Cogprints

Latent Semantic Indexing for Patent Information

Ryley, Dr. James (2007) Latent Semantic Indexing for Patent Information. [Preprint]

Full text available as:

[img] HTML
20Kb

Abstract

Latent Semantic Indexing (LSI) promises more accurate retrieval of information by incorporating statistical information on term meaning and frequency while retrieving documents as a result of a search. LSI’s precision and accuracy has been proven many times on test corpora, but the world’s patent literature poses a significant challenge in effectively implementing an LSI search engine due the size and heterogeneity of the patent corpus. Some of the factors which must be addressed to realize the goal of a more accurate patent search engine are discussed herein.

Item Type:Preprint
Keywords:patents, search, LSI, LSA, latent semantic indexing, latent semantic analysis, SVD, singular value decomposition, conceptual search
Subjects:Computer Science > Language
ID Code:5710
Deposited By: Ryley, Dr. James
Deposited On:12 Sep 2007
Last Modified:11 Mar 2011 08:56

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

1. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R., Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science, 1990. 41(6): p. 391-407.

2. Text REtrieval Conference (TREC).

3. Dumais, S., LSI meets TREC: A Status Report. The First Text REtrieval Conference (TREC1), National Institute of Standards and Technology Special Publication 1993: p. 137-152.

4. Dumais, S., Latent Semantic Indexing (LSI) and TREC-2. The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication, 1994: p. 105-116.

5. Dumais, S., Latent Semantic Indexing (LSI): TREC-3 Report. The 3rd Text Retrieval Conference (TREC-3), D. Harman Ed. 219-230. NIST Special Publication, 1995: p. 219-230.

6. Chen, C.S., N.; Post, M.; Basu, C.; Bassu, D.; Behrens, C., Telcordia LSI Engine: Implementation and Scalability Issues. Proceedings of the Eleventh International Workshop on Research Issues in Data Engineering, 2001: p. 51-58.

7. Bassu, D.a.B., C., Distributed LSI: Scalable Concept-based Information Retrieval with High Semantic Resolution. Proceedings of the 3rd SIAM International Conference on Data Mining (Text Mining Workshop), 2003.

8. Husbands, P., Simon, H., Ding, C., Term norm distribution and its effects on latent semantic indexing. Information Processing and Management, 2005. 41(4): p. 77-787.

9. Ding, C., A Similarity-based Probabability Model for Latent Semantic Indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval 1999: p. 58-65.

10. Kontostathis, A., Pottenger, W., A framework for understanding LSI performance. Proceedings of ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, 2003.

11. Moldovan, A., Bot, R., Wanka, G., Latent Semantic Indexing for Patent Documents. Technische Universität Chemnitz, Fakultät für Mathematik (Germany). Preprint, 2004.

12. Gao, J., Zhang, J., Clustered SVD strategies in latent semantic indexing. Information Processing and Management, 2004. 41: p. 1051-1063.

13. Jain, A., Murty, M., Flynn, P., Data Clustering: A Review. ACM Computing Surveys, 1999. 31(3): p. 264-323.

14. Karypis, G., CLUTO - A Clustering Toolkit. University of Minnesota - Computer Science and Engineering Technical Report Abstract, 2002.

Metadata

Repository Staff Only: item control page