creators_name: Turney, Peter D. creators_id: 2175 type: confpaper datestamp: 2005-08-11 lastmod: 2011-03-11 08:56:09 metadata_visibility: show title: Measuring Semantic Similarity by Latent Relational Analysis ispublished: pub subjects: comp-sci-lang subjects: ling-comput subjects: ling-sem subjects: comp-sci-mach-learn subjects: comp-sci-art-intel full_text_status: public keywords: analogies, semantic relations, vector space model, noun-modifier expressions, latent relational analysis abstract: This paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends the VSM approach in three ways: (1) patterns are derived automatically from the corpus, (2) Singular Value Decomposition is used to smooth the frequency data, and (3) synonyms are used to reformulate word pairs. This paper describes the LRA algorithm and experimentally compares LRA to VSM on two tasks, answering college-level multiple-choice word analogy questions and classifying semantic relations in noun-modifier expressions. LRA achieves state-of-the-art results, reaching human-level performance on the analogy questions and significantly exceeding VSM performance on both tasks. date: 2005 date_type: published pagerange: 1136-1141 refereed: TRUE referencetext: [Banerjee and Pedersen, 2003] S. Banerjee and T. Pedersen. Extended gloss overlaps as a measure of semantic relat-edness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), pages 805-810, Acapulco, Mexico, 2003. [Barzilay and McKeown, 2001] R. Barzilay and K. McKe-own. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the Associa-tion for Computational Linguistics, pages 50-57, Tou-louse, France, 2001. [Berry, 1992] M.W. Berry. Large scale singular value com-putations. International Journal of Supercomputer Ap-plications, 6(1): 13-49, 1992. [Clarke et al., 1998] C.L.A. Clarke, G.V. Cormack, and C.R. Palmer. An overview of MultiText. ACM SIGIR Forum, 32(2): 14-15, 1998. [Dagan and Glickman, 2004] I. Dagan and O. Glickman, Probabilistic textual entailment: Generic applied model-ing of language variability. In Learning Methods for Text Understanding and Mining, Grenoble, France, 2004. [Golub and Van Loan, 1996] G.H. Golub and C.F. Van Loan. Matrix Computations. Third edition. Johns Hop-kins University Press, Baltimore, MD, 1996. [Landauer and Dumais, 1997] T.K. Landauer and S.T. Du-mais. A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and repre-sentation of knowledge. Psychological Review, 104:211-240, 1997. [Lesk, 1986] M.E. Lesk. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone. In Proceedings of ACM SIGDOC ’86, pages 24-26, 1986. [Lewis, 1991] D.D. Lewis. Evaluating text categorization. In Proceedings of the Speech and Natural Language Workshop, Asilomar, pages 312-318, 1991. [Lin, 1998] D. Lin. Automatic retrieval and clustering of similar words. In Proceedings of the 36th Annual Meet-ing of the Association for Computational Linguistics and the 17th International Conference on Computational Lin-guistics (COLING-ACL ‘98), pages 768-774, Montreal, Canada, 1998. [Medin et al., 1990] D.L. Medin, R.L. Goldstone, and D. Gentner. Similarity involving attributes and relations: Judgments of similarity and difference are not inverses. Psychological Science, 1(1): 64-69, 1990. [Nastase and Szpakowicz, 2003] V. Nastase and S. Szpako-wicz. Exploring noun-modifier semantic relations. In Fifth International Workshop on Computational Seman-tics (IWCS-5), Tilburg, The Netherlands, pages 285-301, 2003. [Rosario and Hearst, 2001] B. Rosario and M. Hearst. Clas-sifying the semantic relations in noun-compounds via a domain-specific lexical hierarchy. In Proceedings of the 2001 Conference on Empirical Methods in Natural Lan-guage Processing (EMNLP-01), pages 82-90, 2001. [Salton and McGill, 1983] G. Salton and M.J. McGill. In-troduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. [Turney et al., 2003] P.D. Turney, M.L. Littman, J. Bigham, and V. Shnayder. Combining independent modules to solve multiple-choice synonym and analogy problems. In Proceedings of the International Conference on Re-cent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pages 482-489, 2003. [Turney and Littman, 2005] P.D. Turney and M.L. Littman. Corpus-based learning of analogies and semantic rela-tions. Machine Learning, in press, 2005. citation: Turney, Peter D. (2005) Measuring Semantic Similarity by Latent Relational Analysis. [Conference Paper] document_url: http://cogprints.org/4501/1/NRC-48255.pdf