Cogprints

Measuring Semantic Similarity by Latent Relational Analysis

Turney, Peter D. (2005) Measuring Semantic Similarity by Latent Relational Analysis. [Conference Paper]

Full text available as:

[img]
Preview
PDF
73Kb

Abstract

This paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends the VSM approach in three ways: (1) patterns are derived automatically from the corpus, (2) Singular Value Decomposition is used to smooth the frequency data, and (3) synonyms are used to reformulate word pairs. This paper describes the LRA algorithm and experimentally compares LRA to VSM on two tasks, answering college-level multiple-choice word analogy questions and classifying semantic relations in noun-modifier expressions. LRA achieves state-of-the-art results, reaching human-level performance on the analogy questions and significantly exceeding VSM performance on both tasks.

Item Type:Conference Paper
Keywords:analogies, semantic relations, vector space model, noun-modifier expressions, latent relational analysis
Subjects:Computer Science > Language
Linguistics > Computational Linguistics
Linguistics > Semantics
Computer Science > Machine Learning
Computer Science > Artificial Intelligence
ID Code:4501
Deposited By:Turney, Peter
Deposited On:11 Aug 2005
Last Modified:11 Mar 2011 08:56

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

[Banerjee and Pedersen, 2003] S. Banerjee and T. Pedersen. Extended gloss overlaps as a measure of semantic relat-edness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), pages 805-810, Acapulco, Mexico, 2003.

[Barzilay and McKeown, 2001] R. Barzilay and K. McKe-own. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the Associa-tion for Computational Linguistics, pages 50-57, Tou-louse, France, 2001.

[Berry, 1992] M.W. Berry. Large scale singular value com-putations. International Journal of Supercomputer Ap-plications, 6(1): 13-49, 1992.

[Clarke et al., 1998] C.L.A. Clarke, G.V. Cormack, and C.R. Palmer. An overview of MultiText. ACM SIGIR Forum, 32(2): 14-15, 1998.

[Dagan and Glickman, 2004] I. Dagan and O. Glickman, Probabilistic textual entailment: Generic applied model-ing of language variability. In Learning Methods for Text Understanding and Mining, Grenoble, France, 2004.

[Golub and Van Loan, 1996] G.H. Golub and C.F. Van Loan. Matrix Computations. Third edition. Johns Hop-kins University Press, Baltimore, MD, 1996.

[Landauer and Dumais, 1997] T.K. Landauer and S.T. Du-mais. A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and repre-sentation of knowledge. Psychological Review, 104:211-240, 1997.

[Lesk, 1986] M.E. Lesk. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone. In Proceedings of ACM SIGDOC ’86, pages 24-26, 1986.

[Lewis, 1991] D.D. Lewis. Evaluating text categorization. In Proceedings of the Speech and Natural Language Workshop, Asilomar, pages 312-318, 1991.

[Lin, 1998] D. Lin. Automatic retrieval and clustering of similar words. In Proceedings of the 36th Annual Meet-ing of the Association for Computational Linguistics and the 17th International Conference on Computational Lin-guistics (COLING-ACL ‘98), pages 768-774, Montreal, Canada, 1998.

[Medin et al., 1990] D.L. Medin, R.L. Goldstone, and D. Gentner. Similarity involving attributes and relations: Judgments of similarity and difference are not inverses. Psychological Science, 1(1): 64-69, 1990.

[Nastase and Szpakowicz, 2003] V. Nastase and S. Szpako-wicz. Exploring noun-modifier semantic relations. In Fifth International Workshop on Computational Seman-tics (IWCS-5), Tilburg, The Netherlands, pages 285-301, 2003.

[Rosario and Hearst, 2001] B. Rosario and M. Hearst. Clas-sifying the semantic relations in noun-compounds via a domain-specific lexical hierarchy. In Proceedings of the 2001 Conference on Empirical Methods in Natural Lan-guage Processing (EMNLP-01), pages 82-90, 2001.

[Salton and McGill, 1983] G. Salton and M.J. McGill. In-troduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

[Turney et al., 2003] P.D. Turney, M.L. Littman, J. Bigham, and V. Shnayder. Combining independent modules to solve multiple-choice synonym and analogy problems. In Proceedings of the International Conference on Re-cent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pages 482-489, 2003.

[Turney and Littman, 2005] P.D. Turney and M.L. Littman. Corpus-based learning of analogies and semantic rela-tions. Machine Learning, in press, 2005.

Metadata

Repository Staff Only: item control page