Cogprints

A Supervised Learning Approach to Acronym Identification

Nadeau, David and Turney, Peter (2005) A Supervised Learning Approach to Acronym Identification. [Conference Paper]

Full text available as:

[img]
Preview
PDF
100Kb

Abstract

This paper addresses the task of finding acronym-definition pairs in text. Most of the previous work on the topic is about systems that involve manually generated rules or regular expressions. In this paper, we present a supervised learning approach to the acronym identification task. Our approach reduces the search space of the supervised learning system by putting some weak constraints on the kinds of acronym-definition pairs that can be identified. We obtain results comparable to hand-crafted systems that use stronger constraints. We describe our method for reducing the search space, the features used by our supervised learning system, and our experiments with various learning schemes.

Item Type:Conference Paper
Keywords:acronym identification, supervised learning
Subjects:Computer Science > Language
ID Code:4399
Deposited By: Nadeau, David
Deposited On:19 Jun 2005
Last Modified:11 Mar 2011 08:56

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.

Adar, E. (2002) S-RAD A Simple and Robust Abbreviation Dictionary, HP Laboratories Technical Report, September.

Chang, J.T., Schütze, H. and Altman R.B., (2002), Creating an Online Dictionary of Abbreviations from MEDLINE, Journal of American Medical Informatics Association(JAMIA), 9(6), p.612-620.

Larkey, L., Ogilvie, P., Price, A. and Tamilio, B. (2000) Acrophile: An Automated Acronym Extractor and Server, In Proceedings of the ACM Digital Libraries conference, pp. 205-214.

Park, Y., and Byrd, R.J., (2001), Hybrid Text Mining for Finding Abbreviations and Their Definitions, Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, Pittsburgh, PA.

Pustejovsky, J., Castao, J., Cochran, B., Kotecki, M., Morrell, M. and Rumshisky, A. (2001) "Extraction and Disambiguation of Acronym-Meaning Pairs in Medline", unpublished manuscript.

Schwartz, A. and Hearst, M. (2003), A simple algorithm for identifying abbreviation definitions in biomedical texts, In Proceedings of the Pacific Symposium on Biocomputing (PSB).

Taghva, K. and Gilbreth, J. (1999), Recognizing acronyms and their definitions, International journal on Document Analysis and Recognition, pages 191-198.

Tufis, D. and Mason, O. (1998). Tagging Romanian Texts: a Case Study for QTAG, a Language Independent Probabilistic Tagger, Proceedings of the First International Conference on Language Resources and Evaluation (LREC), Spain, p.589-596.

Yeates, S. (1999), Automatic extraction of acronyms from text. In Third New Zealand Computer Science Research Students' Conference, pages 117-124.

Yu H, Hripcsak G, Friedman C. (2002) Mapping abbreviations to full forms in biomedical articles, Journal of the American Medical Informatics Association (9) 262-272.

Witten I, H, and Frank, E. (2000) Data Mining: Practical machine learning tools with Java implementations, Morgan Kaufmann, San Francisco.

Zahariev, M. (2004). A (Acronyms), Ph.D. thesis, School of Computing Science, Simon Fraser University.

Metadata

Repository Staff Only: item control page