Turney, Peter D. (2004) Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities. [Conference Paper]
| PDF 36Kb |
Abstract
This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system approaches WSD as a classical supervised machine learning problem, using familiar tools such as the Weka machine learning software and Brill's rule-based part-of-speech tagger. Head words are represented as feature vectors with several hundred features. Approximately half of the features are syntactic and the other half are semantic. The main novelty in the system is the method for generating the semantic features, based on word co-occurrence probabilities. The probabilities are estimated using the Waterloo MultiText System with a corpus of about one terabyte of unlabeled text, collected by a web crawler.
| Item Type: | Conference Paper |
|---|---|
| Subjects: | Computer Science > Language Linguistics > Computational Linguistics Linguistics > Semantics Computer Science > Machine Learning |
| ID Code: | 3732 |
| Deposited By: | Turney, Peter |
| Deposited On: | 30 Jul 2004 |
| Last Modified: | 19 Dec 2009 19:20 |
References in Article
Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.
Metadata
- ID Plus Text Citation
- RDF+XML
- BibTeX
- Pageflow Montage
- JSON
- Dublin Core
- OAI-ORE Resource Map (Atom Format)
- Simple Metadata
- Refer
- METS
- OAI-ORE Resource Map (RDF Format)
- Search Data Dump
- Pageflow
- HTML Citation
- ASCII Citation
- YAML
- EPrints Application Profile (experimental)
- OpenURL ContextObject
- EndNote
- OpenURL ContextObject in Span
- MODS
- DIDL
- EP3 XML
- Reference Manager
- RDF+N3
- Eprints Application Profile
Repository Staff Only: item control page

