http://cogprints.org/3122/
Coherent Keyphrase Extraction via Web Mining
Keyphrases are useful for a variety of purposes,
including summarizing, indexing, labeling,
categorizing, clustering, highlighting, browsing, and
searching. The task of automatic keyphrase extraction
is to select keyphrases from within the text of a given
document. Automatic keyphrase extraction makes it
feasible to generate keyphrases for the huge number of
documents that do not have manually assigned
keyphrases. A limitation of previous keyphrase
extraction algorithms is that the selected keyphrases are
occasionally incoherent. That is, the majority of the
output keyphrases may fit together well, but there may
be a minority that appear to be outliers, with no clear
semantic relation to the majority or to each other. This
paper presents enhancements to the Kea keyphrase
extraction algorithm that are designed to increase the
coherence of the extracted keyphrases. The approach is
to use the degree of statistical association among
candidate keyphrases as evidence that they may be
semantically related. The statistical association is
measured using web mining. Experiments demonstrate
that the enhancements improve the quality of the
extracted keyphrases. Furthermore, the enhancements
are not domain-specific: the algorithm generalizes well
when it is trained on one domain (computer science
documents) and tested on another (physics documents).
Turney, Peter
Statistical Models
Language
Machine Learning
Peter
Turney