--- abstract: 'The evaluative character of a word is called its semantic orientation. A positive semantic orientation implies desirability (e.g., "honest", "intrepid") and a negative semantic orientation implies undesirability (e.g., "disturbing", "superfluous"). This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora. The method involves issuing queries to a Web search engine and using pointwise mutual information to analyse the results. The algorithm is empirically evaluated using a training corpus of approximately one hundred billion words — the subset of the Web that is indexed by the chosen search engine. Tested with 3,596 words (1,614 positive and 1,982 negative), the algorithm attains an accuracy of 80%. The 3,596 test words include adjectives, adverbs, nouns, and verbs. The accuracy is comparable with the results achieved by Hatzivassiloglou and McKeown (1997), using a complex four-stage supervised learning algorithm that is restricted to determining the semantic orientation of adjectives. ' altloc: - http://extractor.iit.nrc.ca/reports/ERB-1094.pdf chapter: ~ commentary: ~ commref: ~ confdates: ~ conference: ~ confloc: ~ contact_email: ~ creators_id: [] creators_name: - family: Turney given: Peter D. honourific: '' lineage: '' - family: Littman given: Michael L. honourific: '' lineage: '' date: 2002 date_type: published datestamp: 2002-07-15 department: Institute for Information Technology dir: disk0/00/00/23/22 edit_lock_since: ~ edit_lock_until: ~ edit_lock_user: ~ editors_id: [] editors_name: [] eprint_status: archive eprintid: 2322 fileinfo: /style/images/fileicons/application_postscript.png;/2322/1/ERB%2D1094.ps|/style/images/fileicons/application_pdf.png;/2322/5/ERB%2D1094.pdf full_text_status: public importid: ~ institution: National Research Council Canada isbn: ~ ispublished: unpub issn: ~ item_issues_comment: [] item_issues_count: 0 item_issues_description: [] item_issues_id: [] item_issues_reported_by: [] item_issues_resolved_by: [] item_issues_status: [] item_issues_timestamp: [] item_issues_type: [] keywords: ~ lastmod: 2011-03-11 08:54:57 latitude: ~ longitude: ~ metadata_visibility: show note: ~ number: ~ pagerange: ~ pubdom: FALSE publication: ~ publisher: ~ refereed: FALSE referencetext: ~ relation_type: [] relation_uri: [] reportno: NRC Technical Report ERB-1094 rev_number: 14 series: ~ source: ~ status_changed: 2007-09-12 16:44:14 subjects: - comp-sci-art-intel - comp-sci-lang - comp-sci-mach-learn - comp-sci-stat-model succeeds: ~ suggestions: ~ sword_depositor: ~ sword_slug: ~ thesistype: ~ title: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus type: techreport userid: 2175 volume: ~