creators_name: Berkowitz, Eric creators_name: Elkhadiri, Mohamed Reda creators_name: Sahouri, Tim creators_name: Abraham, Michel editors_name: Berkowitz, Eric type: confpaper datestamp: 2004-06-05 lastmod: 2011-03-11 08:55:37 metadata_visibility: show title: Intelligent Content Based Title and Author Name Extraction from Formatted Documents ispublished: pub subjects: comp-sci-lang subjects: archives full_text_status: public keywords: Document Classification Indexing abstract: This paper describes the development of algorithms for extracting the title and the names of the authors from documents available on the World Wide Web. In this paper we describe several algorithms for doing so in a manner designed not to rely on specific stylistic dictates of any document formatting standard. Rather, they are designed to rely on a combination of overt and subtle cues that form a generalized, common standard for placing this information in a document and its easy extraction by readers. date: 2004 date_type: published publisher: Omnipress pagerange: 119-124 refereed: TRUE citation: Berkowitz, Eric and Elkhadiri, Mohamed Reda and Sahouri, Tim and Abraham, Michel (2004) Intelligent Content Based Title and Author Name Extraction from Formatted Documents. [Conference Paper] document_url: http://cogprints.org/3663/1/ebmaics2004b.pdf