?url_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rft.title=EXPLOITING+N-GRAM+IMPORTANCE+AND+ADDITIONAL+KNOWEDGE+BASED+ON+WIKIPEDIA+FOR+IMPROVEMENTS+IN+GAAC+BASED+DOCUMENT+CLUSTERING&rft.creator=Kumar%2C+Mr.+Niraj+&rft.creator=Vemula%2C+Mr.+Venkata+Vinay+Babu&rft.creator=Srinathan%2C+Dr.+Kannan&rft.creator=Varma%2C+Dr.+Vasudeva&rft.subject=Statistical+Models&rft.description=This+paper+provides+a+solution+to+the+issue%3A+%E2%80%9CHow+can+we+use+Wikipedia+based+concepts+in+document%0D%0Aclustering+with+lesser+human+involvement%2C+accompanied+by+effective+improvements+in+result%3F%E2%80%9D+In+the%0D%0Adevised+system%2C+we+propose+a+method+to+exploit+the+importance+of+N-grams+in+a+document+and+use%0D%0AWikipedia+based+additional+knowledge+for+GAAC+based+document+clustering.+The+importance+of+N-grams%0D%0Ain+a+document+depends+on+several+features+including%2C+but+not+limited+to%3A+frequency%2C+position+of+their%0D%0Aoccurrence+in+a+sentence+and+the+position+of+the+sentence+in+which+they+occur%2C+in+the+document.+First%2C+we%0D%0Aintroduce+a+new+similarity+measure%2C+which+takes+the+weighted+N-gram+importance+into+account%2C+in+the%0D%0Acalculation+of+similarity+measure+while+performing+document+clustering.+As+a+result%2C+the+chances+of+topical+similarity+in+clustering+are+improved.+Second%2C+we+use+Wikipedia+as+an+additional+knowledge+base+both%2C+to+remove+noisy+entries+from+the+extracted+N-grams+and+to+reduce+the+information+gap+between+N-grams+that+are+conceptually-related%2C+which+do+not+have+a+match+owing+to+differences+in+writing+scheme+or+strategies.+Our+experimental+results+on+the+publicly+available+text+dataset+clearly+show+that+our+devised+system+has+a+significant+improvement+in+performance+over+bag-of-words+based+state-of-the-art+systems+in+this+area.&rft.date=2010-10-25&rft.type=Conference+Paper&rft.type=PeerReviewed&rft.format=application%2Fpdf&rft.identifier=http%3A%2F%2Fcogprints.org%2F7148%2F1%2FKDIR_Niraj.pdf&rft.identifier=++Kumar%2C+Mr.+Niraj++and+Vemula%2C+Mr.+Venkata+Vinay+Babu+and+Srinathan%2C+Dr.+Kannan+and+Varma%2C+Dr.+Vasudeva++(2010)+EXPLOITING+N-GRAM+IMPORTANCE+AND+ADDITIONAL+KNOWEDGE+BASED+ON+WIKIPEDIA+FOR+IMPROVEMENTS+IN+GAAC+BASED+DOCUMENT+CLUSTERING.++%5BConference+Paper%5D+++++&rft.relation=http%3A%2F%2Fcogprints.org%2F7148%2F