Helmholtz Principle-Based Keyword Extraction

Pradhan, Anima (2013) Helmholtz Principle-Based Keyword Extraction. MTech thesis.



In today’s world of evolving technology, everybody wishes to accomplish tasks in least time. As information available online is perpetuating every day, it becomes very difficult to summarize any more than 100 documents in acceptable time. Thus, ”text summarization” is a challenging problem in the area of Natural Language Processing (NLP) especially in the context of global languages. In this thesis, we survey taxonomy of text summarization from different aspects. It briefly explains different approaches to summarization and the evaluation parameters. Also presented are a thorough details and facts about more than fifty automatic text summarization systems to ease the job of researchers and serve as a short encyclopedia for the investigated systems. Keyword extraction methods plays vital role in text mining and document processing. Keywords represent essential content of a document. Text mining applications take the advantage of keywords for processing documents. A quality Keyword is a word that represents the exact content of the text subsetly. It is very difficult to process large number of documents to get high quality keywords in acceptable time. This thesis gives a comparison between the most popular keyword extractions method, tf-idf and the proposed method that is based on Helmholtz Principle. Helmholtz Principle is based on the ideas from image processing and derived from the Gestalt theory of human perception. We also investigate the run time to extract the keywords by both the methods. Experimental results show that keyword extraction method based on Helmholtz Principle outperformancetf-idf.

Item Type:Thesis (MTech)
Uncontrolled Keywords:Text Mining;Text Summarization; Stemming; Helmholtz Peinciple; Information Retrieval;Keyword Extraction; Term Frequency - Inverse Document Frequency.
Subjects:Engineering and Technology > Computer and Information Science > Data Mining
Divisions: Engineering and Technology > Department of Computer Science
ID Code:5048
Deposited By:Hemanta Biswal
Deposited On:06 Dec 2013 10:15
Last Modified:06 Dec 2013 10:15
Supervisor(s):Babu, K S

Repository Staff Only: item control page