Classification of Sentimental Reviews Using Natural Language Processing Concepts and Machine Learning Techniques

Agrawal, Ankit (2015) Classification of Sentimental Reviews Using Natural Language Processing Concepts and Machine Learning Techniques. MTech thesis.



Natural language processing (NLP) is the hypothetically motivated scope of computational strategies for representing and analyzing naturally occurring text at many levels of textual analysis for the goal of attaining automatic language processing system for multiple tasks and applications. One of the most import applications of natural language processing from industry perspective is sentiment analysis. Sentiment analysis is the most eminent branch of NLP because of its capability to classify any textual document to either as positive or negative polarity. With the proliferation of World Wide Web, huge textual unstructured data in form of tweets, messages, articles, social networking discussions, reviews of products and movies are available so as to extract right information from the large pool. Thus, a need is felt to analyze this data to bring out some hidden facts based on the intention of the author of the text. The intention can be either criticism (negative) of product and movie review or it can be admiration (positive). Although, The intention can vary from strongly positive to positive and strongly negative to negative. This thesis completely focuses on classification of movie reviews in either as positive or negative review using machine learning techniques like Support Vector Machine(SVM), K-Nearest Neighbor(KNN) and Naive Bayes (NB) classifier. Further, a N-gram Model has been proposed where the documents are classified based on unigram, bigram and trigram composition of words in a sentence. Two dataset are considered for this study; one is a labeled polarity dataset where each movie review is either labeled as positive or negative and other one is IMDb movie reviews dataset. Finally, the prediction accuracy of above mentioned machine learning algorithms in different manipulations of same dataset is studied and a comparative analysis has been made for critical examination.

Item Type:Thesis (MTech)
Uncontrolled Keywords:Natural Language Processing, Sentiment Analysis, Naive Bayes, Support Vector Machine, K-Nearest Neighbor
Subjects:Engineering and Technology > Computer and Information Science > Data Mining
Divisions: Engineering and Technology > Department of Computer Science
ID Code:7347
Deposited By:Mr. Sanat Kumar Behera
Deposited On:19 May 2016 19:58
Last Modified:19 May 2016 19:58
Supervisor(s):Rath, S K

Repository Staff Only: item control page