Sarcasm Detection in Textual Data: A Supervised Approach

Bharti, Santosh Kumar (2019) Sarcasm Detection in Textual Data: A Supervised Approach. PhD thesis.

[img]PDF (Restricted upto 10/06/2021)
Restricted to Repository staff only



Sentiment analysis is a technique to identify people’s opinion, attitude, sentiment,and emotion towards any specific target such as individuals, events, topics, products, organizations, services, etc. Sarcasm is a special type of sentiment that comprise of words which are opposite in meaning to what is really being said(especially in a sesne of insult, wit, irritation, humor). People often expressed it verbally through the use of heavy tonal stress and certain gestures clues like eye rolling, hands movement, etc. These tonal and gestural clues are obviously missing to express sarcasm in text, making its detection reliant upon other factors such as capitalization of words, punctuation mark, exclamation mark, etc. To express sarcasm in text, one often use positive or intensified positive words to express their negative feelings on a particular target. Nowadays, posting sarcastic messages on social media like Twitter, Facebook, WhatsApp, etc., has became a new trend to avoid direct negativity. Detecting these indirect negativity i.e., sarcasm in the social media text has become an important task as they influence every business organization. In the presence of sarcasm, sentiment analysis on these social media texts became the most challenging task. The property of sarcasm that makes it difficult to analyze and detect is the gap between its literal and intended meaning. Therefore, an automated system is required for sarcasm detection in textual data which would be capable of identifying actual sentiment of a given text in the presence of sarcasm.

In this thesis, we proposed an automated system for sarcasm detection in tweets scripted in English as well as Hindi (Transliterated in English). It also detects sarcasm in Telugu conversation sentences (Transliterated in English). Sarcasm detection methods in the text can be categorized as rule-based, pattern-based, machine learning-based and context-based.

Rule-based approach is the most basic method used for sarcasm detection in the text. In this approach, we mainly focus on hyperbolic and syntactic features of the text. Interjections, intensifiers and punctuation symbols are the most frequent hyperbole features used in the text to infer sarcastic messages. The extreme adjective and extreme adverb act as intensifiers for the text. Some examples of intensifiers are thoroughly enjoyed, fantastic weather, so beautiful, etc. The rule-based approach is simple to implement and often attains good accuracy for text classification. Three rule-based classification methods are proposed, one each for English, Hindi and Telugu.

Pattern-based approach is the most effective classifier for sarcasm detection in the text. Here, a corpus of sarcastic tweets and conversation sentences were analyzed, and six unique patterns of the sarcastic text were obtained. The patterns are: sarcasm as a contradiction between tweet’s sentiment and its situation phrases, sarcasm as a contradiction between user’s likes and dislikes in Twitter data, sarcasm as a contradiction between a tweet and the universal truth, sarcasm as a contradiction between a tweet and its time-dependent facts, sarcasm as a contradiction between tweet’s sentiment and its context on which it is posted, and a positive tweet with antonym pairs of either verbs or adverbs or adjectives. These approaches attain high accuracy for sarcasm detection in the text.

Machine learning-based approach is the most common technique used for classification. The performance of the machine learning classifiers often depends on dataset and feature set quality. In this thesis, lexical, syntactic, hyperbole, sentiment features are used in various machine learning algorithms. The classifiers evaluated are Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and AdaBoost. Among these classifiers, NB outperformed other classifiers because of independence of text features. In text classification (especially when training set is small), NB performs better than other classifiers.

Context-based approach is the most important method for text classification. Sarcasm can be detected by considering lexical, pragmatic, hyperbolic or other such features of
the text. Some features can also be developed using certain patterns such as unigram, bigram, trigram, etc. There can be features based on verbal or gestural clues such as
emoticons, onomatopoeic expressions in laughter, positive interjections, quotation marks, use of punctuation which can help in detecting sarcasm. But all these features are not enough to identify sarcasm in text until the context of the text is known. The machine, as well as human, should be aware of the context of the text and relate it to general world knowledge to be able to identify sarcasm more accurately. In this approach, we mainly focus on situation, topical, temporal, and historical context of the text.

Item Type:Thesis (PhD)
Uncontrolled Keywords:Context-based; Hindi; Machine learning; NLP; Pattern-based; Rule-based; Sarcasm; Sentiment; Telugu Conversation; Tweets
Subjects:Engineering and Technology > Computer and Information Science > Data Mining
Divisions: Engineering and Technology > Department of Computer Science Engineering
ID Code:10002
Deposited By:IR Staff BPCL
Deposited On:06 Jun 2019 16:01
Last Modified:06 Jun 2019 16:01
Supervisor(s):Babu, Korra Sathya

Repository Staff Only: item control page