Abstract: The technology has led to the sudden rise in popularity of microblogging sites like Twitter, where people express their opinions on a wide range of topics, products, and events. In order to derive useful insights from such biased data, it must be properly utilized. The goal of this study was to develop a supervised machine learning framework for Twitter sentiment analysis (TSA) that makes use of a knowledge-based Tweet Normalization System (TNS) and an advanced negation handling strategy. Although the negation has been the subject of numerous publications, the assessment of exceptions to the rule has received less attention. Therefore, dealing with negation exception circumstances was also the emphasis of this thesis. The primary goal is to avoid incorrectly categorizing negation tweets by giving proper attention to negation and taking negation exception tweets into account. In order to detect these types of tweets, we built an algorithm specifically for negation exceptions and integrated it into our advanced statistical negation modeling method that is based on corpus data.
Description:Field of the Invention
To extract useful information from textual data, sentiment analysis employs machine learning and natural language processing (NLP) methods. Researchers in fields as diverse as politics, healthcare, finance, marketing, and business have recently focused on sentiment analysis due to its potential benefits in these areas and others. With the abundance of subjective data at our fingertips, sentiment analysis has quickly grown into a hot area of study. All sorts of information like this can be found on social media sites including blogs, posts, reviews, comments, and more. Reviewing services, products, subjects, or events by analyzing such opinionated data is quite valuable. One example is the widespread usage of hashtags (#) in modern Twitter discourse; for example, #Demonetization, #election2019, etc. If you want to know how people are feeling about something, like demonetization, you can do it by following tweets with the hashtag #demonetization. Still, it's hardly a picnic to sift through such subjective digital data in search of human opinion. Understanding user sentiment on Twitter may be quite a challenge. Obstacles such POS labeling, sarcasm detection, polarity detection, and countless more are presented. Sentiment analysis must take a comprehensive approach in order to deal with the myriad of issues associated with polarity and meaning extraction from text.
Background of the Invention
Instead of studying longer texts like product reviews, which have typically been the focus of researchers, recent studies have shifted their focus to the shorter, less organized language of microblogs. Due to the platform's late 2006 introduction, no studies focusing on sentiment analysis on Twitter existed before then. Despite its relative youth (just over a decade), the field of sentiment analysis on Twitter has recently experienced explosive growth. This topic focuses on the automated management of opinions, sentiments, and textual subjectivity. However, due to the casual tone of tweets, other approaches were used in the literature. According to Birmingham and Smeaton (2010), the research used micro reviews, blogs, tweets, and movie reviews as its subjects. They discovered that sentiment analysis on tweets was far easier than on long texts. On the other hand, they did make a note of the fact that things work out differently when evaluating high-order n-grams. Consequently, sentiment analysis on large-text sources, such as reviews, is simpler than sentiment analysis on Twitter. An abundance of literature explores different approaches and properties of Twitter sentiment analysis. Regardless, most of them zeroed in on sentiment analysis on Twitter with ML and, more recently, DL methods.
Since people frequently utilize acronyms, misspelling words, out-of-vocabulary terms, etc., to fit into the character restriction of a tweet, the writing on Twitter is extremely unstructured. It could have unnecessary components such as scripts, URLs, html tags, etc. Elongated words (in which a single letter appears more than once) and capitalized words are common ways that individuals convey emphasis. Consequently, researchers now face additional obstacles in developing effective solutions for Twitter sentiment analysis due to the questionable grammar employed in the tweet. Since Twitter language is so different from conventional text, the majority of previous research on Twitter sentiment analysis has included an extra data pre-processing step prior to feature extraction.
While conducting sentiment analysis to denoize tweets and reduce vocabulary, some researchers investigated the tweet's linguistic quirks, including misspellings, acronyms, emoticons, improper grammar constructions, punctuations, etc. Haddi et al. (2013) investigated the pre-processing role in improving the performance of a support vector machine model for sentiment analysis of movie reviews, in contrast to most prior research on sentiment analysis, which neglected the effectiveness of pre-processing in elevating classifier performance. They tested SVMs with and without pre-processing and found that the latter significantly improved performance. Despite the importance of pre-processing tweets for improved performance, very little current research has concentrated on this area, despite the fact that it is a prerequisite for sentiment analysis. This section mostly addresses previous research that examined the effect of data pre-processing on classification accuracy.
The majority of the work on negation focuses on two kinds of words: those that simply negate, like "not," and those that modify content, like "eliminate" (Choi & Cardie, 2008). Along with negators, Wilson et al. (2009) distinguish between shifters of positive and negative polarity. Not many early works also investigated different kinds of negation. Negative polarity items (NPIs) (anything, any, etc.) are also taken into account by Taboada et al. (2011). A negative sentence's semantic direction can be influenced by NPIs. The NPIs were managed by means of irrealis blocking. According to their reasoning, sentiment analysis may not be able to trust words that include sentiment if there are a small number of irrealis markers (such as conditionals, modals, specific verbs (doubt, although), and non-pronoun inflected nouns) in the text. That is why the irrealis marker doesn't take into account the polarity values of sentiment-bearing words. Take the opinionated word "great" as an example; the text "This should have been as great performance" completely disregards its polarity value. But according to Benamara et al. (2012), NPIS can't be accurately solved without conducting further in-depth analyses, and disregarding the sentiment orientation in NPI cases isn't a good idea. Negation quantifiers (nothing, nobody, never, none, no) (e.g., "You will never be bored"), negation operators (no more, not, without, neither, and no one), and lexical negation (introduced due to implicit negative terms such as deficiency, absence of, and lack of) were the three main areas of focus.
Few previous studies have noted that complicated and compound phrases, impacted by a variety of language elements such conditionals, conjunctions, POS tags, and punctuation marks, are difficult to analyze using the aforementioned state-of-the-art methods of scope determination for Twitter sentiment analysis. As an example, the use of the conjunction "but" limits the scope of the negative to only one clause and prevents its impact from being extended to another clause. "This phone is not nice but it is working properly." would be an example of such a sentence. Only the word "nice" following "not" (the conjunction "but") is subject to negation in this phrase. If we exclude the "but" in the previous example, the conventional method of marking the scope up to the punctuation "." is wrong. For negation scope determination, some current studies rely on more complex methods than linguistic ones. These methods include probabilistic models, reinforcement learning, rule-based algorithms, semantic parsing, and a meta-learning approach (US11227120B2).
Notably, the biomedical area is where the concept of negation first emerged. As a result, prior research has mostly used probabilistic models like CRF or shallow semantic parsing to tackle the denial scope in the biological or product study domains. Vincze et al. developed one such biomedical corpus (2008). It was annotated with scopes and negation hints. Both the scope and the word for negation are always at their maximum in this corpus. Then, biomedical corpora were made publicly available, making them the first of their kind (US10565306B2).
The ability of neural network models to handle sequential data is why they have been utilized for negation scope detection in a small number of recent publications. For example, by investigating several LSTMs models, Gautam et al. (2018) dealt with the negation scope in tutorial discussion. They employ LSTM for the purposes of detecting negation scopes and identifying negation cues. They achieved better results than the CRF model that relies on machine learning and requires features that were manually created. Prior work by Banjade et al. (2016) is comparable to their negation scope task. Nevertheless, their use of LSTM for negation cue detection adds to what Banjade et al. (2016) have already accomplished.
Summary of the Invention
Negation modeling in Twitter sentiment analysis is covered in this study. We will outline the steps involved in modeling negation and the many types of negation. On top of that, we'll go over our method for dealing with tweets that contain a negation cue but don't actually contain any negation. Many different kinds of negativity are possible. morphological negation involves the addition of a negating prefix or suffix to the root of an opinionated word (e.g., impolite, dishonest, or limitless); syntactic negation is characterized by the presence of explicit negation cues like no, not, never, etc.; and implied negation is characterized by an implicit sense of negation ("It will be her first and last movie"). Negative adjectives with the ability to soften the intensity of strongly held opinions are diminutives, such as barely, hardly, seldom, etc. We will demonstrate the usefulness of our suggested approach for managing negation exception circumstances through a series of experiments. We will compare the findings obtained manually with those of our newly-developed negation exception algorithm in order to assess its accuracy.
Brief Description of Drawings
Figure 1: Sentiment Analysis Process
Detailed Description of the Invention
Many different kinds of negativity are possible. morphological negation involves the addition of a negating prefix or suffix to the root of an opinionated word (e.g., impolite, dishonest, or limitless); syntactic negation is characterized by the presence of explicit negation cues like no, not, never, etc.; and implied negation is characterized by an implicit sense of negation ("It will be her first and last movie"). Negative adjectives with the ability to soften the intensity of strongly held opinions are diminutives, such as barely, hardly, seldom, etc. As an example, "@OfficeOfRG and by the way people have hardly supported your stance against #Demonetization". "Hardly" is used to tone down the polarity of the word "supported" in this tweet. Anywhere in the text, the scope of a diminisher—an adverb or adjective—can be found.
When dealing with syntactic negation, there are essentially three steps. Finding indications of negation, such as no, not, and many more, should be the first and most crucial step. The next step is to determine the scope of negation by locating the negated context terms, which is done by looking for a negation cue. The last step is to deal with the terms that have been indicated as falling under the negation scope. To put it another way, the three steps are sequential and work in a pipeline, with the output from one step feeding into the next.
The first stage in negation modeling for sentiment analysis is to detect the existence of negation terms. Words and phrases that explicitly negate something, such "no," "nor," "not," etc., or phrases that imply negation, like "no matter what" or "nowhere," are known as negation cues. The most prevalent English cues include the word "not" and its contractions, such as isn't and can't. Several methods, such as a lexicon-based approach, a machine learning technique, and parsing, have been proposed in the literature for negation cue recognition. To forecast negation cues in a machine learning technique, supervised learning classifiers like CRF are employed. A list of explicit negation words is used by the vocabulary-based method to identify negation cues through a look-up mechanism. Machine learning can outperform lexicon-based methods when it comes to negation cue prediction, but it does require corpora that have been annotated with these cues. Also, most of the top-tier systems for analyzing Twitter sentiment use a lexicon-based approach because it is easy to use and produces good results. As a result, we also identified negation cues using the lexicon-based technique. As part of their work on negation handling, researchers have compiled multiple lists of negation cues.
Words that are negated in context are defined by the negation scope as those that are affected by the negation. A negative scope could consist of a single word or several words. Various methods, such as rule-based linguistic, machine learning, compositional semantic parsing, and meta-learning, have been used for resolution of negation scope in the past. Most of the state-of-the-art systems for analyzing Twitter sentiment use rule-based linguistics, the most popular and conventional of these approaches. It is the first punctuation mark in a piece of text that defines negation scope, not the negation cue. Even though the NRC-Canada team and other top-performing SemEval teams utilized this method, it resulted in the needless change of every word between the cue and the first punctuation mark, regardless of its polarity. First feeling conveying words, the remainder of the sentence, and a predetermined length window are a few alternative linguistic techniques. While the aforementioned linguistic approaches to scope resolution have shown performance increases in the literature, they are only effective for basic texts when put into practice. Due to the inclusion of numerous language elements, such as conjunctions (e.g. "but"), punctuations, POS tags, and conditionals (e.g. "if" or "else"), such algorithms may fail in scope identification in cases of compound and complex text (containing more than one clause). For example, the inclusion of conjunctions like "but" in a compound sentence restricts the scope of negation to a single clause and prevents it from extending to other parts of the sentence. When the conjunction "but" appears in a piece of text, negation scope is immediately terminated. "The truth is, MODI didn't bring unbearable change but we are accustomed to easy life," one tweet goes. Battle for Demonetization: Apathy2Change” The negation cue "didn't" should be valid up to the token "change" in scope. The polarity of the phrase after "but" should not be affected by the existence of the conjunction "but" since it terminates the negation impact earlier. On a semantic level, this is also accurate. Nevertheless, the negative "didn't" would also affect the word "easy" if the presence of the conjunction is disregarded in this case. As a result, sentiment prediction would be inaccurate..
Negation modeling concludes with dealing with terms that fall within the negation scope. Reverse polarity, shift polarity, and corpus-based approaches are among the most prominent methods for negation scope identification. When doing sentiment analysis, these methods are usually applied with lexicon-based approaches. Words in the negated context are essentially turned the other way around using the reverse polarity technique. That is, the polarity is reversed, meaning the polarity score goes from p to -p. While some dictionaries, like Bing Liu's, only list positive and negative terms, others, like SWN's, also include the actual polarity values of words that express opinions. In lexicons of the Bing Liu type, positive words are typically given a score of +1 and negative terms a value of -1. As a result, the positive word score will be changed to -1 in cases of reversal polarity and vice versa. Similarly, to obtain the reversal of a word's polarity in the SWN lexicon, its negated context word's value is multiplied by -1.
Nevertheless, we enhanced their corpus-based method by incorporating a handful of linguistic guidelines for dealing with tweets containing negation cues but lacking negation sense. This way, misclassification won't happen. The presence of a negation cue does not always indicate negation sense; for example, in negative rhetorical questions, as in "@padrebrey I understand your perspective. I thought Ash Wednesday wasn't a holy day? It's never escaped my notice. In this case, the negation cue "isn't" does not affect the word "obligation" in any way. In order to avoid using negation models, it is essential to find these tweets in the corpus. No one has provided guidelines for dealing with such cases, even though there are a few studies that touch on them in the current literature. In their study, Reitan et al. (2015) found that their scope detection machine learning classifier had trouble in certain situations with negation cues that were operating as non-cues, which typically happen as determiners. Case in point: phrases that serve as both an exclamation and a negation (such as "No! don't touch it"). So, we did some research in this area and analyzed the obtained corpus of negation tweets. For those few instances where the negation word is present but has no scope, we have established some guidelines for handling the situation. In two common contexts, negation cues serve as non-cues, meaning they do not convey negation.
The preceding section provides a comprehensive overview of such cases. In tweets where the negation cue is present but no negation sense is present semantically, our algorithm can identify them. Therefore, we should not use negation handling in tweets that contain negation. The input to this algorithm should be a list of tweets and POS tokens, which is a prerequisite for tokenization and POS tagging of the Twitter corpus. The CMU POS tagger was utilized to retrieve the tweets and POS tokens. The algorithm also needs a set of negation phrases, which are phrases with a negation cue but no negation sense, as well as a list of negation cues. The algorithm generates a list of tweet tokens that include context words that have been negated and marked with the suffix "_NEG" (scope of negation). Nevertheless, the algorithm does not include negation tweets that conform to cases 1 and 2, as determined by the algorithm's own rules. The procedure for addressing negation exception occurrences is shown in Figure 1. , Claims:The scope of the invention is defined by the following claims:
Claim:
1. The Method/Study to modelling the syntactic Negation in Sentiment Analysis of Tweets on different streams comprising the steps of:
a) The system starts with identifying the presence of negation cues.
b) The Algorithm will handle those words which are marked to be under the negation scope.
2. The Method/Study to modelling the syntactic Negation in Sentiment Analysis of Tweets on different streams as claimed in claim1, Conditional Random field Algorithm is used to identify the presence of negation cues.
3. The Method/Study to modelling the syntactic Negation in Sentiment Analysis of Tweets on different streams as claimed in claim1, Parts of speech tagging is used to identify the scope of negation cues in the tweets.
4. The Method/Study to modelling the syntactic Negation in Sentiment Analysis of Tweets on different streams as claimed in claim1, Designed a method, Negation Exception Algorithm to handle those words which are marked to be under the negation scope.
| # | Name | Date |
|---|---|---|
| 1 | 202441032322-REQUEST FOR EARLY PUBLICATION(FORM-9) [24-04-2024(online)].pdf | 2024-04-24 |
| 2 | 202441032322-FORM-9 [24-04-2024(online)].pdf | 2024-04-24 |
| 3 | 202441032322-FORM FOR SMALL ENTITY(FORM-28) [24-04-2024(online)].pdf | 2024-04-24 |
| 4 | 202441032322-FORM 1 [24-04-2024(online)].pdf | 2024-04-24 |
| 5 | 202441032322-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [24-04-2024(online)].pdf | 2024-04-24 |
| 6 | 202441032322-EVIDENCE FOR REGISTRATION UNDER SSI [24-04-2024(online)].pdf | 2024-04-24 |
| 7 | 202441032322-EDUCATIONAL INSTITUTION(S) [24-04-2024(online)].pdf | 2024-04-24 |
| 8 | 202441032322-DRAWINGS [24-04-2024(online)].pdf | 2024-04-24 |
| 9 | 202441032322-COMPLETE SPECIFICATION [24-04-2024(online)].pdf | 2024-04-24 |