Sign In to Follow Application
View All Documents & Correspondence

System/Method For Automatic Sentiment Analysis For Summarization And Categorization Of Text Using Nlp

Abstract: Now a days most of the products sold through online based on the reviews and ratings of the product. For analyzing these reviews many sentiment analysis and classification algorithms do exist, but they are limited in the use of number of reviews, use of polarity, use of standard data domains and use of multiple classifiers. These existing approaches and algorithms have worked upon the limited or few product reviews. Most of them have used a limited set of products for the implementation. Moreover, the polarity was checked only for the positive and negative reviews using only one classifier. Automated sentiment analysis is a subset of Machine Learning and Natural Language Processing (NLP), which collects data on language, emotions, and images for categorization, summarization, and classification in subjective review data. It is also known as subjective analysis, and it is used to classify text depending on the direction and polarity of consumer opinions. The sentiment analysis is mostly used to determine if a user's review is favorable, negative, or neutral in order to determine the product's popularity or significance in the market. The proposed PCSA system is a general and automatic comment analyzer that can determine the polarity of sentiments and remarks extremely effectively. It uses five primary supervised learning classifiers, including such Logistic Regression, Random Forest, K-Nearest Neighbor, and WordNet, to summarize the comments and then classify them into positive, negative, or neutral categories. 4 Claims & 3 Figures

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
26 November 2022
Publication Number
51/2022
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ipfc@mlrinstitutions.ac.in
Parent Application

Applicants

MLR Institute of Technology
Laxman Reddy Avenue, Dundigal – 500 043, Medchal–District

Inventors

1. Mrs. M. Harshini
Department of Information Technology, MLR Institute of Technology, Hyderabad
2. Dr. Thatha Venkata Nagaraju
Department of Information Technology, MLR Institute of Technology, Hyderabad
3. Dr. Nagireddy Venkata Rajasekhar Reddy
Department of Information Technology, MLR Institute of Technology, Hyderabad
4. Dr. Allam Balaram
Department of Information Technology, MLR Institute of Technology, Hyderabad
5. Dr. Murali Krishna Namana
Department of CSE, AVN Institute of Engineering and Technology, Hyderabad-500070
6. Mrs. Shruthi Patil
Department of Information Technology, MLR Institute of Technology, Hyderabad
7. Mrs. Jeethu Philip
Department of Information Technology, MLR Institute of Technology, Hyderabad
8. Mr. D. Sandeep
Department of Information Technology, MLR Institute of Technology, Hyderabad

Specification

Description:SYSTEM/METHOD FOR AUTOMATIC SENTIMENT ANALYSIS FOR SUMMARIZATION AND CATEGORIZATION OF TEXT USING NLP
Field of Invention
Now a days most of the retail industry market sell their products through online and consider the feedback, reviews on the products from the customers. Sentiment analysis also called as opinion mining plays an important role for analyzing the feedbacks received from the customers. Generally, this automatic sentiment analysis identifies different types of opinions from the customers for different products in the form of text reviews. To analyze these text reviews using sentiment analysis Natural Language comes into play. We havemore number of data sources which will sell the online products, such as meeshow, Amazon, Flipkart etc.This sentiment analysis is used fortext analysis, language processing to recognize and extract individual information from different sources. This sentiment analysis is applied to analyze the reviews and social media for different kinds of uses like customer service, marketing etc. Properties of the products is determined by the characteristics of attributes.Sentiments mainly specify the viewpoints of the customers positive, negative and may be neutral viewpoint. Opinion of the customer is expressed for the whole product or characteristics of each individual attribute.The main aim of this sentiment analysis is analyzing of text reviews and ratings of the products.
Background of the Invention
Now a day’s large number of customers purchases their products through different online platforms.The E-commerce industry encompasses the diverse set of websites which sell and buy many products and services throughout the world. Such websites face a lot of competition with each other, that's why they need the customer's feedbacks, comments and suggestions for each perspective. These comments take the form of their opinions and feelings which express the customer's real feelings about this product. However, the increased numbers of such comments also increase the dimensions of the data. In addition, such a large number of comments include the different types of polarity, which can be positive or negative of any other form. These requirements lead to an automatic sentiment classification system .It must be able to divide product reviews into positive, negative or neutral categories. The analysis of such systems also determines the content form of the review. These reviews express the feelings and emotions of the customer for a product(US9223831B2). These feelings are generally unidirectional, but sometimes they are multidirectional and thereby lead to confusing and ambiguous results. The reviews may contain the structured or unstructured text, the emoticons, or the mixture of the text and emoticons. The unstructured reviews are first converted into the structured reviews and after that they are used for further processing. The automatic sentiment classification follows a sequence of steps such as collecting reviews, lowercase text conversion and removal of additional space, tokenization, stop word removal, stemming, feature extraction, classification and performance analysis. The product reviews are collected in thousands and laces from the standard websites, say Amazon and Flipkart. Their structured form is stemmed, tokenized and tagged. The characteristics are extracted from it and are then required to train the classifier. These trained classifiers are used to check the unknown set of reviews for different products and to calculate the polarity of the comments. The performance of such a system is assessed for accuracy, recall, precision and F1 score. In light of these views, many systems and tools for sentiment analysis have evolved over the years(US8818788B1). These systems have been implemented for different types of products and services using different supervised based classifiers. All such systems and methodologies are elaborated and compared in upcoming sections.
Summary of the Invention
Product Comment summarizer and Analyzer (PCSA) system began with the user registration process and asked the user to enter some details on to the PCSA portal. In this way, the system first acquired the product reviews from the two online websites Amazon and Flipkart; pretreated and segmented them; extracted their features; summarized them; classified them using into positive, negative and neutral categories; Finally provided the ratings to the products. The classification was performed using five supervised learning algorithm, which were logistic regression, Naïve Bayes,randomForest, SentiWordNet, and KNN. PCSA model displayed the results on the portal., a huge number of reviews are gathered for tens of products category from commercial websites likeFlipkart and Amazon.Out of four quarters, three quarters of the product subcategories are used for the training phase and one quarter of the subcategories for the testing phase. The results obtained for categorical reviews, ratings,synthesis and classification of reviews. Many observations were found from the experimental results and their analysis. From the analysis the highest positive reviews and highest negative reviews are obtained for the category of phones and highest neutral reviews are for both phones and Routers. Next highest ratings are achieved using SentiWordNet with logistic regression classifiers for both flipkart and amazon online flatforms. Next, PCSA model predicts the highest star rating to the items available in Flipkart compared to Amazon.
Brief Description of Drawings
Figure 1: PCSA system level-A: Registration, Analysis and Results
Figure 2: PCSA system level-B: Detailed PCSA system Design
Figure 3: PCSA component and sub-component structure

Detailed Description of the Invention
Product Comment Summarizer and Analyzer (PCSA) system design is a robust, generic, which segregates the English comments received from the websites like Flipkart and Amazon by using five popular supervised learning classification techniques. The techniques include logistic regression, Naïve Bayes, random forest, SentiWordNet, and KNN. The PCSA system is designed in the training and testing phases. During the training phase users are allowed for registration and the credentials of the users are stored in databases. This phase accepts multiple number of URLS from different websites like Flipkart and Amazon. In this phase first Preprocessing of comments are done and then sentences are divided into different segments and extracts the features from these segments and finally summarize and classify them according to the classification algorithms. In this technique system is designed by using all the five classification algorithms. The model is designed in such a way that the comments of multiple products are classified by using any one technique at a time.During the system testing, this proposed PCSA system is tested for a different unknown set of product comments collected from both websites. It goes through all these steps one by one and the chosen trained classifier categorizes these comments efficiently.

The PCSA model design is organized into two primary levels called Level-A and Level-B. The PCSA system level-A describes the user registration process, and system analysis and result display through the online portal. It is shown in Fig. 1. This level-A is configured into three functions which are user registration, PCSA analysis process and PCSA result display. First user registration accepts the profile details and profile picture from the user. After that user has to enter name, E-mail and password. The system stores all the user information in the database called User Profile Storage. The user registration process also contains update option for password and picture after authentication process. During the PCSA analysis process, the registered user enters the number of URLs and the set of URL links, and then chooses the classification algorithm out of five classification algorithms. The comment analyzer analyzes the products and then displays the results through the configuration. The PCSA result display depicts and compares the actual and predicted results. These results are shown for the star rating, the precision, recall and F1-measure, the positive, negative and neutral reviews, and graphical representation.
The PCSA system level-A describes the detailed PCSA system design as shown in fig. 2. PCSA model is implemented in two phases. First one is Training and second one is Testing phase. Based on the above phases PCSA model uses two storage media called NLP toolkit and class repository. The NLTK consists of English directory, available stop words and Wordnet information. The class repository consists of three classes called positive, negative and neutral. During training, system collects a large set of known comments of multiple products. It preprocesses them by performing removal of stop words, stemming and then lemmatization. It segments the comments into sentences using the delimiters. Then it extracts the relevant words present in the comments and discards the rest of the irrelevant words. These relevant words can be adjectives, nouns, adverbs and verbs. After preprocessing, the system gathers features from the comments after preprocessing, which are used to train all five classifiers. The testing stage of the PCSA system tests an unknown collection of user comments gathered from the websites by summarizing and classifying them into three distinct classes. To begin with, a user has to register into the system. The user has to provide the number of URL links, set of URL links and the classification technique. The system goes through them all the steps one by one and classifies the comments into positive, negative and neutral categories. This system compares these predicted review results of Amazon or Flipkart products with the actual review results of Amazon or Flipkart products.
The PCSA system is organized into many components and sub components shown in fig 3. The figure shows how to call and return among the components. The system mainly contains 4 components named Set_CA, Set_Login, Train_Test and User_Profiling and also called as C1, C2, C3 and C4 components. The component C1 is subdivide into 3 sub components Basic_Settings, Main_Func and Set_Env, which are represented as C11, C12 and C13 respectively in Figure. The second component C2 is sub divided inot three sub components Basic_Settings, Main_Function and Set_Env which are represented as C21, C22 and C23 respectively. The third component is divided into seven sub components Amazon_Word_Scraper, Flipkart_Word_Scraper, Product_Info, Review_Modeling, ML_Method, Test_Train_Data and Sentiment_Analyze. These components are represented as C31, C32, C33, C34, C35, C36 and C37 respectively. Finally, the fourth component is divided into four sub components called Define_Profile, User_Profile, Str_User_Forms and Registration and these are represented as C41, C42, C43 and C44 respectively.
The Product Comment Summarizer and Analyzer (PCSA) system is a powerful comment analyzer that analyses feedback from the Amazon or Flipkart shopping websites. The comments are analyzed through five prime classification algorithms called Naïve Bayes, WordNet, random forest, logistic regression and KNN. The proposed PCSA system illustrates its strength in many points. Its first strength is that it never lets a user in without its login credentials. Secondly, the password strength is kept very high, so that no unauthenticated user can access it. Thirdly, it can analyze any number of URL links, i.e. products at a time. Fourthly, the system works on all types of textual reviews of both data sets. Fifthly, it summarizes all the reviews and provides the rating to the product as well. Sixthly, it is a multi-tasking system, which analyzes, summarizes and classifies the product comments simultaneously. Seventhly, the system time complexity is kept low. Eighthly, it is a fully automated, efficient, fast and easy-to-understand sentiment analysis system. Ninthly, the prediction results obtained from any classification algorithm are also compared with the original results of the Amazon or Flipkart. It is found that the PCSA system provides better results than the original results.

4 Claims & 3 Figures , Claims:The scope of the invention is defined by the following claims:

Claim:
The System/Method for Automatic Sentiment Analysis for Summarization and Categorization of Text Using NLP comprising the steps of:
a) Identify and analyze the comments and relations among the text.
b) Adopted a method for efficient reviewing and rating of text documents collected from different online web sites.
c) A framework and component architecture for describing the text comments and ratings analysis step by step.
2. The System/Method for Automatic sentiment analysis for summarization and categorization of text using NLP as claimed in claim1, a novel approach is designed that specifies the relation of comment analyser with the NLP and Machine Learning.
3. The System/Method for Automatic sentiment analysis for summarization and categorization of text using NLP as claimed in claim1, led to the construction of a new, automated, and efficient sentiment review analyzer.
4. The System/Method for Automatic sentiment analysis for summarization and categorization of text using NLP as claimed in claim1, Adopted a method PCSA for complete analysis of text summarization and categorization.

Documents

Application Documents

# Name Date
1 202241068094-COMPLETE SPECIFICATION [26-11-2022(online)].pdf 2022-11-26
1 202241068094-REQUEST FOR EARLY PUBLICATION(FORM-9) [26-11-2022(online)].pdf 2022-11-26
2 202241068094-DRAWINGS [26-11-2022(online)].pdf 2022-11-26
2 202241068094-FORM-9 [26-11-2022(online)].pdf 2022-11-26
3 202241068094-EDUCATIONAL INSTITUTION(S) [26-11-2022(online)].pdf 2022-11-26
3 202241068094-FORM FOR SMALL ENTITY(FORM-28) [26-11-2022(online)].pdf 2022-11-26
4 202241068094-EVIDENCE FOR REGISTRATION UNDER SSI [26-11-2022(online)].pdf 2022-11-26
4 202241068094-FORM FOR SMALL ENTITY [26-11-2022(online)].pdf 2022-11-26
5 202241068094-FORM 1 [26-11-2022(online)].pdf 2022-11-26
5 202241068094-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [26-11-2022(online)].pdf 2022-11-26
6 202241068094-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [26-11-2022(online)].pdf 2022-11-26
6 202241068094-FORM 1 [26-11-2022(online)].pdf 2022-11-26
7 202241068094-EVIDENCE FOR REGISTRATION UNDER SSI [26-11-2022(online)].pdf 2022-11-26
7 202241068094-FORM FOR SMALL ENTITY [26-11-2022(online)].pdf 2022-11-26
8 202241068094-EDUCATIONAL INSTITUTION(S) [26-11-2022(online)].pdf 2022-11-26
8 202241068094-FORM FOR SMALL ENTITY(FORM-28) [26-11-2022(online)].pdf 2022-11-26
9 202241068094-DRAWINGS [26-11-2022(online)].pdf 2022-11-26
9 202241068094-FORM-9 [26-11-2022(online)].pdf 2022-11-26
10 202241068094-REQUEST FOR EARLY PUBLICATION(FORM-9) [26-11-2022(online)].pdf 2022-11-26
10 202241068094-COMPLETE SPECIFICATION [26-11-2022(online)].pdf 2022-11-26