Abstract: Sentiment Analysis (SA) is an important area in natural language processing invention due to its considerable significance in understanding public opinions and executing exact opinion-based evaluations. A wide variety of data kinds, including pictures, videos, music, and text, are constantly flooding in due to the proliferation of online shopping and social media usage. Particularly important for inventors to pay close attention to is text, which is the most important type of unstructured data. Several approaches have been suggested to efficiently extract useful information from large datasets, which is understandable given the data's abundance. Recognizing polarity in lengthy consumer reviews is still a challenge because of the complexities of dealing with massive textual datasets generated by reviews, comments, tweets, and postings. In response to this difficulty, this research presents the Double Path Transformer Network (DPTN), an easy-to-understand architecture that models both global and local information for thorough review classification. The research suggests a parallel design that mixes a convolutional network with a strong self-attention mechanism to improve the attention path's synergy with the convolutional path. The hyper parameters are fine-tuned using the gaining-sharing knowledge optimization (GSK) method, which improves the classification accuracy of the model. Even without clear metrics for class imbalances, the research shows that optimization techniques and deep learning work together to control them finessefully.
Description:Field of Invention
The present invention relates to the field of artificial intelligence, natural language processing (NLP), and sentiment analysis, specifically focusing on deep learning-based sentiment classification in large-scale e-commerce platforms. It introduces a novel GSK-Based Double Path Transformer Network designed to enhance the accuracy and efficiency of sentiment analysis by leveraging advanced machine learning techniques for analyzing public sentiment in online reviews. This invention finds applications in computational linguistics, opinion mining, automated customer feedback analysis, and business intelligence, enabling more precise consumer sentiment tracking and market trend predictions. Additionally, it contributes to big data analytics by optimizing large-scale text data processing for improved decision-making in e-commerce platforms.
Background of the Invention
One of the most important aspects of developing social relationships since the beginning of human civilisation has been learning how to communicate effectively. Almost every segment of society today makes use of social media because of how effective it has become as a tool for networking. The majority of social media platforms are online marketplaces. As awareness of e-commerce grows rapidly, more and more people are opting to do their shopping online. Customers can rate a wide range of experiences, products, and services on social media, with ratings ranging from good to negative. Criticism is essential to a business's growth since it allows for the improvement of services. In this case, sentiment analysis is helpful. Sentiment analysis helps to provide light on consumer sentiment by examining the overall tone of product reviews. There are three levels of sentiment analysis that are commonly used in invention studies: sentence-level, document-level, and phrase-level.
Sentiment analysis refers to the invention of human understanding of one's own ideas, sentiments, attitudes, and emotions. In natural language processing, sentiment analysis is where much of the innovation is happening. Data mining is an area that has seen a lot of innovation. Therefore, the scope of this research has broadened to include management and social science as well as traditional scientific disciplines. An ever-more-important tool, sentiment analysis is seeing a meteoric rise in use alongside microblogs, chat rooms, and Twitter. More opinions than ever before can be collected and analysed digitally. Opinions are at the heart of everything that humans do, which is why sentiment analysis tools are so popular in the for-profit and nonprofit worlds. Therefore, they effect our behaviour significantly. Our opinions, perceptions, and choices are shaped by the views and values of those around us. This is why, when faced with a decision, people often seek the opinions of others. This is beneficial for companies and individuals alike. Amazon customers leave ratings and reviews for products. Reviews serve a variety of purposes depending on the product in question, but in general, they allow companies to learn what customers like and dislike about their products and services and make adjustments as needed.
Since the expansion of the internet speeds up the flow of information, it is influencing the business sector. Some buyers will go so far as to post negative reviews or social media comments after buying a product. The online marketplace known as Amazon.com is one of many. An opinion engine is necessary for dealing with large volumes of text. Text mining is a tool that is utilised in sentiment analysis. Text mining allows for the discovery of hidden patterns or information by processing and analysing large datasets. Process, organise, group, and analyse large amounts of unstructured content, according to them; text mining can assist with this. Among the many mining techniques is classification (US8463595B1). Machine learning has become increasingly important in sentiment analysis during the past decade. Since deep learning processes were introduced and developed, the use of sentiment analysis has skyrocketed. It is hard to tell where in the review (feature) space trustworthy predictions can be expected because to the lack of explicitly defined confidence metrics for predictions made by such algorithms, which is a common problem in machine learning. Among the confidence predictors, conformal prediction provides a framework for data sharing, allowing for the use of instance-level confidence measures that are highly particular. Because most machine learning algorithms presume either exchangeability or, more commonly, that the data is IID (Independent and Identically Distributed), it is essential to remember that conformal prediction does not impose any further constraints on confidence predictors. Nearly 5.2 million reviews across four Amazon categories—beauty, books, electronics, and home—were sentimentally analysed using the word vector network. Along with density-based conformal prediction, this was done on three balanced datasets, one of which was the 50,000 IMDB review dataset (1:1).
Businesses and organizations are concentrating on public relations, campaigns, fortifying their weaknesses, and growing their clientele in response to the rising demand for SA. An organization greatly values customer feedback on its products and services (CN109271522B). Furthermore, political groups are interested in public opinion and media coverage. Understanding the tone of social media evaluations is currently SA's top priority. Healthcare, politics, entertainment, athletics, and industries pertaining to harassment are among those that have started to embrace SA. Improved methods of natural language processing (NLP), data mining for predictive research, and text contextual understanding are some of the current invention issues in SA. Support vector machines (SVMs) have long been used to solve a wide range of natural language processing (NLP) problems. In recent times, numerous natural language processing (NLP) applications have showcased state-of-the-art neural network (NN) approaches that are built on dense vector drawings. Early on, deep learning NNs demonstrated impressive performance on pattern recognition and computer vision tasks. Innovative algorithms for natural language processing (NLP) jobs, such as sentiment analysis, are part of this release.
As a service to their customers, Qorich and El Ouazzani have developed a CNN model that can identify positive and negative sentiment in review text. We also contrasted the word embedding representations of competing models with our proposed CNN model in an effort to identify the optimal one. The results demonstrate that various model designs can achieve satisfactory performances when tested on the Amazon reviews dataset. The results demonstrate the need of adding stop-words in sentiment analysis assignments. Neglecting to include them can result in inaccurate sentiment predictions. A 2% gain in accuracy was observed in the CNN model that used stop words in comparison to the one that did not. We also shown that utilising a random initialisation method performs better than supervised and embedded model vectors on large-scale datasets. Following word embedding representation training, the model can acquire more accurate features with reduced computing effort. Furthermore, we improved CNN accuracy to 90% on the Amazon reviews dataset, and our model surpassed the baseline ML and DL methods. Inventors set out to find out what factors most impact Amazon's stock price by analysing social media comments made by customers. To find out how much these Italian retailers' service attributes were linked to Amazon-related customer dissatisfaction, they used natural language processing (NLP)-based methods to examine the content and sentiment of user comments taken from their Facebook pages over two years (2016-2018). When it comes to pricing, service, in-store staff, and after-sale assistance, consumers have a lot to say about how Amazon has affected consumer electronics stores. When comparing negative feedback on the Italian Amazon website with customer reviews on Facebook, it's clear that customers are less satisfied with the service they receive from other companies due to Amazon's high standards. They suggest more research to clarify Amazonification in terms of customer impatience and dissatisfaction more generally, going beyond the well-known factors of cost and logistics. It was Jadhav's intention to sort mobile phone evaluations into star ratings and determine how well the ratings reflected the reviews' actual sentiments. The data had to be cleaned and pre-processed before this could be done. To turn the text into numbers, the TF-IDF method and word embedding were employed. We used a variety of approaches, including Support Vector Machines, Logistic Regression, and Ensemble models, and we evaluated how well they performed at accuracy. A variety of metrics are employed for assessment, including as F1-Score, recall, accuracy, and precision. When using a balanced classifier, Unigram yields superior results. Elahi et al. presented a novel hybrid recommender system that can read reviews and glean emotions to inform recommendations. They used complex algorithms to find people who could contribute additional data, such as the review sentiment, and let them know what to add. A number of industries, including music, showed no significant relationship between ratings and user evaluation attitude. That being said, sentiment could be an extra measure of user input that sheds light on a previously unseen aspect of customer preferences. Their proposed hybrid recommender system takes into consideration both star ratings and review sentiment to assess its efficacy. This study showed that the proposed hybrid recommender system performed better than many baselines using two popular datasets, Amazon Digital Music and Games. The comparisons were based on two evaluation scenarios: one where user feedback was taken as ratings, and the other where user input was collected as review feelings. A recommendation method based on sentiment analysis and matrix factorisation (SAMF) is provided by Liu and Zhao to assist in rating matrix recovery and suggestion. The technology completely mines the implicit information in reviews using deep learning methods and topic models. To begin, reviews (which include both user and item reviews) are subject to LDA (Latent Dirichlet Allocation) to build user topic distribution. Both the user feature matrix and the object feature matrix are created using the topic likelihood as their basis. To create a user-item preference matrix, the user feature matrix and the item feature matrix are combined in the second phase. Third, using the original rating matrix as a basis, construct the user-item rating matrix. Lastly, BERT (Bidirectional Encoder for Text) is used to update and modify the user-item rating matrix. BERT quantifies the sentiment info in the reviews and combines it with the rating matrix. The last step is to use the revised user-item rating matrix to provide rating predictions and Top-N suggestions. The proposed SAMF beats other conventional algorithms in terms of suggestion performance, according to trials run using Amazon datasets.
Summary of the Invention
One of the most significant themes in natural language processing development is sentiment analysis (SA), which is very important for understanding public opinions and doing accurate opinion-based evaluations. A wide variety of data kinds, including pictures, videos, music, and text, are constantly flooding in due to the proliferation of online shopping and social media usage. Particularly important for inventors to pay close attention to is text, which is the most important type of unstructured data. After realising sentiment analysis was vital to online life, the system's developers set out to enhance it. The suggested approach is the most computationally efficient way to achieve the same or better results with the greatest degree of assurance and the fewest resources used. Examining the effects of several preprocessing procedures on customer reviews, the study found that cleaning and normalising the data, eliminating punctuation and hash tags, transforming the text to lowercase, and tokenising were all beneficial. Here, we take a look at Amazon customer reviews using optimised deep learning approaches. To begin, data is cleaned up by retrieving it from Amazon.com and then running it through a series of pre-processing steps. Using a variety of word embedding techniques, the pre-processed texts are transformed into the attributes. To improve review detection performance, the innovation proposes a hierarchical dual-path backbone that gives CNN global attention. Next, provide a simple yet effective solution known as a bidirectional connection module. This module enables two concurrent ways of communicating context information. During the concentrating phase, the model employs a multi-head attention block to generate scalable features without relying on complex and computationally costly modules.
Brief Description of Drawings
Figure 1: Enhanced Concept Map Generation Process
Detailed Description of the Invention
Data pre-processing is crucial for text data analysis. Text data can become more complex due to repetitions and redundancies in tweets, blogs, reviews, and other types of textual information. Examining the effects of several preprocessing procedures on customer reviews, the study found that cleaning and normalising the data, eliminating punctuation and hash tags, transforming the text to lowercase, and tokenising were all beneficial. Data normalisation makes use of data pre-processing, a filtering operation. Normalising the data, tokenising words, removing stop words, padding, and extra spaces, converting text data to lowercase, and removing hash tagging are all examples of data pre-processing tasks. A lot of work went into getting the data in the right format for this job.
Periodic statements make up about 40% to 50% of all written content. Punctuation has no bearing on the outcomes of sentiment analysis models. Please remove these punctuations as they do not contribute to the sentiment analysis in any way. All punctuation was removed from the input data. Reviews written by customers often violate basic grammar norms; for instance, the text may contain both uppercase and lowercase letters. Approaches that are responsive to individual instances are extensively used in the study. This makes it difficult for the classifier to determine the polarity of the text. Your difficulty should disappear if you format the full text according to industry standards. But when you want to do the same thing by hand, you use the lower text statement. It converts all uppercase letters to lowercase while preserving the original forms of all other characters. Here is an example that shows how to change all capital characters to lowercase: One approach to making the phrase "myself working on natural language processing engineer in India" lowercase is to just convert it to that.
The process of tokenisation divides text streams into smaller units, such phrases, for analysis. Fragmented text is what makes up a token. Simplifying complex textual themes is the aim of this strategy. Data mining becomes less of a hassle when tokens are utilised. Tokenisation is essential for lexical evaluation and has applications in sentiment analysis and semantics. The tokenisation procedure is a crucial step in NLP pipelines. The study cannot proceed with model generation unless the text is appropriately cleaned. There are two subgroups of tokenisation: word tokenise and phrase tokenise. Two possible uses for this structured data are word count and word frequency analysis. The text data is now being parsed into individual words. From a massive and intricate record, little word or symbol packets are generated. Duplicate words are common in text files. Thus, getting rid of the stop words is crucial. Using stop words in writing is a certain way to detract from its overall quality. A lot of writing makes use of words like these. The stop words have been removed from this data set. This approach decreases textual content while increasing system efficiency. In every database, links no longer have any relevance. The sole practical use for the connections is their functionality. Based on the data collected, the study exclusively employs tweets, comments, and reviews to modify the text's polarity. Deleting the connections from the databases is, hence, critically important.
Additionally, hash tagging is currently all the rage. Customer feedback often makes use of hash tags. Hash tags are rather space-intensive. Hash tags are useless for sentiment analysis. All these do is raise the bar for classifiers' uncertainty. For this reason, doing away with hash tags is crucial. Eliminating hash tags from datasets makes the training data more clear and succinct. Not to mention that hash tagging is all the rage. Customer feedback often makes use of hash tags. Hash tags take up a lot of room. Hash tags are useless for sentiment analysis. All these do is raise the bar for classifiers' uncertainty. For this reason, doing away with hash tags is crucial. Eliminating hash tags from datasets makes the training data more clear and succinct. Since consumer review databases contain both extremely short and extremely long reviews, the classifier struggles with sentiment analysis. In a CNN evaluation, the total number of pixels is called CNN related padding. To ensure that all of our customers' reviews are the same length, we add a few zeroes to the end of our input review; this is called padding. The objective of POS is to categorise words in training data based on the grammatical form that predators have mined. This set considers the meaning of words. Applying labels to POS systems is not a picnic. Despite its usefulness for many other problems, point-of-sale labelling is unable to resolve the serious discovery challenge in opinion analysis. For the purpose of writing a product review, this process compiled multiple perspectives and factors. The modified POS tagger is used to specify a certain attribute. A POS tagging technique given by grammar relations is made available to users in the review. For the purpose of establishing the test's speech component, use Plus, the POS tags found the right major issues. In this POS process, the observable output is created using a concealed Markov model (HMM), where tags are concealed. When POS tags, the objective is to find the mathematically best tag sequence (C).
The model's structure is shown in Figure 1. The pipeline consists of a post-processing algorithm, a multi-head attention decoder, a backbone with Transformer improvement, and so on. Integrating self-attention with convolution, the invention employs a dual-path technique. It attempts to collect information within and across initial receptive fields by handling (FFN) for feature output. In order to derive multi-level features, the input data is first routed through an FPN assembly, following the usual CNN backbone paradigm. There are four levels to this structure, and the down sampling rates vary throughout them: f4, 8, 16, 32g, etc. Step two involves bringing all of the retrieved features up to the same scale using multi-head attention. I will use the feature to construct two maps: one for likelihood and one for threshold.
Since CNN and Transformer both have their limitations and complimentary strengths, the research suggests that they could work together to achieve better results. The inductive bias of convolutional layers allows them to model local relations. As illustrated in Figure 1, a 4×4 window is used in accordance with previous invention. Applying depth-wise size to the CNN path will allow us to maximise efficiency. Given the architectural differences between the two routes, we must also adjust the channel count to make it compatible with the merged branch; this will enable their seamless integration. Their outputs are normalised using different layers and then mixed following the channel correction. Strengthened feature representation learning is achieved by optimising both parallel branches simultaneously during training. Features are weaved across the two branches. Then, a sequential Feed Forward Network (FFN) is used to fuse the learnt relations in both pathways, resulting in the final output features.
Not only are the output properties of dual-path branches not concentrated, but the architecture also reveals that these qualities interact with one other in both directions. By utilising the complimentary signals offered by the parallel branches, representation learning can be improved in both branches. Two paths exist for extracting channel/spatial context; one relies on attention and the other on channel/spatial interaction. Within a single block, bidirectional interactions are composed of channel and spatial interactions. First of all, the invention makes use of a structure that is similar to the SE layer, which means that data is passed from the convolutional branch to the other branch via the channel interaction. A single GAP layer, two normalization/activation-connected layers, and a dynamic weighting mechanism for different channels make up this structure. The information from the branch is used to lower the channel sum to 1 through two successive 1 × 1 layers that link BN and GELU. Also, the spatial weights are distributed using a sigmoid layer. Summing up or cascade approaches are used by most semantic segmentation algorithms to incorporate input from various scales. However, if one were to pay scant attention, such a simplistic fusion paradigm would inevitably omit several vital components. Consequently, the research uses a multi-head attention decoder to refocus on the cases while preserving the integrity of the spatially hidden text area.
Applying appropriate post-processing techniques to the acquired features allows us to get more expressive text areas. Before the features can be parsed into the text sections, the innovation needs to binarize and create labels for them. The probability map's label creation was inspired by PSENet. Results from post-processing procedures are often shown as a cluster of vertices in a polygonal shape. Typically, the number of vertices (n) is determined by the labelling rule in different datasets, and the review findings (S) are represented in each dataset. As a way to efficiently generate an offset for shrinking the starting polygon, the Vatti clipping approach was proposed as a solution to the problem of defining the bounds of surrounding texts. It is possible to calculate the offset mathematically. In this case, we compute the area of a polygon using Area(), the perimeter of a polygon using Perimeter(), and we analytically fix the shrink ratio, r, to 0.4. By applying graphics-related operations to the results of reduced polygons, the kernel for each text segment may be easily generated from the original ground truth. Using a predefined threshold—usually 0.2 in this case—to binarize the probability map is the first step in producing the binary map. The second step of the innovation is to divide the text pixels into smaller portions by utilising the binary map. The reduced areas are given an offset D', which enlarges the final text prediction outcomes. , Claims:The scope of the invention is defined by the following claims:
Claim:
The Design of Analyzing Public Sentiment Analysis using gaining-sharing knowledge optimization technique with Double Path Transformer Network Approach comprising the steps of:
a) Perform preprocessing on text data like removal of noise, stemming, removal of hash tags, punctuation, convert the text into lowercase and then tokenization of text that Generates local and global information for categorization of reviews.
b) Improve the efficiency in CNN path by applying depth wise size and also need to change the number of channels so they align with the merged branch, allowing them to be flawlessly merged the two paths.
c) Design a new optimization technique for hyper parameter tuning that enhances accuracy of sentiment analysis.
2. According to claim1, Double Path Transformer Network (DPTN) is designed generating local and global information for categorization of reviews.
3. According to claim1, provides a parallel design Transformer-Enhanced Backbone is used that merges a CNN with a strong self-attention mechanism to improve the interaction between the two paths.
4. According to claim1, optimizes the model's hyper parameters using the gaining-sharing knowledge Optimization (GSK) method that mimics the procedure of learning and sharing information throughout the course of a person’s lifetime which boosts the accuracy of its categorization.
| # | Name | Date |
|---|---|---|
| 1 | 202541060945-REQUEST FOR EARLY PUBLICATION(FORM-9) [26-06-2025(online)].pdf | 2025-06-26 |
| 2 | 202541060945-FORM-9 [26-06-2025(online)].pdf | 2025-06-26 |
| 3 | 202541060945-FORM FOR STARTUP [26-06-2025(online)].pdf | 2025-06-26 |
| 4 | 202541060945-FORM FOR SMALL ENTITY(FORM-28) [26-06-2025(online)].pdf | 2025-06-26 |
| 5 | 202541060945-FORM 1 [26-06-2025(online)].pdf | 2025-06-26 |
| 6 | 202541060945-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [26-06-2025(online)].pdf | 2025-06-26 |
| 7 | 202541060945-EVIDENCE FOR REGISTRATION UNDER SSI [26-06-2025(online)].pdf | 2025-06-26 |
| 8 | 202541060945-EDUCATIONAL INSTITUTION(S) [26-06-2025(online)].pdf | 2025-06-26 |
| 9 | 202541060945-DRAWINGS [26-06-2025(online)].pdf | 2025-06-26 |
| 10 | 202541060945-COMPLETE SPECIFICATION [26-06-2025(online)].pdf | 2025-06-26 |