Perplexity And Log Likelihood Based Approach For Text Classification

< Back

Perplexity And Log Likelihood Based Approach For Text Classification Using Causal Language Models

Abstract: State of art techniques using moderate sized Language Models (LMs) for text classification need fine-tuning or in-context learning. A method and system providing a two-step classification using moderate-sized (#params = 2.7B) causal LM (Gen AI) is disclosed. Firstly, for a text instance to be classified, a set of perplexity and log-likelihood based features are obtained from an LM. Further, a light-weight classifier is trained in the second step to predict the final label. The system enables a new way of exploiting the available labelled instances, in addition to the existing ways like fine-tuning LMs or in-context learning. It neither needs any parameter updates in LMs like fine-tuning nor it is restricted by the number of training examples to be provided in the prompt like in-context learning. The key advantages of the disclosed system are explainability through most suitable key phrases and its applicability in resource poor environment.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

18 December 2023

Publication Number

25/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point, Mumbai 400021, Maharashtra, India

Inventors

1. PAWAR, Sachin Sharad

Tata Consultancy Services Limited, Sahyadri Park 2 office, Rajiv Gandhi Infotech Park, Hinjewadi Phase-III, Pune - 411057, Maharashtra, India

2. RAMRAKHIYANI, Nitin Vijaykumar

Tata Consultancy Services Limited, Plot No.41, Phase-1, IT/ITES-SEZ, Garima Park, GND, Gandhinagar - 382009, Gujarat, India

3. SINHA, Anubhav

Tata Consultancy Services Limited, Gopalan Global Axis Block-H, Rd Number 9, Whitefield, KIADB Export Promotion Industrial Area, Bangalore - 560066, Karnataka, India

4. APTE, Manoj Madhav

Tata Consultancy Services Limited, Sahyadri Park 2 office, Rajiv Gandhi Infotech Park, Hinjewadi Phase-III, Pune - 411057, Maharashtra, India

5. PALSHIKAR, Girish Keshav

Tata Consultancy Services Limited, Sahyadri Park 2 office, Rajiv Gandhi Infotech Park, Hinjewadi Phase-III, Pune - 411057, Maharashtra, India

Specification

Description:FORM 2 THE PATENTS ACT, 1970 (39 of 1970) & THE PATENT RULES, 2003 COMPLETE SPECIFICATION (See Section 10 and Rule 13) Title of invention: PERPLEXITY AND LOG-LIKELIHOOD BASED APPROACH FOR TEXT CLASSIFICATION USING CAUSAL LANGUAGE MODELS Applicant: Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956 Having address: Nirmal Building, 9th Floor, Nariman Point, Mumbai 400021, Maharashtra, India The following specification particularly describes the invention and the manner in which it is to be performed. TECHNICAL FIELD The embodiments herein generally relate to the field of Natural Language Processing and, more particularly, to a method and system for perplexity and log-likelihood based approach for text classification using causal Language Models (LMs). BACKGROUND In recent years, the autoregressive or causal Language models (LM) like Generative pre-trained transformers (GPT)-3 and GPT-Neo have been successful in a variety of natural language processing tasks such as summarization, machine translation, question answering, etc. Recently, there have been attempts to use such LMs for text classification in zero-shot or few-shot manner. In these approaches, there are several challenges for using moderate-sized LMs like GPT-Neo-2.7B for text classification in both zero-shot as well as few-shot settings. In a zero-shot setting, getting the LM to generate an output containing the expected class labels is challenging. E.g., in case of SST-2 dataset for sentiment prediction, in spite of providing specific instruction in the prompt, for only around 10% test instances, the generated text contained the expected Positive and Negative labels. Most cases resulted in generating some random text or generating some text containing words like mess or brilliant from which the actual labels need to be inferred in a non-trivial way as can be seen in Table 1. Table 1 PROMPT: This is an overall sentiment classifier for movie reviews. Classify the overall SENTIMENT of the INPUT as Positive or Negative. INPUT: If this movie were a book, it would be a page turner, you can’t wait to see what happens next. SENTIMENT: This movie is a mess ( generated text) PROMPT: This is an overall sentiment classifier for movie reviews. A review with Positive SENTIMENT finds the movie to be great, good, encouraging, brilliant, excellent, accurate, realistic, engaging, funny, or exciting. A review with Negative SENTIMENT finds the movie to be terrible, bad, unrealistic, frustrating, boring, forgettable, predictable, thoughtless, appalling, or incomprehensible. Classify the overall SENTIMENT of the INPUT as Positive or Negative. INPUT: Together, Tok and O orchestrate a buoyant, darkly funny dance of death. SENTIMENT: Tok and O are a couple of misfits who... ( generated text) In a few-shot setting, the generated output conforms to the expected format in most cases. However, due to the limited context window of the LM, a large number of training instances cannot be provided in the prompt. This limits the ability of the LM to exploit the available labelled examples. Another way of exploiting training examples is through fine-tuning the LM. However, this requires specialized hardware resources (like GPUs with significant RAM) and time for fine-tuning. Very large LMs like GPT-3 may not face these above challenges, but their usage through Application Programming Interfaces (APIs) entails sharing the data to be classified and this may not be desirable for private and confidential data. Moderate-sized LMs such as GPT-Neo-2.7B can be deployed with very limited hardware in-house. Thus they can be useful resources for text classification requirements of organizations, where data privacy is critical. However, achieving text classification with desired accuracy using moderate sized LMs still remain an unaddressed technical challenge considering the technical limitations of such LMs as mentioned above. SUMMARY Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for text classification is provided. The method includes receiving a text, predefined numbers of class labels, a set of key phrases associated with each of the predefined class labels, and a connector sentence, wherein the text is to be classified into one or more class labels from among predefined class labels. Further, the method includes generating a plurality of label-specific augmentations for the text based on each key phrase among the set of key phrases associated with each of the predefined class labels, and the connector sentence. Further, the method includes deriving via a Language Model (LM) executed by one or more hardware processors, perplexity based key phrase level features and log-likelihood based key phrase level features for each of the plurality of label-specific augmentations. Each of the perplexity based key phrase level features captures a reduction in perplexity of a key phrase from the set of key phrases, wherein the reduction in perplexity is a ratio of conditional perplexity of the key phrase given the text to be classified, to the perplexity of the key phrase. Each of the log-likelihood based key phrase level features captures an increase in log-likelihood of the key phrase from the set of key phrases, wherein the increase in log-likelihood is a difference between conditional log-likelihood of the key phrase given the text to be classified, and log-likelihood of the key phrase. Furthermore, the method includes determining, i) a class level perplexity based feature for each of the predefined class labels as a minimum of perplexity based key phrase level features associated with the corresponding class label, and ii) a class level log-likelihood based feature for each of the predefined class labels as maximum of log-likelihood based key phrase level features associated with the corresponding class label. Further, the method includes predicting for a zero shot classification, the one or more class labels for the text based on one of : i) value of perplexity based class level features lying below a minimum threshold value; and ii) value of log-likelihood based class level features lying above a maximum threshold value. Furthermore, the method includes enhancing an accuracy of prediction of text classification of the text into one or more class labels using a pretrained supervised machine learning classifier that utilizes the perplexity based key phrase level features, log-likelihood based key phrase level features, the class level perplexity based features, and the class level log-likelihood based features. The supervised machine learning classifier is trained on the perplexity based key phrase level features, log-likelihood based key phrase level features, the class level perplexity based features, and the class level log-likelihood based features obtained for a training data. In another aspect, a system for text classification is provided. The system comprises a memory storing instructions; one or more Input/Output (I/O) interfaces; and one or more hardware processors coupled to the memory via the one or more I/O interfaces, wherein the one or more hardware processors are configured by the instructions to receive a text, predefined numbers of class labels, a set of key phrases associated with each of the predefined class labels, and a connector sentence, wherein the text is to be classified into one or more class labels from among predefined class labels. Further, the system is configured to generate a plurality of label-specific augmentations for the text based on each key phrase among the set of key phrases associated with each of the predefined class labels, and the connector sentence. Further, the system is configured to derive via a Language Model (LM) executed by the one or more hardware processors, perplexity based key phrase level features and log-likelihood based key phrase level features for each of the plurality of label-specific augmentations. Each of the perplexity based key phrase level features captures a reduction in perplexity of a key phrase from the set of key phrases, wherein the reduction in perplexity is a ratio of conditional perplexity of the key phrase given the text to be classified, to the perplexity of the key phrase. Each of the log-likelihood based key phrase level features captures an increase in log-likelihood of the key phrase from the set of key phrases, wherein the increase in log-likelihood is a difference between conditional log-likelihood of the key phrase given the text to be classified, and log-likelihood of the key phrase. Furthermore, the system is configured to determine, i) a class level perplexity based feature for each of the predefined class labels as a minimum of perplexity based key phrase level features associated with the corresponding class label, and ii) a class level log-likelihood based feature for each of the predefined class labels as maximum of log-likelihood based key phrase level features associated with the corresponding class label. Further, the system is configured to predict for a zero shot classification, the one or more class labels for the text based on one of : i) value of perplexity based class level features lying below a minimum threshold value; and ii) value of log-likelihood based class level features lying above a maximum threshold value. Furthermore, the system is configured to enhance an accuracy of prediction of text classification of the text into one or more class labels using a pretrained supervised machine learning classifier that utilizes the perplexity based key phrase level features, log-likelihood based key phrase level features, the class level perplexity based features, and the class level log-likelihood based features. The supervised machine learning classifier is trained on the perplexity based key phrase level features, log-likelihood based key phrase level features, the class level perplexity based features, and the class level log-likelihood based features obtained for a training data. In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for text classification. The method includes receiving a text, predefined numbers of class labels, a set of key phrases associated with each of the predefined class labels, and a connector sentence, wherein the text is to be classified into one or more class labels from among predefined class labels. Further, the method includes generating a plurality of label-specific augmentations for the text based on each key phrase among the set of key phrases associated with each of the predefined class labels, and the connector sentence. Further, the method includes deriving via a Language Model (LM) executed by the one or more hardware processors, perplexity based key phrase level features and log-likelihood based key phrase level features for each of the plurality of label-specific augmentations. Each of the perplexity based key phrase level features captures a reduction in perplexity of a key phrase from the set of key phrases, wherein the reduction in perplexity is a ratio of conditional perplexity of the key phrase given the text to be classified, to the perplexity of the key phrase. Each of the log-likelihood based key phrase level features captures an increase in log-likelihood of the key phrase from the set of key phrases, wherein the increase in log-likelihood is a difference between conditional log-likelihood of the key phrase given the text to be classified, and log-likelihood of the key phrase. Furthermore, the method includes determining, i) a class level perplexity based feature for each of the predefined class labels as a minimum of perplexity based key phrase level features associated with the corresponding class label, and ii) a class level log-likelihood based feature for each of the predefined class labels as maximum of log-likelihood based key phrase level features associated with the corresponding class label. Further, the method includes predicting for a zero shot classification, the one or more class labels for the text based on one of : i) value of perplexity based class level features lying below a minimum threshold value; and ii) value of log-likelihood based class level features lying above a maximum threshold value. Furthermore, the method includes enhancing an accuracy of prediction of text classification of the text into one or more class labels using a pretrained supervised machine learning classifier that utilizes the perplexity based key phrase level features, log-likelihood based key phrase level features, the class level perplexity based features, and the class level log-likelihood based features. The supervised machine learning classifier is trained on the perplexity based key phrase level features, log-likelihood based key phrase level features, the class level perplexity based features, and the class level log-likelihood based features obtained for a training data. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles: FIG. 1A is a functional block diagram of a system, for perplexity and log-likelihood based approach for text classification using causal Language Models (LMs), in accordance with some embodiments of the present disclosure. FIG. 1B illustrates an architectural overview of the system of FIG. 1A, in accordance with some embodiments of the present disclosure. FIGS. 2A through 2B (collectively referred as FIG. 2) is a flow diagram illustrating a method for perplexity and log-likelihood based approach for text classification using the causal LMs, using the system depicted in FIG. 1A and 1B, in accordance with some embodiments of the present disclosure. FIGS. 3 through 5 (collectively referred as FIG. 3) are graphical illustrations for comparative analysis of the system of FIG. 1 with respect to state of the art (SoA) approaches for text classification, in accordance with some embodiments of the present disclosure. It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown. DETAILED DESCRIPTION OF EMBODIMENTS Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. While Language Models (LMs) enhance performance across various Natural Language Processing (NLP) tasks, prior research has revealed several challenges when applying them to text classification, such as designing appropriate prompts in zero-shot setting, limited input prompt length when using in-context learning, and costly as well as time consuming fine-tuning. Given these constraints, there is a line of research which explores ways using moderate-sized LMs for text classification. One of the recent prominent work in this area is by Min et al. (2022). They introduce “noisy channel” as well as “direct” methods which compute conditional probability of the input text given the label or vice versa, for few-shot text classification through in-context learning and prompt tuning. Another work by Estienne (2023) wherein the authors propose to calibrate output probabilities of a LM through prior adaptation to perform text classification tasks. They propose two variations of their approach – unsupervised (UCPA) where no labelled data is needed and semi-unsupervised (SUCPA) where some training examples (600) are used for prior adaptation. Both Min et al. (2022) and Estienne (2023) are considered as Baseline approaches for performance of the method and system disclosed herein as they use moderate-sized LMs such as Generative pre-trained transformers (GPT2-XL). A method and system disclosed herein, partially is based on similar approach of Min et.al. (2022) of computing conditional perplexity but there are several key differences such as (i) computing multiple features (perplexity (PPL) and (log-likelihood (LL)) using domain knowledge based key phrases, (ii) no limitation on number of training examples, and (iii) learning ML classifier based on these features. Further, the technical limitations of moderate sized causal/autoregressive Language Models (LMs) mentioned in the background section are addressed by embodiments of the present disclosure. The central idea relied on is that generating new text using LMs is not absolutely essential for text classification as it is in case of other tasks such as summarization or machine translation, because the final goal is simply to discriminate among a finite set of class labels. Embodiments of the method and system disclosed herein provides perplexity and log-likelihood based approach for text classification using causal or autoregressive Language Models (LMs). The method discloses a two-step technique for text classification. In the first step, for any text X to be classified, a set of feature values are elicited from the LM based on perplexity and log-likelihood of certain label-specific augmentations of X. These augmentations are of the form “X. This text is about ?key phrase?.” where only a set of key phrases associated with each class label is required. In a zero shot setting, only this first step is required, and a class label is predicted by a simple relative comparison of these feature values. In a supervised setting where labelled training instances are available, the second step is needed to train a light-weight supervised machine learning (ML) classifier using the feature values obtained for the training instances. The trained classifier can then be used to predict the class label for any new instance to be classified. Even though LMs mostly discussed herein are moderate sized (#parameters = 2.7B) and open-source autoregressive language models for text classification, it can be understood that the method is equally applicable for Large LMs (LLMs). The system and method disclosed herein attempts to improve the accuracy of the moderate-sized LMs by using our technique with respect to standard zero-shot/few-shot prompting techniques using these LMs. Mentioned below is the well-known concept of Perplexity and the way it has been used by the method disclosed herein. Also explained is the Log-likelihood function and the manner in which both perplexity and log-likelihood is used for text classification using causal LMs. Perplexity: This is used in the art as a metric to evaluate language models. Intuitively, a better model of a text is the one which assigns a higher probability to a word that actually occurs. However, in the method disclosed herein, the perplexity is used for a different purpose; judging plausibility of a text fragment using an autoregressive or causal LM and comparing multiple such text fragments to decide which one is the most plausible. Here, by plausibility of a text, means it is seemingly more reasonable or probable. Consider a text fragment X = [w1,w2, · · · ,wn], which consists of n tokens. The perplexity of X as computed by an LM (M) is as follows: PPL_M (?)= ?_(i=1)^n¦v(n&1/(P_M (??_i |??_(

Documents

Application Documents

#	Name	Date
1	202321086642-STATEMENT OF UNDERTAKING (FORM 3) [18-12-2023(online)].pdf	2023-12-18
2	202321086642-REQUEST FOR EXAMINATION (FORM-18) [18-12-2023(online)].pdf	2023-12-18
3	202321086642-FORM 18 [18-12-2023(online)].pdf	2023-12-18
4	202321086642-FORM 1 [18-12-2023(online)].pdf	2023-12-18
5	202321086642-FIGURE OF ABSTRACT [18-12-2023(online)].pdf	2023-12-18
6	202321086642-DRAWINGS [18-12-2023(online)].pdf	2023-12-18
7	202321086642-DECLARATION OF INVENTORSHIP (FORM 5) [18-12-2023(online)].pdf	2023-12-18
8	202321086642-COMPLETE SPECIFICATION [18-12-2023(online)].pdf	2023-12-18
9	202321086642-FORM-26 [17-01-2024(online)].pdf	2024-01-17
10	Abstract1.jpg	2024-02-28
11	202321086642-Proof of Right [23-05-2024(online)].pdf	2024-05-23
12	202321086642-Request Letter-Correspondence [16-12-2024(online)].pdf	2024-12-16
13	202321086642-Power of Attorney [16-12-2024(online)].pdf	2024-12-16
14	202321086642-Form 1 (Submitted on date of filing) [16-12-2024(online)].pdf	2024-12-16
15	202321086642-Covering Letter [16-12-2024(online)].pdf	2024-12-16
16	202321086642-CERTIFIED COPIES TRANSMISSION TO IB [16-12-2024(online)].pdf	2024-12-16
17	202321086642-FORM 3 [23-01-2025(online)].pdf	2025-01-23