Abstract: A CONVERSATIONAL SYSTEM WITH ENTITY BASED INTENT CLASSIFICATION AND METHOD FOR OPERATING THE SAME Abstract The conversational system 100 comprises an input means 102, such as at least one microphone, to receive a query from a user 124. An output means 112, such as at least one speaker/display, to provide a response to the query, and a controller 110 to preprocess the query through a Natural Language Understanding (NLU) module 104, process an extracted entity and classified intent through a core module 108, and prepare response through a Natural Language Generation (NLG) module 106. The NLU module 104 comprises entity extractor 114 and intent classifier 116 for preprocessing, characterized in that, an output from the entity extractor 114 is given as input to the intent classifier 116. The conversational system 100 addresses this issue at its core, without majorly altering the existing architecture of the conversational systems. This enables the use of proprietary, existing or third-party intent classifier 116 and entity extractors 114. Figure 1
Description:Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed.
Field of the invention:
[0001] The present invention relates to a conversational system with entity based intent classification and method for operating the same.
Background of the invention:
[0002] There are a lot of conversation building and dialogue management frameworks like Rasa, Google DialogFlow, Microsoft Luis, Facebook Wit, Amazon Lex, etcetera. However, they all consider intent classification independent of the entities. Almost every conversation management framework uses an intent classifier trained over raw data provided to it. The intent classifier operates independently of entity extractor, i.e., intent classification is independent of the entity recognition. Having such architecture means that the entities have no influence over the intent classification, and intent classification requires a huge amount of manual data engineering to compensate that.
[0003] Because intent classifier sees 'raw' data (without annotations), there are high chances of overfitting (associating an intent) to an entity if the data is limited, or if training examples are not selected with extreme care. Consider the following table with intents and training examples. Even though the data is provided with entity annotation, those annotations go to entity extractor, and the intent classifier sees 'raw' data, like 'call me john', 'i am jane'. This create a problem if data is limited, or extreme care is not taken in exhaustive data engineering.
Intent - User gives name Intent - Phone someone
Call me [John](NAME) Call [Jane](NAME)
I am [Jane](NAME) Connect to [Jacob](NAME)
[0004] For this example, when user utters the query 'Call John’, it is classified as 'User gives name' instead of 'Phone someone'. This is because of the strong association of the intent 'User gives name' to the entity 'John'. In other words, the classifier has only seen the pair ('call', 'John') in intent 'User gives name'. Although it has seen 'call' in the intent 'Phone someone', the entity 'John' doesn't exist in training data of that intent. This way, the classifier has a stronger association of entity 'John' with the intent 'User gives name'.
[0005] One way to overcome this problem is to provide a large set of training data that includes a variety of entities. In above example, this means making sure that all the entities 'John', 'Jane' and 'Jacob' exists in both the intents. This doesn’t stop to only three names. For a large number of entities, and the training data explodes to an unmanageable pile of largely similar data.
[0006] Such a pile leads to training data where majority of the examples have similar structure. For the example above, if the training data is added for 'Phone someone' intent, its training data might contain a lot of examples with 'call' verb in it, leading to strong association of verb 'call' with the intent 'Phone someone'. Hence, it may now misclassify data from other intents, like 'User gives name' ('Call me Purvish') or 'Book taxi' ('Call a cab please'), that have examples with verb 'call' to 'Phone someone'. Hence, to balance this imbalanced data, a lot of examples in other such intents may have to be added. This increases burden on manual data engineering and maintenance.
[0007] Additionally, because the entity type has no influence over intent classification, it is difficult to train the model to differentiate between largely overlapping examples. For example, classifying 'Let’s hear Joe Biden' under 'Turn on political news channel' intent and 'Let’s hear Eminem' under 'Start playing (Eminem) songs' is difficult because the only difference between these two is the type of entity and meta-data of the entity. Training an intent classifier that only sees 'raw' data is challenging. It can increase the confusion of the classifier and reduce its confidence, often leading to 'I don’t understand' scenario.
[0008] Facebook Luis has a workaround to partially address this in an interesting way. It provides an exhaustively pretrained entity classifier leveraging its large dataset. However, this works only for entity of person names and a few other predefined entity classes. For domain specific or custom entities, like say diseases names, restaurant names, etcetera, this would not work, because there is no predefined model. Additionally, the predefined model might have biases being trained over universal data. Hence, if the system is targeted towards a region, which is the case often, the pretrained model might not be as effective. Additionally, adding regional entities means going through previously discussed problems again.
[0009] According to a patent literature US10628483, an entity resolution with ranking is disclosed. The system is configured to identify an entity referred to in speech or text by comparing the text of the entity mention to a database of the domain of the entity. The system may obtain a group of potential matches from the database and may then discriminatively rank those potential matches according to specific features identified for the context of the speech and/or text.
Brief description of the accompanying drawings:
[0010] An embodiment of the disclosure is described with reference to the following accompanying drawings,
[0011] Fig. 1 illustrates a block diagram of a conversational system, according to an embodiment of the present invention, and
[0012] Fig. 2 illustrates a method of operating the conversational system, according to the present invention.
Detailed description of the embodiments:
[0013] Fig. 1 illustrates a block diagram of a conversational system, according to an embodiment of the present invention. The conversational system 100 comprises an input means 102, such as at least one microphone, to receive a query from a user 124. An output means 112, such as at least one speaker/display, to provide a response to the query, and a controller 110 to preprocess the query through a Natural Language Understanding (NLU) module 104, process an extracted entity and classified intent through a core module 108, and prepare response through a Natural Language Generation (NLG) module 106. The NLU module 104 comprises entity extractor 114 and intent classifier 116 for preprocessing, characterized in that, an output from the entity extractor 114 is given as input to the intent classifier 116. The query from the user 124 is received either through audio or text.
[0014] According to the present invention, the input to the intent classifier 116 comprises a modified query having entities replaced with respective classes. In an embodiment, the controller 110 accesses a table/list 118 of entities and corresponding classes which are predefined and stored in the memory element (not shown) of the controller 110. Here, a list 118 of entities and their respective classes are predetermined and stored in the memory element of the controller 110. The controller 110 creates a mapping function between the entity and classes. When a sentence is received from the end user 124, the words in the sentence are masked and replaced, if at all, with their respective entity classes ('Eminem' ? '$SINGER', 'Joe Biden' ? '$POLITICIAN').
[0015] In another embodiment, the controller 110 is configured to access at least one from a group 120 comprising a pretrained language model and entity extractor to provide recognized entities with respective classes. Such group 120 usually outputs a map of recognised entities with their own predefined classes like name, location, etcetera. Such class names are alterable to match a format such as all capital letters, preceded by '$' (or any other symbol) followed by replacement of all the entities with their respective entity classes. Example provided in preceding paragraph.
[0016] In yet another embodiment, the controller 110 configured to access a knowledge source 122 with predefined metadata types to replace entities with respective classes. The knowledge source 122 is at least one of a public source and a private source. The NLU module 104 is configured with an access to existing/new private or public knowledge source 122, along with definition of metadata types (like occupation of a person). The entity classes are conditioned based on the values of the metadata types. For example, if occupation includes Politician, mark it as '$POLITICIAN'. When user 124 gives a sentence, the various parameters of the sentence like object, subject are searched for in the knowledge source. If any required metadata are found (like occupation), they are tested for the defined condition (inclusion of politician in the occupation). If the conditions hold, the controller 110 replaces them with their respective entity classes ('$POLITICIAN') as defined.
[0017] According to an embodiment of the present invention, the controller 110 is configurable with selection of at least one of the list 118, the group 120 and the knowledge source 122. Regardless of the translation strategy, as a result, the important entities in user sentence are masked with their respective entity classes. For example, 'Let’s hear Joe Biden' becomes 'Let’s hear $POLITICIAN', which is then provided to intent classifier 116 and the entity '$POLITICIAN = Joe Biden' is recorded. The intent classifier 116 then confidently classifies it as 'Turn on political news channel'.
[0018] It is important to understand some aspects of Artificial Intelligence (AI) technology and AI based devices, which can be explained as follows. Depending on the architecture of the implements, AI devices may include many components. One such component is an AI model or module. The AI model can be defined as reference or an inference set of data, which uses different forms of correlation matrices. Using these AI models and the data from these AI models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI models such as linear regression, naïve bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed and can be applied to any AI module irrespective of the AI model being executed. A person skilled in the art will also appreciate that the AI model may be implemented as a set of software instructions, combination of software and hardware or any combination of the same. The controller 110 makes use of AI modules which are NLU module 104, core module 108, NLG module 106 and the like.
[0019] Some of the typical tasks performed by AI systems are classification, clustering, regression etc. Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are, face recognition, object identification, gesture recognition, voice recognition etc. In a regression task, the model is trained based on labeled datasets, where the target labels are numeric values. Some of the typical applications of regressions are, Weather forecasting, Stock price predictions, House price estimation, energy consumption forecasting etc. Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities.
[0020] In accordance to an embodiment of the present invention, the controller 110 is provided with necessary signal detection, acquisition, and processing circuits. The controller 110 is the one which comprises input interface, output interfaces having pins or ports, the memory element such as Random Access Memory (RAM) and/or Read Only Memory (ROM), Analog-to-Digital Converter (ADC) and a Digital-to-Analog Convertor (DAC), clocks, timers, counters and at least one processor (capable of implementing machine learning) connected with each other and to other components through communication bus channels. The memory element 106 is pre-stored with logics or instructions or programs or applications or modules/models and/or threshold values/ranges, reference values, predefined/predetermined criteria/conditions, lists, knowledge sources which is/are accessed by the at least one processor as per the defined routines. The internal components of the controller 110 are not explained for being state of the art, and the same must not be understood in a limiting manner. The controller 110 may also comprise communication units such as transceivers to communicate through wireless or wired means such as Global System for Mobile Communications (GSM), 3G, 4G, 5G, Wi-Fi, Bluetooth, Ethernet, serial networks, and the like. The controller 110 is implementable in the form of System-in-Package (SiP) or System-on-Chip (SOC) or any other known types. Examples of controller 110 comprises but not limited to, microcontroller, microprocessor, microcomputer, etc.
[0021] Further, the processor may be implemented as any or a combination of one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored in the memory element and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The processor is configured to exchange and manage the processing of various AI modules.
[0022] According to an embodiment of the present invention, the controller 110 is part of at least one of an infotainment unit of the vehicle, a smartphone, a wearable device, a cloud computer. Alternatively, the conversational system 100 is at least one of the infotainment unit of the vehicle, the smartphone, the wearable device, the cloud computer, a smart speaker or a smart display and the like. In other words, the controller 110 is part of an internal device of the vehicle or part of external device which may or may not be connected to the vehicle through known wired or wireless means as described earlier.
[0023] In Fig. 1, the conversational system 100 is shown with the controller 110 having the input interface connected with the input means 102, an output interface connected with the output means 112. The controller 110 comprises the NLU module 104 and the NLG module 106 and the core module 108 which further processes the extracted entities and classified intents, and gives signal to NLG module 106 for generation of output. The core module 108 is state of the art and known to person skilled in the art. The NLU module 104 further comprises the entity extractor 114 and the intent classifier 116. The internal of the NLU module 104 is shown through a first block 126 which is as per the conventional solution. The first block 126 is prior art and is shown in dotted lines. In the first block 126, the entity extractor 114 and the intent classifier 116 take inputs from the speech to text converter or (Automatic Speech Recognition) independently. The entity extraction and intent classification do not influence each other and individually is fed to the core module 108 for further processing
[0024] A second block 128 is shown which shows the internal components of the NLU module 104 as per the present invention. The second block 128 is shown in solid lines as per the present invention. The entity extractor 114 and the intent classifier 116 are configured to be operated differently in comparison to conventional method. The output of the entity extractor 114 is given as input to the intent classifier 116. The extracted entity is used in the intent classification. The output from the entity extractor 114 and the intent classifier 116 is then given as input to the core module 108 for further processing.
[0025] According to the present invention, a working of the conversational system 100 is explained. The controller 110 allows entity metadata to influence intent classification, where external knowledge source such as Wikipedia (just for example) is used, to extract more data about entities and let that data influence the intent classification. In an example of classifying 'Let’s hear Joe Biden' under 'Turn on political news channel' intent and 'Let’s hear Eminem' under 'Start playing (Eminem) songs'. With Wikipedia’s stating occupation of Joe Biden as 'Politician' and that of Eminem as 'Singer', it is easy to differentiate the intent of these two highly overlapping sentences without having to come up with an extremely sophisticated solution or huge amount of training data. The examples are easily defined in training data as
Intent – Turn on political news channel Intent – Start playing songs
Let’s hear $POLITICIAN Let’s hear $SINGER
Start political news please Play some rock songs
[0026] Notice the difference in the first example. Instead of providing examples with all politicians and signers, an example with the abstract entity type is provide to the controller 110 during training. This drastically improves readability, maintainability, training speed, and so forth, while also improvising the classification accuracy. Moreover, constant updates on entities and their meta data are not required, as it can leverage existing knowledge sources. For example, after she ‘Draupadi Murmu’ became president of India, 'Let’s hear Draupadi Murmu' is classified to 'Turn on political news channel' with the help of knowledge source, without manually adding 'Draupadi Murmu' in the politician class of entity extractor data.
[0027] Fig. 2 illustrates a method of operating the conversational system, according to the present invention. The method comprises plurality of steps of which, a step 202 comprises receiving, by the controller 110, the conversational input from the input means 120. A step 204 comprises preprocessing, by the controller 110, the query through the Natural Language Understanding (NLU) module 104, processing, by the controller 110, the extracted entity and classified intent through the core module 108, and preparing, by the controller 110, the response through the Natural Language Generation (NLG) module 106. The NLU module 104 comprises entity extractor 114 and intent classifier 116 for preprocessing. A step 206 comprises providing the output through the output means 112 corresponding to the processed query. The method is characterized by the step 204 wherein preprocessing the query through the NLU module 104 comprises a step 208 which in turn comprises feeding the extracted entity as input for intent classification. The method is performed by the controller 110. As per the step 208, once the entity is extracted/identified in the sentence deciphered from the speech to text converter, the entity is replaced with respective class of a specific format and then the modified sentence is given as input to the intent classifier 116 for further processing.
[0028] According to the method, preprocessing the query comprises replacing the extracted entity with respective classes before feeding as input to the intent classifier 116. According to the present invention, the method comprises accessing the table/list 118 of entities and corresponding classes which are predefined and stored in the memory element of the controller 110. The method also comprises accessing at least one selected from a group 120 comprising a pretrained language model and entity extractor to provide recognized entities with respective classes. The method further comprises accessing the knowledge source 122 with predefined metadata types to replace entities with respective classes. The knowledge source 122 is at least one of the public source and the private source.
[0029] According to an embodiment of the present invention, an entity and metadata aware intent classification in conversational system 10 is disclosed. The controller 110 addresses this issue at its core, without majorly altering the existing architecture. This enables the use of proprietary, existing or third-party intent classifier 116 and entity extractors 114. Additionally, because the controller 110 allows entity and its metadata to influence intent classification, the training data does not have to contain exhaustive examples with different entities. This significantly reduces the burden of manual data engineering, saving the precious human hours of experts and allowing the new expert to handle the engineering without being overwhelmed by gigantic dataset, mostly of overlapping examples during the development of the conversational system 100. In turn, it also addresses the issue of over fitting intents to entities or word phrases. Besides, updating entities like adding more sports persons or restaurants, does not require retraining of the large, time-consuming intent classifier.
[0030] It should be understood that the embodiments explained in the description above are only illustrative and do not limit the scope of this invention. Many such embodiments and other modifications and changes in the embodiment explained in the description are envisaged. The scope of the invention is only limited by the scope of the claims.
, Claims:We claim:
1. A conversational system (100), said conversational system (100) comprises:
an input means (102) to receive a query from a user (124);
an output means (112) to provide a response to said query, and
a controller (110) to preprocess said query through a Natural Language Understanding (NLU) module (104), process an extracted entity and an intent through a core module (108), and prepare response through a Natural Language Generation (NLG) module (106), said NLU module (104) comprises an entity extractor (114) and an intent classifier (116) for preprocessing, characterized in that,
an output from said entity extractor (114) is given as input to the intent classifier (116).
2. The conversational system (100) as claimed in claim 1, wherein said input to said intent classifier (116) comprises a modified query having entities replaced with respective classes.
3. The conversational system (100) as claimed in claim 1, wherein said controller (110) accesses a table/list (118) of entities and corresponding classes which are predefined and stored in a memory element of said controller (110).
4. The conversational system (100) as claimed in claim 1, wherein said controller (110) configured to access at least one selected from a group (120) comprising a pretrained language model and entity extractor to provide recognized entities with respective classes.
5. The conversational system (100) as claimed in claim 1, wherein said controller (110) configured to access a knowledge source (122) with predefined metadata types to replace entities with respective classes, wherein said knowledge source (122) is at least one of a public source and a private source.
6. A method for operating a conversational system (100), the method comprising the steps of:
receiving a query from user (124) through an input means (102);
preprocessing said query through a Natural Language Understanding (NLU) module (104), processing an extracted entity and intent through a core module (108) and preparing a response through a Natural Language Generation (NLG) module (106), said NLU module (104) comprises an entity extractor (114) and an intent classifier (116), and
providing an output through an output means (112) corresponding to said processed query, characterized by,
preprocessing of said query through said NLU module (104) comprises feeding extracted entity as input for intent classification.
7. The method as claimed in claim 6, wherein preprocessing said query comprises replacing said extracted entity with respective classes before feeding as input to said intent classifier (116).
8. The method as claimed in claim 6 comprises accessing a table/list (118) of entities and corresponding classes which are predefined and stored in a memory element of a controller (110).
9. The method as claimed in claim 6 comprises accessing at least one selected from a group (120) comprising a pretrained language model and entity extractor to provide recognized entities with respective classes.
10. The method as claimed in claim 6 comprises accessing a knowledge source (122) with predefined metadata types to replace entities with respective classes, wherein said knowledge source (122) is at least one of a public source and a private source.
| # | Name | Date |
|---|---|---|
| 1 | 202341058663-POWER OF AUTHORITY [01-09-2023(online)].pdf | 2023-09-01 |
| 2 | 202341058663-FORM 1 [01-09-2023(online)].pdf | 2023-09-01 |
| 3 | 202341058663-DRAWINGS [01-09-2023(online)].pdf | 2023-09-01 |
| 4 | 202341058663-DECLARATION OF INVENTORSHIP (FORM 5) [01-09-2023(online)].pdf | 2023-09-01 |
| 5 | 202341058663-COMPLETE SPECIFICATION [01-09-2023(online)].pdf | 2023-09-01 |
| 6 | 202341058663-Power of Attorney [29-08-2024(online)].pdf | 2024-08-29 |
| 7 | 202341058663-Form 1 (Submitted on date of filing) [29-08-2024(online)].pdf | 2024-08-29 |
| 8 | 202341058663-Covering Letter [29-08-2024(online)].pdf | 2024-08-29 |