An Automatic Conversational System And Method For Operating The Same

< Back

An Automatic Conversational System And Method For Operating The Same

Abstract: AN AUTOMATIC CONVERSATIONAL SYSTEM AND METHOD FOR OPERATING THE SAME Abstract The automatic conversational system 100 comprises at least one microphone 102 to receive speech input from at least one user 120. A controller 110 connected to the at least one microphone 102 through the input interface 104 of a controller 110. The controller 110 configured to receive speech input through the at least microphone 102 and process the speech input through a speech module 112. The controller 110 further configured to estimate, through a context module 114, a context of a conversation identified in speech data obtained from the speech input, characterized in that, the controller 110 configured to store the estimated context in a propositional logic format, and determine through an analyzer module 116, an actionable context in the stored estimated context. The context module 114 is any one of a rule based model and a learning based model as known in the art. Figure 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

01 September 2023

Publication Number

10/2025

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

Bosch Global Software Technologies Private Limited

123, Industrial Layout, Hosur Road, Koramangala, Bangalore – 560095, Karnataka, India

Robert Bosch GmbH

Postfach 30 02 20, 0-70442, Stuttgart, Germany

Inventors

1. Khalpada Purvish

C/O Gordhan Vallabh, 2183, Shree Gokul Niwas, Near Statue of Gandhi, Aazad Chowk, Kapadwanj, Kheda, Gujarat – 387620, India

2. Karthikeyani Shanmuga Sundaram

3/58, AKG Nagar, Ponnalamman durai, Sethumadai(Po), Pollachi(Tk), Coimbatore – 642133, Tamilnadu, India

3. Swetha Shankar Ravisankar

Tower 4, 304 Salarpuria Sattva Cadenza Apartments, Near Nandi Toyota Office, Kudlu gate signal. Hosur Main Road, Benguluru – 560068, Karnataka, India

4. Arvind Devarajan Sankruthi

P-207, Purva, Bluemont,Trichy Road, Singanallur, Coimbatore – 641005, Tamilnadu, India

Specification

Description:Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed.
Field of the invention:
[0001] The present invention relates to an automatic conversational system and method for operating the same.

Background of the invention:
[0002] Many corporations, have products or solutions for continuous listening on device and detecting for prompting keywords or wake-up words at any place as an utterance. There are other solutions where using a list of phrases as wake words is advocated. For example, “increase ac” can be a wake phrase which solves two problems: waking up the system, provide the command. However, the user has to utter in a predetermined way.

[0003] A voice based virtual assistants have gain a significant popularity. After all, they offer a natural way of “communicating” the commands, instead of navigating plethora of hierarchical menus. It is more human. However, there is an inherent problem. When you are navigating the menus and clicking on something, the system can safely assume that you want system to execute the command. However, for a system, that only takes speech signals from microphone, it is challenging to understand if the user is commanding the system, talking to a friend, or practicing the delivery of a presentation.

[0004] Existing voice assistants deploy various interesting mechanisms to allow the user to express if the vocal command is intended for the system. Such mechanisms include using a physical button to start the service. Once the user presses the button, next whatever is spoken is an intended command for the system. Many automobiles on the road have this button on the steering wheel. For example, the user presses the button and utters “Play radio”. This is more of a hybrid approach. You still press the button, before giving out the command. This is also one of the simplest mechanisms.
[0005] Although, it is improvising the user experience, compared to the conventional system of hierarchical menus, it is still not as seamless as the human conversations can get.

[0006] There are solutions which use only a wake word. This is like pressing a button, but instead of pressing a physical button, user utters a specific phrase, like “Okay Google”, “Hey Alexa”, etcetera, to ask the system to take the following utterance as the intended command. For example, the user utters “Hey Google, what’s the weather like today?”. This approach translates the physical button to a vocal utterance. So, the user can activate the system with the same medium (speech instead of physical button). However, it is still away from the way humans communicate. The system requires specific phrase as a predecessor to the command.

[0007] In yet another solution buffering data is processed. With this approach, the systems keep buffering prefixed amount of data and keeps on analyzing it to find if there is any wake word. If it finds any wake word, it detects the speech segment in the current buffer to identify the command. If not, it clears the buffer and starts the processes again, for the new speech data coming in. This approach allows more flexibility compared to the previous ones. User can now say, “Play Tum Tum, Alexa!” or “I want to listen Tum Tum, Alexa! Can you please play it?” instead of a rigid preceding wake word structure like “Hey Alexa! Play Tum Tum.” However, the user still must utter the phrase for the system to act upon the command.

[0008] The above mechanisms still rely majorly on the user explicitly expressing the intention, either with the physical button or wake phrase. Most of the conversational system resides on clouds. So, when the user utters a command afterwards, the command data (either raw speech or text) is sent to the respective server in cloud. The server then interprets the commands, performs necessary actions, and sends back the response along with any action necessary on edge (for example, increasing the volume). The deployment of conversational system on cloud necessitates the system to send the command data to cloud. Physical button or wake phrase allows system to know from when to start the data. Without the wake word or such physical button, the system would require sending all the speech / text data, including the user’s private conversations with other humans or with the self, which violates the right to privacy (and probably a lot of compliances, like GDPR).

[0009] According to a patent literature US2019279624, a voice command processing without a wake word is disclosed. Audio is recorded by the computer system to form a recorded audio. The computer system then determines whether a voice command spoken by a first person is present in the recorded audio. If the voice command is present in the recorded audio, the computer system determines whether the voice command is directed to a second person by the first person. If the voice command is not being directed to the second person, the computer system processes the voice command, wherein processing of the voice command occurs without a wake word.

Brief description of the accompanying drawings:
[0010] An embodiment of the disclosure is described with reference to the following accompanying drawings,
[0011] Fig. 1 illustrates a block diagram of an automatic conversational system, according to an embodiment of the present invention, and
[0012] Fig. 2 illustrates a method of operating the automatic conversational system, according to the present invention.

Detailed description of the embodiments:
[0013] Fig. 1 illustrates a block diagram of an automatic conversational system, according to an embodiment of the present invention. A conversational system 100 facilitates contextual conversation with a user 120. The conversational system 100 comprises the controller 110 with an input interface 104 and an output interfac108. On the other hand, the automatic conversational system 100 provides an option to the user 120 to enable conversation without any keyword or wake-up word as explained below. The automatic conversational system 100 comprises at least one microphone 102 to receive speech input from at least one user 120. A controller 110 connected to the at least one microphone 102 through the input interface 104 of a controller 110. The controller 110 configured to receive speech input through the at least microphone 102 and process the speech input through a speech module 112. The controller 110 further configured to estimate, through a context module 114, a context of a conversation identified in speech data obtained from the speech input, characterized in that, the controller 110 configured to store the estimated context in a propositional logic format, and determine through an analyzer module 116, an actionable context in the stored estimated context. The context module 114 is any one of a rule based model and a learning based model as known in the art. Further, it is to be noted that Automatic Speech Recognition or Speech-to-Text conversion is also performed, but the same is not explained for being state of the art.

[0014] According to an embodiment of the present invention, an always-on microphone 102 is used to listen to the conversations within an environment. The at least one microphone 102 collects the speech data and passes to the controller 110, where a speech module 112 (or speech processor), continuously analyzes and processes the incoming speech data. The speech module 112 performs speech processing like speaker diarization, recognition, tonal and emotion analysis, and speech to text conversion. The speech module 112 transcribes the speech data with additional information, such as speaker identity (who spoke what when multiple users 120 are present in the environment), emotion and tonal information, along with text utterances and sends the data to the context module 114.

[0015] The speech input refers to dialogue or utterances in the environment with one or more users 120. The context module 114 uses hybrid (rule based + learning based) model to process the incoming text data, along with conversation history 126, to estimate the context of the ongoing conversation. The context module 114 expresses and stores the identified context in propositional logic format, which allows easy inferences. Further, the context module 114 takes input from context history 122, user preference 124 and conversation history 126 in addition to the speech input. The speech module 112 also check for known voices 128 in the speech input for better processing.

[0016] According to an embodiment of the present invention, the controller 110 applies rule of inference to the estimated context stored in propositional logic format to determine the actionable context. The actionable context is compared with a predetermined list, stored in the controller 110, and categorized into an active actionable context and a passive actionable context followed by providing an active response and passive response respectively. The passive response is just to provide some information to the user. The active response is alteration of a parameter of the environment through a control signal 118. For example, the active response (or the control signal 118) in an automobile environment comprises turn ON/OFF sound or music, volume Up/Down, wiper control, AC ON/OFF, temperature increase or decrease, window open or close, sunroof open or close and the like. An active response in a non-automotive environment such as home, office comprises turn ON/OFF music, AC ON/OFF, temperature increase/decrease and control of other smart devices connected in smart environment.

[0017] In other words, the passive response is where the system is not intended to act upon immediately. For example, for an automobile environment, responses to booking a show ticket or checking the election result in a region (such as India) may be considered passive. The active response is where the system is intended to act upon immediately. For example, for an automobile environment, the active response comprises controlling air conditioning system, music system, sunroof, etcetera. For a health care environment, the active response comprises calling out doctor, responding to health queries / worries, etc.

[0018] According to an embodiment of the present invention, the controller 110 configured to perform an action in relation to the determined actionable context. The rule of inference and the actionable context are dependent on application domain. The application domain corresponds to the environment in which the conversational system 100 is deployed such as the automotive domain, home, office, hospital, hospitality, etc. Further, the user 120 in the environment is not just one, but one or more user 120 who are in proximity to the at least one microphone 102.

[0019] It is important to understand some aspects of Artificial Intelligence (AI)/ Machine Learning (ML) technology and AI/ML based devices/systems (such as automatic conversational system 100), which can be explained as follows. Depending on the architecture of the implements, AI/ML devices/system may include many components. One such component is an AI/ML model or AI/ML modules. Different modules are described later in this disclosure. The AI/ML model can be defined as reference or an inference set of data, which uses different forms of correlation matrices. Using these AI/ML models and the data from these AI/ML models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI/ML models such as linear regression, naïve bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed and can be applied to any AI/ML module irrespective of the AI/ML model being executed. A person skilled in the art will also appreciate that the AI/ML model may be implemented as a set of software instructions, combination of software and hardware or any combination of the same.

[0020] Some of the typical tasks performed by AI/ML systems are classification, clustering, regression etc. Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are, face recognition, object identification, gesture recognition, voice recognition etc. In a regression task, the model is trained based on labeled datasets, where the target labels are numeric values. Some of the typical applications of regressions are, Weather forecasting, Stock price predictions, House price estimation, energy consumption forecasting etc. Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities.

[0021] In accordance to an embodiment of the present invention, the controller 110 is provided with necessary signal detection, acquisition, and processing circuits. The controller 110 is the one which comprises input interface 104, output interfaces 108 having pins or ports, the memory element 106 such as Random Access Memory (RAM) and/or Read Only Memory (ROM), Analog-to-Digital Converter (ADC) and a Digital-to-Analog Convertor (DAC), clocks, timers, counters and at least one processor (capable of implementing machine learning) connected with each other and to other components through communication bus channels. The memory element 106 is pre-stored with logics or instructions or programs or applications or modules/models and/or threshold values/ranges, reference values, predefined/predetermined criteria/conditions, predetermined lists, which is/are accessed by the at least one processor as per the defined routines. The internal components of the controller 110 are not explained for being state of the art, and the same must not be understood in a limiting manner. The controller 110 may also comprise communication units such as transceivers to communicate through wireless or wired means such as Global System for Mobile Communications (GSM), 3G, 4G, 5G, Wi-Fi, Bluetooth, Ethernet, serial networks, and the like. The controller 110 is implementable in the form of System-in-Package (SiP) or System-on-Chip (SOC) or any other known types. Examples of controller 110 comprises but not limited to, microcontroller, microprocessor, microcomputer, etc.

[0022] Further, the processor may be implemented as any or a combination of one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored in the memory element 106 and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The processor is configured to exchange and manage the processing of various AI models.

[0023] According to an embodiment of the present invention, the controller 110 is part of at least one of an infotainment unit of the vehicle, a smartphone, a wearable device, a cloud computer. Alternatively, the conversational system 100 is at least one of the infotainment unit of the vehicle, the smartphone, the wearable device, the cloud computer, a smart speaker, or a smart display and the like. In other words, the controller 110 is part of an internal device of the vehicle or part of external device which is connected to the vehicle through known wired or wireless means as described earlier or an external device to be used in non-automotive environment such as home, office, hospitals, etc.

[0024] According to the present invention, a working of the automatic conversational system 100 is explained considering an example conversation with the conversation system 100. The same must not be understood in limiting manner. The controller 110 enables the automatic conversational system 100 to trigger the personalized conversation based on the context of ongoing conversation or dialogue or query without explicit keyword or wake-up word. Consider the following scenario where the conversational system 100 is triggering a conversation.
User 120 This summer, Bangalore is too hot. It had never been this hot.
Friend Yeah. Previously it always used to rain that evening when it would be this hot.
User 120 So true. Global warming, I tell you. Even right now, we are sweating.
Controller 110 Hey! I can help you feel better. Do you want me to improvise AC cooling?
User 120 Oh, please do!
Controller 110 Okay! AC is now set to 21 degrees, and I have increased the fan speed to 3. By the way, it is expected to rain next Friday.

[0025] In the above dialogue flow, the controller 110 makes the automatic conversational system 100 perceivably more intelligent, especially when compared to existing voice assistants. The controller 110 understands the context of the conversation well, and “joins” the conversation and acts upon it. At first, the speech module 112 processes the incoming audio or speech input through at least one microphone 102. The at least one microphone 102 is part of the environment such as cabin of an automobile or a home or office in which case the at least one microphone 102 is positioned at strategic locations. The speech module 112 performs speech processing like speaker diarization, recognition, tonal and emotion analysis, and speech to text conversion and transcribes the speech data with additional information, like speaker identity (who spoke what, like the above example table), emotion and tonal information, along with text utterances. The processed data of the speech input is sent to the context module 114 (or the context processor). The context module 114 processes the speech data and expresses/stores the context in propositional logic format. For the example conversation depicted in the table, the controller 110 estimates the conversation context chain before joining the conversation to be the following.
heat(Bangalore)?!rain(Bangalore)?sweating(User1)

[0026] The estimated context is passed to the analyzer module 116. The analyzer module 116 looks for actionable contexts to which contribution is possible either through actively or passively. The analyzer module 116 uses rules of inferences on the estimated context stored in the format of propositional logic. For example, if user 120 is sweating, then the user 120 may be feeling hot, and if someone is feeling hot, an action is possible such as decreasing temperature of the Air Conditioner (AC). Such rules of inference and what possible actions can be taken are defined vary largely by the domain in which the automatic conversational system 100 is deployed. For a virtual assistant in an automobile or home, the controller adjusts cooling of the AC. Therefore, the analyzer module 116, on finding the actionable context, that is sweating(User1), sends the prompt to automatic conversational system 100. The conversational system 100 determines the appropriate action and responds.

[0027] In the above example, the controller 110 differentiates first propositional context: “!rain(Bangalore)” and a second propositional context: “sweating(User1)” based on actionability. The first propositional context is where the controller 110 is able to act passively (if user’s preference allows passive actions), while the second propositional context is where the controller 110 is able to act actively. The differentiation of active and passive actions is defined based on the domain of the conversational system 100. For example, for a weather assistant, the questioning on rain might be an active actionable context, while the sweating information can be a passive context.

[0028] Hence, based on the user preferences 124, the conversational system 100 may respond (send the prompt to conversational system) to the passive context, passively respond (if there is any active context coming in within threshold turns, or randomly, although with less likelihood, responding on a pause of ongoing conversation) to the passive context, and ignore the passive context.

[0029] According to an embodiment of the present invention, the automatic conversational system 100 is deployed in the vehicle. The controller 110 is either the infotainment control unit or any other control unit. Alternatively, the input from the at least one microphone 102 is collected in the vehicle and then transmitted to the cloud, where the cloud is the controller 110. In yet another alternative, the control unit internal to the vehicle and the external cloud both are used to process the speech input in sharing manner based on the current processing load. The data is processed and sends only the processed data to the cloud for determination of actionable context. The actionable context is then received in the vehicle which is then executed such as increasing the temperature or turning off the AC, etc. In yet another alternative, the entire processing of speech input and determination of actionable context is performed within the vehicle.

[0030] According to an embodiment of the present invention, the automatic conversational system 100 understands the user preferences 124. If the user prefers that the automatic conversational system 100 may only be active when explicitly provoked, it can stop the continuous listening process and can only be active when it detects the preset phrase. Thus, the present invention enables the selective use of the automatic conversational system 100.

[0031] Also, the controller 110 of the automatic conversational system 100 utilizes rules of inference and (higher order) propositional logic to understand and process the conversational context. Compared to language models, these are faster and lighter allowing it to be deployed on a not-so-sophisticated edge hardware.

[0032] Fig. 2 illustrates a method of operating the automatic conversational system, according to the present invention. The method comprises plurality of steps of which, a step 202 comprises processing, through the speech module 112 of the controller 110, the speech input received from at least one microphone 102. The speech or utterances in the defined environment is received by the at least one microphone 102. A step 204 comprises estimating, through the context module 114 of the controller 110, the context of the ongoing conversation identified/detected in the speech data obtained from the speech input. The context module 114 takes input from the context history 122, the user preference 124 and the conversation history 126, based on the availability, in addition to the speech input. The speech module 112 also check for known voices 128 in the speech input for better processing. The method is characterized by a step 206 which comprises storing, by the controller 110, the estimated context in propositional logic format. A step 208 comprises determining, through the analyzer module 116 of the controller 110, at least one actionable context in the estimated context. The actionable context is compared with the predetermined list and categorized into the active actionable context and the passive actionable context. The method is executed by the controller 110.

[0033] According to the method, a step 210 comprises applying rule of inference to the estimated context stored in propositional logic format for determining the at least one actionable context. The rule of inference and the actionable context are dependent on application domain. The application domain corresponds to the environment in which the conversational system 100 is deployed such as the automotive domain, home, office, hospital, hospitality, etc. A step 212 comprises performing the action in relation to the determined actionable context.

[0034] According to the method, the action is at least one of joining the ongoing conversation, prompting the user 120 for input followed by performing the action, and performing the action followed by prompting the user 120 for confirmation/affirmation and rollback if not confirmed. For example, if the conversational system 100 determines that the AC temperature is to be increased, then the controller 110 either prompts to the user 120 for increasing the temperature and increases the temperature after confirmation either through voice or button, or increases the temperature first and then prompts the user 120 if the temperature is fine. If the user response is negative, the controller 110 rolls back the temperature to previous set temperature. This is just an example, and not to be understood in limiting manner.

[0035] According to an embodiment of the present invention, the automatic conversational system 100 is used for a vehicle to provide more convenience to the driver or passengers. The conversational system 100 may also be referred to as digital companion or virtual companion which is more than a digital assistant in a manner that the conversational system 100 is able to extract/deriver and give more information for a detected or asked query. Again as indicated above, the automatic conversational system 100 is applicable for different domains and environments such as home, office, hospital, airports, hospitality industry and the like.

[0036] According to the present invention, the automatic conversational system 100 and method enables responding to intended prompts without wakeup keywords with only speech input. The present invention listens continuously and understands the actionable context and does not require keywords. The present invention goes beyond that by not relying on the syntax of the utterance, but on the context of the ongoing conversation and the utterance. The present invention uses intelligence to understand if the automatic conversational system 100. Further, the automatic conversational system 100 provides flexibility to compute the context and actionability on the edge or on the cloud or both, without leaking data to the cloud and allowing to listen continuously. Additionally, the ability to grasp the intended conversation without being explicitly told, takes the current state of virtual assistants closer to the “human way” and being a personal companion. The present invention facilitates understanding of “actionability” and immediate contribution to the on-going conversation.

[0037] It should be understood that the embodiments explained in the description above are only illustrative and do not limit the scope of this invention. Many such embodiments and other modifications and changes in the embodiment explained in the description are envisaged. The scope of the invention is only limited by the scope of the claims.

, Claims:We claim:
1. An automatic conversational system (100), said automatic conversational system (100) comprises:
at least one microphone (102) to receive speech input from at least one user (120), and
a controller (110) connected to said at least one microphone (102) and configured to,
process said speech input through a speech module (112);
estimate, through a context module (114), a context of a conversation identified in speech data obtained from said speech input, characterized in that,
store said estimated context in propositional logic, and
determine, through an analyzer module (116), actionable context in said stored estimated context.

2. The automatic conversational system (100) as claimed in claim 1, wherein controller (110) applies rule of inference to said estimated context stored in propositional logic format to determine said actionable context.

3. The automatic conversational system (100) as claimed in claim 1, wherein said controller (110) configured to perform an action in relation to said determined actionable context, wherein said rule of inference, and said actionable context are dependent on application domain.

4. The automatic conversational system (100) as claimed in claim 1, wherein said actionable context is compared with a predetermined list and categorized into an active actionable context and a passive actionable context.

5. The automatic conversational system (100) as claimed in claim 1, wherein said context module (114) takes input from context history (122), user preference (124) and conversation history (126) in addition to said speech input.

6. A method for operating an automatic conversational system (100), said method comprising the steps of:
processing, through a speech module (112), a speech input received from at least one microphone (102);
estimating, through a context module (114), a context of an ongoing conversation identified in said speech data obtained from said speech input, characterized by,
storing said estimated context in propositional logic, and
determining, through an analyzer module (116), at least one actionable context in said estimated context.

7. The method as claimed in claim 6 comprises applying rule of inference to said estimated context stored in propositional logic for determining the at least one actionable context, wherein said rule of inference and said actionable context are dependent on application domain.

8. The method as claimed in claim 6 comprises performing an action in relation to said determined actionable context, said action is at least one of joining the ongoing conversation, prompting said user (120) for input followed by performing the action, and performing an action followed by prompting said user (120) for confirmation.

9. The method as claimed in claim 8, wherein said actionable context is compared with list of predetermined items and categorized into an active actionable context and a passive actionable context.

10. The method as claimed in claim 6, wherein said context module (114) takes input from a context history (122), a user preference (124) and conversation history (126) in addition to said speech input.

Documents

Application Documents

#	Name	Date
1	202341058667-POWER OF AUTHORITY [01-09-2023(online)].pdf	2023-09-01
2	202341058667-FORM 1 [01-09-2023(online)].pdf	2023-09-01
3	202341058667-DRAWINGS [01-09-2023(online)].pdf	2023-09-01
4	202341058667-DECLARATION OF INVENTORSHIP (FORM 5) [01-09-2023(online)].pdf	2023-09-01
5	202341058667-COMPLETE SPECIFICATION [01-09-2023(online)].pdf	2023-09-01
6	202341058667-Power of Attorney [29-08-2024(online)].pdf	2024-08-29
7	202341058667-Form 1 (Submitted on date of filing) [29-08-2024(online)].pdf	2024-08-29
8	202341058667-Covering Letter [29-08-2024(online)].pdf	2024-08-29