Abstract: A voice assistant system designed for understanding and executing commands provided in a plurality of languages and dialects. The system comprises a microphone for capturing spoken commands, a Natural Language Processing (NLP) unit utilizing a hybrid of Deep Neural Networks and Hidden Markov Models for converting speech into text while discerning regional accents and filtering ambient noise. The NLP unit includes a Natural Language Understanding component capable of interpreting elliptical and anaphoric expressions. A machine learning unit processes the textual commands to identify the user's intent and context through a sequence-to-sequence prediction model. Based on the derived intent and context, an execution module carries out the specified tasks. This invention represents a significant advancement in the field of interactive voice-responsive systems by improving accuracy, understanding, and operational capabilities across diverse linguistic environments. Drawings / FIG. 1 / FIG. 2 / FIG. 3
Description:.
ADVANCED VOICE ASSISTANT SYSTEM WITH CONTEXT-AWARE PROCESSING AND ADAPTIVE NOISE FILTERING
Field of the Invention
This disclosure relates to the field of voice assistant systems. Specifically, it concerns systems that interpret and act on voice commands across a range of languages and dialects, employing advanced Natural Language Processing (NLP) and machine learning techniques to understand and execute a variety of tasks.
Background
The background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
In the realm of human-computer interaction, voice assistant systems have emerged as a critical technology enabling users to interact with devices using natural language commands. These systems rely on sophisticated Natural Language Processing (NLP) and machine learning technologies to interpret and execute commands. Traditional systems often struggle with recognizing and understanding voice commands in various languages and dialects, particularly in noisy environments or when commands include regional accents, elliptical and anaphoric expressions. Moreover, the ability to discern the user's intent and context accurately from the spoken commands has been a persistent challenge, affecting the system's responsiveness and effectiveness.
Another well-acknowledged limitation of existing systems lies in their reliance on models that are not adequately equipped to handle the wide variability in human speech, including differences in pronunciation, syntax, and semantics across languages and dialects. Such systems may fail to accurately convert spoken commands into text, leading to misunderstandings or incorrect execution of tasks.
Furthermore, the application of machine learning in processing and understanding the context and intent behind voice commands has been traditionally constrained by models that do not adapt well to the unpredictability of natural language. This limitation often results in an inability to accurately execute tasks as intended by the user, especially when commands are complex or involve nuanced linguistic elements.
In light of the above discussion, there exists an urgent need for solutions that overcome the problems associated with conventional systems and/or techniques for understanding and executing voice commands across a diverse linguistic landscape.
Summary
The following presents a simplified summary of various aspects of this disclosure in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements nor delineate the scope of such aspects. Its purpose is to present some concepts of this disclosure in a simplified form as a prelude to the more detailed description that is presented later.
The following paragraphs provide additional support for the claims of the subject application.
The In a first aspect, the present disclosure aims to provide a voice assistant system capable of interpreting and executing commands received in a multitude of languages and dialects. The system is equipped with a microphone to capture spoken commands, and a Natural Language Processing (NLP) unit that employs a sophisticated hybrid model, combining Deep Neural Networks with Hidden Markov Models, for translating these commands into text. The NLP unit is specially configured to identify regional accents, employ a speech recognition component with an acoustic model for filtering ambient noise, and use a Natural Language Understanding (NLU) component with a context-aware algorithm for accurately determining intent from elliptical and anaphoric expressions. A machine learning unit processes the converted text to ascertain the intent and context using a sequence-to-sequence prediction model. Tasks are then executed based on the identified intent and context by an execution module.
Further, the system introduces an acoustic model capable of adjusting its weight based on the level of detected environmental noise to enhance the clarity of voice commands. Additionally, the sequence-to-sequence prediction model benefits from a bidirectional LSTM architecture for improved prediction accuracy. The NLU component also conducts sentiment analysis to tailor responses in alignment with the perceived emotions of the user. The inclusion of a feedback loop within the NLP engine permits the continual refinement and retraining of the hybrid model, leveraging real-time user command inputs for system enhancement. Task prioritization and execution by the execution module are informed by the urgency and context derived from the user's spoken commands. Security is bolstered through an authentication module that utilizes voice biometrics to verify user identity, thereby safeguarding access to personalized tasks and information.
Brief Description of the Drawings
The features and advantages of the present disclosure would be more clearly understood from the following description taken in conjunction with the accompanying drawings in which:
FIG. 1 illustrates a block diagram of a voice assistant system (100), in accordance with the embodiments of the present disclosure.
FIG. 2 illustrates a method (200) for operating the voice assistant system (100) is detailed, in accordance with the embodiments of the present disclosure.
FIG. 3 illustrates a flowchart outlining the operational sequence of a voice-activated system that utilizes a Natural Language Processing (NLP) engine.
Detailed Description
In the following detailed description of the invention, reference is made to the accompanying drawings that form a part hereof, and in which is shown, by way of illustration, specific embodiments in which the invention may be practiced. In the drawings, like numerals describe substantially similar components throughout the several views. These embodiments are described in sufficient detail to claim those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims and equivalents thereof.
The use of the terms “a” and “an” and “the” and “at least one” and similar referents in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The use of the term “at least one” followed by a list of one or more items (for example, “at least one of A and B”) is to be construed to mean one item selected from the listed items (A or B) or any combination of two or more of the listed items (A and B), unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
Pursuant to the "Detailed Description" section herein, whenever an element is explicitly associated with a specific numeral for the first time, such association shall be deemed consistent and applicable throughout the entirety of the "Detailed Description" section, unless otherwise expressly stated or contradicted by the context.
The term "voice assistant system" as used throughout the present disclosure relates to a system designed to perform tasks in response to voice commands. This system integrates various components to understand and act upon user inputs delivered vocally across multiple languages and dialects.
The term "microphone" as used throughout the present disclosure relates to a device for receiving spoken commands. This microphone is capable of capturing audio across a plurality of languages and dialects, serving as the primary interface for user interaction with the voice assistant system.
The term "Natural Language Processing (NLP) unit" as used throughout the present disclosure relates to a component that employs a hybrid model combining Deep Neural Networks and Hidden Markov Models for converting spoken commands into textual format. This unit is further configured to discern regional accents, utilize a speech recognition component with an acoustic model tailored to filter ambient noise, and employ a Natural Language Understanding (NLU) component with a context-aware algorithm capable of handling elliptical and anaphoric expressions for intent determination.
The term "machine learning unit" as used throughout the present disclosure relates to a component that processes the textual format of spoken commands to extract the intent and context. This unit employs a sequence-to-sequence prediction model for accurately identifying the user's intentions and the context of the commands.
The term "execution module" as used throughout the present disclosure relates to a component that executes tasks based on the intent and context derived from the voice commands. This module ensures the effective implementation of actions as dictated by the user's inputs.
FIG. 1 illustrates a block diagram of a voice assistant system (100), in accordance with the embodiments of the present disclosure. The block diagram of a voice assistant system (100) is depicted for executing tasks based on voice commands. The system (100) includes a microphone (102), a Natural Language Processing (NLP) unit (104), a machine learning unit (106), and an execution module (108). The microphone (102) is configured to receive spoken commands across a plurality of languages and dialects. Said spoken commands are captured by said microphone (102) and conveyed to the NLP unit (104) for processing. Within the NLP unit (104), a hybrid model comprising Deep Neural Networks and Hidden Markov Models is utilized to convert the spoken commands into a textual format. Said hybrid model is further configured to discern regional accents and to employ a speech recognition component that uses an acoustic model tailored to filter ambient noise from the received spoken commands. Subsequent to the conversion of spoken commands into text, the NLP unit (104) analyzes the text utilizing a Natural Language Understanding (NLU) component. The machine learning unit (106), which is equipped with a sequence-to-sequence prediction model, processes the analyzed text to extract the intent and context. Finally, the execution module (108) is responsible for executing tasks that correlate with the extracted intent and derived context.
In an embodiment, the acoustic model within the Natural Language Processing (NLP) unit (104) is dynamically adjusted based on the detected environmental noise level to optimize voice command clarity. By modifying the weight of the acoustic model dynamically, the system ensures that the clarity of voice commands through the microphone (102) is maintained, significantly reducing errors in command recognition even in fluctuating noise environments. This adaptability enhances the system's performance, offering a robust solution to the challenge of voice command recognition in diverse acoustic settings.
In another embodiment, the sequence-to-sequence prediction model employed by the machine learning unit (106) utilizes a bidirectional Long Short-Term Memory (LSTM) architecture. The bidirectional LSTM architecture facilitates a comprehensive analysis of the textual data processed by the NLP unit (104), allowing for a more nuanced understanding of user intent and context. This architectural approach enhances the system's capability to accurately interpret and respond to complex commands, thereby improving the interaction experience for the user.
In a further embodiment, the Natural Language Understanding (NLU) component within the NLP unit (104) executes sentiment analysis to adjust responses based on perceived user emotions. This incorporation of sentiment analysis allows the system to personalize responses by aligning them with the emotional tone detected in the user's spoken commands received through the microphone (102). This level of response customization enhances user engagement and satisfaction, illustrating the system's advanced capacity for empathetic interaction.
In an additional embodiment, the NLP engine includes a feedback loop to refine and retrain the hybrid model using real-time user command inputs captured by the microphone (102). This continuous improvement mechanism leverages live interaction data to enhance the system's accuracy and responsiveness. By incorporating user feedback directly into the model's training process, the system remains adaptable and effective in meeting evolving user needs.
In another embodiment, the execution module (108) is configured to prioritize and manage task execution based on urgency and context derived from the user's spoken commands captured by the microphone (102). This capability ensures that tasks deemed urgent by the system, based on the analysis performed by the machine learning unit (106), are executed promptly, optimizing the system's efficiency and responsiveness to the user's immediate needs.
In a further embodiment, the system incorporates an authentication module to verify the identity of the user based on voice biometrics, securing access to personalized tasks and information. This module enhances the security framework of the system by utilizing unique voice characteristics to authenticate user identity, ensuring that sensitive data and tasks are accessible only to authorized individuals.
FIG. 2 illustrates a method (200) for operating the voice assistant system (100) is detailed, in accordance with the embodiments of the present disclosure. The method (200) pertains to the operation of a voice assistant system (100) designed to execute tasks upon receiving voice commands. In step (202) the method (200) involves the reception of spoken commands through a microphone (102). The received commands are diverse, encapsulating a spectrum of languages and dialects. In step (204) processing of these commands occurs in the Natural Language Processing (NLP) unit (104), which employs a hybrid model integrating Deep Neural Networks with Hidden Markov Models. Within said NLP unit (104), the hybrid model performs several functions: it converts the spoken commands into text, discerns regional accents to enhance comprehension accuracy, and utilizes a speech recognition component. This component is further refined by an acoustic model, which is specifically configured to diminish the impact of ambient noise, thereby augmenting the clarity of the spoken commands. The method (200) then advances to the analysis of the converted text. Step (206) is conducted using a Natural Language Understanding (NLU) component within the NLP unit (104), ensuring a sophisticated interpretation of the text. In step (208), a machine learning unit (106) processes the text to extract the user’s intent and the context of the commands, employing a sequence-to-sequence prediction model for this purpose. The culmination of the method (200) is reached when the execution module (108) in step (210) implements tasks corresponding to the extracted intent and context. This execution confirms the system's capability to translate vocal user interactions into responsive actions, completing the operational cycle of the voice assistant system (100).
In an embodiment, the present disclosure provides an AI voice assistant that utilizes advanced Natural Language Processing (NLP) technique for understanding and responding to voice commands across various languages and dialects. This assistant integrates Speech Recognition, Natural Language Understanding (NLU), and Natural Language Generation (NLG) within a comprehensive NLP framework. Machine learning models, which are continually trained on diverse datasets, enhance the assistant's accuracy and adaptability. The AI voice assistant can be used across multiple devices and platforms, facilitating seamless digital interactions and making task management more accessible and efficient for users globally. This innovation not only improves the intuitiveness of interacting with technology but also significantly advances the capabilities of AI-driven voice assistants.
The AI voice assistant system provides various advantages such as: (a) enhanced accuracy in voice recognition, enabling the assistant to understand commands in multiple languages and dialects with minimal errors, (b) sophisticated contextual understanding and response capabilities that allow the assistant to comprehend the intent behind user commands, ensuring relevant and precise interactions, (c) adaptability to user preferences, where the assistant learns from individual interactions to tailor responses based on user accents and speech patterns, (c) multilingual support that accommodates a diverse range of languages, thereby expanding accessibility to non-English speakers, (d) versatile integration capabilities that allow the assistant to operate across different digital platforms and devices, providing a consistent user experience, and many more.
FIG. 3 illustrates a flowchart outlining the operational sequence of a voice-activated system that utilizes a Natural Language Processing (NLP) engine. The sequence begins with voice input received through a microphone. Following the initial voice input, the NLP engine converts the spoken words into text. Once the user's intent is understood, the model moves to the next step, which involves recognizing the specific command given by the user. The recognition of the command triggers the execution phase, where the system performs the task corresponding to the recognized command. After successfully completing the task, the system conveys the result or output through a speaker to the user. The flowchart also indicates a loopback to the step where the model recognizes what the user needs. This looping is conditional and continues until the system receives a voice command to "exit." Essentially, this loop allows for continuous interaction between the user and the system until the user decides to terminate the session by issuing an "exit" command.
Example embodiments herein have been described above with reference to block diagrams and flowchart illustrations of methods and apparatuses. It will be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, respectively, can be implemented by various means including hardware, software, firmware, and a combination thereof. For example, in one embodiment, each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations can be implemented by computer program instructions. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block or blocks.
Throughout the present disclosure, the term ‘processing means’ or ‘microprocessor’ or ‘processor’ or ‘processors’ includes, but is not limited to, a general purpose processor (such as, for example, a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a microprocessor implementing other types of instruction sets, or a microprocessor implementing a combination of types of instruction sets) or a specialized processor (such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), or a network processor).
The term “non-transitory storage device” or “storage” or “memory,” as used herein relates to a random access memory, read only memory and variants thereof, in which a computer can store data or software for any duration.
Operations in accordance with a variety of aspects of the disclosure is described above would not have to be performed in the precise order described. Rather, various steps can be handled in reverse order or simultaneously or not at all.
While several implementations have been described and illustrated herein, a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein may be utilized, and each of such variations and/or modifications is deemed to be within the scope of the implementations described herein. More generally, all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific implementations described herein. It is, therefore, to be understood that the foregoing implementations are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, implementations may be practiced otherwise than as specifically described and claimed. Implementations of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present disclosure.
Claims
I/We claim:
A voice assistant system (100) for performing tasks based on voice commands, comprising:
a microphone (102) for receiving spoken commands in a plurality of languages and dialects;
a Natural Language Processing (NLP) unit (104) utilizes a hybrid model combining Deep Neural Networks and Hidden Markov Models to convert spoken commands into text, further configured to:
discern regional accents,
utilize a speech recognition component using an acoustic model tailored to filter ambient noise,
utilize a Natural Language Understanding (NLU) component with a context-aware algorithm capable of handling elliptical and anaphoric expressions for intent determination;
a machine learning unit (106) process the text to extract the intent and context employing a sequence-to-sequence prediction model; and
an execution module (108) execute the tasks based on the intent and derived context.
The system (100) of claim 1, wherein the acoustic model is dynamically adjust weight based on detected environment noise level to optimize voice command clarity.
The system (100) of claim 1, wherein the sequence-to-sequence prediction model utilizes a bidirectional LSTM architecture.
The system (100) of claim 1, wherein the NLU component executes a sentiment analysis to adjust responses based on perceived user emotions.
The system (100) of claim 1, additionally comprising a feedback loop within the NLP engine to refine and retrain the hybrid model using real-time user command inputs.
The system (100) of claim 1, wherein the execution module (108) prioritize and manage task execution based on urgency and context derived from the user's spoken commands.
The system (100) of claim 1, further including an authentication module to verify the identity of the user based on voice biometrics to secure access to personalized tasks and information.
A method (200) for operating a voice assistant system (100) to perform tasks based on voice commands, the method (200) comprising the steps of:
Receiving spoken commands through a microphone (100), wherein the commands are delivered in a plurality of languages and dialects;
Processing the spoken commands in a Natural Language Processing (NLP) unit (104), wherein the NLP unit (104) utilizes a hybrid model that combines Deep Neural Networks and Hidden Markov Models to:
Convert the spoken commands into text;
Discern regional accents within the commands; and
Employ a speech recognition component that uses an acoustic model specifically tailored to filter ambient noise from the received spoken commands;
Analyzing the converted text using a Natural Language Understanding (NLU) component integrated within the NLP unit (104);
Extracting user intent and context from the analyzed text by processing it through a machine learning unit (106) that employs a sequence-to-sequence prediction model;
Executing tasks related to the extracted intent and context by an execution module (108).
ADVANCED VOICE ASSISTANT SYSTEM WITH CONTEXT-AWARE PROCESSING AND ADAPTIVE NOISE FILTERING
A voice assistant system designed for understanding and executing commands provided in a plurality of languages and dialects. The system comprises a microphone for capturing spoken commands, a Natural Language Processing (NLP) unit utilizing a hybrid of Deep Neural Networks and Hidden Markov Models for converting speech into text while discerning regional accents and filtering ambient noise. The NLP unit includes a Natural Language Understanding component capable of interpreting elliptical and anaphoric expressions. A machine learning unit processes the textual commands to identify the user's intent and context through a sequence-to-sequence prediction model. Based on the derived intent and context, an execution module carries out the specified tasks. This invention represents a significant advancement in the field of interactive voice-responsive systems by improving accuracy, understanding, and operational capabilities across diverse linguistic environments.
Drawings
/
FIG. 1
/
FIG. 2
/
FIG. 3
, Claims:I/We claim:
A voice assistant system (100) for performing tasks based on voice commands, comprising:
a microphone (102) for receiving spoken commands in a plurality of languages and dialects;
a Natural Language Processing (NLP) unit (104) utilizes a hybrid model combining Deep Neural Networks and Hidden Markov Models to convert spoken commands into text, further configured to:
discern regional accents,
utilize a speech recognition component using an acoustic model tailored to filter ambient noise,
utilize a Natural Language Understanding (NLU) component with a context-aware algorithm capable of handling elliptical and anaphoric expressions for intent determination;
a machine learning unit (106) process the text to extract the intent and context employing a sequence-to-sequence prediction model; and
an execution module (108) execute the tasks based on the intent and derived context.
The system (100) of claim 1, wherein the acoustic model is dynamically adjust weight based on detected environment noise level to optimize voice command clarity.
The system (100) of claim 1, wherein the sequence-to-sequence prediction model utilizes a bidirectional LSTM architecture.
The system (100) of claim 1, wherein the NLU component executes a sentiment analysis to adjust responses based on perceived user emotions.
The system (100) of claim 1, additionally comprising a feedback loop within the NLP engine to refine and retrain the hybrid model using real-time user command inputs.
The system (100) of claim 1, wherein the execution module (108) prioritize and manage task execution based on urgency and context derived from the user's spoken commands.
The system (100) of claim 1, further including an authentication module to verify the identity of the user based on voice biometrics to secure access to personalized tasks and information.
A method (200) for operating a voice assistant system (100) to perform tasks based on voice commands, the method (200) comprising the steps of:
Receiving spoken commands through a microphone (100), wherein the commands are delivered in a plurality of languages and dialects;
Processing the spoken commands in a Natural Language Processing (NLP) unit (104), wherein the NLP unit (104) utilizes a hybrid model that combines Deep Neural Networks and Hidden Markov Models to:
Convert the spoken commands into text;
Discern regional accents within the commands; and
Employ a speech recognition component that uses an acoustic model specifically tailored to filter ambient noise from the received spoken commands;
Analyzing the converted text using a Natural Language Understanding (NLU) component integrated within the NLP unit (104);
Extracting user intent and context from the analyzed text by processing it through a machine learning unit (106) that employs a sequence-to-sequence prediction model;
Executing tasks related to the extracted intent and context by an execution module (108).
ADVANCED VOICE ASSISTANT SYSTEM WITH CONTEXT-AWARE PROCESSING AND ADAPTIVE NOISE FILTERING
| # | Name | Date |
|---|---|---|
| 1 | 202421033181-OTHERS [26-04-2024(online)].pdf | 2024-04-26 |
| 2 | 202421033181-FORM FOR SMALL ENTITY(FORM-28) [26-04-2024(online)].pdf | 2024-04-26 |
| 3 | 202421033181-FORM 1 [26-04-2024(online)].pdf | 2024-04-26 |
| 4 | 202421033181-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [26-04-2024(online)].pdf | 2024-04-26 |
| 5 | 202421033181-EDUCATIONAL INSTITUTION(S) [26-04-2024(online)].pdf | 2024-04-26 |
| 6 | 202421033181-DRAWINGS [26-04-2024(online)].pdf | 2024-04-26 |
| 7 | 202421033181-DECLARATION OF INVENTORSHIP (FORM 5) [26-04-2024(online)].pdf | 2024-04-26 |
| 8 | 202421033181-COMPLETE SPECIFICATION [26-04-2024(online)].pdf | 2024-04-26 |
| 9 | 202421033181-FORM-9 [07-05-2024(online)].pdf | 2024-05-07 |
| 10 | 202421033181-FORM 18 [08-05-2024(online)].pdf | 2024-05-08 |
| 11 | 202421033181-FORM-26 [12-05-2024(online)].pdf | 2024-05-12 |
| 12 | 202421033181-FORM 3 [13-06-2024(online)].pdf | 2024-06-13 |
| 13 | 202421033181-RELEVANT DOCUMENTS [09-10-2024(online)].pdf | 2024-10-09 |
| 14 | 202421033181-POA [09-10-2024(online)].pdf | 2024-10-09 |
| 15 | 202421033181-FORM 13 [09-10-2024(online)].pdf | 2024-10-09 |