Abstract: ABSTRACT SYSTEM AND METHOD FOR PROVIDING USER-PERSONALISED VOICE COMMANDS TO VOICE-CONTROLLED ELECTRONIC DEVICE WITH ATTENUATED AMBIENT-NOISE The present invention discloses a system and method for providing one or more user-personalised voice commands to a voice-controlled electronic device with attenuated ambient noise. The system (102) is configured to generate at least one user profile of one or more user profiles for each user of the one or more users to store at least one of: one or more pre-defined voice instructions and voice frequency data associated with one or more audio signals received through one or more voice-capturing units (104). The system (102) attenuates the ambient noise by comparing a plurality of frequencies against one or more predefined ambient noise frequency data and generate one or more bandpass filters. The one or more bandpass filters isolates the one or more user-personalised voice commands associated with the at least one user profile for providing one or more user-personalised voice commands to the voice-controlled electronic device (106). Figure 2A
DESC:EARLIEST PRIORITY DATE:
[0001]This Application claims priority from a Provisional patent application filed in India having Patent Application No. 202441004916, filed on January 24, 2024, and titled “SYSTEM AND METHOD FOR CUSTOMISED VOICE-CONTROLLED ELECTRONIC DEVICES WITH A PERSON VOICE-BASED NOISE-REDUCTION”.
FIELD OF INVENTION
[0002]Embodiments of the present invention relate to voice-controlled electronic devices, and more particularly relate to a system and a method for providing one or more user-personalised voice commands to a voice-controlled electronic device with attenuated ambient noise.
BACKGROUND
[0003]Voice-controlled electronic devices have become increasingly prevalent in modern households and workplaces. These voice-controlled electronic devices facilitate hands-free operation and convenience, allowing one or more users to control various operations through voice commands. However, a widespread adoption of such technology has exposed several limitations and challenges in current implementations.
[0004]One significant drawback of the existing voice-controlled electronic devices is their struggle to recognize the voice commands in noisy environments accurately. Ambient noise from household appliances, street traffic, or multiple conversations may interfere with the voice commands, leading to misinterpretation of the voice commands or failure to respond altogether. This issue is particularly pronounced in busy households or open-plan offices where multiple noise sources are present. Current noise reduction algorithms employed in voice-controlled systems often face limitations in adapting to diverse and dynamic noise scenarios, impacting their ability to distinguish between signal and unwanted background noise.
[0005]Generic noise reduction approaches, commonly used in voice-controlled devices, lack the precision needed to cater to the individual preferences of users and the specific characteristics of different electronic/appliance devices with internal mechanical and other noises. The absence of customization in recognizing user-defined voice commands contributes to a gap in providing a truly personalized and efficient voice control experience. Moreover, a sustained performance or noise factor of the all electronic/house device/any other devices over time is affected by factors such as wear and tear of components and changes in environmental conditions. Existing technologies often struggle to maintain optimal noise reduction performance as the device ages.
[0006]In the existing technology, the landscape of noise reduction technologies for the voice-controlled electronic devices is marked by various approaches. Firstly, an adaptive noise cancellation (ANC) employs a feedback mechanism to actively cancel external noise, making ANC effective in dynamic noise environments. However, the ANC struggles when faced with highly specific or stationary noise sources, limiting the ANC adaptability to certain scenarios. Secondly, noise suppression algorithms aim to filter out background noise during voice processing, providing general noise reduction. However, noise suppression algorithms lack the specificity required for recognizing personalized voice command profiles and may face challenges in handling diverse product environments.
[0007]Furthermore, echo cancellation technology is specialized in eliminating echo effects during voice communication. However, the echo cancellation technology is not well-suited for handling complex noise scenarios in the voice-controlled electronic devices. Further, Digital Signal Processing (DSP) techniques are commonly employed for noise filtering and voice enhancement. While effective in certain applications, DSP does not fully address the challenges posed by personalized voice commands and the variability in product environments.
[0008]There are various technical problems with the noise-reduction system in the prior art. In the existing technology, the noise-reduction system fails to adapt to the specific preferences of the users and unique control commands of the users, resulting in suboptimal performance. The noise-reduction system may struggle to differentiate between specific noises of the electronic devices and environmental sounds, leading to reduced accuracy in voice/specific sound(like baby cry, gun shot) recognition. The noise-reduction/cancellation system may not adapt well to changes in the noise environment and fails to differentiate between dynamic noise sources and the stationary noise sources. The noise-reduction system may be complex and expensive to implement, thereby making the noise-reduction system less practical for widespread adoption. The noise-reduction system may not adequately adjust to changing noise conditions in various environments, impacting overall efficiency and the user experience.
[0009]Therefore, there is a need for a system to address the aforementioned issues by providing an innovative and comprehensive solution that tackles the limitations of existing noise reduction technologies in the voice-controlled electronic devices.
SUMMARY
[0010]This summary is provided to introduce a selection of concepts, in a simple manner, which is further described in the detailed description of the disclosure. This summary is neither intended to identify key or essential inventive concepts of the subject matter nor to determine the scope of the disclosure.
[0011]In order to overcome the above deficiencies of the prior art, the present disclosure is to solve the technical problem by providing a system and a method for providing one or more user-personalised voice commands to a voice-controlled electronic device with attenuated ambient noise
[0012]In accordance with an embodiment of the present disclosure, the system comprises one or more voice-capturing units, one or more hardware processors, and a memory unit. The one or more voice-capturing units are operatively connected to the voice-controlled electronic device. The one or more voice-capturing units are configured to receive one or more audio signals from at least one of: each user of one or more users and one or more noise generating objects.
[0013]In an embodiment, the memory unit is operatively connected to the one or more hardware processors. The memory unit comprises a set of computer-readable instructions in form of a plurality of subsystems. The plurality of subsystems is configured to be executed by the one or more hardware processors. The plurality of subsystems comprises a user profile generation subsystem, a data-obtaining subsystem, an audio signals processing subsystem, a noise reduction subsystem, a voice profile extraction subsystem, a command extraction subsystem, and a command executing subsystem.
[0014]In an embodiment, the user profile generation subsystem is configured to generate at least one user profile of one or more user profiles for each user of the one or more users. The one or more user profiles are configured to store at least one of: one or more pre-defined voice instructions and voice frequency data associated with the one or more audio signals. The one or more pre-defined voice instructions are received during an initial configuration of the voice-controlled electronic device through the one or more voice-capturing units. The one or more pre-defined voice instructions are analysed and processed in the audio signals processing subsystem and noise reduction subsystem to obtain the voice frequency data associated with the one or more audio signals. The voice frequency data comprises at least one of: pitch, tone, and speech patterns of each user of the one or more users.
[0015]In an embodiment, the data-obtaining subsystem is configured to obtain at least one of: the received one or more audio signals from the one or more voice-capturing units and one or more predefined ambient noise frequency data for storing in one or more databases. The data-obtaining subsystem is configured to obtain the one or more audio signals from the one or more voice-capturing units in a real time and transferred to the audio signals processing subsystem and the noise reduction subsystem for attenuating ambient noise. The one or more predefined ambient noise frequency data are obtained based on a pre-defined time period to perform the periodic re-calibration.
[0016]In an embodiment, the audio signals processing subsystem is configured to convert the one or more audio signals into one or more individual spectral components to extract a plurality of frequencies in the one or more audio signals. The audio signals processing subsystem is configured with a Fast Fourier Transform (FFT) model to determine the plurality of frequencies in the one or more audio signals. The audio signals processing subsystem is configured to process the one or more audio signals with a real-time processing latency between 100 milliseconds and 2 seconds.
[0017]In an embodiment, the noise reduction subsystem is configured to compare the extracted plurality of frequencies against the one or more predefined ambient noise frequency data by one or more noise reduction models for segregating the one or more user-personalised voice commands from the ambient noise in the processed one or more audio signals to attenuate the ambient noise. The noise reduction subsystem is configured to generate one or more bandpass filters to isolate the one or more user-personalised voice commands associated with the at least one user profile of the one or more user profiles by analysing one or more frequency characteristics of the one or more user-personalised voice commands. The noise reduction subsystem is configured to perform a periodic re-calibration to the one or more noise reduction models by updating the one or more predefined ambient noise frequency data at the pre-defined time period. The pre-defined time period for performing the periodic re-calibrations is between 24 hours and 365 days. The periodic re-calibration of the one or more noise reduction models optimises the one or more bandpass filters. The optimised one or more bandpass filters are configured to refine the isolation of the one or more user-personalised voice commands.
[0018]In an embodiment, the voice profile extraction subsystem is configured to compare the one or more frequency characteristics of each bandpass filter of the one or more bandpass filters against the voice frequency data stored in the one or more user profiles to extract the at least one user profile. The voice profile extraction subsystem is configured to perform a correlation analysis between the one or more frequency characteristics of each bandpass filter of the one or more bandpass filters against the voice frequency data for refining the extraction of the at least one user profile to optimise accuracy in recognition of the one or more user-personalised voice commands.
[0019]In an embodiment, the command extraction subsystem is configured to extract at least one pre-defined voice instruction of the one or more pre-defined voice instructions from the extracted at least one user profile based on the isolated one or more user-personalised voice commands, for providing the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise. The command extraction subsystem is operatively connected to the command executing subsystem. The command executing subsystem is configured to execute the extracted at least one pre-defined voice instruction by adopting a voice wake model for actuating the voice-controlled electronic device.
[0020]In accordance with another embodiment of the present disclosure, a method for providing the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise. In the first step, the method includes receiving, by the one or more voice-capturing units, the one or more audio signals from at least one of the: each user of one or more users and the one or more noise generating objects. In the next step, the method includes generating, by the one or more hardware processors through the user profile generation subsystem, at least one user profile of the one or more user profiles for each user of the one or more users to store at least one of the: one or more pre-defined voice instructions and voice frequency data associated with the one or more audio signals. In the next step, the method includes obtaining, by the one or more hardware processors through the data-obtaining subsystem, at least one of: the received one or more audio signals from the one or more voice-capturing units and one or more predefined ambient noise frequency data for storing in the one or more databases.
[0021]In the next step, the method includes converting, by the one or more hardware processors through the audio signals processing subsystem, the one or more audio signals into one or more individual spectral components to extract the plurality of frequencies in the one or more audio signals. In the next step, the method includes comparing, by the one or more hardware processors through the noise reduction subsystem, the extracted plurality of frequencies against the one or more predefined ambient noise frequency data by the one or more noise reduction models for segregating the one or more user-personalised voice commands from the ambient noise in the processed one or more audio signals to attenuate the ambient noise.
[0022]In the next step, the method includes generating, by the one or more hardware processors through the noise reduction subsystem, the one or more bandpass filters to isolate the one or more user-personalised voice commands associated with the at least one user profile of the one or more user profiles by analysing the one or more frequency characteristics of the one or more user-personalised voice commands. In the next step, the method includes comparing, by the one or more hardware processors through the voice profile extraction subsystem, the one or more frequency characteristics of each bandpass filter of the one or more bandpass filters against the voice frequency data stored in the one or more user profiles to extract the at least one user profile.
[0023]In the next step, the method includes extracting, by the one or more hardware processors through the command extraction subsystem, the at least one pre-defined voice instruction of the one or more pre-defined voice instructions from the extracted at least one user profile based on the isolated one or more user-personalised voice commands. In the next step, the method includes providing, by the one or more hardware processors through the command executing subsystem, the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise to execute the extracted at least one pre-defined voice instruction. In the next step, the method includes performing, by the one or more hardware processors through the noise reduction subsystem, the periodic re-calibration to the one or more noise reduction models by updating the one or more predefined ambient noise frequency data at the pre-defined time period. The pre-defined time period for performing the periodic re-calibrations is between 24 hours and 365 days.
[0024]To further clarify the advantages and features of the present invention, a more particular description of the invention will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the invention and are therefore not to be considered limiting in scope. The invention will be described and explained with additional specificity and detail with the appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025]The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:
[0026]Figure 1 illustrates an exemplary block diagram representation of a network architecture of a system for providing one or more user-personalised voice commands to a voice-controlled electronic device with an attenuated ambient noise, in accordance with an embodiment of the present disclosure;
[0027]Figure 2A illustrates an exemplary block diagram of the system as shown in Figure 1 for providing the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise, in accordance with an embodiment of the present disclosure;
[0028]Figure 2B illustrates an exemplary schematic diagram depicting the system for providing the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise, in accordance with an embodiment of the present disclosure;
[0029]Figure 3 illustrates an exemplary schematic diagram depicting an initial configuration of the voice-controlled electronic device for generating one or more user profiles, in accordance with an embodiment of the present disclosure;
[0030]Figure 4 illustrates an exemplary schematic diagram depicting a periodic re-calibration for updating one or more predefined ambient noise frequency data, in accordance with an embodiment of the present disclosure; and
[0031]Figure 5 illustrates an exemplary schematic diagram depicting a use case of the system, in accordance with an embodiment of the present disclosure; and
[0032]Figure 6 illustrates an exemplary flow chart of a method for providing the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise, in accordance with an embodiment of the present disclosure.
[0033]Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the method steps, chemical compounds, equipment, and parameters used herein may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
[0034]For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.
[0035]The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more components, compounds, and ingredients preceded by "comprises... a" does not, without more constraints, preclude the existence of other components or compounds or ingredients or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.
[0036]Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.
[0037]In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
[0038]Embodiments of the present disclosure relate to a system and a method for providing one or more user-personalised voice commands to a voice-controlled electronic device with attenuated ambient noise.
[0039]As used herein the term "voice-controlled electronic devices" refers to electronic devices that are designed to receive, interpret, and respond to voice commands for various functionalities. Examples of voice-controlled electronic devices encompass, but not limited to, at least one of a: refrigerators, ovens, washing machines, air conditioners, fans, and the like.
[0040]Embodiments of the present invention relate to a system 102 for providing one or more user-personalised voice commands to a voice-controlled electronic device 106 with an attenuated ambient noise. The one or more user-personalised voice commands with the attenuated ambient noise specify that the noise reduction is personalized to the unique characteristics of each user's voice. This suggests that the system 102 may utilize a voice profile of each user of one or more users to enhance noise reduction.
[0041]Figure 1 illustrates an exemplary block diagram representation of a network architecture 100 of the system 102 for providing one or more user-personalised voice commands to the voice-controlled electronic device 106 with an attenuated ambient noise, in accordance with an embodiment of the present disclosure.
[0042]According to an exemplary embodiment of the present disclosure, the network architecture 100 may include the system 102, one or more databases 114, and one or more communication devices 116. The one or more databases 114 and one or more communication devices 116 may be communicatively coupled via one or more communication networks 118, ensuring seamless data transmission, processing, and attenuating the ambient noise. The system 102 acts as the central processing unit within the network architecture 100, responsible for generating one or more user profiles, analysing, and attenuating the ambient noise from received one or more audio signals through one or more voice-capturing units 104. The system 102 is configured to execute a set of computer-readable instructions that control a plurality of subsystems 112, enabling the system 102 to attenuate the ambient noise from the one or more audio signals associated with the voice profile of each user of the one or more users.
[0043]In an exemplary embodiment, the one or more voice-capturing units 104 is operatively connected to the voice-controlled electronic device 106, ensuring seamless communication between these components. The one or more voice-capturing units 104 serve as the primary interface for receiving the one or more audio signals, which can include a variety at least one of, but not limited to, the one or more user-personalised voice commands and the ambient noise. Specifically, the one or more voice-capturing units 104 are configured to receive the one or more audio signals from at least two primary sources: (i) each user of the one or more users, and (ii) one or more noise-generating objects. The one or more voice-capturing units 104 are capable of capturing a broad range of audio frequencies, including those associated with human speech such as the one or more user-personalised voice commands as well as the ambient noises. The system 102 is configured to operate in various environments, where background noise from the one or more noise-generating objects such as, but not limited to, fans, air conditioners, other household appliances, traffic sound, sound of animals, and the like might interfere with a clear reception of the one or more user-personalised voice commands from the one or more users. To accomplish this, the one or more voice-capturing units 104 may comprise, but not limited to at least one of: microphones, headsets with microphones, smart speakers, wearable devices, voice-activated units, smartphone microphones, and the like.
[0044]In an exemplary embodiment, the one or more databases 114 may be configured to store, and manage data related to various aspects of the system 102. The one or more databases 114 may store at least one of: the received one or more audio signals, one or more predefined ambient noise frequency data, the one or more user profiles, one or more pre-defined voice instructions, voice frequency data, outcomes of command instructions, and the like. The one or more databases 114 may store the received one or more audio signals as raw audio signals captured by the one or more voice-capturing units 104. These raw audio signals represent both the one or more user-personalised voice commands and any surrounding environmental noise. Storing these raw audio signals allows the system 102 to perform subsequent processing and analysis to differentiate between the one or more user-personalised voice commands and irrelevant noise. The section of the one or more databases 114 contains the one or more predefined ambient noise frequency data which corresponds to diverse types of ambient noise that the system 102 may encounter. Examples of such ambient noise include background conversations, machine noises, appliance noise, and other environmental sounds that could interfere with the one or more user-personalised voice commands. By storing the one or more predefined ambient noise frequency data, the system 102 may further effectively apply noise reduction techniques, thereby enhancing the clarity and accuracy of the one or more user-personalised voice commands by the voice-controlled electronic device 106.
[0045]Each user of the system 102 may have an individual at least one user profile stored in the one or more databases 114. The one or more user profiles include personalized information such as user’s voice characteristics, the preferred one or more user-personalised voice commands, and any associated frequency data. The one or more user profiles enable the system 102 to recognize and respond to the one or more user-personalised voice commands in a personalized manner, ensuring that the system 102 accurately distinguishes between different users within the one or more users. The system 102 stores the one or more pre-defined voice instructions, which are pre-configured by each user during an initial configuration of the voice-controlled electronic device 106. The one or more pre-defined voice instructions are linked to corresponding actions or functions that the voice-controlled electronic device 106 is expected to perform. By storing the one or more pre-defined voice instructions, the system 102 may quickly retrieve and execute commands based on the one or more user’s voice input, facilitating seamless interaction with the voice-controlled electronic device 106.
[0046]The system 102 stores the one or more user-personalised voice commands that may pre-configured by the one or more users. The one or more pre-defined voice instructions are linked to corresponding actions or functions that the voice-controlled electronic device 106 is expected to perform. By storing the one or more pre-defined voice instructions in the one or more databases 114, the system 102 may quickly retrieve and execute the one or more user-personalised voice commands based on the one or more user-personalised voice commands as input, facilitating seamless interaction with the voice-controlled electronic device 106. The voice frequency data stored in the one or more databases 114 includes detailed spectral information related to the one or more user-personalised voice commands associated with each user of the one or more users. The voice frequency data is critical for the system 102 to analyse and optimize the recognition and isolation of the one or more user-personalised voice commands from the background noise. The frequency data helps in the fine-tuning of one or more bandpass filters and one or more noise reduction models, ensuring precise execution of the one or more user-personalised voice commands.
[0047]The one or more databases 114 also store the outcomes or results of the one or more user-personalised voice commands executed by the system 102. The one or more user-personalised voice commands include whether the one or more user-personalised voice commands are successfully executed, any errors encountered, and how the voice-controlled electronic device 106 responded to the outcomes of command instructions. Storing these outcomes of command instructions allows the system 102 to learn from past interactions, potentially improving the accuracy and reliability of the future one or more user-personalised voice commands.
[0048]The one or more databases 114 may include diverse types of databases such as relational databases (e.g., Structured Query Language (SQL) databases), non-Structured Query Language (NoSQL) databases (e.g., MongoDB, Cassandra), time-series databases (e.g., InfluxDB), an OpenSearch database, and object storage systems (e.g., Amazon S3, PostgresDB). The relational databases may be used to store structured data comprising, but not limited to, logs, event details, and metadata. The NoSQL databases may be employed to store unstructured and semi-structured data such as user activity data, network traffic details, and application behaviour logs. The time-series databases are ideal for storing time-stamped data, such as process activities, network connections, and other time-dependent events. The object storage systems may be utilized for storing large volumes of binary data, including files from Fast Fourier Transform (FFT) model and the one or more noise reduction models. Additionally, the one or more databases 114 may implement advanced indexing, partitioning, and replication techniques to ensure high availability, scalability, and quick access to the data. The one or more databases 114 may also support various security features such as encryption, access control, and regular backups to protect sensitive information and ensure a data integrity within the system 102.
[0049]In an exemplary embodiment, the one or more communication devices 116 are configured to be used as the one or more voice-capturing units 104 for receiving the one or more audio signals from the one or more users and one or more noise generating objects. The one or more communication devices 116 may be digital devices, computing devices, and/or networks. The one or more communication devices 116 may include, but not limited to, a mobile device, a smartphone, a personal digital assistant (PDA), a tablet computer, a phablet computer, a wearable computing device, a virtual reality/augmented reality (VR/AR) device, a laptop, a desktop, and the like. For instance, the mobile device utilize its built-in microphone array to capture the one or more audio signals to provide the one or more user-personalised voice commands to the voice-controlled electronic device 106.
[0050]In an exemplary embodiment, the one or more communication networks 118 may be, but not limited to, a wired communication network and/or a wireless communication network, a local area network (LAN), a wide area network (WAN), a Wireless Local Area Network (WLAN), a metropolitan area network (MAN), a telephone network, such as the Public Switched Telephone Network (PSTN) or a cellular network, an intranet, the Internet, a fibre optic network, a satellite network, a cloud computing network, or a combination of networks. The wired communication network may comprise, but not limited to, at least one of: Ethernet connections, Fiber Optics, Power Line Communications (PLCs), Serial Communications, Coaxial Cables, Quantum Communication, Advanced Fiber Optics, Hybrid Networks, and the like. The wireless communication network may comprises, but not limited to, at least one of: wireless fidelity (wi-fi), cellular networks (including fourth generation (4G) technologies and fifth generation (5G) technologies), Bluetooth, ZigBee, long-range wide area network (LoRaWAN), satellite communication, radio frequency identification (RFID), 6G (sixth generation) networks, advanced IoT protocols, mesh networks, non-terrestrial networks (NTNs), near field communication (NFC), and the like.
[0051]In an exemplary embodiment, the system 102 comprises one or more hardware processors 108. The one or more hardware processors 108 may comprise a combination of discrete components, an integrated circuit, an application-specific integrated circuit, a field-programmable gate array, a digital signal processor, or other suitable hardware. The “software” may comprise one or more objects, agents, threads, lines of code, subroutines, separate software applications, two or more lines of code, or other suitable software structures operating in one or more software applications or the one or more hardware processors 108. The one or more hardware processors 108 are operatively connected to each voice-capturing unit 104 of the one or more voice-capturing units 104 and a memory unit 110. The memory unit 110 is operatively connected to the one or more hardware processors 108. The memory unit 110 comprises a set of computer-readable instructions in form of the plurality of subsystems 112, configured to be executed by the one or more hardware processors 108.
[0052]The one or more hardware processors 108 may include, for example, microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and/or any devices that manipulate data or signals based on operational instructions. Among other capabilities, the one or more hardware processors 108 may fetch and execute computer-readable instructions in the memory unit 110 operationally coupled with the system 102 for performing tasks such as data processing, input/output processing, and/or any other functions. Any reference to a task in the present disclosure may refer to an operation being or that may be performed on data. The one or more hardware processors 108 are high-performance processors capable of handling large volumes of data and complex computations. The one or more hardware processors 108 may be, but not limited to, at least one of: multi-core central processing units (CPU), graphics processing units (GPUs), and specialized Artificial Intelligence (AI) accelerators that enhance an ability of the system 102 to process real-time one or more audio signals from the one or more voice-capturing units 104.
[0053]In an exemplary embodiment, the system 102 may be implemented by way of a single device or a combination of multiple devices that may be operatively connected or networked together. The system 102 may be implemented in hardware or a suitable combination of hardware and software.
[0054]Though few components and the plurality of subsystems 112 are disclosed in Figure 1, there may be additional components and subsystems which is not shown, such as, but not limited to, ports, routers, repeaters, firewall devices, network devices, the one or more databases 114, network attached storage devices, assets, machinery, instruments, facility equipment, emergency management devices, image capturing devices, any other devices, and combination thereof. The person skilled in the art should not be limiting the components/subsystems shown in Figure 1. Although Figure 1 illustrates the system 102, and the one or more communication devices 116 connected to the one or more databases 114, one skilled in the art can envision that the system 102, and the one or more communication devices 116 may be connected to several user devices located at various locations and several databases via the one or more communication networks 118.
[0055]Those of ordinary skilled in the art will appreciate that the hardware depicted in Figure 1 may vary for particular implementations. For example, other peripheral devices such as an optical disk drive and the like, the local area network (LAN), the wide area network (WAN), wireless (e.g., wireless-fidelity (Wi-Fi)) adapter, graphics adapter, disk controller, input/output (I/O) adapter also may be used in addition or place of the hardware depicted. The depicted example is provided for explanation only and is not meant to imply architectural limitations concerning the present disclosure.
[0056]Those skilled in the art will recognize that, for simplicity and clarity, the full structure and operation of all data processing systems suitable for use with the present disclosure are not being depicted or described herein. Instead, only so much of the system 102 as is unique to the present disclosure or necessary for an understanding of the present disclosure is depicted and described. The remainder of the construction and operation of the system 102 may conform to any of the various current implementations and practices that were known in the art.
[0057]Figure 2A illustrates an exemplary block diagram 200A of the system 102 as shown in Figure 1 for providing the one or more user-personalised voice commands to the voice-controlled electronic device 106 with the attenuated ambient noise, in accordance with an embodiment of the present disclosure.
[0058]Figure 2B illustrates an exemplary schematic diagram 200B depicting the system for providing the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise, in accordance with an embodiment of the present disclosure.
[0059]In an exemplary embodiment, the system 102 comprises the one or more hardware processors 108, the memory unit 110, and a storage unit 204. The one or more hardware processors 108, the memory unit 110, and the storage unit 204 are communicatively coupled through a system bus 202 or any similar mechanism. The system bus 202 functions as the central conduit for data transfer and communication between the one or more hardware processors 108, the memory unit 110, and the storage unit 204. The system bus 202 facilitates the efficient exchange of information and instructions, enabling the coordinated operation of the system 102. The system bus 202 may be implemented using various technologies, including but not limited to, parallel buses, serial buses, or high-speed data transfer interfaces such as, but not limited to, at least one of a: universal serial bus (USB), peripheral component interconnect express (PCIe), and similar standards.
[0060]In an exemplary embodiment, the memory unit 110 is operatively connected to the one or more hardware processors 108. The memory unit 110 comprises the plurality of subsystems 112 in the form of programmable instructions executable by the one or more hardware processors 108. The plurality of subsystems 112 comprises a user profile generation subsystem 206, a data-obtaining subsystem 208, an audio signals processing subsystem 210, a noise reduction subsystem 212, a voice profile extraction subsystem 214, a command extraction subsystem 216, and a command executing subsystem 218. The one or more hardware processors 108, as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor unit, microcontroller, complex instruction set computing microprocessor unit, reduced instruction set computing microprocessor unit, very long instruction word microprocessor unit, explicitly parallel instruction computing microprocessor unit, graphics processing unit, digital signal processing unit, or any other type of processing circuit. The one or more hardware processors 108 may also include embedded controllers, such as generic or programmable logic devices or arrays, application-specific integrated circuits, single-chip computers, and the like.
[0061]The memory unit 110 may be the non-transitory volatile memory and the non-volatile memory. The memory unit 110 may be coupled to communicate with the one or more hardware processors 108, such as being a computer-readable storage medium. The one or more hardware processors 108 may execute machine-readable instructions and/or source code stored in the memory unit 110. A variety of machine-readable instructions may be stored in and accessed from the memory unit 110. The memory unit 110 may include any suitable elements for storing data and machine-readable instructions, such as read-only memory, random access memory, erasable programmable read-only memory, electrically erasable programmable read-only memory, a hard drive, a removable media drive for handling compact disks, digital video disks, diskettes, magnetic tape cartridges, memory cards, and the like. In the present embodiment, the memory unit 110 includes the plurality of subsystems 112 stored in the form of machine-readable instructions on any of the above-mentioned storage media and may be in communication with and executed by the one or more hardware processors 108.
[0062]The storage unit 204 may be a cloud storage or the one or more databases 114 such as those shown in Figure 1. The storage unit 204 may store, but not limited to, recommended course of action sequences dynamically generated by the system 102. These action sequences are based on at least one of: the one or more user profiles generation, data-obtaining, audio signals processing, attenuating the ambient noise, re-calibrating, voice profile extracting, one or more pre-defined voice instructions extracting, executing the one or more pre-defined voice instructions, and the like. The storage unit 204 ensures that the action sequences are readily accessible for analysis and implementation. By storing this information, the system 102 provides the one or more user-personalised voice commands to the voice-controlled electronic device 106 with the attenuated ambient noise. The storage unit 204 may also store historical data related to, but not limited to, at least one of: the one or more pre-defined voice instructions and the voice frequency data of the one or more users, the one or more predefined ambient noise frequency data, and the like. Additionally, the storage unit 204 can retain configuration settings, one or more pre-defined voice instructions, and the like, ensuring that the system 102 operates consistently to attenuate the ambient noise. The storage unit 204 may be any kind of database such as, but not limited to, relational databases, dedicated databases, dynamic databases, monetized databases, scalable databases, cloud databases, distributed databases, any other databases, and a combination thereof.
[0063]In an exemplary embodiment, the user profile generation subsystem 206 is configured to generate at least one user profile of the one or more user profiles for each user of the one or more users. The one or more user profiles are configured to store at least one of: the one or more pre-defined voice instructions and the voice frequency data associated with the one or more audio signals. The one or more pre-defined voice instructions are received during an initial configuration of the voice-controlled electronic device 106 through the one or more voice-capturing units 104.
[0064]During an initial configuration phase, each user of the one or more users may be prompted to provide specific voice commands of the one or more user-personalised voice commands, which are intended for future use in operating the voice-controlled electronic device 106. The one or more user-personalised voice commands are carefully captured by the one or more voice-capturing units 104 to ensure accuracy and clarity. The received one or more user-personalised voice commands in the one or more audio signals are then analysed within the system 102 to determine distinct voice characteristics that are unique to each user. These characteristics are essential in ensuring that the system 102 may accurately recognize and respond to voice commands in real-world, noisy environments.
[0065]The one or more pre-defined voice instructions are mapped with the one or more user-personalised voice commands and is then processed within the audio signals processing subsystem 210 and the noise reduction subsystem 212. The one or more audio signals with the one or more user-personalised voice commands are processed in the audio signals processing subsystem 210. The audio signals processing subsystem 210 is responsible for converting the captured audio signals into individual spectral components, which allow for a detailed analysis of the voice frequencies. This conversion helps in isolating specific frequency bands that are characteristic of each user's voice. The noise reduction subsystem 212 then plays a critical role in filtering out ambient noise and other irrelevant audio elements, ensuring that the one or more user-personalised voice commands are captured in their purest form. This process is crucial in environments where background noise is prevalent, such as crowded areas or homes with multiple noise-generating devices.
[0066]The voice frequency data obtained through the noise reduction subsystem 212 comprises key attributes of each user's voice, including at least one of: pitch, tone, and speech patterns. The pitch refers to the perceived frequency of the voice, which varies from user to user and is a critical factor in distinguishing between different voices. The tone encompasses the quality or character of the voice, influenced by factors such as emotion or emphasis. The speech patterns refer to the unique rhythmic and intonational characteristics of a user's speech, which may include factors like the speed of talking, the use of pauses, and the emphasis on certain syllables or words by each user. This voice frequency data is then stored in the corresponding user profile within the user profile generation subsystem 206. By storing this data, the system 102 enables the voice-controlled electronic device 106 to recognize and respond to voice commands from each user of the one or more users, even in challenging acoustic environments accurately and consistently. The creation and refinement of the one or more user profiles ensure that the system 102 is personalized and optimized for each user, allowing for a seamless and intuitive voice command experience.
[0067]In an exemplary embodiment, the data-obtaining subsystem 208 is configured to obtain at least one of: the received one or more audio signals from the one or more voice-capturing units 104 and the one or more predefined ambient noise frequency data for storing in the one or more databases 114. The data-obtaining subsystem 208 plays a critical role in ensuring that the system 102 operates effectively in various acoustic environments by gathering and managing the necessary data for real-time processing and noise reduction. The data-obtaining subsystem 208 is configured to receive the one or more audio signals from the one or more voice-capturing units 104 in real-time. This real-time capability is essential for enabling the system 102 to respond promptly to the received one or more audio signals and to perform the continuous noise reduction. As soon as the one or more audio signals are captured by the one or more voice-capturing units 104, they are immediately transferred to both the audio signals processing subsystem 210 and the noise reduction subsystem 212. This swift data transfer allows the system 102 to quickly begin the process of converting the audio signals into individual spectral components, which are then analysed to extract relevant voice frequencies for each user of the one or more users.
[0068]In addition to obtaining the real-time one or more audio signals, the data-obtaining subsystem 208 is also responsible for acquiring the one or more predefined ambient noise frequency data. The one or more predefined ambient noise frequency data serves as a reference for the noise reduction subsystem 212, enabling it to distinguish between the one or more user-personalised voice commands and background noise i.e., the ambient noise effectively. The one or more predefined ambient noise frequency data is stored in the one or more databases 114 and is periodically updated based on a pre-defined time period. This periodic updating, or re-calibration, is essential for maintaining the accuracy and reliability of the noise reduction models over time, particularly in environments where noise conditions may change frequently.
[0069]In an exemplary embodiment, the audio signals processing subsystem 210 is configured to convert the one or more audio signals into one or more individual spectral components to extract a plurality of frequencies in the one or more audio signals. This conversion is fundamental to the system 102's ability to analyse and interpret the one or more audio signals received from the one or more voice-capturing units 104. By breaking down the one or more audio signals into the one or more individual spectral components, the audio signals processing subsystem 210 enables the system 102 to identify specific frequencies of the plurality of frequencies that correspond to the one or more user-personalised voice commands of each user of the one or more users and the ambient noise.
[0070]The audio signals processing subsystem 210 is configured with the Fast Fourier Transform (FFT) model to determine the plurality of frequencies in the one or more audio signals. The FFT model is a mathematical algorithm that transforms one or more time-domain audio signals into a frequency domain, allowing for the identification of the plurality of frequencies present in the one or more audio signals. This transformation is crucial for isolating one or more user-personalised voice commands from the ambient noise and other irrelevant audio data. The FFT model is selected for its efficiency and accuracy in frequency analysis, making it particularly suitable for real-time audio processing applications.
[0071]Furthermore, the audio signals processing subsystem 210 is configured to handle the one or more audio signals with a real-time processing latency between 100 milliseconds and 2 seconds. This latency range ensures that the system 102 is able to process the audio signals quickly enough to respond to one or more user-personalised voice commands in a timely manner, while also allowing for the necessary computational time to accurately analyse the one or more individual spectral components. The lower bound of 100 milliseconds provides near-instantaneous processing, which is essential for applications requiring rapid feedback, such as the voice-controlled devices 106. The upper bound of 2 seconds accommodates more complex processing tasks that may involve additional steps, such as noise reduction or pattern recognition, without significantly delaying response of the system 102.
[0072]The ability of the audio signals processing subsystem 210 to operate within this specified latency range is critical for maintaining the overall performance of the system 102. The audio signals processing subsystem 210 ensures that the processed one or more audio signals is delivered to subsequent subsystems, such as the noise reduction subsystem 212 and the voice profile extraction subsystem 214, with minimal delay, thereby preserving the real-time responsiveness of the system 102.
[0073]In an exemplary embodiment, the noise reduction subsystem 212 is configured to compare the extracted plurality of frequencies against the one or more predefined ambient noise frequency data using the one or more noise reduction models. This comparison is essential for distinguishing/ segregating the one or more user-personalised voice commands from the ambient noise within the processed one or more audio signals. By effectively segregating the one or more user-personalised voice commands from the background noise i.e., ambient noise, the system 102 is able to attenuate unwanted noise, thereby enhancing the clarity and accuracy of the one or more user-personalised voice commands that are detected and processed by the system 102.
[0074]The noise reduction subsystem 212 is further configured to generate one or more bandpass filters for each user of the one or more users specifically configured to isolate the one or more user-personalised voice commands that are associated with the at least one user profile within the one or more user profiles. The generation of the one or more bandpass filters involves an analysis of one or more frequency characteristics of the one or more user-personalised voice commands. By focusing on the one or more frequency characteristics, the bandpass filters allow the system 102 to selectively pass only those frequencies that correspond to the one or more audio signals associated with the corresponding user of the one or more users, while filtering out other non-relevant frequencies, including those associated with the ambient noise.
[0075]In order to maintain the effectiveness of the one or more noise reduction models over time, the noise reduction subsystem 212 is also configured to perform a periodic re-calibration. The periodic re-calibration involves updating the one or more predefined ambient noise frequency data at the pre-defined time period, which is typically set between 24 hours and 365 days. The purpose of this periodic re-calibration is to adapt the one or more noise reduction models to any changes in the ambient noise environment, ensuring that the system 102 continues to accurately segregate and isolate the one or more user-personalised voice commands under varying noise conditions.
[0076]The periodic re-calibration of the one or more noise reduction models not only keeps the performance of the system 102 consistent but also optimises the one or more bandpass filters generated by the noise reduction subsystem 212. By continually refining the one or more bandpass filters based on the updated one or more predefined ambient noise frequency data, the system 102 enhances its ability to isolate the one or more user-personalised voice commands with greater precision. This optimisation process is crucial for maintaining prominent levels of accuracy in the one or more user-personalised voice commands recognition, particularly in environments where the ambient noise levels may fluctuate or evolve over time.
[0077]The optimised one or more bandpass filters are thus configured to further refine the isolation of the one or more user-personalised voice commands, ensuring that the system 102 is able to reliably distinguish and process the one or more audio signals even in challenging acoustic environments. This capability is key to the overall effectiveness of the system 102, enabling it to deliver a superior user experience in the voice-controlled electronic device 106 by reducing the impact of the ambient noise on voice command accuracy.
[0078]In an exemplary embodiment, the voice profile extraction subsystem 214 is configured to accurately identifying and isolating the one or more user-personalised voice commands by leveraging the one or more bandpass filters generated by the noise reduction subsystem 212. The voice profile extraction subsystem 214 is configured to achieve this by comparing the one or more frequency characteristics of each bandpass filter of the one or more bandpass filters against the stored voice frequency data in the one or more user profiles. This comparison is essential for extracting the at least one user profile associated with the user in the one or more users whose voice commands are being processed. The voice frequency data stored in the user profiles includes detailed information such as pitch, tone, and specific speech patterns, which are unique to each user.
[0079]The voice profile extraction subsystem 214 is further configured to perform a sophisticated correlation analysis. This correlation analysis involves a detailed comparison of the one or more frequency characteristics of each bandpass filter against the voice frequency data stored within the one or more user profiles. The goal of this correlation analysis is to refine the extraction process, ensuring that the system 102 accurately matches the one or more user-personalised voice commands with the correct user profile within the one or more user profiles. By doing so, the voice profile extraction subsystem 214 significantly optimises the accuracy of recognizing the one or more user-personalised voice commands.
[0080]The correlation analysis conducted by the voice profile extraction subsystem 214 is crucial for enhancing the overall performance of the system 102. By meticulously aligning the frequency characteristics of the one or more bandpass filters with the stored voice frequency data, the voice profile extraction subsystem 214 ensures that the extracted at least one user profile corresponds precisely to the user issuing the one or more user-personalised voice commands. This precise matching process reduces the likelihood of errors and improves the ability of the system 102 to distinguish between different users, especially in multi-user environments where multiple user profiles may exist.
[0081]Furthermore, the refinement of the extraction process through correlation analysis contributes to robustness of the system 102 in various acoustic environments. Whether in a quiet room or a noisy setting, the voice profile extraction subsystem 214 maintains its ability to accurately identify and extract at least one user profile, thereby enabling the system 102 to reliably recognize and respond to the one or more user-personalised voice commands. This capability is essential for providing a seamless and intuitive user experience in the voice-controlled electronic device 106, where the accuracy of voice command recognition directly impacts the effectiveness of the system 102.
[0082]In an exemplary embodiment, the command extraction subsystem 216 is a critical component within the system 102 that is responsible for identifying and isolating at least one pre-defined voice instruction of the one or more pre-defined voice instructions that have been predefined and stored within the at least one user profile. Once the voice profile extraction subsystem 214 is successfully identified and matched the at least one user profile associated with the one or more user-personalised voice commands, the command extraction subsystem 216 then extracts at least one pre-defined voice instruction from this at least one user profile. The one or more pre-defined voice instructions are carefully associated with the user's unique vocal characteristics, ensuring that only the intended commands are extracted and acted upon.
[0083]The extracted at least one pre-defined voice instruction is then prepared for delivery to the voice-controlled electronic device 106. During this process, the command extraction subsystem 216 ensures that the extracted at least one pre-defined voice instruction is accompanied by the previously attenuated ambient noise, meaning that the command is free from external interference and can be clearly understood and executed by the system 102. This meticulous extraction process is crucial for maintaining the accuracy and reliability of the system 102, particularly in environments where background noise is prevalent.
[0084]In an exemplary embodiment, the command extraction subsystem 216 is operatively connected to the command executing subsystem 218. The command executing subsystem 218 is configured to execute the extracted at least one pre-defined voice instruction by adopting a voice wake model for actuating the voice-controlled electronic device 106. The system 102 only responds to the one or more user-personalised voice commands that match the one or more pre-defined voice instructions. Upon receiving the one or more user-personalised voice commands from the command extraction subsystem 216, the command executing subsystem 218 actuates the voice-controlled electronic device 106, triggering the voice-controlled electronic device 106 to perform the desired action.
[0085]In an exemplary embodiment, the voice wake model used by the command executing subsystem 218 is particularly important because it adds an additional layer of precision to the system 102. It ensures that only authenticated voice commands, which are recognised as coming from the legitimate user of the one or more users, are executed. This prevents accidental or unauthorised activation of the voice-controlled electronic device 106, providing a seamless user experience that is both intuitive and secure.
[0086]For instance, imagine a smart home environment where multiple family members use voice commands to control various devices, such as lights, thermostats, and entertainment systems. Each family member has a unique voice profile stored in the system, allowing the smart home to recognize who is speaking and execute personalized commands accordingly. Let the family members with the one or more pre-defined voice instructions be: a) user 1: Prefers dim lighting and a cooler temperature, b) user 2: Likes bright lighting and a warmer environment, and c) user 3: Enjoys specific music playlists when he enters the living room.
[0087]During setup, user 1, user 2, and user 3 each provide the one or more user-personalised voice commands to the system and create user profiles. The one or more user-personalised voice commands such as "Turn on the living room lights," "Set the thermostat to 68 degrees," and "Play my favourite playlist". The one or more user-personalised voice commands is analysed by the audio signals processing subsystem 210 using the Fast Fourier Transform (FFT) model to determine their unique frequency characteristics, such as pitch, tone, and speech patterns. This data is stored in their respective user profiles.
[0088]In the real time usage, the user 1 walks into a living room and says, "Turn on the lights." The system captures his voice through the voice-capturing unit 104 (which could be his smartphone, a smart speaker, or any other communication device present in the room). The data-obtaining subsystem 208 immediately sends the audio signal of the user 1 to the audio signals processing subsystem 210. The audio signals processing subsystem 210 breaks down the audio signal into individual spectral components to identify the frequency characteristics of the one or more user-personalised voice commands from the user 1.
[0089]Simultaneously, the noise reduction subsystem 212 compares the plurality of frequencies with the ambient noise frequency data stored in the one or more databases 114. It uses a bandpass filter to isolate user 1’s voice from background noise, ensuring that only his command is processed. The voice profile extraction subsystem 214 then matches the filtered voice command with user 1’s stored user profile, confirming his identity. The command extraction subsystem 216 extracts the specific instruction ("Turn on the lights") from the user 1’s profile and sends it to the command executing subsystem 218. The command executing subsystem 218 then uses the voice wake model to activate the smart home system, turning on the living room lights to preferred dim setting of the user 1.
[0090]Over time, the system performs periodic re-calibration of the noise reduction models to account for changes in ambient noise, such as seasonal differences in outdoor noise levels or new appliances in the home. This recalibration ensures that the bandpass filters remain optimized, providing consistent accuracy in command recognition. For multi-user interaction, later the user 2 enters the room and says, "Turn on the lights." The system recognizes the user 2 one or more user-personalised voice commands, isolates the one or more user-personalised voice commands, and adjusts the living room lights to the user 2’s preferred bright setting, different from the user 1. When the user 3 arrives, the user 3 says, "Play my playlist." The system identifies the user 3, isolates the one or more user-personalised voice commands, and begins playing his favourite music, which is stored in the user profile associated with the user 3.
[0091]Figure 3 illustrates an exemplary schematic diagram depicting an initial configuration 300 of the voice-controlled electronic device 106 for generating one or more user profiles, in accordance with an embodiment of the present disclosure.
[0092]In an exemplary embodiment, during the initial configuration 300, the system 102 is set up to identify and differentiate between the one or more audio signals of the one or more users. This process is crucial for enabling personalized interactions with the voice-controlled electronic device 106. The voice-controlled electronic device 106 needs to be activated and enters an initial setup mode. This setup mode is configured to guide each user through the process of generating their individual user profile of the one or more user profiles. Each user of the system 102 is prompted to register their voice with the voice-controlled electronic device 106. This involves speaking the one or more user-personalised voice commands into the one or more voice-capturing units 104 through the one or more communication devices 116. For instance, Examples of the one or more user-personalised voice commands may include common commands such as "Turn on the lights," "Play music," or "Set the thermostat to 72 degrees." The one or more user-personalised voice commands are chosen to cover a range of vocal expressions and speech patterns.
[0093]After the one or more audio signals have been processed and isolated by the audio signals processing subsystem 210 and noise reduction subsystem 212, the system 102 creates the at least one user profile of the one or more user profiles for each user of the one or more users. The at least one user profile is stored in the user profile generation subsystem 206 and includes key data points such as: the one or more predefined voice instructions addressed by the user during registration and the voice frequency data, which includes the pitch, tone, and speech patterns extracted during the FFT analysis. This created at least one user profile is then stored in the one or more databases 114 for future reference. The stored at least one user profile may be used by the system 102 to recognize and respond to the one or more user-personalised voice commands in real-time. The initial configuration 300 results in the generation of personalized user profiles for each user. The one or more user profiles enable the system 102 to accurately differentiate between the one or more users and execute commands tailored to their preferences.
[0094]The system 102 manages 5 to 20 user-personalised voice commands. The system 102 is configured to handle a frequency band range between, but not limited to, 20 hertz (Hz) to 20 KHz. To enhance noise reduction, the system 102 incorporates a variable number of pre-recorded standard noises for reduction, offering adaptability with an optimal range. The noise reduction effectiveness parameter, measured by signal-to-noise ratio improvement, spans from, but not limited to, 0 dB Signal-to-Noise Ratio (SNR) to -20 dB Signal-to-Noise Ratio (SNR), showcasing the capability of the system 102 to significantly enhance voice clarity in various noise scenarios. Real-time processing latency, representing the time taken for processing voice commands, is finely tuned within the range of, but not limited to, 100 milliseconds to 2 seconds, ensuring prompt and seamless execution of user commands. This comprehensive set of parameters underscores the versatility, adaptability, and commitment of the system 102 to deliver a personalized and optimal voice-controlled experience for the one or more users.
[0095]Figure 4 illustrates an exemplary schematic diagram depicting a periodic re-calibration 400 for updating one or more predefined ambient noise frequency data, in accordance with an embodiment of the present disclosure.
[0096]In an exemplary embodiment, the periodic re-calibration 400 is an essential aspect of maintaining the accuracy and effectiveness of the noise reduction subsystem 212 in the voice-controlled electronic device 106. The periodic re-calibration 400 ensures that the system 102 adapts to changing environmental conditions, thereby continuing to isolate user-personalized voice commands with high precision. During the initial configuration of the system 102, as described, the noise reduction subsystem 212 is calibrated with the one or more predefined ambient noise frequency data. The one or more predefined ambient noise frequency data represent common background noises in the user's environment, such as, but not limited to, at least one of: air conditioning hums, street noise, household appliances, and the like. The one or more predefined ambient noise frequency data are obtained through the one or more communication devices 116 and stored in the one or more databases 114 and serve as a baseline for the one or more noise reduction models used in the system 102.
[0097]Over time, the acoustic environment in which the voice-controlled electronic device 106 operates may change. Factors such as new appliances, changes in furniture layout, seasonal variations, or even new sources of noise may alter the one or more predefined ambient noise frequency data. To account for these changes, the system is configured to perform the periodic re-calibration. The periodic re-calibration is essential for ensuring that the noise reduction models remain effective in isolating the one or more user-personalized voice commands from the evolving background noise i.e., the ambient noise.
[0098]The periodic re-calibration 400 is triggered at the pre-defined time period, which may range between 24 hours and 365 days, depending on the system's settings and the one or more user's preferences. The periodic re-calibration updates, maintain an effectiveness of the system 102 by addressing wear and tear and adapting to changing environmental conditions. The pre-defined time period is chosen to balance the need for accuracy with the convenience of the one or more users, avoiding too frequent interruptions while still maintaining system performance. When the periodic re-calibration is initiated, the data-obtaining subsystem 208 collects new one or more audio signals from the environment using the voice-capturing units 104. These new one or more audio signals include the latest ambient noises that might not have been present during the initial calibration or previous re-calibrations. The newly collected one or more audio signals are processed by the audio signals processing subsystem 210 to extract the current one or more predefined ambient noise frequency data. With the updated one or more predefined ambient noise frequency data, the noise reduction subsystem 212 recalibrates the one or more noise reduction models. The periodic re-calibration involves adjusting the parameters and the bandpass filters used to segregate the one or more user-personalised voice commands from the ambient noise.
[0099]In an alternative embodiment, the periodic re-calibration 400 may be dynamically triggered at any moment when the system 102 detects the need for re-calibration, rather than adhering to a fixed pre-defined time period. This real-time, on-demand re-calibration may occur based on various factors such as, but not limited to, at least one of: changes in the noise environment, detection of significant deviations in ambient noise frequency data, and system performance degradation. The system 102 continuously monitors for such conditions and initiates the re-calibration process as soon as it identifies the need to update the one or more predefined ambient noise frequency data. In another exemplary embodiment, the periodic re-calibration 400 may be triggered either manually or automatically whenever the system 102 requires. In manual mode, users can initiate the re-calibration process at any time based on their preference or upon noticing performance issues, such as reduced accuracy in recognizing voice commands. In automatic mode, the system 102 continuously monitors environmental conditions and system performance. When significant changes in the noise environment or a drop in voice recognition accuracy are detected, the system 102 automatically triggers the re-calibration process to update the one or more predefined ambient noise frequency data. This flexibility ensures that the system 102 maintains optimal performance, adapting to varying conditions as needed.
[0100]Figure 5 illustrates an exemplary schematic diagram depicting a use case 500 of the system 102, in accordance with an embodiment of the present disclosure.
[0101]In a specific use case 500, User A installs a smart air conditioner (Smart AC) and configures the smart air conditioner with a designated voice command as “Turn on AC.” The system 102 meticulously captures the unique frequency associated with User A’s command, thereby generating a personalized voice profile(includes bandpass filter, amplitude level etc) tailored to their voice profile. In practical scenarios where ambient noise is prevalent, the system 102 adeptly discerns and executes User A's command with precision, showcasing the system 102 resilience against irrelevant background noises. This tailored approach ensures that the smart air conditioner accurately responds to User A's voice commands, exemplifying the effectiveness of the system 102 in real-world, noise-prone environments.
[0102]In another specific use case 500, User B installs the Ceiling Fan and configures the User B voice command as "Switch On Fan." The system 102 records the specific profiles associated with the “Switch On Fan” command for the particular user and generates a tailored profile for the User B. In the presence of the ambient noise, encompassing factors like air cutting noise, mounting noise, and motor noise, the system 102 consistently and accurately identifies User B's command by attenuating the ambient noise, ensuring the reliable activation of the ceiling fan. This exemplifies the ability of the system 102 to distinguish and respond to the one or more user-personalised voice commands amidst diverse environmental noises, enhancing the overall user experience.
[0103]Figure 6 illustrates an exemplary flow chart of a method 600 for providing the one or more user-personalised voice commands to the voice-controlled electronic device 106 with the attenuated ambient noise, in accordance with an embodiment of the present disclosure.
[0104]In accordance with another embodiment of the present disclosure, the method 600 for providing the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise. At step 602, the method 600 includes receiving, by the one or more voice-capturing units, the one or more audio signals from at least one of the: each user of one or more users and the one or more noise generating objects. The one or more voice-capturing units may comprise, but not limited to, at least one of the: microphones, headsets with microphones, smart speakers, wearable devices, voice-activated units, smartphone microphones, communication device microphones, and the like.
[0105]At step 604, the method 600 includes generating, by the one or more hardware processors through the user profile generation subsystem, the at least one user profile of the one or more user profiles for each user of the one or more users to store at least one of the: one or more pre-defined voice instructions and voice frequency data associated with the one or more audio signals. The one or more pre-defined voice instructions are mapped with the one or more user-personalised voice commands and then processed within the audio signals processing subsystem and the noise reduction subsystem.
[0106]At step 606, the method 600 includes obtaining, by the one or more hardware processors through the data-obtaining subsystem, at least one of: the received one or more audio signals from the one or more voice-capturing units and one or more predefined ambient noise frequency data for storing in the one or more databases. The one or more predefined ambient noise frequency data are stored in the one or more databases and periodically updated based on a pre-defined time period. The one or more predefined ambient noise frequency data serve as a reference for the noise reduction subsystem, enabling it to distinguish between the one or more user-personalised voice commands and background noise i.e., the ambient noise effectively.
[0107]At step 608, the method 600 includes converting, by the one or more hardware processors through the audio signals processing subsystem, the one or more audio signals into one or more individual spectral components to extract the plurality of frequencies in the one or more audio signals. The audio signals processing subsystem is configured with the Fast Fourier Transform (FFT) model to determine the plurality of frequencies in the one or more audio signals.
[0108]At step 610, the method 600 includes comparing, by the one or more hardware processors through the noise reduction subsystem, the extracted plurality of frequencies against the one or more predefined ambient noise frequency data by the one or more noise reduction models for segregating the one or more user-personalised voice commands from the ambient noise in the processed one or more audio signals to attenuate the ambient noise. By effectively segregating the one or more user-personalised voice commands from the background noise i.e., ambient noise, the method 600 is able to attenuate unwanted noise, thereby enhancing the clarity and accuracy of the one or more user-personalised voice commands that are detected and processed by the method 600.
[0109]At step 612, the method 600 includes generating, by the one or more hardware processors through the noise reduction subsystem, the one or more bandpass filters to isolate the one or more user-personalised voice commands associated with the at least one user profile of the one or more user profiles by analysing the one or more frequency characteristics of the one or more user-personalised voice commands. By focusing on the one or more frequency characteristics, the bandpass filters allow the system 102 to selectively pass only those frequencies that correspond to the one or more audio signals associated with the corresponding user of the one or more users, while filtering out other non-relevant frequencies, including those associated with the ambient noise.
[0110]The method 600 performs periodic re-calibration of the one or more noise reduction models not only keeps the performance of the method 600 consistent but also optimises the one or more bandpass filters generated by the noise reduction subsystem. By continually refining the one or more bandpass filters based on the updated one or more predefined ambient noise frequency data, the method 600 enhances its ability to isolate the one or more user-personalised voice commands with greater precision. This optimisation process is crucial for maintaining high levels of accuracy in the one or more user-personalised voice commands recognition, particularly in environments where the ambient noise levels may fluctuate or evolve over time. The pre-defined time period for performing the periodic re-calibrations is between 24 hours and 365 days.
[0111]At step 614, the method includes comparing, by the one or more hardware processors through the voice profile extraction subsystem, the one or more frequency characteristics of each bandpass filter of the one or more bandpass filters against the voice frequency data stored in the one or more user profiles to extract the at least one user profile.
[0112]At step 616, the method 600 includes extracting, by the one or more hardware processors through the command extraction subsystem, the at least one pre-defined voice instruction of the one or more pre-defined voice instructions from the extracted at least one user profile based on the isolated one or more user-personalised voice commands. At step 618, the method 600 includes providing, by the one or more hardware processors through the command executing subsystem, the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise to execute the extracted at least one pre-defined voice instruction.
[0001] Numerous advantages of the present disclosure may be apparent from the discussion above. In accordance with the present disclosure, the system for providing the one or more user-personalised voice commands to the voice-controlled electronic device with the attenuated ambient noise. The personalized approach leads to improved voice recognition accuracy, thereby reducing false activations and false rejections. The system effectively handles dynamic noise environments and stationary noise environments, making the system suitable for various use cases. The personalized noise reduction ensures efficient resource utilization and reduces the need for complex hardware and expensive solutions. The users experience a highly customised experience, leading to increased satisfaction and ease of the voice-controlled electronic device usage. With the implementation of the FFT model and the efficient one or more noise reduction models, the system offers low-latency real-time processing of voice commands. This minimizes the delay between voice input and voice-controlled electronic device response, leading to a more seamless and responsive user experience.
[0002] The periodic re-calibration mechanism allows the system to adapt and improve over time by continuously updating one or more noise reduction models and one or more user profiles. This ensures that the system remains effective even as the user's environment changes, maintaining high accuracy in voice recognition. By creating personalized user profiles that are unique to each user, the system adds an additional layer of security. This personalization minimizes the risk of unauthorized access or accidental command activation by others, ensuring that only the intended user’s commands are recognized and executed. The system's ability to operate effectively in dynamic and stationary noise environments reduces the need for constant, high-power processing. By efficiently isolating relevant voice commands, the system conserves energy, which is particularly advantageous for battery-powered devices. The system's ability to store and manage multiple user profiles allows for a highly customizable user experience. The one or more users may have their own personalized commands and noise reduction settings, making the system versatile and user-friendly in multi-user households or shared spaces.
[0003] While specific language has been used to describe the invention, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.
[0113]The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.
,CLAIMS:I/We Claim:
1. A system (102) for providing one or more user-personalised voice commands to a voice-controlled electronic device (106) with an attenuated ambient noise, comprising:
one or more voice-capturing units (104) operatively connected to the voice-controlled electronic device (106), configured to receive one or more audio signals from at least one of: each user of one or more users and one or more noise generating objects;
one or more hardware processors (108); and
a memory unit (110) operatively connected to the one or more hardware processors (108), wherein the memory unit (110) comprises a set of computer-readable instructions in form of a plurality of subsystems (112), configured to be executed by the one or more hardware processors (108), wherein the plurality of subsystems (112) comprises:
a user profile generation subsystem (206) configured to generate at least one user profile of one or more user profiles for each user of the one or more users to store at least one of: one or more pre-defined voice instructions and voice frequency data associated with the one or more audio signals;
a data-obtaining subsystem (208) configured to obtain at least one of: the received one or more audio signals from the one or more voice-capturing units (104) and one or more predefined ambient noise frequency data for storing in one or more databases (114);
an audio signals processing subsystem (210) configured to convert the one or more audio signals into one or more individual spectral components to extract a plurality of frequencies in the one or more audio signals;
a noise reduction subsystem (212) configured to:
compare the extracted plurality of frequencies against the one or more predefined ambient noise frequency data by one or more noise reduction models for segregating the one or more user-personalised voice commands from ambient noise in the processed one or more audio signals to attenuate the ambient noise;
generate one or more bandpass filters to isolate the one or more user-personalised voice commands associated with the at least one user profile of the one or more user profiles by analysing one or more frequency characteristics of the one or more user-personalised voice commands; and
perform a periodic re-calibration to the one or more noise reduction models by updating the one or more predefined ambient noise frequency data at a pre-defined time period;
a voice profile extraction subsystem (214) configured to compare the one or more frequency characteristics of each bandpass filter of the one or more bandpass filters against the voice frequency data stored in the one or more user profiles to extract the at least one user profile; and
a command extraction subsystem (216) configured to extract at least one pre-defined voice instruction of the one or more pre-defined voice instructions from the extracted at least one user profile based on the isolated one or more user-personalised voice commands, for providing the one or more user-personalised voice commands to the voice-controlled electronic device (106) with the attenuated ambient noise.
2. The system (102) as claimed in claim 1, wherein the one or more pre-defined voice instructions are received during an initial configuration of the voice-controlled electronic device (106) through the one or more voice-capturing units (104),
the one or more pre-defined voice instructions are analysed and processed in the audio signals processing subsystem (210) and the noise reduction subsystem (212) to obtain the voice frequency data associated with the one or more audio signals,
the voice frequency data comprises at least one of: pitch, tone, and speech patterns of each user of the one or more users.
3. The system (102) as claimed in claim 1, wherein the data-obtaining subsystem (208) is configured to obtain the one or more audio signals from the one or more voice-capturing units (104) in a real time and transferred to the audio signals processing subsystem (210) and the noise reduction subsystem (212) for attenuating the ambient noise,
the one or more predefined ambient noise frequency data are obtained based on the pre-defined time period to perform the periodic re-calibration.
4. The system (102) as claimed in claim 1, wherein the audio signals processing subsystem (210) is configured with a Fast Fourier Transform (FFT) model to determine the plurality of frequencies in the one or more audio signals,
the audio signals processing subsystem (210) is configured to process the one or more audio signals with a real-time processing latency between 100 milliseconds and 2 seconds.
5. The system (102) as claimed in claim 1, wherein the periodic re-calibration of the one or more noise reduction models optimises the one or more bandpass filters,
the optimised one or more bandpass filters configured to refine the isolation of the one or more user-personalised voice commands.
6. The system (102) as claimed in claim 1, wherein the pre-defined time period for performing the periodic re-calibrations is between 24 hours and 365 days.
7. The system (102) as claimed in claim 1, wherein the voice profile extraction subsystem (214) is configured to perform a correlation analysis between the one or more frequency characteristics of each bandpass filter of the one or more bandpass filters against the voice frequency data for refining the extraction of the at least one user profile to optimise accuracy in recognition of the one or more user-personalised voice commands.
8. The system (102) as claimed in claim 1, wherein the command extraction subsystem (216) is operatively connected to a command executing subsystem (218),
the command executing subsystem (218) is configured to execute the extracted at least one pre-defined voice instruction by adopting a voice wake model for actuating the voice-controlled electronic device (106).
9. A method (600) for providing one or more user-personalised voice commands to a voice-controlled electronic device (106) with an attenuated ambient noise, comprising:
receiving, by one or more voice-capturing units (104), one or more audio signals from at least one of: each user of one or more users and one or more noise generating objects;
generating, by one or more hardware processors (108) through a user profile generation subsystem (206), at least one user profile of one or more user profiles for each user of the one or more users to store at least one of: one or more pre-defined voice instructions and voice frequency data associated with the one or more audio signals;
obtaining, by the one or more hardware processors (108) through a data-obtaining subsystem (208), at least one of: the received one or more audio signals from the one or more voice-capturing units (104) and one or more predefined ambient noise frequency data for storing in one or more databases (114);
converting, by the one or more hardware processors (108) through an audio signals processing subsystem (210), the one or more audio signals into one or more individual spectral components to extract a plurality of frequencies in the one or more audio signals;
comparing, by the one or more hardware processors (108) through a noise reduction subsystem (212), the extracted plurality of frequencies against the one or more predefined ambient noise frequency data by one or more noise reduction models for segregating the one or more user-personalised voice commands from ambient noise in the processed one or more audio signals to attenuate the ambient noise;
generating, by the one or more hardware processors (108) through the noise reduction subsystem (212), one or more bandpass filters to isolate the one or more user-personalised voice commands associated with the at least one user profile of the one or more user profiles by analysing one or more frequency characteristics of the one or more user-personalised voice commands;
comparing, by the one or more hardware processors (108) through a voice profile extraction subsystem (214), the one or more frequency characteristics of each bandpass filter of the one or more bandpass filters against the voice frequency data stored in the one or more user profiles to extract the at least one user profile;
extracting, by the one or more hardware processors (108) through a command extraction subsystem (216), at least one pre-defined voice instruction of the one or more pre-defined voice instructions from the extracted at least one user profile based on the isolated one or more user-personalised voice commands; and
providing, by the one or more hardware processors (108) through a command executing subsystem (218), the one or more user-personalised voice commands to a voice-controlled electronic device (106) with the attenuated ambient noise to execute the extracted at least one pre-defined voice instruction.
10. The method (600) as claimed in claim 1, comprising:
performing, by the one or more hardware processors (108) through the noise reduction subsystem (212), a periodic re-calibration to the one or more noise reduction models by updating the one or more predefined ambient noise frequency data at a pre-defined time period,
the pre-defined time period for performing the periodic re-calibrations is between 24 hours and 365 days.
Dated this 12th day of September, 2024
VIDYA BHASKAR SINGH NANDIYAL
Patent Agent No. 2912
IPExcel Private Limited
| # | Name | Date |
|---|---|---|
| 1 | 202441004916-STATEMENT OF UNDERTAKING (FORM 3) [24-01-2024(online)].pdf | 2024-01-24 |
| 2 | 202441004916-PROVISIONAL SPECIFICATION [24-01-2024(online)].pdf | 2024-01-24 |
| 3 | 202441004916-FORM FOR STARTUP [24-01-2024(online)].pdf | 2024-01-24 |
| 4 | 202441004916-FORM FOR SMALL ENTITY(FORM-28) [24-01-2024(online)].pdf | 2024-01-24 |
| 5 | 202441004916-FORM 1 [24-01-2024(online)].pdf | 2024-01-24 |
| 6 | 202441004916-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [24-01-2024(online)].pdf | 2024-01-24 |
| 7 | 202441004916-EVIDENCE FOR REGISTRATION UNDER SSI [24-01-2024(online)].pdf | 2024-01-24 |
| 8 | 202441004916-DRAWINGS [24-01-2024(online)].pdf | 2024-01-24 |
| 9 | 202441004916-Proof of Right [20-02-2024(online)].pdf | 2024-02-20 |
| 10 | 202441004916-FORM-26 [20-02-2024(online)].pdf | 2024-02-20 |
| 11 | 202441004916-STARTUP [12-09-2024(online)].pdf | 2024-09-12 |
| 12 | 202441004916-FORM28 [12-09-2024(online)].pdf | 2024-09-12 |
| 13 | 202441004916-FORM-9 [12-09-2024(online)].pdf | 2024-09-12 |
| 14 | 202441004916-FORM-5 [12-09-2024(online)].pdf | 2024-09-12 |
| 15 | 202441004916-FORM-26 [12-09-2024(online)].pdf | 2024-09-12 |
| 16 | 202441004916-FORM FOR STARTUP [12-09-2024(online)].pdf | 2024-09-12 |
| 17 | 202441004916-FORM 3 [12-09-2024(online)].pdf | 2024-09-12 |
| 18 | 202441004916-FORM 18A [12-09-2024(online)].pdf | 2024-09-12 |
| 19 | 202441004916-EVIDENCE FOR REGISTRATION UNDER SSI [12-09-2024(online)].pdf | 2024-09-12 |
| 20 | 202441004916-DRAWING [12-09-2024(online)].pdf | 2024-09-12 |
| 21 | 202441004916-CORRESPONDENCE-OTHERS [12-09-2024(online)].pdf | 2024-09-12 |
| 22 | 202441004916-COMPLETE SPECIFICATION [12-09-2024(online)].pdf | 2024-09-12 |
| 23 | 202441004916-FER.pdf | 2024-10-28 |
| 24 | 202441004916-FER_SER_REPLY [24-12-2024(online)].pdf | 2024-12-24 |
| 25 | 202441004916-US(14)-HearingNotice-(HearingDate-20-02-2025).pdf | 2025-01-22 |
| 26 | 202441004916-FORM 3 [27-01-2025(online)].pdf | 2025-01-27 |
| 27 | 202441004916-Correspondence to notify the Controller [27-01-2025(online)].pdf | 2025-01-27 |
| 28 | 202441004916-FORM-26 [29-01-2025(online)].pdf | 2025-01-29 |
| 29 | 202441004916-Written submissions and relevant documents [07-03-2025(online)].pdf | 2025-03-07 |
| 30 | 202441004916-Retyped Pages under Rule 14(1) [07-03-2025(online)].pdf | 2025-03-07 |
| 31 | 202441004916-2. Marked Copy under Rule 14(2) [07-03-2025(online)].pdf | 2025-03-07 |
| 32 | 202441004916-PatentCertificate25-03-2025.pdf | 2025-03-25 |
| 33 | 202441004916-IntimationOfGrant25-03-2025.pdf | 2025-03-25 |
| 1 | SEARCH37E_25-10-2024.pdf |