Abstract: A system and method for providing an emotional interactive response to a voice command of a user. The method encompasses receiving from a first target device, the voice command of the user. The method thereafter comprises identifying, age, gender and emotional state of the user based on the voice command. Further the method encompasses categorizing, voice sample/s of the voice command into one or more categories of the emotional state. The method thereafter encompasses identifying, usage pattern/s associated with the user based on contextual and non-voice contextual data associated with the user. Further the method comprises generating, the emotional interactive response to the voice command based at least on the usage pattern/s and the one or more categories. The method comprises providing, the emotional interactive response to the voice command via at least one of voice capable device/s and non-voice capable device/s present in vicinity of the user.
FORM 2
THE PATENTS ACT, 1970
(39 OF 1970)
AND
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See section 10 and rule 13)
“SYSTEM AND METHOD FOR PROVIDING AN EMOTIONAL INTERACTIVE RESPONSE”
We, Jio Platforms Limited, an Indian Citizen of, 101, Saffron, Nr. Centre Point, Panchwati 5 Rasta, Ambawadi, Ahmedabad - 380006, Gujarat, India.
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD:
The present invention generally relates to voice interaction and more particularly, to systems and methods for providing an emotional interactive response to a voice command of a user.
BACKGROUND OF THE DISCLOSURE:
The following description of the related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section is used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of the prior art.
Over the past few years with an advancement in the digital and wireless technologies, smart devices (such as voice and non-voice capable devices) are also enhanced to a great extent. These smart devices are generally configured to provide a response to users’ commands. Furthermore, the non-voice capable devices are the electronic devices that receives non-voice inputs/ commands from the users or instructions from other connected devices (such as the voice capable devices) to perform one or more operations, for instance to provide a response to the received non-voice commands from the users and/or the instructions received from other connected devices. Also, the voice capable devices are the electronic device that receives voice inputs/commands from the users to provide a response to received voice commands based on voice trigger capabilities. These voice trigger detection enabled devices (i.e. the voice capable devices) are very common these days. Voice triggers, for example “Hello ABC”, “OK XYZ”, are very common and many smart devices with voice trigger detection capability are available. Voice trigger detection is gaining popularity and there are many devices getting launched every year with voice trigger detection capability. Also, today the smarts devices (i.e. the voice and non-voice capable devices) contains communication capabilities including, but not limited to, wireless communication capabilities provided by wireless networks such as LTE, Wi-Fi, Bluetooth, NB IoT, etc.
Also, today a wireless network, that is widely deployed to provide various communication services such as voice, video, data, advertisement, content, messaging, broadcasts, etc. usually comprises multiple access networks and support communications for multiple users
by sharing the available network resources. One example of such a network is the Evolved Universal Terrestrial Radio Access (E-UTRA) which is a radio access network standard meant to be a replacement of the Universal Mobile Telecommunications System (UMTS) and High¬Speed Downlink Packet Access/High-Speed Uplink Packet Access (HSDPA/HSUPA) technologies specified in 3GPP releases 5 and beyond. Unlike HSPA, Long Term Evolution’s (LTE's) E-UTRA is an entirely new air interface system, unrelated to and incompatible with W-CDMA. It provides higher data rates, lower latency and is optimized for packet data. The earlier UMTS Terrestrial Radio Access Network (UTRAN) is the radio access network (RAN), defined as a part of the Universal Mobile Telecommunications System (UMTS), a third generation (3G) mobile phone technology supported by the 3rd Generation Partnership Project (3GPP). The UMTS, which is the successor to Global System for Mobile Communications (GSM) technologies, currently supports various air interface standards, such as Wideband-Code Division Multiple Access (W-CDMA), Time Division-Code Division Multiple Access (TD-CDMA), and Time Division-Synchronous Code Division Multiple Access (TD-SCDMA). The UMTS also supports enhanced 3G data communications protocols, such as High-Speed Packet Access (HSPA), which provides higher data transfer speeds and capacity to associated UMTS networks. Furthermore, as the demand for mobile data and voice access continues to increase, research and development continue to advance the technologies not only to meet the growing demand for access, but to advance and enhance the user experience with user device. Some of the technologies that have evolved starting GSM/EDGE, UMTS/HSPA, CDMA2000/EV-DO and TD-SCDMA radio interfaces with the 3GPP Release 8, e-UTRA is designed to provide a single evolution path for providing increases in data speeds, and spectral efficiency, and allowing the provision of more functionality.
Also, the wireless communication includes 5th generation mobile networks or 5th generation wireless systems, abbreviated 5G, the telecommunications standards beyond the current 4G LTE/ international mobile telecommunications (IMT)-advanced standards. 5G aims at higher capacity than current 4G LTE, allowing a higher density of mobile broadband users, and supporting device-to-device, ultra-reliable, and massive machine communications. 5G also aims at lower latency than 4G equipment and lower battery consumption, for better implementation of the Internet of things (IoT) devices.
Furthermore, 3GPP has introduced Narrow Band Internet of things IoT (NB-IoT) technology in release 13. The low end IoT applications can be met with this technology. It has taken efforts to address IoT markets with completion of standardization on NB-IoT. The NB-IoT technology has been implemented in licensed bands. The licensed bands of LTE are used for exploiting this technology. This technology makes use of a minimum system bandwidth of 180 KHz i.e. one PRB (Physical Resource Block) is allocated for this technology. The NB-IOT can be seen as a separate RAT (Radio Access Technology). The NB-IOT can be deployed in 3 modes as: “in-band”, “guard band” and “standalone”. In the “in-band” operation, resource blocks present within LTE carrier are used. There are specific resource blocks reserved for synchronization of LTE signals which are not used for NB-IOT. In “guard band” operation, resource blocks between LTE carriers that are not utilized by any operator are used. In “standalone” operation, GSM frequencies are used, or possibly unused LTE bands are used. Release 13 contains important refinements like discontinuous reception (eDRX) and power save mode. The PSM (Power Save Mode) ensures battery longevity in release 12 and is completed by eDRX for devices that need to receive data more frequently.
Furthermore, the Internet of Things (IoT) is a network of devices, vehicles, home appliances, and other items embedded with electronics, software, sensors, actuators, and connectivity which can be readable, recognizable, locatable, addressable, and controllable via an IoT communications network that enables these things to connect and exchange data, creating opportunities for more direct integration of the physical world into computer-based systems, resulting in efficiency improvements, economic benefits, and reduced human exertions. The “Internet of things” (IoT) concept getting more and more popular, devices, such as sensors, actuators and everyday objects including the coffee makers, washing machines, headphones, lamps and wearable devices, etc. are being increasingly looked upon as potential IoT devices. IoT involves extending internet connectivity beyond standard devices, such as desktops, laptops, smartphones and tablets, to any range of traditionally dumb or non-internet-enabled physical devices and everyday objects. Embedded with technology, these devices can communicate and interact over the communication networks, and they can be remotely monitored and controlled. The term "Enterprise IoT" refers to devices used in business and corporate settings in a network of physical objects that contain embedded technology to communicate and sense or interact with their internal states or the external environment. Here, IoT refers to connected physical devices, in many
cases everyday objects (things) that can communicate their status, respond to events, or even act autonomously. This enables communication among those things, closing the gap between the real and the virtual world and creating smarter processes and structures that can support user without needing their attention. IoT has evolved from the convergence of wireless technologies, micro-electromechanical systems (MEMS), and the Internet. An IoT device is generally provisioned with an IP address to provide it with the capability of transferring data and receive control signals over an IP network using the standard Internet protocols such as TCP/IP or UDP which is being exclusively used in the Internet.
Furthermore, the smart devices via the above disclosed and the like communication capabilities provides a response to a user command and/or a command received by other devices. Also, from voice capable smart devices/smart computing device such as from smartphones, smart TV’s, smart Set-Top-Boxes, smart fitness bands, Smart speakers and the like voice capable smart devices, the popularity of the voice trigger detection enabled devices is gaining, and people are embracing the voice trigger capability. Furthermore, a smart computing device or user equipment (UE) or user device refers to any electrical, electronic, electro-mechanical computing device or equipment or a combination of one or more of the above devices. Also, a ‘smartphone’ is one type of “smart computing device” that refers to a mobility wireless cellular connectivity device that allows end users to use services on cellular networks such as including but not limited to 2G, 3G, 4G, 5G and/or the like mobile broadband Internet connections with an advanced mobile operating system which combines features of a personal computer operating system with other features useful for mobile or handheld use. Also, a smartphone may have one or the other type of a subscriber identity module (SIM) card to connect to a network. These smartphones are also equipped with voice trigger capabilities to provide a response to various voice commands. Further, the smart set top box is also an example of smart devices. Currently, the type of TV Set-Top Box (STB) most widely used is one which receives encoded/compressed digital signals from a signal source (e.g., a content provider's headend) and decodes/decompresses those signals, converting them into analog signals compatible to an analog (SDTV) television. A STB may be defined as a computerized device that processes digital information and may come in many forms and can have a variety of functions such as Digital Media Adapters, Digital Media Receivers, Windows Media Extender and most video game consoles are also examples of the set-top boxes. The STB accepts commands from user/s (often via use of
remote devices such as a remote control) and transmits these commands back to the network operator which has a return path capability for two-way communication. The STBs can make it possible to receive and display TV signals, connect to networks, play games via a game console, provide satellite broadband TV services, Video on Demand, surf the Internet, interact with Interactive Program Guides (IPGs), display virtual channels, electronic storefronts, walled gardens, send e-mail, and videoconference. Also, many STBs are able to communicate in real time with other devices such as camcorders, DVD and CD players, portable media devices and music keyboards. Some STBs have large dedicated hard-drives and smart card slots to insert smart cards into for purchases and identification. The customer uses the STB for entertainment purposes. The users typically watch specific contents on specific channels at specific time via the STB.. Also, there is an option to record content as well, however, this optionally involves connecting STB with a dedicated external hard drive and explicitly recording the desired content. Furthermore, a voice portable on demand (POD) device, when connected to the STBs (for instance via. USB), provides voice trigger capability to existing the STBs. Furthermore, for voice capable devices, voice commands category can be mainly classified into following:
a. Command for knowledge –
i. These are commands which provide information and knowledge and usually do not result in any action. Commands like “how is the weather today?”, “What is the capital of India?”, “what is the time?” etc.
b. Command to perform action –
i. These are commands which result in performing an action. Commands like “Set alarm for 06:00AM”, “Play happy songs”, “Turn off lights”, etc.
c. Conversation commands –
i. This is conversation mode which could be just for information or may result in performing action.
ii. Information commands are like “What is the capital of India?” -> “Response” -> “What is the population over there?” “response” -> “How is the climate?” -> “response”.
iii. Conversation commands resulting in action are like “book a cab” -> “Response (what is the destination)” -> “Give ” “Response (book via ABC or XYZ” -> “XYZ” -> “Response (What time)” -> “Now” -> “response (XYZ cab booked for and cab will arrive in 15 min)”.
Further in the scenarios where users have multiple voice enabled devices that can accept voice trigger command, it is a pleasant experience for the users to use voice trigger detection and enable one or more desired smart devices. However, at present, there are no solutions to the problems related to generation of an emotional interactive response to voice commands initiated by the users. The current solutions fails to provide emotional voice trigger management for multiple smart devices through external/internal sensors to further process voice commands intended to only one specific device from the multiple smart devices with a customized voice based on an emotional state of the users. Also, there are no current solution for identification and categorization of voice sample of the users into specific categories of emotional state. The current solutions also fails to provide solutions for identification and categorization of the voice sample of the users using various techniques such as Artificial Intelligence techniques, Machine Learning techniques and Deep learning techniques, to detect age using Age-based data, gender using Gender-based data, emotional state (mood) using Emotion based data and language accent using Language-based data and Automatic Speech Recognizer (ASR) techniques, to further categorize the voice samples of the users into specific categories of emotional state. Furthermore, the current solutions also fails to provide solutions for identification and categorization of non-voice sample contextual information received from the smart devices and/or sensors, to provide emotional interactive response to the voice commands initiated by the users. The current solutions fails to detect emotional state of the users based on collection of sensor(s) information collected from surrounding connected smart devices and/or sensors. Also, current solutions fails to contextually modify/suit/fit a voice response content based on the emotional state of the user.
However, there are many independent solutions available for detecting age, gender, language, regional accent and/or emotional state (mood) based on human raw speech/voice data but the above stated problems have not been solved by the current solutions in multi-device environment for the emotional voice responses. For instance, one of the known prior art provides a solution for recognizing emotion from speech based on
age and gender using hierarchical models. Also, one another known art discloses learning to identify genders from raw speech signal. Furthermore, recognition of elderly speech and voice-driven document retrieval is also disclosed in one of the known art. Also, a currently known solution provides a solution for automatic speaker, age-group and gender identification from children’s speech. Furthermore, analysis of children's speech: Duration, Pitch and Formants is also disclosed in one of the known arts. Also, one another known solution, describes about intelligent sound based on emotion decision to adjust surrounding lighting. Further in another solution, suggestion of method about speech recognition system to identify the language to respond back in the same language is provided. However, at present, there are no solutions to the existing problems as defined above for emotional voice trigger management for multiple smart devices through external/internal sensors to process the voice command intended to only one specific smart device with customized voice based on the emotional state of the users. Also, these voice capable devices do not have emotional voice identity where each device would speak with different emotions. There is no solution to handle scenarios where a user has multiple voice trigger enabled devices that accept different emotional voice commands and provide a unified user experience across all these devices with emotional voice output. There is no solution for user voice preference with emotional voice preference set on user’s personal voice enabled device. e.g.: user’s personal smartphone device. Also, there is no solution where a user can set the emotional voice output preference and further customize the voice preference to have different emotional voice type and voice ages e.g. child, teenage, adult, etc.
Hence, there is a need in the art to provide solution to the existing problems as defined above for emotional voice trigger management for multiple devices through external/internal sensors to process the voice command intended to only one specific device with customized voice based on the emotional state of the user. Therefore, there is a requirement to provide a novel system and method for providing an emotional interactive response to a voice command of a user to further handle scenarios where a user has multiple voice trigger enabled devices that accept different emotional voice commands to provide a unified user experience across all these devices with emotional voice output.
The foregoing examples of the related art and limitations related herewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the figures.
SUMMARY OF THE DISCLOSURE
This section is provided to introduce certain objects and aspects of the present invention in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
In order to overcome at least some of the drawbacks mentioned in the previous section and those otherwise known to persons skilled in the art, an object of the present invention is to provide a method and system for providing an emotional interactive response to a voice command of a user. Also, another object of the present invention is to provide solution for emotional voice trigger management for multiple devices through external/internal sensors to process voice command/s intended to only one specific device with customized voice based on the emotional state of the user. Further, an object of the present invention is to provide a solution to voice capable devices with emotional voice identity where each device would speak with different emotions. Another object of the present invention is to provide a solution to handle scenarios where user has multiple voice trigger enabled devices that accept different emotional voice command and provide a unified user experience across all these devices with emotional voice output. Further an object of the present invention is to provide solution for user voice preference with emotional voice preference set on user’s personal voice enabled device. e.g.: user’s personal smartphone device. Another object of the present invention is to provide solution where a user can set the emotional voice output preference and further customize the voice preference to have different emotional voice type and voice ages e.g. child, teenage, adult, etc. Further an object of the present invention is to provide solution that will help transformation emotional voice capability on various devices that contains wireless communication capabilities including, but not limited to, Wi-Fi, Bluetooth, NB IoT, 5G etc. Also, an object of the present invention is to provide solution that will help transformation emotional voice capability on various devices on a secure backend cloud infrastructure where all sensor information (internal and external) collected by devices is securely stored, categorized and processed using techniques such as complex Artificial Intelligence (AI) and machine learning (ML) techniques to derive emotional patterns, user preference recognition and provide enhanced emotional user experience to provide emotional voices trigger management capability. Another object of the present invention is to provide solution that will upgrade the existing available devices that lack the capability for voice command with emotional voices to understand user voice command
with new emotional voice output. One more object of the present invention is to provide solution that will upgrade the current dumb devices with capability of emotional voice command based on the person presence and preference to provide better emotional voice command experiences. Further an object of the present invention is to provide a mechanism which provides a seamless enhancement of existing devices for emotional voice command services in the user devices independent and interoperable for devices and the IoT on 6G/5G/4G/3G/EV-Do/eHRPD capable technology. Also, another object of the present invention is to add value to user for content services and deliver interactive advertisements in the emotional voice command and emotional voices output services in the user devices. Yet another object of the present invention is to add value with input rich information and technology rich digital content, serving dual purpose of enhancing user experience with low input cost and reducing ecological burden of adding additional devices for such functionality.
Furthermore, in order to achieve the aforementioned objectives, the present invention provides a method and system for providing an emotional interactive response to a voice command of a user.
A first aspect of the present invention relates to the method for providing an emotional interactive response to a voice command of a user. The method encompasses receiving, at a transceiver unit of a cloud server unit from a first target device, the voice command of the user. The method thereafter comprises identifying, by an identification unit of the cloud server unit, age of the user, gender of the user and emotional state of the user based on the received voice command of the user. Further the method leads to categorizing, by a processing unit of the cloud server unit, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user. The method thereafter encompasses identifying, by the identification unit of the cloud server unit, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user. Further the method comprises generating, by the processing unit of the cloud server unit, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state. The method comprises providing, by the processing unit of the cloud
server unit, the generated emotional interactive response to the voice command of the user via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
Another aspect of the present invention relates to a system for providing an emotional interactive response to a voice command of a user. The system comprises a transceiver unit, configured to receive at a cloud server unit from a first target device, the voice command of the user. The system further comprises an identification unit, configured to identify at the cloud server unit, age of the user, gender of the user and emotional state of the user based on the received voice command of the user. Further the system comprises a processing unit configured to categorize at the cloud server unit, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user. The identification unit of the system is further configured to identify at the cloud server unit, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user. Also, the processing unit of the system is further configured to generate at the cloud server unit, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state. Thereafter, the processing unit is configured to provide by the cloud server unit, the generated emotional interactive response to the voice command of the user, via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
Another aspect of the present invention relates to a user equipment for providing an emotional interactive response to a voice command of a user. The user equipment comprises a system, wherein the system comprises a transceiver unit, configured to receive from a first target device, the voice command of the user. The system further comprises an identification unit, configured to identify, age of the user, gender of the user and emotional state of the user based on the received voice command of the user. Further the system comprises a processing unit configured to categorize, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user. The identification unit of the system is further configured to
identify, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user. Also, the processing unit of the system is further configured to generate, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state. Thereafter, the processing unit is configured to provide, the generated emotional interactive response to the voice command of the user, via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
Yet another aspect of the present invention relates to a method for providing an emotional interactive response to a voice command of a user. The method encompasses receiving, at a transceiver unit of a user equipment from a first target device, the voice command of the user. The method thereafter comprises identifying, by an identification unit of the user equipment, age of the user, gender of the user and emotional state of the user based on the received voice command of the user. Further the method encompasses categorizing, by a processing unit of the user equipment, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user. The method thereafter comprises identifying, by the identification unit of the user equipment, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user. Further the method leads to generating, by the processing unit of the user equipment, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state. The method thereafter encompasses providing, by the processing unit of the user equipment, the generated emotional interactive response to the voice command of the user via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
BRIEF DESCRIPTION OF DRAWINGS
The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings.
Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.
Figure 1 illustrates an exemplary block diagram of a system [100] for providing an emotional interactive response to a voice command of a user, in accordance with exemplary embodiments of the present invention.
Figure 2 illustrates an exemplary network architecture diagram [200] for providing an emotional interactive response to a voice command of a user, in accordance with exemplary embodiments of the present invention.
Figure 3 illustrates an exemplary method flow diagram [300], depicting a method for providing an emotional interactive response to a voice command of a user, in accordance with exemplary embodiments of the present invention.
Figure 4 (i.e. Figure 4a, Figure 4b and Figure 4c) illustrates an exemplary flow diagram, depicting an instance implementation of an exemplary process for providing an emotional voice response in multi voice environment, in accordance with exemplary embodiments of the present invention.
The foregoing shall be more apparent from the following more detailed description of the disclosure.
DESCRIPTION OF THE INVENTION
In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address
only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth.
Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a sequence diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
Furthermore, embodiments may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a machine-readable medium. A processor(s) may perform the necessary tasks.
The term “machine-readable storage medium” or “computer-readable storage medium” includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other mediums capable of storing, containing, or carrying instruction(s) and/or data. A machine-readable medium may include a non-transitory medium in which data can be stored and that does not include carrier waves and/or transitory electronic signals propagating wirelessly or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape, optical storage media such as compact disk (CD) or digital versatile disk (DVD), flash memory, memory or memory devices. A computer-program product may include code and/or machine-executable
instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner similar to the term “comprising” as an open transition word— without precluding any additional or other elements.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements,
components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
The term "data" as used herein means any indicia, signals, marks, symbols, domains, symbol sets, representations, and any other physical form or forms representing information, whether permanent or temporary, whether visible, audible, acoustic, electric, magnetic, electromagnetic or otherwise manifested. The term "data" as used to represent predetermined information in one physical form shall be deemed to encompass any and all representations of corresponding information in a different physical form or forms.
The terms "media data" and "media" as used herein mean data which is widely accessible, whether over-the-air, or via cable, satellite, network, internetwork (including the Internet), print, displayed, distributed on storage media, or by any other means or technique that is humanly perceptible, without regard to the form or content of such data, and including but not limited to audio, video, audio/video, text, images, animations, databases, broadcasts, displays (including but not limited to video displays, posters and billboards), signs, signals, web pages, print media and streaming media data.
The terms "reading" and "read" as used herein mean a process or processes that serve to recover data that has been added to, encoded in, combined with or embedded in, media data.
The term "database" as used herein means an organized body of related data, regardless of the manner in which the data or the organized body thereof is represented. For example, the organized body of related data may be in the form of one or more of a table, a map, a grid, a packet, a datagram, a frame, a file, an e-mail, a message, a document, a report, a list or in any other form.
The terms "first", "second", "primary" and "secondary" are used to distinguish one element, set, data, object, step, process, function, activity or thing from another, and are not used to designate relative position, or arrangement in time or relative importance, unless otherwise stated explicitly. The terms "coupled", "coupled to", and "coupled with" as used herein each mean a relationship between or among two or more devices, apparatus, files, circuits, elements, functions, operations, processes, programs, media, components, networks, systems, subsystems, and/or means, constituting any one or more of (a) a connection,
whether direct or through one or more other devices, apparatus, files, circuits, elements, functions, operations, processes, programs, media, components, networks, systems, subsystems, or means, (b) a communications relationship, whether direct or through one or more other devices, apparatus, files, circuits, elements, functions, operations, processes, programs, media, components, networks, systems, subsystems, or means, and/or (c) a functional relationship in which the operation of any one or more devices, apparatus, files, circuits, elements, functions, operations, processes, programs, media, components, networks, systems, subsystems, or means depends, in whole or in part, on the operation of any one or more others thereof.
The terms "communicate," and "communicating'' and as used herein include both conveying data from a source to a destination, and delivering data to a communications medium, system, channel, network, device, wire, cable, fiber, circuit and/or link to be conveyed to a destination and the term "communication" as used herein means data so conveyed or delivered. The term "communications" as used herein includes one or more of a communications medium, system, channel, network, device, wire, cable, fiber, circuit and link.
Moreover, terms like “user equipment” (UE), “electronic device”, “mobile station”, “user device”, “mobile subscriber station,” “access terminal,” “terminal,” “smartphone,” “smart computing device,” “smart device”, “device”, “handset,” and similar terminology refers to any electrical, electronic, electro-mechanical equipment or a combination of one or more of the above devices. Smart computing devices may include, voice and non-voice capable devices such as including but not limited to, a mobile phone, smart phone, virtual reality (VR) devices, augmented reality (AR) devices, pager, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, smart set top box (STB), smart speaker, smart fitness band, smart watches, or any other computing device as may be obvious to a person skilled in the art and required to implement the features of the present invention. In general, a smart computing device is a digital, user configured, computer networked device that can operate autonomously. A smart computing device is one of the appropriate systems for storing data and other private/sensitive information. The said device may operate at all the seven levels of ISO reference model, but the primary function is related to the application layer along with the network, session and presentation layer with any additional features of a touch screen, apps ecosystem, physical and biometric
security, etc. Further, a ‘smartphone’ is one type of “smart computing device” that refers to the mobility wireless cellular connectivity device that allows end-users to use services on 2G, 3G, 4G, 5G and/or the like mobile broadband Internet connections with an advanced mobile operating system which combines features of a personal computer operating system with other features useful for mobile or handheld use. These smartphones can access the Internet, have a touchscreen user interface, can run third-party apps including the capability of hosting online applications, music players and are camera phones possessing high-speed mobile broadband 4G LTE internet with video calling, hotspot functionality, motion sensors, mobile payment mechanisms and enhanced security features with alarm and alert in emergencies. Mobility devices may include smartphones, wearable devices, smart-watches, smart bands, wearable augmented devices, etc. For the sake of specificity, we will refer to the mobility device to both feature phone and smartphones in this disclosure but will not limit the scope of the disclosure and may extend to any mobility device in implementing the technical solutions. The above smart devices including the smartphone as well as the feature phone including IoT and the like devices enable the communication on the devices. Furthermore, the foregoing terms are utilized interchangeably in the subject specification and related drawings.
As used herein, a “processor” or “processing unit” includes one or more processors, wherein processor refers to any logic circuitry for processing instructions. A processor may be a general-purpose processor, a special-purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, a low-end microcontroller, Application Specific Integrated Circuits, Field Programmable Gate Array circuits, any other type of integrated circuits, etc. Furthermore, the term "processor" as used herein includes, but is not limited to one or more computers, hardwired circuits, signal modifying devices and systems, devices and machines for controlling systems, central processing units, programmable devices and systems, systems on a chip, systems comprised of discrete elements and/or circuits, state machines, virtual machines, data processors, processing facilities and combinations of any of the foregoing. The processor may perform signal coding data processing, input/output processing, and/or any other functionality that enables the working of the system according to the present disclosure. More specifically, the processor or processing unit is a hardware processor. The term "processor" as used herein means
processing devices, apparatus, programs, circuits, components, systems and subsystems, whether implemented in hardware, tangibly-embodied software or both, and whether or not programmable.
As used herein, “memory unit”, “storage unit” and/or “memory” refers to a machine or computer-readable medium including any mechanism for storing information in a form readable by a computer or similar machine. For example, a computer-readable medium includes read-only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory devices or other types of machine-accessible storage media. The memory unit as used herein is configured to retain data, whether on a temporary or permanent basis, and to provide such retained data to various units to perform their respective functions.
As used herein the “Transceiver Unit” may include but not limited to a transmitter to transmit data to one or more destinations and a receiver to receive data from one or more sources. Further, the Transceiver Unit may include any other similar unit obvious to a person skilled in the art, to implement the features of the present invention. The transceiver unit may convert data or information to signals and vice versa for the purpose of transmitting and receiving respectively.
The present invention provides a novel system and method for providing an emotional interactive response to a voice command of a user. More particularly, the present invention provides a solution to process the voice command of the user to detect emotional state of the user and use this information along with other contextual data to deliver smart emotional interactive response to the voice command in a multi-voice capable environment via wireless connectivity along with directly connected sensors. Furthermore, the present invention provides a solution to over-come the challenges in scenarios when a voice trigger command intends with emotional voice trigger for multiple devices (i.e. voice capable devices) through external/internal sensors to process the voice command intended with customized voice based on emotional state of the user. Also, the present invention provides solution to the existing problems as defined above for emotional voice trigger management for multiple devices through external/internal sensors to process the voice command intended to only one specific device (i.e. voice capable device) with customized voice based on the emotional state of the user. Also, the present invention provides solution to voice
capable devices with emotional voice identity where each device would speak with different emotions. The present invention also provides solution to handle scenarios where the user has multiple voice trigger enabled devices (i.e. voice capable devices) that accept different emotional voice command and provide a unified user experience across all these devices with emotional voice output. The present disclosure provides solution for user voice preference with emotional voice preference set on user’s personal voice enabled device. e.g.: user’s personal smartphone device. Also, the present disclosure provides solution where a user can set the emotional voice output preference and further customize the voice preference to have different emotional voice type and voice ages e.g. child, teenage, adult, etc.
Furthermore, the present invention in order to provide the above solutions encompasses providing the emotional interactive response to the voice command of the user. To provide the emotional interactive response to the voice command of the user, the present invention encompasses categorizing one or more voice samples of the received voice command of the user into one or more categories of emotional state based on an identified age of the user, an identified gender of the user and an identified emotional state of the user. Also, the present invention further encompasses identifying, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user. The present invention thereafter comprises generating, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state. Furthermore, to provide the generated emotional interactive response the present invention also encompasses identifying one or more best devices available to communicate back to the user and one or more best devices available to perform an action while responding back to the user. In an event the one or more best devices to communicate back to the user and the one or more best devices available to perform an action while responding back to the user are same or different based on a location of the user and the one or more best devices on which the action is required to be performed. Also, to provide the generated emotional interactive response the present invention also encompasses temporarily setting a user’s voice preference on the one or more devices identified to deliver the speech, to communicate back to user based at least on user’s voice preference (i.e. user’s usage pattern).
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present disclosure.
Referring to Figure 1, an exemplary block diagram of a system [100] for providing an emotional interactive response to a voice command of a user, in accordance with exemplary embodiments of the present invention is shown.
The system [100] comprises, at least one transceiver unit [102], at least one identification unit [104], at least one processing unit [106] and at least one storage unit [108]. Also, all of the components/ units of the system [100] are assumed to be connected to each other unless otherwise indicated below. In an implementation the system [100] resides at a cloud server unit and in another implementation the system [100] is connected to the cloud server unit, to implement features of the present invention. Also, in Fig. 1 only a few units are shown, however, the system [100] may comprise multiple such units or the system [100] may comprise any such numbers of said units, as required to implement the features of the present disclosure.
The system [100], is configured to provide an emotional interactive response to a voice command of a user, with the help of the interconnection between the components/ units of the system [100].
The processing unit [106] of the system [100] is configured to register at the cloud server unit, one or more smart devices (i.e. one or more voice capable devices and one or more non voice capable devices) present in vicinity of a user. Further, a device capability for each of the one or more voice capable devices and the one or more non voice capable devices is published and listed within the cloud sever unit by the processing unit [106]. The device capability includes one or more parameters indicating capabilities of each smart device (i.e. each of the one or more voice capable devices and the one or more non voice capable devices), such as including but not limited to one or more quality related parameters, one or more signal to noise ratio related parameters, one or more connectivity related parameters, one or more location related parameters, one or more device identification parameters and other such parameters indicating capabilities of a device. Also, in an implementation, the processing unit [106] is also configured to register and store at the cloud server unit, capabilities of all smart devices (including voice trigger enabled devices
and non-voice capable devices) belonging to a same user. Further, in an implementation the capabilities of a smart device may include but not limited to:
a. User voice preference – The customized voice preference set on user’s personal
voice enabled device. e.g.: User’s personal smartphone device etc.
b. For non-voice capable devices – Type of device, location, features supported, etc.
c. For voice enabled/capable devices - Type of microphone used (single channel or
multi-channel array), type of multimedia speaker used (mono, stereo, single or
multi-speaker environment, etc.), display capabilities, and/or location of device etc.
Furthermore, the capability for the voice capable devices may also include but not
limited to an ability to detect trigger or non-trigger based user voice commands,
compute information like signal-to-noise ratio, audio signal amplitude etc., detect
proximity based on speech utterance and/or on sensors (proximity, camera, etc.),
detect the voice capable device’s current state and transmit this information (extra
data/ capabilities related information) along with audio data to cloud server unit
whenever a trigger or a command is received.
Further, once the one or more voice capable devices and the one or more non voice capable devices are registered on the cloud server unit, the transceiver unit [102] is configured to receive at the cloud server unit from a first target device, a voice command of the user. The first target device is a voice capable device identified from the one or more voice capable devices present in the vicinity of the user by the processing unit [106], to receive the voice command of the user. The first target device is identified based on the device capability of the one or more voice capable devices. More particularly, the first target device is identified based on at least one of a real time GPS location data associated with the one or more voice capable devices, a device ID of the one or more voice capable devices, a voice command trigger timestamp associated with the one or more voice capable devices, a signal to noise ratio (SNR) associated with the one or more voice capable devices, a Quality parameter (QoS) associated with the one or more voice capable devices and the other such parameters indicating one or more capabilities of the one or more voice capable devices. For example, whenever a user triggers a voice command, all the nearby voice capable devices are configured to transmit a trigger and the voice command to the transceiver unit [102] of the cloud server unit along with one or more parameters indicating device capability of the all
nearby voice capable devices, such as current GPS location, device ID, trigger timestamp, SNR, etc. Thereafter, the processing unit [106] at the cloud server unit is configured to check if the trigger and the voice command are coming from multiple voice capable devices from a same location, based on the one or more parameters indicating device capability of the all nearby voice capable devices. Also, the processing unit [106] is configured to identify if the trigger and the voice command is coming from a same user using one or more voice recognition techniques. Once it is identified that the trigger and the voice command is coming from the from multiple voice capable devices from the same location and also from the same user, the processing unit [106] is configured to identify, based on the one or more parameters indicating device capability of the all nearby voice capable devices, one or more parameters indicating device capability of each voice capable device from the all nearby voice capable devices (such as associated quality of service parameters (QoS), signal to noise ratio (SNR) and the like), to further decide which is the best device (i.e. the first target device) to continue receiving user’s voice commands. For instance: the processing unit [106] may identify a voice capable device with best microphone and good QoS with very less SNR as the first target device. Also, in an implementation for a particular voice command of the user, any voice capable device present nearby the user can be selected as the first target device based on real time device capability of the one or more nearby voice capable devices.
Further after the receipt of the voice command of the user from the first target device at the cloud server unit, the identification unit [104] is configured to identify at the cloud server unit, age of the user, gender of the user and emotional state (mood) of the user based on the received voice command of the user. Each of the age of the user, the gender of the user and the emotional state of the user is further identified based on a pre-trained data set. Also, in an implementation the each of the age of the user, the gender of the user and the emotional state of the user is identified using one or more Artificial Intelligence techniques. Furthermore, the pre-trained data set is trained based at least on a plurality of data associated with age of plurality of users, gender of plurality of users, emotional state of plurality of users and language of plurality of users. More particularly, the pre-trained data set may include but not limited to an age-based pre-trained data set, a gender-based pre-trained data set, an emotion-based pre-trained data set and a language-based pre-trained data set. Furthermore in an instance, if a voice command of a user is received at the cloud server unit from the first target device, the processing unit [106] is configured to analyze
such voice command based on the pre-trained data set and based on such analysis the processing unit [106] is further configured to identify age of the user, gender of the user and emotional state (mood) of the user. Also, in an implementation the emotional state of the user is further identified based on the identified age of the user and the identified gender of the user. More particularly, in such implementation the processing unit [106] is configured to identify the emotional state of the user based on an analysis of the age of the user and the gender of the user based on the pre-trained data set. For example, if the identified age of the user is 20 years and identified gender of the user is Male, the processing unit [106] is configured to identify the emotional state (happy, sad etc.) of said user based on an analysis of the age 20 years and gender Male based on the pre-trained dataset. Furthermore, in an implementation one or more artificial intelligence, machine learning and deep learning techniques are used to detect the age of the user using the age-based pre-trained data set, the gender of the user using the gender-based pre-trained data set, the emotional state (mood) of the user using the emotion based pre-trained data set and the language accent of the user using automatic speech recognition (ASR) techniques and the language based pre-trained data set. Also, in an implementation once the age is detected using age-based pre-trained data set, and the gender is detected using the gender-based pre-trained data set, a speech data (i.e. the voice command related data) associated with the determined age and gender is then analyzed by the processing unit [106] based at least on an age and/or a gender specific emotional pre-trained data set, where deep learning techniques using gender specific age-based emotion techniques are used to derive the emotional state of the user.
Furthermore, the processing unit [102] is also configured to collect at the cloud server unit, one or more sensor data from one or more sensor devices connected in the vicinity of the user. The one or more sensor devices may include but not limited to one or more microphone sensors, proximity sensors, ultrasonic sensors, camera sensors, temperature sensors, motion sensors, optical sensors, infrared (IR) sensors, pressure sensors, light sensors and/or the like sensors. In an implementation, the one or more sensor data may also be collected from surrounding connected smart devices (like Wi-Fi mesh) including, but not limited to, set temperature (via smart thermostat), lighting condition (via smart bulb), background music (via. smart music box), TV channel / over the top (OTT) app content played (via Smart STB), etc. Further, the identification unit [104] is also configured to identify
at the cloud server unit, one or more contextual data associated with the user based on at least one of the received voice command of the user, the identified age of the user, the identified gender of the user, the identified emotional state of the user and the one or more sensor data. Also, the identification unit [104] is configured to identify at the cloud server unit, one or more non-voice contextual data associated with the user based at least on a user device data. In an instance the non-voice contextual information may include, but not limited to, date & time of an event associated with a user, month & year of an event associated with the user, a historical user data, current upcoming user engagements (integration with e-mail, calendar and other apps on user’s smartphone) and the like information identified based on the user device data. In an event the one or more non-voice contextual data associated with the user may be identified based on the one or more sensor data such as a camera sensor data indicating current or past events attended by the user etc.
Further, the processing unit [106] configured to categorize at the cloud server unit, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user. For example, if an identified age of a user is 22 years, an identified gender of the user is Female and an identified emotional state of the user is a laughing state, then the processing unit [106] in such example is configured to categorize at the cloud server unit, one or more voice samples of the voice command of the user into a happy category as a category of the emotional state. Furthermore, the categorization of the one or more voice samples of the received voice command of the user into the one or more categories of the emotional state is further based on the pre-trained data set. Also each of the one or more categories of the emotional state is one of an age based category of the emotional state, a gender based category of the emotional state and an emotional state based category of the emotional state. Further considering the above example, to categorize at the cloud server unit, the one or more voice samples of the voice command of the user into the happy category as the category of the emotional state, the processing unit is configured to analyze the identified age of the user i.e. 22 years, the identified gender of the user i.e. Female and the identified emotional state of the user i.e. the laughing state based on the pre-trained data set. Also, in the given example, the happy category is the emotional state based category of the emotional state. Furthermore, in an
instance the category of the emotional state may further be defined based on age of the user and the gender of the user as the age based category of the emotional state (for example 20-25 years age based category of the emotional state) and the gender based category of the emotional state (for example Male gender based category of the emotional state), respectively. Also, the processing unit [106] in an implementation is also configured to categorize at the cloud server unit, the one or more voice samples of the received voice command of the user into the one or more categories of the emotional state based on at least one of the one or more sensor data and the one or more non-voice contextual data. For example, to categorize a voice sample into a category of the emotional state, a non-voice contextual information such as user details indicating age and gender of a user and a sensor data such as an input of a microphone sensor indicating a sad sound (i.e. emotional state) of the user, etc. can further be used to categorize the voice sample in an emotional (sad) category as the category of the emotional state.
Further, the identification unit [104] is also configured to identify at the cloud server unit, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user. For example, if a user prefers to listen regional songs in evening time, a usage pattern of such listing of regional songs in evening, can be determined based on a contextual data (such as determined based on a voice sample indicating a command to play a regional song, a determined emotional state of the user for instance a happy mood, a sensor data such as a camera sensor indicating that the user is present in vicinity of a TV telecasting regional songs) and a non-voice contextual data (such as an email/notification) indicating an evening time when an event of such regional songs is telecasted.
Also, the processing unit [106] is further configured to generate at the cloud server unit, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state. For example, if a user voice command to play a specific song ABC is received at the cloud server unit from the first target device, an emotional interactive response to said voice command is generated based on a usage pattern of the user associated with said received voice command (for example a usage pattern indicating the user listens songs similar to the song ABC at volume level 5) and a category of the emotional state associated with the voice command of the user (for example a happy category). Therefore, in the given example the
emotional interactive response may include an emotional voice response (for instance a happy voice response) – indicating starting song ABC at volume level 5. In an implementation the processing unit [106] is also configured to continue to implement the features of the present invention and to perform techniques such as automatic speech recognition (ASR), natural language processing (NLP) and natural language understanding (NLU) techniques to identify one or more voice commands of the user to generate an appropriate emotional interactive response even after providing the emotional interactive response to a particular voice command.
Furthermore, the processing unit [106] is also configured to determine at the cloud storage unit, a voice preference for the emotional interactive response based on the one or more usage pattern and the one or more categories of the emotional state, wherein the voice preference further comprises at least one of a preferred voice type and a preferred voice age. The preferred voice type may include a voice of a male, female, robot etc. and the preferred voice age may include any human age in years such as 10 years, 20 years, 50 years etc. More particularly, based on a category of the emotional state and other information such as usage pattern identified based on contextual and/or non-voice contextual information, the processing unit [106] is configured to auto select a voice preference (such as a male or female voice of age 15 years or a robot voice) while delivering response back to user. Also, in an example the usage pattern may be identified based on a customized voice preference set on a user’s personal voice enabled device. e.g.: user’s personal smartphone device etc. Furthermore, the processing unit [106] is configured to generate at the cloud server unit, the emotional interactive response to the voice command of the user based on the voice preference for the emotional response. More particularly, the processing unit [106] is configured to generate at the cloud server unit, the emotional interactive response to the voice command of the user based on the voice preference for the emotional response, the one or more usage pattern and the one or more categories of the emotional state.
Further, the processing unit [106] is also configured to provide by the cloud server unit, the generated emotional interactive response to the voice command of the user, via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user. Also, the processing unit [106] is further configured to modify via the cloud server unit, a surrounding environment of the user based on the emotional
response. More particularly, the processing unit [106] is configured to identify at the cloud server unit, one or more second target devices from at least one of the one or more voice capable devices and the one or more non voice capable devices, to provide the generated emotional interactive response by the one or more second target devices. Thereafter, the processing unit [106] is configured to modify via the cloud server unit, the surrounding environment of the user based on the emotional interactive response provided by the one or more second target devices. For example, once the emotional interactive response is generated, the processing unit [106] identifies one or more best devices (i.e. the one or more second target devices) to deliver the emotional response, based on capabilities of all available nearby voice and non-voice capable devices (e.g. a device with best speaker or display will get picked as a second target device based on response type). The processing unit [106] then communicates to the one or more identified second target devices to deliver the response to user. The processing unit [106] also sends extra parameters which includes determined voice preference for that given user. The one or more second target devices then respond back to the user using the voice preference applicable for that user. Also, the processing unit [106] communicates to the one or more second target devices (for example all surrounding smart device(s)) to modify surrounding environment of the user to suit mood of the user while delivering the response/emotional response. In an implementation one or more devices other than the one or more second target devices may also be selected by the processing unit [106] to modify surrounding environment of the user to suit mood of the user while delivering the response/emotional interactive response by the one or more second target devices. For example, based on an emotional state (mood) of the user and a usage pattern of the user, the processing unit [106] is configured to set modulation of voice preference (male or female, robot voice) and/or to add/modify background music/video, background light to match the mood or to bring positive change to mood, while delivering emotional interactive response back to the user.
Therefore, based on the user’s age, gender, emotional state (mood) and various type of contextual information while delivering an emotional interactive response back to the user, the system [100] is configured to contextually modify/suit/fit the emotional response e.g.: In summer, based on user age, gender, mood, thermostat sensor reading and contextual information like weather data, user’s eating habits, while responding to user’s command like “find me nearest restaurants”, the processing unit [106] may auto filter AC restaurants
of user preferred cuisine. Also, in another implementation based on the user’s age, gender, emotional state (mood), other contextual information and the one or more sensor data, the system [100] is configured to provide additional contextual information to the user, while delivering the emotional response. (e.g.: In summer, based on user age, gender, mood, thermostat sensor reading and contextual information like weather data, user’s travel history, while responding to user’s command like “how far is Pune from Mumbai”, the system [100] may provide required information and add extra details of nearby resort and hotel details.).
Furthermore, Figure 2 illustrates an exemplary network architecture diagram [200] for providing an emotional interactive response to a voice command of a user, in accordance with exemplary embodiments of the present invention.
The network architecture diagram [200] indicates that a cloud server unit [202] is connected to various smart devices such as a smartphone [206], a smart remote control [208], a smart set top box [210], a smart speaker [212] and other such voice capable and non-voice capable devices [214] via a communication interface (like Wi-Fi, Ethernet, IR, Bluetooth, BLE, NFC, and/or Cloud etc.) [204]. Furthermore, the figure 2 also indicates that the cloud server unit [202] comprises the system [100]. Also, in Fig. 2 only a few units are shown, however, the network architecture diagram [200] may comprise multiple such units or the network architecture diagram [200] may comprise any such numbers of said units, as required to implement the features of the present disclosure.
Moreover, the Figure 2 provides an overall network architecture view of the voice capable devices (for instance [206], [208], [210], [212]) that are connected to the secured cloud server unit [202] (i.e. computing backend for voice data processing). In an implementation the smart devices of the network architecture [200] for managing emotional voice triggers (i.e. for providing an emotional interactive response to the voice command of the user) and for controlling other devices remotely via the cloud server unit [202], may be implemented in a typical home environment, i.e. the home having multi voice capable device environment. For instance, the living room in such typical home environment may contain a smart STB, a smart speaker and smart phone(s). The Kitchen room may contain a smart speaker and smart phone(s). One or more bedrooms may contain a smart STB, smart
speaker and smart phone(s) and study room may contain smart phone(s). Also, the components of the network architecture [200] are provided below:
The smart Set-Top-Box (smart STB) [210] receives, decodes and displays digital signals and also supports IP TV, gaming, etc. The smart STB have multiple communication interfaces like IR, Bluetooth, BLE, Wi-Fi, NFC, Cloud etc. through which it connects to external devices within/outside a house/building including, but not limited to, thermostats, smart door locks, smart bulbs, smartphones, home surveillance systems, home automation systems, fitness bands, etc. In an implementation a Wi-Fi mesh device connects all home sensors including, but not limited to, thermostat, smart door lock, smart bulbs, smart switches, smart STB/TV, smartphones, home surveillance systems, home automation systems, etc. and has the capability to control these devices. The Remote-control unit (RCU) [208] connects to smart STB [210] including sensors like accelerometer, mic, gyroscope, fingerprint, mic etc. and also it connects to the STB [210] via IR (InfraRed), BLE (Bluetooth Low Energy) or any other communication mechanism that is able to transmit sensor data to the STB [210]. A Voice POD, when connected to the STB (via. USB) [210], provides a voice trigger capability to the existing STB [210]. The other components in the network architecture diagram [200] as follows:
1. A Smartphone with voice trigger capability [206].
2. A Smart Speaker with voice trigger capability [212].
3. Other voice capable and/or non-voice capable devices such as a Music Box which plays online music (via music applications) and has voice trigger capability [214].
The smart devices (i.e. the smartphone [206], the smart remote control [208], the smart set top box [210], the smart speaker [212] and other such voice capable and non-voice capable devices [214]) are connected with the cloud server unit [202] comprising the system [100]. Also, all sensor information (for instance internal and external sensor data collected by the smart devices) is securely stored, categorized and processed using complex artificial intelligence (AI) and machine learning (ML) modules to recognize voice patterns of user/s, usage pattern of user/s (for example user preference recognition) to provide better user experience. In an implementation the cloud server unit [202] may also have modules having such as natural language understanding (NLU) and natural language processing (NLP)
capabilities for supporting all voice and non-voice capable devices. Also, cloud server unit [202] comprising the system [100], has the capability to recognize age, gender, language, regional accent and emotional state (mood) of a user based on human raw speech/voice sample data, to implement the features of the present invention.
Also, a few use cases based on the implementation of the features of the present invention are provided as below:
Use Case I –
Let’s assume ABC is 20 years old, female, lives in Mumbai and is in a happy mood, likes continental food.
If ABC said “Hello ‘A’, how is the weather in Mumbai today”, in a multi-voice capable device environment, the system [100] implemented at the cloud server unit is configured to receive a trigger command from multiple devices (e.g.: Smart STB, Voice POD, Smartphone, Smart Speaker, etc.) at the same time, wherein ‘Hello A’ is a command assigned to activate the multiple devices. In an implementation various other commands such as ‘Hello followed by a specific name’, ‘Hi followed by a specific name’ and/or the like may be configured activate device(s) in the multi-voice capable device environment to receive/ trigger voice command(s). Based on use case, the system [100] decides Smart Speaker to continue listening for voice command and once the command is received, the system [100] may decide to display the results on TV via. Smart STB.
The system [100] to provide the emotional interactive response to the voice command “Hello A, how is the weather in Mumbai today”, encompasses
i. Deriving ABC’s age, gender and mood to set the preferred voice response. In this
case, it’s male teenage voice.
ii. From contextual information like – ABC being in Mumbai (current location), likes
eating continental food (eating habits), it is lunch time (time of day) etc., displaying relevant weather information of Mumbai along with providing information about nearby good continental restaurants and a background music/video content of most famous hotel in Mumbai while delivering the emotional interactive response back to ABC.
iii. From other sensor information like thermostat reading, background music/no-
music, etc., pausing the ongoing music and while displaying the audio/video content in the TV via. STB changing the lighting condition to suit the mood.
Use Case II–
Let’s assume XYZ is 50 years old, male, lives in Mumbai and in a sad mood and has interest in music concerts.
If XYZ said “Hello A how is the weather in Mumbai today”, in a multi-voice capable device environment, the system [100] at the cloud server unit receives the trigger command from multiple devices (e.g.: Smart STB, Voice POD, Smartphone, Smart Speaker, etc.) at the same time, wherein ‘Hello A’ is a command assigned to activate the multiple devices. In an implementation various other commands such as ‘Hello followed by a specific name’, ‘Hi followed by a specific name’ and/or the like may be configured activate device(s) in the multi-voice capable device environment to receive/ trigger voice command(s). Based on above, the system [100] decides Smart Speaker to continue listening for voice command and once the command is received, the system [100] may decide to display the results on TV via. Smart STB.
The system [100] to provide the emotional interactive response to the voice command “Hello A how is the weather in Mumbai today”, encompasses
i. Determining XYZ’s age, gender and mood to set the preferred voice response.
In this case, Female voice with deep/low pitch.
ii. From usage pattern/contextual information like – XYZ being in Mumbai
(current location), has interest in music concerts (hobbies), it is morning time (time of day) etc., displaying relevant weather information of Mumbai along with providing information about nearby live music concerts and a background music/video content of famous music concerts being held Mumbai, while delivering the response back to XYZ.
iii. From other sensor information like thermostat reading, background music/no-
music, etc., pausing the ongoing music and while displaying the audio/video content in the TV via. STB changing the lighting condition to suit the mood.
Also, an aspect of the present invention relates to a user equipment for providing an emotional interactive response to a voice command of a user. The user equipment comprises a system, wherein the system comprises a transceiver unit [102], configured to receive from a first target device, the voice command of the user. The system further comprises an identification unit [104], configured to identify, age of the user, gender of the user and emotional state of the user based on the received voice command of the user. Further the system comprises a processing unit [106] configured to categorize, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user. The identification unit [104] of the system is further configured to identify, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user. Also, the processing unit [106] of the system is further configured to generate, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state. Thereafter, the processing unit [106] is configured to provide, the generated emotional interactive response to the voice command of the user, via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
Referring to Figure 3, an exemplary method flow diagram [300], depicting a method for providing an emotional interactive response to a voice command of a user, in accordance with exemplary embodiments of the present invention is shown. In an implementation the method is performed at a cloud server unit, by a system [100] implemented at the cloud server unit. Also as shown in Figure 3, the method starts at step [302].
The method comprises registering via a processing unit [106], at the cloud server unit, one or more voice capable devices and one or more non-voice capable devices present in vicinity of the user. Further, the method comprises publishing and listing by the processing unit [106], a device capability for each of the one or more voice capable devices and the one or more non-voice capable devices at the cloud sever unit. The device capability includes one or more parameters indicating capabilities of each smart device (i.e. each of the one or more voice capable devices and the one or more non-voice capable devices), such as including but not limited to one or more quality related parameters, one or more signal to noise ratio
related parameters, one or more connectivity related parameters, one or more location related parameters, one or more device identification parameters and other such parameters indicating capabilities of a device. Also, in an implementation, the method also encompasses registering and storing by the processing unit [106] at the cloud server unit, one or more capabilities of all smart devices (including voice trigger enabled devices and non-voice capable devices) belonging to a same user.
Further, once the one or more voice capable devices and the one or more non-voice capable devices are registered on the cloud server unit, the method at step [304] comprises receiving, at a transceiver unit [102] of the cloud server unit from a first target device, the voice command of the user. The first target device is a voice capable device identified from the one or more voice capable devices present in the vicinity of the user by the processing unit [106], to receive the voice command of the user. The first target device is identified based on the device capability of the one or more voice capable devices. More particularly, the first target device is identified based on at least one of a real time GPS location data associated with the one or more voice capable devices, a device ID of the one or more voice capable devices, a voice command trigger timestamp associated with the one or more voice capable devices, a signal to noise ratio (SNR) associated with the one or more voice capable devices, a Quality parameter (QoS) associated with the one or more voice capable devices and the other such parameters indicating one or more capabilities of the one or more voice capable devices. For example, whenever a user triggers a voice command, all the nearby voice capable devices are configured to transmit a trigger and the voice command to the transceiver unit [102] of the cloud server unit along with one or more parameters indicating device capability of the all nearby voice capable devices, such as current GPS location, device ID, trigger timestamp, SNR, etc. Thereafter, the method comprises checking by the processing unit [106] at the cloud server unit if the trigger and the voice command are coming from multiple voice capable devices from a same location, based on the one or more parameters indicating device capability of the all nearby voice capable devices. Also, the method encompasses identifying by the processing unit [106], if the trigger and the voice command is coming from a same user using one or more voice recognition techniques. Once it is identified that the trigger and the voice command is coming from the from multiple voice capable devices from the same location and also from the same user, the method encompasses identifying by the processing unit [106], one or more parameters indicating
device capability of each voice capable device from the all nearby voice capable devices based on the one or more parameters indicating device capability of the all nearby voice capable devices, to further decide which is the best device (i.e. the first target device) to continue receiving user’s voice commands. For instance: the method may comprise identifying by the processing unit [106] a voice capable device with best microphone and good QoS with very less SNR as the first target device. Also, in an implementation for a particular voice command of the user, any voice capable device present nearby the user can be selected as the first target device based on real time device capability of the one or more nearby voice capable devices.
Further after the receipt of the voice command of the user from the first target device at the cloud server unit at step [306], the method comprises identifying, by an identification unit [104] of the cloud server unit, age of the user, gender of the user and emotional state of the user based on the received voice command of the user. Each of the age of the user, the gender of the user and the emotional state of the user is further identified based on a pre-trained data set. Also, in an implementation the each of the age of the user, the gender of the user and the emotional state of the user is identified using one or more Artificial Intelligence techniques. The pre-trained data set is trained based at least on a plurality of data associated with age of plurality of users, gender of plurality of users, emotional state of plurality of users and language of plurality of users. More particularly, the pre-trained data set may include but not limited to an age-based pre-trained data set, a gender-based pre-trained data set, an emotion-based pre-trained data set and a language-based pre-trained data set. Furthermore in an instance, if a voice command of a user is received at the cloud server unit from the first target device, the method encompasses analyzing by the processing unit [106] such voice command based on the pre-trained data set and based on such analysis the method further encompasses identifying by the processing unit [106] age of the user, gender of the user and emotional state (mood) of the user. Also, in an implementation the emotional state of the user is further identified based on the identified age of the user and the identified gender of the user. More particularly, in such implementation the method comprises identifying by the processing unit [106] the emotional state of the user based on an analysis of the age of the user and the gender of the user based on the pre-trained data set. For example, if the identified age of the user is 25 years and identified gender of the user is Female, the method encompasses identifying
by the processing unit [106], the emotional state (happy, sad, worried, nervous etc.) of said user based on an analysis of the age 25 years and gender Female based on the pre-trained dataset. Furthermore, in an implementation one or more artificial intelligence, machine learning and deep learning techniques are used to detect the age of the user using the age-based pre-trained data set, the gender of the user using the gender-based pre-trained data set, the emotional state (mood) of the user using the emotion based pre-trained data set and the language accent of the user using automatic speech recognition (ASR) techniques and the language based pre-trained data set. Also, in an implementation once the age is detected using age-based pre-trained data set, and the gender is detected using the gender-based pre-trained data set, the method encompasses analyzing a speech data (i.e. the voice command related data) associated with the determined age and gender by the processing unit [106] based at least on an age and/or a gender specific emotional pre-trained data set, where deep learning techniques using gender specific age-based emotion techniques are used to derive the emotional state of the user.
Also, the method further comprises collecting, by the processing unit [106] of the cloud server unit, one or more sensor data from one or more sensor devices connected in the vicinity of the user. The one or more sensor devices may include but not limited to one or more microphone sensors, proximity sensors, ultrasonic sensors, camera sensors, temperature sensors, motion sensors, optical sensors, infrared (IR) sensors, pressure sensors, light sensors and/or the like sensors. In an implementation, the one or more sensor data may also be collected from surrounding connected smart devices (like Wi-Fi mesh) including, but not limited to, set temperature (via smart thermostat), lighting condition (via smart bulb), background music (via. smart music box), TV channel / over the top (OTT) app content played (via Smart STB), etc. Also, the method comprises identifying, by the identification unit [104] of the cloud server unit, the one or more contextual data associated with the user based on at least one of the received voice command of the user, the identified age of the user, the identified gender of the user, the identified emotional state of the user and the one or more sensor data. The method also encompasses identifying, by the identification unit [104] of the cloud server unit, the one or more non-voice contextual data associated with the user based at least on a user device data. In an instance the non-voice contextual information may include, but not limited to, date & time of an event associated with the user, month & year of an event associated with the user, a historical user data,
current upcoming user engagements (integration with e-mail, calendar and other apps on user’s smartphone) and the like information identified based on the user device data. In an event the one or more non-voice contextual data associated with the user may also be identified based on the one or more sensor data.
Thereafter at step [308] the method comprises categorizing, by the processing unit [106] of the cloud server unit, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user. For example, if an identified age of a user is 32 years, an identified gender of the user is Male and an identified emotional state of the user is an unhappy state, then the method in such example encompasses categorizing by the processing unit [106] at the cloud server unit, one or more voice samples of the voice command of the user into a sad category as a category of the emotional state. Also, the categorization of the one or more voice samples of the received voice command of the user into the one or more categories of the emotional state is further based on the pre-trained data set. Furthermore, each of the one or more categories of the emotional state is one of an age based category of the emotional state, a gender based category of the emotional state and an emotional state based category of the emotional state. Further considering the above example, to categorize at the cloud server unit, the one or more voice samples of the voice command of the user into the sad category, the method encompasses analyzing by the processing unit the identified age of the user i.e. 32 years, the identified gender of the user i.e. Male and the identified emotional state of the user i.e. the unhappy state based on the pre-trained data set. Also, in the given example, the sad category is the emotional state based category of the emotional state. Furthermore, in an instance the category of the emotional state may further be defined based on age of the user and the gender of the user as the age based category of the emotional state (for example 30-35 years age based category of the emotional state) and the gender based category of the emotional state (for example Female gender based category of the emotional state), respectively. Also, in an implementation method also encompasses categorizing, by the processing unit [106] of the cloud server unit, the one or more voice samples of the received voice command of the user into the one or more categories of the emotional state based on at least one of the one or more sensor data and the one or more non-voice contextual data. For example, to categorize a voice sample into a category of the
emotional state, a non-voice contextual information such as user details indicating age and gender of a user and a sensor data such as a camera sensor data indicating a sad state (i.e. emotional state) of the user, etc. can further be used to categorize the voice sample in an emotional (sad) category as the category of the emotional state.
Next at step [310] the method comprises identifying, by the identification unit [104] of the cloud server unit, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user. For example, if a user prefers to watch religious TV shows in evening time, a usage pattern of such watching of religious TV shows in evening, can be determined based on a contextual data (such as determined based on a voice sample indicating a command to play a religious TV show, a determined emotional state of the user for instance a happy mood, a sensor data such as a camera sensor data indicating that the user is present in vicinity of a TV telecasting religious TV show) and a non-voice contextual data (such as a text/notification) indicating an evening time when an event of such religious TV show is telecasted.
Thereafter at step [312] the method comprises generating, by the processing unit [106] of the cloud server unit, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state. For example, if a user voice command to play a specific program AAA is received at the cloud server unit from the first target device, an emotional interactive response to said voice command is generated based on a usage pattern of the user associated with said received voice command (for example a usage pattern indicating the user watches programs similar to the AAA at volume level 10) and a category of the emotional state associated with the voice command of the user (for example a happy category). Therefore, in the given example the emotional interactive response may include an emotional voice response (for instance a happy voice response) – indicating starting program AAA at volume level 10.
Furthermore, the method also comprises determining by the processing unit [106] of the cloud storage unit, a voice preference for the emotional interactive response based on the one or more usage pattern and the one or more categories of the emotional state, wherein the voice preference further comprises at least one of a preferred voice type and a preferred
voice age. The preferred voice type may include a voice of a male, a female, a robot etc. and the preferred voice age may include any human age in years such as 30 years, 40 years, 15 years etc. More particularly, based on a category of the emotional state and other information such as usage pattern identified based on contextual and/or non-voice contextual information, the method encompasses auto selecting by the processing unit [106] a voice preference (such as a male or a female of age 50 years) while delivering response back to user. Also, in an example the usage pattern to determine the voice preference may be identified based on a customized voice preference set on a user’s personal voice enabled device. e.g.: user’s personal smartphone device etc. Furthermore, the process of generating, by the processing unit [106] of the cloud server unit, the emotional interactive response to the voice command of the user is further based on the voice preference for the emotional response. More particularly, the method comprises generating by the processing unit [106] at the cloud server unit, the emotional interactive response to the voice command of the user based on the voice preference for the emotional response, the one or more usage pattern and the one or more categories of the emotional state.
Further at step [314] the method comprises providing, by the processing unit [106] of the cloud server unit, the generated emotional interactive response to the voice command of the user via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user. More particularly, the method comprises identifying by the processing unit [106] of the cloud server unit, one or more second target devices from at least one of the one or more voice capable devices and the one or more non voice capable devices, to provide the generated emotional response. The method also encompasses modifying by the processing unit [106] of the cloud server unit, a surrounding environment of the user based on the emotional interactive response provided by the one or more second target devices.
For example, once the emotional interactive response is generated, the method encompasses identifying by the processing unit [106], one or more best devices (i.e. the one or more second target devices) to deliver the emotional response, based on capabilities of all available nearby voice and non-voice capable devices. The method then encompasses communicating by the processing unit [106] to the one or more identified second target devices to deliver the emotional interactive response to user. The method also comprises
sending by the processing unit [106], extra parameters which includes at least the determined voice preference for that given user. The one or more second target devices then respond back to the user using the voice preference applicable for that user. Also, the method encompasses communicating by the processing unit [106] to one or more surrounding devices (for example all surrounding smart device(s)) or the one or more second target devices, to modify surrounding environment of the user to suit mood of the user while delivering the response/emotional interactive response by the one or more second target devices. For example, based on an emotional state (mood) of the user and a usage pattern of the user, the method encompasses setting by the processing unit [106], modulation of voice preference (such as male or female, robot voice) and/or adding/modifying background music/video, background light to match the mood or to bring positive change to mood of the user, while delivering emotional interactive response back to the user.
After providing the emotional interactive response to the voice command of the user, the method terminates at step [316].
Furthermore, a use case for customized advertisement on an emotional voice output is described as below:
In an implementation after detection of a voice trigger on the cloud server unit, a category of an emotional state and an advertisement associated with one or more voice samples of the received voice command of a user is identified at the cloud server unit based on the implementation of the features of the present invention. Also, once the category of the emotional state is identified, an appropriate emotional interactive response along with advertisement is also generated for the same based on the implementation of the features of the present invention. The emotional interactive response could be a simple voice response or an action. Based on all available user’s voice capable and/or non-voice capable devices in a same location, the system [100] at the cloud sever unit, decides which is the best device available along with the best suitable advertisement (i.e. second target device) to communicate back to the user and which is the best device available to perform an action if necessary. Also one or more second target devices identified at the cloud server unit via the system [100] to deliver the advertisement with speech, to communicate back to user and one or more devices identified by the system [100] to perform necessary action could
be the same device or could be different devices, based on a location of the user and the one or more second target devices. Further, based on user’s voice preference, the cloud server unit via the system [100] temporarily sets the user’s voice preference on the one or more second target devices identified to deliver the speech and advertisement, to communicate back to user. Also, in an event any voice command could result into a conversation mode. The cloud sever unit via the system [100], based on a previous contextual information of the user and based on other parameters like time of the day, location, date, etc. may predict a possibility for a specific command to convert into a conversation mode. Furthermore, in an event based on this prediction, the cloud server unit via the system [100] may also decide which is the best device available (i.e. the second target device) to communicate the emotional interactive response back to the user with speech and advertisement.
Referring to Figure 4 (i.e. Figure 4a, Figure 4b and Figure 4c), an exemplary flow diagram, depicting an instance implementation of an exemplary process for providing an emotional voice response in multi voice environment, in accordance with exemplary embodiments of the present invention is shown. As shown in Figure 4 the method starts at step [402].
The method at step [404] indicates that a user enters in a multi voice capable environment where multiple smart devices are placed in different rooms of a building and listed/registered at a cloud server unit under user’s account [404]
Next, the method as step [406] depicts that the user triggers a voice command by saying “Hello A…”, wherein “Hello A…” is a command assigned to activate the multiple smart devices. In an implementation various other commands such as ‘Hello followed by a specific name’, ‘Hi followed by a specific name’ and/or the like may be configured activate smart device(s) in the multi-voice capable device environment to receive/ trigger voice command(s).
Further, the method at step [408] indicates that all nearby voice capable devices of the user, send trigger (i.e. voice trigger) and command (i.e. user voice command) to the cloud server unit along with extra parameters. The extra parameters includes device capabilities of the voice capable devices such as including but not limited to a current GPS location, a device ID, a trigger timestamp, a SNR etc.
Next, the method at step [410] comprises checking by the cloud server unit (i.e. via the system [100] implemented at the cloud server unit), if the trigger and command is received from multiple voice capable devices from same location, based on the extra parameters and user identification via voice recognition [410]
Next, the method at step [412] comprises identifying by the system [100] implemented at the cloud server unit, if all voice capable devices listing to the same user?
Further, the method at step [414] comprises continue analyzing incoming voice data independently for all devices (i.e. all voice capable devices) by the system [100] implemented at the cloud server unit, if all voice capable devices are not listing to the same user.
Also, the method at step [416] comprises identifying by the system [100] implemented at the cloud server unit, the best device to continue receiving the user command based on device capabilities of all the voice capable devices, if all voice capable devices are listing to the same user.
Next, the method at step [418] comprises communicating back by the system [100] of the cloud server unit, to all listening voice capable devices, where only one device (i.e. first target device) will continue to listen to user command and all other devices to stop listening user voice commands.
Next, the method at step [420] comprises performing by the system [100] of the cloud server unit, ASR techniques for detecting language and regional accent along with NLP and NLU techniques to identify the user’s voice command.
Further, the method at step [422] comprises identifying by the system [100] implemented at the cloud server unit, an incoming speaker/user speech data associated with the user’s voice command.
Next, the method at step [424] comprises detecting by the system [100] of the cloud server unit, the age/age group of the user based on age based data (i.e. a pre-trained data set comprising at least a plurality of data trained based on age of plurality of users).
Thereafter, the method at step [426] comprises detecting by the system [100] of the cloud server unit, the gender of the user based on gender based data (i.e. a pre-trained data set comprising at least a plurality of data trained based on gender of plurality of users).
Further, the method at step [428] comprises identifying by the system [100] implemented at the cloud server unit, if the detected gender is male or female.
Next, the method at step [430] comprises analyzing by the system [100] implemented at the cloud server unit, the voice command of the user based on male gender specific age/age group based emotional data, in an event if at step [428] the detected gender is identified as male.
Also, the method at step [432] comprises analyzing by the system [100] implemented at the cloud server unit, the voice command of the user based on female gender specific age/age group based emotional data, in an event if at step [428] the detected gender is identified as female.
Next, the method at step [434] comprises detecting by the system [100] implemented at the cloud server unit, an emotional state of the user based on emotional data (i.e. a pre-trained data set comprising at least a plurality of data trained based on emotional state of plurality of users) and one of the analysis of voice command of the user based on male gender specific age/age group based emotional data and the analysis of the voice command of the user based on female gender specific age/age group based emotional data.
Also, the method at step [436] comprises generating by the system [100] of the cloud server unit, an appropriate response for the user’s voice command based on a voice based contextual information and a non-voice based contextual information of the user. In an event the voice based contextual information includes but not limited to language accent of the user, age/age group of the user, gender of the user, emotional state of the user and the like. Also, in an event the non-voice based contextual information includes but not limited to date & time of an event associated with the user, month & year of an event associated with the user, topographical data, a historical user data (liking, disliking, travel history etc.), current upcoming user engagements and the like.
Also, the method at step [438] comprises automatically selecting an appropriate voice preference to deliver the constructed/generated emotional interactive response to the user by the system [100] of the cloud server unit, based on the voice based contextual information and the non-voice based contextual information.
Further, the method at step [440] comprises identifying/deciding by the system [100] of the cloud server unit, the best device to deliver the response (i.e. second target device), based at least on the voice based contextual information, the non-voice based contextual information and all available nearby voice and non-voice capable devices.
Thereafter, the method at step [442] comprises deciding and commanding by the system [100] of the cloud server unit, one or more surrounding smart devices of the user to control the environment settings while delivering the response to the user via the second target device, based on the voice based contextual information, the non-voice based contextual information and the all available nearby voice and non-voice capable devices and their operating state. In an event the one or more surrounding smart devices may be same as that of the one or more second target devices.
Further, the method at step [444 A] comprises communicating by the cloud server unit to the identified device (i.e. the second target device) to deliver the response back to the user, along with extra parameters which includes derived custom voice preference.
Also, the method at step [444 B] comprises communicating by the cloud server unit (i.e. by the system [100] of the cloud server unit) to the one or more surrounding smart devices to modify the surrounding environment of the user to suit mood of the user while delivering the response.
Thereafter, the method at step [446 A] comprises responding back to the user by the second target device using the custom voice preference of the user.
Further, the method at step [446 B] comprises performing one or more actions by the one or more surrounding smart devices, to modify the surrounding environment of the user to suit mood of the user while delivering the response by the second target device, based on the command received by the cloud server unit to modify the surrounding environment.
Thereafter, the method terminates at step [448].
Furthermore, an aspect of the present invention relates to a method for providing an emotional interactive response to a voice command of a user. The method encompasses receiving, at a transceiver unit [102] of a user equipment from a first target device, the voice command of the user. The method thereafter comprises identifying, by an identification unit [104] of the user equipment, age of the user, gender of the user and emotional state of the user based on the received voice command of the user. Further the method encompasses categorizing, by a processing unit [106] of the user equipment, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user. The method thereafter comprises identifying, by the identification unit [104] of the user equipment, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user. Further the method leads to generating, by the processing unit [106] of the user equipment, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state. The method thereafter encompasses providing, by the processing unit [106] of the user equipment, the generated emotional interactive response to the voice command of the user via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
Thus, the present invention provides a novel solution to process a voice command of a user to detect emotional state of the user and use this information along with other contextual data to deliver smart emotional interactive response to the voice command in a multi-voice capable environment via wireless connectivity along with directly connected sensors. Furthermore, the present invention provides a solution to over-come the challenges in scenarios when a voice trigger command intends with emotional voice trigger for multiple devices (i.e. voice capable devices) through external/internal sensors to process the voice command intended with customized voice based on emotional state of the user. Also, the present invention provides solution to voice capable devices with emotional voice identity where each device would speak with different emotions. The present invention also provides solution to handle scenarios where the user has multiple voice trigger enabled devices (i.e. voice capable devices) that accept different emotional voice command and provide a unified user experience across all these devices with emotional voice output.
Further, the systems/units depicted in some of the figures may be provided in various configurations. In some embodiments, the systems may be configured as a distributed system where one or more components of the system are distributed across one or more networks in a cloud computing system.
A network may be set up to provide an access device user with access to various devices connected to the network. For example, a network may include one or more network devices that provide a user with the ability to remotely configure or control the network devices themselves or one or more electronic devices (e.g., appliances) connected to the network devices. The electronic devices may be located within an environment or a venue that can support the network. An environment can include, for example, a home, an office, a business, an automobile, a park, or the like. A network may include one or more gateways that allow client devices (e.g., network devices, access devices, or the like) to access the network by providing wired connections and/or wireless connections using radio frequency channels in one or more frequency bands. The one or more gateways may also provide the client devices with access to one or more external networks, such as a cloud network, the Internet, and/or other wide area networks.
A local area network, such as a user's home local area network, can include multiple network devices that provide various functionalities. Network devices may be accessed and controlled using an access device and/or one or more network gateways. One or more gateways in the local area network may be designated as a primary gateway that provides the local area network with access to an external network. The local area network can also extend outside of the user's home/building and may include network devices located outside of the user's home/building. For instance, the local area network can include network devices such as exterior motion sensors, exterior lighting (e.g., porch lights, walkway lights, security lights, or the like), garage door openers, sprinkler systems, or other network devices that are exterior to the user's home. It is desirable for a user to be able to access the network devices while located within the local area network and also while located remotely from the local area network. For example, a user may access the network devices using an access device within the local area network or remotely from the local area network.
In some embodiments, a user may create an account with login information that is used to
authenticate the user and allow access to the network devices. For example, once an account is created, a user may enter the login information in order to access a network device in a logical network.
In some embodiments, an accountless authentication process may be performed so that the user can access one or more network devices within a logical network without having to enter network device login credentials each time access is requested. While located locally within the local area network, an access device may be authenticated based on the access device's authentication with the logical network. For example, if the access device has authorized access to the logical network (e.g., a WiFi network provided by a gateway), the network devices paired with that logical network may allow the access device to connect to them without requiring a login. Accordingly, only users of access devices that have authorization to access the logical network are authorized to access network devices within the logical network, and these users are authorized without having to provide login credentials for the network devices.
An accountless authentication process may also be performed when the user is remote so that the user can access network devices within the logical network, using an access device, without having to enter network device login credentials. While remote, the access device may access the network devices in the local area network using an external network, such as a cloud network, the Internet, or the like. One or more gateways may provide the network devices and/or access device connected to the local area network with access to the external network. To allow accountless authentication, a cloud network server may provide a network ID and/or one or more keys to a network device and/or to the access device (e.g., running an application, program, or the like). In some cases, a unique key may be generated for the network device and a separate unique key may be generated for the access device. The keys may be specifically encrypted with unique information identifiable only to the network device and the access device. The network device and the access device may be authenticated using the network ID and/or each device's corresponding key each time the network device or access device attempts to access the cloud network server.
In some embodiments, a home local area network may include a single gateway, such as a router. A network device within the local area network may pair with or connect to the gateway and may obtain credentials from the gateway. For example, when the network
device is powered on, a list of gateways that are detected by the network device may be displayed on an access device (e.g., via an application, program, or the like installed on and executed by the access device). In this example, only the single gateway is included in the home local area network (e.g., any other displayed gateways may be part of other local area networks). In some embodiments, only the single gateway may be displayed (e.g., when only the single gateway is detected by the network device). A user may select the single gateway as the gateway with which the network device is to pair and may enter login information for accessing the gateway. The login information may be the same information that was originally set up for accessing the gateway (e.g., a network user name and password, a network security key, or any other appropriate login information). The access device may send the login information to the network device and the network device may use the login information to pair with the gateway. The network device may then obtain the credentials from the gateway. The credentials may include a service set identification (SSID) of the home local area network, a media access control (MAC) address of the gateway, and/or the like. The network device may transmit the credentials to a server of a wide area network, such as a cloud network server. In some embodiments, the network device may also send to the server information relating to the network device (e.g., MAC address, serial number, or the like) and/or information relating to the access device (e.g., MAC address, serial number, application unique identifier, or the like).
The cloud network server may register the gateway as a logical network and may assign the first logical network a network identifier (ID). The cloud network server may further generate a set of security keys, which may include one or more security keys. For example, the server may generate a unique key for the network device and a separate unique key for the access device. The server may associate the network device and the access device with the logical network by storing the network ID and the set of security keys in a record or profile. The cloud network server may then transmit the network ID and the set of security keys to the network device. The network device may store the network ID and its unique security key. The network device may also send the network ID and the access device's unique security key to the access device. In some embodiments, the server may transmit the network ID and the access device's security key directly to the access device. The network device and the access device may then communicate with the cloud server using the network ID and the unique key generated for each device. Accordingly, the access device
may perform accountless authentication to allow the user to remotely access the network device via the cloud network without logging in each time access is requested. Also, the network device can communicate with the server regarding the logical network.
In some embodiments, a local area network may include multiple gateways (e.g., a router and a range extender) and multiple network devices. For example, a local area network may include a first gateway paired with a first network device, and a second gateway paired with a second network device. In the event credentials for each gateway are used to create a logical network, a server (e.g., a cloud network server) may register the first gateway as a first logical network and may register the second gateway as a second logical network. The server may generate a first network ID and a first set of security keys for the first logical network. The first set of security keys may include a unique security key for the first network device and a unique security key for the access device for use in accessing the first network device on the first logical network. The server may register the second gateway as the second logical network due to differences in the credentials between the first gateway and second gateway. The server may assign the second gateway a second network ID and may generate a second set of security keys. For example, the server may generate a unique security key for the second network device and may generate a unique security key for the access device for use in accessing the second network device on the second logical network. The server may associate the first network device and the access device with the first logical network by storing the first network ID and the first set of security keys in a first record or profile. The server may also associate the second network device and the access device with the second logical network by storing the second network ID and the second set of security keys in a record or profile. The server may then transmit the first network ID and the first set of security keys to the first network device and may transmit the second network ID and the second set of security keys to the second network device. The two network devices may store the respective network ID and set of security keys of the gateway with which each network device is connected. Each network device may send the respective network ID and the access device's unique security key to the access device. The network devices and the access device may then communicate with the cloud server using the respective network ID and the unique key generated for each device.
Accordingly, when multiple gateways are included in the home local area network, multiple logical networks associated with different network identifiers may be generated for the local
area network. When the access device is located within range of both gateways in the local area network, there is no problem accessing both network devices due to the ability of the access device to perform local discovery techniques (e.g., universal plug and play (UPnP)). However, when the user is located remotely from the local area network, the access device may only be associated with one logical network at a time, which prevents the access device from accessing network devices of other logical networks within the local area network.
While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter to be implemented merely as illustrative of the invention and not as limitation.
We Claim:
1. A method for providing an emotional interactive response to a voice command of a
user, the method comprising:
- receiving, at a transceiver unit [102] of a cloud server unit from a first target device, the voice command of the user;
- identifying, by an identification unit [104] of the cloud server unit, age of the user, gender of the user and emotional state of the user based on the received voice command of the user;
- categorising, by a processing unit [106] of the cloud server unit, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user;
- identifying, by the identification unit [104] of the cloud server unit, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user;
- generating, by the processing unit [106] of the cloud server unit, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state; and
- providing, by the processing unit [106] of the cloud server unit, the generated emotional interactive response to the voice command of the user via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
2. The method as claimed in claim 1, wherein the first target device is a voice capable
device identified from the one or more voice capable devices, to receive the voice
command of the user and the first target device is identified based on at least one
of a real time GPS location data associated with the one or more voice capable
devices, a device ID of the one or more voice capable devices, a voice command trigger timestamp associated with the one or more voice capable devices, a signal to noise ratio (SNR) associated with the one or more voice capable devices, a Quality parameter (QoS) associated with the one or more voice capable devices.
3. The method as claimed in claim 1, wherein each of the age of the user, the gender of the user and the emotional state of the user is further identified based on a pre-trained data set, wherein the pre-trained data set is trained based at least on a plurality of data associated with age of plurality of users, gender of plurality of users, emotional state of plurality of users and language of plurality of users.
4. The method as claimed in claim 3, wherein the emotional state of the user is further identified based on the identified age of the user and the identified gender of the user.
5. The method as claimed in claim 3, wherein the each of the age of the user, the gender of the user and the emotional state of the user is further identified using one or more Artificial Intelligence techniques.
6. The method as claimed in claim 1, wherein the categorisation of the one or more voice samples of the received voice command of the user into the one or more categories of the emotional state is further based on the pre-trained data set and each of the one or more categories of the emotional state is one of an age based category of the emotional state, a gender based category of the emotional state and an emotional state based category of the emotional state.
7. The method as claimed in claim 1, the method further comprises:
- collecting, by the processing unit [106] of the cloud server unit, one or more sensor data from one or more sensor devices connected in the vicinity of the user; and
- categorising, by the processing unit [106] of the cloud server unit, the one or more voice samples of the received voice command of the user into the one or more categories of the emotional state based on at least one of the one or more sensor data and the one or more non-voice contextual data.
8. The method as claimed in claim 1, the method further comprises:
- identifying, by the identification unit [104] of the cloud server unit, the one or more non-voice contextual data associated with the user based at least on a user device data; and
- identifying, by the identification unit [104] of the cloud server unit, the one or more contextual data associated with the user based on at least one of the received voice command of the user, the identified age of the user, the identified gender of the user, the identified emotional state of the user and the one or more sensor data.
9. The method as claimed in claim 1, the method further comprises determining by the processing unit [106] of the cloud storage unit, a voice preference for the emotional interactive response based on the one or more usage pattern and the one or more categories of the emotional state, wherein the voice preference further comprises at least one of a preferred voice type and a preferred voice age.
10. The method as claimed in claim 9, wherein generating, by the processing unit [106] of the cloud server unit, the emotional interactive response to the voice command of the user is further based on the voice preference for the emotional response.
11. The method as claimed in claim 1, the method further comprises:
- identifying by the processing unit [106] of the cloud server unit, one or more second target devices from at least one of the one or more voice capable devices and the one or more non voice capable devices, to provide the generated emotional response; and
- modifying by the processing unit [106] of the cloud server unit, a surrounding environment of the user based on the emotional interactive response provided by the one or more second target devices.
12. A system for providing an emotional interactive response to a voice command of a
user, the system comprising:
- a transceiver unit [102], configured to receive at a cloud server unit from a first target device, the voice command of the user;
- an identification unit [104], configured to identify at the cloud server unit, age of the user, gender of the user and emotional state of the user based on the received voice command of the user;
- a processing unit [106] configured to categorise at the cloud server unit, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user; wherein:
the identification unit [104] is further configured to identify at the cloud server unit, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user, and
the processing unit [106] is further configured to:
generate at the cloud server unit, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state, and
provide by the cloud server unit, the generated emotional interactive response to the voice command of the user, via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
13. The system as claimed in claim 12, wherein the first target device is a voice capable device identified from the one or more voice capable devices, to receive the voice command of the user and the first target device is identified based on at least one of a real time GPS location data associated with the one or more voice capable
devices, a device ID of the one or more voice capable devices, a voice command trigger timestamp associated with the one or more voice capable devices, a signal to noise ratio (SNR) associated with the one or more voice capable devices, a Quality parameter (QoS) associated with the one or more voice capable devices.
14. The system as claimed in claim 12, wherein each of the age of the user, the gender of the user and the emotional state of the user is further identified based on a pre-trained data set, wherein the pre-trained data set is trained based at least on a plurality of data associated with age of plurality of users, gender of plurality of users, emotional state of plurality of users and language of plurality of users.
15. The system as claimed in claim 14, wherein the emotional state of the user is further identified based on the identified age of the user and the identified gender of the user.
16. The system as claimed in claim 14, wherein the each of the age of the user, the gender of the user and the emotional state of the user is further identified using one or more Artificial Intelligence techniques.
17. The system as claimed in claim 12, wherein the categorisation of the one or more voice samples of the received voice command of the user into the one or more categories of the emotional state is further based on the pre-trained data set and each of the one or more categories of the emotional state is one of an age based category of the emotional state, a gender based category of the emotional state and an emotional state based category of the emotional state.
18. The system as claimed in claim 12, wherein the processing unit [106] is further configured to:
- collect at the cloud server unit, one or more sensor data from one or more sensor devices connected in the vicinity of the user; and
- categorise at the cloud server unit, the one or more voice samples of the received voice command of the user into the one or more categories of the emotional state based on at least one of the one or more sensor data and the one or more non-voice contextual data.
19. The system as claimed in claim 12, wherein the identification unit [104] is further
configured to:
- identify at the cloud server unit, the one or more non-voice contextual data associated with the user based at least on a user device data; and
- identify at the cloud server unit, the one or more contextual data associated with the user based on at least one of the received voice command of the user, the identified age of the user, the identified gender of the user, the identified emotional state of the user and the one or more sensor data.
20. The system as claimed in claim 12, wherein the processing unit [106] is further configured to determine at the cloud storage unit, a voice preference for the emotional interactive response based on the one or more usage pattern and the one or more categories of the emotional state, wherein the voice preference further comprises at least one of a preferred voice type and a preferred voice age.
21. The system as claimed in claim 20, wherein the processing unit [106] is further configured to generate at the cloud server unit, the emotional interactive response to the voice command of the user based on the voice preference for the emotional response.
22. The system as claimed in claim 12, wherein the processing unit [106] is further configured to:
- identify at the cloud server unit, one or more second target devices from at least one of the one or more voice capable devices and the one or more non voice capable devices, to provide the generated emotional response; and
- modify via the cloud server unit, a surrounding environment of the user based on the emotional interactive response provided by the one or more second target devices.
23. A user equipment for providing an emotional interactive response to a voice
command of a user, the user equipment comprising:
- a system [100] comprising:
a transceiver unit [102], configured to receive from a first target device, the voice command of the user,
an identification unit [104], configured to identify, age of the user, gender of the user and emotional state of the user based on the received voice command of the user, and
a processing unit [106], configured to categorise, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user, wherein:
the identification unit [104] is further configured to identify, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user, and
the processing unit [106] is further configured to:
generate, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state, and
provide, the generated emotional interactive response to the voice command of the user, via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
24. A method for providing an emotional interactive response to a voice command of a user, the method comprising:
- receiving, at a transceiver unit [102] of a user equipment from a first target
device, the voice command of the user;
- identifying, by an identification unit [104] of the user equipment, age of the user, gender of the user and emotional state of the user based on the received voice command of the user;
- categorising, by a processing unit [106] of the user equipment, one or more voice samples of the received voice command of the user into one or more categories of the emotional state based on the identified age of the user, the identified gender of the user and the identified emotional state of the user;
- identifying, by the identification unit [104] of the user equipment, one or more usage pattern associated with the user based on one or more contextual data associated with the user and one or more non-voice contextual data associated with the user;
- generating, by the processing unit [106] of the user equipment, the emotional interactive response to the voice command of the user based at least on the one or more usage pattern and the one or more categories of the emotional state; and
- providing, by the processing unit [106] of the user equipment, the generated emotional interactive response to the voice command of the user via at least one of one or more voice capable devices and one or more non voice capable devices present in vicinity of the user.
| # | Name | Date |
|---|---|---|
| 1 | 202021033117-FORM-8 [17-09-2024(online)].pdf | 2024-09-17 |
| 1 | 202021033117-STATEMENT OF UNDERTAKING (FORM 3) [01-08-2020(online)].pdf | 2020-08-01 |
| 2 | 202021033117-PROVISIONAL SPECIFICATION [01-08-2020(online)].pdf | 2020-08-01 |
| 2 | 202021033117-FER_SER_REPLY [09-09-2022(online)].pdf | 2022-09-09 |
| 3 | 202021033117-FORM 1 [01-08-2020(online)].pdf | 2020-08-01 |
| 3 | 202021033117-FER.pdf | 2022-03-09 |
| 4 | Abstract1.jpg | 2022-01-11 |
| 4 | 202021033117-FIGURE OF ABSTRACT [01-08-2020(online)].pdf | 2020-08-01 |
| 5 | 202021033117-Proof of Right [18-08-2020(online)].pdf | 2020-08-18 |
| 5 | 202021033117-COMPLETE SPECIFICATION [23-06-2021(online)].pdf | 2021-06-23 |
| 6 | 202021033117-FORM-26 [09-03-2021(online)].pdf | 2021-03-09 |
| 6 | 202021033117-DRAWING [23-06-2021(online)].pdf | 2021-06-23 |
| 7 | 202021033117-FORM 18 [23-06-2021(online)].pdf | 2021-06-23 |
| 7 | 202021033117-ENDORSEMENT BY INVENTORS [23-06-2021(online)].pdf | 2021-06-23 |
| 8 | 202021033117-FORM 18 [23-06-2021(online)].pdf | 2021-06-23 |
| 8 | 202021033117-ENDORSEMENT BY INVENTORS [23-06-2021(online)].pdf | 2021-06-23 |
| 9 | 202021033117-FORM-26 [09-03-2021(online)].pdf | 2021-03-09 |
| 9 | 202021033117-DRAWING [23-06-2021(online)].pdf | 2021-06-23 |
| 10 | 202021033117-COMPLETE SPECIFICATION [23-06-2021(online)].pdf | 2021-06-23 |
| 10 | 202021033117-Proof of Right [18-08-2020(online)].pdf | 2020-08-18 |
| 11 | Abstract1.jpg | 2022-01-11 |
| 11 | 202021033117-FIGURE OF ABSTRACT [01-08-2020(online)].pdf | 2020-08-01 |
| 12 | 202021033117-FORM 1 [01-08-2020(online)].pdf | 2020-08-01 |
| 12 | 202021033117-FER.pdf | 2022-03-09 |
| 13 | 202021033117-PROVISIONAL SPECIFICATION [01-08-2020(online)].pdf | 2020-08-01 |
| 13 | 202021033117-FER_SER_REPLY [09-09-2022(online)].pdf | 2022-09-09 |
| 14 | 202021033117-STATEMENT OF UNDERTAKING (FORM 3) [01-08-2020(online)].pdf | 2020-08-01 |
| 14 | 202021033117-FORM-8 [17-09-2024(online)].pdf | 2024-09-17 |
| 15 | 202021033117-US(14)-HearingNotice-(HearingDate-26-11-2025).pdf | 2025-10-15 |
| 16 | 202021033117-US(14)-ExtendedHearingNotice-(HearingDate-18-12-2025)-1430.pdf | 2025-11-19 |
| 17 | 202021033117-Correspondence to notify the Controller [19-11-2025(online)].pdf | 2025-11-19 |
| 18 | 202021033117-FORM-26 [20-11-2025(online)].pdf | 2025-11-20 |
| 1 | SearchStrategy-convertedE_09-03-2022.pdf |