Method And System For Improved Emotion Identification

< Back

Method And System For Improved Emotion Identification

Abstract: A system (100) and method for identifying an emotion of a user is disclosed. Sensor data corresponding to a user is acquired from sensors (104a-104n) disposed in a defined area. A set of features is extracted from the sensor data, and optimal features are selected from the extracted set. Selecting the optimal features includes iteratively processing each feature using a modified bat algorithm (MOBA). Each feature corresponds to a bat processed by the MOBA. A frequency, velocity, pulse rate, and/or position of the bat is iteratively updated using a trained neural network upon determining that a fitness value of the bat exceeds a threshold value using a first classifier. An emotion of user is identified by processing the optimal features using a second classifier. Operational parameters corresponding to functional units (108a-108n) disposed in the defined area are adjusted for personalizing functions for the user based on the identified emotion.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

31 March 2017

Publication Number

40/2018

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

shery.nair@tataelxsi.co.in

Parent Application

Patent Number

Legal Status

Grant Date

2024-06-14

Renewal Date

Applicants

TATA ELXSI LIMITED

ITPB Road, Whitefield, Bangalore – 560048, India.

Inventors

1. SIVA PRASAD NANDYALA

TATA ELXSI LIMITED, ITPB Road, Whitefield, Bangalore – 560048, India.

2. BHADRADRI SRIRAMKUMAR VARANASI

TATA ELXSI LIMITED, ITPB Road, Whitefield, Bangalore – 560048, India.

3. MURALIDHAR KAVERI VASUDEVA

TATA ELXSI LIMITED, ITPB Road, Whitefield, Bangalore – 560048, India.

Specification

Claims:

1. A method, comprising:
acquiring sensor data corresponding to a user from one or more sensors (104a-104n) disposed in a defined area;
extracting a set of features from the acquired sensor data by an emotion recognition system (106);
selecting one or more optimal features from the extracted set of features by the emotion recognition system (106), wherein selecting the one or more optimal features comprises:
iteratively processing each feature from the extracted set of features using a modified bat algorithm, wherein each feature corresponds to at least one bat processed by the modified bat algorithm, and wherein one or more of a frequency, velocity, pulse rate, and position of the bat is iteratively updated using a trained neural network upon determining that a fitness value of the bat exceeds a threshold value using a first classifier;
identifying an emotion corresponding to the user by processing the one or more optimal features using a second classifier; and
adjusting one or more operational parameters corresponding to one or more functional units (108a-108n) disposed in the defined area for personalizing one or more functions for the user based on the identified emotion.

2. The method as claimed in claim 1, wherein the first classifier corresponds to a combination of Gaussian Mixture Model with one or more of a Universal Background Model and a Restricted Boltzmann Machine.

3. The method as claimed in claim 2, wherein the first classifier and the second classifier are different, and wherein the second classifier corresponds to one or more of a Support Vector Machine (SVM), a Convolutional Neural Network (CNN), and a decision graph.

4. The method as claimed in claim 1, wherein the selected neural network comprises one or more of a deep belief network, a convolutional neural network, an autoencoder, a Deep Boltzmann Machine, and a recurrent neural network.

5. The method as claimed in claim 1, wherein the defined area corresponds to an automobile (102), a virtual reality environment, a movie hall, an auditorium, a medical center, a recreational center, and a residence of the user.

6. The method as claimed in claim 1, wherein the personalizing one or more functions corresponds to modifying one or more parameters of an environment of the defined area, and wherein the parameters comprise one or more of ambient temperature, colour of light, intensity of light, intensity of sound, media content, type of advertisement relayed to the user based on a user profile, relative position of the user, relative orientation of the user, and an operational state of the functional units (108a-108n) disposed in the defined area.

7. The method as claimed in claim 1, wherein the personalizing one or more functions further comprises:
recording information corresponding to one or more user actions for updating the operational parameters of the functional units (108a-108n) and a subsequent change in the emotion of the user for a defined period of time;
identifying one or more patterns correlating the updated operational parameters of the functional units (108a-108n) with a subsequent change in the emotion of the user based on the recorded information;
determining a personalization for one or more of the functional units (108a-108n) based on the identified patterns such that the determined personalization achieves a defined change in the emotion of the user within a specified time.

8. The method as claimed in claim 1, wherein the fitness value is determined by an evaluation function, and wherein the fitness value of the bat depends on at least one of the frequency, velocity, pulse rate, and position of the bat .

9. The method as claimed in claim 1, wherein the optimal features are selected from the extracted set of features based on a condition, and wherein the condition includes at least one of: iteratively processing each feature from the extracted set of features for a predefined number of times and a fitness value of the feature remaining constant for at least two iterations.

10. The method as claimed in claim 1, wherein the sensor data includes at least one of speech data and video data, and wherein the set of features corresponding to the speech data comprise mel-frequency cepstral coefficients (MFCC), and wherein the set of features corresponding to the video data comprise histogram of oriented gradients (HOG).

11. The method of claim 1, wherein identifying the emotion comprises selecting the emotion identified using a majority of data types in the sensor data, or a majority of weighted data types in the sensor data as the prevailing emotion of the user.
12. A system (100), comprising:
one or more sensors (104a-104n) for measuring one or more biometric parameters of a user and generating corresponding sensor data;
an emotion recognition system (106) communicatively coupled to the sensors (104a-104n) for receiving the sensor data for identifying the emotion of the user, wherein the emotion recognition system (106) is configured to:
extract a set of features from the acquired sensor data;
select one or more optimal features from the extracted set of features by iteratively processing each feature from the extracted set of features using a modified bat algorithm, wherein each feature corresponds to at least one bat processed by the modified bat algorithm, and wherein one or more of a frequency, velocity, pulse rate, and position of the bat is iteratively updated using a trained multi-layered neural network upon determining that a fitness value of the bat exceeds a threshold value using a first classifier, wherein the first classifier corresponds to a combination of Gaussian Mixture Model with one or more of a Universal Background Model and a Restricted Boltzmann Machine;
identify an emotion corresponding to the user by processing the one or more optimal features using a second classifier; and
adjust one or more operational parameters corresponding to one or more functional units (108a-108n) disposed in the defined area for personalizing one or more functions for the user based on the identified emotion.

13. The system (100) as claimed in claim 12, wherein the defined area is at least one of an automobile (102), a virtual reality environment, a movie hall, an auditorium, and a medical center, a recreational center. a residence of the user.

14. The system (100) as claimed in claim 12, wherein the sensors (104a-104n) comprise one or more of a microphone, a video capture unit, an image acquisition device, an electrocardiogram machine, an electroencephalography sensor, a temperature sensor, a respiration monitoring unit, a blood pressure monitoring unit, an infrared sensor, and a depth sensing device, and wherein the functional units (108a-108n) comprise one or more of a media playing system, an air conditioner, a seat adjustment system, a lighting unit, a fragrance dispenser, a gaming unit, a communications device, and a control unit disposed in the defined area.

15. The system (100) as claimed in claim 12, wherein the personalizing the one or more functions corresponds to modifying one or more parameters of an environment of the defined area, and wherein the parameter of the environment is one or more of a temperature of an air conditioner, colour of light, intensity of light, and intensity of sound, media content, a type of advertisement relayed to the user based on a user profile, a relative position of the user, relative orientation of the user, and an operational state of the functional units (108a-108n) disposed in the defined area.
, Description:FIELD OF THE INVENTION

[0001] The present disclosure relates generally to data sciences, and more particularly to a system and a method for identifying an emotion by means of efficient data processing and analytics.

DESCRIPTION OF RELATED ART

[0002] With advances in communications technology, the world is rapidly moving towards a state of ubiquitous connectivity, where a multitude of connected devices with intelligent sensors is generating data at unbelievable rates. Examples of the connected devices with sensors include mobile phones, tablets, fitness devices, laptops, desktops, home automation devices, connected cars, and the like. The sensor data gathered from such connected devices may assist in taking a plethora of decisions related to business and personal operations. Multiple data processing algorithms process the sensor data received from such connected devices and attempt to generate useful inferences. For instance, video data that includes facial expressions and head movements of a user may be collected and processed to identify an emotion of the user. Subsequently, ambiance of the user’s residence and/or office, volume of a music playing in at the user’s residences, and the like may be modified based on the identified emotion of the user.
[0003] Commonly used data processing algorithms involve the use of data classifiers to categorize the video data received from the connected devices into predefined categories and identify the emotion of the user indicated by the video data. The data classifiers classify the video data based on a standard dataset. The standard dataset includes data corresponding to past observations and inferences gathered from the data corresponding to the past observations. Certain examples of such data classifiers include Gaussian Mixture Models (GMM), Hidden Markov Models (HMM), K-Nearest Neighbor Classifier (KNN), Support Vector Machine (SVM), Dynamic Time Warping (DTW), and the like.
[0004] While present day data analytics attempts to process data to generate insights using the aforementioned classifiers, the data processing operation is often inefficient and time-intensive due to the large size of the data. Particularly, ill-designed algorithms lead to unnecessary and redundant data processing that preclude their use in real-time systems. Furthermore, in case of occurrence of an error, reprocessing of the sensor data consumes considerable amount of time.
[0005] To solve the aforementioned problems, certain systems employ various data optimization algorithms. The data optimization algorithms extract characteristics that satisfy desired criteria from a dataset. For example, an emotion identification system identifies the emotion of a user based on characteristic data extracted from ambient sensor data. The characteristic data is the subset of the sensor data and is smaller in size as compared to the sensor data. The data optimization algorithms reduces the size of data to be processed for identifying the emotion of the user, thus decreasing the latency of the emotion identification process. The sensor data is typically collected from multiple sensors of the connected devices such as a video camera, a microphone, a heart rate monitor, a body temperature monitor, and the like to identify the emotion of the user. Conventional data processing methods using known classifiers treat each of these multiple sensor outputs alike during processing. However, in reality, some of the sensors outputs may be most pertinent to a particular application scenario, but not in others. As conventional methods fail to identify most relevant sensor outputs, the output of these methods may often be skewed due to presence of a lot of unrelated data points, while also requiring considerable processing time and computation effort.
[0006] Therefore, there exists a need for a method and system for efficient emotion identification of the user from the data corresponding to the multiple sensors of the connected devices.

SUMMARY

[0007] An object of the current disclosure is to provide a method for identifying an emotion of a user. The method includes acquiring sensor data corresponding to a user from one or more sensors (104a-104n) disposed in a defined area, extracting a set of features from the acquired sensor data by an emotion recognition system (106), and selecting one or more optimal features from the extracted set of features. Selecting the one or more optimal features includes iteratively processing each feature from the extracted set of features using a modified bat algorithm. Each feature corresponds to at least one bat processed by the modified bat algorithm, and one or more of a frequency, velocity, pulse rate, and position of the bat is iteratively updated using a trained neural network upon determining that a fitness value of the bat exceeds a threshold value using a first classifier. The method further includes identifying an emotion corresponding to the user by processing the one or more optimal features using a second classifier. The method also includes adjusting one or more operational parameters corresponding to one or more functional units (108a-108n) disposed in the defined area for personalizing one or more functions for the user based on the identified emotion.
[0008] According to an aspect of the present disclosure, the first classifier corresponds to a combination of Gaussian Mixture Model with one or more of a Universal Background Model and a Restricted Boltzmann Machine.
[0009] According to an aspect of the present disclosure, the first classifier and the second classifier are different, and wherein the second classifier corresponds to one or more of a Support Vector Machine (SVM), a Convolutional Neural Network (CNN), and a decision graph.
[0010] According to an aspect of the present disclosure, the selected neural network comprises one or more of a deep belief network, a convolutional neural network, an autoencoder, a Deep Boltzmann Machine, and a recurrent neural network.
[0011] According to an aspect of the present disclosure, the defined area corresponds to an automobile (102), a virtual reality environment, a movie hall, an auditorium, a medical center, a recreational center, and a residence of the user.
[0012] According to an aspect of the present disclosure, personalizing one or more functions corresponds to modifying one or more parameters of an environment of the defined area, and wherein the parameters comprise one or more of ambient temperature, colour of light, intensity of light, intensity of sound, media content, type of advertisement relayed to the user based on a user profile, relative position of the user, relative orientation of the user, and an operational state of the functional units (108a-108n) disposed in the defined area.
[0013] According to an aspect of the present disclosure, personalizing one or more functions further comprises recording information corresponding to one or more user actions for updating the operational parameters of the functional units (108a-108n) and a subsequent change in the emotion of the user for a defined period of time. Personalizing the functions further comprises identifying one or more patterns correlating the updated operational parameters of the functional units (108a-108n) with a subsequent change in the emotion of the user based on the recorded information. Personalizing the functions also comprises determining a personalization for one or more of the functional units (108a-108n) based on the identified patterns such that the determined personalization achieves a defined change in the emotion of the user within a specified time.
[0014] According to an aspect of the present disclosure, the fitness value is determined by an evaluation function, and wherein the fitness value of the bat depends on at least one of the frequency, velocity, pulse rate, and position of the bat.
[0015] According to an aspect of the present disclosure, the optimal features are selected from the extracted set of features based on a condition, and wherein the condition includes at least one of: iteratively processing each feature from the extracted set of features for a predefined number of times and a fitness value of the feature remaining constant for at least two iterations.
[0016] According to an aspect of the present disclosure, the sensor data includes at least one of speech data and video data, and wherein the set of features corresponding to the speech data comprise mel-frequency cepstral coefficients (MFCC), and wherein the set of features corresponding to the video data comprise histogram of oriented gradients (HOG).
[0017] According to an aspect of the present disclosure, identifying the emotion comprises selecting the emotion identified using a majority of data types in the sensor data, or a majority of weighted data types in the sensor data as the prevailing emotion of the user.
[0018] An object of the current disclosure is to provide a system for identifying an emotion of a user. The system includes one or more sensors (104a-104n) for measuring one or more biometric parameters of a user and generating corresponding sensor data. The system also includes an emotion recognition system (106) communicatively coupled to the sensors (104a-104n) for receiving the sensor data for identifying the emotion of the user. The emotion recognition system (106) is configured to extract a set of features from the acquired sensor data and select one or more optimal features from the extracted set of features by iteratively processing each feature from the extracted set of features using a modified bat algorithm. Each feature corresponds to at least one bat processed by the modified bat algorithm, and one or more of a frequency, velocity, pulse rate, and position of the bat is iteratively updated using a trained multi-layered neural network upon determining that a fitness value of the bat exceeds a threshold value using a first classifier. The first classifier corresponds to a combination of Gaussian Mixture Model with one or more of a Universal Background Model and a Restricted Boltzmann Machine. The emotion recognition system (106) is further configured to identify an emotion corresponding to the user by processing the one or more optimal features using a second classifier. The emotion recognition system (106) is also configured to adjust one or more operational parameters corresponding to one or more functional units (108a-108n) disposed in the defined area for personalizing one or more functions for the user based on the identified emotion.
[0019] According to an aspect of the present disclosure, the defined area is at least one of an automobile (102), a virtual reality environment, a movie hall, an auditorium, and a medical center, a recreational center. a residence of the user.
[0020] According to an aspect of the present disclosure, the sensors (104a-104n) comprise one or more of a microphone, a video capture unit, an image acquisition device, an electrocardiogram machine, an electroencephalography sensor, a temperature sensor, a respiration monitoring unit, a blood pressure monitoring unit, an infrared sensor, and a depth sensing device.
[0021] According to an aspect of the present disclosure, the functional units (108a-108n) comprise one or more of a media playing system, an air conditioner, a seat adjustment system, a lighting unit, a fragrance dispenser, a gaming unit, a communications device, and a control unit disposed in the defined area.
[0022] According to an aspect of the present disclosure, personalizing the one or more functions corresponds to modifying one or more parameters of an environment of the defined area, and wherein the parameter of the environment is one or more of a temperature of an air conditioner, color of light, intensity of light, and intensity of sound, media content, a type of advertisement relayed to the user based on a user profile, a relative position of the user, relative orientation of the user, and an operational state of the functional units (108a-108n) disposed in the defined area.
[0023] Additional features and advantages will be readily apparent from the following detailed description and the accompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS

[0024] The features of the present disclosure, which are believed to be novel, are set forth with particularity in the appended claims. Embodiments of the present invention will hereinafter be described in conjunction with the appended drawings provided to illustrate and not to limit the scope of the claims, wherein like designations denote like elements, and in which:
[0025] FIG. 1 is a schematic block diagram illustrating an emotion identification system, according to and embodiment of the present disclosure;
[0026] FIG. 2 is a flow chart illustrating a method for personalizing an environment of an automobile using the system of FIG. 1, in accordance with an embodiment of the present disclosure; and
[0027] FIG. 3 is a flow chart illustrating a method for identifying a set of optimal features from a superset of features using a modified bat algorithm (MOBA), in accordance with an embodiment of the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0028] It may be noted that various components and method steps have been described in the present specification to show specific details that are pertinent for an understanding of the embodiments described herein. Furthermore, the components and the method steps have been represented so as not to obscure the disclosure with details that will be readily apparent to those with ordinary skill in the art having the benefit of the description herein.
[0029] It may further be noted that the singular forms “a,” “an,” and “the,” as used in the specification and claims, include plural references unless the context clearly dictates otherwise. For example, the term “an article” may include a plurality of articles unless the context clearly dictates otherwise.
[0030] Additionally, those with ordinary skill in the art will appreciate that the elements in the figures are illustrated for simplicity and clarity, and are not necessarily drawn to scale. For example, the dimensions of some of the elements in the FIG. 1 may be exaggerated, relative to other elements, in order to improve the understanding of the embodiments described herein.
[0031] Further, there may be additional components described in the following application that are not depicted in one of the drawings. In the event of such component being described, but not depicted in a drawing, the absence of such a drawing should not be considered as an omission of the component from the specification.
[0032] It may further be noted that the embodiments described herein are merely exemplary and can be embodied in various forms in alternative implementations. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the embodiments described herein in virtually any appropriately detailed structure. Further, the terms and phrases used herein are not intended to be limiting but rather to provide an understandable description of the embodiments of the emotion identification system and the present method with reference to FIGS. 1, 2, and 3.
[0033] FIG. 1 illustrates a schematic block diagram depicting an exemplary emotion identification system (100) for identifying a prevailing emotion of a user. The emotion identification system (100) is configured to identify multiple emotions of a user based on one or more biometric parameters of the user. Examples of the biometric parameters include speech, body temperature, heart rate, electroencephalography (EEG) signal, electrocardiogram (ECG) signal, blood pressure, facial expressions, and the like of the user. In one example, the emotion identification system (100) is deployed in an automobile (102) (also referred to herein as a defined area) of the user. In alternate embodiments, the emotion identification system (100) may be deployed in a virtual reality environment, a movie hall, an auditorium, a medical center, a residence of the user, and the like.
[0034] In one embodiment, the emotion identification system (100) includes a set of sensors (104a-104n), an emotion recognition system (ERS) (106), a set of functional devices (108a-108n), a network device (110), a server (112), and a database (114). In certain embodiments, the set of sensors (104a-104n) detects and measures the biometric parameters of the user in the automobile (102). Examples of the set of sensors (104a-104n) include a microphone (also referred to herein as a speech sensor), a video camera, a temperature sensor, an infrared sensor, a pulse rate sensor, an ECG sensor, an EEG sensor, and the like. The set of sensors (104a-104n) generate a set of sensor data based on the detected biometric parameters of the user. In one embodiment, the first and second sensors (104a and 104b) generate first and second sensor data, respectively. In one example, the first and second sensors (104a and 104b) correspond to a microphone and a video camera, respectively. Accordingly, the first and second sensor data correspond to speech and video data, respectively. The set of sensors (104a-104n) is communicatively coupled to the ERS (106). The ERS (106) receives and processes the set of sensor data from the set of sensors (104a-104n). To that end, the ERS (106) includes a memory (116), a display (118), and a processing subsystem (120).
[0035] In one embodiment, the processing subsystem (120) in the ERS (106) processes the set of sensor data to identify a prevailing emotion of the user. The processing subsystem (120) processes the set of sensor data using an optimization algorithm stored in the memory (116) to identify the emotion of the user. In particular, the optimization algorithm identifies a set of optimal features from a superset of features determined from the set of sensor data. The set of optimal features is then further processed to identify the emotion of the user. In one example, the optimization algorithm corresponds to a modified bat algorithm (MOBA). The MOBA will be described in greater detail with reference to FIGs. 2-3. Additional non-limiting examples of optimization algorithms include metaheuristic algorithms such as a cuckoo algorithm, a firefly algorithm, hill climbing algorithm, particle swarm algorithm, and the like.
[0036] In certain embodiments, the ERS (106) is connected to the Internet and/or a network cloud (116) by way of the network device (110). FIG. 1A depicts a graphical representation of an embodiment of the emotion identification system (100) deployed in an automobile environment. In the automobile environment, the network device (110), for example, may correspond to a mobile phone or a human machine interface (HMI) unit operatively coupled to one or more wired and/or wireless communications networks. The network device (110) receives the emotion of the user from the ERS (106) and transmits the emotion of the user to the server (112) using one or more wired and/or wireless communications networks (116). The network device (110) includes a network interface (not shown) to facilitate communication using different communications networks (116). Examples of the communications networks (116) include global system for mobile communication (GSM), 2G, 3G,4G or 5G, long-term evolution (LTE), Wi-Fi, and the like. The network device (110) is configured to allow communication between the ERS (106) and the server (112) over a local area networks (LAN), wide area network (WAN), Wireless Fidelity (Wi-Fi) network, Light-Fidelity (Li-Fi) network, short-range network such as Bluetooth low energy or Zigbee, and the Internet for identifying emotion of the user, for example, in near real-time.
[0037] With returning reference to FIG. 1, in certain embodiments, the server (112) fetches suitable configuration data from the database (114) based on the emotion information received from the ERS (106). The database (114) stores the configuration data required to personalize an environment within the automobile (102) based on the emotion recognized by the ERS (106). The parameters of the environment that are personalized may be different for different emotions of a particular user. The configuration data includes an instruction set that is transmitted to the ERS (106) by way of the network device (110). The set of devices (108a-108n) receives the configuration data from the ERS (106).
[0038] According to an aspect of the present disclosure, the configuration data personalizes operational parameters of the set of devices (108a-108n) in the automobile (102) based on the identified emotion of the user. In one embodiment, the set of devices (108a-108n), for example, may include ambient lights, an air conditioner, a music system, a display unit, a fragrance dispenser, a media delivery system, a seat adjustment system, a gaming unit, a communications device, and a control unit, and the like. The set of devices (108a-108n) may be configured to control one or more aspects or parameters of the environment within the automobile (102). Examples for these parameters include temperature of the automobile (102), color and intensity of ambient light within the automobile (102), loudness and genre of music playing in the automobile (102), type and frequency of media content delivered to the user, and the like. One or more of these parameters may be personalized based on an identified emotion of the user. In one embodiment, for example, when a change in body temperature of the user in the automobile (102) indicates a change in emotion of the user, the air conditioner adjusts the temperature of the automobile (102) accordingly. Further, the ambient lights and music playing in the music system of the automobile (102) may be changed to modulate the mood of the user based on the emotion of the user. In certain embodiments, a personalized advertisement may be delivered to a customer based on an identified emotion and a demographic profile of the user. For example, an advertisement for a high-end car accessory may be delivered to a dashboard display in the automobile for a high net worth individual (HNI) if the identified emotion of the HNI is determined to be happy.
[0039] According to certain aspects of the present disclosure, the specific personalization for a particular user and a corresponding emotion may be input by the user, for example, during user registration. Alternatively, the specific personalization may be learned over a period of time based on historical data corresponding to change in user emotions following one or more user actions for updating operational parameters of the set of devices (108a-108n). The learning based personalization may identify patterns for selectively configuring the set of devices (108a-108n) for improving user disposition and/or user alertness within a specified time that may not be evident even to the user. The emotion identification and personalization steps, thus, may not need any user intervention and may be performed completely automatically. To that end, in one embodiment, the ERS (106) is disposed in a head unit (118) of the automobile (102). The head unit (118) of the automobile (102), in turn, may be connected to the set of sensors (104a-104n) used for identifying a prevailing emotion of the user and the set of devices (108a-108n) that may be personalized based on the identified emotion.
[0040] As previously noted, the emotion of the user may be identified based on measurement of one or more biometric parameters corresponding to the user. For example, in one embodiment, the emotion of the user in an automobile (102) may be identified using speech data (also referred to herein as the first sensor data) acquired by the microphone (104a). Although the emotion recognition process is explained in the present description by means of speech data, it is to be noted that the emotion of the user may be identified by the emotion recognition system (100) based on one or more of video data, body temperature signals, EEG signals, ECG signals, and the like. When the user in the automobile (102) speaks, the microphone (104a) records a voice sample of the user as speech data. The speech data may be stored in the memory (116) of the ERS (106). In certain embodiments, the processing subsystem (120) retrieves the speech data from the memory (116) and performs preprocessing operations such as noise removal, and white space removal on the stored speech data.
[0041] Further, the processing subsystem (120) splits the speech data into a set of frames. For each frame in the set of frames, the ERS (106) extracts a predefined set of features. The set of features may be indicative of one or more signal properties of a corresponding speech frame. In one example, the set of features correspond to 39 Mel-frequency cepstral coefficients (MFCC) features. In another example, however, the set of features may correspond to LPC (Linear Prediction Coefficients) features, or any other suitable set of features.
[0042] In the present embodiment, the processing subsystem (120) may be configured to extract the MFCC features by sampling the speech data with a sampling rate of about 16 kilohertz (kHz) and dividing the speech into 25 milliseconds (ms) frames. If the speech data cannot be split into an even number of frames, the speech data is padded with requisite zeros. Further, from each of the frames, the processing subsystem (120) may be configured to determine 12 MFCC coefficients, 12 delta coefficients, 12 delta-delta coefficients, energy coefficient, delta energy coefficient, and double delta energy coefficient. Particularly, the processing subsystem (120) may be configured to employ mathematical equations and methods comprising the Discrete Fourier Transform (DFT) and Fast Fourier Transform (FFT), and/or other suitable methods to extract the MFCC coefficients, the delta-delta coefficients, the energy coefficient, the delta energy coefficient, and the double delta energy coefficient from the speech data. It is to be noted that the set of features include MFCC features when the processing subsystem (120) is configured to process the speech data. Similarly, the set of features may include Pose and Occlusion Invariant Feature set (POIF) and/or histogram of oriented gradients (HOG) features when the processing subsystem (120) is configured to process the video data. Thus, the set of features depends on the sensor data from which the set of features is extracted.
[0043] When processing speech data for emotion identification, the extracted set of features may be stored in sets X and Y in the memory (116). During initialization, the superset of features extracted from the speech data may be stored in the set X, whereas the set Y remains empty. The set of features from the set X is provided as an input to the MOBA. The MOBA, upon execution, identifies a set of optimal features from the superset of features. The phrase “optimal set of features” is used in the present description to refer to a subset of features that are identified to be particularly suitable for a desired application area. In the present example, the set of optimal features are determined to be particularly suited, for example, for accurate identification of the prevailing emotion of the user by the MOBA.
[0044] The mathematical analogy and implementation of a conventional bat algorithm is known in the art. Conventional bat algorithms typically allow for selection for features based on a predefined objective function. However, these algorithms do not employ any learning systems for update and/or classification of features. In contrast, the present system (100) uses the MOBA that employs a neural network in conjunction with a first classifier that selects the optimal set of features from an overall feature set to train the system (100) to identify a user emotion with greater speed and accuracy.
[0045] To that end, in certain embodiments, the MOBA assigns a set of bats to the set of features corresponding to the set X. Specifically, in one example where speech data is processed, the processing subsystem (120) assigns one bat from the set of bats to each feature in the superset of MFCC features. Thus, 39 bats are assigned for the 39 MFCC features corresponding to the speech data. Further, a set of parameters is assigned to each bat in the set of bats. The set of parameters includes a position, a frequency, an amplitude, a pulse rate, and a velocity. In a first iteration of the MOBA, the frequency, the velocity, and the pulse rate of each bat are initialized with random values. In one example, the frequency and the amplitude are initialized based on a selected MFCC value associated with the specific MFCC feature of the corresponding bat. The position of each bat is initialized on a two-dimensional frequency vs. amplitude plane based on the corresponding frequency and amplitude values. In other embodiments, for example when processing video data, the position of each bat may be initialized on a three-dimensional height, width, and depth plane based on the corresponding height, width, and depth (color depth) values.
[0046] According to aspects of the present disclosure, the MOBA employs a neural network that facilitates the movement of each bat of the set of bats on the frequency vs. amplitude plane based on the optimal set of features selected using a first classifier. Examples of the neural network include a deep belief network (DBN), a Deep Boltzmann Machine (DBM), an autoencoder, a convolutional neural network, a recurrent neural network, and the like. Further, examples of first classifier include Gaussian Mixture Models (GMM), Hidden Markov Models (HMM), K-Nearest Neighbor Classifier (KNN), Support Vector Machine (SVM), Dynamic Time Warping (DTW), and the like. In one example, the first classifier is a combination of Gaussian Mixture Model with Universal Background Model (GMM-UBM) classifier or a Restricted Boltzmann Machine (GMM-RBM). The neural network moves each bat in the set of bats and updates the set of parameters corresponding to each bat subsequent to the movement of the bat. When processing speech data, in every iteration, the neural network moves each bat of the set of bats in the frequency vs. amplitude plane. Further, the set of parameters corresponding to each bat is updated based on new positions of the set of bats in every iteration. The neural network continues to update the set of parameters for each bat in the set of bats for a predefined number of iterations. In one example, the predefined number of iterations may be 500. In one embodiment, the neural network is trained based on historical movements of the bats.
[0047] Particularly, in the first iteration of the MOBA, the neural network moves each bat in the set of bats towards a corresponding nearest neighboring bat that is identified based on a corresponding position and velocity of the each bat. The neural network estimates a time required for the each bat to reach the corresponding nearest neighboring bat. The bats reaching the corresponding nearest neighboring bats within a duration of time that is less than a predefined threshold time are stored as an intermediate set of bats with good fitness. In one example, the threshold time may be defined as 10 milliseconds. Further, each bat in the intermediate set of bats is moved to a new position of the corresponding nearest neighboring bat, and corresponding parameter values are determined. Parameter values corresponding to each bat in the intermediate set of bats are subsequently updated when the bat moves to a new position. In one exemplary implementation, the parameters such as a frequency (fi), the velocity (vit), the position (xit), the amplitude (Ait), and the pulse rate (rit) corresponding to the new position of a bat are updated using equations (1)-(5):

f_i=f_(min )+(f_(max )-f_(min ) )ß (1)
v_i^t = v_i^(t-1) +(x_i^t -x_* ) f_i (2)
x_i^t = x_i^(t-1)+v_i^t (3)
A_i^t = ??A?_i^(t-1) (4)
r_i^t = r_i^0 (1-e^(-?(t-1))) (5)
where:
ß corresponds to [0,1],
x_* corresponds to current best path (i.e., distance between the bat and the nearest bat) associated with the ith bat,
f_i corresponds to frequency associated with the ith bat,
v_i^t corresponds to the velocity associated with the ith bat on the tth iteration,
x_i^t corresponds to the position associated with the ith bat on the tth iteration,
A_i^t corresponds to the loudness associated with the ith bat on the tth iteration,
r_i^t corresponds to the pulse rate associated with the ith bat on the tth iteration,
r_i^0 corresponds to the initial pulse rate of the ith bat and,
f_(min ) corresponds to 0.1Hz
f_(max ) corresponds to 0.9Hz
and where 0 0 are constants.

[0048] Similarly, the neural network moves each bat of the intermediate set of bats towards a corresponding nearest neighboring bat based on the new position and velocity of each bat in the intermediate set of bats. When each bat in the intermediate set of bats reaches the position of the corresponding nearest neighbor, the fitness of each bat is determined based on the threshold time. When the fitness value of the bat of the intermediate set of bats decreases (i.e., the time taken by the bat to reach its nearest neighboring bat increases), the bat stops traversing on the frequency vs. amplitude plane. Specifically, the bat stops traversing on the frequency vs. amplitude plane when addition of more features does not improve efficiency of the ERS (106), or if the bat cannot reach the next position within a specified time.
[0049] At the end of the first iteration, the GMM-UBM classifier selects a best or a desired set of bats from the intermediate set of bats. In one example, the best set of bats has the least travel times on the frequency vs. amplitude plane. Further, the GMM-UBM classifier evaluates the positions that each bat has travelled on the frequency vs. amplitude plane, for example, using equation (6):

Fit_i = 1/(1+f_i ); if f_i >=0
= 1+ abs (? f?_i ); if f_i<0 (6)
where Fit_i corresponds to the fitness value associated with the feature; and f_i corresponds to the MFCC value of the feature.

[0050] In one embodiment, a subset of features with the corresponding fitness values calculated using the equation (6) are stored in the set Y. A subsequent iteration (i.e. second iteration) starts with the bats that correspond to the subset of features stored in the set Y.
[0051] Following completion of the first iteration, the set of parameters corresponding to the best set of bats are again provided as input to the neural network. The neural network undergoes, for example, supervised learning to move the bats on the frequency vs. amplitude plane. In an example, the neural network includes a DBN with three hidden layers. The DBN is trained using a standard data set stored in the memory (116). The DBN then receives the bat along with the corresponding set of parameters. The DBN is trained based on the set of parameters of the bat. The training is performed until a set of conditions is achieved. Generally, for speech data, an exemplary set of conditions may comprise minimization of Equal Error Rate (EER) and maximization of accuracy rate. Additionally, a Detection Cost Function (DCF) with a fixed threshold can be used for monitoring the training of the DBN. When the training is complete, the first classifier, for example, the GMM-UBM classifier receives each bat along with the corresponding set of parameters and moves each bat to a new position on the frequency vs. amplitude plane based on information received from the trained DBN.
[0052] In the second iteration, the neural network receives a first bat with a first set of parameters from the set Y. In one example, the first set of parameters includes a first frequency, a first velocity, a first position, a first amplitude, and a first pulse rate. Further, the first set of bats belongs to the intermediate set of bats. The neural network assigns a new position to each bat in the set X post first iteration based on the training of the DBN. The neural network determines the fitness of the bat in the new position based on a time required by the bat to reach the new position. If the bat reaches the new position in a time less than the threshold time, the neural network considers the bat to be fit and assigns the new position to the first bat. If the bat does not reach the new position in a time less than the threshold time, the GMM-UBM classifier retains the previous position of the bat. The aforementioned process is repeated for each bat in the intermediate set of bats until the movement of the bats is stopped based on certain termination criteria. Certain examples of the termination criteria are described in following sections. At the end of every iteration, the set Y stores the subset of features from the set of bats with increased fitness in that iteration. The set Y is also referred to as an interpreter class and includes the subset of features with good fitness values. A set of bats corresponding to the subset of features in the set Y are then considered for the next iteration. Thus, in each iteration, the MOBA selects only a set of best features from the set of features, thereby ensuring that the optimal set of features is extracted.
[0053] The MOBA stops its execution when at least one of a first and second condition is satisfied. The first condition, for example, includes a determination of whether a best value of fitness associated with the bat, Fbest, is equal to an optimum value of the parameters corresponding to the set X in a particular iteration Xbest, that is, Fbest=Xbest. When the fitness values of the set of bats remain constant for two subsequent iterations, the first condition is satisfied. The second condition, for example, includes a determination of whether a predefined maximum number of iterations is reached, for example 500. Further, when the fitness values of the features in the subset of features in the set Y remains constant, the solution is deemed to be a non-dominated or Pareto optimal solution. A solution is called non-dominated, Pareto optimal, Pareto efficient, or non-inferior, if none of the objective functions can be improved in value without degrading certain other objective values. Without additional subjective preference information, all Pareto optimal solutions are considered equally good.
[0054] Once the first or second condition is satisfied, the set Y that includes the extracted set of optimal features is stored in the memory (116). Similarly, the MOBA processes each speech frame in the set of speech frames and selects the set of optimal features for each frame in the set of frames. The set of optimal features corresponding to each frame is provided as an input to a second classifier in the ERS (106). As previously noted, the second classifier (106), for example, may correspond to an SVM, a decision tree, a convolutional neural network (CNN), and the like.
[0055] The second classifier processes the set of optimal features of each frame and identifies the emotion of the user. The emotion of the user is identified by selectively processing a subset of all selected features, that is, the features identified from the frame as being optimal for emotion recognition. In certain embodiments, optimal features extracted from different types of data received from the set of sensors (104a-104n) may be used to determine the emotion of the person. In one example, one or more of speech data, video data, an EEG signal, ECG data, heart rate, and body temperature data may be processed to identify the emotion of the user as anger, neutral, sad, surprise, disgust, and happiness. In one embodiment, each data type may be processed to independently identify a potential emotion of the user. The emotion identified by the majority of the data types may be selected as the overall emotion of the user.
[0056] In an automotive environment, an emotion recognition application may be pre-installed on the head unit (118) in the automobile (102). Alternatively, the application may be deployed in the automobile (102) through an associated application store, and management of the application may be performed using a software update function. According to certain aspects of the present disclosure, the emotion recognition application may include software instructions that enable control of and communication with the set of sensors (104a-104n) using in-vehicle network such as Bluetooth, Controller Area Network (CAN), Dedicated short-range communications (DSRC)/WAVE, Wi-Fi, Li-Fi, and 4G/LTE to acquire sensor data and identify user emotion using the MOBA, as described herein.
[0057] In certain embodiments, the identified emotion of the user is received by the network device (110) connected to the ERS (106). The network device (110) transmits the identified emotion to the server (112). In one example, the network device (110) may be a transmitter attached to the ERS (106). In another example, the network device (110) is a mobile communications or computing device. The server (112) fetches the configuration data required for modifying and personalizing the environment of the automobile (102) for an identified user from the database (114). The configuration data may be predetermined based on previous user inputs during registration, or during a learning phase. Alternatively, the configuration data may be predefined for specific emotions, specific demographic of users, and/or for other identified usage patterns. The server (112) transmits the configuration data to the network device (110), which in turn, communicates the configuration data to the ERS (106). The ERS (106) receives the configuration data and sends the configuration data to the set of devices (108a-108n) for personalizing the environment of the automobile (102).
[0058] An example of personalization the environment of the automobile (102) based on the identified emotion by the ERS (106) is to play suitable music and/or video, set a suitable temperature within the automobile, and/or adjust seat height and angle of incline based on the emotion of the user. In certain scenarios, for example when suitable content is not available in local storage, the ERS (106) may provide the user with an option to search and download a relevant media file by way of the Wi-Fi/cellular network based on the identified emotion of the user. Additionally, the ERS (106) may provide access to a payment portal if downloading of the media file requires payment. The media file may be played on the main head unit (118) or a Rear Seat Entertainment unit based on whether the user is the driver or a passenger in the automobile (102). Further personalization examples include modifying appearance of an instrument cluster of the automobile (102) based on the emotion of the user, cooling and position of the seat of the user, sending a personalized advertisement to the user on a display associated with the automobile (102), or the user device, and the like .
[0059] Particularly, in one embodiment, the head unit (118) may provide message via controller area network (CAN) to a Body Control Module (BCM) (not shown) within the automobile (102). The BCM measures the body temperature and uses emotion of the user as an input to adjust the temperature and the position of the seat based on received configuration data. In certain implementations, the position of the seat may be adjusted so as to provide desired reclining angle and extended support for head/legs upon detecting drowsiness. Also, based on the distance between the passenger’s leg and the front seat, leg space can be adjusted automatically to improve user’s disposition upon identifying a negative emotion such as sadness or anger. Similarly, the ambient lights in the automobile may also be adjusted based on the emotion of the user. If the user is drowsy, the ambient lights may be turned on.
[0060] Embodiments of the ERS system (100), thus, provide suitable personalization options based on the identified emotions in different environments and different application scenarios. As the ERS system (100) allows for rapid and accurate identification of the user emotion by use of MOBA that identifies the optimal features to reduce the processing time and complexity, the subsequent personalization is also expedited. Quick and accurate personalization, in turn, improves user disposition and experience, which in turn, leads to improvement in user behavior, action, and reaction times in different scenarios. Certain exemplary methods for identifying the optimal features for emotion identification and subsequent personalization are described in greater detail with reference to FIGs. 2-3.
[0061] FIG. 2 illustrates a flowchart (200) depicting an exemplary method for identifying the emotion of the user in the automobile (102). The exemplary method is illustrated as a collection of blocks in a logical flow chart, which represents operations that may be implemented in hardware, software, or a combination thereof. The various operations are depicted in the blocks to illustrate the functions that are performed during various phases of the exemplary method. In the context of software, the blocks represent computer instructions that, when executed by one or more processing subsystems, perform the recited operations. The order in which the exemplary method is described is not intended to be construed as a limitation, and any number of the described blocks may be combined in any order to implement the exemplary method disclosed herein, or an equivalent alternative method. Additionally, certain blocks may be deleted from the exemplary method or augmented by additional blocks with added functionality without departing from the spirit and scope of the subject matter described herein. For discussion purposes, the present method is described with reference to the emotion identification system (100) of FIG. 1.
[0062] At step (202), the ERS (106) detects that the user has entered the automobile (102). The ERS (106) may detect the presence of the user in the automobile (102) by monitoring one or more sensory cues such as facial recognition, respiration, or heartbeat associated with the user using the set of sensors (104a-104n). At step (204), the ERS (106) receives sensor data from the set of the sensors (104a-104n). Each sensor in the set of sensors (104a-104n) generates and transmits the corresponding sensor data to the ERS (106). According to aspects of the present disclosure, different kinds of sensor data such as image data, video data, temperature data, and the like, may be acquired and transmitted to the ERS (106). At step (206), the processing subsystem (120) extracts an overall set of features from each data type received from each sensor in the set of sensors (104a-104n). In an example, the ERS (106) extracts a set of 39 MFCC features from the speech data. Similarly, features from other sensor data sets may also be extracted. At step (208), the ERS (106) executes the MOBA to identify the set of optimal features from the set of features corresponding to each data set. An exemplary method for identifying the set of optimal features from the set of features is described in greater detail with reference to FIG. 3.
[0063] Further, at step (210), the ERS (106) identifies an emotion of the user based on the set of optimal features extracted from the set of features corresponding to each of the data sets. At step (212), the ERS (106) determines whether the emotions from one or more desired sets of sensor data have been identified. If the emotion corresponding to desired sets of sensor data is not identified, the method proceeds to step 204, and the ERS (106) receives the further data from one or more of the sensors (104a-104n). If data from each sensor in the set of sensors (104a-104n) has been used to identify a corresponding emotion, the method proceeds to step (214). At step (214), the ERS (106) identifies the emotion identified by the majority of sensors(104a-104n), when corresponding data sets are independently processed using the MOBA, as the prevailing emotion of the user. For example, if the emotion identified by processing video data and heartbeat data indicate surprise, but the emotion identified by processing speech data indicates user emotion to be neutral, the prevailing emotion is identified as “surprise.” Although the present embodiment employs simple majority to identify the prevailing emotion, in certain other embodiments, the prevailing emotion may be identified based on emotions identified from weighted data types.
[0064] At step (216), the ERS (106) transmits the identified emotion to the server (112) by way of the network device (110) such as a mobile phone of the user. At step (218), the ERS (106) receives the configuration data from the server (112). In particular, the server (112) fetches the configuration data from the database (114) based on the identified emotion of the user. The configuration data includes one or more sets of instructions that facilitate desired configuration of the operation of the set of devices (108a-108n). In one example, the configuration data includes an instruction set to personalize the ambiance of the automobile (102). Particularly, the instruction set may personalize operational settings for the set of devices (108a-108n) including ambient lights, air conditioner, seats, music system, display device, and the like, of the automobile (102). At step (220), one or more operational parameters of the set of devices (108a-108n) are adjusted to personalize the ambiance of the automobile (102) based on the configuration data received from the ERS (106). As previously noted, accurate personalization based on identification of optimal set of features improves user disposition and experience. Certain exemplary methods for identifying the optimal features for emotion identification are described in greater detail with reference to FIG. 3.
[0065] FIG. 3 illustrates a flowchart (300) depicting an exemplary method for extracting the set of optimal features from the set of features using the MOBA. According to aspects of the present disclosure, the optimal set of features may be extracted from different kinds of sensor data such as image data, video data, temperature data, and the like. However, for clarity, the present method is described with reference to identifying the optimal set of features from speech data.
[0066] Accordingly, at step (302), the ERS (106) receives the speech data from the microphone (104a). At step (304), the ERS (106) extracts a desired set of features from the speech data. In certain embodiments, feature extraction is preceded by pre-processing operations, for example, eliminate noise from the speech data. Subsequently, the ERS (106) extracts the desired set of features, for example 39 MFCC features, from the speech data. The 39 features include 12 Mel-frequency cepstral coefficients (MFCC) coefficients, 12 delta coefficients, 12 delta-delta coefficients, the energy coefficient, the delta energy coefficient, and the double delta energy coefficient corresponding to the first sensor data. Further, the ERS (106) initializes the memory (116) and stores the set of features. In one embodiment, the set of features may be stored in a set X in the memory (116). The memory (116) may further include a set Y. During initialization, the set X includes the set of features and the set Y is an empty set.
[0067] At step (306), a neural network such as a DBN is trained using the standard data set of speech data, for example stored in set X, to output a set of best bats including the optimal features. At step (308), the processing subsystem (120) initiates the MOBA. The processing subsystem (120) assigns a set of bats to the set of features. In one example, each bat in the set of bats is assigned to one feature in the set of features. Thus, the processing subsystem (120) assigns 39 bats for 39 features extracted from the speech data. Further, at step (310), a set of selected parameters are assigned to each bat in the set of bats. The selected set of parameters of the bat, for example, may include a position, frequency, velocity, amplitude, and pulse rate. In one embodiment, the processing subsystem (120) assigns random values to each parameter in the set of parameters corresponding to a bat for the first iteration.
[0068] At step (312), the processing subsystem (120) arranges each bat in the set of bats in the set X on a selected plane, for example, the frequency vs. amplitude plane when processing speech data. In alternative embodiments, each bat may be arranged on a suitable plane based on a type of data being processed using the bats. When processing speech data, the bats are arranged on the frequency vs. amplitude plane based on the MFCC value of each bat. At step (314), each bat in the set of bats is updated with a random value of frequency, velocity, pulse rate, and position by the processing subsystem (120). At step (316), the neural network iteratively moves each bat towards the nearest neighboring bat based on the velocity and position of the each bat on the frequency vs. amplitude plane. The movement of the bat towards the nearest neighbor depends on the velocity and time taken by the bat to reach the nearest neighboring bat. If the time taken by the bat to reach the nearest neighboring bat exceeds a determined threshold time, for example 10 milliseconds, the bat retains its position and does not move towards the nearest neighboring bat. In contrast, the neural network iteratively moves the bats for which the time taken to reach a corresponding nearest neighboring bat is less than the determined threshold time based on previous training. When the bat moves to a new position, the values of the set of parameters corresponding to the bat in the set of bats are updated, for example, using the set of equations (1)-(5).
[0069] At step (318), the processing subsystem (120) calculates a fitness of each bat, and thereby a fitness value associated with the feature corresponding to the bat. The fitness value, for example, may be calculated using the equation (6). At step (320), the processing subsystem (120) identifies the features with increased fitness values as compared to a corresponding fitness value in a previous iteration and stores them in set Y. The set Y accordingly includes the subset of features with good fitness values. The set of bats corresponding to the subset of selected features stored in set Y is provided as the input for training the DBN. At step (322), the processing subsystem (120) determines if at least one of first and second conditions are satisfied. The first termination condition is satisfied upon a determination of whether the MOBA has executed for the predefined number of iterations using the set of features in set X or set Y. A second termination condition is satisfied when the fitness values of the subset of features remain constant for at least two continuous iterations. In one embodiment, if at least one of the conditions is satisfied, the method terminates and the subset of features in the set Y is determined to be the set of optimal features for the speech data, as depicted in step (324). The set of optimal features, thus determined, is stored in the set Y in the memory (116) and/or is communicated to the ERS (106) for emotion recognition by the second classifier. However, if neither of the first and second conditions is satisfied, execution of the method returns to step (314).
[0070] Specifically, when both the first and second conditions are not satisfied, the processing subsystem (120) considers the subset of features from the set Y for the second iteration. In the second iteration, the first classifier, for example a GMM-UBM classifier, receives the bats corresponding to the subset of features from the set Y and undergoes training based on the bats corresponding to the subset of features in the set Y. In subsequent iterations, steps (316), (318), (320), and (322) are executed. Specifically, the first classifier moves each bat from the subset of bats based on the training and calculates the fitness values for the subset of features. When at least one of the first and second conditions is satisfied, the subset of features obtained corresponds to the set of optimal features for the speech data. The set of optimal features, thus determined, is stored in the set Y in the memory (116).As previously noted, the set of optimal features is processed by the second classifier to identify the emotion of the user. In one example the second classifier is an SVM or a CNN. Once, the emotions from data acquired by each sensor in the set of sensors (104a-104n) is identified, the processing subsystem (120) identifies the emotion identified by the majority of the sensor data types as the emotion of the user in the automobile (102).
[0071] The emotion recognition system 100, thus, implements the MOBA to extract the set of optimal features from the set of features that provide information of greater relevance for emotion identification. The MOBA employs a multi-level classifier such as DBN that allows for identification of the emotion of the user from sensor data received from multiple sensors irrespective of the type of sensor data. The multi-layered structure of the DBN aids in extraction of consistent features from the sensor data using bottom-hidden layers. Further, the DBN facilitates in extraction of inconsistent features of the sensor data using top-hidden layers. The systematic extraction of the features from the sensor data assists in efficient training of the first classifier. Further, when processing a large number of features in the sensor data using a number of limited number of training samples, the DBN provides better performance in comparison to existing neural network approach for emotion identification.
[0072] Further, use of GMM-UBM as the first classifier further improves performance of the emotion identification method as GMM-UBM is a fast algorithm for learning mixture models, and therefore, helps in faster processing of the data. The GMM-UBM classifier processes the set of optimal features and accurately identifies the emotion of the user in the automobile (102). As the GMM-UBM classifier detects the emotion of the user based on the set of optimal features, accuracy of emotion identification of the user is improved. Since the set of optimal features is a subset of the set of features and includes less data, the time required to process the set of optimal features and identify the emotion is reduced considerably, thereby decreasing the latency in identification of user emotion and corresponding personalization.
[0073] Various embodiments of the present emotion recognition system (100), thus, may be used for personalizing one or more functions in various application scenarios. The personalization, for example, may include customizing an ambiance of the room, activation of a specific operational mode in a virtual reality environment, and medical applications such as activation of emergency response system based on an identified emotion of the user.
[0074] Although specific features of various embodiments of the present system and exemplary methods may be shown in and/or described with respect to some drawings and not in others, this is for convenience only. It is to be understood that the described features, structures, and/or characteristics may be combined and/or used interchangeably in any suitable manner in the various embodiments.
[0075] While various embodiments of the present system and method have been illustrated and described, it will be clear that the present system and method is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions, and equivalents will be apparent to those skilled in the art, without departing from the spirit and scope of the present system and method, as described in the claims.

Documents

Application Documents

#	Name	Date
1	Power of Attorney [31-03-2017(online)].pdf	2017-03-31
2	Form 5 [31-03-2017(online)].pdf	2017-03-31
3	Form 3 [31-03-2017(online)].pdf	2017-03-31
5	Form 18 [31-03-2017(online)].pdf_36.pdf	2017-03-31
6	Form 18 [31-03-2017(online)].pdf	2017-03-31
7	Drawing [31-03-2017(online)].pdf	2017-03-31
8	Description(Complete) [31-03-2017(online)].pdf_35.pdf	2017-03-31
9	Description(Complete) [31-03-2017(online)].pdf	2017-03-31
10	Form 5_As Filed_17-04-2017.pdf	2017-04-17
11	Form 26_Power of Attorney_17-04-2017.pdf	2017-04-17
12	Form 1_As Filed_17-04-2017.pdf	2017-04-17
13	Correspondence by Agent_Power of Attorney-17-04-2017.pdf	2017-04-17
14	201741011608-PETITION UNDER RULE 137 [05-05-2021(online)].pdf	2021-05-05
15	201741011608-FORM 3 [05-05-2021(online)].pdf	2021-05-05
16	201741011608-FER_SER_REPLY [05-05-2021(online)].pdf	2021-05-05
17	201741011608-COMPLETE SPECIFICATION [05-05-2021(online)].pdf	2021-05-05
18	201741011608-CLAIMS [05-05-2021(online)].pdf	2021-05-05
19	Description(Complete) [31-03-2017(online)].pdf_35.pdf	2017-03-31
19	201741011608-FER.pdf	2021-10-17
20	201741011608-US(14)-HearingNotice-(HearingDate-22-03-2024).pdf	2024-02-20
21	201741011608-FORM-26 [27-02-2024(online)].pdf	2024-02-27
22	201741011608-Correspondence to notify the Controller [27-02-2024(online)].pdf	2024-02-27
23	201741011608-Written submissions and relevant documents [03-04-2024(online)].pdf	2024-04-03
24	201741011608-Annexure [03-04-2024(online)].pdf	2024-04-03
25	201741011608-PatentCertificate14-06-2024.pdf	2024-06-14
26	201741011608-IntimationOfGrant14-06-2024.pdf	2024-06-14

Search Strategy

1	Searchstrategy_201741011608E_29-10-2020.pdf