Sign In to Follow Application
View All Documents & Correspondence

Method And System For Analyzing Emotions

Abstract: A method and system for analyzing emotions is provided. The method includes segmenting audio of a media content, wherein each audio segment is further split into multiple frames. The method also includes identifying feature levels associated with the multiple frames. Further, the method also includes acquiring features from the multiple frames based on the feature levels identified. Further, the method also includes classifying the multiple frames using a first gaussian mixture model based on the features acquired. Furthermore, the method includes identifying the emotion using a second gaussian mixture model based on the classification and providing services based on the identified emotion. The system includes a communication interface for receiving a media content on a communication channel and a processor for processing the received media content.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
27 July 2009
Publication Number
05/2011
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

Samsung Electronics Company
416 Maetan-Dong  Yeongtong-GU  SUWON-SI Gyeonggi-do 442-742 Republic of Korea

Inventors

1. Palaniswami Krishnamoorthy
Samsung India Software Center  Ground & Ist floor  Logix Infotech Park D-5  Sector-59  Noida (U.P.)-201305 India
2. Shailendra Singh
Samsung India Software Center Ground & Ist floor  Logix Infotech Park D-5  Sector-59  Noida (U.P.)-201305 India
3. Rajen Bhatt.
Samsung India Software Center Ground & Ist floor  Logix Infotech Park D-5  Sector-59  Noida (U.P.)-201305 India

Specification

METHOD AND SYSTEM FOR ANALYZING EMOTIONS
FIELD

[0001]
The present disclosure relates generally to the field of multimedia. More particularly, the present disclosure relates to a method and a system for analyzing emotions.
BACKGROUND
[0002]
In a rapidly developing environment, mobiles are being built to automate manual processes. The automating of the processes include mapping to functions based on parameters. Currently, the applications in the mobile devices can play music by detecting user’s activity level. The mapping of music to activity level is based on artificial network. However, the activity levels are detected based on sensing the parameters such as movement of the user, acceleration and heat of the user’s body. Further, for sensing the parameters one or more sensors should be physically attached to the user.
[0003]
In light of the foregoing discussion there is a need for a method and system for analyzing emotions and providing services based on the emotions.
SUMMARY
[0004]
Embodiments of the present disclosure described herein provide a method and system for analyzing emotions.
[0005]
An example of a method for analyzing emotions include segmenting audio of a media content, wherein each audio segment is further split into multiple frames. The
2/16
method also includes identifying feature levels associated with the multiple frames. Further, the method also includes acquiring the features from the multiple frames based on the feature levels identified. Further, the method also includes classifying the multiple frames using a first gaussian mixture model based on the features acquired. Furthermore, the method includes identifying the emotion using a second gaussian mixture model based on the classification and providing services based on the identified emotion.
[0006]
An example of a system for analyzing emotions includes one or more electronic devices. The electronic device includes a communication interface for receiving a media content on a communication channel. The electronic device also includes a processor responsive to the received media content to segment the audio of the media content, wherein the audio segment is further split into multiple frames. The processor also identifies feature levels associated with the audio segments. Further, the processor acquires the features from the audio segments based on the feature levels identified. Further, the processor also classifies the audio segments using a first gaussian mixture model based on the features acquired. Furthermore, the processor identifies the emotion using a second gaussian mixture model based on the classification and provides services based on the identified emotion.
BRIEF DESCRIPTION OF FIGURES
[0007]
The accompanying figures, similar reference numerals may refer to identical or functionally similar elements. These reference numerals are used in the detailed description to illustrate various embodiments and to explain various aspects and advantages of the present disclosure.
3/16
[0008]
FIG. 1 is a block diagram of an environment, in accordance with which various embodiments can be implemented;
[0009]
FIG. 2 is a block diagram of an electronic device, in accordance with one embodiment; and
[0010]
FIG. 3 illustrates a method for analyzing emotions, in accordance with one embodiment.
[0011]
Persons skilled in the art will appreciate that elements in the figures are illustrated for simplicity and clarity and may have not been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of various embodiments of the present disclosure.
DETAILED DESCRIPTION
[0012]
It should be observed that method steps and system components have been represented by conventional symbols in the figures, showing only specific details that are relevant for an understanding of the present disclosure. Further, details that may be readily apparent to person ordinarily skilled in the art may not have been disclosed. In the present disclosure, relational terms such as first and second, and the like, may be used to distinguish one entity from another entity, without necessarily implying any actual relationship or order between such entities.
[0013]
Embodiments of the present disclosure described herein provide a method and system for analyzing emotions. 4/16
[0014]
FIG. 1 is a block diagram of an environment 100, in accordance with which various embodiments can be implemented.
[0015]
The environment 100 includes a network 105, and one or more electronic devices, for example an electronic device 110a and an electronic device 110b connected to each other through the network 105.
[0016]
Examples of the electronic devices include, but are not limited to, computers, television, laptops, mobile devices, handheld devices and personal digital assistants (PDAs). Examples of the network 105 include, but are not limited to, a Worldwide Interoperability for Microwave Access (WiMax) network, local area network (LAN), a wide area network (WAN), Bluetooth network, internet protocol multimedia subsystem (IMS) network, infrared network, zigbee network, wireless LAN network (WLAN) or any other wireless network specified by the Institute of Electrical and Electronics Engineers (IEEE).
[0017]
The electronic device 110a receives a call from the electronic device 110b through a network 105. The electronic device 110a includes a plurality of elements for detecting the audio content in the call, analyzing the audio content, identifying emotions associated with the audio content and providing services based on the emotions.
[0018]
The electronic device 110a including the elements is explained in detail in FIG. 2.
[0019]
FIG. 2 is a block diagram of the electronic device 110a, in accordance with one embodiment.
5/16
[0020]
The electronic device 110a includes a bus 205 for communicating information, and a processor 210 coupled with the bus 205 for processing information. The electronic device 110a also includes a memory 215, for example a random access memory (RAM) coupled to the bus 205 for storing information required by the processor 210. The memory 215 can be used for storing temporary information required by the processor 210. The electronic device 110a further includes a read only memory (ROM) 220 coupled to the bus 205 for storing static information required by the processor 210. A storage unit 225, for example a magnetic disk, hard disk or optical disk, can be provided and coupled to bus 205 for storing information. The storage unit 205 stores data associated with the services.
[0021]
The electronic device 110a can be coupled via the bus 205 to a display 230, for example a cathode ray tube (CRT) or liquid crystal display (LCD), for displaying information. An input device 235, including various keys, is coupled to the bus 205 for communicating information to the processor 210. In some embodiments, cursor control 240, for example a mouse, a trackball, a joystick, or cursor direction keys for communicating information to the processor 210 and for controlling cursor movement on the display 230 can also be present.
[0022]
In one embodiment, the steps of the present disclosure are performed by the electronic device 110a using the processor 210. The information can be read into the memory 215 from a machine-readable medium, for example the storage unit 225. In alternative embodiments, hard-wired circuitry can be used in place of or in combination with software instructions to implement various embodiments.
6/16
[0023]
The term machine-readable medium can be defined as a medium providing data to a machine to enable the machine to perform a specific function. The machine-readable medium can be a storage media. Storage media can include non-volatile media and volatile media. The storage unit 225 can be a non-volatile media. The memory 215 can be a volatile media. All such media must be tangible to enable the instructions carried by the media to be detected by a physical mechanism that reads the instructions into the machine.
[0024]
Examples of the machine readable medium includes, but are not limited to, a floppy disk, a flexible disk, hard disk, magnetic tape, a CD-ROM, optical disk, punchcards, papertape, a RAM, a PROM, EPROM, and a FLASH-EPROM.
[0025]
The machine readable medium can also include online links, download links, and installation links providing the information to the processor 210.
[0026]
The electronic device 110a also includes a communication interface 245 coupled to the bus 205 for enabling data communication. Examples of the communication interface 245 include, but are not limited to, an integrated services digital network (ISDN) card, a modem, a local area network (LAN) card, an infrared port, a Bluetooth port, a zigbee port, and a wireless port.
[0027]
In some embodiments, the processor 210 can include one or more processing units for performing one or more functions of the processor 210. The processing units are hardware circuitry performing specified functions.
[0028]
The one or more functions include segmenting audio of a media content. Each audio segment is further divided into multiple frames. The one or more functions also 7/16
include identifying feature levels associated with the multiple frames, acquiring the features from the multiple frames based on the feature levels identified, classifying the multiple frames using a first gaussian mixture model based on the acquired features, identifying the emotions using a second gaussian mixture model based on the classification, and providing services based on the identified emotion.
[0029]
FIG. 3 illustrates a method for analyzing emotions, in accordance with one embodiment.
[0030]
A first electronic device may receive a call from a second electronic device. An application in the first electronic device may analyze media content communicated between users of the electronic devices during the call. In an embodiment, audio being communicated from the first electronic device is determined by the application.
[0031]
In another embodiment, the application can determine the audio communicated from the second electronic device.
[0032]
The method starts at step 305.
[0033]
At step 310, the audio of the media content is segmented. The DC components of the audio segments are removed, and the audio segment is further sampled at pre-determined frequency, for example the pre-determined frequency is 8 kHz. Each audio segment is further divided into multiple frames of certain duration, for example frame corresponding to 5ms duration. 8/16
[0034]
At step 315, feature levels of the audio associated with the multiple frames is identified. Each frame is determined as silence or non-silence based on the identification. In some embodiments, the feature levels correspond to amplitude of the audio.
[0035]
The identification is performed by processing short time energy, short time spectral entropy and autocorrelation peak value of the audio frames. The value derived from the processing the short time energy, the short time spectral entropy and the autocorrelation peak value is compared to a pre-determined threshold. If the value is less than the threshold, then the frame is determined as silence. If the value exceeds the threshold, the frame is considered as non-silence or environmental noise. Each audio segment is determined as silence or non-silence based on the count of the frames corresponding to the silence or non-silence. The non-silence may include speech content and environmental noises. Further, the speech content corresponding to the user is extracted from the non-silence frames.
[0036]
At step 320, the features are acquired from the multiple frames. The features correspond to the extracted speech content. The multiple frames correspond to the frames classified as non-silence in step 315.
[0037]
At step 325, the multiple frames are classified using a first gaussian mixture model (GMM) based on the features acquired. The classification can be one of pure speech and non speech. The non-speech may include music or mixture of speech and music.
[0038]
The first GMM classifies the frames by computing normalized root mean square amplitude, a low short-time energy ratio, a variance of log energy, a mean and a variance
9/16
of spectral flux, a variance of differential pitch and a variance of first five Mel frequency cepstral coefficients excluding the first coefficient. Each audio segment is determined as speech segment or non speech segment based on the count of the frames corresponding to the speech or non-speech classification.
[0039]
In some embodiments, method described in step 320 may be performed on the non speech segment.
[0040]
At step 330, the emotions are identified using a second GMM based on the classification. The emotions are identified for audio segments classified as speech segments.
[0041]
The second GMM identifies the emotions associated with the audio segments classified as the speech segments by computing one of vocal tract format structure, a glottal source structure, the mel frequency cepstral coefficients based parameters, auto correlation of the mel frequency cepstral coefficients and a combination thereof.
[0042]
The second GMM further classifies the emotions of the audio segments as at least one of a happy emotion, a sad emotion, an angry emotion and a Lombard. Further, segmental classification of the audio segments associated with emotions is performed. The segmental classification is based on a “winner takes all logic”.
[0043]
For example, consider an audio content being analyzed for emotions. The audio content is further segmented into five audio segments. The emotion associated with first two audio segments is identified as happy, the emotion associated with third, fourth and
10/16
fifth audio segment are identified as Lombard, sad and angry emotion respectively. Here based on the “winner takes all logic”, the audio content is associated with happy emotion.
[0044]
At step 335, services are provided based on the identified emotion.
[0045]
Examples of services for happy emotion include, but are not limited to, a messaging service, connecting to movie database and suggestion of movies, connecting to travel database and suggestion of places for a trip, connecting to songs database and suggestion of songs.
[0046]
In some embodiments, the messaging service may be one of short message service (SMS) and multimedia messaging service (MMS).
[0047]
Examples of services for sad emotion include, but are not limited to, providing motivational proverbs, connection to tourism database and suggestion of places for a trip, and connecting to songs database and suggestion of the songs.
[0048]
Examples of services for angry emotion include, but are not limited to, messaging of jokes, cartoon clips or videos and comedy videos.
[0049]
Examples of services for lombard include, but are not limited to, connecting to songs database and suggestion of songs, connecting to FM radio, or perform noise cancellation on the user’s speech during the call.
[0050]
In some embodiments, the messaging service may be one of short message service (SMS) and multimedia messaging service (MMS).
[0051]
In some embodiment, the providing of the services may be user defined.
11/16
[0052]
The method ends at step 340.
[0053]
In some embodiment, the method described in FIG. 3 may be performed on audio content received from the second electronic device.
[0054]
In the preceding specification, the present disclosure and its advantages have been described with reference to specific embodiments. However, it will be apparent to a person of ordinary skill in the art that various modifications and changes can be made, without departing from the scope of the present disclosure, as set forth in the claims below. Accordingly, the specification and figures are to be regarded as illustrative examples of the present disclosure, rather than in restrictive sense. All such possible modifications are intended to be included within the scope of the present disclosure.
12/16

I/We claim:

1.
A method of analyzing emotions, the method comprising:
segmenting audio of a media content, wherein each audio segment is further split into multiple frames;
identifying feature levels associated with the multiple frames;
acquiring the features from the multiple frames based on the identifying;
classifying the multiple frames using a first gaussian mixture model based on the acquiring;
identifying the emotion using a second gaussian mixture model based on the classification; and
providing services based on the identified emotion.
2.
The method of claim 1, wherein the acquiring comprises:
extracting the features from the multiple frames;
3.
The method of claim 1, wherein the classification and identification of emotions is performed by a fuzzy decision tree.
4.
The method of claim 1, wherein the audio feature levels are identified by determining at least one of a short time energy, a short time spectral entropy, an autocorrelation peak value of frames of the multiple frames and a combination thereof.
13/16
5.
The method of claim 1, wherein the multiple frames are classified into at least one of speech and non-speech.
6.
The method of claim 4, wherein the multiple frames are classified using at least one of a normalized root mean square amplitude, a low short-time energy ratio, a non-zero pitch ratio, a variance of first five Mel frequency cepstral coefficients excluding the first coefficient and a combination thereof.
7.
The method of claim 1, wherein the emotion of the multiple frames are identified using a vocal tract format structure, a glottal source structure, the mel frequency cepstral coefficients based parameters, auto correlation of the mel frequency cepstral coefficients, and a combination thereof.
8.
The method of claim 1, wherein the emotion is associated with audio segment based on the emotions of the multiple frames associated with the audio segment.
9.
The method of claim 1, wherein the emotions are at least one of a happy emotion, a sad emotion, an angry emotion and a lombard.
10.
The method of claim 1, wherein the identification of emotion comprises:
performing segmental classification of the audio segments;
14/16
11.
A system for analyzing emotions, the system comprising:
one or more electronic devices;
the electronic device comprising:
a communication interface for receiving a media content on a communication channel; and
a processor responsive to the received media content to:
segment the audio of the media content, wherein the audio segment is further split into multiple frames;
identify feature levels associated with the audio segments;
acquire the features from the audio segments based on the identifying;
classify the audio segments using a first gaussian mixture model based on the acquiring;
identify the emotion using a second gaussian mixture model based on the classification; and
provide services based on the identified emotion.

Documents

Application Documents

# Name Date
1 1766-CHE-2009 POWER OF ATTORNEY 28-05-2010.pdf 2010-05-28
1 1766-CHE-2009-AbandonedLetter.pdf 2018-05-21
2 1766-CHE-2009-Changing Name-Nationality-Address For Service [14-02-2018(online)].pdf 2018-02-14
2 1766-CHE-2009 OTHER PATENT DOCUMENT 28-05-2010.pdf 2010-05-28
3 1766-CHE-2009-RELEVANT DOCUMENTS [14-02-2018(online)].pdf 2018-02-14
3 1766-che-2009 form-1 28-05-2010.pdf 2010-05-28
4 1766-CHE-2009-FER.pdf 2017-11-16
4 1766-CHE-2009 FORM-18 27-06-2011.pdf 2011-06-27
5 Drawings.pdf 2011-09-03
5 1766-CHE-2009 CORRESPONDENCE OTHERS 27-06-2011.pdf 2011-06-27
6 Form-1.pdf 2011-09-03
6 1766-CHE-2009 POWER OF ATTORNEY 27-06-2011.pdf 2011-06-27
7 Power of Authority.pdf 2011-09-03
7 Form-3.pdf 2011-09-03
8 Form-5.pdf 2011-09-03
9 Power of Authority.pdf 2011-09-03
9 Form-3.pdf 2011-09-03
10 1766-CHE-2009 POWER OF ATTORNEY 27-06-2011.pdf 2011-06-27
10 Form-1.pdf 2011-09-03
11 Drawings.pdf 2011-09-03
11 1766-CHE-2009 CORRESPONDENCE OTHERS 27-06-2011.pdf 2011-06-27
12 1766-CHE-2009-FER.pdf 2017-11-16
12 1766-CHE-2009 FORM-18 27-06-2011.pdf 2011-06-27
13 1766-CHE-2009-RELEVANT DOCUMENTS [14-02-2018(online)].pdf 2018-02-14
13 1766-che-2009 form-1 28-05-2010.pdf 2010-05-28
14 1766-CHE-2009-Changing Name-Nationality-Address For Service [14-02-2018(online)].pdf 2018-02-14
14 1766-CHE-2009 OTHER PATENT DOCUMENT 28-05-2010.pdf 2010-05-28
15 1766-CHE-2009-AbandonedLetter.pdf 2018-05-21
15 1766-CHE-2009 POWER OF ATTORNEY 28-05-2010.pdf 2010-05-28

Search Strategy

1 1766-che-2009_26-09-2017.pdf