Spotting Facial Micro Expressions Of A User Based Upon An Estimation

< Back

Spotting Facial Micro Expressions Of A User Based Upon An Estimation Of Instantaneous Heart Rates

Abstract: Spotting a plurality of facial micro-expressions of a user by estimating a plurality of instantaneous heart rates of the user is provided. A few of the traditional systems and methods provide for spotting facial micro-expressions but comprise unavoidable human pose variations and eye-blinks. Embodiments of the present disclosure provide for spotting the plurality of facial micro-expressions by estimating the plurality of instantaneous heart rates by extracting, from a facial region of a user, a plurality of Region of Interests (ROIs); extracting a temporal signal from each of the plurality of ROIs; filtering, a plurality of temporal signals extracted to obtain a filtered set of signals; extracting a pulse signal from the filtered set of signals; estimating, from the pulse signal, the plurality of instantaneous heart rates of the user; and spotting, based upon the plurality of the instantaneous heart rates, the plurality of facial micro-expressions of the user.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

08 June 2018

Publication Number

50/2019

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

ip@legasis.in

Parent Application

Patent Number

Legal Status

Grant Date

2024-03-14

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point, Mumbai 400021, Maharashtra, India

Inventors

1. GUPTA, Puneet

Tata Consultancy Services Limited, Ecospace Plot - IIF/12, New Town, Rajarhat, Kolkata - 700156, West Bengal, India

2. BHOWMICK, Brojeshwar

Tata Consultancy Services Limited, Ecospace Plot - IIF/12, New Town, Rajarhat, Kolkata - 700156, West Bengal, India

3. PAL, Arpan

Tata Consultancy Services Limited, Ecospace Plot - IIF/12, New Town, Rajarhat, Kolkata - 700156, West Bengal, India

Specification

Claims:1. A method for spotting a plurality of facial micro-expressions of a user based upon an estimation of a plurality of instantaneous heart rates of the user, the method comprising a processor implemented steps of:
detecting, via one or more hardware processors, a facial region of the user by a Constrained Local Neural Field (CNLF) technique (201);
extracting, a plurality of Region of Interests (ROIs) from the facial region detected, wherein the plurality of ROIs comprise skin pixels obtained by applying a morphological erosion technique on the facial region detected (202);
performing, by the one or more hardware processors, a plurality of steps based upon the plurality of ROIs extracted, wherein the plurality of steps comprise (203):
(i) extracting a temporal signal from each of the plurality of ROIs, wherein the temporal signal comprises a cardiovascular pulse signal of the user and noise (203(i));
(ii) filtering, by a band-pass filter technique, a plurality of temporal signals extracted from the plurality of ROIs to obtain a filtered set of signals (203(ii));
(iii) extracting a pulse signal from the filtered set of signals using a Kurtosis based optimization technique (203(iii)); and
(iv) estimating, based upon the pulse signal extracted, the plurality of instantaneous heart rates of the user, wherein estimating the plurality of instantaneous heart rates comprises (203(iv)):
(a) partitioning the pulse signal into a plurality of overlapping windows, wherein each of the plurality of overlapping windows comprises a sequence of frames corresponding to a facial video of the user (203(iv)(a)); and
(b) estimating the plurality of instantaneous heart rates based upon the plurality of overlapping windows (203(iv)(b)); and
spotting, based upon a plurality of variations in the plurality of the instantaneous heart rates estimated, the plurality of facial micro-expressions of the user (204).

2. The method of claim 1, wherein the step of spotting the plurality of facial micro-expressions comprises classifying, based upon the plurality of variations in the plurality of instantaneous heart rates, a plurality of plausible peaks as either a genuine peak or a spurious peak.

3. The method of claim 2, wherein each of the plurality of plausible peaks correspond to at least one of a facial micro-expressions spot amongst the plurality of facial micro-expressions spotted.

4. The method of claim 2, wherein the step of classifying the plurality of plausible peaks is preceded by computing, based upon the pulse signal extracted, the plurality of variations in the plurality of instantaneous heart rates to spot the plurality of facial micro-expressions of the user.

5. The method of claim 4, wherein the plurality of variations are computed as a sum of changes in a plurality of neighboring heart rates, wherein each of the plurality of neighboring heart rates correspond to the plurality of instantaneous heart rates estimated.

6. The method of claim 1, wherein the estimation of the plurality of instantaneous heart rates based upon the plurality of overlapping windows comprises:
(i) extracting the pulse signal partitioned from each of the plurality of overlapping windows; and
(ii) applying a Fast Fourier Transform (FFT) technique on the partitioned pulse signal extracted to estimate the plurality of instantaneous heart rates, wherein each of the plurality of instantaneous heart rates estimated corresponds to at least one of the plurality of overlapping windows.

7. A system (100) for spotting a plurality of facial micro-expressions of a user based upon an estimation of a plurality of instantaneous heart rates of the user, the system (100) comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
detect, by a Constrained Local Neural Field (CNLF) technique, a facial region of the user;
extract, a plurality of Region of Interests (ROIs) from the facial region detected, wherein the plurality of ROIs comprise skin pixels obtained by applying a morphological erosion technique on the facial region detected;
perform, based upon the plurality of ROIs extracted, a plurality of steps, wherein the plurality of steps comprise:
(i) extract a temporal signal from each of the plurality of ROIs, wherein the temporal signal comprises a cardiovascular pulse signal of the user and noise;
(ii) filter, by a band-pass filter technique, a plurality of temporal signals extracted from the plurality of ROIs to obtain a filtered set of signals;
(iii) extract a pulse signal from the filtered set of signals using a Kurtosis based optimization technique; and
(iv) estimate, based upon the pulse signal extracted, the plurality of instantaneous heart rates of the user, wherein estimation of the plurality of instantaneous heart rates comprises:
(a) partition the pulse signal into a plurality of overlapping windows, wherein each of the plurality of overlapping windows comprises a sequence of frames corresponding to a facial video of the user; and
(b) estimate the plurality of instantaneous heart rates based upon the plurality of overlapping windows; and
spot, based upon a plurality of variations in the plurality of the instantaneous heart rates estimated, the plurality of facial micro-expressions of the user.

8. The system (100) of claim 7, wherein the one or more hardware processors (104) are configured to spot the plurality of facial micro-expressions by classifying, based upon the plurality of variations in the plurality of instantaneous heart rates, a plurality of plausible peaks as either a genuine peak or a spurious peak.

9. The system (100) of claim 8, wherein each of the plurality of plausible peaks correspond to at least one of a facial micro-expressions spot amongst the plurality of facial micro-expressions spotted.

10. The system (100) of claim 8, wherein the one or more hardware processors (104) are configured to classify the plurality of plausible peaks by computing, based upon the pulse signal extracted, the plurality of variations in the plurality of instantaneous heart rates to spot the plurality of facial micro-expressions of the user.

11. The system (100) of claim 10, wherein the one or more hardware processors (104) are configured to compute the plurality of variations as a sum of changes in a plurality of neighboring heart rates, wherein each of the plurality of neighboring heart rates correspond to the plurality of instantaneous heart rates estimated.

12. The system (100) of claim 7, wherein the estimate of the plurality of instantaneous heart rates based upon the plurality of overlapping windows comprises:
(i) extracting the pulse signal partitioned from each of the plurality of overlapping windows; and
(ii) applying a Fast Fourier Transform (FFT) technique on the partitioned pulse signal extracted to estimate the plurality of instantaneous heart rates, wherein each of the plurality of instantaneous heart rates estimated corresponds to at least one of the plurality of overlapping windows.
, Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:

SPOTTING FACIAL MICRO-EXPRESSIONS OF A USER BASED UPON AN ESTIMATION OF INSTANTANEOUS HEART RATES

Applicant

Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD
The disclosure herein generally relates to spotting facial micro-expressions of a user based upon an estimation of instantaneous heart rates, and, more particularly, to spotting a plurality of facial micro-expressions of a user based upon an estimation of a plurality of instantaneous heart rates of the user.

BACKGROUND
Facial micro-expressions are manifested by human reflexive behavior and thus are useful to disclose genuine human emotions. Analysis of the facial micro-expressions plays a pivotal role in many real-world applications comprising computing, biometrics and psychotherapy. Micro-expressions spotting (which is a part of micro-expressions analysis) involves detection of the micro-expressions affected frames from a video. In general, any changes in human emotions not only manifest the micro-expressions but also introduces a plurality of variations in instantaneous heart rates of a user.
Some of the prominent facial deformations are generated by a plurality of unavoidable pose variations and eye blinks. Further, subtle facial deformations may be generated by the micro-expressions. A facial deformation due to the micro-expressions may also be generated by deformations due to macro-expression. Subtle facial deformations due to pose variations, eye blinks and macro-expression can be easily misinterpreted as deformations due to the micro-expressions which eventually result in erroneous micro-expressions spotting.
Generally, micro-expressions spotting is performed by analyzing temporal deformations that are produced by either variations in facial appearances or movements of discriminative facial points. Both these deformations are inadequate for correct and highly accurate micro-expressions spotting. Thus, the traditional systems and methods may suffer from a plethora of problems such as a facial appearance influenced by illumination, eye-blinking and macro-expressions, and a plurality of movements of facial points that results in an erroneous micro-expressions spotting due to inaccurate localization.
Generally, critical information corresponding to the micro-expression spotting is generated from lips, eyes and eye-brows, since facial expressions are manifested by the contraction or stretching of facial arteries present in these facial areas. But most of the above mentioned appearance based feature encoding consider a full face region to extract facial appearances which may not be a requirement. Hence, the appearance based feature encoding may comprise large amount of redundant information.
Some of the traditional systems and methods analyze feature encoding to detect frames affected by the micro-expressions. However, a plurality of spurious peaks may be generated in the feature encoding due to macro-expressions, eye-blinking and background noise. In such a case, apex frames (which a category of video frames) may get misclassified which in-turn results in incorrect or inaccurate micro-expressions spotting.

SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for spotting a plurality of facial micro-expressions of a user based upon an estimation of a plurality of instantaneous heart rates of the user is provided, the method comprising: detecting, via one or more hardware processors, a facial region of the user by a Constrained Local Neural Field (CNLF) technique; extracting, a plurality of Region of Interests (ROIs) from the facial region detected, wherein the plurality of ROIs comprise skin pixels obtained by applying a morphological erosion technique on the facial region detected; performing, by the one or more hardware processors, a plurality of steps based upon the plurality of ROIs extracted, wherein the plurality of steps comprise: (i) extracting a temporal signal from each of the plurality of ROIs, wherein the temporal signal comprises a cardiovascular pulse signal of the user and noise; (ii) filtering, by a band-pass filter technique, a plurality of temporal signals extracted from the plurality of ROIs to obtain a filtered set of signals; (iii) extracting a pulse signal from the filtered set of signals using a Kurtosis based optimization technique; and (iv) estimating, based upon the pulse signal extracted, the plurality of instantaneous heart rates of the user, wherein estimating the plurality of instantaneous heart rates comprises: (a) partitioning the pulse signal into a plurality of overlapping windows, wherein each of the plurality of overlapping windows comprises a sequence of frames corresponding to a facial video of the user; and (b) estimating the plurality of instantaneous heart rates based upon the plurality of overlapping windows; spotting, based upon a plurality of variations in the plurality of the instantaneous heart rates estimated, the plurality of facial micro-expressions of the user; classifying, based upon the plurality of variations in the plurality of instantaneous heart rates, a plurality of plausible peaks as either a genuine peak or a spurious peak; computing, based upon the pulse signal extracted, the plurality of variations in the plurality of instantaneous heart rates to spot the plurality of facial micro-expressions of the user; computing the plurality of variations as a sum of changes in a plurality of neighboring heart rates, wherein each of the plurality of neighboring heart rates correspond to the plurality of instantaneous heart rates estimated; and estimating of the plurality of instantaneous heart rates based upon the plurality of overlapping windows by: (i) extracting the pulse signal partitioned from each of the plurality of overlapping windows; and (ii) applying a Fast Fourier Transform (FFT) technique on the partitioned pulse signal extracted to estimate the plurality of instantaneous heart rates, wherein each of the plurality of instantaneous heart rates estimated corresponds to at least one of the plurality of overlapping windows.
In another aspect, there is provided a system for spotting a plurality of facial micro-expressions of a user based upon an estimation of a plurality of instantaneous heart rates of the user, the system comprising a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: detect, by a Constrained Local Neural Field (CNLF) technique, a facial region of the user; extract, a plurality of Region of Interests (ROIs) from the facial region detected, wherein the plurality of ROIs comprise skin pixels obtained by applying a morphological erosion technique on the facial region detected; perform, based upon the plurality of ROIs extracted, a plurality of steps, wherein the plurality of steps comprise: (i) extract a temporal signal from each of the plurality of ROIs, wherein the temporal signal comprises a cardiovascular pulse signal of the user and noise; (ii) filter, by a band-pass filter technique, a plurality of temporal signals extracted from the plurality of ROIs to obtain a filtered set of signals; (iii) extract a pulse signal from the filtered set of signals using a Kurtosis based optimization technique; and (iv) estimate, based upon the pulse signal extracted, the plurality of instantaneous heart rates of the user by partitioning the pulse signal into a plurality of overlapping windows, wherein each of the plurality of overlapping windows comprises a sequence of frames corresponding to a facial video of the user; and estimating the plurality of instantaneous heart rates based upon the plurality of overlapping windows; spot, based upon a plurality of variations in the plurality of the instantaneous heart rates estimated, the plurality of facial micro-expressions of the user; spot the plurality of facial micro-expressions by classifying, based upon the plurality of variations in the plurality of instantaneous heart rates, a plurality of plausible peaks as either a genuine peak or a spurious peak; classify the plurality of plausible peaks by computing, based upon the pulse signal extracted, the plurality of variations in the plurality of instantaneous heart rates to spot the plurality of facial micro-expressions of the user; compute the plurality of variations as a sum of changes in a plurality of neighboring heart rates, wherein each of the plurality of neighboring heart rates correspond to the plurality of instantaneous heart rates estimated; and estimate the plurality of instantaneous heart rates based upon the plurality of overlapping windows by: (i) extracting the pulse signal partitioned from each of the plurality of overlapping windows; and (ii) applying a Fast Fourier Transform (FFT) technique on the partitioned pulse signal extracted to estimate the plurality of instantaneous heart rates, wherein each of the plurality of instantaneous heart rates estimated corresponds to at least one of the plurality of overlapping windows.
In yet another aspect, there is provided one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes the one or more hardware processors to perform a method for spotting a plurality of facial micro-expressions of a user based upon an estimation of a plurality of instantaneous heart rates of the user, the method comprising: detecting, a facial region of the user by a Constrained Local Neural Field (CNLF) technique; extracting, a plurality of Region of Interests (ROIs) from the facial region detected, wherein the plurality of ROIs comprise skin pixels obtained by applying a morphological erosion technique on the facial region detected; performing, a plurality of steps based upon the plurality of ROIs extracted, wherein the plurality of steps comprise: (i) extracting a temporal signal from each of the plurality of ROIs, wherein the temporal signal comprises a cardiovascular pulse signal of the user and noise; (ii) filtering, by a band-pass filter technique, a plurality of temporal signals extracted from the plurality of ROIs to obtain a filtered set of signals; (iii) extracting a pulse signal from the filtered set of signals using a Kurtosis based optimization technique; and (iv) estimating, based upon the pulse signal extracted, the plurality of instantaneous heart rates of the user, wherein estimating the plurality of instantaneous heart rates comprises: (a) partitioning the pulse signal into a plurality of overlapping windows, wherein each of the plurality of overlapping windows comprises a sequence of frames corresponding to a facial video of the user; and (b) estimating the plurality of instantaneous heart rates based upon the plurality of overlapping windows; spotting, based upon a plurality of variations in the plurality of the instantaneous heart rates estimated, the plurality of facial micro-expressions of the user; classifying, based upon the plurality of variations in the plurality of instantaneous heart rates, a plurality of plausible peaks as either a genuine peak or a spurious peak; computing, based upon the pulse signal extracted, the plurality of variations in the plurality of instantaneous heart rates to spot the plurality of facial micro-expressions of the user; computing the plurality of variations as a sum of changes in a plurality of neighboring heart rates, wherein each of the plurality of neighboring heart rates correspond to the plurality of instantaneous heart rates estimated; and estimating of the plurality of instantaneous heart rates based upon the plurality of overlapping windows by: (i) extracting the pulse signal partitioned from each of the plurality of overlapping windows; and (ii) applying a Fast Fourier Transform (FFT) technique on the partitioned pulse signal extracted to estimate the plurality of instantaneous heart rates, wherein each of the plurality of instantaneous heart rates estimated corresponds to at least one of the plurality of overlapping windows.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.
FIG. 1 illustrates a block diagram of a system for spotting a plurality of facial micro-expressions of a user based upon an estimation of a plurality of instantaneous heart rates of the user, in accordance with some embodiments of the present disclosure.
FIG. 2A through 2B is a flow diagram illustrating the steps involved in the process of spotting the plurality of facial micro-expressions of the user based upon the estimation of the plurality of instantaneous heart rates of the user, in accordance with some embodiments of the present disclosure.
FIG. 3 illustrates a visual representation of a detected facial region of the user, a plurality of Region of Interest (ROIs) extracted, a plurality of temporal signals extracted, pulse signal extracted from a filtered set of signals, and the plurality of instantaneous heart rates estimated, in accordance with some embodiments of the present disclosure.
FIG. 4 illustrates a graphical representation of a plurality of Receiver Operating Characteristic (ROC) curves corresponding to the proposed disclosure and an example system, and genuine micro-expressions spotted (amongst the plurality of facial micro-expressions spotted) at a low threshold, in accordance with some embodiments of the present disclosure.
FIG. 5 shows a graphical representation of an example of a plurality of plausible peaks classified as either a genuine peak or a spurious peak, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.
Embodiments of the present disclosure provide systems and methods for spotting a plurality of facial micro-expressions of a user based upon an estimation of a plurality of instantaneous heart rates of a user. Human expression can be classified as facial macro-expressions which are usually observed in day-to-day life and are easily elucidated by humans, and facial micro-expressions which arise from human reflexive behavior. Since micro-expressions arises from the human reflexive behavior which is hard to control or suppress, they are hard to suppress and disclose the genuine human expressions. Hence, the micro-expressions are useful in a plurality of applications comprising, inter-alia, business negotiations and security applications where lie detection is required to circumvent frauds, psychotherapy for monitoring suspicious intent or surveillance, applications in the realm of affective computing that require the understanding of human genuine emotions like commercial advertisement rating, and virtual / augmented reality for face synthesis.
Micro-expressions analysis comprises two stages, that is, micro-expressions spotting which refers to the localization of micro-expressions frames in a face video, and micro-expressions recognition wherein the spotted frames are analyzed to classify the expression. Micro-expressions spotting per se is a highly challenging problem as the micro-expressions are generated for short duration (usually 1/25 to 1/5 of a second) by subtle stretching or contraction of facial arteries located in the face areas belonging to like lips, eyes and mouth. Human eyes are unable to process such short duration subtle facial temporal movements.
A micro-expression spotting system comprises a face alignment technique / module wherein a face present in each video frame is detected and aligned to a common reference so as to handle the geometric deformations, feature encoding where subtle temporal deformations in the video frames are represented, and spotting the micro-expressions frames by analyzing the temporal deformations. The efficacy of the micro-expressions spotting is highly dependent on feature encoding. The micro-expression spotting cannot be correct unless the feature encoding does not correctly represent a plurality of subtle facial temporal deformations that are generated by the micro-expressions and simultaneously mitigate the spurious temporal deformations generated by expressions and illuminations. Some of the traditional systems and methods analyze feature encoding to detect frames affected by the micro-expressions. However, various spurious peaks may be generated in the feature encoding due to macro-expressions, eye-blinking and background noise. In such a case, apex frames (which a category of video frames) may get misclassified which in-turn results in incorrect or inaccurate micro-expressions spotting.
Further, fluctuations in heart rates may be observed when human emotion changes. Hence, there is need for a technical solution that can accurately estimate a plurality of instantaneous heart rates of a user, and spot, based upon any variations in the plurality of the instantaneous heart rates estimated, the facial micro-expressions of the user.
Referring now to the drawings, and more particularly to FIG. 1 through FIG. 5, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates an exemplary block diagram of a system 100 for spotting a plurality of facial micro-expressions of a user based upon an estimation of a plurality of instantaneous heart rates of the user, in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
FIG. 2A through 2B, with reference to FIG. 1, illustrates an exemplary flow diagram of a method for spotting the plurality of facial micro-expressions of the user based upon the estimation of the plurality of instantaneous heart rates of the user, in accordance with an embodiment of the present disclosure. In an embodiment the system 100 comprises one or more data storage devices of the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions for execution of steps of the method by the one or more processors 104. The steps of the method of the present disclosure will now be explained with reference to the components of the system 100 as depicted in FIG. 1 and the flow diagram. In the embodiments of the present disclosure, the hardware processors 104 when configured the instructions performs one or more methodologies described herein.
According to an embodiment of the present disclosure, at step 201, the one or more hardware processors 104 detect, by implementing a Constrained Local Neural Field (CNLF) technique, a facial region of the user. As is known in the art, instantaneous heart rate estimations and micro-expressions spotting require only the facial region as an input. Thus, only the facial region is detected, and the remaining area comprising background is removed. Hence, an accurate localization of a plurality of discriminating facial landmarks is required.
In an embodiment, the CNLF technique is used to detect the plurality of discriminating facial landmarks. The plurality of discriminating facial landmarks represent the contour of eyes, lips, nose and a facial boundary. The CNLF initially implements a Viola-Jones face detector for detecting one or more areas corresponding to a plurality of plausible faces. Further, an actual face is then determined from the one or more areas corresponding to the plurality of plausible faces and the plurality of discriminating facial landmarks are localized using global and local facial models. The facial region is detected, and the remaining area comprising background is removed.
As the detected facial region comprises rigid deformations, for example, translations and rotations, performance of the instantaneous heart rate estimations and micro-expressions spotting may get degraded. To improve the performance, the facial region detected may be aligned to a common reference. The facial region detected may then be normalized, such that eye distance of the facial region detected is fixed using any traditional methods or techniques. In an example implementation of step 201, referring to FIG. 3, the facial region detected of the user may be referred.
According to an embodiment of the present disclosure, at step 202, the one or more hardware processors 104 extract from the facial region detected, a plurality of Region of Interests (ROIs), wherein the plurality of ROIs comprise skin pixels obtained by applying a morphological erosion technique on the facial region detected. The performance of the instantaneous heart rate estimations and micro-expressions spotting may also get degraded by unavoidable eye-blinking. Hence, eye areas may be removed for processing. The eye areas are given by the convex hull of only those facial landmarks that correspond to the eyes.
As is observed in some traditional systems and methods, even slight movement in facial boundaries may result in large color variations thus degrading the heart rate estimations. In an embodiment, the morphological erosion technique may be applied to remove the facial boundaries. Further, if a full face is considered a single ROI, the variations in expressions localized in a small facial area may deteriorate the heart rate estimations. Thus, a plurality of non-overlapping square blocks from a resulting image may be extracted and the plurality of ROIs comprising only skin pixels may be obtained. In an example implementation of step 202, referring to FIG. 3 again, the plurality of ROIs extracted based upon the facial region detected of the user may be referred.
According to an embodiment of the present disclosure, at step 203, the one or more hardware processors 104 perform based upon the plurality of ROIs extracted, a plurality of steps. At step 203(i), the one or more hardware processors 104 extract a temporal signal from each of the plurality of ROIs, wherein the temporal signal comprises a cardiovascular pulse signal of the user and noise. Each of the plurality of ROIs comprising skin pixels the cardiovascular pulse signal of the user in terms of color variations and noise.
As is known from prior arts, a green channel intensities provide detail information corresponding to the cardiovascular pulse signal as compared to green and red channel intensities since green light penetrates better inside the skin as compared to blue light and the green light provides better haemoglobin absorption than red light. Thus, the temporal signal of a ROI (from the plurality of ROIs) is provided by a mean green value of pixels, wherein the mean green value of pixels correspond to the temporal signal of the ROI. In an embodiment, the temporal signal of i^(th )ROI, denoted by T^i may be obtained as:
T^i=[?_((x,y)?R^i)¦?I_g^1 (x,y),….,? ?_((x,y)?R^i)¦?I_g^n (x,y) ?] equation (1)
wherein n is the total number of frames, (x,y) represents the pixel location, R^i is the i^th ROI, and I_g^k comprises the green channel intensities of the k^th frame.
According to an embodiment of the present disclosure, at step 203(ii), the one or more hardware processors 104 filter, using a band-pass filter technique, a plurality of temporal signals extracted from the plurality of ROIs to obtain a filtered set of signals, wherein the filtered set of signals are obtained by removing noise from each of the plurality of temporal signals. As the human heart beats within the range of 40-240 beats per minute (bpm), the frequency range of a band-pass filter may be set from 0.7 to 4 Hz by implementing a serial fusion of Eulerian and Lagrangian approaches. Further, as is known in the art, each of the plurality of temporal signals extracted may comprise a non-stationary trend due to focus or illumination changes. A de-trending filter may be applied to remove noise from each of the plurality of temporal signals. In an example implementation of steps 203(i) and 203(ii) referring to FIG. 3 again, the plurality of temporal signals that may be extracted for each of the plurality of ROIs and filtered using the band-pass filter technique may be referred.
According to an embodiment of the present disclosure, at step 203(iii), the one or more hardware processors 104 extract a pulse signal from the filtered set of signals using a Kurtosis based optimization technique. As discussed above, each of the plurality of temporal signals comprises the cardiovascular pulse signal of the user and noise. In general, in case of the plurality of temporal signals, the pulse signal may be extracted using a blind source separation by estimating the individual source components.
As is known in the art, amplitudes of the pulse signal and noise in each of the plurality of temporal signals depends upon facial structure, user characteristics, for example skin color, and environmental settings, for example illumination. In such a scenario, Z-source normalization may be applied to normalize the plurality of temporal signals. In an embodiment, the pulse signal is extracted from the plurality of temporal signals by applying the Kurtosis based optimization technique as it facilitates a high level of accuracy for extracting the pulse signal. In an example implementation of step 203(iii) referring to FIG. 3 again, the pulse signal extracted from the filtered set of signals by applying the Kurtosis based optimization technique may be referred.
According to an embodiment of the present disclosure, at step 203(iv), the one or more hardware processors 104 estimate based upon the pulse signal extracted, the plurality of instantaneous heart rates of the user. At step 203(iv)(a), the one or more hardware processors 104 partition the pulse signal (extracted in the step 203(iii) above) into a plurality of overlapping windows (not shown in the figure), wherein each of the plurality of overlapping windows comprises a sequence of frames corresponding to a facial video of the user. The pulse signal extracted is partitioned into the plurality of overlapping windows and an instantaneous heart rate is estimated from each of the plurality of overlapping windows.
In an embodiment, size of each of the plurality of overlapping windows (not shown in the figure) is 60 frames and there is an overlap of 30 frames in subsequent windows. Considering an example scenario, if first window comprises 1 to 60 frames of the facial video, second window comprises 30 to 90 frames of the facial video, third video comprises 60 to 120 frames of the facial video and so on, wherein the first window, the second window, the third window and the subsequent windows correspond to the plurality of overlapping windows. Further, as mentioned above, each of the plurality of overlapping windows comprise the sequence of frames corresponding to the facial video of the user. In general, the facial video (or an input video) comprises human face along with large background. However, the instantaneous heart rate estimations and micro-expressions spotting require only the facial region as an input (discussed in step 201 already).
In general, video frames may be classified, based upon one or more facial expressions, as onset frames wherein an expression comes into existence and thus a plurality of temporal deformations are escalating; (ii) apex frames where the expression is at its peak and thus the plurality of temporal deformation are the most prominent; (iii) offset frames where the expression diminishes and thus the plurality of temporal deformations starts to fade away; and (iv) neutral frames where no expression is noticeable and thus the plurality of temporal deformations are negligible.
According to an embodiment of the present disclosure, at step 203(iv)(b), the one or more hardware processors 104 estimate the plurality of instantaneous heart rates based upon the plurality of overlapping windows. In an embodiment, a Fast Fourier Transform (FFT) technique is applied on the partitioned pulse signal extracted from each of the plurality of overlapping windows to estimate the plurality of instantaneous heart rates, wherein each of the plurality of instantaneous heart rates estimated corresponds to at least one of the plurality of overlapping windows. In an example implementation of step 203(iv)(b), referring to FIG. 3 again, the plurality of instantaneous heart rates estimated may be referred, wherein the plurality of instantaneous heart rates are estimated as 85, 86, 83, 85,…..,78.
According to an embodiment of the present disclosure, at step 204, the one or more hardware processors 104 spot, based upon a plurality of variations in the plurality of the instantaneous heart rates estimated, the plurality of facial micro-expressions of the user. The step of computing the plurality of variations in the plurality of the instantaneous heart rates estimated may now be considered in detail. In an embodiment, one or more variations in an instantaneous heart rate corresponding to a m^th frame, v_m is given by:
v_m=?_(a=-p)^p¦?|h_i ?-h_((i+a) ) | equation (2)
wherein the one or more variations and the instantaneous heart rate at the m^th frame correspond to the plurality of variations and the plurality of the instantaneous heart rates estimated respectively, wherein p is the number of neighbors, |.| denotes the absolute operation, i is the fragment containing the m^th frame, and h_i denotes the instantaneous heart rate in l^th fragment. In an embodiment, the value of p is set to 2. Similarly, each of the plurality of variations in the plurality of the instantaneous heart rates may be computed.
According to an embodiment of the present disclosure, the plurality of variations may be computed as a sum of changes in a plurality of neighboring heart rates, wherein each of the plurality of neighboring heart rates correspond to the plurality of instantaneous heart rates estimated. Taking an example scenario, suppose h1, h2, h3, h4 and h5 denote the first, second, third, fourth and fifth instantaneous heart rates estimated respectively, then the plurality of variations may be computed as a sum of absolute values of (h1-h3), (h2-h3), (h4-h3), and (h5-h3). Thus, the plurality of variations are computed as a sum of changes in a plurality of neighboring heart rates.
According to an embodiment of the present disclosure, the step of spotting the plurality of facial micro-expressions comprises classifying, based upon the plurality of variations in the plurality of instantaneous heart rates, a plurality of plausible peaks as either a genuine peak or a spurious peak. Further, each of the plurality of plausible peaks correspond to at least one of facial micro-expressions spot amongst the plurality of facial micro-expressions spotted. Determining the plurality of plausible peaks and along with the classification may now be considered in detail.
In an embodiment, the plurality of plausible peaks may be determined based upon the plurality of ROIs extracted. Initially, one or more feature variations in each of the plurality of ROIs extracted may be evaluated. Based upon the one or more feature variations, one or more ROIs amongst the plurality of extracted are selected, that may possibly comprise the plurality of temporal deformations due to the facial micro-expressions. An encoding of one or more appearance based features may be performed using the selected one or more ROIs and the encoding may then be used to determine or extract the plurality of plausible peaks.
The one or more feature variations are required for spotting the plurality of facial micro-expressions. In an embodiment, the one or more feature variations may be evaluated using changes in a facial appearance of the user. The one or more feature variations in each of the plurality of ROIs may be calculated by utilizing an Eulerian methodology. That is, the intensity variation in a particular ROI amongst the plurality of ROIs extracted provide the one or more feature variations. In an embodiment, a feature variation corresponding to i^th ROI in a^th frame, F_i (a) is given as:
F_i (a)=?_((a,b)?R^i)¦?(G_((a,b))^a-((G_((a,b))^((a+k) )+G_((a,b))^((a-k) )))/2? )^2 equation (3)
wherein R^i denotes the i^th ROI and G_((a,b))^w is the gray-scale intensity at a pixel location (a,b) at the w^th frame. Referring to equation (3), it may be noted that a feature difference may be evaluated by subtracting features of sequential frames within a specified interval defined by K, rather than by subtracting feature of alternate frames, thereby providing an optimized feature representation as compared to considering an alternate frame difference.
According to an embodiment of the present disclosure, based upon the one or more feature variations, the one or more ROIs are selected amongst the plurality of ROIs extracted, wherein the selected one or more ROIs comprise the plurality of temporal deformations due to the facial micro-expressions. In an embodiment, the total variation in a ROI (amongst the plurality of ROIs extracted) d(i) is denoted by adding the one or more feature variations as:
d(i)=?_(i=q+1)^(n-q)¦F_i (a) equation (4)
wherein F_i is a feature variation for i^th block, n is the number of video frames, and q is an interval used for a frame feature difference. In an embodiment, 40% of the ROI comprising largest total variations were selected.
According to an embodiment of the present disclosure, the encoding (that is a feature encoding) may be obtained by adding the feature variation(s) of the selected one or more ROIs (amongst the plurality of ROIs extracted), that is:
A(a)=?_i?ß¦F_i (a) equation(5)
wherein A denotes an appearance based feature encoding (amongst the one or more appearance based features), F_i is a feature variation for i^th block, a is a frame number, and ß stores the index of the selected one or more ROIs (amongst the plurality of ROIs extracted).
Plausible Spot Detection – As is known in the art, the Micro-expressions are given by the peaks in the feature encoding. Spurious peaks (corresponding to the facial micro-expressions) may be generated by the noise. The spurious peaks usually contain low amplitude as compared to the genuine micro-expressions peaks. In an embodiment, a threshold T may be defined to remove few of the spurious peaks according to any known techniques or methods as below:
T=A_mean+t×(A_max-A_mean) equation (6)
wherein A_max and A_mean indicate the maximum and mean value in A respectively, and t is a predefined parameter. In an embodiment, all the peaks whose magnitude is greater than T are marked as plausible peaks and comprise the plurality of plausible peaks.
According to an embodiment of the present disclosure, each of the plurality of plausible peaks correspond to at least one of the facial micro-expressions spot amongst the plurality of facial micro-expressions spotted (as mentioned above). In an embodiment, the plurality of facial micro-expressions spotted may be classified either as genuine micro-expressions or spurious micro-expressions (or spuriously generated micro-expressions), wherein the spuriously generated micro-expressions comprise one or more facial micro-expressions spuriously generated due to noise.
Hence, amongst all the plurality of plausible peaks detected, only some peaks correspond to the genuine micro-expressions while the others correspond to the spuriously generated micro-expressions. As mentioned above, the plurality of instantaneous heart rates estimated may be used to classify the plurality of plausible peaks as either the genuine peak or the spurious peak. This is due to the reason that in general, a heart rate fluctuates when micro-expressions of the user changes, and therefore, the plurality of variations in the plurality of instantaneous heart rates should be higher at the frames affected by micro-expressions.
In an embodiment, the plurality of plausible peaks may thus be classified using below equation:
Classify (s)={¦(Genuine,ifv_s>t_f@Spurious,otherwise)¦ equation(7)
wherein s is a plausible location of a micro-expression, v_s is one or more variations in an instantaneous heart rate at s^th location frame, and t_f is a pre-defined threshold which is set to 10 bpm.
According to an embodiment of the present disclosure, the efficacy and performance evaluation of the proposed methodology, that is spotting the plurality of facial micro-expressions based upon the plurality of instantaneous heart rates estimated may now be considered in detail. The efficacy is tested using a publicly available dataset. The dataset comprises a total of 57 micro-expressions from a total of 22 users. A plurality of face videos from the total of 22 users were acquired using a camera, wherein the plurality of face videos were acquired at 30 frames per second (fps) and the resolution of the camera is set to 640×480 pixels. Further, a uniform illumination was maintained during the acquisition and the dataset comprises a plurality of onsets and offsets of micro-expressions from the plurality of face videos.
In an embodiment, the performance is evaluated based upon a verification of the genuine peak (amongst the plurality of plausible peaks) denoting the apex of a spotted micro-expression (amongst the plurality of facial micro-expressions spotted). In an embodiment, the spotted micro-expression is classified as a true positive, if the genuine peak is in the range of [onset-20, offset+20], else the spotted micro-expression is classified as a false positive. Further, behavior of a True Positive Rate (TPR) and a False Positive Rate (FPR) is analyzed for the performance evaluation.
In an embodiment, the TPR is computed as a percentage of true positives, divided by a total number of micro-expressions spotted, while the FPR is computed as a percentage of false positives, divided by a total number of false peaks. The performance evaluation is performed using a plurality of Receiver Operating Characteristic (ROC) curves, wherein the x-axis and y-axis of each of the ROC curves denote the FPR and the TPR respectively. Further, a plurality of threshold values required for plotting each of the ROC curves may be obtained by varying the predefined parameter t in the equation (6) from 0 to 1.
According to an embodiment of the present disclosure, the performance analysis of the proposed methodology and the corresponding technical advantages may be considered in detail by comparing the proposed methodology with a system MEP, wherein the system MEP (or MEP) is simply an example system (or an example name) used for comparison purposes only. In an embodiment, the system MEP may be obtained based upon the plurality of facial micro-expressions spotted, but by excluding the plurality of instantaneous heart rates estimated. The system MEP generated only the plurality of plausible peaks. Referring to FIG. 4, the plurality of the ROC curves corresponding to the proposed methodology and the system MEP may be referred. Referring to FIG. 4 again, it may be observed that a large number of genuine micro-expressions (amongst the plurality of facial micro-expressions spotted) are spotted at low thresholds, however, a large number spurious micro-expressions spots may also be observed.
In an embodiment, referring to FIG. 4 again, it may be observed that the proposed methodology by spotting the plurality of facial micro-expressions based upon the plurality of instantaneous heart-rates perform better than the system MEP, thereby further indicating that the plurality of variations in the plurality of instantaneous heart rates successfully classify the plurality of plausible peaks as either the genuine peak or the spurious peak. Referring to FIG. 5, an example of the plurality of plausible peaks classified as either the genuine peak or the spurious peak based upon the plurality of variations in the plurality of instantaneous heart rates may be referred. Referring to FIG. 5 again, it may be noted that the plurality of variations in the plurality of instantaneous heart rates at depicted genuine and spurious spots are 16 bpm and 3 bpm respectively.
In an embodiment, the memory 102 can be configured to store any data that is associated with spotting the plurality of facial micro-expressions of the user based upon the estimation of the plurality of instantaneous heart rates of the user. In an embodiment, the information pertaining to the facial region detected, the plurality of ROIs extracted, the temporal signal extracted, the filtered set of signals, the pulse signal extracted, the plurality of instantaneous heart rates estimated, the plurality of facial micro-expressions spotted, the plurality of variations in the plurality of instantaneous heart rates, and the plurality of plausible peaks etc. is stored in the memory 102. Further, all information (inputs, outputs and so on) pertaining to spotting the plurality of facial micro-expressions of the user based upon the estimation of the plurality of instantaneous heart rates of the user may also be stored in the database, as history data, for reference purpose.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
The embodiments of present disclosure herein addresses unresolved problem of spotting the plurality of facial micro-expressions of the user based upon the estimation of the plurality of instantaneous heart rates of the user. The embodiment, thus provides for a highly accurate and optimized spotting of the plurality of micro-expressions of the user based upon the estimation of the plurality of instantaneous heart rates. Moreover, the embodiments herein further provides for computing, based upon the pulse signal extracted, the plurality of variations in the plurality of instantaneous heart rates to spot the plurality of facial micro-expressions of the user.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.

Documents

Application Documents

#	Name	Date
1	201821021609-STATEMENT OF UNDERTAKING (FORM 3) [08-06-2018(online)].pdf	2018-06-08
2	201821021609-REQUEST FOR EXAMINATION (FORM-18) [08-06-2018(online)].pdf	2018-06-08
3	201821021609-FORM 18 [08-06-2018(online)].pdf	2018-06-08
4	201821021609-FORM 1 [08-06-2018(online)].pdf	2018-06-08
5	201821021609-FIGURE OF ABSTRACT [08-06-2018(online)].jpg	2018-06-08
6	201821021609-DRAWINGS [08-06-2018(online)].pdf	2018-06-08
7	201821021609-COMPLETE SPECIFICATION [08-06-2018(online)].pdf	2018-06-08
8	Abstract1.jpg	2018-08-11
9	201821021609-FORM-26 [30-08-2018(online)].pdf	2018-08-30
10	201821021609-Proof of Right (MANDATORY) [20-09-2018(online)].pdf	2018-09-20
11	201821021609-OTHERS(ORIGINAL UR 6(1A) FORM 1)-210918.pdf	2018-12-07
12	201821021609-ORIGINAL UR 6(1A) FORM 26-060918.pdf	2019-01-16
13	201821021609-OTHERS [24-05-2021(online)].pdf	2021-05-24
14	201821021609-FER_SER_REPLY [24-05-2021(online)].pdf	2021-05-24
15	201821021609-COMPLETE SPECIFICATION [24-05-2021(online)].pdf	2021-05-24
16	201821021609-CLAIMS [24-05-2021(online)].pdf	2021-05-24
17	201821021609-FER.pdf	2021-10-18
18	201821021609-US(14)-HearingNotice-(HearingDate-14-02-2024).pdf	2024-01-23
19	201821021609-FORM-26 [12-02-2024(online)].pdf	2024-02-12
20	201821021609-Correspondence to notify the Controller [12-02-2024(online)].pdf	2024-02-12
21	201821021609-Written submissions and relevant documents [28-02-2024(online)].pdf	2024-02-28
22	201821021609-Response to office action [07-03-2024(online)].pdf	2024-03-07
23	201821021609-PatentCertificate14-03-2024.pdf	2024-03-14
24	201821021609-IntimationOfGrant14-03-2024.pdf	2024-03-14

Search Strategy

1	2020-11-0910-21-53E_09-11-2020.pdf