Sign In to Follow Application
View All Documents & Correspondence

System And Method For Analyzing Physiological State Of Interlocutors

Abstract: An embodiment herein provides for a system for analyzing physiological state of interlocutors. The system processes a set of multimedia data packets associated with at least one interlocutor, said set of multimedia data packets being deconstructed into at least one media stream indicative of data packets; processes said at least one media stream to detect at least one syntactical object indicative of physiological state of the at least one interlocutor; identifies at least one physiological state label for each of the detected at least one syntactical object; and based on the identified at least one physiological state label, extract a physiological attribute value for the at least one interlocutor. The system may further transmit a set of processed data packets associated with at least one message generated by the system based on the extracted physiological attribute value.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
14 October 2022
Publication Number
16/2024
Publication Type
INA
Invention Field
BIO-MEDICAL ENGINEERING
Status
Email
Parent Application

Applicants

MUST Research Labs LLP
Flat 109, Block 4, My Home Krishe, Hyderabad - 500046, Telangana, India.

Inventors

1. MUSTAFI, Joy
Flat 301, Block 4, My Home Vihanga, Hyderabad - 500046, Telangana, India.
2. VUTUKURI, Santosh
3-1-119, Santosh Nagar, Azad Road, Near Raithu Bazar, Bhongir - 508116, Telangana, India.
3. SREEPADA, Soudamini
Flat 30508, Indu Fortune Fields Gardenia, KPHB 13th Phase, Hyderabad - 500085, Telangana, India.
4. VIJAYAKUMAR, Tharangini
B404, Purvi Lotus, HSR layout, Hosapalya Main Road, Sector 2, Bangalore - 560068, Karnataka, India.
5. PAUL, Anubhab
9A, Becharam Chatterjee Road, Behala Natun Para. Kolkata - 700061, West Bengal, India.

Specification

Description:FIELD OF INVENTION
[0001] The present disclosure generally relates to a system for analyzing physiological state of interlocutors. More particularly, it relates to a system for providing recommendations to interlocutors to adjust their physical and locutory expressions based on predicted physiological states of interlocutors.

BACKGROUND AND PRIOR ART
[0002] Advent of Internet has enabled digital transformation of every aspect of human life. Increasingly, human-to-human interaction is becoming reliant on digital solutions due to its spontaneity and overall convenience. Some modes of digital communication include phone calls, video calls, VoIP, instant messaging, etc. Moreover, the ubiquity of digital communication has become an inescapable necessity in today’s professional and social activities.
[0003] However, modes of digital communication have been criticized for several reasons; the foremost criticism being its inefficient or ineffective conveyance of social and emotional cues between interlocutors, i.e. listeners and speakers in communication. Facial expressions, body language, speech intonation and tone (which are factors that indicate physiological state of the interlocutors) are difficult to convey in digital medium. Physiological states refer to the condition or state of body or bodily function. Physiological states may be physical states including fatigue, vigour, hunger, etc., or emotional states including happiness, sadness, love, anger, etc. For instance, it may be difficult to gauge speaker’s emotional state when using a video conferencing tool due to network bandwidth issues, limitations of monitor displays in terms of size and orientation, etc. This risks misunderstanding/misinterpretation, and often causes more problems than benefits digital communication offer. Apart from the dangers of miscommunication, interlocutors in digital communication may want to adjust their physiological appearance in response to other party’s physiological state. Existing solutions do not provide such feedback in real-time that allow interlocutors to modify their tone or facial expression, in order to stay relevant or captivating during communication. For instance in an online class or a workshop, a speaker may choose to adjust their speed or change their style of presentation depending on listeners’ physiological state. If the listeners’ physiological state is determined to be ‘distracted’, the speaker may want to take a break or employ a different style of presentation. Likewise, during negotiations, a first interlocutor may try to ascertain a second interlocutor’s state of mind with the tone and intonation of the first interlocutor’s speech. A speaker may also try to assess the listener’s face, body posture and speech, among other indicators, to ascertain the listener’s physiological state. Moreover, speakers would also like to know their perceived physiological state and may choose to adjust the same to obtain the intended result. For instance, the speaker, unbeknownst to them, may be using a harsh tone or speaking too quickly which may not be desired in a particular situation. In such cases, speakers may like to receive feedback in real time to remedy the same.
[0004] Several solutions have been proposed to address problems in conveying physiological state of interlocutors during digital communication. Most of these solutions rely on independent estimations of physical and mental states of interlocutors using computer vision and natural language processing technologies. However, these solutions do not provide holistic apprisal of the interlocutor’s physiological state. A physiological analyzing model is required that uses physical expressions indicative of facial expressions, postures, gestures etc., and vocal expressions such as tonal and locutionary acts, among other contextual information. Additionally, such expressions and contextual information need to be detected and combined to improve the accuracy of the predicted emotion.
[0005] Furthermore, once the physiological state of an interlocutor is detected, the emotion must be presented with recommendations to the interlocutor, so as to allow interlocutors adjust their expressions, and in-turn their perceived physiological state to achieve the desired effect during communication. Existing solutions do not provide for such recommendations.
[0006] Therefore, there is a need for a system for analyzing physiological state of interlocutors by detecting and combining physical and locutory expressions of interlocutors. Furthermore, there is a need for a system that provides recommendations to interlocutors to adjust their physical and locutory expressions based on the physiological state of interlocutors in communication.

OBJECTS OF THE INVENTION
[0007] An object of the present disclosure is to provide for a system for analyzing physiological state of interlocutors.
[0008] Another object of the present disclosure is to provide for a system for analyzing physiological state of interlocutors that uses contextual information of the interlocutors such as physical and locutory expressions.
[0009] Yet another object of the present disclosure is to provide for a system that provides a physiological attribute value that is a numeric or ordinal value.
[00010] Another object of the present disclosure is to provide for a system for analyzing physiological state of interlocutors that also provides recommendations to adjust the physiological state of the interlocutors.
[00011] Another object of the present disclosure is to provide for a system that can analyze physiological states of a diverse set of human and non-human interlocutors.
[00012] The other objects and advantages of the present invention will be apparent from the following description when read in conjunction with the accompanying drawings, which are incorporated for illustration of the preferred embodiments of the present invention and are not intended to limit the scope thereof.

SUMMARY OF THE INVENTION
[00013] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended for use in determining/limiting the scope of the claimed subject matter.
[00014] The present disclosure generally relates to a system for analyzing physiological state of interlocutors. More particularly, it relates to a system for providing recommendations to interlocutors to adjust their physical and locutory expressions based on predicted physiological states of interlocutors.
[00015] An aspect of the present disclosure relates to a system for analyzing physiological state of interlocutors, said system may include: a processor; a memory coupled to the processor, wherein the memory storing processor-executable instructions, which on execution, may cause the processor to: process a set of multimedia data packets associated with at least one interlocutor, said set of multimedia data packets being deconstructed into at least one media stream; process said at least one media stream to detect at least one syntactical object indicative of physiological state of the at least one interlocutor; identify at least one physiological state label for each of the detected at least one syntactical object; and based on the identified at least one physiological state label, extract a physiological attribute value for the at least one interlocutor.
[00016] In an embodiment, upon extracting the physiological attribute value, the processor may be configured to: transmit a set of processed data packets associated with at least one message generated by the system based on the extracted physiological attribute value to the interlocutor.
[00017] In an embodiment, the set of processed data packets may be combined with the at least one media stream deconstructed from the set of multimedia data packets.
[00018] In an embodiment, the set of multimedia data packets may be a set of video data packets, a set of audio data packets, a set of textual data packets, or a set of data packets indicative of virtual representations of the at least one interlocutor in computer-generated environments, or a combination thereof.
[00019] In an embodiment, the at least one media stream may include an image stream indicative of a set of at least one still image, an audio stream indicative of a recording of sounds, or a data stream indicative of time metadata of the image stream and the audio stream.
[00020] In an embodiment, the at least one syntactical object may include at least one object or at least one part of an object in the at least one media stream that may be capable of signifying physiological states.
[00021] In an embodiment. the at least one syntactical object include physical expressions indicative of eye movements, faces, gestures, behaviours, body parts, or postures from the at least one media stream; and locutory expressions indicative of locutionary acts and tones from the at least one media stream.
[00022] In an embodiment, the system may further include a syntactic engine with a first deep learning model that outputs a first set of data packets indicative of at least one categorical label or at least one numeric value associated with the detected at least one syntactical object on receiving data packets associated with the at least one media stream; a prediction engine with a second deep learning model that outputs a second set of data packets indicative of at least one categorical label associated with the detected at least one physiological state label on receiving the first set of data packets; and a aggregation engine with a third deep learning model that outputs a third set of data packets indicative of at least one numerical or ordinal value associated with the detected at least one physiological attribute value on receiving the second set of data packets, wherein the first, second and third deep learning models are optimized and pretrained on an annotated dataset of a diverse set of human and non-human interlocutors.
[00023] Another aspect of the present disclosure relates to a method for analysing physiological state of interlocutors, said method may include the steps of: processing, at a processor associated with a system, a set of multimedia data packets associated with at least one interlocutor, said set of multimedia data packets being deconstructed into at least one media stream; processing, at the processor, said at least one media stream to detect at least one syntactical object capable of indicating physiological state of the at least one interlocutor; identifying, at the processor, at least one physiological state label for each of the detected at least one syntactical object; and based on the identified at least one physiological state label, at the processor, extracting a physiological attribute value for the at least one interlocutor.
[00024] In an embodiment, the method may further include the step of: transmitting, by the processor, a set of processed data packets with at least one message generated by the system based on the extracted physiological attribute value to the interlocutor.
[00025] Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.

BRIEF DESCRIPTION OF DRAWINGS
[00026] The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry/sub-components of each component. It will be appreciated by those skilled in the art that the invention of such drawings includes the invention of electrical components, electronic components, or circuitry commonly used to implement such components.
[00027] Fig. 1 illustrates an exemplary block diagram representation of an implementation of an architecture of a system for analysing physiological state of interlocutors, according to an embodiment of the present disclosure.
[00028] Fig. 2 illustrates an exemplary detailed block diagram representation of function components a system for analysing physiological state of interlocutors, according to embodiments of the present disclosure.
[00029] Fig. 3 illustrates an exemplary detailed block diagram representation of an implementation of an internal architecture of a system for analysing physiological state of interlocutors, according to an embodiment of the present disclosure.
[00030] Fig. 4 illustrates an exemplary flow diagram representation of a method for predicting physiological state of interlocutors, according to an embodiment of the present disclosure.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[00031] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims. It will be apparent to one skilled in the art that embodiments of the present invention may be practised without some of these specific details.
[00032] Embodiments of the present invention include various steps, which will be described below. The steps may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special purpose processor programmed with the instructions to perform the steps. Alternatively, steps may be performed by a combination of hardware, software, and firmware and/or by human operators.
[00033] Thus, for example, it will be appreciated by those of ordinary skill in the art that the diagrams, schematics, illustrations, and the like represent conceptual views or processes illustrating systems and methods embodying this invention. The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing associated software. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the entity implementing this invention. Those of ordinary skill in the art further understand that the exemplary hardware, software, processes, methods, and/or operating systems described herein are for illustrative purposes and, thus, are not intended to be limited to any particular named element.
[00034] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner.
[00035] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a,” “an,” and “the” are intended to include the 25 plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or 30 groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed products.
[00036] The present disclosure generally relates to a system for analyzing physiological state of interlocutors. More particularly, it relates to a system for providing recommendations to interlocutors to adjust their physical and locutory expressions based on predicted physiological states of interlocutors.
[00037] An aspect of the present disclosure relates to a system for analyzing physiological state of interlocutors, said system may include: a processor; a memory coupled to the processor, wherein the memory storing processor-executable instructions, which on execution, may cause the processor to: process a set of multimedia data packets associated with at least one interlocutor, said set of multimedia data packets being deconstructed into at least one media stream; process said at least one media stream to detect at least one syntactical object indicative of physiological state of the at least one interlocutor; identify at least one physiological state label for each of the detected at least one syntactical object; and based on the identified at least one physiological state label, extract a physiological attribute value for the at least one interlocutor.
[00038] In an embodiment, upon extracting the physiological attribute value, the processor may be configured to: transmit a set of processed data packets associated with at least one message generated by the system based on the extracted physiological attribute value to the interlocutor.
[00039] In an embodiment, the set of processed data packets may be combined with the at least one media stream deconstructed from the set of multimedia data packets.
[00040] In an embodiment, the set of multimedia data packets may be a set of video data packets, a set of audio data packets, a set of textual data packets, or a set of data packets indicative of virtual representations of the at least one interlocutor in computer-generated environments, or a combination thereof.
[00041] In an embodiment, the at least one media stream may include an image stream indicative of a set of at least one still image, an audio stream indicative of a recording of sounds, or a data stream indicative of time metadata of the image stream and the audio stream.
[00042] In an embodiment, the at least one syntactical object may include at least one object or at least one part of an object in the at least one media stream that may be capable of signifying physiological states.
[00043] In an embodiment. the at least one syntactical object include physical expressions indicative of eye movements, faces, gestures, behaviours, body parts, or postures from the at least one media stream; and locutory expressions indicative of locutionary acts and tones from the at least one media stream.
[00044] In an embodiment, the system may further include a syntactic engine with a first deep learning model that outputs a first set of data packets indicative of at least one categorical label or at least one numeric value associated with the detected at least one syntactical object on receiving data packets associated with the at least one media stream; a prediction engine with a second deep learning model that outputs a second set of data packets indicative of at least one categorical label associated with the detected at least one physiological state label on receiving the first set of data packets; and a aggregation engine with a third deep learning model that outputs a third set of data packets indicative of at least one numerical or ordinal value associated with the detected at least one physiological attribute value on receiving the second set of data packets, wherein the first, second and third deep learning models are optimized and pretrained on an annotated dataset of a diverse set of human and non-human interlocutors.
[00045] Another aspect of the present disclosure relates to a method for analysing physiological state of interlocutors, said method may include the steps of: processing, at a processor associated with a system, a set of multimedia data packets associated with at least one interlocutor, said set of multimedia data packets being deconstructed into at least one media stream; processing, at the processor, said at least one media stream to detect at least one syntactical object capable of indicating physiological state of the at least one interlocutor; identifying, at the processor, at least one physiological state label for each of the detected at least one syntactical object; and based on the identified at least one physiological state label, at the processor, extracting a physiological attribute value for the at least one interlocutor based on the identified at least one physiological state label.
[00046] In an embodiment, the method may further include the step of: transmitting, by the processor, a set of processed data packets with at least one message generated by the system based on the extracted physiological attribute value to the interlocutor.
[00047] Fig. 1 illustrates an exemplary block diagram representation of an implementation of an architecture of a system 100 for analysing physiological state of interlocutors, according to an embodiment of the present disclosure. The architecture 1000 may include a system 100, an electronic device 110, at least one computing device 125 associated with at least one interlocutor 120, and a communication network 130.
[00048] The system 100 may be implemented by way of a single device or of multiple devices that may be operatively connected or networked together. In one embodiment, the system 100 may be implemented in or associated with the electronic device 110, said electronic device 110 being installed locally or in a remote location. In another embodiment, the system 100 may be implemented by way of a standalone device, like a server that may be communicatively coupled to the electronic device 110. In such an embodiment, the server may be a stand-alone server, a remote server, a cloud computing server, a dedicated server, a rack server, a server blade, a server rack, a bank of servers, a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, one or more processors executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof. In yet another embodiment, the system 100 may be implemented in or associated with respective computing devices 125-1, 125-2, …., 125-N (individually referred to as the computing device 125, and collectively referred to as the computing devices 125), associated with one or more interlocutors 120-1, 120-2, ...., 120-N (individually referred to as the interlocutors 120, and collectively referred to as the interlocutors 120). In such an embodiment, the system 100 may be replicated in each of the computing devices 125. In one embodiment, interlocutors 120 may include, but are not limited to, human and non-human interlocutors.
[00049] The computing devices 125 and the electronic device 110 may be any one of an electrical, an electronic, an electromechanical and a computing device. including, but not are limited to, a mobile device, a smart phone, a Personal Digital Assistant (PDA), a tablet computer, a phablet computer, a wearable device, a Virtual Reality/Augment Reality (VR/AR) device, a laptop, a desktop, server, and the like. In an embodiment where the system 100 is implemented using the server or the electronic device 110, the computing devices 125 may communicate with the at least system 100 using a communication network 130. The communication network 130 may be a wired or wireless network implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, Wi-Fi, LTE network, CDMA network, and the like. The communication network 130 may be implemented so as to allow the transfer of data packets using a variety of protocols including but not limited to, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like.
[00050] The system 100 may also include one or more processor(s) 102, a memory 103 and one or more interface(s) 104. The processor(s) 103 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that manipulate data based on operational instructions. Further, the processors(s) may be configured to read and execute computer-readable instructions from the memory 103 for performing tasks including, but not limited to, data processing, input/output processing, feature extraction, and/or any other functions. The memory 106 may store computer readable instructions or routines that may be fetched and executed to create and share data. The memory 206 may comprise any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as EPROM, flash memory, and the like. The interface(s) 104 of may be used to receive inputs from the computing devices 125 associated with the interlocutors 120. The interface(s) 104 may include a variety of interfaces, for example, interfaces for data input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) 104 may facilitate communication between the system 100 and the computational devices 125 and electronic device 110. In an embodiment, the interface(s) 104 may also include a Graphical Interlocutor Interface (GUI), Application Programming Interface (API), a Command Line Interface (CLI) or the like.
[00051] The computer-readable instructions may be executed by the processor 102 analyzing the interlocutors 120 physiological state. In an example that follows, the interlocutors 120 may desire to analyze the physiological state of interlocutors 120. In this instance, the interlocutor 120, via the computing device 120 corresponding to the interlocutor 120, may transmit a set of multimedia data packets associated with the interlocutor 120 to the system 100. In an embodiment, the set of multimedia data packets may be a set of video data packets, a set of audio data packets, a set of textual data packets, or a set of data packets indicative of virtual representations of the at least one interlocutor in computer-generated environments, or a combination thereof. In such an embodiment, the system 100 may be communicatively coupled to the corresponding computing device 125 to receive the set of multimedia data packets.
[00052] In an embodiment, the system 100 may process the set of multimedia data packets associated with at least one interlocutor, said set of multimedia data packets may be deconstructed into at least one media stream 215 indicative of data packets. The at least one media stream 215 may be an image stream 215A indicative of a set of at least one still image, an audio stream 215B indicative of a recording of sounds, or a data stream 218 indicative of time metadata of the image stream 215A and the audio stream 215B.
[00053] In an embodiment, the system 100 may process said at least one media stream 215 to detect at least one syntactical object 225 indicative of physiological state of the at least one interlocutor. The at least one syntactical object 225 comprises at least one object or at least one part of an object in the at least one media stream 215 that may be capable of signifying physiological states. The at least one syntactical object 225 may also include physical expressions indicative of eye movements, faces, gestures, behaviours, body parts, or postures from the at least one media stream 215; and locutory expressions indicative of locutionary acts and tones from the at least one media stream 215.
[00054] In an embodiment, the system 100 may identify at least one physiological state label 235 for each of the detected at least one syntactical object 225. The at least one physiological label 235 may include, but not be limited to, a label each of the physiological states of sleepiness, alertness, love, aggression, hunger, distracted, fatigue, vigour, happiness, sadness, fear, anger, surprise, disgust, and the like.
[00055] In an embodiment, the system 100 may, based on the identified at least one physiological state label 235, extract a physiological attribute value 245 for the at least one interlocutor. The physiological attribute value 245 extracted by the aggregation engine 240 may have numeric or ordinal value. Since the physiological attribute value 245 may not be a discrete categorical value, the system 100 may be able to predict intermediate physiological states.
[00056] In an embodiment, the system 100 may transmit a set of processed data packets associated with at least one message generated by the system 100 based on the extracted physiological attribute value 245 to the interlocutors 120. In an embodiment, the set of processed data packets may be combined with the at least one media stream deconstructed from the set of multimedia data packets. In an embodiment, the at least one message generated by the system 100 may be a recommendation or a prescription to the interlocutors 120 that may aid said interlocutors 120 to maximize their relevance during communication.
[00057] Fig. 2 illustrates an exemplary detailed block diagram representation of function components a system for analysing physiological state of interlocutors, according to embodiments of the present disclosure. The system 100 may include the processor(s) 102, the memory 103, and the interface(s) 104. In some implementations, the system 100 may include processing engine(s) 200. The processing engine(s) 200 may be stored in the memory 103 and execute by processor(s) 102. The processing engine(s) 200 may be implemented as a hardware or a combination of hardware and software components that implement one or more functionalities of the processing engine(s) 200. For instance, the hardware elements of the processing engine(s) 200 may include one or more processor(s) 102 and software elements may include computer-readable and executable instructions stored in the memory 103. In other examples, the processing engine(s) 210 may be implemented by electronic circuitry or mechanical computers.
[00058] The system 100 may also include a database 105. The database 105 may include data received, stored or generated as a result of the functioning of any component of the processing engine(s) 200. In an embodiment, the data stored in the memory 103 and the database 105 may be processed by any component of the processing engine(s) 200 of the system 100. In an embodiment, the processing engine(s) 200, may include a data collection engine 210, a syntactic engine 220, a prediction engine 230, an aggregation engine 240 and a presentation engine 250.
[00059] Fig. 3 illustrates an exemplary detailed block diagram representation of an implementation of an internal architecture of a processing engine(s) 200 of a system 100 for analysing physiological state of interlocutors, according to an embodiment of the present invention herein. The processing engine(s) 200 may include the data collection engine 210, the syntactic engine 220, the prediction engine 230, the aggregation engine 240 and the presentation engine 250.
[00060] In an embodiment, the data collection engine 210 may receive the set of multimedia data packets associated with at least one interlocutor 120. In an embodiment, the set of multimedia data packets may be from a pre-recorded video. Alternatively in another embodiment, the set of multimedia data packets may be from a continuous stream of multimedia data packets from a video recorded and transmitted in real-time. In the exemplary embodiment shown in Fig. 3, the set of multimedia data packets is shown as an input stream 212. The data collection engine 210 may deconstruct the set of multimedia data packets into at least one media stream 215. In an embodiment, the at least one media one media stream 215 may include an image stream 215A indicative of a set of at least one still image, an audio stream 215B indicative of a recording of sounds, or a data stream 218 indicative of time metadata of the image stream 215A and the audio stream 215B.
[00061] In an embodiment, the syntactic engine 220 may process at least one media stream 215 to detect at least one syntactical object 225 indicative of physiological state of the at least one interlocutor 120. Here, syntactical objects 225 may include at least one at least one object or at least one part of an object in the at least one media stream 215 that is capable of signifying physiological states. For instance, the syntactic engine may detect positions and orientations of body parts 225B including, but not limited to, hands, elbows, shoulders and feet. In an exemplary embodiment shown in Fig. 3, the syntactical objects 225 detected in the syntactic engine 220 may include physical expressions indicative of eye movements, faces 225A, gestures, behaviours, body parts 225B and postures 225C of the interlocutors from the image stream 215A, and locutory expressions such as indicative of locutionary acts 225D and tones 225E from the at least one media stream 215. Alternatively, in an embodiment where the at least one media stream is indicative of an VR/AR environment, syntactical objects 225 may also include facial expressions and other body movements of interlocutors’ 120 avatars. In an embodiment, the syntactic engine 220 may include a machine learning model, wherein the machine learning model may be a deep learning model that outputs a set of data packets indicative of at least one categorical labels or numeric value associated with the detected at least one syntactical object 225 on receiving data packets associated with the at least one media stream 215.
[00062] In an embodiment, the prediction engine 230 may identify at least one physiological state label 235 for each of the detected at least one syntactical object 225. For instance, the physiological state label 235 identified from at least one syntactical object 225 indicative of faces 125A may be “happy” if the face detected signifies a happy state. In an exemplary embodiment as shown in Fig. 3, the prediction engine 230 identifies a facial state label 235A corresponding to faces 225, a postural state label 235B corresponding to postures 225B, locutory state labels 235C corresponding to locutionary acts 225D, and tonal state labels 235D corresponding to tones 225. In an embodiment, the prediction engine 230 may include a machine learning model, wherein the machine learning model may be a deep learning model that outputs a set of data packets indicative of at least one categorical label associated with the detected physiological state 235 on receiving data packets associated with the categorical labels or numeric values from syntactic engine 220.
[00063] In an embodiment, the aggregation engine 240 may extract the physiological attribute value 245 from the at least one physiological state value. The physiological attribute value 245 extracted by the aggregation engine 240 may have numeric or ordinal value. In an embodiment, physiological attribute value 245 may be a numeric value between the limits [-1,+1]. Alternatively in another embodiment, the physiological attribute value 245 may be a value on an ordinal scale. For instance, the physiological attribute value 245 may be “medium” on an ordinal scale of “low”, “medium”, and “high”. Since the physiological attribute value 245 may not be a discrete categorical value, the system 100 may be able to predict intermediate physiological states. For instance, the system 100 may be able to predict the physiological attribute value 245 of +0.5 as moderately happy, wherein +1 is predicted as extremely happy and -1 is predicted as extremely sad. In an embodiment, the aggregation engine 240 may include a machine learning model, wherein the machine learning model may be a deep learning model that outputs a set of data packets indicative of at least one numerical or ordinal value associated with the detected at least one physiological attribute value on receiving the data packets associated with the categorical labels from prediction engine 230.
[00064] In an embodiment, upon extracting the physiological attribute value 245, the presentation engine 250 may transmit a set of processed data packets associated with at least one message generated by the system 100 based on the extracted physiological attribute value 245. In the exemplary embodiment shown in Fig. 3, the set of processed data packets is shown as an output stream 255. In an embodiment, the set of processed data packets may be combined with the at least one media stream 215 deconstructed from set of multimedia data packets. For instance in one embodiment, the presentation engine 250 may transmit the set of processed data packets wherein the physiological attribute value 245 may be overlayed along with the at least one message on the video component associated with the set of multimedia data packets. For instance, if interlocutors’ 120 physiological state is predicted as “drowsy”, the at least one message may include “Alert!!! Person_1 you are drowsy, please wash your face” or “Alert!!! Person_18 please look towards the screen”. In another embodiment, if the physiological state of interlocutors 120 who is speaking is predicted as “harsh”, at least one message may include “Alert!!! Person_3 your tone is too harsh while addressing to Person_9, please quiet down”. In yet another embodiment, when the interlocutors 120 is engaging and maintaining relevance to the audience interlocutors 120, then the at least one message may include “Person_12 your usage of words ‘THANKS’ and ‘EXCELLENT’ has improved overall engagement, keep it up!”. In an embodiment, the at least one message may be generated by the presentation engine 250. In one embodiment, the at least one message may be generated by retrieving human annotated messages corresponding to the extracted physiological attribute value 245 from the database 105. The database 105 may store the physiological attribute value 245 and the corresponding at least one message in key-value pairs. In another embodiment, the at least one message may be generated using a natural language generation (NLG) process that takes the physiological state score 245 as input and provides a natural language output.
[00065] Furthermore, in one embodiment, processing engine(s) may employ at least one machine learning model. In such an embodiment, the syntactic engine 220 may include a first deep learning model that outputs a first set of data packets indicative of at least one categorical label or at least one numeric value associated with the detected at least one syntactical object 225 on receiving data packets associated with the at least one media stream 215. In such an embodiment, the prediction engine 230 may include a second deep learning model that outputs a second set of data packets indicative of at least one categorical label associated with the detected at least one physiological state label 235 on receiving the first set of data packets. In such an embodiment, the aggregation engine 240 may also include a third deep learning model that outputs a third set of data packets indicative of at least one numerical or ordinal value associated with the detected at least one physiological attribute value 245 on receiving the second set of data packets. The first, second and third deep learning models may be optimized and pretrained on an annotated dataset of a diverse set of human and non-human interlocutors. In an embodiment, the annotated dataset may include human annotated data entries with features associated with the physiological states of interlocutors 120.
[00066] Fig. 4 illustrates an exemplary flow diagram representation of a method for predicting physiological state of interlocutors, according to an embodiment of the present invention herein.
[00067] At block 402, the method 400 includes processing, at a processor 102 associated with a system 100, a set of multimedia data packets received from at least one interlocutor 120, said set of multimedia data packets being deconstructed into at least one media stream 215.
[00068] At block 404, the method 400 includes processing, at the processor 102, said at least one media stream 215 to detect at least one syntactical object 225 capable of indicating physiological state of the at least one interlocutor 120.
[00069] At block 406, the method 400 includes identifying, at the processor 102, at least one physiological state label 235 for each of the detected at least one syntactical object 225; and
[00070] At block 408, the method 400 includes extracting, at the processor 102, a physiological attribute value 245 for the at least one interlocutor based on the identified at least one physiological state label 235.
[00071] Additionally, the method 400 may also include transmitting, by the processor, a set of processed data packets associated with at least one message generated by the system 100 based on the extracted physiological attribute value 245 to the interlocutor 120.
[00072] The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks may be combined or otherwise performed in any order to implement the method 400 or an alternate method. Additionally, individual blocks may be deleted from the method 400.
[00073] Hence, the invention disclosed herein solves the need for a system for analyzing the physiological state of the interlocutor by detecting and combining physical and locutory expressions of the interlocutor. Furthermore, the invention disclosed herein solves the need for a system that provides recommendations to the interlocutor to adjust their physical and locutory expressions based on the physiological state of the interlocutors in communication.
[00074] While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.

ADVANTAGES OF THE PRESENT DISCLOSURE
[00075] The present disclosure provides for a system for analyzing physiological state of interlocutors.
[00076] The present disclosure provides for a system for analyzing physiological state of interlocutors that uses contextual information of the interlocutors such as physical and locutory expressions.
[00077] The present disclosure provides for a system that provides a physiological attribute value that is a numeric or ordinal value.
[00078] The present disclosure provides for a system for analyzing physiological state of interlocutors that also provides recommendations to adjust the physiological state of the interlocutors.
[00079] The present disclosure provides for a system that can analyze physiological states of a diverse set of human and non-human interlocutors.
[00080] The present disclosure provides a system and a method for providing an auto-buy feature that enables buying products on e-commerce platforms, when the price for the product(s) match the range that has been set by the interlocutor, and avoid missing the offers and discount during a certain period/ odd time of the day to the interlocutor.
List of References:
100 - System
110 - Electronic Device
102 - Processor
103 - Memory
104 - Interface(s)
105 - Database
120 - Interlocutors
125 - Computing devices
130 - Communication network
200 - Processing engine
210 – Data collection engine
212 - Input stream
215 - Media stream
215A - Image stream
215B - Audio stream
215C – Data stream
220 - Syntactic engine
225 - Syntactical Objects
225A - Faces
225B - Body Parts
225C - Postures
225D - Locutory acts
225E - Tones
230 - Prediction Engine
235 - Physiological state Label
235A- Facial state label
235B - Postural state label
235C - Locutory state label
235D - Tonal state label
240 - Aggregation engine
245 - Physiological attribute value
250 - Presentation Engine
255 - Output Stream
400- Method
1000 - Architecture 
, Claims:1. A system (100) for analysing physiological state of interlocutors (120), comprising:
a processor (102); a memory (103) coupled to the processor, wherein the memory (103) storing processor-executable instructions, which on execution, causes the processor (102) to:
process a set of multimedia data packets associated with at least one interlocutor (120), said set of multimedia data packets being deconstructed into at least one media stream (215);
process said at least one media stream (215) to detect at least one syntactical object (225) indicative of physiological state of the at least one interlocutor (120);
identify at least one physiological state label (235) for each of the detected at least one syntactical object (225); and
based on the identified at least one physiological state label (235), extract a physiological attribute value (245) for the at least one interlocutor (120).
2. The system (100) as claimed in claim 1, wherein upon extracting the physiological attribute value (245), the processor (102) is configured to:
transmit a set of processed data packets associated with at least one message generated by the system (100) based on the extracted physiological attribute value (245).
3. The system 100 as claimed in claim 2, wherein the set of processed data packets is combined with the at least one media stream (215) deconstructed from the set of multimedia data packets.
4. The system (100) as claimed in claim 1, wherein the set of multimedia data packets is a set of video data packets, a set of audio data packets, a set of textual data packets, or a set of data packets indicative of virtual representations of the at least one interlocutor in computer-generated environments, or a combination thereof.
5. The system (100) as claimed in claim 1, wherein the at least one media stream (215) is an image stream (215A) indicative of a set of at least one still image, an audio stream (215B) indicative of a recording of sounds, or a data stream (218) indicative of time metadata of the image stream (215A) and the audio stream (215B).
6. The system (100) as claimed in claim 1, wherein the at least one syntactical object (225) comprises at least one object or at least one part of an object in the at least one media stream (215) that is capable of signifying physiological states.
7. The system (100) as claimed in claim 6, wherein the at least one syntactical object (225) comprises physical expressions indicative of eye movements, faces, gestures, behaviours, body parts, or postures from the at least one media stream; and locutory expressions indicative of locutionary acts and tones from the at least one media stream.
8. The system 100 as claimed in claim 1, wherein the system 100 further comprises a syntactic engine (220) with a first deep learning model that outputs a first set of data packets indicative of at least one categorical label or at least one numeric value associated with the detected at least one syntactical object (225) on receiving data packets associated with the at least one media stream (215);
a prediction engine (230) with a second deep learning model that outputs a second set of data packets indicative of at least one categorical label associated with the detected at least one physiological state label (235) on receiving the first set of data packets; and
an aggregation engine (240) with a third deep learning model that outputs a third set of data packets indicative of at least one numerical or ordinal value associated with the detected at least one physiological attribute value (245) on receiving the second set of data packets,
wherein the first, second and third deep learning models are optimized and pretrained on an annotated dataset of a diverse set of human and non-human interlocutors (120).
9. A method (400) for analysing physiological state of interlocutors (120), said method (400) comprising the steps of:
processing, at a processor (102) associated with a system (100), a set of multimedia data packets associated with at least one interlocutor (120), said set of multimedia data packet being deconstructed into at least one media stream (215);
processing, at the processor (102), said at least one media stream (115) to detect at least one syntactical object (225) capable of indicating physiological state of the at least one interlocutor (120);
identifying, at the processor (102), at least one physiological state label (235) for each of the detected at least one syntactical object (225); and
based on the identified at least one physiological state label (235), at the processor (102), extracting a physiological attribute value (245) for the at least one interlocutor (120).
10. The method (400) as claimed in claim 9, wherein the method (400) further comprises the step of:
transmitting, by the processor (102), a set of processed data packets with at least one message generated by the system (100) based on the extracted physiological attribute value (245) to the interlocutor (120).

Documents

Application Documents

# Name Date
1 202241058916-STATEMENT OF UNDERTAKING (FORM 3) [14-10-2022(online)].pdf 2022-10-14
2 202241058916-POWER OF AUTHORITY [14-10-2022(online)].pdf 2022-10-14
3 202241058916-FORM 1 [14-10-2022(online)].pdf 2022-10-14
4 202241058916-DRAWINGS [14-10-2022(online)].pdf 2022-10-14
5 202241058916-DECLARATION OF INVENTORSHIP (FORM 5) [14-10-2022(online)].pdf 2022-10-14
6 202241058916-COMPLETE SPECIFICATION [14-10-2022(online)].pdf 2022-10-14
7 202241058916-Proof of Right [20-10-2022(online)].pdf 2022-10-20
8 202241058916-ENDORSEMENT BY INVENTORS [31-10-2022(online)].pdf 2022-10-31