An Iot Based Emotion Detection And Object Identification System And

< Back

An Iot Based Emotion Detection And Object Identification System And Method Thereof

Abstract: ABSTRACT: Title: An IoT-Based Emotion Detection and Object Identification System and Method Thereof The present disclosure proposes an IoT-based emotion detection and identification system and method for detecting the symptoms of attention disorder, emotional dysregulation, mental stress, and autism spectrum disorder (ASD). An IoT-based system (100) comprises a computing device (102) having a controller (104) and a memory (106) for storing and executing a plurality of instructions and a plurality of modules (108) by the controller (104). The plurality of modules (108) comprises a camera interface module (110), an object detection module (114), an emotion detection module (116), an object recognition module (118), an object analysis module (120), an instructional content selection module (122), and a presentation module (124). The proposed IoT-based system (100) customize the instructional videos based on individual needs is a crucial advantage, adjustment of pacing, detail level, and language used in the videos ensures optimal learning for each child.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

04 May 2024

Publication Number

19/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Andhra University

Andhra University, Waltair, Visakhapatnam-530003, Andhra Pradesh, India.

Inventors

1. Prof. D. Lalitha Bhaskari

Professor, Department of Computer Science & Systems Engineering, AU college of Engineering (A), Andhra University, Waltair, Visakhapatnam-530003, Andhra Pradesh, India.

2. I. Srilalita Sarwani

Research Scholar, Department of Computer Science & Systems Engineering, AU college of Engineering (A), Andhra University, Waltair, Visakhapatnam- 530003, Andhra Pradesh, India.

Specification

Description:DESCRIPTION:
Field of the invention:
[0001] The present disclosure generally relates to the technical field of systems and methods for detecting abnormal mental and emotional health and, in specific relates to an IoT-based emotion detection and identification system and method for detecting the symptoms of attention disorder, emotional dysregulation, mental stress, and autism spectrum disorder (ASD).
Background of the invention:
[0002] Emotions are the instinctive state of mind of an individual that are catalyzed and brought on by neuropsychological changes. These changes can be triggered by a circumstance or in relation to others. When such changes take place, they almost instantaneously impact the physiological nature of a being. This complex amalgamation of consciousness, bodily sensations, and behaviour is often critical to one's wellbeing and holistic development. Emotion dysregulation is a difficulty in managing one's emotions in a healthy way. Emotion dysregulation is expressed in the form of anger, frustration, and anxiety, which are all normal emotions but can become problematic when they are experienced too intensely or for too long. People with emotion dysregulation also have difficulty in identifying and expressing their emotions, or they resort to unhealthy coping mechanisms such as substance abuse or self-harm.

[0003] There are a number of possible causes of emotion dysregulation, including genetics, brain chemistry, and environmental factors such as childhood trauma. There is no one-size-fits-all treatment for emotion dysregulation, but there are a number of therapies that can be helpful, such as cognitive-behavioural therapy (CBT), dialectical behaviour therapy (DBT), and mindfulness-based therapies.

[0004] The brain undergoes a period of rapid development in the first few years of life. This is when neural connections are being formed at an astonishing rate. These connections are essential for learning, memory, and emotional regulation. Negative experiences, behaviours, or environmental factors can disrupt this development and lead to mental health problems. For example, children who experience abuse or neglect are more likely to develop anxiety, depression, and other mental health problems. Even relatively minor negative experiences, such as being teased or bullied, can have a lasting impact on a user's mental health. This is because these experiences can lead to feelings of insecurity, worthlessness, and fear.

[0005] Early experiences are not the only factor that determines a user's mental health. Genetics, family history, and other factors also play a role. However, early childhood experiences can have a profound impact on a user's development and mental health. Disorders such as autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD), anxiety, depression, obsessive-compulsive disorder (OCD), oppositional defiant disorder (ODD), post-traumatic stress disorder (PTSD), and chronic stress can all alter a person's physiological baseline. As a result, it is critical to monitor and address even minor changes in these physiological bio-variables in order to assess a person's emotional and physical health.

[0006] An attempt was made to diagnose schizophrenia using pre-pulse inhibition (PPI). PPI is a neurological phenomenon in which a weaker pre-stimulus (pre pulse) inhibits the reaction of an organism to a subsequent strong reflex-eliciting stimulus (pulse), often using the startle reflex. The stimuli are usually acoustic. The reduction or lack of reduction of the amplitude of the startle reflects the ability of the nervous system to temporarily adapt to a strong sensory stimulus when a preceding weaker signal is given to warn the organism (e.g., ears). In the test, known in the art as, the “San Diago” test, examinees were given acoustical stimulus and pre- stimulus and the startle response was measured using Electromyography (EMG). The test results were not conclusive and the method was abandoned.

[0007] Therefore, there is a need for an IoT-based system for detecting emotion and object identification. There is also need for an IoT-based system for the symptoms of attention disorder, emotional dysregulation, and autism spectrum disorder (ASD). There is also a need for an IoT-based system that utilizes a database for emotion analysis using physiological and audio visuals signals. There is also need for an IoT-based system that combines real-time emotion detection, considering both facial expressions and stereotypic behaviors, with an interactive object identification system using voice feedback. Further, there is also a need for an IoT-based system that provides rapid evaluation and personalized intervention regimen for children, adults and old persons.
Objectives of the invention:
[0008] The primary objective of the invention is to provide an IoT-based emotion detection and identification system and method for detecting the symptoms of attention disorder, emotional dysregulation, mental stress, and autism spectrum disorder (ASD).

[0009] Another objective of the invention is to provide an IoT-based system that aids users to understand their mood and personality traits.

[0010] The other objective of the invention is to provide an IoT-based system that integrated with instructional videos triggered by object detection, the software capitalizes on the visual learning style prevalent in many children with autism, enhances information retention and understanding.

[0011] The other objective of the invention is to provide an IoT-based system that detects an emotion for security purpose, easy communication and automated identification.

[0012] The other objective of the invention is to provide an IoT-based system that customize the instructional videos based on individual needs is a crucial advantage, adjustment of pacing, detail level, and language used in the videos ensures optimal learning for each child.

[0013] The other objective of the invention is to provide an IoT-based system that detects reveal valuable information on frequently encountered objects and those posing difficulty, identifying user patterns, tracking progress, and informing future software improvements.

[0014] Yet another objective of the invention is to provide an IoT-based system that collects physiological and psychological signals from users to recognise their emotions.

[0015] Further objective of the invention is to provide an IoT-based system that includes a set of modules to measure relevant physiological signals associated with basic drive of a body sensor.
Summary of the invention:
[0016] The present disclosure proposes an IoT-based emotion detection and object identification system and method thereof. The following presents a simplified summary in order to provide a basic understanding of some aspects of the claimed subject matter. This summary is not an extensive overview. It is not intended to identify key/critical elements or to delineate the scope of the claimed subject matter. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

[0017] In order to overcome the above deficiencies of the prior art, the present disclosure is to solve the technical problem to provide an IoT-based emotion detection and identification system and method for detecting the symptoms of attention disorder, emotional dysregulation, mental stress, and autism spectrum disorder (ASD).

[0018] According to an aspect, the invention provides an IoT-based system for detecting emotion and identifying objects. In one embodiment, the IoT-based system comprises a computing device having a controller and a memory for storing and executing a plurality of instructions and a plurality of modules by the controller. The plurality of modules comprises a camera interface module, an object detection module, an emotion detection module, an object recognition module, an object analysis module, an instructional content selection module, and a presentation module.

[0019] In one embodiment, the camera interface module is configured to capture video frames from the computing device. In one embodiment herein, the object detection module is configured to identify objects in the video frames using a computer vision algorithm. In one embodiment, the emotion detection module is configured to detect the emotional state of a user based on facial expressions and stimming behaviour captured by the camera interface module. In one embodiment herein, the object recognition module is configured to classify the detected objects. The emotion detection module utilizes a machine learning framework to analyze the facial expressions and stimming behaviour.

[0020] The emotion detection module is configured to allocate a weight to the analysis of facial expressions and the detected stimming behaviour, and combines the weighted analysis to determine the emotional state. The emotion detection module comprises a stimming behavior detection algorithm is configured to identify repetitive movements associated with the user's emotional state. In one embodiment herein, the object analysis module is configured to analyse the relevance and significance of the classified objects.

[0021] In one embodiment, the instructional content selection module is configured to select instructional content based on the analysis of the classified objects and the detected emotional state of the user. The instructional content selection module is configured to select instructional content based on user preferences. The instructional content selection module includes videos, tutorials, or interactive guides. The feedback mechanism is configured to receive feedback from the user on the instructional content.

[0022] In one embodiment, the presentation module is configured to present the selected instructional content to the user through the computing device. The IoT-based system is configured to detect users’ emotional response and attention to digital content in real time by continuous monitoring of the user biofeedback while they are experiencing or engaging into any digital content by viewing or listening to the digital content.

[0023] According to another aspect, the invention provides a method for operating the IoT-based system for detecting emotion and identifying objects. At one step, the camera interface module captures the capture video frames from a computing device. At another step, the object detection module identifies the objects in the video frames using the computer vision algorithm. At another step, the emotion detection module detects the emotional state of a user based on facial expressions and stimming behaviour captured by the camera interface module.

[0024] At another step, the object analysis module analyses the relevance and significance of the classified objects upon classifying the detected objects via an object recognition module. At another step, the instructional content selection module selects the instructional content based on the analysis of the classified objects and the detected emotional state of the user.

[0025] Further, at another step, the presentation module presents the selected instructional content to the user through the computing device. The detection of the emotional state of the user further comprises utilizing a machine learning framework to analyse the facial expressions and stimming behaviour.

[0026] Further, objects and advantages of the present invention will be apparent from a study of the following portion of the specification, the claims, and the attached drawings.
Detailed description of drawings:
[0027] The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate an embodiment of the invention, and, together with the description, explain the principles of the invention.

[0028] FIG. 1 illustrates a block diagram of a block diagram of an IoT-based system for detecting emotion and identifying objects for children, in accordance of an exemplary embodiment of the invention.

[0029] FIG. 2A illustrates a schematic view of an emotion detection in autism children, in accordance of an exemplary embodiment of the invention.

[0030] FIG. 2B illustrates a schematic view of the emotion state classification of facial expressions, in accordance of an exemplary embodiment of the invention.

[0031] FIG. 2C illustrates a schematic view of a camera unit captures real-time objects, in accordance of an exemplary embodiment of the invention.

[0032] FIG. 3 illustrates a schematic view of an architecture accepts input images, in accordance of an exemplary embodiment of the invention.

[0033] FIG. 4A illustrates a schematic view of the camera unit integrates with a software to capture with pixels of an image, in accordance of an exemplary embodiment of the invention.

[0034] FIG. 4B illustrates a schematic view of the software conditional probability map, in accordance of an exemplary embodiment of the invention.

[0035] FIG. 4C illustrates a schematic view of the software output feature map, in accordance of an exemplary embodiment of the invention.

[0036] FIG. 4D illustrates a schematic view of the software test result, in accordance of an exemplary embodiment of the invention.

[0037] FIG. 5 illustrates a flowchart of a method for operating the IoT-based system for detecting emotion and identifying objects, in accordance of an exemplary embodiment of the invention.
Detailed invention disclosure:
[0038] Various embodiments of the present invention will be described in reference to the accompanying drawings. Wherever possible, same or similar reference numerals are used in the drawings and the description to refer to the same or like parts or steps.

[0039] The present disclosure has been made with a view towards solving the problem with the prior art described above, and it is an object of the present invention to provide an IoT-based emotion detection and identification system and method for detecting the symptoms of attention disorder, emotional dysregulation, mental stress, and autism spectrum disorder (ASD).

[0040] According to an exemplary embodiment of the invention, FIG. 1 refers to a block diagram of the IoT-based system for detecting emotion and identifying objects. In one embodiment herein, the IoT-based system comprises a computing device having a controller and a memory for storing and executing a plurality of instructions and a plurality of modules by the controller. The plurality of modules comprises a camera interface module, an object detection module, an emotion detection module, an object recognition module, an object analysis module, an instructional content selection module, and a presentation module. In one embodiment herein, the camera interface module is configured to capture video frames from the computing device. In one embodiment herein, the object detection module is configured to identify objects in the video frames using a computer vision algorithm

[0041] In one embodiment herein, the emotion detection module is configured to detect the emotional state of a user based on facial expressions and stimming behaviour captured by the camera interface module. In one embodiment herein, the object recognition module is configured to classify the detected objects. The emotion detection module utilizes a machine learning framework to analyze the facial expressions and stimming behaviour. The emotion detection module is configured to allocate a weight to the analysis of facial expressions and the detected stimming behaviour, and combines the weighted analysis to determine the emotional state. The emotion detection module comprises a stimming behavior detection algorithm is configured to identify repetitive movements associated with the user's emotional state. In one embodiment herein, the object analysis module is configured to analyse the relevance and significance of the classified objects.

[0042] In one embodiment herein, the IoT-based system is configured to communicate with an application server via a network for transferring the real-time data to the computing device. In one embodiment herein, the instructional content selection module is configured to select instructional content based on the analysis of the classified objects and the detected emotional state of the user. The instructional content selection module is configured to select instructional content based on user preferences. The instructional content selection module includes videos, tutorials, or interactive guides. The feedback mechanism is configured to receive feedback from the user on the instructional content.

[0043] In one embodiment herein, the presentation module is configured to present the selected instructional content to the user through the computing device. The IoT-based system is configured to detect users’ emotional response and attention to digital content in real time by continuous monitoring of the user biofeedback while they are experiencing or engaging into any digital content by viewing or listening to the digital content. In one example embodiment herein, the IoT-based system is integrated with the computing device to detect objects and then providing instructional videos on how to use those objects can be incredibly helpful for children with autism. It could benefit are a visual learning, a real-time assistance, a customization, life skills development, a parent involvement, and a data collection. The visual learning is might for children with autism are visual learners. By providing instructional videos, you're catering to their preferred learning style, making it easier for them to understand and retain information.

[0044] The real-time assistance that might offer immediate assistance whenever a child encounters an object they're unfamiliar with or unsure how to use. This real-time support can help them navigate their environment more independently. The customization might be tailored the instructional videos to suit the individual needs and preferences of each child. For example, you can adjust the pacing, level of detail, or language used in the videos to match their learning abilities. The life skills development might teach children how to use everyday objects can help them develop important life skills and increase their independence in various settings, such as at home, school, or in the community. The parent involvement may involve parents or caregivers by providing them with insights into their child's interactions with objects and suggestions for supporting their learning and development outside of the software.

[0045] According to another exemplary embodiment of the invention, FIG. 2A refers to schematic view 202 of the emotion detection in autism children. According to another exemplary embodiment of the invention, FIG. 2B refers to schematic view 204 of the emotion state classification of facial expressions. In one embodiment herein, the emotion detection using Python typically involves leveraging computer vision libraries like OpenCV and machine learning frameworks like TensorFlow or PyTorch. The high-level overview of implement emotion detection using Python are dataset, which is start by obtaining a dataset of facial images labeled with different emotions. Common datasets used for emotion detection include the FER2013 dataset, CK+ dataset, or the Extended Cohn-Kanade (CK+) dataset.

[0046] The pre-processing is configured to pre-process the dataset by resizing the images to a consistent size, converting them to grayscale, and normalizing pixel values to improve model training. The model training might choose a machine learning model architecture for emotion detection. Common approaches include Convolutional Neural Networks (CNNs) or pre-trained models like VGG, ResNet, or Mobile Net. Train the model on the labelled dataset to learn to recognize facial expressions.

[0047] The model evaluation might evaluate the trained model using a separate validation dataset to assess its performance in terms of accuracy, precision, recall, and F1-score. The deployment model achieves satisfactory performance, deploy it in your application. The frame works like Flask or Django to create a web application or integrate the model into other Python projects. The real-time emotion detection detects by capturing frames from a webcam or video stream using OpenCV. Process each frame by detecting faces and applying the trained model to predict the emotion of each face.

[0048] In one embodiment herein, the data collection use of the camera, your software can collect data on the objects that children interact with most frequently or have difficulty using. This data can be valuable for identifying patterns, tracking progress, and informing future updates or enhancements to the software. The potential to empower children with autism by providing them with personalized, real-time support for navigating their environment and learning how to use everyday objects effectively. It's an exciting and impactful use of technology in the field of autism support.

[0049] According to another exemplary embodiment of the invention, FIG. 2C refers to schematic view 206 of a camera unit captures real-time objects. The software initializes the camera interface on the user's device, accessing the live camera feed. It configures camera settings such as resolution, frame rate, and focus to optimize object detection performance. Upon activation, the software establishes a continuous stream of image frames captured by the device's camera. The camera feed is processed in real-time, providing a constant flow of visual information to the object detection module. Ensuring smooth operation, the software manages resources efficiently, maintaining a stable connection to the device's camera interface. Parameters such as exposure, white balance, and autofocus are adjusted dynamically to adapt to varying lighting conditions.

[0050] Frame buffering techniques may be employed to enhance processing speed and minimize latency in capturing image data. The software verifies camera permissions and system compatibility before initializing the camera interface. It monitors camera health indicators, such as temperature and battery level, to prevent hardware-related issues during operation. Error handling mechanisms are implemented to gracefully manage interruptions or failures in the camera acquisition process.

[0051] In one embodiment herein, the object detection utilizes the advanced computer vision algorithms, the software meticulously analyses each frame of the camera feed. Employing techniques such as convolutional neural networks (CNNs), the software extracts features indicative of object presence. Each frame is divided into smaller regions, known as anchors, to facilitate localized object detection. The software applies a series of convolutional and pooling layers to extract hierarchical features from the image data.

[0052] Through iterative refinement, the object detection module progressively narrows down candidate regions containing potential objects. To optimize performance, parallel processing techniques may be employed to distribute computational workload across multiple cores. Advanced optimization strategies, including quantization and model pruning, may be employed to enhance efficiency on resource-constrained devices. The software incorporates adaptive learning mechanisms to refine its object detection capabilities over time. Real-time feedback loops enable the software to adjust detection parameters dynamically based on environmental conditions. Comprehensive logging and diagnostics mechanisms are implemented to monitor the performance and accuracy of the object detection process.

[0053] In one embodiment herein, the object detection algorithm might utilize a state-of-the-art object detection algorithm, such as YOLO (You Only Look Once) to identify objects in real-time from camera feeds. The chosen algorithm offers high accuracy, speed, and efficiency in detecting objects, ensuring reliable performance for users. The principles of object detection algorithms, such as YOLO or SSD, including how they process images and identify objects. Discuss the suitability of these algorithms for real-time object detection tasks and their performance metrics in terms of accuracy, speed, and efficiency.

[0054] The object detection algorithms can be applied to assist children with autism in recognizing objects in their environment. The software harnesses cutting-edge object detection algorithms, such as YOLO (You Only Look Once), to empower real-time identification of objects from camera feeds. These algorithms are built on sophisticated principles that enable them to efficiently process images and accurately recognize objects within them. The object detection algorithms like YOLO and SSD (Single Shot Multi Box Detector) operate by dividing the input image into a grid of cells. For each cell, the algorithm predicts bounding boxes and associated confidence scores for potential objects. These bounding boxes are refined through regression techniques to precisely localize the objects within the image. Additionally, the algorithms classify the content within each bounding box, assigning labels to detected objects based on learned features.

[0055] One key principle underlying these algorithms is their ability to perform detection and classification simultaneously. Unlike traditional methods that require separate stages for object localization and classification, YOLO and SSD streamline the process by directly predicting bounding boxes and class probabilities in a single forward pass through the neural network.

[0056] According to another exemplary embodiment of the invention, FIG. 3 refers to a schematic view 300 of the architecture accepts input images. In one embodiment herein, the architecture begins by accepting input images, which are then resized to dimensions of 448x448 pixels. To ensure consistency in aspect ratios, padding is applied as needed. This preprocessing step is crucial for maintaining the integrity of the image information across various input sizes. Once resized, the images are passed through a Convolutional Neural Network (CNN) architecture tailored for object detection tasks. The CNN architecture consists of a series of 24 convolutional layers, strategically interspersed with 4 max- pooling layers to down sample the feature maps and capture hierarchical representations of the input images.

[0057] The convolutional layers are designed to extract intricate features at different spatial scales, enabling the model to discern objects of varying sizes and complexities. To manage computational complexity and reduce the number of channels without sacrificing important features, the model incorporates 1x1 convolutions followed by 3x3 convolutions. This technique helps streamline the network architecture while preserving crucial information necessary for accurate object detection.

[0058] At the output stage, the model generates predictions in a cuboidal format. This entails transforming the output from the final fully connected layer into a vector of dimensions (1, 1470), which is then reshaped into a 3D tensor of size (7, 7, 30). Each element within this tensor encapsulates specific information pertinent to object detection, such as class probabilities and bounding box coordinates. This structured output format facilitates precise localization and classification of objects within the input images.

[0059] In one embodiment herein, the architecture might be Leaky Rectified Linear Unit (ReLU) activation function is predominantly utilized, except for the final layer where a linear activation function is employed. The Leaky Rectified Linear Unit function introduces a small slope for negative inputs, aiding in mitigating the vanishing gradient problem commonly encountered during training. To enhance model robustness and facilitate stable training, batch normalization is integrated into the architecture. Batch normalization normalizes the activations of each layer, reducing internal covariate shift and accelerating convergence during training. Additionally, dropout regularization is employed as a regularization technique to prevent overfitting. Dropout randomly deactivates a fraction of neurons during training, forcing the network to learn more robust and generalizable features by reducing reliance on specific neurons. The architecture amalgamates a suite of sophisticated techniques to effectively process input images, extract pertinent features, and accurately predict the presence and attributes of objects within the images. By leveraging a combination of advanced architectural design choices and regularization strategies, the model demonstrates robust performance across diverse object detection scenarios.

[0060] In one embodiment herein, the training is trained ont he ImageNet-1000 dataset, a large- scale dataset containing images categorized into 1000 classes. Training the model spanned over a week, during which it achieved a top-5 accuracyof88% on the ImageNet 2012 validation set. This level of accuracy is comparable to Google Net, the state-of-the-art model at the time of training. Fast YOLO, a variant of the YOLO (You Only Look Once) object detection model, is characterized by its streamlined architecture, featuring only 9layers instead of the original 24 layers in YOLO. Additionally, Fast YOLO employs fewer filters, reducing computational complexity while maintaining performance. Despite its simplified architecture, Fast YOLO shares many parameters and design principles with the original YOLO model. Notably, both models utilize the sum- squared error loss function, which simplifies optimization. However, one limitation of this loss function is that it assigns equal weight to both the classification and localization tasks. In YOLO, the loss function is defined as follows:
?coord?i=0S2?j=0B1objij[(xi-x^I)2+(yI-y^i)2]+?coord?i=0S2?j=0B1objij[(wi--v-w^i--v)2+ (hi--v-h^i--v)2]+?i=0S2?j=0B1objij(Ci-C^i)2+?noobj?i=0S2?j=0B1n oobjij(Ci-C^i)2+?i=0S21obji ?c?classes(pi(c)-p^i(c))2

[0061] According to another exemplary embodiment of the invention, FIG. 4A refers to a schematic view 402 refers to a schematic view of the camera unit integrates with the software to capture with pixels of an image. According to another exemplary embodiment of the invention, FIG. 4B refers to a schematic view 404 refers to a schematic view of the software conditional probability map. According to another exemplary embodiment of the invention, FIG. 4C refers to a schematic view 406 of the software output feature map. In one embodiment herein, the detection is integrated with the architecture that divided the image into a grid of size S×S. Each grid cell is responsible for detecting objects whose center lies within its boundaries. For each grid cell, the model predicts bounding boxes along with their confidence scores. These confidence scores indicate the likelihood that the predicted bounding box contains an object and how accurately it predicts the bounding box coordinates relative to the ground truth.

[0062] During testing, the final confidence score for each predicted bounding box is computed as follows: When no object exists in the grid cell, the confidence score is set to 0. If an object is present in the image, the confidence score is determined by the Intersection over Union (IoU) between the ground truth and predicted bounding boxes. Each bounding box prediction consists of five parameters (x, y) represents the coordinates of the center of the bounding box relative to the grid cell boundaries, while w and h represent the width and height of the bounding box, respectively, relative to (x, y). By combining conditional class probabilities and individual box confidence predictions at test time, the model generates a comprehensive assessment of object presence and localization accuracy across the image grid. This approach enables robust object detection by leveraging both spatial information and confidence scores to accurately identify objects within the input image. YOLO bounding box combination each gradual so predicts conditional class probability (Classi | Object). YOLO conditional probability map. In this architecture, class probabilities are conditioned on the presence of an object within each grid cell. Despite the potential for multiple bounding box predictions per grid cell, the model generates only one set of class probabilities. These predictions are encoded in a 3D tensor of size S×S×(5×B+C), where S is the grid size, B is the number of bounding boxes per grid cell, and C is the number of classes.

[0063] During inference, the model multiplies the conditional class probabilities with the individual box confidence predictions. This multiplication operation combines spatial information with confidence scores, aiding in the accurate identification of objects within the input image. By employing this approach, the model can effectively evaluate the presence of objects and their associated class probabilities across the entire image grid. This enables robust object detection capabilities, facilitating the localization and classification of objects within complex visual scenes.

[0064] Each term in the equation corresponds to a distinct aspect of object detection or classification tasks. Pr(Class i | Object): This term represents the conditional probability of an object belonging to a specific class igiven that an object exists within the grid cell. Pr(Object): This term denotes the probability of an object being present within the grid cell. Intersection over Union truth, the Intersection over Union (IoU) measures the overlap between the predicted bounding box and the ground truth bounding box. It quantifies the accuracy of the bounding box prediction relative to the ground truth. Pr(Class i): This probability represents the likelihood of an object belonging to class i, irrespective of whether an object exists within the grid cell.

[0065] According to another exemplary embodiment of the invention, FIG. 4D refers to a schematic view 408 of the software test result. In one embodiment herein, the content equation is a common formulation used in machine learning models designed for object detection tasks, where the objective is to accurately identify and localize objects within an image. In the YOLO testing phase, class-specific confidence scores are generated for each bounding box prediction. Subsequently, non-maximal suppression is applied to filter out redundant bounding box predictions, particularly when multiple boxes are predicted for the same object. Finally, the remaining predictions constitute the model's final output.

[0066] The YOLO distinguishes itself with its efficiency during the testing phase, leveraging a single CNN architecture to predict results. Additionally, YOLO treats the classification task as a regression problem, further streamlining the inference process. Regarding performance metrics, empirical evaluations reveal that the standard YOLO model achieves a mean average precision (map) of 63.4 percent when trained on the VOC dataset spanning 2007 and 2012. In contrast, the Fast YOLO variant, which is approximately three times faster in result generation, attains a slightly lower map of 52 percent. While this performance is inferior to the top-performing Fast R-CNN model, which achieves the map of 71 percent, and the R-CNN model, which achieves an map of 66 percent, it surpasses other real- time detectors such as DPMv5, which achieves an map of 33 percent, highlighting YOLO's competitive accuracy in real- time object detection scenarios.

[0067] In one embodiment herein, the object recognition is configured to identify and classify the detected objects. Leveraging a diverse set of image features, including shape, texture, and color, the software distinguishes between different object categories. Machine learning models, such as support vector machines (SVMs) or decision trees, are utilized for robust object recognition. The software compares the extracted features of detected objects against a pre-trained database of object descriptors. Utilizing pattern recognition techniques, the software assigns a confidence score to each object category based on the similarity of features. Threshold methods may be employed to filter out spurious detections or noise in the camera feed. Contextual information, such as scene semantics and object relationships, is incorporated to improve recognition accuracy. The software employs ensemble learning techniques to combine multiple recognition models for enhanced performance. In cases of ambiguous or uncertain detections, the software employs probabilistic inference methods to infer the most likely object class. Continuous refinement of recognition algorithms is facilitated through iterative model updates and data augmentation techniques.

[0068] In one embodiment herein, the object analysis are recognized, the software proceeds with contextual analysis to determine their relevance and significance. Incorporating environmental cues and user- specific preferences, the software evaluates the potential utility of detected objects. Semantic understanding techniques are employed to infer the functional role of objects within the user's context. The software assesses the user's current activities and objectives to prioritize the analysis of detected objects. Dynamic scene understanding enables the software to adapt its analysis based on evolving user interactions and environmental changes. Hierarchical reasoning mechanisms facilitate multi- level object analysis, considering both individual objects and their collective context. The software employs machine learning algorithms to infer user intentions and preferences from observed object interactions.

[0069] Statistical modelling techniques are utilized to quantify the salience and relevance of detected objects within the user's environment. Feedback mechanisms enable users to provide input on the accuracy and usefulness of object analysis results. Continuous refinement of object analysis algorithms is achieved through iterative learning from user feedback and system performance metrics. In one embodiment herein, the instructional content selection is configured to informed by the results of object analysis, the software selects appropriate instructional content to present to the user. An extensive repository of instructional materials, including videos, tutorials, and interactive guides, is curated based on user needs. Content selection algorithms consider factors such as object category, user proficiency level, and learning objectives. Machine learning techniques, such as collaborative filtering and content-based recommendation, personalize content selection based on user preferences.

[0070] The software employs adaptive learning strategies to dynamically adjust the difficulty and pacing of instructional content. Context-aware content filtering ensures that instructional materials are relevant and aligned with the user's immediate learning context. Metadata associated with instructional content, such as topic tags and difficulty ratings, is leveraged to facilitate content selection. Content diversity is maintained to cater to users with varying interests, learning styles, and cognitive abilities. The software incorporates user feedback mechanisms to iteratively refine content selection algorithms and improve recommendation accuracy. Real-time monitoring of user engagement metrics informs content selection decisions, ensuring a personalized and engaging learning experience.

[0071] Real-time feedback mechanisms enable users to track their progress, receive performance metrics, and monitor their understanding of the presented material. Ramification elements, such as rewards, achievements, and progress bars, may be integrated to incentivize active participation and goal attainment. The software adapts its presentation style and content delivery based on user feedback and interaction patterns, ensuring a personalized and effective learning experience. Prompting and scaffolding techniques are employed to provide guidance and support to users as they navigate through instructional content. Continuous assessment mechanisms enable the software to dynamically adjust the difficulty level and pacing of instructional materials based on user performance and comprehension. User feedback is solicited at various stages of the learning process to gather insights, assess user satisfaction, and identify areas for improvement.

[0072] In one embodiment herein, the user interaction and engagement is configured to activate user engagement and exploration by providing interactive features and opportunities for hands-on learning. Users are encouraged to interact with instructional content through gestures, touch interactions, voice commands, or other input modalities supported by the device. Immersive learning experiences, such as augmented reality (AR) or virtual reality (VR) simulations, may be integrated to enhance user engagement and comprehension. Personalization features allow users to customize their learning experience by selecting topics of interest, adjusting content preferences, and setting learning goals. Social learning features, such as collaborative activities, peer-to- peer interactions, and community forums, promote knowledge sharing and peer support among users. Progress tracking tools enable users to monitor their learning achievements, track completion of learning modules, and set milestones for skill development.

[0073] The software provides timely feedback and reinforcement to users, acknowledging their efforts, celebrating achievements, and addressing areas for improvement. Iterative learning experiences encourage users to revisit previously learned concepts, apply newly acquired skills in real-world scenarios, and reinforce learning through practice. User engagement analytics are collected and analysed to assess the effectiveness of instructional content, identify user preferences, and optimize the learning experience. Continuous iteration and improvement based on user feedback and performance metrics ensure that the software remains adaptive, responsive, and conducive to sustained user engagement.

[0074] By leveraging technology, we're providing a platform for these children to engage with their environment more effectively, helping them comprehend the world around them in a way that suits their learning needs. With our application, children can interact with various objects, receiving spoken descriptions of each item and how it's used. This interactive approach not only helps them identify objects but also provides them with context on their functionality, fostering a deeper understanding of the world and how objects relate to daily activities.

[0075] According to another exemplary embodiment of the invention, FIG. 5 refers to a flowchart 500 of a method for operating the IoT-based system for detecting emotion and identifying objects. At step 502, the camera interface module 110 captures the capture video frames from the computing device 102. At step 504, the object detection module 112 identifies the objects in the video frames using the computer vision algorithm. At step 506, the emotion detection module 114 detects the emotional state of the user based on facial expressions and stimming behaviour captured by the camera interface module 110.

[0076] At step 508, the object analysis module 118 analyses the relevance and significance of the classified objects upon classifying the detected objects via the object recognition module 116. At step 510, the instructional content selection module 120 selects the instructional content based on the analysis of the classified objects and the detected emotional state of the user. At step 512, the presentation module 122 presents the selected instructional content to the user through the computing device 102. The detection of the emotional state of the user further comprises utilizing a machine learning framework to analyse the facial expressions and stimming behaviour.

[0077] Numerous advantages of the present disclosure may be apparent from the discussion above. In accordance with the present disclosure, the IoT-based system that aids users to understand their mood and personality traits. The proposed IoT-based system integrated with instructional videos triggered by object detection, the software capitalizes on the visual learning style prevalent in many children with autism, enhances information retention and understanding. The proposed IoT-based system detects an emotion for security purpose, easy communication and automated identification.

[0078] The proposed IoT-based system that customize the instructional videos based on individual needs is a crucial advantage, adjustment of pacing, detail level, and language used in the videos ensures optimal learning for each child. The proposed IoT-based system detects reveal valuable information on frequently encountered objects and those posing difficulty, identifying user patterns, tracking progress, and informing future software improvements. The proposed IoT-based system collects physiological and psychological signals from users to recognise their emotions. The proposed IoT-based system that includes a set of modules to measure relevant physiological signals associated with basic drive of a body sensor.

[0079] It will readily be apparent that numerous modifications and alterations can be made to the processes described in the foregoing examples without departing from the principles underlying the invention, and all such modifications and alterations are intended to be embraced by this application.
, Claims:CLAIMS:
I / We Claim:
1. An IoT-based system (100) for detecting emotion and identifying objects, comprising:
a computing device (102) having a controller (104) and a memory (106) for storing and executing a plurality of instructions and a plurality of modules (108) by the controller (104), wherein the plurality of modules (108) comprises:
a camera interface module (110) configured to capture video frames from a computing device (102);
an object detection module (112) configured to identify objects in the video frames using a computer vision algorithm;
an emotion detection module (114) configured to detect the emotional state of a user based on facial expressions and stimming behaviour captured by the camera interface module (110);
an object recognition module (116) configured to classify the detected objects;
an object analysis module (118) configured to analyze the relevance and significance of the classified objects;
an instructional content selection module (120) configured to select instructional content based on the analysis of the classified objects and the detected emotional state of the user; and
a presentation module (122) configured to present the selected instructional content to the user through the computing device (102).
2. The IoT-based system (100) as claimed in claim 1, wherein the emotion detection module (114) utilizes a machine learning framework to analyze the facial expressions and stimming behaviour.
3. The IoT-based system (100) as claimed in claim 1, wherein the instructional content selection module (120) is configured to select instructional content based on user preferences, wherein the instructional content selection module (120) includes videos, tutorials, or interactive guides.
4. The IoT-based system (100) as claimed in claim 1, wherein the feedback mechanism configured to receive feedback from the user on the instructional content.
5. The IoT-based system (100) as claimed in claim 1, wherein the emotion detection module (114) allocates a weight to the analysis of facial expressions and the detected stimming behaviour, and combines the weighted analysis to determine the emotional state, wherein the emotion detection module (114) comprises a stimming behavior detection algorithm is configured to identify repetitive movements associated with the user's emotional state.
6. The IoT-based system (100) as claimed in claim 1, wherein the IoT-based system (100) is configured to detect users emotional response and attention to digital content in real time by continuous monitoring of the user biofeedback while they are experiencing or engaging into any digital content by viewing or listening to the digital content.
7. The IoT-based system (100) as claimed in claim 1, wherein the IoT-based system (100) is configured to communicate with an application server (126) via a network (124) for transferring the real-time data to the computing device (102).
8. A method for operating of an IoT-based system (100) for detecting emotion and identifying objects, comprising:
capturing, by a camera interface module (110), capture video frames from a computing device (102);
identifying, by an object detection module (112), objects in the video frames using a computer vision algorithm;
detecting, by an emotion detection module (114), the emotional state of a user based on facial expressions and stimming behaviour captured by the camera interface module (110);
analysing, by an object analysis module (118), the relevance and significance of the classified objects upon classifying the detected objects via an object recognition module (116);
selecting, by an instructional content selection module (120), instructional content based on the analysis of the classified objects and the detected emotional state of the user; and
presenting, by a presentation module (122), the selected instructional content to the user through the computing device (102).
9. The method as claimed in claim 7, wherein the detecting the emotional state of the user further comprises utilizing a machine learning framework to analyse the facial expressions and stimming behaviour.

Documents

Application Documents

#	Name	Date
1	202441035414-STATEMENT OF UNDERTAKING (FORM 3) [04-05-2024(online)].pdf	2024-05-04
2	202441035414-REQUEST FOR EARLY PUBLICATION(FORM-9) [04-05-2024(online)].pdf	2024-05-04
3	202441035414-POWER OF AUTHORITY [04-05-2024(online)].pdf	2024-05-04
4	202441035414-FORM-9 [04-05-2024(online)].pdf	2024-05-04
5	202441035414-FORM FOR SMALL ENTITY(FORM-28) [04-05-2024(online)].pdf	2024-05-04
6	202441035414-FORM 1 [04-05-2024(online)].pdf	2024-05-04
7	202441035414-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [04-05-2024(online)].pdf	2024-05-04
8	202441035414-EVIDENCE FOR REGISTRATION UNDER SSI [04-05-2024(online)].pdf	2024-05-04
9	202441035414-EDUCATIONAL INSTITUTION(S) [04-05-2024(online)].pdf	2024-05-04
10	202441035414-DRAWINGS [04-05-2024(online)].pdf	2024-05-04
11	202441035414-DECLARATION OF INVENTORSHIP (FORM 5) [04-05-2024(online)].pdf	2024-05-04
12	202441035414-COMPLETE SPECIFICATION [04-05-2024(online)].pdf	2024-05-04