Abstract: The speech therapy is a specialized intervention that aims to improve people's speech and communication abilities. Concurrently, emotion detection entails identifying emotional states using speech patterns. The proposed invention combines emotion recognition, speech therapy, and a chatbot interface in a novel way. The initial phase of this envisioned technology leverages emotion detection to determine the patient's emotional environment. This data is then used to guide the tailoring of speech therapy sessions to meet the individual's emotional needs. Furthermore, by incorporating a chatbot, the system provides rapid feedback and help. This chatbot can be programmed to respond to a variety of emotions and events. The potential consequences are significant. This invention uses algorithms such as Support Vector Machine (SVM), Artificial Neural Networks (ANN). The integration improves the precision and efficiency of speech treatment while also adding a personalised and supporting component to it. Furthermore, this approach solves accessibility issues, providing a solution for those who live in rural places or are constrained by traditional therapeutic limits. 4 Claims & 4 figures
Description:Field of the Invention
The current innovation is in the field of healthcare and relates to detecting emotion in order to include technology into speech therapy sessions. The goal is to develop algorithms and machine learning models that can detect and analyse emotional cues in patients' speech.
Background of the Invention
Speech therapy is a subspecialty of medicine that focuses on identifying, treating, and improving communication and speech impairments. It is extremely valuable to society since it helps people of all ages, from children with speech difficulties to adults recuperating from strokes, overcome communication challenges. Speech therapists help people pronounce language, express themselves, and improve their social and cognitive skills through individualised interventions. Finally, speech therapy helps individuals to live more satisfying lives by creating greater educational and employment prospects, stronger interpersonal relationships, and increased general well-being, all of which contribute considerably to society's overall health and social fabric.
For instance, US20210249035A1 discloses a method and system for detecting emotions in a speaker's voice. The method includes obtaining a baseline voice signal from the speaker, and then obtaining a target voice signal from the speaker while the speaker is expressing an emotion. The target voice signal is then compared to the baseline voice signal to determine the emotion being expressed. The method can be used to detect a variety of emotions, including happiness, sadness, anger, and fear. This can be used in a variety of applications, such as customer service, healthcare, and security.
Similarly, US11189302B2 relates to a method and apparatus for detecting emotions in a speaker's voice. The method includes obtaining a speech signal from the speaker, extracting features from the speech signal, and using machine learning to classify the emotion being expressed. The features that are extracted from the speech signal can include pitch, loudness, and duration of the speech signal. The machine learning model is trained on a dataset of speech signals that have been labelled with the emotions being expressed.
US9226866B2 discloses a speech therapy device that uses a computer to generate a personalized speech therapy program for a user. The program is based on the user's individual speech production abilities and goals. The device also includes a sensor that measures the user's speech production and provides feedback to the user. The feedback can be used to help the user improve their speech production. The device can be used to treat a variety of speech disorders, such as stuttering, dysarthria, and apraxia. The device can also be used to improve the speech production of people who are learning a new language.
US11517254B2 show A method and system for detecting errors when practicing fluency shaping exercises. The method includes setting each threshold of a set of thresholds to a respective predetermined initial value; analysing a voice production to compute a set of first energy levels composing the voice production. wherein the voice production is of a user practicing a fluency shaping exercise; detecting at least one speech-related error based on the computed set of first energy levels, a set of second energy levels, and the set of thresholds, wherein the detection of the at least one speech-related error is with respect to the fluency shaping exercise being practiced by the user, wherein the set of second energy levels is deter- mined based on a calibration process; and generating feed- back indicating the detected at least one speech-related error.
US10096319, the content describes the disclosure of systems, methods, and computer-readable media for determining both physical and emotional traits of users through voice-based interactions. One approach involves analysing a user's voice data to ascertain their real-time status, creating a corresponding data tag, selecting audio content based on this tag and voice data, and playing the chosen audio content through a speaker device.
US11004461 outlines methods to analyse voice signals and extract vocal features for assessing the emotional or mental states of individuals, particularly to identify risks like suicide. By examining changes in speech influenced by emotions, such as in hotline conversations, it's possible to gauge potential risks. The process involves preparing recordings, extracting vocal features, and making predictions. For instance, a computer-based approach involves obtaining conversation audio, isolating emotional components, and providing information about the person's emotional state.
US20220238110A1 shows a computer-implemented approach is based on the capabilities of a portable electronic device. It begins by capturing spoken input from a user and converting it into an audio signal using an audio input device. To improve accuracy and responsiveness, the gadget combines both an online and an offline processing module to perform speech recognition on this audio signal. Following that, an interactive game module uses the outcomes of these recognition procedures to generate user feedback, which could take the form of game responses or prompts. This input is then provided to the user via a user interface, delivering an interesting and participatory experience that makes use of the portable device's speech recognition technology.
US20220280087A1, this disclosure describes an emotion identification system based on visual perception. Several steps are involved in the process: First, a face video is captured and a region of interest is identified using face detection. Second, color channel data from this region is extracted for dimension reduction. Finally, when generating physiological characteristics such as heart rate, breathing waveforms, and heart rate variability (HRV), the signals are demised and stabilized. Nonlinear aspects of these physiological parameters are identified and employed for modelling and regression using a twin network-based support vector machine (SVM) model in the fourth stage. Finally, utilizing the regressed three-dimensional information, a valence-arousal-dominance (VAD) model is generated, resulting in a unique emotional representation. Essentially, this system recognizes emotions by combining visual data and physiological information, providing a fresh approach to emotion identification technology.
US20230096357A1, this contains voice-based interactions, this system and method focus on emotion recognition and emotion-driven moderation. It entails memorizing a user's emotion profile, which contains moderation rules associated with distinct emotional states. The system monitors a communication session including the user and others depending on the user's emotion profile. Moderation actions are initiated if it detects emotional states in messages that match the user's predefined rules. Modifying the appearance of messages on the user's device to meet the stated moderation actions is one of these actions. In essence, this technology tries to improve online interactions by adapting communication experiences based on users' emotional states.
US202000335086A1 describes Data augmentation which is employed in speech emotion recognition tasks to address imbalanced emotional labels, particularly when certain emotions like sadness are underrepresented in a training dataset, which is common in real-life scenarios. The proposed approach involves conditioned data augmentation using Generative Adversarial Networks (GANs) to generate samples for the underrepresented emotions. A conditional GAN architecture is used to create synthetic spectrograms for the minority emotion class. The effectiveness of this method is demonstrated by comparing it to traditional signal-based data augmentation techniques. Results show that the proposed GAN-based data augmentation method considerably enhances classification performance in speech emotion recognition compared to traditional methods.
Similarly, US20180137109A1, the system processes audio data containing a wakeword and a command linked to two speech-processing systems. It decides how to transition the audio data to the second system based on the level of interaction with it. If interaction is low, a comprehensive indication is created; if interaction is high, a concise indication is generated. A local device plays the indication audio before playing the audio generated by the second system in response to the command.
In the disclosures, as mentioned earlier, the accuracy, robustness, interpretability, privacy, and bias of emotion detection technologies can be constrained. These technologies' accuracy varies based on the technology utilised, the quality of the audio data, and the context of the dialogue. Many of the approaches and systems mentioned here rely on specific elements or cues, such as pitch, loudness, duration, or facial expressions, which may not fully capture the complexities of human emotions. They can also be affected by noise and other factors that distort the audio signal. The results of these technologies might be difficult to interpret since they frequently employ complex algorithms that are not always obvious to the user. Furthermore, the recognition process may struggle to distinguish subtle subtleties and context-dependent emotions. Emotion detection technology may be biased towards certain groups of people, such as those with accents or from specific cultures. By fine-tuning the algorithms and methodologies used, the project seeks to ensure that emotions are detected with higher precision and reliability. This enhancement is expected to have a significant positive impact on various fields, particularly speech therapy and emotional analysis.
Brief Description of Drawings
The invention will be described in detail with the reference to the exemplary embodiments shown in the figures wherein:
Figure-1: Flowgorithm representing the work flow of the speech-sentio
Figure-2: Flowchart representing speech emotion detection
Figure-3: Diagrammatic representation of speech-to-text conversion
Figure-4:Flow chart representing the basic architecture and workflow the developed prototype
Detailed Description of the Invention
The "SpeechSentio " presents an innovative approach to aiding individuals with speech challenges, particularly stammering. This project leverages advanced technologies to create a comprehensive and user-centric platform for speech improvement. The following detailed description provides insights into the application's functioning, impact, and significance.The application's journey begins with the user login process. Once users access the platform, they are prompted to record a speech sample. This recorded audio serves as the foundation for the subsequent analysis and feedback. The application takes this raw speech data and employs cutting-edge speech-to-text engines to convert the spoken words into text form. This transition from spoken to textual representation forms a pivotal step in the analysis pipeline.
From here, the application enters the realm of acoustic analysis. Acoustic features are extracted from the audio recording, encompassing a spectrum of auditory attributes. This phase involves the dissection of sound waves into discreet components, unraveling aspects such as pitch, rhythm, intensity, and resonance. These acoustic features are fundamental to understanding speech patterns and serve as the building blocks for the subsequent machine learning analysis.The core of the application's efficacy rests upon its implementation of machine learning models. These models have been meticulously trained to discern intricate patterns within the extracted acoustic features. A trained ML model is employed to identify areas where the user's speech may be struggling. This encompasses identifying instances of hesitation, repetition, disfluency, or deviations from established speech norms. The ML model's discerning capability lends a data-driven lens to the assessment process, offering insights that go beyond surface observations.
The culmination of this comprehensive analysis is the generation of a detailed report. This report encapsulates the outcomes of the acoustic analysis and the ML-driven assessment. It provides users with an incisive overview of their speech performance, highlighting areas of proficiency as well as those in need of improvement. This individualized report equips users with a roadmap for honing their speech skills and fostering greater fluency. The application's commitment to user engagement and empowerment extends to its chatbot functionality. Users are encouraged to seek clarification and insights through conversation with the chatbot. This interaction allows users to delve into the nuances of their report, comprehend the nuances of the ML-driven assessment, and gain a deeper understanding of their speech patterns. The chatbot's responsive nature enhances the user experience, fostering an environment of learning and growth.
The impact of the " SpeechSentio " is profound. It bridges the gap between technology and speech improvement, offering a multifaceted approach that resonates with users seeking to enhance their communication abilities. By fusing speech-to-text engines, acoustic analysis, and machine learning algorithms, the application transforms raw speech data into actionable insights. Users are not only empowered with insights into their speech patterns but are also equipped with personalized recommendations for progress. Beyond its technical prowess, the project underscores the potential for technology to catalyze positive change. It merges scientific rigor with user-centric design, thereby cultivating an ecosystem where individuals can embark on a journey of speech enhancement. Collaboration with experts in speech therapy ensures that the application adheres to best practices and offers a credible and valuable resource.
In conclusion, the " SpeechSentio " transcends conventional boundaries. It embodies the symbiotic relationship between technology and human empowerment, fostering speech improvement through data-driven analysis and personalized feedback. By championing user engagement, embracing machine learning, and delivering comprehensive insights, this project redefines speech therapy for the digital age.
Advantages of proposed model,
The integration of real-time feedback via chatbots provides a dynamic and instantaneous emotional monitoring system. As patients practise communication skills, the chatbot analyses their speech and emotional cues in real time, delivering rapid insights into their present emotional states. This immediate feedback enables people to recognise and manage their emotions on the spot, improving their capacity to control emotional responses during dialogue. Patients can engage in ongoing practise outside of therapy sessions with the chatbot's integration. This continual involvement allows for the consistent development of emotional regulation abilities, which improves the therapeutic process.
This technique employs advanced algorithms to detect minor emotional indicators inside speech. It creates a more thorough comprehension of emotional expressions by analysing characteristics such as intonation, pitch changes, and speech tempo. This increased precision ensures that even subtle emotions are reliably detected, providing trustworthy insights into patients' emotional states. As a result, we can personalise interventions to specific emotional responses, increasing the efficacy of therapy by addressing both linguistic and emotional limitations.
The system's adaptability allows it to be used in a variety of settings. It can effectively treat speech issues like stuttering by assisting people in controlling their emotional responses during communication exercises. Furthermore, for those experiencing communication difficulties as a result of worry, the system provides a platform to practise and improve emotional expressiveness. Furthermore, it aids language learners by supporting them in honing their emotional delivery while learning a new language.
The efficiency advantage is centred on the expedited therapeutic process made possible by technology. Therapists can devote more of their valuable time and expertise to personalised guidance and interventions by automating the emotion recognition aspect. This technological efficiency enables therapists to concentrate on adapting tactics to individual patients, ensuring that therapies are properly aligned with their emotional and communication needs. Therapy sessions become more focused and productive as a result, maximising the impact of therapist-patient interactions.
4 Claims & 4 figures , Claims:The scope of the invention is defined by the following claims:
Claims:
The following illustrates the exemplary embodiments of the invention.
1. The Speechsentio: AI based emotion analyser for spoken language comprising:
a) A Speech-to-text interface which allows people with stammering to record their speech. The speech recordings are then converted into text by a speech-to-text engine. b) A machine learning model is trained to analyse speech recordings and identify areas where the person is struggling. The model is trained on a dataset of speech recordings from people with stammering. The machine learning model is able to identify a variety of speech disorders.
c) A Report generator is incorporated to generate a report that includes a speech score, strengths and weaknesses, recommendations, and emotion. The report is generated based on the analysis of the speech recordings by the machine learning model.
d) A Virtual assistant is integrated with the report generator to answer questions about the report. The virtual assistant can also provide additional information and support to the person with stammering.
2. According to claim 1, the data storage is a module that stores the speech recordings, the machine learning model, and the reports. The data is protected by a variety of security measures, such as encryption and access controls. The data security measures are regularly reviewed to ensure that they are effective.
3. According to claim 1, the user authentication is done using a variety of methods, such as passwords, biometrics, and two-factor authentication. The user authentication is strong enough to prevent unauthorized access to the system. The data collected by the system is kept confidential and is not shared with third parties without the user's consent.
4. According to claim 1, the system learns from the data that it collects and uses this data to improve the accuracy of the machine learning model and the quality of the reports. The continuous learning module is regularly updated with new data.
| # | Name | Date |
|---|---|---|
| 1 | 202341078532-REQUEST FOR EARLY PUBLICATION(FORM-9) [18-11-2023(online)].pdf | 2023-11-18 |
| 2 | 202341078532-FORM-9 [18-11-2023(online)].pdf | 2023-11-18 |
| 3 | 202341078532-FORM FOR STARTUP [18-11-2023(online)].pdf | 2023-11-18 |
| 4 | 202341078532-FORM FOR SMALL ENTITY(FORM-28) [18-11-2023(online)].pdf | 2023-11-18 |
| 5 | 202341078532-FORM 1 [18-11-2023(online)].pdf | 2023-11-18 |
| 6 | 202341078532-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [18-11-2023(online)].pdf | 2023-11-18 |
| 7 | 202341078532-EVIDENCE FOR REGISTRATION UNDER SSI [18-11-2023(online)].pdf | 2023-11-18 |
| 8 | 202341078532-EDUCATIONAL INSTITUTION(S) [18-11-2023(online)].pdf | 2023-11-18 |
| 9 | 202341078532-DRAWINGS [18-11-2023(online)].pdf | 2023-11-18 |
| 10 | 202341078532-COMPLETE SPECIFICATION [18-11-2023(online)].pdf | 2023-11-18 |