Abstract: The present invention relates to a multimodal system for speech disorder detection, communication assistance, and an apparatus thereof. The system (S) incorporates a user-wearable device (D) with an integrated sensor array (SA) comprising audio sensor, infrared camera (SAC), nasal (SNA) and oral (SOA) airflow sensors, and user-wearable optoelectronics sensors (SAO). Said system (S) analyzes abnormalities in speech generation-related aspects, including speech patterns, articulatory organ movement, and airflow in the oral and nasal cavities. Employing deep learning algorithms and computational linguistics techniques, system (S) enables real-time identification and classification of specific speech disorders. Said system (S) further incorporates a speech correction and personalized feedback module (SCA1.3A), practice module (SCA1.3) and user progress visualization and tracking module (SCA1.4). These modules support users (U), speech pathologists, and therapists in diagnosing and treating a broad spectrum of speech disorders. The system also incorporates a secure cloud-based user data storage facility for user progress monitoring.
Description:FIELD OF THE INVENTION
The present invention relates to a multimodal system for speech disorder detection, communication assistance, and an apparatus thereof. More particularly, the present invention discloses a system capable of assessing both speech patterns and articulatory organ movement using image, tactile, and audio sensors. By integrating deep learning algorithms and computational linguistics techniques, the system identifies and classifies a wide array of speech disorders. Said system provides tailored diagnostics and treatment for users.
BACKGROUND OF THE INVENTION
Speech sound disorders encompass a wide range of challenges related to perceiving, producing, and phonologically representing speech sounds and segments. This umbrella term includes difficulties in various aspects, such as the accurate perception of speech sounds, the precise motor production of speech sounds, and the understanding and application of phonotactic rules that dictate permissible sound sequences within a particular language. It is important to note that speech sound disorders can manifest in different ways, affecting individuals' ability to articulate and sequence sounds effectively.
There are various types of speech disorders such as various types of speech disorders, including articulation disorders, fluency disorders, voice disorders, apraxia of speech, dysarthria, and language-based disorders. Speech disorders are intricate conditions that affect individuals' ability to produce clear and intelligible speech, consequently influencing their communication skills, social interactions, and overall quality of life.
Articulation disorders involve phonetic errors resulting from difficulties in producing specific speech sounds or phonemes. These errors may manifest as substitutions, omissions, distortions, or additions of sounds, impeding speech intelligibility. This section explores the intricate nature of articulation disorders, highlighting the importance of accurate assessment and targeted intervention techniques.
Fluency disorders, notably stuttering, disrupt the natural rhythm and flow of speech, leading to communication breakdowns. This section provides an in-depth analysis of the etiology and subtypes of fluency disorders, emphasizing the multifactorial nature of stuttering. Evidence-based assessment protocols and intervention strategies are discussed, encompassing both behavioral and cognitive approaches to enhance fluency and minimize the psychosocial impact of these disorders.
Voice disorders encompass a wide array of conditions that impact the quality, pitch, and loudness of an individual's voice. This section explores the underlying causes of voice disorders, including vocal nodules, vocal cord paralysis, and vocal strain. Accurate diagnosis through comprehensive voice assessment techniques is highlighted, along with evidence-based interventions such as vocal hygiene, vocal therapy, and surgical interventions when necessary.
Apraxia of speech is a complex neurological disorder that affects an individual's ability to plan and execute the precise movements required for speech production. This section provides a detailed exploration of the underlying neural mechanisms and diagnostic challenges associated with apraxia of speech. Evidence-based treatment approaches, including principles of motor learning and augmented feedback, are discussed to aid clinicians in developing effective intervention plans.
Dysarthria is a motor speech disorder resulting from weakness or impairment of the muscles involved in speech production. This section delves into the various etiologies of dysarthria, including cerebral palsy, stroke, and degenerative diseases. A comprehensive examination of assessment techniques, differential diagnosis, and intervention strategies is provided, emphasizing the importance of a multidisciplinary approach for optimal outcomes.
Language-Based Disorders are not strictly speech disorders, language-based disorders significantly impact an individual's ability to comprehend and express language effectively. This section explores various language-based disorders such as aphasia, developmental language disorder, and specific language impairment. Diagnostic considerations, comprehensive language assessments, and evidence-based language interventions are discussed to facilitate accurate diagnosis and effective treatment planning.
A number of literatures have been published including patents and non-patents documents in said domain for identification of speech disorders. A patent literature with publication number CN112617755A, titled “ Speech dysfunction detection method, device, equipment, storage medium and system” published by ” Ma Yong , Ma Sai , Yang Jiandong , Wang Chengxing”, where said system merges audio and video prints of an individual to detect speech dysfunction using artificial neural network. However, systems lack user progress monitoring module, nasal and oral cavity airflow detection and other articulatory organ's movement tracking in order to identify and classify the associated speech disorders in a user.
Another patent literature with publication number AU2020343020A1, titled “Wireless Real-Time Tongue Tracking for Speech Impairment Diagnosis, Speech Therapy with Audiovisual Biofeedback, and Silent Speech Interfaces” published by Ghovanloo Maysam, Block Jacob, describes the system for focused on tongue tracking using magnetic tracer unit attached to the mouth temporarily or semi-permanent and audiovisual biofeedback mechanism for diagnosing speech impairments and improving speech therapy. However, the system only focused on speech impairments related to tongue movement. The system lacks multi feedback from nasal and oral cavity, and other articulators such as lips, neck and jaw movement, in order to identify associated speech disorders.
Another patent literature with publication number WO2023189379A1, titled “Articulation Disorder Detection Device and Articulation Disorder Detection Method” published by Akiho Sakurai, Takahiro Kamai, Katsunori Daimou, Tomoki Ogawa, Shogo Takahata, Seiki Nagao, Kazunori Kawami, describes system and method to identify specifically dysarthria using neural network based on speech signal analysis. The system lacks a real-time identification and classification system and apparatus to address a wide range of speech disorders.
Another patent literature with publication number IN202341004261A, titled “Deep Learning Based Pattern Analysis System for Detecting Speech Disorders in Children” published by Dr D Jeyakumari, Dr M Sundar Prakash Balaji, Dr P Valarmathi, Dr R Jayabharathy, Mr L Gowrisankar, Mrs C Sakunthala, Mrs D Ramya Cauvery, Mrs G Sharmila, S Balamurugan, describes a system using deep learning and digital processing techniques to accurately extracting features of an audio and its analysis for classification of speech disorders. However, the system only focuses on analyzing audio signals and lacks assessing various other parameters associated with speech disorders such as articulatory organ’s movements, nasal and oral airflow detection and jaw and neck movement.
Voice Disorder Identi?cation by using Machine
Learning Techniques
Another non patent literature, titled “Voice Disorder Identification by Using Machine Learning Techniques” published in March 2018, by Laura Verde, Giuseppe De Pietro, Giovanna Sannino describes the voice disorder identification using machine learning. However, the system only focused on speech signal analysis and lacks addressing other counterparts for recognition of wide array of speech disorder.
Existing inventions in this field have not fully utilized analyzing the associated speech patterns, articulatory organ’s movement tracking and breathing related problems for detection for speech disorders identification and categorization.
In order to obviate the drawbacks in the existing state of the art, the present invention provides a hybrid system and apparatus by incorporating audio, visual, tactile sensor system, for accurate and real time identification and classification of broad range of speech disorders, to aid clinicians in diagnosis and treatment planning.
OBJECT OF THE INVENTION
In order to overcome the shortcomings in the existing state of the art, the objective of the present invention is to provide a multimodal system for identification and categorization of speech disorders.
Yet another objective of the invention is to provide wearable apparatus capable of integrating audio, video, oral and nasal airflow sensors.
Yet another objective of the invention is to provide a communication assisting system.
Yet another objective of the invention is to provide a system equipped with optoelectronic sensors to track articulatory organ movements including lower neck and jaw during speech, with the purpose of detecting any irregularities in their motion.
Yet another objective of the invention is to provide an apparatus incorporating high-definition infrared camera in order to track lip and tongue movement associated with speech with the purpose of detecting any irregularities in their motion.
Yet another objective of the invention is to provide an apparatus incorporating audio sensor to receive speech of the user, for analyzing the associated anomalies.
Yet another objective of the invention is to provide deep learning algorithm and computational linguistic technique-based anomalies detection in data collected via sensors.
Yet another objective of the invention is to provide a system for identifying and categorizing speech disorders associated with various languages.
Yet another objective of the invention is to provide a system with cloud storage facilities for securely storing user data, facilitating real time user progress tracking, and monitoring.
Yet another objective of the invention is to provide a wearable, portable, cost-effective apparatus for identification for broad spectrum of speech disorder.
SUMMARY OF THE INVENTION:
The present invention discloses a multimodal system for assessing and addressing speech disorders. Said system (S) provides a comprehensive solution for identifying and classifying various speech-related anomalies by integrating wearable device (D), sensor arrays (SA), and data analysis module (SCA1.2). The present invention includes a headset device (D) equipped sensor array (SA) with an extended mounting seat (EMS). Said system (s) employs sensor arrays (SA) strategically positioned on the extended mounting seat within the device (D). Said sensor array (SA) is combination of different type of sensors, capturing multiple aspects of speech including, audio for recording speech sounds, infrared camera (SAC) for intraoral mouth and lip movement monitoring. Also, system (S) utilizes oral and nostril airflow sensors for breathing related data collection. System (S) also incorporates optoelectronics sensor (SAO) based lower jaw and neck movement tracking for gathering vibrotactile data for sensing articulatory organ motion.
The sensor arrays (SA) ensure data collection during natural speech production. Said system (S) utilizes data analysis module (SCA1.2) including Convolutional Neural Networks (CNN) for lip and/or mouth movement analysis using edge detection and texture mapping. Recurrent Neural Networks (RNN) in present system (S) utilized for nostril and oral airflow data analysis to assess respiratory patterns, also for lower jaw and neck movement analysis to track articulatory organ motion.
Both neural networks (CNN and RNN) analyze the extracted features from sensor data to detect anomalies and categorize them into specific speech disorders. Based on the identified anomalies, said system (S) provides personalized feedback using visual and audio cues. Said system (S) utilizes speech correction module (SCA1.3A), which helps system (S) to pinpoint issues like pronunciation errors, fluency problems, or vocal strain. Said system (S) then tailors a treatment plan incorporating repetition-based exercises and targeted practice sessions which delivered via personalized feedback and practice module (SCA1.3). These sessions aim to enhance muscle memory and promote accurate articulation.
Said system (S) employs an advanced adaptation mechanism that personalizes therapy experience. System (S) analyzes user progress by utilizing user progress visualization and tracking module (SCA1.3), adjusts the complexity of exercises, and modifies feedback modalities accordingly. This dynamic approach ensures the therapy remains aligned with the user's evolving capabilities and goals, fostering a tailored and responsive rehabilitation environment.
The present system (S) incorporates encryption based secure cloud data storage facility to safeguard user’s personal data, and telehealth functionality to help users to connect with specialized healthcare professionals.
Furthermore, the present system’s architecture is designed to accommodate the integration and application of a diverse array of machine learning and deep learning algorithms. Such adaptability enhances the system's capability to accurately diagnose and treat speech disorders but also broadens its applicability across a spectrum of languages and dialects, and global utility.
BRIEF DESCRIPTION OF DRAWINGS
Figure 1 depicts system architecture for speech disorders identification and classification.
Figure 2 depicts sensor array for auditory, visual, and tactile feedback for an articulatory movement, speech pattern and somatosensory analysis.
DETAILED DESCRIPTION OF THE INVENTION ILLUSTRATIONS AND
EXAMPLES
While the invention has been disclosed with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made, and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt to a particular situation or material to the teachings of the invention without departing from its scope. Throughout the specification and claims, the following terms take the meanings explicitly associated herein unless the context clearly dictates otherwise. The meaning of “a”, “an”, and “the” include plural references. Additionally, a reference to the singular includes a reference to the plural unless otherwise stated or inconsistent with the disclosure herein.
Table 1: Legend of Reference numerals
Serial no. Item description Reference numerals
1. Wearable headset device D
2. System S
3. Sensor Array SA
4. Extended arm EMS
5. Infrared camera SAC
6. Oral airflow sensor SOA
7. Nasal airflow sensor SNA
8. Piezoelectric sensor SAP1
9. Thermocouple sensor SAT1
10 Optoelectronic adhesive sensor SAO
11. Optoelectronics neck movement sensor SAO1.1
12. Optoelectronics lower jaw movement sensor SAO1.2
13. Convolution Neural Network CNN
14. Recurrent Neural Network RNN
15. User U
16. Communication unit CU
17. Communication assistant application SCA
18. User interface SCA1.1
19. Data analysis module SCA1.2
20. Practice module SCA1.3
21. ASR module SCA1.31
22. NLP module SCA1.32
23. Text-to-speech module SCA1.33
24. Voice embedding module SCA1.34
25. User progress visualization and tracking module SCA1.4
26. Telehealth module SCA1.5
27. Cloud storage module SCA1.6
28. Audio sensor SA1
29. Speech correction module SCA1.3A
The present invention provides a multimodal system (S) with a wearable headset (D) and communication unit (CU) as shown in Figure 1, for the identification and classification of a broad spectrum of speech disorders. Said headset (D) features an adjustable extended arm (EMS) attached to the said headset (D), positioned closer to the user's mouth, integrating a sensor array (SA) to collect data related to speech generation. Said sensor array (SA) comprises sensors such as an audio sensor (SA1), a camera (SAC) for visual lip movement analysis, and nasal (SNA), oral (SOA) airflow monitoring sensor. Said sensor array (SA) further comprises wearable optoelectronics sensor (SAO) to capture user’s lower jaw and neck muscle activity. Figure 1 provides a visual representation of this multi-sensor speech data collection and analysis system.
The present invention architecture comprises several key components:
? Sensor-array based data collection: The system (s) gathers data using specialized sensor arrays (SA) that capture nuances in speech patterns, vocal characteristics, and articulatory organ’s movement.
? Feature extraction and data analysis: Present system (S) is capable of adopting Flexible Algorithmic Framework, meaning the architecture of said system (S) is designed to accommodate the integration and application of a diverse array of machine learning and deep learning algorithms. While present system (S) employing Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) for data analysis and feature extraction, is not limited to these specific algorithms, for data collection via sensor array (SA). These features serve as the foundation for subsequent analysis. Feature extraction includes lip movement analysis, using Edge detection and Texture mapping, nostril and oral airflow data analysis, and movement analysis of lower jaw and neck. Audio pitch analysis, frequencies, and voice quality metrics analysis.
? Disorder identification, classification, and prediction: Neural networks analyze the extracted features to detect anomalies and classify them into type of disorder based on the analysis. Predictive models assess the progression of these disorders over time.
? Speech correction and personalized feedback: System (S) analyzes inconsistencies or errors in user’s speech by utilizing automatic speech recognition (SCA1.31), natural language processing (SCA1.32) and speech synthesizer using voice embedding technology. Personalized feedback includes visual, audio or textual cues highlight specific pronunciation errors, fluency issues, or vocal strain related issues and provide communication assistance to user (U).
? Targeted practice-based user progress visualization and tracking: Targeted practice sessions are generated based on the identified speech disorder, helps users (U) to enhance their muscle memory and to promote accurate articulation.
? Telehealth functionality: Users (U) may consult speech therapists or healthcare professionals remotely. Telehealth sessions address specific challenges, provide expert guidance, and adjust therapy plans based on individual needs.
The present system (S) comprises a three-part sensor array (SA) integrated on a wearable device (D), system architecture is depicted in Figure (2), representing the data collection via sensor arrays (SA) mounted on extended mounting seat (EMS). The said three-part sensor-array (SA) includes audio sensor (SA1) integrated to extended mounting seat (EMS) adjustably positioned closer to mouth to collect audio data. Said three-part sensor-array (SA) comprises a camera (SAC) for detection and data collection of articulatory organs movement which may be indiscernible to the naked eye. The said articulatory organs included but not limited to lips, tongue and mouth. The said lip-movement data collection is performed via camera (SAC) mounted on said extendable mounting seat (EMS), wherein the said camera (SAC) is infrared camera with features included but not limited to high-definition recording, anti-fogging capabilities using hydrophobic coating, with recording rate at least 120 frames per second.
The said three-part sensor-array (SA) comprises optoelectronics sensors (SAO) which are an adhesive reflective sensor that are placed at critical articulatory points. Said articulatory points are selected from the lower jaw and neck. The said infrared camera (SAC) measures displacement and subtle movements of said adhesive sensors placed at said critical articulatory points with high precision.
The three-part sensor-array (SA) comprises oral (SOA) and nasal airflow (SNA) sensor, where sensors are selected from piezoelectric (SAP1) and thermocouple (SAT1) sensor and mounted on extended arm (EMS). The said nasal airflow (SNA) sensors measure the nasal airflow pressure and rate. The term “thermocouple sensors” (SAT1) used in this specification refers to a sensor capable of measuring temperature differentials associated with airflow, providing valuable information about the thermal characteristics of the exhaled air. The term “piezoelectric sensor” (SAP1) used in this specification refers to sensors capable of capturing subtle vibration and pressure changes in airflow, allowing for a comprehensive analysis of airflow rates.
The second phase of said system (S) focuses on extracting essential features from sensor data. Said wearable headset device (D) offers flexible connectivity options with communication unit (CU) supporting both wireless and wired configurations. The term “Communication Assistant Application” or “dedicated application” (SCA) used in this specification refers to a computer program in the form of application installed on communication unit (CU). The term “Communication unit” used in this specification refers to tablet device or smartphone or any smart devices with display and similar computing capabilities. Said dedicated application (SCA) further comprises user interface (UI) for the system (S), allowing users to receive personalized audio, video-based exercise recommendations for their identified speech disorder. Said user interface (UI) incorporates user (U) progress visualization and tracking based on said exercise result and telehealth functionality for remote consultations with healthcare professionals.
The data analysis module (SCA1.2) is incorporated into said communication assistant application (SCA) installed on said communication unit (CU). Said data analysis module (SCA1.2), employs deep learning algorithms and computational linguistics techniques to assist in real-time identification, categorization, and diagnosis of speech disorders by analyzing the data received via sensor array (SA), in various language. Said data analysis module (SCA1.2) is designed for flexibility, allowing it to adapt to and integrate various machine learning and deep learning algorithms as classifiers. While present system (S) utilizing Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) for identification and/or classification, this adaptability ensures the system's ability to incorporate future advancements in algorithms for even more effective speech disorder identification.
The term “Recurrent Neural Network” (RNN) used in this specification refers to a classifier that analyzes and process signals received via optoelectronics sensors (SAO), nasal airflow sensor (SNA), oral airflow sensors (SOA). Said Recurrent Neural Network (RNN) is trained using a dataset comprising feature vectors derived from the sensor data of individuals with normal speech patterns and those with various speech disorders.
Said Recurrent Neural Network (RNN) takes inputs from two sensors. The first is an optoelectronic (SAO) movement analysis system, which includes lower jaw optoelectronics sensor (SAO1.1) and neck (SAO1.2) optoelectronics sensor as shown in figure (1). The term “optoelectronics sensor” (SAO) used in this specification refers to adhesive sensors placed at critical articulatory points such as the lower jaw (SAO1.1) and neck (SAO1.2). The said sensors (SAO) reflect infrared light emitted by said camera (SAC), which is then used to measure the displacement of these points with high precision, capturing the subtle movements of the lower jaw and neck as the individual speaks.
The second is a nasal (SNA), oral (SOA) airflow sensor which includes series of thermocouples (SAT1) and piezoelectric (SAP1) sensors to measure temperature differentials associated with airflow and also capture subtle vibrations, pressure changes, respectively. This provides a comprehensive analysis of nasal and oral airflow rates and velopharyngeal valve dynamics.
Said Recurrent Neural Network (RNN) preprocess and extract the features of said data collected from the said optoelectronics sensors (SAO), nasal (SNA) and oral airflow (SOA) sensors. Said preprocessing may involve converting the data into a suitable format for analysis. This involves steps such as noise reduction where unwanted disturbances or interferences in the data are removed. Techniques such as filtering or smoothing can be used to achieve this. After said noise reduction, the normalization step is performed, which is a process of adjusting the values measured on different scales to a common scale. This process of normalization ensures that no particular feature dominates others due to its numerical values, which can bias the learning algorithm.
Said preprocessed data is then subjected to feature extraction which involves transforming the raw sensor data into a set of features that effectively capture the underlying patterns in the data. For example, the displacement measurements from the optoelectronic sensors (SAO) can be used to derive features related to the speed, acceleration, and rhythm of the articulatory movements. Similarly, the airflow and pressure measurements from the nasal-oral airflow sensor can be used to derive features related to the rate and intensity of airflow, as well as the opening and closing dynamics of the velopharyngeal valve.
Recurrent Neural Network (RNN) is trained using a dataset comprising feature vectors derived from the of individuals with normal speech patterns and those with various speech disorders. Once preprocessed data is fed to trained RNN, it can be used to analyze the sensors data in real-time. The features from each time step are extracted and fed into the Recurrent Neural Network (RNN). The said extracted features are compared in RNN with the expected patterns learned during training. Any significant difference or anomalies are flagged for further analysis. The detected discrepancies are classified based on their similarity to the patterns associated with different types of speech disorders in the training data. This involves calculating the distance between the anomaly and each class in the feature space and assigning the anomaly to the class with the smallest distance. This is typically done using a method called nearest neighbor classification. Finally, the said system outputs the type of speech disorder based on the classification of the anomaly. Said output can then be used for further analysis or intervention.
Said system (S) employs Convolution Neural Network (CNN). The term “Convolution Neural Network (CNN)” used in this specification refers to classifier used for data analysis and feature extraction specifically for image-based tasks. Said Convolution Neural Network (CNN) analyzes mouth and lip movement in video captured via said infrared camera (SAC). Said camera (SAC) records the mouth and lip movements, and each frame of the video is digitized and stored for further processing. Said digitized frames are preprocessed to enhance the features of interest and suppress the irrelevant ones. This involves noise reduction, normalization, and image enhancement. The frames are then converted into a suitable format selected from but not limited to grayscale or binary images for feature extraction.
The preprocessed frames undergo texture mapping and edge detection. The term “texture mapping” used in the specification in the context of the present invention refers to analyzing the spatial distribution of intensity variations or color information in the image, which can highlight the subtle movements of the lips. The term “edge detection” used in this specification in the context of the present invention refers to the process of identifying the boundaries of the lips by locating the points in the image to identify their sharp movements and position displacement. These features are then quantified into a feature vector for each frame.
Said extracted features are fed to said Convolution Neural Network (CNN), so it can be used to monitor lip movements in real-time. Said extracted features of each frame are compared in CNN with expected patterns learned during the training. Any significant difference, or anomalies, is flagged for further analysis. The detected discrepancies are classified based on their similarity to the patterns associated with different types of speech disorders in the training data. This involves calculating the distance between anomaly and norms. Each class in the feature space and assigning the anomaly to the class with the smallest distance. Finally, said system (S) outputs the type of speech disorder based on the classification of the anomalies. This output can then be used for further analysis or intervention.
Said system (S) collects Audio data via audio sensor (SA1) and analyzed this to find irregularities in speech patterns, pitch or frequency of the voice, intensity of the sound. Prior to analysis, the said audio data undergoes noise filtering to remove background noise. The resulted audio data is digitized and stored in a standardized format for analysis and compared in Convolution Neural Network (CNN), which is trained with audio data of healthy person and speech disorder suffering patients. Label the audio with either “normal” or speech disorder class, based on analysis result.
The present invention utilizes a multimodal fusion approach for classification of identified speech disorders. Said approach combines outputs of said classifiers, a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN). Said outputs are fused with techniques such as feature level or decision level technique, depending on the desired emphasis on comprehensive features or reconciling model interpretations. Said fused output then serves as an input for a separate classifier, selected from but not limited to neural network, or traditional classifier like Support Vector Machine (SVM), decision tree, or ensemble methods, chosen based on its effectiveness in handling complex, multimodal data. This approach prioritizes accuracy, sensitivity, and specificity in speech disorder classification while maintaining the flexibility to incorporate future advancements in algorithms.
Following speech disorder classification, said system (S) utilizes speech correction and personalized feedback module (SCA1.3A) which further comprises:
? Automatic Speech Recognition (ASR) modules (SCA1.31),
? Natural Language Processing (NLP) module (SCA1.32),
? Text-to-Speech (TTS) or speech synthesizer module (SCA1.33) and
? Voice embedding module (SCA1.34).
Said system (S) employs a speech correction module (SCA1.3A) to identify inconsistency or errors or corrections needed to provide feedback to the user (U) for improvement. These corrections may involve phonetic correction which means recommending correct pronunciation for specific words based on the identified articulation errors. Grammatical correction which may include suggesting appropriate sentence rephrasing or word substitutions to address grammatical errors. Semantic correction which may offer alternative wording or sentence structures to improve clarity and avoid potential misunderstandings.
The term “Personalized feedback” (SCA1.3A) used in this specification, refers to a module that provides feedback, tailored to the user's specific speech disorder, aims to enhance communication clarity. This feedback might be in various forms, including visual cues, audio prompts, or text suggestions to enhance communication clarity.
To provide corrected personalized feedback said system (S) utilizes an Automatic Speech Recognition module (SAC1.32) and Natural Language Processing (SAC1.32) module, which works synergistically to transform raw audio data into a rich linguistic representation, enable said system (S) to effectively analyze, identify speech patterns and potential speech disorders to generate user (U) targeted practice session.
The term “Automatic Speech Recognition” or “ASR module” (SAC1.31) module used in this specification refers to a module capable of converting live audio data into text by processing an audio signal through several key stages. Firstly, audio features are analyzed through feature extraction process, in which relevant characteristics like pitch and energy are extracted. Next, a built-in acoustic model compares these features to pre-existing databases of sounds and phonemes (basic units of speech) to identify the most likely spoken sounds. Then, an additional built in language model analyzes the sequence of sounds, considering grammar and context to predict the most probable word sequence. Finally, a decoder translates the most likely word sequence into written text.
Said text output, in the form of a word sequence, from the ASR module (SAC1.31) is subsequently fed into said Natural Language Processing (SAC1.32) module. The term “Natural Language Processing” or NLP module (SAC1.32) used in the specification refers to module capable of performing speech analysis which includes phoneme recognition, syntax analysis, extracting unusual phoneme combinations, grammatical errors, or semantic inconsistencies. Said NLP module (SAC1.32) adds a crucial layer of linguistic analysis to the user's (U) speech, delving deeper than the raw audio information. This analysis encompasses various tasks, including:
? Phonetic and Pronunciation Matching: said user's (U) pronunciation of individual phonemes is compared against a reference database or user-specific pronunciations for similarity evaluation. This step aims to identify potential articulation errors.
? Morphological Analysis: Speech segmentation techniques are employed to divide the speech into individual words. Subsequently, morphological analysis examines word formation patterns (e.g., prefixes, suffixes). This analysis helps identify patterns related to specific speech disorders.
? Syntax Analysis: Said NLP module (SAC1.32) parses the sentence structure to understand the grammatical relationships between words and identify potential errors using techniques like Constituency Parsing or Dependency Parsing. Deviations from typical syntax can be indicative of specific language-based speech disorders.
? Semantic Analysis: This component aims to understand the meaning of words and sentences, including homonyms and discourse coherence, utilizing techniques like Word Sense Disambiguation (WSD) and topic modeling. Analyzing the meaning helps identify potential issues with understanding or expressing ideas, contributing to the detection of specific speech disorders.
Said analysis helps NLP module (SCA1.32) to gather insight into articulation, grammatical and other relevant errors in speech. Based on analysis, said module (SCA1.32) extracts specific information about the identified errors, such as incorrectly pronounced phonemes, grammatical mistakes, or ambiguous sentence structures. Using this information, said module (SCA1.32) helps to generate user specific feedback and disorder targeted practice sessions.
The present system (S) incorporates a text-to-speech module (SCA1.33). The term “text-to-speech module” or “speech synthesizer module” (SCA1.33) used in this specification refers to a module which converts generated text (phonemes) into natural-sounding speech. Said speech synthesizer (SCA1.34) analyzes the phonemes and adjusts factors like pitch, speed, and prosody (the rhythm and emphasis of speech), through which synthesizer aims to mimic natural human speech. For more realistic user (U) experience the said system (S) incorporates voice embedding (SAC1.34) module. The terms “voice embedding” used in this specification refers to personalized, synthesized speech by incorporating the user's unique voice characteristics. Said technology captures and securely saves user's unique voice characteristics such as pitch, accent, intonation, and speaking style, received via audio sensor (SA1) and incorporates them into said synthesized speech output. This personalization allows the generated speech to sound closer to the user's own voice.
Said system (S) employs a real-time practice module (SCA1.2) integrated within the communication assistant application (SCA) installed on the communication unit (CU). These sessions aim to enhance muscle memory and promote accurate articulation by generating user’s disorder specific practice session and records progression over time.
To visualize and track user progress based on user’s performance on targeted practice session, said system (S) additionally employs a user progress visualization and tracking module (SCA1.4). The term “user progress visualization and tracking module (SCA1.4)” used in this specification refers to a module that represents user progress in a graphical format, such as bars or charts, based on the targeted practice sessions delivered through the personalized feedback and practice module (SCA1.4). This graphical representation allows users to monitor their progress in addressing their speech disorder.
For user data security, said system (S) utilizes cloud storage module (SCA1.6). Additionally, said system (S) facilitates Telehealth module (SCA1.5). The term “Telehealth module” (SCA1.5) used in this specification refers to module integrated within communication assistant (SCA) installed on communication unit (CU). Said module capable of providing a timely intervention from healthcare professional, when user experiences significant difficulty in progressing, or encounters severe speech disorder symptoms. This allows the user to connect with a specialized healthcare professional for further assessment and treatment.
Said System (S) employs cloud storage facility (SCA1.6) with advanced encryption methodologies and strictly adhering to international data protection standards, including but not limited to HIPAA and GDPR. Said storage facility (SCA1.6) provides secure handling of user (U) data. Furthermore, robust user consent mechanisms are in place, ensuring that users retain control over their personal information, thereby upholding ethical standards and user trust.
, C , Claims:WE CLAIM:
1. A multimodal system (S) for speech disorder detection, communication assistance, comprising:
? user (U) wearable headset (D)
? sensor array (SA) integrated into extended arm (EMS) of said user wearable headset (D);
? optoelectronics adhesive sensors (SAO);
? communication unit (CU) in communication with said user-wearable headset (D);
? communication assistant application (SCA) capable of being installed on said communication unit (CU) by said user (U),
wherein said communication assistant application (SCA) comprises of:
? user interface module (SCA1.1)
? data analysis module (SCA1.2)
? speech correction module and personalized feedback (SCA1.3A)
? practice module (SCA1.3)
? user progress visualization and tracking module (SCA1.4)
? telehealth module (SCA1.5)
? cloud storage module (SCA1.6)
wherein, said headset (D) and communication unit (CU) work in conjunction to provide a comprehensive and personalized solution for speech disorder detection, classification, personalized treatment, and communication assistance.
2. The system as claimed in claim 1, wherein said sensor array (SA) comprising:
? anti-fogging hydrophobic layer coated infrared camera (SAC) configured to record articulatory organs movement including but not limited to lip, mouth, lower jaw, and neck.
? thermocouple sensors (SAT1) to measure temperature variation of nasal and oral airflow.
? piezoelectric sensors (SAP1) to measure subtle vibrations and pressure changes in oral and nasal airflow.
? audio sensor (SA1) to record audio of the said user.
3. The system of claim 1, wherein said optoelectronics adhesive sensors (SAO) are wearable and is configured to gather movement data from critical articulatory organs of said user (U), organs including, but not limited to lower jaw and neck.
4. The system as claimed in claim 1, wherein said communication unit (CU) is a dedicated device including but not limited to smart mobile or tablet or any other similar devices with display and data processing capabilities.
5. The system as claimed in claim 1, wherein said sensor array (SA) collects multisensory data including audio, video, nasal, oral airflow, and articulatory organs movement.
6. The system as claimed in claim 1, wherein said communication assistance application (SCA) receives sensors’ data via user wearable device (D) for feature extraction.
7. The system as claimed in claim 5, wherein said data analysis modules (SCA1.2) utilizes classifier selected from but not limited to Convolution Neural Network (CNN) and Recurrent Neural Network (RNN).
8. The system as claimed in claim 7, wherein said Convolution Neural Network is configured to analyze mouth and lip movement in video captured via said infrared camera (SAC) for data analysis and feature extraction.
9. The system as claimed in claim 7, wherein said Recurrent Neural Network (RNN) is configured to analyze and process signals received via optoelectronics sensors (SAO), nasal airflow sensor (SNA), oral airflow sensors (SOA).
10. The system as claimed in claim 1, wherein said data analysis module (SCA1.2) is configured to merge outputs from said convolutional neural network (CNN) and Recurrent Neural Network (RNN) to identify and classify user’s speech disorder.
11. The system as claimed in claim 1, wherein said data analysis module (SCA1.2) is configured to classify detected disorder into its category.
12. The system as claimed in claim 1, wherein said speech correction and personalized feedback module (SCA1.3A) comprising:
? automatic speech recognition module (SAC1.31)
? natural language processing module (SCA1.32)
? text to speech module (SCA1.33)
? voice embedding module (SCA1.34)
13. The system as claimed in claim 12, wherein said Automatic speech recognition module (SCA1.31) is configured to receive audio data via audio sensor and convert it into text format.
14. The system as claimed in claim 12, wherein said natural language processing module (SCA1.32) is configured to analyze output of said Automatic speech recognition (SCA1.31) and gather inconsistencies in phoneme, syntax, grammar, or sentence semantics.
15. The system as claimed in claim 12, wherein said voice embedding module (SCA1.34) is configured to gather user’s (U) unique voice characteristics, including pitch, accent, intonation, and incorporates them into the synthesized speech output.
16. The system as claimed in claim 12, wherein said text to speech (SCA1.33) module provides synthesized voice output by utilizing said voice embedding modules (SCA1.34) features for providing corrected speech feedback to said user (U).
17. The system as claimed in claim 1, wherein said practice module (SCA1.4) is configured to provide user’s disorder-based practice session and capable of adjusting the complexity of practice session.
18. The system as claimed in claim 1, wherein said user progress monitoring and tracking modules (SCA1.4) configured to represents user’s progress in a graphical format, such as bars or charts, based on the performance on targeted practice sessions provided via personalized feedback and practice module (SCA1.4).
19. The system as claimed in claim 1, wherein the telehealth module (SCA1.5) enables remote consultations with speech therapists for user-specific guidance and therapy adjustments.
20. The system as claimed in claim 1, wherein said cloud storage module (SCA1.6) enables said system (S) to securely store user’s personal data remotely.
21. The system as claimed in claim 20, wherein said cloud storage facility employs advanced data privacy and security protocols, including adherence to international data protection regulations and robust user consent mechanisms, to ensure the ethical management and security of sensitive health information.
22. An apparatus for tracking, receiving, and collecting data related to articulatory organs’ movement, involved in speech production comprising:
? user (U) wearable headset (D);
wherein said user (U) wearable headset (D) capable of integrating sensor array (SA) on an extended arm (EMS) adjustably attached to said headset (D).
characterized in that,
wherein said sensor array (SA) comprising:
? audio sensor (SA1) for capturing speech audio data,
? infrared camera (SAC) for capturing continuous visual data related to lip and mouth movements,
? nasal airflow sensor (SNA) for capturing data on nasal airflow during speech, and
? oral airflow sensor (SOA) for capturing data on oral airflow during speech,
wherein said sensor array (SA) integrated to extended arm (EMS) configured to adjustably position closer to said user’s lips, mouth, and nose for data collection.
wherein said apparatus is capable for identification and categorization of broad spectrum of speech disorder in various languages and dialects.
| # | Name | Date |
|---|---|---|
| 1 | 202441036903-STATEMENT OF UNDERTAKING (FORM 3) [10-05-2024(online)].pdf | 2024-05-10 |
| 2 | 202441036903-FORM FOR SMALL ENTITY(FORM-28) [10-05-2024(online)].pdf | 2024-05-10 |
| 3 | 202441036903-FORM 1 [10-05-2024(online)].pdf | 2024-05-10 |
| 4 | 202441036903-FIGURE OF ABSTRACT [10-05-2024(online)].pdf | 2024-05-10 |
| 5 | 202441036903-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [10-05-2024(online)].pdf | 2024-05-10 |
| 6 | 202441036903-EDUCATIONAL INSTITUTION(S) [10-05-2024(online)].pdf | 2024-05-10 |
| 7 | 202441036903-DRAWINGS [10-05-2024(online)].pdf | 2024-05-10 |
| 8 | 202441036903-DECLARATION OF INVENTORSHIP (FORM 5) [10-05-2024(online)].pdf | 2024-05-10 |
| 9 | 202441036903-COMPLETE SPECIFICATION [10-05-2024(online)].pdf | 2024-05-10 |
| 10 | 202441036903-FORM-9 [22-05-2024(online)].pdf | 2024-05-22 |
| 11 | 202441036903-FORM 18 [22-05-2024(online)].pdf | 2024-05-22 |
| 12 | 202441036903-Proof of Right [10-06-2024(online)].pdf | 2024-06-10 |
| 13 | 202441036903-FORM-26 [10-06-2024(online)].pdf | 2024-06-10 |
| 14 | 202441036903-ENDORSEMENT BY INVENTORS [10-06-2024(online)].pdf | 2024-06-10 |