A System And Method For Providing Speech Training

< Back

A System And Method For Providing Speech Training

Abstract: A system and method for providing speech training is disclosed wherein a handheld device is utilized for providing speech training. The handheld device is present with users undergoing speech training. The system includes first input module and second input module which cooperate with the handheld device to receive voice signals corresponding to user's articulation and video signals corresponding to facial expressions of the user respectively. The system further includes a repository which stores signals corresponding to ideal articulations having ideal characteristics. The system facilitates comparison of the ideal characteristics with the characteristics corresponding to voice signals that represent user's articulation and subsequently provides a recommendation based on the comparison. The recommendation provided by the system includes at least one graphical representation depicting the comparison between characteristics corresponding to ideal articulation and characteristics corresponding to voice signals that represent user's articulation.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

27 May 2011

Publication Number

49/2012

Publication Type

INA

Invention Field

ELECTRONICS

Status

dewan@rkdewanmail.com

Parent Application

Patent Number

Legal Status

Grant Date

2023-03-20

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LTD

NIRMAL BUILDING, 9TH FLOOR,NARIMAN POINT, MUMBAI 400 021,MAHARASHTRA,INDIA

Inventors

1. PANDE ARUN

TCS INNOVATION LAB,SDC-5,ODC-G, TATA CONSULTANCY SERVICES,YANTRA PARK, OPP.VOLTAS HRD TRG.CENTER,SUBHAS NAGAR, POKHARAN ROAD 2,THANE 400601,MAHARASHTRA,INDIA

2. KOPPARAPU SUNIL KUMAR

TCS INNOVATION LAB,SDC-5,ODC-G, TATA CONSULTANCY SERVICES,YANTRA PARK, OPP.VOLTAS HRD TRG.CENTER,SUBHAS NAGAR, POKHARAN ROAD 2,THANE 400601,MAHARASHTRA,INDIA

3. PANDEY VINOD KUMAR

TCS INNOVATION LAB,SDC-5,ODC-G, TATA CONSULTANCY SERVICES,YANTRA PARK, OPP.VOLTAS HRD TRG.CENTER,SUBHAS NAGAR, POKHARAN ROAD 2,THANE 400601,MAHARASHTRA,INDIA

Specification

FORM-2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2006
COMPLETE SPECIFICATION
(See section 10 and rule 13) A SYSTEM AND METHOD FOR PROVIDING SPEECH TRAINING
TATA CONSULTANCY SERVICES LTD.
an Indian Company of Nirmal Building, 9th floor, Nariman Point, Mumbai 400021, Maharashtra, India
The following specification particularly describes the invention and the manner in
which it is to be performed.

FIELD OF THE INVENTION
The present invention relates to the field of speech training. Particularly, the present invention relates to a system that facilitates analysis of speech and subsequently facilitates providing remote speech training to persons, in need thereof.
DEFINITIONS OF TERMS USED IN THIS SPECIFICATION
The term 'handheld device' in this specification relates to any portable device which has a display. Handheld devices for instance include laptops, mobile phones, iPADs, iPhones, business phones, cell phones or the like devices.
BACKGROUND OF THE INVENTION
A large number of people suffer from articulatory problems. Articulatory problems can be caused as a result of the presence of cleft lip, cleft palate, autism, cerebral palsy, hearing impairment, stuttering and the like. Children born with cleft palates normally suffer from articulatory problems and warrant special consideration and treatment. Persons with clefts in the lip or palate find it difficult to engage in fluent articulation of speech. Even after undergoing surgery to overcome speech related problems, these people are required to undertake speech training for certain amounts of time in order to acquire the ability to engage in proper articulation of speech.
Speech training is administered to such persons in order to provide them with proper articulatory skills and better articulatory abilities. A speech therapist administering speech training to such persons designs a dictionary which is a collection of syllables/ words/ sentences and subsequently instructs the person undergoing speech training to articulate those words. The speech therapist also makes sure that the person practices articulating displayed words.
One of the major requirements of administering speech training to people suffering from articulatory problems is that they are required to travel to hospitals or speech training centres at regular intervals of time. For people residing in remote geographic locations, travelling to hospitals and back on a regular basis might be troublesome and also

expensive. Moreover, people residing at remote geographic locations also encounter transportation related issues. If the hospital is in an area that is not well connected, then visiting such a hospital on a regular basis for the purpose of obtaining speech training can be cumbersome.
Another factor that needs to be taken into consideration is the dearth of qualified speech therapists. Lack of qualified speech therapists severely impacts the quality of speech training imparted to people suffering from articulatory problems. Lack of qualified speech therapists also means that people suffering from articulatory problems are forced to wait longer in order to undergo speech training. Due to the unevenness in the ratio of number of people suffering from articulatory problems and number of speech therapists, speech therapists are often required to provide speech training to a large number of people suffering from articulatory problems, within a very short span of time so as to be able keep up with and handle the inflow of people suffering from articulatory problems.
Since there is a shortage of qualified speech therapists and the number of people suffering from articulatory problems is on the rise, speech therapists who are qualified enough to handle such people are over burdened. Since speech therapists are forced to work long hours, there is a possibility that they might not be able to dedicate sufficient amount of time for every person that they provide speech training to. The occurrence of such a phenomenon directly affects the quality of the speech training a particular person is receiving and the efficiency of the speech therapists.
There have been various attempts in the prior art towards developing a system that would provide speech therapists with the option of conducting speech training sessions online.
Great Britain Patent 1588217 teaches a method for assisting in speech training. The method in accordance with this Patent involves detecting speech sounds of a person and producing an audibly reproducible recording of the speech sounds. The recorded speech sounds are further mapped on to a substantially permanent graphic record in terms of variations in the speech sounds with respect to time. Based on the mappings, nasal

airflow concurrent with the speech sounds is determined and subsequently mapped onto a substantially permanent graphic record in terms of variations in the nasal airflow. The two graphic records mentioned above provide a visual record of the nasal airflow concurrent with the speech sounds, as well as an audible record of the speech sounds. The approach envisaged by the above mentioned patent is not automated and necessitates physical presence of a speech therapist, close to the person undergoing speech training.
WIPO Patent Application WO0239423 teaches providing a computer-based interactive speech training service to people with speech related problems. The system, in accordance with this Patent Application can be installed on a multimedia computer with a sound card and a headset or on a network server to which the multimedia computer is connected. The system acquires descriptive data about a person's articulatory problem and automatically detects defects of articulation on the basis of speech analysis of the person. The system envisaged by this Patent Application performs automatic diagnosis of the articulatory problem. The system also provides information about adequate means of training or schedule of necessary speech training apart from providing for storing of the person's speech samples and retrieval of previously stored speech samples. However, the system envisaged by this Patent Application restricts itself to remedying only a stuttering problem and can not be utilized to treat people suffering from the presence of a cleft lip or a cleft palate.
United States Patent 7031922 teaches a method for enhancing the fluency of persons who stutter while speaking. The method displays (on frames of eyeglasses) visual speech gestures on a display device while a person having a stuttering or speech impediment is speaking, so that the person is able to visually perceive the articulatory movements of the person's mouth provided on the display and improve the way he/she speaks. However the system envisaged by this patent is restricted to remedying a stuttering problem and can not be extended to provide speech training to people suffering from the presence of a cleft lip or a cleft palate.
United States Patent 6971993 teaches a method for utilizing oral movements in speech

assessment, speech training, language development and controlling external devices. This US Patent teaches using a device having a sensor plate which detects the point of contact of the tongue with the sensor plate. The system in accordance with this Patent provides for visual representations of contact of the tongue and the palate during speech and comparing the representations with model representations displayed in a split screen fashion. The model representations may be generated by another speaker or can be generated by a computer. The model representations are capable of being electronically stored and retrieved. This US Patent makes use of a palatometer for the purpose of providing speech training and speech assessment and this system can not be utilized to administer speech training to a person who is at a remote or far away destination .
Canadian Patent Application 2519755 discloses a device for determining occurrence of voice abuse. The device senses the voice signal of a person via a microphone and background noise via a belt at waist. The voice signal of the person is separated from the background noise using a low power, high CMRR (Common Mode Rejection Ratio) instrumentation amplifier. The sensed signal is thereafter passed onto a microcontroller where the signal energy of the corresponding sensed signal is calculated. The device provides physicians with adequate information necessary for proper diagnosis, provided that the person undergoing training wears the device continuously for several days. A physician can retrieve information stored in the device by connecting the device to the serial port of a computer. The device envisaged by this Patent Application is restricted to determining the voice abuse based on comparing the voice signal energy with a predefined threshold and can not be utilized to provide and administer speech training to people suffering from articulatory problems.
United States Patent Application 2009138270 provides speech training to a learner. The method envisaged by this Patent Application includes receiving a speech signal at a computing system which corresponds to an utterance made by the learner. A set of parameters are ascertained from the speech signal. The parameters represent a contact pattern between the tongue and the palate of the learner during the utterance. For each parameter in the set of parameters, a deviation measure is calculated relative to a

corresponding parameter from a set of normative parameters characterizing a proper pronunciation of the utterance. An accuracy score for the utterance, relative to its ideal pronunciation, is generated from the deviation measure. The accuracy score is provided to the learner to visualize accuracy of the utterance relative to its ideal pronunciation. The system envisaged by this US Patent application is restricted to providing speech training to learners and can not be utilized to provide and administer speech training to people suffering from articulatory problems. Moreover this US Patent Application does not disclose providing speech training through a handheld device such as a mobile phone present with the person undergoing speech training.
US Patent 6732076 describes a system which presents a symbolic representation of a word, prompts the user to articulate the word represented by the symbol into a microphone which is in signal communication with a processor, enters a phonetic representation of the user pronunciation into the processor, automatically determines whether an error exists in the user pronunciation and if an error exists, automatically categorizes the occurred error. This method always requires entering a phonetic representation of the user pronunciation of the word into the computer by a speech therapist. The system envisaged by this US Patent does not disclose providing speech training through a handheld device such as a mobile phone that is present with the person undergoing speech training.
None of the above mentioned Patent documents disclose a system which can be utilized for:
• conducting remote speech training;
• Providing speech training to a person through a hand held device such as a GPRS(General Packet Radio Service) enabled mobile phone present with him/her;
• dynamically selecting and displaying the words to be articulated by a person during the course of speech training session, based on the articulatory improvements witnessed in the person;
• facilitating online as well as offline interaction between a speech therapist and the

person undergoing speech training;
• providing a speech therapist with the option of retrieving a person's surgical history, his/her speech and articulatory movements online; and
• providing speech training to a person in a language of his/her choice.
Hence, in order to over come the above mentioned shortcomings, there was felt a need for a system, a computer based system in particular, that can be utilized for providing and administering speech training to persons suffering from articulatory problems. There was a need for a system which did not require the people suffering from articulatory problems to make frequent trips to a hospital. There was a need for a system which provided speech training to people suffering from articulatory problems through a hand held device such as GPRS enabled mobile phone present with them.
OBJECTS OF THE INVENTION
It is an object of the present invention to provide a system for effectively administering speech training.
Another object of the present invention is to provide a system that can be utilized to administer speech training without requiring the person suffering from articulatory problems to personally visit the speech therapist.
Yet another object of the present invention is to make available a system which provides feedback pertaining to the improvement in articulatory abilities of the person undergoing speech training, online.
Further object of the present invention is to make available a system which can be utilized to provide speech training through a handheld device present with the person undergoing speech training.
Another object of the present invention is to provide a system which displays to the person undergoing speech training, the word to be articulated, in a language of his/her

choice.
Still, a further object of the present invention is to provide a system which facilitates real time communication as well as offline communication between the person undergoing speech training and speech therapist.
Yet another object of the present invention is to provide a system which provides visual feedback on the handheld device present with the person undergoing speech training.
Still further object of the present invention is to provide a system which facilities administration of speech training to people who are located in remote geographic locations.
Another object of the present invention is to make available a system which provides speech therapists with easy access to all the available surgical history, speech signals and video signals of each training session corresponding to a particular person undergoing speech training.
Yet another object of the present invention is to provide a system that helps enhance the productivity of speech therapists.
One more object of the present invention is to make available a system which allows speech therapists to decide upon the words to be displayed to a person for articulation, based on the progress in articulation of the person.
Another object of the present invention is to make available a system that automatically decides about the words to be displayed to a person for articulation, based on the progress achieved by the person in properly articulating words.
Yet another object of the present invention is to provide a system that makes it possible for speech therapists to easily transfer surgical history of users to experts and

subsequently seek their opinion.
Still further object of the present invention is to make available a system that provides speech training seekers with trouble-free and timely access to speech therapists.
SUMMARY OF THE INVENTION
In accordance with the present invention, there is provided a system for providing speech training to a user. The system, in accordance with the present invention includes the following components:
• first input module configured to receive signals corresponding to at least one articulation of the user;
• processing module configured to determine the characteristics of received signals;
• repository adapted to store signals corresponding to ideal articulations having ideal characteristics;
• comparison module adapted to perform comparison of the characteristics of the received signals with the corresponding ideal characteristics received from the repository; and
• recommendation module adapted to generate at least one recommendation based on the comparison and adapted to generate iteratively, based on the comparison, words, syllables or sentences in at least one language.
Typically, in accordance with the present invention, the characteristics include at least one characteristic selected from the group consisting of speech energy, pitch, speech format, speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off.
Typically, in accordance with the present invention, the repository further comprises at least one list of syllables, words, or sentences.
Typically, in accordance with the present invention, the recommendation module further

comprises a dictionary module adapted to receive and store said words, syllables or sentences in at least one language from the repository based on the comparison.
Typically, in accordance with the present invention, the system further comprises second input module to receive video signals of the facial expressions of the user that correspond to at least one articulation of the user.
Typically, in accordance with the present invention, the system includes a receiver module adapted to receive signals corresponding to ideal articulations and syllables, words, or sentences from the repository.
Typically, in accordance with the present invention, the system includes a hand held device adapted to cooperate with at least one module selected from the group of modules consisting of first input module, second input module, repository, recommendation module and receiver module.
Typically, in accordance with the present invention, the handheld device is provided with a graphical user interface and a display module for displaying syllables, words or sentences received from the repository and facial expressions corresponding to the at least one articulation of the user.
In accordance with the present invention, there is provided a system for providing speech training to a user. The system in accordance with the present invention includes the following components:
• first input module configured to receive voice signals corresponding to at least one articulation of the user;
• processing module configured to determine characteristics of the received voice signals;
• receiver module to receive signals corresponding to ideal articulations and a list of probable syllables, words, or sentences from the repository;
• comparison module to compare the characteristics of received voice

signals with the corresponding ideal characteristics received from the receiver module; and
• recommendation module to generate at least one recommendation based
on the comparison and the list of probable syllables, words, or sentences.
Typically, in accordance with the present invention, the system further comprises a graphical user interface and a display module for displaying syllables, words or sentences received from the repository and facial expressions corresponding to the at least one articulation of the user.
Typically, in accordance with the present invention, the recommendation module further comprises a dictionary module for storing and receiving syllables, words, or sentences in at least one language from the repository based on the comparison.
Typically, in accordance with the present invention, the recommendation module is further adapted to generate iteratively based on the comparison, words, syllables or sentences in at least one language.
Typically, in accordance with the present invention, the system further comprises a second input module to receive video signals of the facial expressions of the user corresponding to the at least one articulation of the user.
In accordance with the present invention, there is provided a method for providing speech training to a user. The method in accordance with the present invention includes the following steps:
• receiving signals corresponding to at least one articulation of the user;
• determining characteristics of the received signals;
• receiving a set of ideal characteristics corresponding to the received signals;
• comparing the characteristics of the received signals with the corresponding ideal characteristics; and

• generating at least one recommendation based on the comparison.
Typically, in accordance with the present invention, the method further comprises the step of storing in a dictionary module and retrieving therefrom syllables, words, or sentences for the purpose of comparison.
Typically, in accordance with the present invention, the method further comprises the step of receiving video signals of the facial expressions of the user corresponding to the at least one articulation of the user.
Typically, in accordance with the present invention, the characteristics include at least one characteristic selected from the group consisting of speech energy, pitch, speech format, speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The present invention will now be described with reference to the accompanying drawings, in which:
FIGURE 1 illustrates a first embodiment of the system for providing speech training to
users having articulatory problems;
FIGURE 2 illustrates a second embodiment of the system for providing speech training
to users having articulatory problems; and
FIGURE 3 illustrate a flowchart for a method of providing speech training to users
having articulatory problems.
DETAILED DESCRIPTION
The invention will now be described with reference to the accompanying drawings which do not limit the scope and ambit of the invention. The description provided is purely by way of example and illustration.

The present invention envisages a system and method for providing and administrating speech training to users suffering from articulatory problems. The present invention makes it possible for speech training to be administered to even those users who reside in remote geographic locations. The present invention allows users to undergo speech training without visiting a hospital and personally consulting a speech therapist. The system in accordance with the present invention is adapted to provide speech training to users through a handheld device such as a GPRS (General Packet Radio Service) enabled mobile phone present with them. The present invention does not require the user to wear any articulatory sensors or any other special device for the purpose of obtaining speech training.
The system, in accordance with the present invention captures a user's articulation or speech prior to the user undergoing surgery to correct his/her articulatory problem, using a handheld device such as a GPRS enabled mobile phone. Through the handheld device. the user is guided to articulate or speak a set of words, syllables or sentences. The articulation of the user is captured and subsequently recorded using the handheld device. The captured articulation is converted into speech signals and at least one characteristic of the speech signals selected from the group of characteristics consisting of speech energy, pitch, speech format, speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off is determined. The determined characteristics along with the surgical history of the user and the family background of the user are transmitted and subsequently stored into a repository. The characteristics of speech signals, the family background of the user and the surgical history of the user stored in the repository serve as reference for speech therapists to determine the progress and improvement in articulatory abilities of the user undergoing speech training.
The system in accordance with the present invention is also utilized to capture user's articulation or speech after he/she has undergone surgery to correct his/her articulatory problems. The user's articulation or speech after the surgery is also captured using a handheld device such as a GPRS enabled mobile phone. Through the handheld device,

the user is guided to articulate or speak a set of words, syllables or sentences. The articulation of the user is captured and subsequently recorded using the handheld device. The captured articulation is converted into speech signals and at least one characteristic of the speech signals selected from the group of characteristics consisting of speech energy, pitch, speech format, speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off is determined. The determined characteristics that correspond to the speech signals representing the words, syllables or sentences articulated by the user are compared against the characteristics corresponding to ideal articulation of the same words, syllables or sentences. The characteristics that correspond to ideal articulation of words, syllables or sentences are selected from the group of characteristics consisting of speech energy, pitch, speech format, speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off. A closeness metric determining the deviation between the characteristics corresponding to the speech signals representing the words, syllables or sentences articulated by the user and the characteristics corresponding to the ideal articulation of the same words, syllables or sentences is produced. The closeness metric produced by the system of the invention is essentially a graphical representation of the comparison between the characteristics corresponding to ideal articulation of words, syllables or sentences and the characteristics corresponding to user's articulation of the same words, syllables or sentences. The closeness metric provided to the user through the handheld device acts as a visual feedback and helps the user in ascertaining his/her progress in terms of improvement in his/her articulatory abilities after surgery.
Referring to the accompanying drawings, FIGURE 1 illustrates the first embodiment of the present invention. The system 10, in accordance with the present invention includes a handheld device 12 cooperating with a repository denoted by the reference numeral 14 to receive words, syllables or sentences to be articulated by a user undergoing speech training. The repository 14 includes at least one list of words, syllables or sentences that will be provided for articulation to the user undergoing speech training. Words, syllables or sentences stored in the repository 14 can be arranged in terms of the complexity

associated with them, difficulty associated with articulating them and the like. The repository 14 is adapted to store the words, syllables or sentences in at least one language.
In accordance with the present invention, the handheld device 12 utilized for the purpose of providing speech training is essentially a GPRS enabled mobile phone. However other kinds of handheld device can also be utilized for the purpose of providing speech training to users, without departing from the scope of the invention.
The handheld device 12 includes a display module denoted by the reference numeral 12A which is adapted to display a graphical user interface (not shown in figures). The graphical user interface is adapted to optionally display on the handheld device 12, the words, syllables or sentences to be articulated by the user undergoing speech training. The words, syllables or sentences displayed on the graphical user interface of the display module 12A are selected by the system 10 based on the age of the user undergoing speech training, severity of articulatory problem, linguistic profile of the user, reading ability of the user and the like. The graphical user interface of the display module 12A provides the user with the option of selecting a language in which the words, syllables or sentences will be displayed. The words, syllables or sentences are displayed on the graphical user interface of the display module 12A one after the other in a language selected by the user and subsequently, the user is prompted to articulate the displayed words, syllables or sentences.
The hand held device 12 is also be utilized for capturing articulatory movements of the user while he/she articulates the words, syllables or sentences displayed on the graphical user interface of the display module 12A. The camera available on the handheld device 12 can be utilized for the purposes of capturing articulatory movements of users. Typically, for every '10' milliseconds of user's articulation of words, syllables or sentences, the articulatory movements corresponding to user's articulation are extracted and subsequently displayed on the graphical user interface of the display module 12A. Such a display helps the user in determining whether he/she is articulating words,

syllables or sentences in an appropriate manner and visualizing appropriate articulatory movements and subsequently trying to achieve appropriate articulatory movements.
In accordance with the present invention, the system 10 further includes a module (not shown in figures) which, in harmony with the display of words, syllables or sentences on the graphical user interface of the display module 12A, orally provides words, syllables or sentences to the user so that the user is able to listen to the words, syllables or sentences to be articulated. Words, syllables or sentences can be heard by the user, using an earphone or a headphone attachable to the handheld device 12. After viewing a word, syllables or sentence on the graphical user interface of the display module 12A and listening to the word, syllable or sentence through the handheld device 12, the user is instructed to articulate the displayed word, syllable or sentence through a microphone within or connected to the handheld device 12. The words, syllables or sentences articulated by the user are captured and recorded by the handheld device 12. The handheld device 12 transmits the captured articulations in the form of voice signals and captured articulatory movements in the form of video signals. The captured articulations and corresponding articulatory movements of the user are transmitted through GPRS (General Packet Radio Service) or through a wired communication network or through a wireless communication network.
In accordance with the present invention, first input module 16 receives from the handheld device 12 the voice signals corresponding to the articulation of the user. The second input module denoted by the reference numeral 18 receives from the hand held device 12, video signals representing the articulatory movements corresponding to articulation of the user. The video signals corresponding to articulation of the user and received by the second input module 18 are made available to a speech therapist for the purpose of analysis. The first input module 16 cooperates with processing module denoted by the reference numeral 20 and transmits voice signals corresponding to the articulation of the user to the processing module 20. The processing module 20 receives the voice signals from first input module 16 and processes the voice signals to determine the characteristics corresponding to received voice signals. The characteristics

corresponding to received voice signals include at least one characteristic selected from the group consisting of speech energy, pitch, speech format, speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off. The processing module 20 cooperates with the repository 14 to store the voice signals and characteristics corresponding to voice signals that include at least one characteristic selected from the group of characteristics consisting of speech energy, pitch, speech format, speech level, jitter, shimmer, spectral tilt, spectral balance, spectral moments, spectral decrease, spectral slope and spectral roll-off are also stored into the repository 14. A speech therapist can access the repository 14 at his/her convenience and analyze voice signals and articulatory movements of the user along with the characteristics corresponding to user's articulation and ascertain the progress and improvements achieved by the user in articulating words, syllables or sentences.
In accordance with the present invention, the system 10 further includes comparison module denoted by the reference numeral 22. The comparison module 22 cooperates with the first input module 16 to receive the characteristics corresponding to articulation of the user undergoing speech training. The comparison module 22 also cooperates with the repository 14 to receive the characteristics corresponding to ideal articulation of words, syllables or sentences articulated by the user during the course of speech training. The comparison module 22 compares the characteristics of voice signals associated with speech of a user, with the ideal characteristics received from the repository 14 and determines the deviation between the ideal characteristics and the characteristics corresponding to voice signals representing user's articulation. The ideal characteristics received from the repository 14 include at least one characteristic selected from the group consisting of speech energy, pitch, speech format, speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off.
In accordance with the present invention, the comparison module 22 cooperates with recommendation module denoted by the reference numeral 24 which receives the comparison between the ideal characteristics and the characteristics corresponding to

voice signals representing articulation of the user from the comparison module 22 and generates at least one recommendation based on the comparison received from the comparison module 22, The recommendation generated by the recommendation module 24 is typically a graph depicting the comparison between the ideal characteristics and the characteristics corresponding to voice signals representing articulation of the user. The recommendation module 24 further transmits the generated graph to the hand held device 12 for the purpose of display. The graph generated by the recommendation module 24 provides a visual feedback to the user in terms of the progress and improvements achieved by him/her in articulating words, syllables or sentences.
In accordance with the present invention, the recommendation module 24 differentiates between correctly articulated words, syllables or sentences and improperly articulated words, syllables or sentences based on the comparison between the ideal characteristics and the characteristics corresponding to voice signals representing articulation of the user. The recommendation module 24 cooperates with the repository 14 to store the words, syllables or sentences that have been identified as improper articulations. The recommendation module 24 iterates the speech training process with improperly articulated words, syllables or sentences by instructing the graphical user interface of the display module 12A to retrieve improperly articulated words form the repository 14 and display them for the purpose of articulation. The recommendation module 24 further includes a dictionary module denoted by the reference numeral 24A which is adapted to store the words, syllables or sentences identified as improper articulations by the recommendation module 24. The dictionary module 24A is adapted to store the words, syllables or sentences in at least one language.
In accordance with the present invention, the system 10 is adapted to automatically select from the repository 14 the words, syllables or sentences to be displayed during the speech training session to a particular user. The selection of words, syllables or sentences are based on the articulatory improvements and progress witnessed in the user during the course of speech training. Alternatively the speech therapist administering speech training to the user can also select from the repository 14 the words, syllables or

sentences to be displayed during the course of speech training to the user.
In accordance with the present invention, voice signals corresponding to the words, syllables or sentences articulated by the user, subsequent comparison between the characteristics corresponding to voice signals representing the words, syllables or sentences articulated by the user and the characteristics corresponding to ideal articulation of the same words, syllables or sentences are also stored in the repository 14. The user, after completion of speech training session can listen to the words, syllables or sentences he/she had articulated during the course of speech training, using the handheld device 12. When the user makes a request to listen to words, syllables or sentences articulated by him/her, using the handheld device 12, respective contents are retrieved from the repository 14 and transmitted to the handheld device 12 thereby enabling the user to listen to articulated words, syllables or sentences. Using the hand held device 12, the user can also listen to the ideal articulation of the words, syllables or sentences that have been articulated by him/her during the course of speech training. When the user, using the handheld device 12, makes a request to listen to the ideal articulation of words, syllables or sentences articulated by him/her during the course of speech training, the respective contents are retrieved from the repository 14 and transmitted to the handheld device 12 thereby enabling the user to listen to the ideal articulation of words, syllables or sentences. Graphs provided by the recommendation module 24 form a part of the visual feedback provided to the user in relation to his/her progress during speech training. The graphs generated by the recommendation module 24 are stored in the repository 14 for further reference. For the purpose of display on the graphical user interface of display module 12A, the graphs will be retrieved from the repository 14 and transmitted through a transmission network to the display module 12A.
The speech therapist can listen to the words, syllables or sentences articulated by the user and review the articulatory movements exhibited by the user during the course of articulation of words, syllables or sentences. The speech therapist can also access the graphs provided by the recommendation module 24 before opining about user's improvement in articulation. The speech therapist can also provide voice based feedback

to the user using the system 10. The speech therapist needs to record his/her instructions in voice format, which is subsequently transmitted and played on the handheld device 12 present with the user. The graphs provided to the user as visual feedback and the instructions, suggestions from the speech therapist can be provided in real time through the handheld device 12 to the user. These contents are also stored in the repository 14 and made available to the user on his/her request at a later time. By using handheld device 12, the user can also send his/her queries to the speech therapist. The therapist's reply to user's query is either transmitted to the user through the hand held device 12 or is stored in the repository 14 for later retrieval and display.
In accordance with the present invention, the speech therapist can review in real time and online, words, syllables or sentences articulated by the user, articulatory movements of the user and graphs generated by comparison module 22 and subsequently determine the progress and improvements achieved by the user after undergoing surgery and after receiving speech treatment. The words articulated by a user, articulatory movements of the user and the graphical representations generated by the recommendation module 24 are displayed to the speech therapist along with the time stamp, which makes it possible for speech therapists to determine the date and time on which the articulatory movements, the words articulated by the user were captured and graphs were generated.
FIGURE 2 illustrates the second embodiment of the present invention. Referring to FIGURE 2, there is provided a block diagram of a system 20 that provides speech training to users with articulatory problems. The system 20 is similar the system 10 described in FIGURE 1 in that it includes hand held device 22, first input module 26, second input module 28 , processing module 30, comparison module 32 and recommendation module 34. The functionalities of the hand held device 22, first input module 26, second input module 28, processing module 30, comparator module 32 and recommendation module 34 of system 20 are similar to the functionalities of the first input module, processing module, comparison module, recommendation module described in system 10 in FIGURE 1. The system 20 differs from system 10 of FIGURE 1 in that the comparison module cooperates 32 with the receiver module 36 instead of

the repository 24 to receive the signals corresponding to ideal articulations having ideal characteristics instead of the repository 24. In system 20, the receiver module 36 of the system 20 receives signals corresponding to ideal articulations from the repository 24 and subsequently transmits them to the comparison module 32. The receiver module 36 of system 20 acts as an intermediary between the comparison module 34 and the repository 24, whereas in system 10 of figure 1, the signals corresponding to ideal articulations having ideal characteristics were directly transferred to the comparison module from the repository.
Referring to FIGURE 3, a method for providing speech training to a user is illustrated through a flow diagram. The method in accordance with the present invention includes the following steps:
• receiving voice signals corresponding to at least one articulation of the user 100;
• determining characteristics of the received voice signals 102;
• receiving a set of ideal characteristics corresponding to the received voice signals 104;
• comparing the characteristics of the received voice signals with the corresponding ideal characteristics 106; and
• generating at least one recommendation based on the comparison 108.
In accordance with the present invention, the method further comprises the step of storing in a dictionary module and retrieving therefrom syllables, words, or sentences for the purpose of comparison.
In accordance with the present invention, the method further comprises the step of receiving video signals of the facial expressions of the user corresponding to the at least one articulation of the user.
In accordance with the present invention, the characteristics include at least one characteristic selected from the group consisting of speech energy, pitch, speech format,

speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off.
TECHNICAL ADVANCEMENTS
The technical advancements of the present invention include the following:
• present invention provides a system for effectively administering speech training;
• present invention provides a system that can be utilized to administer speech training without requiring the user to personally visit the speech therapist;
• present invention provides a system which makes available the feedback pertaining to the improvement in user's speech, online;
• present invention provides a system which can be utilized to provide speech training through a handheld device adapted to be with the user;
• present invention provides a system which displays to the user, the words to be articulated, in a language of his/her choice;
• present invention provides a system which facilitates real time communication as well as offline communication between the user and speech therapist;
• present invention provides a system which allows the users to obtain visual feedback pertaining to their progress with the speech treatment;
• present invention provides a system which makes administration of speech training possible even to those users who are located in remote geographic locations;
• present invention makes available a system which provides speech therapists with easy access to all the surgical history , speech signals, and video signals corresponding to every speech training session undertaken by a user;
• present invention makes available a system which provides for speech therapists to decide upon the words to be displayed to a user for articulation, based on the progress in articulation of the user;
• present invention makes available a system that automatically selects the words to be displayed to a user for articulation based on the improvement in articulatory abilities of the user;
• present invention provides a system that helps enhance the productivity of speech

therapists;
• present invention makes available a system that provides speech training seekers with trouble-free and timely access to speech therapists;
• present invention provides a system that makes it possible for speech therapists to easily transfer surgical history of users to experts and subsequently seek their opinion.
While considerable emphasis has been placed herein on the particular features of this invention, it will be appreciated that various modifications can be made, and that many changes can be made in the preferred embodiment without departing from the principles of the invention. These and other modifications in the nature of the invention or the preferred embodiments will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.

We Claim:
1. A system for providing speech training to a user, the system comprising:
• first input module configured to receive voice signals corresponding to at least one articulation of the user;
• processing module configured to determine the characteristics of received voice signals;
• repository adapted to store signals corresponding to ideal articulations having ideal characteristics;
• comparison module adapted to perform comparison of the characteristics of the received voice signals with the corresponding ideal characteristics received from the repository; and
• recommendation module adapted to generate at least one recommendation based on the comparison and adapted to generate iteratively, based on the comparison words, syllables or sentences in at least one language.

2. The system as claimed in claim 1, wherein the characteristics include at least one characteristic selected from the group consisting of speech energy, pitch, speech format, speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off.
3. The system as claimed in claim 1, wherein said repository further comprises at least one list of syllables, words, or sentences.
4. The system as claimed in claim 1, wherein said recommendation module further comprises a dictionary module adapted to receive and store said words, syllables or sentences in at least one language from the repository based on the comparison.
5. The system as claimed in claim 1, the system further comprising a second input module to receive video signals of the facial expressions of the user that correspond to at least one articulation of the user.

6. The system for providing speech training to a user as claimed in any one of the preceding claims which includes a receiver module adapted to receive signals corresponding to ideal articulations and syllables, words, or sentences from the repository.
7. The system as claimed in any one of the preceding claims, which includes a hand held device adapted to cooperate with at least one module selected from the group of modules consisting of first input module, second input module, repository, recommendation module and receiver module.
8. The system as claimed in claim 6, wherein the handheld device is provided with a graphical user interface and a display module for displaying syllables, words or sentences received from the repository and facial expressions corresponding to the at least one articulation of the user.
9. A system for providing speech training to a user, the system comprising:
a. first input module configured to receive voice signals corresponding to at
least one articulation of the user;
b. processing module configured to determine characteristics of the received
voice signals;
c. receiver module to receive signals corresponding to ideal articulations and
a list of probable syllables, words, or sentences from a repository;
d. comparison module to compare the characteristics of received voice
signals with the corresponding ideal characteristics received from the
receiver module; and
e. recommendation module to generate at least one recommendation based
on the comparison and the list of probable syllables, words, or sentences.
10. The system as claimed in claim 9, the system further comprising a graphical user
interface and a display module for displaying syllables, words or sentences
received from the repository and facial expressions corresponding to the at least

one articulation of the user.
11. The system as claimed in claim 9, wherein the recommendation module further comprising a dictionary module for storing and receiving syllables, words, or sentences in at least one language from the repository based on the comparison.
12. The system as claimed in claim 9, wherein the recommendation module further adapted to generate iteratively based on the comparison, words, syllables or sentences in at least one language.
13. The system as claimed in claim 9, the system further comprising a second input module to receive video signals of the facial expressions of the user corresponding to the at least one articulation of the user.
14. A method for providing speech training to a user, the method comprising:

• receiving signals corresponding to at least one articulation of the user;
• determining characteristics of the received signals;
• receiving a set of ideal characteristics corresponding to the received signals;
• comparing the characteristics of the received signals with the corresponding ideal characteristics; and
• generating at least one recommendation based on the comparison.

15. The method as claimed in claim 14. the method further comprising the step of storing in a dictionary module and retrieving therefrom syllables, words, or sentences for the purpose of comparison.
16. The method as claimed in claim 14. the method further comprising the step of receiving video signals of the facial expressions of the user corresponding to the at least one articulation of the user.

17. The method as claimed in claim 14. wherein the characteristics include at least one characteristic selected from the group consisting of speech energy, pitch. speech format, speech signal level, jitter, shimmer, spectral tilt, spectral balance, spectral movements, spectral decrease, spectral slope and spectral roll-off.

Documents

Application Documents

#	Name	Date
1	Form-18(Online).pdf	2018-08-10
2	abstract1.jpg	2018-08-10
3	1584-mum-2011-form 3.pdf	2018-08-10
4	1584-mum-2011-form 26.pdf	2018-08-10
5	1584-mum-2011-form 2.pdf	2018-08-10
6	1584-mum-2011-form 2(title page).pdf	2018-08-10
7	1584-mum-2011-form 1.pdf	2018-08-10
8	1584-MUM-2011-FORM 1(12-4-2012).pdf	2018-08-10
9	1584-mum-2011-drawing.pdf	2018-08-10
10	1584-mum-2011-description(complete).pdf	2018-08-10
11	1584-mum-2011-correspondence.pdf	2018-08-10
12	1584-MUM-2011-CORRESPONDENCE(12-4-2012).pdf	2018-08-10
13	1584-mum-2011-claims.pdf	2018-08-10
14	1584-mum-2011-abstract.pdf	2018-08-10
15	1584-MUM-2011-FER.pdf	2019-03-05
16	1584-MUM-2011-RELEVANT DOCUMENTS [27-06-2019(online)].pdf	2019-06-27
16	1584-mum-2011-description(complete).pdf	2018-08-10
17	1584-MUM-2011-PETITION UNDER RULE 137 [27-06-2019(online)].pdf	2019-06-27
18	1584-MUM-2011-OTHERS [27-06-2019(online)].pdf	2019-06-27
19	1584-MUM-2011-FER_SER_REPLY [27-06-2019(online)].pdf	2019-06-27
20	1584-MUM-2011-CLAIMS [27-06-2019(online)].pdf	2019-06-27
21	1584-MUM-2011-ABSTRACT [27-06-2019(online)].pdf	2019-06-27
22	1584-MUM-2011-Response to office action [09-09-2020(online)].pdf	2020-09-09
23	1584-MUM-2011-PatentCertificate20-03-2023.pdf	2023-03-20
24	1584-MUM-2011-IntimationOfGrant20-03-2023.pdf	2023-03-20
25	1584-MUM-2011-RELEVANT DOCUMENTS [30-09-2023(online)].pdf	2023-09-30

Search Strategy

1	SearchPattern1584MUM2011_25-02-2019.pdf