"An Apparatus For Measuring Clarity Of Spoken English"

Abstract: The present invention relates to an apparatus providing for measurement and determination of clarity of spoken English by a speech to text engine, trained on an average voice and capable of categorizing speakers in different bands based on quantitative analysis done on their speech. The system provides for conversion of speech of the speaker, recorded through a microphone to be transcribed in the engine to get the corresponding text. The text is compared with the original text and is analyzed. The number of matching words is calculated and scores are tabulated and accuracy % is calculated. The system also provides for analyzing the voice with accent of male & female and neutral voice of male & female. Based on the stated speech recording, conversion of speech into text, comparison and analysis of the converted text with the original text, it can be safely concluded that the apparatus provided by the present invention is able to distinguish a clear voice from the rest and able to categorize the speakers in different bands based on the clarity of their voices.

Patent Information

Application #

Filing Date

18 November 2002

Publication Number

15/2010

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

NIIT LIMITED

NIIT HOUSE, C-125, OKHLA INDUSTRIAL AREA, PHASE-1, NEW DELHI-110 020, INDIA.

Inventors

1. SUGATA MITRA

NIIT LIMITED, NIIT HOUSE, C-125, OKHLA INDUSTRIAL AREA, PHASE-1, NEW DELHI-110 020, INDIA.

2. RANJU GHATAK

NIIT LIMITED, NIIT HOUSE, C-125, OKHLA INDUSTRIAL AREA, PHASE-1, NEW DELHI-110 020, INDIA.

Specification

PRIOR ART
US PATENT # 4707858
Title: UTILIZING WORD-TO-DIGITAL CONVERSION
This invention talks about a communication system in which human speech is analyzed but the comparison with our invention ends here. While this invention is analyzing human speech on pre-stored words and tries to recognize the speaker through its analysis, our invention is analyzing human speech through a speech engine with an unlimited vocabulary, with the objective of measuring the speakers' clarity of speech. The engine is not confined to the set of words present in the vocabulary. What it tries to analyze is the clarity of speech of one speaker against another.
US PATENT # 5899972
Title: INTERACTIVE VOICE RECOGNITION METHOD AND APPARATUS USING AFFIRMATIVE/NEGATIVE CONTENT DISCRIMINATION.
Mentioned invention recognizes voices, translates into digital form, analyzes whether the characteristic voice data can be characterized as an affirmative or negative response and generate synthesized audio corresponding to the appropriate response formulated by said voice comprehension and conversation control unit. Here again, the analysis of voice is happening on a set of pre-registered words which is different from our invention. Also, where this invention primarily categories voice data as affirmative or negative & responds appropriately, our invention categories speakers based on the clarity of their pronunciation.
US PATENT # 5640490
Title: USER INDEPENDENT, REAL-TIME SPEECH RECOGNITION SYSTEM AND METHOD
The purpose of this invention is to recognize speech irrespective of accent or pronunciation. This is the opposite of the purpose of our invention which measures the clarity of pronunciation of different speakers.

US PATENT # 6438520
Title: APPARATUS, METHOD & SYSTEM FOR CROSS SPEAKER SPEECH RECOGNITION FOR TELECOMMUNICATION APPLICATION
The Invention involves phonetic transcription of incoming calls and determining its. "confidence ratio" with the predefined threshold value. The invention provides cross-speaker speech recognition utilizing a methodology that provides both high discrimination and high noise immunity, utilizing a matching or collision of two different speech models. Once again, the purpose is to recognize speech from different speakers irrespective of their pronunciation, while our invention measures the clarity of spoken English.
US PATENT # 5621857
Title: METHOD AND SYSTEM FOR IDENTIFYING AND RECOGNIZING SPEECH
The invention involves a system which processes a sequence of spoken utterances to identify the same based upon a highest probability match of each utterance with learned speech tokens and based upon a highest probability match of the uttered sequence with a defined utterance library. Once again, the purpose is to recognize speech from different speakers irrespective of their pronunciation, while our invention measures the clarity of spoken English.
SUMMARY OF THE INVENTION
The present invention provides for determining the clarity of spoken English of the speaker through a speech to text conversion engine.
The engine is first trained in a specific speaker's voice. Hence, when this speaker reads out any passage into the engine, the resultant text file is an accurate rendition of the passage read.
Then probable in the recruitment process are asked to read the same passage. Reading of the same passage by the other speaker is fed into the engine and it produces text that is different from that produced by the original speaker, due to

pronunciation differences between them. The number of errors in the resultant transcription of any speaker's voice is converted to a score that measures how closely the new speaker's pronunciation matches that of the original speaker in whose voice the engine was trained.
Any speaker can now use this engine (trained in another speaker's voice) to check how much of his or her speech is recognized by the system. Such reedback was. until now, not possible from any automated system.
Moreover, a speaker can use this feedback from the system to change his or her pronunciation until the system recognizes the words spoken. In effect, the speaker would have changed his or her pronunciation to match that of the original speaker in whose voice the engine was originally trained.
DETAILED DESCRIPTION OF THE PROCESS
Figure 1, gives a block diagram of the invention on how to measure the clarity of spoken English through speech to text engines. A suitable speech-to-text engine (12) is installed in an appropriate computer and trained (11) using a human speaker whose pronunciation is considered correct or desirable. The resulting file (13) is stored as a "User Profile" on the computer for future reference.
A-user of the system then reads (21) a target passage (31) in to the system through a microphone and his or her speech is then converted to text (22), (also referred as transcribed file), by the speech-to-text engine (12) using the trainer's or user's profile (13).
The transcribed file (22) is then compared with the text of the target passage (31) through a comparator algorithm (32) and errors are counted. The number of errors is converted into pronunciation score (33), which is a measure of iiow close the pronunciation of the user is as compared to that of the trainer. Such measurement was, until now, considered impossible to do automatically.

The user can repeat (21) to achieve a higher score. In the process, the user's pronunciation (22) would get more and more similar to the trainer's pronunciation (13). If the resultant score (33) is found satisfactory (34) then the process can be ended.
LIMITATIONS
The reading should, usually be slower than normal pace, since words are lost if the pace of reading is too high.
Difficulty in distinguishing phonetically similar words e.g. right and write, yourself and your self.
Such limitations are generic to all speech to text engines available today.

WE CLAIM
1. An apparatus for measurement and determination of clarify of speech by a speech to text engine capable of doing quantitative analysis on a given speech.
2. An apparatus as claimed in claim 1 providing for feeding an average voice of a speaker through a microphone and conversion of the same by speech to text engine and storing the same as user profile.
3. An apparatus as claimed in claim 1 providing for feeding the same speech by a different speaker and storing the same as transcribed file.
4. An apparatus as claimed in claim 1 providing for comparing the user profile and transcribed file as claimed in claim 2 and 3 through a comparator algorithm.
5. An apparatus as claimed in claim 1 providing for detecting and counting of errors in transcribed file and converting the same to pronunciation score.
6. An apparatus as claimed in claim 1 providing for feed back enabling the speakers to rectify their pronunciation to match the accurate.

Documents

Application Documents

#	Name	Date
1	1159-del-2002-abstract.pdf	2011-08-21
1	1159-del-2002-gpa.pdf	2011-08-21
2	1159-del-2002-claims.pdf	2011-08-21
2	1159-del-2002-form-3.pdf	2011-08-21
3	1159-del-2002-form-2.pdf	2011-08-21
3	1159-del-2002-correspondence-others.pdf	2011-08-21
4	1159-del-2002-form-19.pdf	2011-08-21
4	1159-del-2002-correspondence-po.pdf	2011-08-21
5	1159-del-2002-description (complete).pdf	2011-08-21
5	1159-del-2002-form-1.pdf	2011-08-21
6	1159-del-2002-drawings.pdf	2011-08-21
7	1159-del-2002-description (complete).pdf	2011-08-21
7	1159-del-2002-form-1.pdf	2011-08-21
8	1159-del-2002-correspondence-po.pdf	2011-08-21
8	1159-del-2002-form-19.pdf	2011-08-21
9	1159-del-2002-correspondence-others.pdf	2011-08-21
9	1159-del-2002-form-2.pdf	2011-08-21
10	1159-del-2002-form-3.pdf	2011-08-21
10	1159-del-2002-claims.pdf	2011-08-21
11	1159-del-2002-gpa.pdf	2011-08-21
11	1159-del-2002-abstract.pdf	2011-08-21

Search Strategy

1	Searchstrategy_18-05-2017.pdf