Method And System For Speech Recognition Using Phoneme Statistics In

Method And System For Speech Recognition Using Phoneme Statistics In Lexical Tree Search

Abstract: A device and method for speech recognition using phoneme statistics in lexical tree search is disclosed. The present application provides a method and system for speech recognition using phoneme statistics in lexical tree search comprising organization and optimization of the search graph and early incorporation of phoneme based bigram language model and word based trigram language model in a search process. The system and method also provide for weighted lexical tree generation which may be used for search. The weighted lexical tree is based on phoneme based bigram model score.

Patent Information

Application #

Filing Date

29 March 2016

Publication Number

40/2017

Publication Type

INA

Invention Field

ELECTRONICS

Status

Email

iprdel@lakshmisri.com

Parent Application

Patent Number

Legal Status

Grant Date

2023-11-24

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building, 9th Floor, Nariman Point, Mumbai - 400021, Maharashtra, India

Inventors

1. DAS, Biswajit

Tata Consultancy Services Limited Yantra Park - (STPI), 2nd Pokharan Road, Subhash Nagar Unit No 6 Thane - 400601, Maharashtra, India

2. PANDA, Ashish

Tata Consultancy Services Limited Yantra Park - (STPI), 2nd Pokharan Road, Subhash Nagar Unit No 6 Thane - 400601, Maharashtra, India

3. KOPPARAPU, Sunil Kumar

Tata Consultancy Services Limited Yantra Park - (STPI), 2nd Pokharan Road, Subhash Nagar Unit No 6 Thane - 400601, Maharashtra, India

Specification

Claims:1. A method for speech recognition using phoneme statistics in lexical tree search, said method comprising processor implemented steps of:
acquiring, a speech signal using a speech acquisition module (210);
segmenting, the speech signal into a plurality of frames using at least one spectrum analyzer (212);
extracting, at least one feature vector for the speech signal using a front end processing module (214);
generating, at least one acoustic model based on Hidden Markov Model (HMM), wherein each state of the HMM is modeled by at least one of a Gaussian mixture model (GMM) and a deep neural network (DNN), using an acoustic model generation module (216);
generating, a plurality of statistical language model wherein the plurality of language model comprises at least one phoneme based bigram model and at least one word based trigram model using a language model generation module (218);
generating, a weighted lexical tree using the at least one phoneme based bigram model using a lexical tree generation module (220);
propagating, the at least one feature vector and a corresponding token through the weighted lexical tree using a token passing algorithm and score the at least one feature vector by matching the at least one feature vector with each HMM using a search module (222) wherein propagating comprises
setting the token score based on the weight assigned to each node of the weighted lexical tree;
updating, the token score by adding a phoneme transitional probability score to the token score to generated a first updated score; and
selecting, the state with the best token score for propagating through the weighted lexical tree;
updating the first updated score by adding a word based language model score to the first updated score to generate a second updated score wherein the word based language model score is based on the at least one word based trigram model, when the token transitions from end of a first word to start of a second word;
storing, a history record in a database (224) each time a token transitions from end of a first word to start of a second word wherein a link to the history record is stored and updated in the token; and
generating, a hypothesis for speech recognition by examining the best token at a valid network exit point after a predetermined time and tracing back the most likely state sequence and time boundary using a speech recognition module (226).

2. The method according to claim 1 wherein feature extraction comprises extracting, a plurality of spectral features from the each of the plurality of frames using Fast Fourier Transform (FFT), wherein the plurality of spectral features comprise a magnitude spectral feature.

3. The method according to claim 2 further comprising extracting, one or more Mel Filter Cepstral Coefficients (MFCC) by performing Mel filter bank analysis, applying logarithm and applying discrete cosine transform (DCT) on the magnitude spectral feature wherein the one or more MFCC’s are used for acoustic model generation.

4. The method according to claim 1 wherein when the HHM is modeled by the GMM, each HMM is trained iteratively by using an Expectation Maximization (EM) algorithm.

5. The method according to claim 1 wherein the at least one phoneme based bigram model is generated by using replacing each word in a text corpus with a pronunciation unit corresponding to the word.

6. A system (102), comprising a processor (202), a memory (206) coupled to said processor the system comprising:

a speech acquisition module (210), configured to acquire a speech signal;
at least one spectrum analyzer (212), configured to segment the speech signal into a plurality of frames;
a front end processing module (214), configured to extract at least one feature vector for the speech signal;
an acoustic model generation module (216) configured to generate, at least one acoustic model based on Hidden Markov Model (HMM), wherein each state of the HMM is modeled by at least one of a Gaussian mixture model (GMM) and a deep neural network (DNN);
a language model generation module (218) configured to generate, a plurality of statistical language model wherein the plurality of language model comprises at least one phoneme based bigram model and at least one word based trigram model;
a lexical tree generation module (220) configured to generate, a weighted lexical tree using the at least one phoneme based bigram model;
a search module (222) configured to propagate, the at least one feature vector and a corresponding token through the weighted lexical tree using a token passing algorithm and score the at least one feature vector by matching the at least one feature vector with each HMM wherein propagation comprises
setting the token score based on the weight assigned to each node of the weighted lexical tree;
updating, the token score by adding a phoneme transitional probability score to the token score to generated a first updated score; and
selecting, the state with the best token score for propagating through the weighted lexical tree;
the search module (222) is further configured to update the first updated score by adding a word based language model score to the first updated score to generate a second updated score wherein the word based language model score is based on the at least one word based trigram model, when the token transitions from end of a first word to start of a second word;
the search module (222) is further configured to store, a history record in a database (226) each time a token transitions from end of a first word to start of a second word wherein a link to the history record is stored and updated in the token; and

a speech recognition module (224) configured to generate, a hypothesis for speech recognition by examining the best token at a valid network exit point after a predetermined time and tracing back the most likely state sequence and time boundary.

7. The system (102) according to claim 6 wherein the speech acquisition module (210) comprises one or more voice capture device to capture a speech signal.

8. The system (102) according to claim 6 wherein the speech acquisition module (210) further comprises at least one analog digital convertor for converting the speech signal to a digital signal.

9. The system (102) according to claim 6 wherein the front end processing module (214) is further configured to extract, a plurality of spectral features from the each of the plurality of frames using Fast Fourier Transform (FFT), wherein the plurality of spectral features comprise a magnitude spectral feature.

10. The system (102) according to claim 9 wherein the front end processing module (214) extracting, one or more Mel Filter Cepstral Coefficients (MFCC) by performing Mel filter bank analysis, applying logarithm and applying discrete cosine transform (DCT) on the magnitude spectral feature wherein the one or more MFCC’s are used for acoustic model generation.

11. The system 102 according to claim 6 wherein the language model generation module (218) is further configured to generate the at least one phoneme based bigram model by replacing each word in a text corpus with a pronunciation unit corresponding to the word.
, Description:As Attached

Documents

Application Documents

#	Name	Date
1	201621010685-IntimationOfGrant24-11-2023.pdf	2023-11-24
1	Form 5 [29-03-2016(online)].pdf	2016-03-29
2	Form 3 [29-03-2016(online)].pdf	2016-03-29
2	201621010685-PatentCertificate24-11-2023.pdf	2023-11-24
3	Form 18 [29-03-2016(online)].pdf	2016-03-29
3	201621010685-CLAIMS [07-02-2020(online)].pdf	2020-02-07
4	Drawing [29-03-2016(online)].pdf	2016-03-29
4	201621010685-DRAWING [07-02-2020(online)].pdf	2020-02-07
5	Description(Complete) [29-03-2016(online)].pdf	2016-03-29
5	201621010685-FER_SER_REPLY [07-02-2020(online)].pdf	2020-02-07
6	201621010685-POWER OF ATTORNEY-(02-5-2016).pdf	2018-08-11
6	201621010685-FORM 1.pdf	2019-08-14
7	201621010685-Form 1-220416.pdf	2018-08-11
7	201621010685-FER.pdf	2019-08-09
8	201621010685-Correspondence-220416.pdf	2018-08-11
8	201621010685-CORRESPONDENCE-(02-5-2016).pdf	2018-08-11
9	201621010685-Correspondence-220416.pdf	2018-08-11
9	201621010685-CORRESPONDENCE-(02-5-2016).pdf	2018-08-11
10	201621010685-FER.pdf	2019-08-09
10	201621010685-Form 1-220416.pdf	2018-08-11
11	201621010685-POWER OF ATTORNEY-(02-5-2016).pdf	2018-08-11
11	201621010685-FORM 1.pdf	2019-08-14
12	Description(Complete) [29-03-2016(online)].pdf	2016-03-29
12	201621010685-FER_SER_REPLY [07-02-2020(online)].pdf	2020-02-07
13	Drawing [29-03-2016(online)].pdf	2016-03-29
13	201621010685-DRAWING [07-02-2020(online)].pdf	2020-02-07
14	Form 18 [29-03-2016(online)].pdf	2016-03-29
14	201621010685-CLAIMS [07-02-2020(online)].pdf	2020-02-07
15	Form 3 [29-03-2016(online)].pdf	2016-03-29
15	201621010685-PatentCertificate24-11-2023.pdf	2023-11-24
16	Form 5 [29-03-2016(online)].pdf	2016-03-29
16	201621010685-IntimationOfGrant24-11-2023.pdf	2023-11-24

Search Strategy

1	201621010685_09-08-2019.pdf