Abstract: A device and method for speech recognition using phoneme statistics in lexical tree search is disclosed. The present application provides a method and system for speech recognition using phoneme statistics in lexical tree search comprising organization and optimization of the search graph and early incorporation of phoneme based bigram language model and word based trigram language model in a search process. The system and method also provide for weighted lexical tree generation which may be used for search. The weighted lexical tree is based on phoneme based bigram model score.
Claims:1. A method for speech recognition using phoneme statistics in lexical tree search, said method comprising processor implemented steps of:
acquiring, a speech signal using a speech acquisition module (210);
segmenting, the speech signal into a plurality of frames using at least one spectrum analyzer (212);
extracting, at least one feature vector for the speech signal using a front end processing module (214);
generating, at least one acoustic model based on Hidden Markov Model (HMM), wherein each state of the HMM is modeled by at least one of a Gaussian mixture model (GMM) and a deep neural network (DNN), using an acoustic model generation module (216);
generating, a plurality of statistical language model wherein the plurality of language model comprises at least one phoneme based bigram model and at least one word based trigram model using a language model generation module (218);
generating, a weighted lexical tree using the at least one phoneme based bigram model using a lexical tree generation module (220);
propagating, the at least one feature vector and a corresponding token through the weighted lexical tree using a token passing algorithm and score the at least one feature vector by matching the at least one feature vector with each HMM using a search module (222) wherein propagating comprises
setting the token score based on the weight assigned to each node of the weighted lexical tree;
updating, the token score by adding a phoneme transitional probability score to the token score to generated a first updated score; and
selecting, the state with the best token score for propagating through the weighted lexical tree;
updating the first updated score by adding a word based language model score to the first updated score to generate a second updated score wherein the word based language model score is based on the at least one word based trigram model, when the token transitions from end of a first word to start of a second word;
storing, a history record in a database (224) each time a token transitions from end of a first word to start of a second word wherein a link to the history record is stored and updated in the token; and
generating, a hypothesis for speech recognition by examining the best token at a valid network exit point after a predetermined time and tracing back the most likely state sequence and time boundary using a speech recognition module (226).
2. The method according to claim 1 wherein feature extraction comprises extracting, a plurality of spectral features from the each of the plurality of frames using Fast Fourier Transform (FFT), wherein the plurality of spectral features comprise a magnitude spectral feature.
3. The method according to claim 2 further comprising extracting, one or more Mel Filter Cepstral Coefficients (MFCC) by performing Mel filter bank analysis, applying logarithm and applying discrete cosine transform (DCT) on the magnitude spectral feature wherein the one or more MFCC’s are used for acoustic model generation.
4. The method according to claim 1 wherein when the HHM is modeled by the GMM, each HMM is trained iteratively by using an Expectation Maximization (EM) algorithm.
5. The method according to claim 1 wherein the at least one phoneme based bigram model is generated by using replacing each word in a text corpus with a pronunciation unit corresponding to the word.
6. A system (102), comprising a processor (202), a memory (206) coupled to said processor the system comprising:
a speech acquisition module (210), configured to acquire a speech signal;
at least one spectrum analyzer (212), configured to segment the speech signal into a plurality of frames;
a front end processing module (214), configured to extract at least one feature vector for the speech signal;
an acoustic model generation module (216) configured to generate, at least one acoustic model based on Hidden Markov Model (HMM), wherein each state of the HMM is modeled by at least one of a Gaussian mixture model (GMM) and a deep neural network (DNN);
a language model generation module (218) configured to generate, a plurality of statistical language model wherein the plurality of language model comprises at least one phoneme based bigram model and at least one word based trigram model;
a lexical tree generation module (220) configured to generate, a weighted lexical tree using the at least one phoneme based bigram model;
a search module (222) configured to propagate, the at least one feature vector and a corresponding token through the weighted lexical tree using a token passing algorithm and score the at least one feature vector by matching the at least one feature vector with each HMM wherein propagation comprises
setting the token score based on the weight assigned to each node of the weighted lexical tree;
updating, the token score by adding a phoneme transitional probability score to the token score to generated a first updated score; and
selecting, the state with the best token score for propagating through the weighted lexical tree;
the search module (222) is further configured to update the first updated score by adding a word based language model score to the first updated score to generate a second updated score wherein the word based language model score is based on the at least one word based trigram model, when the token transitions from end of a first word to start of a second word;
the search module (222) is further configured to store, a history record in a database (226) each time a token transitions from end of a first word to start of a second word wherein a link to the history record is stored and updated in the token; and
a speech recognition module (224) configured to generate, a hypothesis for speech recognition by examining the best token at a valid network exit point after a predetermined time and tracing back the most likely state sequence and time boundary.
7. The system (102) according to claim 6 wherein the speech acquisition module (210) comprises one or more voice capture device to capture a speech signal.
8. The system (102) according to claim 6 wherein the speech acquisition module (210) further comprises at least one analog digital convertor for converting the speech signal to a digital signal.
9. The system (102) according to claim 6 wherein the front end processing module (214) is further configured to extract, a plurality of spectral features from the each of the plurality of frames using Fast Fourier Transform (FFT), wherein the plurality of spectral features comprise a magnitude spectral feature.
10. The system (102) according to claim 9 wherein the front end processing module (214) extracting, one or more Mel Filter Cepstral Coefficients (MFCC) by performing Mel filter bank analysis, applying logarithm and applying discrete cosine transform (DCT) on the magnitude spectral feature wherein the one or more MFCC’s are used for acoustic model generation.
11. The system 102 according to claim 6 wherein the language model generation module (218) is further configured to generate the at least one phoneme based bigram model by replacing each word in a text corpus with a pronunciation unit corresponding to the word.
, Description:As Attached
| # | Name | Date |
|---|---|---|
| 1 | 201621010685-IntimationOfGrant24-11-2023.pdf | 2023-11-24 |
| 1 | Form 5 [29-03-2016(online)].pdf | 2016-03-29 |
| 2 | Form 3 [29-03-2016(online)].pdf | 2016-03-29 |
| 2 | 201621010685-PatentCertificate24-11-2023.pdf | 2023-11-24 |
| 3 | Form 18 [29-03-2016(online)].pdf | 2016-03-29 |
| 3 | 201621010685-CLAIMS [07-02-2020(online)].pdf | 2020-02-07 |
| 4 | Drawing [29-03-2016(online)].pdf | 2016-03-29 |
| 4 | 201621010685-DRAWING [07-02-2020(online)].pdf | 2020-02-07 |
| 5 | Description(Complete) [29-03-2016(online)].pdf | 2016-03-29 |
| 5 | 201621010685-FER_SER_REPLY [07-02-2020(online)].pdf | 2020-02-07 |
| 6 | 201621010685-POWER OF ATTORNEY-(02-5-2016).pdf | 2018-08-11 |
| 6 | 201621010685-FORM 1.pdf | 2019-08-14 |
| 7 | 201621010685-Form 1-220416.pdf | 2018-08-11 |
| 7 | 201621010685-FER.pdf | 2019-08-09 |
| 8 | 201621010685-Correspondence-220416.pdf | 2018-08-11 |
| 8 | 201621010685-CORRESPONDENCE-(02-5-2016).pdf | 2018-08-11 |
| 9 | 201621010685-Correspondence-220416.pdf | 2018-08-11 |
| 9 | 201621010685-CORRESPONDENCE-(02-5-2016).pdf | 2018-08-11 |
| 10 | 201621010685-FER.pdf | 2019-08-09 |
| 10 | 201621010685-Form 1-220416.pdf | 2018-08-11 |
| 11 | 201621010685-POWER OF ATTORNEY-(02-5-2016).pdf | 2018-08-11 |
| 11 | 201621010685-FORM 1.pdf | 2019-08-14 |
| 12 | Description(Complete) [29-03-2016(online)].pdf | 2016-03-29 |
| 12 | 201621010685-FER_SER_REPLY [07-02-2020(online)].pdf | 2020-02-07 |
| 13 | Drawing [29-03-2016(online)].pdf | 2016-03-29 |
| 13 | 201621010685-DRAWING [07-02-2020(online)].pdf | 2020-02-07 |
| 14 | Form 18 [29-03-2016(online)].pdf | 2016-03-29 |
| 14 | 201621010685-CLAIMS [07-02-2020(online)].pdf | 2020-02-07 |
| 15 | Form 3 [29-03-2016(online)].pdf | 2016-03-29 |
| 15 | 201621010685-PatentCertificate24-11-2023.pdf | 2023-11-24 |
| 16 | Form 5 [29-03-2016(online)].pdf | 2016-03-29 |
| 16 | 201621010685-IntimationOfGrant24-11-2023.pdf | 2023-11-24 |
| 1 | 201621010685_09-08-2019.pdf |