Modified Cascaded Median Based Noise Estimation For Speech Enhancement

< Back

Modified Cascaded Median Based Noise Estimation For Speech Enhancement

Abstract: The application of speech enhancement in daily useful devices is more attractive attention to the research community. Thespeech enhancement approach for eliminating the noise proposes modified cascaded median method for estimating the noise spectrum from the degraded speech signal. This is done using the spectral subtraction method for a single-channel speech enhancement system. For implementation, MATLAB platform has been used through the Simulink library. After getting faithful results from the model, the obtained results are evaluated through PESQ score, Segmented SNR values. The PESQ score of recovered speech signals (enhanced speech signals) is higher as compared to observed speech signals at each SNR levels - 1, 3, 5, 7, 9, 11, 13 and 15dB. Also, In the case of Segmented SNR values, it is noted that the performance of MCM is better as compared to CM method. Even in different noises conditions, the MCM method performs well in both parameters like PESQ Score and SSNR. 5 Claims & 6 Figures

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

04 October 2023

Publication Number

42/2023

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

MLR Institute of Technology

Laxman Reddy Avenue, Dundigal-500043

Inventors

1. Dr. B.Sridhar

Department of Electronics and Communication Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

2. Dr. Manoj Kumar

Department of Electronics and Communication Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

3. Dr. SVS Prasad

Department of Electronics and Communication Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

4. Mr. A.Sudhakar

Department of Electronics and Communication Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal-500043

Specification

Description:Field of Invention
The present invention relates to Speech Signal Processing and is also significantly related to speech enhancement, echo cancellation, pre-processing for any system or devices which worked in noisy environment. In this invention, a novel noise spectrum is proposed for obtaining the noise spectrum for speech enhancement systems.
.The objectives of this invention
This invention aims to enhance the degraded speech signal for better performance of systems whose are performed in noisy medium/environment or real-time sceneries for example hearing aid device, Automatic speech recognition system, human-machine communication (HMC), Humanoids ...etc. For these systems, basically in pre-processing stage, speech enhancement methods are deployed to reduce the noises which are available with the input speech signal.
Background of the invention
In speech processing, speech enhancement techniques are used as pre-processing for removing the noise with less speech distortion. Due to continually increasing pollution, our living places are nosier, however, speech processing based systems disturb from noises. Actually, noises picked up through microphone simultaneously with speech signal. Several speech enhancement techniques like as wiener filtering (Jaiswal, et.al, International Journal of Speech Technology, Vol. 25, No.3, pp:745-758, 2022), MMSE STSA (kumar, Bittu, International Journal of Speech Technology, Vol. 21, no.4, pp: 1033-1044, 2018), spectral subtraction (Bharti, et.al, 2016 3rd International Conference on Recent Advances in Information Technology (RAIT). IEEE, 2016), signal subspace approach and source separation are available for recovering the desired speech signal from degraded speech signals. Among all, spectral subtraction method is found lots of attentions in last few decades. According to spectral subtraction process, original speech signal is recovered by subtracting the estimated noise from degraded speech signal. If we estimate accurate noises (it is present in observed noisy speech signal), then spectral subtraction method can give better speech quality signal. Therefore, selection of noise estimation method is also important for any speech enhancement techniques.
Through the literature survey of noise estimation method, researchers more devoted toward quantile-based noise estimation method in the application of speech enhancement. In quantile-based noise estimation method (V. Stahl, A. Fisher, and R. Bipus, “Spectral subtraction based on minimum statistics”, in Proc. IEEE Int. Conf. Acoust., Speech, and Sig. Proc. ICASSP’00. 2000.), Stahl et al observe that most of frames (80-90%) of degraded speech carry low energy level spectrum which is very near to noise energy spectrum signal in the particular frequency bins and only few number of frames (10-20%) of the signal contain high energy level spectrum corresponding to speech signal. To this observation, noise samples are obtained as a few quantile values histogram of the observed degraded speech signals. More sorting operations are needed for finding of the sample value at every quantile chosen in every frequency bins. In cascaded median based noise estimation method (Santosh K. Waddi, et al, “Speech Enhancement Using Spectral Subtractionand Cascaded-Median Based Noise Estimation for Hearing Impaired Listeners”, Communications (NCC), 2013 National Conference on. IEEE, 2013), Waddi et al reduce the sorting operation, reduces in computational complexity and storage requirement. For improvement in speech quality, in (Kumar, Bittu, Fluctuation and Noise Letters, vol.18, no.04, 1950020, 2019), Kumar had proposed a noise estimation which was modified version of cascaded median based noise estimation (MCM-based method), but it requires some storage requirement. Speech signal (or noisy speech signal) is a non-stationary signal that needs to be first segmented into frames and can be considered as a stationary signal over the duration of a frame. In the CM-based method, for the calculation of median in the 1st stage, samples of multiple frames have been used for estimating the noise spectrum. Here, the authors did not consider the case when sample values of successive frames vary much, resulting in a large value of the estimated noise which is not the true value of noise. In the proposed method, i.e., MCM, the median is calculated from the samples within a frame which is the modification done in the CM-based method. Improved performance of the proposed method is due to this modification which takes into account the case of large variation of sample values across the frames.
Detailed of Prior Art
Several applications of speech enhancement, like hands-free, hearing aids, speech-to-speech communication, etc. are frequently applied to real-time devices. To improve speech quality and reduce noise, noise suppression /speech enhancement algorithms are utilized. Initially, speech filtering filters the degraded speech signal taken from the microphone and outputting the filtered sound. Additionally, it is set up to change the frequency and volume of the auditory input and sound. The augmented speech component is output as an audible sound by the speaker, which amplifies it. (US7191127B2). As we know that when for processing of any voiced speech signal, the amplitude and time periods may be non-uniform, so that intelligibility thereof is adversely unnatural. According to this statement or methodology, the continually few frames of the speech spectrum are processed so that every part of the spectrum has a significantly uniform period and improved intelligibility. In some examples, the processing might be done so that each processed section also has roughly uniform peak amplitudes. The improved voiced waveform acquired by the enhancement technique may be utilised in conjunction with approaches for handling unvoiced speech waveforms to improve their understandability. (US4468804A). A method of estimating the noise power spectrum in an observed noisy signal having a power spectrum comprising the steps of (a) Receiving and storing the noisy signal for T time frames; (b) for a frequency k sorting the frames and obtaining a noise estimate, wherein the noise estimate is calculated as the quantile (c) applying a recursive function to generate, the recursive function is arranged to reduce fluctuations in the noise estimate. Also, a method of saving processing power by only updating odd frequency bands for a one-time frame and only updating even frequency bands for the next time frame.(GB2426167A). In some embodiments, a method comprises: dividing, using at least one processor, an audio input into speech and non-speech segments; for each frame in each non-speech segment, estimating, using at least one processor, a time-varying noise spectrum of the non-speech segment; for each frame in each speech segment, estimating, using the at least one processor, speech spectrum of the speech segment; for each frame in each speech segment, identifying one or more non-speech frequency components in the speech spectrum; comparing the one or more non-speech frequency components with one or more corresponding frequency components in a plurality of estimated noise spectra and selecting the estimated noise spectrum from the plurality of estimated noise spectra based on a result of the comparing. (WO2022066590A1).
Summary of Invention
This patent discusses the noise estimation method for speech enhancement systems in commonly used for hearing aids and speech-to-speech communication systems. The proposed method, i.e., the modified cascaded median based noise estimation method, is given a good quality speech signal with less computation time and memory. In terms of the mean PESQ Score and mean segmented signal to noise ratio (SSNR), its performance is better than the existing method. The Mean Opinion Score (MOS) is also calculated using listening testfor the enhanced speech signal and it is observed that it is very close to the listening comfort of the original speech signal. So, the modified cascaded based noise estimation can utilize for enhancement purposes in hearing aid devices.
Detailed description of the invention
The block diagram of the speech enhancement approach employing the MCM-based noise estimation method is shown in Fig. 1. Only noisy speech signals are used as input signals in this method to find the original speech signals. The observed noisy speech signal, which includes background noise and real speech data, initially undergoes a Fast Fourier Transform (FFT) to produce a noisy speech spectrum. The magnitude of the noisy speech spectrum is employed by the spectral subtraction method to recover the original speech spectrum. The true speech signal is identified using this method by deducting the noise spectrum from the measured speech spectrum. Complex spectrum is produced by combining the acquired spectrum (enhanced speech spectrum) and phase spectrum of the nosy speech signal. For recover this spectrum into time domain we have applied Inverse Fast Fourier Transform (IFFT).In this process, we were assumed that speech and noise was uncorrected and also the phase spectrum of noisy speech spectrum was independent of noise.
The sampled degraded speech signal s(n), which is a combination of additive noise e(n) and clean speech signal y(n), is used to estimate the noise spectrum and is shown in equation (1).
s(n)= y(n)+ e(n) (1)
Here, it is assumed that there is no correlation between y(n) and e(n). A window function is used to first partition the degraded speech signal s(n) into distinct frames. Additionally, by using Fast Fourier Transform (FFT), these frames are converted into the frequency domain, and equation (1) can be put in equation form (2).
S_i (k)= Y_i (k)+ E_i (k) (2)
where the clean speech signal's spectrum is represented by Y_i (k)and the noise spectrum is represented by E_i (k). Here, I stands for the frame index and "k" stands for the frequency bin index (0, 1, 2,..., N 1). Using an MCM-based noise estimation approach, the noise spectrum is inferred from the magnitude spectrum of the speech signal that has been degraded. The following are the steps that make up the MCM-based noise estimation approach.
Step1: Take the input S_i (k)and arrange it in the form of a 2-D array.
S_i (k)=[S_1 (k),S_2 (k) ,S_3 (k)……….?,S?_R (k) ]^( T) (3)
where, i=1,2,3…., R and S_1 (k),S_2 (k) ,S_3 (k)………,S_R (k)?R^(1×f)
Step2: Calculate the ensemble median M_iof S_i (k) using
M_i=?[M_1,? M?_2,M_3 ?,M?_4…..M_R]?^T (4)
where, M_1=median{S_1 (k) },M_2=median{S_2 (k) }… ? M?_R=median{S_R (k) }.
Step3: Find the median value M_R from? M?_i using
M_R=median{M_i } (5)
Step4: Repeat step 1, 2 and 3 for the S number of stages.
The final median value will be:
M_S=median{M_R } (6)
Step5: Estimate the noise spectrum E_i (k)using
E_i (k)= M_S U (7)
where, U=?[1 1 1 1 1…….]?^T ?R^(1×f).
Brief description of Drawing
In the figures which are illustrated exemplary embodiments of the invention.
Figure 1 Speech Enhancement System.
Figure 2 Modified Cascaded Median based Noise Estimation Method.
Figure 3 Performance of MCM in terms of PESQ Score with different SNR Levels.
Figure 4 Performance of MCM in terms of SSNR with different SNR Levels.
Figure 5 Performance of MCM in terms of PESQ Score with different Noises.
Figure 6Performance of MCM in terms of SSNR with different Noises

Detailed description of the drawing
As described above present invention relates to MCM based noise estimation for speech enhancement system.
Figure 1 shows the block diagram of MCM noise estimation based speech enhancement technique. In this technique, only noisy speech signal is used as input signal for finding the original speech signal. The observed noisy speech signal which first passes through Fast Fourier Transform (FFT) for transforming into noisy speech spectrum. For obtaining original speech spectrum, the magnitude of noisy speech spectrum is used by spectral subtraction method. For estimating the noise spectrum, MCM-based method is used. Next, the obtained spectrum (enhanced speech spectrum) and phase spectrum of nosy speech signal are combined into complex spectrum. For recover this spectrum into time domain we have applied Inverse Fast Fourier Transform (IFFT).
The block diagram for the MCM-based noise estimation approach is shown in Figure 2. The median blocks are staged in a cascading manner according to the MCM approach. Degraded spectra are kept in the first stage, while the subsequent stages each store an ensemble of the median of the degraded spectra. Assuming that "R" recent frames are present in the first stage, "R" numbers of median values are obtained and stored in the second stage utilising the median calculation from each frame's samples. Once more, a median value is derived from the 'R' ensemble medians in the second stage, and the subsequent stage stores this median (3rd stage). The third stage is entirely filled with median values following 'R' repetitions of this operation. The fourth stage's median is once more calculated, and the results are accumulated in the fifth stage. In the final step, a single median is obtained and multiplied by a unity array to create a frame. Finally, a frame-based spectrum is identified, and this spectrum is applied to the speech enhancement process as a noise spectrum.
In the interpretation of the simulation results using objective measures, enhanced speech signals obtained through use of three noise estimation methods have been considered. The noisy speeches, in this case, differ in SNR values and are corrupted with pink noise. In terms of PESQ score and SSNR, the comparison findings for the CM, MCM, and DQT techniques are displayed in Figs. 3–4. As shown in Fig. 3, the enhanced speech signals' PESQ score for MCM-based noise estimation is greater than the scores for the other two methods for all SNR levels, and the scores for the other two methods are equal. Figure 4 shows that at all SNR levels, the mean value of the improved speech signal's SSNR for the MCM-based noise estimating approach is between the values obtained with the CM and DQT methods.
Figure 5 displays the mean PESQ Score for the cascaded median, modified cascaded median, and dynamic quantile tracking methods using a range of noises (burst, f16, factory, pink, Volvo, and white noise). This figure shows that the PESQ score of enhanced speech signals for the MCM technique is high compared to other approaches; however the DQT method performs well for particular noises, such as factory and Volvo noise. Figure 6 depicts additional comparison results for the CM, MCM, and DQT techniques in terms of SSNR, along with variations of various sounds, including burst, f16, factory, pink, Volvo, and white noise. The enhanced speech signal of the MCM-based technique exhibits good SSNR performance, especially for pink noise and white noise-corrupted speech signal, but when compared to CM, the MCM-based method performs better in all noisy instances.
5 Claims & 6 Figures , Claims:The scope of the invention is defined by the following claims:

Claims:
1. Modified Cascaded Median based method for the estimation of noise isused to improve the speech quality of the speech to speech communication for hearing aids devices.
a) For estimating the noise spectruma novel method i.e, MCM method is reportedfor the application of speech enhancement systems. Here, median is used to obtain the noise.
b) Using estimated median values, we obtain the noise spectrum from the degraded speech spectrum with the help ofcascaded construction.
c) Next, with the help of spectral subtraction method, we can produce the improved or enhanced speech signal using the degraded speech and estimated noise spectrum.
2. As per Claim 1, the quality of the speech signal in terms of PESQ has been obtained using different noisy or degraded speech signals.
3. As per Claim 1, the testing of the obtained results has been carried out with the various levels of the SNRs like as - 1, 3, 5, 7, 9, 11, 13 and 15dB.
4. As per Claim 1, various speech signals with different combination of words or sentence has been taken as input signal that is degraded by different noise(burst, f16, factory, pink, Volvo and white noise), for evaluating the performance of the MCM-based speech enhancement system.
5. As mentioned in Claim1, the analysis of quality of the enhanced speech signal is also projected in terms of segmented Signal to Noise Ratio.

Documents

Application Documents

#	Name	Date
1	202341066361-REQUEST FOR EARLY PUBLICATION(FORM-9) [04-10-2023(online)].pdf	2023-10-04
2	202341066361-FORM-9 [04-10-2023(online)].pdf	2023-10-04
3	202341066361-FORM FOR STARTUP [04-10-2023(online)].pdf	2023-10-04
4	202341066361-FORM FOR SMALL ENTITY(FORM-28) [04-10-2023(online)].pdf	2023-10-04
5	202341066361-FORM 1 [04-10-2023(online)].pdf	2023-10-04
6	202341066361-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [04-10-2023(online)].pdf	2023-10-04
7	202341066361-EVIDENCE FOR REGISTRATION UNDER SSI [04-10-2023(online)].pdf	2023-10-04
8	202341066361-EDUCATIONAL INSTITUTION(S) [04-10-2023(online)].pdf	2023-10-04
9	202341066361-DRAWINGS [04-10-2023(online)].pdf	2023-10-04
10	202341066361-COMPLETE SPECIFICATION [04-10-2023(online)].pdf	2023-10-04