Abstract: Our Invention “Speech Enhancement using Wavelet Transform” has been claimed. Wavelet transform has been intensively used in various fields of signal processing. It has the advantage of using variable size time-windows for different frequency bands. The tunable Q-factor-based wavelet transform (TQWT) is a novel method employed for the speech enhancement (SE) task. However, in TQWT, the controlling parameters Q-factor and the level of decomposition (J) are kept constant for different noise conditions which deteriorates the overall performance of SE. Generally, the performance of SE is calculated in terms of quality and intelligibility. However, it has been reported that these two evaluation parameters do not always correlate with each other because of the distortions introduced by the SE algorithms. Multiple sources of interference and low signal-to-interference ratio are two major challenges to speech-based intelligent driver assistant systems. They will have a serious impact on the performance of voice control commands. To solve this problem, this study proposes a speech enhancement method based on wavelet analysis and blind source separation in a complicated automobile environment. The system uses a novel thresholding algorithm, and introduces a new method for threshold selection. Moreover, the efficiency of the system has been increased by selecting more suitable parameters for voiced, unvoiced, and silent regions, separately. The proposed system has been evaluated on different sentences under various noise conditions. The results show a plausible improvement in the performance of the system, in comparison with similar approaches
Description:1. FIELD OF THE INVENTION
[1 ] Our Invention is related to Speech Enhancement using Wavelet Transform.
2. BACKGROUND OF THE INVENTION
[2] In many speech processing applications, speech has to be processed in the presence of undesirable background noise, leading to a need to a front-end speech enhancement. During the last decades, various approaches to the problem have been adopted. Generally the approaches can be classified into two major categories of single-microphone and multi-microphone methods.
[3] Despite its capability of removing background noise, spectral subtraction introduces additional artifacts known as the musical noise, and is faced by difficulties in pause detection. In recent years, several alternative approaches such as extended and iterative Wiener filtering, HMM-based algorithms [1,2] and signal subspace methods [3] have been proposed for enhancing degraded speech.
[4] Problems in applying wavelet thresholding to speech signals Some major problems arise when the basic wavelet thresholding method is applied to a complex signal such as speech degraded by real-life noises.
[5] Long pauses (lasting a few hundred milliseconds) naturally occur in human speech and can be used to obtain an estimate of the noise profile. Since the noise dynamics is often much (at least ten times) slower than the speech dynamics, we assume that the estimated profile remains constant between two consecutive long pauses.
[6] Resonance-Based Signal Decomposition and Speech Signal Processing :The speech signal is a non-stationary signal but the properties remain constant over fixed time intervals of 10–30 ms. By applying STFT, speech signals can be represented as the sum of sinusoids or complex exponentials. The STFT is mostly used with the sliding-window concept for the analysis and synthesis by reconstructing the processed input signal with an inverse transform and overlap-
add method. The periodically changing properties of speech can be extracted using the STFT along with a sliding window concept.
[7] Particle swarm optimization (PSO) is a population-based stochastic optimization technique influenced by the social behavior of bird flocking. The PSO requires few parameters to adjust and hence easy to implement as compared to the genetic algorithm.
[8] Maximization of Speech Quality (Of1): Speech quality is a subjective parameter that mainly depends on the listener’s opinion of the processed speech signals. The quality can be assessed by subjective as well as objective evaluation measures.
[9] Maximization of Speech Intelligibility (Of2): Speech Intelligibility depends on the understanding of the meaning or the content of the spoken words [40]. Short- Time Objective Intelligibility (STOI) is one of the popular and effective objective intelligibility evaluation measures which has a high correlation (95%) with subjective evaluations. It is based on an intermediate intelligibility calculation of 400 ms Time– Frequency regions.
3. OBJECTIVES OF THE INVENTION
1) The objective of the inventions is to find the best values of Q and J of the TQWT-based SE using the MOPSO technique to yield the best possible quality and intelligibility performance at the same time in different noise conditions.
2) Maximization of Speech Intelligibility (Of2): Speech Intelligibility depends on the understanding of the meaning or the content of the spoken words . Short-Time.
3) Objective Intelligibility (STOI) is one of the popular and effective objective intelligibility evaluation measures which has a high correlation (95%) with subjective evaluations.
4) The objective of the inventions to implement Dereverberation at the first stage and Dynamic Level Control (DLC) at the last stage of SE, to provide additional SE by reducing the effect of reverberation and compensating the
difference between the loudness levels between the studio recorded speech and real-life speech.
5) The objective of the inventions to perform the Objective and Subjective evaluation of the proposed method along with the state-of-the-art SE techniques by using three standard speech data sets.
4. SUMMARY OF THE INVENTION
[ 11] The implementation of the ANN model to predict the relationship between the selected audio features and the best values of Q and J of the TQWT algorithm obtained earlier from the MOPSO technique is discussed.
[12 ] The Objective Function of the FLANN model and its block diagram is given used to develop the relationship between the values of the selected audio features at different noise levels and pre-calculated values of the Q and J of the TQWT algorithm obtained from the MOPSO simulation.
[13] One important parameter of good speech quality is the loudness level of each spoken word. While working with a data set of studio recording, the loudness level is normalized and the artists are also trained to produce sound with equal loudness.
But in real-life scenario, the human voice is not normalized and undergoes a lot of loudness change in a single sentence.
[14 ] In the validate the results, two objective evaluation parameters and one subjective evaluation parameter are used. Quality and intelligibility are two different measures for SE performance, and their interrelationship is difficult to interpret.
[15] The TQWT algorithm is based on Resonance-based Signal Decomposition and has been used for speech processing due to several advantages like the sparsity representation of the clean and noisy speech samples and resemblance with the frequency responses of the Mel-scale filter banks.
5. BRIEF DESCRIPTION OF THE DIAGRAM
Fig.1: Simple arrangement of Speech Enhancement using Wavelet Transform.
Fig.2: Particle swarm optimization (PSO)
Fig.3: Flowchart of MOPSO method implementation along with the Fuzzy decision- making.
6. DESCRIPTION OF THE INVENTION
[16 ] Modification of Hard thresholding algorithm As another improvement, we have used a refined version of the hard thresholding function instead of the standard form of an equation . More precisely, instead of setting some wavelet coefficients to zero (which causes observable sharp time-frequency discontinuities in the speech spectrogram), we attenuate the coefficients that are smaller than the threshold value in a nonlinear manner to avoid creating abrupt changes.
[17 ] The output of MOPSO is a group of optimal solutions rather than one solution. These solutions are normally categorized as dominated or non-dominated solutions. The set of non-dominated solutions are called Pareto-optimal solutions and these are placed on the Pareto-front.
[ 18] For optimizing TQWT parameters, there are limitations in that Q and J can take only integer and positive values. Therefore, in the MOPSO implementation, the positive and integer part of the values of Q and J is taken in the simulation. The implementation is done using the noisy speech (train station and babble noise) and clean speech taken from the NOIZEUS data set.
[19 ] Wavelet transform represents a given function f in a very efficient way by using a set of basic functions. In this invention , these basic functions are referred as wavelet families. Some of the most famous wavelet families are Daubechies, Biorthogonal, Coiflet and Symlet. Here, we show them by dbN, biorNr.Nd, coifN and sym N, respectively, where N indicates the order. For the Biorthogonal family, the
decomposition and reconstruction wavelets have different orders and are shown by Nd and Nr, respectively.
, C , C , Claims:1. Our Invention “Speech Enhancement using Wavelet Transform”. Additionally, Calculation and analysis of Q Factor has effectively done.
2. According to claim1# the invention is to a Our Invention “implement Dereverberation at the first stage and Dynamic Level Control (DLC) at the last stage of SE, to provide additional SE by reducing the effect of reverberation and compensating the difference between the loudness levels between the studio recorded speech and real-life speech has been done,
3. According to claim1,2# Finally, this invention provides a future perspective that objective Intelligibility (STOI) is one of the popular and effective objective intelligibility evaluation measures which has a high correlation (95%) with subjective evaluations.
4. According to claim1,2,3# the invention is to calculate the maximization of speech intelligibility (MSI) so that speech intelligibility depends on the understanding of the meaning or the content of the spoken words and Short- Time.
5. According to claim1,2,3,4,# Noisy speech (Seg SNR = 10 dB), enhanced speech using PSS, enhanced speech using MUWCT, and enhanced speech has been calculated .
6. According to claim1,2,3,4,5# the invention is to calculate on Mean opinion score (MOS) results across all test conditions,
7. According to claim1,2,3,4,5,6# Demonstrate the strong initial performance of the Bionic Wavelet Transform representation, continued work in this direction is expected to lead to additional improvement in overall signal enhancement.
| # | Name | Date |
|---|---|---|
| 1 | 202341044434-STATEMENT OF UNDERTAKING (FORM 3) [03-07-2023(online)].pdf | 2023-07-03 |
| 2 | 202341044434-REQUEST FOR EARLY PUBLICATION(FORM-9) [03-07-2023(online)].pdf | 2023-07-03 |
| 3 | 202341044434-FORM-9 [03-07-2023(online)].pdf | 2023-07-03 |
| 4 | 202341044434-FORM 1 [03-07-2023(online)].pdf | 2023-07-03 |
| 5 | 202341044434-DRAWINGS [03-07-2023(online)].pdf | 2023-07-03 |
| 6 | 202341044434-DECLARATION OF INVENTORSHIP (FORM 5) [03-07-2023(online)].pdf | 2023-07-03 |
| 7 | 202341044434-COMPLETE SPECIFICATION [03-07-2023(online)].pdf | 2023-07-03 |