Specification
FORM 2
THE PATENTS ACT, 1970 (39 of 1970)
& THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
[See section 10, Rule 13]
ENCODING DEVICE AND ENCODING METHOD;
PANASONIC CORPORATION, A CORPORATION ORGANIZED AND EXISTING UNDER THE LAWS OF JAPAN, WHOSE ADDRESS IS 1006, OAZA KADOMA, KADOMA-SHI, OSAKA 5718501, JAPAN.
THE FOLLOWING SPECIFICATION
PARTICULARLY DESCRIBES THE INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED.
DESCRIPTION
Technical Field
The present invention relates to a coding apparatus and coding method for encoding speech signals and audio signals.
Background Art
In mobile communication, it is necessary to compress and encode digital information of speech and images for efficient use of radio channel capacity for radio waves and storage media, and many coding and decoding schemes have been developed so far.
Among these, the performance of speech coding technology has been improved significantly by the fundamental scheme of "CELP (Code Excited Linear Prediction)." which models the vocal tract system of speech and skillfully adopts vector quantization. Further, the performance of sound coding technology such as audio coding has been improved significantly by transform coding techniques (such as MPEG-standard ACC and MP3).
On the other hand, a scalable codec, the standardization of which is in progress by ITU-T (International Telecommunication Union -Telecommunication Standardization Sector) and others, is designed to cover from the conventional speech band (which is a band of 300 Hz to 3.4 kHz at 8 kHz sampling) to the wideband (which is a band of 50 Hz to 7 kHz at 16 kHz sampling). Further, in the standardization, it is also necessary to encode frequency band signals of an ultra wideband (which is a band of 10 Hz to 15 kHz at 32 kHz sampling). Accordingly, in a wideband codec, audio has to be encoded in a certain degree, which cannot be supported only by conventional, low-bit-rate speech coding techniques based on the human voice model such as CELP. Now, ITU-T standard G.729.1. declared earlier as a recommendation, uses an audio codec coding scheme of transform coding, to encode speech of wideband or above.
Patent Literature 1 discloses a coding scheme utilizing spectral parameters and pitch parameters, whereby signals acquired by inverse-filtering speech signals by spectral parameters are orthogonally transformed
and encoded, and, as an example of coding, further discloses a coding method based on codebooks of an algebraic structure.
Patent Literature 2 discloses a coding scheme of dividing a speech
signal into the linear prediction parameters and the residual components.
performing orthogonal transform of residual components, and normalizing
the residual waveform by the power and then quantizing the gain and the
normalized residue. Further, Patent Literature 2 discloses vector
quantization as a quantization method for normalized residue.
Non-Patent Literature 1 discloses a coding method based on an algebraic codebook improving excitation spectrums in TCX (i.e. a fundamental coding scheme modeled by filtering of an excitation subjected to transform coding and spectral parameters), and this coding method is adopted in ITU-T standard G.729.1.
Non-Patent Literature 2 discloses description of the MPEG-standard scheme, "TC-WVQ." This scheme is also used to transform linear prediction residue and perform vector quantization of a spectrum, using DCT (Discrete Cosine Transform) as an orthogonal transform method.
With the above four conventional techniques, upon coding, it is possible to use quantization of spectral parameters such as linear prediction parameters, which is an efficient coding element technique for speech signals, and realize efficient audio coding and a low bit rate.
Citation List Patent Literature
PTL I: Japanese Patent Application Laid-Open No.HEIlO-260698 PTL 2: Japanese Patent Application Laid-Open No.HE107-261800 Non-Patent Literature
NPL 1: Xie, Adoul, "EMBEDDED ALGEBRAIC VECTOR QUANTIZERS
(EAVQ) WITH APPLICATION TO WIDEBAND SPEECH CODING,"
ICASSP'96
NPL 2: Moriya, Honda, "Transform Coding of Speech Using a Weighted
Vector Quantizer," IEEE journal on selected areas in communications,
Vol.6, No.2, February 1988
Summary of Invention Technical Problem
However, the number of bits to be assigned is small especially in a relatively lower layer of a scalable codec, and. consequently, the performance of excitation transform coding is not sufficient. For example, in ITU-T standard G.729.1. although the bit rate is 12 kbps up to a second layer of the telephone band (300 Hz to 3.4 kHz), only 2 kbps is assigned to a third layer supporting the next wideband (50 Hz to 7 kHz). Thus, when there are few information bits, it is not possible to achieve sufficient perceptual performance by a method of encoding a spectrum acquired by an orthogonal transform, with vector quantization using a codebook.
Further, as for above G.729.1, in a scalable codec to implement extension standardization, in the same way as above, only a low bit rate of 2 kbps is assigned to an enhancement layer in which the bit rate increases from a wideband (50 Hz to 7 kHz) to an ultra wideband (10 Hz to 15 kHz). That is. despite the 8 kHz increase of the band, it is not possible to secure a sufficient bit rate.
It is therefore an object of the present invention to provide a coding apparatus and coding method that can achieve good perceptual quality even when there are few information bits.
Solution to Problem
The coding apparatus of the present invention employs a configuration having: a shape quantizing section that encodes a shape of a frequency spectrum; and a gain quantizing section that encodes a gain of the frequency spectrum, in which the shape quantizing section includes: an interval search section that searches for a first waveform in each of a plurality of bands dividing a predetermined search interval, and encodes the first waveform searched out in a predetermined band, by a smaller number of bits than other first waveforms; and a thorough search section that searches for a second waveform over the predetermined search interval, and, when the second waveform located in the predetermined band satisfies a predetermined condition, encodes a position near a position of the second waveform located in the predetermined band.
The coding method of the present invention includes: a shape quantizing step of encoding a shape of a frequency spectrum; and a gain
quantizing step of encoding, a gain of the frequency spectrum, in which the shape quantizing step includes: an interval search step of searching for a first waveform in each of a plurality of bands dividing a predetermined search interval, and encoding the first waveform searched out in a predetermined band, by a smaller number of bits than other first waveforms; and a thorough search step of searching for a second waveform over the predetermined search interval, and, when the second waveform located in the predetermined band satisfies a predetermined condition, encodes a position nearby a position of the second Waveform located in the predetermined band.
Advantageous Effects of Invention
According to the present invention, it is possible to accurately encode frequency (positions) where energy is present, so that it is possible to improve qualitative performance, which is unique to spectrum coding, and provide good sound quality even at a low bit rate.
Brief Description of Drawings
FIG. 1 is a block diagram showing the Configuration of a speech coding apparatus according to Embodiments 1 and 2 of the present invention;
FIG.2 is a block diagram showing the configuration of a speech decoding apparatus according to Embodiments I and 2 of the present invention,
FIG.3 is a flowchart showing a search algorithm of an interval search section according to Embodiment 1 of the present invention:
FIG.4 shows an example of a spectrurn represented by pulses searched out in an interval search section according to Embodiment 1 of the present invention;
FIG.5 is a flowchart showing a search algorithm of a thorough search section according to Embodiment 1 of the present invention;
FIG.6 is a flowchart showing a search algorithm of a thorough search section according to Embodiment 1 of the present invention;
FIG.7 shows an example of a coding result of pulse positions searched out by thorough search;
FIG.8 shows an example of a spectrum represented by pulses searched out in an interval search section and thorough search section according to Embodiment 1 of the present invention;
FIG.9 is a flowchart showing a decoding algorithm of a spectrum decoding section according to Embodiment 1 of the present invention;
FIG. 10 is a flowchart showing a search algorithm of an interval search section according to Embodiment 2 of the present invention;
FIG.il is a flowchart showing a search algorithm of a thorough search section according to Embodiment 2 of the present invention; and
FIG. 12 is a flowchart showing a search algorithm of a thorough search section according to Embodiment 2 of the present invention.
Description of Embodiments
Human perception perceives voltage components (i.e. the signal value of a digital signal) logarithmically, and, consequently, in a case where speech signals are converted into the frequency domain and encoded. has a characteristic of having difficulty recognizing frequency accurately and perceptually in higher spectral components. For example, human perception perceives the same amount of increase (twice) between a case where the signal value increases from 10 dB to 20 dB and a case where the signal value increases from 20 dB to 40 dB. In contrast, although human perception can perceive the difference of signal values between 20 dB and 21 dB, it cannot perceive the difference between 1000 dB and 1001 dB.
The present invention has focused on this point and arrived at the present invention. That is, the present invention adopts a model of encoding a frequency spectrum by a small number of pulses, and, in coding for transforming a coding speech signal (time-series vector) into the frequency domain by an orthogonal transform, encodes a spectrum and then performs coding at a low bit rate with reduced accuracy of frequency information of high frequency components.
An embodiment of the present invention will be explained below with reference to the accompanying drawings. Here, an example case will be described with the present embodiment, using a speech coding apparatus and a speech decoding apparatus as a coding apparatus and a decoding apparatus, respectively.
FIG.l is a block diagram showing the configuration of a speech
coding apparatus according to the present embodiment. The speech coding apparatus shown in FIG.l is provided with LPC analyzing section 101. LPC quantizing section 102, inverse filter 103. orthogonal transform section 104, spectrum coding section 105 and multiplexing section 106. Spectrum coding section 105 is provided with shape quantizing section 111 and gain quantizing section 112.
LPC analyzing section 101 performs a linear prediction analysis of an input speech signal and outputs a spectral envelope parameter to LPC quantizing section 102 as an analysis result. LPC quantizing section 102 performs quantization processing of the spectral envelope parameter (LPC: Linear Prediction Coefficient) outputted from LPC analyzing section 101, and outputs a code representing the quantized LPC. to multiplexing section 106. Further, LPC quantizing section 102 outputs decoded parameters acquired by decoding the code representing the quantized LPC, to inverse filter 103. Here, the parameter quantization may adopt vector quantization ("VQ"). prediction quantization, multi-stage VQ, split VQ and other modes.
Inverse filter 103 inverse-filters input speech using the decoded parameters and outputs the resulting residual component to orthogonal transform section 104.
Orthogonal transform section 104 applies a match window, such as a sine window, to the residual component, performs an orthogonal transform using MDCT (Modified Discrete Cosine Transform), and outputs a spectrum transformed into the frequency domain (hereinafter "input spectrum"), to spectrum coding section 105. Here, the orthogonal transform may employ other transforms such as the FFT (Fast Fourier Transform), K.LT (Karhunen-Loeve Transform) and Wavelet transform, and, although their usage varies, it is possible to transform the residual component into an input spectrum using any of these.
Here, the order of processing may be reversed between inverse filter 103 and orthogonal transform section 104. That is, by dividing an input speech signal subjected to orthogonal transform by the frequency spectrum of an inverse filter (i.e. subtraction on the logarithmic axis), it is possible to provide the same input spectrum.
Spectrum coding section 105 quantizes the spectral shape and gain of the input spectrum separately and outputs the resulting quantization codes to multiplexing section 106. Shape quantizing section 111 quantizes
the shape of the input spectrum based on the positions and polarities of a small number of pulses. Here, in coding of pulse positions, shape coding section 1 ! 1 performs coding with a saved number of bits by reducing the accuracy of position information in the higher frequency band. Gain quantizing section 112 calculates and quantizes the gain of the pulses searched out by shape quantizing section 111, on a per band basis. Shape quantizing section 111 and gain quantizing section 112 will be described later in detail.
Multiplexing section 106 receives as input a code representing the quantized LPC from LPC quantizing section 102 and a code representing the quantized input spectrum from spectrum coding section 105, multiplexes these items of information, and outputs the result to the transmission channel as encoded information.
FIG.2 is a block diagram showing the configuration of a speech decoding apparatus according to the present embodiment. The speech decoding apparatus shown in FIG.2 is provided with demultiplexing section 201, parameter decoding section 202. spectrum decoding section 203, orthogonal transform section 204 and synthesis filter 205.
Encoded information transmitted from the speech coding apparatus of FIG.l is received in the speech decoding apparatus of FIG.2 and demultiplexed into individual codes in demultiplexing section 201. The code representing the quantized LPC is outputted to parameter decoding section 202. and the code of the input spectrum is outputted to spectrum decoding section 203.
Parameter decoding section 202 decodes the spectral envelope parameter and outputs the resulting decoded parameter to synthesis filter 205.
Spectrum decoding section 203 decodes the shape vector and gain by a method supporting the coding method in spectrum coding section 105 shown in FJG.l, acquires a decoded spectrum by multiplying the decoded shape vector by the decoded gain, and outputs the decoded spectrum to orthogonal transform section 204.
Orthogonal transform section 204 transforms the decoded spectrum outputted from spectrum decoding section 203 in an opposite way to orthogonal transform section 104 shown in FIG.l, and outputs the resulting, time-series decoded residual signal to synthesis filter 205.
Synthesis fitter 205 provides output speech by applying a synthesis filter to the decoded residual signal outputted from orthogonal transform section 204, using the decoded parameter outputted from parameter decoding section 202.
Here, to reverse the order of processing between inverse filter 103 and orthogonal transform section 104 shown in FIG.l, the speech decoding apparatus of FIG.2 performs a multiplication by the frequency spectrum of the decoded parameter (i.e. addition on the logarithmic axis) before performing an orthogonal transform, and then performs an orthogonal transform of the resulting spectrum.
Next, shape quantizing section 1 1 I and gain quantizing section 11 2 will be explained in detail. Shape quantizing section I 1 J is provided with interval search section 12 1 that searches for pulses in each of a plurality of bands into which a predetermined search interval is divided, and thorough search section 122 that searches for pulses over the entire search interval.
Following equation 1 provides the reference of search. Here, in equation 1, E is the coding distortion, Sj is the input spectrum, g is the optimal gain, 5 is the delta function, and p is the pulse position. [1]
From equation I above, the pulse position to minimize the cost function refers to a position in which the absolute value jsp| of the input spectrum in each band is maximum, and the polarity refers to a polarity of the input spectrum value in the position of that pulse.
An example case will be explained below where the vector length of an input spectrum is eighty samples and the number of bands is five, and where the spectrum is encoded using eight pulses in total, one pulse from each band and three pulses from the entire band. In this case, the length of each band is sixteen samples. Further, the amplitude of pulses to search for is fixed to "1," and their polarity is "+" or "-."
Also, upon shape coding, the number of bits is saved by reducing the accuracy of pulse positions in two high frequency bands. To be more specific, although coding is performed in all positions, positions in two high frequency bands are limited to "odd-numbered" positions in decoding. Here, in a case where a pulse is already present upon decoding, a case is
possible where a pulse is placed in an even-numbered position.
Interval search section 121 searches for the position of" the maximum energy and the polarity (+/-) in each band, and places one pulse per band. In this example, the number of ban^s is five, and each band requires four bits (entries of positions: sixteen) * three bands + three bits (entries of positions: eight) * two bands to show the pulse position and one bit to show the polarity (+/-), requiring twenty three information bits in total. Also, if the accuracy in the high frequency bands is not reduced, it requires five (bands) x (four (position) and one (polarity)) = twenty-five information bits. Therefore, according to this example, it is possible to save two bits compared to a case of not reducing the accuracy in high frequency bands.
The flow of the search algorithm of interval search section 121 is shown in FIG.3. Here, the symbols used in the flowchart of FIG.3 stand for lYie loYioVing.
i: position
b: band number
max: maximum value
c: counter
posfbj: search result (position)
pol[b]: search result (polarity)
s[i]: input spectrum
As shown in FIG.3. interval search section 121 calculates the input spectrum s[i] of each sample (0
Documents
Application Documents
| # |
Name |
Date |
| 1 |
2117-MUMNP-2010- AFR.pdf |
2022-12-05 |
| 1 |
Other Patent Document [05-10-2016(online)].pdf |
2016-10-05 |
| 2 |
abstract1.jpg |
2018-08-10 |
| 2 |
2117-MUMNP-2010-AbandonedLetter.pdf |
2018-10-31 |
| 3 |
2117-MUMNP-2010-POWER OF ATTORNEY(2-11-2010).pdf |
2018-08-10 |
| 3 |
2117-mumnp-2010-abstract.doc |
2018-08-10 |
| 4 |
2117-mumnp-2010-other document.pdf |
2018-08-10 |
| 4 |
2117-mumnp-2010-abstract.pdf |
2018-08-10 |
| 6 |
2117-mumnp-2010-form pct-isa-210.pdf |
2018-08-10 |
| 6 |
2117-mumnp-2010-claims.pdf |
2018-08-10 |
| 7 |
2117-mumnp-2010-form pct-ib-304.pdf |
2018-08-10 |
| 7 |
2117-MUMNP-2010-CORRESPONDENCE(15-3-2012).pdf |
2018-08-10 |
| 8 |
2117-mumnp-2010-form pct-ib-301.pdf |
2018-08-10 |
| 8 |
2117-MUMNP-2010-CORRESPONDENCE(2-11-2010).pdf |
2018-08-10 |
| 9 |
2117-mumnp-2010-form 5.pdf |
2018-08-10 |
| 9 |
2117-MUMNP-2010-CORRESPONDENCE(2-5-2012).pdf |
2018-08-10 |
| 10 |
2117-MUMNP-2010-CORRESPONDENCE(6-4-2011).pdf |
2018-08-10 |
| 10 |
2117-mumnp-2010-form 3.pdf |
2018-08-10 |
| 11 |
2117-mumnp-2010-correspondence.pdf |
2018-08-10 |
| 11 |
2117-MUMNP-2010-FORM 3(6-4-2011).pdf |
2018-08-10 |
| 12 |
2117-mumnp-2010-description(complete).pdf |
2018-08-10 |
| 12 |
2117-MUMNP-2010-FORM 3(2-5-2012).pdf |
2018-08-10 |
| 13 |
2117-mumnp-2010-drawing.pdf |
2018-08-10 |
| 13 |
2117-mumnp-2010-form 2.pdf |
2018-08-10 |
| 14 |
2117-MUMNP-2010-ENGLISH TRANSLATION(2-11-2010).pdf |
2018-08-10 |
| 14 |
2117-mumnp-2010-form 2(title page).pdf |
2018-08-10 |
| 15 |
2117-MUMNP-2010-FER.pdf |
2018-08-10 |
| 15 |
2117-MUMNP-2010-FORM 18(15-3-2012).pdf |
2018-08-10 |
| 16 |
2117-mumnp-2010-form 1.pdf |
2018-08-10 |
| 16 |
2117-MUMNP-2010-FORM 1(2-11-2010).pdf |
2018-08-10 |
| 17 |
2117-MUMNP-2010-FORM 1(2-11-2010).pdf |
2018-08-10 |
| 17 |
2117-mumnp-2010-form 1.pdf |
2018-08-10 |
| 18 |
2117-MUMNP-2010-FER.pdf |
2018-08-10 |
| 18 |
2117-MUMNP-2010-FORM 18(15-3-2012).pdf |
2018-08-10 |
| 19 |
2117-MUMNP-2010-ENGLISH TRANSLATION(2-11-2010).pdf |
2018-08-10 |
| 19 |
2117-mumnp-2010-form 2(title page).pdf |
2018-08-10 |
| 20 |
2117-mumnp-2010-drawing.pdf |
2018-08-10 |
| 20 |
2117-mumnp-2010-form 2.pdf |
2018-08-10 |
| 21 |
2117-mumnp-2010-description(complete).pdf |
2018-08-10 |
| 21 |
2117-MUMNP-2010-FORM 3(2-5-2012).pdf |
2018-08-10 |
| 22 |
2117-mumnp-2010-correspondence.pdf |
2018-08-10 |
| 22 |
2117-MUMNP-2010-FORM 3(6-4-2011).pdf |
2018-08-10 |
| 23 |
2117-MUMNP-2010-CORRESPONDENCE(6-4-2011).pdf |
2018-08-10 |
| 23 |
2117-mumnp-2010-form 3.pdf |
2018-08-10 |
| 24 |
2117-MUMNP-2010-CORRESPONDENCE(2-5-2012).pdf |
2018-08-10 |
| 24 |
2117-mumnp-2010-form 5.pdf |
2018-08-10 |
| 25 |
2117-MUMNP-2010-CORRESPONDENCE(2-11-2010).pdf |
2018-08-10 |
| 25 |
2117-mumnp-2010-form pct-ib-301.pdf |
2018-08-10 |
| 26 |
2117-MUMNP-2010-CORRESPONDENCE(15-3-2012).pdf |
2018-08-10 |
| 26 |
2117-mumnp-2010-form pct-ib-304.pdf |
2018-08-10 |
| 27 |
2117-mumnp-2010-claims.pdf |
2018-08-10 |
| 27 |
2117-mumnp-2010-form pct-isa-210.pdf |
2018-08-10 |
| 29 |
2117-mumnp-2010-abstract.pdf |
2018-08-10 |
| 29 |
2117-mumnp-2010-other document.pdf |
2018-08-10 |
| 30 |
2117-MUMNP-2010-POWER OF ATTORNEY(2-11-2010).pdf |
2018-08-10 |
| 31 |
abstract1.jpg |
2018-08-10 |
| 31 |
2117-MUMNP-2010-AbandonedLetter.pdf |
2018-10-31 |
| 32 |
Other Patent Document [05-10-2016(online)].pdf |
2016-10-05 |
| 32 |
2117-MUMNP-2010- AFR.pdf |
2022-12-05 |
Search Strategy
| 1 |
search_strategy_18-09-2017.pdf |