Specification
FORM 2
THE PATENTS ACT, 1970 (39 of 1970)
& THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
[See section 10, Rule 13]
ENCODING DEVICE, DECODING DEVICE, AND METHOD THEREOF;
PANASONIC CORPORATION, A
CORPORATION ORGANIZED AND EXISTING
UNDER THE LAWS OF JAPAN WHOSE
ADDRESS IS 1006, OAZA KADOMA, KADOMA-SHI, OSAKA, 571-8501, JAPAN
THE FOLLOWING SPECIFICATION
PARTICULARLY DESCRIBES THE
INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED.
DESCRIPTION
Technical Field
The present invention relates to an encoding apparatus/decoding apparatus and encoding method/decoding method used in a communication system in which a signal is encoded and transmitted, and received and decoded.
Background Art
When a speech/audio signal is transmitted in a mobile communication system or a packet communication system typified by Internet communication, compression/encoding technology is often used in order to increase speech/audio signal transmission efficiency. Also, in recent years, a scalable encoding/decoding method has been developed that enables a good-quality decoded signal to be obtained fr£™ part of encoded information even if a transmission error occurs during transmission.
One above-described compression/encoding technology is a time-domain predictive encoding technology that increases compressj.on efficiency by using the temporal correlation of a speech signal and/or audio signal (hereinafter referred to as "speech/audio signal"). For example, in Patent Document 1, 2 current-frame signal
2,
is predicted from a past-frame signal and the predictive encoding method is switched according to the prediction error. Also, in Non-patent Docurnent i a technology is described whereby a predictive encoding method is switched according to the degree of change in the time domain of a speech parameter such as LSF (Line Spectral Frequency) and the frame error occurrence state.
Patent Document 1: Japanese Patent Application
Laid-Open No.HEI 8-211900
Non-patent Document 1: Thomas Eriksson/ Jan Linden, and Jan Skoglund, "Exploiting Inter-,frame Correlation In Spectral Quantization," "AcousticSf Speech, and Signal Processing," 1996. ICASSP-96. Conference Proceedings, 7-10 May 1996 Page(s): 765 - 768 volt 2
Disclosure of Invention
Problems to be Solved by the Invention
However, with any of the above technologies, predictive encoding is performed leased on a time domain parameter on a frame-by-frame basiS/ and predictive encoding based on a non-time domcain parameter such as a frequency domain parameter is not mentioned. If a predictive encoding method base^ on a time domain parameter such as described above 1S simply applied to frequency domain parameter encoding there is no problem if a quantization target band is thle same in a past frame
3
and current frame, but if the quantization target band is different in a past frame and current frame, encoding error and decoded signal audio quality degradation increases greatly, and a speech/audio signal may not be able to be decoded.
It is an object of the present invention to provide an encoding apparatus and so forth capable of reducing the encoded information amount of a speech/audio signal, and also capable of reducing speech/audio signal encoding error and decoded signal audio quality degradation, when a frequency component of a different band is made a quantization target in each frame.
Means for Solving the Problems
An encoding apparatus of the present invention employs a configuration having: a transform section that transforms an input signal to the frequency domain to obtain a frequency domain parameter; a selection section that selects a quantization target band from among a plurality of subbands obtained by dividing the frequency domain, and generates band information indicating the quantization target band; a shape quantization section that quantizes the shape of the frequency domain parameter in the quantization target band; and a gain quantization section that encodes gain of a frequency domain parameter in the quantization target band to obtain gain encoded information.
4
A decoding apparatus of the present invention employs a configuration having: a deceiving section that receives information indicating a quantization target band selected from among a plurality of subbands obtained by dividing a frequency domain of ah input signal; a shape dequantization section that decodes shape encoded information in which the shape Qf a frequency domain parameter in the quantization target band is quantized, to generate a decoded shape; a gain a;equantization section that decodes gain encoded information in which gain of a frequency domain parameter in the quantization target band is encoded, to generate decoded gain, and decodes a frequency parameter using the decoded shape and the decoded gain to generate a decoded frequency domain parameter; and a time domain transform section that transforms the decoded frequency domain parameter to the time domain to obtain a time domain decoded signal.
An encoding method of the present invention has: a step of transforming an input signal to the frequency domain to obtain a frequency domain parameter; a step of selecting a quantization target band from among a plurality of subbands obtained by dividing the frequency domain, and generating band infori^ation indicating the quantization target band; and a step of quantizing the shape of the frequency domain parameter in the quantization target band to obtain shape encoded information; and encoding gain of a frequency domain
5
parameter in the quantization target band, to obtain gain encoded information.
A decoding method of the present invention has: a step of receiving information indicating a quantization target band selected from among a plurality of subbands obtained by dividing a frequency domain of an input signal; a step of decoding shape encoded information in which the shape of a frequency domain parameter in the quantization target band is quantized, to generate a decoded shape; a step of decoding gain encoded information in which gain of a frequency domain parameter in the quantization target band is quantized, to generate decoded gain, and decoding a frequency domain parameter using the decoded shape and the decoded gain to generate a decoded frequency domain parameter; and a step of transforming the decoded frequency domain parameter to the time domain to obtain a time domain decoded signal.
Advantageous Effect of the Invention
The present invention reduces the encoded information amount of a speech/audio signal or the like, and also can prevent sharp quality degradation of a decoded signal, decoded speech, and so forth, and can reduce encoding error of a speech/audio signal or the like and decoded signal quality degradation.
6
Brief Description of Drawings
FIG.l is a block diagram showing the main configuration of a speech encoding apparatus according to Embodiment 1 of the present invention;
FIG.2 is a drawing showing an example of the configuration of regions obtained by a band selection section according to Embodiment 1 of the present invention;
FIG.3 is a block diagram showing the main configuration of a speech decoding apparatus according to Embodiment 1 of the present invention;
FIG.4 is a block diagram showing the main configuration of a variation of a speech encoding apparatus according to Embodiment 1 of the present invention;
FIG.5 is a block diagram showing the main configuration of a variation of a speech decoding apparatus according to Embodiment 1 of the present invention;
FIG.6 is a block diagram showing the main configuration of a speech encoding apparatus according to Embodiment 2 of the present invention;
FIG.7 is a block diagram showing the main configuration of the interior of a second layer encoding section according to Embodiment 2 of the present invention;
7
FIG.8 is a block diagram showing the main configuration of a speech decoding apparatus according to Embodiment 2 of the present invention;
FIG.9 is a block diagram showing the main configuration of the interior of a second layer decoding section according to Embodiment 2 of the present invention;
FIG.10 is a block diagram showing the main configuration of a speech encoding apparatus according to Embodiment 3 of the present invention;
FIG.11 is a block diagram showing the main configuration of a speech decoding apparatus according to Embodiment 3 of the present invention;
FIG.12 is a block diagram showing the main configuration of a speech encoding apparatus according to Embodiment 4 of the present invention;
FIG.13 is a block diagram showing the main configuration of a speech decoding apparatus according to Embodiment 4 of the present invention;
FIG.14 is a block diagram showing the main configuration of a speech encoding apparatus according to Embodiment 5 of the present invention;
FIG.15 is a block diagram showing the main configuration of the interior of a band enhancement encoding section according to Embodiment 5 of the present invention;
FIG.16 is a block diagram showing the main
8
configuration of the interior of a corrective scale factor encoding section according to Embodiment 5 of the present invention;
FIG.17 is a block diagram showing the main configuration of the interior of a second layer encoding section according to Embodiment 5 of the present invention;
FIG.18 is a block diagram showing the main configuration of a speech decoding apparatus according to Embodiment 5 of the present invention;
FIG.19 is a block diagram showing the main configuration of the interior of a band enhancement decoding section according to Embodiment 5 of the present invention;
FIG.20 is a block diagram showing the main configuration of the interior of a second layer decoding section according to Embodiment 5 of the present invention;
FIG.21 is a block diagram showing the main configuration of a speech encoding apparatus according to Embodiment 6 of the present invention;
FIG.22 is a block diagram showing the main configuration of the interior of a second layer encoding section according to Embodiment 6 of the present invention;
FIG.23 is a drawing showing an example of the configuration of regions' obtained by a band selection
9
section according to Embodiment 6 of the present invention;
FIG.24 is a block diagram showing the main configuration of a speech decoding apparatus according to Embodiment 6 of the present invention;
FIG.25 is a block diagram showing the main configuration of the interior of a second layer decoding section according to Embodiment 6 of the present invention;
FIG.26 is a block diagram showing the main configuration of a speech encoding apparatus according to Embodiment 7 of the present invention;
FIG.27 is a block diagram showing the main configuration of the interior of a second layer encoding section according to Embodiment 7 of the present invention;
FIG.28 is a block diagram showing the main configuration of a speech decoding apparatus according to Embodiment 7 of the present invention; and
FIG.29 is a block diagram showing the main configuration of the interior of a second layer decoding section according to Embodiment 7 of the present invention.
Best Mode for Carrying Out the Invention
As an overview of an example of the present invention, in quantization of a frequency component of
10
a different band in each frame, if the number of subbands common to a past-frame quantization target band and current-frame quantization target band is determined to be greater than or equal to a predetermined value, predictive encoding is performed on a frequency domain parameter, and if the number of common subbands is determined to be less than the predetermined value, a frequency domain parameter is encoded directly. By this means, the encoded information amount of a speech/audio signal or the like is reduced, and also sharp quality degradation of a decoded signal, decoded speech, and so forth, can be prevented, and encoding error of a speech/audio signal or the like and decoded signal quality degradation — and decoded speech audio quality degradation, in particular — can be reduced.
Embodiments of the present invention will now be described in detail with reference to the accompanying drawings. In the following descriptions, a speech encoding apparatus and speech decoding apparatus are used as examples of an encoding apparatus and decoding apparatus of the present invention.
(Embodiment 1) FIG.l is a block diagram showing the main configuration of speech encoding apparatus 100 according to Embodiment 1 of the present invention.
In this figure, speech encoding apparatus 100 is equipped with frequency domain transform section 101,
11
band selection section 102, shape quantization section 103, predictive encoding execution/non-execution decision section 104, gain quantization section 105, and multiplexing section 106.
frequency domain transform section 101 performs a Modified Discrete Cosine Transform (MDCT) using an input signal, to calculate an MDCT coefficient, which is a frequency domain parameter, and outputs this to band selection section 102.
Band selection section 102 divides the MDCT coefficient input from frequency domain transform sect ion 101 into a plurality of subbands, selects a band as a quantization target band from the. plurality of subbands, and outputs band information indicating the selected band to shape quantization section 103, predictive encoding execution/non-execution decision section 104, and multiplexing section 106. In addition, band selection section IQ2 outputs the MDCT coefficient to shape quantization section 103. MDCT coefficient input to shape quantization section 103 may also be performed directly from frequency domain transform section 101 separately from input from frequency domain transform section 10i to band selection section 102.
Shqpe quantization section 103 performs shape quantization using an MDCT coefficient corresponding to a band indicated by band information input from band selection Section 102 from among MDCT coefficients input
12
from band selection section 102, and outputs obtained shape encoded information to multiplexing section 106. In addition, shape quantization section 103 finds a shape quantization ideal gain value, and outputs the obtained ideal gain value to gain quantization section 105.
Predictive encoding execution/non-execution decision section 104 finds a number of subbands common to a current-frame quantization target band and a past-frame quantization target band using the band information input from band selection section 102. Then predictive encoding execution/non-execution decision section 104 determines that predictive encoding is to be performed on the MDCT coefficient of the quantization target band indicated by the band information if the number of common subbands is greater than or equal to a predetermined value, or determines that predictive encoding is not to be performed on the MDCT coefficient of the quantization target band indicated by the band information if the number of common subbands is less than the predetermined value. Predictive encoding execution/non-execution decision section 104 outputs the result of this determination to gain quantization section 105.
If the determination result input from predict ive encoding execution/non-execution decision section 104 indicates that predictive encoding is to be performed, gain quantization section 105 performs predictive
13
encoding of current-frame quantization target band gain using a past-frame quantization gain value stored in an internal buffer and an internal gain codebook, to obtain gain encoded information. On the other hand, if the determination result input from predictive encoding execution/non-execution decision section 104 indicates that predictive encoding is not to be performed, gain quantization section 105 obtains gain encoded information by directly quantizing the ideal gain value input from shape quantization section 103. Gain quantization section 105 outputs the obtained gain encoded information to multiplexing section 106.
Multiplexing section 10 6 multiplexes band information input from band selection section 102, shape encoded information input from shape quantization section 103, and gain encoded information input from gain quantization section 105, and transmits the obtained bit stream to a speech decoding apparatus.
Speech encoding apparatus 100 having a configuration such as described above separates an input signal into sections of N samples (where N is a natural number), and performs encoding on a frame-by-frame basis with N samples as one frame. The operation of each section of speech encoding apparatus 100 is described in detail below. in the following description, an input signal of a frame that is an encoding target is represented by xn (where n = 0, 1, .., N-l). Here, n indicates the index
14
of each sample in a frame that is an encoding target. Frequency domain transform section 101 has N internal buffers, and first initializes each buffer using a value of 0 in accordance with Equation (1) below.
In this equation, bufn (n = 0 , ..., N-l) indicates the (ntl)'th of N buffers in frequency domain transform section 10 1 .
[0025] Next, frequency domain transform section 101 finds MDCT coefficient Xk by performing a modified discrete cosine transform (MDCT) of input signal xn in accordance with Equation (2) below
In this equation, k indicates the index of each sample in one frame, and x'n is a vector linking input signal xn and bufn in accordance with Equation (.3) below.
Next, frequency domain transform section 101 updates bufn (n = 0, ..., N-l) as shown in Equation (4) below.
Then frequency domain transform section 101 outputs found MDCT coefficient Xk to band selection section 102.
Barid selection section 102 first divides MDCT
15
coefficient Xk into a plurality of subbands. Here, a description will be given taking a case in which MDCT coefficient Xk is divided equally into J subbands (where J is a natural number) as an example. Then band selection section 102 selects L consecutive subbands (where L is a natural number) from among the J subbands, and obtains M kinds of subband groups (where M is a natural number) . Below, these M kinds of subband groups are called regions.
FIG.2 is a drawing showing an example of the configuration of regions obtained by band selection section 102.
In this figure, the number of subbands is 17 (J=17 ) , the number of kinds of regions is eight (M=8), and each region is composed of five consecutive subbands (L=5). Of these, for example, region 4 is composed of subbands 6 through 10 .
Next, band selection section 102 calculates average energy E(m) of each of the M kinds of regions in accordance with Equation (5) below.
In this equation, j indicates the index of each of J subbands, m indicates the inciex of each of M kinds of regions, S(m) indicates the minimum value among the indices of L subbands composing region m, B(j) indicates the minimum value among the indices of a plurality of MDCT coefficients composing subband j , andW(j) indicates
16
the bandwidth of subband j . In the following description, a case in which the bandwidths of the J subbands are all equal - that is, a case in which W(j) is a constant -will be described as an example.
Next, band selection section 102 selects a region — for example, a band composed of subbands j" through j" + L-l — for which average energy E (m) is a maximum as a band that is a quantization target (a quantization target band), and outputs index m_max indicating this region as band information to shape quantization section 103, predictive encoding execution/non-execution decision section 104, and multiplexing section 106. Band selection section 102 also outputs MDCT coefficient Xk to shape quantization section 103. In the following description, the band indices indicating a quantization target band selected by band selection section 102 are assumed to be j " through j"+L-1.
Shape quantization section 103 performs shape quantization on a subband-by-subband basis on an MDCT coefficient corresponding to the band indicated by band information m_max input from band selection section 102 . .Specifically, shape quantization section 103 searches an internal shape codebook composed of quantity SQ of shape code vectors for each of L subbands, and finds the index of a shape code vector for which the result of Equation (6) below is a maximum.
17
. . . ( Equation 6) In this equation, SCS indicates a shape code vector composing a shape codeboo^ ■[_ indicates a shape code vector index, and k indicates the index of a shape code vector element.
Shape quantization section 103 outputs shape code vector index S_max for which the Result of Equation (6) above is a maximum to multiplexing section 106 as shape encoded information. Shape quan^ j_ za^ j_on section 103 also calculates ideal gain value G^^ ( j ) in accordance with Equation (7) below, and outputs this to gain quantization section 105.
. . . ( Equation 7) Predictive encoding ex€,cut ion/non-execution decision section 10 4 has an interna]_ buffer that stores band information m_max input from t,and selection section 102 in a past frame. Here, a case wj_ix be described by way of example in which bredictive encoding execution/non-execution decision section 104 has an internal buffer that stores band information m_max for the past three frames. t>redictive encoding
18
execution/non-execution decision section 104 first finds a number of subbands common to a past-frame quantization target band and current - frame quantization target band using band information m_max input from band selection section 102 in a past frame and band information m_max input from band selection section 102 in the current frame. Then predictive encoding execution/non-execution decision section 104 determines that predictive encoding is to be performed if the number of common subbands is greater than or equal to a predetermined value, or determines that predictive encoding is not to be performed if the number of common subbands is less than the predetermined value. Specifically, Lsubbands indicated by band information m_max input from band selection section 102 one frame back in time are compared with L subbands indicated by band information m_max input from band selection section 102 in the current frame, and it is determined that predictive encoding is to be performed if the number of common subbands is P or more, or it is determined that predictive encoding is not to be performed if the number of common subbands is less than P. Predictive encoding execution/non-execution decision section 10 4 outputs the result of this determination to gain quantization section 105 . Then predictive encoding execution/non-execution decision section 104 updates the internal buffer storing band information using band information m_max input from band selection section 102
19
in the current frame.
Gain quantization section 105 has an internal buffer that stores a quantization gain value obtained in a past frame. If a determination result input from predictive encoding execution/non-execution decision section 104 indicates that predictive encoding is to be performed, gain quantization section 105 performs quantization by predicting a current-frame gain value using past-frame quantization gain value Ctj stored in the internal buffer. Specifically, gain quantization section 105 searches an internal gain codebook composed of quantity GQ of gain code vectors for each of L subbands, and finds an index of a gain code vector for which the result of Equation (8) below is a minimum.
...(Equation 8) In this equation, GCij indicates a gain code vector composing a gain codebook, i indicates a gain code vector index, and j indicates an index of a gain code vector element. For example , if the number of subbands composing a region is five (L=5), j has a value of 0 to 4. Here, Ctj indicates a gain value of t frames before in time, so that when t = l, for example, Ctj indicates a gain value of one frame before in time. Also, is a 4th-order linear prediction coefficient stored in gain quantization section 105. Gain quantization section 105 treats L
20
subbands within one region as an L-dimensiona 1 vector, and performs vector quantization.
Gain quantization section 105 outputs gain code vector index G_min for which the result of Equation (8) above is a minimum to multiplexing section 106 as gain encoded information. If there is no gain value of a subband corresponding to a past frame in the internal buffer, gain quantization section 105 substitutes the gain value of the nearest subband in frequency in the internal buffer in Equation (8) above.
On the other hand, if the determination result input from predictive encoding execution/non-execution decision section 104 indicates that predictive encoding is not to be performed, gain quantization section 105 directly quantizes ideal gain value Gain_i(j) input from shape quantization section 103 in accordance with Equation (9) below. Here, gain quantization section 105 treats an ideal gain value as an L-dimensional vector, and performs vector quantization.
Here, a codebook index that makes Equation (9) above a minimum is denoted by G_min.
Gain quantization section 10 5 outputs G_min to multiplexing section 106 as gain encoded information. Gain quantization section 105 also updates the internal buffer in accordance with Equation (10) below using gain
21
encoded information G_min and quantization gain value Ctj obtained in the current frame.
Multiplexing section 106 multiplexes band information m_max input from band selection section 102, shape encoded information S_max input from shape quantization section 103, and gain encoded information G_min input from gain quantization section 105, and transmits the obtained bit stream to a speech decoding apparatus.
FIG.3 is a block diagram showing the main configuration of speech decoding apparatus 200 according to this embodiment.
In this figure, speech decoding apparatus 200 is equipped with demultiplexing section 201, shape dequantization section 2 02, predictive decoding execution/non-execution decision section 203, gain dequantization section 204, and time domain trans form section 2 05.
Demultiplexing section 201 demultiplexes band information, shape encoded information, and gain encoded information from a bit stream transmitted from speech encoding apparatus 100, outputs the obtained band information to shape dequantization section 202 and predictive decoding execution/non-execution decision
22
section 203, outputs the obtained shape encoded information to shape dequantization section 2 02, and outputs the obtained gain encoded information to gain dequantization section 204.
Shape dequantization section 202 finds the shape value of an MDCT coefficient corresponding to a quantization target band indicated by band information input from demultiplexing section 201 by performing dequantization of shape encoded information input from demultiplexing section 201, and outputs the found shape value to gain dequantization section 2 04.
Predictive decoding execution/non-execution decision section 203 finds a number of subbands common to a current-frame quantization target band and a past-frame quantization target band using the band information input from demultiplexing section 201 . Then predictive decoding execution/non-execution decision section 203 determines that predictive decoding is to be performed on the MDCT coefficient of the quantization target band indicated by the band information if the number of common subbands is greater than or equal to a predetermined value, or determines that predictive decoding is not to be performed on the MDCT coefficient of the quantization target band indicated by the band information if the number of common subbands is less than the predetermined value. Predictive decoding execution/non-execution decision section 203 outputs the
23
result of this determination to gain dequantization section 2 04.
If the determination result input from predictive decoding execution/non-execution decision section 203 indicates that predictive decoding is to be performed, gain dequantization section 204 performs predictive decoding on gain encoded information input from demultiplexing section 201 using a past-frame gain value stored in an internal buffer and an internal gain codebook, to obtain a gain value. On the other hand, if the determination result input from predictive decoding execution/non-execution decision section 203 indicates that predictive decoding is not to be performed, gain dequantization section 204 obtains a gain value by directly performing dequantization of gain encoded information input from demultiplexing section 201 using the internal gain codebook. Gain dequantization section 204 outputs the obtained gain value to time domain transform section 205. Gain dequantization section 204 also finds an MDCT coefficient of the quantization target band using the obtained gain value and a shape value input from shape dequantization section 202, and outputs this to time domain transform section 205 as a decoded MDCT coefficient.
Time domain transform section 205 performs an Inverse Modified Discrete Cosine Transform (IMDCT) on the decoded MDCT coefficient input from gain
24
dequantization section 204 to generate a time domain signal, and outputs this as a decoded signal.
Speech decoding apparatus 2 00 having a configuration such as described above performs the following operations.
Demultiplexing section 201 demultiplexes band information m_max, shape encoded information S_max, and gain encoded information G_min from a bit stream transmitted from speech encoding apparatus 100, outputs obtained band information m_max to shape dequantization section 202 and predictive decoding execution/non-execution decision section 203, outputs obtained shape encoded information S_max to shape dequantization section 202, and outputs obtained gain encoded information G_min to gain dequantization section 204 .
Shape dequantization section 202 has an internal shape codebook similar to the shape codebook with which shape quantization section 10 3 of speech encoding apparatus 100 is provided, and searches for a shape code vector for which shape encoded information S_max input from demultiplexing section 201 is an index. Shape dequantization section 202 outputs a searched code vector to gain dequantization section 204 as the shape value of an MDCT coefficient of a quantization target band indicated by band information m_max input from demultiplexing section 201. Here, a shape code vector
25
searched as a shape value is denoted by Shape_q{k) (k
Predictive decoding execution/non-execution decision section 2 03 has an internal buffer that stores band information m_max input from demultiplexing section 201 in a past frame. Here, a case will be described by way of example in which predictive decoding execution/non-execution decision section 203 has an internal buffer that stores band information m_max for the past three frames. Predictive decoding execution/non-execution decision section 2 03 first finds a number of subbands common to a past-frame quantization target band and current-frame quantization target band using band information m_max input from demultiplexing section 201 in a past frame and band information m_max input from demultiplexing section 201 in the current frame Then predictive decoding execution/non-execution decision section 203 determines that predictive decoding is to be performed if the number of common subbands is greater than or equal to a predetermined value, or determines that predictive decoding is not to be performed if the number of common subbands is less than the predetermined value. Specifically, predictive decoding execution/non-execution decision section 203 compares L subbands indicated by band information m_max input from demultiplexing section 201 one frame back in time with L subbands indicated by band information m_max input from
26
demultiplexing section 2 01 in the current frame, and determines that predictive decoding is to be performed if the number of common subbands is Pormore, or determines that predictive decoding is not to be performed if the number of common subbands is less than P. Predictive decoding execution/non-execution decision section 203 outputs the result of this determination to gain dequantization section 20 4. Then predictive decoding execution/non-execution decision section 203 updates the internal buffer storing band information using band information m_max input from demultiplexing section 201 in the current frame.
Gain dequantization section 204 has an internal buffer that stores a gain value obtained in a past frame. If a determination result input from predictive decoding execution/non-execution decision section 203 indicates that predictive decoding is to be performed, gain dequanti zation section 2 04 performs dequantization by predicting a current-frame gain value using a past-frame gain value stored in the internal buffer. Specifically, gain dequantization section 2 04 has the same kind of internal gain codebook as gain quantization section 105 of speech encoding apparatus 100, and obtains gain value Gain_q' by performing gain dequantization in accordance with Equation (11) below. Here, C"^ indicates a gain value of t frames before in time, so that when t = l, for example, C"tj indicates a gain value of one frame before
21
in time. Also, is a 4th-order linear prediction coefficient stored in gain dequantization section 2 04. Gain dequantization section 204 treats L subbands within one region as an L-dimensional vector, and performs vector dequantization.
. . . ( Equation 11)
If there is no gain value of a subband
corresponding to a past frame in the internal buffer,
.gain dequantization section 204 substitutes the gain
value of the nearest subband in frequency in the internal
buffer in Equation (11) above.
On the other hand, if the determination result input from predictive decoding execution/non-execution decision section 203 indicates that predictive decoding is not to be performed, gain dequantization section 204 performs dequantization of a gain value in accordance with Equation (12) below using the above-described gain codebook. Here, a gain value is treated as an L-dimensional vector, and vector dequantization is performed. That is to say, when predictive decoding is not performed, gain code vector corresponding to gain encoded information G_min is taken directly as a gain value.
Next, gain dequantization section 204 calculates
28
a decoded MDCT coefficient in accordance with Equation (13) below using a gain value obtained by current-frame dequantization and a shape value input from shape dequantization section 2 02, and updates the internal buffer in accordance with Equation (14) below. Here, a calculated decoded MDCT coefficient is denoted by X"k. Also, in MDCT coefficient dequantization, if k is present within B(j") through B{j"+1)-1, gain value Gain_q'(j) takes the value of Gain q'(j").
Gain dequantization section 204 outputs decoded MDCT coefficient X"k calculated in accordance with Equation (13) above to time domain transform section 205. Time domain transform section 205 first initializes internal buffer buf'k to a value of zero in accordance with Equation (15) below.
Then time domain transform section 205 finds decoded signal Yn in accordance with Equation (16) below using decoded MDCT coefficient X"k input from gain dequantization section 204.
29
. . . { Equat ion 16) In this equation, X2"k is a vector linking decoded MDCT coefficient X"k and buffer buf'k.
Next, time domain transform section 205 updates buffer buf'k in accordance with Equation (18) below. buf\ = X\ (k = 0,--N-\) ...{Equation 18) Time domain transform section 205 outputs obtained decoded signal Yn as an output signal.
Thus, according to this embodiment, a high-energy band is selected in each frame as a quant ization target band and a frequency domain parameter is quantized, enabling bias' to be created in quantized gain value distribution, and vector quantization performance to be improved.
Also, according to this embodiment, in frequency domain parameter quantization of a different quantization target band of each frame, predictive encoding is performed on a frequency domain parameter if the number of subbands common to a past-frame quantization target band and current-frame quantization target band is determined to be greater than or equal to a predetermined value, and a frequency domain parameter is encoded directly if the number of common subbands is determined to be less than the predetermined value. Consequently, the encoded information amount in speech encoding is
30
reduced, and also sharp speech quality degradation can be prevented, and speech/audio signal encoding error and decoded signal audio quality degradation can be reduced.
Furthermore, according to this embodiment, on the encoding side a quantization target band can be decided, and frequency domain parameter quantization performed, in region units each composed of a plurality of subbands, and information as to a frequency domain parameter of which region has become a quantization target can be transmitted to the decoding side. Consequently, quantization efficiency can be improved and the encoded information amount transmitted to the decoding side can be further reduced as compared with deciding whether or not predictive encoding is to be used on a subband-by-subband basis and transmitting information as to which subband has become a quantization target to the decoding side.
In this embodiment, a case has been described by way of example in which gain quantization is performed in region units each composed of a plurality of subbands, but the present invention is not limited to this, and a quantization target may also be selected on a subband-by-subband basis — that is, determination of whether or not predictive quantization is to be carried out may also be performed on a subband-by-subband basis.
In this embodiment, a case has been described by way of example in which the gain predictive quantization
31
method is to perform linear prediction in the time domain for gain of the same frequency band, but the present invention is not limited to this, and linear prediction may also be performed in the time domain for gain of different frequency bands.
In this embodiment, a case has been described in which an ordinary speech/audio signal is taken as an example of a signal that becomes a. quantization target, but the present invention is not limited to this, and an excitation signal obtained by processing a speech/audio signal by means of an LPC (Linear Prediction Coefficient) inverse filter may also be used as a quantization target.
In this embodiment, a case has been described by way of example in which a region fc5r which the magnitude of individual region energy - that is, perceptual significance - is greatest is selected as a reference for selecting a quantization target band, but the present invention is not limited to this, and in addition to perceptual significance, frequency correlation with a band selected in a past frame may also be taken into consideration at the same time. That is to say, if candidate bands exist for which the number of subbands common to a quantization target bar\d selected in the past is greater than or equal to a predetermined value and energy is greater than or equal to a. predetermined value, the band with the highest energy among the above candidate
32
bands may be selected as the quantization target band, and if no such candidate bands exist, the band with the highest energy among all frequency bands may be selected as the quantization target band. For example, if a subband common to the highest-energy region and a band selected in a past frame does not exist, the number of subbands common to the second-highest-energy region and a band selected in a past frame is greater than or equal to a predetermined threshold value, and the energy of the second-highest-energy region is greater than or equal to a predetermined threshold value, the second-highest-energy region is selected rather than the highest-energy region. Also, a band selection section according to this embodiment selects a region closest to a quantization target band selected in the past from among regions whose energy is greater than or equal to a predetermined value as a quantization target band.
In this embodiment, MDCT coefficient quantization may be performed after interpolation is performed using a past frame. For example, a case will be described with reference to FIG. 2 in which a past-frame quantization target band is region 3 (that is, subbands 5 through 9), a current-frame quantization target band is region 4 (that is, subbands 6 through 10), and current-frame predictive encoding is performed using a past-frame quantization result. In such a case, predictive encoding is performed on current-frame
33 •
subbands 6 through 9 using past-frame subbands 6 through 9, and for current-frame subband 10, past-frame subband 10 is interpolated using past-frame subbands 6 through 9, and then predictive encoding is performed using past-frame subband 10 obtained by interpolation.
In this embodiment, a case has been described by way of example in which quantization is performed using the same codebook irrespective of whether or not predictive encoding is performed, but the present invention is not limited to this, and different codebooks may also be used according to whether predictive encoding is performed or is not performed in gain quantization and in shape quantization.
In this embodiment, a case has been described by way of example in which all subband widths are the same, but the present invention is not limited to this, and individual subband widths may also differ.
In this embodiment, a case has been described by way of example in which the same codebook is used for all subbands in gain quantization and in shape quantization, but the present invention is not limited to this, and different codebooks may also be used on a subband-by-subband basis in gain quantization and in shape quantization.
In this embodiment, a case has been described by way of example in which consecutive subbands are selected as a quantization target band, but the present invention
34
is not limited to this, and a nonconsecutive plurality of subbands may also be selected as a quantization target band. In such a case, speech encoding efficiency can be further improved by interpolating an unselected subband value using adjacent subband values.
In this embodiment, a case has been described by way of example in which speech encoding apparatus 100 is equipped with predictive encoding execution/non-execution decision section 104, but a speech encoding apparatus according to the present invention is not limited to this, and may also have a configuration in which predictive encoding execution/non-execution decision section 104 is not* provided and predictive quantization is not always performed by gain quantization section 105, as illustrated by speech encoding apparatus 100a shown in FIG.4. In this case, as shown in FIG.4, speech encoding apparatus 100a is equipped with frequency domain transform section 101, band selection section 102, shape quantization section 103, gain quantization section 105, and multiplexing section 106. FIG.5 is a block diagram showing the configuration of speech decoding apparatus 2 00a corresponding to speech encoding apparatus 10 0a, speech decoding apparatus 200a being equipped with demultiplexing section 2 01, shape dequantization section 202, gain dequantization section 204, and time domain transform section 205. In such a case, speech encoding
35
apparatus 100a performs partial selection of a band to be quantized from among all bands, further divides the selected band into a plurality of subbands, and quantizes the gain of each subband. By this means, quantization can be performed at a lower bit rate than with a method whereby components of all bands are quantized, and encoding efficiency can be improved. Also, encoding efficiency can be further improved by quantizing a gain vector using gain correlation in the frequency domain. A speech encoding apparatus according to the present invention may also have a configuration in which predictive encoding execution/non-execution decision section 104 is not provided and predictive quantization is always performed by gain quantization section 105, as illustrated by speech encoding apparatus 100a shown in FIG. 4 . The configuration of speech decoding apparatus 200a corresponding to this kind of speech encoding apparatus 100a is as shown in FIG. 5. In such a case, speech encoding apparatus 10 0a performs partial selection of a bandtobe quantized from among all bands, further divides the selected band into a plurality of subbands, and performs gain guantization for each subband. By this means, quantization can be performed at a lower bit rate than with a method whereby components of all bands are quantized, and encoding efficiency can be improved . Also , encoding efficiency can be further improved by predictive quantizing a.gain vector using gain correlation in the
36 •
time domain.
In this embodiment, a case has been described by way of example in which the method of selecting a quantization target band in a band selection section is to select the region with the highest energy in all bands, but the present invention is not limited to this, and selection may also be performed using information of a band selected in a temporally preceding frame in addition to the above criterion. For example, a possible method is to select a region to be quantized after performing multiplicationby aweight such that a region that includes a band in the vicinity of a band selected in a temporally preceding frame becomes more prone to selection. Also, if there are a plurality of layers in which a band to be quantized is selected, a band quantized in an upper layer may be selected using information of a band selected in a lower layer. For example, a possible method is to select a region to be quantized after performing multiplication by a weight such that a region that includes a band in the vicinity of a band selected in a lower layer becomes more prone to selection.
In this embodiment, a case has been described by way of example in which the method of selecting a quantization target band is to select the region with the highest energy in all bands, but the present invention is not limited to this, and a certain band may also be preliminarily selected beforehand, after which a
37
quantization target band is finally selected in the preliminarily selected band. In such a case, a preliminarily selected band may be decided according to the input signal sampling rate, coding bit rate, or the like. For example, one method is to select a low band preliminarily when the bit rate or sampling rate is low. For example, it is possible for a method to be employed in band selection section 102 whereby a region to be quantized is decided by calculating region energy after limiting selectable regions to low-band regions from among all selectable region candidates. As an example of this, a possible method is to perform limiting to five candidates from the low-band side from among the total of eight candidate regions shown in FIG. 2, and select the region with the highest energy among these. Alternatively, band selection section 102 may compare energies after multiplying energy by a weight so that a lower-area region becomes proportionally more prone to selection. Another possibility is for band selection section 102 to select a fixed low-band-side subband. A feature of a speech signal is that the harmonics structure becomes proportionally stronger toward the low-band side, as a result of which a strong peak is present on the low-band side. As this strong peak is difficult to mask, it is prone to be perceived as noise. Here, by increasing the likelihood of selection toward the low-band side rather than simply selecting a region based on energy magnitude,
38
the possibility of a region that includes a strong peak being selected is increased, and a sense of noise is reduced as a result. Thus, the quality of a decoded signal can be improveci by limiting selected regions to the low-band side, or performing multiplication by a weight such that the likelihood of selection increases toward the low-band side, in this way.
A speech encoding apparatus according to the present invention has been described in terms of a configuration whereby shape (shape information) quantization is first performed on a component of a band to be quantized, followed by gain {gain information) quantization, but the present invention is not limited to this, and a configuration may also be used whereby gain quantization is performed first, followed by shape quanti zation.
(Embodiment 2) FIG.6 is a block diagram showing the main configuration of speech encoding apparatus 300 according to Embodiment 2 of the present invention.
In this figure, speech encoding apparatus 300 is equipped with down-sampling section 301, first layer encoding section 302, first layer decoding section 303, up-sampling section 304, first frequency domain trans form section 305, delay section 306, second frequency domain transform section 30 7, second layer encoding section 308, and multiplexing section 309, and has a scalable
39 •
configuration compris ing two layers. In the first layer, a CELP (Code Excited Linear Prediction) speech encoding method is applied, and in the second layer, the speech
encoding method described in Embodiment 1 of the present
«
invention is applied.
Down-sampling section 301 performs down-sampl ing processing on an input speech/audio signal, to convert the speech/audio signal sampling rate from Rate 1 to Rate 2 (where Rate 1 > Rate 2), and outputs this signal to first layer encoding section 302.
First layer encoding section 302 performs CELP
speech encoding on the post-down-sampling speech/audio
signal input from down-sampling section 301, and outputs
obtained first layer encoded information to first layer
decoding Section 303 and multiplexing section 309.
Specifically, first layer encoding section 302 encodes
a speech signal comprising vocal tract information and
excitation information by finding an LPC parameter for
the vocal tract information, and for the excitation
information, performs encoding by finding an index that
identifies which previously stored speech model is to
be used-th^t is, an index that identifies which excitation
vector of a.n adaptive codebook and fixed codebook is to
be generated.
First layer decoding section 303 performs CELP speech decoding on first layer encoded information input from first layer encoding section 302, and outputs an
40
obtained first layer decoded signal to up-sampling sect ion 30 4.
Up-sampling section 30 4 performs up-sampling processing on the first layer decoded signal input from first layer decoding section 303, to convert the first layer decoded signal sampling rate from Rate 2 to Rate 1, and outputs this signal to first frequency domain transform section 305.
First frequency domain transform section 30 5 performs an MDCT on the post-up-sampling first layer decoded signal input from up-sampling section 304, and outputs a first layer MDCT coefficient obtained as a frequency domain parameter to second layer encoding section 308. The actual transform method used in first frequency domain transform section 305 is similar to the transform method used in frequency domain transform section 101 of speech encoding apparatus 100 according to Embodiment 1 of the present invention, and therefore a description thereof is omitted here.
Delay section 306 outputs a delayed speech/audio signal to second frequency domain transform section 307 by outputting an input speech/audio signal after storing that input signal in an internal buffer for a predetermined time. The predetermined delay time here is a time that takes account of algorithm delay that arises in down-sampling section 301, first layer encoding section 302, first- layer decoding section 303, up-sampling
41
section 304, first frequency domain transform section 305, and second frequency domain transform section 307. Second frequency domain transform section 3 07 performs an MDCT on the delayed speech/audio signal input from delay section 306, and outputs a second layer MDCT coefficient obtained as a frequency domain parameter to second layer encoding section 308. The actual transform method used in second frequency domain transform section 307 is similar to the transform method used in frequency domain transform section 101 of speech encoding apparatus 100 according to Embodiment 1 of the present invention, and therefore a description thereof is omitted here.
Second layer encoding section 308 performs second layer encoding using the first l&yer MDCT coefficient input from first frequency domain transform section 305 and the second layer MDCT coefficient input from second frequency domain transform section 307, and outputs obtained second layer encoded information to multiplexing section 309. The main internal corifiguration and actual operation of second layer encoding section SOS will be described later herein.
Multiplexing section 309 multiplexes first layer encoded information input from first layer encoding section 3 02 and second layer encoded information input from second layer encoding section 30 8, and transmits the obtained bit stream to a speech decoding apparatus. FIG.7 is a block diagram showing the main
42 •
configuration of the interior of second layer encoding section 308. Second layer encodin5 section 308 has a similar basic configuration to that of speech encoding apparatus 100 according to Embodiment l (see F±G.l), and therefore identical configuration elements are assigned the same reference codes and desrriP^lons thereof are omitted here.
Second layer encoding section 308 differs from speech encoding apparatus 100 in being equipped with residual MDCT coefficient calculati-onsection381instead of frequency domain transform seC^lon 101. Processing by multiplexing section 106 is sirn9 section 405 has a similar basic configuration to that of speech decoding apparatus 200 according to Embodiment 1 on 307, and outputs obtained band enhancement enc'"'}
. . . (Equation 3 5) Here, a codebook index that makes Equation (35) above a minimum is denoted by G min.
Gain quantization section 1805 outputs G_min to multiplexing section 106 as gain encoded information. Gain quantization section 1805 also updates the internal buffer in accordance with Equation (36) below using gain encoded information G_min and quantization gain value Cfcj- obtained in the current frame*. That is to say, in Equation (36), aC1^ value is updated with gain code vector GCG-minj element index j and j' satisfying j ' e Region(m_max) respectively associated in ascending order.
C3 =C1
C\,=GCj-mn
' j'e Region(m_max)^
7' = 0,-,i-l
. . . (Equation 3 6)
FIG.24 is a block diagram showing the main configuration of speech decoding apparatus 1200 according to this embodiment.
In this figure, speech decoding apparatus 1200 is equipped with control section 40^, first layer decoding section 402, up-sampling section 403, frequency domain transform section 404, second layer decoding section 1205, time domain transform section 40$, ancj switch 4 07.
With the exception of Second layer decoding
section 1205, configuration elements in speech decoding apparatus 1200 shown in FIG.24 are identical to the configuration elements of speech clecoding apparatus 400 shown in FIG.8, and therefore identical configuration elements are assigned the same reference codes and descriptions thereof are omitted her e .
FIG.25 is a block diagiam showing the main configuration of the interior of second layer decoding section 1205. Second layer decodin9 section 1205 mainly comprises demultiplexing seption 451, shape dequantization section 202, predictive decoding execution/non-execution decision section 203, gain dequantization section 2504, and addition MDCT coefficient calculation section 452- With the exception of gain dequantization section 2504, configuration elements in second layer decodin9 section 12 0 5 are identical to the configuration ele?ments of second layer decoding section 405 shown in fIG-9' and therefore identical configuration elements are assigned the same reference codes and descriptions thereof are omitted here .
Gain dequantization sectic?n 2504 has an internal buffer that stores a gain value obtained in a Past frame. If a determination result input frcm predictive decoding execution/non-execution decision section 203 indicates that predictive decoding is to be performed, gain dequantization section 2504 perfo^ms dequantization by
89
predicting a current-frame gain value using a past-frame gain value stored in the internal buffer. Specifically, gain dequantization section 2 50 4 has the same kind of internal gain codebook value is calculated with gain code vector GCG-mink element index k and j' satisfying j' e Region(m_max) respectively associated in ascending order.
Goin_q'(f) = t*(a< ■ c'/)+«o'OCck
G min
j'e Region(m _ max)
Jc = 0,---,L-. . . (Equation 37) If there is no gain value of a subband corresponding to a past frame in the internal buffer, gain dequantization section 25 0 4 substitutes the gain value of the nearest subband in frequency in the internal buffer in Equation (37) above.
t=0
On the other hand, if the determination result input from predictive decoding execution/non-execution decision section 203 indicates that predictive decoding is not to be performed, gain dequantization section 2504 performs dequantization of a gain value in accordance with Equation (38) below using the above-described gain codebook. Here, a gain value is treated as an L-dimensional vector, and vector dequantization is performed. That is to say, when predictive decoding is not performed, gain dequantization section 2504 takes gain code vector GCkG-min corresponding to gain encoded information G_min directly as a gain value. In Equation (38) , k and j' are respectively associated in ascending order in the same way as, in Equation (37).
Gain_q'(f) = GCf:
G min
^/e Region(m _max)
. . . (Equation 3
Next, gain dequantization section 2504 calculates a decoded MDCT coefficient in accordance with Equation (39) below using a gain value obtained by current-frame dequantization and a shape value input from shape dequantization section 202, and updates the internal buffer in accordance with Equation {40} below. In Equation (40), a C " * j value is updated with j of dequantized gain value Gain_q' (j) and j' satisfying j'
^ Region(m_max) respectively associated in ascending order. Here, a calculated decoded MDCT coefficient is denoted by X"k. Also, in MDCT coefficient dequantization,
91-
if k is present within B ( j ' ) through B ( j ' +1) -1, the gain value takes the value of Gain q' (j' ) .
k = B(f),-,B(f+\)-U y-k, y
X\ = Gain _q\f)-Shape _q'{k)\ .
fe Region{m_max)
. . . (Equation 39)
(
fe Region(m_rnax)
. . . {Equation 4 0
\C«f = Gain_q\j)
Gain dequantization section 2504 outputs decoded MDCT coefficient X"k calculated in accordance with Equation (39) above to addition MDCT coefficient calculation section 452.
Thus, according to this embodiment, as compared with selecting one region composed of adjacent subbands from among all bands as a quantization target band, a plurality of bands for which it is wished to improve audio quality are set beforehand across a wide range, and a nonconsecutive plurality of bands spanning a wide range are selected as quantization target bands. Consequently, both low-band and high-band quality can be improved at the same time.
In this embodiment, the reason for always fixing subbands included in a quantization target band on the high-band side, as shown in FIG.23, is that encoding distortion is still large for a high band in the first layer of a scalable codec. Therefore, audio quality is improved by also fixedly selecting a high band that has
not been encoded with very high precision by the first layer as a quantization target in addition to selecting a low or middle band having perceptual significance to selection as a quantization target in the second layer. In this embodiment, a case has been described by way of example in which a band that becomes a high-band quantization target is fixed by including the same high-band subbands (specifically, subband indices 15 and 16) throughout all frames, but the present invention is not limited to this, and a band that becomes a high-band quantization target may also be selected from among a plurality of quantization target band candidates for a high-band subband in the same way as for a low-band subband. In such a case, selection may be performed after multiplying by a larger weight the higher the subband
area is. It is also possible for bands that become
V candidates to be changed adaptively according to the input
signal sampling rate, coding bit rate, and first layer
decoded signal spectral characteristics, or the spectral
characteristics of a differential signal for an input
signal and first layer decoded signal, or the like. For
example, a possible method is to give priority as a
quantization target band candidate to- a part where the
energy distribution of the spectrum (residual MDCT
coefficient) of a differential signal for the input signal
and first layer decoded signal is high.
In this embodiment, a case has been described by
93
way of example in which a high-band-side subband group composing a region is fixed, and whether or not predictive encoding is to be applied to a gain quantization section is determined according to the number of subbands common to a quantization target band selected in the current frame and a quantization target band selected in a past frame, but the present invention is not limited to this, and predictive encoding may also always be applied to gain of a high-band-side subband group composing a region, with determination of whether or not predictive encoding is to be performed being performed only for a low-band-side subband group. In this case, the number of subbands common to a quantization target band selected in the current frame and a quantization target band selected in a past frame is taken into consideration only for a low-band-side subband group. That is to say, in this case, a quantization vector is quantized after division into a part for which predictive encoding is performed and a part for which predictive encoding is not performed. In this way, since determination of whether or not predictive encoding is necessary for a high-band side fixed subband group composing a region is not performed, and predictive encoding is always performed, gain can be quantized more efficiently.
In this embodiment, a case has been described by way of example in which switching is performed between application and non-application of predictive encoding
94
in a gain quantization section according to the number of subbands common to a quantization target band selected in the current frame and a quantization target band selected one frame back in time, but the present invention is not limited to this, and a number of subbands common to a quantization target band selected in the current frame and a quantization target band selected two or more frames back in time may also be used. In this case, even if the number of subbands common to a quantization target band selected in the current frame and a quantization target band selected one frame back in time is less than or equal to a predetermined value, predictive encoding may be applied in a gain quantization section according to the number of subbands common to a quantization target band selected in the current frame and a quantization target band selected two or more frames back in time. In this embodiment, a case has been described by way of example in which a region is composed of a low-band-side subband group and a high-band-side subband group, but the present invention is not limited to this, and, for example, a subband group may also be set in a middle band, and a region may be composed of three or more subband groups. the number of subband groups composing a region may also be changed adaptively according to the input signal sampling rate, coding bit rate, and first layer decoded signal spectral characteristics, or the spectral characteristics of a
95
differential signal for an input signal and first layer decoded signal, or the like.
In this embodiment, a case has been described by way of example in which a high-band-side subband group composing a region is fixed throughout all frames, but the present invention is not limited to this, and a low-band-side subband group composing a region may also be fixed throughout all frames. Also, both high-band-side and low-band-side subband groups composing a region may also be fixed throughout all frames, or both high-band-side and low-band-side subband groups may be searched for and selected on a frame-by-frame basis . Moreover, the various above-described methods may be applied to three or more subband groups among subband groups composing a region.
In this embodiment, a case has been described by way of example in which, of subbands composing a region, the number of subbands composing a high-band-side subband group is smaller than the number of subbands composing a low-band-side subband group (the number of high-band-side subband group subbands being two, and the number of low-band-side subband group subbands being three) , but the present invention is not limited to this, and the number of subbands composing a high-band-side subband group may also be equal to, or greater than, the number of subbands composing a low-band-side subband group. The number of subbands composing each subband
96
group may also be changed adaptively according to the input signal sampling rate, coding bit rate, first layer decoded signal spectral characteristics, spectral characteristics of a differential signal for an input signal and first layer decoded signal, or the like.
In this embodiment, a case has been described by way of example in which encoding using a CELP encoding method is performed by first layer encoding section 302, but the present invention is not limited to this, and encoding using an encoding method other than CELP (such as transform encoding, for example) may also be performed.
(Embodiment 7) FIG.26 is a block diagram showing the main configuration of speech encoding apparatus 130 0 according to Embodiment 7 of the present invention.
In this figure, speech encoding apparatus 13 0 0 is equipped with down-sampling section 301, first layer encoding section 302, first layer decoding section 303, up-sampling section 304, first frequency domain trans form section 305, delay section 306, second frequency domain trans form section 307, second layer encoding section 130 8, and multiplexing section 30 9, and has a scalable configuration comprising two layers. In the first layer, a CELP speech encoding method is applied, and in the second layer, the speech encoding method described in Embodiment 1 of the present invention is applied.
With the exception of second layer encoding
97
section 1308, configuration elements in speech encoding apparatus 1300 shown in FIG.26 are identical to the configuration elements of speech encoding apparatus 300 shown in FIG.6, and therefore identical configuration elements are assigned the same reference codes and descriptions thereof are omitted here.
FIG.27 is a block diagram showing the main
configuration of the interior of second layer encoding
section 1308. Second layer encoding section 1308 mainly
comprises residual MDCT coefficient calculation section
381, band selection section 102, shape quantization
section 103, predictive encoding
execution/non-execution decision section 3804, gain quantization section 3805, and multiplexing section 106. With the exception of predictive encoding execution/non-execution decision section 3 8 04 and gain quantization section 3805, configuration elements in second layer encoding section 1308 are identical to the configuration elements of second layer encoding section 3 08 shown in FIG.7 , and therefore identical configuration elements are assigned the same reference codes and descriptions thereof are omitted here.
Predictive encoding execution/non-execution decision section 3804 has an internal buffer that stores band information m_max input from band selection section 102 in a past frame. Here, a case will be described by way of example in which predictive encoding
98
execution/non-execution decision section j«U4 has an internal buffer that stores band information m_max for the past three frames. Predictive encoding execution/non-execution decision section 3804 first detects a subband common to a pa?t-frame quantization target band and current-frame qUantization target band using band information m_max input from band selection section 102 in a past frame and b£nd information m^max input from band selection section 102in the current frame. Of L subbands indicated by band inf°rmation m_max input from band selection section 102, predictive encoding execution/non-execution decision section 38 0 4 determines that predictive encod>n9 is to be applied, and sets Pred_Flag{j)=ON, for a ^ubband selected as a quantization target one frame back in time. On the other hand, of L subbands indicated by band information m_max input from band selection section 102' predictive encoding execution/non-execution decision section 3804 determines that predictive encoding is not to be applied, and sets Pred_Flag(j)=OFF, for a subband not selected as a quantization target one fram^ back in time. Here, Pred_Flag is a flag indicating a predictive encoding application/non-application detefmination result for each subband, with an ON value me^ning that predictive encoding is to be applied to a suPband gain value, and an OFF value meaning that predictive encoding is not to be applied to a subband gain value. Predictive encoding
99
execution/non-execution decision section 38 0 4 outputs a determination result for each subband to gain quantization section 38 05. Then predictive encoding execution/non-execution decision section 3804 updates the internal buffer storing band information using band information m_max input from band selection section 102 in the current frame.
Gain quantization section 38 0 5 has an internal buffer that stores a quantization gain value obtained in a past frame. Gain quantization section 3805 switches between execution/non-execution of application of predictive encoding in current-frame gain value quantization according to a determination result input from predictive encoding execution/non-execution decision section 38 04. For example, if predictive encoding is to be performed, gain quantization section 3805 searches an internal gain codebook composed of quantity GQ of gain code vectors for each of L subbands, performs a distance calculation corresponding to the determination result input from predictive encoding execution/non-execution decision section 3804, and finds an index of a gain code vector for which the result of Equation (41) below is a minimum. In Equation (41) , one or other distance calculation is performed according to Pred_Flag(j) for all j ' s satisfying j ^Region(m_max) , and a gain vector index is found for which the total value of the error is a minimum.
100
Gain_q\i) =
jsRegiorim_mix) I t=\ J
y{Gam_i(j)-GCt} (if(Pred_FlagU)=OFF)) ^
i = 0./--,GQ-\" k = 0,--.L-l
j^Regiottm mat)
. . . (Equation 41) In this equation, GC\ indicates a gain code vector composing a gain codebook, i indicates a gain code vector index, and k indicates an index of a gain code vector element. For example, if the number of subbands composing a region is five (L = 5), k has a value of 0 to 4. Here, Ctj indicates a gain value of t frames before in time, so that when t = l, for example, Ctj indicates a gain value of one frame before in time. Also, is a 4th-order linear prediction coefficient stored in gain quantization section 3805. Gain quantization section 3805 treats L subbands within one region as an L-dimensional vector, and performs vector quantization.
Gain quantization section 3805 outputs gain code vector index G_min for which the result of Equation (41) above is a minimum to multiplexing section 106 as gain encoded information.
Gain quantization section 3805 outputs G_min to multiplexing section 10 6 as gain encoded information. Gain quantization section 3805 also updates the internal buffer in accordance with Equation (42) below using gain encoded information G_min and quantization gain value Ctj obtained in the current frame. In Equation (42), a CXj- value is updated with gain code vector GCG-minj element
101
index j and j ' satisfying j'GRegion(m_max) respectively associated in ascending order.
G min
C3 =C2
C2 =Cl
c\. = Gq-
^/e Region(m maxf" j = 0,--,L-
. . . (Equation 42
FIG.28 is a block diagram showing the main configuration of speech decoding apparatus 14 00 according to this embodiment.
In this figure, speech decoding apparatus 1400 is equipped with control section 401, first layer decoding section 402, up-sampling section 403, frequency domain transform section 404, second layer decoding section 1405, time domain transform section 406, and switch 407.
With the exception of second layer decoding section 1405, configuration elements in speech decoding apparatus 1400 shown in FIG.28 are identical to the configuration elements of speech decoding apparatus 400 shown in FIG.8, and therefore identical configuration elements are assigned the same reference codes and descriptions thereof are omitted here.
FIG.29 is a block diagram showing the main configuration of the interior of second layer decoding section 1405. Second layer decoding section 1405 mainly comprises demultiplexing section 451, shape dequantization section 202, predictive decoding execution/non-execution decision section 4503, gain dequantization section 4504, and addition MDCT
102
coefficient calculation section 452. With the exception of predictive decoding execution/r\on-execution decision section 4 50 3 and gain dequantization section 4 504, configuration elements in second layer decoding section 1405 shown in FIG.29 are identical to the configuration elements of second layer decoding section 405 shown in FIG.9, and therefore identical configuration elements are assigned the same reference Codes and descriptions thereof are omitted here.
Predictive decoding execution/non-execution decision section 4503 has an internal buffer that stores band information m_max input from demultiplexing section 451 in a past frame. Here, a case will be described by way of example in which predictive decoding execution/non-execution decision section 4503 has an internal buffer that stores band information m_max for the past three frames. Predictive decoding execution/non-execution decision section 4503 first detects a subband common to a pa^t-frame quantization target band and current-frame quantization target band using band information m_max input from demultiplexing section 451 in a past frame and bgnd information m_max input from demultiplexing section 451 in the current frame. Of L subbands indicated by band information m_max input from demultiplexing section 4 51, predictive decoding execution/non-execution decision section 4503 determines that predictive decoding is to be applied,
• 103
and sets Pred_Flag{j)=0N, for a subband selected as a quantization target one frame back in time. On the other hand, of L subbands indicated by tfand information m_max input from demultiplexing sectlon 451, predictive decoding execution/non-execution decision section 4503 determines that predictive decodir*9 is not to be applied, and sets Pred_Flagfj)=OFF, for a subband not selected as a quantization target one fram£ back in time. Here, Pred_Flag is a flag indicating <* predictive decoding app1ication/non-application determination result for each subband, with an ON value meaning tnat predictive decoding is to be applied to a subband gain value, and an OFF value meaning that predict>ve decoding is not to be applied to a subband gain value- Next, predictive decoding execution/non-execution decision section 4503 outputs a determination result fo^ each subband to gain dequantization section 4504. Then predictive decoding execution/non-execution decision section 4 5 03 updates the internal buffer storing band information using band information m_max input from demul^lplexmg section 451 in the current frame.
Gain dequantization secti^n 4504 has an internal buffer that stores a gain value obtained in a past frame, and switches between execution/non_execution of application of predictive decoding in current-frame gain value decoding according to a dete^mination result input from predictive decoding ex^cution/non-execution
104
decision section 4503. Gain dequantization section 4504 has the same kind of internal gain code book as gain quantization section 105 of speech encoding apparatus 100, and when performing predictive decoding, for example, obtains gain value Gain_q' by performing gain dequanti zation in accordance with Equation (43) below. Here, C'#tj indicates a gain value of t frames before in time, so that when t = l, for example, C"1^ indicates a gain value of one frame before. Also, is a 4th-order linear prediction coefficient stored in gain dequantization section 4 504. Gain dequantization section 4 5 04 treats L subbands within one region as an L-dimensional vector, and performs vector dequantization. In Equation (43), a Gain_q' (j' ) value is calculated with gain code vector
GCG-raink element index k and j' satisfying j' €E Region(m_max) respectively associated in ascending order.
Gain_q,{f) =
'mPred_Flag(f) = ON))
£ (a, ■ C*)+ a0 ■ GC?-min ffe Region(m _max) (if(Pred_Flag(f) = OFF)) ^ = °''""'Z'"1
G min
GCt-
. . . ( Equation 4 3) Next, gain dequantization section 4504 calculates a decoded MDCT coefficient in accordance with Equation (44) below using a gain value obtained by current-frame dequantization and a shape value input from shape dequantization section 202, and updates the
105
internal buffer in accordance with Equation (45) below. In Equation (45), a C"1^ value is updated with j of dequantized gain value Gain_q' (j) and j' satisfying j ' £ Region(m_max) respectively associated in ascending order. Here, a calculated decoded MDCT coefficient is denoted by X"k. Also, in MDCT coefficient dequantization, if k is present within B (j ' ) through B {j ' +1) -1, the gain value takes the value of Gain_q'(j').
X"k = Gain_qW) ■ Shape _q' (k)
y'e Region(m_max)
... (Equation 44
T,t3 T.2
*fs Region{m _ max)
. . . ( Equation 4 5
C"'=Gflm_?'(y)
Gain dequantization section 4504 outputs decoded MDCT coefficient X"k calculated in accordance with Equation (44) above to addition MDCT coefficient calculation section 4 52.
Thus, according to this embodiment, at the time of gain quantization of a quantization target band selected in each frame, whether or not each subband included in a quantization target band was quantized in a past frame is detected. Then vector quantization is performed, with predictive encoding being applied to a subband quantized in a past frame, and with predictive encoding not being applied to a subband not quantized in a past frame. By this means, frequency domain
■ 106
parameter encoding can be carried ou^ more efficiently than with a method whereby predictive encoding application/non-application swit(=hing is performed for an entire vector.
In this embodiment, a metPod has been described whereby switching is performed between application and non-application of predictive encoding in a gain quantization section according to the number of subbands common to a quantization target band selected in the current frame and a quantization target band selected one frame back in time, but the pr^sent invention is not limited to this, and a number of subbands common to a quantization target band selected in the current frame and a quantization target band selectedtwoormore frames back in time may also be used. In this case, even if the number of subbands common to a quantizat ion target band selected in the current frame and a quantization target band selected one frame back in time is less than or equal to a predetermined value, predic;tive encoding may be applied in a gain quantization se£tion according to the number of subbands common to a quantization target band selected in the current frame and a quantization target band selected two or more frames back in time.
It is also possible for the quantization method described in this embodiment to be combined with the quantization target band selection method described in Embodiment 6. A case will be de^cribeo! in which, for
107
example, a region that is a quantization target band is
composed of a low-band-side subband group and a
high-band-side subband group, the high-band-side subband
group is fixed throughout all frames, and a vector in
which low-band-side subband group gain and high-band-side
subband group are made consecutive is quantized. In this
case, within a quantization target band gain vector,
vector quantization is performed with predictive encoding
always being applied for an element indicating
high-band-side subband group gain, and predictive
encoding not being applied for an element indicating
low-band-side subband group gain. By this means, gain
vector quantization can be carried out more efficiently
than when predictive encoding
application/non-application switching is performed for an entire vector. At this time, in low-band-side subband group, a method whereby vector quantization is performed with predictive encoding being applied to a subband quantized in a past frame, and with predictive encoding not being applied to a subband not quantized in a past frame, is also efficient. Also, for an element indicating low-band-side subband group gain, quantization is performed by switching between application and non-application of predictive encoding using subbands composing a quantization target band selected in a past frame in time, as described in Embodiment 1. Bythismeans, gain vector quantization can be performed still more
108
efficiently. it is also possible for the present invention to be applied to a configuration that combines above-described configurations.
This concludes a description of embodiments of the present invention.
In the above embodiments, cases have been described by way of example in which the method of selecting a quantization target band is to select the region with the highest energy in all bands, but the present invention is not limited to this, and a certain band may also be preliminarily selected beforehand, after which a quantization target band is finany selected in the preliminarily selected band. jn such a case, a preliminarily selected band may b^ decided according to the input signal sampling rate, ceding bit rate, or the like. For example, one method is to select a low band preliminarily when the sampling rate is low.
In the above embodiment^ MDCT is used as a transform encoding method, and therefore "MDCT coefficient" used in the above embodiments essentially means "spectrum". Therefore, the expression "MDCT coefficient" may be replaced by ^spectrum".
In the above embodiments, e^ampies have been shown
in which speech decoding apparatuses 200, 200a, 400, 600, 800, 1010, 1200, and 1400 receive as input and process
encoded data transmitted from speech encoding apparatuses 100, 100a, 300, 500, 700, 10Q0, 1100, and 1300,
• 109
respectively, but encoded data output by an encoding apparatus of a different configuration capable of generating encoded data having a similar configuration may also be input and processed.
An encoding apparatus, decoding apparatus, and method thereof according to the present invention are not limited to the above-described embodiments, and various variations and modifications may be possible without departing from the scope of the present invention. For example, it is possible fc,r embodiments to be implemented by being combined apProPriately.
It is possible for an encoding apparatus and decoding apparatus according to the present invention to be installed in a communication terminal apparatus and base station apparatus in a mobile communication system, thereby enabling a communication terminal apparatus, base station apparatus, and mobile communication system that have the same kind of operational effects as described above to be provided. ' A case has here been described by way of example in which the present invention is configured as hardware, but it is also possible for the present invention to be implemented by software. For example, the same kind of functions as those of an encoding apparatus and decoding apparatus according to the present invention can be realized by writing an algorithm of an encoding method and decoding method according to the present invention
110
in a programming language, storing this program in memory, and having it executed by an information processing means.
The function blocks used in the descriptions of the above embodiments are typically implemented as LSIs, which are integrated circuits. These may be implemented individually as single chips, or a single chip may incorporate some or all of them.
Here, the term LSI has been used, but the terms IC, system LSI, super LSI, ultra LSI, and so forth may also be used according to differences in the degree of integration.
The method of implementing integrated circuitry is not limited to LSI, and implementation by means of dedicated circuitry or a general-purpose processor may also be used. An FPGA {Field Programmable Gate Array) for which programming is possible after LSI fabrication, or a reconfigurable processor allowing reconfiguration of circuit cell connections and settings within an LSI, may also be used.
In the event of the introduction of an integrated circuit implementation technology whereby LSI is replaced by a different technology as an advance in, or derivation from, semiconductor technology, integration of the function blocks may of course be performed using that technology. The application of biotechnology or the like is also a possibility.
The disclosures of Japanese Patent Application
111
No.2006-336270, filed on December 13, 2006, Japanese Patent Application No. 2007-053499, filed on March 2 , 2007, Japanese Patent Application No.2007-132078, filed on May 17, 2007, and Japanese Patent Application No. 2007-185078, filed on July 13, 2 007, including the specifications, drawings and abstracts, are incorporated herein by reference in their entirety.
Industrial Applicability
An encoding apparatus and so forth according to the present invention is suitable for use in a communication terminal apparatus, base station apparatus, or the like, in a mobile communication system.
112
WE CLAIM :
1. An encoding apparatus comprising:
a transform section that transforms an input signal to a frequency domain to obtain a frequency domain parameter;
a selection section that selects a quantization target band from among a plurality of subbands obtained by dividing the frequency domain, and generates band information indicating the quantization target band;
a shape quantization section that quantizes a shape of the frequency domain parameter in the quantization target band; and
a gain quantization section that encodes gain of a frequency domain parameter in the quantization target band to obtain gain encoded information.
2 . The encoding apparatus according to claim 1, further comprising a determination section that determines whether or not predictive encoding is to be performed based on a number of subbands common to the quantization target band and a quantization target band selected in the past,
wherein the gain quantization section encodes gain of the frequency domain parameter in accordance with a determination result of the determination section.
3. The encoding apparatus according to claim 2 , further
113
comprising a determination section that determines that predictive encoding is to be performed when a number of subbands common to the quantization target band and a quantization target band selected in the past is greater than or equal to a predetermined value, and determines that predictive encoding is not to be performed when the number of common subbands is less than the predetermined value,
wherein the gain quantization section obtains gain encoded information by performing predictive encoding on gain of a frequency domain parameter in the quantization target band using past gain encoded information when the determination section determines that predictive encoding is to be performed, and obtains gain encoded information by directly quantizing gain of a frequency domain parameter in the quantization target band when the determination section determines that predictive encoding is not to be performed.
4 . The encoding apparatus according to claim 1, wherein the gain quantization section obtains the gain encoded information by performing vector quantization of gain of the frequency domain parameter.
5. The encoding apparatus according to claim 1, wherein the gain quantization section obtains the gain encoded information by performing predictive quantizing of the
114
gain using gain of a frequency domain parameter in a past frame.
6. The encoding apparatus according to claim 1, wherein the selection section selects a region for which energy is highest among regions composed of a plurality of subbands as a quantization target band.
7 . The encoding apparatus according to claim 1, wherein the selection section, when candidate bands exist for which a number of subbands common to a quantization target band selected in the past is greater than or equal to a predetermined value and energy is greater than or equal to a predetermined value, selects a band for which energy is highest among the candidate bands as the quantization target band, and when the candidate bands do not exist, selects a band for which energy is highest in all bands of the frequency domain as the quantization target band.
8. The encoding apparatus according to claim 1, wherein the selection section selects a band closest to a quantization target band selected in the past among bands for which energy is greater than or equal to a predetermined value as the quantization target band.
9. The encoding apparatus according to claim 1, wherein the selection section selects the quantization target
115
band after multiplication by a weight that is larger the more toward a low-band side a subband is.
10. The encoding apparatus according to claim 1, wherein the selection section selects a low-band-side fixed subband as the quantization target band.
11. The encoding apparatus according to claim 1, wherein the selection section selects the quantization target band after multiplication by a weight that is larger the higher the frequency of selection in the past of a subband i s .
12 . The encoding apparatus according to claim 2 , further
comprising an interpolation section that performs
interpolation on gain of a frequency domain parameter
in a subband not quantized in the past among subbands
indicated by the band information using past gain encoded
information, to obtain an interpolation value,
wherein the gain quantization section also uses the interpolation value when performing the predictive encoding.
13 . The encoding apparatus according to claim 2, further
comprising a deciding section that decides a prediction
coefficient such that a weight of a gain value of a past
frame is larger the larger a subband common to a
116
quantization target band of a past frame and a quantization target band of a current frame is,
wherein the gain quantization section uses the prediction coefficient when performing the predictive encoding.
14. The encoding apparatus according to claim 1, wherein the selection section fixedly selects a predetermined subband as part of the quantization target band.
15. The encoding apparatus according to claim 1, wherein the selection section selects the quantization target band after multiplication by a weight that is larger the more toward a high-band side a subband is in part of the quantization target band.
16. The encoding apparatus according to claim 2, wherein the gain quantization section performs predictive encoding on gain of a frequency domain parameter in part of the quantization target band, and performs direct quantizing on gain of a frequency domain parameter in a remaining part.
17 . The encoding apparatus according to claim 1, wherein the gain quantization section performs vector quantization of the gain of a nonconsecutive plurality of subbands.
117
18. A decoding apparatus comprising:
a receiving section that receives information indicating a quantization target band selected from among a plurality of subbands obtained by dividing a frequency domain of an input signal;
a shape dequantization section that decodes shape encoded information in which a shape of a frequency domain parameter in the quantization target band is quantized, to generate a decoded shape;
a gain dequantization section that decodes gain encoded information in which gain of a frequency domain parameter in the quantization target band is quantized, to generate decoded gain, and decodes a frequency parameter using the decoded shape and the decoded gain to generate a decoded frequency domain parameter; and
a time domain transform section that transform s the decoded frequency domain parameter to the time domain and obtains a time domain decoded signal.
19. The decoding apparatus according to claim 18,
further comprising a determination section that
determines whether or not predictive decoding is to be
performed based on a number of subbands common to the
quantization target band and a quantization target band
selected in the past,
wherein the gain dequantization section decodes the
118
gain encoded information in accordance with a determination result of the determination section to generate decoded gain.
20. The decoding apparatus according to claim 19, further comprising a determination section that determines that predictive decoding is to be performed when a number of subbands common to the quantization target band and a quantization target band selected in the past is greater than or equal to a predetermined value, and determines that predictive decoding is not to be performed when the number of common subbands is less than the predetermined value,
wherein the gain dequantization section performs predictive decoding of gain of a frequency domain parameter in the quantization target band using gain obtained in past gain decoding when the determination section determines that predictive decoding is to be performed, and performs direct d^quantization of gain encoded information in which gain of a frequency domain parameter is quantized in the quantization target band when the determination section determines that predictive decoding is not to be performed.
21. An encoding method comprising:
a step of transforming an input signal to a frequency domain to obtain a frequency domain parameter;
119
a step of selecting a quantization target band from among a plurality of subbands obtained by dividing the frequency domain, and generating band information indicating the quantization target band;
a step of quantizing a shape of the .frequency domain parameter in the quantization target band to obtain shape encoded information; and
a step of encoding gain of a frequency domain parameter in the quantization target band to obtain gain encoded information.
22. A decoding method comprising:
a step of receiving information indicating a quantization target band selected from among a plurality of subbands obtained by dividing a frequency domain of an input signal;
a step of decoding shape encoded information in which the shape of a frequency domain parameter in the quantization target band is quantized, to generate a decoded shape;
a step of decoding gain encoded information in which gain of a frequency domain parameter in the quantization target band is quantized, to generate decoded gain, and decoding a frequency domain parameter using the decoded shape and the decoded gain to generate a decoded frequency domain parameter; and
a step of transforming the decoded frequency domain
120
parameter to a time domain to obtain a time domain decoded signal.
Dated this 8th day of June, 2009
FOR PANASONIC CORPORATION By their Agent
(MADHURI RAMESHCHANDRA TAWRI) KRISHNA & SAURASTRI
121