Encoding Apparatus And Decoding Apparatus In A Scalablecoding Scheme
Abstract:
Disclosed is an encoding device which can accurately specify a band having a large error among all the bands by using a small calculation amount. The device includes: a first position identification unit (201) which uses a first layer error conversion coefficient indicating an error of decoding signal for an input signal so as to search for a band having a large error in a relatively wide bandwidth in all the bands of the input signal and generates first position information indicating the identified band; a second position identification unit (202) which searches for a target frequency band having a large error in a relatively narrow bandwidth in the band identified by the first position identification unit (201) and generates second position information indicating the identified target frequency band; and an encoding unit (203) which encodes a first layer decoding error conversion coefficient contained in the target frequency band. The first position information, the second position information, and the encoding unit are transmitted to a communication partner.
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
C/O. PANASONIC CORPORATION,
1006. OAZA KADOMA,
KADOMA-SHI,
OSAKA, JAPAN 571-8501.
3. MORII TOSHIYUKI
C/O. PANASONIC CORPORATION,
1006. OAZA KADOMA,
KADOMA-SHI,
OSAKA, JAPAN 571-8501.
4. NOT APPLICABLE
NOT APPLICABLE
Specification
FORM 2
THE PATENTS ACT, 1970 (39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
[See section 10, Rule 13]
ENCODING DEVICE, DECODING DEVICE, AND METHOD THEREOF;
PANASONIC CORPORATION, A COMPANY
ORGANIZED AND EXISTING UNDER THE LAWS OF JAPAN, WHOSE ADDRESS IS 1006, OAZA KADOMA, KADOMA-SHI, OSAKA 571-8501, JAPAN.
THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED.
Technical Field
The present invention relates to an encoding apparatus, decoding apparatus and methods thereof used in a communication system of a scalable coding scheme.
Background Art
It is demanded in a mobile communication system that speech signals are compressed to low bit rates to transmit to efficiently utilize radio wave resources and so on. On the other hand, it is also demanded that quality improvement in phone call speech and call service of high fidelity be realized, and, to meet these demands, it is preferable to not only provide quality speech signals but also encode other quality signals than the speech signals, such as quality audio signals of wider bands.
The technique of integrating a plurality of coding techniques in layers is promising for these two contradictory demands. This technique combines in layers the first layer for encoding input signals in a form adequate for speech signals at low bit rates and a second layer for encoding differential signals between input signals and decoded signals of the first layer in a form adequate to other signals than speech. The technique of performing layered coding in this way have characteristics of providing scalability in bit streams acquired from an encoding apparatus, that is, acquiring decoded signals from part of information of bit streams, and, therefore, is generally referred to as "scalable coding (layered coding)."
The scalable coding scheme can flexibly support communication between networks of varying bit rates thanks to its characteristics, and, consequently, is adequate for a future network environment where various networks will be integrated by the IP protocol.
For example, Non-Patent Document 1 discloses a technique of realizing scalable coding using the technique that is
standardized by MPEG-4 (Moving Picture Experts Group phase-4). This technique uses CELP (Code Excited Linear Prediction) coding adequate to speech signals, in the first layer, and uses transform coding such as AAC (Advanced Audio Coder) and TwinVQ (Transform Domain Weighted Interleave Vector Quantization) with respect to residual signals subtracting first layer decoded signals from original signals, in the second layer.
By contrast with this, Non-Patent Document 2 discloses a method of encoding MDCT coefficients of a desired frequency bands in layers using TwinVQ that is applied to a module as a basic component. By sharing this module to use a plurality of times, it is possible to implement simple scalable coding of a high degree of flexibility. Although this method is based on the configuration where subbands which are the targets to be encoded by each layer are determined in advance, a configuration is also disclosed where the position of a subband, which is the target to be encoded by each layer, is changed within predetermined bands according to the property of input signals.
Non-Patent Document 1: "All about MPEG-4," written and edited by Sukeichi MIKI, the first edition, Kogyo Chosakai Publishing, Inc., September 30, 1998, page 126 to 127
Non-Patent Document 2: "Scalable Audio Coding Based on Hierarchical Transform Coding Modules," Akio JIN et al., Academic Journal of The Institute of Electronics, Information and Communication Engineers, Volume J83-A, No.3, page 241 to 252, March, 2000
Non-Patent Document 3: "AMR Wideband Speech Codec;
Transcoding functions," 3GPP TS 26.190, March 2001.
Non-Patent Document 4: "Source-Controll ed-Variable-Rate
Multimode Wideband Speech Codec (VMR-WB), Service options 62 and 63 for Spread Spectrum Systems," 3GPP2 C.S0052-A, April 2005.
Non-Patent Document 5: "7/10/15 kHz band scalable speech coding schemes using the band enhancement technique by means of pitch filtering," Journal of Acoustic Society of Japan 3-11-4, page 327 to 328, March 2004
3
Disclosure of the Invention
Problems to be Solved by the Invention
However, to improve the speech quality of output signals, how sub bands (i.e. target frequency bands) of the second layer encoding section are set, is important. The method disclosed in Non-Patent Document 2 determines in advance subbands which are the target to be encoded by the second layer (FIG.1A). In this case, quality of predetermined subbands is improved at all times and, therefore, there is a problem that, when error components are concentrated in other bands than these subbands, it is not possible to acquire an improvement effect of speech quality very much.
Further, although Non-Patent Document 2 discloses that the position of a subband, which is the target to be encoded by each layer, is changed within predetermined bands (FIG.IB) according to the property of input signals, the position employed by the subband is limited within the predetermined bands and, therefore, the above-described problem cannot be solved. If a band employed as a subband covers a full band of an input signal (FIG.1C), there is a problem that the computational complexity to specify the position of a subband increases. Furthermore, when the number of layers increases, the position of a subband needs to be specified on a per layer basis and, therefore, this problem becomes substantial.
It is therefore an object of the present invention to provide an encoding apparatus, decoding apparatus and methods thereof for, in a scalable coding scheme, accurately specifying a band of a great error from the full band with a small computational complexity.
Means for Solving the Problem
The encoding apparatus according to the present invention employs a configuration which includes: a first layer encoding section that performs encoding processing with respect to input transform coefficients to generate first layer encoded data; a first layer decoding section that performs decoding processing using
the first layer encoded data to generate first layer decoded transform coefficients; and a second layer encoding section that performs encoding processing with respect to a target frequency band where, in first layer error transform coefficients representing an error between the input transform coefficients and the first layer decoded transform coefficients, a maximum error is found, to generate second layer encoded data, and in which wherein the second layer encoding section has: a first position specifying section that searches for a first band having the maximum error throughout a full band, based on a wider bandwidth than the target frequency band and a predetermined first step size to generate first position information showing the specified first band; a second position specifying section that searches for the target frequency band throughout the first band, based on a narrower second step size than the first step size to generate second position information showing the specified target frequency band; and an encoding section that encodes the first layer error transform coefficients included in the target frequency band specified based on the first position information and the second position information to generate encoded information.
The decoding apparatus according to the present invention employs a configuration which includes: a receiving section that receives: first layer encoded data acquired by performing encoding processing with respect to input transform coefficients; second layer encoded data acquired by performing encoding processing with respect to a target frequency band where, in first layer error transform coefficients representing an error between the input transform coefficients and first layer decoded transform coefficients which are acquired by decoding the first layer encoded data, a maximum error is found; first position information showing a first band which maximizes the error, in a bandwidth wider than the target frequency band; and second position information showing the target frequency band in the first band; a first layer decoding section that decodes the first layer encoded data to generate first layer decoded transform coefficients; a second layer decoding section that specifies the target frequency
band based on the first position information and the second position information and decodes the second layer encoded data to generate first layer decoded error transform coefficients; and an adding section that adds the first layer decoded transform coefficients and the first layer decoded error transform coefficients to generate second layer decoded transform coefficients.
The encoding method according to the present invention includes: a first layer encoding step of performing encoding processing with respect to input transform coefficients to generate first layer encoded data; a first layer decoding step of performing decoding processing using the first layer encoded data to generate first layer decoded transform coefficients; and a second layer encoding step of performing encoding processing with respect to a target frequency band where, in first layer error transform coefficients representing an error between the input transform coefficients and the first layer decoded transform coefficients, a maximum error is found, to generate second layer encoded data, where the second layer encoding step includes: a first position specifying step of searching for a first band having the maximum error throughout a full band, based on a wider bandwidth than the target frequency band and a predetermined first step size to generate first position information showing the specified first band; a second position specifying step of searching for the target frequency band throughout the first band, based on a narrower second step size than the first step size to generate second position information showing the specified target frequency band; and an encoding step of encoding the first layer error transform coefficients included in the target frequency band specified based on the first position information and the second position information to generate encoded information.
The decoding method according to the present invention includes: a receiving step of receiving: first layer encoded data acquired by performing encoding processing with respect to input transform coefficients; second layer encoded data acquired by performing encoding processing with respect to a target frequency
band where, in first layer error transform coefficients representing an error between the input transform coefficients and first layer decoded transform coefficients which are acquired by decoding the first layer encoded data, a maximum error is found; first position information showing a first band which maximizes the error, in a bandwidth wider than the target frequency band; and second position information showing the target frequency band in the first band; a first layer decoding step of decoding the first layer encoded data to generate first layer decoded transform coefficients; a second layer decoding step of specifying the target frequency band based on the first position information and the second position in formation and decoding the second layer encoded data to generate first layer decoded error transform coefficients; and an adding step of adding the first layer decoded transform coefficients and the first layer decoded error transform coefficients to generate second layer decoded transform coefficients.
Advantageous Effects of Invention
According to the present invention, the first position specifying section searches for the band of a great error throughout the full band of an input signal, based on relatively wide bandwidths and relatively rough step sizes to specify the band of a great error, and a second position specifying section searches for the target frequency band (i.e. the frequency band having the greatest error) in the band specified in the first position specifying section based on relatively narrower bandwidths and relatively narrower step sizes to specify the band having the greatest error, so that it is possible to specify the band of a great error from the full band with a small computational complexity and improve sound quality.
Brief Description of Drawings
FIG.l shows an encoded band of the second layer encoding
section of a conventional speech encoding apparatus;
7
FIG.2 is a block diagram showing the main configuration of an encoding apparatus according to Embodiment 1 of the present invention;
FIG.3 is a block diagram showing the configuration of the second layer encoding section shown in FIG.2;
FIG.4 shows the position of a band specified in the first position specifying section shown in FIG.3;
FIG.5 shows another position of a band specified in the first position specifying section shown in FIG.3;
FIG.6 shows the position of target frequency band specified in the second position specifying section shown in FIG.3;
FIG.7 is a block diagram showing the configuration of an encoding section shown in FIG.3;
FIG.8 is a block diagram showing a main configuration of a decoding apparatus according to Embodiment 1 of the present invention;
FIG.9 shows the configuration of the second layer decoding section shown in FIG.8;
FIG.10 shows the state of the first layer decoded error transform coefficients outputted from the arranging section shown in FIG.9;
FIG.11 shows the position of the target frequency specified in the second position specifying section shown in FIG.3;
FIG.12 is a block diagram showing another aspect of the configuration of the encoding section shown in FIG.7;
FIG.13 is a block diagram showing another aspect of the configuration of the second layer decoding section shown in FIG.9;
FIG.14 is a block diagram showing the configuration of the second layer encoding section of the encoding apparatus according to Embodiment 3 of the present invention;
FIG.15 shows the position of the target frequency specified in a plurality of sub-position specifying sections of the encoding apparatus according to Embodiment 3;
FIG.16 is a block diagram showing the configuration of
the second layer encoding section of the encoding apparatus according to Embodiment 4 of the present invention;
FIG.17 is a block diagram showing the configuration of the encoding section shtjwn in FIG.16;
FIG.18 shows tin encoding section in case where the second position information candidates stored in the second position information codebook in FIG.17 each have three target frequencies;
FIG.19 is a block diagram showing another configuration of the encoding section shown in FIG.16;
FIG.20 is a block diagram showing the configuration of the second layer encoding section according to Embodiment 5 of the present invention;
FIG.21 shows the position of a band specified in the first position specifying section shown in FIG.20;
FIG.22 is a block diagram showing the main configuration of the encoding apparatus according to Embodiment 6;
FIG.23 is a block diagram showing the configuration of the first layer encoding section of the encoding apparatus shown in FIG.22;
FIG.24 is a block diagram showing the configuration of the first layer decoding section of the encoding apparatus shown in FIG.22;
FIG.25 is a block diagram showing the main configuration of the decoding apparatus supporting the encoding apparatus shown in FIG.22;
FIG.26 is a block diagram showing the main configuration of the encoding apparatus according to Embodiment 7;
FIG.27 is a block diagram showing the main configuration of the decoding apparatus supporting the encoding apparatus shown in FIG.26;
FIG.28 is a block diagram showing another aspect of the main configuration of the encoding apparatus according to Embodiment 7;
FIG.29A shows the positions of bands in the second layer encoding section shown in FIG.28;
FIG.29B shows the positions of bands in the third layer encoding section shown in FIG.28;
FIG.29C shows the positions of bands in the fourth layer encoding section shown in FIG.28;
FIG.30 is a block diagram showing the main configuration of the decoding apparatus supporting the encoding apparatus shown in FIG. 2 8 ;
FIG.31A shows other positions of bands in the second layer encoding section shown in FIG.28;
FIG.31B shows other positions of bands in the third layer encoding section shown in FIG.28;
FIG.31C shows other positions of bands in the fourth layer encoding section shown in FIG.28;
FIG.32 illustrates the operation of the first position specifying section according to Embodiment 8;
FIG.33 is a block diagram showing the configuration of the first position specifying section according to Embodiment 8;
FIG.34 illustrates how the first position information is formed in the first position information forming section according to Embodiment 8 ;
FIG.35 illustrates decoding processing according to Embodiment 8;
FIG.36 illustrates a variation of Embodiment 8; and
FIG.37 illustrates a variation of Embodiment 8.
Best Mode for Carrying Out the Invention
Embodiments of the present invention will be explained in details below with reference to the accompanying drawings.
(Embodiment 1 )
FIG.2 is a block diagram showing the main configuration of an encoding apparatus according to Embodiment 1 of the present invention. Encoding apparatus 100 shown in FIG.2 has frequency domain transforming section 101, first layer encoding section 102, first layer decoding section 103, subtracting section 104, second layer encoding section 105 and multiplexing section 106.
Frequency domain transforming section 101 transforms a
time domain input signal into a frequency domain signal (i.e. input transform coefficients), and outputs the input transform coefficients to first layer encoding section 102.
First layer encoding section 102 performs encoding processing with respect to the input transform coefficients to generate first layer encoded data, and outputs this first layer encoded data to first layer decoding section 103 and multiplexing section 106.
First layer decoding section 103 performs decoding processing using the first layer encoded data to generate first layer decoded transform coefficients, and outputs the first layer decoded transform coefficients to subtracting section 104.
Subtracting section 104 subtracts the first layer decoded transform coefficients generated in first layer decoding section 103, from the input transform coefficients, to generate first layer error transform coefficients, and outputs this first layer error transform coefficients to second layer encoding section 105.
Second layer encoding section 105 performs encoding processing of the first layer error transform coefficients outputted from subtracting section 104, to generate second layer encoded data, and outputs this second layer encoded data to multiplexing section 106.
Multiplexing section 106 multiplexes the first layer encoded data acquired in first layer encoding section 102 and the second layer encoded data acquired in second layer encoding section 105 to form a bit stream, and outputs this bit stream as final encoded data, to the transmission channel.
FIG.3 is a block diagram showing a configuration of second layer encoding section 105 shown in FIG.2. Second layer encoding section 105 shown in FIG.3 has first position specifying section 201, second position specifying section 202, encoding section 203 and multiplexing section 204,
First position specifying section 201 uses the first layer error transform coefficients received from subtracting section 104 to search for a band employed as the target frequency band, which are target to be encoded, based on predetermined bandwidths and
predetermined step sizes, and outputs information showing the specified band as first position information, to second position specifying section 202, encoding section 203 and multiplexing section 204. Meanwhile, first position specifying section 201 will be described later in details. Further, these specified band may be referred to as "range" or "region."
Second position specifying section 202 searches for the target frequency band in the band specified in first position specifying section 201 based on narrower bandwidths than the bandwidths used in first position specifying section 201 and narrower step sizes than the step sizes used in first position specifying section 201, and outputs information showing the specified target frequency band as second position information, to encoding section 203 and multiplexing section 204. Meanwhile, second position specifying section 202 will be described later in details.
Encoding section 203 encodes the first layer error transform coefficients included in the target frequency band specified based on the first position information and second position information to generate encoded information, and outputs the encoded information to multiplexing section 204. Meanwhile, encoding section 203 will be described later in details.
Multiplexing section 204 multiplexes the first position information, second position information and encoded information to generate second encoded data, and outputs this second encode data. Further, this multiplexing section 204 is not indispensable and these items of information may be outputted directly to multiplexing section 106 shown in FIG.2.
FIG.4 shows the band specified in first position specifying section 201 shown in FIG.3.
In FIG.4, first position specifying section 201 specifies one of three bands set based on a predetermined bandwidth, and outputs position information of this band as first position information, to second position specifying section 202, encoding section 203 and multiplexing section 204. Each band shown in FIG.4 is configured to have a bandwidth equal to or wider than the
target frequency bandwidth (band 1 is equal to or higher than F1 and lower than F3, band 2 is equal to or higher than F2 and lower than F4, and band 3 is equal to or higher than F3 and lower than F5). Further, although each band is configured to have the same bandwidth with the present embodiment, each band may be configured to have a different bandwidth. For example, like the critical bandwidth of human perception, the bandwidths of bands positioned in a low frequency band may be set narrow and the bandwidths of bands positioned in a high frequency band may be set wide.
Next, the method of specifying a band in first position specifying section 201 will be explained. Here, first position specifying section 201 specifies a band based on the magnitude of energy of the first layer error transform coefficients. The first layer error transform coefficients are represented as ei(k), and energy ER(i) of the first layer error transform coefficients included in each band is calculated according to following equation 1 . [1]
Here, i is an identifier that specifies a band, FRL(i) is the lowest frequency of the band i and FRH(i) is the highest frequency of the band i.
In this way, the band of greater energy of the first layer error transform coefficients are specified and the first layer error transform coefficients included in the band of a great error are encoded, so that it is possible to decrease errors between decoded signals and input signals and improve speech quality.
Meanwhile, normalized energy NER(1), normalized based on the bandwidth as in following equation 2, may be calculated instead of the energy of the first layer error transform coefficients. [2]
Further, as the reference to specify the band, instead of energy of the first layer error transform coefficients, the energy WER(i) and WNER(i) of the first layer error transform coefficients (normalized energy that is normalized based on the bandwidth), to which weight is applied taking into account the characteristics of human perception, may be found according to equations 3 and 4. Here, w(k) represents weight related to the characteristics of human perception. [3]
In this case, first position specifying section 201
increases weight for the frequency of high importance in the
perceptual characteristics such that the band including this
frequency is likely to be selected, and decreases weight for the
frequency of low importance such that the band including this
frequency is not likely to be selected. By this means, a
perceptually important band is preferentially selected, so that it is possible to provide a similar advantage of improving sound quality as described above. Weight may be calculated and used utilizing, for example, human perceptual loudness characteristics or perceptual masking threshold calculated based on an input signal or first layer decoded signal.
Further, the band selecting method may select a band from bands arranged in a low frequency band having a lower frequency than the reference frequency (Fx) which is set in advance. With the example of FIG.5, band is selected in band 1 to band 8. The reason to set limitation (i.e. reference frequency) upon selection
of bands is as follows. With a harmonic structure or harmonics structure which is one characteristic of a speech signal (i.e. a structure in which peaks appear in a spectrum at given frequency intervals), greater peaks appear in a low frequency band than in a high frequency band and peaks appear more sharply in a low frequency band than in a high frequency band similar to a quantization error (i.e. error spectrum or error transform coefficients) produced in encoding processing. Therefore, even when the energy of an error spectrum (i.e. error transform coefficients) in a low frequency band is lower than in a high frequency band, peaks in an error spectrum (i.e. error transform coefficients) in a low frequency band appear more sharply than in a high frequency band, and, therefore, an error spectrum (i.e. error transform coefficients) in the low frequency band is likely to exceed a perceptual masking threshold (i.e. threshold at which people can perceive sound) causing deterioration in perceptual sound quality.
This method sets the reference frequency in advance to determine the target frequency from a low frequency band in which peaks of error coefficients (or error vectors) appear more sharply than in a high frequency band having a higher frequency than the reference frequency (Fx), so that it is possible to suppress peaks of the error transform coefficients and improve sound quality.
Further, with the band selecting method, the band may be selected from bands arranged in low and middle frequency band. With the example in FIG.4, band 3 is excluded from the selection candidates and the band is selected from band 1 and band 2. By this means, the target frequency band is determined from low and middle frequency band.
Hereinafter, as first position information, first position specifying section 201 outputs "1" when band 1 is specified, "2" when band 2 is specified and "3" when band 3 is specified.
FIG.6 shows the position of the target frequency band specified in second position specifying section 202 shown in FIG.3.
Second position specifying section 202 specifies the
target frequency band in the band specified in first position specifying section 201 based on narrower step sizes, and outputs position information of the target frequency band as second position information, to encoding section 203 and multiplexing section 204.
Next, the method of specifying the target frequency band in second position specifying section 202 will be explained. Here, referring to an example where first position information outputted from first position specifying section 201 shown in FIG.3 is "2," the width of the target frequency band is represented as "BW." Further, the lowest frequency F2 in band 2 is set as the base point, and this lowest frequency F2 is represented as G1 for ease of explanation. Then, the lowest frequencies of the target frequency band that can be specified in second position specifying section 202 is set to G2 to GN- Further, the step sizes of target frequency bands that are specified in second position specifying section 202 are Gn - Gn-1 and step sizes of the bands that are specified in first position specifying section 201 are Fn Fn -i(Gn-Gn-1