“Audio Encoder And Decoder, Methods For Encoding And Decoding An Audio Signal, Audio Stream”
Abstract:
An encoder for providing an audio stream on the basis of a
transform-domain representation of an input audio signal
comprises a quantization error calculator configured to
determine a multi-band quantization error over a plurality
of frequency bands of the input audio signal for which
separate band gain information is available. The encoder
also comprises an audio stream provider configured to
provide the audio stream such that the audio stream
comprises an information describing an audio content of the
frequency bands and an information describing the multi-
band quantization error.
A decoder for providing a decoded representation of an audio signal on the basis of an encoded audio stream representing spectral components of frequency bands of the audio signal comprises a noise filler configured to introduce noise into spectral components of a plurality of
frequency bands to which separate frequency band gain information is associated on the basis of a common multiband noise intensity value.
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
Audio Encoder, Audio Decoder, Methods for Encoding and
Decoding an Audio Signal, Audio Stream and Computer Program
Background of the Invention
Embodiments according to the invention are related to an
encoder for providing an audio stream on the basis of a
transform-domain representation of an input audio signal.
Further embodiments according to the invention are related
to a decoder for providing a decoded representation of an
audio signal on the basis of an encoded audio stream.
Further embodiments according to the invention provide
methods for encoding an audio signal and for decoding an
audio signal. Further embodiments according to the
invention provide an audio stream. Further embodiments
according to the invention provide computer programs for
encoding an audio signal and for decoding an audio signal.
Generally speaking, embodiments according to the invention
are related to a noise filling.
Audio coding concepts often encode an audio signal in the
frequency domain. For example, the so-called "advanced
audio coding" (AAC) concept encodes the contents of
different spectral bins (or frequency bins), taking into
consideration a psychoacoustic model. For this purpose,
intensity information for different spectral bins is
encoded. However, the resolution used for encoding
intensities in different spectral bins is adapted in
accordance with the psychoacoustic relevances of the
different spectral bins. Thus, some spectral bins, which
are considered as being of low psychoacoustic relevance,
are encoded with a very low intensity resolution, such that
some of the spectral bins considered to be of low
psychoacoustic relevance, or even a dominant number
thereof, are quantized to zero. Quantizing the intensity of
a spectral bin to zero brings along the advantage that the
quantized zero-value can be encoded in a very bit-saving
manner, which helps to keep the bit rate as small as
possible. Nevertheless, spectral bins quantized to zero
sometimes result in audible artifacts, even if the
psychoacoustic model indicates that the spectral bins are
of low psychoacoustic relevance.
Therefore, there is a desire to deal with spectral bins
quantized to zero, both in an audio encoder and an audio
decoder.
Different approaches are known for dealing with spectral
bins encoded to zero in transform-domain audio coding
systems and also in speech coders.
For example, the MPEG-4 "AAC" (advanced audio coding) uses
the concept of perceptual noise substitution (PNS). The
perceptional noise substitution fills complete scale factor
bands with noise only. Details regarding the MPEG-4 AAC
may, for example, be found in the International Standard
ISO/IEC 14496-3 (Information Technology - Coding of Audio-
Visual Objects - Part 3: Audio). Furthermore, the AMR-WB+
speech coder replaces vector quantization vectors (VQ
vectors) quantized to zero with a random noise vector,
where each complex spectral value has a constant amplitude,
but a random phase. The amplitude is controlled by one
noise value transmitted with the bitstream. Details
regarding the AMR-WB+ speech coder may, for example, be
found in the technical specification entitled "Third
Generation Partnership Project; Technical Specification
Group Services and System Aspects; Audio Codec Processing
Functions; Extended Adaptive Multi-Rate-Wide Band (AMR-WB+)
Codec; Transcoding Functions (Release Six)", which is also
known as "3GPP TS 26.290 V6.3.0 (2005-06) - Technical
Specification".
Further, EP 1 395 980 B1 describes an audio coding concept.
The publication describes a means by which selected
frequency bands of information from an original audio
signal, which are audible, but which are perceptionally
less relevant, need not be encoded, but may be replaced by
a noise filling parameter. Those signal bands having
content, which is perceptionally more relevant are, in
contrast, fully encoded. Encoding bits are saved in this
manner without leaving voids in the frequency spectrum of
the received signal. The noise filling parameter is a
measure of the RMS signal value within the band in question
and is used at the reception end by a decoding algorithm to
indicate the amount of noise to inject in the frequency
band in question.
Further approaches provide for a non-guided noise insertion
in the decoder, taking into account the tonality of the
transmitted spectrum.
However, the conventional concepts typically bring along
the problem that they either comprise a poor resolution
regarding the granularity of the noise filling, which
typically degrades the hearing impression, or require a
comparatively large amount of noise filling side
information, which requires extra bit rate.
In view of the above, there is the need for an improved
concept of noise filling, which provides for an improved
trade-off between the achievable hearing impression and the
required bit rate.
Summary of the Invention
An embodiment according to the invention creates an encoder
for providing an audio stream on the basis of a transform-
domain representation of an input audio signal. The encoder
comprises a quantization error calculator configured to
determine a multi-band quantization error over a plurality
of frequency bands (for example, over a plurality of scale
factor bands) of the input audio signal, for which separate
band gain information (for example, separate scale factors)
is available. The encoder also comprises an audio stream
provider configured to provide the audio stream such that
the audio stream comprises an information describing an
audio content of the frequency bands and an information
describing the multi-band quantization error.
The above-described encoder is based on the finding that
the usage of a multi-band quantization error information
brings along the possibility to obtain a good hearing
impression on the basis of a comparatively small amount of
side information. In particular, the usage of a multi-band
quantization error information, which covers a plurality of
frequency bands for which separate band gain information is
available, allows for a decoder-sided scaling of noise
values, which are based on the multi-band quantization
error, in dependence on the band gain information.
Accordingly, as the band gain information is typically
correlated with a psychoacoustic relevance of the frequency
bands or with a quantization accuracy applied to the
frequency bands, the multi-band quantization error
information has been identified as a side information,
which allows for a synthesis of filling noise providing a
good hearing impression while keeping the bit rate-cost of
the side information low.
In a preferred embodiment, the encoder comprises a
quantizer configured to quantize spectral components (for
example, spectral coefficients) of different frequency
bands of the transform domain representation using
different quantization accuracies in dependence on
psychoacoustic relevances of the different frequency bands
to obtain quantized spectral components, wherein the
different quantization accuracies are reflected by the band
gain information. Also, the audio stream provider is
configured to provide the audio stream such that the audio
stream comprises an information describing the band gain
information (for example, in the form of scale factors) and
such that the audio stream also comprises the information
describing the multi-band quantization error.
In a preferred embodiment, the quantization error
calculator is configured to determine the quantization
error in the quantized domain, such that a scaling, in
dependence on the band gain information of the spectral
component, which is performed prior to an integer value
quantization, is taken into consideration. By considering
the quantization error in the quantized domain, the
psychoacoustic relevance of the spectral bins is considered
when calculating the multi-band quantization error. For
example, for frequency bands of small perceptual relevance,
the quantization may be coarse, such that the absolute
quantization error (in the non-quantized domain) is large.
In contrast, for spectral bands of high psychoacoustic
relevance, the quantization is fine and the quantization
error, in the non-quantized domain, is small. In order to
make the quantization errors in the frequency bands of high
psychoacoustic relevance and of low psychoacoustic
relevance comparable, such as to obtain a meaningful multi-
band quantization error information, the quantization error
is calculated in the quantized domain (rather than in the
non-quantized domain) in a preferred embodiment.
In a further preferred embodiment, the encoder is
configured to set a band gain information (for example, a
scale factor) of a frequency band, which is quantized to
zero (for example, in that all spectral bins of the
frequency band are quantized to zero) to a value
representing a ratio between an energy of the frequency
band quantized to zero and an energy of the multi-band
quantization error. By setting a scale factor of a
frequency band which is quantized to zero to a well-defined
value, it is possible to fill the frequency band quantized
to zero with a noise, such that the energy of the noise is
at least approximately equal to the original signal energy
of the frequency band quantized to zero. By adapting the
scale factor in the encoder, a decoder can treat the
frequency band quantized to zero in the same way as any
other frequency bands not quantized to zero, such that
there is no need for a complicated exception handling
(typically requiring an additional signaling). Rather, by
adapting the band gain information (e.g. scale factor), a
combination of the band gain value and the multi-band
quantization error information allows for a convenient
determination of the filling noise.
In a preferred embodiment, the quantization error
calculator is configured to determine the multi-band
quantization error over a plurality of frequency bands
comprising at least one frequency component (e.g. frequency
bin) quantized to a non-zero value while avoiding frequency
bands entirely quantized to zero. It has been found that a
multi-band quantization error information is particularly
meaningful if frequency bands entirely quantized to zero
are omitted from the calculation. In frequency bands
entirely quantized to zero, the quantization is typically
very coarse, so that the quantization error information
obtained from such a frequency band is typically not
particularly meaningful. Rather, the quantization error in
the psychoacoustically more relevant frequency bands, which
are not entirely quantized to zero, provides a more
meaningful information, which allows for a noise filling
adapted to the human hearing at the decoder side.
An embodiment according to the invention creates a decoder
for providing a decoded representation of an audio signal
on the basis of an encoded stream representing spectral
components of frequency bands of the audio signal. The
decoder comprises a noise filler configured to introduce
noise into spectral components (for example, spectral line
values or, more generally, spectral bin values) of a
plurality of frequency bands to which separate frequency
band gain information (for example, scale factors) is
associated on the basis of a common multi-band noise
intensity value.
The decoder is based on the finding that a single multi-
band noise intensity value can be applied for a noise
filling with good results if separate frequency band gain,
information is associated with the different frequency
bands. Accordingly, an individual scaling of noise
introduced in the different frequency bands is possible on
the basis of the frequency band gain information, such
that, for example, the single common multi-band noise
intensity value provides, when taken in combination with
separate frequency band gain information, sufficient
information to introduce noise in a way adapted to human
psychoacoustics. Thus, the concept described herein allows
to apply a noise filling in the quantized (but non-
rescaled) domain. The noise added in the decoder can be
scaled with the psychoacoustic relevance of the band
without requiring additional side information (beyond the
side information, which is, anyway, required to scale the
non-noise audio content of the frequency bands in
accordance with the psychoacoustic relevance of the
frequency bands).
In a preferred embodiment, the noise filler is configured
to selectively decide on a per-spectral-bin basis whether
to introduce a noise into individual spectral bins of a
frequency band in dependence on whether the respective
individual spectral bins are quantized to zero or not.
Accordingly, it is possible to obtain a very fine
granularity of the noise filling while keeping the quantity
of required side information very small. Indeed, it is not
required to transmit any frequency-band-specific noise
filling side information, while still having an excellent
granularity with respect to the noise filling. For example,
it is typically required to transmit a band gain factor
(e.g. scale factor) for a frequency band even if only a
single spectral line (or a single spectral bin) of said
frequency band is quantized to a non-zero intensity value.
Thus, it can be said that the scale factor information is
available for noise filling at no extra cost (in terms of
bitrate) if at least one spectral line (or a spectral bin)
of the frequency band is quantized to a non-zero intensity.
However, according to a finding of the present invention,
it is not necessary to transport frequency-band-specific
noise information in order to obtain an appropriate noise
filling in such a frequency band in which at least one non-
zero spectral bin intensity value exists. Rather, it has
been found that psychoacoustically good results can be
obtained by using the multi-band
combination with the frequency-band-specific frequency band
gain information (e.g. scale factor). Thus, it is not
necessary to waste bits on a frequency-band-specific noise
filling information. Rather, the transmission of a single
multi-band noise intensity value is sufficient, because
this multi-band noise filling information can be combined
with the frequency band gain information transmitted anyway
to obtain frequency-band-specific noise filling information
well adapted to the human hearing expectations.
In another preferred embodiment, the noise filler is
configured to receive a plurality of spectral bin values
representing different overlapping or non-overlapping
frequency portions of the first frequency band of a
frequency domain audio signal representation, and to
receive a plurality of spectral bin values representing
different overlapping or non-overlapping frequency portions
of the second frequency band of the frequency domain audio
signal representation. Further, the noise filler is
configured to replace one or more spectral bin values of
the first frequency band of the plurality of frequency
bands with a first spectral bin noise value, wherein a
magnitude of the first spectral bin noise value is
determined by the multi-band noise intensity value. In
addition, the noise filler is configured to replace one or
more spectral bin values of the second frequency band with
a second spectral bin noise value having the same magnitude
as the first spectral bin noise value. The decoder also
comprises a scaler configured to scale spectral bin values
of the first frequency band with the first frequency band
gain value to obtain scaled spectral bin values of the
first frequency band, and to scale spectral bin values of
the second frequency band with a second frequency band gain
value to obtain scaled spectral bin values of the second
frequency band, such that the replaced spectral bin values,
replaced with the first and second spectral bin noise
values, are scaled with different frequency band gain
values, and such that the replaced spectral bin value,
replaced with the first spectral bin noise value, an un-
replaced spectral bin values of the first frequency band
representing an audio content of the first frequency band
are scaled with the first frequency band gain value, and
such that the replaced spectral bin value, replaced with
the second spectral bin noise value, an un-replaced
spectral bin values of the second frequency band
representing an audio content of the second frequency band
are scaled with the second frequency band gain value.
In an embodiment according to the invention, the noise
filler is optionally configured to selectively modify a
frequency band gain value of a given frequency band using a
noise offset value if the given frequency band is quantized
to zero. Accordingly, the noise offset serves for
minimizing a number of side information bits. Regarding
this minimization, it should be noted that the encoding of
the scale factors (scf) in an AAC audio coder is performed
using a Huffmann encoding of the difference of subsequent
scale factors (scf). Small differences obtain the shortest
codes (while larger differences obtain larger codes). The
noise offset minimizes the "mean difference" at a
transition from conventional scale factors (scale factors
of bands not quantized to zero) to noise scale factors and
back, and thus optimizes the bit demand for the side
information. This is due to the fact that normally the
"noise scale factors" are larger than the conventional
scale factors, as the included lines are not >= 1, but
correspond to the mean quantization error e (wherein
typically 0