Audio Signal Decoder, Audio Signal Encoder, Methods Using A Sampling Rate Dependent Time Warp Contour Encoding
Abstract:
An audio signal decoder configured to provide a decoded audio signal representation on the basis of an encoded audio signal representation comprising a sampling frequency information, an encoded time warp information and an encoded spectrum representation comprises a time warp calculator and a warp decoder. The time warp calculator is configured to adapt a mapping rule for mapping codewords of the encoded time warp information onto decoded time warp values describing the decoded time warp information in dependence on the sampling frequency information. The warp decoder is configured to provide the decoded audio signal representation on the basis of the encoded spectrum representation and in dependence on the decoded time warp information.
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
C/O Apollo Building,
3E Herikerbergweg 1-35,
1101 CN Amsterdam Zuid-Oost,
The Netherlands
Inventors
1. BAYER, Stefan
Dortmunder Str. 14,
90425 Nürnberg,
GERMANY
2. BÄCKSTRÖM, Tom
Bauerngasse 8-12,
90443 Nürnberg
GERMANY
3. GEIGER, Ralf
Jakob-Herz-Weg 36,
91052 Erlangen,
GERMANY
4. EDLER, Bernd
Hemelingstr. 10,
30419 Hannover,
GERMANY
5. DISCH, Sascha
Wilhelmstrasse 70,
90766 Fürth,
GERMANY
6. VILLEMOES, Lars
Mandolinvaegen 22,
S-175 56 Jaerfaella,
SWEDEN
Specification
Audio Signal Decoder, Audio Signal Encoder, Methods and Computer Program
Using a Sampling Rate Dependent Time- arp Contour Encoding
Background of the Invention
Embodiments according to the invention are related to an audio signal decoder. Further
embodiments according to the invention are related to an audio signal encoder. Further
embodiments according to the invention are related to a method for decoding an audio
signal, to a method for encoding an audio signal and to a computer program.
Some embodiments according to the invention are related to a sampling frequency
dependent pitch variation quantization.
In the following, a brief introduction will be given into the field of time-warped audio
encoding, concepts of which can be applied in conjunction with some of the embodiments
of the invention.
In the recent years, techniques have been developed to transform an audio signal to a
frequency-domain representation, and to efficiently encode the frequency-domain
representation, for example, by taking into account perceptual masking thresholds. This
concept of audio signal encoding is particularly efficient if the block length, for which a set
of encoded spectral coefficients are transmitted, is long, and if only a comparatively small
number of spectral coefficients are well above the global masking threshold while a large
number of spectral coefficients are nearby or below the global masking threshold and can
thus be neglected (or coded with minimum code length). A spectrum in which said
condition holds is sometimes called a sparse spectrum.
For example, cosine-based or sine-based modulated lapped transforms are often used in
applications for source coding due to their energy compaction properties. That is, for
harmonic tones with constant fundamental frequencies (pitch), they concentrate the signal
energy to a low number of spectral components (sub-bands), which leads to an efficient
signal representation.
Generally, the (fundamental) pitch of a signal shall be understood to be the lowest
dominant frequency distinguishable from the spectrum of the signal. In the common
speech model, the pitch is the frequency of the excitation signal modulated by the human
throat. If only one single fundamental frequency would be present, the spectrum would be
extremely simple, comprising the fundamental frequency and the overtones only. Such a
spectrum could be encoded highly efficiently. For signals with varying pitch, however, the
energy corresponding to each harmonic component is spread over several transform
coefficients, thus leading to a reduction of coding efficiency.
In order to overcome the reduction of coding efficiency, the audio signal to be encoded is
effectively resampled on a non-uniform temporal grid. In the subsequent processing, the
sample positions obtained by the non-uniform resampling are processed as if they would
represent values on a uniform temporal grid. This operation is commonly denoted by the
phrase "time warping". The sample times may be advantageously chosen in dependence on
the temporal variation of the pitch, such that a pitch variation in the time warped version of
the audio signal is smaller than a pitch variation in the original version of the audio signal
(before time warping). After time warping of the audio signal, the time-warped version of
the audio signal is converted into the frequency-domain. The pitch-dependent time warping
has the effect that the frequency-domain representation of the time-warped audio signal
typically exhibits an energy compaction into a much smaller number of spectral
components than a frequency-domain representation of the original (non-time-warped
audio signal).
At the decoder side the frequency-domain representation of the time-warped audio signal is
converted to the time-domain, such that a time-domain representation of the time-warped
audio signal is available at the decoder side. However, in the time-domain representation
of the decoder-sided reconstructed time-warped audio signal, the original pitch variations
of the encoder-sided input audio signal are not included. Accordingly, yet another time
warping by resampling of the decoder-sided reconstructed time-domain representation of
the time-warped audio signal is applied.
In order to obtain a good reconstruction of the encoder-sided input audio signal at the
decoder, it is desirable that the decoder-sided time warping is at least approximately the
inverse operation with respect to the encoder-sided time warping. In order to obtain an
appropriate time warping, it is desirable to have an information available at the decoder,
which allows for an adjustment of the decoder-sided time warping.
As it is typically required to transfer such an information from the audio signal encoder to
the audio signal decoder, it is desirable to keep the bitrate required for this transmission
small while still allowing for a reliable reconstruction of the required time warp
information at the decoder side.
In view of this situation, there is a desire to have a concept which allows for a reliable
reconstruction of a time-warp information on the basis of an efficiently encoded
representation of the time-warp information.
Summary of the Invention
An embodiment according to the invention creates an audio decoder configured to provide
a decoded audio signal representation on the basis of an encoded audio signal
representation comprising a sampling frequency information, an encoded time warp
information and an encoded spectrum representation. The audio signal decoder comprises a
time warp calculator (which may, for example, take the function of a time warp decoder)
and a warp decoder. The time warp calculator is configured to map the encoded time warp
information onto a decoded time warp information. The time warp calculator is configured
to adapt a mapping rule for mapping codewords of the encoded time warp information onto
decoded time warp values describing the decoded time warp information in dependence on
the sampling frequency information. The warp decoder is configured to provide the
decoded audio signal representation on the basis of the encoded spectrum representation
and in dependence on the decoded time warp information.
This embodiment according to the invention is based on the finding that a time warp
(which is, for example, described by a time warp contour) can be efficiently encoded if the
mapping rule for mapping codewords of the encoded time warp information onto decoded
time warp values is adapted to the sampling rate because it has been found that it is
desirable to represent a larger time warp per sample for lower sampling frequencies than
for higher sampling frequencies. It has been found that this desire arises from the fact that
it is advantageous if a time warp per time unit, which is representable by the set of
codewords of the encoded time warp information, is approximately independent from the
sampling frequency, which translates into the consequence that a time warp representable
by a given set of codewords should be larger for smaller sampling frequencies than for
higher sampling frequencies under the assumption that the number of time warp codewords
per audio sample (or per audio frame) remains at least approximately constant independent
from the actual sampling frequency.
To summarize, it has been found that it is advantageous to adapt the mapping rule for
mapping codewords of the encoded time warp information (also briefly designated as time
warp codewords) onto decoded time warp values in dependence on the sampling frequency
of the encoded audio signal (represented by the encoded audio signal representation),
because this allows to represent the relevant time warp values using a small (and
consequently bitrate-efficient) set of time warp codewords both for the case of a
comparatively high sampling frequency and for the case of a comparatively low sampling
frequency.
By adapting the mapping rule, it is possible to encode a comparatively smaller range of
time warp values using a higher resolution for a comparatively high sampling frequency,
and to encode a comparatively larger range of time warp values with a coarser resolution
for a comparatively small sampling frequency, which in turn brings along a very good
bitrate efficiency.
In a preferred embodiment, the codewords of the encoded time warp information describe
a temporal evolution of a time warp contour. The time warp calculator is preferably
configured to evaluate a predetermined number of codewords of the encoded time warp
information for an audio frame of an encoded audio signal represented by the encoded
audio signal representation. The predetermined number of codewords is independent of a
sampling frequency of the encoded audio signal. Accordingly, it can be achieved that a
bitstream format remains substantially independent of the sampling frequency while it is
still possible to efficiently encode the time warp. By using a predetermined number of time
warp codewords for an audio frame of the encoded audio signal, wherein the
predetermined number is preferably independent of the sampling frequency of the encoded
audio signal, the bitstream format does not change with the sampling frequency and the
bitstream parser of an audio decoder does not need to be adjusted to the sampling
frequency. However, an efficient encoding of the time warp is still achieved by the
adaptation of the mapping rule for mapping codewords of the encoded time warp
information onto decoded time warp values, because the mapping of the time warp
codewords onto decoded time warp values can be adapted to the sampling frequency such
that a representable range of time warp values brings along a good compromise between
resolution and maximum encodeable time warp for different sampling frequencies.
In a preferred embodiment, the time warp calculator is configured to adapt the mapping
rule such that a range of decoded time warp values onto which codewords of a given set of
codewords of the encoded time warp information are mapped, is larger for a first sampling
frequency than for a second sampling frequency provided the first sampling frequency is
smaller than the second sampling frequency. Accordingly, the same codewords, which
encode a comparatively smaller range of time warp values for a comparatively high
sampling frequency encode a comparatively larger range of time warp values for a
comparatively smaller sampling frequency. Thus, it can be ensured that it is possible to
encode approximately the same time warp per time unit (defined, for example, in octaves
per second, briefly designated with "oct/s") for a high sampling frequency and a low
sampling frequency, even though more time warp codewords are transmitted per time unit
for a comparatively higher sampling frequency than for a comparatively lower sampling
frequency.
In a preferred embodiment, the decoded time warp values are time warp contour values
representing values of a time warp contour or time warp contour variation values
representing a change of values of a time warp contour.
In a preferred embodiment, the time warp calculator is configured to adapt the mapping
rule such that a maximum change of pitch over a given number of samples, which is
representable by a given set of codewords of the encoded time warp information, is larger
for a first sampling frequency than for a second sampling frequency provided the first
sampling frequency is smaller than the second sampling frequency. Accordingly, the same
set of codewords is used for describing different ranges of decoded time warp values,
which is very well-adapted to the different sampling frequencies.
In a preferred embodiment, the time warp calculator is configured to adapt the mapping
rule such that a maximum change of pitch over a given time period, which is representable
by a given set of codewords of the encoded time warp information at a first sampling
frequency, differs from a maximum change of pitch over the given time period, which is
representable by the given set of codewords of the encoded time warp information at a
second sampling frequency, by no more than 10% for a first sampling frequency and a
second sampling frequency differing by at least 30%. Accordingly, the fact that a given set
of codewords would conventionally represent a significantly different time warp per time
unit for different sampling frequencies is avoided, in accordance with the present
invention, by the adaptation of the mapping rule. Thus, a number of different codewords
can be kept reasonably small, which results in a good coding efficiency, wherein the
resolution for the encoding of the time warp is nevertheless adapted to the sampling
frequency.
In a preferred embodiment, the time warp calculator is configured to use different mapping
tables for mapping codewords of the encoded time warp information onto decoded time
warp values in dependence on the sampling frequency information. By providing different
mapping tables, the decoding mechanism can be kept very simple at the expense of the
memory requirements.
In another preferred embodiment, the time warp calculator is configured to adapt a
(reference) mapping rule, which describes decoded time warp values associated with
different codewords of the encoded time warp information for a reference sampling
frequency, to an actual sampling frequency different from the reference sampling
frequency. Accordingly, a memory demand can be kept small because it is only necessary
to store the mapping values (i.e. decoded time warp values) associated with a set of
different codewords for a single reference sampling frequency. It has been found that it is
possible with small computational effort to adapt the mapping values to a different
sampling frequency.
In a preferred embodiment, the time warp calculator is configured to scale a portion of the
mapping values, which portion describes a time warp, in dependence on a ratio between
the actual sampling frequency and the reference sampling frequency. It has been found that
such a linear scaling of a portion of the mapping values constitutes a particularly efficient
solution for obtaining the mapping values for different sampling frequencies.
In a preferred embodiment, the decoded time warp values describe a variation of a time
warp contour over a predetermined number of samples of the encoded audio signal
represented by the encoded audio signal representation. In this case, the time warp
calculator is preferably configured to combine a plurality of decoded time warp values
which represent a variation of the time warp contour, to derive a warp contour node value,
such that a deviation of the derived warp node value from a reference warp node value is
larger than a deviation representable by a single one of the decoded time warp values. By
combining a plurality of decoded time warp values, it is possible to maintain a range
required for an individual time warp values sufficiently small. This increases the coding
efficiency of the time warp values. At the same time, it is possible to adjust the range of
representable time warps by adapting the mapping rule.
In a preferred embodiment, the encoded time warp values describe a relative change of the
time warp contour over a predetermined number of samples of the encoded audio signal
represented by the encoded audio signal representation. In this case, the time warp
calculator is configured to derive the decoded time warp information from the decoded
time warp values, such that the decoded time warp information describes the time warp
contour. A combination of a use of time warp values, which describe a relative change of
the time warp contour over a predetermined number of samples of the encoded audio
signal, with an adaptation of a mapping rule for mapping codewords of the encoded time
warp information onto decoded time warp values brings along a high coding efficiency,
because it can be ensured that a substantially identical, or at least similar range of time
warp (in terms of oct/s) can be encoded for different sampling frequencies, even though the
number of time warp codewords per sample of the encoded audio signal can be kept
constant in the case of a change of the sampling frequency.
In a preferred embodiment, the time warp calculator is configured to compute supporting
points of a time warp contour on the basis of the decoded time warp values. In this case,
the time warp calculator is configured to interpolate between the supporting points to
obtain the time warp contour as the decoded time warp information. In this case, a number
of decoded time warp values per audio frame is predetermined and independent from the
sampling frequency. Accordingly, the interpolation scheme between the supporting points
may be left unchanged, which helps to keep the computational complexity small.
An embodiment according to the invention creates an audio signal encoder for providing
an encoded representation of an audio signal. The audio signal encoder comprises a time
warp contour encoder configured to map time warp values describing a time warp contour
onto an encoded time warp information. The time warp contour encoder is configured to
adapt a mapping rule for mapping the time warp values describing the time warp contour
onto the codewords of the encoded time warp information in dependence on a sampling
frequency of the audio signal. The audio signal encoder also comprises a time warping
signal encoder configured to obtain an encoded representation of a spectrum of the audio
signal, taking into account a time warp described by the time warp contour information. In
this case, the encoded representation of the audio signal comprises the codewords of the
encoded time warp information, the encoded representation of the spectrum and a sampling
frequency information describing the sampling frequency. Said audio encoder is wellsuited
for providing the encoded audio signal representation which is used by the abovediscussed
audio signal decoder. Moreover, the audio signal encoder brings along the same
advantages which have been discussed above with respect to the audio signal decoder and
is based on the same considerations.
Another embodiment according to the invention creates a method for providing a decoded
audio signal representation on the basis of an encoded audio signal representation.
Another embodiment according to the invention creates a method for providing an encoded
representation of an audio signal.
Another embodiment according to the invention creates a computer program for
implementing one or both of said methods.
Brief Description of the Figures
Embodiments according to the present invention will subsequently be described taking
reference to the enclosed figures in which:
Fig. 1 shows a block schematic diagram of an audio signal encoder, according to
an embodiment of the present invention;
Fig. 2 shows a block schematic diagram of an audio signal decoder, according to
an embodiment of the present invention;
Fig. 3a shows a block schematic diagram of an audio signal encoder, according to
another embodiment of the present invention;
Fig. 3b shows a block schematic diagram of an audio signal decoder, according to
another embodiment of the present invention;
Fig. 4a shows a block schematic diagram of a mapper for mapping an encoded time
warp information onto decoded time warp values, according to an
embodiment of the invention;
Fig. 4b shows a block schematic diagram of a mapper for mapping an encoded time
warp information onto decoded time warp values, according to another
embodiment of the invention;
Fig. 4c shows a table representation of warps of a conventional quantization
scheme;
Fig. 4d shows a table representation of a mapping of codeword indices onto
decoded time warp values for different sampling frequencies, according to
an embodiment of the invention;
Fig. 4e shows a table representation of a mapping of codeword indices onto
decoded time warp values for different sampling frequencies, according to
another embodiment of the invention;
Figs. 5a, 5b show a detailed extract from a block schematic diagram of an audio signal
decoder, according to an embodiment of the invention;
Figs. 6a, 6b show a detailed extract of a flowchart of a mapper for providing a decoded
audio signal representation, according to an embodiment of the invention;
shows a legend of definitions of data elements and help elements, which are
used in an audio decoder according to an embodiment of the invention;
shows a legend of definitions of constants, which are used in an audio
decoder according to an embodiment of the invention;
shows a table representation of a mapping of a codeword index onto a
corresponding decoded time warp value;
shows a pseudo program code representation of an algorithm for
interpolating linearly between equally spaced warp nodes;
shows a pseudo program code representation of a helper function
"warp_time_inv" ;
shows a pseudo program code representation of a helper function
"warp inv vec";
Fig. 11 shows a pseudo program code representation of an algorithm for computing
a sample position vector and a transition length;
Fig. 12 shows a table representation of values of a synthesis window length N
depending on a window sequence and a core coder frame length;
Fig. 13 shows a matrix representation of allowed window sequences;
Fig. 14 shows a pseudo program code representation of an algorithm for windowing
and for an internal overlap-add of a window sequence of type
"EIGHT_SHORT_SEQUENCE";
Fig. 15 shows a pseudo program code representation of an algorithm for the
windowing and the internal overlap-and-add of other window sequences,
which are not of type "EIGHT_SHORT_SEQUENCE";
Fig. 16 shows a pseudo program code representation of an algorithm for
resampling; and
Figs. 17a-17f show representations of syntax elements of the audio stream, according to
an embodiment of the invention.
Detailed Description of the Embodiments
1. Time Warp Audio Signal Encoder According to Fig. 1
Fig. 1 shows a block schematic diagram of a time warp audio signal encoder 100 according
to an embodiment of the invention.
The audio signal encoder 100 is configured to receive an input audio signal 110 and, to
provide, on the basis thereof, an encoded representation 112 of the input audio signal 110.
The encoded representation 112 of the input audio signal 110 comprises, for example, an
encoded spectrum representation, an encoded time warp information (which may be
designated, for example, with "tw_data", and which may, for example, comprise
codewords tw_ratio[i]) and a sampling frequency information.
The audio signal encoder may optionally comprise a time warp analyzer 120, which may
be configured to receive the input audio signal 110, to analyze the input audio signal and to
provide a time warp contour information 122, such that the time warp contour information
122 describes, for example, a temporal evolution of the pitch of the audio signal 110.
However, the audio signal encoder 100 may, alternatively, receive a time warp contour
information provided by a time warp analyzer which is external to the audio signal
encoder.
The audio signal encoder 100 also comprises a time warp contour encoder 130, which is
configured to receive the time warp contour information 122, and to provide, on the basis
thereof, the encoded time warp information 132. For example, the time warp contour
encoder 130 may receive time warp values describing the time warp contour. The time
warp values may, for example, describe absolute values of a normalized or non-normalized
time warp contour or relative changes over time of normalized or non-normalized time
warp contour. Generally speaking, the time warp contour encoder 130 is configured to map
time warp values describing the time warp contour 122 onto the encoded time warp
information 132.
The time warp contour encoder 130 is configured to adapt a mapping rule for mapping the
time warp values describing the time warp contour onto codewords of the encoded time
warp information 132 in dependence on a sampling frequency of the audio signal. For this
purpose, the time warp contour encoder 130 may receive a sampling frequency
information, to thereby adapt said mapping 134.
The audio signal encoder 100 also comprises a time warping signal encoder 140, which is
configured to obtain an encoded representation 142 of a spectrum of the audio signal 110,
taking into account a time warp described by the time warp contour information 122.
Consequently, the encoded audio signal representation 112 may be provided, for example,
using a bitstream provider, such that the encoded representation 112 of the audio signal
110 comprises the codewords of the encoded time warp information 132, the encoded
representation 142 of the spectrum and a sampling frequency information 152 describing
the sampling frequency (for example, the sampling frequency of the input audio signal 110
and/or the (average) sampling frequency used by the time warping signal encoder 140 in
context with the time-domain-to-frequency-domain conversion).
Regarding the functionality of the audio signal encoder 100, it can be said that the
spectrum of an audio signal, which changes its pitch during an audio frame (wherein a
length of an audio frame, in terms of audio samples, may be equal to a transform length of
a time-domain-to-frequency-domain transform used by the time warping signal encoder)
may be compacted by a time-varying re-sampling. Accordingly, the time-varying resampling,
which may be performed by the time warping signal encoder 140 in dependence
on the time warp contour information 122, results in a spectrum (of the re-sampled audio
signal) which can be encoded with better bitrate-efficiency than the spectrum of the
original input audio signal 110.
However, the time warp which is applied in the time warping signal encoder 140 is
signaled to an audio signal decoder 200 according to Fig. 2 using the encoded time warp
information. Moreover, the encoding of the time warp information, which may comprise a
mapping of the time warp values onto codewords, is adapted in dependence on the
sampling frequency information, such that different mappings of the time warp values onto
the codewords are used for different sampling frequencies of the input audio signal 110 or
for different sampling frequencies at which the time warping signal encoder 140 (or the
time-domain-to frequency-domain conversion thereof) is operated.
Thus, the most bitrate-efficient mapping may be chosen for each of the possible sampling
frequencies, which can be handled by the time warping signal encoder 140. Such an
adaptation makes sense because it was found that a bitrate of the encoded time warp
information can be kept small even in case of multiple possible sampling frequencies used
by the time warping signal encoder 140 if the mapping of the time warp values describing
the time warp contour onto the codewords matches the current frequency. Accordingly, it
can be ensured that a small set of different codewords is sufficient for encoding the time
warp contour with sufficiently fine resolution and also with sufficiently large dynamic
range, both in the case of comparatively small sampling frequencies and comparatively
large sampling frequencies, even if a number of codewords per audio frame remains
constant over different sampling frequencies (which, in turn, provides for a sampling
frequency independent bitstream and therefore facilitates the generation, storage, parsing
and on-the-fly-processing of the encoded audio signal representation 112).
Further details regarding the adaptation of the mapping 134 will be discussed below.
2. Time Warp Audio Signal Decoder According to Fig. 2
Fig. 2 shows a block schematic diagram of a time warp audio signal decoder 200,
according to an embodiment of the invention.
The audio signal decoder 200 is configured to provide a decoded audio signal
representation 212 (for example, in the form of a time-domain audio signal representation)
on the basis of an encoded audio signal representation 210. The encoded audio signal
representation 210 may, for example, comprise an encoded spectrum representation 214
(which may be equal to the encoded spectrum representation 142 provided by the time
warping audio signal encoder 140), an encoded time warp information 216 (which may, for
example, be equal to the encoded time warp information 132 provided by the time warp
contour encoder 130), and a sampling frequency information 218 (which may, for
example, be equal to the sampling frequency information 152).
The audio signal decoder 200 comprises a time warp calculator 230, which may also be
considered as a time warp decoder. The time warp calculator 230 is configured to map the
encoded time warp information 216 onto a decoded time warp information 232. The
encoded time warp information 216 may, for example, comprise time warp codewords
"tw_ratio[i]", and the decoded time warp information may, for example, take the form of a
time warp contour information describing a time warp contour. The time warp calculator
230 is configured to adapt a mapping rule 234 for mapping (time warp) codewords of the
encoded time warp information 216 onto decoded time warp values describing the decoded
time warp information in dependence on the sampling frequency information 218.
Accordingly, different mappings of codewords of the encoded time warp information 216
onto time warp values of the decoded time warp information 232 may be chosen for
different sampling frequencies signaled by the sampling frequency information.
The audio signal decoder 200 also comprises a warp decoder 240 which is configured to
receive the encoded representation 214 of the spectrum and to provide the decoded audio
signal representation 212 on the basis of the encoded spectrum representation 214 and in
dependence on the decoded time warp information 232.
Accordingly, the audio signal decoder 200 allows for an efficient decoding of the encoded
time warp information, both for a comparatively high sampling frequency and for a
comparatively low sampling frequency, because the mapping of codewords of the encoded
time warp information onto decoded time warp values is dependent on the sampling
frequency. Thus, it is possible to obtain a high resolution of the time warp contour for a
comparatively high sampling frequency while still covering a sufficiently large time warp
per time unit for comparatively small sampling frequencies, and while using the same set
of codewords both for a comparatively small sampling frequency and a comparatively high
sampling frequency. Thus, the bitstream format is substantially independent from the
sampling frequency, while it is still possible to describe the time warp with appropriate
accuracy and dynamic range, both in case of a comparatively high sampling frequency and
a comparatively small sampling frequency.
Further details regarding the adaptation of the mapping 234 will be described below. Also,
further details regarding the warp decoder 240 will be described below.
3. Time Warp Audio Signal Encoder According to Fig. 3a
Fig. 3a shows a block schematic diagram of a time warp audio signal encoder 300,
according to an embodiment of the invention.
The audio signal encoder 300 according to Fig. 3 is similar to the audio signal encoder 100
according to Fig. 1, such that identical signals and devices are designated as identical
reference numerals. However, Fig. 3a shows more details regarding the time warp signal
encoder 140.
As the present invention is related to a time warp audio encoding and time warp audio
decoding, a short overview of details of the time warping audio signal encoder 140 will be
given. The time warping audio signal encoder 140 is configured to receive an input audio
signal 110 and to provide an encoded spectrum representation 142 of the input audio signal
110 for a sequence of frames. The time warping audio signal encoder 140 comprises a
sampling unit or re-sampling unit 140a, which is adapted to sample or re-sample the input
audio signal 110 to derive signal blocks (sampled representations) 140d used as a basis for
a frequency domain transform. The sampling unit/re-sampling unit 140a comprises a
sampling position calculator 140b, which is configured to compute sample positions which
are adapted to the time warp described by the time warp contour information 122, and
which are therefore non-equidistant in time if the time warp (or pitch variation, or
fundamental frequency variation) is different from zero. The sampling unit or re-sampling
unit 140a also comprises a sampler or re-sampler 140c, which is configured to sample or
re-sample a portion (for example, an audio frame) of the input audio signal 110 using the
temporally non-equidistant sample positions obtained by the sampling position calculator.
The time warping audio signal encoder 140 further comprises a transform window
calculator 140e, which is adapted to derive scaling windows for the sampled or re-sampled
representations 140d output by the sampling unit or re-sampling unit 140a. The scaling
window information 140f and the sampled/re-sampled representations 140d are input into a
windower 140g, which is adapted to apply the scaling windows described by the scaling
window information 140f to the corresponding sampled or re-sampled representations
140d derived by the sampling unit/re-sampling unit 140a. In other embodiments, the time
warping audio signal encoder 140 may additionally comprise a frequency-domain
transformer 140i, in order to derive a frequency-domain representation 140j (for example,
in the form of transform coefficients or spectral coefficients) of the sampled and windowed
representation 140h of the input audio signal 110. The frequency-domain representation
140j may, for example, be post-processed. Moreover, the frequency-domain representation
140j, or a post-processed version thereof, may be encoded using an encoding 140k to
obtain the encoded spectrum representation 142 of the input audio signal 110.
The time warping audio signal encoder 1 0 further uses a pitch contour of the input audio
signal 110, wherein the pitch contour may be described by a time warp contour
information 122. The time warp contour information 122 may be provided to the audio
signal encoder 300 as an input information, or may be derived by the audio signal encoder
300. The audio signal encoder 300 may therefore, optionally, comprise a time warp
analyzer 120, which may operate as a pitch estimator for deriving the time warp contour
information 122, such that the time warp contour information 122 constitutes a pitch
contour information or describes the pitch contour or a fundamental frequency.
The sampling unit/re-sampling unit 140a may operate on a continuous representation of the
input audio signal 110. Alternatively, however, the sampling unit/re-sampling unit 140a
may operate on a previously sampled representation of the input audio signal 110. In the
former case, the unit 140a may sample the input audio signal (and may therefore be
considered a sampling unit), and in the latter case, the unit 140a may resample the
previously sampled representation of the input audio signal 110 (an may therefore be
considered a re-sampling unit). The sampling unit 140a may, for example, be adapted to
time warp neighboring overlapping audio blocks such that the overlapping portion has a
constant pitch or reduced pitch variation within each of the input blocks after the sampling
or re-sampling.
The transform window calculator 140e may, optionally, derive the scaling windows for the
audio blocks (for example, for the audio frames) depending on the time warping performed
by the sampler 140a. To this end, an optional adjustment block 1401 may be present in
order to define the warping rule used by the sampler, which is then also provided to the
transform window calculator 140e.
In an alternative embodiment, the adjustment block 1401 may be omitted and the pitch
contour described by the time warp contour information 122 may be directly provided to
the transform window calculator 140e, which may itself perform the appropriate
calculations. Furthermore, the sampling unit/re-sampling unit 140a may communicate the
applied sampling to the transform window calculator 140e in order to enable the
calculation of appropriate scaling windows.
However, in some other embodiments, the windowing may be substantially independent
from details of the time warping.
The time warping is performed by the sampling unit/re-sampling unit 140a such that a
pitch contour of sampled (or re-sampled) audio blocks (or audio frames) time-warped and
sampled (or re-sampled) by the unit 140a is more constant than the pitch contour of the
original input audio signal 110. Accordingly, a smearing of the spectrum, which is caused
by a temporal variation of the pitch contour, is reduced by sampling or resampling
performed by the unit 140a. Thus, the spectrum of the sampled or re-sampled audio signal
140d is less smeared (and, typically, shows more explicit spectral peaks and spectral
valleys) than the spectrum of the input audio signal 110. Accordingly, it is typically
possible to encode the spectrum of the sampled (or resampled) audio signal 140d using a
smaller bitrate when compared to a bitrate which would be required for encoding the
spectrum of the input audio signal 110 with the same accuracy.
It should be noted here that the input audio signal 110 is typically processed frame-wise,
wherein the frames may be overlapping or non-overlapping depending on the specific
requirements. For example, each of the frames of the input audio signal may be sampled or
re-sampled individually by the unit 140a, to thereby obtain a sequence of sampled (or resampled)
frames described by respective sets of time-domain samples 140d. Also, the
windowing may be applied individually to the sampled or re-sampled frames, represented
by respective sets of time domain samples 140d, by the windowing 140g. Moreover, the
windowed and re-sampled frames, described by respective sets of windowed and resampled
time domain samples 140h, may be transformed individually into a frequencydomain
by the transform 140i. Nevertheless, there may be some (temporal) overlapping of
the individual frames.
Moreover, it should be noted that the audio signal 110 may be sampled with a
predetermined sampling frequency (also designated as a sampling rate). In the re-sampling,
which is performed by the sampler or re-sampler 140c, the re-sampling may be performed
such that a re-sampled block (or frame) of the input audio signal 110 may comprise an
average sampling frequency (or sampling rate) which is identical (or at least approximately
identical, for example within a tolerance of +/- 5%) to the sampling frequency (or sampling
rate) of the input audio signal 110. However, the audio signal encoder 300 may,
alternatively, be configured to operate with input audio signals of different sampling
frequencies (or sampling rates).
Accordingly, the average sampling frequency (or sampling rate) of the re-sampled blocks
or frames, represented by time-domain samples 140d, may vary in dependence on the
sampling frequency or sampling rate of the input audio signal 110 in some embodiments.
However, it is naturally also possible that the average sampling frequency or sampling rate
of the blocks or frames of the sampled or re-sampled audio signal, represented by the time
domain samples 140d, differs from the sampling rate of input audio signal 110, because the
sampler 140a may perform both, a sampling rate conversion, in accordance with an
operator's desires or requirements, and a time warping.
Consequently, it can be said that the blocks or frames of the sampled or re-sampled audio
signal, represented by sets of time domain samples 140d, may be provided at different
sampling frequencies or sampling rates, depending on an average sampling frequency or
sampling rate of the input audio signal 110 and/or users' desires.
However, in some embodiments, a length of the blocks or frames of the sampled or resampled
audio signal represented by sets of spectral values 140d, in terms of audio
samples, may be constant even for different average sampling frequencies or sampling
rates. However, switching between two possible lengths (in terms of audio samples per
block or frame) may take place in some embodiments, wherein a block length or frame
length in a first (short block) mode may be independent of the average sampling frequency,
and wherein a block length or frame length (in terms of audio samples) in a second (long
block) mode may be independent of the average sampling frequency or sampling rate as
well.
Accordingly, the windowing, which is performed by the windower 140g, the transform,
which is performed by the transformer 140i, and the encoding, which is performed by the
encoder 140k, may be substantially independent of the average sampling frequency or
sampling rate of the sampled or re-sampled audio signal 140d (except for a possible
switching between a short block mode and a long block mode, which may take place
independent of the average sampling frequency or sampling rate).
To conclude, the time warping signal encoder 140 allows to efficiently encode the input
audio signal 110 because the sampling or re-sampling performed by the sampler 140a
results in a re-sampled audio signal 140d having a less smeared spectrum than the input
audio signal 110 in case the input audio signal 110 comprises a temporal pitch variation,
which in turn allows for a bitrate-efficient encoding (by the encoder 140k) of the spectral
coefficients 140j provided by the transformer 140i on the basis of the sampled/re-sampled
and windowed version 140h of the input audio signal 110.
The time-warped contour encoding, which is performed in a sampling-frequencydependent
manner by the time warp contour encoder 130, allows for a bitrate efficient
encoding of the time warp contour information 122 for different sampling frequencies (or
average sampling frequencies) of the sampled/re-sampled audio signal 140d, such that a
bitstream comprising the encoded spectrum representation 142 and the encoded time warp
information 132 is bitrate-efficient.
4. Time Warp Audio Signal Decoder According to Fig. 3b
Fig. 3b shows a block schematic diagram of an audio signal decoder 350, according to an
embodiment of the invention.
The audio signal decoder 350 is similar to the audio signal decoder 200 according to Fig.
2, such that identical signals and devices will be designated with identical reference
numerals and not be explained here again.
The audio signal decoder 350 is configured for receiving an encoded spectrum
representation of a first time-warped and sampled audio frame and for also receiving an
encoded spectrum representation of a second time-warped and sampled audio frame.
Generally speaking, the audio signal encoder 350 is configured for receiving a sequence of
encoded spectrum representations of time-warp-resampled audio frames, wherein said
encoded spectrum representations may, for example, be provided by the time warping
signal encoder 140 of the audio signal encoder 300. In addition, the audio signal decoder
350 receives side information, like, for example, an encoded time warp information 216
and a sampling frequency information 218.
The warp decoder 240 may comprise a decoder 240a, which is configured to receive the
encoded representation 214 of the spectrum, to decode the encoded representation 214 of
this spectrum and to provide a decoded representation 240b of the spectrum. The warp
decoder 240 also comprises an inverse transformer 240c which is configured to receive the
decoded representation 240b of the spectrum and to perform an inverse transform on the
basis of said decoded representation 240b of the spectrum, to thereby obtain a time-domain
representation 240d of a block or frame of the time-warp-sampled audio signal described
by the encoded spectrum representation 214. The warp decoder 240 also comprises a
windower 240e, which is configured to apply a windowing to the time-domain
representation 240d of a block or frame, to thereby obtain a windowed time-domain
representation 240f of a block or frame. The warp decoder 240 also comprises a re¬
sampling 240g, in which the windowed time-domain representation 240f is re-sampled in
accordance with a sampling position information 240h, to thereby obtain a windowed and
re-sampled time-domain representation 240i for a block or a frame. The warp decoder 240
also comprises an overlapper-adder 240j, which is configured to overlap-and-add
subsequent blocks or frames of the windowed and re-sampled time-domain representation,
to thereby obtain a smooth transition between the subsequent blocks or frames of the
windowed and re-sampled time-domain representation 240i, and to thereby obtain the
decoded audio signal representation 212 as a result of the overlap-and-add operation.
The warp decoder 240 comprises a sampling position calculator 240k, which is configured
to receive the decoded time warp information 232 from the time warp calculator (or time
warp decoder) 230, and to provide the sampling position information 240h on the basis
thereof. Accordingly, the decoded time warp information 232 describes the time-varying
re-sampling, which is performed by the re-sampler 240g.
Optionally, the warp decoder 240 may comprise a window shape adjuster 2401, which may
be configured to adjust the shape of the window used by the windower 240e in dependence
on the requirements. For exampled, the windowed shape adjuster 2401 may, optionally,
receive the decoded time warp information 232 and adjust the window in dependence on
said decoded time warp information 232. Alternatively, or in addition, the window shape
adjuster 2401 may be configured to adjust the window shape used by the windower 240e in
dependence on an information indicating whether a long block mode or a short block mode
is used, if the warp decoder 240 is switchable between such a long block mode and a short
block mode. Alternatively, or in addition, the window shape adjuster 2401 may be
configured to select an appropriate window shape for use by the windower 240e in
dependence on a window sequence information if different window types are used by the
warp decoder 240. However, it should be noted that the window shape adjustment, which
is performed by the window shape adjuster 2401, should be considered as being optional
and is not particularly relevant for the present invention.
Moreover, the warp decoder 240 may, optionally, comprise the sampling rate adjuster
240m, which may be configured to control the window shape adjuster 2401 and/or the
sampling position calculator 240k in dependence on the sampling frequency information
218. However, the sampling rate adjustment 240m may be considered as optional and is
not of particular relevance for the present invention.
Regarding the functionality of the warp decoder 240, it can be said that the encoded
representation 214 of the spectrum, which may, for example, comprise a set of transform
coefficients (also designated as spectral coefficients) for each of a plurality of audio frames
(or even a plurality of sets of spectral coefficients for some audio frames), is first decoded
using the decoder 240a, such that the decoded spectrum representation 240b is obtained.
The decoded spectrum representation 240b of a block or frame of the encoded audio signal
is transformed into a time-domain representation (comprising, for example, a
predetermined number of time-domain samples per audio frame) of said block or frame of
the audio content. Typically, but not necessarily, the decoded representation 240b of the
spectrum comprises pronounced peaks and valleys, because such a spectrum can be
encoded efficiently. Consequently, the time-domain representation 240d comprises a
comparatively small pitch variation during a single block or frame (which corresponds to a
spectrum having pronounced peaks and valleys).
The windowing 260e is applied to the time-domain representation 240d of the audio signal
to allow for an overlap-and-add operation. Subsequently, the windowed time-domain
representation 240f is re-sampled in a time-varying manner, wherein the re-sampling is
performed in accordance with the time warp information included, in an encoded form, in
the encoded audio signal representation 210. Accordingly, the re-sampled audio signal
representation 240i typically comprises a significantly larger pitch variation than the
windowed time-domain representation 240f, provided the encoded time warp information
describes a time warp, or, equivalently, a pitch variation. Thus, an audio signal comprising
a significant pitch variation over a single audio frame can be provided at the output of the
re-sampler 240g, even though the output signal 240d of the inverse transformer 240c
comprises a significantly smaller pitch variation over a single audio frame.
However, the warp decoder 240 may be configured to handle encoded spectrum
representations which are provided using different sampling frequencies, and to provide
the decoded audio signal representation 212 with different sampling frequencies. However,
a number of time-domain samples per audio frame or audio block may be identical for a
plurality of different sampling frequencies. Alternatively, however, the warp decoder 240
may be switchable between a short block mode, in which an audio block comprises a
comparatively small number of samples (for example, 256 samples) and a long block mode
in which an audio block comprises a comparatively large number of samples (for example,
2048 samples). In this case, the number of samples per audio block in the short block mode
is identical for the different sampling frequencies, and the number of audio samples per
audio block (or audio frame) in the long block mode is identical for the different sampling
frequencies. Also, the number of time warp codewords per audio frame is typically
identical for the different sampling frequencies. Accordingly, a uniform bitstream format
can be achieved, which is substantially independent (at least with respect to a number of
time-domain samples encoded per audio frame, and with respect to a number of time warp
codewords per audio frame) from the sampling frequency.
However, in order to have both a bitrate efficient encoding of the time warp information
and a sufficient resolution of the time warp information, the encoding of the time warp
information is adapted to the sampling frequency at the side of an audio signal encoder
300, which provides the encoded audio signal representation 210. Consequently, the
decoding of the encoded time warp information 216, which comprises the mapping of time
warp codewords onto decoded time warp values, is adapted to the sampling frequency.
Details regarding this adaptation of the decoding of the time warp information will be
described subsequently.
5. Adaptation of Time Warp Encoding and Decoding
5.1. Conceptual Overview
In the following, details regarding the adaptation of the time warp encoding and decoding
in dependence on a sampling frequency of an audio signal to be encoded or an audio signal
to be decoded will be described. In other words, a sampling frequency dependent pitch
variation quantization will be described. In order to facilitate the understanding, some
conventional concepts will first be described.
In conventional audio encoders and audio decoders using a time warp, the quantization
table for the pitch variation or a warp is fixed for all sampling frequencies. As an example,
reference is made to the Working Draft 6 of the Unified-Speech-and-Audio-Coding
("WD6 of USAC", ISO/IEC JTC1/SC29/WG1 1 N 11213, 2010). Since the update distance
in samples (for example, a distance, in terms of audio samples, of time instances for which
a time warp value is transmitted from an audio encoder to an audio decoder) is also fixed
(both in conventional time warp audio encoders/audio decoders and in time warp audio
encoders/audio decoders according to the present invention), applying such a coding
scheme at a lower bitrate leads to a smaller range of actual pitch changes (for example, in
terms of pitch change per unit time) that can be covered. Typical maximum changes in the
fundamental frequency of speech are below about 15 oct/s (15 octaves per second).
The table of Fig. 4c shows the finding that for certain sampling frequencies that are used in
audio coding, the coding scheme described in reference [3] is not able to map the desired
pitch variation range and therefore leads to a sub-optional coding gain. To show this effect,
the table of Fig. 4c shows the warps for different sampling frequencies for the table (for
example, mapping table for mapping time warp codewords onto decoded time warp
values) used in the audio decoder described in reference [3]. The formula to obtain those
warp values in oct/s is:
In the above equation w designates a warp, pre i designates a relative pitch change factor, fs
designates a sampling frequency, np designates a number of pitch nodes in one frame and
n designates a frame length in samples.
Accordingly, the table of Fig. 4c shows warps of the quantization scheme used in the audio
decoder described in reference [3], wherein nf = 1024 and np = 16.
In accordance with the present invention, it has been found that it is advantageous to adapt
the mapping of the warp value index (which may be considered as a time warp codeword)
onto a corresponding time warp value pr i in dependence on the sampling frequency. In
other words, it has been found that the solution to the above-mentioned problems is to
design distinct quantization tables for different sampling frequencies in such a way that the
absolute range of covered pitch variations or warps in oct/s (octaves per second) is the
same (or at least approximately the same) for all sampling frequencies. It has been found
that this might be done, for example, by providing several explicit quantization tables, each
used for a narrow range of neighbored sampling frequencies, or by a calculation of the
quantization table on the fly for the used sampling frequencies.
In accordance with an embodiment of the invention, this might be done by providing a
table of warp values and calculating the quantization table for the relative pitch change
factor by transforming the formula from above:
P r , = ' ' (2)
In the above equation pr i designate a relative pitch change factor, nf designate the frame
length in samples, w designates the warp, fs designates the sampling frequency and n p
designates the number of pitch nodes in one frame. Using said equation, the relative pitch
change factors p i, which are shown in the table of Fig. 4d, can be obtained.
Taking reference to Fig. 4d, a first column 480 designated an index, which index may be
considered as a time warp codeword, and which index may be included in the bitstream
representing the encoded audio signal representation 210. A second column 482 describes
a maximum representable time warp (in terms of oct/s), which can be represented by np
relative pitch change factors prei associated with the index shown in the first column and in
the respective row. A third column 484 describes a relative pitch change factor associated
with the index given in the first column 480 of the respective row for a sampling frequency
of 24000 Hz. A fourth column 486 shows relative pitch change factors associated with
index values shown in the first column 480 of the respective row for a sampling frequency
of 12000 Hz. As can be seen, indices 0, 1 and 2 correspond to relative pitch change factors
Prei for a "negative" change of the pitch (i.e., for a reduction of the pitch), index value 3
corresponds to a relative pitch change factor of 1, which represents a constant pitch, and
indices 4, 5, 6 and 7 are associated with relative pitch change factors p e describing a
"positive" time warp, i.e. an increase of the pitch.
However, it has been found that there are different concepts for obtaining the relative pitch
change factors. It has been found that one other way to obtain the relative pitch change
factors is to design a table of quantization values for the relative pitch change factor and a
corresponding reference sampling rate. The actual quantization table for a given sampling
frequency can then simply be derived from the designed table using the following formula:
Prei describes a relative pitch change factor for a current sampling frequency fs. In addition,
Prei,ref describes a relative pitch change factor for the reference sampling frequency fS ef . A
set of reference pitch change factors pre i,ref associated with different indices (time warp
codewords) may be stored in a table, wherein the reference sampling frequency fS ef, to
which the reference (relative) pitch change factors correspond, is known.
It has been found that the latter formula gives a reasonable approximation to the results
obtained by the formula above while being computationally less complex.
Fig. 4e shows a table representation of relative pitch change factors pr i, which are obtained
from reference relative pitch change factors pre i,ref wherein the table holds for a reference
sampling frequency fS ef = 24000 Hz.
A first column 490 describes an index, which may be considered as a time warp codeword.
A second column 492 describes reference relative pitch change factors pre ire f associated
with the indices (or codewords) shown in the first column 490 in the respective row. A
third column 494 and a fourth column 496 describe (relative) pitch change factors
associated with the indices of the first column 490 for a sample frequency fs of 24000 Hz
(third column 494) and 12000 Hz (fourth column 496). As can be seen, the relative pitch
change factors pr for a sampling frequency fs of 24000 Hz, which are shown in the third
column 494 are identical to the reference relative pitch change factors shown in the second
column 492, because the sampling frequency fs of 24000 Hz is equal to the reference
sampling frequency fS ef . However, the fourth column 496 shows relative pitch change
factors prei at a sampling frequency fs of 12000 Hz, which are derived from the reference
relative pitch change factors of the second column 492 in accordance with the above
equation (3).
Of course, such normalization procedures, as described above, can easily be applied
straightforward to any other representation of a change in frequency or pitch, for example,
also to a scheme coding the absolute pitch or frequency values and not the relative changes
thereof.
5.2. Implementation According to Fig. 4a
Fig. 4a shows a block schematic diagram of an adaptive mapping 400, which may be used
in embodiments according the invention.
For example, the adaptive mapping 400 may take place of the mapping 234 in the audio
signal decoder 200 or of the mapping 234 in the audio signal decoder 350.
The adaptive mapping 400 is configured to receive an encoded time warp information,
like, for example, a so-called "tw_data" information comprising time warp codewords
"tw_ratio[i]". Accordingly, the adaptive mapping 400 may provide decoded time warp
values, for example, decoded ratio values, which are sometimes designated as values
"warp_value_tbl[tw_ratio]", and which are sometimes also designated as relative pitch
change factors pre . The adaptive mapping 400 also receives a sampling frequency
information which describes, for example, the sampling frequency fs of the time-domain
representation 240d provided by the inverse transform 230c, or the average sampling
frequency of the windowed and re-sampled time domain representation 240i provided by
the re-sampling 240g, or the sampling frequency of the decoded audio signal
representation 212.
The adaptive mapping comprises a mapper 420, which provides a decoded time warp value
as a function of a time warp codeword of the encoded time warp information. A mapping
rule selector 430 selects a mapping table, out of a plurality of mapping tables 432, 434 for
the use by the mapper 420 in dependence on the sampling frequency information 406. For
example, the mapping table selector 430 selects a mapping table, which represents a
mapping defined by the first column 480 of the table of Fig. 4d and the third column 484
of the table of Fig. 4d if the current sampling frequency is equal to 24000 Hz, or if the
current sampling frequency is in a predetermined environment of 24000 Hz. In contrast,
the mapping table selector 430 may select a mapping table, which represents a mapping
defined by the first column 480 of the table of Fig. 4d and the fourth column 486 of the
table of Fig. 4d, if the sampling frequency fs is equal to 12000 Hz or if the sampling
frequency fs is in a predetermined environment of 12000 Hz.
Accordingly, time warp codewords (also designated as "indices") 0-7 are mapped to the
respective decoded time warp values (or relative pitch change factors) shown in the third
column 484 of the table of Fig. 4d if the sampling frequency is equal to 24000 Hz, and
onto respective decoded time warp values (or relative pitch change factors) shown in the
fourth column 486 of the table of Fig. 4d. If a sampling frequency is equal to 12000 Hz.
To summarize, different mapping tables may be selected by the mapping table selector 430
in dependence on the sampling frequency, to thereby map a time warp codeword (for
example, a value "index" included in a bitstream representing the decoded audio signal)
onto a decoded time warp value (for example, a relative pitch change factor p ei, or a time
warp value "warp_value_tbl").
5.3. Implementation According to Fig. 4b
Fig. 4b shows a block schematic diagram of an adaptive mapping 450, which may be used
in embodiments according to the invention. For example, the adaptive mapping 450 may
take place of the mapping 234 in the audio signal decoder 200 or of the mapping 234 in the
audio signal decoder 350. The adaptive mapping 450 is configured to receive an encoded
time warp information, wherein the above explanations regarding the adaptive mapping
400 hold.
First of all, the adaptive mapping 450 is configured to provide decoded time warp values,
wherein the above explanations with respect to the adaptive mapping 400 also hold.
The adaptive mapping 450 comprises a mapper 470, which is configured to receive a
codeword of the encoded time warp and to provide a decoded time warp value. The
adaptive mapping 450 also comprises a mapping value computer or a mapping table
computer 480.
In the case of a mapping value computer, the decoded time warp value is computed
according to the above equation (3). For this purpose, the mapping value computer may
comprise a reference mapping table 482. The reference mapping table 482 may, for
example, describe the mapping information which is defined by a first column 490 and a
second column 492 of the table of Fig. 4e. Accordingly, the mapping value computer 480
and the mapper 470 may cooperate such that a corresponding reference relative pitch
change factor is selected for a given time warp codeword on the basis of the reference
mapping table, and such that the relative pitch change factor pr i corresponding to said
given time warp codeword is computed in accordance with equation (3) using the
information about the current sampling frequency fs and returned as decoded time warp
value. In this case, it is not even necessary to store all the entries of a mapping table
adapted to the current sampling frequency fs at the price of a computation of the decoded
time warp value (relative pitch change factor) for each time warp codeword.
Alternatively, however, the mapping table computer 480 may pre-compute a mapping table
adapted to the current sampling frequency fs for usage by the mapper 470. For example, the
mapping table computer may be configured to compute the entries of the fourth column
496 of Fig. 4e in response to the finding that a current sampling frequency of 12000 Hz is
selected. The computation of said relative pitch change factors pre i for a sampling
frequency fs of 12000 Hz may be based on the reference mapping table (comprising, for
example, the mapping defined by the first column 490 and the second column 492 of the
table of Fig. 4e), and may be performed using equation (3).
Accordingly, said pre-computed mapping table may be used for the mapping of a time
warp codeword onto a decoded time warp value. Moreover, the pre-computed mapping
table may be updated whenever the re-sampling rate is changed.
To summarize, the mapping rule for the mapping of time warp codewords onto decoded
time warp values may be evaluated or computed on the basis of the reference mapping
table 482, wherein a pre-computation of a mapping table adapted to the current sampling
frequency or an on-de-fly computation of the decoded time warp value may be performed.
6. Detailed Description of the Computation of the Time Warp Control Information
In the following, details regarding the computation of the time warp control information on
the basis of a time warp contour evolution information will be described.
6.1. Apparatus according to Figs. 5a and 5b
Figs. 5a and 5b show a block schematic diagram of an apparatus 500 for providing a time
warp control information 512 on the basis of a time warp contour evolution information
510, which may be a decoded time warp information, and which may, for example,
comprise decoded time warp values provided by the mapping 234 of the time warp
calculator 230. The apparatus 500 comprises the means 520 for providing the reconstructed
time warp contour information 522 on the basis of the time warp contour evolution
information 510 and a time warp control information calculator 530 to provide the time
warp control information 512 on the basis of the reconstructed time warp contour
information 522.
In the following, the structure and functionality of the means 520 will be described.
The means 520 comprises a time warp contour calculator 540, which is configured to
receive the time warp contour evolution information 510 and to provide, on the basis
thereof, a new time warp contour portion information 542. For example, a set of time warp
contour evolution information (for example, a set of a predetermined number of decoded
time warp values provided by the mapping 234) may be transmitted to the apparatus 500
for each frame of the audio signal to be reconstructed. Nevertheless, the set of time warp
contour evolution information 510 associated with a frame of the audio signal to be
reconstructed may be used for the reconstruction of a plurality of frames of the audio
signal in some cases. Similarly, a plurality of sets of time warp contour evolution
information may be used for the reconstruction of the audio content of a single frame of the
audio signal, as will be discussed in detail in the following. As a conclusion, it can be
stated that, in some embodiments, the time warp contour evolution information may be
updated at the same rate at which sets of the transform-domain coefficients of the audio
signal to be reconstructed are updated ( 1 set of time warp contour evolution information
510 per frame of the audio signal, and/or one time warp contour portion per frame of the
audio signal).
The time warp contour calculator 540 comprises a warp node value calculator 544, which
is configured to compute a plurality (or temporal sequence) of warp contour node values
on the basis of a plurality (or temporal sequence) of time warp contour ratio values,
wherein the time warp ratio values are comprised by the time warp contour evolution
information 510. In other words, the decoded time warp values provided by the mapping
234 may constitute the time warp ratio values (e.g., warp_value_tbl[tw_ratio[]]). For this
purpose, the warp node value calculator 544 is configured to start the provision of the time
warp contour node values at a predetermined starting value (for example, 1) and to
calculate subsequent time warp contour node values using the time warp contour ratio
values, as will be discussed below.
Further, the time warp contour calculator 544 optionally comprises an interpolator 548,
which is configured to interpolate between subsequent time warp contour node values.
Accordingly, the description 542 of the new time warp contour portion is obtained,
wherein the new time warp contour portion typically starts from the predetermined starting
value used by the warp node calculator 524. Furthermore, the means 520 is configured to
store the so-called "last time warp contour portion" and the so-called "current time warp
contour portion" in a memory not shown in Fig. 5.
However, the means 520 also comprises a rescaler 550, which is configured to rescale the
"last time warp contour portion" and the "current time warp contour portion" to avoid (or
reduce, or eliminate) any discontinuities in the full time warp contour section, which is
based on the "last time warp contour portion", the "current time warp contour portion" and
the "new time warp contour portion". For this purpose, the rescaler 550 is configured to
receive the stored description of the "last time warp contour portion" and of the "current
time warp contour portion" and to jointly rescale the "last time warp contour portion" and
the "current time warp contour portion" to obtain rescaled versions of the "last time warp
contour portion" and the "current time warp contour portion". Some details regarding this
functionality will be described below.
Moreover, the rescaler 550 may also be configured to receive, for example, from a memory
not shown in Fig. 5, a sum value associated with the "last time warp contour portion" in
another sum value associated with the "current time warp portion". These sum values are
sometimes designated with "last_warp_sum" and "cur_warp_sum", respectively. The
rescaler 550 is configured to rescale the sum values associated with the time warp contour
portions using the same rescale factor which the corresponding time warp contour portions
are rescaled with. Accordingly, rescaled sum values are obtained.
In some cases, the means 520 may comprise an updater 560, which is configured to
repeatedly update the time warp contour portions input into the rescaler 550 and also the
sum values input into the rescaler 550. For example, the updater 560 may be configured to
update said information at the frame rate. For example, the "new time warp contour
portion" of the present frame cycle may serve as the "current time warp contour portion" in
a next frame cycle. Similarly, the rescaled "current time warp contour portion" of the
current frame cycle may serve as the "last time warp contour portion" in a next frame
cycle. Accordingly, a memory efficient implementation is created, because the "last time
warp contour portion" of the current frame cycle may be discarded upon completion of the
"current frame cycle".
To summarize the above, the means 520 is configured to provide, for each frame cycle
(with the exception of some special frame cycles, for example, at the beginning of a frame
sequence, or at the end of a frame sequence, or in a frame in which time warping is
inactive) a description of a time warp contour section comprising a description of a "new
time warp contour portion", of a "rescaled current time warp contour portion" and of a
"rescaled last time warp contour portion". Furthermore, the means 520 may provide, for
each frame cycle (with the exception of the above-mentioned special frame cycles) a
representation of a warp contour sum values, for example, comprising a "new time warp
contour portion sum value", a "rescaled current time warp contour sum value" and a
"rescaled last time warp contour sum value".
The time warp control information calculator 530 is configured to calculate the time warp
control information 512 on the basis of the reconstructed time warp contour information
542 provided by the means 520. For example, the time warp control information calculator
530 comprises a time contour calculator 570, which is configured to compute a time
contour 572 (e.g., a sample-wise representation of the time warp contour) on the basis of
the reconstructed time warp contour information. Furthermore, the time warp contour
information calculator 530 comprises a sample position calculator 574, which is provided
to receive the time contour 572 and to provide, on the basis thereof, a sample position
information, for example, in the form of a sample position vector 576. The sample position
vector 576 describes the time warping performed, for example, by the re-sampler 240g.
The time warp control information calculator 530 also comprises a transition length
calculator, which is configured to derive a transition length information from the
reconstructed time warp control information. The transition length information 582 may,
for example, comprise an information describing a left transition length and an information
describing a right transition length. The transition length may, for example, depend on the
length of time segments described by the "last time warp contour portion", the "current
time warp contour portion" and the "new time warp contour portion". For example, the
transition length may be shortened (when compared to a default transition length) if the
temporal extension of a time segment described by the "last time warp contour portion" is
shorter than a temporal extension of the time segment described by the "current time warp
portion", or if the temporal extension of a time segment described by the "new time warp
contour portion" is shorter than the temporal extension of the time segment described by
the "current time warp contour portion".
In addition, the time warp control information calculator 530 may further comprise a first
and last position calculator 584, which is configured to calculate the so-called "first
position" and a so-called "last position" on the basis of the left and right transition length.
The "first position" and the "last position" increase the efficiency of the re-sampler, if
regions outside of these positions are identical to zero after windowing and are therefore
not needed to be taken into account for the time warping. It should be noted here that the
sample position vector 576 comprises, for example, information used (or even required) by
the time warping performed by the re-sampler 240g. Furthermore, the left and right
transition length 582 and the "first position" and the "last position" 586 constitute
information which is, for example, used (or even required) by the windower 240e.
Accordingly, it can be said that the means 520 and the time warp control information
calculator 530 may together take over the functionality of the sample rate adjustment
240m, of the window shape adjustment 2401 and of the sampling position calculation 240k.
6.2. Functional Description according to Figs. 6a and 6b
In the following, the functionality of an audio decoder comprising the means 520 and the
time warp control information calculator 530 will be described with reference to Figs. 6a
and 6b.
Figs. 6a and 6b show a flowchart of a method for decoding an encoded representation of an
audio signal, according to an embodiment of the invention. The method 600 comprises
providing a reconstructed time warp contour information, wherein providing the
reconstructed time warp contour information comprises mapping 604 codewords of an
encoded time warp information onto decoded time warp values, calculating 610 warp node
values, interpolating 620 between the warp node values and rescaling 630 one or more
previously calculated warp contour portions and one or more previously calculated warp
contour sum values. The method 600 further comprises calculating 640 time warp control
information using a "new time warp contour portion" obtained in steps 610 and 620, the
rescaled previously calculated time warp contour portions ("current time warp contour
portion", "last time warp contour portion") and also, optionally, using the rescaled
previously calculated warp contour sum values. As a result, a time contour information,
and/or a sample position information, and/or a transition length information and/or a first
position and a last position information can be obtained in the step 640.
The method 600 further comprises performing 650 time warp signal reconstruction using
the time warp control information obtained in step 640. Details regarding the time warp
signal reconstruction will be described subsequently.
The method 600 also comprises a step 660 of updating a memory, as will be described
below.
7. Detailed Description of the Algorithm
7.1. Overview
In the following, some of the algorithms performed by an audio decoder according to an
embodiment of the invention will be described in detail. For this purpose, reference is
made to Figs. 5a, 5b, 6a, 6b, 7a, 7b, 8, 9, 10a, 10b, 11, 12, 13, 14, 15 and 16.
First of all, reference is made to Fig. 7a, which shows a legend of definitions of data
elements and a legend of definitions of help elements. Moreover, reference is made to Fig.
7b, which shows a legend of definitions of constants.
Generally speaking, it can be said that the methods described here can be used for the
decoding of an audio stream which is encoded according to a time-warped modified
discrete cosine transform. Thus, when the TW-MDCT is enabled for an audio stream
(which may be indicated by a flag, for example, referred to as "twMDCT" flag, which may
be comprised in a specific configuration information), a time-warped filter bank and block
switching may replace a standard filter bank and block switching in an audio decoder.
Additionally to the inverse modified discrete cosine transform (IMDCT) the time-warped
filter bank and block switching contains a time-domain-to-time-domain mapping from an
arbitrarily spaced time grid to a normal regularly spaced or linearly spaced time grid and a
corresponding adaptation of window shapes.
It should be noted here, that the decoding algorithm described here may be performed, for
example, by the warp decoder 240 on the basis of the encoded representation 214 of the
spectrum and also on the basis of the encoded time warp information 232.
7.2. Definitions:
With respect to the definition of data elements, help elements and constants, reference is
made to Figs. 7a and 7b.
7.3. Decoding Process-Warp Contour
The codebook indices of the warp contour nodes are decoded as follows to warp values for
the individual nodes:
1 for tw_data _ present = 0, 0
Documents
Application Documents
#
Name
Date
1
2583-KOLNP-2012-(10-09-2012)-GPA.pdf
2012-09-10
2
2583-KOLNP-2012-(10-09-2012)-FORM-5.pdf
2012-09-10
3
2583-KOLNP-2012-(10-09-2012)-FORM-3.pdf
2012-09-10
4
2583-KOLNP-2012-(10-09-2012)-FORM-2.pdf
2012-09-10
5
2583-KOLNP-2012-(10-09-2012)-FORM-1.pdf
2012-09-10
6
2583-KOLNP-2012-(10-09-2012)-CORRESPONDENCE.pdf
2012-09-10
7
2583-KOLNP-2012.pdf
2012-09-27
8
2583-KOLNP-2012-FORM-18.pdf
2012-11-09
9
2583-KOLNP-2012-(07-03-2013)-CORRESPONDENCE.pdf
2013-03-07
10
2583-KOLNP-2012-(07-03-2013)-ASSIGNMENT.pdf
2013-03-07
11
2583-KOLNP-2012-(03-04-2013)-PA.pdf
2013-04-03
12
2583-KOLNP-2012-(03-04-2013)-CORRESPONDENCE.pdf
2013-04-03
13
2583-KOLNP-2012-(03-04-2013)-ANNEXURE TO FORM 3.pdf
2013-04-03
14
Other Patent Document [10-09-2016(online)].pdf
2016-09-10
15
Other Patent Document [21-02-2017(online)].pdf
2017-02-21
16
2583-KOLNP-2012-Information under section 8(2) (MANDATORY) [12-08-2017(online)].pdf
2017-08-12
17
2583-KOLNP-2012-Information under section 8(2) (MANDATORY) [28-02-2018(online)].pdf
2018-02-28
18
2583-KOLNP-2012-FER.pdf
2018-05-21
19
2583-KOLNP-2012-Information under section 8(2) (MANDATORY) [16-08-2018(online)].pdf