Audio Signal Decoder, Time Warp Contour Data Provider, And Method Thereof
Abstract:
An audio signal decoder configured to provide a decoded audio signal representation on
the basis of an encoded audio signal representation comprising a time warp contour
evolution information comprises a time warp contour calculator, a time warp contour
data rescaler and a warp decoder. The time warp contour calculator is configured to
generate time warp contour data repeatedly restarting from a predetermined time warp
contour start value on the basis of a time warp contour evolution information describing
a temporal evolution of the time warp contour. The time warp contour data rescaler is
configured to rescale at least a portion of the time warp contour data such that a
discontinuity at a restart is avoided, reduced or eliminated in a rescaled version of the
time warp contour. The warp decoder is configured to provide the decoded audio signal
representation on the basis of the encoded audio signal representation and using the
rescaled version of the time warp contour.
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
Audio Signal Decoder, Time Warp Contour Data Provider, Method and Computer
Program
Background of the Invention
Embodiments according to the invention are related to an audio signal decoder. Further
embodiments according to the invention are related to a time warp contour data
provider. Further embodiments according to the invention are related to a method for
decoding an audio signal, a method for providing time warp contour data and to a
computer program.
Some embodiments according to the invention are related to methods for a time warped
MDCT transform coder.
In the following, a brief introduction will be given into the field of time warped audio
encoding, concepts of which can be applied in conjunction with some of the
embodiments of the invention.
In the recent years, techniques have been developed to transform an audio signal into a
frequency domain representation, and to efficiently encode this frequency domain
representation, for example taking into account perceptual masking thresholds. This
concept of audio signal encoding is particularly efficient if the block length, for which a
set of encoded spectral coefficients are transmitted, are long, and if only a
comparatively small number of spectral coefficients are well above the global masking
threshold while a large number of spectral coefficients are nearby or below the global
masking threshold and can thus be neglected (or coded with minimum code length).
For example, cosine-based or sine-based modulated lapped transforms are often used in
applications for source coding due to their energy compaction properties. That is, for
harmonic tones with constant fundamental frequencies (pitch), they concentrate the
signal energy to a low number of spectral components (sub-bands), which leads to an
efficient signal representation.
Generally, the (fundamental) pitch of a signal shall be understood to be the lowest
dominant frequency distinguishable from the spectrum of the signal. In the common
speech model, the pitch is the frequency of the excitation signal modulated by the
human throat. If only one single fundamental frequency would be present, the spectrum
would be extremely simple, comprising the fundamental frequency and the overtones
only. Such a spectrum could be encoded highly efficiently. For signals with varying
pitch, however, the energy corresponding to each harmonic component is spread over
several transform coefficients, thus leading to a reduction of coding efficiency.
In order to overcome this reduction of coding efficiency, the audio signal to be encoded
is effectively resampled on a non-uniform temporal grid. In the subsequent processing,
the sample positions obtained by the non-uniform resampling are processed as if they
would represent values on a uniform temporal grid. This operation is commonly
denoted by the phrase 'time warping'. The sample times may be advantageously chosen
in dependence on the temporal variation of the pitch, such that a pitch variation in the
time warped version of the audio signal is smaller than a pitch variation in the original
version of the audio signal (before time warping). After time warping of the audio
signal, the time warped version of the audio signal is converted into the frequency
domain. The pitch-dependent time warping has the effect that the frequency domain
representation of the time warped audio signal typically exhibits an energy compaction
into a much smaller number of spectral components than a frequency domain
representation of the original (non time warped) audio signal.
At the decoder side, the frequency-domain representation of the time warped audio
signal is converted back to the time domain, such that a time-domain representation of
the time warped audio signal is available at the decoder side. However, in the time-
domain representation of the decoder-sided reconstructed time warped audio signal, the
original pitch variations of the encoder-sided input audio signal are not included.
Accordingly, yet another time warping by resampling of the decoder-sided
reconstructed time domain representation of the time warped audio signal is applied. In
order to obtain a good reconstruction of the encoder-sided input audio signal at the
decoder, it is desirable that the decoder-sided time warping is at least approximately the
inverse operation with respect to the encoder-sided time warping. In order to obtain an
appropriate time warping, it is desirable to have an information available at the decoder
which allows for an adjustment of the decoder-sided time warping.
As it is typically required to transfer such an information from the audio signal encoder
to the audio signal decoder, it is desirable to keep a bit rate required for this
transmission small while still allowing for a reliable reconstruction of the required time
warp information at the decoder side.
In view of the above discussion, there is a desire to have a concept which allows for a
reliable reconstruction of a time warp information on the basis of an efficiently encoded
representation of the time warp information.
Summary of the Invention
An embodiment according to the invention creates an audio signal decoder configured
to provide a decoded audio signal representation on the basis of an encoded audio signal
representation comprising a time warp contour evolution information. The audio signal
decoder comprises a time warp contour calculator configured to generate time warp
contour data repeatedly restarting from a predetermined time warp contour start value
on the basis of the time warp contour evolution information describing a temporal
evolution of the time warp contour. The audio signal decoder also comprises a time
warp contour rescaler configured to rescale at least a portion of the time warp contour
data such that a discontinuity at a restart is avoided, reduced or eliminated in a rescaled
version of the time warp contour. The audio signal decoder also comprises a time warp
decoder configured to provide the decoded audio signal representation on the basis of
the encoded audio signal representation and using the rescaled version of the time warp
contour.
The above described embodiment is based on the finding that the time warp contour can
be encoded with high efficiency using a representation which describes the temporal
evolution, or relative change, of the time warp contour, because the temporal variation
of the time warp contour (also designated as "evolution") is actually the characteristic
quantity of the time warp contour, while the absolute value thereof is of no importance
for a time warped audio signal encoding/decoding. However, it has been found that a
reconstruction of a time warp contour on the basis of a time warp contour evolution
information, describing a variation of the time warp contour over time, brings along the
problem that an allowable range of values in a decoder may be exceeded, for example in
the form of a numeric underflow or overflow. This is due to the fact that decoders
typically comprise a number representation having a limited resolution. Further, it has
been found that the risk of an underflow or overflow in the decoder can be eliminated
by repeatedly restarting the reconstruction of the time warp contour from a
predetermined time warp contour start value. Nevertheless, a mere restart of the
reconstruction of the time warp contour brings along the problem that there are
discontinuities in the time warp contour at the times of restart. Thus, it has been found
that a rescaling can be used to avoid, eliminate, or at least reduce this discontinuity at
the restart, where the reconstruction of the time contour is repeatedly restarted from the
predetermined time warp contour start value.
To summarize the above, it has been found that a block-wise continuous time warp
contour can be reconstructed without running the risk of a numeric overflow or
underflow if the reconstruction of the time warp contour is repeatedly restarted from a
predetermined time warp contour start value, and if the discontinuity arising from the
restart is reduced or eliminated by a rescale of at least a portion of the time warp
contour.
Accordingly, it can be achieved that the time warp contour is always within a well-
defined range of values surrounding the time warp contour start value within a certain
temporal environment of the restart time. This is, in many cases, sufficient because
typically only a temporal portion of the time warp contour, defined relative to a current
time of audio signal reconstruction, is required for a block-wise audio signal
reconstruction, while "older" portions of the time warp contour are not required for the
present audio signal reconstruction.
To summarize the above, the embodiment described here allows for an efficient usage
of a relative time warp contour information, describing a temporal evolution of the time
warp contour, wherein a numeric overflow or underflow in the decoder can be avoided
by the repeated restart of the time warp contour, and wherein a continuity of the time
warp contour, which is often required for the audio signal reconstruction, can be
achieved even at the time of restart by an appropriate rescaling.
In the following, some preferred embodiments will be discussed, which comprise
optional improvements of the inventive concept.
In an embodiment of the invention, the time warp contour calculator is configured to
calculate, starting from a predetermined starting value and using a first relative change
information, a temporal evolution of a first portion of the time warp contour, and to
calculate, starting from the predetermined starting value and using second relative
change information, a temporal evolution of a second portion of the time warp contour,
wherein the first portion of the time warp contour and the second portion of the time
warp contour are subsequent portions of the time warp contour. Preferably, the time
warp contour rescaler is configured to rescale one of the portions of the time warp
contour, to obtain a steady transition between the first portion of the time warp contour
and the second portion of the time warp contour.
Using this concept, both the first time warp contour portion and the second time warp
contour portion can be generated starting from a well-defined predetermined starting
value, which may be identical for the reconstruction of the first time warp contour
portion and the reconstruction of the second time warp contour portion. Assuming that
the relative change information describes relative changes of the time warp contour in a
limited range, it is ensured that the first portion of the time warp contour and the second
portion of the time warp contour exhibit a limited range of values. Accordingly, a
numeric underflow or a numeric overflow can be avoided.
Further, by rescaling of one of the portions of the time warp contour, a discontinuity at
the transition from the first portion of the time warp contour to the second portion of the
time warp contour (i.e. at the restart) can be reduced or even eliminated.
In a preferred embodiment, the time warp contour rescaler is configured to rescale the
first portion of the time warp contour such that a last value of the scaled version of the
first portion of the time warp contour takes the predetermined starting value, or deviates
from the predetermined starting value by no more than a predetermined tolerance value.
In this way, it can be achieved that a value of the time warp contour, which is at the
transition from the first portion to the second portion, takes a predetermined value.
Accordingly, a range of values can be kept particularly small, because a central value is
fixed (or scaled to a predetermined value). For example, if both the first portion of the
time warp contour and the second portion of the time warp contour are ascending, a
minimum value of the rescaled version of the first portion lies below the predetermined
starting value, and an end value of the second portion lies above the predetermined
starting value. However, a maximum deviation from the predetermined starting value is
determined by a maximum of the ascent of the first portion and the ascent of the second
portion. In contrast, if the first portion and the second portion were put together in a
continuous way, without starting from the starting value and without rescaling, an end
of the second portion would deviate from the starting value by the sum of the ascent of
the first portion and the second portion.
Thus, it can be seen that a range of values (maximum deviation from the starting value)
can be reduced by scaling a central value, at the transition between the first portion and
the second portion, to take the starting value. This reduction of the range of values is
particularly advantageous, because it supports the usage of a comparatively low
resolution data format having a limited numeric range, which in turn allows for the
design of cheap and power-efficient consumer devices, which is a continuous challenge
in the field of audio coding.
In a preferred embodiment, the rescaler is configured to multiply warp contour data
values with a normalization factor to scale a portion of the time warp contour, or to
divide warp contour data values by a normalization factor to scale the portion of the
time warp contour. It has been found that a linear scaling (rather than, for example, an
additive shift of the time warp contour) is particularly appropriate, because a
multiplication scaling or division scaling maintains relative variations of the time warp
contour, which are relevant for the time warping, other than absolute values of the time
warp contour, which are of no importance.
In another preferred embodiment, the time warp contour calculator is configured to
obtain a warp contour sum value of a given portion of the time warp contour, and to
scale the given portion of the time warp contour and the warp contour sum value of the
given portion of the time warp contour using a common scaling value.
It has been found that in some cases, it is desirable to derive a warp contour sum value
from the warp contour, because such a warp contour sum value can be used for a
derivation of a time contour from the time warp contour. Thus, it is possible to use the
given time warp contour and the corresponding warp contour sum value for the
calculation of a first time contour. Further, it has been found that the scaled version of
the time warp contour and the corresponding scaled sum value may be required for a
subsequent calculation of another time contour. So, it has been found that it is not
necessary to re-compute the warp contour sum value for the rescaled version of the
given time warp contour from a new, because it is possible to derive the warp contour
sum value of the rescaled version of the given portion of the warp contour by rescaling
the warp contour sum value of the original version of the given portion of the warp
contour.
In a preferred embodiment, the audio signal decoder comprises a time contour
calculator configured to calculate a first time contour using time warp contour data
values of a first portion of the time warp contour, of a second portion of the time warp
contour and of a third portion of the time warp contour, and to calculate a second time
contour using time warp contour data values of the second portion of the time warp
contour, of the third portion of the time warp contour and of a fourth portion of the time
warp contour. In other words, a first plurality of portions of the time warp contour
(comprising three portions) is used for a calculation of the first time contour, and a
second plurality of portions (comprising three portions) is used for a calculation of the
second time contour, wherein the first plurality of portions is overlapping with the
second plurality of portions. The time warp contour calculator is configured to generate
time warp contour data of the first portion starting from a predetermined time warp
contour start value on the basis of a time warp contour evolution information describing
a temporal evolution of the first portion. Further, the time warp contour calculator is
configured to rescale the first portion of the time warp contour, such that a last value of
the first portion of the time warp contour comprises the predetermined time warp
contour start value, to generate time warp contour data of the second portion of the time
warp contour starting from the predetermined time warp contour start value on the basis
of a time warp contour evolution information describing a temporal evolution of the
second portion, and to jointly rescale the first portion and the second portion using a
common scaling factor, such that a last value of the second portion comprises the
predetermined time warp contour start value, so as to obtain jointly rescaled time warp
contour data values. The time warp contour calculator is also configured to generate
original time warp contour data values of the third portion of the time warp contour
starting from the predetermined time warp contour start value on the basis of a time
warp contour evolution information of the third portion of the time warp contour.
Accordingly, the first portion, the second portion and the third portion of the time warp
contour are generated such that they form a continuous section of the time warp
contour. Accordingly, the time contour calculator is configured to calculate the first
time contour using the jointly rescaled time warp contour data values of the first and
second time warp contour portions and the time warp contour data values of the third
time warp contour portion.
Subsequently, the time warp contour calculator is configured to jointly rescale the
second, rescaled portion and the third, original portion of the time warp contour using
another common scaling factor, such that a last value of the third portion of the time
warp contour comprises the predetermined time warp start value, so as to obtain a twice
rescaled version of the second portion and a once rescaled version of the third portion of
the time warp contour. Further, the time warp contour calculator is configured to
generate original time warp contour data values of the fourth portion of the time warp
contour starting from the predetermined time warp contour start value on the basis of a
time warp contour evolution information of the fourth portion of the time warp contour.
Further, the time warp contour calculator is configured to calculate the second time
contour using the twice rescaled version of the second portion, the once rescaled version
of the third portion and the original version of the fourth portion of the time warp
contour.
Thus, it can be seen that the second portion and the third portion of the time warp
contour are used both for the calculation of the first time contour and for the calculation
of the second time contour. Nevertheless, there is a rescaling of the second portion and
of the third portion between the calculation of the first time contour and the calculation
of the second time contour, in order to keep the used range of values sufficiently small
while ensuring the continuity of the time warp contour section considered for the
calculation of the respective time contours.
In another preferred embodiment, the signal decoder comprises a time warp control
information calculator configured to calculate a time warp control information using a
plurality of portions of the time warp contour. The time warp control information
calculator is configured to calculate a time warp control information for the
reconstruction of a first frame of the audio signal on the basis of time warp contour data
of a first plurality of time warp contour portions, and to calculate a time warp control
information for the reconstruction of a second frame of the audio signal, which is
overlapping or non-overlapping with the first frame, on the basis of a time warp contour
data of a second plurality of time warp contour portions. The first plurality of time warp
contour portions is shifted, with respect to time, when compared to the second plurality
of time warp contour portions. The first plurality of time warp contour portions
comprises at least one common time warp contour portion with the second plurality of
time warp contour portions. It has been found that the inventive rescaling approach
brings along particular advantages if overlapping sections of the time warp contour
(first plurality of time warp contour portions, and second plurality of time warp contour
portions) are used for obtaining a time warp control information for the reconstruction
of different audio frames (first audio frame and second audio frame). The continuity of
the time warp contour, which is obtained by the rescaling, brings along particular
advantages if overlapping sections of the time warp contour are used for obtaining the
time warp control information, because the usage of overlapping sections of the time
warp contour could result in severely degraded results, if there was any discontinuity of
the time warp contour.
In another preferred embodiment, the time warp contour calculator is configured to
generate a new time warp contour such that the time warp contour restarts from the
predetermined warp contour start value at a position within the first plurality of time
warp contour portions, or within the second plurality of time warp contour portions,
such that there is a discontinuity of the time warp contour at a location of the restart. To
compensate for that, the time warp contour rescaler is configured to rescale the time
warp contour such that the discontinuity is reduced or eliminated.
In another preferred embodiment, the time warp contour calculator is configured to
generate the time warp contour such that there is a first restart of the time warp contour
from the predetermined time warp contour start value at a position within the first
plurality of time warp contour portions, such that there is a first discontinuity at the
position of the first restart. In this case, the time warp contour rescaler is configured to
rescale the time warp contour such that the first discontinuity is reduced or eliminated.
The time warp calculator is further configured to also generate the time warp contour
such that there is a second restart of the time warp contour from the predetermined time
warp contour start value, such that there is a second discontinuity at the position of the
second restart. The rescaler is also configured to rescale the time warp contour such that
the second discontinuity is reduced or eliminated.
In other words, it is sometimes preferred to have a high number of time warp contour
restarts, for example, one restart per audio frame. In this way, the processing algorithm
can be made to be very regular. Also, the range of values can be kept very small.
In a further preferred embodiment, the time warp calculator is configured to periodically
restart the time warp contour starting from the predetermined time warp contour start
value, such that there is a discontinuity at the restart. The rescaler is adapted to rescale
at least a portion of the time warp contour to reduce or eliminate the discontinuity of the
time warp contour at the restart. The audio signal decoder comprises a time warp
control information calculator configured to combine rescaled time warp contour data
from before a restart and time warp contour data from after the restart, to obtain time
warp control information.
In a further preferred embodiment, the time warp contour calculator is configured to
receive an encoded warp ratio information to derive a sequence of warp ratio values
from the encoded warp ratio information, and to obtain a plurality of warp contour node
values, starting from the warp contour start value. Ratios between the warp contour start
value associated with the warp contour start node and the warp contour node values are
determined by the warp ratio values. It has been shown that the reconstruction of a time
warp contour on the basis of a sequence of warp ratio values brings along very good
results because the warp ratio values encode, in a very efficient way, the relative
variation of the time warp contour, which is the key information for the application of a
time warp. Thus, the warp ratio information has been found to be a very efficient
description of the time warp contour evolution.
In another preferred embodiment, the time warp contour calculator is configured to
compute a warp contour node value of a given warp contour node, which is spaced from
the time warp contour starting point by an intermediate warp contour node, on the basis
of a product-formation comprising a ratio between the warp contour starting value and
the warp contour node value of the intermediate warp contour node and a ratio between
the warp contour node value of the intermediate warp contour node and the warp
contour value of the given warp contour node as factors. It has been found that warp
contour node values can be calculated in a particularly efficient way using a
multiplication of a plurality of the warp ratio values. Also, usage of such a
multiplication allows for a reconstruction of a warp contour, which is well adapted to
the ideal characteristics of a warp contour.
A further embodiment according to the invention creates a time warp contour data
provider for providing time warp contour data representing a temporal evolution of a
relative pitch of an audio signal on the basis of a time warp contour evolution
information. The time warp contour data provider comprises a time warp contour
calculator configured to generate time warp contour data on the basis of a time warp
contour evolution information describing a temporal evolution of the time warp contour.
The time warp contour calculator is configured to repeatedly or periodically restart at
restart positions, a calculation of the time warp contour data from a predetermined time
warp contour start value, thereby creating discontinuities of the time warp contour and
reducing a range of the time warp contour data values. The time warp contour data
provider further comprises a time warp contour rescaler configured to repeatedly rescale
portions of the time warp contour, to reduce or eliminate the discontinuity at the restart
positions in rescaled sections of the time warp contour. The time warp contour data
provider is based on the same idea as the above described audio signal decoder.
A further embodiment according to the invention creates a method for providing a
decoded audio signal representation on the basis of an encoded audio signal
representation.
Yet another embodiment of the invention creates a computer program for providing a
decoded audio signal on the basis of an encoded audio signal representation.
Brief Description of the figures.
Embodiments according to the invention will sequently be described taking reference to
the enclosed figures, in which:
Fig. 1 shows a block schematic diagram of a time warp audio encoder;
Fig. 2 shows a block schematic diagram of a time warp audio decoder;
Fig. 3 shows a block schematic diagram of an audio signal decoder, according to an
embodiment of the invention;
Fig. 4 shows a flowchart of a method for providing a decoded audio signal
representation, according to an embodiment of the invention;
Fig. 5 shows a detailed extract from a block schematic diagram of an audio signal
decoder according to an embodiment of the invention;
Fig. 6 shows a detailed extract of a flowchart of a method for providing a decoded
audio signal representation according to an embodiment of the invention;
Figs. 7a,7b show a graphical representation of a reconstruction of a time warp
contour, according to an embodiment of the invention;
Fig. 8 shows another graphical representation of a reconstruction of a time warp
contour, according to an embodiment of the invention;
Figs. 9a and 9b show algorithms for the calculation of the time warp contour;
Fig. 9c shows a table of a mapping from a time warp ratio index to a time warp
ratio value;
Figs. 10a and 10b show representations of algorithms for the calculation of a time
contour, a sample position, a transition length, a "first position" and a "last position";
Fig. 10c shows a representation of algorithms for a window shape calculation;
Figs. lOdandlOe show a representation of algorithms for an application of a
window;
Fig. 10f shows a representation of algorithms for a time-varying resampling;
Fig. 10g shows a graphical representation of algorithms for a post time warping
frame processing and for an overlapping and adding;
Figs. 1 la and 1 lb show a legend;
Fig. 12 shows a graphical representation of a time contour, which can be extracted from
a time warp contour;
Fig. 13 shows a detailed block schematic diagram of an apparatus for providing a warp
contour, according to an embodiment of the invention;
Fig. 14 shows a block schematic diagram of an audio signal decoder, according to
another embodiment of the invention;
Fig. 15 shows a block schematic diagram of another time warp contour calculator
according to an embodiment of the invention;
Figs. 16a, 16b show a graphical representation of a computation of time warp node
values, according to an embodiment of the invention;
Fig. 17 shows a block schematic diagram of another audio signal encoder, according to
an embodiment of the invention;
Fig. 18 shows a block schematic diagram of another audio signal decoder, according to
an embodiment of the invention; and
Figs. 19a-19f show representations of syntax elements of an audio stream, according to
an embodiment of the invention;
Detailed Description of the Embodiments
1. Time warp audio encoder according to Fig. 1
As the present invention is related to time warp audio encoding and time warp audio
decoding, a short overview will be given of a prototype time warp audio encoder and a
time warp audio decoder, in which the present invention can be applied.
Fig. 1 shows a block schematic diagram of a time warp audio encoder, into which some
aspects and embodiments of the invention can be integrated. The audio signal encoder
100 of Fig. 1 is configured to receive an input audio signal 110 and to provide an
encoded representation of the input audio signal 110 in a sequence of frames. The audio
encoder 100 comprises a sampler 104, which is adapted to sample the audio signal 110
(input signal) to derive signal blocks (sampled representations) 105 used as a basis for a
frequency domain transform. The audio encoder 100 further comprises a transform
window calculator 106, adapted to derive scaling windows for the sampled
representations 105 output from the sampler 104. These are input into a windower 108
which is adapted to apply the scaling windows to the sampled representations 105
derived by the sampler 104. In some embodiments, the audio encoder 100 may
additionally comprise a frequency domain transformer 108a, in order to derive a
frequency-domain representation (for example in the form of transform coefficients) of
the sampled and scaled representations 105. The frequency domain representations may
be processed or further transmitted as an encoded representation of the audio signal 110.
The audio encoder 100 further uses a pitch contour 112 of the audio signal 110, which
may be provided to the audio encoder 100 or which may be derived by the audio
encoder 100. The audio encoder 100 may therefore optionally comprise a pitch
estimator for deriving the pitch contour 112. The sampler 104 may operate on a
continuous representation of the input audio signal 110. Alternatively, the sampler 104
may operate on an already sampled representation of the input audio signal 110. In the
latter case, the sampler 104 may resample the audio signal 110. The sampler 104 may
for example be adapted to time warp neighboring overlapping audio blocks such that the
overlapping portion has a constant pitch or reduced pitch variation within each of the
input blocks after the sampling.
The transform window calculator 106 derives the scaling windows for the audio blocks
depending on the time warping performed by the sampler 104. To this end, an optional
sampling rate adjustment block 114 may be present in order to define a time warping
rule used by the sampler, which is then also provided to the transform window
calculator 106. In an alternative embodiment the sampling rate adjustment block 114
may be omitted and the pitch contour 112 may be directly provided to the transform
window calculator 106, which may itself perform the appropriate calculations.
Furthermore, the sampler 104 may communicate the applied sampling to the transform
window calculator 106 in order to enable the calculation of appropriate scaling
windows.
The time warping is performed such that a pitch contour of sampled audio blocks time
warped and sampled by the sampler 104 is more constant than the pitch contour of the
original audio signal 110 within the input block.
2. Time warp audio decoder according to Fig. 2
Fig. 2 shows a block schematic diagram of a time warp audio decoder 200 for
processing a first time warped and sampled, or simply time warped representation of a
first and second frame of an audio signal having a sequence of frames in which the
second frame follows the first frame and for further processing a second time warped
representation of the second frame and of a third frame following the second frame in
the sequence of frames. The audio decoder 200 comprises a transform window-
calculator 210 adapted to derive a first scaling window for the first time warped
representation 211a using information on a pitch contour 212 of the first and the second
frame and to derive a second scaling window for the second time warped representation
211b using information on a pitch contour of the second and the third frame, wherein
the scaling windows may have identical numbers of samples and wherein the first
number of samples used to fade out the first scaling window may differ from a second
number of samples used to fade in the second scaling window. The audio decoder 200
further comprises a windower 216 adapted to apply the first scaling window to the first
time warped representation and to apply the second scaling window to the second time
warped representation. The audio decoder 200 furthermore comprises a resampler 218
adapted to inversely time warp the first scaled time warped representation to derive a
first sampled representation using the information on the pitch contour of the first and
the second frame and to inversely time warp the second scaled time warped
representation to derive a second sampled representation using the information on the
pitch contour of the second and the third frame such that a portion of the first sampled
representation corresponding to the second frame comprises a pitch contour which
equals, within a predetermined tolerance range, a pitch contour of the portion of the
second sampled representation corresponding to the second frame. In order to derive the
scaling window, the transform window calculator 210 may either receive the pitch
contour 212 directly or receive information on the time warping from an optional
sample rate adjustor 220, which receives the pitch contour 212 and which derives a
inverse time warping strategy in such a manner that the pitch becomes the same in the
overlapping regions, and optionally the different fading lengths of overlapping window
parts before the inverse time warping become the same length after the inverse time
warping.
The audio decoder 200 furthermore comprises an optional adder 230, which is adapted
to add the portion of the first sampled representation corresponding to the second frame
and the portion of the second sampled representation corresponding to the second frame
to derive a reconstructed representation of the second frame of the audio signal as an
output signal 242. The first time-warped representation and the second time-warped
representation could, in one embodiment, be provided as an input to the audio decoder
200. In a further embodiment, the audio decoder 200 may, optionally, comprise an
inverse frequency domain transformer 240, which may derive the first and the second
time warped representations from frequency domain representations of the first and
second time warped representations provided to the input of the inverse frequency
domain transformer 240.
3. Time warp audio signal decoder according to Fig. 3
In the following, a simplified audio signal decoder will be described. Fig. 3 shows a
block schematic diagram of this simplified audio signal decoder 300. The audio signal
decoder 300 is configured to receive the encoded audio signal representation 310, and to
provide, on the basis thereof, a decoded audio signal representation 312, wherein the
encoded audio signal representation 310 comprises a time warp contour evolution
information. The audio signal decoder 300 comprises a time warp contour calculator
320 configured to generate time warp contour data 322 on the basis of the time warp
contour evolution information 316, which time warp contour evolution information
describes a temporal evolution of the time warp contour, and which time warp contour
evolution information is comprised by the encoded audio signal representation 310.
When deriving the time warp contour data 322 from the time warp contour evolution
information 316, the time warp contour calculator 320 repeatedly restarts from a
predetermined time warp contour start value, as will be described in detail in the
following. The restart may have the consequence that the time warp contour comprises
discontinuities (step-wise changes which are larger than the steps encoded by the time
warp contour evolution information 316). The audio signal decoder 300 further
comprises a time warp contour data rescaler 330 which is configured to rescale at least a
portion of the time warp contour data 322, such that a discontinuity at a restart of the
time warp contour calculation is avoided, reduced or eliminated in a rescaled version
332 of the time warp contour.
The audio signal decoder 300 also comprises a warp decoder 340 configured to provide
a decoded audio signal representation 312 on the basis of the encoded audio signal
representation 310 and using the rescaled version 332 of the time warp contour.
To put the audio signal decoder 300 into the context of time warp audio decoding, it
should be noted that the encoded audio signal representation 310 may comprise an
encoded representation of the transform coefficients 211 and also an encoded
representation of the pitch contour 212 (also designated as time warp contour). The time
warp contour calculator 320 and the time warp contour data rescaler 330 may be
configured to provide a reconstructed representation of the pitch contour 212 in the
form of the rescaled version 332 of the time warp contour. The warp decoder 340 may,
for example, take over the functionality of the windowing 216, the resampling 218, the
sample rate adjustment 220 and the window shape adjustment 210. Further, the warp
decoder 340 may, for example, optionally, comprise the functionality of the inverse
transform 240 and of the overlap/add 230, such that the decoded audio signal
representation 312 may be equivalent to the output audio signal 232 of the time warp
audio decoder 200.
By applying the rescaling to the time warp contour data 322, a continuous (or at least
approximately continuous) rescaled version 332 of the time warp contour can be
obtained, thereby ensuring that a numeric overflow or underflow is avoided even when
using an efficient-to-encode relative time warp contour evolution information.
4. Method for providing a decoded audio signal representation according to Fig. 4.
Fig. 4 shows a flowchart of a method for providing a decoded audio signal
representation on the basis of an encoded audio signal representation comprising a time
warp contour evolution information, which can be performed by the apparatus 300
according to Fig. 3. The method 400 comprises a first step 410 of generating the time
warp contour data, repeatedly restarting from a predetermined time warp contour start
value, on the basis of a time warp contour evolution information describing a temporal
evolution of the time warp contour.
The method 400 further comprises a step 420 of rescaling at least a portion of the time
warp control data, such that a discontinuity at one of the restarts is avoided, reduced or
eliminated in a rescaled version of the time warp contour.
The method 400 further comprises a step 430 of providing a decoded audio signal
representation on the basis of the encoded audio signal representation using the rescaled
version of the time warp contour.
5. Detailed description of an embodiment according to the invention taking
reference to Figs. 5-9.
In the following, an embodiment according to the invention will be described in detail
taking reference to Figs. 5-9.
Fig. 5 shows a block schematic diagram of an apparatus 500 for providing a time warp
control information 512 on the basis of a time warp contour evolution information 510.
The apparatus 500 comprises a means 520 for providing a reconstructed time warp
contour information 522 on the basis of the time warp contour evolution information
510, and a time warp control information calculator 530 to provide the time warp
control information 512 on the basis of the reconstructed time warp contour information
522.
Means 520 for Providing the Reconstructed Time Warp Contour Information
In the following, the structure and functionality of the means 520 will be described. The
means 520 comprises a time warp contour calculator 540, which is configured to receive
the time warp contour evolution information 510 and to provide, on the basis thereof, a
new warp contour portion information 542. For example, a set of time warp contour
evolution information may be transmitted to the apparatus 500 for each frame of the
audio signal to be reconstructed. Nevertheless, the set of time warp contour evolution
information 510 associated with a frame of the audio signal to be reconstructed may be
used for the reconstruction of a plurality of frames of the audio signal. Similarly, a
plurality of sets of time warp contour evolution information may be used for the
reconstruction of the audio content of a single frame of the audio signal, as will be
discussed in detail in the following. As a conclusion, it can be stated that in some
embodiments, the time warp contour evolution information 510 may be updated at the
same rate at which sets of the transform domain coefficient of the audio signal to be
reconstructed or updated (one time warp contour portion per frame of the audio signal).
The time warp contour calculator 540 comprises a warp node value calculator 544,
which is configured to compute a plurality (or temporal sequence) of warp contour node
values on the basis of a plurality (or temporal sequence) of time warp contour ratio
values (or time warp ratio indices), wherein the time warp ratio values (or indices) are
comprised by the time warp contour evolution information 510. For this purpose, the
warp node value calculator 544 is configured to start the provision of the time warp
contour node values at a predetermined starting value (for example 1) and to calculate
subsequent time warp contour node values using the time warp contour ratio values, as
will be discussed below.
Further, the time warp contour calculator 540 optionally comprises an interpolator 548
which is configured to interpolate between subsequent time warp contour node values.
Accordingly, the description 542 of the new time warp contour portion is obtained,
wherein the new time warp contour portion typically starts from the predetermined
starting value used by the warp node value calculator 524. Furthermore, the means 520
is configured to consider additional time warp contour portions, namely a so-called "last
time warp contour portion" and a so-called "current time warp contour portion" for the
provision of a full time warp contour section. For this purpose, means 520 is configured
to store the so-called "last time warp contour portion" and the so-called "current time
warp contour portion" in a memory not shown in Fig. 5.
However, the means 520 also comprises a rescaler 550, which is configured to rescale
the "last time warp contour portion" and the "current time warp contour portion" to
avoid (or reduce, or eliminate) any discontinuities in the full time warp contour section,
which is based on the "last time warp contour portion", the "current time warp contour
portion" and the "new time warp contour portion". For this purpose, the rescaler 550 is
configured to receive the stored description of the "last time warp contour portion" and
of the "current time warp contour portion" and to jointly rescale the "last time warp
contour portion" and the "current time warp contour portion", to obtain rescaled
versions of the "last time warp contour portion" and the "current time warp contour
portion". Details regarding the rescaling performed by the rescaler 550 will be discussed
below, taking reference to Figs. 7a, 7b and 8.
Moreover, the rescaler 550 may also be configured to receive, for example from a
memory not shown in Fig. 5, a sum value associated with the "last time warp contour
portion" and another sum value associated with the "current time warp contour portion".
These sum values are sometimes designated with "last_warp_sum" and
"curwarpsum", respectively. The rescaler 550 is configured to rescale the sum values
associated with the time warp contour portions using the same rescale factor which the
corresponding time warp contour portions are rescaled with. Accordingly, rescaled sum
values are obtained.
In some cases, the means 520 may comprise an updater 560, which is configured to
repeatedly update the time warp contour portions input into the rescaler 550 and also the
sum values input into the rescaler 550. For example, the updater 560 may be configured
to update said information at the frame rate. For example, the "new time warp contour
portion" of the present frame cycle may serve as the "current time warp contour
portion" in a next frame cycle. Similarly, the rescaled "current time warp contour
portion" of the current frame cycle may serve as the "last time warp contour portion" in
a next frame cycle. Accordingly, a memory efficient implementation is created, because
the "last time warp contour portion" of the current frame cycle may be discarded upon
completion of the current frame cycle.
To summarize the above, the means 520 is configured to provide, for each frame cycle
(with the exception of some special frame cycles, for example at the beginning of a
frame sequence, or at the end of a frame sequence, or in a frame in which time warping
is inactive) a description of a time warp contour section comprising a description of a
"new time warp contour portion", of a "rescaled current time warp contour portion" and
of a "rescaled last time warp contour portion". Furthermore, the means 520 may
provide, for each frame cycle (with the exception of the above mentioned special frame
cycle) a representation of warp contour sum values, for example, comprising a "new
time warp contour portion sum value", a "rescaled current time warp contour sum
value" and a "rescaled last time warp contour sum value".
The time warp control information calculator 530 is configured to calculate the time
warp control information 512 on the basis of the reconstructed time warp contour
information provided by the means 520. For example, the time warp control information
calculator comprises a time contour calculator 570, which is configured to compute a
time contour 572 on the basis of the reconstructed time warp control information.
Further, the time warp contour information calculator 530 comprises a sample position
calculator 574, which is configured to receive the time contour 572 and to provide, on
the basis thereof, a sample position information, for example in the form of a sample
position vector 576. The sample position vector 576 describes the time warping
performed, for example, by the resampler 218.
The time warp control information calculator 530 also comprises a transition length
calculator, which is configured to derive a transition length information from the
reconstructed time warp control information. The transition length information 582
may, for example, comprise an information describing a left transition length and an
information describing a right transition length. The transition length may, for example,
depend on a length of time segments described by the "last time warp contour portion",
the "current time warp contour portion" and the "new time warp contour portion". For
example, the transition length may be shortened (when compared to a default transition
length) if the temporal extension of a time segment described by the "last time warp
contour portion" is shorter than a temporal extension of the time segment described by
the "current time warp contour portion", or if the temporal extension of a time segment
described by the "new time warp contour portion" is shorter than the temporal extension
of the time segment described by the "current time warp contour portion".
In addition, the time warp control information calculator 530 may further comprise a
first and last position calculator 584, which is configured to calculate a so-called "first
position" and a so-called "last position" on the basis of the left and right transition
length. The "first position" and the "last position" increase the efficiency of the
resampler, as regions outside of these positions are identical to zero after windowing
and are therefore not needed to be taken into account for the time warping. It should be
noted here that the sample position vector 576 comprises, for example, information
required by the time warping performed by the resampler 280. Furthermore, the left and
right transition length 582 and the "first position" and "last position" 586 constitute
information, which is, for example, required by the windower 216.
Accordingly, it can be said that the means 520 and the time warp control information
calculator 530 may together take over the functionality of the sample rate adjustment
220, of the window shape adjustment 210 and of the sampling position calculation 219.
In the following, the functionality of an audio decoder comprises the means 520 and the
time warp control information calculator 530 will be described with reference to Figs. 6,
7a, 7b, 8, 9a-9c, lOa-lOg, 1 la, 1 lb and 12.
Fig. 6 shows a flowchart of a method for decoding an encoded representation of an
audio signal, according to an embodiment of the invention. The method 600 comprises
providing a reconstructed time warp contour information, wherein providing the
reconstructed time warp contour information comprises calculating 610 warp node
values, interpolating 620 between the warp node values and rescaling 630 one or more
previously calculated warp contour portions and one or more previously calculated warp
contour sum values. The method 600 further comprises calculating 640 time warp
control information using a "new time warp contour portion" obtained in steps 610 and
620, the rescaled previously calculated time warp contour portions ("current time warp
contour portion" and "last time warp contour portion") and also, optionally, using the
rescaled previously calculated warp contour sum values. As a result, a time contour
information, and/or a sample position information, and/or a transition length
information and/or a first portion and last position information can be obtained in the
step 640.
The method 600 further comprises performing 650 time warped signal reconstruction
using the time warp control information obtained in step 640. Details regarding the time
warp signal reconstruction will be described subsequently.
The method 600 also comprises a step 660 of updating a memory, as will be described
below.
Calculation of the Time Warp Contour Portions
In the following, details regarding the calculation of the time warp contour portions will
be described, taking reference to Figs. 7a, 7b, 8, 9a, 9b, 9c.
It will be assumed that an initial state is present, which is illustrated in a graphical
representation 710 of Fig. 7a. As can be seen, a first warp contour portion 716 (warp
contour portion 1) and a second warp contour portion 718 (warp contour portion 2) are
present. Each of the warp contour portions typically comprises a plurality of discrete
warp contour data values, which are typically stored in a memory. The different warp
contour data values are associated with time values, wherein a time is shown at an
abscissa 712. A magnitude of the warp contour data values is shown at an ordinate 714.
As can be seen, the first warp contour portion has an end value of 1, and the second
warp contour portion has a start value of 1, wherein the value of 1 can be considered as
a "predetermined value". It should be noted that the first warp contour portion 716 can
be considered as a "last time warp contour portion" (also designated as
"last_warp_contour"), while the second warp contour portion 718 can be considered as
a "current time warp contour portion" (also referred to as "cur_warp_contour").
Starting from the initial state, a new warp contour portion is calculated, for example, in
the steps 610, 620 of the method 600. Accordingly, warp contour data values of the
third warp contour portion (also designated as "warp contour portion 3" or "new time
warp contour portion" or "new_warp_contour") is calculated. The calculation may, for
example, be separated in a calculation of warp node values, according to an algorithm
910 shown in Fig. 9a, and an interpolation 620 between the warp node values, according
to an algorithm 920 shown in Fig. 9a. Accordingly, a new warp contour portion 722 is
obtained, which starts from the predetermined value (for example 1) and which is
shown in a graphical representation 720 of Fig. 7a. As can be seen, the first time warp
contour portion 716, the second time warp contour portion 718 and the third new time
warp contour portion are associated with subsequent and contiguous time intervals.
Further, it can be seen that there is a discontinuity 724 between an end point 718b of the
second time warp contour portion 718 and a start point 722a of the third time warp
contour portion.
It should be noted here that the discontinuity 724 typically comprises a magnitude
which is larger than a variation between any two temporally adjacent warp contour data
values of the time warp contour within a time warp contour portion. This is due to the
fact that the start value 722a of the third time warp contour portion 722 is forced to the
predetermined value (e.g. 1), independent from the end value 718b of the second time
warp contour portion 718. It should be noted that the discontinuity 724 is therefore
larger than the unavoidable variation between two adjacent, discrete warp contour data
values.
Nevertheless, this discontinuity between the second time warp contour portion 718 and
the third time warp contour portion 722 would be detrimental for the further use of the
time warp contour data values.
Accordingly, the first time warp contour portion and the second time warp contour
portion are jointly rescaled in the step 630 of the method 600. For example, the time
warp contour data values of the first time warp contour portion 716 and the time warp
contour data values of the second time warp contour portion 718 are rescaled by
multiplication with a rescaling factor (also designated as "norm_fac"). Accordingly, a
rescaled version 716' of the first time warp contour portion 716 is obtained, and also a
rescaled version 718' of the second time warp contour portion 718 is obtained. In
contrast, the third time warp contour portion is typically left unaffected in this rescaling
step, as can be seen in a graphical representation 730 of Fig. 7a. Rescaling can be
performed such that the rescaled end point 718b' comprises, at least approximately, the
same data value as the start point 722a of the third time warp contour portion 722.
Accordingly, the rescaled version 716' of the first time warp contour portion, the
rescaled version 718' of the second time warp contour portion and the third time warp
contour portion 722 together form an (approximately) continuous time warp contour
section. In particular, the scaling can be performed such that a difference between the
data value of the rescaled end point 718b' and the start point 722a is not larger than a
maximum of the difference between any two adjacent data values of the time warp
contour portions 716', 718',722.
Accordingly, the approximately continuous time warp contour section comprising the
rescaled time warp contour portions 716', 718' and the original time warp contour
portion 722 is used for the calculation of the time warp control information, which is
performed in the step 640. For example, time warp control information can be computed
for an audio frame temporally associated with the second time warp contour portion
718.
However, upon calculation of the time warp control information in the step 640, a time-
warped signal reconstruction can be performed in a step 650, which will be explained in
more detail below.
Subsequently, it is required to obtain time warp control information for a next audio
frame. For this purpose, the rescaled version 716' of the first time warp contour portion
may be discarded to save memory, because it is not needed anymore. However, the
rescaled version 716' may naturally also be saved for any purpose. Moreover, the
rescaled version 718' of the second time warp contour portion takes the place of the
"last time warp contour portion" for the new calculation, as can be seen in a graphical
representation 740 of Fig. 7b. Further, the third time warp contour portion 722, which
took the place of the "new time warp contour portion" in the previous calculation, takes
the role of the "current time warp contour portion" for a next calculation. The
association is shown in the graphical representation 740.
Subsequent to this update of the memory (step 660 of the method 600), a new time warp
contour portion 752 is calculated, as can be seen in the graphical representation 750. For
this purpose, steps 610 and 620 of the method 600 may be re-executed with new input
data. The fourth time warp contour portion 752 takes over the role of the "new time
warp contour portion" for now. As can be seen, there is typically a discontinuity
between an end point 722b of the third time warp contour portion and a start point 752a
of the fourth time warp contour portion 752. This discontinuity 754 is reduced or
eliminated by a subsequent rescaling (step 630 of the method 600) of the rescaled
version 718' of the second time warp contour portion and of the original version of the
third time warp contour portion 722. Accordingly, a twice-rescaled version 718" of the
second time warp contour portion and a once rescaled version 722' of the third time
warp contour portion are obtained, as can be seen from a graphical representation 760 of
Fig. 7b. As can be seen, the time warp contour portions 718", 722', 752 form an at least
approximately continuous time warp contour section, which can be used for the
calculation of time warp control information in a re-execution of the step 640. For
example, a time warp control information can be calculated on the basis of the time
warp contour portions 718", 722', 752, which time warp control information is
associated to an audio signal time frame centered on the second time warp contour
portion.
It should be noted that in some cases it is desirable to have an associated warp contour
sum value for each of the time warp contour portions. For example, a first warp contour
sum value may be associated with the first time warp contour portion, a second warp
contour sum value may be associated with the second time warp contour portion, and so
on. The warp contour sum values may, for example, be used for the calculation of the
time warp control information in the step 640.
For example, the warp contour sum value may represent a sum of the warp contour data
values of a respective time warp contour portion. However, as the time warp contour
portions are scaled, it is sometimes desirable to also scale the time warp contour sum
value, such that the time warp contour sum value follows the characteristic of its
associated time warp contour portion. Accordingly, a warp contour sum value
associated with the second time warp contour portion 718 may be scaled (for example
by the same scaling factor) when the second time warp contour portion 718 is scaled to
obtain the scaled version 718' thereof. Similarly, the warp contour sum value associated
with the first time warp contour portion 716 may be scaled (for example with the same
scaling factor) when the first time warp contour portion 716 is scaled to obtain the
scaled version 716' thereof, if desired.
Further, a re-association (or memory re-allocation) may be performed when proceeding
to the consideration of a new time warp contour portion. For example, the warp contour
sum value associated with the scaled version 718' of the second time warp contour
portion, which takes the role of a "current time warp contour sum value" for the
calculation of the time warp control information associated with the time warp contour
portions 716', 718', 722 may be considered as a "last time warp sum value" for the
calculation of a time warp control information associated with the time warp contour
portions 718", 722', 752. Similarly, the warp contour sum value associated with the
third time warp contour portion 722 may be considered as a "new warp contour sum
value" for the calculation of the time warp control information associated with time
warp contour portions 716', 718', 722 and may be mapped to act as a "current warp
contour sum value" for the calculation of the time warp control information associated
with the time warp contour portions 718", 722', 752. Further, the newly calculated
warp contour sum value of the fourth time warp contour portion 752 may take the role
of the "new warp contour sum value" for the calculation of the time warp control
information associated with the time warp contour portions 718", 722', 752.
Example according to Fig. 8
Fig. 8 shows a graphical representation illustrating a problem which is solved by the
embodiments according to the invention. A first graphical representation 810 shows a
temporal evolution of a reconstructed relative pitch over time, which is obtained in
some conventional embodiments. An abscissa 812 describes the time, an ordinate 814
describes the relative pitch. A curve 816 shows the temporal evolution of the relative
pitch over time, which could be reconstructed from a relative pitch information.
Regarding the reconstruction of the relative pitch contour, it should be noted that for the
application of the time warped modified discrete cosine transform (MDCT) only the
knowledge of the relative variation of the pitch within the actual frame is necessary. In
order to understand this, reference is made to the calculation steps for obtaining the time
contour from the relative pitch contour, which lead to an identical time contour for
scaled versions of the same relative pitch contour. Therefore, it is sufficient to only
encode the relative instead of an absolute pitch value, which increases the coding
efficiency. To further increase the efficiency, the actual quantized value is not the
relative pitch but the relative change in pitch, i.e., the ratio of the current relative pitch
over the previous relative pitch (as will be discussed in detail in the following). In some
frames, where, for example, the signal exhibits no harmonic structure at all, no time
warping might be desired. In such cases, an additional flag may optionally indicate a
flat pitch contour instead of coding this flat contour with the afore mentioned method.
Since in real world signals the amount of such frames is typically high enough, the
trade-off between the additional bit added at all times and the bits saved for non-warped
frames is in favor of the bit savings.
The start value for the calculation of the pitch variation (relative pitch contour, or time
warp contour) can be chosen arbitrary and even differ in the encoder and decoder. Due
to the nature of the time warped MDCT (TW-MDCT) different start values of the pitch
variation still yield the same sample positions and adapted window shapes to perform
the TW-MDCT.
For example, an (audio) encoder gets a pitch contour for every node which is expressed
as actual pitch lag in samples in conjunction with an optional voiced/unvoiced
specification, which was, for example, obtained by applying a pitch estimation and
voiced/unvoiced decision known from speech coding. If for the current node the
classification is set to voiced, or no voiced/unvoiced decision is available, the encoder
calculates the ratio between the actual pitch lag and quantizes it, or just sets the ratio to
1 if unvoiced. Another example might be that the pitch variation is estimated directly by
an appropriate method (for example signal variation estimation).
In the decoder, the start value for the first relative pitch at the start of the coded audio is
set to an arbitrary value, for example to 1. Therefore, the decoded relative pitch contour
is no longer in the same absolute range of the encoder pitch contour, but a scaled
version of it. Still, as described above, the TW-MDCT algorithm leads to the same
sample positions and window shapes. Furthermore, the encoder might decide, if the
encoded pitch ratios would yield a flat pitch contour, not to send the fully coded
contour, but set the activePitchData flag to 0 instead, saving bits in this frame (for
example saving numPitchbits * numPitches bits in this frame).
In the following, the problems will be discussed which occur in the absence of the
inventive pitch contour renormalization. As mentioned above, for the TW-MDCT, only
the relative pitch change within a certain limited time span around the current block is
needed for the computation of the time warping and the correct window shape
adaptation (see the explanations above). The time warping follows the decoded contour
for segments where a pitch change has been detected, and stays constant in all other
cases (see the graphical representation 810 of Fig. 8). For the calculation of the window
and sampling positions of one block, three consecutive relative pitch contour segments
(for example three time warp contour portions) are needed, wherein the third one is the
one newly transmitted in the frame (designated as "new time warp contour portion")
and the other two are buffered from the past (for example designated as "last time warp
contour portion" and "current time warp contour portion").
To get an example, reference is made, for example, to the explanations which were
made with reference to Figs. 7a and 7b, and also to the graphical representations 810,
860 of Fig. 8. To calculate, for example, the sampling positions of the window for (or
associated with) frame 1, which extends from frame 0 to frame 2, the pitch contours of
(or associated with) frame 0, 1 and 2 are needed. In the bit stream, only the pitch
information for frame 2 is sent in the current frame, and the two others are taken from
the past. As explained herein, the pitch contour can be continued by applying the first
decoded relative pitch ratio to the last pitch of frame 1 to obtain the pitch at the first
node of frame 2, and so on. It is now possible, due to the nature of the signal, that if the
pitch contour is simply continued (i.e., if the newly transmitted part of the contour is
attached to the existing two parts without any modification), that a range overflow in the
coder's internal number format occurs after a certain time. For example, a signal might
start with a segment of strong harmonic characteristics and a high pitch value at the
beginning which is decreasing throughout the segment, leading to a decreasing relative
pitch. Then, a segment with no pitch information can follow, so that the relative pitch
keeps constant. Then again, a harmonic section can start with an absolute pitch that is
higher than the last absolute pitch of the previous segment, and again going downwards.
However, if one simply continues the relative pitch, it is the same as at the end of the
last harmonic segment and will go down further, and so on. If the signal is strong
enough and has in its harmonic segments an overall tendency to go either up or down
(like shown in the graphical representation 810 of Fig. 8), sooner or later the relative
pitch reaches the border of a range of the internal number format. It is well known from
speech coding that speech signals indeed exhibit such a characteristic. Therefore it
comes as no surprise, that the encoding of a concatenated set of real world signals
including speech actually exceeded the range of the float values used for the relative
pitch after a relatively short amount of time when using the conventional method
described above.
To summarize, for an audio signal segment (or frame) for which a pitch can be
determined, an appropriate evolution of the relative pitch contour (or time warp
contour) could be determined. For audio signal segments (or audio signal frames) for
which a pitch cannot be determined (for example because the audio signal segments are
noise-like) the relative pitch contour (or time warp contour) could be kept constant.
Accordingly, if there was an imbalance between audio segments with increasing pitch
and decreasing pitch, the relative pitch contour (or time warp contour) would either run
into a numeric underflow or a numeric overflow.
For example, in the graphical representation 810a relative pitch contour is shown for
the case that there is a plurality of relative pitch contour portions 820a, 820a, 820c,
820d with decreasing pitch and some audio segments 822a, 822b without pitch, but no
audio segments with increasing pitch. Accordingly, it can be seen that the relative pitch
contour 816 runs into a numeric underflow (at least under very adverse circumstances).
In the following, a solution for this problem will be described. To prevent the above-
mentioned problems, in particular the numeric underflow or overflow, a periodic
relative pitch contour renormalization has been introduced according to an aspect of the
invention. Since the calculation of the warped time contour and the window shapes only
rely on the relative change over the aforementioned three relative pitch contour
segments (also designated as "time warp contour portions"), as explained herein, it is
possible to normalize this contour (for example, the time warp contour, which may be
composed of three pieces of "time warp contour portions") for every frame (for example
of the audio signal) anew with the same outcome.
For this, the reference was, for example, chosen to be the last sample of the second
contour segment (also designated as "time warp contour portion"), and the contour is
now normalized (for example, multiplicatively in the linear domain) in such a way so
that this sample has a value of a 1.0 (see the graphical representation 860 of Fig. 8).
The graphical representation 860 of Fig. 8 represents the relative pitch contour
normalization. An abscissa 862 shows the time, subdivided in frames (frames 0, 1, 2).
An ordinate 864 describes the value of the relative pitch contour.
A relative pitch contour before normalization is designated with 870 and covers two
frames (for example frame number 0 and frame number 1). A new relative pitch contour
segment (also designated as "time warp contour portion") starting from the
predetermined relative pitch contour starting value (or time warp contour starting value)
is designated with 874. As can be seen, the restart of the new relative pitch contour
segment 874 from the predetermined relative pitch contour starting value (e.g. 1) brings
along a discontinuity between the relative pitch contour segment 870 preceding the
restart point-in-time and the new relative pitch contour segment 874, which is
designated with 878. This discontinuity would bring along a severe problem for the
derivation of any time warp control information from the contour and will possibly
result in audio distortions. Therefore, a previously obtained relative pitch contour
segment 870 preceding the restart point-in-time restart is rescaled (or normalized), to
obtain a rescaled relative pitch contour segment 870'. The normalization is performed
such that the last sample of the relative pitch contour segment 870 is scaled to the
predetermined relative pitch contour start value (e.g. of 1.0).
Detailed Description of the Algorithm
In the following, some of the algorithms performed by an audio decoder according to an
embodiment of the invention will be described in detail. For this purpose, reference will
be made to Figs. 5, 6, 9a, 9b, 9c and 10a-lOg. Further, reference is made to the legend
of data elements, help elements and constants of Figs. 11a and lib.
Generally speaking, it can be said that the method described here can be used for
decoding an audio stream which is encoded according to a time warped modified
discrete cosine transform. Thus, when the TW-MDCT is enabled for the audio stream
(which may be indicated by a flag, for example referred to as "twMdct" flag, which may
be comprised in a specific configuration information), a time warped filter bank and
block switching may replace a standard filter bank and block switching. Additionally to
the inverse modified discrete cosine transform (IMDCT) the time warped filter bank and
block switching contains a time domain to time domain mapping from an arbitrarily
spaced time grid to the normal regularly spaced time grid and a corresponding
adaptation of window shapes.
In the following, the decoding process will be described. In a first step, the warp
contour is decoded. The warp contour may be, for example, encoded using codebook
indices of warp contour nodes. The codebook indices of the warp contour nodes are
decoded, for example, using the algorithm shown in a graphical representation 910 of
Fig. 9a. According to said algorithm, warp ratio values (warp_value_tbl) are derived
from warp ratio codebook indices (twratio), for example using a mapping defined by a
mapping table 990 of Fig. 9c. As can be seen from the algorithm shown as reference
numeral 910, the warp node values may be set to a constant predetermined value, if a
flag (tw_data_present) indicates that time warp data is not present. In contrast, if the
flag indicates that time warp data is present, a first warp node value can be set to the
predetermined time warp contour starting value (e.g. 1). Subsequent warp node values
(of a time warp contour portion) can be determined on the basis of a formation of a
product of multiple time warp ratio values. For example, a warp node value of a node
immediately following the first warp node (i=0) may be equal to a first warp ratio value
(if the starting value is 1) or equal to a product of the first warp ratio value and the
starting value. Subsequent time warp node values (i=2,3,..., num_tw_nodes) are
computed by forming a product of multiple time warp ratio values (optionally taking
into consideration the starting value, if the starting value differs from 1). Naturally, the
order of the product formation is arbitrary. However, it is advantageous to derive a
(i+l)-th warp mode value from an i-th warp node value by multiplying the i-th warp
node value with a single warp ratio value describing a ratio between two subsequent
node values of the time warp contour.
As can be seen from the algorithm shown at reference numeral 910, there may be
multiple warp ratio codebook indices for a single time warp contour portion over a
single audio frame (wherein there may be a 1 -to-1 correspondence between time warp
contour portions and audio frames).
To summarize, a plurality of time warp node values can be obtained for a given time
warp contour portion (or a given audio frame) in the step 610, for example using the
warp node value calculator 544. Subsequently, a linear interpolation can be performed
between the time warp node values (warp_node_values[i]). For example, to obtain the
time warp contour data values of the "new time warp contour portion"
(newwarpcontour) the algorithm shown at reference numeral 920 in Fig. 9a can be
used. For example, the number of samples of the new time warp contour portion is
equal to half the number of the time domain samples of an inverse modified discrete
cosine transform. Regarding this issue, it should be noted that adjacent audio signal
frames are typically shifted (at least approximately) by half the number of the time
domain samples of the MDCT or IMDCT. In other words, to obtain the sample-wise
(Nlong samples) new warp_contour[], the warp_node_values[] are interpolated
linearly between the equally spaced (interpdist apart) nodes using the algorithm shown
at reference numeral 920.
The interpolation may, for example, be performed by the interpolator 548 of the
apparatus of Fig. 5, or in the step 620 of the algorithm 600.
Before obtaining the full warp contour for this frame (i.e. for the frame presently under
consideration) the buffered values from the past are rescaled so that the last warp value
of the past_warp_contour[] equals 1 (or any other predetermined value, which is
preferably equal to the starting value of the new time warp contour portion).
It should be noted here that the term "past warp contour" preferably comprises the
above-described "last time warp contour portion" and the above-described "current time
warp contour portion". It should also be noted that the "past warp contour" typically
comprises a length which is equal to a number of time domain samples of the IMDCT,
such that values of the "past warp contour" are designated with indices between 0 and
2*n_long-l. Thus, "past_warp_contour[2*n_long-l]" designates a last warp value of the
"past warp contour". Accordingly, a normalization factor "norm_fac" can be calculated
according to an equation shown at reference numeral 930 in Fig. 9a. Thus, the past warp
contour (comprising the "last time warp contour portion" and the "current time warp
contour portion") can be multiplicatively rescaled according to the equation shown at
reference numeral 932 in Fig. 9a. In addition, the "last warp contour sum value"
(lastwarpsum) and the "current warp contour sum value" (cur_warp_sum) can be
multiplicatively rescaled, as shown in reference numerals 934 and 936 in Fig. 9a. The
rescaling can be performed by the rescaler 550 of Fig. 5, or in step 630 of the method
600 of Fig. 6.
It should be noted that the normalization described here, for example at reference
numeral 930, then could be modified, for example, by replacing the starting value of "1"
by any other desired predetermined value.
By applying the normalization, a "full warp_contour[]" also designated as a "time warp
contour section" is obtained by concatenating the "pastwarpcontour" and the
"newwarpcontour". Thus, three time warp contour portions ("last time warp contour
portion", "current time warp contour portion", and "new time warp contour portion")
form the "full warp contour", which may be applied in further steps of the calculation.
In addition, a warp contour sum value (new_warp_sum) is calculated, for example, as a
sum over all "newwarpcontourf]" values. For example, a new warp contour sum
value can be calculated according to the algorithms shown at reference numeral 940 in
Fig. 9a.
Following the above-described calculations, the input information required by the time
warp control information calculator 330 or by the step 640 of the method 600 is
available. Accordingly, the calculation 640 of the time warp control information can be
performed, for example by the time warp control information calculator 530. Also, the
time warped signal reconstruction 650 can be performed by the audio decoder. Both, the
calculation 640 and the time-warped signal reconstruction 650 will be explained in
more detail below.
However, it is important to note that the present algorithm proceeds iteratively. It is
therefore computationally efficient to update a memory. For example, it is possible to
discard information about the last time warp contour portion. Further, it is
recommendable to use the present "current time warp contour portion" as a "last time
warp contour portion" in a next calculation cycle. Further, it is recommendable to use
the present "new time warp contour portion" as a "current time warp contour portion" in
a next calculation cycle. This assignment can be made using the equation shown at
reference numeral 950 in Fig. 9b, (wherein warp_contour[n] describes the present "new
time warp contour portion" for 2* n_long