Apparatus, Method And Computer Program For Upmixing A Downmix Audio Signal Using A Phase Value Smoothing
Abstract:
An apparatus for upmixing a downmix audio signal describing one or more downmix audio channels into an upmixed audio signal describing a plurality of upmixed audio channels comprises an upmixer and a parameter determinator. The upmixer is configured to apply temporally variable upmix parameters to upmix the downmix audio signal in order to obtain the upmixed audio signal, wherein the temporally variable upmix parameters comprise temporally variable smoothened phase values. The parameter determinator is configured to obtain one or more temporally smoothened upmix parameters for usage by the upmixer on the basis of a quantized upmix parameter input information. The parameter determinator is configured to combine a scaled version of a previous smoothened phase value with a scaled version of an input phase information using a phase change limitation algorithm, to determine a current smoothened phase value on the basis of the previous smoothened phase value and the phase input information.
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
INNERER KLEINREUTHER WEG 25 A, 90408 NUERNBERG GERMANY
3. JOHANNES HILPERT
HERRNHUETTESTRAßE 46, 90411 NUERNBERG GERMANY
Specification
Apparatus, Method and Computer Program for Upmixing a Downmix Audio Signal
using a Phase Value Smoothing
Technical Field
Embodiments according to the invention are related to an apparatus, a method, and a
computer program for upmixing a downmix audio signal.
Some embodiments according to the invention are related to an adaptive phase parameter
smoothing for parametric multi-channel audio coding.
Background of the Invention
In the following, the context of the invention will be described. Recent development in the
area of parametric audio coding delivers techniques for jointly coding a multi-channel
audio (e.g. 5.1) signal into one (or more) downmix channels plus a side information
stream. These techniques are known as Binaural Cue Coding, Parametric Stereo, and
MPEG Surround etc.
A number of publications describe the so-called "Binaural Cue Coding" parametric multi-
channel coding approach, see for example references [1][2][3][4][5],
"Parametric Stereo" is a related technique for the parametric coding of a two-channel
stereo signal based on a transmitted mono signal plus parameter side information, see, for
example, references [6] [7],
"MPEG Surround" is an ISO standard for parametric multi-channel coding, see, for
example, reference [8].
The above-mentioned techniques are based on transmitting the relevant perceptual cues for
a human's spatial hearing in a compact form to the receiver together with the associated
mono or stereo downmix-signal. Typical cues can be inter-channel level differences (ILD),
inter-channel correlation or coherence (ICC), as well as inter-channel time differences
(ITD), inter-channel phase differences (IPD), and overall phase differences (OPD).
These parameters are, in some cases, transmitted in a frequency and time resolution
adapted to the human's auditory resolution.
For the transmission, the parameters are typically quantized (or, in some eases, even have
to be quantized), where often (especially for low-bit rate scenarios) a rather coarse
quantization is used.
The update interval in time is determined by the encoder, depending on the signal
characteristics. This means that, not for every sample of the downmix-signal, parameters
are transmitted. In other words, in some cases a transmission rate "(or transmission
frequency, or update rate) of parameters describing the above-mentioned cues may be
smaller than a transmission rate (or transmission frequency, or update rate) of audio
samples (or groups of audio samples).
Instead of transmitting both inter-channel phase differences (IPDs) and overall phase
differences (OPDs), it is also possible to only transmit inter-channel phase differences
(IPDs) and estimate the overall phase differences (OPDs) in the decoder.
Since the decoder may, in some cases, have to apply the parameters continuously over time
in a gapless manner, e.g. to each sample (or audio sample), intermediate parameters may
need to be derived at decoder side, typically by interpolation between past and current
parameter sets.
Some conventional interpolation approaches, however, result in poor audio quality.
In the following, a generic binaural cue coding scheme will be described, taking reference
to Fig. 7. Fig. 7 shows a block schematic diagram of a binaural cue coding transmission
system 800, which comprises a binaural cue coding encoder 810 and a binaural cue coding
decoder 820. The binaural cue coding encoder 810 may, for example, receive a plurality of
audio signals 812a, 812b, and 812c. Further, the binaural cue coding encoder 810 is
configured to downmix the audio input signals 812a-812c using a downmixer 814 to obtain
a downmix signal 816, which may, for example, be a sum signal, and which may be
designated with "AS" or "X". Further, the binaural cue coding encoder 810 is configured
to analyze the audio input signals 812a-812c using an analyzer 818 to obtain the side
information signal 819 ("SI"). The sum signal 816 and the side information signal 819 are
transmitted from the binaural cue coding encoder 810 to the binaural cue coding decoder
820. The binaural cue coding decoder 820 may be configured to synthesize a multi-channel
audio output signal comprising, for example, audio channels yl, y2, ... , yN on the basis of
the sum signal 816 and inter-channel cues 824. For this purpose, the binaural cue coding
decoder 820 may comprise a binaural cue coding synthesizer 822, which receives the sum
signal 816 and the inter-channel cues 824, and provides the audio signals yl, y2,..., yN.
The binaural cue coding decoder 820 further comprises a side information processor 826,
which is configured to receive the side information 819 and, optionally, a user input 827.
The side information processor 826 is configured to provide the inter-channel cues 824 on
the basis of the side information 819 and the optional user input 827.
To summarize, the audio input signals are analyzed and downmixed. The sum signal plus
the side information is transmitted to the decoder. The inter-channel cues are generated
from the side information and local user input. The binaural cue coding synthesis generates
the multi-channel audio output signal.
For details, reference is made to the articles "Binaural Cue Coding Part II: Schemes and
applications," by C. Faller and F. Baumgarte (published in: IEEE Transactions on Speech
and Audio Processing, vol. 11, no. 6, Nov. 2003).
However, it has been found that many conventional binaural cue coding decoders provide
multi-channel output audio signals with degraded quality if the side information is
quantized coarsely or with insufficient resolution.
In view of this problem, there is a need for an improved concept of upmixing a downmix
audio signal into an upmixed audio signal, which reduces a degradation of the hearing
impression if the side information describing a phase relationship between different
channels of the upmix signal is quantized with comparatively low resolution.
Summary of the Invention
An embodiment according to the invention creates an apparatus for upmixing a downmix
audio signal describing one or more downmix audio channels into an upmixed audio signal
describing a plurality of upmixed audio channels. The apparatus comprises an upmixer
configured to apply temporally variable upmix parameters to upmix the downmix signal in
order to obtain the upmixed audio signal. The temporally variable upmix parameters
comprise temporally variable smoothened phase values. The apparatus further comprises a
parameter determinator, which parameter determinator is configured to obtain one or more
temporally smoothened upmix parameters to be used by the upmixer on the basis of a
quantized upmix parameter input information. The parameter determinator is configured to
combine a scaled version of a previous smoothened phase value with a scaled version of an
input phase information using a phase change limitation algorithm, to determine a current
smoothened phase value on the basis of the previous smoothened phase value and the input
phase information.
This embodiment according to the invention is based on the finding that audible artifacts in
the upniix signals can be reduced or even avoided by combining a scaled version of a
previous smoothened phase value with a scaled version of an input phase information
using a phase change limitation algorithm, because the consideration of the previous
smoothened phase value in combination with a phase change limitation algorithm allows to
keep discontinuities of the smoothened phase values reasonably small. A reduction of
discontinuities between subsequent smoothened phase values (for example, the previous
smoothened phase value and the current smoothened phase value), in turn, helps to avoid
(or keep sufficiently small) audible frequency variation at a transition between portions of
an audio signal to which the subsequent phase values (e.g. the previous smoothened phase
value and the current smoothened phase value) are applied.
To summarize the above, the invention creates a general concept of adaptive phase
processing for parametric multi-channel audio coding. Embodiments according to the
invention supersede other techniques by reducing artifacts in the output signal caused by
coarse quantization or rapid changes of phase parameters.
In a preferred embodiment, the parameter determinator is configured to combine the scaled
version of the previous smoothened phase value with the scaled version of the input phase
information, such that the current smoothened phase value is in a smaller angle region out
of a first angle region and a second angle region, wherein the first angle region extends, in
a mathematically positive direction, from a first start direction defined by the previous
smoothened phase value to a first end direction defined by the phase input information, and
wherein the second angle region extends, in the mathematically positive direction, from a
second start direction defined by the input phase information to a second end direction
defined by the previous smoothened phase value. Accordingly, in some embodiments of
the invention, a phase variation, which is introduced by a recursive (infinite impulse
response type) smoothening of phase values, is kept as small as possible. Accordingly,
audible artifacts are kept as small as possible. For example, the apparatus may be
configured to ensure that the current smoothened phase value is located within a smaller
angle range out of two angle ranges, wherein a first of the two angle ranges covers more
than 180° and wherein a second of the angle ranges covers the less than 180°, and wherein
the two angle ranges together cover 360°. Accordingly, it is ensured by the phase change
limitation algorithm that the phase difference between the previous smoothened phase
value and the current smoothened phase value is smaller than 180° and, preferably, even
smaller than 90°. This helps to keep audible artifacts as small as possible.
In a preferred embodiment, the parameter determinator is configured to select a
combination rule out of a plurality of different combination rules in dependence on a
difference between the phase input information and the previous smoothened phase value,
and to determine the current smoothened phase value using the selected combination rule.
Accordingly, it can be achieved that an appropriate combination rule is chosen, which
ensures that the phase change between the previous smoothened phase value and the
current smoothened phase value is below a predetermined threshold or, more generally,
sufficiently small or as small as possible. Accordingly, the inventive apparatus outperforms
comparable apparatus, which have a fixed combination rule.
In a preferred embodiment, the parameter determinator is configured to select a basic
combination rule if a difference between the phase input information and the previous
smoothened phase value is in a range between -n and + ti, and to select one or more
different phase adaptation combination rules otherwise. The basic combination rule defines
a linear combination without a constant summand of the scaled version of the phase input
information and the scaled version of the previous smoothened phase value. The one or
more phase adaptation combination rules define a linear combination, taking into account a
constant phase adaptation summand, of the scaled version of the input phase information
and the scaled version of the previous smoothened phase value. Accordingly, an
advantageous and easy-to-implement linear combination of the previous smoothened phase
value and the input phase information can be performed, wherein an additional summand
can be selectively applied if the difference between the previous smoothened phase value
and the input phase information takes a comparatively large value (greater than n or
smaller than - n). Accordingly, the problematic cases in which there is a large difference
between the previous smoothened phase value and the input phase information can be
handled with specifically adapted phase adaptation combination rules, which allows
keeping the phase changes between subsequent smoothened phase values sufficiently
small.
In a preferred embodiment, the parameter determinator comprises a smoothing controller,
wherein the smoothing controller is configured to selectively disable a phase value
smoothing functionality if a difference between the smoothened phase quantity and the
corresponding input phase quantity is larger than a predetermined threshold value.
Accordingly, the phase value smoothing functionality can be disabled if there is a large
change in the input phase information. Typically, very large changes of the input phase
information indicate that it is, indeed, desired to perform a non-smoothened phase change,
because comparatively large changes of the input phase information (significantly larger
than a quantization step) are often related to specific sound events within an audio signal.
Thus, a smoothing of the phase values, which improves the auditory impression in most
cases, would be detrimental in this specific case. Accordingly, the auditory impression can
even be improved by selectively disabling the phase value smoothing functionality"
In a preferred embodiment, the smoothing controller is configured to evaluate, as the
smoothened phase quantity, a difference between two smoothened phase values and to
evaluate, as the corresponding input phase quantity, a difference between two input phase
values corresponding to the two smoothened phase values. It has been found that in some
cases, a difference between phase values, which are associated with different (upmixed)
channels of a multi-channel audio signal, is a particularly meaningful quantity to decide
whether the phase value smoothing functionality should be enabled or disabled.
In a preferred embodiment, the upmixer is configured to apply, for a given time portion,
different temporally smoothened phase rotations, which are defined by different
smoothened phase values, to obtain signals of the upmixed audio channels having an inter-
channel phase difference if a smoothing function (or a phase value smoothing
functionality) is enabled, and to apply temporally non-smoothened phase rotations, which
are defined by different non-smoothened phase values, to obtain signals of different of the
upmixed audio channels having an inter-channel phase difference if the smoothing
function (or the phase value smoothing functionality) is disabled. In this case, the
parameter determinator comprises a smoothing controller, which smoothing controller is
configured to selectively enable or disable the phase value smoothing functionality if a
difference between the smoothened phase values applied to obtain the signals of the
different upmixed audio channels differs from a non-smoothened inter-channel phase
difference value, which is received by the upmixer or derived from a received information
by the upmixer, by more than a predetermined threshold value. It has been found that a
selective deactivation of the phase value smoothing functionality is particularly useful in
terms of improving the hearing impression if an inter-channel phase difference value is
evaluated as the criterion for activating and deactivating the phase value smoothing
functionality.
In a preferred embodiment, the parameter determinator is configured to adjust the filter
time constant for determining a sequence of the smoothened phase values in dependence
on a current difference between a smoothened phase value and a corresponding input phase
value. By adjusting the filter time constant, it can achieved that a sufficiently small settling
time is obtained for very large changes of the input phase value, while keeping the
smoothing characteristics sufficiently good for lower and medium changes of the input
phase value. This functionality brings along particular advantages, because a
comparatively small (or, at most, medium-sized) change of the input phase value is often
caused by a quantization granularity. In other words, a stepwise change of the input phase
value, which is caused by a quantization granularity, may result in an efficient operation of
the smoothing. In such a case, the smoothing functionality may be particularly
advantageous, wherein a comparatively long filter time constant brings good results. In
contrast, a very large change of the input phase value, which is significantly larger than a
quantization step, typically corresponds to a desired large change of the phase value. In this
case, a comparatively short filter time constant brings along good results. Accordingly, by
• adjusting the filter time constant in dependence on a current difference between a
smoothened phase value and a corresponding input phase value, it can be reached that,
intentional large changes of the input phase value result in fast changes of the smoothened
phase values, while comparatively small changes of the input phase value, which take the
size of a quantization step, result in a comparatively slow and smoothed transition of the
smoothened phase value. Accordingly, a good hearing impression is reached both for
intentional, large changes of the desired phase value and for small changes of the desired
phase value (which, nevertheless, may cause a change of the input phase value by one
quantization step).
In a preferred embodiment, the parameter determinator is configured to adjust a filter time
constant for determining a sequence of smoothened phase values in dependence on
differences between a smoothened inter-channel phase difference, which is defined by a
difference, between two smoothened phase values associated with different channels of the
upmixed audio signal, and a non-smoothened inter-channel phase difference, which is
defined by a non-smoothened inter-channel phase difference information. It has been found
that the concept of selectively adjusting the filter time constant can be used with advantage
in combination with a processing of the inter-channel phase differences.
In a preferred embodiment, the apparatus for upmixing is configured to selectively enable
or disable a phase value smoothing functionality in dependence on an information
extracted from an audio bit stream. It has been found that an improvement of the hearing
impression may be obtained by providing the possibility to selectively enable or disable,
under the control of an audio encoder, a phase value smoothing functionality in an audio
decoder.
An embodiment according to the invention creates a method implementing the
functionality of the above-discussed apparatus for upmixing a downmix audio signal into
an upmixed audio signal. Said method is based on the same ideas as the above-discussed
apparatus.
In addition, embodiments according to the invention create a computer program for
performing said method.
Brief Description of the Figs.
Embodiments according to the invention will subsequently be described taking reference to
the accompanying Figs., in which:
Fig. 1 shows a block schematic diagram of an apparatus for upmixing a
downmix audio signal, according to an embodiment of the invention;
Figs. 2a and 2b show a block schematic diagram of an apparatus for upmixing a
downmix audio signal, according to another embodiment of the
invention;
Fig. 3 shows a schematic representation of overall phase differences OPD1,
OPD2 and an inter-channel phase difference IPD;
Figs. 4a and 4b show graphical representations of phase relationships for a first case
of the phase change limitation algorithm;
Figs. 5a and 5b show graphical representations of phase relationships for a second
case of the phase change limitation algorithm;
Fig. 6 shows a flow chart of a method for upmixing a downmix audio
signal into an upmixed audio signal, according to an embodiment of
the invention; and
Fig. 7 shows a block schematic diagram representing a generic binaural cue
coding scheme.
Detailed Description of the Embodiments
1. Embodiment according to Fig. 1
Fig. 1 shows a block schematic diagram of an apparatus 100 for upmixing a downmix
audio signal, according to an embodiment of the invention. The apparatus 100 is
configured to receive a downmix audio signal 110 describing one or more downmix audio
channels and to provide an upmixed audio signal 120 describing a plurality of upmixed
audio channels. The apparatus 100 comprises an upmixer 130 configured to apply
temporally variable upmix parameters to upmix the downmix audio signal 110 in order to
obtain the upmixed audio signal 120. The apparatus 100 also comprises a parameter
determinator 140 configured to receive quantized upmix parameter input information 142.
The parameter determinator 140 is configured to obtain one or more temporally
smoothened upmix parameters 144 for usage by the upmixer 130 on the basis of the
quantized upmix parameter input information 142.
The parameter determinator 140 is configured to combine a scaled version of a previous
smoothened phase value with a scaled version of an input phase information 142a, which is
included in the quantized upmix parameter input information 142, using a phase change
limitation algorithm 146, to determine a current smoothened phase value 144a on the basis
of the previous smoothened phase value and the input phase information. The current
smoothened phase value 144a is included in the temporally variable, smoothened upmix
parameters 144.
In the following, some details regarding the functionality of the apparatus 100 will be
described. The downmix audio signal 110 is input into the upmixer 130, for example, in
the form of a sequence of sets of complex values representing the dowmix audio signal in
the time-frequency domain (describing overlapping or non-overlapping frequency bands or
frequency subbands at an update rate determined by the encoder not shown here). The
upmixer 130 is configured to linearly combine multiple channels of the downmix audio
signal 110 in dependence on the temporally variable, smoothened upmix parameters and/or
to linearly combine a channel of the downmix audio signal 110 with an auxiliary signal
(e.g. de-correlated signal) (wherein the auxiliary signal may be derived from the same
audio channel of the downmix audio signal 110, from one or more other audio channels of
the downmix audio signal 110, or from a combination of audio channels of the dowmix
audio signal 110). Thus, the temporally variable, smoothened upmix parameters 144 may
be used by the upmixer 130 to decide upon the amplitude scaling and/or a phase rotation
(or time delay) used in a generation of the upmixed audio signal 120 (or a channel thereof)
on the basis of the downmix audio signal 110.
The parameter determinator 140 is typically configured to provide temporally variable,
smoothened upmix parameters 144 at an update rate, which is equal to (or, in some cases,
higher than) the update rate of the side information described by the quantized "upmix
parameter input information 142. The parameter determinator 140 may be configured to
avoid (or, at least, reduce) artifacts arising from a coarse (bit rate saving) quantization of
the quantized upmix parameter input information 142. For this purpose, the parameter
determinator 140 may apply a smoothening of the phase information describing, for
example, inter-channel phase differences. This smoothening of the input phase information
142a, which is included in the quantized upmix parameter input information 142, is
performed using a phase change limitation algorithm 143, such that large and abrupt
changes of the phase, which would result in audible artifacts, are avoided (or, at least,
limited to a tolerable degree).
The smoothening is preferably performed by combining a previous smoothened phase
value with a value of the input phase information 142a, such that a current smoothened
phase value is dependent both on the previous smoothened phase value and the current
value of the input phase information 142a. By doing so, a particularly smooth transition
can be obtained using a simple structure of the smoothing algorithm. In other words,
disadvantages of a finite-impulse-response smoothing can be avoided by providing an
infmite-impulse-response type smoothening in which the previous smoothened phase value
is considered.
Optionally, the parameter determinator 140 may comprise an additional interpolation
functionality, which is advantageous if the quantized upmix parameter input information
142 is transmitted at comparatively long temporal intervals (for example, less than once
per set of spectral values of the downmix audio signal 110).
To summarize, the apparatus 100 allows for the provision of temporally variable
smoothened phase values 144a on the basis of the quantized upmix parameter input
information 142, such that the temporally variable smoothened phase values 144a are well-
suited for the derivation of the upmixed audio signal 120 from the downmix audio signal
110 using the upmixer 130.
Audible artifacts are reduced (or even eliminated) by providing the smoothened phase
value 144a using the above-discussed concept, wherein a consideration of a previous
smoothened phase value is combined with a phase change limitation. Accordingly, a good
hearing impression of the upmixed audio signal 120 is achieved.
2. Embodiment according to Fig. 2
2.1. Overview over the Embodiment of Fig. 2
Further details regarding the structure and operation of an apparatus for upmixing an audio
signal will be described taking reference to Figs. 2a and 2b. Figs. 2a and 2b show a
detailed block schematic diagram of an apparatus 200 for mixing a downmix audio signal,
according to another embodiment of the invention.
The apparatus 200 can be considered as a decoder for generating a multi-channel (e.g. 5.1)
audio signal on the basis of a downmix audio signal 210 and a side information SI. The
apparatus 200 implements the functionalities, which have been described with respect to
the apparatus 100.
The apparatus 200 may, for example, serve to decode a multi-channel audio signal encoded
according to a so-called "Binaural Cue Coding", a so-called "Parametric Stereo" or a so-
called "MPEG Surround". Naturally, the apparatus 200 may similarly be used to upmix
multi-channel audio signals encoded according to other systems using spatial cues.
For simplicity, the apparatus 200 is described, which performs an upmix of a single
channel downmix audio signal into a two-channel signal. However, the concept described
here can easily be extended to cases in which the downmix audio signal comprises more
than one channel, and also to cases in which the upmixed audio signal comprises more than
two channels.
2.2. Input Signals and Input Timing of the Embodiment of Fig. 2
The apparatus 200 is configured to receive the downmix audio signal 210 and the side
information 212. Further, the apparatus 200 is configured to provide an upmixed audio
signal 214 comprising, for example, multiple channels.
The downmix audio signal 210 may, for example, be a sum signal generated by an encoder
(e.g. by the BCC encoder 810 shown in Fig. 7). The dowmix audio signal 210 may, for
instance, be represented in a time-frequency domain, for example, in the form of a
complex-valued frequency decomposition. For instance, audio contents of a plurality of
frequency subbands (which may be overlapping or non-overlapping) of the audio signal
may be represented by corresponding complex values. For a given frequency band, the
dowmix audio signal may be represented by a sequence of complex values describing the
audio content in the frequency subband under consideration for subsequent (overlapping or
non-overlapping) time intervals. The subsequent complex values for subsequent time
intervals may be obtained, for example, using a filterbank (e.g. QMF filterbank), a Fast
Fourier Transform, or the like, in the^apparatus 100 (which may be part of a multi-channel
audio signal decoder), or in an additional device coupled to the apparatus 100. However,
the representation of the downmix audio signal 210 described here is typically not identical
to the representation of the downmix signal used for a transmission of the dowmix audio
signal from a multi-channel audio signal encoder to a multi-channel audio signal decoder
or to the apparatus 100. Accordingly, the downmix audio signal 210 may be represented by
a stream of sets or vectors of complex values.
In the following, it will be assumed that subsequent time intervals of the downmix audio
signal 210 are designated with an integer-valued index k. It will also be assumed that the
apparatus 200 receives one set or vector of complex values per interval k and per channel
of the downmix audio signal 210. Thus, one sample (set or vector of complex values) is
received for every audio sample update interval described by time index k.
In other words, audio samples ("AS") of the downmix audio signal 210 are received by the
apparatus 210, such that a single audio sample AS is associated with each audio sample
update interval k.
The apparatus 200 further receives a side information 212 describing the upmix
parameters. For instance, the side information 212 may describe one or more of the
following upmix parameters: Inter-channel level difference (ILD), inter-channel
correlation (or coherence) (ICC), inter-channel time difference (ITD), inter-channel phase
difference (IPD) or overall-phase difference (OPD). Typically, the side information 212
comprises the ILD parameters and at least one out of the parameters ICC, ITD, IPD, OPD.
However, in order to save bandwidth, the side information 212 is, in some embodiments,
only transmitted towards, or received by, the apparatus 200 once per multiple of the audio
sample update intervals k of the downmix audio signal 210 (or the transmission of a single
set of side information may be temporally spread over a plurality of audio sample update
intervals k). Thus, in some cases, there is only one set of side information parameters for a
plurality of audio sample update intervals k. However, in other cases, there may be one set
of side information parameters for each audio sample update interval k.
Intervals at which the side information is updated are designed with the index n, wherein,
for the sake of simplicity only, it will be assumed in the following that the subsequent time
intervals of the downmix audio signal 210, which are designated with the integer-value
index k, are identical to the time intervals at which the side information SI 212 is updated,
such that the relationship k=n holds. However, if an update of the side information SI 212
is performed only once per a plurality of subsequent time intervals k of the downmix audio
signal 210, an interpolation may be performed, for example, between subsequent input
phase information values a„ or subsequent smoothened phase values a n.
For example, side information may be transmitted to (or received by) the apparatus 200 at
the audio sample update intervals k=4, k=8 and k=16. In contrast, no side information 212
may be transmitted to (or received by) the apparatus between said audio sample update
intervals. Thus, the update intervals of the side information 212 may vary over time, as the
encoder may, for example, decide to provide a side information update only when required
(e.g. when the decoder recognizes that the side information is changed by more than a
predetermined value). For example, the side information received by the apparatus 200 for
the audio sample update interval k=4 may be associated with the audio sample update
intervals k=3, 4, 5. Similarly, the side information received by the apparatus 200 for the
audio sample update interval k=8 may be associated with the audio sample update intervals
k=6, 7, 8, 9, 10, and so on. However, a different association is naturally possible and the
update intervals for the side information may naturally also be larger or smaller than
discussed.
2.3. Output Signals and Output Timing of the Embodiment of Fig. 2
However, the apparatus 200 serves to provide upmixed audio signals in a complex-valued
frequency composition. For example, the apparatus 200 may be configured to provide the
upmixed audio signals 214, such that the upmixed audio signals comprise the same audio
sample update interval or audio signal update rate as the downmix audio signal 210. In
other words, for each sample (or audio sample update interval k) of the downmix audio
signal 210, a sample of the upmixed audio signal 214 is generated in some embodiments.
2.4. Upmix
In the following, it will be described in detail how an update of the upmix parameters,
which are used for upmixing the downmix audio signal 210, can be obtained for each audio
sample update interval k even though the decoder input side information 212 may be
updated, in some embodiments, only at larger update intervals. In the following, the
processing for a single subband will be described, but the concept can naturally be
extended to multiple subbands.
The apparatus 200 comprises, as a key component, an upmixer 230, which is configured to
operate as a complex-valued linear combiner. The upmixer 230 is configured to receive a
sample x(t) or x(k) of the dowrrmix audio signal 210 (e.g. representing a certain frequency
band) associated with the audio sample update interval k. The signal x(t) or x(k) is
sometimes also designated as "dry signal". In addition, the upmixer 230 is configured to
receive samples q(t) or q(k) representing a de-correlated version of the downmix audio
signal.
Further, the apparatus 200 comprises a de-correlator (e.g. a delayer or reverberator) 240,
which is configured to receive samples x(k) of the downmix audio signal and to provide,
on the basis thereof, samples q(k) of a de-correlated version of the downmix audio signal
(represented by x(k)). The de-correlated version (samples q(k)) of the dowmix audio signal
(samples x(k)) may be designated as "wet signal".
The upmixer 230 comprises, for example, a matrix-vector multiplier 232, which is
configured to perform a real-valued (or, in some cases, complex-valued) linear
combination of the "dry signal" (represented by x(k)) and the "wet signal" (represented by
q(k)) to obtain a first upmixed channel signal (represented by samples yi(k)) and a second
upmixed channel signal (represented by samples y2(k)). The matrix-vector multiplier 232
may, for example, be configured to perform the following matrix-vector multiplication to
obtain the samples yi(k) and y2(k) of the upmixed channel signals:
The matrix-vector multiplier 232, or the complex-valued linear combiner 230, may further
comprise a phase adjuster 233, which is configured to adjust phases of the samples yi(k)
and y2(k) representing the upmixed channel signals. For example, the phase adjustor 233
may be configured to obtain the phase-adjusted first upmixed channel signal, which is
represented by samples y i(k) according to
and to obtain the phase adjusted second upmixed channel signal, which is represented by
samples y 2(k), according to
Accordingly, the upmixed audio signal 214, samples of which are designated with y i(k)
and y 2(k), is obtained on the basis of the dry signal and the wet signal, by the complex-
valued linear combiner 230 using the temporally variable upmix parameters. The
temporally variable smoothened phase values a n are used to determine the phases (or
inter-channel phase differences) of the upmixed audio signals yiOO and J*2(k). For
example, the phase adjustor 232 may be configured to apply the temporally variable
smoothened phase values. However, alternatively, the temporally variable smoothened
phase values may already be used by the matrix vector multiplier 232 (or even in the
generation of the entries of the matrix H). In this case, the phase adjuster 233 may be
omitted entirely.
2.5 Update Of The Upmix Parameters
As can be seen from the above equations, it is desirable to update the upmix parameter
matrix H(k) and the upmix channel phase values ai(k), oi2(k) for each audio sample update
interval k. Updating the upmix parameter matrix for each audio sample update interval k
brings the advantage that the upmix parameter matrix is always well-adapted to the actual
acoustic environment. Updating the upmix parameter matrix for every audio sample update
interval k also allows keeping step-wise changes of the upmix parameter matrix H (or of
the entries thereof) between subsequent audio sample intervals k small, as changes of the
upmix parameter matrix are distributed over multiple audio sample update intervals, even
if the side information 212 is updated only once per multiple of the audio sample update
intervals k. Also, it is desirable to smoothen any changes of the upmix parameter matrix H
which would arise from a quantization of the side information SI, 212. Similarly, it is
desirable to update the upmix channel phase values