Abstract: An audio signal processing device (1) for downmixing of a first input signal (X1) and a second input signal (X2) to a downmix signal (XD) comprising: a dissimilarity extractor (2) configured to receive the first input signal (X1) and the second input (X2) signal as well as to output an extracted signal (Û2), which is lesser correlated with respect to the first input signal (X1) than the second input signal (X2) and a combiner (3) configured to combine the first input signal (X1) and the extracted signal (Û2) in order to obtain the downmix signal (XD).
Concept for generating a downmix signal
Description
The present invention is related to audio signal processing and, in particular,
to downmixing of a plurality of input signals to a downmix signal.
In signal processing, it often becomes necessary to mix two or more signals
to one sum signal. The mixing procedure usually comes along with some
signal impairments, especially if two signals, which are to be mixed, contain
similar but phase shifted signal parts. If those signals are summed up, the
resulting signal contains severe comb-filter artifacts. To prevent those arti¬
facts, different methods have been suggested being either very costly in
terms of computational complexity or based on applying a correction gain or
term to the already impaired signal.
Converting multi-channel audio signals into a fewer number of channels nor¬
mally implies mixing several audio channels. The ITU, for instance, recom¬
mends using a time-domain, passive mix matrix with static gains for a downward
conversion from a certain multi-channel setup to another [1]. In [2] a
quite similar approach is proposed.
To increase dialogue intelligibility, a combined approach of using the ITUbased
and a matrix-based downmix is proposed in [3]. Also, audio coders
utilize a passive downmix of channels, e.g. in some parametric modules [4, 5,
6].
The approach described in [7] performs a loudness measurement of every
input and output channel, i.e. of every single channel before and after the
mixing process. By taking the ratio of the sum of the input energies (i.e. e n
ergy of the channels supposed to be mixed) and the output energy (i.e. energy
of the mixed channels), gains can be derived such that signal energy loss
and coloration effects are reduced.
The approach described in [8] performs a passive downmix which is afterwards
transformed into frequency domain. The downmix is then analyzed by
a spatial correction stage which tries to detect and correct any spatial incon
sistencies through modifications to the inter-channel level differences and
inter-channel phase differences. Then, an equalizer is applied to the signal to
ensure the downmix signal has the same power as the input signal. In the
last step, the downmix signal is transformed back into time domain.
A different approach is disclosed in [9, 10], where two signals, which are to
be downmixed, are transformed into frequency domain and a desired/actual
value pair is built. The desired value calculates as the root of the sum of the
single energies, whereas the actual value computes as the root of energy of
the sum signal. The two values are then compared and depending on the
actual value being greater or less than the desired value, a different correc¬
tion is applied to the actual value.
Alternatively, there are methods which aim on aligning the signals' phases,
such that no signal cancelation effects occur due to phase differences. Such
methods were proposed for instance for parametric stereo encoders [ 1 1, 12,
13].
A passive downmix as done in [ 1 , 2, 3 , 4 , 5 , 6] is the most straight forward
approach to mix signals. But if no further action is taken, the resulting
downmix signals might suffer from severe signal loss and comb-filtering ef¬
fects.
The approaches described in [7, 8, 9, 10] perform a passive downmix, in the
sense of equally mixing both signals, in the first step. Afterwards, some cor¬
rections are applied to the downmixed signal. This might help to reduce
comb-filter effects, but on the other hand will introduce modulation artifacts.
This is caused by rapidly changing correction gains/terms over time. Fur
thermore, a phase shift of 80 degrees between the signals to be down mixed
still results in a zero value downmix and cannot be compensated for by applying,
for instance, a correction gain.
A phase-align approach, such as mentioned in [ 1 1, 12, 13], may help to avoid
unwanted signal cancelation; but due to still performing a simple add-up pro
cedure of the phase-aligned signals comb-filter and cancelation may occur if
phases are not estimated properly. Additionally, robustly estimating the
phase relations between two signals is not an easy task and is computational
intensive, especially if done for more than two signals.
It is an object of the present invention to provide an improved concept for
downmixing a plurality of input signals to a downmix signal.
This object is achieved by a device according to claim 1, a system according
to claim 16, a method according to claim 17 or a computer program of claim
18.
An audio signal processing device for downmixing of a first input signal and a
second input signal to a downmix signal, wherein the first input signal ( )
and the second input signal (X2 ) are at least partly correlated, comprising:
a dissimilarity extractor configured to receive the first input signal and the
second input signal as well as to output an extracted signal, which is lesser
correlated with respect to the first input signal than the second input signal
and
a combiner configured to combine the first input signal and the extracted s ig
nal in order to obtain the downmix signal is provided.
The device will be described herein in time-frequency domain, but all consid
erations are also true for time domain signals. A first input signal and second
input signal are the signals to be mixed, where the first input signal serves as
reference signal. Both signals are fed into a dissimilarity extractor, where correlated
signal parts of the second input signal with respect to the second in
put signal are rejected and only the uncorrelated signal parts of the second
input signal are passed to the extractor's output.
The improvement of the proposed concept lies in the way the signals are
mixed. In the first step, one signal is selected to serve as a reference. It is
then determined, which part of the reference signal is already present within
the other, and only those parts, which are not present in the reference signal
(i.e. the uncorrelated signal), are added to the reference to build the downmix
signal. Since only low-correlated or uncorrelated signal parts with respect to
the reference are combined with the reference, the risk of introducing combfilter
effects is minimized.
As a summary, a novel concept of mixing two signals to one downmix signal
is proposed. The novel method aims at preventing the creation of downmix
artifacts, like comb-filtering. In addition, the proposed method is computa
tionally efficient.
In some embodiments of the invention the combiner comprises an energy
scaling system configured in such way that the ratio of the energy of the
downmix and the summed up energies of the first input signal and the sec
ond input signal is independent from the correlation of the first input signal
and the second input signal. Such energy scaling device may ensure that the
downmixing process is energy preserving (i.e., the downmix signal contains
the same amount of energy as the original stereo signal) or at least that the
perceived sound stays the same independently from the correlation of the
first input signal and the second input signal.
In embodiments of the invention the energy scaling system comprises a first
energy scaling device configured to scale the first input signal based on a
first scale factor in order to obtain a scaled input signal.
In some embodiments of the invention the energy scaling system comprises
a first scale factor provider configured to provide the first scale factor, where¬
in the first scale factor provider preferably is designed as a processor config¬
ured to calculate the first scale factor depending on the first input signal, the
second input signal, the extracted signal and/or a scale factor for the extracted
signal. During the downmixing, the reference signal (first input signal)
might be scaled to preserve the overall energy level or to keep the energy
level independent from the correlation of the input signals automatically.
In embodiments of the invention the energy scaling system comprises a second
energy scaling device configured to scale the extracted signal based on
a second scale factor in order to obtain a scaled extracted signal.
In some embodiments of the invention the energy scaling system comprises
a second scale factor provider configured to provide the second scale factor,
wherein the second scale factor provider preferably is designed as a manmachine
interface configured for manually inputting the second scale factor.
The second scale factor can be seen as an equalizer. In general, this may be
done frequency dependent and in preferred embodiments manually by a
sound engineer. Of course, plenty of different mixing ratios are possible and
these highly depend on the experience and/or taste of the sound engineer.
Alternatively, the second scale factor provider preferably is designed as a
processor configured to calculate the first scale factor depending on the first
input signal, the second input signal and/or the extracted signal.
In some embodiments of the invention the combiner comprises a sum up de¬
vice for outputting the downmix signal based on the first input signal and
based on the extracted signal. Since only low-correlated or even uncorrelated
signal parts with respect to the reference are added to the reference, the risk
of introducing comb-filter effects is minimized. In addition, the use of a sum
up device is computationally efficient.
In some embodiments of the invention the dissimilarity extractor comprises a
similarity estimator configured to provide filter coefficients for obtaining the
signal parts of the first input signal being present in the second input signal
from the first input signal and a similarity reducer configured to reduce the
signal parts of the first input signal being present in the second input signal
based on the filter coefficients. In such implementations, the dissimilarity ex¬
tractor consists of two sub-stages: a similarity estimator and a similarity reducer.
The first input signal and the second input signal are fed into a simi¬
larity estimation stage, where the signal parts of the first input signal being
present within the second input signal are estimated and represented by the
resulting filter coefficients. The filter coefficients, the first input signal and the
second input signal are fed into the similarity reducer where the signal parts
of the second input signal being similar to the first input signal are sup¬
pressed and/or canceled, respectively. This results in the extracted signal
which is an estimation for the uncorrelated signal part of the second input
signal with respect to the first input signal.
In some embodiments of the invention the similarity reducer comprises a
cancelation stage having a signal cancellation device configured to subtract
the obtained signal parts of the first input signal being present in the second
input signal or a signal derived from the obtained signal parts from the sec
ond input signal or from a signal derived from the second input signal. This
concept is related to a method being used in the subject of adaptive noise
cancelation but with the difference that it is not used, as originally intended,
to cancel the noise or uncorrelated component but instead to cancel the cor¬
related signal part, which results in the extracted signal.
In some embodiments of the invention the cancelation stage comprises a
complex filter device configured to filter the first input signal by using complex
valued filter coefficients. The advantage of this approach is that phase shifts
can be modeled.
In some embodiments of the invention the cancelation stage comprises a
phase shift device configured to align the phase of the second input signal to
the phase of the first input signal. For opposite phases between the first input
signal and the second input signal in addition with sudden signal drops of the
first input signal, phase jumps and signal cancelation effects may occur with¬
in the downmix signal. This effect can be drastically reduced by aligning the
phase of the second input signal towards the first input signal. Such cancela¬
tion stage may be called reverse phase aligned cancelation stage.
In some embodiments of the invention the similarity reducer comprises a sig¬
nal suppression stage having a signal suppression device configured to multiply
the second input signal with a suppression gain factor in order to obtain
the extracted signal. It has been observed that audible distortions due to es¬
timation errors in the filter coefficients may be reduced by these features.
In some embodiments of the invention the signal suppression stage comprises
a phase shift device configured to align the phase of the second input s ig
nal to the phase of the first input signal. The suppression gain factors are re¬
al-valued and therefore have no influence on the phase relations of the two
input signals, but since the complex valued filter coefficients have to be esti¬
mated anyway, additional information on the relative phase between the input
signals may be obtained. This information can be used to adjust the phase of
the second input signal towards the first input signal. This may be done within
the signal suppression stage before the suppression gains are applied,
wherein the phase of the second input signal is shifted by the estimated
phase of the complex valued filter factors mentioned above. Such suppres
sion stage may be called reverse phase aligned suppression stage.
In some embodiments of the invention an output signal of the cancellation
stage is fed to an input of the signal suppression stage in order to obtain the
extracted signal or an output signal of the signal suppression stage is fed to
an input of the cancellation stage in order to obtain the extracted signal. A
combined approach of using canceling as well as suppression of coherent
signal components may be used to further increase the quality of the
downmix signal. The resulting downmix signal may be obtained by perform¬
ing a cancelation procedure first, and afterwards applying a suppression pro¬
cedure. In other embodiments, the resulting downmix signal may be obtained
by performing a suppression procedure first, and afterwards applying a cancelation
procedure. In this way, signal parts in the extracted signal, which are
correlated to the first signal, may be further reduced. The extracted signal as
well as the first input signal may be energy scaled as before.
In some embodiments of the invention the signal parts of the first input signal
being present in the second input signal are being weighted before being
subtracted from the second input signal depending on a weighting factor. A
weighting factor may in general be time and frequency dependent but can
also be chosen as constant. In some embodiments, the reverse phasealigned
cancelation module can be used here as well with a small modification:
the weighting with the weighting factor has to be done analogously after
filtering with the absolute value of the filter coefficients.
In some embodiments of the invention the phase shift device is configured to
align the phase of the second input signal to the phase of the first input signal
depending on the weighting factor.
In some embodiments of the invention the phase shift device is configured to
align the phase of the second input signal to the phase of the first input signal
only, if the weighting factor is smaller or equal to a predefined threshold.
The invention further relates to an audio signal processing system for
downmixing of a plurality of input signals to a downmix signal comprising at
least a first device according to the invention and a second device according
to the invention, wherein the downmix signal of the first device is fed to the
second device as a first input signal or as a second input signal. To downmix
a plurality of input channels, a cascade of a plurality of two-channel downmix
devices can be used.
Moreover, the invention relates to a method for downmixing of a first input
signal and a second input signal to a downmix signal comprising the steps of:
estimating an uncorrelated signal, which is a component of the second input
signal and which is uncorrelated with respect to the first input signal and
summing up the first input signal and the uncorrelated signal in order to obtain
the downmix signal.
Furthermore, the invention relates to a computer program for implementing
the method according to the invention when being executed on a computer or
signal processor.
Preferred embodiments are subsequently discussed with respect to the ac¬
companying drawings, in which:
Fig. 1 illustrates a first embodiment of an audio signal processing device;
Fig. 2 illustrates the first embodiment in more details;
Fig. 3 illustrates a similarity reducer and a combiner of the first em¬
bodiment;
Fig. 4 illustrates a similarity reducer of a second embodiment;
Fig. 5 illustrates a similarity reducer and a combiner of a third embod¬
iment;
Fig. 6 illustrates a similarity reducer of a fourth embodiment;
Fig. 7 illustrates a similarity reducer and a combiner of a fifth embodi¬
ment;
Fig. 8 illustrates a similarity reducer and a combiner of a sixth embod
iment; and
Fig. 9 illustrates a cascade of a plurality of audio signal processing
device.
Fig. 1 shows a high level system description of the proposed novel downmix
device 1. The device is described in time-frequency domain, where k and m
correspond to frequency and time indices respectively, but all considerations
are also true for time domain signals. A first input signal C k,m ) and second
input signal X 2 k ,m) are the input signals to be mixed, where the first input
signal X (k, ) may serve as reference signal. Both signals X k ,m) and
X 2 (k, m) are fed into a dissimilarity extractor 2 , where correlated signal parts
with respect to X k ,m) and X 2 (k, rn) are rejected or at least reduced and
only the uncorrelated signal or the low-correlated parts 2 k ,m are extracted
and passed to the extractor's output. Then, the first input signal X (k, ) is
scaled using a first energy scaling device 4 to meet some predefined energy
constraint, which results in a scaled reference signal X s k,m The neces¬
sary scale factors GEx (k,m) are provided by the scale factor provider 5 . The
extracted signal part 0 k, ) can also be scaled using a second energy scal
ing device 6 , which results in a scaled uncorrelated signal part 02s k,m . The
corresponding scale factors GEu (k,m) are provided by the second scale fac¬
tor provider 7 . The scale factors GE k, ) may be determined preferably
manually by a sound engineer. Both scaled signals X s k,m and 02s k, )
are summed up using a sum up device 8 to form the desired downmix signal
XD(k,m).
Figure 2 shows a medium level system description of the proposed device 1.
In some implementations, the dissimilarity extractor 2 consists of two substages:
a similarity estimator 9 and a similarity reducer 10 as depicted in Fig¬
ure 2. The first input signal X k. and the second input signal X (k,m) are
fed into a similarity estimation stage 9 , where the signal parts of X k.rn be¬
ing present within X2(k,m) are estimated and represented by the resulting
filter coefficients Wk with I = . ..L - 1 and L being the filter length. The
filter coefficients Wk , the first input signal X^k. m) and the second input
signal X2 k,m axe fed into the similarity reducer 10 , where the signal parts of
X (k,m) being similar to X k,rr are at least partly suppressed and/or can¬
celed, respectively. This results in the residual signal 02(k, m), which is an
estimation for the uncorrelated signal part of X k,m) with respect to
The signal model assumes the second input signal X {k, m) to be a mixture
of a weighted or filtered version W'(k, X {k, m) of the first input signal
X k. n and an initially unknown independent signal U2 k,m with
E X J - 0. Thus, X (k, m) is considered to consist of the sum of a corre
lated and an uncorrelated signal part with respect to X k.m):
X2 k,m W'(k,m) X^k, ) + U2 k,m . ( 1)
Capital letters indicate frequency transformed signals and k and rn are the
frequency and time indices respectively. Now the desired downmix signal
XD(k,m) can be defined as:
XD k,m = GEx k,m X1 k,m + GE k,m k,m) (2)
where 2 k,m is an estimation of U2(k,m) and where GEx k,r and
GE k, are scaling factors to adjust the energies of the reference signal
X k.m and the extracted signal part U2 k,m of the other input signal
X2 k,m) according to predefined constraints. Additionally, they can be used
to equalize the signals. In some scenarios this might become necessary, es
pecially for 2 k, rn). In the remainder of this paper the time-frequency indi
ces k,m will be omitted for clarity.
The paramount objective is to obtain the signal component U2, which is un
corrected with X . This can be done by utilizing a method being used in the
subject of adaptive noise cancelation but with the difference that it is not
used, as originally intended, to cancel the noise or uncorrelated component,
but instead the correlated signal part, which results in the estimate 02 of U2.
Figure 3 depicts a similarity reducer 10 having a cancelation stage 10a and a
combiner 3 of the first embodiment of such a system. The advantage of this
approach is that W is allowed to be complex and thus phase shifts can be
modeled.
To determine 02,an estimated complex gain Wfor the initially unknown
complex gain W is needed. This is done by minimizing the energy of the
tracted signal U2 in the minimum mean squared (MMS) sense:
/ ( O = E{\X - WX }
= E - WX^ WX (4)
= E{X X - X W *X - X X* + WX1W *X
Setting the partial derivative of 7(14/) with respect to W* to zero leads to the
desired filter coefficients, i.e.:
- W = E{X X* - E{\X \2} aw
In one embodiment, the cancelation module 0a , highlighted by the gray
dashed rectangle in Figure 3 , can be replaced by a reverse phase-aligned
cancelation block 10a' as depicted in Figure 4 , wherein the cancelation stage
10a' comprises a phase shift device 3 configured to align the phase of the
second input signal X2 to the phase of the first input signal X and an abso
lute filter device 11' configured to filter an aligned first input signal (X' 2 by
using absolute valued filter coefficients \W\.
For opposite phase of the first input signal X and the second input signal X2
in addition with sudden signal drops of the first input signal X , phase jumps
and signal cancelation effects may occur within the downmix signal XD. This
effect can be drastically reduced by aligning the phase of the second input
signal X2 towards the phase of the first input signal X . Furthermore, just the
absolute value of W is used to perform the filtering of and hence the can
celation too.
Figure 5 illustrates a similarity reducer 10 and a combiner 3 of a third embod¬
iment, wherein the similarity reducer 10 comprises a signal suppression
stage 10b having a signal suppression device 14 configured to multiply the
second input signal X2 with a suppression gain factor (G) in order to obtain
the extracted signal 02
In practice, the extracted signal U2 obtained using (3) might contain audible
distortions due to estimation errors in the complex gain W. As an alternative,
an estimator 9 (see figure 2) to obtain an estimate 02 of U2 in the minimum
mean squared error (MMSE) sense may be derived. Figure 5 shows a blockdiagram
of the proposed approach.
The extracted signal 2 is then given by
J G) = Ej | - j = E ¾ — ¾ = ¾ —GWXi - G |2
= - 2 + + «i
Setting the partial derivative of ] G) with respect to G to zero leads to the d e
sired gains:
¾(-2 + 2 ) + G WXl = 0 ( 10)
2 L¾ ( l + ?) + 2G = 0
- + + W 0
G · ( ¾ 4-½¾) = F¾ ( 1 1)
G = F F¾
F ¾ + ¾ ¾
According to (12), we can substitute the energy of X2 by the sum of the ener
gies of the filtered version of and the uncorrelated signal U2:
¾ = E |¾ 2 = E{(WX + U ){WX^ + ¾
(12)
For the gains G, this leads to
with SNRUz(wX being the a priori SNR of X2. The complex filter gains Ware
determined using (6).
In one embodiment, the suppression module 10b, highlighted by the dashed
gray rectangle in Figure 5 , can be replaced by a reverse phase-aligned sup¬
pression module 10b' comprising a phase shift device 15 configured to align
the phase of the second input signal X2 to the phase of the first input signal
Figure 6 illustrates a similarity reducer 10b' having such phase shift device
15 as a fourth embodiment of the invention. The suppression gains Gare real-
valued and therefore have no influence on the phase relations of the two
signals X and X2. But since the filter coefficients Whave to be estimated
anyway, additional information on the relative phase between the input sig
nals may be gained. This information can be used to adjust the phase of X2
towards the phase of X . This is done within the reverse phase-aligned suppression
block 10b'; before the suppression gains Gare applied, the phase of
X2 is shifted by the estimated phase of W . With a phase-alignment, the signal
0 can be expressed as
U = X - -G
= (\W\ - w- w X + U2 . - w . 4
which shows that the residual component of X within ϋ 2 is in phase with re
spect to Xx provided that z is correctly estimated.
A combined approach of using canceling as well as suppression of coherent
signal components is depicted in Figure 7, wherein an output signal U'2 .oi
the cancellation stage 0a is fed to an input of the signal suppression stage
10b in order to obtain the extracted signal U . The cancelation stage 10a
comprises a weighting device configured to weight the obtained signal parts
WX of the first input signal X being present in the second input signal X2 ) .
Here, the resulting downmix signal XD is obtained by performing a weighted
cancelation procedure, first, and afterwards applying a suppression gain. The
resulting signal U2 as well as Xx . is energy scaled as before. Due to the
weighting factor g , the signal 0' after the canceling stage still contains some
signal parts correlated to X . To further reduce those signal parts, we derive
the suppression gain G c for the combined approach:
2
a rain E U
' } — U - ¾ W X . ( 16)
J'(G ) = - + 2(1 - G WXl 4- W Ά = 0 ( 17)
The parameter g is in general time and frequency dependent but can also be
chosen as constant. One possibility to determine a time and frequency de¬
pending Y is:
Fig. 8 illustrates a similarity reducer 0 and a combiner 3 of a sixth embodi¬
ment. According to this embodiment the normalized cross-correlation in (19)
is fed as input to a mapping function whose output can be used to determine
the actual y-values. For the mapping, a logistic function can be used which
can be defined as:
= i + ( + 1+¾) * + ) 2
where i defines the input data, A and A l the upper and lower asymptote, R
is the growth rate, v > 0 influences the maximum growth rate near the as¬
ymptote, f o specifies the output value for / (0) and M is the data point i of
maximum growth. In such embodiment, g is determined by
In one embodiment, the reverse phase-aligned cancelation module 10a' can
be used here as well with a small modification. The weighting with g has to
be done analogously after filtering with the absolute value of W .
A sixth embodiment shown in Fig. 8 comprises a more sophisticated applica¬
tion of the reverse phase processing. It affects only time-frequency bins
which were mapped to mainly be suppressed, i.e. g is below a certain
threshold r . For that reason, a flag F defined by
is introduced.
In one embodiment, the reverse phase-aligned cancelation module 10a' can
be used here as well with a small modification. The weighting with g has to
be done analogously after filtering with the absolute value of W .
In some embodiments the scale factor provider 7 provides GE , by which the
energy amount of the uncorrelated signal 02 with respect to X . contributing
to the downmix signal X D can be controlled. These scale factors G¾ can be
seen as an equalizer. In general, this is done frequency dependent and in the
preferred embodiment manually by a sound engineer. Of course, plenty of
different mixing ratios are possible and these highly depend on the experi¬
ence and/or taste of the sound engineer. Alternatively, the scale factors
6' can be a function of the signals X , X 2 and 02.
In some embodiments the scale factor provider 4 provides GEx , by which the
energy amount of the first input signal X contributing to the downmix signal
X D can be controlled. If the downmixing process ought to be energy preserving
(i.e., the downmix signal contains the same amount of energy as the orig
inal stereo signal) or at least if the perceived sound level ought to stay the
same, additional processing is required. The following consideration is made
with the objection to keep the perceived sound level of the individual signal
parts in the downmix signal constant. In the preferred embodiment, the ener
gy is scaled according to a derived optimal-downmix-energy consideration.
One may consider two signals X and X and assume them to be highly co r
related as it would be the case, for instance, for an amplitude panned source
with E C C 2* } ¹ 0. The signal X can be expressed as = a X such
that the downmix signal X results in
X + -X (23)
The energy of X is given by
We now assume the two signals to be fully uncorrelated with "{ " '2"'} = 0 .
The downmix signal X results in
(25)
The energy of X is given by
= E | } + E { | |3 } (26)
= (1 + 6) . E
From these considerations, one can see the energy of an optimal downmix of
the correlated signal parts would result in
E { |¾ } = E )¾ + E { |WX , ( 27)
with corresponding to ain (23) and for the uncorrelated signal parts, a
simple addition of the energy has to be done. The final optimal downmix en
ergy with respect to the assumed signal model and the desired downmix sig
nal in ( 1) and (2) would then result in
In order to make sure X and XDcontain the same amount of energy, we
introduced the energy scaling factors GEx
and GE ,where the latter is provid
ed by the scale factor provider U2. The actual downmix signal XDcomputes
as
Given the optimal downmix energy and GE ,we can now derive GEx
as fo l
lows:
{ ¾ I } =e |¾ 2} (30)
With (12) the middle part of equation (32) is identified as
so it becomes
To downmix multiple input channels , X 2 , X , a cascade of multiple twochannel
downmix stages 1 can be used. In Figure 9 , an example is shown for
three input signals X X 2 , X -
The final downmix signal X D for a two staged system results in
s = ¾ + »¾
= G EX I + G E V J ) + , ¾ 34
G u ' + G 9 U
Key-features of an embodiment of the invention are:
• Considering X as a reference signal and considering X 2 as a mixture
of a filtered version of X , and therefore a correlated signal part W X
and an uncorrelated signal part U2 with respect to X .
• Separation/Decomposition of X 2 into its two afore-mentioned signal
components. Dissimilarity extraction of X . and X2 via
- estimation of the similarity of X . and X2 , which results in
a filter coefficient W and
- similarity reduction either by cancelation or suppression
of correlated signal parts or a combination of both, which
results in an estimated uncorrelated signal part U2 .
• Energy scaling of to meet a predefined energy level.
• Energy scaling of 02.
· Summing up the energy scaled signals to form the desired downmix
signal XD .
• Processing in frequency bands.
Optional implementation features are:
• Reverse phase-aligned suppression or reverse phase-aligned can¬
celation.
Cascade of two or more downmix blocks to perform a multi-channel
downmix.
Only partially applied reverse phase-aligned suppression.
Although some aspects have been described in the context of an apparatus,
it is clear that these aspects also represent a description of the correspond¬
ing method, where a block or device corresponds to a method step or a fea
ture of a method step. Analogously, aspects described in the context of a
method step also represent a description of a corresponding block or item or
feature of a corresponding apparatus.
Depending on certain implementation requirements, embodiments of the in¬
vention can be implemented in hardware or in software. The implementation
can be performed using a non-transitory storage medium such as a digital
storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM,
a PROM, and EPROM, an EEPROM or a FLASH memory, having electroni
cally readable control signals stored thereon, which cooperate (or are capa¬
ble of cooperating) with a programmable computer system such that the re¬
spective method is performed. Therefore, the digital storage medium may be
computer readable.
Some embodiments according to the invention comprise a data carrier hav
ing electronically readable control signals, which are capable of cooperating
with a programmable computer system, such that one of the methods de
scribed herein is performed.
Generally, embodiments of the present invention can be implemented as a
computer program product with a program code, the program code being
operative for performing one of the methods when the computer program
product runs on a computer. The program code may, for example, be stored
on a machine readable carrier.
Other embodiments comprise the computer program for performing one of
the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer
program having a program code for performing one of the methods de¬
scribed herein, when the computer program runs on a computer.
A further embodiment of the inventive method is, therefore, a data carrier (or
a digital storage medium, or a computer-readable medium) comprising, recorded
thereon, the computer program for performing one of the methods de¬
scribed herein. The data carrier, the digital storage medium or the recorded
medium are typically tangible and/or non-transitionary.
A further embodiment of the invention method is, therefore, a data stream or
a sequence of signals representing the computer program for performing one
of the methods described herein. The data stream or the sequence of signals
may, for example, be configured to be transferred via a data communication
connection, for example, via the internet.
A further embodiment comprises a processing means, for example, a com¬
puter or a programmable logic device, configured to, or adapted to, perform
one of the methods described herein.
A further embodiment comprises a computer having installed thereon the
computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a
system configured to transfer (for example, electronically or optically) a com
puter program for performing one of the methods described herein to a receiver.
The receiver may, for example, be a computer, a mobile device, a
memory device or the like. The apparatus or system may, for example, com
prise a file server for transferring the computer program to the receiver .
In some embodiments, a programmable logic device (for example, a field
programmable gate array) may be used to perform some or all of the func¬
tionalities of the methods described herein. In some embodiments, a field
programmable gate array may cooperate with a microprocessor in order to
perform one of the methods described herein. Generally, the methods are
preferably performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of
the present invention. It is understood that modifications and variations of the
arrangements and the details described herein will be apparent to others
skilled in the art. It is the intent, therefore, to be limited only by the scope of
the impending patent claims and not by the specific details presented by way
of description and explanation of the embodiments herein.
Reference signs:
1 audio signal processing device
2 dissimilarity extractor
3 combiner
4 first energy scaling device
5 first scale factor provider
6 second energy scaling device
7 second scale factor provider
8 sum up device
9 similarity estimator
10 similarity reducer
10a cancelation stage
0a' cancelation stage
10b suppression stage
10b' suppression stage
11 complex filter device
1' absolute filter device
12 signal cancellation device
13 phase shift device
14 suppression device
15 phase shift device
16 weighting device
first input signal
2
second input signal
X D downmix signal
u extracted signal
first scale factor
i s a first scaled input signal
w filter coefficients
W X signal parts of the first input signal being present in the second input
signal (X 2 )
X ' signal derived from the second input signal
y weighting factor
yWX 1 weighted signal parts of the first input signal being present in the second
input signal (X 2 )
References:
[1] ITU-R BS.775-2, "Multichannel Stereophonic Sound System With And
Without Accompanying Picture," 07/2006.
[2] R. Dressier, (05.08.2004) Dolby Surround Pro Logic I I Decoder Principles
of Operation. [Online]. Available:
http://www.dolby.com/uploadedFiles/Assets/US/Doc/Professional/209_Dolby
_Surround_Pro_LogicJI_Decoder_Principles_of_Operation.pdf.
[3] K. Lopatka, B. Kunka, and A. Czyzewski, "Novel 5.1 Downmix Algorithm
with Improved Dialogue Intelligibility," in 34th Convention of the AES, 2013.
[4] J. Breebaart, K. S. Chong, S. Disch, C. Faller, J. Herre, J. Hilpert, K. Kjorling,
J. Koppens, K. Linzmeier, W. Oomen, H. Purnhagen, and J. Roden,
"MPEG Surround - the ISO/MPEG Standard for Efficient and Compatible
Multi-Channel Audio Coding," J. Audio Eng. Soc, vol. 56, no. 11, pp. 932-
955, 2007.
[5] M. Neuendorf, M. Multrus, N. Rellerbach, R. J. Fuchs Guillaume, J.
Lecomte, Wilde Stefan, S. Bayer, S. Disch, C. Helmrich, R. Lefebvre, P.
Gournay, B. Bessette, J. Lapierre, K. Kjorling, H. Purnhagen, L. Villemoes,
W. Oomen, E. Schuijers, K. Kikuiri, T . Chinen, T. Norimatsu, C. K. Seng, E.
Oh, M. Kim, S. Quackenbush, and B. Grill, "MPEG Unified Speech and Audio
Coding - The ISO/MPEG Standard for High-Efficiency Audio Coding of all
Content Types," J. Audio Eng. Soc, vol. 32nd Convention, 2012.
[6] C. Faller and F. Baumgarte, "Binaural Cue Coding-Part II: Schemes and
Applications," Speech and Audio Processing, IEEE Transactions on, vol. 11,
no. 6, pp. 520-531 , 2003.
[7] F. Baumgarte, "Equalization for Audio Mixing," Patent US 7,039,204 B2,
2003.
[8] J. Thompson, A. Warner, and B. Smith, "An Active Multichannel Downmix
Enhancement for Minimizing Spatial and Spectral Distortions," in 127nd Con
vention of the AES, October 2009.
[9] G. Stoll, J. Groh, M. Link, J. Deigmoller, B. Runow, M. Keil, R. Stoll, M.
Stoll, and C. Stoll, "Method for Generating a Downward-Compatible Sound
Format," US Patent US2012/0 014 526, 2012.
[ 0] B. Runow and J. Deigmoller, "Optimierter Stereo-Dowmix von 5.1-
Mehrkanalproduktionen: An optimized Stereo-Downmix of a 5.1 multichannel
audio production," in 25. Tonmeistertagung - VDT International Convention,
2008.
[1 Samsudin, E. Kumiawati, Ng Boon Poh, F. Sattar, and S. George, "A
Stereo toMono Dowmixing Scheme for MPEG-4 Parametric Stereo Encoder,"
in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceed¬
ings. 2006 IEEE International Conference on, vol. 5 , 2006, p. V. 2 .
[12] M. Kim, E. Oh, and H. Shim, "Stereo audio coding improved by phase
parameters," in 129th Convention of the AES, 2010.
[13] W. Wu, L. Miao, Y. Lang, and D. Virette, "Parametric Stereo Coding
Scheme with a New Downmix Method and Whole Band Inter Channel
Time/Phase Differences," Acoustics, Speech and Signal Processing, IEEE
Transactions on, pp. 556-560, 201 3.
ARTICLE 34 AMENDED CLAIMS
I/We Claim:
1. An audio signal processing device (l)fordownmixing of a first input signal
[Xi) and a second input signal (X2) to a downmix signal (XD), wherein the
first input signal (XJ and the second input signal (X2) are at least partly
correlated, comprising:
a dissimilarity extractor (2) configured to receive the first input signal (A^)
and the second input (X2) signal as well as to output an extracted signal
{U2), which is lesser correlated with respect to the first input signal (XJ
than the second input signal (A^) and
a combiner (3) configured to combine the first input signal (X±) and the extracted
signal (02) in order to obtain the downmix signal (XD),
wherein the dissimilarity extractor (2) comprises a similarity estimator (9)
configured to provide filter coefficients {W, |K/|) for obtaining signal parts
{WXV IM/A'il) of the first input signal (Xx) being present in the second input
signal (X2) from the first input signal {Xx),
wherein the dissimilarity extractor (2) comprises a similarity reducer (10)
configured to reduce the obtained signal parts (WXlf IWXJ) of the first input
signal being present in the second input signal (X2) based on the filter
coefficients (MMiy|),
wherein the similarity reducer (10) comprises a signal suppression stage
(10b, 10b') having a signal suppression device (14) configured to multiply
the second input signal {X2) or a signal (A"2) derived from the second input
signal {X2) with a suppression gain factor (G) in order to obtain the extracted
signal (02),
wherein the suppression gain factor (G) is chosen in such way that a mean
squared error between the extracted signal (U2) and a signal part (U2) of
the second input signal (X2), which is uncorrelated with the first input signal
(Xi), is minimized.
2. A device according to the preceding claim, wherein the combiner (3) comprises
an energy scaling system (4, 5, 6, 7) configured in such way that
the ratio of the energy of the downmix (XD) and the summed up energies
of the first input signal (A^) and the second input signal (X2) is independent
from the correlation of the first input signal (Xx) and the second input signal
(X2).
3. A device according to the preceding claim, wherein the energy scaling system
(4, 5, 6, 7) comprises a first energy scaling device (4) configured to
scale the first input signal (Xt) based on a first scale factor (GEx) in order to
obtain a scaled input signal (Xls).
A device according to the preceding claim, wherein the energy scaling system
(4, 5, 6, 7) comprises a first scale factor provider (5) configured to provide
the first scale factor (GEx), wherein the first scale factor provider (5)
preferably is designed as a processor (5) configured to calculate the first
scale factor (GEx) depending on the first input signal (XJ, the second input
signal (X2) and/or the extracted signal (02).
5. A device according to one of the claims 2 to 4, wherein the energy scaling
system (4, 5, 6, 7) comprises a second energy scaling device (6) configured
to scale the extracted signal (02) based on a second scale factor
(GEu) in order to obtain a scaled extracted signal (02s).
6. A device according to the preceding claim, wherein the energy scaling system
(4, 5, 6, 7) comprises a second scale factor provider (7) configured to
4.
provide the second scale factor (GEu), wherein the second scale factor provider
(7) preferably is designed as a man-machine interface configured for
manually inputting the second scale factor(GEJ.
7. A device according to one of the preceding claims, wherein the combiner
(3) comprises a sum up device (8) for outputting the downmix signal {XD)
based on the first input signal (XJ and based on the extracted signal (02).
8. A device according to one of the preceding claims, wherein the similarity
reducer (10) comprises a cancelation stage (10a, 10a') having a signal
cancellation device (12) configured to subtract the obtained signal parts
{WXt, \WXil) of the first input signal {Xx) being present in the second input
signal (#2) or a signal {yWX^) derived from the obtained signal parts
(WXlt \WX-LI) from the second input signal (X2) or from a signal (X'2) derived
from the second input signal (X2).
9. A device according to claim 8, wherein the cancelation stage (10a) comprises
a complex filter device (11) configured to filter the first input signal
{Xx) by using complex valued filter coefficients W.
10. A device according to claim 8 or 9, wherein the cancelation stage (10a')
comprises a phase shift device (13) configured to align the phase of the
second input signal (X2) to the phase of the first input signal (Xx).
11. A device according to one of the claims 8 to 10, wherein an output signal
(0'2) of the cancelation stage (10a) is fed to an input of the signal suppression
stage (10b) in order to obtain the extracted signal (02), or wherein an
output signal of the signal suppression stage (10b) is fed to an input of the
cancellation stage (10a) in order to obtain the extracted signal (02).
12. A device according to the preceding claim, wherein the cancelation stage
(10a) comprises a weighting device (16) configured to weight the obtained
signal parts (WXlt \WXX\) of the first input signal (XJ being present in the
second input signal (X2) depending on a weighting factor (y).
13. A device according to one of the preceding claims, wherein the signal suppression
stage (10bp) comprises a phase shift device (15) configured to
align the phase of the second input signal (X2) to the phase of the first input
signal (A^).
14. A device according to claim 10 and 12, wherein the phase shift device (13)
is configured to align the phase of the second input signal (X2) to the
phase of the first input signal (XJ depending on the weighting factor (y).
15. A device according to the preceding claim, wherein the phase shift device
(13) is configured to align the phase of the second input signal (X2) to the
phase of the first input signal (Xx) only, if the weighting factor (y) is smaller
or equal to a predefined threshold (r).
16. An audio signal processing system for downmixing of a plurality of input
signals (Xlt X2, X3) to a downmix signal (XD2) comprising at least a first device
(1) according to one of the preceding claims and a second device (1')
according to one of the preceding claims, wherein the downmix signal
{XD1) of the first device is fed to the second device as a first input signal
(XD1) or as a second input signal.
17. A method for downmixing of a first input signal (Xx) and a second input signal
(X2) to a downmix signal (XD) comprising the steps of:
extracting an extracted signal (V2) from the second input signal (X2),
wherein the extracted signal (02) is lesser correlated with respect to the
first input signal (A^) than the second input signal (X2)
summing up the first input signal (Xx) and the extracted signal (02) in order
to obtain the downmix signal {XD)
providing filter coefficients {W, \W\) for obtaining signal parts (WXlt IWX-^I)
of the first input signal {Xx) being present in the second input signal (X2)
from the first input signal (A^),
reducing the obtained signal parts (WX1, [WX^) of the first input signal being
present in the second input signal (X2) based on the filter coefficients
(W,\W\),
multiplying the second input signal (X2) or a signal (X'2) derived from the
second input signal (A^) with a suppression gain factor (G) in order to obtain
the extracted signal (U2),
wherein the suppression gain factor (G) is chosen in such way that a mean
squared error between the extracted signal (Uz) and a signal part (U2) of
the second input signal (X2), which is uncorrelated with the first input signal
(Ai), is minimized.
18. A computer program for implementing the method of claim 17 when being
executed on a computer or signal processor.
| # | Name | Date |
|---|---|---|
| 1 | Form 5 [23-03-2016(online)].pdf | 2016-03-23 |
| 2 | Form 3 [23-03-2016(online)].pdf | 2016-03-23 |
| 3 | Form 18 [23-03-2016(online)].pdf | 2016-03-23 |
| 4 | Drawing [23-03-2016(online)].pdf | 2016-03-23 |
| 5 | Description(Complete) [23-03-2016(online)].pdf | 2016-03-23 |
| 6 | Other Patent Document [19-05-2016(online)].pdf | 2016-05-19 |
| 7 | Form 26 [19-05-2016(online)].pdf | 2016-05-19 |
| 8 | 201617010266-GPA-(20-05-2016).pdf | 2016-05-20 |
| 9 | 201617010266-Correspondence Others-(20-05-2016).pdf | 2016-05-20 |
| 10 | 201617010266--Form-1-(20-05-2016).pdf | 2016-05-20 |
| 11 | 201617010266--Correspondence Others-(20-05-2016).pdf | 2016-05-20 |
| 12 | 201617010266.pdf | 2016-06-06 |
| 13 | abstract.jpg | 2016-07-08 |
| 14 | Form 3 [25-08-2016(online)].pdf | 2016-08-25 |
| 15 | Form 3 [20-02-2017(online)].pdf | 2017-02-20 |
| 16 | 201617010266-FORM 3 [27-02-2018(online)].pdf | 2018-02-27 |
| 17 | 201617010266-FORM 3 [16-08-2018(online)].pdf | 2018-08-16 |
| 18 | 201617010266-FORM 3 [25-02-2019(online)].pdf | 2019-02-25 |
| 19 | 201617010266-FORM 3 [12-08-2019(online)].pdf | 2019-08-12 |
| 20 | 201617010266-FER.pdf | 2019-08-21 |
| 21 | 201617010266-Information under section 8(2) (MANDATORY) [17-12-2019(online)].pdf | 2019-12-17 |
| 22 | 201617010266-FORM 4(ii) [17-02-2020(online)].pdf | 2020-02-17 |
| 23 | 201617010266-FORM 3 [20-02-2020(online)].pdf | 2020-02-20 |
| 24 | 201617010266-OTHERS [20-05-2020(online)].pdf | 2020-05-20 |
| 25 | 201617010266-FER_SER_REPLY [20-05-2020(online)].pdf | 2020-05-20 |
| 26 | 201617010266-DRAWING [20-05-2020(online)].pdf | 2020-05-20 |
| 27 | 201617010266-COMPLETE SPECIFICATION [20-05-2020(online)].pdf | 2020-05-20 |
| 28 | 201617010266-CLAIMS [20-05-2020(online)].pdf | 2020-05-20 |
| 29 | 201617010266-ABSTRACT [20-05-2020(online)].pdf | 2020-05-20 |
| 30 | 201617010266-FORM 3 [11-08-2020(online)].pdf | 2020-08-11 |
| 31 | 201617010266-FORM 3 [16-08-2021(online)].pdf | 2021-08-16 |
| 32 | 201617010266-FORM 3 [08-02-2022(online)].pdf | 2022-02-08 |
| 33 | 201617010266-PatentCertificate02-05-2023.pdf | 2023-05-02 |
| 34 | 201617010266-IntimationOfGrant02-05-2023.pdf | 2023-05-02 |
| 1 | SearchPattern201617010266_20-08-2019.pdf |