Abstract: An apparatus for generating an enhanced downmix signal on the basis of a multi channel microphone signal comprises a spatial analyzer configured to compute a set of spatial cue parameters comprising a direction information describing a direction of arrival of a direct sound, a direct sound power information and a diffuse sound power information on the basis of the multi channel microphone signal. The apparatus also comprises a filter calculator for calculating enhancement filter parameters in dependence on the direction information describing the direction of arrival of the direct sound, in dependence on the direct sound power information and in dependence on the diffuse sound power information. The apparatus also comprises a filter for filtering the microphone signal, or a signal derived therefrom using the enhancement filter parameters, to obtain the enhanced downmix signal.
Apparatus for Generating an Enhanced Downmix Signal, Method for Generating an
Enhanced Downmix Signal and Computer Program
Description
Embodiments according to the invention are related to an apparatus for generating an
enhanced downmix signal, to a method for generating an enhanced downmix signal and to
a computer program for generating an enhanced downmix signal.
An embodiment according to the invention is related to an enhanced downmix computation
for spatial audio microphones.
Background of the Invention
Recording surround sound with a small microphone configuration remains a challenge.
One of the most widely known such configuration is a Soundfield microphone and
corresponding surround decoders (see, for example, reference [3]), which filter and
combine its four nearly-coincident microphone capsule signals to generate the surround
sound output channels. While high single channel signal fidelity is maintained, the
weakness of this approach is its limited channel separation related to limited directivity of
first order microphone directional responses.
Alternatively, techniques based on a parametric representation of the observed sound field
can be applied. In reference [2], a method has been proposed using conventional coincident
stereo microphone pairs to record surround sound. It was shown how to estimate the spatial
cue parameters direct-to-diffuse-sound-ratios and directions-of-arrival of sound from these
directional microphone signals and how to apply this information to drive a spatial audio
coding synthesis to generate surround sound. In reference [2] it has also been discussed,
how the parametric information, i.e., direction-of-arrival (DOA) of sound and the diffusesound-
ratio (DSR) of the sound field can be used to directly computing the specific spatial
parameters that are used in MPEG Surround (MPS) coding scheme (see, for example,
reference [6]).
MPEG Surround is parametric representation of multi-channel audio signals, representing
an efficient approach to high-quality spatial audio coding. MPS exploits the fact that, from
a perceptual point of view, multi-channel audio signals contain significant redundancy with
respect to the different loudspeaker channels. The MPS encoder takes multiple loudspeaker
signals as input, where the corresponding spatial configuration of the loudspeakers has to
be known in advance. Based on these input signals, the MPS encoder computes spatial
parameters in frequency subbands, such as channel level differences (CLD) between two
channels and inter channel correlation (ICC) between two channels. The actual MPS side
information is then derived from these spatial parameters. Furthermore, the encoder
computes a downmix signal, which could consist of one or more audio channels.
It has been found out that the stereo microphone input signals are well suitable to estimate
the spatial cue parameters. However, it has also been found out that the unprocessed stereo
microphone input signal is in general not well suitable to be directly used as the
corresponding MPEG Surround downmix signal. It has been found that in many cases,
crosstalk between left and right channels is too high, resulting in a poor channel separation
in the MPEG Surround decoded signals.
In view of this situation, there is a need for a concept for generating an enhanced downmix
signal on the basis of a multi-channel microphone signal, such that the enhanced downmix
signals leads to a sufficiently good spatial audio quality and localization property after
MPEG Surround decoding.
Summary of the Invention
This objective is achieved by the claimed apparatus for generating an enhanced downmix
signal, by the claimed method for generating an enhanced downmix signal and by the
claimed computer program for generating an enhanced downmix signal.
An embodiment according to the invention creates an apparatus for generating an enhanced
downmix signal on the basis of a multi-channel microphone signal. The apparatus
comprises a spatial analyzer configured to compute a set of spatial cue parameters
comprising a direction information describing a direction-of-arrival of direct sound, a
direct sound power information and a defuse sound power information on the basis of the
multi-channel microphone signal. The apparatus also comprises a filter calculator for
calculating enhancement filter parameters in dependence on the direction information
describing the direction-of-arrival of the direct sound, in dependence on the direct sound
power information and in dependence on the diffuse sound power information. The
apparatus also comprises a filter for filtering the microphone signal, or a signal derived
therefrom, using the enhancement filter parameters, to obtain the enhanced downmix
signal.
This embodiment according to the invention is based on the finding that an enhanced
downmix signal, which is better-suited than the input multi-channel microphone signal,
can be derived from the input multi-channel microphone signal by a filtering operation,
and that the filter parameters for such a signal enhancement filtering operation can be
derived efficiently from the spatial cue parameters.
Accordingly, it is possible to reuse the same information, namely the spatial cue
parameters, which is also well-suited for the derivation of the MPEG Surround parameters,
for the computation of the enhancement filter parameters. Accordingly, a highly-efficient
system can be created using the above-described concept.
Moreover, it is possible to derive a downmix signal, which allows for a good channel
separation when processed in an MPEG surround decoder even if the channel signals of the
multi-channel microphone signal only comprise a low spatial separation. Accordingly, the
enhanced downmix signal may lead to a significantly improved spatial audio quality and
localization property after MPEG Surround decoding compared to conventional systems.
To summarize, the above-described embodiment according to the invention allows to
provide an enhanced downmix signal having good spatial separation properties at moderate
computational effort.
In a preferred embodiment, the filter calculator is configured to calculate the enhancement
filter parameters such that the enhanced downmix signal approximates a desired downmix
signal. Using this approach, it can be ensured that the enhancement filter parameters are
well-adapted to a desired result of the filtering. For example, enhancement filter
parameters can be calculated such that one or more statistical properties of the enhanced
downmix signal approximate desired statistical properties of the downmix signal.
Accordingly, it can be reached that the enhanced downmix signal is well-adapted to the
expectations, wherein the expectations can be defined numerically in terms of desired
correlation values.
In a preferred embodiment, the filter calculator is configured to calculate desired
correlation values between the multi-channel microphone signal (or, more precisely,
channel signals thereof) and desired channel signals of the downmix signal in dependence
on the spatial cue parameters. In this case, the filter calculator is preferably configured to
calculate the enhancement filter parameters in dependence on the desired cross-correlation
values. It has been found that said cross-correlation values are a good measure of whether
the channel signals of the downmix signal exhibit sufficiently good channel separation
characteristics. Also, it has been found that the desired correlation values can be computed
with moderate computational effort on the basis of the spatial cue parameters.
In a preferred embodiment, the filter calculator is configured to calculate the desired crosscorrelation
values in dependence on direction-dependent gain factors, which describe
desired contributions of a direct sound component of the multi-channel microphone signal
to a plurality of loudspeaker signals, and in dependence on one or more downmix matrix
values which describe desired contributions of a plurality of audio channels (for example,
loudspeaker signals) to one or more channels of the enhanced downmix signal. It has been
found that both the direction-dependent gain factors and the downmix matrix values are
very well-suited for computing the desired cross-correlation values and that said directiondependent
gain factors and said downmix matrix values are easily obtainable. Moreover, it
has been found that the desired cross-correlation values are easily obtainable on the basis
of said information.
In a preferred embodiment, the filter calculator is configured to map the direction
information onto a set of direction-dependent gain factors. It has been found that a multi¬
channel amplitude panning law may be used to determine the gain factors with moderate
effort in dependence on the direction information. It has been found that the direction-ofarrival
information is well-suited to determine the direction-dependent gain factors, which
may describe, for example, which speakers should render the direct sound component. It is
easily understandable that the direct sound component is distributed to different speaker
signals in dependence on the direction-of-arrival information (briefly designated as
direction information), and that it is relatively simple to determine the gain factors which
describe which of the speakers should render the direct sound component. For example, the
mapping rule, which is used for mapping the direction information onto the set of
direction-dependent gain factors, may simply determine that those speakers, which are
associated to the direction of arrival, could render (or mainly render) the direct sound
component, while the other speakers, which are associated with other directions, should
only render a small portion of the direct sound component or should even suppress the
direct sound component.
In a preferred embodiment, the filter calculator is configured to consider the direct sound
power information and the diffuse sound power information to calculate the desired crosscorrelation
values. It has been found that the consideration of the powers of both of said
sound components (direct sound component and diffuse sound component) results in a
particularly good hearing impression, because both the direct sound component and the
diffuse sound component can be properly allocated to the channel signals of the (typically
multi-channel) downmix signal.
In a preferred embodiment, the filter calculator is configured to weight the direct sound
power information in dependence on the direction information, and to apply a
predetermined weighting, which is independent from the direction information, to the
diffuse sound power information, in order to calculate the desired cross-correlation values.
Accordingly, it can be distinguished between the direct sound components and the diffuse
sound components, which results in a particularly realistic estimation of the desired crosscorrelation
values.
In a preferred embodiment, the filter calculator is configured to evaluate a Wiener-Hopf
equation to derive the enhancement filter parameters. In this case, the Wiener-Hopf
equation describes a relationship between correlation values describing a correlation
between different channel pairs of the multi-channel microphone signal, enhancement filter
parameters and desired cross-correlation values between channel signals of the multi
channel microphone signal and desired channel signals of the downmix signal. It has been
found that the evaluation of such a Wiener-Hopf equation results in enhancement filter
parameters which are well-adapted to the desired correlation characteristics of the channel
signals of the downmix signal.
In a preferred embodiment, the filter calculator is configured to calculate the enhancement
filter parameters in dependence on a model of desired downmix channels. By modeling the
desired downmix channels, the enhancement filter parameters can be computed such that
they yield a downmix signal which allows for a good reconstruction of desired multi
channel speaker signals in a multi-channel decoder.
In some embodiments, the model of the desired downmix channels may comprise a model
of an ideal downmixing, which would be performed if the channel signals (for example,
loudspeaker signals) were available individually. Moreover, the modeling may include a
model of how individual channel signals could be obtained from the multi-channel
microphone signal, even if the multi-channel microphone signal comprises channel signals
having only a limited spatial separation. Accordingly, an overall model of the desired
downmix channels can be obtained, for example, by combining a modeling of how to
obtain individual channel signals (for example, loudspeaker signals) and how to derive
desired downmix channels from said individual channel signals. Thus, it is a sufficiently
good reference for the calculation of the enhancement filter parameters obtainable with
relatively small computational effort.
In a preferred embodiment, the filter calculator is configured to selectively perform a
single-channel filtering, in which a first channel of the downmix signal is derived by a
filtering of a first channel of the multi-channel microphone signal and in which a second
channel of the downmix signal is derived by a filtering of a second channel of the multi¬
channel microphone signal while avoiding a cross talk from the first channel of the multi¬
channel microphone signal to the second channel of the downmix signal and from the
second channel of the multi-channel microphone signal to the first channel of the downmix
signal, or a two-channel filtering, in which a first channel of the downmix signal is derived
by filtering a first and a second channel of the multi-channel microphone signal, and in
which a second channel of the downmix signal is derived by filtering a first and a second
channel of the multi-channel microphone signal. The selection of the single-channel
filtering and of the two-channel filtering is made in dependence on a correlation value
describing a correlation between the first channel of the multi-channel microphone signal
and the second channel of the multi-channel microphone signal. By selecting between the
single-channel filtering and the two-channel filtering, numeric errors can be avoided which
may sometimes appear if the two-channel filtering is used in a situation in which the left
and right channel are highly correlated. Accordingly, a good-quality downmix signal can
be obtained irrespective of whether the channel signals of the multi-channel microphone
signal are highly correlated or not.
Another embodiment according to the invention creates a method for generating an
enhanced downmix signal.
Another embodiment according to the invention creates a computer program for
performing said method for generating an enhanced downmix signal.
The method and the computer program are based on the same findings as the apparatus and
may be supplemented by any of the features and functionalities discussed with respect to
the apparatus.
Brief Description of the Figures
Embodiments according to the present invention will subsequently be described taking
reference to the enclosed figures in which:
Fig. 1 shows a block schematic diagram of an apparatus for generating an
enhanced downmix signal, according to an embodiment of the invention;
shows a graphic illustration of the spatial audio microphone processing,
according to an embodiment of the invention;
shows a graphic illustration of the enhanced downmix computation,
according to an embodiment of the invention;
shows a graphic illustration of the channel mapping for the computation of
the desired downmix signals Y and Y2, which may be used in embodiments
according to the invention;
shows a graphic illustration of an enhanced downmix computation based on
preprocessed microphone signals, according to an embodiment of the
invention;
shows a schematic representation of computations for deriving the
enhancement filter parameters from the multi-channel microphone signal,
according to an embodiment of the invention; and
shows a schematic representation of computations for deriving the
enhancement filter parameters from the multi-channel microphone signal,
according to another embodiment of the invention.
Detailed Description of the Embodiments
. Apparatus for Generating an Enhanced Downmix Signal According to Fig. 1
Fig. 1 shows a block schematic diagram of an apparatus 100 for generating an enhanced
downmix signal on the basis of a multi-channel microphone signal. The apparatus 100 is
configured to receive a multi-channel microphone signal 110 and to provide, on the basis
thereof, an enhanced downmix signal 112. The apparatus 100 comprises a spatial analyzer
120 configured to compute a set of spatial cue parameters 122 on the basis of the multi¬
channel microphone signal 10. The spatial cue parameters typically comprise a direction
information describing a direction-of-arrival of direct sound (which direct sound is
included in the multi-channel microphone signal), a direct sound power information and a
diffuse sound power information. The apparatus 100 also comprises a filter calculator 130
for calculating enhancement filter parameters 132 in dependence on the spatial cue
parameters 122, i.e., in dependence on the direction information describing the directionof-
arrival of direct sound, in dependence on the direct sound power information and in
dependence on the diffuse sound power information. The apparatus 100 also comprises a
filter 1 0 for filtering the microphone signal 110, or a signal 110' derived therefrom, using
the enhancement filter parameters 132, to obtain the enhanced downmix signal 112. The
signal 110' may optionally be derived from the multi-channel microphone signal 110 using
an optional pre-processing 150.
Regarding the functionality of the apparatus 100, it can be noted that the enhanced
downmix signal 112 is typically provided such that the enhanced downmix signal 112
allows for an improved spatial audio quality after MPEG Surround decoding when
compared to the multi-channel microphone signal 110, because the enhancement filter
parameters 132 are typically provided by the filter calculator 130 in order to achieve this
objective. The provision of the enhancement filter parameters 130 is based on the spatial
cue parameters 122 provided by the spatial analyzer, such that the enhancement filter
parameters 130 are provided in accordance with a spatial characteristic of the multi¬
channel microphone signal 110, and in order to emphasize the spatial characteristic of the
multi-channel microphone signal 110. Accordingly, the filtering performed by the filter
140 allows for a signal-adaptive improvement of the spatial characteristic of the enhanced
downmix signal 112 when compared to the input multi-channel microphone signal 110.
Details regarding the spatial analysis performed by the spatial analyzer 120, with respect to
the filter parameter calculation performed by the filter calculator 130 and with respect to
the filtering performed by the filter 140 will subsequently be described in more detail.
2. Apparatus for Generating an Enhanced Downmix Signal According to Fig. 2
Fig. 2 shows a block schematic diagram of an apparatus 200 for generating an enhanced
downmix signal (which may take the form of a two-channel audio signal) and a set of
spatial cues associated with an upmix signal having more than two channels. The apparatus
200 comprises a microphone arrangement 205 configured to provide a two-channel
microphone signal comprising a first channel signal 210a and a second channel signal
210b.
The apparatus 200 further comprises a processor 216 for providing a set of spatial cues
associated with an upmix signal having more than two channels on the basis of a twochannel
microphone signal. The processor 216 is also configured to provide enhancement
filter parameters 232. The processor 216 is configured to receive, as its input signals, the
first channel signal 210a and the second channel signal 2 b provided by the microphone
arrangement 205. The apparatus 216 is configured to provide the enhancement filter
parameters 232 and to also provide a spatial cue information 262. The apparatus 200
further comprises a two-channel audio signal provider 240, which is configured to receive
the first channel signal 210a and the second channel signal 210b provided by the
microphone arrangement 205 and to provide processed versions of the first channel
microphone signal 210a and of the second channel microphone signal 210b as the twochannel
audio signal 212 comprising channel signals 212a, 212b.
The microphone arrangement 205 comprises a first directional microphone 206 and a
second directional microphone 208. The first directional microphone 206 and the second
directional microphone 208 are preferably spaced by no more than 30cm. Accordingly, the
signals received by the first directional microphone 206 and the second directional
microphone 208 are strongly correlated, which has been found to be beneficial for the
calculation of a component energy information (or component power information) 122a
and a direction information 122b by the signal analyzer 220. However, the first directional
microphone 206 and the second directional microphone 208 are oriented such that a
directional characteristic 209 of the second directional microphone 208 is a rotated version
of a directional characteristic 207 of the first directional microphone 206. Accordingly, the
first channel microphone signal 210a and the second channel microphone signal 210b are
strongly correlated (due to the spatial proximity of the microphones 206, 208) yet different
(due to the different directional characteristics 207, 209 of the directional microphones
206, 208). In particular, a directional signal incident on the microphone arrangement 205
from an approximately constant direction causes strongly correlated signal components of
the first channel microphone signal 210a and the second channel microphone signal 210b
having a temporally constant direction-dependent amplitude ratio (or intensity ratio). An
ambient audio signal incident on the microphone array 205 from temporally-varying
directions causes signal components of the first channel microphone signal 210a and the
second channel microphone signal 210b having a significant correlation, but temporally
fluctuating amplitude ratios (or intensity ratios). Accordingly, the microphone arrangement
205 provides a two-channel microphone signal 210a, 210b, which allows the signal
analyzer 220 of the processor 216 to distinguish between direct sound and diffuse sound
even though the microphones 206, 208 are closely spaced. Thus, the apparatus 200
constitutes an audio signal provider, which can be implemented in a spatially compact
form, and which is, nevertheless, capable of providing spatial cues associated with an
up ix signal having more than two channels.
The spatial cues 262 can be used in combination with the provided two-channel audio
signal 2a, 212b by a spatial audio decoder to provide a surround sound output signal.
In the following, some further explanations regarding the apparatus 200 will be given. The
apparatus 200 optionally comprises a microphone arrangement 205, which provides the
first channel signal 210a and the second channel signal 210b. The first channel signal 210a
is also designated with x1 (t) and the second channel signal 210b is also designated with x2
(t). It should also be noted that the first channel signal 210a and the second channel signal
210b may represent the multi-channel microphone signal 110, which is input into the
apparatus 100 according to Fig. 1.
The two-channel audio signal provider 240 receives the first channel signal 210a and the
second channel signal b and typically also receives the enhancement filter parameter
information 232. The two-channel audio signal provider 240 may, for example, perform
the functionality of the optional pre-processing 150 and of the filter 140, to provide the two
channel audio signal 212 which is represented by a first channel signal 212a and a second
channel signal 212b. The two-channel audio signal 212 may be equivalent to the enhanced
downmix signal 112 output by the apparatus 100 of Fig. 1.
The signal analyzer 220 may be configured to receive the first channel signal 210a and the
second channel signal 210b. Also, the signal analyzer 220 may be configured to obtain a
component energy information 122a and a direction information 122b on the basis of the
two-channel microphone signal 210, i.e., on the basis of the first channel signal 210a and
the second channel signal 210b. Preferably, the signal analyzer 220 is configured to obtain
the component energy information 122a and the direction information 122b such that the
component energy information 122a described estimates of energies (or, equivalently, of
powers) of a direct sound component of the two-channel microphone signal and of a
diffuse sound component of the two-channel microphone signal, and such that the direction
information 122 describes an estimate of a direction from which the direct sound
component of the two-channel microphone signal 210a, 210b originates. Accordingly, the
signal analyzer 220 may take the functionality of the spatial analyzer 120, and the
component energy information 122a and the direction information 122b may be equivalent
to the spatial cue parameters 122. The component energy information 122a may be
equivalent to the direct sound power information and the diffuse sound power information.
The processor 216 also comprises the spatial side information generator 260 which
receives the component energy information 122a and the direction information 122b from
the signal analyzer 220. The spatial side information generator 260 is configured to
provide, on the basis thereof, the spatial cue information 262. Preferably, the spatial side
information generator 260 is configured to map the component energy information 122a of
the two-channel microphone signal 210a, 210b and the direction information 122b of the
two-channel microphone signal 210a, 210b onto the spatial cue information 262.
Accordingly, the spatial side information 262 is obtained such that the spatial cue
information 262 describes a set of spatial cues associated with an upmix audio signal
having more than two channels.
The processor 216 allows for a computationally very efficient computation of the spatial
cue information 262, which is associated with an upmix audio signal having more than two
channels, on the basis of a two-channel microphone signal 210a, 210b. The signal analyzer
220 is capable of extracting a large amount of information from the two-channel
microphone signal, namely the component energy information 122a describing both an
estimate of an energy of a direct sound component and an estimate of an energy of a
diffuse sound component, and the direction information 122b describing an estimate of a
direction from which the direct sound component of the two-channel microphone signal
originates. It has been found that this information, which can be obtained by the signal
analyzer 220 on the basis of the two-channel microphone signal 210a, 210b, is sufficient to
derive the spatial cue information 262 even for an upmix audio signal having more than
two channels. Importantly, it has been found that the component energy information 122a
and the direction information 122b are sufficient to directly determine the spatial cue
information 262 without actually using the upmix audio channels as an intermediate
quantity.
Moreover, the processor 216 comprises a filter calculator 230 which is configured to
receive the component energy information 122a and the direction information 122b and to
provide, on the basis thereof, the enhancement filter parameter information 232.
Accordingly, the filter calculator 230 may take over the functionality of the filter calculator
130.
To summarize the above, the apparatus 200 is capable to efficiently determine both the
enhanced downmix signal 212 and the spatial cue information 262 in an efficient way,
using the same intermediate information 122a, 122b in both cases. Also, it should be noted
that the apparatus 200 is capable of using a spatially small microphone arrangement 205 in
order to obtain both the (enhanced) downmix signal 212 and the spatial cue information
262. The downmix signal 212 comprises a particularly good spatial separation
characteristic, despite the usage of the small microphone arrangement 205 (which may be
part of the apparatus 200 or which may be external to the apparatus 200 but connected to
the apparatus 200) because of the computation of the enhancement filter parameters 232 by
the filter calculator 230. Accordingly, the (enhanced) downmix signal 212 may be wellsuited
for a spatial rendering (for example, using an MPEG Surround decoder) when taken
in combination with the spatial cue information 262.
To summarize, Fig. 2 shows a block schematic diagram of a spatial audio microphone
approach. As can be seen, the stereo microphone input signals 210a (also designated with
xi (t)) and 210b (also designated with x2 (t)) are used in the block 216 to compute the set of
spatial cue information 262 associated with a multi-channel upmix signal (for example, the
two-channel audio signal 212). Furthermore, a two-channel downmix signal 12 is
provided.
In the following sections, the required steps to determine the spatial cue information 262
based on an analysis of the stereo microphone signals will be summarized. Here, reference
will be made to the presentation in reference [2].
3. Stereo Signal Analysis
In the following, a stereo signal analysis will be described which may be performed by the
spatial analyzer 120 or by the signal analyzer 220. It should be noted that in some
embodiments, in which there are more than two microphones used and in which there are
more than two channel signals of a multi-channel microphone signal, an enhanced signal
analysis may be used.
The stereo signal analysis described herein may be used to provide the spatial cue
parameters 122, which may take the form of the component energy information 122a and
the direction information 122b. It should be noted that the stereo signal analysis may be
performed in a time-frequency domain. Accordingly, the channel signals 210a, 210b of the
multi-channel microphone signal 110, 210 may be transformed into a time-frequency
domain representation for the purpose of the further analysis.
The time-frequency representation of the microphone signals t and x2(t) are Xi(k, i) and
X2(k, i), where k and i are time and frequency indices. It is assumed that X^k, i) and X (k,
i) can be modeled as
·
X k a k i)S(k, i) + N - (k i )
0 )
where a(k, i) is a gain factor, S(k, i) is the direct sound in the left channel, and Ni(k, i) and
N2(k, i) represent diffuse sound.
The spatial audio coding (SAC) downmix signal 112, 212 and side information 262 are
computed as a function of a, E{SS }, E{NiNi }, and E{N2N2 }, where E{.} is a short-time
averaging operation, and where * denotes complex conjugate. These values are derived in
the following.
From (1) it follows that
E M = E{SS*} + E { V V }
i XiX: } = < \ S ' } ÷ . ' , } . (2)
It should be noted here that E{SS*} may be considered as a direct sound power information
or, equivalently, a direct sound energy information, and that E{N1Ni*} and E{N2N *} may
be considered as a diffuse sound power information or a diffuse sound energy information.
E{SS } and E NiNi } may be considered as a component energy information a may be
considered as a direction information.
It is assumed that the amount of diffuse sound in both microphone signals is the same, i.e.,
E{N } = E{N N2 } = E{NN } and that the normalized cross-correlation coefficient
between N an is d f, i.e.,
E{A i¾}
l iff - (3)
iff may, for example, take a predetermined value, or may be computed according to some
algorithm.
Given these assumptions, (2) can be written as
E ί = E{ *} + E{iYN*}
E{X X *} = a2E{SS*} + E{NN*}
E ¾ = oE{SS*} + E{NN*} . (4)
Elimination of E {SS*} and a in (2) yields the quadratic equation
AE{iViV*} 2 + BE{NN*}+C =
4 - ! <¾ .
B = E 1 ί - E - E M
Then E {N } is one of the two solutions of (5), the physically possible one, i.e.,
2 4
The other solution of (5) yields a diffuse sound power larger than the microphone signal
power, which is physically impossible.
Given (7), it is easy to compute a and E {SS }:
{C C x } - E{NN*}
E{ } - E{iVX*}
E\SS = {C C - N N *}
2E {5 *} = E . (8)
As discussed in reference [2], the direction-of-arrival a (k, i) of direct sound can be
determined as a function of the estimated amplitude ratio a (k, i),
n ( -. i ) ( / (/ ·. ) ). (9)
The specific mapping depends on the directional characteristics of the stereo microphones
used for sound recording.
4. Generation of Spatial Side Information
In the following, the generation of the spatial cue information 262, which may be provided
by the spatial side information generator 260, will be described. However, it should be
noted that the generation of spatial side information in the form of the spatial cue
information 262 is not a necessary feature of embodiments of the present invention.
Accordingly, it should be noted that the generation of the spatial side information can be
omitted in some embodiments. Also, it should be noted that different methods for
obtaining the spatial cue information 262, or any other spatial side information, may be
used.
Nevertheless, it should also be noted that the generation of the spatial side information
which is discussed in the following maybe considered as a preferred concept for generating
a spatial cue information.
Given the stereo signal analysis results 122a, 122b, i.e. the parameters a respectively a
according to equation (9), E{SS*}, and E{NN*}, SAC decoder compatible spatial
parameters are generated, for example, by the spatial side information generator 260. It has
been found that one efficient way of doing this is to consider a multi-channel signal model.
As an example, we consider the loudspeaker configuration as shown in Fig. 4 in the
following, implying:
L{k,i) = (k,i)S (
B(k. i ) = g {k i S k . n -r k . i) /,
Cik. i ) = .\". : .V, .Y.; }
where,
d = E { iX }E ¾ - ' , E {A'
2A }
wherein
designates a first channel signal of the multi-channel microphone signal,
X2 designates a second channel signal of the multi-channel microphone signal,
E{.} designates a short-time averaging operation,
1/104146 PCT/EP2011/052246
designates a complex conjugate operation,
E {XiYi }, E {X2Y }, E { U 2 } and E {X2Y2 } designate cross-correlation values
between channel signals X X2 of the multi-channel microphone signal and desired
channel signals Y Y2 of the enhanced downmix signal.
The apparatus according to one of claims 1 to 9, wherein the filter calculator (130;
230; 316) is configured to calculate the enhancement filter parameters Hj 1(k,i) to
Hj (k,i) such that channel signals Yj (k,i) of the enhanced downmix signal (112;
212; 312) obtained by filtering the channel signals (Xl X2) of the multi-channel
microphone signal in accordance with the enhancement filter parameters
approximate, with respect to a statistical measure of similarity, desired channel
signals Yj(k,i) defined as
K-YJ(k,i) =å mJlZl (k ).
1=0
with
Zl (k, i) =g k, i)S(k, i) +h k,i N (k, i).
wherein gi are gain factors, which are dependent on the direction information (a, a)
and which represent desired contributions of a direct sound component ( S ) of the
multi-channel microphone signal ( 110; 210; 310) to a plurality of loudspeaker
signals (Zi);
wherein h are predetermined values describing desired contributions of a diffuse
sound component (N ) of the multi-channel microphone signal ( 110; 210; 310) to a
plurality of loudspeaker signals.
The apparatus according to one of claims 1 to 10, wherein the filter calculator (130;
230; 316) is configured to evaluate a Wiener-Hopf equation to derive the
enhancement filter parameters (132; 232; 332; H H ; H Hl ; H2 , H2 2),
wherein the Wiener-Hopf equation describes a relationship between correlation
values E{XiXi }, E{XiX2
*}, E{X2X * , E{X2X2
*}, which correlation values
1/104146 PCT/EP2011/052246
describe a relationship between different channel pairs of the multi-channel
microphone signal, enhancement filter parameters (H , Hl , H2 1, H2 2) and desired
cross-correlation values (E{XiYi*}, E{X2Yi*}, E{XiY 2*}, E{X2Y2*}) between
channel signals (X 5 X2) of the multi-channel microphone signal ( 110; 210; 310)
and desired channel signals (Yi,Y 2) of the downmix signal.
The apparatus according to one of claims 1 to 11, wherein the filter calculator (130;
230; 316) is configured to calculate the enhancement filter parameters (132; 232;
332) in dependence on a model of desired downmix channels.
The apparatus according to one of claims 1 to 12, wherein the filter calculator (130;
230; 316) is configured to selectively perform a single-channel filtering, in which a
first channel ( ) of the enhanced downmix signal ( 112; 212; 312) is derived by a
filtering of a first channel (X^ of the multi-channel microphone signal ( 110; 210;
310) and in which a second channel ( 2 ) of the enhanced downmix signal is
derived by a filtering of a second channel (X2) of the multi-channel microphone
signal while avoiding a cross talk from the first channel of the multi-channel
microphone signal to the second channel of the enhanced downmix signal and from
the second channel of the multi-channel microphone signal to the first channel of
the enhanced downmix signal,
or a two-channel filtering in which a first channel { Y ) of enhanced downmix signal
is derived by filtering a first and a second channel (X \ , X2) of the multi-channel
microphone signal, and in which a second channel ( Y ) of the enhanced downmix
signal is derived by filtering a first and a second channel (X \ , X2) of the multi¬
channel microphone signal,
in dependence on a correlation value describing a correlation between the first
channel ( of the multi-channel microphone signal and the second channel (X2)
of the multi-channel microphone signal.
A method for generating an enhanced downmix signal on the basis of a multi¬
channel microphone signal, the method comprising:
computing a set of spatial cue parameters comprising a direction information
describing a direction-of-arrival of a direct sound, a direct sound power information
1/104146 PCT/EP2011/052246
and a diffuse sound power information on the basis of the multi-channel
microphone signal;
calculating enhancement filter parameters in dependence on the direction
information describing the direction-of-arrival of the direct sound, in dependence
on the direct sound power information and in dependence on the diffuse sound
power information; and
filtering the microphone signal, or a signal derived therefrom, using the
enhancement filter parameters, to obtain the enhanced downmix signal.
A computer program for performing the method according to claim 14 when the
computer program runs on a computer.
| # | Name | Date |
|---|---|---|
| 1 | 2358-KOLNP-2012-(24-08-2012)-FORM-5.pdf | 2012-08-24 |
| 1 | 2358-KOLNP-2012-RELEVANT DOCUMENTS [08-09-2023(online)].pdf | 2023-09-08 |
| 2 | 2358-KOLNP-2012-(24-08-2012)-FORM-3.pdf | 2012-08-24 |
| 2 | 2358-KOLNP-2012-RELEVANT DOCUMENTS [12-09-2022(online)].pdf | 2022-09-12 |
| 3 | 2358-KOLNP-2012-IntimationOfGrant28-10-2020.pdf | 2020-10-28 |
| 3 | 2358-KOLNP-2012-(24-08-2012)-FORM-2.pdf | 2012-08-24 |
| 4 | 2358-KOLNP-2012-PatentCertificate28-10-2020.pdf | 2020-10-28 |
| 4 | 2358-KOLNP-2012-(24-08-2012)-FORM-1.pdf | 2012-08-24 |
| 5 | 2358-KOLNP-2012-Annexure [03-10-2020(online)].pdf | 2020-10-03 |
| 5 | 2358-KOLNP-2012-(24-08-2012)-CORRESPONDENCE.pdf | 2012-08-24 |
| 6 | 2358-KOLNP-2012.pdf | 2012-08-31 |
| 6 | 2358-KOLNP-2012-Written submissions and relevant documents [03-10-2020(online)].pdf | 2020-10-03 |
| 7 | 2358-KOLNP-2012-FORM-26 [16-09-2020(online)].pdf | 2020-09-16 |
| 7 | 2358-FORM-18-KOLNP-2012-FORM-18.pdf | 2012-09-10 |
| 8 | 2358-KOLNP-2012-Correspondence to notify the Controller [15-09-2020(online)].pdf | 2020-09-15 |
| 8 | 2358-KOLNP-2012-(25-10-2012)-PA.pdf | 2012-10-25 |
| 9 | 2358-KOLNP-2012-(25-10-2012)-CORRESPONDENCE.pdf | 2012-10-25 |
| 9 | 2358-KOLNP-2012-US(14)-HearingNotice-(HearingDate-17-09-2020).pdf | 2020-08-02 |
| 10 | 2358-KOLNP-2012-(25-10-2012)-ASSIGNMENT.pdf | 2012-10-25 |
| 10 | 2358-KOLNP-2012-FORM 3 [09-06-2020(online)].pdf | 2020-06-09 |
| 11 | 2358-KOLNP-2012-(22-02-2013)-CORRESPONDENCE.pdf | 2013-02-22 |
| 11 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [09-05-2019(online)].pdf | 2019-05-09 |
| 12 | 2358-KOLNP-2012-(22-02-2013)-ANNEXURE TO FORM 3.pdf | 2013-02-22 |
| 12 | 2358-KOLNP-2012-ABSTRACT [18-12-2018(online)].pdf | 2018-12-18 |
| 13 | 2358-KOLNP-2012-(13-05-2013)-FORM 3.pdf | 2013-05-13 |
| 13 | 2358-KOLNP-2012-CLAIMS [18-12-2018(online)].pdf | 2018-12-18 |
| 14 | 2358-KOLNP-2012-(13-05-2013)-CORRESPONDENCE.pdf | 2013-05-13 |
| 14 | 2358-KOLNP-2012-DRAWING [18-12-2018(online)].pdf | 2018-12-18 |
| 15 | 2358-KOLNP-2012-FER_SER_REPLY [18-12-2018(online)].pdf | 2018-12-18 |
| 15 | Other Patent Document [10-09-2016(online)].pdf | 2016-09-10 |
| 16 | 2358-KOLNP-2012-PETITION UNDER RULE 137 [17-12-2018(online)].pdf | 2018-12-17 |
| 16 | Other Patent Document [31-03-2017(online)].pdf | 2017-03-31 |
| 17 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [16-09-2017(online)].pdf | 2017-09-16 |
| 17 | 2358-KOLNP-2012-FORM 4(ii) [16-11-2018(online)].pdf | 2018-11-16 |
| 18 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [09-08-2018(online)].pdf | 2018-08-09 |
| 18 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [15-11-2017(online)].pdf | 2017-11-15 |
| 19 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [13-03-2018(online)].pdf | 2018-03-13 |
| 19 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [28-07-2018(online)].pdf | 2018-07-28 |
| 20 | 2358-KOLNP-2012-FER.pdf | 2018-05-18 |
| 21 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [13-03-2018(online)].pdf | 2018-03-13 |
| 21 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [28-07-2018(online)].pdf | 2018-07-28 |
| 22 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [09-08-2018(online)].pdf | 2018-08-09 |
| 22 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [15-11-2017(online)].pdf | 2017-11-15 |
| 23 | 2358-KOLNP-2012-FORM 4(ii) [16-11-2018(online)].pdf | 2018-11-16 |
| 23 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [16-09-2017(online)].pdf | 2017-09-16 |
| 24 | Other Patent Document [31-03-2017(online)].pdf | 2017-03-31 |
| 24 | 2358-KOLNP-2012-PETITION UNDER RULE 137 [17-12-2018(online)].pdf | 2018-12-17 |
| 25 | Other Patent Document [10-09-2016(online)].pdf | 2016-09-10 |
| 25 | 2358-KOLNP-2012-FER_SER_REPLY [18-12-2018(online)].pdf | 2018-12-18 |
| 26 | 2358-KOLNP-2012-(13-05-2013)-CORRESPONDENCE.pdf | 2013-05-13 |
| 26 | 2358-KOLNP-2012-DRAWING [18-12-2018(online)].pdf | 2018-12-18 |
| 27 | 2358-KOLNP-2012-(13-05-2013)-FORM 3.pdf | 2013-05-13 |
| 27 | 2358-KOLNP-2012-CLAIMS [18-12-2018(online)].pdf | 2018-12-18 |
| 28 | 2358-KOLNP-2012-(22-02-2013)-ANNEXURE TO FORM 3.pdf | 2013-02-22 |
| 28 | 2358-KOLNP-2012-ABSTRACT [18-12-2018(online)].pdf | 2018-12-18 |
| 29 | 2358-KOLNP-2012-(22-02-2013)-CORRESPONDENCE.pdf | 2013-02-22 |
| 29 | 2358-KOLNP-2012-Information under section 8(2) (MANDATORY) [09-05-2019(online)].pdf | 2019-05-09 |
| 30 | 2358-KOLNP-2012-(25-10-2012)-ASSIGNMENT.pdf | 2012-10-25 |
| 30 | 2358-KOLNP-2012-FORM 3 [09-06-2020(online)].pdf | 2020-06-09 |
| 31 | 2358-KOLNP-2012-(25-10-2012)-CORRESPONDENCE.pdf | 2012-10-25 |
| 31 | 2358-KOLNP-2012-US(14)-HearingNotice-(HearingDate-17-09-2020).pdf | 2020-08-02 |
| 32 | 2358-KOLNP-2012-(25-10-2012)-PA.pdf | 2012-10-25 |
| 32 | 2358-KOLNP-2012-Correspondence to notify the Controller [15-09-2020(online)].pdf | 2020-09-15 |
| 33 | 2358-FORM-18-KOLNP-2012-FORM-18.pdf | 2012-09-10 |
| 33 | 2358-KOLNP-2012-FORM-26 [16-09-2020(online)].pdf | 2020-09-16 |
| 34 | 2358-KOLNP-2012-Written submissions and relevant documents [03-10-2020(online)].pdf | 2020-10-03 |
| 34 | 2358-KOLNP-2012.pdf | 2012-08-31 |
| 35 | 2358-KOLNP-2012-(24-08-2012)-CORRESPONDENCE.pdf | 2012-08-24 |
| 35 | 2358-KOLNP-2012-Annexure [03-10-2020(online)].pdf | 2020-10-03 |
| 36 | 2358-KOLNP-2012-(24-08-2012)-FORM-1.pdf | 2012-08-24 |
| 36 | 2358-KOLNP-2012-PatentCertificate28-10-2020.pdf | 2020-10-28 |
| 37 | 2358-KOLNP-2012-IntimationOfGrant28-10-2020.pdf | 2020-10-28 |
| 37 | 2358-KOLNP-2012-(24-08-2012)-FORM-2.pdf | 2012-08-24 |
| 38 | 2358-KOLNP-2012-RELEVANT DOCUMENTS [12-09-2022(online)].pdf | 2022-09-12 |
| 38 | 2358-KOLNP-2012-(24-08-2012)-FORM-3.pdf | 2012-08-24 |
| 39 | 2358-KOLNP-2012-RELEVANT DOCUMENTS [08-09-2023(online)].pdf | 2023-09-08 |
| 39 | 2358-KOLNP-2012-(24-08-2012)-FORM-5.pdf | 2012-08-24 |
| 1 | 2358_20-10-2017.pdf |