Acoustic Echo Suppression Unit And Conferencing Front End

< Back

Acoustic Echo Suppression Unit And Conferencing Front End

Abstract: An acoustic echo suppression unit (210) according to an embodiment of the present invention comprises and input interface (230) for extracting a downmix signal (310) from an input signal (300), the input signal comprising the downmix signal (310) and parametric side information (320), wherein the downmix and the parametric side information together represent a multichannel signal, a calculator (220) for calculating filter coefficients for an adaptive filter (240), wherein the calculator (220) is adapted to determine the filter coefficients based on the downmix signal (310) and a microphone signal (340) or a signal derived from the microphone signal, and an adaptive filter (240) adapted to filter the microphone signal (340) or the signal derived from the microphone signal based on the filter coefficients to suppress an echo caused by the multichannel signal in the microphone signal (340).

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 October 2011

Publication Number

35/2016

Publication Type

INA

Invention Field

BIO-MEDICAL ENGINEERING

Status

Parent Application

Patent Number

Legal Status

Grant Date

2019-12-04

Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖERDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

HANSASTR. 27C, 80686 MÜNCHEN, GERMANY

Inventors

1. FABIAN KUECH

SCHUETZENWEG 13 91054 ERLANGEN, DE

2. MARKUS KALLINGER

SCHORLACHSTRASSE 23A 91058 ERLANGEN, DE

3. MARKUS SCHMIDT

ZEPPELINSTRASSE 28B 91052 ERLANGEN, DE

4. MERAY ZOURUB

NUERNBERGER STRASSE 81 91052 ERLANGEN, DE

5. MARCO DIATSCHUK

NEUSTADTER STRASSE 18 91085 WEISENDORF, DE

6. OLIVER MOSER

LACHNERSTRASSE 25 91058 ERLANGEN, DE

Specification

Description
Embodiments according to the present invention relate to an
echo suppression unit and a method for suppressing an acoustic
echo, which may be used, for instance, in hands-free
telecommunication systems or other acoustic systems that
include multichannel loudspeaker playback based on a
parametric representation of spatial sound.
Acoustic echoes arise from an acoustic coupling or feed-back
between loudspeakers and microphones of telecommunication
devices. This phenomenon is especially present in hands-free
operations. The acoustic feedback signal from the loudspeaker
is transmitted back to the far-end subscriber, who notices a
delayed version of his own speech. Echo signals represent a
very distracting disturbance and can even inhibit interactive,
full-duplex communication. Additionally, acoustic echoes can
result in howling effects and instability of the acoustic
feedback loop. In a full-duplex hands-free telecommunication
system, echo control is therefore advisable in order to cancel
the coupling between loudspeakers and microphones.
Fig. 9 illustrates the general acoustic echo control problem.
The far-end signal, emitted by a loudspeaker, travels to the
microphone directly, and through reflected paths. Thus, the
microphone does not only capture the local near-end speech but
also the echo which is thus fed back to the user on the far-
end.
A loudspeaker signal x (n) is provided to a loudspeaker 100,
which transforms the loudspeaker signal into an audible
oscillation of the medium surrounding the loudspeaker 100. As
indicated in Fig. 9, microphone 110 may receive the emitted
sound by the loudspeaker 100, which is indicated in Fig. 9 by

a curved vector, wherein y(n) denotes a feedback signal from
the loudspeaker 100 to the microphone 110.
Apart from the feedback signal y(n), the microphone 110 also
records an additional sound signal w(n) , which may for
instance represent speech by a user. Both acoustic signals are
recorded by the microphone 110 and provided, as a microphone
signal z (n), to an echo removal unit 120. The echo removal
unit 120 also receives the loudspeaker signal x (n). It outputs
a signal in which - ideally - the contribution from the
loudspeaker signal x(n) is removed from the recorded signal or
the microphone signal z(n) .
Hence, Fig. 9 illustrates the general setup of the acoustic
echo control problem. The loudspeaker signal x(n) is fed back
to the microphone signal z(n). An echo removal process removes
this echo while - ideally - letting through the desired local
near-end signal w{n).
Acoustic echo control represents a well-known problem and
various methods to remove the acoustic echoes have been
proposed [13]. Below, we briefly recall the approaches to
acoustic echo suppression (AES) as, e.g., presented in [8, 9],
as they are most suitable in the considered context of spatial
audio communication.
When transmitting or playing back audio signals, multichannel
systems are often used. In these systems multiple loudspeakers
are used to play back sound and/or multiple microphones are
used to record spatial sound. Such multichannel systems are,
for instance, used in spatial audio teleconferencing systems
that do not only transmit audio signals of the different
parties, but also preserve spatial information of the
recording scenario [12] . In other systems, the spatial
information can be provided artificially or changed
interactively [5].

In case that spatial audio is applied in telecommunication
scenarios, an efficient representation of the multichannel
audio signals should be used, while still assuring high audio
quality. Parametric spatial audio coding represents a suitable
approach to address this challenge. Below, we present
practical methods that follow the parametric spatial audio
coding paradigm and are especially important in the context of
communication.
While multichannel systems as, for instance, the previously
mentioned spatial audio coding provide the opportunity of
transmitting a plurality of audio signals in a very efficient
and bandwidth-saving manner, a straightforward implementation
of an echo removal or echo suppression process into such
multichannel systems requires an application to each and every
microphone signal based on each and every loudspeaker signal
as output by the multichannel system. This, however, may
represent a significant, approximately exponentially growing
computational complexity simply due to the high number of
microphone and/or loudspeaker signals to be processed.
Accordingly, this may require additional costs due to a higher
energy consumption, the necessity for a higher data
processibility and, eventually, also slightly increased delay.
It is therefore an object of the present invention to provide
an acoustic echo suppression unit and a conferencing front-end
which allow a more efficient acoustic echo suppression.
This object is achieved by an acoustic echo suppression unit
according to claim 1, a method for suppressing an acoustic
according to claim 8, a conferencing front-end according to
claim 10, , a method for providing loudspeaker signals and a
microphone signal according to claim 14, or a computer program
according to claim 15.
Embodiments according to the present invention are based on
the finding that a more efficient acoustic echo suppression is
achievable by extracting a downmix signal from an input signal

comprising the downmix signal and parametric side information,
calculating filter coefficients for an adaptive filter based
on the downmix signal and a microphone signal or a signal
derived from a microphone signal, and filtering the microphone
signal or the signal derived from the microphone signal based
on the calculated filter coefficients. In other words, in the
case of a multichannel system based on a downmix signal and
parametric side information forming the input signal, wherein
the parametric side information together with the downmix
signal represent a multichannel signal, the echo suppression
may be done based on the downmix signal.
While employing an embodiment according to the present
invention, it may be, therefore, possible to avoid decoding
the input signal into the multichannel signal and afterwards
performing the acoustic echo suppression. It may therefore be
possible to reduce the computational complexity significantly
since the number of signals is drastically reduced compared to
a multichannel system as previously described. By employing an
embodiment according to the present invention it is possible
to perform the acoustic echo suppression on the basis of the
downmix signal comprised in the input signal.
In further embodiments according to the present invention, the
echo suppression may be performed based on reference power
spectra, which may be determined based on the received downmix
signal and the microphone signal or a signal derived from the
microphone signal. Optionally, the reference power spectrum
derived from the multichannel signal may be delayed by a delay
value, which may, for instance, be determined on the basis of
a correlation value.
Accordingly, a conferencing front-end according to an
embodiment of the present invention does not only comprise an
acoustic echo suppression unit according to an embodiment of
the present invention but also a multichannel decoder and a
least one microphone unit, wherein the multichannel decoder is
adapted to decode the downmix signal and the parametric side

information into a plurality of loudspeaker signals. The at
least one microphone unit is further adapted to provide the
microphone signal to the acoustic echo suppression unit. In
further embodiments of conferencing front-end the input
interface is further adapted to extract the parametric side
information, wherein the multichannel decoder comprises an
upmixer and a parameter processor. The parameter processor is
then adapted to receive the parameter side information from
the input interface and to provide an upmix control signal.
The upmixer is adapted to receive the downmix signal from the
input interface and the upmix control signal from the
parameter processor and is adapted to provide the plurality of
loudspeaker signals based on the downmix signal and the upmix
control signal. Hence, in embodiments according to the present
invention, the input interface of the acoustic echo
suppression unit may be that of the multichannel decoder or
both the multichannel decoder and the acoustic echo
suppression unit may share a common input interface.
Furthermore, embodiments according to the present invention
may optionally also comprise a corresponding multichannel
encoder adapted to encode a plurality of audio input signals
into a further downmix signal and further parametric side
information together representing the plurality of audio input
signals, wherein the microphone signal of the at least one
microphone unit is one of the audio input signals of the
plurality of audio input signals. In this case, the acoustic
echo suppression unit comprised in the conferencing front-end
is adapted to receive the further downmix as the derived from
the microphone signal.
In other words, as will be presented below, the approach
according to embodiments of the present invention allows
efficiently combining acoustic echo suppression and parametric
spatial audio coding.

Embodiments according to the present invention will be
described hereinafter making reference to the appended
drawings.
Fig. 1 shows a block diagram of a conferencing front-end
comprising an acoustic echo suppression unit
according to an embodiment of the present invention;
Fig. 2 illustrates a general structure of a parametric
spatial audio encoder;
Fig. 3 illustrates a general structure of a parametric
spatial audio decoder;
Fig. 4 illustrates a signal processing chain used in the
MPEG surround (MPS) decoder;
Fig. 5 illustrates a general structure of a spatial audio
object coding (SAOC) decoder;
Fig. 6a illustrates a monodownmix based transcoder for
transcoding SAOC-data to MPS-data;
Fig. 6b illustrates a stereo downmix based transcoder for
transcoding SAOC-data to MPS-data;
Fig. 7 shows a conferencing front-end according to an
embodiment of the present invention to illustrate the
proposed efficient approach of acoustic echo
suppression based on the downmix of parametric
spatial audio coders;
Fig. 8 illustrates a further embodiment according to the
present invention in the form of a conferencing
front-end comprising an acoustic echo suppression
unit according to an embodiment of the present
invention;

Fig. 9 illustrates the general setup of an acoustic echo
control problem.
With reference to Figs. 1-9 in the following different
embodiments according to the present invention and underlying
technologies will be outlined and described in more detail.
However, before introducing acoustic echo suppression
techniques for single channel acoustic echo suppression and
multichannel acoustic echo suppression, an embodiment
according to the present invention in the form of a
conferencing front-end along with an acoustic echo suppression
unit will be described first.
Fig. 1 shows a block diagram of a conferencing front-end 200
comprising, as a central component, an acoustic echo
suppression unit 210 according to an embodiment of the present
invention. The acoustic echo suppression unit 210 comprises a
calculator 220, an input interface 230 and an adaptive filter
240. The conferencing front-end 200 further comprises a
multichannel decoder 250, which is output-wise coupled to a
plurality of loudspeakers 100, of which exemplarily four
loudspeakers 100-1, ..., 100-4 are shown. The conferencing
front-end further comprises a microphone or microphone unit
110.
To be a little more specific, an input signal 300 is provided,
which comprises a downmix signal 310 and parametric side
information 320. The input interface 230 separates or extracts
from the input signal in the embodiment shown in Fig. 1 both,
the downmix signal 310 and the parametric 'side information
320. In the embodiment shown in Fig. 1, the input interface
230 provides the downmix signal 310 along with the parametric
side information 320 to multichannel decoder 250.
The multichannel decoder 250 is adapted to decode the downmix
signal 310 and the parametric side information 320 into a
plurality of loudspeaker signals 330, of which, for the sake
of simplicity only, in Fig. 1 only one is labeled as such.

Since the loudspeakers 100 are coupled to appropriate outputs
of the multichannel decoder 250, the loudspeakers 100 receive
the individual loudspeaker signals 330 and transform them back
into audible acoustic signals.
The calculator 220 is furthermore coupled to an output of the
input interface 230 in which the downmix channel 310 is
available. Hence, the calculator 220 is adapted to receive the
downmix signal 310. However, in the embodiment shown in Fig.
1, the parametric side information 320 of the input signal 300
are not provided to the calculator 220. In other words, in
embodiments according to the present invention, the calculator
220 may use the downmix signal alone in terms the signals
comprised in the input signal.
The microphone 110 is output-wise coupled to both, the
calculator 220 and the adaptive filter 240. As consequence,
the calculator 220 is also adapted to receive a microphone
signal 340 as provided by the microphone 110. Based on the
microphone signal 340 and the downmix signal 310, the
calculator 220 is adapted to determine filter coefficients for
the adaptive filter 240 and to provide a corresponding filter
coefficient signal 350 to the adaptive filter 240 on the basis
of which the adaptive filter 240 filters the incoming
microphone signal 340. The adaptive filter 240 provides at its
output an output signal, which is an echo suppressed version
of the microphone signal 340.
Further details concerning the mode of operation of a possible
implementation of a calculator 220 will be given below.
Although the input interface 230 is drawn schematically in
Fig. 1 as an individual component of the acoustic echo
suppression unit 210, the interface 230 may also be part of
the decoder 250 or may be shared by both, the decoder 250 and
the acoustic echo suppression unit 210. Furthermore, it is
possible to implement embodiments according to the present
invention, for instance, by implementing an input interface

230 which is capable of extracting the downmix signal 310
alone. In this case, the input signal 300 would be provided to
the multichannel decoder 250, which in turn comprises an
appropriate interface being capable of extracting both, the
downmix signal 310 and parametric side information 320. In
other words, it may be possible to implement an acoustic echo
suppression unit 210 with an input interface 230 which is not
capable of extracting the parametric side information but only
the downmix signal 310.
Embodiments according to the present invention represent an
efficient method for the suppression of acoustic echoes for
multichannel loudspeaker systems used in spatial audio
communication systems. The method is applicable in case that
the spatial audio signals are represented by a downmix signal
and corresponding parametric side information or metadata.
These parameters capture the information that is required for
computing the loudspeaker signals on the reproduction side.
The invention exploits the fact that the echo suppression can
be performed directly based on the received downmix signal
rather than explicitly computing the loudspeaker signals
before they are input into the acoustic echo suppression
(AES). Analogously, the echo components can also be suppressed
in the downmix signal of the spatial audio signal to be
transmitted to the far-end. This approach typically is also
more efficient than applying the echo suppression to each of
the recorded signals of the microphones used to capture the
observed sound field.
In the following, summarizing reference signs will be used for
object which appear more than once in an embodiment or a
figure, but which are nevertheless equal or similar at least
in terms of some of their features or structures. For
instance, in Fig. 1 the four loudspeakers 100-1, ..., 100-4
have been denoted with individual reference signs, however,
when their basic properties or features as being loudspeakers
are discussed, reference was made to the "loudspeakers 100".

Furthermore, to simplify the description, similar or equal
objects will be denoted with the same or similar reference
signs. Comparing Figs. 1 and 9, the loudspeakers have been
referenced with the equal reference sign 100. Objects denoted
by the same or similar reference signs may be implemented
identically, similarly or differently. For instance, in some
implementations it might be advisable to implement different
types of loudspeakers 100 for the different loudspeaker
signals, while in different applications the loudspeakers may
be implemented identically. Therefore, object denoted by the
same or similar reference sign may optionally be implemented
identically or similarly.
Moreover, it should be noted that when several objects appear
more than once in a figure, the depicted number of objects is
typically for illustrative purposes only. Deviations from the
number may be made either by increasing or decreasing the
number. For instance, Fig. 1 shows four loudspeakers 100-1,
..., 100-4. However, in different embodiments more or less
loudspeakers 100 may equally well be implemented. For
instance, in the case of a "5.l"-system, 5 loudspeakers along
with a sub-woof loudspeaker are typically used.
In the following we briefly recall the general approach of
acoustic echo suppression. Thereby, we basically follow the
method as described in [8, 9].
As illustrated in Fig. 9, the microphone signal z(n) is composed
of the acoustic echo signal y(n) that results from the feedback
of the loudspeaker signal x(n) and the near—end signal w(n).
Here, we assume that the room impulse response can be
expressed as a combination of a direct propagation path
corresponding to a delay of d samples between the loudspeaker
signal x (n) and the microphone signal z(n), and a linear
filter gn which models the acoustic properties of the
enclosure.
Then, the microphone signal z(n) can be expressed by

where * denotes convolution. The short-time Fourier transform
(STFT) domain representation of equation (1) is given by

where k is a block time index and m denotes a frequency index.
Xd(k, m) is defined as the STFT-domain correspondence of the
delayed loudspeaker signal. The first term on the right hand
side of equation (2) represents the echo components Y(k,m),
where

It should be noted that the following discussion of acoustic
echo suppression refers to the STFT as spectral representation
of signals. However, the concept can obviously also be applied
to any other suitable frequency subband representation
instead.
The acoustic echo suppression is performed by modifying the
magnitude of the STFT of the microphone signal Z(k,m) , while
keeping its phase unchanged. This can be expressed by

where H(krm) represents a real-valued, positive attenuation
factor. In the following we refer to H(k,m) as echo
suppression filter (ESF).

A practical approach to the computation of the echo
suppression filter H(k,m) is to use a parametric spectral
subtraction approach analogously to [7]:

where a, (3, and y represent design parameters to control the
echo suppression performance.
Typical values for (3 and y are values around 2, while in some
applications a is chosen to be the inverse of γ. In other
words, when choosing typical values of β =2 and γβ=2, a is
typically chosen to be 0.5 (= 1/2).
The estimate of the power spectrum of the echo signal can be
obtained by

where |G(k, m)|2 represents an estimate of the echo power transfer
function |G(k, m)|2 . Alternatively, a complex spectrum based
approach according to

can be used.
Note that in practice both the echo power transfer function
|G(k, m)j and the delay d are not known and have to be replaced
by corresponding estimates, as discussed next. In the

following we will refer to |G(k, m)|2as echo estimation filter
(EEF).
One possible method to estimate the EEF has been proposed in
[8] . Assuming that the near—end speaker is silent, equation
(2) implies that the EEF may be estimated by

where * denotes the complex conjugate operator, and E{. . . •}
denotes the expectation operator. The expectation operator may
be approximated by a floating average of its argument.
The above technique effectively estimates the echo path
transfer function and takes the magnitude thereof to obtain
the real-valued EEF. Whenever the phase changes abruptly, such
as during echo path changes, time drift, etc., this EEF
estimation may have to re-converge. To make equation (8)
insensitive to phase variations, it can be modified to be
computed from the power spectra rather than from the complex
spectra [6]:

In [6] it is shown that the estimate according to (9) is
biased. Thus, in [6] it is proposed to use another approach to
estimate the EEF, namely to estimate |G(k, m)|2 based on temporal
fluctuations of the power spectra of the loudspeaker and
microphone signals. The temporal fluctuations of the power
spectra may be computed according to

The estimation of the EEF is then performed analogously to
equation (9), but based on the fluctuating spectra of the
loudspeaker and the microphone:

It is important to note that the fluctuating power spectra are
only used for the estimation of |G(k, m)| . The computation of the
echo suppression filter H(k,m) is still based on the original
power spectra of the loudspeaker and microphone signals.
The delay value d can be estimated using the squared coherence
function with respect to the loudspeaker and microphone power
spectra according to

In general, the delay d can then be chosen different for each
frequency bin m. Here, however, we consider one single delay
for all frequencies. Therefore, we compute an echo prediction
gain ωd (k) as the mean of Td(k,m) over frequency

where M denotes the number of frequency bins. Then, d is
chosen such that the echo prediction gain is maximized, i.e.,

Alternatively to equation (15), the estimation of the delay
value d can also be performed with respect to the fluctuating
spectra, i.e., based on equations (10), (11).
Note that in practice, the mathematical expectation E{...},
used in the derivations above, may have to be replaced by
corresponding short-time or floating averages. To give an
example, we consider

The short—time average ΦAB(k, m) corresponding to ΦAB(k, m) can,
for instance, be obtained by recursive smoothing according to

The factor aavg determines the degree of smoothing over time
and may be adjusted to any given requirements.
In the following we discuss, how the single channel AES
described in the previous section can analogously be applied
to multichannel AES, too.
Let X1(k, m) denote the STFT-domain representation of the -th
loudspeaker signal. A joint power spectrum for all loudspeaker
channels is then computed by combining the power spectra of
the individual loudspeaker signals:

where L denotes the number of loudspeaker channels.
Alternatively, the joint power spectrum of the loudspeaker
signals may be obtained from adding the spectrum of each
loudspeaker signal and then taking the squared magnitude of
the joint spectrum:

Analogously, a joint power spectrum is computed for the
microphone channels according to

where Zp(k,m) represents the signal of the p-th microphone, and
P denotes the number of microphones.
As in case of the loudspeaker signals, the joint microphone
power spectrum can alternatively be computed according to

The desired model for the power spectrum of the echo is given
analogously to equation (2), when assuming statistical

independence of the loudspeaker signals and the near—end
signals:

where in the multichannel case the power spectra |x(k, m)| and
|z(k, m)|2 are given by equations (18) and (20), respectively.
For determining the echo estimation filter |G(k, m)|2 and the
delay value d, respectively, we may also apply the different
methods discussed in above, but using the joint loudspeaker
and microphone power spectra defined here.
The actual echo suppression is then performed for each
microphone signal separately, but by using the same echo
removal filters for each microphone channel:

In this section we will review some important examples of
parametric spatial audio representation and parametric spatial
audio coding. Thereby, we consider the approaches Directional
Audio Coding (DirAC) [12], MPEG Surround (MPS) [1], and MPEG
Spatial Audio Object Coding (SAOC) [5] . Before looking into
specific details of the different coding approaches, we
consider the basic encoder/decoder structures which are common
for all methods discussed here.
The general structure of a parametric spatial audio encoder is
illustrated in Fig. 2. Fig. 2 shows a multichannel or a
parametric spatial audio encoder 400. The encoder takes
multiple audio signals as input and outputs a downmix signal

of one or more channels and the corresponding parametric side
information. To be a little more specific, the multichannel
encoder 400 is provided with a plurality of input signals 410-
1, . .., 410-N, which may, in principle, be any audio signal.
Based on the input signals 410, the encoder 400 provides a
downmix signal 310 and parametric side information 320, which
together represent the plurality of input signals 410. In many
cases and implementations of a multichannel encoder 400 this
representation is typically not lossless.
The encoder takes as input multiple audio channels. Depending
on the actual coding approach, these audio input channels can
represent microphone signals [12], loudspeaker signals [10],
or the input signals correspond to so-called spatial audio
objects [5] . The output of the encoder is the downmix signal
310 and corresponding side information 320. The downmix signal
comprises one or more audio channels. The side information
includes parametric metadata, representing the observed sound
field, the relation between different input channels, or the
relation between different audio objects. The output of the
encoder, i.e. the combination of the downmix signal and the
side information, is called spatial audio stream or spatial
audio representation in the following.
The general structure of a corresponding parametric spatial
audio decoder is illustrated in Fig. 3. Fig. 3 shows a
(multichannel) decoder 250, which takes a downmix signal 310
and corresponding parametric side information 320 as input.
The multichannel decoder 250 outputs a plurality of output
signals 420-1, ..., 420-N, which may be, for instance,
loudspeaker signals (e.g. loudspeaker signals 330 as shown in
Fig. 1) corresponding to a desired playback configuration. As
can be seen, the decoder takes the spatial audio stream as
input. Based on the downmix signal and the metadata included
in the side information, the decoder computes loudspeaker
signals corresponding to a desired playback configuration.
Typical loudspeaker setups are, for instance, described in
[1].

One example of a parametric spatial audio coding scheme is
directional audio coding, which is also referred to as DirAC.
DirAC uses a parametric representation of a sound field using
the direction-of—arrival (DOA) and diffuseness of sound in
frequency subbands. Hence, it only takes features into account
that are relevant for human hearing. The DirAC approach is
based on the assumption that interaural time differences (ITD)
and the interaural level differences (ILD) are perceived
correctly, if the direction—of-arrival of a sound field is
correctly reproduced. Correspondingly, the interaural
coherence (IC) is assumed to be perceived correctly, if the
diffuseness of a sound field is reproduced correctly. In this
way the reproduction side only needs the direction and
diffuseness parameters and a mono microphone signal to
generate features that are relevant for human perception of
spatial audio at a given listening position with an arbitrary
set of loudspeakers.
In DirAC, the desired parameters (i.e. DOA φ(k,m) of sound and
the diffuseness ψ(k,m) in each frequency band) are estimated
via an energetic analysis of the sound field [12] based on B-
format microphone signals. B-format microphone signals
typically comprise an Omni directional signal W(k,m), and two
dipole signals (Ux(k, m), Uy(k, m)) corresponding to the x-, y—
direction of a Cartesian coordinate system. The B-format
signals may be directly measured using, for instance, sound
field microphones [2] . Alternatively an array of Omni
directional microphones can be used to generate the required
B-format signals [11].
On the reproduction side (decoder), the different loudspeaker
signals are computed based on a mono downmix signal together
with the direction and diffuseness parameters. The loudspeaker
signals are composed of signal components corresponding to
direct sound and to diffuse sound, respectively. The signal of
the pth loudspeaker channel can, for instance, be computed
according to

where ψ(k,m) denotes the diffuseness at frequency subband m and
block time index k. The panning gain gp(k,m) depends on both
the DOA of sound φ (k,m) and the position of the loudspeaker p
relative to the desired listening position. The operator
Dp{.. .} corresponds to a decorrelator. The decorrelator is
applied to the downmix signal W(k,m) when computing the pth
loudspeaker signal.
From the above discussion it follows that the microphone
signals (B—format or array of Omni directional microphones)
represent the input of the DirAC encoder 400. The output of
the encoder is given by the downmix signal W(k,m) and the
direction φ (k,m) and diffuseness (l(k,m)) parameters as side
information.
Correspondingly, the decoder 250 takes the downmix signal
W(k,m) and the parametric side information cp {k,m) and J{k,m)
as input to compute the desired loudspeaker signals according
to (24).
MPEG Surround (MPS) represents an efficient approach to high-
quality spatial audio coding [10]. A complete specification of
MPS can be found in [1] . In the following we will not look
into the details of MPS, but rather review those parts that
are relevant in the context of embodiments according to the
invention.
MPS exploits the fact that, from a perceptual point of view,
multichannel audio signals typically comprise significant
redundancy with respect to the different loudspeaker channels.
The MPS encoder takes multiple loudspeaker signals as input,
where the corresponding spatial configuration of the
loudspeakers has to be known in advance. Based on these input
signals, the MPS encoder 400 computes spatial parameters in
frequency subbands, such as channel level differences (CLD)

between two channels, inter channel correlation (ICC) between
two channels, and channel prediction coefficients (CPC) used
to predict a third channel from two other channels. The actual
MPS side information 320 is then derived from these spatial
parameters. Furthermore, the encoder 400 computes a downmix
signal which may comprise one or more audio channels.
In the mono case, a downmix signal B(k,m) obviously comprises
only one channel B{k,m) , whereas in the stereo case, the
downmix signal may be written as

where, for instance, Bi(k,m) corresponds to the left
loudspeaker channel and B2(k,m) denotes the right loudspeaker
channel of a common stereo loudspeaker configuration.
The MPS decoder 250 takes the downmix signal and the
parametric side information as input and computes the
loudspeaker signals 330, 420 for a desired loudspeaker
configuration. The general structure of the signal processing
chain used in the MPEG surround decoder is illustrated in Fig.
4 for the stereo case.
Fig. 4 shows a schematic representation of a MPEG surround
decoder 250. To the decoder 250 the downmix signal 310 and
parametric side information are provided. The downmix signal
310 comprises the downmix signal channels Bi(k,m) and B2(k,m),
which correspond to the left and right loudspeaker channels of
a common stereo configuration.
In a pre-mixing matrix 450 (Mi) the two channels of the downmix
signal 310 are transformed into an intermediate signal vector
V(krm). Parts of the components of the intermediate signal
vector V(k,m) are then provided to a plurality of
decorrelators 460-1, ..., 460-P that decorrelate the
respective components of the intermediate signal vector. The
signals provided by the decorrelators 460 along with the

undecorrelated signals or signal components of the
intermediate signal vector V(k,m) form a second intermediate
signal vector R(k,m), which in turn is provided to the post-
mixing matrix 470 (M2) . The post-mixing matrix 470 provides at
its output a plurality of loudspeaker signals 330-1, ..., 330-
P, which represent the outputs signals 420 in terms of the
decoder shown in Fig. 3.
The decoder 250 further comprises a parameter processor 480 to
which the parametric side information 320 are provided. The
parameter processor 480 is coupled to both, the pre-mixing
matrix 450 as well as the post-mixing matrix 470. The
parameter processor 480 is adapted to receive the parametric
side information 320 and to generate corresponding matrix
elements to be processed by the pre-mixing matrix 450 and the
post-mixing matrix 470. To facilitate this, the parameter
processor 480 is coupled to both the pre-mixing matrix 450 and
the post-mixing matrix 470.
As implied by Fig. 4, the decoding process may be written in
matrix notation according to

Following [1], Mi(Jc,i) denotes the pre—mixing matrix 450 and
M2{k,m) the post-mixing matrix 470. Note that the elements of
Mi{k,m) and M2{k,m) depend on the spatial side information and
the loudspeaker configuration used for playback, Which may be
provided by the parameter processor 480.
As can be seen from Fig. 4, the relation between the
intermediate signal vectors V{k,m) and R{k,m) are given as
follows: One part of the signal vector elements Vp(k,m) is kept
unchanged {Rp(k,m) = Vp{k,m))r while the other components of
R(k,m) are decorrelated versions of the corresponding elements

of \(k,m), i.e., Ri{k,m)= Di{(k,m)}, wherein D[{(k,m)} describes
a decorrelator operator. The elements of the signal vector
X(k,m) correspond to the multichannel loudspeaker signals
Xp(k,m) used for playback.
It should be noted that MPS assumes loudspeaker channels as
input, whereas in teleconferencing systems the input consists
of recorded microphone signals. A conversion of the microphone
input signal to corresponding loudspeaker channels may be
required before MPS can be applied for determining the desired
efficient spatial audio representation of the recorded sound.
One possible approach is to simply use multiple directional
microphones which are arranged such that the loudspeaker
channels can be directly computed by a combination of the
microphone input signals. Alternatively, a DirAC-based
computation of the loudspeaker channels may be applied,
comprising a direct connection of a DirAC encoder and a DirAC
decoder as described in the previous sections.
Spatial Audio Object Coding (SAOC) is based on the concept of
representing a complex audio scene by a number of single
objects together with a corresponding scene' description. In
order to implement an efficient way to achieve this goal, SAOC
applies techniques that are closely related to MPS [5] . As
before, we will only consider those parts of the SAOC concept
that are relevant in the context of this invention. More
details can be found, for instance, in [5].
The general structure of an SAOC encoder is shown in Fig. 2,
where the input signals 410 correspond to audio objects. From
these input signals 410, the SAOC encoder 400 computes a
downmix signal 310 (mono or stereo) along with corresponding
parametric side information 320 representing the relation of
the different audio objects in the given audio scene. Similar
to MPS, these parameters are computed for each block time
index and each frequency subband. These parameters include
Object Level Differences (OLD), Inter—Object Cross Coherence

(IOC), Object Energies (NRG), and other, downmix signal-
related measures and parameters [5].
The SAOC decoder 250 takes the downmix signal 310 together
with the corresponding side information 320 as input, and
outputs the loudspeaker channel signals for a desired
loudspeaker configuration. The SAOC decoder also uses the MPS
rendering engine for determining the final loudspeaker
signals. Note that in addition to the side information
generated by the SAOC encoder 400, the SAOC decoder 250 takes
also information of the loudspeaker configuration used for
rendering, or other interactive information with respect to
controlling specific audio objects, as input for computing the
final output signals. This is illustrated in Fig. 5.
Fig. 5 illustrates the general structure of a SAOC decoder
250. To the SAOC decoder 250 a downmix signal 310 along with
the parametric side information 320 are provided.
Additionally, the SAOC decoder 250 is also provided with
rendering or interaction information 490. As described above,
the SAOC decoder 250 takes the downmix signal 310, the
parametric side information 320 along with a
rendering/interaction parameter 4 90 to generate a plurality of
loudspeaker signals 330-1, ..., 330-N. These signals are
output by the SAOC decoder 250.
Let us now consider the SAOC decoder for the case of a mono
downmix signal and a stereo downmix signal, respectively.
Following [5] , the structure of the SAOC decoder is
illustrated in Fig. 6(a) for a mono downmix and in Fig. 6(b)
for the stereo case.
Fig. 6a illustrates more specific details concerning a mono
downmix-based transcoder, which may be used as an SAOC-to-MPS-
transcoder according to [5]. The system shown in Fig. 6a
comprises an MPEG surround decoder 250, to which a downmix
signal 310 and a MPEG surround bitstream as parametric side
information 320 is provided. The MPEG surround decoder 250

outputs in the situation shown in Fig. 6a at least five
loudspeaker signals 330-1, ..., 330-5. Optionally, the MPEG
surround decoder 250 may also output further loudspeaker
signals, such as a sub-woof-loudspeaker signal. However, a
corresponding sub-woof-loudspeaker is not shown in Fig. 6a for
the sake of simplicity, while corresponding loudspeakers 100-
1, ..., 100-5 for each of the loudspeakers 330 are shown in
Fig. 6a.
While the downmix bitstream 310 is directly provided to the
MPEG surround decoder 250, the parametric side information 320
is provided by SAOC-to-MPS transcoder 500, the transcoder 500
comprises SAOC parsing unit 510 to which an SAOC bitstream as
an input signal 520 is provided. The SAOC parsing unit 510
provides as one of its output signals information concerning a
number of objects 530.
The SAOC parsing unit 510 is furthermore coupled to a scene
rendering engine 540, which processes data received from the
SAOC parsing unit 510 based on a rendering matrix 550
generated by a rendering matrix generator 560 the
corresponding side information 320 for the MPEG surround
decoder 250. Accordingly, the scene rendering engine 540 and
its output at which the side information 320 are provided to
the MPEG surround decoder 250 also represent the output of the
transcoder 500.
The rendering matrix generator 560 is provided with
information concerning the playback configuration 570 as well
as with information concerning the object positions 580 on the
basis of which the rendering matrix generator 560 provides the
rendering matrix 550.
The mono downmix decoding comprises transcoding the SAOC side
information to MPS side information 520, based on the given
object positions 580 and the loudspeaker configuration 570
used for the playback. The so-determined MPS side information
320 is fed into the MPS decoder 250 together with the SAOC

mono downmix signal 310. Since the downmix signal 310 remains
unchanged, the computation of the loudspeaker signals can also
be expressed according to equations (26), (27), where the pre—
mixing matrix Mi{k,m) and the post—mixing matrix M2(k,m) are
determined from the SAOC-to-MPS transcoder.
Fig. 6b shows a similar SAOC-to-MPS transcoder 500 compared to
the corresponding transcoder 500 shown in Fig. 6a. Therefore,
reference is made to the description above. However, both the
system as well as the transcoder 500 differ mainly with
respect to the downmix signal 310, which is in the situation
depicted in Fig. 6b a stereo downmix signal. Accordingly, the
MPEG surround decoder 250 differs from the corresponding MPEG
surround decoder of Fig. 6a by the fact that the downmix
signal 310 comprises two channels such that the decoder 250 is
adapted to generate the loudspeaker signals 330 on the basis
of the side information 320 and the stereo downmix signal 310.
The system shown in Fig. 6b differs from the system shown in
Fig. 6a with respect to further details. The transcoder 500
further comprises a downmix transcoder 590 which receives and
original downmix signal 310' as well as control information
600 from the scene rendering engine 540. The downmix
transcoder 590 is therefore adapted to generate the downmix
signal 310 based on the control information 600 and the
original or incoming downmix signal 310'.
In the stereo case, the SAOC downmix signal 310' may not
represent a suitable input for the MPS decoder. An example for
such a situation is, if the signal components of one object
are included only in the left channel of the SAOC stereo
downmix 310', while it should be rendered to the right
hemisphere during the MPS mixing process [5] . Then, as shown
in Fig. 6(b), the SAOC downmix signal 310' has to be processed
by the so—called downmix transcoder 590 before it can be used
as input for the MPS decoder 250. The specific properties of
this processing stage depend on the actual SAOC side
information 520 and the playback configuration 570. Obviously,

the relation of the transcoded downmix signal 310 and the
loudspeaker channels 330 used for playback can then be
expressed by equations (26), (27), too.
It should be noted that SAOC assumes signals corresponding to
an ensemble of audio objects as input, whereas in
teleconferencing systems, the input typically comprises
recorded microphone signals. A conversion of the microphone
input signal to a corresponding spatial audio object
representation may be useful before SAOC can be applied for
determining the desired efficient spatial audio representation
of the recorded sound. A possible approach to determine
different audio objects from a microphone array input is given
by blind source separation techniques such as [3] . Based on
the microphone input signals, blind source separation methods
exploit the statistical independence of different audio
objects to estimate the corresponding audio signals. In case
that the configuration of the microphone array is known in
advance, additional spatial information with respect to the
audio objects can be determined, too [4].
For the sake of simplicity only, it should be noted that
throughout the description information and signals carrying
the respective information have been identified with the same
reference sign. Moreover, the signals and the data lines over
which same are transported have also been identified with the
same reference signs. Depending on the concrete implementation
of an embodiment of the present invention, information may be
exchanged between different units or objects by signals
transmitted directly over signal lines or by virtue of a
memory, a storage location or another mediator (e.g. a latch)
coupled in between the respective units or objects. For
instance, in the case of processor-based implementation,
information may be, for instance, stored in a memory
associated with the respective processor. Therefore,
information, pieces of information and signals may be
synonymously referred to.

Based on the discussion of acoustic echo suppression and
parametric spatial audio coding presented in the previous
sections, we now present a method for efficiently integrating
acoustic echo suppression (AES) into a spatial audio encoder/
decoder structure as used in spatial audio telecommunication
According to an embodiment of the present invention.
The general structure of the proposed approach is illustrated
in Fig. 7. Fig. 7 shows a conferencing front-end 200 according
to an embodiment of the present invention, wherein the
acoustic echo suppression is based on the downmix signals of
parametric spatial audio coders.
The conferencing front-end 200 as shown in Fig. 7 comprises an
acoustic echo suppression unit 210 according to an embodiment
of the present invention with an input interface 230, which is
coupled to an echo removal or echo suppression unit 700 such
that a downmix signal 310 comprised in an input signal 300
provided to the input interface 230 is provided thereto. In
the embodiment shown in Fig. 7, parametric side information
320, also separated from the input signal 300 by the input
interface 230, are not provided to the echo suppression unit
700.
Both, the downmix signal 310 as well as the parametric side
information 320 are provided to a multichannel decoder 250,
which is output-wise coupled to a plurality of loudspeakers
100-1, ..., 100-N. The decoder 220 provides to each of the
loudspeakers 100 a corresponding loudspeaker signal 330-1,
..., 330-N.
The conferencing front-end 200 further comprises a plurality
of microphones 110-1, ..., 110-K which provides acoustic input
signals to the conferencing front-end 200. In contrast, the
loudspeakers 100 provide the equivalent acoustic output. The
microphones 110 are coupled to a processing unit 710 and
further to an encoder 400, which is adapted to generate a
further downmix signal 720 and further parametric side

information 730 corresponding to the pre-processed microphone
signals received from the microphones 110. The echo
suppression unit 700 is coupled to the encoder 400 such that
the echo suppression unit 700 is capable of receiving both,
the further downmix signal 720 and the further side
information 730. At an output, the echo suppression unit 700
provides a modified downmix signal 740 along with the further
parametric side information 730 which passes through the echo
suppression unit 700 without being altered.
The echo suppression unit 700 will be outlined in more detail
with respect to Fig. 8 and comprises a calculator 220 and the
adaptive filter 240 as shown in Fig. 1.
Here, a spatial audio communication application is considered,
where we assume that the spatial audio scenes at the far-end
and at the near-end are represented by spatial audio streams
which are transmitted between the different subscribers. Since
hands-free operation is often essential in case of surround
playback with multiple loudspeakers, an AES unit 210 may be
useful to remove annoying echoes in the output of the near-
end's decoder. In contrast to previous methods described
above, where the AES is performed based on the loudspeaker
signals, we propose to perform the AES solely based on the
downmix signal 310 of the spatial audio stream 300 received
from the far-end. Since the number of downmix channels is in
general much lower than the number of loudspeaker signals used
for the playback, the proposed method is significantly more
efficient with respect to complexity. The AES can be applied
to either the microphone signals at the near-end, or, even
more efficiently, to the downmix signal of the near—end's
encoder output, as illustrated in Fig. 7.
Before describing the echo suppression unit 700 in more detail
in context with Fig. 8, in the following the process or method
according to an embodiment of the present invention will be
described in more detail.

Alternatively, the linear combination can be computed with
respect to the complex spectra of the downmix channels
First, a reference power spectrum (RPS) of the playback
signals P(k,m) based on the downmix signal 310 of the received
spatial audio stream is computed. In the general case of an N-
channel downmix signal B(k,m) = [Bi{k,m), Bi (k,m) ,..., BN(krm)]r
this can be performed according to a linear combination

The weighting factors a±(k,m) may be used to control the
contribution of the different downmix channels to the RPS.
A different weighting of the channels may be, for instance,
beneficial in the context of SAOC. When the input of the AES
is determined before the downmix transcoder is applied to the
SAOC downmix signal (see Fig. 6(b)), the time-variant behavior
of the downmix transcoder may not have to be modeled by the
echo estimation filter, but is already captured by the
computation of the reference power spectrum.
For the special case of a mono downmix signal, it is
reasonable to simply choose the RPS equal to the power
spectrum of the downmix signal, i.e. , |p(k, m)| = |fl(k, m)| .
In other words, the weighting coefficients a,(&,/«) are chosen to
be one for the single downmix channel comprised in the downmix
signal 310.

Analogously to equations (28), (29), we compute an RPS Q(k,m)
of the recorded signals based on the K-channel downmix signal
A{k,m) = [Ai (k,m) , Ai{k,m) , ..., AK{kfm)] of the near-end's
encoder:

Alternatively, the linear combination may be computed with
respect to the complex spectra of the downmix channels

The weighting factors Ci(k,m) may be used to control the
contribution of the different downmix channels to the RPS. As
before, we can simply use |£?(k, m)| = |A(k, m)| in case of a mono
downmix signal (ci(k,m) = 1).
The downmix signal A(k,m) and, thus also the RPS |£>(k, m)| ,
contain typically undesired echo components resulting from a
feedback of the loudspeaker signals. An estimate f(k, m)( of the
echo components |c(k, m)j2 is computed based on a delayed version
of the RPS |p(k, m)|2 and an estimate of echo power transfer
function according to

Analogously to the description above, G(k, mj is called echo
estimation filter (EEF) in the following.

This estimate is then used to determine an echo suppression
filter (ESF), e.g., analogously to (5):

where a, p, and y represent design parameters to control the
echo suppression performance. Typical values for a, p, and y
have been given above.
The Removal of the undesired echo components is finally
obtained by multiplying the channels of the original downmix
signal of the near-end's encoder with the ESF

The estimation of the EEF can be based on a correlation with
respect to the RPSs according to

Alternatively, the EEF filter can be estimated using temporal
fluctuations of the RPSs, i.e., analogously to (12):

where the temporal fluctuations of the RPSs are computed
according to

The estimation of the delay parameter d may be performed
analogously to (13), when replacing the loudspeaker and
microphone signals X{k,m) and Z{k,m) by the corresponding RPS
P(k,m) and Q(k,m), respectively.
It should be mentioned that typically there is no meaningful
phase relation between the downmix signals A{k,m) and B(k,m).
This is because their phases are related not only through the
frequency response of the room, but also by the highly time-
variant process of determining the loudspeaker signals from
the downmix signal and the spatial side information. Thus,
approaches which use the phase information to estimate the EEF
(or the delay), such as (8), are not suitable when performing
the echo removal based on the downmix signals.
It is worth mentioning that the same reasoning holds for the
case that echo cancellation using linear adaptive filtering
techniques should be applied with respect to the downmix
signals. Such adaptive filters would have to model and track
the highly time-variant changes caused by the mapping of the
downmix signal to the loudspeaker channels.
Fig. 8 shows a block diagram of a conferencing front-end 200
according to an embodiment of the present invention, which is
fairly similar to the one shown in Fig.' 1. Accordingly,
reference is made to the description of Fig. 1.
The conferencing front-end 200 also comprises an acoustic echo
suppression unit 210 according to an embodiment of the present
invention, which in turn comprises a calculator 220 for
performing essentially the same functionality as described in
context with Fig. 1. However, in the following a more detailed
description will be given.

The conferencing front-end 200 further comprises and input
interface 230 and an adaptive filter 240. The conferencing
front-end 200 further comprises a multichannel decoder 250,
which is coupled to a plurality of loudspeakers 100-1, ...,
100-N. The conferencing front-end 200 further comprises a
corresponding encoder or multichannel encoder 400, which in
turn is coupled to a plurality of microphones 110-1, ..., 110-
K.
To be a little more specific, an input signal 300 is provided
to the input interface 230 from the far-end of a communication
system underlying the front-end 200. In the embodiment shown
in Fig. 8, the input interface 230 separates a downmix signal
310 and parametric side information 320 from the input signal
and provides same as the input signals to the multichannel
decoder 250. Inside the multichannel decoder 250 the two
signals, the downmix signal 310 and the parametric side
information 320, are decoded into a plurality of corresponding
loudspeaker signals 330, which are then provided to the
respective loudspeakers 100. For the sake of simplicity, only
the first loudspeaker signal 330-1 is labeled as such.
The decoder 250 comprises, in the embodiment shown in Fig. 8,
an upmixer 705 and a parameter processor 480. The upmixer 705
is coupled to the input interface 230 and adapted to receive
the downmix signal 310. Similarly, the parameter processor 480
is also coupled to the input interface 230, but adapted to
receive the parametric side information 320. The upmixer 705
and the parameter processor 480 are interconnected such that
upmix control information 707 derived from the parametric side
information 320 may be transmitted to the upmixer 705. The
upmixer 705 is also coupled to the loudspeakers 100.
With respect to its functionality, the upmixer 705 is adapted
to generate the loudspeaker signals 330 from the downmix
signal 310 based on the upmix control information 707 derived
from the parametric side information 320. For each of the N (N

being an integer) loudspeakers 100-1, ..., 100-N, the upmixer
705 provides an individual loudspeaker signal 330.
As discussed before, the decoder 250 may optionally comprise
an interface, which extracts the side information 320 and the
downmix 310 and provides same to the parameter processor 480
and the upmixer 705, respectively, in case the input interface
230 is not shared by the decoder 250 and the acoustic echo
suppression unit 710.
As already described in context with Fig. 1, an output of the
input interface 230 is coupled to the calculator 220 to
provide the downmix signal 310 to the calculator 220. In other
words, the calculator 220 is adapted to receive the downmix
signal 310.
Before describing the internal structure of the calculator 220
in more detail, it should be noted that the microphones 110
provide a respective number K (K being an integer) of
microphone signals 340, of which only the first microphone
signal 340-1 is labeled as such in Fig. 8 to the multichannel
encoder 400.
Based on the received microphone signals 340 the multichannel
encoder 400 generates a further downmix signal 720 and further
parametric side information 730 based on the received
microphone signals. While the further parametric side
information 730 are provided to an output of the conferencing
system 200, the further downmix signal 720 is provided to
both, the calculator 220 and the adaptive filter 240. The
calculator 220 also provides a filter coefficient signal 350
to the adaptive filter 240 on the basis of which the further
downmix signal 720 is filtered to obtain a modified downmix
signal 740 at an output of the adaptive filter 240. The
modified downmix signal 740 represents an echo-suppressed
version of the incoming further downmix signal 720. As a
consequence, on the receiver side of the further downmix
signal 720 and the further parametric side information 730 an

echo-suppressed version of the microphone signal received by
the microphones 110 may be reconstructed.
With respect to the internal structure of the calculator 220,
the downmix signals 310 from the input interface 330 is
provided to the first reference power spectrum generator 800
which is adapted to generate the previously described
reference power spectrum, for instance, according to equations
(28) and (29). An output of the first reference power
generator 800 is coupled to an optional delayer 810, which is
adapted to delay an incoming signal by a delay value d. An
output of the delayer 810 is then coupled to an echo estimator
820, which may be, for instance, adapted to calculate an echo
estimation according to equation (38) . An output of the echo
estimator 820 is then coupled to an input of echo suppression
filter generator 830, which generates or estimates the echo
suppression filter according to equation (33). An output of
the echo suppression filter generator 830 is the filter
coefficient signal 350 comprising the filter coefficient,
which is provided to the adaptive filter 240.
The further downmix signal 720 as generated by the encoder 400
is provided to the echo suppression filter generator 830, if
this circuit comprises a second reference power spectrum
generator 840 or is provided to the second reference power
spectrum generator 840. To achieve this, the acoustic echo
suppression unit 210 may optionally comprise an additional or
further input interface to extract the further downmix signal
720, if necessary.
An output of the second reference power spectrum generator 840
is then coupled to an echo estimation filter coefficient
generator, which in turn is coupled to the echo estimator 820
to provide the echo estimation filter coefficients according
to equation (35) or (36) to the echo estimator 820. In case
the echo estimation filter coefficient generator 850 operates
based on equation (36), optional first and second temporal
fluctuation compensators 860, 870 are coupled in between the

echo estimation filter coefficient generator 850 and an output
of the delayer 810 and the second reference power spectrum
generator 840, respectively. The two temporal fluctuation
compensators 860, 870 may then be adapted to calculate
modified reference power spectra based on equations (37) and
(38), respectively. Then, the echo estimation filter
coefficient generator 850 may use the modified reference power
spectra to operate based on equation (36).
It should be noted that the delayer 810 is not a required, but
often useful component. A determination of the delay value d
may be achieved based on computations according to equations
(13), (14) and (15). To be more precise, an embodiment
according to the present invention may therefore comprise a
coherence calculator 880, which input-wise is coupled to an
output of the first reference power spectrum generator 800.
Furthermore, the coherence calculator 880 is also coupled to
an output of the second reference power spectrum generator 840
to provide the coherence calculator 880 with a respective
reference power spectrum.
For instance, based on equation (13) but with the two
reference power spectra as provided by the two reference power
spectrum generators 800, 840 the coherence calculator 880 may
generate values of a coherence function according to equation
(13) to an echo prediction gain calculator 890, which
calculates the echo predication gain cod{k) according to or
based on equation (14) . An output of the echo prediction gain
calculator is then coupled to an input of an optimizer 900,
which may be adapted to optimize the delay value d according
to equation (15) . To provide the delay value d to the delayer
810, the optimizer 900 is coupled to the delayer 810 and the
delayer 810 is adapted to receive the delay value d.
Naturally, the delayer is also in this case adapted to delay
the incoming signal (here the first reference power spectrum)
by the delay value d.

For the sake of completeness also the echo suppression unit
700 is shown in Fig. 8, which comprises a calculator 220 as
well as the adaptive filter 240 as already outlined in the
context of Fig. 7.
In the remainder of this section we will present practical
variations of the above method for downmix signal based echo
suppression.
We can obtain a variation of equation (32) according to

where the complex reference spectrum of the playback signals
P{k,m) is computed with respect to the complex spectra of the
downmix channels, i.e., according to

Equation (40) results from (29) by discarding the magnitude
computation.
Another modification of the AES approach can be obtained by
performing the echo suppression not on the downmix channels,
as proposed by (34), but with respect to the microphone input
signals instead. In other words, the echo suppression is
performed on the originally recorded microphone signals before
it is used as input for the near-end's encoder or any pre-
processing stage, respectively.
Many embodiments according to the present invention therefore
share the following features:

1. Receiving a first parametric spatial audio representation,
consisting of a downmix signal together with side information,
which is used to generate multichannel loudspeaker signals.
2. Receiving a second parametric spatial audio
representation, consisting of a downmix signal together with
side information, which has been determined from recorded
microphone signals.
3. Computing a reference power spectrum of the first and the
second downmix signals.
4. Computing an echo estimation filter for estimating the
echo components in the reference power spectrum of the second
downmix signal.
5. Computing an echo removal filter from the reference power
spectrum of the first downmix signal, the reference power
spectrum of the second downmix signal, and the echo estimation
filter to remove the echo components in the downmix signal of
the second spatial audio representation.
Depending on certain implementation requirements of
embodiments of inventive methods, embodiments of the inventive
methods may be implemented in hardware or in software. The
implementation can be performed using a digital storage
medium, in particular, a disc, a CD or a DVD having
electronically readable control signal installed thereon which
cooperate with a programmable computer or processor such that
an embodiment of the inventive methods is performed.
Generally, an embodiment of the present invention is,
therefore, a computer program product where the program code
stored on a machine-readable carrier, the program code being
operative to perform an embodiment of the inventive method,
when the computer program product runs on the computer of the
processor. In other words, embodiments of the inventive
methods are, therefore, a computer program having program code
for performing at least one of the embodiments of the

inventive method, when the computer programs runs on the
computer processor. A processor may be formed by a computer, a
chip card, a smart card, an application specific integrated
circuit (ASIC) or another integrated circuit.
Embodiments according to the present invention may furthermore
be implemented based on discrete electrical or electronical
elements, integrated circuits or combinations thereof.
Embodiments according to the present invention enable
therefore an acoustic echo control for parametric spatial
audio reproduction. As the previous discussion has shown,
embodiments may represent an efficient method for the
suppression of acoustic echoes for multichannel loudspeaker
systems used in spatial audio communication systems. The
methods are applicable in cases that the spatial audio signals
are represented by a downmix signal and corresponding
parametric side information or meter data. Embodiments exploit
the fact that the echo suppression may be performed directly
based on the received downmix signal rather than explicitly
computing the loudspeaker signals before they are input into
an acoustic echo suppression. Analogously, the echo components
may also be suppresses in the downmix signal of the spatial
audio signal to be transmitted to the far-end.

ISO/IEC 23003-1:2007. Information technology - MPEG Audio
technologies — Part 1: MPEG Surround. International
Standards Organization, Geneva, Switzerland, 2007.
E. Benjamin and T. Chen. The native B-format microphone:
Part I. In 119th AES Convention, Paper 6621, New York,
Oct. 2005.
H. Buchner, R. Aichner, and W. Kellermann. A
generalization of blind source separation algorithms for
convolutive mixtures based on second order statistics.
IEEE trans, on Speech and Audio Processing, 13(1):120—
134, Jan. 2005.
H. Buchner, R. Aichner, J. Stenglein, H. Teutsch, and W.
Kellermann. Simultaneous localization of multiple sound
sources using blind adaptive MIMO filtering. In Proc.
IEEE Int. Conf on Acoustics, Speech, and Signal
Processing (ICASSP), Philadelphia, March 2005.
J. Engdegard, B. Resch, C. Falch, 0. Hellmuth, J. Hilpert,
A. Hoelzer, L. Terentiev, J. Breebaart, J. Koppens, E.
Schuijers, and W. Oomen. Spatial audio object coding
(SAOC) — the upcoming MPEG standard on parametric object
based audio coding. In 124th AES Convention, Paper 7377,
Amsterdam, May 2008.
A. Favrot et. al. Acoustic echo control based on temporal
fluctuations of short—time spectra. In Proc. Intl. Works,
on Acoust. Echo and Noise Control (IWAENC), Seattle,
Sept. 2008, submitted.
W. Etter and G. S. Moschytz. Noise reduction by noise-
adaptive spectral magnitude expansion. J. Audio Eng.
Soc, 42:341-349, May 1994.

C. Faller and C. Toumery. Estimating the delay and
coloration effect of the acoustic echo path for low
complexity echo suppression. In Proc. Intl. Works, on
Acoust. Echo and Noise Control (IWAENC), Sept. 2005.
A. Favrot, C. Faller, M. Kallinger, F. Kuech, and M.
Schmidt. Acoustic echo control based on temporal
fluctuations of short—time spectra. In Proc. Intl. Works,
on Acoust. Echo and Noise Control (I WAENC), Sept. 2008.
Jurgen Herre, Kristofer Kjorling, Jeroen Breebaart,
Christof Faller, Sascha Disch, Heiko Purnhagen, Jeroen
Koppens, Johannes Hilpert, Jonas Roden, Werner Oomen,
Karsten Linzmeier, and Kok Seng Chong. MPEG Surround -
The ISO / MPEG Standard for efficient and compatible
multichannel audio coding. J. Audio Eng. Soc,
56(11):932-955, Nov. 2008.
J. Merimaa. Applications of a 3-D microphone array. In
112th AES Convention, Paper 5501, Munich, May 2002.
V. Pulkki. Spatial sound reproduction with directional
audio coding. J. Audio Eng. Soc, 55 (6) : 503—516, June
2007.
G. Schmidt and E. Hansler. Acoustic echo and noise
control: a practical approach. Hoboken: Wiley, 2004.

We Claim:
1. An acoustic echo suppression unit (210) comprising:
an input interface (230) for extracting a downmix signal
(310) from an input signal (300) comprising the downmix
signal (310) and parametric side information (320),
wherein the downmix signal (310) and parametric side
information (320) together represent a multichannel signal
having at least further channels or a channel number
higher than the number of channels in the downmix signal;
a calculator (220) for calculating filter coefficients
(350) for an adaptive filter (240), wherein the calculator
(220) is adapted to receive the downmix signal (310),
wherein the calculator (220) is further adapted to receive
a microphone signal (340) or a signal derived from the
microphone signal (720), wherein the calculator (220) is
adapted to determine the filter coefficients (350) based
on the received signals;
an adaptive filter (240) adapted to receive the filter
coefficients (350) from the calculator (220) and adapted
to filter the microphone signal (340) or the signal
derived from the microphone signal (720) based on the
filter coefficients (350) to suppress an echo caused by
the multichannel signal in the microphone signal (340).
2. The acoustic echo suppression unit (210) according to
claim 1, wherein the calculator (220) is adapted to
determine the filter coefficients (350) by determining a
first reference power spectrum based on the downmix signal
(310), by determining a second reference power spectrum
based on the microphone signal (340) or the signal derived
from the microphone signal (720), by determining echo
estimation filter coefficients based on the first and
second reference power spectra, by determining an echo
estimation based on the first reference power spectrum and

the echo estimation filter coefficients, and by-
determining the filter coefficients (350) based on the
echo estimation filter coefficients and the second
reference power spectrum.
3. The acoustic echo suppression unit (210) according to
claim 1 or 2, wherein the calculator (220) is adapted to
calculate a first reference power spectrum based on

wherein |p(k, m)|2 is the first reference power spectrum,
ai(k, m) is a weighting factor, Bi(k, m) is an i-th channel
of the downmix signal (310) , wherein N is the number of
channels in the downmix signal (310), N being greater than
or equal to 1, wherein k is a block time index and m
denotes a frequency index.
4. The acoustic echo suppression unit (210) according to any
of the claims 1, 2 or 3, wherein the calculator (220) is
adapted to calculate a second reference power spectrum
based on

wherein |Q(k, m)|2 is the second reference power spectrum,
ci(k,m) is a weighting factor, Ai(k,m) is an i-th channel of
a downmix signal (720), wherein K is a number of channels
in the downmix signal (720), K being greater than or equal
to 1, wherein k is a block time index and m denotes a
frequency index.
5. The acoustic echo suppression unit (210) according to any
of the claims 1 to 4, wherein the calculator (220) is
further adapted to determine the echo estimation filter
coefficients and the echo estimation based on the first
reference power spectrum in a delayed version by delaying
the first reference power spectrum by a delay value.
6. The acoustic echo suppression unit (210) according to
claim 5, wherein the calculator (220) is further adapted
to determine the delay value by determining a correlation
value for a plurality of different possible delay values,
by determining echo prediction gain values for values of
the plurality of different possible delay values and by
determining the value of the plurality of different
possible delay values as the delay value with a maximum
value of the determined echo prediction gain values.
7. The acoustic echo suppression unit (210) according to any
of the claims 1 to 6, wherein the calculator (220) is
adapted to determine a first modified power spectrum based
on the first reference power spectrum by subtracting a
mean value of the first reference power spectrum, wherein
the calculator (220) is adapted to determine a second
modified power spectrum based on the second reference
power spectrum by subtracting a second mean value of the
second reference power spectrum, and wherein the
calculator (220) is adapted to determine the echo
estimation filter coefficients based on the first and
second modified power spectra.

8. A method for suppressing an acoustic echo, comprising:
extracting a downmix signal (310) from an input signal
(300) comprising the downmix signal (310) and parametric
side information (320), wherein the downmix signal (310)
and the parametric side information (320) together
represent a multichannel signal having at least further
channels or a channel number higher than the number of
channels in the downmix signal;
calculating filter coefficients (350) for adaptive
filtering based on the downmix signal and the microphone
signal or a signal derived from the microphone signal;
adaptively filtering the microphone signal (340) or the
signal derived from the microphone signal (720) based on
the filter coefficients to suppress an echo caused by the
multichannel signal in the microphone signal (340).
9. The method according to claim 8, further comprising
decoding the downmix signal (310) and the parametric side
information (320) into a plurality of loudspeaker signals
(330).
10. A conferencing front-end (200), comprising:
an acoustic echo suppression unit (210) according to any
of the claims 1 to 7;
a multichannel decoder (250);
at least one microphone unit (110),
wherein the multichannel decoder (250) is adapted to
decode the downmix signal (310) and the parametric side
information (320) to a plurality of loudspeaker signals
(330);

wherein the at least one microphone unit (110) is adapted
to provide the microphone signal (340).
11. Conferencing front-end (200) according to claim 10,
wherein the input interface (230) is further adapted to
extract the parametric side information (320), wherein the
multichannel decoder (250) comprises an upmixer (705) and
a parameter processor (480), wherein the parameter
processor (480) is adapted to receive the parameter side
information (320) from the input interface (230) and is
adapted to provide an upmix control signal (707), and
wherein the upmixer (705) is adapted to receive the
downmix signal (310) from the input interface (230) and
the upmix control signal from the parameter processor and
is adapted to provide the plurality of loudspeaker signals
(330) based on the downmix signal (310) and the upmixer
control signal (707).
12. Conferencing front-end (200) according to any of the
claims 10 or 11, further comprising a multichannel encoder
(400) adapted to encode a plurality of audio input signals
(340; 410) into a further downmix signal (720) and further
parametric side information (730), together representing
the plurality of audio input signals, wherein the
microphone signal (340) of the at least one microphone
unit (110) is comprised in the plurality of audio input
signals, wherein the acoustic echo suppression unit (210)
is adapted to receive the further downmix signal (720) as
the signal derived from the microphone signal.
13. Conferencing front-end (200) according to any of the
claims 10 to 12, comprising a plurality of microphone
units (110), wherein the plurality of microphone units
(110) is adapted to provide the plurality of audio input
signals (330; 410).

14. Method of providing a plurality of loudspeaker signals
(330) and a microphone signal (340), comprising:
a method of suppressing (210) an acoustic echo according
to claim 8;
a step of multichannel decoding (250);
a step of receiving a microphone signal (340),
wherein, in the step of multichannel decoding (250), the
downmix signal (310) and the parametric side information
(320) are decoded to obtain a plurality of loudspeaker
signals (330) .
15. A computer program for performing, when running on a
processor, a method according to any of the claims 8 or
14.

List of Reference Signs
100 loudspeaker
110 microphone
120 echo removal unit
200 conferencing front-end
210 acoustic echo suppression unit
220 calculator
230 input interface
240 adaptive filter
250 multichannel decoder
300 input signal
310 downmix signal
320 parametric side information
330 loudspeaker signal
340 microphone signal
350 filter coefficient signal
360 output signal
400 multichannel encoder
410 input signal
420 output signal
450 pre-mixing matrix
460 decorrelator
470 post-mixing matrix
480 parameter processor
490 rendering/interaction information
500 transcoder
510 SAOC parsing unit
520 SAOC bitstream
530 number of objects
540 scene rendering engine
550 rendering matrix
560 rendering matrix generator
570 playback configuration
580 object position
590 downmix transcoder
600 control information
700 echo suppression unit

710 processing unit
720 further downmix signal
730 further parametric side information
740 modified downmix signal
800 first reference power spectrum generator
810 delayer
820 echo estimator
830 echo suppression filter generator
840 second reference power spectrum generator
850 echo estimation filter coefficient generator
860 first temporal fluctuation compensator
870 second temporal fluctuation compensator
880 coherence calculator
890 echo prediction gain calculator
900 optimizer

An acoustic echo suppression unit (210) according to an
embodiment of the present invention comprises and input
interface (230) for extracting a downmix signal (310) from an
input signal (300), the input signal comprising the downmix
signal (310) and parametric side information (320), wherein
the downmix and the parametric side information together
represent a multichannel signal, a calculator (220) for
calculating filter coefficients for an adaptive filter (240),
wherein the calculator (220) is adapted to determine the
filter coefficients based on the downmix signal (310) and a
microphone signal (340) or a signal derived from the
microphone signal, and an adaptive filter (240) adapted to
filter the microphone signal (340) or the signal derived from
the microphone signal based on the filter coefficients to
suppress an echo caused by the multichannel signal in the
microphone signal (340).

Documents

Application Documents

#	Name	Date
1	4222-KOLNP-2011-(12-10-2011)-SPECIFICATION.pdf	2011-10-12
1	4222-KOLNP-2011-RELEVANT DOCUMENTS [07-09-2023(online)].pdf	2023-09-07
2	4222-KOLNP-2011-(12-10-2011)-PCT REQUEST FORM.pdf	2011-10-12
2	4222-KOLNP-2011-RELEVANT DOCUMENTS [09-09-2022(online)].pdf	2022-09-09
3	4222-KOLNP-2011-RELEVANT DOCUMENTS [26-09-2021(online)].pdf	2021-09-26
3	4222-KOLNP-2011-(12-10-2011)-PCT PRIORITY DOCUMENT NOTIFICATION.pdf	2011-10-12
4	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [12-12-2019(online)].pdf	2019-12-12
4	4222-KOLNP-2011-(12-10-2011)-OTHERS.pdf	2011-10-12
5	4222-KOLNP-2011-IntimationOfGrant04-12-2019.pdf	2019-12-04
5	4222-KOLNP-2011-(12-10-2011)-INTERNATIONAL SEARCH REPORT.pdf	2011-10-12
6	4222-KOLNP-2011-PatentCertificate04-12-2019.pdf	2019-12-04
6	4222-KOLNP-2011-(12-10-2011)-INTERNATIONAL PUBLICATION.pdf	2011-10-12
7	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [20-03-2019(online)].pdf	2019-03-20
7	4222-KOLNP-2011-(12-10-2011)-FORM-5.pdf	2011-10-12
8	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [31-12-2018(online)].pdf	2018-12-31
8	4222-KOLNP-2011-(12-10-2011)-FORM-3.pdf	2011-10-12
9	4222-KOLNP-2011-(12-10-2011)-FORM-2.pdf	2011-10-12
9	4222-KOLNP-2011-ABSTRACT [10-05-2018(online)]-1-1.pdf	2018-05-10
10	4222-KOLNP-2011-(12-10-2011)-FORM-1.pdf	2011-10-12
10	4222-KOLNP-2011-ABSTRACT [10-05-2018(online)]-1.pdf	2018-05-10
11	4222-KOLNP-2011-(12-10-2011)-DRAWINGS.pdf	2011-10-12
11	4222-KOLNP-2011-ABSTRACT [10-05-2018(online)].pdf	2018-05-10
12	4222-KOLNP-2011-(12-10-2011)-DESCRIPTION (COMPLETE).pdf	2011-10-12
12	4222-KOLNP-2011-CLAIMS [10-05-2018(online)]-1-1.pdf	2018-05-10
13	4222-KOLNP-2011-(12-10-2011)-CORRESPONDENCE.pdf	2011-10-12
13	4222-KOLNP-2011-CLAIMS [10-05-2018(online)]-1.pdf	2018-05-10
14	4222-KOLNP-2011-(12-10-2011)-CLAIMS.pdf	2011-10-12
14	4222-KOLNP-2011-CLAIMS [10-05-2018(online)].pdf	2018-05-10
15	4222-KOLNP-2011-(12-10-2011)-ABSTRACT.pdf	2011-10-12
15	4222-KOLNP-2011-FER_SER_REPLY [10-05-2018(online)]-1-1.pdf	2018-05-10
16	4222-KOLNP-2011-(09-11-2011)-FORM-18.pdf	2011-11-09
16	4222-KOLNP-2011-FER_SER_REPLY [10-05-2018(online)]-1.pdf	2018-05-10
17	ABSTRACT-4222-KOLNP-2011.jpg	2011-11-30
17	4222-KOLNP-2011-FER_SER_REPLY [10-05-2018(online)].pdf	2018-05-10
18	4222-KOLNP-2011-(07-12-2011)-PA.pdf	2011-12-07
18	4222-KOLNP-2011-OTHERS [10-05-2018(online)]-1-1.pdf	2018-05-10
19	4222-KOLNP-2011-(07-12-2011)-CORRESPONDENCE.pdf	2011-12-07
19	4222-KOLNP-2011-OTHERS [10-05-2018(online)]-1.pdf	2018-05-10
20	4222-KOLNP-2011-(14-12-2011)-CORRESPONDENCE.pdf	2011-12-14
20	4222-KOLNP-2011-OTHERS [10-05-2018(online)].pdf	2018-05-10
21	4222-KOLNP-2011-(14-12-2011)-ASSIGNMENT.pdf	2011-12-14
21	4222-KOLNP-2011-PETITION UNDER RULE 137 [10-05-2018(online)].pdf	2018-05-10
22	4222-KOLNP-2011-(09-04-2012)-FORM-3.pdf	2012-04-09
22	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [27-01-2018(online)].pdf	2018-01-27
23	4222-KOLNP-2011-(09-04-2012)-CORRESPONDENCE.pdf	2012-04-09
23	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [27-01-2018(online)]_31.pdf	2018-01-27
24	4222-KOLNP-2011-FER.pdf	2017-11-15
24	4222-KOLNP-2011-(16-04-2012)-FORM-3.pdf	2012-04-16
25	4222-KOLNP-2011-(16-04-2012)-CORRESPONDENCE.pdf	2012-04-16
25	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [05-10-2017(online)].pdf	2017-10-05
26	Other Patent Document [18-10-2016(online)].pdf	2016-10-18
26	Other Patent Document [24-03-2017(online)].pdf	2017-03-24
27	Other Patent Document [18-10-2016(online)].pdf	2016-10-18
27	Other Patent Document [24-03-2017(online)].pdf	2017-03-24
28	4222-KOLNP-2011-(16-04-2012)-CORRESPONDENCE.pdf	2012-04-16
28	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [05-10-2017(online)].pdf	2017-10-05
29	4222-KOLNP-2011-(16-04-2012)-FORM-3.pdf	2012-04-16
29	4222-KOLNP-2011-FER.pdf	2017-11-15
30	4222-KOLNP-2011-(09-04-2012)-CORRESPONDENCE.pdf	2012-04-09
30	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [27-01-2018(online)]_31.pdf	2018-01-27
31	4222-KOLNP-2011-(09-04-2012)-FORM-3.pdf	2012-04-09
31	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [27-01-2018(online)].pdf	2018-01-27
32	4222-KOLNP-2011-(14-12-2011)-ASSIGNMENT.pdf	2011-12-14
32	4222-KOLNP-2011-PETITION UNDER RULE 137 [10-05-2018(online)].pdf	2018-05-10
33	4222-KOLNP-2011-(14-12-2011)-CORRESPONDENCE.pdf	2011-12-14
33	4222-KOLNP-2011-OTHERS [10-05-2018(online)].pdf	2018-05-10
34	4222-KOLNP-2011-(07-12-2011)-CORRESPONDENCE.pdf	2011-12-07
34	4222-KOLNP-2011-OTHERS [10-05-2018(online)]-1.pdf	2018-05-10
35	4222-KOLNP-2011-(07-12-2011)-PA.pdf	2011-12-07
35	4222-KOLNP-2011-OTHERS [10-05-2018(online)]-1-1.pdf	2018-05-10
36	ABSTRACT-4222-KOLNP-2011.jpg	2011-11-30
36	4222-KOLNP-2011-FER_SER_REPLY [10-05-2018(online)].pdf	2018-05-10
37	4222-KOLNP-2011-(09-11-2011)-FORM-18.pdf	2011-11-09
37	4222-KOLNP-2011-FER_SER_REPLY [10-05-2018(online)]-1.pdf	2018-05-10
38	4222-KOLNP-2011-(12-10-2011)-ABSTRACT.pdf	2011-10-12
38	4222-KOLNP-2011-FER_SER_REPLY [10-05-2018(online)]-1-1.pdf	2018-05-10
39	4222-KOLNP-2011-(12-10-2011)-CLAIMS.pdf	2011-10-12
39	4222-KOLNP-2011-CLAIMS [10-05-2018(online)].pdf	2018-05-10
40	4222-KOLNP-2011-(12-10-2011)-CORRESPONDENCE.pdf	2011-10-12
40	4222-KOLNP-2011-CLAIMS [10-05-2018(online)]-1.pdf	2018-05-10
41	4222-KOLNP-2011-(12-10-2011)-DESCRIPTION (COMPLETE).pdf	2011-10-12
41	4222-KOLNP-2011-CLAIMS [10-05-2018(online)]-1-1.pdf	2018-05-10
42	4222-KOLNP-2011-(12-10-2011)-DRAWINGS.pdf	2011-10-12
42	4222-KOLNP-2011-ABSTRACT [10-05-2018(online)].pdf	2018-05-10
43	4222-KOLNP-2011-(12-10-2011)-FORM-1.pdf	2011-10-12
43	4222-KOLNP-2011-ABSTRACT [10-05-2018(online)]-1.pdf	2018-05-10
44	4222-KOLNP-2011-(12-10-2011)-FORM-2.pdf	2011-10-12
44	4222-KOLNP-2011-ABSTRACT [10-05-2018(online)]-1-1.pdf	2018-05-10
45	4222-KOLNP-2011-(12-10-2011)-FORM-3.pdf	2011-10-12
45	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [31-12-2018(online)].pdf	2018-12-31
46	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [20-03-2019(online)].pdf	2019-03-20
46	4222-KOLNP-2011-(12-10-2011)-FORM-5.pdf	2011-10-12
47	4222-KOLNP-2011-PatentCertificate04-12-2019.pdf	2019-12-04
47	4222-KOLNP-2011-(12-10-2011)-INTERNATIONAL PUBLICATION.pdf	2011-10-12
48	4222-KOLNP-2011-IntimationOfGrant04-12-2019.pdf	2019-12-04
48	4222-KOLNP-2011-(12-10-2011)-INTERNATIONAL SEARCH REPORT.pdf	2011-10-12
49	4222-KOLNP-2011-Information under section 8(2) (MANDATORY) [12-12-2019(online)].pdf	2019-12-12
49	4222-KOLNP-2011-(12-10-2011)-OTHERS.pdf	2011-10-12
50	4222-KOLNP-2011-RELEVANT DOCUMENTS [26-09-2021(online)].pdf	2021-09-26
50	4222-KOLNP-2011-(12-10-2011)-PCT PRIORITY DOCUMENT NOTIFICATION.pdf	2011-10-12
51	4222-KOLNP-2011-(12-10-2011)-PCT REQUEST FORM.pdf	2011-10-12
51	4222-KOLNP-2011-RELEVANT DOCUMENTS [09-09-2022(online)].pdf	2022-09-09
52	4222-KOLNP-2011-(12-10-2011)-SPECIFICATION.pdf	2011-10-12
52	4222-KOLNP-2011-RELEVANT DOCUMENTS [07-09-2023(online)].pdf	2023-09-07

Search Strategy

1	searchstrategyformat_4222kolnp2011_08-11-2017.pdf