Method And Apparatus For Conversion Between Multi Channel Audio

< Back

Method And Apparatus For Conversion Between Multi Channel Audio Formats

Abstract: An input multi-channel representation is converted into a different output multi-channel representation of a spatial audio signal, in that an intermediate representation of the spatial audio signal is derived, the intermediate representation having direction parameters indicating a direction of origin of a portion of the spatial audio signal; and in that the output multi-channel representation of the spatial audio signal is generated using the intermediate representation of the spatial audio signal.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

04 September 2009

Publication Number

46/2009

Publication Type

INA

Invention Field

PHYSICS

Status

Parent Application

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

HANSASTRASSE 27C 80686 MÜNCHEN GERMANY

Inventors

1. JÜRGEN HERRE

HALLERSTR. 24 91054 BUCKENHOF GERMANY

2. VILLE PULKKI

YLÄPORTTI 4 A 7 02210 ESPOO FINLAND

Specification

METHOD AMD APPARATUS FOR CONVERSION BETWEEN MULTI-CHANNEL
AUDIO FORMATS
Field of the Invention
The present invention relates to a technique as to how to
convert between different multi-channel audio formats in
the highest possible quality without being limited to
specific multi-channel representations. That is, the
present invention relates to a technique allowing the
conversion between arbitrary multi-channel formats.
Background of the Invention and prior art
Generally, in multi-channel reproduction and listening, a
listener is surrounded by multiple loudspeakers. Various
methods exist to capture audio signals for specific setups.
One general goal in the reproduction is to reproduce the
spatial composition of the originally recorded sound event,
i.e. the origins of individual audio sources, such as the
location of a trumpet within an orchestra. Several
loudspeaker setups are fairly common and can create
different spatial impressions. Without using special post-
production techniques, the commonly known two-channel
stereo setups can only recreate auditory events on a line
between the two' loudspeakers. This is mainly achieved by
so-called "amplitude-panning", where the amplitude of the
signal associated to one audio source is distributed
between the two loudspeakers, depending on the position of
the audio source with respect to the loudspeakers. This is
normally done during recording or subsequent mixing. That
is, an audio source coming from the far-left with respect
to the listening position will be mainly reproduced by the
left loudspeaker, whereas an audio source in front of the
listening position will be reproduced with identical

amplitude (level) by both loudspeakers. However, sound
emanating from other directions cannot be reproduced.
Consequently, by using more loudspeakers that are
distributed around the listener, more directions can be
covered and a more natural spatial impression can be
created. The probably most well known multi-channel
loudspeaker layout is the 5.1 standard (ITU-R775-1), which
consists of 5 loudspeakers, whose azimuthal angles with
respect to the listening position are predetermined to be
0°, ±30° and ±110°. That means, during recording or
mixing, the signal is tailored to that specific loudspeaker
configuration and deviations of a reproduction setup from
the standard will result in decreased reproduction quality.
Numerous other systems with varying numbers of loudspeakers
located at different directions have also been proposed.
Professional and special systems, especially in theaters
and sound installations, do also include loudspeakers at
different heights.
A universal audio reproduction system named DirAC has been
recently proposed which is able to record and reproduce
sound for arbitrary loudspeaker setups. The purpose of
DirAC is to reproduce the spatial impression of an existing
acoustical environment as precisely as possible, using a
multi-channel loudspeaker system having an arbitrary
geometrical setup. Within the recording environment, the
responses of the environment (which may be continuous
recorded sound or impulse responses) are measured with an
omnidirectional microphone (W) and with a set of
microphones allowing to measure the direction of arrival of
sound and the diffuseness of sound. In the following
paragraphs and within the application, the term
"diffuseness" is to be understood as a measure for the non-
directivity of sound. That is, sound arriving at the
listening or recording position with equal strength from
all directions,' is maximally diffuse. A common way to

quantify diffusion is to use diffuseness values from the
interval [0,...,1], wherein a value of 1 describes maximally
diffuse sound and value of 0 describes perfectly
directional sound, i.e. sound emanating from one clearly
distinguishable direction only. One commonly known method
of measuring the direction of arrival of sound is to apply
3 figure-of-eight microphones (XYZ) aligned with Cartesian
coordinate axes. Special microphones, so-called "SoundField
microphones", have been designed, which directly yield all
the desired responses. However, as mentioned above, the W,
X, Y and Z signals may also be computed from a set of
discrete omnidirectional microphones.
Another method to store audio formats for arbitrary number
of channels to one or two downmix channels of audio with
accompanying directional data has been recently proposed by
Goodwin and Jot-. This format can be applied to arbitrary
reproduction systems. The directional data, i.e. the data
having information about the direction of audio sources is
computed using "Gerzon vectors", which consist of a
velocity vector and an energy vector. The velocity vector
is a weighted sum of vectors pointing at loudspeakers from
the listening position, wherein each weight is the
magnitude of a frequency spectrum at a given time/frequency
tile for a loudspeaker. The energy vector is a similarly
weighted vector sum. However, the weights are short-time
energy estimates of the loudspeaker signals, that is, they
describe a somewhat smoothed signal or an integral of the
signal energy contained in the signal within finite length
time-intervals. These vectors share the disadvantage of not
being related to a physical or a perceptual quantity in a
well-grounded way. For example, the relative phase of the
loudspeakers with respect to each other is not properly
taken into account. That means, for example, if a broadband
signal is fed into the loudspeakers of a stereophonic setup
in front of a listening position with opposite phase, a
listener would perceive sound from ambient direction, and
the sound field in the listening position would have sound

energy oscillations from side to side (e.g. from the left
side to the right side) . In such a scenario, the Gerzon
vectors would be pointing towards the front direction,
which is obviously not representing the physical or the
perceptual situation.
Naturally, having multiple multi-channel formats or
representations in the market, the requirement exists to be
able to convert between the different representations, such
that the individual representations may be reproduced with
setups originally developed for the reconstruction of an
alternative multi-channel representation. That is, for
example, a transformation between the 5.1 channels and 7.1
or 7.2 channels may be required to use an existing 7.1 or
7.2 channel playback setup for playing back the 5.1 multi-
channel representation commonly used on DVD. The great
variety of audio formats makes the audio content production
difficult, as all formats require specific mixes and
storage/transmission formats. Therefore, conversion between
different recording formats for playback on different
reproduction setups is necessary.
There are a number of methods proposed to convert audio in
a specific audio format to another audio format. However,
these methods are always tailored to specific multi-channel
formats or representations. That is, these are only
applicable to the conversion from one specific
predetermined multi-channel representation into another
specific multi-channel representation.
Generally, a reduction in the number of reproduction
channels (so-called "downmix") is simpler to implement that
an increase in the number of reproduction channels
("upmix"). For some standard loudspeaker reproduction
setups, recommendations are provided by, for example, the
ITU on how to downmix to reproduction setups with a lower
number of reproduction channels. In these so-called "ITU"
downmix equations, the output signals are derived as simple

static linear combinations of input signals. Usually, a
reduction of the number of reproduction channels leads to a
degradation of the perceived spatial image, i.e. a degraded
reproduction quality of a spatial audio signal.
For a possible benefit from a high number of reproduction
channels or reproduction loudspeakers, upmixing techniques
for specific types of conversions have been developed. An
often investigated problem is how to convert 2-channel
stereophonic audio for reproduction with 5-channel surround
loudspeaker systems. One approach or implementation to such
a 2-to-5 upmix is to use a so-called "matrix" decoder. Such
decoders have become common to provide or upmix 5.1 multi-
channel sound over stereo transmission infrastructures,
especially in the early days of surround sound for movies
and home theatres. The basic idea is to reproduce sound
components which are in-phase in the stereo signal in the
front of the sound image, and to put out-of-phase
components into the rear loudspeakers. An alternative 2-to-
5 upmixing method proposes to extract the ambient
components of the stereo signal and to reproduce those
components via the rear loudspeakers of the 5.1 setup. An
approach following the same basic ideas on a perceptually
more justified basis and using a mathematically more
elegant implementation has been recently proposed by C.
Faller in "Parametric Multi-channel Audio Coding: Synthesis
of Coherence Cues", IEEE Trans. On Speech and Audio Proc,
vol. 14, no. 1, January 2006.
The recently published standard MPEG surround performs an
upmix from one or two downmixed and transmitted channels to
the final channels used in reproduction or playback, which
is usually 5.1. This is implemented either using spatial
side information (side information similar to the BCC
technique) or without side information, by using the phase
relations between the two channels of a stereo downmix
("non-guided mode" or "enhanced matrix mode").

All methods for format conversion described in the previous
paragraphs are specialized to be applied to specific
configurations of both the source and the destination audio
reproduction format and are thus not universal. That is, a
conversion between arbitrary input multi-channel
representations to arbitrary output multi-channel
representations cannot be performed. That is to say the
prior art transformation techniques are specifically
tailored to the number of loudspeakers and their precise
position for the input multi-channel audio representation
as well as for the output multi-channel representation.
The international patent application 2004/077884 proposes
to utilize DirAC-coding to record impulse responses of
audio signals within listening environments. Using the such
recorded impulse responses, audio signals may be reproduced
with the spatial impression of the listening environment.
The AES-convention paper 6658 is directed to DirAC audio
coding and proposes a method to how to create an efficient
encoded representation of signals recorded by b-format
microphones.
The international patent application 01/82651 relates to
multi-channel surround mastering and reproduction
techniques. A particular a spatial encoding technique is
proposed, in order to provide for a compact encoded
representation to be transmitted. The encoded
representation may then be decoded by a specially designed
decoder at the receiving end.
It is, naturally, desirable to have a concept for multi-
channel transformation which is applicable to arbitrary
combinations of input and output multi-channel
representations.

Summary of the Invention
According to one embodiment of the present invention, an
apparatus for conversion of an input multi-channel
representation into a different output multi-channel
representation of a spatial audio signal comprises: an
analyzer for deriving an intermediate representation of the
spatial audio signal, the intermediate representation
having direction parameters indicating a direction of
origin of a portion of the spatial audio signal; and a
signal composer for generating the output multi-channel
representation of the spatial audio signal using the
intermediate representation of the spatial audio signal.
In that an intermediate representation is used which has
direction parameters indicating a direction of origin of a
portion of the spatial audio signal, conversion can be
achieved between arbitrary multi-channel representations,
as long as the loudspeaker configuration of the output
multi-channel representation is known. It is important to

note that the loudspeaker configuration of the output
multi-channel representation does not have to be known in
advance, that is, during the design of the conversion
apparatus. As the conversion apparatus and method are
universal, a multi-channel representation provided as an
input multi-channel representation and designed for a
specific loudspeaker-setup may be altered on the receiving
side, to fit the available reproduction setup such that the
reproduction quality of a reproduction of a spatial audio
signal is enhanced.
According to a further embodiment of the present invention,
the direction of origin of a portion of the spatial audio
signal is analyzed within different frequency bands. Such,
different direction parameters are derived for finite with
frequency portions of the spatial audio signal. To derive
the finite width frequency portions, a filterbank or a
Fourier-transform may, for example, be used. According to
another embodiment, the frequency portions or frequency
bands, for which the analysis is performed individually is
chosen to match the frequency resolution of the human
hearing process. These embodiments may have the advantage
that the direction of origin of portions of the spatial
audio signal is performed as good as the human auditory
system itself can determine the direction of origin of
audio signals. Therefore, the analysis is performed without
a potential loss of precision in the determination of the
origin of an audio object or a signal portion, when a such
analyzed signal is reconstructed and played back via an
arbitrary loudspeaker setup.
According to a further embodiment of the present invention,
one or more downmix channels are additionally derived
belonging to the intermediate representation. That is,
downmixed channels are derived from audio channels
corresponding to loudspeakers associated to the input
multi-channel representation, which may then be used for
generating the output multi-channel representation or for

generating audio channels corresponding to loudspeakers
associated to the output multi-channel representation.
For example, a monophonic downmix a channel may be
generated from the 5.1 input channels of a common 5.1
channel audio signal. This could, for example, be performed
by computing the sum of all the individual audio channels.
Based on the such derived monophonic downmix channel, a
signal composer may distribute such portions of the
monophonic downmix channel corresponding to the analyzed
portions of the. input multi-channel representation to the
channels of the output multi-channel representation as
indicated by the direction parameters. That is, a frequency
/time or signal portion analyzed to be coming from the far
left from a spatial audio signal will be redistributed to
the loudspeakers of the output multi-channel
representation, . which are located on the left side with
respect to a listening position.
Generally, some embodiments of the present invention allow
to distribute portions of the spatial audio signal with
greater intensity to a channel corresponding to a
loudspeaker closer to the direction indicated by the
direction parameters than to a channel further away from
that direction. That is, no matter how the location of
loudspeakers used for reproduction are defined in the
output multi-channel representation, a spatial
redistribution will be achieved fitting the available
reproduction setup as good as possible.
According to some embodiments of the present invention, a
spatial resolution, with which a direction of origin of a
portion of the spatial audio signal can be determined, is
much higher than the angle of three dimensional space
associated to one single loudspeaker of the input multi-
channel representation. That is, the direction of origin of
a portion of the spatial audio signal can be derived with a
better precision than a spatial resolution achievable by

simply redistributing the audio channels from one distinct
setup to another specific setup, as for example by
redistributing the channels of a 5.1 setup to a 7.1 or 7.2
setup.
Summarizing, some embodiments of the invention allow the
application of an enhanced method for format conversion
which is universally applicable and does not depend on a
particular desired target loudspeaker layout/configuration.
Some embodiments convert an input multi-channel audio
format (representation) with Nl channels into an output
multi-channel format (representation) having N2 channels by
means of extracting direction parameters (similar to
DirAC), which are then used for synthesizing the output
signal having N2 channels. Furthermore, according to some
embodiments, a number of NO downmix channels are computed
from the Nl input signals (audio channels corresponding to
loudspeakers according to the input multi-channel
representation) , which are then used as a basis for a
decoding process using the extracted direction parameters.
Brief Description of the Drawings
Several embodiments of the present invention will in the
following be described referencing the enclosed drawings.
Fig. 1 shows an illustration of derivation of direction
parameters indicating a direction of origin of a portion of
an audio signal; and
Fig. 2 shows a further embodiment of derivation of
direction parameters based on a 5.1-channel representation;
Fig. 3 shows an example of generation of an output multi-
channel representation;

Fig. 4 shows an example for audio conversion from a 5.1-
channel setup to an 8.1 channel setup; and
Fig. 5 shows an. example for an inventive apparatus for
conversion between multi-channel audio formats.
Some embodiments of the present invention derive an
intermediate representation of a spatial audio signal
having direction parameters indicating a direction of
origin of a portion of the spatial audio signal. One
possibility is to derive a velocity vector indicating the
direction of origin of a portion of a spatial audio signal.
One example for doing so will be described in the following
paragraphs, referencing Fig. 1.
Before detailing the concept, it may be noted that the
following analysis may be applied to multiple individual
frequency or time portions of the underlying spatial audio
signal simultaneously. For the sake of simplicity, however,
the analysis will be described for one specific frequency
or time or time/frequency portion only. The analysis is
based on an energetic analysis of the sound field recorded
at a recording position 2, located at the center of a
coordinate system, as indicated in Fig. 1.
The coordinate system is a Cartesian Coordinate System,
having an x axis 4 and a y axis 6 perpendicular to each
other. Using a right handed system, the z axis not shown in
Fig. 1 points to the direction out of the drawing plane.
For the direction analysis, it is assumed that 4 signals
(known as B-format signals) are recorded. One
omnidirectional signal w is recorded, i.e. a signal
receiving signals from all directions with (ideally) equal
sensitivity. Furthermore, three directional signals X, Y
and Z are recorded, having a sensitivity distribution
pointing in the direction of the axes of the Cartesian
Coordinate System. Examples for possible sensitivity

patterns of the microphones used are given in Fig. 1
showing two "figure-of-eight" patterns 8a and 8b, pointing
to the directions of the axes. Two possible audio sources
10 and 12 are furthermore illustrated in the two-
dimensional projection of the coordinate system shown in
Fig. 1.
For the direction analysis, an instantaneous velocity
vector (at time index n) is composed for different
frequency portions (described by the index i) by

That is, a vector is created having the individually
recorded microphone signals of the microphones associated
to the axis of the coordinate system as components. In the
previous and the following equations, the Quantities are
indexed in Time (n) as well as in frequency (i) by two
indices (n,i). That is,
ex, ey and ez represent Cartesian unit vectors.
Using the simultaneously recorded omnidirectional signal w,
an instantaneous intensity I is computed as

the instantaneous energy is derived according to the
following formula:

where || denotes vector norm.
That is, an intensity quantity is derived allowing for
possible interference between two signals (as positive and
negative amplitudes may occur). Additionally, an energy
quantity is derived, which naturally does not allow for

interference between two signals, as the energy quantity
does not contain negative values allowing for an
cancellation of the signal.
These properties of the intensity and the energy signals
can be advantageously used to derive a direction of origin
of signal portidns with high accuracy, preserving a virtual
correlation of audio channels (a relative phase between the
channels), as it will be detailed below.
On the one hand, the instantaneous intensity vector may be
used as vector indicating the direction of origin of a
portion of the spatial audio signal. However, this vector
may undergo rapid changes thus causing artifacts within the
reproduction of the signal. Therefore, alternatively, an
instantaneous direction may be computed using short time
averaging utilizing a Hanning window W2 according to the
following formula:

where W2 is the Hanning window for short-time averaging D.
That is, optionally, a short-time averaged direction vector
having parameters indicating a direction of origin of the
spatial audio signal may be derived.
Optionally, a diffuseness measure ψ may be computed as
follows:

where W1 (m) is a window function defined between -M/2 and
M/2 for short-time averaging.

It should again be noted that the deriving is performed
such as to preserve virtual correlation of the audio
channels. That is, phase information is properly taken into
account, which is not the case for direction estimates
based on energy estimates only (as for example Gerzon
vectors).
The following simple example shall serve to explain this in
more detail. Consider a perfectly diffuse signal which is
played back by two loudspeakers of a stereo system. As the
signal is diffuse (originating from all directions) , it is
to be played back by both speakers with equal intensity.
However, as the perception shall be diffuse, a phase shift
of 180 degrees is required. In such a scenario, a purely
energy based direction estimation would yield a direction
vector pointing exactly to the middle between the two
loudspeakers, which certainly is a undesirable result not
reflecting reality.
According to the inventive concept detailed above, virtual
correlation of the audio channels is preserved while
estimating the direction parameters (direction vectors). In
this particular example, the direction vector would be
zero, indicating that the sound does not originate from one
distinct direction, which is clearly not the case in
reality. Correspondingly, the diffuseness parameter of
equation (5) is 1, matching the real situation perfectly.
The Hanning windows in the above equations may furthermore
have different lengths for different frequency bands.
As a result of this analysis, for each time slice of a
frequency portion, a direction vector or direction
parameters are derived indicating a direction of origin of
the portion of the spatial audio signal, for which the
analysis has been performed. Optionally, a diffuseness
parameter can be derived indicating the diffuseness of the
direction of a portion of the spatial audio signal. As

previously described, a diffusion value of one derived
according to equation (4) describes a signal of maximal
diffuseness, i.e. originating from all directions with
equal intensity.
To the contrary, small diffuseness values are attributed to
signal portions originating predominantly from one
direction.
Fig. 2 shows an example for the derivation of direction
parameters from an input multi-channel representation
having five channels according to ITU-775-1. The multi-
channel input audio signal, i.e. the input multi-channel
representation, is first transformed into B-format by
simulating an anechoic recording of the corresponding
multi-channel audio setup. With respect to a center 20 of
the Cartesian Coordinate System having an axis x 22 and y
24, a rear-right loudspeaker 26 is located at an angle of
110°. A right-front loudspeaker 28 is located at +30°, a
center loudspeaker at 0°, a left-front loudspeaker 32 at -
31° and a left-rear loudspeaker 34 at -110°. In practice,
an anechoic recording can be simulated by applying simple
matrixing operations, the geometrical setup of the input
multi-channel representation is known.
An omnidirectional signal w can be obtained by taking a
direct sum of all loudspeaker signals, that is of all audio
channels corresponding to the loudspeakers associated to
the input multi-channel representation. The dipole or
"figure-of-eight" signals X, Y and Z can be formed by
adding the loudspeaker signals weighted by the cosine of
the angle between the loudspeaker and the corresponding
Cartesian axes, i.e. the direction of maximum sensitivity
of the dipole microphone to be simulated. Let Ln be the 2-D
or 3-D Cartesian vector pointing towards the nth
loudspeaker and V be the unit vector pointing to the
Cartesian axis direction corresponding to the dipole

microphone. Then, the weighting factor is cos(angle(Ln,V)).
The directional signal X would, for example, be written as

when Cn denotes the loudspeaker signal of the nth channel
and N is the number of channels. The term angle has to be
interpreted as an operator, computing the spatial angle
between the two given vectors. That is, for example the
angle 40 (©) between the Y axis 24 and the left-front
loudspeaker 32 in the two dimensional case illustrated in
Fig. 2.
The further derivation of direction parameters could, for
example, be performed as illustrated in Fig. 1 and detailed
in the corresponding description, i.e. audio signals X, Y
and Z can be divided into frequency bands according to
frequency resolution of the human auditory system. The
direction of the sound, i.e. the direction of origin of the
portions of the spatial audio signal and, optionally,
diffuseness is analyzed depending on time in each frequency
channel. Optionally, a replacement for sound diffuseness
using another measure of signal dissimilarity than
diffuseness can also be used, such as the coherence between
(stereo) channels associated to the spatial audio signal.
If, as a simplified example, one audio source 44 is
present, as indicated in Fig. 2, wherein that source only
contributes to the signal within a specific frequency band,
a direction vector 46 pointing to the audio source 44 would
be derived. The direction vector is represented by
direction parameters (vector components) indicating the
direction of the portion of the spatial audio signal
originating from audio source 44. In the reproduction setup
of Fig. 2, such a signal would be reproduced mainly by the
left-front loudspeaker 32 as illustrated by the symbolic
wave form associated to this loudspeaker. However, minor

signal portions will also be played back from the left-rear
loudspeaker 32.. Hence, the directional signal of the
microphone associated to the X coordinate 22 would receive
signal components from the left-front channel 32 (the audio
channel associate to the left-front loudspeaker 32) and the
left-rear channel 34.
As, according to the above implementation, the directional
signal Y associated to the y-axis will receive also signal
portions played back by the left-front loudspeaker 32, a
directional analysis based on directional signals X and Y
will be able to reconstruct sound coming from direction
vector 46 with high precision.
For the final conversion to the desired multi-channel
representation (multi-channel format), the direction
parameters indicating the direction of origin of portions
of the audio signals are used. Optionally, one or more (NO)
additional audio downmix channels may be used. Such a
downmix channel may, for example, be the omnidirectional
channel W or any other monophonic channel. However, for the
spatial distribution, the use of only one single channel
associated to the intermediate representation is of minor
negative impact. That is, several downmix channels, such as
a stereo mix, the channels W, X and Y or all channels of a
B-format may be used as long as the direction parameters or
the directional data has been derived and can be used for
the reconstruction or the generation of the output multi-
channel representation. It is alternatively also possible
to use the 5 channels of Fig. 2 directly or any combination
of channels associated to the input multi-channel
representation as replacement for possible downmix
channels. When only one channel is stored, there might be a
degradation of the quality in the reproduction of diffuse
sound.
Fig. 3 shows an example for the reproduction of the signal
of audio source 44 with a loudspeaker-setup differing

significantly from the loudspeaker-setup of Fig. 2, which
was the input multi-channel representation from which the
parameters have been derived. Fig. 3 shows, as an example,
six loudspeakers 50a to 50f equally distributed along a
line in front of a listening position 60, defining the
center of a coordinate system having an x-axis 22 and a y-
axis 24, as introduced in Fig. 2. As a previous analysis
has provided direction parameters describing the direction
of the direction vector 46 pointing to the source of the
audio signal 44, an output multi-channel representation
adapted to the loudspeaker setup of Fig. 3 can easily be
derived by redistributing the portion of the spatial audio
signal to be reproduced to the loudspeakers close to the
direction of audio source 44, i.e. by those loudspeakers
close to the direction indicated by the direction
parameters. That is, audio channels corresponding to
loudspeakers in the direction indicated by the direction
parameters are emphasized with respect to audio channels
corresponding to loudspeakers far away from this direction.
That is, loudspeakers 50a and 50b can be steered (for
example using amplitude panning) to reproduce the signal
portion, whereas loudspeakers 50c to 50f do not reproduce
that specific signal portion, while they may be used for
reproduction of diffuse sound or other signal portions of
different frequency bands.
The use of a signal composer for generating the output
multi-channel representation of the spatial audio signal
using the direction parameters can also be interpreted as
being a decoding of the intermediate signal into the
desired multi-channel output format having N2 output
channels. Audio downmix channels or signals generated are
typically processed in the same frequency band as they have
been analyzed in. Decoding may be performed in a manner
similar to DirAC. In the optional reproduction of diffuse
sound, the audio use for representing a non-diffuse stream
is typically either one of the optional NO downmix channel
signals or linear combinations thereof.

For the optional creation of a diffuse stream, several
synthesis options exist to create the diffuse part of the
output signals or the output channels corresponding to
loudspeakers according to the output multi-channel
representation. If there is only one downmix channel
transmitted, that channel has to be used to create non-
diffuse signals for each loudspeaker. If there are more
channels transmitted, there are more options how diffuse
sound may be created. If, for example, a stereo downmix is
used in the conversion process, an obviously suited method
is to apply the left downmix channel to the loudspeakers on
the left and the right downmix channel to the loudspeakers
on the right side. If several downmix channels are used for
the conversion (i.e. NO > 1), the diffuse stream for each
loudspeaker can be computed as a differently weighted sum
of these downmix channels. One possibility could, for
example, be transmitting a B-format signal (channels X, Y,
Z and w as previously described) and computing the signal
of a virtual cardioid microphone signal for each
loudspeaker.
The following text describes a possible procedure for the
conversion of an input multi-channel representation into an
output multi-channel representation as a list. In this
example, sound is recorded with a simulated B-format
microphone and then further processed by a signal composer
for listening or playing back with a multi-channel or a
monophonic loudspeaker setup. The single steps are
explained referencing Fig. 4 showing a conversion of a 5.1-
channel input multi-channel representation into an 8-
channel output multi-channel representation. The basis is a
N1-channel audio format (N1 being 5 in the specific
example). To convert the input multi-channel representation
into a different output multi-channel representation the
following steps may be performed.

1. Simulate an anechoic recording of an arbitrary multi-
channel audio representation having N1 audio channels (5
channels), as illustrated in the recording section 70 (with
a simulated B-format microphone in a center 72 of the
layout).
2. In an analysis step 74, the simulated microphone
signals are divided into frequency bands and in a
directional analysis step 76, the direction of origin of
portions of the simulated microphone signals are derived.
Furthermore, optionally, diffuseness (or coherence) may be
determined in a diffuseness termination step 78.
As previously mentioned a direction analysis may be
performed without using a B-format intermediate step. That
is, generally, an intermediate representation of the
spatial audio signal has to be derived based on an input
multi-channel representation, wherein the intermediate
representation has direction parameters indicating a
direction of origin of a portion of the spatial audio
signal.
3. In a downmix step 80, NO downmix audio signals are
derived, to be used as the basis for the conversion/ the
creation of the output multi-channel representation. In a
composition step 82, the NO downmix audio signals are
decoded or upmixed to an arbitrary loudspeaker setup
requiring N2 audio channels by an appropriate synthesis
method (for example using amplitude panning or equally
suitable techniques).
The result can be reproduced by a multi-channel loudspeaker
system, having for example 8 loudspeakers as indicated in
the playback scenario 84 of Fig. 4. However, thanks to the
universality of the concept, a conversion may also be
performed to a monophonic loudspeaker setup, providing an
effect as if the spatial audio signal had been recorded
with one single directional microphone.

Fig. 5 shows a principle sketch of an example for an
apparatus for conversion between multi-channel audio
formats 100.
The Apparatus 100 for receives an input multi-channel
representation 102.
The Apparatus 100 comprises an analyzer 104 for deriving an
intermediate representation 106 of the spatial audio
signal, the intermediate representation 106 having
direction parameters indicating a direction of origin of a
portion of the spatial audio signal.
The Apparatus 100 furthermore comprises a signal composer
108 for generating a output multi-channel representation
110 of the spatial audio signal using the intermediate
representation (106) of the spatial audio signal.
Summarizing, the embodiments of the conversion apparatuses
and conversion methods previously described provide some
great advantages. First of all, virtually any input audio
format can be' processed in this way. Moreover, the
conversion process can generate output for any loudspeaker
layout, including non-standard loudspeaker
layout/configurations without the need to specifically
tailor new relations for new combinations of input
loudspeaker layout/configurations and output loudspeaker
layout/configurations. Furthermore, the spatial resolution
of audio reproduction increases when the number of
loudspeakers is increased, contrary to prior art
implementations.
Depending on certain implementation requirements of the
inventive methods, the inventive methods can be implemented
in hardware or in software. The implementation can be
performed using a digital storage medium, in particular a
disk, DVD or a CD having electronically readable control

signals stored thereon, which cooperate with a programmable
computer system such that the inventive methods are
performed. Generally, the present invention is, therefore,
a computer program product with a program code stored on a
machine readable carrier, the program code being operative
for performing the inventive methods when the computer
program product runs on a computer. In other words, the
inventive methods are, therefore, a computer program having
a program code for performing at least one of the inventive
methods when the computer program runs on a computer.
While the foregoing has been particularly shown and
described with reference to particular embodiments thereof,
it will be understood by those skilled in the art that
various other changes in the form and details may be made
without departing from the spirit and scope thereof. It is
to be understood that various changes may be made in
adapting to different embodiments without departing from
the broader concepts disclosed herein and comprehended by
the claims that follow.

We-Claim
1. Apparatus for conversion of an input multi-channel
representation into a different output multi-channel
representation of a spatial audio signal, comprising:
an input representation decoder for deriving a number
of audio channels corresponding to the loudspeakers
associated to the input multi-channel representation;
an analyzer for deriving, using the number of audio
channels corresponding to the loudspeakers associated
to the input multi-channel representation, an
intermediate representation of the spatial audio
signal, the intermediate representation having
direction parameters indicating a direction of origin
of a portion of the spatial audio signal; and
a signal composer for generating the output multi-
channel representation of the spatial audio signal
using the intermediate representation of the spatial
audio signal.
2. Apparatus in accordance with claim 1, in which the
analyzer is operative to derive direction parameters
depending on a virtual correlation of the audio
channels associated to the input multi-channel
representation.
3. Apparatus in accordance with claim 1, in which the
analyzer is operative to derive direction parameters
preserving the relative phase information of the audio
channels associated to the input multi-channel
representation
4. Apparatus in accordance with claim 1, in which the
analyzer is operative to derive different direction

parameters for finite width frequency portions of the
spatial audio signal.
5. Apparatus in accordance with claim 1, in which the
analyzer is operative to derive different direction
parameters for finite length time portions of the
spatial audio signal.
6. Apparatus in accordance with claim 4, in which the
analyzer is operative to derive the different
direction parameters for finite length time portions
of the spatial audio signal associated to the
frequency portions, wherein the length of a first time
portion associated to a first frequency portion
differs from the length of a second time portion
association to a second, different frequency portion
of the spatial audio signal.
7. Apparatus in accordance with claim 1, in which the
analyzer is operative to derive direction parameters
describing a vector pointing to the direction of
origin of the portion of the spatial audio signal.
8. Apparatus in accordance with claim 1, in which the
analyzer is additionally operative to derive one or
more audio channels associated to the intermediate
representation.
9. Apparatus in accordance with claim 8, in which the
analyzer is operative to derive audio channels
corresponding to loudspeakers associated to the input
multi-channel representation.
10. Apparatus in accordance with claim 8, in which the
analyzer is operative to derive one downmix channel as
the sum of the audio channels corresponding to
loudspeakers associated to the input multi-channel
representation.

11. Apparatus in accordance with claim 8, in which the
analyzer is operative to derive at least one audio
channel associated to the direction of an axis of a
Cartesian Coordinate System.
12. Apparatus in accordance with claim 11, in which the
analyzer is operative to derive the at least one audio
channel building the weighted sum of the audio
channels corresponding to the loudspeakers associated
to the input multi-channel representation.
13. Apparatus in accordance with claim 11, in which the
analyzer is operative such that the deriving of the at
least one audio channel X associated to the direction
V of an axis of the Cartesian Coordinate System can be
described by a combination of n audio channels Cn
corresponding to the n loudspeakers associated to the
input multi-channel representation and directed in a
direction Ln, according to the following formula:

14. Apparatus in accordance with claim 1, in which the
analyzer is further operative to derive a diffuseness
parameter indicating a diffuseness of the direction of
origin of the portion of the spatial audio signal.
15. Apparatus in accordance with claim 1, in which the
signal composer is operative to distribute the portion
of the spatial audio signal to a number of channels
corresponding to a number of loudspeakers associated
to the output multi-channel representation.
16. Apparatus in accordance with claim 15, in which the
signal composer is operative such that the portion of
the spatial audio signal is distributed with greater

intensity to a channel corresponding to a loudspeaker
closer to the direction indicated by the direction
parameters than to a channel corresponding to a
loudspeaker further away from that direction.
17. Apparatus in accordance with claim 14, in which the
signal composer is operative such that the portion of
the spatial audio signal is distributed with more
uniform intensity to channels corresponding to
loudspeakers associated to the output multi-channel
representation when the diffuseness parameter
indicates higher diffuseness than when the diffuseness
parameter indicates lower diffuseness.
18. Apparatus in accordance with claim 1 further
comprising:
an input interface for receiving the input multi-
channel representation.
19. Apparatus in accordance with claim 15, in which the
signal composer further comprises an output channel
encoder for deriving the output multi-channel
representation based on the audio channels
corresponding to the loudspeakers associated to the
output channel representation.
20. Apparatus in accordance with claim 1 further
comprising . an output interface for providing the
output multi-channel representation.
21. Method for conversion of an input multi-channel
representation into a different output multi-channel
representation of a spatial audio signal, the method
comprising:

deriving a number of audio channels corresponding to
the loudspeakers associated to the input multi-channel
representation;
deriving, using the number of audio channels
corresponding to the loudspeakers associated to the
input multi-channel representation, an intermediate
representation of the spatial audio signal, the
intermediate representation having direction
parameters indicating a direction of origin of a
portion of the spatial audio signal; and
generating the output multi-channel representation of
the spatial audio signal using the intermediate
representation of the spatial audio signal.
22. A computer program for, when running on a computer,
implementing the method for conversion of a multi-
channel representation into a different output multi-
channel representation of a spatial audio signal, the
method comprising:
deriving a number of audio channels corresponding to
the loudspeakers associated to the input multi-channel
representation;
deriving, using the number of audio channels
corresponding to the loudspeakers associated to the
input multi-channel representation, an intermediate
representation of the spatial audio signal, the
intermediate representation having direction
parameters indicating a direction of origin of a
portion of the spatial audio signal; and
generating the output multi-channel representation of
the spatial audio signal using the intermediate
representation of the spatial audio signal.

An input multi-channel representation is converted into a different output multi-channel representation of a spatial audio signal, in that an intermediate representation of the
spatial audio signal is derived, the intermediate
representation having direction parameters indicating a direction of origin of a portion of the spatial audio signal; and in that the output multi-channel representation of the spatial audio signal is generated using the intermediate representation of the spatial audio signal.

Documents

Application Documents

#	Name	Date
1	3140-KOLNP-2009-FORM 3 [05-03-2024(online)].pdf	2024-03-05
1	abstract-3140-kolnp-2009.jpg	2011-10-07
2	3140-KOLNP-2009-FORM 3 [07-09-2023(online)].pdf	2023-09-07
2	3140-kolnp-2009-specification.pdf	2011-10-07
3	3140-kolnp-2009-pct request form.pdf	2011-10-07
3	3140-KOLNP-2009-FORM 3 [02-03-2023(online)].pdf	2023-03-02
4	3140-kolnp-2009-pct priority document notification.pdf	2011-10-07
4	3140-KOLNP-2009-FORM 3 [01-09-2022(online)].pdf	2022-09-01
5	3140-KOLNP-2009-PA.pdf	2011-10-07
5	3140-KOLNP-2009-FORM 3 [07-03-2022(online)].pdf	2022-03-07
6	3140-kolnp-2009-international search report.pdf	2011-10-07
6	3140-KOLNP-2009-FORM 3 [02-09-2021(online)].pdf	2021-09-02
7	3140-kolnp-2009-international publication.pdf	2011-10-07
7	3140-KOLNP-2009-Information under section 8(2) [02-03-2021(online)].pdf	2021-03-02
8	3140-kolnp-2009-international preliminary examination report.pdf	2011-10-07
8	3140-KOLNP-2009-Information under section 8(2) [14-09-2020(online)].pdf	2020-09-14
9	3140-kolnp-2009-form 5.pdf	2011-10-07
9	3140-KOLNP-2009-Information under section 8(2) [13-03-2020(online)].pdf	2020-03-13
10	3140-kolnp-2009-form 3.pdf	2011-10-07
10	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [20-09-2019(online)].pdf	2019-09-20
11	3140-KOLNP-2009-FORM 3.1.1.pdf	2011-10-07
11	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [16-04-2019(online)].pdf	2019-04-16
12	3140-kolnp-2009-form 2.pdf	2011-10-07
12	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [31-12-2018(online)].pdf	2018-12-31
13	3140-KOLNP-2009-FORM 18.pdf	2011-10-07
13	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [14-03-2018(online)].pdf	2018-03-14
14	3140-kolnp-2009-form 1.pdf	2011-10-07
14	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [21-09-2017(online)].pdf	2017-09-21
15	3140-kolnp-2009-drawings.pdf	2011-10-07
15	Information under section 8(2) [28-06-2017(online)].pdf	2017-06-28
16	3140-kolnp-2009-description (complete).pdf	2011-10-07
16	Other Patent Document [09-01-2017(online)].pdf	2017-01-09
17	Other Patent Document [13-09-2016(online)].pdf	2016-09-13
17	3140-kolnp-2009-correspondence.pdf	2011-10-07
18	3140-KOLNP-2009-CORRESPONDENCE-1.1.pdf	2011-10-07
18	3140-KOLNP-2009_EXAMREPORT.pdf	2016-06-30
19	3140-KOLNP-2009-(10-12-2014)-ABSTRACT.pdf	2014-12-10
19	3140-KOLNP-2009-CORRESPONDENCE 1.2.pdf	2011-10-07
20	3140-KOLNP-2009-(10-12-2014)-ANNEXURE TO FORM 3.pdf	2014-12-10
20	3140-kolnp-2009-claims.pdf	2011-10-07
21	3140-KOLNP-2009-(10-12-2014)-CLAIMS.pdf	2014-12-10
21	3140-KOLNP-2009-ASSIGNMENT.pdf	2011-10-07
22	3140-KOLNP-2009-(10-12-2014)-CORRESPONDENCE.pdf	2014-12-10
22	3140-kolnp-2009-abstract.pdf	2011-10-07
23	3140-KOLNP-2009-(10-12-2014)-FORM-1.pdf	2014-12-10
23	3140-KOLNP-2009-(10-12-2014)-PETITION UNDER RULE 137.pdf	2014-12-10
24	3140-KOLNP-2009-(10-12-2014)-OTHERS.pdf	2014-12-10
24	3140-KOLNP-2009-(10-12-2014)-FORM-2.pdf	2014-12-10
25	3140-KOLNP-2009-(10-12-2014)-FORM-3.pdf	2014-12-10
25	3140-KOLNP-2009-(10-12-2014)-FORM-5.pdf	2014-12-10
26	3140-KOLNP-2009-(10-12-2014)-FORM-3.pdf	2014-12-10
26	3140-KOLNP-2009-(10-12-2014)-FORM-5.pdf	2014-12-10
27	3140-KOLNP-2009-(10-12-2014)-FORM-2.pdf	2014-12-10
27	3140-KOLNP-2009-(10-12-2014)-OTHERS.pdf	2014-12-10
28	3140-KOLNP-2009-(10-12-2014)-FORM-1.pdf	2014-12-10
28	3140-KOLNP-2009-(10-12-2014)-PETITION UNDER RULE 137.pdf	2014-12-10
29	3140-KOLNP-2009-(10-12-2014)-CORRESPONDENCE.pdf	2014-12-10
29	3140-kolnp-2009-abstract.pdf	2011-10-07
30	3140-KOLNP-2009-(10-12-2014)-CLAIMS.pdf	2014-12-10
30	3140-KOLNP-2009-ASSIGNMENT.pdf	2011-10-07
31	3140-KOLNP-2009-(10-12-2014)-ANNEXURE TO FORM 3.pdf	2014-12-10
31	3140-kolnp-2009-claims.pdf	2011-10-07
32	3140-KOLNP-2009-(10-12-2014)-ABSTRACT.pdf	2014-12-10
32	3140-KOLNP-2009-CORRESPONDENCE 1.2.pdf	2011-10-07
33	3140-KOLNP-2009-CORRESPONDENCE-1.1.pdf	2011-10-07
33	3140-KOLNP-2009_EXAMREPORT.pdf	2016-06-30
34	3140-kolnp-2009-correspondence.pdf	2011-10-07
34	Other Patent Document [13-09-2016(online)].pdf	2016-09-13
35	Other Patent Document [09-01-2017(online)].pdf	2017-01-09
35	3140-kolnp-2009-description (complete).pdf	2011-10-07
36	3140-kolnp-2009-drawings.pdf	2011-10-07
36	Information under section 8(2) [28-06-2017(online)].pdf	2017-06-28
37	3140-kolnp-2009-form 1.pdf	2011-10-07
37	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [21-09-2017(online)].pdf	2017-09-21
38	3140-KOLNP-2009-FORM 18.pdf	2011-10-07
38	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [14-03-2018(online)].pdf	2018-03-14
39	3140-kolnp-2009-form 2.pdf	2011-10-07
39	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [31-12-2018(online)].pdf	2018-12-31
40	3140-KOLNP-2009-FORM 3.1.1.pdf	2011-10-07
40	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [16-04-2019(online)].pdf	2019-04-16
41	3140-kolnp-2009-form 3.pdf	2011-10-07
41	3140-KOLNP-2009-Information under section 8(2) (MANDATORY) [20-09-2019(online)].pdf	2019-09-20
42	3140-kolnp-2009-form 5.pdf	2011-10-07
42	3140-KOLNP-2009-Information under section 8(2) [13-03-2020(online)].pdf	2020-03-13
43	3140-KOLNP-2009-Information under section 8(2) [14-09-2020(online)].pdf	2020-09-14
43	3140-kolnp-2009-international preliminary examination report.pdf	2011-10-07
44	3140-KOLNP-2009-Information under section 8(2) [02-03-2021(online)].pdf	2021-03-02
44	3140-kolnp-2009-international publication.pdf	2011-10-07
45	3140-kolnp-2009-international search report.pdf	2011-10-07
45	3140-KOLNP-2009-FORM 3 [02-09-2021(online)].pdf	2021-09-02
46	3140-KOLNP-2009-PA.pdf	2011-10-07
46	3140-KOLNP-2009-FORM 3 [07-03-2022(online)].pdf	2022-03-07
47	3140-kolnp-2009-pct priority document notification.pdf	2011-10-07
47	3140-KOLNP-2009-FORM 3 [01-09-2022(online)].pdf	2022-09-01
48	3140-kolnp-2009-pct request form.pdf	2011-10-07
48	3140-KOLNP-2009-FORM 3 [02-03-2023(online)].pdf	2023-03-02
49	3140-kolnp-2009-specification.pdf	2011-10-07
49	3140-KOLNP-2009-FORM 3 [07-09-2023(online)].pdf	2023-09-07
50	3140-KOLNP-2009-FORM 3 [05-03-2024(online)].pdf	2024-03-05
50	abstract-3140-kolnp-2009.jpg	2011-10-07