Apparatus For Merging Spatial Audio Streams

< Back

Apparatus For Merging Spatial Audio Streams

Abstract: An apparatus (100) for merging a first spatial audio stream with a second spatial audio stream to obtain a merged audio stream comprising an estimator (120) for estimating a first wave representation comprising a first wave direction measure and a first wave field measure for the first spatial audio stream, the first spatial audio stream having a first audio representation and a first direction of arrival. The estimator (120) being adapted for estimating a second wave representation comprising a second wave direction measure and a second wave field measure for the second spatial audio stream, the second spatial audio stream having a second audio representation and a second direction of arrival. The apparatus (100) further comprising a processor (130) for processing the first wave « representation and the second wave representation to obtain a merged wave representation comprising a merged wave field measure and a merged direction of arrival measure, and for processing the first audio representation and the second audio representation to obtain a merged audio representation, and for providing the merged audio stream comprising the merged audio representation and the merged direction of arrival measure.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

09 February 2011

Publication Number

47/2011

Publication Type

INA

Invention Field

COMMUNICATION

Status

Parent Application

Patent Number

Legal Status

Grant Date

2018-07-12

Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

HANSASTRASSE 27C 80686 MUENCHEN GERMANY

Inventors

1. GIOVANNI DEL GALDO

KRAEHENWEG 95 90768 FUERTH GERMANY

2. KUECH, FABIAN

SOPHIENSTRASSE 77 91052 ERLANGEN GERMANY

3. MARKUS KALLINGER

SCHORLACHSTRASSE 23A 91058 ERLANGEN GERMANY

4. VILLE PULKKI

YLAEPORTI 4 A 7 02210 ESPOO Finland

5. MIKKO-VILLE LAITINEN

ALBERGANESPLANADI 2 A 26 02600 ESPOO FINLAND

6. RICHARD SCHULTZ-AMLING

WALZWERKSTR. 2 90491 NUERNBERG GERMANY

Specification

Apparatus for Merging Spatial Audio Streams
Description
The present invention is in the field of audio processing, especially spatial audio
processing, and the merging of multiple spatial audio streams.
DirAC (DirAC = Directional Audio Coding), cf. V. Pulkki and C. Faller, Directional audio
coding in spatial sound reproduction and stereo upmixing, In AES 28th International
Conference, Pitea, Sweden, June 2006, and V. Pulkki, A method for reproducing natural or
modified spatial impression in Multichannel listening, Patent WO 2004/077884 A1,
September 2004, is an efficient approach to the analysis and reproduction of spatial sound.
DirAC uses a parametric representation of sound fields based on the features which are
relevant for the perception of spatial sound, namely the direction of arrival (DOA =
Direction Of Arrival) and diffuseness of the sound field in frequency subbands. In fact,
DirAC assumes that interaural time differences (ITD = Interaural Time Differences) and
interaural level differences (ILD = Interaural Level Differences) are perceived correctly
when the DOA of a sound field is correctly reproduced, while interaural coherence (IC =
Interaural Coherence) is perceived correctly, if the diffuseness is reproduced accurately.
These parameters, namely DOA and diffuseness, represent side information which
accompanies a mono signal in what is referred to as mono DirAC stream. The DirAC
parameters are obtained from a time-frequency representation of the microphone signals.
Therefore, the parameters are dependent on time and on frequency. On the reproduction
side, this information allows for an accurate spatial rendering. To recreate the spatial sound
at a desired listening position a multi-loudspeaker setup is required. However, its geometry
is arbitrary. In fact, the signals for the loudspeakers are determined as a function of the
DirAC parameters.
There are substantial differences between DirAC and parametric multichannel audio
coding such as MPEG Surround although they share very similar processing structures, cf.
Lars Villemoes, Juergen Herre, Jeroen Breebaart, Gerard Hotho, Sascha Disch, Heiko
Purnhagen, and Kristofer Kjrlingm, MPEG surround: The forthcoming ISO standard for
spatial audio coding, in AES 28th International Conference, Pitea, Sweden, June 2006.
While MPEG Surround is based on a time-frequency analysis of the different loudspeaker
channels, DirAC takes as input the channels of coincident microphones, which effectively
describe the sound field in one point. Thus, DirAC also represents an efficient recording
technique for spatial audio.
Another conventional system which deals with spatial audio is SAOC (SAOC = Spatial
Audio Object Coding), cf. Jonas Engdegard, Barbara Resch, Cornelia Falch, Oliver
Hellmuth, Johannes Hilpert, Andreas Hoelzer, Leonid Ternetiev, Jeroen Breebaart, Jeroen
Koppens, Erik Schuijer, and Werner Oomen, Spatial audio object coding (SAOC) the
upcoming MPEG standard on parametric object based audio coding, in 124th AES
Convention, May 17-20, 2008, Amsterdam, The Netherlands, 2008, currently under
standardization in ISO/MPEG.
It builds upon the rendering engine of MPEG Surround and treats different sound sources
as objects. This audio coding offers very high efficiency in terms of bitrate and gives
unprecedented freedom of interaction at the reproduction side. This approach promises
new compelling features and functionality in legacy systems, as well as several other novel
applications.
It is the object of the present invention to provide an approved concept for merging spatial
audio signals.
The object is achieved by an apparatus for merging according to one of the claims 1 or 14
and a method for merging according to one of the claims 13 or 15.
*
Note that the merging would be trivial in the case of a multi-channel DirAC stream, i.e. if
the 4 B-format audio channels were available. In fact, the signals from different sources
can be directly summed to obtain the B-format signals of the merged stream. However, if
these channels are not available direct merging is problematic.
The present invention is based on the finding that spatial audio signals can be represented
by the sum of a wave representation, e.g. a plane wave representation, and a diffuse field
representation. To the former it may be assigned a direction. When merging several audio
streams, embodiments may allow to obtain the side information of the merged stream, e.g.
in terms of a diffuseness and a direction. Embodiments may obtain this information from
the wave representations as well as the input audio streams. When merging several audio
streams, which all can be modeled by a wave part or representation and a diffuse part or
representation, wave parts or components and diffuse parts or components can be merged
separately. Merging the wave part yields a merged wave part, for which a merged direction
can be obtained based on the directions of the wave part representations. Moreover, the
diffuse parts can also be merged separately, from the merged diffuse part, an overall
diffuseness parameter can be derived.
Embodiments may provide a method to merge two or more spatial audio signals coded as
mono DirAC streams. The resulting merged signal can be represented as a mono DirAC
stream as well. In embodiments mono DirAC encoding can be a compact way of
describing spatial audio, as only a single audio channel needs to be transmitted .together
with side information.
In embodiments a possible scenario can be a teleconferencing application with more than
two parties. For instance, let user A communicate with users B and C, who generate two
separate mono DirAC streams. At the location of A, the embodiment may allow the
streams of user B and C to be merged into a single mono DirAC stream, which can be
reproduced with the conventional DirAC synthesis technique. In an embodiment utilizing a
network topology which sees the presence of a multipoint control unit (MCU = multipoint
control unit), the merging operation would be performed by the MCU itself, so that user A
would receive a single mono DirAC stream already containing speech from both B and C.
Clearly, the DirAC streams to be merged can also be generated synthetically, meaning that
proper side information can be added to a mono audio signal. In the example just
mentioned, user A might receive two audio streams from B and C without any side
information. It is then possible to assign to each stream a certain direction and diffuseness,
thus adding the side information needed to construct the DirAC streams, which can then be
merged by an embodiment.
Another possible scenario in embodiments can be found in multiplayer online gaming and
virtual reality applications. In these cases several streams are generated from either players
or virtual objects. Each stream is characterized by a certain direction of arrival relative to
the listener and can therefore be expressed by a DirAC stream. The embodiment may be
used to merge the different streams into a single DirAC stream, which is then reproduced
at the listener position.
Embodiments of the present invention will be detailed using the accompanying figures, in
which
Fig. 1a shows an embodiment of an apparatus for merging;
Fig. 1 b shows pressure and components of a particle velocity vector in a Gaussian plane for
a plane wave;
Fig. 2 shows an embodiment of a DirAC encoder;
Fig. 3 illustrates an ideal merging of audio streams;
Fig. 4 shows the inputs and outputs of an embodiment of a general DirAC merging
processing block;
Fig. 5 shows a block diagram of an embodiment; and
Fig. 6 shows a flowchart of an embodiment of a method for merging.
Fig. 1a illustrates an embodiment of an apparatus 100 for merging a first spatial audio
stream with a second spatial audio stream to obtain a merged audio stream. The
embodiment illustrated in Fig. 1 a illustrates the merge of two audio streams, however shall
not be limited to two audio streams, in a similar way, multiple spatial audio streams may
be merged. The first spatial audio stream and the second spatial audio stream may, for
example, correspond to mono DirAC streams and the merged audio stream may also
correspond to a single mono DirAC audio stream. As will be detailed subsequently, a
mono DirAC stream may comprise a pressure signal e.g. captured by an omni-directional
microphone and side information. The latter may comprise time-frequency dependent
measures of diffuseness and direction of arrival of sound.
Fig. la shows an embodiment of an apparatus 100 for merging a first spatial audio stream
with a second spatial audio stream to obtain a merged audio stream, comprising an
estimator 120 for estimating a first wave representation comprising a first wave direction
measure and a first wave field measure for the first spatial audio stream, the first spatial
audio stream having a first audio representation and a first direction of arrival, and for
estimating a second wave representation comprising a second wave direction measure and
a second wave field measure for the second spatial audio stream, the second spatial audio
stream having a second audio representation and a second direction of arrival. In
embodiments the first and/or second wave representation may correspond to a plane wave
representation.
In the embodiment shown in Fig. la the apparatus 100 further comprises a processor 130
for processing the first wave representation and the second wave representation to obtain a
merged wave representation comprising a merged field measure and a merged direction of
«
arrival measure and for processing the first audio representation and the second audio
representation to obtain a merged audio representation, the processor 130 is further adapted
for providing the merged audio stream comprising the merged audio representation and the
merged direction of arrival measure.
The estimator 120 can be adapted for estimating the first wave field measure in terms of a
first wave field amplitude, for estimating the second wave field measure in terms of a
second wave field amplitude and for estimating a phase difference between the first wave
field measure and the second wave field measure. In embodiments the estimator can be
adapted for estimating a first wave field phase and a second wave field phase. In
embodiments, the estimator 120 may estimate only a phase shift or difference between the
first and second wave representations, the first and second wave field measures,
respectively. The processor 130 may then accordingly be adapted for processing the first
wave representation and the second wave representation to obtain a merged wave
representation comprising a merged wave field measure, which may comprise a merged
wave field amplitude, a merged wave field phase and a merged direction of arrival
measure, and for processing the first audio representation and the second audio
representation to obtain a merged audio representation.
In embodiments the processor 130 can be further adapted for processing the first wave
representation and the second wave representation to obtain the merged wave
representation comprising the merged wave field measure, the merged direction of arrival
measure and a merged diffuseness parameter, and for providing the merged audio stream
comprising the merged audio representation, the merged direction of arrival measure and
«
the merged diffuseness parameter.
In other words, in embodiments a diffuseness parameter can be determined based on the
wave representations for the merged audio stream. The diffuseness parameter may
establish a measure of a spatial diffuseness of an audio stream, i.e. a measure for a spatial
distribution as e.g. an angular distribution around a certain direction. In an embodiment a
possible scenario could be the merging of two mono synthetic signals with just directional
information.
The processor 130 can be adapted for processing the first wave representation and the
second wave representation to obtain the merged wave representation, wherein the merged
diffuseness parameter is based on the first wave direction measure and on the second wave
direction measure. In embodiments the first and second wave representations may have
different directions of arrival and the merged direction of arrival may lie in between them.
In this embodiment, although the first and second spatial audio streams may not provide
any diffuseness parameters, the merged diffuseness parameter can be determined from the
first and second wave representations, i.e. based on the first wave direction measure and on
the second wave direction measure. For example, if two plane waves impinge from
different directions, i.e. the first wave direction measure differs from the second wave
direction measure, the merged audio representation may comprise a combined merged
direction of arrival with a none-vanishing merged diffuseness parameter, in prder to
account for the first wave direction measure and the second wave direction measure. In
other words, while two focussed spatial audio streams may not have or provide any
diffuseness, the merged audio stream may have a none-vanishing diffuseness, as it is based
on the angular distribution established by the first and second audio streams.
Embodiments may estimate a diffuseness parameter ¥, for example, for a merged DirAC
stream. Generally, embodiments may then set or assume the diffuseness parameters of the
individual streams to a fixed value, for instance 0 or 0.1, or to a varying value derived from
an analysis of the audio representations and/or direction representations.
In other embodiments, the apparatus 100 for merging the first spatial audio stream with
the second spatial audio stream to obtain a merged audio stream, may comprise the
estimator 120 for estimating the first wave representation comprising a first wave direction
measure and a first wave field measure for the first spatial audio stream, the first spatial
audio stream having the first audio representation, the first direction of arrival and a first
diffuseness parameter. In other words, the first audio representation may correspond to an
audio signal with a certain spatial width or being diffuse to a certain extend. In one
embodiment, this may correspond to scenario in a computer game. A first player may be in
a scenario, where the first audio representation represents an audio source as for example a
train passing by, creating a diffuse sound field to a certain extend. In such an embodiment,
sounds evoked by the train itself may be diffuse, a sound produced by the train's horn, i.e.
the corresponding frequency components, may not be diffuse.
The estimator 120 may further be adapted for estimating the second wave representation
comprising the second wave direction measure and the second wave field measure for the
second spatial audio stream, the second spatial audio stream having the second audio
representation, the second direction of arrival and a second diffuseness parameter. In other
words, the second audio representation may correspond to an audio signal with a certain
spatial width or being diffuse to a certain extend. Again this may correspond to the
scenario in the computer game, where a second sound source may be represented by the
second audio stream, for example, background noise of another train passing by on another
track. For the first player in the computer game, both sound source may be diffuse as he is
located at the train station.
In embodiments the processor 130 can be adapted for processing the first wave
representation and the second wave representation to obtain the merged wave
representation comprising the merged wave field measure and the merged direction of
arrival measure, and for processing the first audio representation and the second audio
representation to obtain the merged audio representation, and for providing the merged
audio stream comprising the merged audio representation and the merged direction of
arrival measure. In other words the processor 130 may not determine a merged diffuseness
parameter. This may correspond to the sound field experienced by a second player in the
above-described computer game. The second player may be located farther away from the
train station, so the two sound sources may not be experienced as diffuse by the second
player, but represent rather focussed sound sources, due to the larger distance.
In embodiments the apparatus 100 may further comprise a means 110 for determining for
the first spatial audio stream the first audio representation and the first direction of arrival,
and for determining for the second spatial audio stream the second audio representation
and the second direction of arrival. In embodiments the means 110 for determining may be
provided with a direct audio stream, i.e. the determining may just refer to reading the audio
representation in terms of e.g. a pressure signal and a DOA and optionally also diffuseness
parameters in terms of the side information.
The estimator 120 can be adapted for estimating the first wave representation from the first
spatial audio stream further having a first diffuseness parameter and/or for estimating the
second wave representation from the second spatial audio stream further having a second
diffuseness parameter, the processor 130 may be adapted for processing the merged wave
field measure, the first and second audio representations and the first and second
diffuseness parameters to obtain the merged diffuseness parameter for the merged audio
stream, and the processor 130 can be further adapted for providing the audio stream
comprising the merged diffuseness parameter. The means 110 for determining can be
adapted for determining the first diffuseness parameter for the first spatial audio stream
and the second diffuseness parameter for the second spatial audio stream.
The processor 130 can be adapted for processing the spatial audio streams, the audio
representations, the DOA and/or the diffuseness parameters blockwise, i.e. in terms of
segments of samples or values. In some embodiments a segment may comprise a
predetermined number of samples corresponding to a frequency representation of a certain
frequency band at a certain time of a spatial audio stream. Such segment may correspond
to a mono representation and have associated a DOA and a diffuseness parameter.
t
In embodiments the means 110 for determining can be adapted for determining the first
and second audio representation, the first and second direction of arrival and the first and
second diffuseness parameters in a time-frequency dependent way and/or the processor
130 can be adapted for processing the first and second wave representations, diffuseness
parameters and/or DOA measures and/or for determining the merged audio representation,
the merged direction of arrival measure and/or the merged diffuseness parameter in a time-
frequency dependent way.
In embodiments the first audio representation may correspond to a first mono
representation and the second audio representation may correspond to a second mono
representation and the merged audio representation may correspond to a merged mono
representation. In other words, the audio representations may correspond to a single audio
channel.
In embodiments, the means 110 for determining can be adapted for determining and/or the
processor can be adapted for processing the first and second mono representation, the first
and the second DOA and a first and a second diffuseness parameter and the processor 130
may provide the merged mono representation, the merged DOA measure and/or the
merged diffuseness parameter in a time-frequency dependent way. In embodiments the
first spatial audio stream may already be provided in terms of, for example, a DirAC
representation, the means 110 for determining may be adapted for determining the first and
second mono representation, the first and second DOA and the first and second diffuseness
parameters simply by extraction from the first and the second audio streams, e.g. from the
DirAC side information.
In the following, an embodiment will be illuminated in detail, where the notation and the
data model are to be introduced first. In embodiments, the means 110 for determining can
be adapted for determining the first and second audio representations and/or the processor
130 can be adapted for providing a merged mono representation in terms of a pressure
signal p(t) or a time-frequency transformed pressure signal P{k,ri), wherein k denotes a
frequency index and n denotes a time index.
In embodiments the first and second wave direction measures as well as the" merged
direction of arrival measure may correspond to any directional quantity, as e.g. a vector, an
angle, a direction etc. and they may be derived from any directional measure representing
an audio component as e.g. an intensity vector, a particle velocity vector, etc. The first and
second wave field measures as well as the merged wave field measure may correspond to
any physical quantity describing an audio component, which can be real or complex
valued, correspond to a pressure signal, a particle velocity amplitude or magnitude,
loudness etc. Moreover, measures may be considered in the time and/or frequency
domain.
Embodiments may be based on the estimation of a plane wave representation for the wave
field measures of the wave representations of the input streams, which can be carried out
by the estimator 120 in Fig. la. In other words the wave field measure may be modelled
using a plane wave representation. In general there exist several equivalent exhaustive (i.e.,
complete) descriptions of a plane wave or waves in general. In the following a
mathematical description will be introduced for computing diffuseness parameters and
directions of arrivals or direction measures for different components. Although only a few
descriptions relate directly to physical quantities, as for instance pressure, particle velocity
etc., potentially there exist an infinite number of different ways to describe wave
representations, of which one shall be presented as an example subsequently, however, not
meant to be limiting in any way to embodiments of the present invention.
In order to further detail different potential descriptions two real numbers a and b are
considered. The information contained in a and b may be transferred by sending c and
d, when

wherein Si is a known 2x2 matrix. The example considers only linear combinations,
generally any combination, i.e. also a non-linear combination, is conceivable.
In the following scalars are represented by small letters a,b,c, while column vectors are
represented by bold small letters a,b,c. The superscript ( )T denotes the transpose,
respectively, whereas Q and (•)' denote complex conjugation. The complex phasor
notation is distinguished from the temporal one. For instance, the pressure p{t), which is a
real number and from which a possible wave field measure can be derived, can be
expressed by means of the phasor P, which is a complex number and from which another
possible wave field measure can be derived, by

wherein Re{-} denotes the real part and o = 2?f is the angular frequency. Furthermore,
capital letters used for physical quantities represent phasors in the following. For the
following introductory example and to avoid confusion, please note that all quantities with
subscript "PW" considered in the following refer to plane waves.
For an ideal monochromatic plane wave the particle velocity
vector UPW can be noted as

where the unit vector ed points towards the direction of propagation of the wave, e.g.
corresponding to a direction measure. It can be proven that

wherein Ia denotes the active intensity, p0 denotes the air density, c denotes the speed of
sound, E denotes the sound field energy and T denotes the diffuseness.
It is interesting to note that since all components of ed are real numbers, the components
of UPW are all in-phase with PPW. Fig. lb illustrates an exemplary Vpw and PPW in the
Gaussian plane. As just mentioned, all components of UPiV share the same phase'as Ppw,
namely 0. Their magnitudes, on the other hand, are bound to

Even when multiple sound sources are present, the pressure and particle velocity can still
be expressed as a sum of individual components. Without loss of generality, the case of
two sound sources can be illuminated. In fact, the extension to larger numbers of sources is
straight-forward.
Let Pm and P{2) be the pressures which would have been recorded for the first and second
source, respectively, e.g. representing the first and second wave field measures.
Similarly, let Um and U(2) be the complex particle velocity vectors. Given the linearity of
the propagation phenomenon, when the sources play together, the observed pressure P
and particle velocity U are

Therefore, the active intensities are
Thus,
Note that apart from special cases,

When the two, e.g. plane, waves are exactly in-phase (although traveling towards 'different
directions),

wherein y is a real number. It follows that
and
When the waves are in-phase and traveling towards the same direction they can be clearly
interpreted as one wave.
For y - -1 and any direction, the pressure vanishes and there can be no flow of energy,

When the waves are perfectly in quadrature, then

wherein y is a real number. From this it follows that
and
Using the above equations it can easily be proven that for a plane wave each of the
exemplary quantities U, P and ed, or P and Ia may represent an equivalent and
exhaustive description, as all other physical quantities can be derived from them, i.e., any
combination of them may in embodiments be used in place of the wave field measure or
wave direction measure. For example, in embodiments the 2-norm of the active intensity
vector may be used as wave field measure.
A minimum description may be identified to perform the merging as specified by the
embodiments. The pressure and particle velocity vectors for the i-th plane wave can be
expressed as
This equation shows that the information required to compute Ia can be reduced to
\P{I)\, e{J], \ZPm -ZPm\. In other words, the representation for each e.g. plane, wave can
be reduced to the amplitude of the wave and the direction of propagation. Furthermore, the
relative phase difference between the waves may be considered as well. When more than
two waves are to be merged, the phase differences between all pairs of waves' may be
considered. Clearly, there exist several other descriptions which contain the very same
information. For instance, knowing the intensity vectors and the phase difference would be
equivalent.
Generally, an energetic description of the plane waves may not be enough to carry out the
merging correctly. The merging could be approximated by assuming the waves in
quadrature. An exhaustive descriptor of the, waves (i.e., all physical quantities of the wave
are known) can be sufficient for the merging, however may not be necessary in all
embodiments. In embodiments carrying out correct merging the amplitude of each wave,
the direction of propagation of each wave and the relative phase difference between each
pair of waves to be merged may be taken into account.
The means 110 for determining can be adapted for providing and/or the processor 130 can
be adapted for processing the first and second directions of arrival and/or for providing the
merged direction of arrival measure in terms of a unity vector e^,,, (£,«), with

denoting the time-frequency transformed u(t)= \ux{t),u {t^u^f particle velocity vector.
In other words, let p(t) and u{t)- \*x{t\uyit\uz(ty[ be the pressure and particle velocity
vector, respectively, for a specific point in space, where [J denotes the transpose. These
signals can be transformed into a time-frequency domain by means of a proper filter bank
e.g., a Short Time Fourier Transform (STFT) as suggested e.g. by V. Pulkki and C. Faller,
Directional audio coding: Filterbank and STFT-based design, in 120th AES Convention,
May 20-23, 2006, Paris, France, May 2006.
Let P(k,n) and U{k,n)=[ux(k,n),Uy{k,n),U2(k,n)} denote the transformed signals,
where k and n are indices for frequency (or frequency band) and time, respectively. The
active intensity vector Ia{k,n) can be defined as

where (•)* denotes complex conjugation and Re{} extracts the real part. The active
intensity vector expresses the net flow of energy characterizing the sound field, cf. F.J.
Fahy, Sound Intensity, Essex: Elsevier Science Publishers Ltd., 1989, and may thus be
used as a wave field measure.
Let c denote the speed of sound in the medium considered and E the sound field energy
defined by F.J. Fahy

where |-| computes the 2-norm. In the following, the content of a mono DirAC stream will
be detailed.
The mono DirAC stream may consist of the mono signal p(t) and of side information.
This side information may comprise the time-frequency dependent direction of arrival and
a time-frequency dependent measure for diffuseness. The former can be denoted with
emA(k,n), which is a unit vector pointing towards the direction from which sound arrives.
The latter, diffuseness, is denoted by

In embodiments, the means 110 and/or the processor 130 can be adapted for
providing/processing the first and second DOAs and/or the merged DOA in terms of a
unity vector e^^^ri). The direction of arrival can be obtained as

where the unit vector e,(k,ri) indicates the direction towards which the active intensity
points, namely

Alternatively in embodiments, the DOA can be expressed in terms of azimuth and
elevation angles in a spherical coordinate system. For instance, if q> and 9 are azimuth
and elevation angles, respectively, then

In embodiments, the means 110 for determining and/or the processor 130 can be adapted
for providing/processing the first and second diffuseness parameters and/or the merged
diffuseness parameter by V(&,«) in a time-frequency dependent manner. The means 110
for determining can be adapted for providing the first and/or the second diffuseness
parameters and/or the processor 130 can be adapted for providing a merged diffuseness
parameter in terms of

where < • >, indicates a temporal average.
There exist different strategies to obtain P(k,ri) and U(k,n) in practice. One possibility is
to use a B-format microphone, which delivers 4 signals, namely w{t), x(t), y(t) and z(t).
The first one, w(t), corresponds to the pressure reading of an omnidirectional microphone.
The latter three are pressure readings of microphones having figure-of-eight pickup
patterns directed towards the three axes of a Cartesian coordinate system. These signals are
also proportional to the particle velocity. Therefore, in some embodiments

where W{k,ri), X(k,ri), Y(k,n) and Z(k,n) are the transformed B-format signals. Note
that the factor V2 in (6) comes from the convention used in the definition of B-format
signals, cf. Michael Gerzon, Surround sound psychoacoustics, In Wireless World, volume
80, pages 483-486, December 1974.
Alternatively, P(k,n) and U{k,ri) can be estimated by means of an omnidirectional
microphone array as suggested in J. Merimaa, Applications of a 3-D microphone array, in
112 AES Convention, Paper 5501, Munich, May 2002. The processing steps described
above are also illustrated in Fig. 2.
Fig. 2 shows a DirAC encoder 200, which is adapted for computing a mono audio channel
and side information from proper input signals, e.g., microphone signals. In other words,
Fig. 2 illustrates a DirAC encoder 200 for determining diffuseness and direction of arrival
from proper microphone signals. Fig. 2 shows a DirAC encoder 200 comprising a P/U
estimation unit 210. The P/U estimation unit receives the microphone signals as input
information, on which the PIU estimation is based. Since all information is available, the
P/U estimation is straight-forward according to the above equations. An energetic
analysis stage 220 enables estimation of the direction of arrival and the diffuseness
parameter of the merged stream.
In embodiments, other audio streams than mono DirAC audio streams may be merged. In
other words, in embodiments the means 110 for determining can be adapted for converting
any other audio stream to the first and second audio streams as for example stereo or
surround audio data. In case that embodiments merge DirAC streams other than mono,
they may distinguish between different cases. If the DirAC stream carried B-format signals
as audio signals, then the particle velocity vectors would be known and a merging would
be trivial, as will be detailed subsequently. When the DirAC stream carries audio signals
other than B-format signals or a mono omnidirectional signal, the means 110 for
determining may be adapted for converting to two mono DirAC streams first, and an
embodiment may then merge the converted streams accordingly. In embodiments the first
and the second spatial audio streams can thus represent converted mono DirAC streams.
Embodiments may combine available audio channels to approximate an omnidirectional
pickup pattern. For instance, in case of a stereo DirAC stream, this may be achieved by
summing the left channel L and the right channel R.
In the following, the physics in a field generated by multiple sound sources shall be
illuminated. When multiple sound sources are present, it is still possible to express the
pressure and particle velocity as a sum of individual components.
Let Pi'\k,n) and U(l)(k,n) be the pressure and particle velocity which would have been
recorded for the i-th source, if it was to play alone. Assuming linearity of the propagation
phenomenon, when N sources play together, the observed pressure P{k,ri) and particle
velocity U(k,n) are
and
The previous equations show that if both pressure and particle velocity were known,
obtaining the merged mono DirAC stream would be straight-forward. Such a situation is
depicted in Fig. 3. Fig. 3 illustrates an embodiment performing optimized or possibly ideal
merging of multiple audio streams. Fig. 3 assumes that all pressure and particle velocity
vectors are known. Unfortunately, such a trivial merging is not possible for mono DirAC
streams, for which the particle velocity £/(,)(&, ri) is not known.
Fig. 3 illustrates N streams, for each of which a PIU estimation is carried out in blocks
301, 302-30N. The outcome of the PIU estimation blocks are the corresponding time-
frequency representations of the individual P(>)(k,n) and Ul'\k,ri) signals, which can
then be combined according to the above equations (7) and (8), illustrated by the two
adders 310 and 311. Once the combined P (k,n) and U (k,n) are obtained, an energetic
analysis stage 320 can determine the diffuseness parameter !?(£,«) and the direction of
arrival eDOA (k, ri) in a straight-forward manner.
Fig. 4 illustrates an embodiment for merging multiple mono DirAC streams. According to
the above description, N streams are to be merged by the embodiment of an apparatus 100
depicted in Fig. 4. As illustrated in Fig. 4, each of the N input streams may be represented
by a time-frequency dependent mono representation P{,){k,ri), a direction of arrival
e%A(k,n) and *Pm(k,n), where 0) represents the first stream. An according representation
is also illustrated in Fig. 4 for the merged stream.
The task of merging two or more mono DirAC streams is depicted in Fig. 4. As the
pressure P(k,ri) can be obtained simply by summing the known quantities P{l)(k,n) as in
(7), the problem of merging two or more mono DirAC streams reduces to the
determination of cm(i,n) and W{k,ri). The following embodiment is based on the
assumption that the field of each source consists of a plane wave summed to a diffuse field.
Therefore, the pressure and particle velocity for the i-th source can be expressed as

*
where the subscripts "PW" and "diff' denote the plane wave and the diffuse field,
respectively. In the following an embodiment is presented having a strategy to estimate the
direction of arrival of sound and diffuseness. The corresponding processing steps are
depicted in Fig. 5.
Fig. 5 illustrates another apparatus 500 for merging multiple audio streams which will be
detailed in the following. Fig. 5 exemplifies the processing of the first spatial audio stream
in terms of a first mono representation P(l), a first direction of arrival e%A and a first
diffuseness parameter W(X). According to Fig. 5, the first spatial audio stream is
decomposed into an approximated plane wave representation P^(k,n) as well as the
second spatial audio stream and potentially other spatial audio streams accordingly into
Pp2w(k,ri)...Pp'l){k,ri). Estimates are indicated by the hat above the respective formula
representation.
«
The estimator 120 can be adapted for estimating a plurality of N wave representations
P^(k,n) and diffuse field representations P^(k,n) as approximations P{l)(k,n) for a
plurality of N spatial audio streams, with \f a first wave field amplitude and
for estimating the second wave field measure in terms of a second wave field
amplitude, and for estimating a phase difference between the first wave field
measure and the second wave field measure, and/or for estimating a first wave field
phase and a second wave field phase.
3. The apparatus of one of the claims 1 to 2, comprising a means (110) for
determining for the first spatial audio stream the first audio representation, the first
direction of arrival measure and the first diffuseness parameter, and for determining
for the second spatial audio stream the second audio representation, the second
direction of arrival measure and the second diffuseness parameter.
4. The apparatus of one of the claims 1 to 3, wherein the processor (130) is adapted
for determining the merged audio representation, the merged direction of arrival
measure and the merged diffuseness parameter in a time-frequency dependent way.
5. The apparatus (100) of one of the claims 1 to 4, wherein the estimator (120) is
adapted for estimating the first and/or second wave representations, and wherein the
processor (130) is adapted for providing the merged audio representation in terms
of a pressure signal p(t) or a time-frequency transformed pressure signal P(k,ri),
wherein k denotes a frequency index and n denotes a time index.
6. The apparatus (100) of claim 5, wherein the processor (130) is adapted for
processing the first and second directions of arrival measures and/or for providing
the merged direction of arrival measure in terms of a unity vector e^JJc,^, with

particle velocity vector of the merged audio stream, where
Re{-} denotes the real part.
The apparatus (100) of one of the claim 6, wherein the processor (130) is adapted
for processing the first and/or the second diffuseness parameters and/or for
providing the merged diffuseness parameter in terms of

u(t)=\ux(t\uy(t\uz(t)f particle velocity vector, Re{-} denotes the real part,
P(k,n) denoting a time-frequency transformed pressure signal p(t), wherein k
denotes a frequency index and n denotes a time index, c is the speed of sound and
E(k,n) = —\\U(k,n)l +------j\P(k,n)f denotes the sound field energy, where p0
denotes the air density and < • >, denotes a temporal average.
The apparatus (100) of claim 7, wherein the estimator (120) is adapted for
estimating a plurality of N wave representations Pp^{k,n) and diffuse field
representations P^(k,n) as approximations for a plurality of N spatial audio
streams Pw(k,ri), with \),
wherein the second spatial audio stream additionally comprises a second
diffuseness parameter (T(2)), and
wherein the merged diffuseness parameter (*¥) is calculated in the step of
processing additionally based on the first diffuseness parameter (x¥m) and the
second diffuseness parameter. (T'2')
15. Computer program having a program code for performing the method of claim 14,
when the program code runs on a computer or a processor.

An apparatus (100) for merging a first spatial audio stream with a second spatial audio
stream to obtain a merged audio stream comprising an estimator (120) for estimating a first
wave representation comprising a first wave direction measure and a first wave field
measure for the first spatial audio stream, the first spatial audio stream having a first audio
representation and a first direction of arrival. The estimator (120) being adapted for
estimating a second wave representation comprising a second wave direction measure and
a second wave field measure for the second spatial audio stream, the second spatial audio
stream having a second audio representation and a second direction of arrival. The
apparatus (100) further comprising a processor (130) for processing the first wave
«
representation and the second wave representation to obtain a merged wave representation
comprising a merged wave field measure and a merged direction of arrival measure, and
for processing the first audio representation and the second audio representation to obtain a
merged audio representation, and for providing the merged audio stream comprising the
merged audio representation and the merged direction of arrival measure.

Documents

Application Documents

#	Name	Date
1	619-KOLNP-2011-RELEVANT DOCUMENTS [05-09-2023(online)].pdf	2023-09-05
1	619-kolnp-2011-specification.pdf	2011-10-06
2	619-kolnp-2011-pct request form.pdf	2011-10-06
2	619-KOLNP-2011-RELEVANT DOCUMENTS [05-09-2022(online)].pdf	2022-09-05
3	619-KOLNP-2011-RELEVANT DOCUMENTS [24-09-2021(online)].pdf	2021-09-24
3	619-kolnp-2011-pct priority document notification.pdf	2011-10-06
4	619-KOLNP-2011-RELEVANT DOCUMENTS [02-03-2020(online)].pdf	2020-03-02
4	619-kolnp-2011-international search report.pdf	2011-10-06
5	619-KOLNP-2011-RELEVANT DOCUMENTS [06-02-2019(online)].pdf	2019-02-06
5	619-kolnp-2011-international publication.pdf	2011-10-06
6	619-KOLNP-2011-IntimationOfGrant12-07-2018.pdf	2018-07-12
6	619-kolnp-2011-international preliminary examination report.pdf	2011-10-06
7	619-KOLNP-2011-PatentCertificate12-07-2018.pdf	2018-07-12
7	619-kolnp-2011-form-5.pdf	2011-10-06
8	619-KOLNP-2011-Information under section 8(2) (MANDATORY) [29-05-2018(online)].pdf	2018-05-29
8	619-kolnp-2011-form-3.pdf	2011-10-06
9	619-KOLNP-2011-ABSTRACT [18-12-2017(online)].pdf	2017-12-18
9	619-kolnp-2011-form-2.pdf	2011-10-06
10	619-KOLNP-2011-CLAIMS [18-12-2017(online)].pdf	2017-12-18
10	619-kolnp-2011-form-1.pdf	2011-10-06
11	619-KOLNP-2011-FER_SER_REPLY [18-12-2017(online)].pdf	2017-12-18
11	619-KOLNP-2011-FORM 3-1.1.pdf	2011-10-06
12	619-KOLNP-2011-FORM 18.pdf	2011-10-06
12	619-KOLNP-2011-OTHERS [18-12-2017(online)].pdf	2017-12-18
13	619-kolnp-2011-drawings.pdf	2011-10-06
13	619-KOLNP-2011-PETITION UNDER RULE 137 [18-12-2017(online)].pdf	2017-12-18
14	619-kolnp-2011-description (complete).pdf	2011-10-06
14	619-KOLNP-2011-Information under section 8(2) (MANDATORY) [13-11-2017(online)].pdf	2017-11-13
15	619-kolnp-2011-correspondence.pdf	2011-10-06
15	619-KOLNP-2011-FER.pdf	2017-06-20
16	619-KOLNP-2011-CORRESPONDENCE-1.2.pdf	2011-10-06
16	Other Patent Document [20-02-2017(online)].pdf	2017-02-20
17	Other Patent Document [08-08-2016(online)].pdf	2016-08-08
17	619-KOLNP-2011-CORRESPONDENCE-1.1.pdf	2011-10-06
18	619-KOL-2011-CORRESPONDENCE-1.3.pdf	2011-10-17
18	619-kolnp-2011-claims.pdf	2011-10-06
19	619-KOL-2011-PA.pdf	2011-10-17
19	619-KOLNP-2011-ASSIGNMENT.pdf	2011-10-06
20	619-kolnp-2011-abstract.pdf	2011-10-06
21	619-KOL-2011-PA.pdf	2011-10-17
21	619-KOLNP-2011-ASSIGNMENT.pdf	2011-10-06
22	619-KOL-2011-CORRESPONDENCE-1.3.pdf	2011-10-17
22	619-kolnp-2011-claims.pdf	2011-10-06
23	619-KOLNP-2011-CORRESPONDENCE-1.1.pdf	2011-10-06
23	Other Patent Document [08-08-2016(online)].pdf	2016-08-08
24	Other Patent Document [20-02-2017(online)].pdf	2017-02-20
24	619-KOLNP-2011-CORRESPONDENCE-1.2.pdf	2011-10-06
25	619-KOLNP-2011-FER.pdf	2017-06-20
25	619-kolnp-2011-correspondence.pdf	2011-10-06
26	619-kolnp-2011-description (complete).pdf	2011-10-06
26	619-KOLNP-2011-Information under section 8(2) (MANDATORY) [13-11-2017(online)].pdf	2017-11-13
27	619-kolnp-2011-drawings.pdf	2011-10-06
27	619-KOLNP-2011-PETITION UNDER RULE 137 [18-12-2017(online)].pdf	2017-12-18
28	619-KOLNP-2011-FORM 18.pdf	2011-10-06
28	619-KOLNP-2011-OTHERS [18-12-2017(online)].pdf	2017-12-18
29	619-KOLNP-2011-FER_SER_REPLY [18-12-2017(online)].pdf	2017-12-18
29	619-KOLNP-2011-FORM 3-1.1.pdf	2011-10-06
30	619-KOLNP-2011-CLAIMS [18-12-2017(online)].pdf	2017-12-18
30	619-kolnp-2011-form-1.pdf	2011-10-06
31	619-KOLNP-2011-ABSTRACT [18-12-2017(online)].pdf	2017-12-18
31	619-kolnp-2011-form-2.pdf	2011-10-06
32	619-kolnp-2011-form-3.pdf	2011-10-06
32	619-KOLNP-2011-Information under section 8(2) (MANDATORY) [29-05-2018(online)].pdf	2018-05-29
33	619-kolnp-2011-form-5.pdf	2011-10-06
33	619-KOLNP-2011-PatentCertificate12-07-2018.pdf	2018-07-12
34	619-kolnp-2011-international preliminary examination report.pdf	2011-10-06
34	619-KOLNP-2011-IntimationOfGrant12-07-2018.pdf	2018-07-12
35	619-kolnp-2011-international publication.pdf	2011-10-06
35	619-KOLNP-2011-RELEVANT DOCUMENTS [06-02-2019(online)].pdf	2019-02-06
36	619-kolnp-2011-international search report.pdf	2011-10-06
36	619-KOLNP-2011-RELEVANT DOCUMENTS [02-03-2020(online)].pdf	2020-03-02
37	619-KOLNP-2011-RELEVANT DOCUMENTS [24-09-2021(online)].pdf	2021-09-24
37	619-kolnp-2011-pct priority document notification.pdf	2011-10-06
38	619-KOLNP-2011-RELEVANT DOCUMENTS [05-09-2022(online)].pdf	2022-09-05
38	619-kolnp-2011-pct request form.pdf	2011-10-06
39	619-kolnp-2011-specification.pdf	2011-10-06
39	619-KOLNP-2011-RELEVANT DOCUMENTS [05-09-2023(online)].pdf	2023-09-05

Search Strategy

1	PatSeerresult1_17-05-2017.pdf
1	PatSeerstrategy_17-05-2017.pdf
2	PatSeerresult1_17-05-2017.pdf
2	PatSeerstrategy_17-05-2017.pdf