APPARATUS AND METHOD FOR DERIVING A DIRECTIONAL INFORMATION AND
COMPUTER PROGRAM PRODUCT
Description
. Technical Field
Embodiments of the present invention relate to an apparatus for deriving a directional
information from a plurality of microphone signals or from a plurality of components of a
microphone signal. Further embodiments relate to systems comprising such an apparatus.
Further embodiments relate to a method for deriving a directional information from a
plurality of microphone signals.
2. Background of the Invention
Spatial sound recording aims at capturing a sound field with multiple microphones such
that at the reproduction side, a listener perceives the sound image as it was present at the
recording location. Standard approaches for spatial sound recording use conventional
stereo microphones or more sophisticated combinations of directional microphones, e.g.,
such as the B-format microphones used in Ambisonics (M.A. Gerzon. Periphony, Widthheight
sound reproduction, J . Audio Eng. Soc, 21(1):2-10, 1973). Commonly, most of
these methods are referred to as coincident-microphone techniques.
Alternatively, methods based on a parametric representation of sound fields can be applied,
which are referred to as parametric spatial audio coders. These methods determine one or
more downmix audio signals together with corresponding spatial side information, which
are relevant for the perception of spatial sound. Examples are Directional Audio Coding
(DirAC), as discussed in V. Pulkki, Spatial sound reproduction with directional audio
coding, J . Audio Eng. Soc, 55(6):503—516, June 2007, or the so-called spatial audio
microphones (SAM) approach proposed in C. Faller, Microphone front-ends for spatial
audio coders. In 125th AES Convention, Paper 7508, San Francisco, Oct. 2008. The spatial
cue information is determined in frequency subbands and basically consists of the
direction-of-arrival (DOA) of sound and, sometimes, of the diffuseness of the sound field
or other statistical measures. In a synthesis stage, the desired loudspeaker signals for
reproduction are determined based on the downmix signals and the parametric side
information.
In addition to spatial audio recording, parametric approaches to sound field representations
have been used in applications such as directional filtering (M. Kallinger, H. Ochsenfeld,
G. Del Galdo, F. Kuech, D. Mahne, R. Schultz-Amling, and O. Thiergart, A spatial
filtering approach for directional audio coding, in 126th AES Convention, Paper 7653,
Munich, Germany, May 2009) or source localization (O. Thiergart, R. Schultz-Amling, G.
Del Galdo, D. Mahne, and F. Kuech, Localization of sound sources in reverberant
environments based on directional audio coding parameters, in 128th AES Convention,
Paper 7853, New York City, NY, USA, Oct. 2009). These techniques are also based on
directional parameters such as DOA of sound or the diffuseness of the sound field.
One way to estimate directional information from the sound field, namely the direction of
arrival of sound, is to measure the field in different points with an array of microphones.
Several approaches have been proposed in the literature J . Chen, J. Benesty, and Y. Huang,
Time delay estimation in room acoustic environments: An overview, in EURASIP Journal
on Applied Signal Processing, Article ID 26503, 2006 using relative time delay estimates
between the microphone signals. However, these approaches make use of the phase
information of the microphone signals, leading inevitably to spatial aliasing. In fact, as
higher frequencies are being analyzed, the wavelength becomes shorter. At a certain
frequency, termed aliasing frequency, the wavelength is such that the identical phase
readings correspond to two or more directions, so that an unambiguous estimation is not
possible (at least without additional a priori information).
There exists a large variety of methods to estimate the DOA of sound using arrays of
microphones. An overview of common approaches is summarized in J. Chen, J . Benesty,
and Y. Huang, Time delay estimation in room acoustic environments: An overview, in
EURASIP Journal on Applied Signal Processing, Article ID 26503, 2006. These
approaches have in common, that they exploit the phase relation of the microphone signals
to estimate the DOA of sound. Often, the time difference between different sensors is
determined first, and then the knowledge of the array geometry is exploited to compute the
corresponding DOA. Other approaches evaluate the correlation between the different
microphone signals in frequency subbands to estimate the DOA of sound (C. Faller,
Microphone front-ends for spatial audio coders, in 125th AES Convention, Paper 7508,
San Francisco, Oct. 2008 and J. Chen, J . Benesty, and Y. Huang, Time delay estimation in
room acoustic environments: An overview, in EURASIP Journal on Applied Signal
Processing, Article ID 26503, 2006).
In DirAC the DOA estimate for each frequency band is determined based on the active
sound intensity vector measured in the observed sound field. In the following the
estimation of the directional parameters in DirAC is briefly summarized. Let P(k, n) denote
the sound pressure and U(k, n) the particle velocity vector at frequency index k and time
index n. Then, the active sound intensity vector is obtained as
I a (k, n) = Re{ ( ,ri)U*(k, n)} ( 1)
The superscript * denotes the conjugate complex and Re{ } is the real part of a complex
number. p0 represents the mean density of air. Finally, the opposite direction of Ia(k, n)
points to the DOA of sound:
I a ( )
DOA , n ) =
Additionally, the diffuseness of the sound field can be determined, e.g., according to
(3)
In practice, the particle velocity vector is computed from the pressure gradient of closely
spaced omnidirectional microphone capsules, often referred to as differential microphone
array. Considering Fig. 2, the x component of the particle velocity vector can, e.g., be
computed using a pair of microphones according to
Ux (k, n ) = K{k) [Pi (fe, n) - i ¾( , n)] ,
(4)
where K(k) represents a frequency dependent normalization factor. Its value depends on
the microphone configuration, e.g. the distance of the microphones and/or their directivity
patterns. The remaining components Uy(k, n) (and U (k, n)) of U(kn) can be determined
analogously by combining suitable pairs of microphones.
As shown in M. Kallinger, F. Kuech, R. Schultz-Amling, G. Del Galdo, J . Ahonen, and V.
Pulkki, Analysis and Adjustment of Planar Microphone Arrays for Application in
Directional Audio Coding, in 124th AES Convention, Paper 7374, Amsterdam, the
Netherlands, May 2008, spatial aliasing affects the phase information of the particle
velocity vector, prohibiting the use of pressure gradients for the active sound intensity
estimation at high frequencies. This spatial aliasing yields ambiguities in the DOA
estimates. As can be shown, the maximum frequency fmax, where unambiguous DOA
estimates can be obtained based on active sound intensity, is determined by the distance of
the microphone pairs. Additionally, the estimation of directional parameters such as
diffuseness of a sound field are also affected. In case of omnidirectional microphones with
a distance d, this maximum frequency is given by
(5)
where c denotes the speed of sound propagation.
Typically, the required frequency range of applications exploiting the directional
information of sound fields is larger than the spatial aliasing limit fmax to be expected for
practical microphone configuration. Notice that reducing the microphone spacing d, which
increases the spatial aliasing limit fmax, is not a feasible solution for most applications, as a
too small d significantly reduces the estimation reliability at low frequencies in practice.
Thus, new methods are needed to overcome the limitations of current directional parameter
estimation techniques at high frequencies.
3. Summary of the Invention
It is an objective of embodiments of the present invention to create a concept, which allows
for a better determination of a directional information above a spatial aliasing limit
frequency.
This objective is solved by an apparatus according to claim 1, systems according to claims
15 and 16, a method according to claim 18 and a computer program according to claim 19.
Embodiments provide an apparatus for deriving a directional information from a plurality
of microphone signals or from a plurality of components of a microphone signal, wherein
different effective microphone look directions are associated with the microphone signals
or components. The apparatus comprises a combiner configured to obtain a magnitude
from a microphone signal or a component of the microphone signal. Furthermore, the
combiner is configured to combine (e.g. linearly combine) direction information items
describing the effective microphone look direction, such that a direction information item
describing a given effective microphone look direction is weighted in dependence on the
magnitude value of the microphone signal, or of the component of the microphone signal,
associated with the given effective microphone look direction, to derive the directional
information.
It has been found that the problem of spatial aliasing in directional parameter estimation
results from ambiguities in the phase information within the microphone signals. It is an
idea of embodiments of the present invention to overcome this problem by deriving a
directional information based on magnitude values of the microphone signals. It has been
found that by deriving the directional information based on magnitude values of the
microphone signals or of components of the microphone signals, ambiguities, as they may
occur in traditional systems using the phase information to determine the directional
information do not occur. Hence, embodiments enable a determination of a directional
information even above a spatial aliasing limit, above which a determination of the
directional information is not (or only with errors) possible using phase information.
In other words, the use of the magnitude values of the microphone signals or of the
components of the microphone signals is especially beneficial within frequency regions
where spatial aliasing or other phase distortions are expected, since these phase distortions
do not have an influence on the magnitude values and, therefore, do not lead to ambiguities
in the directional information determination.
According to some embodiments, an effective microphone look direction associated to a
microphone signal describes the direction where the microphone from which the
microphone signal is derived has its maximum response (or its highest sensitivity). As an
example, the microphone may be a directional microphone possessing a non isotropic pick
up pattern and the effective microphone look direction can be defined as the direction
where the pick up pattern of the microphone has its maximum. Hence, for a directional
microphone the effective microphone look direction may be equal to the microphone look
direction (describing the direction towards which the directional microphone has a
maximum sensitivity), e.g. when no objects modifying the pick-up pattern of the
directional microphone are placed near the microphone. The effective microphone look
direction may be different to the microphone look direction of the directional microphone
if the directional microphone is placed near an object that has the effect of modifying its
pick-up pattern. In this case the effective microphone look direction may describe the
direction, where the directional microphone has its maximum response.
In the case of an omnidirectional microphone, an effective response pattern of the
omnidirectional microphone may be shaped, for example, using a shadowing object (which
has an effect of the effect of modifying the pick-up pattern of the microphone), such that
the shaped effective response pattern has an effective microphone look direction which is
the direction of maximum response of the omnidirectional microphone with the shaped
effective response pattern.
According to further embodiments, the directional information may be a directional
information of a sound field pointing towards the direction from which the sound field is
propagating (for example, at certain frequency and time indices). The plurality of
microphone signals may describe the sound field. According to some embodiments, a
direction information item describing a given effective microphone look direction maybe a
vector pointing into the given effective microphone look direction. According to further
embodiments, the direction information items may be unit vectors, such that direction
information items associated with different effective microphone look directions have
equal norms (but different directions). Therefore, a norm of a weighted vector linearly
combined by the combiner is determined by the magnitude value of the microphone signal
or the component of the microphone signal associated to the direction information item of
the weighted vector.
According to further embodiments, the combiner may be configured to obtain a magnitude
value, such that the magnitude value describes a magnitude of a spectral coefficient (as a
component of the microphone signal) representing a spectral sub-region of the microphone
signal of the component of the microphone signal. In other words, embodiments may
extract the actual information of a sound field (for example analyzed in a time frequency
domain) from the magnitudes of the spectra of the microphones used for deriving the
microphone signals.
According to further embodiments, only the magnitude values (or the magnitude
information) of the microphone signals (or of the microphone spectra) are used in the
estimation process for deriving the directional information, as the phase term is corrupted
by the spatial aliasing effect.
In other words, embodiments create an apparatus and a method for directional parameter
estimation using only the magnitude information of microphone signals or components of
the microphone signals and the spectrum, respectively.
According to further embodiments, the output of the magnitude based directional
parameter estimation (the directional information) can be combined with other techniques
which also consider phase information.
According to further embodiments, the magnitude value may describe a magnitude of the
microphone signal or of the component.
4. Short Description of the Figures
Embodiments of the present invention will be described in detail using the accompanying
figures, in which:
Fig. 1 shows a block schematic diagram of an apparatus according to an
embodiment of the present invention;
Fig. 2 shows an illustration of a microphone configuration using four
omnidirectional capsules; providing sound pressure signals P (k, n) with i =
1, . . . , 4;
Fig. 3 shows an illustration of a microphone configuration using four directional
microphones with cardioid pick up patterns;
Fig. 4 shows an illustration of a microphone configuration employing a rigid
cylinder to cause scattering and shadowing effects;
Fig. 5 shows an illustration of a microphone configuration similar to Fig. 4, but
employing a different microphone placement;
Fig. 6 shows an illustration of a microphone configuration employing a rigid
hemisphere to cause scattering and shadowing effects;
Fig. 7 shows an illustration of a 3D microphone configuration employing a rigid
sphere to cause shadowing effects;
Fig. 8 shows a flow diagram of a method according to an embodiment;
Fig. 9 shows a block schematic diagram of a system according to an embodiment;
Fig. 10 shows a block schematic diagram of a system according to a further
embodiment of the present invention;
Fig. 11 shows an illustration of an array of four omnidirectional microphones with
spacing of d between the opposing microphones;
Fig. 12 shows an illustration of an array of four omnidirectional microphones,
which are mounted on the end of a cylinder;
Fig. 13 shows a diagram of a directivity index DI in decibels as a function of ka,
which represents a diaphragm circumference of an omnidirectional
microphone divided by the wavelength;
Fig. 1 shows logarithmic directional patterns with G.R.A.S. microphone;
Fig. 15 shows logarithmic directional patterns with AKG microphone; and
Fig. 16 shows diagram results for direction analysis expressed as root-mean-square
error (RMSE).
Before embodiments of the present invention will be described in more detail using the
accompanying figures, it is to be pointed out that the same or functionally equal elements
are provided with the same reference numbers and that a repeated description of elements
provided with the same reference numbers is omitted. Hence, descriptions provided for
elements with the same reference numbers are mutually exchangeable.
5. Detailed Description of Embodiments of the Present Invention
5.1 Apparatus According to Fig. 1
Fig. 1 shows an apparatus 00 according to an embodiment of the present invention. The
apparatus 100 for deriving a directional information 101 (also denoted as d(k, n)) from a
plurality of microphone signals 103^0 103N (also denoted as to PN) or from a plurality
of components of a microphone signal comprises a combiner 105. The combiner 105 is
configured to obtain a magnitude value from a microphone signal or a component of the
microphone signal, and to linearly combine direction information items describing
effective microphone look directions being associated with the microphone signals 103i to
103N or the components, such that a direction information item describing a given effective
microphone look direction is weighted in dependence on the magnitude value of the
microphone signal, or of the component of the microphone signal, associated with the
given effective microphone look direction to derive the directional information 101.
A component of an i-th microphone signal Pj may be denoted as P (k, n). The component
Pj(k, n) of the microphone signal i may be a value of the microphone signal R,· at
frequency index k and time index n. The microphone signal Pj may be derived from an i-th
microphone and may be available to the combiner 05 in the time frequency representation
comprising a plurality of components Pj(k, n) for different frequency indices k and time
indices n. As an example, the microphone signals Pi to P may be Sound Pressure Signals,
as they can be derived from B-Format microphones.
Therefore, each component Pj(k, n) may correspond to a time frequency tile (k, n). The
combiner 105 may be configured to obtain the magnitude value such that the magnitude
value describes a magnitude of a spectral coefficient representing a spectral sub-region of
the microphone signal Pj. This spectral coefficient may be a component P;(k, n) of the
microphone signal Pj. The spectral sub-region may be defined by the frequency index k of
the component Pj(k, n). Furthermore, the combiner 105 may be configured to derive the
directional information 101 on the basis of a time frequency representation of the
microphone signals, for example, in which a microphone signal Pj is represented by a
plurality of components Pi(k, n), each component being associated to a time frequency tile
(k, n).
As described in the introductory part of this application, by obtaining the directional
information (k, n) based on the magnitude values of the microphone signals Pi to or of
components of a microphone signal a determination of the directional information d(k, n)
even with higher frequency for the microphone signals to P , e.g. for components Pj(k,
n) to P (k, n) having a frequency index above a frequency index of the spectral aliasing
frequency fma , can be achieved, since spatial aliasing or other phase distortions cannot
occur.
In the following a detailed example of an embodiment of the present invention is given,
which is based on a combination of the magnitudes of the microphone signals (directional
magnitude combination), and how it can be performed by the apparatus 100 according to
Fig. 1. The directional information d(k, n), also denoted as DOA estimate, is obtained by
interpreting the magnitude of each microphone signal (or of each component of a
microphone signal) as a corresponding vector in a two-dimensional (2D) or threedimensional
(3D) space.
Let dt(k, n) be the true or desired vector which points towards the direction from which the
sound field is propagating at frequency and time indices k and n respectively. In other
words, the DOA of sound corresponds to the direction of dt(k, n). Estimating dt(k, n) so
that the directional information from the sound field can be extracted is the goal of
embodiments of the invention. Let further bl 5 b2, . . . , be vectors (e.g. unit norm
vectors) pointing into the look direction of the N directional microphones. The look
direction of a directional microphone is defined as the direction, where the pick-up pattern
has its maximum. Analogously, in case of scattering/shadowing objects are included in the
microphone configuration, the vectors bi, b2, . . . , point in the direction of maximum
response of the corresponding microphone.
The vectors bi, b2, . . . , may be designated as direction information items describing
effective microphone look directions of the first to the N-th microphone. In this example,
the direction information items are vectors pointing into corresponding effective
microphone look directions. According to further embodiments, a direction information
item may also be a scalar, for example an angle describing a look direction of a
corresponding microphone.
Furthermore, in this example the direction information items may be unit norm vectors,
such that vectors associated with different effective microphone look directions have equal
norms.
It should also be noted, that the proposed method may work best if the sum of the vectors
bj, corresponding to the effective microphone look directions of the microphones, equals
zero (e.g. within a tolerance range), i.e.,
= o.
i-1
(6)
In some embodiments the tolerance range may be ±30%, ±20%, ±10%, ±5%» of one of the
direction information items used to derive the sum (e.g. of the direction information item
having the largest norm of the direction information item having the smallest norm, or of
the direction information item having the norm closest to the average of all norms of the
direction items used to derive the sum).
In some embodiments effective microphone look directions may not be equally distributed
with regard to a coordinate system. For example, assuming a system in which a first
effective microphone look direction of a first microphone is EAST (e.g. 0 degrees in a 2-
dimensional coordinate system), a second effective microphone look direction of a second
microphone is NORTH-EAST (e.g. 45 degrees in the 2-dimensional coordinate system), a
third microphone look direction of a third microphone is NORTH (e.g. 90 degrees in the
2-dimensional coordinate system), and a fourth effective microphone look direction of a
fourth microphone is SOUTH-WEST (e.g. -135 degrees in the 2-dimensional coordinate
system), having the direction information items being unit norm vectors would result in:
b = [ 1 0]T for the first effective microphone look direction;
b = [1/ 1 2 ] for the second effective microphone look direction;
b3= [0 l ] for the third effective microphone look direction; and
b4= [— 1 2 - 1/ 2 ] for the fourth effective microphone look direction.
This would lead to a non-zero sum of the vectors of:
As in some embodiments, it is desired to have a sum of the vectors being zero, a direction
information item being a vector pointing into an effective microphone look direction may
be scaled. In this example, the direction information item b4 may be scaled, such as:
b4= [-(l + l /V2) -(1 + 1 ) ]
resulting in a sum bSUmof the vectors being equal to zero:
bsum= bi+b2+b 3+b4= [0 0] .
In other words, according to some embodiments, different direction information items
being vectors pointing into different effective microphone look directions may have
different norms, which may be chosen such that a sum of the direction information items
equals zero.
The estimate d of the true vector dt(k, n), and therefore the directional information to be
determined can be defined as
N
(7)
where P ,(k, n) denotes the signal of the i-th microphone (or of the component of the
microphone signal R,· of the i-th microphone) associated to the frequency tile (k, n).
The equation (7) forms a linear combination of the direction information items b to of
a first microphone to a N-th microphone weighted by magnitude values of components
Pi(k, n) to PN , n) of microphone signals P to P derived from the first to the N-th
microphone. Therefore, the combiner 105 may calculate the equation (7) to derive the
directional information 101 (d(k, n)).
As can be seen from eq. (7) the combiner 105 may be configured to linearly combine the
direction information items to weighted in dependence on the magnitude values
being associated to a given time frequency tile (k, n) in order to derive the directional
information d(k, n) for the given time frequency tile (k, n).
According to further embodiments, the combiner 105 may be configured to linearly
combine the direction information items b to weighted only in dependence on the
magnitude values being associated to the given time frequency tile (k, n).
Furthermore, from equation (7) it can be seen that the combiner 105 may be configured to
linearly combine for a plurality of different time frequency tiles the same directional
information items b to (as these are independent from the time frequency tiles)
describing different effective microphone look directions, but the direction information
items may be weighted differently in dependence on the magnitude values associated to the
different time frequency tiles.
As the direction information items b to b may be unit vectors a norm of a weighted
vector being formed by a multiplication of a direction information item b and a magnitude
value may be defined by the magnitude value. Weighted vectors for the same effective
microphone look direction but different time frequency tiles may have the same direction
but differ in their norms due to the different magnitude values for different time frequency
tiles.
According to some embodiments, the weighted values may be scalar values.
The factor k shown in eq. (7) may be chosen freely. In the case that k = 2 and that opposing
microphones (from which the microphone signals Pi to P are derived from) are
equidistant, the directional information d(k, n) is proportional to the energy gradient in the
center of the array (for example in a set of two microphones).
In other words the combiner 105 may be configured to obtain squared magnitude values
based on the magnitude values, a squared magnitude value describing a power of a
component P (k, n) of a microphone signal P . Furthermore, the combiner 105 may be
configured to linearly combine the direction information items to such that a
direction information item b is weighted in dependence on the squared magnitude value of
the component P (k, n) of the microphone signal associated with the corresponding look
direction (of the i-th microphone).
From d (k, n) the directional information expressed with azimuth f and elevation 3 angles
is easily obtained considering that
(8)
In some applications, when only 2D analysis is required, four directional microphones,
e.g., arranged as in Fig. 3, can be employed. In this case, the direction information items
may be chosen as:
b = [1 o o]T
T (10)
4 = fo - 1 o T
( 12)
so that (7) becomes
dx = R , h) - \P2 (k n)\ K (13)
dy = \P (k, n)\ - \P (k, n)\ (14)
This approach can analogously be applied in case of rigid objects placed in the microphone
configuration. As an example, Fig. 4 and 5, illustrate the case of a cylindrical object placed
in the middle of an array of four microphones. Another example is shown in Fig. 6, where
the scattering object has the shape of a hemisphere.
An example of a 3D configuration is shown in Fig. 7, where six microphones are
distributed over a rigid sphere. In this case, the z component of the vector d(k, n) can be
obtained analogously to (9) - (14):
5 = [0 0 1] T (15)
yielding
dz = \P5 (k, n ) \ - \P k , n)\ .
(17)
A well known 3D configuration of directional microphones which is suitable for
application in embodiments of this invention is the so-called A-format microphone, as
described in P.G. Craven and M.A. Gerzon, US4042779 (A), 1977.
To follow the proposed directional magnitude combination approach, certain assumptions
need to be fulfilled. If directional microphones are employed, then for each microphone the
pick up patterns should be approximately symmetric with respect to the orientation or look
direction of the microphones. If the scattering/shadowing approach is used, then
scattering/shadowing effects should be approximately symmetric with respect to the
direction of maximum response. These assumptions are easily met when the array is
constructed as in the examples shown in Figs. 3 to 7.
Application in DirAC
The above discussion considers the estimation of the directional information (the DOA)
only. In the context of directional coding information about the diffuseness of a sound field
may additionally be required. A straightforward approach is obtained by simply equating
the estimated vector d(k, n) or determined directional information with the opposite
direction of the active sound intensity vector Ia(k, n):
I &(k, n ) = -d(k, n).
(18)
This is possible as d(k, n) contains information related to the energetic gradient. Then, the
diffuseness can be computed according to (3).
5.2. Method According to Figure 8
Further embodiments of the present invention create a method for deriving a directional
information from a plurality of microphone signals or from a plurality of components of a
microphone signal, wherein different effective microphone look directions are associated
with the microphone signals.
Such a method 800 is shown in a flow diagram in Fig. 8. The method 800 comprises a step
801 of obtaining a magnitude from a microphone signal or a component of the microphone
signal.
Furthermore, the method 800 comprises a step 803 of combining (e.g. linearly combining)
direction information items describing the effective microphone look directions, such that a
direction information item describing a given effective microphone look direction is
weighted in dependence on the magnitude value of the microphone signal or of the
component of the microphone signal associated with the corresponding effective
microphone look direction, to derive the directional information.
The method 800 may be performed by the apparatus 100 (for example by the combiner 105
of the apparatus 100).
In the following, two systems according to embodiments may be described for acquiring
the microphone signals and deriving a directional information from these microphone
signals using Figs. 9 and 10.
5.3 Systems According to Fig. 9 and Fig. 10
As commonly known, the use of the pressure magnitude to extract directional information
is not practical when using omnidirectional microphones. In fact, the magnitude
differences due to the different distances traveled by the sound to reach the microphones is
normally too small to be measured, so that most known algorithms mainly rely on the
phase information. Embodiments overcome the problem of spatial aliasing in directional
parameter estimation. The systems described in the following make use of microphone
arrays adequately designed so that there exists a measurable magnitude difference in the
microphone signals which is dependent on the direction of arrival. (Only) This magnitude
information of the microphone spectra is then used in the estimation process, as the phase
term is corrupted by the spatial aliasing effect.
Embodiments comprise extracting directional information (such as DOA or diffuseness) of
a sound field analyzed in a time-frequency domain from only the magnitudes of the spectra
of two or more microphones, or of one microphone subsequently placed in two or more
positions, e.g., by making one microphone rotate about an axis. This is possible when the
magnitudes vary sufficiently strong in a predictable way depending on the direction of
arrival. This can be achieved in two ways, namely by
1. employing directional microphones (i.e., possessing a non isotropic pick up pattern
such as cardioid microphones), where each microphone points to a different
direction, or by
2. realizing for each microphone or microphone position a unique scattering and/or
shadowing effect. This can be achieved for instance by employing a physical object
in the center of the microphone configuration. Suitable objects modify the
magnitudes of the microphone signals in a known way by means of scattering
and/or shadowing effects.
An example for a system using the first method is shown in Fig. 9.
5.3.1 System Using Directional Microphones According to Fig. 9
Fig. 9 shows a block schematic diagram of a system 900, the system comprises an
apparatus, for example the apparatus 100 according to Fig. 1. Furthermore, the system 900
comprises a first directional microphone 901 having a first effective microphone look
direction 903] for deriving a first microphone signal 103] of the plurality of microphone
signals of the apparatus 100. The first microphone signal 103 is associated with the first
look direction 903 ! . Furthermore, the system 900 comprises a second directional
microphone 90 2 having a second effective microphone look direction 9032 for deriving a
second microphone signal 1032 of the plurality of microphone signals of the apparatus 100.
The second microphone signal 1032 is associated with the second look direction 9032.
Furthermore, the first look direction 903 1 is different from the second look direction 9032.
For example, the look directions 903 , 9032 may be opposing. A further extension to this
concept is shown in Fig. 3, where four cardioid microphones (directional microphones) are
pointed towards opposing directions of a Cartesian coordinate system. The microphone
positions are marked by black circuits.
By applying directional microphones it can be achieved that magnitude differences
between the directional microphones 901 1, 90 12 are large enough to determine the
directional information 101.
An example of a system using the second method to achieve a strong variation of
magnitudes of different microphone signals for omnidirectional microphones is shown in
Fig. 10.
5.3.2 System Using Omnidirectional Microphones According to Fig. 10
Fig. 10 shows a system 1000 comprising an apparatus, for example, the apparatus 100
according to Fig. 1, for deriving a directional information 101 from a plurality of
microphone signals or components of a microphone signal. Furthermore, the system 1000
comprises a first omnidirectional microphone l OO for deriving a first microphone signal
103] of the plurality of microphone signals of the apparatus 100. Furthermore, the system
1000 comprises a second omnidirectional microphone 1001 2 for deriving a second
microphone signal 1032 of the plurality of microphone signals of the apparatus 100.
Furthermore, the system 1000 comprises a shadowing object 1005 (also denoted as
scattering object 1005) placed between the first omnidirectional microphone 1001 1 and the
second omnidirectional microphone 1001 2 for shaping effective response patterns of the
first omnidirectional microphone 1001 and of the second omnidirectional microphone
1001 2, such that a shaped effective response pattern of the first omnidirectional
microphone lOO comprises a first effective microphone look direction 1003j and a
shaped effected pattern of the second omnidirectional microphone 1001 2 comprises a
second effective microphone look direction 10032. In other words, by using the shadowing
object 1005 between the omnidirectional microphones 1001 , 1001 a directional behavior
of the omnidirectional microphones 1001 , 1001 2 can be achieved such that measurable
magnitude differences between the omnidirectional microphones 1001 1, 10012 even with a
small distance between the two omnidirectional microphones OO , 1001 2 can be
achieved.
Further optional extensions to the system 1000 are given in Fig. 4 to Fig. 6, in which
different geometric objects are placed in the middle of a conventional array of four
(omnidirectional) microphones.
Fig. 4 shows an illustration of a microphone configuration employing an object 1005 to
cause scattering and shadowing effects. In this example in Fig. 4 the object is a rigid
cylinder. The microphone positions of four (omnidirectional) microphones lOO to 10014
are marked by the black circuits.
Fig. 5 shows an illustration of a microphone configuration similar to Fig. 4, but employing
a different microphone placement (on a rigid surface of a rigid cylinder). The microphone
positions of the four (omnidirectional) microphones lOO to 1001 4 are marked by the
black circuits. In the example shown in Fig. 5 the shadowing object 1005 comprises the
rigid cylinder and the rigid surface.
Fig. 6 shows an illustration of a microphone configuration employing a further object 1005
to cause scattering and shadowing effects. In this example, the object 1005 is a rigid
hemisphere (with a rigid surface). The microphone positions of the four (omnidirectional)
microphones l OO to 1001 are marked by the black circuits.
Furthermore, Fig. 7 shows an example for a three-dimensional DOA estimation (a threedimensional
directional information derivation) using six (omnidirectional) microphones
1001]. to 1001 distributed over a rigid sphere. In other words, Fig. 6 shows an illustration
of a 3D microphone configuration employing an object 1005 to cause shadowing effects.
In this example, the object is a rigid sphere. The microphone positions of the
(omnidirectional) microphones 1001 to 1001 6 are marked by the black circuits.
From the magnitude differences between the different microphone signals generated by the
different microphones shown in Figs. 2 to 7 and 9 to 10, embodiments compute the
directional information following the approach explained in conjunction with the apparatus
100 according to Fig. 1.
According to further embodiments, the first directional microphone 901 or the first
omnidirectional microphone 1001 1 and the second directional microphone 9012 or the
second omnidirectional microphone 1001 2 may be arranged such that a sum of a first
direction information item being a vector pointing in the first effective microphone look
direction 903 1, 1003] and of a second direction information item being a vector pointing
into the second effective microphone look direction 9032, 10032 equals 0 within a tolerance
range of +/- 5 %, +/- 10 %, +/- 20 % or +/- 30 % of the first direction information item or
the second direction information item.
In other words, equation (6) may apply to the microphones of the systems 900, 1000, in
which b is a direction information item of the i-th microphone being a unit vector pointing
in the effective microphone look direction of the i-th microphone.
In the following, alternative solutions for using the magnitude information of the
microphone signals for directional parameter estimation will be described.
5.4 Alternate Solutions
5.4.1 Correlation Based Approach
An alternative approach to exploit solely the magnitude information of microphone signals
for directional parameter estimation is proposed in this section. It is based on correlations
between magnitude spectra of the microphone signals and corresponding a priori
determined magnitude spectra obtained from models or measurements.
Let Sj(k, n) = |P;(k, n)|K denote the magnitude or power spectrum of the i-th microphone
signal. Then, we define the measured magnitude array response S(k, n) of the N
microphones as
S k , n ) = [S k , n), S 2 k , n), S N (k, n)} T .
(19)
The corresponding magnitude array manifold of the microphone array is denoted by
SM( P, k, n). The magnitude array manifold obviously depends on the DOA of sound f if
directional microphones with different look direction or scattering/shadowing with objects
within the array are used. The influence on the DOA of sound on the array manifold
depends on the actual array configuration, and it is influenced by the directional patterns of
the microphones and/or scattering object included in the microphone configuration. The
array manifold can be determined from measurements of the array, where sound is played
back from different directions. Alternatively, physical models can be applied. The effect of
a cylindrical scatterer on the sound pressure distribution on its surface is, e.g., described in
H. Teutsch and W. Kellermann, Acoustic source detection and localization based on
wavefield decomposition using circular microphone arrays, J . Acoust. Soc. Am., 5(120),
2006.
To determine the desired estimate of the DOA of sound, the magnitude array response and
the magnitude array manifold are correlated. The estimated DOA corresponds to the
maximum of the normalized correlation according to
(20)
Although we have presented only the 2D case for the DOA estimation here, it is obvious
that the 3D DOA estimation including azimuth and elevation can be performed
analogously.
5.4.2 Noise Subspace Based Approach
An alternative approach to exploit solely the magnitude information of microphone signals
for directional parameter estimation is proposed in this section. It is based on the well
known root MUSIC algorithm (R. Schmidt, Multiple emitter location and signal parameter
estimation, IEEE Transactions on Antennas and Propagation, 34(3):276-280, 1986), with
the exception that in the example shown only the magnitude information is processed.
Let S(k, n) be the measured magnitude array response, as defined in (19). In the following
the dependencies on k and n are omitted, as all steps are carried out separately for each
time frequency bin. The correlation matrix R can be computed with
R = E{ ,
where (·)H denotes the conjugate transpose and E{·} is the expectation operator. The
expectation is usually approximated by a temporal and/or spectral averaging process in the
practical application. The eigenvalue decomposition of R can be written as
(22)
where , are the eigenvalues and N is the number of microphones or measurement
positions. Now, when a strong plane wave arrives at the microphone array, one relatively
large eigenvalue l is obtained, while all other eigenvalues are close to zero. The
eigenvectors, which correspond to the latter eigenvalues, form the so-called noise subspace
Qn. This matrix is orthogonal to the so-called signal subspace Qs, which contains the
eigenvector(s) corresponding to the largest eigenvalue(s). The so-called MUSIC spectrum
can be computed with
(23)
where the steering vector s(
0.
12. Apparatus according to one of the preceding claims,
wherein the combiner is configured to derive the directional information (d(k, n))
on the basis of the magnitude values and independent from phases of the
microphone signals (P to PN) or of the components (Pj(k, n)) of the microphone
signal (P ) in a first frequency range; and
wherein the combiner is further configured to derive the directional information in
dependence on the phases of the microphone signals (Pi to PN) or of the
components (Pi(k, n)) of the microphone signal (P ) in a second frequency range.
13. Apparatus according to one of the preceding claims,
wherein the combiner is configured such that the direction information item (b ) is
weighted solely in dependence on the magnitude value.
14. Apparatus (100) according to one of the preceding claims, wherein the combiner
(105) is configured to linearly combine the direction information items (b to N) .
15. System (900) comprising:
an apparatus (100) according to one of the preceding claims,
a first directional microphone (901 having a first effective microphone look
direction (9030 for deriving a first microphone signal (1030 ° f e plurality of
microphone signals, the first microphone signal (1030 being associated with a first
effective microphone look direction (9030; d
a second directional microphone (9012) having a second effective microphone look
direction (9032) for deriving a second microphone signal (1032) of the plurality of
microphone signals, the second microphone signal (1032) being associated with the
second effective microphone look direction (903 ); and
wherein the first look direction (9030 is different from the second look direction
(9032) .
16. System (1000) comprising:
an apparatus according to one of claims 1 to 14,
a first omnidirectional microphone (10010 f deriving a first microphone signal
(103i,) of the plurality of microphone signals;
a second omnidirectional microphone (1001 2) for deriving a second microphone
signal (1032) ; and
a shadowing object (1005) placed between the first omnidirectional microphone
(1001 ) and the second omnidirectional microphone (1001 2) for shaping effective
response patterns of the first omnidirectional microphone (1001 and of the second
omnidirectional microphone (1001 ), such that a shaped effective response pattern
of the first omnidirectional microphone (10010 comprises a first effective
microphone look direction (10030 and a shaped effective response pattern of the
second omnidirectional microphone (1001 2) comprises a second effective
microphone look direction (10032), being different from the first effective
microphone look direction (10030.
System according to one of claims 15 or 16,
wherein the directional microphones (901 1, 9012) or the omnidirectional
microphones (lOOl 1001 2) are arranged such that a sum of direction information
items being vectors pointing in the effective microphone look directions (903 \ ,
9032, 1003 l 10032) equals zero within a tolerance range of ± 30 % of the norm of
one of the direction information items.
Method (800) for deriving a directional information from a plurality of microphone
signals or from a plurality of components of a microphone signal, wherein different
effective microphone look directions are associated with the microphone signals or
the components, the method comprising:
obtaining (801) a magnitude value from the microphone signal or a component of
the microphone signal; and
combining (803) direction information items describing the effective microphone
look directions, such that a direction information item describing a given effective
microphone look direction is weighted in dependence on the magnitude value of the
microphone signal or of the component of the microphone signal associated with
the given effective microphone look direction, to derive the directional information.
19. Computer program having a program code for, when running on a computer,
performing the method according to claim 18.