Abstract: A system for generating one or more audio output signals is provided. The system comprises a decomposition module (101) a signal processor (105) and an output interface (106). The decomposition module (101) is configured to receive two or more audio input signals wherein the decomposition module (101) is configured to generate a direct component signal comprising direct signal components of the two or more audio input signals and wherein the decomposition module (101) is configured to generate a diffuse component signal comprising diffuse signal components of the two or more audio input signals. The signal processor (105) is configured to receive the direct component signal the diffuse component signal and direction information said direction information depending on a direction of arrival of the direct signal components of the two or more audio input signals. Moreover the signal processor (105) is configured to generate one or more processed diffuse signals depending on the diffuse component signal. For each audio output signal of the one or more audio output signals the signal processor (105) is configured to determine depending on the direction of arrival a direct gain the signal processor (105) is configured to apply said direct gain on the direct component signal to obtain a processed direct signal and the signal processor (105) is configured to combine said processed direct signal and one of the one or more processed diffuse signals to generate said audio output signal. The output interface (106) is configured to output the one or more audio output signals.
(k, ri) and/or r{k, ri) such that the desired consistent spatial image are obtained. For example, when zooming in with the visual camera, the gain functions are adjusted such that the sound is reproduced from the directions where the sources are visible in the video. The weights G (/c, n) and Q and underlying gain functions g and q are further described below. It should be noted that the weights Gi(k, ri) and Q and underlying gain functions g and q may, e.g., be complex-valued. Computing the gain functions requires information such as the zooming factor, width of the visual image, desired look direction, and loudspeaker setup. In other embodiments, the weights are Gt(k, ri) and Q are directly computed within the signal modifier 103, instead of at first computing the gain functions in module 104 and then selecting the weights G (k, ri) and Q from the computed gain functions in the gain selection units 201 and 202. According to embodiments, more than one plane wave per time-frequency may, e.g., be specifically processed. For example, two or more plane waves in the same frequency band from two different directions may, e.g., arrive be recorded by a microphone array at the same point-in-time. These two plane waves may each have a different direction of arrival. In such scenarios, the direct signal components of the two or more plane waves and their direction of arrivals may, e.g., be separately considered. According to embodiments, the direct component signal ri) and one or more further direct component signals Xdir k ri), X d q k, ri) may, e.g, form a group of two or more direct component signals Xd k, ri), Xd, k, ri), . . ., Xdjr q{k, ri), wherein the decomposition module 101 may, e.g., be configured is configured to generate the one or more further direct component signals Xdi k, ri), Xdir q k ri) comprising further direct signal components of the two or more audio input signals k, ri), x2( , ri), . . . x (k, ri). The direction of arrival and one or more further direction of arrivals form a group of two or more direction of arrivals, wherein each direction of arrival of the group of the two or more direction of arrivals is assigned to exactly one direct component signal X d k, ri) of the group of the two or more direct component signals Xdi ri), Xdi,-2 k, ri), X di qm k, ri), wherein the number of the direct component signals of the two or more direct component signals and the number of the direction of arrivals of the two direction of arrivals is equal. The signal processor 105 may, e.g., be configured to receive the group of the two or more direct component signals Xdir {k, ri), Xdmik, ri), X d q k, ri), and the group of the two or more direction of arrivals. For each audio output signal U ·( ri) of the one or more audio output signals Y , ri), Y 2(k, ri), . . ., Y v(k, ri), - The signal processor 105 may, e.g, be configured to determine, for each direct component signal ¾ ri) of the group of the two or more direct component signals Xd , ri), Xd k, ri), Xdr q(k, ri), a direct gain Gp(k, ri) depending on the direction of arrival of said direct component signal Xdi , k, ri), The signal processor 05 may, e.g., be configured to generate a group of two or more processed direct signals Ydi ri), Yd , ri), Ydir q. ri) by applying, for each direct component signal Xdir , ri) of the group of the two or more direct component signals X the direct gain G , ri) of said direct component signal Xdi , k, ri) on said direct component signal X d j ri). And: The signal processor 105 may, e.g., be configured to combine one dif . n) of the one or more processed diffuse signals and each processed signal of the group of the two or more processed signals to generate said audio output signal . Thus, if two or more plane waves are separately considered, the model of formula ( 1) becomes: and the weights may, e.g., be computed analogously to formulae (2a) and (2b) according to: It is sufficient that only a few direct component signals, a diffuse component signal and side information is transmitted from a near-end side to a far-end side. In an embodiment, the number of the direct component signal(s) of the group of the two or more direct component signals X di ri), Xdir2(k, ri), Xdir q ri) plus 1 is smaller than the number of the audio input signals ^k, ri), x (k, ri), . . . x (k, ri) being received by the receiving interface 101 . (using the indices: q + 1 < p) "plus 1" represents the diffuse component signal Xdi^k, ri) that is needed. When in the following, explanations are provided with respect to a single plane wave, to a single direction of arrival and to a single direct component signal, it is to be understood that the explained concepts are equally applicable to more than one plane wave, more than one direction of arrival and more than one direct component signal. In the following, direct and diffuse Sound Extraction is described. Practical realizations of the decomposition module 101 of Fig. 2 , which realizes the direct/diffuse decomposition, are provided. In embodiments, to realize the consistent spatial sound reproduction, the output of two recently proposed informed linearly constrained minimum variance (LCMV) filters described in [8] and [9] are combined, which enable an accurate multi-channel extraction of direct sound and diffuse sound with a desired arbitrary response assuming a similar sound field model as in DirAC (Directional Audio Coding). A specific way of combining these filters according to an embodiment is now described in the following: At first, direct sound extraction according to an embodiment is described. The direct sound is extracted using the recently proposed informed spatial filter described in [8]. This filter is briefly reviewed in the following and then formulated such that it can be used in embodiments according to Fig. 2 . The estimated desired direct signal Ydjr {k,n) for the z'-th loudspeaker channel in (2b) and Fig. 2 is computed by applying a linear multi-channel filter to the microphone signals, e.g., where the vector x(k, n) = [X^{k, n), . . . ,X^k, n)] comprises the M microphone signals and is a complex-valued weight vector. Here, the filter weights minimize the noise and diffuse sound comprised by the microphones while capturing the direct sound with the desired gain Gi(k, n). Expressed mathematically, the weights, may, e.g., be computed as subject to the linear constraint Here, a(k, f ) is the so-called array propagation vector. The -th element of this vector is the relative transfer function of the direct sound between the m-th microphone and a reference microphone of the array (without loss of generality the first microphone at position is used in the following description). This vector depends on the DOA (p(k, n) of the direct sound. The array propagation vector is, for example, defined in [8]. In formula (6) of document [8], the array propagation vector is defined according to wherein i is an azimuth angle of a direction of arrival of an /-th plane wave. Thus, the array propagation vector depends on the direction of arrival. If only one plane wave exists or is considered, index / may be omitted. According to formula (6) of [8], the /-th element a , of the array propagation vector a describes the phase shift of an /-th plane wave from a first to an -th microphone is defined according to E.g., r , is equal to a distance between the first and the /-th microphone, k indicates the wavenumber of the plane wave and is the imaginary number. More information on the array propagation vector a and its elements at can be found in [8] which is explicitly incorporated herein by reference. The M M matrix in (5) is the power spectral density (PSD) matrix of the noise and diffuse sound, which can be determined as explained in [8]. The solution to (5) is given by where Computing the filter requires the array propagation vector a(k,
(k, n), the DOA information can also be provided in the form of the spatial frequency m [ \
(k, n)] for one or more waves arriving at the microphone array. It should be noted that the DOA information can also be provided externally. For example, the DOA of the plane wave can be determined by a video camera together with a face recognition algorithm assuming that human talkers form the acoustic scene. Finally, it should be noted that the DOA information can also be estimated in 3D (in three dimensions). In that case, both the azimuth q> k , n) and elevation &(k, n) angles are estimated in the parameter estimation module 102 and the DOA of the plane wave is in such a case provided, for example, as (f , 9). Thus, when reference is made below to the azimuth angle of the DOA, it is understood that all explanations are also applicable to the elevation angle of the DOA, to an angle or derived from the azimuth angle of the DOA, to an angle or derived from the elevation angle of the DOA or to an angle derived from the azimuth angle and the elevation angle of the DOA. In more general, all explanations provided below are equally applicable to any angle depending on the DOA. Now, distance information determination/estimation is described. Some embodiments relate top acoustic zoom based on DOAs and distances. In such embodiments, the parameter estimation module 102 may, for example, comprise two submodules, e.g., the DOA estimator sub-module described above and a distance estimation sub-module that estimates the distance from the recording position to the sound source r k, n). In such embodiments, it may, for example, be assumed that each plane wave that arrives at the recording microphone array originates from the sound source and propagates along a straight line to the array (which is also known as the direct propagation path). Several state-of-the-art approaches exist for distance estimation using microphone signals. For example, the distance to the source can be found by computing the power ratios between the microphones signals as described in [12]. Alternatively, the distance to the source r(k, n) in acoustic enclosures (e.g., rooms) can be computed based on the estimated signal-to-diffuse ratio (SDR) [13]. The SDR estimates can then be combined with the reverberation time of a room (known or estimated using state-of-the-art methods) to calculate the distance. For high SDR, the direct sound energy is high compared to the diffuse sound which indicates that the distance to the source is small. When the SDR value is low, the direct sound power is week in comparison to the room reverberation, which indicates a large distance to the source. In other embodiments, instead of calculating/estimating the distance by employing a distance computation module in the parameter estimation module 02, external distance information may, e.g., be received, for example, from the visual system. For example, state-of-the-art techniques used in vision may, e.g., be employed that can provide the distance information, for example, Time of Flight (ToF), stereoscopic vision, and structured light. For example, in the ToF cameras, the distance to the source can be computed from the measured time-of-flight of a light signal emitted by a camera and traveling to the source and back to the camera sensor. Computer stereo vision for example, utilizes two vantage points from which the visual image is captured to compute the distance to the source. Or, for example, structured light cameras may be employed, where a known pattern of pixels is projected on a visual scene. The analysis of deformations after the projection allows the visual system to estimate the distance to the source. It should be noted that the distance information r k, n) for each time-frequency bin is required for consistent audio scene reproduction. If the distance information is provided externally by a visual system, the distance to the source r(k, n) that corresponds to the DOA
k, ri)) and q(k, ri) provided by the gain function computation module 104. According to an embodiment, G,( , ri) may, for example, be selected based the DOA information only and Q may, for example, have a constant value. In other embodiments, however, other the weight Gi(k, ri) may, for example, be determined based on further information, and the weight Q may, for example, be variably determined. At first, implementations are considered, that realize consistency with the recorded acoustic scene. Afterwards, embodiments are considered that realize consistency with image information / with a visual image is considered. In the following, a computation of the weights Gi(k, ri) and Q is described to reproduce an acoustic scene that is consistent with the recorded acoustic scene, e.g., such that the listener positioned in a sweet spot of the reproduction system perceives the sound sources as arriving from the DOAs of the sound sources in the recorded sound scene, having the same power as in the recorded scene, and reproducing the same perception of the surrounding diffuse sound. For a known loudspeaker setup, reproduction of the sound source from direction
(k, n) = 30°, the right loudspeaker gain is
Gr (k, n) = g,(30 °) = p r (3 °) = 1 and the left loudspeaker gain is G k, n) = /(30 ) = /(30 ) =
0 . For the direct sound arriving from (p{k, n) = 0°, the final stereo loudspeaker gains are
In an embodiment, the panning gain function, e.g., r ί(f ) , may, e.g., be a head-related
transfer function (HRTF) in case of binaural sound reproduction.
For example, if the HRTF g,{q>) = r ,(f ) returns complex values then the direct sound gain
Gi(k, n) selected in gain selection unit 201 may, e.g., be complex-valued.
If three or more audio output signals shall be generated, corresponding state-of-the-art
panning concepts may, e.g., be employed to pan an input signal to the three or more
audio output signals. For example, VBAP for three or more audio output signals may be
employed.
In consistent acoustic scene reproduction, the power of the diffuse sound should remain
the same as in the recorded scene. Therefore, for the loudspeaker system with e.g.
equally spaced loudspeakers, the diffuse sound gain has a constant value:
where is the number of the output loudspeaker channels. This means that gain function
computation module 104 provides a single output value for the '-th loudspeaker (or
headphone channel) depending on the number of loudspeakers available for reproduction,
and this values is used as the diffuse gain Q across all frequencies. The final diffuse
sound dif n) for the z'-th loudspeaker channel is obtained by decorrelating Y n)
obtained in (2b).
Thus, acoustic scene reproduction that is consistent with the recorded acoustical scene
may be achieved, for example, by determining gains for each of the audio output signals
depending on, e.g., a direction of arrival, by applying the plurality of determined gains
Gj(k, n) on the direct sound signal X dir (k,n) to determine a plurality of direct output
signal components Y i {k,n) , by applying the determined gain Q on the diffuse sound
signal X djff {k,n) to obtain a diffuse output signal component Ydi k n) and by combining
each of the plurality of direct output signal components Ydjr {k,n) with the diffuse output
signal component Ydiff k ,n) to obtain the one or more audio output signals ( , ) .
Now, audio output signal generation according to embodiments is described that achieves
consistency with the visual scene. In particular, the computation of the weights G k, n)
and Q according to embodiments is described that are employed to reproduce an acoustic
scene that is consistent with the visual scene. It is aimed to recreate an acoustical image
in which the direct sound from a source is reproduced from the direction where the source
is visible in a video/image.
A geometry as depicted in Fig. 4 may be considered, where 1 corresponds to the look
direction of the visual camera. Without loss of generality, we I may define the y-axis of the
coordinate system.
The azimuth of the DOA of the direct sound in the depicted (x, y ) coordinate system is
given by (p(k, n) and the location of the source on the x-axis is given by xg(k, ri). Here, it is
assumed that all sound sources are located at the same distance g to the x-axis, e.g., the
source positions are located on the left dashed line, which is referred to in optics as a
focal plane. It should be noted that this assumption is only made to ensure that the visual
and acoustical images are aligned and the actual distance value g is not needed for the
presented processing.
On the reproduction side (far-end side), the display is located at b and the position of the
source on the display is given by x b{k, n). Moreover, cά is the display size (or, in some
embodiments, for example, ¾ indicates half of the display size), is the corresponding
maximum visual angle, S is the sweet spot of the sound reproduction system, and k , ri) from
f k , ri) r and g according to:
Thus, according to an embodiment, the signal processor 105 may, e.g., be configured to
receive an original azimuth angle f k , ri) of the direction of arrival, being the direction of
arrival of the direct signal components of the two or more audio input signals, and is
configured to further receive distance information, and may, e.g., be configured to further
receive distance information r . The signal processor 105 may, e.g., be configured to
calculate a modified azimuth angle q>(k, ri) of the direction of arrival depending on the
azimuth angle of the original direction of arrival f k , ri) and depending on the distance
information r and g . The signal processor 105 may, e.g., be configured to generate each
audio output signal of the one or more of audio output signals depending on the azimuth
angle of the modified direction of arrival p ( k , ri).
The required distance information can be estimated as explained above (the distance g of
the focal plane can be obtained from the lens system or autofocus information). It should
be noted that, for example, in this embodiment, the distance r(k, ri) between the source
and focal plane is transmitted to the far-end side together with the (mapped) DOA p( k , ri).
Moreover, by analogy to the visual zoom, the sources lying at a large distance r from the
focal plane do not appear sharp in the image. This effect is well-known in optics as the socalled
depth-of-field (DOF), which defines the range of source distances that appear
acceptably sharp in the visual image.
An example of the DOF curve as function of the distance r is depicted in Fig. 10(a).
Fig. 0 illustrates example figures for the depth-of-field (Fig. 10(a)), for a cut-off frequency
of a low-pass filter (Fig. 10(b)), and for the time-delay in ms for the repeated direct sound
(Fig. 10(c)).
In Fig. 10(a), the sources at a small distance from the focal plane are still sharp, whereas
sources at larger distances (either closer or further away from the camera) appear as
blurred. So according to an embodiment, the corresponding sound sources are blurred
such that their visual and acoustical images are consistent.
To derive the gains G,(k, ri) and Q in (2a), which realize the acoustic blurring and
consistent spatial sound reproduction, the angle is considered at which the source
positioned at R (f , r ) will appear on a display. The blurred source will be displayed at
where c is the calibration parameter, b ³ 1 is the user-controlled zoom factor, f { , ri) is the
(mapped) DOA, for example, estimated in the parameter estimation module 102. As
mentioned before, the direct gain Gj{k, ri) in such embodiments may, e.g., be computed
from multiple direct gain functions . In particular, two gain functions gi,i{(p(k, ri)) and
g (r (k, ri)) may, for example, be used, wherein the first gain function depends on the DOA
tp(k, ri), and wherein the second gain function depends on the distance r(k, ri). The direct
gain G k, ri) may be computed as:
wherein f ) denotes the panning gain function (to assure that the sound is reproduced
from the right direction), wherein w ( is the window gain function (to assure that the
direct sound is attenuated if the source is not visible in the video), and wherein b(r) is the
blurring function (to blur sources acoustically if they are not located on the focal plane).
It should be noted that all gain functions can be defined frequency-dependent (which is
omitted here for brevity). It should be further noted that in this embodiment the direct gain
is found by selecting and multiplying gains from two different gain functions, as shown
in formula (32).
Both gain functions ¾,·( ) and W i p are defined analogously as described above. For
example, they may be computed, e.g., in the gain function computation module 104, for
example, using formulae (26) and (27), and they remain fixed unless the zoom factor b
changes. The detailed description of these two functions has been provided above. The
blurring function b(r) returns complex gains that cause blurring, e.g. perceptual spreading,
of a source, and thus the overall gain function g will also typically return a complex
number. For simplicity, in the following, the blurring is denoted as a function of a distance
to the focal plane b(r).
The blurring effect can be obtained as a selected one or a combination of the following
blurring effects: Low pass filtering, adding delayed direct sound, direct sound attenuation,
temporal smoothing and/or DOA spreading. Thus, according to an embodiment, the signal
processor 105 may, e.g., be configured to generate the one or more audio output signals
by conducting low pass filtering, or by adding delayed direct sound, or by conducting
direct sound attenuation, or by conducting temporal smoothing, or by conducting direction
of arrival spreading.
Low pass filtering: In vision, a non-sharp visual image can be obtained by low-pass
filtering, which effectively merges the neighboring pixels in the visual image. By analogy,
an acoustic blurring effect can be obtained by low-pass filtering of the direct sound with
the cut-off frequency selected based on the estimated distance of the source to the focal
plane r . In this case, the blurring function b(r, k) returns the low-pass filter gains for
frequency k and distance r . An example curve for the cut-off frequency of a first-order lowpass
filter for the sampling frequency of 16 kHz is shown in Fig. 10(b). For small distances
r , the cut-off frequency is close to the Nyquist frequency, and thus almost no low-pass
filtering is effectively performed. For larger distance values, the cut-off frequency is
decreased until it levels off at 3 kHz where the acoustical image is sufficiently blurred.
Adding delayed direct sound: In order to unsharpen the acoustical image of a source, we
can decorrelated the direct sound, for instance by repeating an attenuating the direct
sound after some delay t (e.g., between 1 and 30 ms). Such processing can, for example,
be conducted according to the complex gain function of formula (34):
where a denotes the attenuation gain for the repeated sound and r is the delay after which
the direct sound is repeated. An example delay curve (in ms) is shown in Fig. 10(c). For
small distances, the delayed signal is not repeated and a is set to zero. For larger
distances, the time delay increases with increasing distance, which causes a perceptual
spreading of an acoustic source.
Direct sound attenuation: The source can also be perceived as blurred when the direct
sound is attenuated by a constant factor. In this case b{r) = const < 1. As mentioned
above, the blurring function b{r) can consist of any of the mentioned blurring effects or as
a combination of these effects. In addition, alternative processing that blurs the source
can be used.
Temporal smoothing: Smoothing of the direct sound across time can, for example, be
used to perceptually blur the acoustic source. This can be achieved by smoothing the
envelop of the extracted direct signal over time.
DOA spreading: Another method to unsharpen an acoustical source consists in
reproducing the source signal from the range of directions instead from the estimated
direction only. This can be achieved by randomizing the angle, for example, by taking a
random angle from a Gaussian distribution centered around the estimated f . Increasing
the variance of such a distribution, and thus the widening the possible DOA range,
increases the perception of blurring.
Analogously as described above, computing the diffuse gain function q(fi) in the gain
function computation module 104, may, in some embodiments, require only the
knowledge of the number of loudspeakers available for reproduction. Thus the diffuse
gain function q fi ) can, in such embodiments, be set as desired for the application. For
example for equally spaced loudspeakers, the real-valued diffuse sound gain
n formula (2a) is selected in the gain selection unit 202 based on the zoom
parameter b . The aim of using the diffuse gain is to attenuate the diffuse sound depending
on the zooming factor, e.g., zooming increases the DRR of the reproduced signal. This is
achieved by lowering Q for larger b . In fact, zooming in means that the opening angle of
the camera becomes smaller, e.g., a natural acoustical correspondence would be a more
directive microphone which captures less diffuse sound. To mimic this effect, we can use
for instance the gain function shown in Fig. 8. Clearly, the gain function could also be
defined differently. Optionally, the final diffuse sound Ydf n) for the z'-th loudspeaker
channel is obtained by decorrelating Yd< , n) obtained in formula (2b).
Now, embodiments are considered that realize an application to hearing aids and
assistive listening devices. Fig. 1 illustrates such a hearing aid application.
Some embodiments are related to binaural hearing aids. In this case, it is assumed that
each hearing aid is equipped with at least one microphone and that information can be
exchanged between the two hearing aids. Due to some hearing loss, the hearing impaired
person might experience difficulties focusing (e.g., concentrating on sounds coming from
a particular point or direction) on a desired sound or sounds. In order to help the brain of
the hearing impaired person to process the sounds that are reproduced by the hearing
aids, the acoustical image is made consistent with the focus point or direction of the
hearing aids user. It is conceivable that the focus point or direction is predefined, user
defined, or defined by a brain-machine interface. Such embodiments ensure that desired
sounds (which are assumed to arrive from the focus point or focus direction) and the
undesired sounds appear spatially separated.
In such embodiments, the directions of the direct sounds can be estimated in different
ways. According to an embodiment, the directions are determined based on the inter-aural
level differences (ILDs) and/or inter-aural time differences (ITDs) that are determined
using both hearing aids (see [15] and [16]).
According to other embodiments, the directions of the direct sounds on the left and right
are estimated independently using a hearing aid that is equipped with at least two
microphones (see [17]). The estimated directions can be fussed based on the sound
pressure levels at the left and right hearing aid, or the spatial coherence at the left and
right hearing aid. Because of the head shadowing effect, different estimators may be
employed for different frequency bands (e.g., ILDs at high frequencies and ITDs at low
frequencies).
In some embodiments, the direct and diffuse sound signals may, e.g., be estimated using
the aforementioned informed spatial filtering techniques. In this case, the direct and
diffuse sounds as received at the left and right hearing aid can be estimated separately
(e.g., by changing the reference microphone), or the left and right output signals can be
generated using a gain function for the left and right hearing aid output, respectively, in a
similar way the different loudspeaker or headphone signals are obtained in the previous
embodiments.
In order to spatially separate the desired and undesired sounds, the acoustic zoom
explained in the aforementioned embodiments can be applied. In this case, the focus
point or focus direction determines the zoom factor.
Thus, according to an embodiment, a hearing aid or an assistive listening device may be
provided, wherein the hearing aid or an assistive listening device comprises a system as
described above, wherein the signal processor 105 of the above-described system
determines the direct gain for each of the one or more audio output signals, for example,
depending on a focus direction or a focus point.
In an embodiment, the signal processor 105 of the above-described system may, e.g., be
configured to receive zoom information. The signal processor 105 of the above-described
system may, e.g., be configured to generate each audio output signal of the one or more
audio output signals depending on a window gain function, wherein the window gain
function depends on the zoom information. The same concepts as explained with
reference to Fig. 7(a), 7(b) and 7(c) are employed.
If a window function argument, depending on the focus direction or on the focus point, is
greater than a lower threshold and smaller than an upper threshold, the window gain
function is configured to return a window gain being greater than any window gain
returned by the window gain function, if the window function argument is smaller than the
lower threshold, or greater than the upper threshold.
For example, in case of the focus direction, focus direction may itself be the window
function argument (and thus, the window function argument depends on the focus
direction). In case of the focus position, a window function argument, may, e.g., be
derived from the focus position.
Similarly, the invention can be applied to other wearable devices which include assistive
listening devices or devices such as Google Glass®. It should be noted that some
wearable devices are also equipped with one or more cameras or ToF sensor that can be
used to estimate the distance of objects to the person wearing the device.
Although some aspects have been described in the context of an apparatus, it is clear that
these aspects also represent a description of the corresponding method, where a block or
device corresponds to a method step or a feature of a method step. Analogously, aspects
described in the context of a method step also represent a description of a corresponding
block or item or feature of a corresponding apparatus.
The inventive decomposed signal can be stored on a digital storage medium or can be
transmitted on a transmission medium such as a wireless transmission medium or a wired
transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be
implemented in hardware or in software. The implementation can be performed using a
digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an
EPROM, an EEPROM or a FLASH memory, having electronically readable control signals
stored thereon, which cooperate (or are capable of cooperating) with a programmable
computer system such that the respective method is performed.
Some embodiments according to the invention comprise a non-transitory data carrier
having electronically readable control signals, which are capable of cooperating with a
programmable computer system, such that one of the methods described herein is
performed.
Generally, embodiments of the present invention can be implemented as a computer
program product with a program code, the program code being operative for performing
one of the methods when the computer program product runs on a computer. The
program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods
described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program
having a program code for performing one of the methods described herein, when the
computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital
storage medium, or a computer-readable medium) comprising, recorded thereon, the
computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence
of signals representing the computer program for performing one of the methods
described herein. The data stream or the sequence of signals may for example be
configured to be transferred via a data communication connection, for example via the
Internet.
A further embodiment comprises a processing means, for example a computer, or a
programmable logic device, configured to or adapted to perform one of the methods
described herein.
A further embodiment comprises a computer having installed thereon the computer
program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable
gate array) may be used to perform some or all of the functionalities of the methods
described herein. In some embodiments, a field programmable gate array may cooperate
with a microprocessor in order to perform one of the methods described herein. Generally,
the methods are preferably performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of the present
invention. It is understood that modifications and variations of the arrangements and the
details described herein will be apparent to others skilled in the art. It is the intent,
therefore, to be limited only by the scope of the impending patent claims and not by the
specific details presented by way of description and explanation of the embodiments
herein.
References
[ 1] Y. Ishigaki, M. Yamamoto, K. Totsuka, and N. Miyaji, "Zoom microphone," in Audio
Engineering Society Convention 67, Paper 1713, October 1980.
[2] M. Matsumoto, H. Naono, H. Saitoh, K. Fujimura, and Y. Yasuno, "Stereo zoom
microphone for consumer video cameras," Consumer Electronics, IEEE
Transactions on, vol. 35, no. 4, pp. 759-766, November 1989. August 13, 2014
[3] T. van Waterschoot, W. J. Tirry, and M. Moonen, "Acoustic zooming by multi
microphone sound scene manipulation," J. Audio Eng. Soc, vol. 6 1, no. 7/8, pp.
489-507, 2013.
[4] V. Pulkki, "Spatial sound reproduction with directional audio coding," J. Audio Eng.
Soc, vol. 55, no. 6, pp. 503-516, June 2007.
[5] R. Schultz-Amling, F. Kuech, O. Thiergart, and M. Kallinger, "Acoustical zooming
based on a parametric sound field representation," in Audio Engineering Society
Convention 128, Paper 8120, London UK, May 2010.
[6] O. Thiergart, G. Del Galdo, M. Taseska, and E. Habets, "Geometry-based spatial
sound acquisition using distributed microphone arrays," Audio, Speech, and
Language Processing, IEEE Transactions on, vol. 2 1, no. 12, pp. 2583-2594,
December 201 3 .
[7] K. Kowalczyk, O. Thiergart, A. Craciun, and E. A. P. Habets, "Sound acquisition in
noisy and reverberant environments using virtual microphones," in Applications of
Signal Processing to Audio and Acoustics (WASPAA), 2013 IEEE Workshop on,
October 2013.
[8] O. Thiergart and E. A. P. Habets, "An informed LCMV filter based on multiple
instantaneous direction-of-arrival estimates," in Acoustics Speech and Signal
Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp. 659-663.
[9] O. Thiergart and E. A. P. Habets, "Extracting reverberant sound using a linearly
constrained minimum variance spatial filter," Signal Processing Letters, IEEE, vol.
2 1, no. 5 , pp. 630-634, May 2014.
[10] R. Roy and T. Kailath, "ESPRIT-estimation of signal parameters via rotational
invariance techniques," Acoustics, Speech and Signal Processing, IEEE
Transactions on, vol. 37, no. 7, pp. 984-995, July 1989.
[ 1 1] B. Rao and K. Hari, "Performance analysis of root-music," in Signals, Systems and
Computers, 1988. Twenty-Second Asilomar Conference on, vol. 2 , 1988, pp. 578-
582.
[12] H. Teutsch and G. Elko, "An adaptive close-talking microphone array," in
Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop
on the, 2001 , pp. 163-166.
[13] O. Thiergart, G. D. Galdo, and E. A. P. Habets, "On the spatial coherence in mixed
sound fields and its application to signal-to-diffuse ratio estimation," The Journal of
the Acoustical Society of America, vol. 132, no. 4, pp. 2337-2346, 2012.
[14] V. Pulkki, "Virtual sound source positioning using vector base amplitude panning,"
J. Audio Eng. Soc, vol. 45, no. 6 , pp. 456-466, 1997.
[15] J. Blauert, Spatial hearing, 3rd ed. Hirzel-Verlag, 2001 .
[16] T. May, S. van de Par, and A. Kohlrausch, "A probabilistic model for robust
localization based on a binaural auditory front-end," IEEE Trans. Audio, Speech,
Lang. Process., vol. 19, no. 1, pp. 1-13, 201 1.
[17] J. Ahonen, V. Sivonen, and V. Pulkki, "Parametric spatial sound processing
applied to bilateral hearing aids," in AES 45th International Conference, Mar. 2012.
Claims
1. A system for generating one or more audio output signals, comprising:
a decomposition module (101 ) ,
a signal processor (105), and
an output interface (106),
wherein the decomposition module (101 ) is configured to receive two or more
audio input signals, wherein the decomposition module (101 ) is configured to
generate a direct component signal, comprising direct signal components of the
two or more audio input signals, and wherein the decomposition module (101 ) is
configured to generate a diffuse component signal, comprising diffuse signal
components of the two or more audio input signals,
wherein the signal processor (105) is configured to receive the direct component
signal, the diffuse component signal and direction information, said direction
information depending on a direction of arrival of the direct signal components of
the two or more audio input signals,
wherein the signal processor (105) is configured to generate one or more
processed diffuse signals depending on the diffuse component signal,
wherein, for each audio output signal of the one or more audio output signals, the
signal processor (105) is configured to determine, depending on the direction of
arrival, a direct gain, the signal processor (105) is configured to apply said direct
gain on the direct component signal to obtain a processed direct signal, and the
signal processor (105) is configured to combine said processed direct signal and
one of the one or more processed diffuse signals to generate said audio output
signal, and
wherein the output interface (106) is configured to output the one or more audio
output signals.
2 . A system according to claim 1
wherein the signal processor (105) is configured to determine two or more audio
output signals,
wherein for each audio output signal of the two or more audio output signals a
panning gain function is assigned to said audio output signal,
wherein the panning gain function of each of the two or more audio output signals
comprises a plurality of panning function argument values, wherein a panning
function return value is assigned to each of said panning function argument values,
wherein, when said panning gain function receives one of said panning function
argument values, said panning gain function is configured to return the panning
function return value being assigned to said one of said panning function argument
values, and
wherein the signal processor (105) is configured to determine each of the two or
more audio output signals depending on a direction dependent argument value of
the panning function argument values of the panning gain function being assigned
to said audio output signal, wherein said direction dependent argument value
depends on the direction of arrival.
3 . A system according to claim 2 ,
wherein the panning gain function of each of the two or more audio output signals
has one or more global maxima, being one of the panning function argument
values, wherein for each of the one or more global maxima of each panning gain
function, no other panning function argument value exists for which said panning
gain function returns a greater panning function return value than for said global
maxima, and
wherein, for each pair of a first audio output signal and a second audio output
signal of the two or more audio output signals, at least one of the one or more
global maxima of the panning gain function of the first audio output signal is
different from any of the one or more global maxima of the panning gain function of
the second audio output signal.
4 . A system according to claim 2 or 3 ,
wherein the signal processor (105) is configured to generate each audio output
signal of the one or more audio output signals depending on a window gain
function,
wherein the window gain function is configured to return a window function return
value when receiving a window function argument value,
wherein, if the window function argument value is greater than a lower window
threshold and smaller than an upper window threshold, the window gain function is
configured to return a window function return value being greater than any window
function return value returned by the window gain function, if the window function
argument value is smaller than the lower threshold, or greater than the upper
threshold.
5. A system according to one of claims 2 to 4, wherein the signal processor (105) is
configured to further receive orientation information indicating an angular shift of a
look direction with respect to the direction of arrival, and wherein at least one of the
panning gain function and the window gain function depends on the orientation
information, or
wherein the gain function computation module (104) is configured to further
receive zoom information, wherein the zoom information indicates an opening
angle of a camera, and wherein at least one of the panning gain function and the
window gain function depends on the zoom information, or
wherein the gain function computation module (104) is configured to further
receive a calibration parameter, and wherein at least one of the panning gain
function and the window gain function depends on the calibration parameter.
6. A system according to one of the preceding claims,
wherein the signal processor (105) is configured to receive distance information,
wherein the signal processor (105) is configured to generate each audio output
signal of the one or more audio output signals depending on the distance
information.
7. A system according to claim 6,
wherein the signal processor (105) is configured to receive an original angle value
depending on an original direction of arrival, being the direction of arrival of the
direct signal components of the two or more audio input signals, and is configured
to receive the distance information,
wherein the signal processor (105) is configured to calculate a modified angle
value depending on the original angle value and depending on the distance
information, and
wherein the signal processor (105) is configured to generate each audio output
signal of the one or more audio output signals depending on the modified angle
value.
8 . A system according to claim 6 or 7, wherein the signal processor (105) is
configured to generate the one or more audio output signals by conducting low
pass filtering, or by adding delayed direct sound, or by conducting direct sound
attenuation, or by conducting temporal smoothing, or by conducting direction of
arrival spreading, or by conducting decorrelation.
9. A system according to one of the preceding claims,
wherein the signal processor (105) is configured to generate two or more audio
output channels,
wherein the signal processor (105) is configured to apply the diffuse gain on the
diffuse component signal to obtain an intermediate diffuse signal, and
wherein the signal processor (105) is configured to generate one or more
decorrelated signals from the intermediate diffuse signal by conducting
decorrelation,
wherein the one or more decorrelated signals form the one or more processed
diffuse signals, or wherein the intermediate diffuse signal and the one or more
decorrelated signals form the one or more processed diffuse signals.
10. A system according to one of the preceding claims,
wherein the direct component signal and one or more further direct component
signals form a group of two or more direct component signals, wherein the
decomposition module (101 ) is configured is configured to generate the one or
more further direct component signals comprising further direct signal components
of the two or more audio input signals,
wherein the direction of arrival and one or more further direction of arrivals form a
group of two or more direction of arrivals, wherein each direction of arrival of the
group of the two or more direction of arrivals is assigned to exactly one direct
component signal of the group of the two or more direct component signals,
wherein the number of the direct component signals of the two or more direct
component signals and the number of the direction of arrivals of the two direction
of arrivals is equal,
wherein the signal processor (105) is configured to receive the group of the two or
more direct component signals, and the group of the two or more direction of
arrivals, and
wherein, for each audio output signal of the one or more audio output signals,
the signal processor (105) is configured to determine, for each direct
component signal of the group of the two or more direct component signals,
a direct gain depending on the direction of arrival of said direct component
signal,
the signal processor (105) is configured to generate a group of two or more
processed direct signals by applying, for each direct component signal of
the group of the two or more direct component signals, the direct gain of
said direct component signal on said direct component signal, and
the signal processor (105) is configured to combine one of the one or more
processed diffuse signals and each processed signal of the group of the
two or more processed signals to generate said audio output signal.
1 1. A system according to claim 10, wherein the number of the direct component
signals of the group of the two or more direct component signals plus 1 is smaller
than the number of the audio input signals being received by the receiving
interface (101 ) .
12. A hearing aid or an assistive listening device comprising a system according to
one of claims 1 to 11.
13. An apparatus for generating one or more audio output signals, comprising:
a signal processor (105), and
an output interface (106),
wherein the signal processor (105) is configured to receive a direct component
signal, comprising direct signal components of the two or more original audio
signals, wherein the signal processor (105) is configured to receive a diffuse
component signal, comprising diffuse signal components of the two or more
original audio signals, and wherein the signal processor (105) is configured to
receive direction information, said direction information depending on a direction of
arrival of the direct signal components of the two or more audio input signals,
wherein the signal processor (105) is configured to generate one or more
processed diffuse signals depending on the diffuse component signal,
wherein, for each audio output signal of the one or more audio output signals, the
signal processor (105) is configured to determine, depending on the direction of
arrival, a direct gain, the signal processor (105) is configured to apply said direct
gain on the direct component signal to obtain a processed direct signal, and the
signal processor (105) is configured to combine said processed direct signal and
one of the one or more processed diffuse signals to generate said audio output
signal, and
wherein the output interface (106) is configured to output the one or more audio
output signals.
14. A method for generating one or more audio output signals, comprising:
receiving two or more audio input signals,
generating a direct component signal, comprising direct signal components of the
two or more audio input signals,
generating a diffuse component signal, comprising diffuse signal components of
the two or more audio input signals,
receiving direction information depending on a direction of arrival of the direct
signal components of the two or more audio input signals,
generating one or more processed diffuse signals depending on the diffuse
component signal,
for each audio output signal of the one or more audio output signals, determining,
depending on the direction of arrival, a direct gain, applying said direct gain on the
direct component signal to obtain a processed direct signal, and combining said
processed direct signal and one of the one or more processed diffuse signals to
generate said audio output signal, and
outputting the one or more audio output signals.
15. A method for generating one or more audio output signals, comprising:
receiving a direct component signal, comprising direct signal components of the
two or more original audio signals,
receiving a diffuse component signal, comprising diffuse signal components of the
two or more original audio signals,
receiving direction information, said direction information depending on a direction
of arrival of the direct signal components of the two or more audio input signals,
generating one or more processed diffuse signals depending on the diffuse
component signal,
for each audio output signal of the one or more audio output signals, determining,
depending on the direction of arrival, a direct gain, applying said direct gain on the
direct component signal to obtain a processed direct signal, and the combining
said processed direct signal and one of the one or more processed diffuse signals
to generate said audio output signal, and
outputting the one or more audio output signals.
16. A computer program for implementing the method of claim 14 or 15 when being
executed on a computer or signal processor.
| # | Name | Date |
|---|---|---|
| 1 | Form 5 [27-10-2016(online)].pdf | 2016-10-27 |
| 2 | Form 3 [27-10-2016(online)].pdf | 2016-10-27 |
| 3 | Form 18 [27-10-2016(online)].pdf_54.pdf | 2016-10-27 |
| 4 | Form 18 [27-10-2016(online)].pdf | 2016-10-27 |
| 5 | Drawing [27-10-2016(online)].pdf | 2016-10-27 |
| 6 | Description(Complete) [27-10-2016(online)].pdf | 2016-10-27 |
| 7 | 201617036834.pdf | 2016-11-01 |
| 7 | 201617036834-ABSTRACT [17-06-2020(online)].pdf | 2020-06-17 |
| 8 | abstract.jpg | 2017-01-09 |
| 8 | 201617036834-CLAIMS [17-06-2020(online)].pdf | 2020-06-17 |
| 9 | Form 26 [27-01-2017(online)].pdf | 2017-01-27 |
| 9 | 201617036834-DRAWING [17-06-2020(online)].pdf | 2020-06-17 |
| 10 | 201617036834-FER_SER_REPLY [17-06-2020(online)].pdf | 2020-06-17 |
| 10 | 201617036834-Power of Attorney-310117.pdf | 2017-02-03 |
| 11 | 201617036834-Correspondence-310117.pdf | 2017-02-03 |
| 11 | 201617036834-OTHERS [17-06-2020(online)].pdf | 2020-06-17 |
| 12 | 201617036834-FORM 4(ii) [05-03-2020(online)].pdf | 2020-03-05 |
| 12 | Form 3 [13-04-2017(online)].pdf | 2017-04-13 |
| 13 | 201617036834-FORM 3 [15-01-2020(online)].pdf | 2020-01-15 |
| 13 | Other Patent Document [14-04-2017(online)].pdf | 2017-04-14 |
| 14 | 201617036834-FER.pdf | 2019-09-18 |
| 14 | Other Document [14-04-2017(online)].pdf_454.pdf | 2017-04-14 |
| 15 | Other Document [14-04-2017(online)].pdf | 2017-04-14 |
| 15 | 201617036834-FORM 3 [12-09-2019(online)].pdf | 2019-09-12 |
| 16 | 201617036834-FORM 3 [20-03-2019(online)].pdf | 2019-03-20 |
| 16 | Form 13 [14-04-2017(online)].pdf_453.pdf | 2017-04-14 |
| 17 | 201617036834-FORM 3 [19-02-2019(online)].pdf | 2019-02-19 |
| 17 | Form 13 [14-04-2017(online)].pdf | 2017-04-14 |
| 18 | 201617036834-FORM 3 [14-02-2019(online)].pdf | 2019-02-14 |
| 18 | 201617036834-OTHERS-180417.pdf | 2017-04-21 |
| 19 | 201617036834-Correspondence-180417.pdf | 2017-04-21 |
| 19 | 201617036834-FORM 3 [05-09-2018(online)].pdf | 2018-09-05 |
| 20 | 201617036834-FORM 3 [07-09-2017(online)].pdf | 2017-09-07 |
| 20 | 201617036834-FORM 3 [09-03-2018(online)].pdf | 2018-03-09 |
| 21 | 201617036834-FORM 3 [07-09-2017(online)].pdf | 2017-09-07 |
| 21 | 201617036834-FORM 3 [09-03-2018(online)].pdf | 2018-03-09 |
| 22 | 201617036834-Correspondence-180417.pdf | 2017-04-21 |
| 22 | 201617036834-FORM 3 [05-09-2018(online)].pdf | 2018-09-05 |
| 23 | 201617036834-FORM 3 [14-02-2019(online)].pdf | 2019-02-14 |
| 23 | 201617036834-OTHERS-180417.pdf | 2017-04-21 |
| 24 | 201617036834-FORM 3 [19-02-2019(online)].pdf | 2019-02-19 |
| 24 | Form 13 [14-04-2017(online)].pdf | 2017-04-14 |
| 25 | 201617036834-FORM 3 [20-03-2019(online)].pdf | 2019-03-20 |
| 25 | Form 13 [14-04-2017(online)].pdf_453.pdf | 2017-04-14 |
| 26 | 201617036834-FORM 3 [12-09-2019(online)].pdf | 2019-09-12 |
| 26 | Other Document [14-04-2017(online)].pdf | 2017-04-14 |
| 27 | 201617036834-FER.pdf | 2019-09-18 |
| 27 | Other Document [14-04-2017(online)].pdf_454.pdf | 2017-04-14 |
| 28 | 201617036834-FORM 3 [15-01-2020(online)].pdf | 2020-01-15 |
| 28 | Other Patent Document [14-04-2017(online)].pdf | 2017-04-14 |
| 29 | 201617036834-FORM 4(ii) [05-03-2020(online)].pdf | 2020-03-05 |
| 29 | Form 3 [13-04-2017(online)].pdf | 2017-04-13 |
| 30 | 201617036834-Correspondence-310117.pdf | 2017-02-03 |
| 30 | 201617036834-OTHERS [17-06-2020(online)].pdf | 2020-06-17 |
| 31 | 201617036834-FER_SER_REPLY [17-06-2020(online)].pdf | 2020-06-17 |
| 31 | 201617036834-Power of Attorney-310117.pdf | 2017-02-03 |
| 32 | 201617036834-DRAWING [17-06-2020(online)].pdf | 2020-06-17 |
| 32 | Form 26 [27-01-2017(online)].pdf | 2017-01-27 |
| 33 | 201617036834-CLAIMS [17-06-2020(online)].pdf | 2020-06-17 |
| 33 | abstract.jpg | 2017-01-09 |
| 34 | 201617036834.pdf | 2016-11-01 |
| 34 | 201617036834-ABSTRACT [17-06-2020(online)].pdf | 2020-06-17 |
| 35 | Description(Complete) [27-10-2016(online)].pdf | 2016-10-27 |
| 35 | 201617036834-FORM 3 [08-07-2020(online)].pdf | 2020-07-08 |
| 36 | Drawing [27-10-2016(online)].pdf | 2016-10-27 |
| 36 | 201617036834-FORM 3 [08-07-2020(online)]-1.pdf | 2020-07-08 |
| 37 | Form 18 [27-10-2016(online)].pdf | 2016-10-27 |
| 37 | 201617036834-PatentCertificate07-08-2020.pdf | 2020-08-07 |
| 38 | Form 18 [27-10-2016(online)].pdf_54.pdf | 2016-10-27 |
| 38 | 201617036834-IntimationOfGrant07-08-2020.pdf | 2020-08-07 |
| 39 | 201617036834-RELEVANT DOCUMENTS [19-09-2022(online)].pdf | 2022-09-19 |
| 40 | 201617036834-RELEVANT DOCUMENTS [20-09-2023(online)].pdf | 2023-09-20 |
| 1 | 2019-04-1512-48-37_15-04-2019.pdf |