Source Separation

< Back

Source Separation

Abstract: The present examples refer to methods, apparatus and techniques for obtaining a plurality of output signals associated with different sources (e.g. audio sources). In one example, it is possible to: combine a first input signal (502, M0), or a processed version thereof, with a delayed and scaled version (5031) of a second input signal (M1), to obtain a first output signal (504, S'0); and combine a second input signal (502, M1), or a processed version thereof, with a delayed and scaled version (5030) of the first input signal (M0), to obtain a second output signal (504, S'1). It is possible to determine, e.g. using a random direction optimization (560): scaling values (564, a0, a1), to obtain the delayed and scaled versions (5030) of the first and second input signals; and delay values (564, d0, d1), to obtain the delayed and scaled versions of the first and second input signals.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

29 March 2022

Publication Number

22/2022

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Hansastraße 27c 80686 München

TECHNISCHE UNIVERSITÄT ILMENAU

Ehrenbergstraße 29 98693 llmenau

Inventors

1. SCHULLER, Gerald

c/o Fraunhofer-Institut für Digitale Medientechnologie IDMT Ehrenbergstraße 31 98693 Ilmenau

Specification

Source Separation

Technical field

The present examples refer to methods and apparatus for obtaining a plurality of output signals associated with different sources (e.g. audio sources). The present examples also refer to methods and apparatus for signal separation. The present examples also refer to methods and apparatus for teleconferencing. Techniques for separation (e.g., audio source separation) are also disclosed. Techniques for fast time domain stereo audio source separation (e.g. using fractional delay filters) are also discussed.

Introductive discussion

Fig. 1 shows the setup of microphones indicated with 50a. The microphones 50a may include a first microphone mic0 and a second microphone mic1 which are here shown at a distance of 5 cm (50 mm) from each other. Other distances are possible. Two different sources (source0 and source1) are here shown. As identified by angles β0 and β1, they are placed in different positions (here also in different orientations with respect to each other).

A plurality of input signals M0 and M1 (from the microphones, also collectively indicated as a multi-channel, or stereo, input signal 502), are obtained from the sound source0 and source1. While source0 generates the audio sound indexed as S0, source1 generates the audio sound indexed as S1.

The microphone signals M0 and M1 may be considered, for example, as input signals. It is possible to consider a multi-channel with more than 2 channels instead of stereo signal 502.

The input signals may be more than two in some examples (e.g. other additional input channels besides M0 and M1), even though here only two channels are mainly discussed. Notwithstanding, the present examples are valid for any multi-channel input signal. In examples, it is also not necessary that the signals M0 and M1 are directly obtained by a microphone, since they may be obtained, for example, from a stored audio file.

Figs. 2a and 4 show the interactions between the sources source0 and source1 and the microphones mic0 and mic1. For example, the source0 generates an audio sound S0, which primarily reaches the microphone mic0 and also reaches the microphone mic1. The same applies to source1, whose generated audio source S1 primarily reaches the microphone mic1 and also reaches the microphone mic0. We see from Figs. 2a and 4 that the sound S0 needs less time to reach at the microphone mic0, than the time needed for reaching microphone

mic1. Analogously, the sound S1 needs less time to arrive at mic1, than the time it takes to arrive at mic0. The intensity of the signal S0, when reaching the microphone mic1, is in general attenuated with respect to when reaching mic0, and vice versa.

Accordingly, in the multi-channel input signal 502, the channel signals M0 and M1 are such that the signals S0 and S1 from the sound source0 and source1 are combinations of each other. Separation techniques are therefore pursued.

Summary

Here below, text in square brackets and round brackets indicates non-limiting examples.

wherein the apparatus is configured to combine a first input signal [M0], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a1. Z-d1. M1] of a second input signal [e.g. M1] [e.g. by subtracting the delayed and scaled version of the second input signal from the first input signal, e.g. by S’0 = M0(z) - a1.z- d1.M1(z)], to obtain a first output signal [S’0];

wherein the apparatus is configured to combine a second input signal [M1], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a0 Z-d0.M0] of the first input signal [M0] [e.g. by subtracting the delayed and scaled version of the first input signal from the second input signal, e.g. by S’1 = M1(z) - a0. z-do.M0(z)], to obtain a second output signal [S’1];

wherein the apparatus is configured to determine, using a random direction optimization [e.g. by performing one of operations defined in other claims, for example; and/or by finding the delay and attenuation values which minimize an objective function, which could be, for example that in formulas (6) and/or (8)]:

a first scaling value [a0], which is used to obtain the delayed and scaled version [a0.z-d0.M0] of the first input signal [M0];

a first delay value [d0], which is used to obtain the delayed and scaled version [a0.z-d0.M0] of the first input signal [M0];

a second scaling value [a1], which is used to obtain the delayed and scaled version [a1.z-d1.M1] of the second input signal [M1]; and

a second delay value [d1], which is used to obtain the delayed and scaled version of the second input signal [a1.z-d1.M1].

The delayed and scaled version [a1.z-d1.M1] of the second input signal [M1], may be combined with the first input signal [M0], is obtained by applying a fractional delay to the second input signal [M1],

The delayed and scaled version [a0.z-d0.M0] of the first input signal [M0], may be combined with the second input signal [M1], is obtained by applying a fractional delay to the first input signal [M0].

The apparatus may sum a plurality of products [e.g., as in formula (6) or (8)] between:

- a respective element [Pi(n), with i being 0 or 1 ] of a first set of normalized magnitude values [e.g., as in formula (7)], and

- a logarithm of a quotient formed on the basis of:

o the respective element [P(n) or Pi(n)] of the first set of normalized magnitude values; and

o a respective element [Q(n) or Q1(n)] of a second set of normalized magnitude values,

in order to obtain a value [DKL(P||Q) or D(P0,P1) in formulas (6) or (8)] describing a similarity [or dissimilarity] between a signal portion [so’(n)] described by the first set of normalized magnitude values [P0(n), for n=1 to ...] and a signal portion [s1,(n)] described by the second set of normalized magnitude values [P1(n), for n=1 to ...].

The random direction optimization may be such that candidate parameters form a candidates’ vector [e.g., with four entries, e.g. corresponding to a0, a1, d0, d1], wherein the vector is iteratively refined [e.g., in different iterations, see also claims 507ff.] by modifying the vector in random directions.

The random direction optimization may be such that candidate parameters form a candidates’ vector [e.g., with four entries, e.g. corresponding to a0, a1, d0, d1], wherein the vector is iteratively refined [e.g., in different iterations, see also below] by modifying the vector in random directions.

The random direction optimization may be such that a metrics and/or a value indicating the similarity (or dissimilarity) between the first and second output signals is measured, and the first and second output measurements are selected to be those measurements associated to the candidate parameters associated to the value or metrics indicating lowest similarity (or highest dissimilarity).

At least one of the first and second scaling values and first and second delay values may be obtained by minimizing the mutual information or related measure of the output signals.

In accordance to an aspect, there is provided an apparatus for obtaining a plurality of output signals [S’0, S’1] associated with different sound sources [source1, source2] on the basis of a plurality of input signals [e.g. microphone signals][ M0, M1], in which signals from the sound sources [source1, source2] are combined,

wherein the apparatus is configured to combine a first input signal [M0], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a1.z-d1.M1] of a second input signal [M1], to obtain a first output signal [S’0], wherein the apparatus is configured to apply a fractional delay [d1] to the second input signal [M1] [wherein the fractional delay (di) may be indicative of the relationship and/or difference between the delay (e.g. delay represented by H1,0) of the signal (H1,0.S1) arriving at the first microphone (mic0) from the second source (source1) and the delay (e.g. delay represented by H1,1) of the signal (H1,1. S1) arriving at the second microphone (mic1) from the second (source1)][in examples, the fractional delay di may be understood as approximating the exponent of the z term of the result of the fraction H1,0(z)/H1,1(z)];

wherein the apparatus is configured to combine a second input signal [M1], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a0.z-d0.M0] of the first input signal [M0], to obtain a second output signal [S’1], wherein the apparatus is configured to apply a fractional delay [d0] to the first input signal [M0] [wherein the fractional delay (d0) may be indicative of the relationship and/or difference between the delay (e.g. delay represented by H0,0) of the signal (H0,0.S0) arriving at the first microphone (mic0) from the first source (source0) and the delay (e.g. delay represented by H0,1) of the signal (H0,1.S0) arriving at the second microphone (mic1) from the first source (source0)][ in examples, the fractional delay d0 may be understood as approximating the exponent of the z term of the result of the fraction H0,1(z)/H0,0(z)];

wherein the apparatus is configured to determine, using an optimization:

a first scaling value [a0], which is used to obtain the delayed and scaled version [a0.z-d0.M0] of the first input signal [M0];

a first fractional delay value [d0], which is used to obtain the delayed and scaled version [a0.z-d0.M0] of the first input signal [M0];

a second scaling value [a1], which is used to obtain the delayed and scaled version [a1.z-d1.M1] of the second input signal [M1]; and

a second fractional delay value [d1], which is used to obtain the delayed and scaled version [a1.z-d1.M1] of the second input signal [M1].

The optimization may be a random direction optimization.

The apparatus may sum a plurality of products [e.g., as in formula (6) or (8)] between: - a respective element [Pi(n), with i being 0 or 1 ] of a first set of normalized magnitude values [e.g., as in formula (7)], and

- a logarithm of a quotient formed on the basis of:

o the respective element [P(n) or P1(n)] of the first set of normalized magnitude values; and

o a respective element [Q(n) or Q1(n)] of a second set of normalized magnitude values,

in order to obtain a value [DKL(P||Q) or D(P0,P1) in formulas (6) or (8)] describing a similarity [or dissimilarity] between a signal portion [so’(n)] described by the first set of normalized magnitude values [P0(n), for n=1 to ...] and a signal portion [s1’(n)] described by the second set of normalized magnitude values [P1(n), for n=1 to ...].

In accordance to an aspect, there is provided an apparatus [e.g. a multichannel or stereo audio source separation apparatus] for obtaining a plurality of output signals [S’0, S’1] associated with different sound sources [source0, source1] on the basis of a plurality of input signals [e.g. microphone signals][M0, M1], in which signals from the sound sources are combined [e.g. by subtracting a delayed and scaled version of a second input signal from a first input signal and/or by subtracting a delayed and scaled version of a first input signal from a second input signal],

wherein the apparatus is configured to combine a first input signal [M0], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a1.z-d1.M1] of a second input signal [M1] [e.g. by subtracting the delayed and scaled version of the second input signal from the first input signal], to obtain a first output signal [S’0], wherein the apparatus is configured to combine a second input signal [M1], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a0.z-d0.M0] of the first input signal [M0] [e.g. by subtracting the delayed and scaled version of the first input signal from the second input signal], to obtain a second output signal [ S’1], wherein the apparatus is configured to sum a plurality of products [e.g., as in formula (6) or (8)] between:

- a respective element [P1(n), with i being 0 or 1] of a first set of normalized magnitude values [e.g., as in formula (7)], and

- a logarithm of a quotient formed on the basis of:

o the respective element [P(n) or P1(n)] of the first set of normalized magnitude values; and

o a respective element [Q(n) or Q1(n)] of a second set of normalized magnitude values,

in order to obtain a value [DKL(P||Q) or D(P0,P1) in formulas (6) or (8)] describing a similarity [or dissimilarity] between a signal portion [so’(n)] described by the first set of normalized magnitude values [P0(n), for n=1 to ...] and a signal portion [s1’(n)] described by the second set of normalized magnitude values [P1(n), for n=1 to ...].

The apparatus may determine:

a first scaling value [a1], which is used to obtain the delayed and scaled version of the first input signal [M0],

a first delay value [d0], which is used to obtain the delayed and scaled version of the first input signal,

a second scaling value [a1], which is used to obtain the delayed and scaled version of the second input signal, and

a second delay value [d1], which is used to obtain the delayed and scaled version of the second input signal, using an optimization [e.g. on the basis of a “modified KLD computation”]

The first delay value [d0] may be a fractional delay. The second delay value [d1] is a fractional delay.

The optimization may be a random direction optimization.

The apparatus may perform at least some of the processes in the time domain. The apparatus may perform at least some of the processes in the z transform or frequency domain.

The apparatus may be configured to:

combine the first input signal [M0], or a processed [e.g. delayed and/or scaled] version thereof, with the delayed and scaled version [a1.z-d1.M1] of the second input signal [M1] in the time domain and/or in the z transform or frequency domain;

combine the second input signal [M1], or a processed [e.g. delayed and/or scaled] version thereof, with the delayed and scaled version [a0.z-d0.M0] of the first input signal [M0] in the time domain and/or in the z transform or frequency domain.

The optimization may be performed in the time domain and/or in the z transform or frequency domain.

The fractional delay (d0) applied to the second input signal [M1] may be indicative of the relationship and/or difference or arrival between:

the signal [S0.H0,0(z)] from the first source [source0] received by the first microphone [mic0]; and

the signal [S0.H0,1(z)] from the first source [source0] received by the second microphone [mic1].

The fractional delay (d1) applied to the first input signal [M0] may be indicative of the relationship and/or difference or arrival between:

the signal [S1.H1,1(z)] from the second source [source1] received by the second microphone [mic1]; and

the signal [S1.H1,0(z)] from the second source [source1] received by the first microphone [mic0].

The apparatus may perform an optimization [e.g., the optimization such that different candidate parameters [a0, a1, d0, d1] are iteratively chosen and processed, and a metrics [e.g., as in formula (6) or (8)] [e.g. on the basis of a “modified KLD computation”][e.g., objective function] is measured for each of the candidate parameters, wherein the metrics is a similarity metrics (or dissimilarity metrics), so as to choose the first input signal [M0] and the second input signal [M0]) obtained by using the candidate parameters [a0, a1, d0, d1 ] which associated to the metrics indicating the lowest similarity (or largest dissimilarity), [the similarity may be imagined as a statistic dependency between the first and second input signals (or values associated thereto, such as those in formula (7)), and/or the dissimilarity may be imagined as a statistic independency between the first and second input signals (or values associated thereto, such as those in formula (7)]

For each iteration, the candidate parameters may include a candidate delay (do) [e.g., a candidate fractional delay] to be applied to the second input signal [M1], the candidate delay (do) being associate to a candidate relationship and/or candidate difference or arrival between:

the signal [S0.H0,0(z)] from the first source [source0] received by the first microphone [mic0]; and

the signal [S0.H0,1(z)] from the first source [source0] received by the second microphone [mic1].

For each iteration, the candidate parameters include a candidate delay (d1) [e.g., a candidate fractional delay] to be applied to the first input signal [M0], the candidate delay (d1) being associable to a candidate relationship and/or candidate difference or arrival between:

the signal [S1. H1,1(z)] from the second source [source1] received by the second microphone [mic1]; and

the signal [S1. H1,0(z)] from the second source [source1] received by the first microphone [mic0].

For each iteration, the candidate parameters may include a candidate relative attenuation value [a0] to be applied to the second input signal [M1], the candidate relative attenuation value [a0] being indicative of a candidate relationship and/or candidate difference between:

the amplitude of the signal [S0.H0,0(z)] received by the first microphone [mic0] from the first source [source0]; and

the amplitude of the signal [S0.H0,1(z)] received by the second microphone [mic1] from the first source [source0].

For each iteration, the candidate parameters may include a candidate relative attenuation value [a1] to be applied to the first input signal [M0], the candidate relative attenuation value [ai] being indicative of a candidate relationship and/or candidate difference between: the amplitude of the signal [S1. H1,1(z)] received by the second microphone [mic1] from the second source [source1]; and

the amplitude of the signal [S1. H1,0(z)] received by the first microphone [mic0] from the second source [source1].

The apparatus may change at least one candidate parameter for different iterations by randomly choosing at least one step from at least one candidate parameter for a preceding

iteration to at least one candidate parameter for a subsequent iteration [e.g. , random direction optimization].

The apparatus may choose the at least one step [e.g., coeffvariation in line 10 of algorithm 1] randomly [e.g., random direction optimization].

The at least one step may be weighted by a preselected weight [e.g. coeffweights in line 5 of algorithm 1].

The at least one step is limited by a preselected weight [e.g. coeffweights in line 5 of algorithm 1].

The apparatus may be so that the candidate parameters [a0, a1, d0, d1 ] form a candidates’ vector, wherein, for each iteration, the candidates’ vector is perturbed [e.g., randomly] by applying a vector of uniformly distributed random numbers [e.g., each between -0.5 and +0.5], which are element-wise multiplied by (or added to) the elements of the candidates’ vector. [it is possible to avoid gradient processing] [e.g., random direction optimization].

For each iteration, the candidates’ vector is modified (e.g., perturbed) for a step [e.g., which is each between -0.5 and +0.5].

The apparatus may be so that the numeric of iterations is limited to a predetermined maximum number, the predetermined maximum number being between 10 and 30 (e.g., 20, as in subsection 2.3, last three lines).

The metrics may be processed as a Kullback-Leibler divergence.

The metrics may be based on:

for each of the first and second signals [M0, M1], a respective element [Pi(n), with i being 0 or 1 ] of a first set of normalized magnitude values [e.g., as in formula (7)]. [a trick may be: considering the normalized magnitude values of the time domain samples as probability distributions, and after that measuring the metrics (e.g., as the Kullback-Leibler divergence, e.g. as obtained though formula (6) or (8))]

For at least one of the first and second input signals [M0, M1], the respective element [Pi(n)] may be based on the candidate first or second outputs signal [S’0, S’1] as obtained from the candidate parameters [e.g., like in formula (7)].

For at least one of the first and second input signals [M0, M1], the respective element [Pi(n)] may be obtained as a fraction between:

a value [e.g., absolute value] associated to a candidate first or second output signal [S’0(n), S’1(n)] [e.g., in absolute value]; and

a norm [e.g., 1-norm] associated to the previously obtained values of the first or second output signal [S’0(...n-1), S’1(...n-1)].

For at least one of the first and second input signals [M0, M1], the respective element [Pi(n)] may be obtained by

(Here, “s’i (n)” and “s'i" are written without capital letters by virtue of not being, in this case, z transforms).

The metrics may include a logarithm of a quotient formed on the basis of:

o the respective element [P(n) or P1(n)] of the first set of normalized magnitude values; and

o a respective element [Q(n) or Q1 (n)] of a second set of normalized magnitude values,

in order to obtain a value [DKL(P||Q) or D(P0,P1) in formulas (6) or (8)] describing a similarity [or dissimilarity] between a signal portion [so’(n)] described by the first set of normalized magnitude values [P0(n), for n=1 to ...] and a signal portion [s1’(n)] described by the second set of normalized magnitude values [P1(n), for n=1 to ...].

The metrics may be obtained in form of:

wherein P(n) is an element associated to the first input signal [e.g., P1(n) or element of the first set of normalized magnitude values] and Q(n) is an element associated to the second input signal [e.g., P2(n) or element of the second set of normalized magnitude values].

The metrics may be obtained in form of:

wherein P1(n) is an element associated to the first input signal [e.g., P1(n) or element of the first set of normalized magnitude values] and P2(n) is an element associated to the second input signal [e.g., element of the second set of normalized magnitude values].

The apparatus may perform the optimization using a sliding window [e.g., the optimization may take into account TD samples of the last 0.1s...1.0s].

The apparatus may transform, into a frequency domain, information associated to the obtained first and second output signals (S’0, S’1).

The apparatus may encode information associated to the obtained first and second output signals (S’0, S’1).

The apparatus may store information associated to the obtained first and second output signals (S’0, S’1).

The apparatus may transmit information associated to the obtained first and second output signals (S’0, S’1).

The apparatus of any of the preceding claims may include at least one of a first microphone (mic0) for obtaining the first input signal [M0] and a second microphone (mic1) for obtaining the second input signal [M1], [e.g., at a fixed distance]

An apparatus for teleconferencing may be provided, including the apparatus as above and equipment for transmitting information associated to the obtained first and second output signals (S’0, S’1).

A binaural system is disclosed including the apparatus as above.

An optimizer is disclosed for iteratively optimizing physical parameters associated to physical signals, wherein the optimizer is configured, at each iteration, to randomly generate a current candidate vector for evaluating whether the current candidate vector performs better than a current best candidate vector,

wherein the optimizer is configured to evaluate an objective function associated to a similarity, or dissimilarity, between physical signals, in association to the current candidate vector,

wherein the optimizer is configured so that, in case the current candidate vector causes the objective function to be reduced with respect to the current best candidate vector, to render, as the new current best candidate vector, the current candidate vector.

The physical signal may include audio signals obtained by different microphones.

The parameters may include a delay and/or a scaling factor for an audio signal obtained at a particular microphone.

The objective function is a Kullback-Leibler divergence. The Kullback-Leibler divergence may be applied to a first and a second sets of normalized magnitude values.

The objective function may be obtained by summing a plurality of products [e.g., as in formula (6) or (8)] between:

- a respective element [P,(n), with i being 0 or 1] of a first set of normalized magnitude values [e.g., as in formula (7)], and

- a logarithm of a quotient formed on the basis of:

o the respective element [P(n) or P1(n)] of the first set of normalized magnitude values; and

o a respective element [Q(n) or Q1 (n)] of a second set of normalized magnitude values,

in order to obtain a value [DKL(P||Q) or D(P0,P1) in formulas (6) or (8)] describing a similarity [or dissimilarity] between a signal portion [s0’(n)] described by the first set of normalized magnitude values [P0(n), for n=1 to ...] and a signal portion [s1’(n)] described by the second set of normalized magnitude values [P1(n), for n=1 to ...].

The objective function may be obtained as

wherein P1(n) or P(n) is an element associated to the first input signal [e.g., P1(n) or element of the first set of normalized magnitude values] and P2(n) or Q(n) is an element associated to the second input signal.

the method comprising:

combining a first input signal [M0], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a1.z-d1.M1] of a second input signal [M1] [e.g. by subtracting the delayed and scaled version of the second input signal from the first input signal, e.g. by S’0 = M0(z) - a1.z-d1.M1(z)], to obtain a first output signal [S’0];

combining a second input signal [M1], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a0.z-d0.M0] of the first input signal [M0] [e.g. by subtracting the delayed and scaled version of the first input signal from the second input signal, e.g. by S’1 = M1(z) - a0.z-d0.M0(z)], to obtain a second output signal [S’I]; determining, using a random direction optimization [e.g. by performing one of operations defined in other claims, for example; and/or by finding the delay and attenuation values which minimize an objective function, which could be, for example that in formulas (6) and/or (8)]:

a first scaling value [a0], which is used to obtain the delayed and scaled version [a0*z-d0*M0] of the first input signal [M0];

a first delay value [d0], which is used to obtain the delayed and scaled version [a0*z-d0*M0] of the first input signal [M0];

a second scaling value [a1], which is used to obtain the delayed and scaled version [a1*z-d1*M1] of the second input signal [M1]; and

a second delay value [d1], which is used to obtain the delayed and scaled version of the second input signal [a1*z-d1*M1].

In accordance to an example, there is provided a method method for obtaining a plurality of output signals [S’0, S’1] associated with different sound sources [source1, source2] on the basis of a plurality of input signals [e.g. microphone signals][ M0, M1], in which signals from the sound sources [source1, source2] are combined,

the method including

combining a first input signal [M0], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a1*z-d1*M1] of a second input signal [M1], to obtain a first output signal [S’0], wherein the method is configured to apply a fractional delay [d1] to the second input signal [M1] [wherein the fractional delay (d1) may be indicative of the relationship and/or difference between the delay (e.g. delay represented by H1,0) of the signal (H1,0*S1) arriving at the first microphone (mic0) from the second source (source1) and the delay (e.g. delay represented by H1,1) of the signal (H1,1*S1) arriving at the second microphone (mic1) from the second (source1)][in examples, the fractional delay di may be understood as approximating the exponent of the z term of the result of the fraction H1,0(z)/H1,1(z)];

combining a second input signal [M1], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a0*z-d0*M0] of the first input signal [M0], to obtain a second output signal [S’I], wherein the method is configured to apply a fractional delay [d0] to the first input signal [M0] [wherein the fractional delay (d0) may be indicative of the relationship and/or difference between the delay (e.g. delay represented by H0,0) of the signal (H0,0*S0) arriving at the first microphone (mic0) from the first source (source0) and the delay (e.g. delay represented by H0,1) of the signal (H0,1*S0) arriving at the second microphone (mic1) from the first source (source0)][ in examples, the fractional delay do may be understood as approximating the exponent of the z term of the result of the fraction H0,1(z)/H0,0(z)];

determining, using an optimization:

a first scaling value [a0], which is used to obtain the delayed and scaled version [a0*z-d0*M0] of the first input signal [M0];

a first fractional delay value [d0], which is used to obtain the delayed and scaled version [a0*z-d0*M0] of the first input signal [M0];

a second scaling value [a1], which is used to obtain the delayed and scaled version [a1*z-d1*M1] of the second input signal [M1]; and

a second fractional delay value [d1], which is used to obtain the delayed and scaled version [a1*z-d1*M1] of the second input signal [M1],

In accordance to an example, there is provided a method for obtaining a plurality of output signals [S’0, S’1] associated with different sound sources [source0, source1] on the basis of a plurality of input signals [e.g. microphone signals][M0, M1], in which signals from the sound sources are combined [e.g. by subtracting a delayed and scaled version of a second input signal from a first input signal and/or by subtracting a delayed and scaled version of a first input signal from a second input signal],

combining a first input signal [M0], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a1*z-d1*M1] of a second input signal [M1] [e.g. by subtracting the delayed and scaled version of the second input signal from the first input signal], to obtain a first output signal [S’0],

combining a second input signal [M1], or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version [a0*z-d0*M0] of the first input signal [M0] [e.g. by subtracting the delayed and scaled version of the first input signal from the second input signal], to obtain a second output signal [S’I],

summing a plurality of products [e.g., as in formula (6) or (8)] between:

- a respective element [P1(n), with i being 0 or 1] of a first set of normalized magnitude values [e.g., as in formula (7)], and

- a logarithm of a quotient formed on the basis of:

o the respective element [P(n) or P1(n)] of the first set of normalized magnitude values; and

o a respective element [Q(n) or Q1(n)] of a second set of normalized magnitude values,

In accordance to an example, there is provided a method of any of the preceding method claims, configured to use equipment as above or below.

A non-transitory storage unit storing instructions which, when executed by a processor, cause the processor to perform a method according to any of the preceding method claims.

Brief description of the figures

Fig. 1 shows a layout of microphones and sources useful to understand the invention;

Fig. 2a shows a functioning technique according to the present invention;

Fig. 2b shows a signal block diagram of convulsive mixing and mixing process;

Fig. 3 shows performance evaluation of BSS algorithm applied to simulated data;

Fig. 4 shows a layout of microphones and sound sources useful to understand the invention; Fig. 5 shows an apparatus according to the invention;

Figs. 6a, 6b and 6c show results obtainable with the invention; and

Fig. 7 shows elements of the apparatus of Fig. 5.

Description of examples

It has been understood that by applying the techniques such as those discussed above and below, a signal may be processed so as to arrive at a plurality of signals
separated with each other. Therefore, the result is that the output signal is not affected (or negligibly

or minimally affected) from the sound S0, while the output signal is not affected (or

minimally or negligibly affected) by the effects of the sound S1 onto the microphone mic0.

An example is provided by Fig. 2b, showing a model of the physical relationships between the generated sounds S0 and S1 and the signal 502 as collectively obtained from the microphones M0 and M1. The results are here represented in the z transform (which in some cases is not indicated for the sake of brevity). As can be seen from block 501 , the sound signal S0 is subjected to a transfer function H0,0(z) which is summed to the sound signal S1 (modified by a transfer function H1,0(z)). Accordingly, the signal M0(z) is obtained at microphone mic0 and keeps into account, unwantedly, the sound signal S1(z). Analogously, the signal M1(z) as obtained at the microphone mic1 includes both a component associated to the sound signal S1(z) (obtained through a transfer function H1,1(z)) and a second, unwanted component caused by the sound signal S0(z) (after having been subjected to the transfer function H0,1(z)). This phenomenon is called crosstalk.

In order to compensate for the crosstalk, the solution indicated at block 510 may be exploited. Here, the multi-channel output signal 504 includes:

• a first output signal (representing the sound S0 collected at microphone mic0

but polished from the crosstalk), which includes at least the two components:

o the input signal M0 and

o a subtractive component 503i (which is a delayed and/or scaled version of the signal M1 and which may be being obtained by subjecting the signal M1 to the transfer function -a1 . z-d1)

• an output signal
(representing the sound S1 collected at microphone mic1 but polished from the crosstalk) which includes:

o the input signal M1 and

o a subtractive component 5030 (which is a delayed and/or scaled version of the first input signal M0 as obtained at the microphone mic0 and which may be obtained by subjecting the signal M0 to the transfer function -a0 . z-d0).

The mathematical explanations are provided below, but it may be understood that the subtractive components 5031 and 5030 at block 510 compensate for the unwanted components caused at block 501. It is therefore clear that block 510 permits to obtain a plurality (504) of output signals (S,0, S’1), associated with different sound sources (source0, source1), on the basis of a plurality (502) of input signals [e.g. microphone signals][(M0, M1), in which signals (S0, S1) from the sound sources (source0, source1) are (unwantedly) combined (501). The block 510 may be configured to combine (510) the first input signal (M0), or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version (5031) [a1.z-d1.M1] of the second input signal (M1) [e.g. by subtracting the delayed and scaled version of the second input signal from the first input signal, e.g. by S’0(z) = M0(z) - a1.z-d1.M1(z)], to obtain a first output signal (S’0); wherein the block is configured to combine (510) a second input signal (M1), or a processed [e.g. delayed and/or scaled] version thereof, with a delayed and scaled version (5030) [a0.z-d0.M0] of the first input signal [M0] [e.g. by subtracting the delayed and scaled version of the first input signal from the second input signal, e.g. by S’1(z) = M1(z) - a0.z-d0. M0(z)], to obtain a second output signal [S’1],

While the z transform is particularly useful in this case, it is notwithstanding possible to make use of other kinds of transforms or to directly operate in the time domain.

Basically, it may be understood that a couple of scaling values a0 and a1 modify the amplitude of the subtractive components 5031 and 5030 to obtain a scaled version of the

input signals, and the delays do and di may be understood as fractional delays. In examples, the fractional delay do may be understood as approximating the exponent of the z term of the result of the fraction H0,1(z)/H0,0(z)]. The fractional delay d1 may be indicative of the relationship and/or difference between the delay (e.g. delay represented by H1,0) of the signal (H1,0 S1) arriving at the first microphone (mic0) from the second source (source1) and the delay (e.g. delay represented by H1,1) of the signal (H1,1.S1) arriving at the second microphone (mic1) from the second (source1). In examples, the fractional delay d1 may be understood as approximating the exponent of the z term of the result of the fraction H1,0(z)/H1,1(z)]. The fractional delay do may be indicative of the relationship and/or difference between the delay (e.g. delay represented by H0,0) of the signal (H0,0 S0) arriving at the first microphone (mic0) from the first source (source0) and the delay (e.g. delay represented by H0,1) of the signal (H0,1.S0) arriving at the second microphone (mic1) from the first source (source0)][in examples, the fractional delay d0 may be understood as approximating the exponent of the z term of the result of the fraction H0,1(z)/H0,0(z)].

As it will be explained subsequently, it is possible to find the most preferable values (also collectively indicated with the reference numeral 564), in particular:

• a first scaling value [a0], e.g., which is used to obtain the delayed and scaled version 5030 [a0.z-d0.M0] of the first input signal [502, M0];

• a first fractional delay value [d0], e.g., which is used to obtain the delayed and scaled version 5030 [a0.z-d0.M0] of the first input signal [502, M0];

• a second scaling value [a1], e.g., which is used to obtain the delayed and scaled version 5031 [a1.z-d1.M1] of the second input signal [502, M1]; and

• a second fractional delay value [d1], e.g., which is used to obtain the delayed and scaled version 5031 [a1.z-d1.M1] of the second input signal [502, M1],

Techniques for obtaining the most preferable scaling values a0 and a1 and delay values do and di are here discussed, particularly with reference to Fig. 5. As can be seen from Fig. 5, a stereo or multiple channel signal 502 (including the inputs signals M0(z) and M1(z)) is obtained. As can be seen, the method may be iterative, in the sense that it is possible to cycle along multiple iterations for obtaining the best values of the scaling values and the delay values to be adopted.

Fig. 5 shows an output 504 formed by signals which are optimized, e.g.

after multiple iterations. Fig. 5 shows the mixing block 510, which may be the block 510 of Fig. 2b.

The multichannel signal 510 (including its channel components, i.e. the multiple input signals is thus obtained by making use of scaling values a0 and a1 and

delay values d0 and d1, which are more and more optimized along the iterations.

At block 520, normalizations are performed to the signals An example of
normalization is provided by formula (7), represented as the following quotient:

Here, i = 0,1, indicating that there is a normalized value P0(n) for the input signal M0 and a normalized value P1(n) for the input signal M1. The index n is the time index of the time domain input signal. Here,
is the time domain sample index (it is not a z transform) of the signal Mi (with i=0, 1). indicates that the magnitude (e.g. absolute value) of

obtained and is therefore positive or, at worse, 0. This implies that the numerator in formula (7) is positive or, at worse, 0.
indicates that the denominator in formula (7) is formed by the 1-norm of the vector
The 1-norm
indicates the sum of the magnitudes where n goes over the signal samples, e.g. up to the present index (e.g., the

signal samples may be taken within a predetermined window from a past index to the present index). Hence, (which is the denominator in formula (7)) is positive (or is 0

in some cases). Moreover, it is always which implies that 0≤Pi(n)≤1

(i=0,1). Further, also the following is verified:

It has been therefore noted that P0(n) and P1(n) can be artificially considered as probabilities since, by adopting equation (7), they verify:

1. Pi(n) ≥ 0, ∀n

with i = 0,1 (further discussion is provided here below). “∞” is used for mathematical formalism, but can approximated over the considered signal.

It is noted that other kinds of normalizations may be provided, and not only those obtained through formula (7).

Fig. 5 shows block 530 which is input by the normalized values 522 and outputs a similarity value (or the similarity value) 532, giving information between the first and second input values M0and M1. Block 530 may be understood as a block which measures a metrics that gives an indication of how much the input signals M0 and M1 are similar (or dissimilar) to each other.

It has been understood that the metrics chosen for indicating the similarity or dissimilarity between the first and second input values may be the so-called Kullback-Leibler Divergence (KLD). This can be obtained using formulas (6) or (8):

A discussion on how to obtain the Kullback-Leibler Divergence (KLD) is now provided. Fig.

7 shows an example of block 530 downstream to block 520 of Fig. 5. Block 520 therefore provides P0(n) and P1(n) (522), e.g. using the formula (7) as discussed above (other techniques may be used). Block 530 (which may be understood as a Kullback-Leibler processor or KL processor) may be adapted to obtain a metrics 532, which is in this case the Kullback-Leibler Divergence as calculated in formula (8).

With reference to Fig. 7, at a first branch 700a, a quotient 702’ between P0(n) and P1(n) is calculated at block 702. At block 706, a logarithm of the quotient 702’ is calculated, hence, obtaining the value 706’. Then, the logarithm value 706’ may be used for scaling the normalized value P0 at scaling block 710, hence, obtaining a product 710’. At a second branch 700b, a quotient 704’ is calculated at block 704. The logarithm 708' of the quotient 704’ is calculated at block 704. Then, the logarithm value 708’ is used for scaling the normalized value at scaling block 712, hence, obtaining the product 712’.

At adder block 714, the values 710’ and 712’ (as respectively obtained at branches 700a and 700b) are combined to each other. The combined values 714’ are summed with each other and along the sample domain indexes at block 716. The added values 716’ may be inverted at block 718 (e.g., scaled by -1) to obtain the inverted value 718’. It is to be noted that, while the value 716’ can be understood as a similarity value, the inverted value 718' can be understood as a dissimilarity value. Either the value 716’ or the value 718’ may be provided as metrics 532 to the optimizer 560 as explained above (value 716’ indicating similarity, value 718’ indicating dissimilarity).

Hence, the optimizer block 530 may therefore permit to arrive at formula (8), i.e.

^ ¾ In order to arrive at formula (6), e.g. DKL, it

could simply be possible to eliminate, from Fig. 7, blocks 704, 708, 712 and 714, and substitute P0 with P and P1 with Q.

The Kullback-Leibler divergence was natively conceived for giving measurements regarding probabilities and is in principle, unrelated to the physical significance of the input signals M0 and M1. Notwithstanding, the inventors have understood that, by normalizing the signals SQ and S1 and obtaining normalized values such as P0(n) and P1(n), the Kullback-Leibler Divergence provides a valid metrics for measuring the similarity/dissimilarity between the input signals M0 and M1. Hence, it is possible to consider the normalized magnitude values of the time domain samples as probability distributions, and after that, it is possible to measure the metrics (e.g., as the Kullback-Leibler divergence, e.g. as obtained though formula (6) or (8)).

Reference is now made to Fig. 5 again. For each iteration, the metrics 532 provides a good estimate of the validity of the scaling values a0 and a1 and the delay values d0 and d1. Along the iterations, the different candidate values for the scaling values a0 and a1 and the delay values d0 and d1 will be chosen among those candidates, which presents the lowest similarity or highest dissimilarity.

Block 560 (optimizer) is input by the metrics 532 and outputs candidates 564 (vector) for the delay values d0 and d1 and the scaling values a0 and a1. The optimizer 560 may measure

the different metrics obtained for different groups of candidates a0, a1, d0, d1, change them, and choose the group of candidates associated to the lowest similarity (or highest dissimilarity) 532. Hence, the output 504 (output signals will provide the best
approximation. The candidate values 564 may be grouped in a vector, which can be subsequently modified, for example, through a random technique (Fig. 5 shows a random generator 540 providing a random input 542 to the optimizer 560). The optimizer 560 may make use of weights through which the candidate values 564 (a0, a1, d0, d1) are scaled (e.g., randomly). Initial coefficient weights 562 may be provided, e.g., by default. An example of processing of the optimizer 560 is provided and discussed profusely below (“algorithm 1”). Possible correspondences between the lines of the algorithm and elements of Fig. 5 are also shown in Fig. 5.

As may be seen, the optimizer 564 outputs a vector 564 of values a0, a1, d0, d1, which are subsequently reused at the mixing block 510 for obtaining new values 512, new normalized values 522, and new metrics 532. After a certain number of iterations (which could be for example predefined) a maximum numbers of iterations may be, for example, a number chosen between 10 and 20. Basically, the optimizer 560 may be understood as finding the delay and iteration values, which minimize an objective function, which could be, for example, the metrics 532 obtained at block 530 and/or using formulas (6) and (8).

It has been therefore understood that the optimizer 560 may be based on a random direction optimization technique, such that candidate parameters form a candidates’ vector [e.g., with four entries, e.g. corresponding to 564, a0, a1, d0, d1 ], wherein the candidates’ vector is iteratively refined by modifying the candidates’ vector in random directions.

Claims

1. An apparatus (500) for obtaining a plurality of output signals (504, S’0, S’1), associated with different sound sources (source0, source1), on the basis of a plurality of input signals (502, M0, M1), in which signals (S0, S1) from the sound sources (source0, source1) are combined (501),

wherein the apparatus is configured to combine (510) a first input signal (502, M0), or a processed version thereof, with a delayed and scaled version (5031) of a second input signal (M1), to obtain a first output signal (504, S’0);

wherein the apparatus is configured to combine (510) a second input signal (502, M1), or a processed version thereof, with a delayed and scaled version (5030) of the first input signal (M0), to obtain a second output signal (504, S'1);

wherein the apparatus is configured to determine, using a random direction optimization (560):

a first scaling value (564, a0), which is used to obtain the delayed and scaled version (5030) of the first input signal (502, M0);

a first delay value (564, do), which is used to obtain the delayed and scaled version (5030) of the first input signal (502, M0);

a second scaling value (564, a1), which is used to obtain the delayed and scaled version (5031) of the second input signal (502, M1); and

a second delay value (564, d1), which is used to obtain the delayed and scaled version (5031) of the second input signal.

2. The apparatus of claim 1, wherein the delayed and scaled version (5031) of the second input signal (502, M1), to be combined with the first input signal (502, M0), is obtained by applying a fractional delay to the second input signal (502, M1).

3. The apparatus of claim 1 or 2, wherein the delayed and scaled version (5030) of the first input signal (502, M0), to be combined with the second input signal (502, M1), is obtained by applying a fractional delay to the first input signal (502, M0).

4. The apparatus of any of the preceding claims, configured to sum (714, 716) a plurality of products (712’, 710’) between:

- a respective element (P0) of a first set of normalized magnitude values, and - a logarithm (706’) of a quotient (702’) formed on the basis of:

o the respective element (P0) of the first set of normalized magnitude values (522); and

o a respective element (P1) of a second set of normalized magnitude values (522),

in order to obtain (530) a value (DKL, D, 532) describing a similarity, or dissimilarity, between a signal portion (s’0)described by the first set of normalized magnitude values (P0) and a signal portion (s’1) described by the second set of normalized magnitude values (P1).

5. The apparatus of any of the preceding claims, wherein the random direction optimization (560) is such that candidate parameters form a candidates’ vector (564, a0, a1, d0, d1), wherein the candidates’ vector is iteratively refined by modifying the candidates’ vector in random directions.

6. The apparatus of any of the preceding claims, wherein the random direction optimization (560) is such that candidate parameters form a candidates’ vector, wherein the candidates’ vector is iteratively refined by modifying the candidates’ vector in random directions.

7. The apparatus of any of the preceding claims, wherein the random direction optimization (560) is such that a metrics and/or a value (DKL, D, 532) indicating the similarity, or dissimilarity, between the first and second output signals is measured, and the first and second output measurements are selected to be those measurements associated to the candidate parameters associated to the value or metrics indicating lowest similarity, or highest dissimilarity.

8. An apparatus (500) for obtaining a plurality of output signals (504, S’0, S’1), associated with different sound sources (source1, source2), on the basis of a plurality of input signals (502, M0, M1), in which signals from the sound sources (source1, source2) are combined (501),

wherein the apparatus is configured to combine (510) a first input signal (502, M0), or a processed version thereof, with a delayed and scaled version (503i) of a second input signal (502, M1), to obtain a first output signal (504, S’0), wherein the apparatus is configured to apply a fractional delay (d1) to the second input signal (502, M1);

wherein the apparatus is configured to combine (510) a second input signal (502, M1), or a processed version thereof, with a delayed and scaled version (5030) of the first

input signal (502, M0), to obtain a second output signal (504, S'1), wherein the apparatus is configured to apply a fractional delay (d0) to the first input signal (502, M0);

wherein the apparatus is configured to determine, using an optimization (560):

a first scaling value (564, a0), which is used to obtain the delayed and scaled version (5030) of the first input signal (502, M0);

a first fractional delay value (564, do), which is used to obtain the delayed and scaled version (5030) of the first input signal (502, M0);

a second scaling value (564, ai), which is used to obtain the delayed and scaled version (5031) of the second input signal (502, M1); and

a second fractional delay value (564, d1), which is used to obtain the delayed and scaled version (5031) of the second input signal (502, M1).

9. The apparatus according to claim 8, wherein the optimization is a random direction optimization (560).

10. The apparatus of any of claims 8 or 9, configured to sum (714, 716) a plurality of products (710, 712) between:

- a respective element (P0) of a first set of normalized magnitude values, and - a logarithm (706’) of a quotient (702’) formed on the basis of:

o the respective element (P0) of the first set of normalized magnitude values;

and

o a respective element (P1) of a second set of normalized magnitude values, in order to obtain (530) a value (DKL, D, 532) describing a similarity, or dissimilarity, between a signal portion (s0') described by the first set of normalized magnitude values (P0) and a signal portion (s1’) described by the second set of normalized magnitude values (P1).

11. An apparatus (500) for obtaining a plurality of output signals (504, S’0, S’1) associated with different sound sources (source0, source1) on the basis of a plurality of input signals (M0, M1), in which signals from the sound sources are combined,

wherein the apparatus is configured to sum (714, 716) a plurality of products between:

- a respective element (P0) of a first set of normalized magnitude values, and - a logarithm (706’) of a quotient (702’) formed on the basis of:

o the respective element (P0) of the first set of normalized magnitude values;

and

o a respective element (P1) of a second set of normalized magnitude values, in order to obtain (530) a value (DKL, D, 532) describing a similarity, or dissimilarity, between a signal portion (s0’) described by the first set of normalized magnitude values (P0) and a signal portion (s1’) described by the second set of normalized magnitude values.

12. The apparatus of claim 11 , configured to determine at least one of:

a first scaling value (564, a1), which is used to obtain the delayed and scaled version of the first input signal (502, M0),

a first delay value (564, d0), which is used to obtain the delayed and scaled version of the first input signal,

a second scaling value (564, a1), which is used to obtain the delayed and scaled version of the second input signal, and

a second delay value (564, d1), which is used to obtain the delayed and scaled version of the second input signal, using an optimization.

13. The apparatus of claim 12, wherein the first delay value (d0) is a fractional delay.

14. The apparatus of any of claims 11 to 13, wherein the second delay value (d1) is a fractional delay.

15. The apparatus of any of claims 12 to 14, wherein the optimization is a random direction optimization (560).

16. The apparatus of any of the preceding claims, wherein at least one of the first and second scaling values and first and second delay values is obtained by minimizing the mutual information or related measure of the output signals.

17. The apparatus of any of the preceding claims, further comprising an optimizer (560) for iteratively performing the optimization, wherein the optimizer is configured, at each iteration, to randomly generate a current candidate vector for evaluating whether the current candidate vector performs better than a current best candidate vector,

wherein the optimizer is configured to evaluate an objective function associated to a similarity, or dissimilarity, between physical signals, in association to the current candidate vector,

18. The apparatus of any of the preceding claims, configured to:

combine the first input signal (502, M0), or a processed version thereof, with the delayed and scaled version (5031) of the second input signal (502, M1) in the time domain and/or in the z transform or frequency domain;

combine the second input signal (502, M1), or a processed version thereof, with the delayed and scaled version (5030) of the first input signal (502, M0) in the time domain and/or in the z transform or frequency domain.

19. The apparatus of any of claims preceding claims,

wherein the optimization is performed in the time domain and/or in the z transform or frequency domain.

20. The apparatus of any of the preceding claims, wherein the delay or fractional delay (d0) applied to the second input signal (502, M1) is indicative of the relationship and/or difference or arrival between:

the signal from the first source (source0) received by the first microphone (mic0); and the signal from the first source (source0) received by the second microphone (mic1).

21. The apparatus of any of the preceding claims, wherein the delay or fractional delay (d1) applied to the first input signal (502, M0) is indicative of the relationship and/or difference or arrival between:

the signal from the second source (source1) received by the second microphone (mic1); and

the signal from the second source (source1) received by the first microphone (mic0).

22. The apparatus of any of the preceding claims, configured to perform an optimization (560) such that different candidate parameters (a0, a1, d0, d1) are iteratively chosen and processed, and a metrics (532) is measured for each of the candidate parameters, wherein the metrics (532) is a similarity metrics, or dissimilarity metrics, so as to process and combine the first input signal (502, M0) and the second input signal (502, M1)) by using the candidate parameters (a0, a1, d0, d1 ) associated to the metrics indicating the lowest similarity of the output signals, or largest dissimilarity.

23. The apparatus of claim 22, wherein, for each iteration, the candidate parameters include a candidate delay (do) to be applied to the second input signal (502, M1), the candidate delay (d0) being associable to a candidate relationship and/or candidate difference or arrival between:

the signal from the first source (source0) received by the first microphone (mic0); and the signal from the first source (source0) received by the second microphone (mic1).

24. The apparatus of claim 22 or 23, wherein, for each iteration, the candidate parameters include a candidate delay (di) to be applied to the first input signal (502, M0), the candidate delay (di) being associable to a candidate relationship and/or candidate difference or arrival between:

the signal from the second source (source1) received by the second microphone (mic1); and

the signal from the second source (source1) received by the first microphone (mic0).

25. The apparatus of claim 22 or 23 or 24, wherein, for each iteration, the candidate parameters include a candidate relative attenuation value (564, a0) to be applied to the second input signal (502, M1), the candidate relative attenuation value (564, a0) being indicative of a candidate relationship and/or candidate difference between:

the amplitude of the signal received by the first microphone (mic0) from the first source (source0); and

the amplitude of the signal received by the second microphone (mic1) from the first source (source0).

26. The apparatus of claim 22 or 23 or 24 or 25, wherein, for each iteration, the candidate parameters (564) include a candidate relative attenuation value (a1) to be applied to the first input signal (502, M0), the candidate relative attenuation value (a1) being indicative of a candidate relationship and/or candidate difference between:

the amplitude of the signal received by the second microphone (mic1) from the second source (source1); and

the amplitude of the signal received by the first microphone (mic0) from the second source (source1).

27. The apparatus of any of claims 22 to 26, configured to change at least one candidate parameter for different iterations.

28. The apparatus of any of claims 22 to 27, configured to change at least one candidate parameter for different iterations by randomly choosing at least one step from at least one candidate parameter for a preceding iteration to at least one candidate parameter for a subsequent iteration.

29. The apparatus of claim 28, configured to choose the at least one step randomly.

30. The apparatus of claim 29, wherein at least one step is weighted by a preselected weight.

31. The apparatus of claim 29 or 30, wherein the at least one step is limited by a preselected weight.

32. The apparatus of any of claims 22 to 31 , wherein candidate parameters (a0, a1 , d0, d1) form a candidates’ vector, wherein, for each iteration, the candidates’ vector is perturbed by applying a vector of random numbers, which are element-wise multiplied by, or added to, the elements of the candidates’ vector.

33. The apparatus of claim 32, wherein, for each iteration, the candidates' vector is modified for a step.

34. The apparatus of any of claims 22 to 33, wherein the number of iterations is limited to a predetermined maximum number.

35. The apparatus of any of claims 7 and 22 to 34, wherein the metrics (532) is processed as a Kullback-Leibler divergence.

36. The apparatus of any of claims 7 and 22 to 35, wherein the metrics (532) is processed using a Kullback-Leibler divergence.

37. The apparatus of any of claims 7 and 22 to 36, wherein the metrics is so that, the less the crosstalk, the higher its value.

38. The apparatus of any of any of claims 22 to 37, wherein the metrics (532) is based on:

for each of the first and second signals (M0, M1), a respective element of a set of normalized magnitude values.

39. The apparatus of claim 38, wherein, for at least one of the first and second input signals (M0, M1), the respective element is based on the candidate first or second outputs signal (S'0, S’1) as obtained from the candidate parameters.

40. The apparatus of claim 39, wherein for at least one of the first and second input signals (M0, M1), the respective element is obtained as a fraction between:

a value associated to a candidate first or second output signal (S’0, S’1); and a norm associated to the previously obtained values of the first or second output signal.

41. The apparatus of claim 39 or 40, wherein for at least one of the first and second input signals (M0, M1), the respective element is obtained by

42. The apparatus of claim 41 , wherein Pi(n) is comprised between 0 and 1.

43. The apparatus of any of claims 22 to 42, wherein the metrics includes a logarithm of a quotient formed on the basis of:

o the respective element of the first set of normalized magnitude values; and

o a respective element of a second set of normalized magnitude values, in order to obtain (530) a value (532) between a signal portion (s0’) described by the first set of normalized magnitude values (P0) and a signal portion (s1’) described by the first set of normalized magnitude values (P0).

44. The apparatus of any of claims 7 or 22 to 43, wherein the metrics is obtained in form of:

wherein P(n) is an element associated to the first input signal and Q(n) is an element associated to the second input signal.

45. The apparatus of any of claims 22 to 43, wherein the metrics is obtained in form of:

wherein P1(n) is an element associated to the first input signal and P2(n) is an element associated to the second input signal.

46. The apparatus of any of claims 22 to 45, configured to perform the optimization using a sliding window.

47. The apparatus of any of the preceding claims, further configured to transform, into a frequency domain, information associated to the obtained first and second output signals (S’0, S’1).

48. The apparatus of any of the preceding claims, further configured to encode information associated to the obtained first and second output signals (S’0, S’1).

49. The apparatus of any of the preceding claims, further configured to store information associated to the obtained first and second output signals (S’0, S’1).

50. The apparatus of any of the preceding claims, further configured to transmit information associated to the obtained first and second output signals (S’0, S’1).

51. The apparatus of any of the preceding claims and at least one of a first microphone (mic0) for obtaining the first input signal (502, M0) and a second microphone (mic1) for obtaining the second input signal (502, M1).

52. An apparatus for teleconferencing including the apparatus of any of the preceding claims and equipment for transmitting information associated to the obtained first and second output signals (S’0, S’1).

53. A binaural system including the apparatus of any of the preceding claims.

54. An optimizer (560) for iteratively optimizing physical parameters associated to physical signals, wherein the optimizer is configured, at each iteration, to randomly generate a current candidate vector for evaluating whether the current candidate vector performs better than a current best candidate vector,

wherein the optimizer is configured to evaluate an objective function associated to a similarity, or dissimilarity, between physical signals, in association to the current candidate vector,

55. The optimizer of claim 54, wherein the physical signal includes audio signals obtained by different microphones.

56. The optimizer of claim 54 or 55, wherein the parameters include a delay and/or a scaling factor for an audio signal obtained at a particular microphone.

57. The optimizer of any of claims 54 to 56, wherein the objective function is a Kullback-Leibler divergence.

58. The optimizer of claim 57, wherein the Kullback-Leibler divergence is applied to a first and a second sets of normalized magnitude values.

59. The optimizer of claim 57 or 58, wherein the objective function is obtained by summing (714, 716) a plurality of products (712’, 710’) between:

- a respective element (P0) of a first set of normalized magnitude values, and - a logarithm (706’) of a quotient (702’) formed on the basis of:

o the respective element (P0) of the first set of normalized magnitude values (522); and

o a respective element (P1) of a second set of normalized magnitude values (522),

in order to obtain (530) a value (DKL, D, 532) describing a similarity, or dissimilarity, between a signal portion ( S’0)described by the first set of normalized magnitude values (P0) and a signal portion (S’1) described by the second set of normalized magnitude values (P1).

60. The optimizer of claim 57 or 58 or 59, wherein the objective function is obtained as

wherein P1(n) or P(n) is an element associated to the first input signal and P2(n) or Q(n) is an element associated to the second input signal.

61. A method for obtaining a plurality of output signals (504, S’0, S’1) associated with different sound sources (source0, source1) on the basis of a plurality of input signals (502, M0, M1), in which signals from the sound sources (source0, source1) are combined, the method comprising:

combining a first input signal (502, M0), or a processed version thereof, with a delayed and scaled version (5031) of a second input signal (502, M1), to obtain a first output signal (504, S’0);

combining a second input signal (502, M1), or a processed version thereof, with a delayed and scaled version (5030) of the first input signal (502, M0), to obtain a second output signal (504, S’1);

determining, using a random direction optimization (560), at least one of:

a first scaling value (564, a0), which is used to obtain the delayed and scaled version (5030) of the first input signal (502, M0);

a first delay value (564, d0), which is used to obtain the delayed and scaled version (5030) of the first input signal (502, M0);

a second scaling value (564, a1), which is used to obtain the delayed and scaled version (5031) of the second input signal (502, M1); and

a second delay value (564, d1), which is used to obtain the delayed and scaled version (5031) of the second input signal.

62. A method for obtaining a plurality of output signals (504, S'0, S’1), associated with different sound sources (source1, source2), on the basis of a plurality of input signals (502, M0, M1), in which signals from the sound sources (source1, source2) are combined, the method including

combining (510) a first input signal (502, M0), or a processed version thereof, with a delayed and scaled version (5031) of a second input signal (502, M1), to obtain a first output signal (504, S’0), wherein the method is configured to apply a fractional delay (d1) to the second input signal (502, M1);

combining (510) a second input signal (502, M1), or a processed version thereof, with a delayed and scaled version (5030) of the first input signal (502, M0), to obtain a second output signal (504, S’I), wherein the method is configured to apply a fractional delay (d0) to the first input signal (502, M0);

determining, using an optimization, at least one of:

a first scaling value (564, a0), which is used to obtain the delayed and scaled version (5030) of the first input signal (502, M0);

a first fractional delay value (564, d0), which is used to obtain the delayed and scaled version (5030) of the first input signal (502, M0);

a second scaling value (564, a1), which is used to obtain the delayed and scaled version (5031) of the second input signal (502, M1); and

a second fractional delay value (564, d1), which is used to obtain the delayed and scaled version (5031) of the second input signal (502, M1).

63. A method for obtaining a plurality of output signals (504, S'0, S’1) associated with different sound sources (source0, source1) on the basis of a plurality of input signals (M0, M1), in which signals from the sound sources are combined,

combining a first input signal (502, M0), or a processed version thereof, with a delayed and scaled version (5031) of a second input signal (502, M1), to obtain a first output signal (504, S’0),

combining a second input signal (502, M1), or a processed version thereof, with a delayed and scaled version (5030) of the first input signal (502, M0), to obtain a second output signal (504, S’I),

summing (714, 716) a plurality of products between:

- a respective element (P0) of a first set of normalized magnitude values, and - a logarithm (706’) of a quotient (702’) formed on the basis of:

o the respective element (P0) of the first set of normalized magnitude values;

and

o a respective element (P1, Q) of a second set of normalized magnitude values,

in order to obtain (530) a value (532) describing a similarity, or dissimilarity, between a signal portion (S0’) described by the first set of normalized magnitude values (P0) and a signal portion (S1’(n)) described by the second set of normalized magnitude values (P0).

64. A method of any of claims 61-63, configured to use equipment of any of the preceding product claims.

65. A method according to any of claims 61-64, wherein the fractional delay (d1) is indicative of the relationship and/or difference between the delay of the signal (M0) arriving at the first microphone (mic0) from the second source (source1) and the delay (H1,1) of the signal (M1) arriving at the second microphone (mic1) from the second (source1).

66. An optimizing method for iteratively optimizing physical parameters associated to physical signals, wherein the method includes, for each iteration, to randomly generate a current candidate vector for evaluating whether the current candidate vector performs better than a current best candidate vector,

wherein the optimizer is configured to evaluate an objective function associated to a similarity, or dissimilarity, between physical signals, in association to the current candidate vector,

67. A non-transitory storage unit storing instructions which, when executed by a processor, cause the processor to perform a method according to any of the claims 61-66.

Documents

Application Documents

#	Name	Date
1	202237018358.pdf	2022-03-29
2	202237018358-STATEMENT OF UNDERTAKING (FORM 3) [29-03-2022(online)].pdf	2022-03-29
3	202237018358-FORM 1 [29-03-2022(online)].pdf	2022-03-29
4	202237018358-DRAWINGS [29-03-2022(online)].pdf	2022-03-29
5	202237018358-DECLARATION OF INVENTORSHIP (FORM 5) [29-03-2022(online)].pdf	2022-03-29
6	202237018358-COMPLETE SPECIFICATION [29-03-2022(online)].pdf	2022-03-29
7	202237018358-MARKED COPIES OF AMENDEMENTS [31-03-2022(online)].pdf	2022-03-31
8	202237018358-FORM 13 [31-03-2022(online)].pdf	2022-03-31
9	202237018358-Annexure [31-03-2022(online)].pdf	2022-03-31
10	202237018358-AMMENDED DOCUMENTS [31-03-2022(online)].pdf	2022-03-31
11	202237018358-FORM 18 [01-04-2022(online)].pdf	2022-04-01
12	202237018358-Proof of Right [01-06-2022(online)].pdf	2022-06-01
13	202237018358-FORM-26 [01-06-2022(online)].pdf	2022-06-01
14	202237018358-FORM 3 [24-08-2022(online)].pdf	2022-08-24
15	202237018358-FER.pdf	2022-09-12
16	202237018358-FORM 3 [21-02-2023(online)].pdf	2023-02-21
17	202237018358-FORM 3 [21-02-2023(online)]-1.pdf	2023-02-21
18	202237018358-Proof of Right [06-03-2023(online)].pdf	2023-03-06
19	202237018358-Information under section 8(2) [06-03-2023(online)].pdf	2023-03-06
20	202237018358-FER_SER_REPLY [06-03-2023(online)].pdf	2023-03-06
21	202237018358-DRAWING [06-03-2023(online)].pdf	2023-03-06
22	202237018358-COMPLETE SPECIFICATION [06-03-2023(online)].pdf	2023-03-06
23	202237018358-CLAIMS [06-03-2023(online)].pdf	2023-03-06
24	202237018358-Information under section 8(2) [02-05-2023(online)].pdf	2023-05-02
25	202237018358-Information under section 8(2) [19-09-2023(online)].pdf	2023-09-19
26	202237018358-FORM 3 [19-09-2023(online)].pdf	2023-09-19
27	202237018358-Information under section 8(2) [26-02-2024(online)].pdf	2024-02-26
28	202237018358-FORM 3 [26-02-2024(online)].pdf	2024-02-26

Search Strategy

1	202237018358E_09-09-2022.pdf