Sign In to Follow Application
View All Documents & Correspondence

Determining Corrections To Be Applied To A Multichannel Audio Signal, Associated Coding And Decoding

Abstract: The invention relates to a method for determining a set of corrections (Corr.) to be made to a multichannel sound signal, in which the set of corrections is determined on the basis of an item of information representative of a spatial image of an original multichannel signal (Inf.B) and an item of information representative of a spatial image of the original multichannel signal that has been coded and then decoded (Inf. B). The invention also relates to a decoding method and a coding method implementing the determining method, and to the associated coding and decoding devices.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
14 March 2022
Publication Number
26/2022
Publication Type
INA
Invention Field
ELECTRONICS
Status
Email
Parent Application
Patent Number
Legal Status
Grant Date
2025-03-13
Renewal Date

Applicants

ORANGE
111, quai du Président Roosevelt 92130 Issy-les-Moulineaux

Inventors

1. MAHE, Pierre Clément
ORANGE - TGI/OLR/IPL/PATENTS ORANGE GARDENS - 44 avenue de la Republique - CS 50010 92326 CHÂTILLON CEDEX
2. RAGOT, Stéphane
ORANGE - TGI/OLR/IPL/PATENTS ORANGE GARDENS - 44 avenue de la Republique - CS 50010 92326 CHÂTILLON CEDEX
3. DANIEL, Jerome
ORANGE - TGI/OLR/IPL/PATENTS ORANGE GARDENS - 44 avenue de la Republique - CS 50010 92326 CHÂTILLON CEDEX

Specification

DESCRIPTION

Title: Determination of corrections to be applied to a multichannel audio signal, associated coding and decoding

The present invention relates to the coding/decoding of spatialized sound data, in particular in an ambiophonic context (hereinafter also denoted “ambisonic”).

The coders/decoders (hereafter called “coded”) which are currently used in mobile telephony are mono (a single signal channel for reproduction on a single loudspeaker). The 3GPP EVS code (for "Enhanced Voice Services") makes it possible to offer "Super-HD" quality (also called "High Definition Rus" or HD+ voice) with a super-broadband audio band (SWB for "super- wideband” in English) for signals sampled at 32 or 48 kHz or full band (FB for “Fullband”) for signals sampled at 48 kHz; the audio bandwidth is 14.4 to 16 kHz in SWB mode (9.6 to 128 kbit/s) and 20 kHz in FB mode (16.4 to 128 kbit/s).

The next quality evolution in the conversational services offered by operators should be constituted by immersive services, using terminals such as smartphones equipped with several microphones or spatialized audio conferencing equipment or telepresence or video type videoconferencing. 360°, or “live” audio content sharing equipment, with spatialized 3D sound rendering that is much more immersive than simple 2D stereo reproduction. With the increasingly widespread uses of listening on a mobile phone with an audio headset and the appearance of advanced audio equipment (accessories such as a 3D microphone, voice assistants with acoustic antennas, virtual reality headsets, etc.

As such, the future 3GPP "IVAS" standard (for "Immersive Voice And Audio Services") proposes the extension of the EVS codec to immersive by accepting as the codec input format at least the spatialized sound formats listed d- below (and their combinations):

- Multichannel format (channel-based in English) of the stereo or 5.1 type where each channel feeds a loudspeaker (for example L and R in stereo or L, R, Ls, Rs and C in 5.1);

- Object format (object-based in English) where sound objects are described as an audio signal (usually mono) associated with metadata describing the attributes of this object (position in space, spatial width of the source, etc. ),

- Ambisonic format (scene-based in English) which describes the sound field at a given point, generally picked up by a spherical microphone or synthesized in the field of spherical harmonics.

We are interested below typically in the coding of a sound in the Ambisonic format, by way of example of realization (at least certain aspects presented in connection with the invention hereafter being able to also apply to other formats than Ambisonics).

Ambisonics is a method of recording ("coding" in the acoustic sense) of spatialized sound and a system of reproduction ("decoding" in the acoustic sense). An ambisonic microphone (in order 1) comprises at least four capsules (typically of the cardioid or sub-cardioid type) arranged on a spherical grid, for example the vertices of a regular tetrahedron. The audio channels associated with these capsules are called the “A-format”. This format is converted into a "B-format", in which the sound field is broken down into four components (spherical harmonics) denoted W, X, Y, Z, which correspond to four coincident virtual microphones. The W component corresponds to omnidirectional capture of the sound field, while the more directional X, Y and Z components are similar to pressure gradient microphones oriented along the three orthogonal axes of space. An Ambisonic system is a flexible system in the sense that recording and playback are separate and decoupled. It allows decoding (in the acoustic sense) on any configuration of loudspeakers (for example, binaural, “surround” sound of the 5.1 type or periphony (with elevation) of the 7.1.4 type). The Ambisonics approach can be generalized to more than four channels in B-format and this generalized representation is commonly referred to as “HOA” (for “Higher-Order Ambisonics”). Decomposing the sound on more spherical harmonics improves the spatial accuracy of restitution when rendered on loudspeakers. An Ambisonic system is a flexible system in the sense that recording and playback are separate and decoupled. It allows decoding (in the acoustic sense) on any configuration of loudspeakers (for example, binaural, “surround” sound of the 5.1 type or periphony (with elevation) of the 7.1.4 type). The Ambisonics approach can be generalized to more than four channels in B-format and this generalized representation is commonly referred to as “HOA” (for “Higher-Order Ambisonics”). Decomposing the sound on more spherical harmonics improves the spatial accuracy of restitution when rendered on loudspeakers. An Ambisonic system is a flexible system in the sense that recording and playback are separate and decoupled. It allows decoding (in the acoustic sense) on any configuration of loudspeakers (for example, binaural, “surround” sound of the 5.1 type or periphony (with elevation) of the 7.1.4 type). The Ambisonics approach can be generalized to more than four channels in B-format and this generalized representation is commonly referred to as “HOA” (for “Higher-Order Ambisonics”). Decomposing the sound on more spherical harmonics improves the spatial accuracy of restitution when rendered on loudspeakers. 5.1-type “surround” sound or 7.1.4-type periphony (with elevation). The Ambisonics approach can be generalized to more than four channels in B-format and this generalized representation is commonly referred to as “HOA” (for “Higher-Order Ambisonics”). Decomposing the sound on more spherical harmonics improves the spatial accuracy of restitution when rendered on loudspeakers. 5.1-type “surround” sound or 7.1.4-type periphony (with elevation). The Ambisonics approach can be generalized to more than four channels in B-format and this generalized representation is commonly referred to as “HOA” (for “Higher-Order Ambisonics”). Decomposing the sound on more spherical harmonics improves the spatial accuracy of restitution when rendered on loudspeakers.

An ambisonic signal of order M comprises K=(M+ 1) 2 components and, at order 1 (if M= 1), we find the four components W, X, Y, and Z, commonly called FOA (for First-Order Ambisonics). There is also a so-called "planar" variant of Ambisonics (W, X, Y) which breaks down the sound defined in a plane which is generally the horizontal plane. In this case, the number of components is K=2M+1 channels.

First-order ambisonics (4 channels: W, X, Y, Z), planar first-order ambisonics (3 channels: W, X, Y), higher-order ambisonics are all referred to as ci - afterwards by “ambisonic” indiscriminately to facilitate reading, the treatments presented being applicable independently of the planar type or not and of the number of ambisonic components.

Thereafter, we will call “ambisonic signal” a signal in B-format at a predetermined order with a certain number of ambisonic components. This also includes hybrid cases, where for example at order 2 we only have 8 channels (instead of 9) - more precisely, at order 2, we find the 4 channels of order 1 (W , X, Y, Z) to which we normally add 5 channels (usually denoted R, S, T, U, V), and we can for example ignore one of the higher order channels (for example

R).

The signals to be processed by the coder/decoder are presented as successions of blocks of sound samples called “frames” or “sub-frames” below.

In addition, hereafter, the mathematical notations follow the following convention:

- Scalar: s or N (lowercase for variables or uppercase for constants)

- the operator Re(.) designates the real part of a complex number

- Vector: u (lowercase, bold)

- Matrix: A (uppercase, bold)

The notations A T and A H indicate respectively the transposition and the Hermitian transposition (transposed and conjugated) of A.

- A one-dimensional discrete-time signal, s(i), defined over a time interval i=0, ..., L-1 of length L is represented by a row vector

s=[s(0,) ...,s(L-1)]

We can also write: s = [S 0 ,.., S L-1 ] to avoid the use of parentheses.

- A discrete-time multidimensional signal, b(i), defined over a time interval i=0, .., L-1 of length L and in K dimensions is represented by a matrix of size

We can also note: B = [B ij ], i=0,..K-1 , j=0..L-1 , to avoid the use of parentheses.

- A 3D point of Cartesian coordinates (x,y,z) can be converted into spherical coordinates (r, Θ ,φ), where r is the distance from the origin, Θ is the azimuth and φ the elevation. We use here, without loss of generality, the mathematical convention where the elevation is defined with respect to the horizontal plane (0xy); the invention can easily be adapted to other definitions, including the convention used in physics where the azimuth is defined with respect to the Oz axis.

Moreover, we do not recall here the conventions known from the state of the art in Ambisonics concerning the order of the Ambisonic components (including ACN for Ambisonic Channel Number, SID for Sngle Index Designation, FUMA for Furse-Malham) and the normalization of ambisonic components (SN3D, N3D, maxN). More details can be found for example in the resource available online: https://en.wikipedia.org/wiki/Ambisonic data exchange formats

By convention, the first component of an Ambisonic signal generally corresponds to the omnidirectional component W.

The simplest approach to encoding an Ambisonic signal is to use a mono encoder and apply it in parallel to all channels with possibly different bit allocation depending on the channel. This approach is referred to herein as "multi-mono". The multi-mono approach can be extended to multi-stereo coding (where pairs of channels are coded separately by a stereo codec) or more generally to the use of several parallel instances of the same core codec.

Such an embodiment is presented in FIG. 1. The input signal is divided into channels (one mono channel or several channels) by block 100. These channels are coded separately by blocks 120 to 122 according to a distribution and of a predetermined binary allocation. Their binary train is multiplexed (block 130) and after transmission and/or storage, it is demultiplexed (block 140) to apply a

decoding to reconstruct the decoded channels (blocks 150-152) which are recombined (block 160).

The associated quality varies according to the core coding and decoding used (blocks 120 to 122 and 150 to 152), and it is generally satisfactory only at very high bit rate. For example, in the multi-mono case, the EVS coding can be considered quasi-transparent (from a perceptual point of view) at a rate of at least 48 kbit/s per channel (mono); thus for an ambisonic signal of order 1, a minimum bit rate of 4x48 = 192 kbit/s is obtained.

The multi-mono coding approach does not take into account the correlation between channels, it produces spatial deformations with the addition of various artifacts such as the appearance of ghost sound sources, diffuse noises or displacements of sound source trajectories . Thus, the coding of an Ambisonic signal according to this approach generates degradations of the spatialization.

An alternative approach to the separate coding of all the channels is given, for a stereo or multichannel signal, by parametric coding. For this type of coding, the input multichannel signal is reduced to a lower number of channels, after a processing called "downmix", these channels are coded and transmitted and additional spatialization information is also coded. Parametric decoding consists in increasing the number of channels after decoding the transmitted channels, using a processing called “upmix” (typically implemented by decorrelation) and spatial synthesis according to the decoded additional spatialization information. An example of stereo parametric coding is given by the 3GPP e-AAC+ codec. It will be noted that the downmix operation also generates degradations of the spatialization; in this case,

The invention improves the state of the art.

To this end, it proposes a method for determining a set of corrections to be made to a multichannel sound signal, in which the set of corrections is determined from information representative of a spatial image of a multichannel signal. source and information representative of a spatial image of the original coded and then decoded multichannel signal.

Thus, the determined set of corrections, to be applied to the decoded multichannel signal, makes it possible to limit the spatial degradations due to the coding and possibly to channel reduction/increase operations. The implementation of the correction

thus makes it possible to find a spatial image of the decoded multichannel signal closest to the spatial image of the original multichannel signal.

In a particular embodiment, the determination of the set of corrections is performed in the full-band time domain (one frequency band). In variants, it is performed in the time domain by frequency sub-band. This makes it possible to adapt the corrections according to the frequency bands. In other variants, it is carried out in a real or complex transformed domain (typically frequential) of the short-term discrete Fourier transform (STFT), modified discrete cosine transform (MDCT) type, or other.

The invention also relates to a method for decoding a multi-channel sound signal, comprising the following steps:

- reception of a binary stream comprising a coded audio signal originating from an original multi-channel signal and information representative of a spatial image of the original multi-channel signal;

- decoding of the coded audio signal received and obtaining a decoded multi-channel signal;

- decoding of information representative of a spatial image of the original multi-channel signal;

- determination of information representative of a spatial image of the decoded multichannel signal;

- determination of a set of corrections to be made to the decoded signal according to the determination method described above;

- correction of the decoded multi-channel signal by the determined set of corrections. Thus, in this embodiment, the decoder is capable of determining the corrections to be made to the decoded multichannel signal, from information representative of the spatial image of the original multichannel signal, received from the coder. The information received from the encoder is thus limited. It is the decoder that takes care of both the determination and the application of the corrections.

The invention also relates to a method for coding a multi-channel sound signal, comprising the following steps:

- coding of an audio signal originating from an original multi-channel signal;

- determination of information representative of a spatial image of the original multi-channel signal;

- local decoding of the coded audio signal and obtaining a decoded multi-channel signal;

- determination of information representative of a spatial image of the signal

multi-channel decoded;

- determination of a set of corrections to be made to the decoded multi-channel signal according to the determination method described previously;

- coding of the determined set of corrections.

In this embodiment, it is the coder which determines the set of corrections to be made to the decoded multichannel signal and which transmits it to the decoder. It is therefore the coder who initiates this determination of corrections.

In a first particular embodiment of the decoding method as described above or of the coding method as described above, the information representative of a spatial image is a covariance matrix and the determination of the set of corrections comprises in besides the following steps:

- Obtaining a weighting matrix comprising weighting vectors associated with a set of virtual loudspeakers;

- determination of a spatial image of the original multi-channel signal from the weighting matrix obtained and from the covariance matrix of the original multi-channel signal received;

- determination of a spatial image of the decoded multi-channel signal from the weighting matrix obtained and from the covariance matrix of the determined decoded multi-channel signal;

- calculation of a ratio between the spatial image of the original multichannel signal and the spatial image of the decoded multichannel signal at the directions of the loudspeakers of the set of virtual loudspeakers, to obtain a set of gains.

According to this embodiment, this method using rendering on loudspeakers makes it possible to transmit only a limited quantity of data from the coder to the decoder. Indeed, for a given order M, K=(M+1) 2 coefficients to be transmitted (associated with as many virtual loudspeakers) may be sufficient, but for a more stable correction it may be recommended to use higher -virtual speakers and thus to transmit more points. Moreover, the correction is easily interpretable in terms of gains associated with virtual loudspeakers.

In another embodiment variant, in the case where the coder directly determines the energy of the signal according to different directions and transmits this spatial image of the original multichannel signal to the decoder, the determination of the set of corrections of the decoding method comprises further the following steps:

- Obtaining a weighting matrix comprising weighting vectors associated with a set of virtual loudspeakers;

- determination of a spatial image of the decoded multi-channel signal from the weighting matrix obtained and from the information representative of a spatial image of the determined decoded multi-channel signal;

- calculation of a ratio between the spatial image of the original multichannel signal and the spatial image of the decoded multichannel signal at the directions of the loudspeakers of the set of virtual loudspeakers, to obtain a set of gains.

In order to guarantee a correction value that is not too abrupt, the decoding method or the coding method includes a step of limiting the values ​​of gains obtained according to at least one threshold.

Get set of gains constitutes the set of corrections and can for example be in the form of a correction matrix comprising the set of gains thus determined.

In a second particular embodiment of the decoding method or of the coding method, the information representative of a spatial image is a covariance matrix and the determination of the set of corrections comprises a step of determining a matrix of transformation by matrix decomposition of the two covariance matrices, the transformation matrix constituting the set of corrections.

This embodiment has the advantage of making the corrections directly in the ambisonic domain in the case of an ambisonic multichannel signal. The stages of transformation of the signals reproduced on loudspeakers towards the ambisonic domain are thus avoided. This embodiment also makes it possible to optimize the correction so that it is mathematically optimal even if it requires the transmission of a greater number of coefficients with respect to the method with rendering on loudspeakers. Indeed, for an order M and therefore a number of components K=(M+ 1) 2, the number of coefficients to be transmitted is Kx(K+1)/2. In order to avoid amplifying too much over certain frequency zones, a normalization factor is determined and applied to the transformation matrix. In the case where the set of corrections is represented by a transformation matrix or a correction matrix as described above, the correction of the decoded multichannel signal by the determined set of corrections is performed by applying the set corrections to the decoded multichannel signal, that is to say directly in the ambisonic domain in the case of an ambisonic signal.

In the loudspeaker rendering embodiment implemented by the decoder, the correction of the decoded multichannel signal by the determined set of corrections is carried out according to the following steps:

- acoustic decoding of the decoded multi-channel signal on the defined set of virtual loudspeakers;

- application of the set of gains obtained to the signals resulting from the acoustic decoding;

- acoustic coding of the signals resulting from the acoustic decoding and corrected to obtain components of the multichannel signal;

- summation of the components of the multichannel signal thus obtained to obtain a corrected multichannel signal.

In a variant embodiment, the steps of decoding, application of gains and coding/summation above are grouped together in a direct correction operation by a correction matrix. This correction matrix can be applied directly to the decoded multichannel signal, which has the advantage, as described above, of bringing the corrections directly into the Ambisonic domain.

In a second embodiment, where the coding method implements the method for determining all of the corrections, the decoding method comprises the following steps:

- reception of a binary stream comprising a coded audio signal originating from an original multi-channel signal and a coded set of corrections to be made to the decoded multi-channel signal, the set of corrections having been coded according to a coding method described previously;

- decoding of the received coded audio signal obtaining a decoded multi-channel signal;

- decoding of the coded set of corrections;

- correction of the decoded multi-channel signal by applying the set of decoded corrections to the decoded multi-channel signal.

In this embodiment, it is the coder which determines the corrections to be applied to the decoded multi-channel signal, directly in the Ambisonic domain and it is the decoder which implements the application of these corrections to the decoded multi-channel signal, directly into the Ambisonics domain.

The set of corrections can in this case be a transformation matrix or else a correction matrix comprising a set of gains.

In a variant embodiment of the decoding method by rendering on loudspeakers, the decoding method comprises the following steps:

- reception of a binary stream comprising a coded audio signal originating from an original multi-channel signal and a coded set of corrections to be made to the decoded multi-channel signal, the set of corrections having been coded according to a coding method as described previously;

- decoding the received coded audio signal and obtaining a decoded multi-channel signal;

- decoding of the coded set of corrections;

- correction of the decoded multichannel signal by the set of decoded corrections according to the following steps:

. acoustically decoding the decoded multi-channel signal on the defined set of virtual loudspeakers;

. application of the set of gains obtained to the signals resulting from the acoustic decoding;

. acoustic coding of the signals resulting from the acoustic decoding and corrected to obtain components of the multichannel signal;

. summation of the components of the multichannel signal thus obtained to obtain a corrected multichannel signal.

In this embodiment, it is the coder which determines the corrections to be made to the signals resulting from the acoustic decoding on a set of virtual loudspeakers and it is the decoder which implements the application of these corrections to the signals resulting from the acoustic decoding then which transforms these signals to return to the ambisonic domain in the case of an ambisonic multichannel signal.

In a variant embodiment, the steps of decoding, application of gains and coding/summation above are grouped together in a direct correction operation by a correction matrix. The correction is then performed directly by applying a correction matrix to the decoded multichannel signal, for example the Ambisonic signal. As described previously, this has the advantage of bringing the corrections directly into the Ambisonic domain.

The invention also relates to a decoding device comprising a processing circuit for implementing the decoding methods as described above.

The invention also relates to a decoding device comprising a processing circuit for implementing the coding methods as described previously.

The invention relates to a computer program comprising instructions for implementing decoding methods or coding methods as described above, when they are executed by a processor.

Finally, the invention relates to a storage medium, readable by a processor, storing a computer program comprising instructions for the execution of the decoding methods or the coding methods described above.

Other characteristics and advantages of the invention will appear more clearly on reading the following description of particular embodiments, given by way of simple illustrative and non-limiting examples, and the appended drawings, among which:

[Fig 1] Figure 1 illustrates a multi-mono coding according to the state of the art and as described above;

[Fig 2] Figure 2 illustrates in the form of a flowchart, the steps of a method for determining a set of corrections according to one embodiment of the invention;

[Fig 3] Figure 3 illustrates a first embodiment of an encoder and a decoder, an encoding method and a decoding method according to the invention;

[Fig 4] Figure 4 illustrates a first detailed embodiment of the block for determining the set of corrections;

[Fig 5] Figure 5 illustrates a second detailed embodiment of the block for determining the set of corrections;

[Fig 6] Figure 6 illustrates a second embodiment of an encoder and a decoder, of an encoding method and of a decoding method according to the invention; and [Fig 7] Figure 7 illustrates examples of structural embodiment of an encoder and a decoder according to one embodiment of the invention.

The method described below is based on the correction of spatial degradations, in particular to ensure that the spatial image of the decoded signal is as close as possible to the original signal. Unlike known parametric coding approaches for stereo or multichannel signals, where perceptual attributes are coded, the invention is not based on a

perceptual interpretation of spatial image information because the Ambisonic domain is not directly "listenable".

FIG. 2 represents the main steps implemented to determine a set of corrections to be applied to the coded then decoded multichannel signal.

The original multi-channel signal B of dimension KxL (ie K components of L time or frequency samples) is input to the determination method. At step S1 information representative of a spatial image of the original multi-channel signal is extracted.

We are interested here in the case of a multichannel signal in Ambisonic representation, as described above. The invention can also be applied to other types of multichannel signal such as a B-format signal with modifications, such as for example the suppression of certain components (e.g. suppression of the R component at order 2 so as not to keep only 8 channels) or the matrixing of the B-format to pass into an equivalent domain (called "Equivalent Spatial Domain") as described in the 3GPP specification TS 26.260 - another example of matrixing is given by the "channel mapping 3" of the coded IETF Opus and in the 3GPP TS 26.918 specification (dause 6.1.6.3).

We call here “spatial image” the distribution of the sound energy of the ambisonic sound stage at different directions in space; in variants, this spatial image describing the sound scene generally corresponds to positive magnitudes evaluated at different predetermined directions in space, for example in the form of a pseudo-spectrum of the MUSIC (MUltiple Signal Classification) type sampled at these directions or a histogram of directions of arrival (where the directions of arrival are counted according to the discretization given by the predetermined directions); these positive magnitudes can be interpreted as energies and are seen as such below to simplify the description of the invention.

A spatial image associated with an Ambisonic sound scene therefore represents the sound energy (or more generally a positive magnitude) relative to different directions in space. In the invention, information representative of a spatial image can be for example a covariance matrix calculated between the channels of the multichannel signal or energy information associated with directions of origin of the sound (associated with directions of height -virtual speakers distributed on a unit sphere).

The set of corrections to be applied to a multichannel signal is information which can be defined by a set of gains associated with the directions from which the sound comes, which can be in the form of a matrix of corrections comprising this set of gains or a transformation matrix.

A covariance matrix of a multichannel signal B is for example obtained at step SI. As described later with reference to Figures 3 and 6, this matrix is ​​for example calculated as follows:

C = BB T up to a normalization factor (in the real case)

Where

C = Re(BB H ) up to a normalization factor (in the complex case)

In variants, temporal smoothing operations of the covariance matrix could be used. In the case of a multichannel signal in the time domain, the covariance can be estimated recursively (sample by sample) in the form:

Qj(n) = n/(n+1) Qj(n-1) + 1/(n+1) bi(n) bj(n).

In a variant embodiment, energy information is obtained in different directions (associated with directions of virtual loudspeakers distributed over a unit sphere). For this, a method of the SRP type (for "Steered-Response Power" in English) described later with reference to FIGS. 3 and 4 could for example be applied. In variants, other spatial image calculation methods (pseudo-spectrum MUSIC, histogram of directions of arrival) can be used.

Several embodiments are possible and described here for encoding the original multichannel signal.

In a first embodiment, the different channels b k , k=0,...K-1 , of B are coded, in step S2, by multi-mono coding, each channel b k being coded separately. In variant embodiments, a multi-stereo coding where the channels b kare encoded in separate pairs is also possible. A classical example for a 5.1 input signal is to use two separate stereo encodings of L/R and Ls/Rs with mono encodings of C and LFE (low frequencies only); for the ambisonic case, the multi-stereo coding can be applied to the ambisonic components (B-format) or to an equivalent multichannel signal obtained after matrixing the channels of the B-format - for example to order 1 the channels W, X, Y, Z can be converted into four transformed channels, and two channel pairs are coded

separately and converted back to B-format on decoding. An example is given in the recent versions of the Opus codec (“channel mapping 3”) and in the 3GPP specification TR 26.918 (dause 6.1.6.3).

In other variants, it will also be possible to use in step S2 joint multichannel coding, such as for example the MPEG-H 3D Audio codec for the ambisonic (scene-based) format; in this case, the codec encodes the input channels jointly. In the MPEG-H example, this joint coding breaks down for an ambisonic signal into several stages such as the extraction and the coding of predominant mono sources, the extraction of an ambiance (typically reduced to an ambisonic signal of order 1 ), the coding of all the extracted channels (called “transport channels”) and metadata describing the acoustic beamforming vectors (“beamforming” in English) for the extraction of predominant channels. Joint multi-channel coding makes it possible to exploit the relationships between all channels for, for example,

In the preferred embodiment, the exemplary embodiment of step S2 is a multi-mono coding which is performed using the 3GPP EVS codec as described above. However, the method according to the invention can thus be used independently of the core codec (multi-mono, multi-stereo, joint coding) used to represent the channels to be coded.

The signal thus encoded in the form of a bitstream can be decoded in step S3 either by a local decoder of the encoder, or by a decoder after transmission. Ge signal is decoded to find the channels of the multichannel signal S (for example by several instances of EVS decoder according to a multi-mono decoding).

Steps S2a, S2b, S3a, S3b represent a variant embodiment of the coding and decoding of the multichannel signal B. The difference with the coding of step S2 described above lies in the use of additional processing operations to reduce the number of channels (“downmix” in English) in step S2a and of increasing the number of channels (“upmix” in English) in step S3b. Ges coding and decoding steps (S2b and S3a) are similar to steps S2 and S3 except that the number of respective input and output channels is lower in steps S2b and S3a

An example of downmixing a first-order Ambisonic input signal is to keep only the W channel; for an ambisonic input signal of order > 1, we can take as downmix the first 4 components W, X, Y, Z (thus truncate the signal to order 1). In variants, we can take as a downmix a subset of the Ambisonic components (for example 8 channels at order 2 without the R component) and also consider cases of matrixing such as for example a stereo downmix obtained in the form: L = W-Y+0.3*X, R=W+Y+0.3*X (using only FOA channels).

An example of upmixing a mono signal is to apply different spatial room impulse responses (SRIR) or different decorrelating filters (of the all-pass type) in the time or frequency domain. An example of realization of decorrelation in a frequency domain is given for example in document 3GPP S4-180975, pCR to 26.118 on Dolby VRStream audio profile candidate (dause X6.2.3.5).

The signal B* resulting from this “downmix” processing is coded in step S2b by a core codec (multi-mono, multi-stereo, joint coding), for example by a mono or multi-mono approach with the 3GPP EVS codec . The input audio signal from the coding step S2b and output from the decoding step S3 has a lower number of channels than the original multi-channel audio signal. In this case, the spatial image represented by the core codec is already substantially degraded even before coding. In an extreme case, the number of channels is reduced to a single mono channel, encoding only the W channel; the input signal is then limited to a single audio channel and the spatial image is therefore lost. The method according to the invention makes it possible to describe and reconstruct this spatial image as close as possible to that of the original multichannel signal.

At the output of the upmix step in S3b of this variant embodiment, there is a decoded multichannel signal 8 .

From the decoded multichannel signal 8 according to the two variants (S2-S3 or S2a-S2b-S3a-S3b), is extracted, in step S4, information representative of the spatial image of the decoded multichannel signal. As for the original image, this information can be a covariance matrix calculated on the decoded multichannel signal or energy information associated with the directions of origin of the sound (or equivalently, with virtual points on a unit sphere ).

Information representative of the original multichannel signal and of the decoded multichannel signal is used in step S5 to determine a set of corrections to be made to the decoded multichannel signal in order to limit the spatial degradations.

Two embodiments will be detailed later with reference to Figures 4 and

5 to illustrate this step.

The process described in Figure 2 can be implemented in the time domain, in full frequency band (with a single band) or by frequency sub-bands (with several bands), this does not change the operation of the process, each sub-band then being processed separately. S the method is carried out per sub-band, the set of corrections is then determined per sub-band, which causes an additional cost of calculation and data to be transmitted to the decoder compared to the case of a single band. The division into subbands can be uniform or non-uniform. For example, the spectrum of a signal sampled at 32 kHz can be divided according to different variants:

- 4 bands of respective width 1, 3, 4 and 8 kHz or 2, 2, 4, 8 khz

- 24 Bark bands (from 100 Hz width in low frequencies to 3.5-4 kHz for the last sub-band)

- the 24 Bark strips can be grouped together in blocks of 4 or

6 successive strips to form a set of respectively 6 or 4 “agglomerated” strips.

Other breakdowns are possible (for example ERB bands - for "equivalent rectangular bandwidth" in English - or in 1/3 of an octave), including for the case of a different sampling frequency (for example 16 or 48 kHz).

In variants, the invention may also be implemented in a transformed domain, for example in the domain of the short-term discrete Fourier transform (STFT) or the domain of the modified discrete cosine transform

(MDCT).

Several embodiments are now described for implementing the determination of this set of correlations and for applying this set of corrections to the decoded signal.

The known technique for encoding a sound source in the ambisonic format is recalled here. A mono sound source can be artificially spatialized by multiplying its signal by the values ​​of the spherical harmonics associated with its

direction of origin (assuming the signal carried by a plane wave) to obtain as many ambisonic components. For this, the coefficients are calculated for each spherical harmonic for a position determined in azimuth Θ and

in elevation φ to the desired order:

Β=Y(Θ,φ).s

where s is the mono signal to be spatialized and Y( Θ , φ) is the encoding vector defining the coeffidents of the spherical harmonics associated with the direction ( Θ , φ) for order M. An example of an encoding vector is given below for order 1 with the SN3D convention and the order of the SI D or FuMa channels:

10

In variants, other normalization conventions (e.g. maxN, N3D) and channel ordering conventions (e.g. ACN) may be used and the different embodiments are then adapted according to the convention used for the order. des or ambisonic component normalization (PDA or HOA). This amounts to modifying the order of the lines Y( Θ ,φ) or multiplying these lines by predefined constants.

For higher orders, the Y( Θ ,φ) coefficients of spherical harmonics can be found in B.Rafaely's book, Fundamentals of Spherical Array Processing, Springer, 2015. number of K=(M+ 1) 2 .

Similarly, we recall here some notions on the rendering or ambisonic reproduction by loudspeakers. Ambisonic sound is not meant to be listened to as it is; for immersive listening on loudspeakers or on headphones, a “decoding” stage in the acoustic sense also called rendering (“r integer er” in English) must be carried out. We consider the case of N loudspeakers (virtual or physical) distributed over a sphere - typically of unit radius - and whose directions ( Θ n , φ n ), n=0, .., N-1, in terms of azimuth and elevation are known. Decoding, as considered here, is a linear operation which consists of applying a matrix D to the ambisonic signals B to obtain the signals Sn from the loudspeakers, which can be assembled into a matrix S=[ S 0, ... S N-1 ] , S=DB where

one can decompose the matrix D into row vectors d n , i.e.

d n can be seen as a weighting vector for the nth loudspeaker, used to recombine the components of the Ambisonic signal and calculate the signal played on the nth loudspeaker: Sn= d n .B.

There are multiple methods of “decoding” in the acoustic sense. The so-called “basic decoding” method, also called “mode-matching”, is based on the encoding matrix E associated with all the virtual loudspeaker directions:

According to this method, the matrix D is typically defined as the pseudo-inverse of E: D=pinv(E)= D T (DD T ) -1

Alternatively, the method that can be called "projection" gives similar results for certain regular distributions of directions, and is described by the equation:

In the latter case, we see that for each direction of index n,

In the context of this invention, such matrices will serve as a matrix for forming directional beams (“beamforming” in English) describing how to obtain characteristic signals of directions in space in order to operate an analysis and/or transformations spatial.

In the context of the present invention, it is useful to describe the reciprocal conversion to pass from the loudspeaker domain to the ambisonic domain. The successive application of the two conversions should reproduce exactly the original Ambisonic signals if no intermediate modification is applied in the loudspeaker domain. We therefore define the reciprocal conversion as involving the pseudo-inverse of D: pinv (D).S=D T (DD T ) -1 .S

When K=(M+ 1) 2 , the matrix D of size KxK is invertible under certain conditions and in this case: B= D -1 .S

In the case of the “mode-matching” method, it appears that pinv(D)=E In variants, other methods of decoding by D could be used, with the corresponding inverse conversion E; the only condition to be checked is that the combination of the decoding by D and the inverse conversion by E must give a perfect reconstruction (when no intermediate processing is carried out between the acoustic decoding and the acoustic encoding).

Such variants are for example given by:

- "mode-matching" decoding with a regulation term in the form D T (DD T +εl) -1 where ε is a low value (for example 0.01),

- "in phase" or "max-rE" decoding known from the state of the art

- or variants where the distribution of the directions of the loudspeakers is not regular on the sphere.

FIG. 3 represents a first embodiment of an encoding device and of a decoding device for the implementation of an encoding and decoding method including a method for determining a set of corrections as described with reference to Figure 2.

In this embodiment, the coder calculates the information representative of the spatial image of the original multichannel signal and transmits it to the decoder in order to allow it to correct the spatial degradation generated by the coding. Oela allows during decoding, to attenuate spatial artifacts in the decoded Ambisonic signal.

Thus, the encoder receives a multi-channel input signal for example of Ambisonic representation FOA, or HOA, or a hybrid representation with a subset of Ambisonic components up to a given partial Ambisonic order - the latter case is in fact indus de equivalent way in the FOA or HOA case where the missing ambisonic components are zero and the ambisonic order is given by the order

minimum required to include all defined components. Thus, without loss of generality, the following description considers the FOA or HQA cases.

In the embodiment thus described, the input signal is sampled at 32 kHz. The coder operates in frames which are preferably 20 ms long, ie L=640 samples per frame at 32 khz. In variants, other frame lengths and sampling frequencies are possible (for example L=480 samples per 10 ms frame at 48 kHz).

In a preferred embodiment, the coding is performed in the time domain (on one or more bands), however in variants, the invention can be implemented in a transformed domain, for example after short-term discrete Fourier transform. term (STFT) or modified discrete cosine transform (MDCT).

Depending on the coding embodiment used, as explained with reference to FIG. 2, a block 310 for reducing the number of channels (DMX) may be implemented; the input of block 311 is the B* signal at the output of block 310 when the downmix is ​​implemented or the B signal otherwise. In one embodiment, if the downmix is ​​applied, it consists for example for an ambisonic input signal of order 1 in keeping only the W channel and for an ambisonic input signal of order > 1, in not keep only the first 4 ambisonic components W, X, Y, Z (therefore to truncate the signal at order 1). Other types of downmix (like those described previously with a selection of a subset of channels and/or a matrixing) can be implemented without this modifying the method according to the invention.

Block 311 codes the audio signal b'k of B* at the output of block 310 in the case where the downmix step is performed or the audio signal bk of the original multichannel signal B. This signal corresponds to the ambisonic components of the signal original multichannel if no channel reduction processing has been applied.

In a preferred embodiment, block 311 uses multi-mono coding (COD) with a fixed or variable allocation, where the core codec is the standard 3GPP EVS codec. In this multi-mono approach, each channel bk or b'k is coded separately by an instance of the codec; however, in variants other encoding methods are possible, for example multi-stereo encoding or a

joint multi-channel coding. We therefore obtain, at the output of this coding block 311, a coded audio signal coming from the original multi-channel signal, in the form of a binary train which is sent to the multiplexer 340.

Optionally, block 320 performs a division into subbands. In variants, this division into sub-bands could reuse equivalent processing carried out in blocks 310 or 311; the separation of block 320 is functional here. In a preferred embodiment, the channels of the original multichannel audio signal are divided into 4 frequency sub-bands with respective widths of 1 kHz, 3 kHz, 4 kHz, 8 kHz (which amounts to a division of the frequencies according to the 0 -1000, 1000-4000, 4000-8000 and 8000-16000 Hz. Slicing can be implemented through a short-term discrete Fourier transform (STFT), band-pass filtering in the Fourier domain (by application of a frequency mask), and inverse transform with overlap addition. the sub-bands remain sampled at the same original frequency and the processing according to the invention is applied in the time domain; in variants, it is possible to use a bank of filters with critical sampling. It will be noted that the cutting operation into sub-bands generally involves a processing delay which depends on the type of filter bank implemented; according to the invention, a temporal alignment can be applied before or after coding-decoding and/or before the extraction of spatial image information, so that the spatial image information is well synchronized in time with the corrected signal. It will be noted that the cutting operation into sub-bands generally involves a processing delay which depends on the type of filter bank implemented; according to the invention, a temporal alignment can be applied before or after coding-decoding and/or before the extraction of spatial image information, so that the spatial image information is well synchronized in time with the corrected signal. It will be noted that the cutting operation into sub-bands generally involves a processing delay which depends on the type of filter bank implemented; according to the invention, a temporal alignment can be applied before or after coding-decoding and/or before the extraction of spatial image information, so that the spatial image information is well synchronized in time with the corrected signal.

In variants, a full-band processing may be carried out, or the division into sub-bands may be different as explained above.

In other variants, the signal resulting from a transformation of the original multichannel audio signal is directly used and the invention applies in the transformed domain with a division into sub-bands in the transformed domain.

In the remainder of the description, the various stages of coding and decoding are described as if it were processing in the time or frequency domain (real or complex) with a single frequency band in order to simplify the description. .

It is also possible to implement, optionally, in each sub-band, a high-pass filtering (with a cut-off frequency typically at 20 or 50 Hz), for example in the form of an elliptical IIR filter of order 2 whose cut-off frequency is preferably set at 20 or 50 Hz (50 Hz in variants). Ge preprocessing avoids potential bias for subsequent covariance estimation during coding; without this pre-processing, the correction implemented in block 390, described later, will tend to boost low frequencies during full-band processing.

Block 321 determines (Inf. B) information representative of a spatial image of the original multichannel signal.

In one embodiment, this information is energy information associated with directions from which the sound originates (associated with directions of virtual loudspeakers distributed over a unit sphere).

To do this, we define a virtual 3D sphere of unit radius, this 3D sphere is discretized by N points (“punctual” virtual loudspeakers) whose position is defined in spherical coordinates by the directions ( Θ n , φ n ) for the nth speaker. The loudspeakers are typically placed in a (quasi-) uniform way on the sphere. The number N of virtual loudspeakers is determined as a discretization having at least N= K points, with M the ambisonic order of the signal and K=(M+ 1) 2, i.e. N≥K. A quadrature method of the “Lebedev” type can for example be used to carry out this discretization, according to the references Vl Lebedev, and DN Laikov, “A quadrature formula for the sphere of the 131st algebraic order of accuracy”, Doklady Mathematics, vol. 59, no. 3, 1999, p. 477-481 or Pierre Lecomte, Philippe-Aubert Gauthier, Christophe Langrenne, Alexandre Garcia and Alain Berry, On the use of a Lebedev grid for Ambisonics, AES Convention 139, New York, 2015.

In variants, other discretizations may be used, such as for example a Riege discretization with at least N=K points (N≥K), as described in the reference J. Riege und U. Maier, “A two-stage approach for computing cubature formulae for the sphere”, Technical Report, Dortmund University, 1999 or a discretization by taking the points of a “spherical t-design” as described in the article by R H. Hardin and NJ A Soane, “ McLaren's Improved Snub Cube and Other New Spherical Designs in Three Dimensions", Discrète and Gomputational Geometry, 15 (1996), pp. 429-441.

From this discretization, it is possible to determine the spatial image of the multichannel signal. One possible method is for example the SRP method (for “Steered-Response Power”). Indeed, this method consists in calculating the short-term energy coming from different directions defined in terms of azimuth and elevation. For this, as explained previously, similarly to rendering on N loudspeakers, a weighting matrix of the Ambisonic components is calculated, then this matrix is ​​applied to the multichannel signal to sum the contribution of the components and produce a set of N acoustic beams (or “beamformers” in English).

The signal from the acoustic beam for the direction ( Θ n , φ n ) of the nth loudspeaker is given by: S n = d n .B

where d n is the weighting vector (row) giving the acoustic beam forming coefficients for the given direction and B is a matrix of size KxL representing the ambisonic signal (B-format) with K components, over time interval of length L

The set of signals from the N acoustic beams leads to the equation: S= DB

where

and S is a matrix of size NxL representing the signals of N virtual loudspeakers over a time interval of length L

The short-term energy on the time segment of length L for each direction

( Θ n , φ n ) is:

where C= BB T (real case) or Re(BB H ) (complex case) is the covariance matrix of B.

Each term σ n 2 =s n .s nT can thus be calculated for all the directions

( Θ n , φ n ) which correspond to a discretization of the 3D sphere by virtual loudspeakers.

The spatial image ∑ is then given by:

Other variants of calculation of a spatial image ∑ than the SRP method, can be used.

- The d n values ​​may vary depending on the type of acoustic beam forming used (delay-sum, MVDR, LCMV, etc.). The invention also applies to these variants of calculation of the matrix D and of the spatial image

- The MUSIC method (MUItiple Sgnal Classification) also provides another way to calculate a spatial image, with a subspace approach.

The invention also applies in this variant of calculation of the spatial image

which corresponds to the MUSIC pseudo-spectrum calculated by diagonalizing the covariance matrix and evaluated for the directions ( Θ n , φ n ).

- The spatial image can be calculated from a histogram of the intensity vector (at order 1) as for example in the article by S. Tervo, Direction estimation based on sound intensity vectors, Proc. EUSI PCOO, 2009, or its generalization in pseudo-intensity vector. In this case, ('histogram (whose values ​​are the number of occurrences of arrival direction values ​​according to the predetermined directions (Θ n , φ n )) is interpreted as a set of energies according to the predetermined directions.

Block 330 then carries out a quantization of the spatial image thus determined, for example with a scalar quantization on 16 bits by coefficients (by directly using the floating point representation truncated on 16 bits). In variants, other scalar or vector quantization methods are possible. In another embodiment, the information representative of the spatial image of the original multichannel signal is a covariance matrix (of the sub-bands) of the input channels B. This matrix is ​​calculated as:

C = BB T except for a normalization factor (in the real case).

If the invention is implemented in a complex-valued transform domain, this covariance is calculated as:

C=Re (BB H )

up to a normalization factor.

In variants, temporal smoothing operations of the covariance matrix could be used. In the case of a multichannel signal in the time domain, the covariance can be estimated recursively (sample by sample).

The covariance matrix C (of size Kx(K) being, by definition, symmetric, only one of the lower or upper triangles is transmitted to the quantization block 330 which encodes (Q) K(K+1)/2 coefficients, K being the number of Ambisonic components This block 330 carries out a quantization of these coefficients, for example with a scalar quantization on 16 bits per coefficient (by directly using the truncated floating point representation on 16 bits). scalar or vector quantization of the covariance matrix can be implemented. For example, one can calculate the maximum value (maximum variance) of the covariance matrix then encode by scalar quantization with a logarithmic step, on a number of bits greater than low (e.g. 8 bit),the values ​​of the upper (or lower) triangle of the covariance matrix normalized by its maximum value.

In variants, the covariance matrix C could be regularized before quantification in the form C+ εl .

The quantized values ​​are sent to multiplexer 340.

In this embodiment, the decoder receives in the demultiplexer block 350, a binary stream comprising a coded audio signal coming from the original multichannel signal and the information representative of a spatial image of the original multichannel signal.

Block 360 decodes (Q 1 ) the covariance matrix or other information representative of the spatial image of the original signal. Block 370 decodes (DEC) the audio signal as represented by the bitstream.

In one embodiment of the coding and decoding, not implementing the downmix and upmix steps, the decoded multichannel signal is obtained at the

output of decode block 370.

In the embodiment where the downmix step was used for coding, the decoding implemented in block 370 makes it possible to obtain a decoded audio signal which is sent to the input of block 371 of upmix.

Thus, block 371 implements an optional step (UPMIX) of increasing the number of channels. In one embodiment of this step, for the channel of a

mono signal
, it consists in convoluting the signal by different responses

spatial room impulse responses (SRIR for "Spatial Room Impulse Response"); these SRIRs are defined at the original ambisonic order of B. Other methods of decorrelation are possible, for example the application of all-pass decorrelating filters to the different channels of the signal.

CLAIMS

1 . Method for determining a set of corrections (Corr.) to be applied to a multichannel sound signal, in which the set of corrections is determined from information representative of a spatial image of an original multichannel signal (Inf. B) and information representative of a spatial image of the original multi-channel signal coded then decoded (Inf.
).

2. Method according to claim 1, in which the determination of the set of corrections is carried out by frequency sub-band.

3. Method for decoding a multi-channel sound signal, comprising the following steps:

- reception (350) of a binary stream comprising a coded audio signal originating from an original multi-channel signal and information representing a spatial image of the original multi-channel signal;

- decoding (370) the received coded audio signal and obtaining a decoded multi-channel signal;

- decoding (360) of the information representative of a spatial image of the original multi-channel signal;

- determination (375) of information representative of a spatial image of the decoded multichannel signal;

- determination (380) of a set of corrections to be made to the decoded signal according to the determination method in accordance with one of claims 1 to 2;

- correction (390) of the decoded multichannel signal by the determined set of corrections.

4. Method for coding a multi-channel sound signal, comprising the following steps:

- coding (611) of an audio signal originating from an original multi-channel signal;

- determination (621) of information representative of a spatial image of the original multi-channel signal;

- local decoding (612) of the coded audio signal and obtaining a decoded multi-channel signal;

- determination (615) of information representative of a spatial image of the decoded multi-channel signal;

- determination (630) of a set of corrections to be made to the signal

multi-channel decoded according to the determination method in accordance with one of Claims 1 to 2;

- coding (640) of the determined set of corrections.

5. Decoding method according to claim 3 or coding method according to claim 4, in which the information representative of a spatial image is a covariance matrix and the determination of the set of corrections further comprises the following steps:

- obtaining a weighting matrix comprising weighting vectors associated with a set of virtual loudspeakers;

- determination of a spatial image of the original multichannel signal from the weighting matrix obtained and from the covariance matrix of the original multichannel signal;

- determination of a spatial image of the decoded multi-channel signal from the weighting matrix obtained and from the covariance matrix of the determined decoded multi-channel signal;

- calculation of a ratio between the spatial image of the original multichannel signal and the spatial image of the decoded multichannel signal at the directions of the loudspeakers of the set of virtual loudspeakers, to obtain a set of gains.

6. Decoding method according to claim 3, in which the information representative of a spatial image of the original multi-channel signal received is the spatial image of the original multi-channel signal and the determination of the set of corrections comprises in besides the following steps:

- obtaining a weighting matrix comprising weighting vectors associated with a set of virtual loudspeakers;

- determination of a spatial image of the decoded multi-channel signal from the weighting matrix obtained and from the information representative of a spatial image of the determined decoded multi-channel signal;

- calculation of a ratio between the spatial image of the original multichannel signal and the spatial image of the decoded multichannel signal at the directions of the loudspeakers of the set of virtual loudspeakers, to obtain a set of gains.

7. Decoding method according to claim 3 or coding method according to claim 4, in which the information representative of a spatial image is a covariance matrix and the determination of the set of corrections comprises a step of determining a matrix of

transformation by matrix decomposition of the two covariance matrices, the transformation matrix constituting the set of corrections.

8. Decoding method according to one of claims 5 to 7, in which the correction of the decoded multichannel signal by the determined set of corrections is performed by applying the set of corrections to the decoded multichannel signal.

9. Decoding method according to one of claims 5 to 6, in which the correction of the decoded multichannel signal by the determined set of corrections is carried out according to the following steps:

- acoustic decoding of the decoded multi-channel signal on the defined set of virtual loudspeakers;

- application of the set of gains obtained to the signals resulting from the acoustic decoding;

- acoustic coding of the signals resulting from the acoustic decoding and corrected to obtain components of the multichannel signal;

- summation of the components of the multichannel signal thus obtained to obtain a corrected multichannel signal.

10. Method for decoding a multi-channel sound signal, comprising the following steps:

- reception of a binary stream comprising a coded audio signal originating from an original multichannel signal and a coded set of corrections to be made to the decoded multichannel signal, the set of corrections having been coded according to a coding method in accordance with one of claims 4, 5 or 7;

- decoding the received coded audio signal and obtaining a decoded multi-channel signal;

- decoding of the coded set of corrections;

- correction of the decoded multi-channel signal by applying the set of decoded corrections to the decoded multi-channel signal.

11 . Method for decoding a multi-channel sound signal, comprising the following steps:

- reception of a binary stream comprising a coded audio signal originating from an original multichannel signal and a coded set of corrections to be made to the decoded multichannel signal, the set of corrections having been coded according to a

coding method according to claim 5;

- decoding the received coded audio signal and obtaining a decoded multi-channel signal;

- decoding of the coded set of corrections;

- correction of the decoded multichannel signal by the set of decoded corrections according to the following steps:

. acoustic decoding of the decoded multi-channel signal on the set of virtual loudspeakers;

. application of the set of gains obtained to the signals resulting from the acoustic decoding;

. acoustic coding of the signals resulting from the acoustic decoding and corrected to obtain components of the multichannel signal;

. summation of the components of the multichannel signal thus obtained to obtain a corrected multichannel signal.

12. Decoding device comprising a processing circuit for implementing the decoding method according to one of claims 3 or 5 to 11.

13. Coding device comprising a processing circuit for implementing the coding method according to one of claims 4, 5 or 7.

14. Storage medium, readable by a processor, storing a computer program comprising instructions for the execution of the decoding method according to one of claims 3 or 5 to 11 or of the coding method according to one of claims 4, 5 or 7.

Documents

Application Documents

# Name Date
1 202217013787.pdf 2022-03-14
2 202217013787-TRANSLATIOIN OF PRIOIRTY DOCUMENTS ETC. [14-03-2022(online)].pdf 2022-03-14
3 202217013787-STATEMENT OF UNDERTAKING (FORM 3) [14-03-2022(online)].pdf 2022-03-14
4 202217013787-PRIORITY DOCUMENTS [14-03-2022(online)].pdf 2022-03-14
5 202217013787-POWER OF AUTHORITY [14-03-2022(online)].pdf 2022-03-14
6 202217013787-FORM 1 [14-03-2022(online)].pdf 2022-03-14
7 202217013787-DRAWINGS [14-03-2022(online)].pdf 2022-03-14
8 202217013787-DECLARATION OF INVENTORSHIP (FORM 5) [14-03-2022(online)].pdf 2022-03-14
9 202217013787-COMPLETE SPECIFICATION [14-03-2022(online)].pdf 2022-03-14
10 202217013787-FORM 3 [08-04-2022(online)].pdf 2022-04-08
11 202217013787-Proof of Right [09-05-2022(online)].pdf 2022-05-09
12 202217013787-FORM 3 [02-08-2022(online)].pdf 2022-08-02
13 202217013787-FORM 18 [15-09-2023(online)].pdf 2023-09-15
14 202217013787-FER.pdf 2024-01-23
15 202217013787-Verified English translation [13-03-2024(online)].pdf 2024-03-13
16 202217013787-OTHERS [11-04-2024(online)].pdf 2024-04-11
17 202217013787-FORM-26 [11-04-2024(online)].pdf 2024-04-11
18 202217013787-FER_SER_REPLY [11-04-2024(online)].pdf 2024-04-11
19 202217013787-DRAWING [11-04-2024(online)].pdf 2024-04-11
20 202217013787-COMPLETE SPECIFICATION [11-04-2024(online)].pdf 2024-04-11
21 202217013787-CLAIMS [11-04-2024(online)].pdf 2024-04-11
22 202217013787-US(14)-HearingNotice-(HearingDate-03-03-2025).pdf 2025-02-05
23 202217013787-FORM-26 [27-02-2025(online)].pdf 2025-02-27
24 202217013787-Correspondence to notify the Controller [27-02-2025(online)].pdf 2025-02-27
25 202217013787-Written submissions and relevant documents [11-03-2025(online)].pdf 2025-03-11
26 202217013787-PatentCertificate13-03-2025.pdf 2025-03-13
27 202217013787-IntimationOfGrant13-03-2025.pdf 2025-03-13

Search Strategy

1 ssE_22-01-2024.pdf
2 202217013787AE_14-01-2025.pdf

ERegister / Renewals

3rd: 25 Mar 2025

From 24/09/2022 - To 24/09/2023

4th: 25 Mar 2025

From 24/09/2023 - To 24/09/2024

5th: 25 Mar 2025

From 24/09/2024 - To 24/09/2025

6th: 25 Mar 2025

From 24/09/2025 - To 24/09/2026