Sign In to Follow Application
View All Documents & Correspondence

Audio Encoder Audio Decoder Methods And Computer Program Using Jointly Encoded Residual Signals

Abstract: An audio decoder for providing at least four audio channel signals on the basis of an encoded representation is configured to provide a first residual signal and a second residual signal on the basis of a jointly encoded representation of the first residual signal and of the second residual signal using a multi channel decoding. The audio decoder is configured to provide a first audio channel signal and a second audio channel signal on the basis of a first downmix signal and the first residual signal using a residual signal assisted multi channel decoding. The audio decoder is configured to provide a third audio channel signal and a fourth audio channel signal on the basis of a second downmix signal and the second residual signal using a residual signal assisted multi channel decoding. An audio encoder is based on corresponding considerations.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
12 January 2016
Publication Number
27/2016
Publication Type
INA
Invention Field
COMMUNICATION
Status
Email
Parent Application

Applicants

FRAUNHOFER GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.
Hansastrasse 27c 80686 München

Inventors

1. DICK Sascha
Schupferstrasse 49 90482 Nürnberg
2. ERTEL Christian
Nürnberger Str. 24 90542 Eckental
3. HELMRICH Christian
Hauptstrasse 68 91054 Erlangen
4. HILPERT Johannes
Herrnhüttestrasse 46 90411 Nürnberg
5. HÖLZER Andreas
Obere Karlstrasse 23 91054 Erlangen
6. KUNTZ Achim
Weiherstrasse 12 91334 Hemhofen

Specification

Audio Encoder, Audio Decoder, Methods and Computer Program Using Jointly

Encoded Residual Signals

Description

Technical Field

Embodiments according to the invention are related to an audio decoder for providing at least four audio channel signals on the basis of an encoded representation.

Further embodiments according to the invention are related to an audio encoder for providing an encoded representation on the basis of at least four audio channel signals.

Further embodiments according to the invention are related to a method for providing at least four audio channel signals on the basis of an encoded representation and to a method for providing an encoded representation on the basis of at least four audio channel signals.

Further embodiments according to the invention are related to a computer program for performing one of said methods.

Generally speaking, embodiments according the invention are related to a joint coding of n channels.

Background of the Invention

In recent years, a demand for storage and transmission of audio contents has been steadily increasing. Moreover, the quality requirements for the storage and transmission of audio contents has also been increasing steadily. Accordingly, the concepts for the encoding and decoding of audio content have been enhanced. For example, the so-called "advanced audio coding"(AAC) has been developed, which is described, for example, in the International Standard ISO/IEC 13818-7:2003. Moreover, some spatial extensions have been created, like, for example, the so-called "MPEG Surround"-concept which is described, for example, in the international standard ISO/IEC 23003-1 :2007. Moreover, additional improvements for the encoding and decoding of spatial information of audio signals are described in the international standard ISO/IEC 23003-2:2010, which relates to the so-called spatial audio object coding (SAOC).

Moreover, a flexible audio encoding/decoding concept, which provides the possibility to encode both general audio signals and speech signals with good coding efficiency and to handle multi-channel audio signals, is defined in the international standard ISO/IEC 23003-3:2012, which describes the so-called "unified speech and audio coding" (USAC) concept.

In MPEG USAC [1], joint stereo coding of two channels is performed using complex prediction, MPS 2-1 -1 or unified stereo with band-limited or full-band residual signals.

MPEG surround [2] hierarchically combines OTT and TTT boxes for joint coding of multichannel audio with or without transmission of residual signals.

However, there is a desire to provide an even more advanced concept for an efficient encoding and decoding of three-dimensional audio scenes.

Summary of the Invention

An embodiment according to the invention creates an audio decoder for providing at least four audio channel signals on the basis of an encoded representation. The audio decoder is configured to provide a first residual signal and a second residual signal on the basis of a jointly encoded representation of the first residual signal and of the second residual signal using a multi-channel decoding. The audio decoder is also configured to provide a first audio channel signal and a second audio channel signal on the basis of a first downmix signal and the first residual signal using a residual-signal-assisted multi-channel decoding. The audio decoder is also configured to provide a third audio channel signal and a fourth audio channel signal on the basis of a second downmix signal and the second residual signal using a residual-signal-assisted multi-channel decoding.

This embodiment according to the invention is based on the finding that dependencies between four or even more audio channel signals can be exploited by deriving two residual signals, each of which is used to provide two or more audio channel signals using a residual-signal-assisted multi-channel decoding, from a jointly-encoded representation of the residual signals. In other words, it has been found there are typically some similarities of said residual signals, such that a bit rate for encoding said residual signals, which help to improve an audio quality when decoding the at least four audio channel signals, can be reduced by deriving the two residual signals from a jointly-encoded representation using a multi-channel decoding, which exploits similarities and/or dependencies between the residual signals.

In a preferred embodiment, the audio decoder is configured to provide the first downmix signal and the second downmix signal on the basis of a jointly-encoded representation of the first downmix signal and the second downmix signal using a multi-channel decoding. Accordingly, a hierarchical structure of an audio decoder is created, wherein both the downmix signals and the residual signals, which are used in the residual-signal-assisted multi-channel decoding for providing the at least four audio channel signals, are derived using separate multi-channel decoding. Such a concept is particularly efficient, since the two downmix signals typically comprise similarities, which can be exploited in a multichannel encoding/decoding, and since the two residual signals typically also comprise similarities, which can be exploited in a multi-channel encoding/decoding. Thus, a good coding efficiency can typically be obtained using this concept.

In a preferred embodiment, the audio decoder is configured to provide the first residual signal and the second residual signal on the basis of the jointly-encoded representation of the first residual signal and of the second residual signal using a prediction-based multichannel decoding. The usage of a prediction-based multi-channel decoding typically brings along a comparatively good reconstruction quality for the residual signals. This is, for example, advantageous if the first residual signal represents a left side of an audio scene and the second residual signal represents a right side of the audio scene, because the human hearing is typically comparatively sensitive for differences between the left and right sides of the audio scene.

In a preferred embodiment, the audio decoder is configured to provide the first residual signal and the second residual signal on the basis of the jointly-encoded representation of the first residual signal and of the second residual signal using a residual-signal-assisted multi-channel decoding. It has been found that a particularly good quality of the first and second residual signal can be achieved if the first residual signal and the second residual signal are provided using a multi-channel decoding, which in turn receives a residual signal (and typically also a downmix signal, which combines the first residual signal and the second residual signal). Thus, there is a cascading of decoding stages, wherein two residual signals (the first residual signal, which is used for providing the first audio channel signal and the second audio channel signal, and the second residual signal, which is used for providing the third audio channel signal and the fourth audio channel signal), are provided on the basis of an input downmix signal and an input residual signal, wherein the latter may also be designated as a common residual signal) of the first residual signal and the second residual signal). Thus, the first residual signal and the second residual signal are actually "intermediate" residual signals, which are derived using a multi-channel decoding from a corresponding downmix signal and a corresponding "common" residual signal.

In a preferred embodiment, the prediction-based multi-channel decoding is configured to evaluate a prediction parameter describing a contribution of a signal component, which is derived using a signal component of a previous frame, to the provision of the residual signals (i.e., the first residual signal and the second residual signal) of a current frame. Usage of such a prediction-based multi-channel decoding brings along a particularly good quality of the residual signals (first residual signal and second residual signal).

In a preferred embodiment, the prediction-based multi-channel decoding is configured to obtain the first residual signal and the second residual signal on the basis of a (corresponding) downmix signal and a (corresponding) "common" residual signal, wherein the prediction-based multi-channel decoding is configured to apply the common residual signal with a first sign, to obtain the first residual signal, and to apply the common residual signal with a second sign, which is opposite to the first sign, to obtain the second residual signal. It has been found that such a prediction-based multi-channel decoding brings along a good efficiency for reconstructing the first residual signal and the second residual signal.

In a preferred embodiment, the audio decoder is configured to provide the first residual signal and the second residual signal on the basis of the jointly-encoded representation of the first residual signal and of the second residual signal using a multi-channel decoding which is operative in the modified-discrete-cosine-transform domain (MDCT domain). It has been found that such a concept can be implemented in an efficient manner, since an audio decoding, which may be used to provide the jointly-encoded representation of the first residual signal and of the second residual signal, preferably operates in the MDCT domain. Accordingly, intermediate transformations can be avoided by applying the multi-

channel decoding for providing the first residual signal and the second residual signal in the MDCT domain.

In a preferred embodiment, the audio decoder is configured to provide the first residual signal and the second residual signal on the basis of the jointly-encoded representation of the first residual signal and of the second residual signal using a USAC complex stereo prediction (for example, as mentioned in the above referenced USAC standard). It has been found that such a USAC complex stereo prediction brings along good results for the decoding of the first residual signal and of the second residual signal. Moreover, usage of the USAC complex stereo prediction for the decoding of the first residual signal and the second residual signal also allows for a simple implementation of the concept using decoding blocks which are already available in the unified-speech-and-audio coding (USAC). Accordingly, a unified-speech-and-audio coding decoder may be easily reconfigured to perform the decoding concept discussed here.

In a preferred embodiment, the audio decoder is configured to provide the first audio channel signal and the second audio channel signal on the basis of the first downmix signal and the first residual signal using a parameter-based residual-signal-assisted multichannel decoding. Similarly, the audio decoder is configured to provide the third audio channel signal and the fourth audio channel signal on the basis of the second downmix signal and the second residual signal using a parameter-based residual-signal-assisted multi-channel decoding. It has been found that such a multi-channel decoding is well-suited for the derivation of the audio channel signals on the basis of the first downmix signal, the first residual signal, the second downmix signal and the second residual signal. Moreover, it has been found that such a parameter-based residual-signal-assisted multichannel decoding can be implemented with small effort using processing blocks which are already present in typical multi-channel audio decoders.

In a preferred embodiment, the parameter-based residual-signal-assisted multi-channel decoding is configured to evaluate one or more parameters describing a desired correlation between two channels and/or level differences between two channels in order to provide the two or more audio channel signals on the basis of a respective downmix signal and a respective corresponding residual signal. It has been found that such a parameter-based residual-signal-assisted multi-channel decoding is well adapted for the second stage of a cascaded multi-channel decoding (wherein, preferably, the first and

second downmix signals and the first and second residual signals are provided using a prediction-based multi-channel decoding).

In a preferred embodiment, the audio decoder is configured to provide the first audio channel signal and the second audio channel signal on the basis of the first downmix signal and the first residual signal using a residual-signal-assisted multi-channel decoding which is operative in the QMF domain. Similarly, the audio decoder is preferably configured to provide the third audio channel signal and the fourth audio channel signal on the basis of the second downmix signal and the second residual signal using a residual-signal-assisted multi-channel decoding which is operative in the QMF domain. Accordingly, the second stage of the hierarchical multi-channel decoding is operative in the QMF domain, which is well adapted to typical post-processing, which is also often performed in the QMF domain, such that intermediate conversions may be avoided.

In a preferred embodiment, the audio decoder is configured to provide the first audio channel signal and the second audio channel signal on the basis of the first downmix signal and the first residual signal using an MPEG Surround 2-1 -2 decoding or a unified stereo decoding. Similarly, the audio decoder is preferably configured to provide the third audio channel signal and the fourth audio channel signal on the basis of the second downmix signal and the second residual signal using a MPEG Surround 2-1 -2 decoding or a unified stereo decoding. It has been found that such decoding concepts are particularly well-suited for the second stage of a hierarchical decoding.

In a preferred embodiment, the first residual signal and the second residual signal are associated with different horizontal positions (or, equivalently, azimuth-positions) of an audio scene. It has been found that it is particularly advantageous to separate residual signals, which are associated with different horizontal positions (or azimuth positions), in a first stage of the hierarchical multi-channel processing because a particularly good hearing impression can be obtained if the perceptually important left/right separation is performed in a first stage of the hierarchical multi-channel decoding.

In a preferred embodiment, the first audio channel signal and the second channel signal are associated with vertically neighboring positions of the audio scene (or, equivalently, with neighboring elevation positions of the audio scene). Also, the third audio channel signal and the fourth audio channel signal are preferably associated with vertically neighboring positions of the audio scene (or, equivalently, with neighboring elevation

positions of the audio scene). It has been found that good decoding results can be achieved if the separation between upper and lower signals is performed in a second stage of the hierarchical audio decoding (which typically comprises a somewhat smaller separation accuracy than the first stage), since the human auditory system is less sensitive with respect to a vertical position of an audio source when compared to a horizontal position of the audio source.

In a preferred embodiment, the first audio channel signal and the second audio channel signal are associated with a first horizontal position of an audio scene (or, equivalently, azimuth position), and the third audio channel signal and the fourth audio channel signal are associated with a second horizontal position of the audio scene (or, equivalently, azimuth position), which is different from the first horizontal position (or, equivalently, azimuth position).

Preferably, the first residual signal is associated with a left side of an audio scene, and the second residual signal is associated with a right side of the audio scene. Accordingly, the left-right separation is performed in a first stage of the hierarchical audio decoding.

In a preferred embodiment, the first audio channel signal and the second audio channel signal are associated with the left side of the audio scene, and the third audio channel signal and the fourth audio channel signal are associated with a right side of the audio scene.

In another preferred embodiment, the first audio channel signal is associated with a lower left side of the audio scene, the second audio channel signal is associated with an upper left side of the audio scene, the third audio channel signal is associated with a lower right side of the audio scene, and the fourth audio channel signal is associated with an upper right side of the audio scene. Such an association of the audio channel signals brings along particularly good coding results.

In a preferred embodiment, the audio decoder is configured to provide the first downmix signal and the second downmix signal on the basis of a jointly-encoded representation of the first downmix signal and the second downmix signal using a multi-channel decoding, wherein the first downmix signal is associated with the left side of an audio scene and the second downmix signal is associated with the right side of the audio scene. It has been found that the downmix signals can also be encoded with good coding efficiency using a

multi-channel coding, even if the downmix signals are associated with different sides of the audio scene.

In a preferred embodiment, the audio decoder is configured to provide the first downmix signal and the second downmix signal on the basis of the jointly-encoded representation of the first downmix signal and of the second downmix signal using a prediction-based multi-channel decoding or even using a residual-signal-assisted prediction-based multichannel decoding. It has been found that the usage of such multi-channel decoding concepts provides for a particularly good decoding result. Also, existing decoding functions can be reused in some audio decoders.

In a preferred embodiment, the audio decoder is configured to perform a first multichannel bandwidth extension on the basis of the first audio channel signal and the third audio channel signal. Also, the audio decoder may be configured to perform a second (typically separate) multi-channel bandwidth extension on the basis of the second audio channel signal and the fourth audio channel signal. It has been found that it is advantageous to perform a possible bandwidth extension on the basis of two audio channel signals which are associated with different sides of an audio scene (wherein different residual signals are typically associated with different sides of the audio scene).

In a preferred embodiment, the audio decoder is configured to perform the first multichannel bandwidth extension in order to obtain two or more bandwidth-extended audio channel signals associated with a first common horizontal plane (or, equivalently, with a first common elevation) of an audio scene on the basis of the first audio channel signal and the third audio channel signal and one or more bandwidth extension parameters. Moreover, the audio decoder is preferably configured to perform the second multi-channel bandwidth extension in order to obtain two or more bandwidth-extended audio channel signals associated with a second common horizontal plane (or, equivalently, a second common elevation) of the audio scene on the basis of the second audio channel signal and the fourth audio channel signal and one or more bandwidth extension parameters. It has been found that such a decoding scheme results in good audio quality, since the multi-channel bandwidth extension can consider stereo characteristics, which are important for the hearing impression, in such an arrangement.

In a preferred embodiment, the jointly-encoded representation of the first residual signal and of the second residual signal comprises a channel pair element comprising a

downmix signal of the first and second residual signal and a common residual signal of the first and second residual signal. It has been found that the encoding of the downmix signal of the first and second residual signal and of the common residual signal of the first and second residual signal using a channel pair element is advantageous since the downmix signal of the first and second residual signal and the common residual signal of the first and second residual signal typically share a number of characteristics. Accordingly, the usage of a channel pair element typically reduces a signaling overhead and consequently allows for an efficient encoding.

In another preferred embodiment, the audio decoder is configured to provide the first downmix signal and the second downmix signal on the basis of a jointly-encoded representation of the first downmix signal and the second downmix signal using a multichannel decoding, wherein the jointly-encoded representation of the first downmix signal and of the second downmix signal comprises a channel pair element, the channel pair element comprising a downmix signal of the first and second downmix signal and a common residual signal of the first and second downmix signal. This embodiment is based on the same considerations as the embodiment described before.

Another embodiment according to the invention creates an audio encoder for providing an encoded representation on the basis of at least four audio channel signals. The audio encoder is configured to jointly encode at least a first audio channel signal and a second audio channel signal using a residual-signal-assisted multi-channel encoding, to obtain a first downmix signal and a first residual signal. The audio encoder is configured to jointly encode at least a third audio channel signal and a fourth audio channel signal using a residual-signal-assisted multi-channel encoding, to obtain a second downmix signal and a second residual signal. Moreover, the audio encoder is configured to jointly encode the first residual signal and the second residual signal using a multi-channel encoding, to obtain a jointly-encoded representation of the residual signals. This audio encoder is based on the same considerations as the above-described audio decoder.

Moreover, optional improvements of this audio encoder, and preferred configurations of the audio encoder, are substantially in parallel with improvements and preferred configurations of the audio decoder discussed above. Accordingly, reference is made to the above discussion.

Another embodiment according to the invention creates a method for providing at least four audio channel signals on the basis of an encoded representation, which substantially performs the functionality of the audio encoder described above, and which can be supplemented by any of the features and functionalities discussed above.

Another embodiment according to the invention creates a method for providing an encoded representation on the basis of at least four audio channel signals, which substantially fulfills the functionality of the audio decoder described above.

Another embodiment according to the invention creates a computer program for performing the methods mentioned above.

Brief Description of the Figures

Embodiments according to the present invention will subsequently be described taking reference to the enclosed figures in which:

Fig. 1 shows a block schematic diagram of an audio encoder, according to an embodiment of the present invention;

Fig. 2 shows a block schematic diagram of an audio decoder, according to embodiment of the present invention;

Fig. 3 shows a block schematic diagram of an audio decoder, according to another embodiment of the present invention;

Fig. 4 shows a block schematic diagram of an audio encoder, according to an embodiment of the present invention;

Fig. 5 shows a block schematic diagram of an audio decoder, according to an embodiment of the present invention;

Fig. 6 shows a block schematic diagram of an audio decoder, according to another embodiment of the present invention;

Fig. 7 shows a flowchart of a method for providing an encoded representation on the basis of at least four audio channel signals, according to an embodiment of the present invention;

Fig. 8 shows a flowchart of a method for providing at least four audio channel signals on the basis of an encoded representation, according to an embodiment of the invention;

Fig. 9 shows as flowchart of a method for providing an encoded representation on the basis of at least four audio channel signals, according to an embodiment of the invention; and

Fig. 10 shows a flowchart of a method for providing at least four audio channel signals on the basis of an encoded representation, according to an embodiment of the invention;

Fig. 1 1 shows a block schematic diagram of an audio encoder, according to an embodiment of the invention;

Fig. 12 shows a block schematic diagram of an audio encoder, according to another embodiment of the invention;

shows a block schematic diagram of an audio decoder, according to an embodiment of the invention;

shows a syntax representation of a bitstream, which can be used with the audio encoder according to Fig. 13;

shows a table representation of different values of the parameter qcelndex;

shows a block schematic diagram of a 3D audio encoder in which the concepts according to the present invention can be used;

Fig. 16 shows a block schematic diagram of a 3D audio decoder in which the concepts according to the present invention can be used; and

Fig. 17 shows a block schematic diagram of a format converter.

Fig. 18 shows a graphical representation of a topological structure of a Quad

Channel Element (QCE), according to an embodiment of the present invention;

Fig. 19 shows a block schematic diagram of an audio decoder, according to an embodiment of the present invention;

Fig. 20 shows a detailed block schematic diagram of a QCE Decoder, according to an embodiment of the present invention; and

Fig. 21 shows a detailed block schematic diagram of a Quad Channel Encoder, according to an embodiment of the present invention.

Detailed Description of the Embodiments

1. Audio encoder according to Fig. 1

Fig. 1 shows a block schematic diagram of an audio encoder, which is designated in its entirety with 100. The audio encoder 100 is configured to provide an encoded representation on the basis of at least four audio channel signals. The audio encoder 100 is configured to receive a first audio channel signal 1 10, a second audio channel signal 1 12, a third audio channel signal 1 14 and a fourth audio channel signal 1 16. Moreover, the audio encoder 100 is configured to provide an encoded representation of a first downmix signal 120 and of a second downmix signal 122, as well as a jointly-encoded representation 130 of residual signals. The audio encoder 100 comprises a residual-signal-assisted multi-channel encoder 140, which is configured to jointly-encode the first audio channel signal 1 10 and the second audio channel signal 1 12 using a residual-signal-assisted multi-channel encoding, to obtain the first downmix signal 120 and a first residual signal 142. The audio signal encoder 100 also comprises a residual-signal-assisted multi-channel encoder 150, which is configured to jointly-encode at least the third audio channel signal 1 14 and the fourth audio channel signal 1 16 using a residual-signal-assisted multi-channel encoding, to obtain the second downmix signal 122 and a second residual signal 152. The audio decoder 100 also comprises a multi-channel encoder 160, which is configured to jointly encode the first residual signal 142 and the second residual signal 152 using a multi-channel encoding, to obtain the jointly encoded representation 130 of the residual signals 142, 152.

Regarding the functionality of the audio encoder 100, it should be noted that the audio encoder 100 performs a hierarchical encoding, wherein the first audio channel signal 1 10 and the second audio channel signal 1 12 are jointly-encoded using the residual-signal-assisted multi-channel encoding 140, wherein both the first downmix signal 120 and the first residual signal 142 are provided. The first residual signal 142 may, for example, describe differences between the first audio channel signal 1 10 and the second audio channel signal 1 12, and/or may describe some or any signal features which cannot be represented by the first downmix signal 120 and optional parameters, which may be provided by the residual-signal-assisted multi-channel encoder 140. In other words, the first residual signal 142 may be a residual signal which allows for a refinement of a decoding result which may be obtained on the basis of the first downmix signal 120 and any possible parameters which may be provided by the residual-signal-assisted multichannel encoder 140. For example, the first residual signal 142 may allow at least for a partial waveform reconstruction of the first audio channel signal 1 10 and of the second audio channel signal 1 12 at the side of an audio decoder when compared to a mere reconstruction of high-level signal characteristics (like, for example, correlation characteristics, covariance characteristics, level difference characteristics, and the like). Similarly, the residual-signal-assisted multi-channel encoder 150 provides both the second downmix signal 122 and the second residual signal 152 on the basis of the third audio channel signal 1 14 and the fourth audio channel signal 1 16, such that the second residual signal allows for a refinement of a signal reconstruction of the third audio channel signal 1 14 and of the fourth audio channel signal 1 16 at the side of an audio decoder. The second residual signal 152 may consequently serve the same functionality as the first residual signal 142. However, if the audio channel signals 1 10, 1 12, 1 14, 1 16 comprise some correlation, the first residual signal 142 and the second residual signal 52 are typically also correlated to some degree. Accordingly, the joint encoding of the first residual signal 142 and of the second residual signal 152 using the multi-channel encoder 160 typically comprises a high efficiency since a multi-channel encoding of correlated signals typically reduces the bitrate by exploiting the dependencies. Consequently, the first residual signal 142 and the second residual signal 152 can be encoded with good precision while keeping the bitrate of the jointly-encoded representation 130 of the residual signals reasonably small.

To summarize, the embodiment according to Fig. 1 provides a hierarchical multi-channel encoding, wherein a good reproduction quality can be achieved by using the residual-signal-assisted multi-channel encoders 140, 150, and wherein a bitrate demand can be kept moderate by jointly-encoding a first residual signal 142 and a second residual signal 152.

Further optional improvement of the audio encoder 100 is possible. Some of these improvements will be described taking reference to Figs. 4, 1 1 and 12. However, it should be noted that the audio encoder 100 can also be adapted in parallel with the audio decoders described herein, wherein the functionality of the audio encoder is typically inverse to the functionality of the audio decoder.

2. Audio decoder according to Fig. 2

Fig. 2 shows a block schematic diagram of an audio decoder, which is designated in its entirety with 200.

The audio decoder 200 is configured to receive an encoded representation which comprises a jointly-encoded representation 210 of a first residual signal and a second residual signal. The audio decoder 200 also receives a representation of a first downmix signal 212 and of a second downmix signal 214. The audio decoder 200 is configured to provide a first audio channel signal 220, a second audio channel signal 222, a third audio channel signal 224 and a fourth audio channel signal 226.

The audio decoder 200 comprises a multi-channel decoder 230, which is configured to provide a first residual signal 232 and a second residual signal 234 on the basis of the jointly-encoded representation 210 of the first residual signal 232 and of the second residual signal 234. The audio decoder 200 also comprises a (first) residual-signal-assisted multi-channel decoder 240 which is configured to provide the first audio channel signal 220 and the second audio channel signal 222 on the basis of the first downmix signal 212 and the first residual signal 232 using a multi-channel decoding. The audio decoder 200 also comprises a (second) residual-signal-assisted multi-channel decoder 250, which is configured to provide the third audio channel signal 224 and the fourth audio channel signal 226 on the basis of the second downmix signal 214 and the second residual signal 234.

Regarding the functionality of the audio decoder 200, it should be noted that the audio signal decoder 200 provides the first audio channel signal 220 and the second audio channel signal 222 on the basis of a (first) common residual-signal-assisted multi-channel decoding 240, wherein the decoding quality of the multi-channel decoding is increased by the first residual signal 232 (when compared to a non-residual-signal-assisted decoding). In other words, the first downmix signal 212 provides a "coarse" information about the first audio channel signal 220 and the second audio channel signal 222, wherein, for example, differences between the first audio channel signal 220 and the second audio channel signal 222 may be described by (optional) parameters, which may be received by the residual-signal-assisted multi-channel decoder 240 and by the first residual signal 232. Consequently, the first residual signal 232 may, for example, allow for a partial waveform reconstruction of the first audio channel signal 220 and of the second audio channel signal 222.

Similarly, the (second) residual-signal-assisted multi-channel decoder 250 provides the third audio channel signal 224 in the fourth audio channel signal 226 on the basis of the second downmix signal 214, wherein the second downmix signal 214 may, for example, "coarsely" describe the third audio channel signal 224 and the fourth audio channel signal 226. Moreover, differences between the third audio channel signal 224 and the fourth audio channel signal 226 may, for example, be described by (optional) parameters, which may be received by the (second) residual-signal-assisted multi-channel decoder 250 and by the second residual signal 234. Accordingly, the evaluation of the second residual signal 234 may, for example, allow for a partial waveform reconstruction of the third audio channel signal 224 and the fourth audio channel signal 226. Accordingly, the second residual signal 234 may allow for an enhancement of the quality of reconstruction of the third audio channel signal 224 and the fourth audio channel signal 226.

However, the first residual signal 232 and the second residual signal 234 are derived from a jointly-encoded representation 210 of the first residual signal and of the second residual signal. Such a multi-channel decoding, which is performed by the multi-channel decoder 230, allows for a high decoding efficiency since the first audio channel signal 220, the second audio channel signal 222, the third audio channel signal 224 and the fourth audio channel signal 226 are typically similar or "correlated". Accordingly, the first residual signal 232 and the second residual signal 234 are typically also similar or "correlated", which can be exploited by deriving the first residual signal 232 and the second residual signal 234 from a jointly-encoded representation 210 using a multi-channel decoding.

Consequently, it is possible to obtain a high decoding quality with moderate bitrate by decoding the residual signals 232, 234 on the basis of a jointly-encoded representation 210 thereof, and by using each of the residual signals for the decoding of two or more audio channel signals.

To conclude, the audio decoder 200 allows for a high coding efficiency by providing high quality audio channel signals 220, 222, 224, 226.

It should be noted that additional features and functionalities, which can be implemented optionally in the audio decoder 200, will be described subsequently taking reference to Figs. 3, 5, 6 and 13. However, it should be noted that the audio encoder 200 may comprise the above-mentioned advantages without any additional modification.

3. Audio decoder according to Fig. 3

Fig. 3 shows a block schematic diagram of an audio decoder according to another embodiment of the present invention. The audio decoder of Fig. 3 designated in its entirety with 300. The audio decoder 300 is similar to the audio decoder 200 according to Fig. 2, such that the above explanations also apply. However, the audio decoder 300 is supplemented with additional features and functionalities when compared to the audio decoder 200, as will be explained in the following.

The audio decoder 300 is configured to receive a jointly-encoded representation 310 of a first residual signal and of a second residual signal. Moreover, the audio decoder 300 is configured to receive a jointly-encoded representation 360 of a first downmix signal and of a second downmix signal. Moreover, the audio decoder 300 is configured to provide a first audio channel signal 320, a second audio channel signal 322, a third audio channel signal 324 and a fourth audio channel signal 326. The audio decoder 300 comprises a multichannel decoder 330 which is configured to receive the jointly-encoded representation 310 of the first residual signal and of the second residual signal and to provide, on the basis thereof, a first residual signal 332 and a second residual signal 334. The audio decoder 300 also comprises a (first) residual-signal-assisted multi-channel decoding 340, which receives the first residual signal 332 and a first downmix signal 312, and provides the first audio channel signal 320 and the second audio channel signal 322. The audio decoder 300 also comprises a (second) residual-signal-assisted multi-channel decoding 350, which is configured to receive the second residual signal 334 and a second downmix signal 314, and to provide the third audio channel signal 324 and the fourth audio channel signal 326.

The audio decoder 300 also comprises another multi-channel decoder 370, which is configured to receive the jointly-encoded representation 360 of the first downmix signal and of the second downmix signal, and to provide, on the basis thereof, the first downmix signal 312 and the second downmix signal 314.

In the following, some further specific details of the audio decoder 300 will be described. However, it should be noted that an actual audio decoder does not need to implement a combination of all these additional features and functionalities. Rather, the features and functionalities described in the following can be individually added to the audio decoder 200 (or any other audio decoder), to gradually improve the audio decoder 200 (or any other audio decoder).

In a preferred embodiment, the audio decoder 300 receives a jointly-encoded representation 310 of the first residual signal and the second residual signal, wherein this jointly-encoded representation 310 may comprise a downmix signal of the first residual signal 332 and of the second residual signal 334, and a common residual signal of the first residual signal 332 and the second residual signal 334. In addition, the jointly-encoded representation 310 may, for example, comprise one or more prediction parameters. Accordingly, the multi-channel decoder 330 may be a prediction-based, residual-signal-assisted multi-channel decoder. For example, the multi-channel decoder 330 may be a USAC complex stereo prediction, as described, for example, in the section "Complex Stereo Prediction" of the international standard ISO/IEC 23003-3:2012. For example, the multi-channel decoder 330 may be configured to evaluate a prediction parameter describing a contribution of a signal component, which is derived using a signal component of a previous frame, to a provision of the first residual signal 332 and the second residual signal 334 for a current frame. Moreover, the multi-channel decoder 330 may be configured to apply the common residual signal (which is included in the jointly-encoded representation 310) with a first sign, to obtain the first residual signal 332, and to apply the common residual signal (which is included in the jointly-encoded representation 310) with a second sign, which is opposite to the first sign, to obtain the second residual signal 334. Thus, the common residual signal may, at least partly, describe differences between the first residual signal 332 and the second residual signal 334. However, the multi-channel decoder 330 may evaluate the downmix signal, the common residual signal and the one or more prediction parameters, which are all included in the jointly-encoded representation 310, to obtain the first residual signal 332 and the second residual signal 334 as described in the above-referenced international standard ISO/IEC 23003-3:2012. Moreover, it should be noted that the first residual signal 332 may be associated with a first horizontal position (or azimuth position), for example, a left horizontal position, and that the second residual signal 334 may be associated with a second horizontal position (or azimuth position), for example a right horizontal position, of an audio scene.

The jointly-encoded representation 360 of the first downmix signal and of the second downmix signal preferably comprises a downmix signal of the first downmix signal and of the second downmix signal, a common residual signal of the first downmix signal and of the second downmix signal, and one or more prediction parameters. In other words, there is a "common" downmix signal, into which the first downmix signal 312 and the second downmix signal 314 are downmixed, and there is a "common" residual signal which may describe, at least partly, differences between the first downmix signal 312 and the second downmix signal 314. The multi-channel decoder 370 is preferably a prediction-based, residual-signal-assisted multi-channel decoder, for example, a USAC complex stereo prediction decoder. In other words, the multi-channel decoder 370, which provides the first downmix signal 312 and the second downmix signal 314 may be substantially identical to the multi-channel decoder 330, which provides the first residual signal 332 and the second residual signal 334, such that the above explanations and references also apply. Moreover, it should be noted that the first downmix signal 312 is preferably associated with a first horizontal position or azimuth position (for example, left horizontal position or azimuth position) of the audio scene, and that the second downmix signal 314 is preferably associated with a second horizontal position or azimuth position (for example, right horizontal position or azimuth position) of the audio scene. Accordingly, the first downmix signal 312 and the first residual signal 332 may be associated with the same, first horizontal position or azimuth position (for example, left horizontal position), and the second downmix signal 314 and the second residual signal 334 may be associated with the same, second horizontal position or azimuth position (for example, right horizontal position). Accordingly, both the multi-channel decoder 370 and the multi-channel decoder 330 may perform a horizontal splitting (or horizontal separation or horizontal distribution).

The residual-signal-assisted multi-channel decoder 340 may preferably be parameter-based, and may consequently receive one or more parameters 342 describing a desired correlation between two channels (for example, between the first audio channel signal 320 and the second audio channel signal 322) and/or level differences between said two channels. For example, the residual-signal-assisted multi-channel decoding 340 may be based on an MPEG-Surround coding (as described, for example, in ISO/IEC 23003-1 :2007) with a residual signal extension or a "unified stereo decoding" decoder (as described, for example in ISO/IEC 23003-3, chapter 7.1 1 (Decoder) & Annex B.21 (Description of the Encoder & Definition of the Term "Unified Stereo")). Accordingly, the residual-signal-assisted multi-channel decoder 340 may provide the first audio channel signal 320 and the second audio channel signal 322, wherein the first audio channel signal 320 and the second audio channel signal 322 are associated with vertically neighboring positions of the audio scene. For example, the first audio channel signal may be associated with a lower left position of the audio scene, and the second audio channel signal may be associated with an upper left position of the audio scene (such that the first audio channel signal 320 and the second audio channel signal 322 are, for example, associated with identical horizontal positions or azimuth positions of the audio scene, or with azimuth positions separated by no more than 30 degrees). In other words, the residual-signal-assisted multi-channel decoder 340 may perform a vertical splitting (or distribution, or separation).

The functionality of the residual-signal-assisted multi-channel decoder 350 may be identical to the functionality of the residual-signal-assisted multi-channel decoder 340, wherein the third audio channel signal may, for example, be associated with a lower right position of the audio scene, and wherein the fourth audio channel signal may, for example, be associated with an upper right position of the audio scene. In other words, the third audio channel signal and the fourth audio channel signal may be associated with vertically neighboring positions of the audio scene, and may be associated with the same horizontal position or azimuth position of the audio scene, wherein the residual-signal-assisted multi-channel decoder 350 performs a vertical splitting (or separation, or distribution).

To summarize, the audio decoder 300 according to Fig. 3 performs a hierarchical audio decoding, wherein a left-right splitting is performed in the first stages (multi-channel decoder 330, multi-channel decoder 370), and wherein an upper-lower splitting is

performed in the second stage (residual-signal-assisted multi-channel decoders 340, 350). Moreover, the residual signals 332, 334 are also encoded using a jointly-encoded representation 310, as well as the downmix signals 312, 314 (jointly-encoded representation 360). Thus, correlations between the different channels are exploited both for the encoding (and decoding) of the downmix signals 312, 314 and for the encoding (and decoding) of the residual signals 332, 334. Accordingly, a high coding efficiency is achieved, and the correlations between the signals are well exploited.

4. Audio encoder according to Fig. 4

Fig. 4 shows a block schematic diagram of an audio encoder, according to another embodiment of the present invention. The audio encoder according to Fig. 4 is designated in its entirety with 400. The audio encoder 400 is configured to receive four audio channel signals, namely a first audio channel signal 410, a second audio channel signal 412, a third audio channel signal 414 and a fourth audio channel signal 416. Moreover, the audio encoder 400 is configured to provide an encoded representation on the basis of the audio channel signals 410, 412, 414 and 416, wherein said encoded representation comprises a jointly encoded representation 420 of two downmix signals, as well as an encoded representation of a first set 422 of common bandwidth extension parameters and of a second set 424 of common bandwidth extension parameters. The audio encoder 400 comprises a first bandwidth extension parameter extractor 430, which is configured to obtain the first set 422 of common bandwidth extraction parameters on the basis of the first audio channel signal 410 and the third audio channel signal 414. The audio encoder 400 also comprises a second bandwidth extension parameter extractor 440, which is configured to obtain the second set 424 of common bandwidth extension parameters on the basis of the second audio channel signal 412 and the fourth audio channel signal 416.

Moreover, the audio encoder 400 comprises a (first) multi-channel encoder 450, which is configured to jointly-encode at least the first audio channel signal 410 and the second audio channel signal 412 using a multi-channel encoding, to obtain a first downmix signal 452. Further, the audio encoder 400 also comprises a (second) multi-channel encoder 460, which is configured to jointly-encode at least the third audio channel signal 414 and the fourth audio channel signal 416 using a multi-channel encoding, to obtain a second downmix signal 462. Further, the audio encoder 400 also comprises a (third) multi-channel encoder 470, which is configured to jointly-encode the first downmix signal 452 and the second downmix signal 462 using a multi-channel encoding, to obtain the jointly-encoded representation 420 of the downmix signals.

Regarding the functionality of the audio encoder 400, it should be noted that the audio encoder 400 performs a hierarchical multi-channel encoding, wherein the first audio channel signal 410 and the second audio channel signal 412 are combined in a first stage, and wherein the third audio channel signal 414 and the fourth audio channel signal 416 are also combined in the first stage, to thereby obtain the first downmix signal 452 and the second downmix signal 462. The first downmix signal 452 and the second downmix signal 462 are then jointly encoded in a second stage. However, it should be noted that the first bandwidth extension parameter extractor 430 provides the first set 422 of common bandwidth extraction parameters on the basis of audio channel signals 410, 414 which are handled by different multi-channel encoders 450, 460 in the first stage of the hierarchical multi-channel encoding. Similarly, the second bandwidth extension parameter extractor 440 provides a second set 424 of common bandwidth extraction parameters on the basis of different audio channel signals 412, 416, which are handled by different multi-channel encoders 450, 460 in the first processing stage. This specific processing order brings along the advantage that the sets 422, 424 of bandwidth extension parameters are based on channels which are only combined in the second stage of the hierarchical encoding (i.e., in the multi-channel encoder 470). This is advantageous, since it is desirable to combine such audio channels in the first stage of the hierarchical encoding, the relationship of which is not highly relevant with respect to a sound source position perception. Rather, it is recommendable that the relationship between the first downmix signal and the second downmix signal mainly determines a sound source location perception, because the relationship between the first downmix signal 452 and the second downmix signal 462 can be maintained better than the relationship between the individual audio channel signals 410, 412, 414, 416. Worded differently, it has been found that it is desirable that the first set 422 of common bandwidth extension parameters is based on two audio channels (audio channel signals) which contribute to different of the downmix signals 452, 462, and that the second set 424 of common bandwidth extension parameters is provided on the basis of audio channel signals 412, 416, which also contribute to different of the downmix signals 452, 462, which is reached by the above-described processing of the audio channel signals in the hierarchical multi-channel encoding. Consequently, the first set 422 of common bandwidth extension parameters is based on a similar channel relationship when compared to the channel relationship between the first downmix signal 452 and the second downmix signal 462, wherein the

latter typically dominates the spatial impression generated at the side of an audio decoder. Accordingly, the provision of the first set 422 of bandwidth extension parameters, and also the provision of the second set 424 of bandwidth extension parameters is well-adapted to a spatial hearing impression which is generated at the side of an audio decoder.

5. Audio decoder according to Fig. 5

Fig. 5 shows a block schematic diagram of an audio decoder, according to another embodiment of the present invention. The audio decoder according to Fig. 5 is designated in its entirety with 500.

The audio decoder 500 is configured to receive a jointly-encoded representation 510 of a first downmix signal and a second downmix signal. Moreover, the audio decoder 500 is configured to provide a first bandwidth-extended channel signal 520, a second bandwidth extended channel signal 522, a third bandwidth-extended channel signal 524 and a fourth bandwidth-extended channel signal 526.

The audio decoder 500 comprises a (first) multi-channel decoder 530, which is configured to provide a first downmix signal 532 and a second downmix signal 534 on the basis of the jointly-encoded representation 510 of the first downmix signal and the second downmix signal using a multi-channel decoding. The audio decoder 500 also comprises a (second) multi-channel decoder 540, which is configured to provide at least a first audio channel signal 542 and a second audio channel signal 544 on the basis of the first downmix signal 532 using a multi-channel decoding. The audio decoder 500 also comprises a (third) multi-channel decoder 550, which is configured to provide at least a third audio channel signal 556 and a fourth audio channel signal 558 on the basis of the second downmix signal 544 using a multi-channel decoding. Moreover, the audio decoder 500 comprises a (first) multi-channel bandwidth extension 560, which is configured to perform a multi-channel bandwidth extension on the basis of the first audio channel signal 542 and the third audio channel signal 556, to obtain a first bandwidth-extended channel signal 520 and the third bandwidth-extended channel signal 524. Moreover, the audio decoder comprises a (second) multi-channel bandwidth extension 570, which is configured to perform a multi-channel bandwidth extension on the basis of the second audio channel signal 544 and the fourth audio channel signal 558, to obtain the second bandwidth-extended channel signal 522 and the fourth bandwidth-extended channel signal 526.

Regarding the functionality of the audio decoder 500, it should be noted that the audio decoder 500 performs a hierarchical multi-channel decoding, wherein a splitting between a first downmix signal 532 and a second downmix signal 534 is performed in a first stage of the hierarchical decoding, and wherein the first audio channel signal 542 and the second audio channel signal 544 are derived from the first downmix signal 532 in a second stage of the hierarchical decoding, and wherein the third audio channel signal 556 and the fourth audio channel signal 558 are derived from the second downmix signal 550 in the second stage of the hierarchical decoding. However, both the first multi-channel bandwidth extension 560 and the second multi-channel bandwidth extension 570 each receive one audio channel signal which is derived from the first downmix signal 532 and one audio channel signal which is derived from the second downmix signal 534. Since a better channel separation is typically achieved by the (first) multi-channel decoding 530, which is performed as a first stage of the hierarchical multi-channel decoding, when compared to the second stage of the hierarchical decoding, it can be seen that each multichannel bandwidth extension 560, 570 receives input signals which are well-separated (because they originate from the first downmix signal 532 and the second downmix signal 534, which are well-channel-separated). Thus, the multi-channel bandwidth extension 560, 570 can consider stereo characteristics, which are important for a hearing impression, and which are well-represented by the relationship between the first downmix signal 532 and the second downmix signal 534, and can therefore provide a good hearing impression.

In other words, the "cross" structure of the audio decoder, wherein each of the multichannel bandwidth extension stages 560, 570 receives input signals from both (second stage) multi-channel decoders 540, 550 allows for a good multi-channel bandwidth extension, which considers a stereo relationship between the channels.

However, it should be noted that the audio decoder 500 can be supplemented by any of the features and functionalities described herein with respect to the audio decoders according to Figs. 2, 3, 6 and 13, wherein it is possible to introduce individual features into the audio decoder 500 to gradually improve the performance of the audio decoder.

6. Audio decoder according to Fig. 6

Fig. 6 shows a block schematic diagram of an audio decoder according to another embodiment of the present invention. The audio decoder according to Fig. 6 is designated in its entirety with 600. The audio decoder 600 according to Fig. 6 is similar to the audio decoder 500 according to Fig. 5, such that the above explanations also apply. However, the audio decoder 600 has been supplemented by some features and functionalities, which can also be introduced, individually or in combination, into the audio decoder 500 for improvement.

The audio decoder 600 is configured to receive a jointly encoded representation 610 of a first downmix signal and of a second downmix signal and to provide a first bandwidth-extended signal 620, a second bandwidth extended signal 622, a third bandwidth extended signal 624 and a fourth bandwidth extended signal 626. The audio decoder 600 comprises a multi-channel decoder 630, which is configured to receive the jointly encoded representation 610 of the first downmix signal and of the second downmix signal, and to provide, on the basis thereof, the first downmix signal 632 and the second downmix signal 634. The audio decoder 600 further comprises a multi-channel decoder 640, which is configured to receive the first downmix signal 632 and to provide, on the basis thereof, a first audio channel signal 542 and a second audio channel signal 544. The audio decoder 600 also comprises a multi-channel decoder 650, which is configured to receive the second downmix signal 634 and to provide a third audio channel signal 656 and a fourth audio channel signal 658. The audio decoder 600 also comprises a (first) multi-channel bandwidth extension 660, which is configured to receive the first audio channel signal 642 and the third audio channel signal 656 and to provide, on the basis thereof, the first bandwidth extended channel signal 620 and the third bandwidth extended channel signal 624. Also, a (second) multi-channel bandwidth extension 670 receives the second audio channel signal 644 and the fourth audio channel signal 658 and provides, on the basis thereof, the second bandwidth extended channel signal 622 and the fourth bandwidth extended channel signal 626.

The audio decoder 600 also comprises a further multi-channel decoder 680, which is configured to receive a jointly-encoded representation 682 of a first residual signal and of a second residual signal and which provides, on the basis thereof, a first residual signal 684 for usage by the multi-channel decoder 640 and a second residual signal 686 for usage by the multi-channel decoder 650.

The multi-channel decoder 630 is preferably a prediction-based residual-signal-assisted multi-channel decoder. For example, the multi-channel decoder 630 may be substantially identical to the multi-channel decoder 370 described above. For example, the multichannel decoder 630 may be a USAC complex stereo predication decoder, as mentioned above, and as described in the USAC standard referenced above. Accordingly, the jointly encoded representation 610 of the first downmix signal and of the second downmix signal may, for example, comprise a (common) downmix signal of the first downmix signal and of the second downmix signal, a (common) residual signal of the first downmix signal and of the second downmix signal, and one or more prediction parameters, which are evaluated by the multi-channel decoder 630.

Moreover, it should be noted that the first downmix signal 632 may, for example, be associated with a first horizontal position or azimuth position (for example, a left horizontal position) of an audio scene and that the second downmix signal 634 may, for example, be associated with a second horizontal position or azimuth position (for example, a right horizontal position) of the audio scene.

Moreover, the multi-channel decoder 680 may, for example, be a prediction-based, residual-signal-associated multi-channel decoder. The multi-channel decoder 680 may be substantially identical to the multi-channel decoder 330 described above. For example, the multi-channel decoder 680 may be a USAC complex stereo prediction decoder, as mentioned above. Consequently, the jointly encoded representation 682 of the first residual signal and of the second residual signal may comprise a (common) downmix signal of the first residual signal and of the second residual signal, a (common) residual signal of the first residual signal and of the second residual signal, and one or more prediction parameters, which are evaluated by the multi-channel decoder 680. Moreover, it should be noted that the first residual signal 684 may be associated with a first horizontal position or azimuth position (for example, a left horizontal position) of the audio scene, and that the second residual signal 686 may be associated with a second horizontal position or azimuth position (for example, a right horizontal position) of the audio scene.

The multi-channel decoder 640 may, for example, be a parameter-based multi-channel decoding like, for example, an MPEG surround multi-channel decoding, as described above and in the referenced standard. However, in the presence of the (optional) multichannel decoder 680 and the (optional) first residual signal 684, the multi-channel

decoder 640 may be a parameter-based, residual-signal-assisted multi-channel decoder, like, for example, a unified stereo decoder. Thus, the multi-channel decoder 640 may be substantially identical to the multi-channel decoder 340 described above, and the multichannel decoder 640 may, for example, receive the parameters 342 described above.

Similarly, the multi-channel decoder 650 may be substantially identical to the multichannel decoder 640. Accordingly, the multi-channel decoder 650 may, for example, be parameter based and may optionally be residual-signal assisted (in the presence of the optional multi-channel decoder 680).

Moreover, it should be noted that the first audio channel signal 642 and the second audio channel signal 644 are preferably associated with vertically adjacent spatial positions of the audio scene. For example, the first audio channel signal 642 is associated with a lower left position of the audio scene and the second audio channel signal 644 is associated with an upper left position of the audio scene. Accordingly, the multi-channel decoder 640 performs a vertical splitting (or separation or distribution) of the audio content described by the first downmix signal 632 (and, optionally, by the first residual signal 684). Similarly, the third audio channel signal 656 and the fourth audio channel signal 658 are associated with vertically adjacent positions of the audio scene, and are preferably associated with the same horizontal position or azimuth position of the audio scene. For example, the third audio channel signal 656 is preferably associated with a lower right position of the audio scene and the fourth audio channel signal 658 is preferably associated with an upper right position of the audio scene. Thus, the multi-channel decoder 650 performs a vertical splitting (or separation, or distribution) of the audio content described by the second downmix signal 634 (and, optionally, the second residual signal 686).

However, the first multi-channel bandwidth extension 660 receives the first audio channel signal 642 and the third audio channel 656, which are associated with the lower left position and a lower right position of the audio scene. Accordingly, the first multi-channel bandwidth extension 660 performs a multi-channel bandwidth extension on the basis of two audio channel signals which are associated with the same horizontal plane (for example, lower horizontal plane) or elevation of the audio scene and different sides (left/right) of the audio scene. Accordingly, the multi-channel bandwidth extension can consider stereo characteristics (for example, the human stereo perception) when performing the bandwidth extension. Similarly, the second multi-channel bandwidth

extension 670 may also consider stereo characteristics, since the second multi-channel bandwidth extension operates on audio channel signals of the same horizontal plane (for example, upper horizontal plane) or elevation but at different horizontal positions (different sides) (left/right) of the audio scene.

To further conclude, the hierarchical audio decoder 600 comprises a structure wherein a left/right splitting (or separation, or distribution) is performed in a first stage (multi-channel decoding 630, 680), wherein a vertical splitting (separation or distribution) is performed in a second stage (multi-channel decoding 640, 650), and wherein the multi-channel bandwidth extension operates on a pair of left/right signals (multi-channel bandwidth extension 660, 670). This "crossing" of the decoding pathes allows that left/right separation, which is particularly important for the hearing impression (for example, more important than the upper/lower splitting) can be performed in the first processing stage of the hierarchical audio decoder and that the multi-channel bandwidth extension can also be performed on a pair of left-right audio channel signals, which again results in a particularly good hearing impression. The upper/lower splitting is performed as an intermediate stage between the left-right separation and the multi-channel bandwidth extension, which allows to derive four audio channel signals (or bandwidth-extended channel signals) without significantly degrading the hearing impression.

7. Method according to Fig. 7

Fig. 7 shows a flow chart of a method 700 for providing an encoded representation on the basis of at least four audio channel signals.

The method 700 comprises jointly encoding 710 at least a first audio channel signal and a second audio channel signal using a residual-signal-assisted multi-channel encoding, to obtain a first downmix signal and a first residual signal. The method also comprises jointly encoding 720 at least a third audio channel signal and a fourth audio channel signal using a residual-signal-assisted multi-channel encoding, to obtain a second downmix signal and a second residual signal. The method further comprises jointly encoding 730 the first residual signal and the second residual signal using a multi-channel encoding, to obtain an encoded representation of the residual signals. However, it should be noted that the method 700 can be supplemented by any of the features and functionalities described herein with respect to the audio encoders and audio decoders.

8. Method according to Fig, 8

Fig. 8 shows a flow chart of a method 800 for providing at least four audio channel signals on the basis of an encoded representation.

The method 800 comprises providing 810 a first residual signal and a second residual signal on the basis of a jointly-encoded representation of the first residual signal and the second residual signal using a multi-channel decoding. The method 800 also comprises providing 820 a first audio channel signal and a second audio channel signal on the basis of a first downmix signal and the first residual signal using a residual-signal-assisted multichannel decoding. The method also comprises providing 830 a third audio channel signal and a fourth audio channel signal on the basis of a second downmix signal and the second residual signal using a residual-signal-assisted multi-channel decoding.

Moreover, it should be noted that the method 800 can be supplemented by any of the features and functionalities described herein with respect to the audio decoders and audio encoders.

9. Method according to Fig. 9

Fig. 9 shows a flow chart of a method 900 for providing an encoded representation on the basis of at least four audio channel signal.

The method 900 comprises obtaining 910 a first set of common bandwidth extension parameters on the basis of a first audio channel signal and a third audio channel signal. The method 900 also comprises obtaining 920 a second set of common bandwidth extension parameters on the basis of a second audio channel signal and a fourth audio channel signal. The method also comprises jointly encoding at least the first audio channel signal and the second audio channel signal using a multi-channel encoding, to obtain a first downmix signal and jointly encoding 940 at least the third audio channel signal and the fourth audio channel signal using a multi-channel encoding to obtain a second downmix signal. The method also comprises jointly encoding 950 the first downmix signal and the second downmix signal using a multi-channel encoding, to obtain an encoded representation of the downmix signals.

It should be noted that some of the steps of the method 900, which do not comprise specific inter dependencies, can be performed in arbitrary order or in parallel. Moreover, it should be noted that the method 900 can be supplemented by any of the features and functionalities described herein with respect to the audio encoders and audio decoders.

10. Method according to Fig. 10

Fig. 10 shows a flow chart of a method 1000 for providing at least four audio channel signals on the basis of an encoded representation.

The method 1000 comprises providing 1010 a first downmix signal and a second downmix signal on the basis of a jointly encoded representation of the first downmix signal and the second downmix signal using a multi-channel decoding, providing 1020 at least a first audio channel signal and a second audio channel signal on the basis of the first downmix signal using a multi-channel decoding, providing 1030 at least a third audio channel signal and a fourth audio channel signal on the basis of the second downmix signal using a multi-channel decoding, performing 1040 a multi-channel bandwidth extension on the basis of the first audio channel signal and the third audio channel signal, to obtain a first bandwidth-extended channel signal and a third bandwidth-extended channel signal, and performing 1050 a multi-channel bandwidth extension on the basis of the second audio channel signal and the fourth audio channel signal, to obtain a second bandwidth-extended channel signal and a fourth bandwidth-extended channel signal.

Claims

1. An audio decoder (200; 300; 600; 1300; 1600; 2000) for providing at least four audio channel signals (220, 222, 224, 226; 320, 322, 324, 326; 620, 622, 624, 626; 1320, 1322, 1324, 1326) on the basis of an encoded representation (210; 310,

360; 610, 682; 1310, 1312; 1610),

wherein the audio decoder is configured to provide a first residual signal (232; 332; 684; 1362) and a second residual signal (234; 334; 686; 1364) on the basis of a jointly encoded representation (210; 310; 682; 1312) of the first residual signal and of the second residual signal using a multi-channel decoding (230; 330; 680; 1360);

wherein the audio decoder is configured to provide a first audio channel signal (220; 320; 642; 1372) and a second audio channel signal (222; 322; 644; 1374) on the basis of a first downmix signal (212; 312; 632; 1342) and the first residual signal using a residual-signal-assisted multi-channel decoding (240; 340; 640; 1370); and

wherein the audio decoder is configured to provide a third audio channel signal (224; 324; 656; 1382) and a fourth audio channel signal (226; 326; 658; 1384) on the basis of a second downmix signal (214; 314; 634; 1344) and the second residual signal using a residual-signal-assisted multi-channel decoding (250; 350; 650; 1380).

2. The audio decoder according to claim 1 , wherein the audio decoder is configured to provide the first downmix signal (212; 312; 632; 1342) and the second downmix signal (214; 314; 634; 1344) on the basis of a jointly-encoded representation (360; 610; 1310) of the first downmix signal and the second downmix signal using a multi-channel decoding (370; 630; 1340).

3. The audio decoder according to claim 1 or claim 2, wherein the audio decoder is configured to provide the first residual signal and the second residual signal on the basis of the jointly encoded representation of the first residual signal and of the second residual signal using a prediction-based multi-channel decoding.

4. The audio decoder according to one of claims 1 to 3, wherein the audio decoder is configured to provide the first residual signal and the second residual signal on the basis of the jointly encoded representation of the first residual signal and of the second residual signal using a residual-signal-assisted multi-channel decoding.

5. The audio decoder according to claim 3, wherein the prediction-based multichannel decoding is configured to evaluate a prediction parameter describing a contribution of a signal component, which is derived using a signal component of a previous frame, to the provision of the residual signals of the current frame.

6. The audio decoder according to one of claims 3 to 5, wherein the prediction-based multi-channel decoding is configured to obtain the first residual signal and the second residual signal on the basis of a downmix signal of the first residual signal and of the second residual signal and on the basis of a common residual signal of the first residual signal and the second residual signal.

7. The audio decoder according to claim 6, wherein the prediction-based multichannel decoding is configured to apply the common residual signal with a first sign, to obtain the first residual signal, and to apply the common residual signal with a second sign, which is opposite to the first sign, to obtain the second residual signal.

8. The audio decoder according to one of claims 1 to 7, wherein the audio decoder is configured to provide the first residual signal and the second residual signal on the basis of the jointly encoded representation of the first residual signal and of the second residual signal using a multi-channel decoding which is operative in a MDCT domain.

9. The audio decoder according to one of claims 1 to 8, wherein the audio decoder is configured to provide the first residual signal and the second residual signal on the basis of the jointly encoded representation of the first residual signal and of the second residual signal using a USAC Complex Stereo Prediction.

10. The audio decoder according to one of claims 1 to 9,

wherein the audio decoder is configured to provide the first audio channel signal and the second audio channel signal on the basis of the first downmix signal and the first residual signal using a parameter-based residual-signal-assisted multichannel decoding; and

wherein the audio decoder is configured to provide the third audio channel signal and the fourth audio channel signal on the basis of the second downmix signal and the second residual signal using a parameter-based residual-signal-assisted multichannel decoding.

1 1 . The audio-decoder according to claim 10, wherein the parameter-based residual- signal-assisted multi-channel decoding is configured to evaluate one or more parameters describing a desired correlation between two channels and/or level differences between two channels in order to provide the two or more audio channel signals on the basis of a respective one of the downmix signals and a corresponding one of the residual signals.

12. The audio decoder according to one of claims 1 to 1 1 , wherein the audio decoder is configure to provide the first audio channel signal and the second audio channel signal on the basis of the first downmix signal and the first residual signal using a residual-signal-assisted multi-channel decoding which is operative in a QMF domain; and

wherein the audio decoder is configured to provide the third audio channel signal and the fourth audio channel signal on the basis of the second downmix signal and the second residual signal using a residual-signal-assisted multi-channel decoding which is operative in the QMF domain.

13. The audio decoder according to one of claims 1 to 12, wherein the audio decoder is configured to provide the first audio channel signal and the second audio channel signal on the basis of the first downmix signal and the first residual signal using a MPEG Surround 2-1 -2 decoding or a Unified Stereo Decoding; and

wherein the audio decoder is configured to provide the third audio channel signal and the fourth audio channel signal on the basis of the second downmix signal and the second residual signal using a MPEG Surround 2-1 -2 decoding or a Unified Stereo Decoding.

14. The audio decoder according to one of claims 1 to 13, wherein the first residual signal and the second residual signal are associated with different horizontal positions of an audio scene or with different azimuth positions of the audio scene.

15. The audio decoder according to one of claims 1 to 14, wherein the first audio channel signal and the second audio channel signal are associated with vertically neighboring positions of an audio scene, and

wherein the third audio channel signal and the fourth audio channel signal are associated with vertically neighboring positions of the audio scene.

16. The audio decoder according to one of claims 1 to 15, wherein the first audio channel signal and the second audio channel signal are associated with a first horizontal position or azimuth position of an audio scene, and

wherein the third audio channel signal and the fourth audio channel signal are associated with a second horizontal position or azimuth position of the audio scene, which is different from the first horizontal position or the first azimuth position.

17. The audio decoder according to one of claims 1 to 16, wherein the first residual signal is associated with a left side of an audio scene, and wherein the second residual signal is associated with a right side of an audio scene.

18. The audio encoder according to claim 17,

wherein the first audio channel signal and the second audio channel signal are associated with the left side of the audio scene, and

wherein the third audio channel signal and the fourth audio channel signal are associated with the right side of the audio scene.

19. The audio decoder according to claim 18, wherein the first audio channel signal is associated with a lower left position of the audio scene,

wherein the second audio channel signal is associated with an upper left position of the audio scene,

wherein the third audio channel signal is associated with a lower right position of the audio scene, and

wherein the fourth audio channel signal is associated with an upper right position of the audio scene.

The audio decoder according to one of claims 1 to 19, wherein the audio decoder is configured to provide the first downmix signal and the second downmix signal on the basis of a jointly-encoded representation of the first downmix signal and the second downmix signal using a multi-channel decoding, wherein the first downmix signal is associated with a left side of an audio scene and the second downmix signal is associated with a right side of the audio scene.

The audio decoder according to one of claims 1 to 20, wherein the audio decoder is configured to provide the first downmix signal and the second downmix signal on the basis of a jointly encoded representation of the first downmix signal and of the second downmix signal using a prediction-based multi-channel decoding.

The audio decoder according to one of claims 1 to 21 , wherein the audio decoder is configured to provide the first downmix signal and the second downmix signal on the basis of a jointly encoded representation of the first downmix signal and of the second downmix signal using a residual-signal-assisted prediction-based multichannel decoding.

The audio decoder according to one of claims 1 to 22, wherein the audio decoder is configured to perform a first multi-channel bandwidth extension (660; 390) on the basis of the first audio channel signal and the third audio channel signal, and wherein the audio decoder is configured to perform a second multi-channel bandwidth extension (670; 1394) on the basis of the second audio channel signal and the fourth audio channel signal.

24. The audio decoder according to claim 23, wherein the audio decoder is configured to perform the first multi-channel bandwidth extension in order to obtain two or more bandwidth-extended audio channel signals (620, 624; 1320, 1324) associated with a first common horizontal plane or a first common elevation of an audio scene on the basis of the first audio channel signal and the third audio channel signal and one or more bandwidth extension parameters ( 338), and

wherein the audio decoder is configured to perform the second multi-channel bandwidth extension in order to obtain two or more bandwidth-extended audio channel signals (622, 626: 1322, 1326) associated with a second common horizontal plane or a second common elevation of the audio scene on the basis of the second audio channel signal and the fourth audio channel signal and one or more bandwidth extension parameters (1358).

25. The audio decoder according to one of claims 1 to 24, wherein the jointly encoded representation of the first residual signal and of the second residual signal comprises a channel pair element comprising a downmix signal of the first and second residual signal and a common residual signal of the first and second residual signal.

26. The audio decoder according to one of claims 1 to 25, wherein the audio decoder is configured to provide the first downmix signal and the second downmix signal on the basis of a jointly-encoded representation of the first downmix signal and the second downmix signal using a multi-channel decoding,

wherein the jointly encoded representation of the first downmix signal and of the second downmix signal comprises a channel pair element comprising a downmix signal of the first and second downmix signal and a common residual signal of the first and second downmix signal.

27. An audio encoder (100; 1 100; 1200; 1500; 2100) for providing an encoded representation (130; 1 144, 1 154; 1220, 1222; 2272, 2282) on the basis of at least four audio channel signals (1 10, 1 12, 1 14, 116; 11 10, 1 1 12, 1114, 1116; 1210, 1212, 1214, 1216; 2216, 2226, 2218, 2228),

wherein the audio encoder is configured to jointly encode at least a first audio channel signal and a second audio channel signal using a residual-signal-assisted multi-channel encoding (140; 1 120; 1230; 2230), to obtain a first downmix signal (120; 1 122; 1232; 2234) and a first residual signal (142; 1 124; 1234; 2236); and

wherein the audio encoder is configured to jointly encode at least a third audio channel signal and a fourth audio channel signal using a residual-signal-assisted multi-channel encoding (150; 1 130; 1240; 2240), to obtain a second downmix signal (122; 1 132; 1242; 2244) and a second residual signal (152; 1 134; 1244; 2246); and

wherein the audio encoder is configured to jointly encode the first residual signal and the second residual signal using a multi-channel encoding (160; 1 150; 1260; 2260), to obtain a jointly encoded representation (130; 1 154; 1262; 2264) of the residual signals.

28. The audio encoder according to claim 27, wherein the audio encoder is configured to jointly encode the first downmix signal and the second downmix signal using a multi-channel encoding (1 140; 1250; 2250), to obtain a jointly encoded representation (1 144; 1252; 2254) of the downmix signals.

29. The audio encoder according to claim 28, wherein the audio encoder is configured to jointly encode the first residual signal and the second residual signal using a prediction-based multi-channel encoding, and

wherein the audio encoder is configured to jointly encode the first downmix signal and the second downmix signal using a prediction-based multi-channel encoding.

30. The audio encoder according to one of claims 27 to 29, wherein the audio encoder is configured to jointly encode at least the first audio channel signal and the second audio channel signal using a parameter-based residual-signal-assisted multi-channel encoding, and

wherein the audio encoder is configured to jointly encode at least the third audio channel signal and the fourth audio channel signal using a parameter-based residual-signal-assisted multi-channel encoding.

31. The audio encoder according to one of claims 27 to 30, wherein the first audio channel signal and the second audio channel signal are associated with vertically neighboring positions of an audio scene, and

wherein the third audio channel signal and the fourth audio channel signal are associated with vertically neighboring positions of the audio scene.

32. The audio encoder according to one of claims 27 to 31 , wherein the first audio channel signal and the second audio channel signal are associated with a first horizontal position or azimuth position of an audio scene, and

wherein the third audio channel signal and the fourth audio channel signal are associated with a second horizontal position or azimuth position of the audio scene, which is different from the first horizontal position or azimuth position.

33. The audio encoder according to one of claims 27 to 32, wherein the first residual signal is associated with a left side of an audio scene, and wherein the second residual signal is associated with a right side of the audio scene.

34. The audio encoder according to claim 33,

wherein the first audio channel signal and the second audio channel signal are associated with the left side of the audio scene, and

wherein the third audio channel signal and the fourth audio channel signal are associated with the right side of the audio scene.

35. The audio decoder according to claim 34, wherein the first audio channel signal is associated with a lower left position of the audio scene,

wherein the second audio channel signal is associated with an upper left position of the audio scene,

wherein the third audio channel signal is associated with a lower right position of the audio scene, and

wherein the fourth audio channel signal is associated with an upper right position of the audio scene.

The audio encoder according to one of claims 27 to 35, wherein the audio encoder is configured to jointly encode the first downmix signal and the second downmix signal using a multi-channel encoding, to obtain a jointly encoded representation of the downmix signals, wherein the first downmix signal is associated with a left side of an audio scene and the second downmix signal is associated with a right side of the audio scene.

A method (800) for providing at least four audio channel signals on the basis of a encoded representation, the method comprising:

providing (810) a first residual signal and a second residual signal on the basis of a jointly encoded representation of the first residual signal and the second residual signal using a multi-channel decoding;

providing (820) a first audio channel signal and a second audio channel signal on the basis of a first downmix signal and the first residual signal using a residual- signal-assisted multi-channel decoding; and

providing (830) a third audio channel signal and a fourth audio channel signal on the basis of a second downmix signal and the second residual signal using a residual-signal-assisted multi-channel decoding.

38. A method (700) for providing an encoded representation on the basis of at least four audio channel signals, the method comprising:

jointly encoding (710) at least a first audio channel signal and a second audio channel signal using a residual-signal assisted multi-channel encoding, to obtain a first downmix signal and a first residual signal;

jointly encoding (720) at least a third audio channel signal and a fourth audio channel signal using a residual-signal-assisted multi-channel encoding, to obtain a second downmix signal and a second residual signal; and

jointly encoding (730) the first residual signal and the second residual signal using a multi-channel encoding, to obtain an encoded representation of the residual signals.

A computer program for performing the method according to claim 37 or 38 when the computer program runs on a computer.

Documents

Application Documents

# Name Date
1 Form 5 [12-01-2016(online)].pdf 2016-01-12
2 Form 3 [12-01-2016(online)].pdf 2016-01-12
3 Form 20 [12-01-2016(online)].pdf 2016-01-12
4 Drawing [12-01-2016(online)].pdf 2016-01-12
5 Description(Complete) [12-01-2016(online)].pdf 2016-01-12
6 201637001048-(03-03-2016)-FORM-1.pdf 2016-03-03
7 201637001048-(03-03-2016)-CORRESPONDENCE.pdf 2016-03-03
8 Other Patent Document [06-07-2016(online)].pdf 2016-07-06
9 Other Patent Document [22-11-2016(online)].pdf 2016-11-22
10 Other Patent Document [31-01-2017(online)].pdf 2017-01-31
11 Other Patent Document [13-04-2017(online)].pdf 2017-04-13
12 Information under section 8(2) [15-06-2017(online)].pdf 2017-06-15
13 201637001048-Information under section 8(2) (MANDATORY) [17-07-2017(online)].pdf 2017-07-17
14 201637001048-Information under section 8(2) (MANDATORY) [12-01-2018(online)].pdf 2018-01-12
15 201637001048-Information under section 8(2) (MANDATORY) [22-01-2018(online)].pdf 2018-01-22
16 201637001048-Information under section 8(2) (MANDATORY) [14-03-2018(online)].pdf 2018-03-14
17 201637001048-Information under section 8(2) (MANDATORY) [09-07-2018(online)].pdf 2018-07-09
18 201637001048-Information under section 8(2) (MANDATORY) [18-10-2018(online)].pdf 2018-10-18
19 201637001048-Information under section 8(2) (MANDATORY) [11-01-2019(online)].pdf 2019-01-11
20 201637001048-Information under section 8(2) (MANDATORY) [06-03-2019(online)].pdf 2019-03-06
21 201637001048-Information under section 8(2) (MANDATORY) [11-07-2019(online)].pdf 2019-07-11
22 201637001048-Information under section 8(2) (MANDATORY) [24-07-2019(online)].pdf 2019-07-24
23 201637001048-FER.pdf 2019-07-25
24 201637001048-Information under section 8(2) (MANDATORY) [21-10-2019(online)].pdf 2019-10-21
25 201637001048-Information under section 8(2) (MANDATORY) [02-12-2019(online)].pdf 2019-12-02
26 201637001048-FORM 4(ii) [22-01-2020(online)].pdf 2020-01-22
27 201637001048-OTHERS [29-04-2020(online)].pdf 2020-04-29
28 201637001048-FER_SER_REPLY [29-04-2020(online)].pdf 2020-04-29
29 201637001048-DRAWING [29-04-2020(online)].pdf 2020-04-29
30 201637001048-CORRESPONDENCE [29-04-2020(online)].pdf 2020-04-29
31 201637001048-COMPLETE SPECIFICATION [29-04-2020(online)].pdf 2020-04-29
32 201637001048-CLAIMS [29-04-2020(online)].pdf 2020-04-29
33 201637001048-Information under section 8(2) [05-10-2020(online)].pdf 2020-10-05
34 201637001048-Information under section 8(2) [08-03-2021(online)].pdf 2021-03-08
35 201637001048-FORM 3 [10-04-2021(online)].pdf 2021-04-10
36 201637001048-FORM 3 [19-10-2021(online)].pdf 2021-10-19
37 201637001048-Information under section 8(2) [03-12-2021(online)].pdf 2021-12-03
38 201637001048-FORM 3 [06-04-2022(online)].pdf 2022-04-06
39 201637001048-Information under section 8(2) [09-06-2022(online)].pdf 2022-06-09
40 201637001048-FORM 3 [12-10-2022(online)].pdf 2022-10-12
41 201637001048-Information under section 8(2) [10-04-2023(online)].pdf 2023-04-10
42 201637001048-FORM 3 [10-04-2023(online)].pdf 2023-04-10
43 201637001048-Information under section 8(2) [03-07-2023(online)].pdf 2023-07-03
44 201637001048-Information under section 8(2) [11-10-2023(online)].pdf 2023-10-11
45 201637001048-FORM 3 [11-10-2023(online)].pdf 2023-10-11
46 201637001048-US(14)-HearingNotice-(HearingDate-04-03-2024).pdf 2024-02-17
47 201637001048-Information under section 8(2) [20-02-2024(online)].pdf 2024-02-20
48 201637001048-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [28-02-2024(online)].pdf 2024-02-28
49 201637001048-FORM-26 [01-03-2024(online)].pdf 2024-03-01
50 201637001048-FORM 3 [01-03-2024(online)].pdf 2024-03-01
51 201637001048-US(14)-ExtendedHearingNotice-(HearingDate-16-05-2024).pdf 2024-05-03
52 201637001048-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [13-05-2024(online)].pdf 2024-05-13

Search Strategy

1 201637001048_23-07-2019.pdf