Audio Encoder, Audio Decoder, Methods And Computer Program Using

< Back

Audio Encoder, Audio Decoder, Methods And Computer Program Using Jointly Encoded Residual Signals

Abstract: An audio decoder for providing at least four audio channel signals on the basis of an encoded representation is configured to provide a first residual signal and a second residual signal on the basis of a jointly encoded representation of the first residual signal and of the second residual signal using a multi-channel decoding. The audio decoder is configured to provide a first audio channel signal and a second audio channel signal on the basis of a first downmix signal and the first residual signal using a residual-signal- assisted multi-channel decoding. The audio decoder is configured to provide a third audio channel signal and a fourth audio channel signal on the basis of a second downmix signal and the second residual signal using a residual-signal-assisted multi-channel decoding. An audio encoder is based on corresponding considerations.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

02 May 2025

Publication Number

28/2025

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Hansastrasse 27c 80686 München Germany

Inventors

1. DICK, Sascha

Schupferstrasse 49 90482 Nürnberg Germany

2. ERTEL, Christian

Nürnberger Str. 24 90542 Eckental Germany

3. HELMRICH, Christian

Hauptstrasse 68 91054 Erlangen Germany

4. HILPERT, Johannes

Herrnhüttestrasse 46 90411 Nürnberg Germany

5. HÖLZER, Andreas

Obere Karlstrasse 23 91054 Erlangen Germany

6. KUNTZ, Achim

Weiherstrasse 12 91334 Hemhofen Germany

Specification

Technical Field
Embodiments according to the invontion are related to an audio decoder for providing at
least four audio channel signals on tho basis of an encoded reprosentation.
Further embodiments according to the invention are related to an audio encoder for
providing an encoded representation on the basis of at least four audio channel signals.
Further embodiments according to the invention are related to a method for providing at
least four audio chann€l signals on the basie of an encoded representation and to a
method for providing an encoded representation on the basis of at l€ast four audio
channel signals.
Further embodiments according to the invontion are related to a computer program for
performing one of said methods.
Generally speaking, embodiments according the invention are rolated to a joint coding of
n channels.
Backoround of the lnvention
tn recent years, a demand for storage and transmission of audio contents has been
steadily increasing. Moreover, the quality requirements for the storage and transmission of
audto contents has also been increasing steadily. Accordingly, the concepts for the
encoding and decoding of audio contont have been enhanced. For example, the so-called
,,advanced audio coding"(AAc) has been developed, which is described, for example, in
the lnternational Standard ISO/IEC 1 38'18-7:2003. Moreover, some spatial extensions
have been created, like, for example, the so-called "MPEG Surround"-concept which is
described, for example, in the international standard lso/lEc 23oo3-'1.2007. Moreover,
addrtional improvements for the encoding and decoding of Spatial information of audio
35
2
10
signals are described in the international standard ISO/IEC 23003-2:2010, which relates
to the so-called spatial audio obioct coding (SAOC).
Moreover, a fl€xibl€ audio €ncoding/decoding concept, which providos the possibility to
encode both general audio signels and speech signals with good coding efficiency and to
handle multi-channel audio signals, is defined in the international standard lso/lEc
23OOg-3:2012, which describes the so-callod "unified Bpeech and audio coding" (USAC)
concept.
ln MPEG USAC t1l, joint stereo coding of two chann€ls is performed using complex
prediction, MPS 2-1-1 or unified stereo with band-limited or full-band r€sidual signals'
MPEG surround [2] hierarchically combines OTT and TTT boxes for ioint coding of
multichannel audio with or without transmission of residual signals'
However,thereisadesiretoprovidoanevenmoreedvancedconceptforaneffici€nt
encoding and decoding of thr€e'dimensional audio scene8'
An embodiment according to the invention creates an audio decoder for providing at least
fouraudiochannelsignalsonthebasisofanencodedrepresentation.Theaudiodecoder
isconfiguredtoprovideafirstresidualsignalandasecondresidualsignalonthebasisof
a jointly encoded representation of the first residual signal and of the second residual
signalusingamulti.channeldecoding.Theaudiodecoderisalsoconfiguredtoprovidea
first audio channel signal and a second audio channel signal on the basis of a first
downmixSignalandthefirstresidualsignalUsingaresidual.Signal-assistedmulti.channel
decoding.Theaudiodecoderisalsoconfiguredtoprovideathirdaudiochannelsignal
and a fourth audio channel signal on the basis of a second downmix signal and the
second residual signal using a residual-signal-assisted multi-channel decoding'
Thisembodimentaccordingtotheinventionisbasedonthefindingthatdependencies
between four or even more audio channel signals can be exploited by deriving two
residual signals, each of which is used to provide two or more audio channel signals using
a residual-signaFassisted multi-channel decoding, from a jointly-encoded representation
15
25
20
35
3
10
20
25
of the residual signals. ln other words, it has been found thore are typically som€
similarities of said residual signals, such that a bit rate for encoding said residual signals'
whichhelptoimproveanaudioqualitywhendecodingtheatleastfouraudiochannel
signals,canbereducedbyderivingth6tworegidualsignalEfromajointly-encoded
representationusingamulti.channeldecoding,whichexploitssimilaritiesand/or
dependencies between the residual signals'
ln a preferred embodiment, the audio decoder i8 configured to provid€ th8 first downmix
signalandtheseconddownmixeignalonth6baEisofajointly-encodedrepresentationof
the first downmix signal and th€ second downmix signal using a multi-channel decoding'
Accordingly,ahierarchicalstructureofanaudiodecoderiscreated,whereinboththe
downmixsignalsandtheresidua|signals,whichareusedintheresidual.signal-assisted
multi.channeldecodingforprovidingtheatleastfouraudiochannelsignals,arederived
usingseparatemulti-channeldecoding.Suchaconceptisparticularlyefficient,sincethe
two downmix signals typically comprise EimilaritieE' which can be exploited in a multichanner encodingidecoding, and since the two residuar signars typicaly arso compriE.
similarities, which can be exploited in a multFchannel encoding/decoding. Thus, a good
coding efficiency can typically be obtained using this concept'
ln a preferred embodiment, the audio decoder is configured to provide the first residual
signalandthesecondresidualsignalonthebasisofthelointly-encodedrepresentationof
the first residual signal and of the second residual signal using a prediction-based multichanneldecoding.Theusageofaprediction.basedmulti-channeldecodingtypically
brings along a comparatively good reconstruction quality for the residual signals This is'
for example, advantageous if the first residual signal represents a left side of an audio
scene and the second residual signal represents a right side of the audio scene' because
thehumanhearingistypicallycomparativelysensitivefordifferencesbetweentheleftand
right sides of the audio scene'
ln a preferred embodiment, the audio decoder is configured to provide the first residual
signal and the second residual signal on the basis of the jointly-encoded representation of
thefirstresidualsignalandofthesecondresidualsignalusingaresidual.SignaFassisted
multi-channel decoding. lt has been found that a particularly good quality of the first and
secondresidualsignalcanbeachievedifthefirstresidualsignalandthesecondresidual
signal are provided using a multi-channel decoding' which in turn receives a residual
signal (and typically also a downmix signal' which combines the first residual signal and
30
4
10
15
thesecondresidualsignal).Thus,thereisacascadingofdecodingstages,whereintwo
residual signals (the first residual signal, which is used for providing the first audio channel
signal and the second audio channel signal, and the second residual signal' which is used
forprovidingthethirdaudiochann€lsignalandthefourthaudiochannelsignal),are
providedonthebasisofaninputdownmlxEignalendaninputrogidualsignal,wherointho
latter may also be designated as a common residual signal) of the first residual signal and
thesecondresidualsignal).ThUg,thefirstresidualsignalandthesecondresidualgignal
are actually "intermediate" residual signals, which are derived using a multi-channel
decoding from a corresponding downmix signal and a corresponding "common" residual
signal.
lnapreferredembodiment,theprediction.ba8edmulti.channeldecodingisconfiguredto
evaluate a prediction parameter describing a contribution of a signal component' which is
derived using a signal component of a pr€vious frame' to the provision of tho residual
signals (i.e., the first residual signal and the second residual signal) of a current frame'
usage of such a prediction-based multi-channel decoding brings along a particularly good
quality of the residual stgnals (first roEidual signal and second residual signal)'
lnapreferredembodiment'theprediction-basedmulti-channeldecodingisconfiguredto
obtainthefirstresidualsignalandthesecondresidualsignalonthebasisofa
(corresponding) downmix signal and a (corresponding) "common" residual signal' wherein
the prediction-based multi-channel decoding is configured to apply the common residual
signar with a first sign, to obtain the first residual signal, and to appry the common residual
signalwithasecondsign,whichlsoppositetothefirstsign'toobtainthesecondresidual
signal.lthasbeenfoundthatsuchaprediction-basedmulti.channeldecodingbrings
alongagoodefficiencyforreconstructingthefirstresidualsignalandthesecondresidual
signal.
ln a preferred embodiment, the audio decoder is configured to provide the first residual
signalandthesecondresidualsignalonthebasisofthejointly-encodedrepresentationof
the first residual signal and of the second residual signal using a multi-channel decoding
WhichiSoperativeinthemodified-discrete.cosine.transformdomain(MDCTdomain)'lt
has been found that such a concept can be implemented in an efficient manner' since an
audiodecoding,whichmaybeusedtoprovidetheiointly.encodedrepresentationofthe
firstresidualsignalandofthesecondresidualsignal,preferablyoperatesintheMDCT
domain. Accordingly, intermediate transformations can be avoided by applying the multit5
35
5
10
15
channel decoding for providing the first residual signal and the second residual signal in
the MDCT domain.
ln a prefened embodiment, the audio docoder is configured to provide the first residual
signat and the second residual signel on the basis of the jointly'encodod r€presentation of
thefirstresidualsignalandofth€secondresidualsignalusingaUSACcomplexStoreo
prediction(forexample,aSmentionedintheabovereferencedUsAcstandard).lthas
beenfoundthatsuchaUSACcomplexstereoprediclionbringsalonggoodreSUltSforthe
decoding of the first residual signal and of the second residual signal. Moreover, usage of
theUSACcompl€xsteroopredictionforthedecodingofthefirgtresidualsignalandthe
Secondresidualsignalalsoallowsforssimpleimplemantationoftheconceptusing
decodingblockswhicharealreadyavailableintheUnifi6d.sp6ech.and.audiocoding
(USAC). Accordingly, a unified-speech-and-audio coding decoder may be easily
reconfigured to perform the decoding concept discussed here'
ln a preferred embodiment, the audio decoder is configured to provide the first audio
channel signal and the second audio channel signal on the basis of the first downmix
signalandthefirstresidualsignalusingaparameter-basedresidual.signal.assistedmulti.
channel decoding. Similarly' the audio decoder is configured to provide the third audio
channel signal and the fourth audio channel signal on the basis of the second downmix
signalandthesecondresiclualsignalUsingaparameter-basedresidual.SignaFassisted
multi-channeldecoding,lthasbe€nfoundthatsuchamulti-channeldecodingiswell.
suitedforthederivationoftheaudiochannelsignalsonthebasisofthefirstdownmix
signal, the first residual signal, the second downmix signal and the second residual signal'
Moreover, it has been found that such a parameter-based residual-signal-assisted multichanneldecodingcanbeimplementedwithsmalleffortusingprocessingblockswhichare
already present in typical multFchannel audio decoders'
lnapreferredembodiment,theparameter-basedresidual-signal-assistedmulti-channel
decoding is configured to evaluate one or more parameters describing a desired
correlationbetweentwochannelsand/orleveldifferencesbetweentwochannelsinorder
to provide the two or more audio channel signals on the basis of a respective downmix
signal and a respective corresponding residual signal' lt has been found that such a
parameter-basedresidual-signal.assistedmulti.channeldecodingiswelladaptedforthe
secondstageofacascadedmulti-channeldecoding(wherein,preferably,thefirstand
)6
6
10
15
second downmix signals and the first and second residual signals are provided using a
prediction-based multi-channel decoding)
ln a preferred embodimont, the audio docoder is configured to provide the first audio
channel signal and the second audio channel signal on lhe basis of the first downmix
signal and the first residual signal using a re8idual-signal'assisted multi-channel decoding
whichisoperativeintheQMFdomain.Similarly,theaudiodecoderispreferably
configured to provide the third audio channel signal and the fourth audio channel signal on
the basis of thE second downmix signal and th6 second residual signal using a residual'
signal-assisted multi-channel decoding which is operative in the QMF domain'
Accordingly'thesecondStagBofthehigrarchica|multi.channeldecodlngiEoperativein
the QMF domain, which is well adapted to typical post-processing' which is also often
performed in the QMF domain' such that intermediate conversions may be avoided'
ln a proferred ombodim€nt, the audio decoder is configured to provide the first audio
channel signal and the second audio channel signal on the basis of the first downmix
signal and the first residual signal using an MPEG Surround 2-1-2 decoding or a unified
stereo decoding. similarly, the audio decoder is proferably configured to provide the third
audio channel signal and the fourth audio channel signal on the basis of the second
downmixsignalandtheSecondresidualslgnalusingaMPEGsurround2-1-2decodingor
aunifiedStereodecoding.lthasbeenfoundthatSuchdecodingconceptsareparticularly
well-suited for the second stage of a hierarchical decoding'
lnapreferredembodiment,thefirstresidualsignalandthesecondresidualsignalare
associatedwithdifferenthorizontalpositions(or,equivalently,azimuth-positions)ofan
audio scene. lt has been found that it is particularly advantageous to separate residual
Signals,whichareassociatedwithdifferenthorizontalpositions(orazimuthpositions).ina
firstStageofthehierarchicalmulti.channelprocessingbecauseaparticularlygood
hearingimpressioncanbeobtainediftheperceptuallyimportantlefurightSeparationiS
performed in a first stage of the hierarchical multi-channel decoding'
ln a preferred embodiment, the first audio channel signal and the second channel signal
areassociatedwithverticallyneighboringpositionsoftheaudioscene(or'equivalently'
with neighboring elevation positions of the audio scene)' Also' the third audio channel
signalandthefourthaudiochannelsignalarepreferablyassociatedwithVertically
neighboringpositionsoftheaudioScene(or,equivalently,withneighboringelevation
20
25
35
7
10
15
positions of the audio scene). lt has been found that good decoding results can be
achieved if the separation between upper and lower signals is performed in a second
stage of the hierarchical audio decoding (which typically comprises a somewhat smaller
separation accuracy than tho first stage), since the human auditory sygtem i3 lsso
sensitive with respect to a v€rtical position of an sudio source when compared to a
horizontal position of the audio source.
ln a preferred embodiment, the first audio channal signal and the second audio channel
signal are associated wirh a first horizontal position of an audio Scene (or, equivalently,
azimuth position), and the third audio channel signal and the fourth audio channel signal
areassociatedwithaSecondhorizontalpositionoftheaudioscene(or,equivalently,
azimuth position), which iS differcnt from the first horizontal position (or, equivalently,
azimuth position).
Preferably,thefirstresidualsignalisassociatedwithaleftsideofanaudioscene,andthe
secondresidualsignalisassociatedwitharightsideoftheaudioscEne'Accordingly,th€
left-rightseparationiBperformodinsfirststageofthchierarchicalaudiodecoding,
ln a preferred embodiment, the first audio channel signal and the second audio channel
signal are associated with the left side of the audio scene' and the third audio channel
signal and the fourth audio channel signal are associated with a right side of the audio
scene.
lnanotherpreferredembodiment,thefirstaudiochannelsignalisassociatedwithalow6r
left side of the audio scene, the second audio channel Signal is associated with an upper
left side of the audio scene, the third audio channel signal is associated with a lower right
side of the audio scene, and the fourth audio channel signal is associated with an upper
rightsideoftheaudioscene.Suchanassociationoftheaudiochannelsignalsbrings
along particularly good coding results'
ln a preferred embodiment, the audio decoder is configured to provide the first downmix
signalandtheseconddownmixsignalonthebasisofajointly.encodedrepresentationof
thefirstdownmixSignalandtheseconddownmixSignalusingamultFchanneldecoding,
whereinthefirstdownmixS|gnalisassociatedwiththeleftsideofanaudiosceneandthe
seconddownmixsignalisassociatedwiththerightSideoftheaudioScene.lthasbeen
found that the downmix signals can also be encoded with good coding efficiency using a
20
30
35
8
10
15
multi-channel coding, even if the downmix signals are associated with different sides of
the audio scene.
tn a preferred embodiment, the audio decoder ir configured to provide th€ first downmix
signalandthesoconddownmixsignalonth6basisofth6jointly-encodedrepresentation
of the first downmix signal and of the second downmix signal using a prediction'based
multi-channel decoding or evon uging a rosidual-signal-assisted prediction-based multichanneldecoding.lthasbeenfoundthattheusageofSUchmulti.channeldecoding
conceptsprovidesforaparticularlygooddeoodingr6sult.Also,existingdecoding
functions can be reused in some audio decoder8'
ln a preferred embodiment, the audio decoder is configured to perform a first multi'
channelbandwidthextensiononthebasisofthefirstaudiochannelsignalandthethird
audiochannelsignal'Also,theaudiodecodermaybeconfiguredtoperformasecond
(typically separate) multi-channel bandwidth extension on the basis of the second audio
channel signal and the fourth audio channcl signal lt haE bccn found that it i8
advantageouE to perform a possible bandwidth extension on tho basis of two audio
channel signals which are asBociated with different sides of an audio scene (wherein
differentresidualsignalsaretypicallyassociatedwithdifferentsidesoftheaudioscene).
ln a preferred embodiment, the audio decoder is configured to perform the first multichannel bandwidth extension in order to obtain two or more bandwidth-extended audio
channel signals associated with a first common horizontal plane (or, equivalently' with a
frrst common elevation) of an audio scene on the basis of the first audio channel signal
and the third audio channel signal and one or more bandwidth extension parameters'
Moreover'theaudiodecoderispreferablyconfiguredtoperformtheSecondmulti.channel
bandwidth extension in order to obtain two or more bandwidth-extended audio channel
signals associated with a second common horizontal plane (or, equivalently, a second
common elevation) of the audio scene on the basis of the second audio channel signal
andthefourthaudiochannelsignalandoneormorebandwidthextensionparameters.lt
has been found that such a decoding scheme results in good audio quality' since the
multi-channelbandwidthextensioncanconsiderstereocharacteristics,whichare
important for the hearing impression, in such an arrangement'
lnapreferredembodiment,theiointly-encodedrepresentationofthefirstresidualsignal
andofthesecondresidualsignalcomprisesachannelpairelementcomprisinga
20
,)E
30
9
10
downmix signal of the first and s€cond rosidual Bignal and a common residual signal of
the first and second residual signal. lt hag been found that the encoding of the downmix
signal of the first and second residuel signal and of the common residual signal of the firlt
andSecondresidualsignalusingachannelpairelementisadvantageoussinc€the
downmix signal of the first and second residual signal and the common residual signal of
the first and second residual signal typically share a number of characteristics'
Accordingly, the usage of a channel pair €lement typically reduces a signaling overhead
and consequently allows for an efficient encoding'
lnanotherpr€ferredembodiment,theaudiodecoderisconfiguredtoprovidethefirst
downmixsignalandtheSeconddownmixsignalonthebasisofajointly-encoded
representation of the first downmix signal and the second downmix signal using a multichanneldecoding,whereintheiointly.encodedrepresentationofthefirstdownmixSignal
andoftheseconddownmixEignalcomprisesachannelpaireloment.thechannelpair
element comprising a downmix signal of the firct and aecond downmix signal and a
common residual signal of the first and second downmix signal This embodiment ls
based on the same considerations as the embodiment described before'
Anotherembodimentaccordingtotheinventioncreatesanaudioencoderforprovidingan
encoded representation on the basis of at least four audio channel signals' The audio
encoder is configured to.iointly encode at least a first audio channel signal and a second
audio channel signal using a residual-signal-assisted multFchannel encoding' to obtain a
firstdownmixsignalandafirstresidualsignal.Theaudioencoderisconfiguredtojointly
encode at least a third audio channel signal and a fourth audio channel signal using a
residual-signal-assisted multi-channel encoding' to obtain a second downmix signal and a
secondresidualsignal.Moreover'theaudioencoderisconfiguredto.iointlyencodethe
first residual signal and the second residual signal using a multi-channel encoding' to
obtain a iointly-encoded representation of the residual signals This audio encoder is
based on the same considerations as the above-described audio decoder'
Moreover, optional improvements of this audio encoder' and preferred conflgurations of
theaudioencoder,aresubstantiallyinparallelwithimprovementsandpreferred
configurations of the audio decoder discussed above. Accordingry, reference is made to
the above discussion
15
20
25
30
10
Another embodiment according to the invention creates a method for providing at least
four audio channel signals on the basis of an encoded representation, which substantially
performs the functionality of the audio encoder describod above, and which can be
supplemented by any of tho featur€e and functionalitiss di8cuEBed abov€'
Another embodiment according to the invention creates a method for providing an
encodedrepresentationonthebasisofatloastfouraudiochannelsignals,which
substantially fulfills the functionality of the audio decoder described above'
1O Another embodiment eccording to thB invention crgates a computer program for
performing the m€thods mentioned above'
Brief DescriPtion of the Fiqures
15 Embodiments according to the present invention will subsequently be described taking
reference to the enclosed figure8 in which:
Fig.lshowsablockschcmaticdiagramofanaudioencoder'accordingtoen
embodiment of the present invention;
20
Ftg.2 shows a block schematic diagram of an audio decoder' according to an
embodiment of the present invention;
Fig.3showsablockschematicdiagramofanaudiodecoder,accordingto
25 another embodiment ofthe present invention;
Fig. 4 shows a block schematic diagram of an audio encoder' according to an
embodiment of the present invention;
30Fig'5showsablockschematicdiagramofanaudiodEcoder,accordingtoan
embodiment of the present invention;
Fig.6showsablockschematicdiagramofanaudiodecoder,accordingto
another embodiment of the present invention;
35
11
'10
15
Fig.7
Fig. I
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. '13
Fig. 14a
Fig. 14b
Fig. 15
Fig. 16
shows a flowchart of a method for providing an encoded representation on
the basis of at least four audio channel signals' according to an
embodiment of the present invention;
shows a flowchart of a method for providing at least four audio channol
signals on the basis of an encoded representation, according to an
embodiment of the invention;
shows as flowchart of a method for providing an encoded representation on
thebasisofatleastfouraudiochannelsignals'accordingtoan
embodiment of th€ invention; and
shows a flowchart of a method for providing at least four audio channel
signals on the basis of an encoded representation' according to an
embodiment of the invention;
shows a block sch€matic diagram of an audio encodcr' eccording to an
embodiment of tho invention;
showsablockschematicdiagramofanaudioencoder,accordingto
another embodiment of the invention;
showsablockschematicdiagramofanaudiodecoder,accordingtoan
embodiment of the invention;
shows a syntax representation of a bitstream' which can be used with the
audio encoder according to Fig' 13;
shows a table representation of different values of the parameter qcelndex;
shows a block schematic diagram of a 3D audio encoder in which the
concepts according to the present invention can be used;
shows a block schematic diagram of a 3D audio decoder in which the
concepts according to the present invention can be used; and
35
12
Fig. 17
Fig. 18
Fig. 19
10 Fig. 20
Fig.21
shows a block schematic diagram of a format converter'
shows a graphical reprosentation of a topological structure of a Quad
Channel Elemont (QCE), according to an embodiment of the present
invention;
shows a block schematic diagram of an audio decoder, according to an
embodiment of the presont invention;
ShowsadetailedblockEohematlodlegremofaocED6coder,accordingto
an embodiment of the present invEntion; and
shows a detailed block schematic diagram of a Quad Channel Encoder'
according to an embodiment of the prosent invention'
Detailed Descriotion of the Embodiment8
'15
20
25
1. Audio encoder accordinq to Fio. 1
Fig.lshowsablockschematicdiagramofanaudioencoder,whichisdesignatedinits
entirety with 100. The audio encoder 1OO is configured to provide an encoded
representationonthebasisofatleastfouraudiochannelsignals.Theaudioencoder,t00
isconfiguredtoreceiveafirstaudiochannelsignalll0,asecondaudiochannelsignal
,ll2,athirdaudiochannelsignalll4andafourthaudiochannelsignalll6.Moreover,
the audio encoder 1oo is configured to provide an encoded representation of a first
downmix signal 120 and of a second downmix signal 122' as well as a jointly-encoded
representation 1 30 of residual signals. The audio encoder 100 comprises a residualsignal-assisted multi-channel encoder 140, which is configured to jointly-encode the first
audiochannelsignall'lOandthesecondaudiochannelsignalll2usingaresidualSignal.assisted multi-channel encoding, to obtain the first downmix Signal 120 and a first
residualsignall42.:heaudiosignalencoderlooalsocomprisesaresidual.signa|-
assisted multi-channel encoder 150, which is configured to jointly-encode at least the third
audio channel signal 114 and the fourth audio channel signal 116 using a residual-signalassistedmulti.channelencoding,toobtaintheseconddownmixsignall22andasecond
residual signal 152. The audio decoder 100 also comprises a multi-channel encoder 160,
which is configured to jointly encode the first residual signal 142 and the second residual
13
10
15
signal 152 using a multi-channel encoding, to obtain the jointly encoded ropresentation
130 of the residual aignals 142, 152.
Regarding the functionality of the audio encoder 100, it should b6 noted thet the audio
encoderl00performsahierarchicalencoding,wherginthefirstaudiochannelsignalll0
andthesecondaudiochannelsignalll2arejointly-encodedusingtheresidual-signa|-
assistedmulti-channelencodingl40,whereinboththefirstdownmixsignall20andth6
firstresidualsignal142areprovided'Thefirtresidualsignall42may,forexample,
describe differences betwe€n th6 first audio ohannel signal 110 and the second audio
channel signal 112, andtot may describe some or any signal foatur€s which cannot be
represented by the first downmix signal 120 and optional parameters' which may be
provided by the residuaFsignal-assi8ted multi-channel encoder 140' ln other words' the
f rrst residual signal 142 may be a residual signal which allows for a rsfinement of a
decodingr€sultwhichmaybeobtainedonthebasisofthefirstdownmixsignall20and
anypossibleparameterswhichmaybeprovidedbytheresidual.signal.assistedmultichannelencoderl40,Forexampl€'thefirstresiduelsignal.t42mayallowatleastfora
partial waveform reconstruction of the first audio channel signal '1 10 and of the second
audio channel signal 112 at the eide of an audio decoder when compared to a mere
reconstruction of highlevel signal characteristics (like' for example' correlation
characteristics, covariance characteristics, level difference characteristics, and the like)'
Similarly,theresidUa|-signal.assistedmulti.channelencoder,tS0providesboththe
second downmix signal 122 and the second residual signal 152 on the basis of the third
audio channel signal 114 and the fourth audio channel signal 1'16, such that the second
residual signal allows for a refinement of a signal roconstruction of the third audio channel
signal 114 and of the fourth audio channel signal 116 at the side of an audio decoder' The
Secondresidualsignall52mayconsequentlyservetheSamefunctionalityaSthefirst
residualSignall42.However,iftheaudiochannelSignalsl,l0,ll2,ll4,116comprise
some correlation, the first residual signal 142 and the second residual signal 152 arc
typically also correlated to some degree. Accordingly, the joint encoding of the first
residual signal 142 and of the second residual signal 152 using the multi-channel encoder
l60typicallycomprisesahighefficiencySinceamulti-channelencodingofcorrelated
Signalstypicallyreducesthebitratebyexploitingthedependencies'consequently,the
first residual signal 142 and the second residual signal '152 can be encoded with good
precision while keeping the bitrate of the jointly-encoded representation 130 of the
residual signals reasonably small.
25
30
,)<
14
10
15
20
)4
To summarize, the embodiment according to Fig. 1 provides a hierarchical multi-channel
encoding, wherein a good reproduction quality can be achievEd by using the rosidual'
signal-assisted multi-channel encoders 140, 150, and wher€in a bitrat€ demand can bo
kept moderate by jointly-encoding a first residual signal 142 and a second residual signel
152
Further optional improvement of the audio encoder 100 is possible Some of thege
improvements will be described taking refor€nco to FigE' 4, 'l l and l2 However' it should
be noted that the audio encodor 1oo can also b6 edapted in parallel with the audio
decoders described herein, wherein the functionality of the audio encoder is typically
rnverse to the functionality of the audio decoder'
2. Audio decoder accordino to Fio. 2
Fig. 2 shows a block schematic diagram of an audio decoder, which is d€signated in ite
entirety with 200.
The audio decoder 200 is configured to receive an encoded representation which
comprisesajointly-encodedrepresentation2loolafirstresidualsignalandasecond
residual signal. The audio decoder 2oo also receives a representation of a first downmix
signat 212 and of a second downmix signal 2'14. The audio decoder 200 is configured to
provide a first audio channel signal 220, a second audio channel signal 222, a third audio
channel signal 224 and a fourth audio channel signal 226'
The audio decoder 200 comprises a multi-channel decoder 230, which is configured to
provide a first residual signal 232 and a second residual signal 234 on the basis of the
join y-encoded representation 210 of the first residual signal 232 and of the second
residual signal 234. The audio decoder 200 also comprises a (first) residual-signalassisted multFchannel decoder 240 which is configured to provide the first audio channel
signal 220 and the second audio channel signal 222 on the basis of the first downmix
signal 212 and the first residual signal 232 using a multi-channel decoding. The audio
decoder 200 also comprises a (second) residual-signal-assisted multi-channel decoder
250, which is configured to provide the third audio channel signal 224 and the fourth audio
channel signal 226 on the basis of the second downmix signal 214 and the second
residual signal 234.
30
,)A
15
10
Regarding the functionality of the audio decoder 200, it should be noted that the audio
signal decoder 200 provides the first audio channel signal 220 and the second audio
channel signal 222 on lhe basis of a (first) common rosidual-signaFassisted multi'channel
decoding 240, wherein the decoding quality of the multi-channel decoding is increased by
the first r€sidual signal 232 (when compared to a non-residual'signal-assisted decoding).
ln other words, the first downmix signal 212 provides a "coarse' information about the first
audro channel signal 220 and the second audio channel signal 222, wherein, for example,
differences between the first audio channel signal 220 and the second audio channsl
signat 222 may be described by (optional) paramet€rs, which may be received by the
residual-signaFassrsted multi-channel decoder 240 and by the first residual signal 232
consequently, the first residual signal 232 may, for example, allow for a partial waveform
reconstruction of the first audio channel Signal 220 and of the second audio channel signal
222
Similarly, the (second) residual-signal-as8isted multi'channel decoder 250 provides the
third audio channel signal 224 in lhe fourth audio channel signal 226 on the basis of tho
second downmix signal 2.14, whorein the sEcond downmix signal 214 may, for example,
"coarsely" describe the third audio channel signal 224 and the fourth audio channel signal
226 Moreover, differences between the third audio channel signal 224 and the fourth
audio channel signal 226 may, for example, be described by (optional) parameters, which
may be received by the (second) residual-signal-assisted multi-channel decoder 250 and
by the second residual signal 234. Accordingly, the evaluation of the second residual
signal 234 may, for example, allow for a partial waveform reconstruction of the third audio
channel signal 224 and the fourth audio channel signal 226. Accordingly, the second
residual signal 234 rnay allow for an enhancement of the quality of reconstruction of the
third audio channel signal 224 and the fourth audio channel signal 226'
However, the first residual signal 232 and the second residual signal 234 are derived from
a join y-encoded representation 210 of th6 first residual signal and of the second residual
signal. such a multFchannel decoding, which is performed by the multi-channel decoder
230, a ows for a high decoding efficiency since the first audio channel signal 220, lhe
second audio channel signal 222,lhe third audio channel signal 224 and the fourth audio
channel signal 226 are typically similar or "correlated". Accordingly, the first residual signal
232 and the second residual signal 234 are typically also similar or "correlated', which can
t3
20
30
-J5
16
10
be exploited by deriving the first residual signal 232 and the second rosidual signal 234
from a jointly-encoded representation 210 using a multFchannel docoding'
consequently, it is possible to obtain a high d€coding quality with moderate bitrate by
decodingtheresidualsignals232,234onthebasisofalointly.encodedrepresentation
2lothereof,andbyUsingeachoftheresidual8ignalsforthedecodingoftwoormore
audio channel signals.
Toconclud€,thesudiod€codor20Oallowsforahighcodingof,lci€ncybyprovidinghigh
quality audio chennel Eignal8 220, 222' 224,226'
ltshouldbenotedthatadditionalfeaturesandfunctionalities,whichcanbeimplemented
optionallyintheaudiodecoder200,willbedescribedsubsequentlytakingreferenceto
Figs. 3, 5, 6 and '13. However, it should be noted that the audio encoder 200 may
comprise the above-mentionod advantages without any additional modification'
3. Audio decoder accordino to Fiq' 3
Fig. 3 shows a block schematic diagram of an audio decoder according to another
embodiment of the present invontion' The audio decoder of Fig 3 designated in its
entirety with 3OO. The audio decodor 300 is similar to th€ audio decoder 200 according to
Fig. 2, such that the above explanations also apply However' the audio decoder 300 is
supplemented with additional features and functionalities when compared to the audio
decoder 2Oo, as will be explained in the following'
The audio decoder 300 is configured to receive a joinfly-encoded representation 3'r0 0f a
first residual signal and of a second residual signal Moroover' the audio decoder 300 is
configuredtoreceiveajointly.encod€drepresentation360ofafirstdownmixsignalandof
a second downmix signal. Moreover, the audio decoder 300 is configured to provide a first
audio channel signal 320, a second audio channel signal 322' a third audio channel signal
324 anda fourth audio channer signar 326. The audio decoder 3oo comprises a murtichannel decoder 330 which is configured to receive the jointly-encoded representation
3loofthefirstresidualsignalandofthesecondresidualsignalandtoprovide,onthe
basis thereof, a first residuar signar 332 and a second residuar signal 334. The audio
decoder3ooalsocompriseSa(first)residual.signal-assistedmulti.channeldecoding340'
t3
)o
35
17
10
15
20
25
which receives the first residual signal 332 and a first downmix signal 312, and provides
the first audio channel signal 320 and the second audio channel signal 322 The audio
decoder 3OO also comprises a (second) residual-signal-as8i8ted multi'channel decoding
350, which is configured to recoive the second residual signal 334 and a second downmix
signal 314, and to provide the third audio chann€l signal 324 and the fourth audio channal
signal 326.
The audio decoder 300 also comprises another multi-channel decoder 370, which is
configured to receive the jointly-encoded repr€sontation 360 of the first downmix signal
and of the second downmix signal, and to provido, on the baBis thereof' the first downmix
signal 312 and th€ second downmix signal 314'
lnthefollowing,somefurtherSpecificdotailsoftheaudiodecoder300willbedescribed.
However, it should be noted that an actual audio decoder does not need to implement a
combinationofalltheseadditionalfeaturesandfunctionalities.Rather,thefeaturesand
functionalitiesdescribedinthefollowingcanbeindividuallyaddedtoth6audiodecoder
200(oranyotharaudiodecoder),tograduallyimprovetheaudiodecoder200(orany
other audio decoder).
lnapreferredembodiment,theaudiodecoder300receivesajointly.encoded
representation 310 of the first residual signal and the second residual signal' wherein this
jointly-encoded representation 310 may comprise a downmix signal of the first residual
signal332andoftheSecondresidualsignal334,andacommonresidualsignalofthefirst
residual signal 332 and the second residual signal 334. ln addition, the jointly-encodod
representation 310 may, for example' comprise one or more prediction parameters'
Accordingly,themulti-channeldecoder330maybeaprediction.based,residual-signalassistedmulti-channeldecoder.Forexample'themulti-channeldecoder330maybea
USAC complex stereo prediction, as described' for example' in the section 'Complex
Stereo Prediction,, of the international standard lso/|Ec 23003-3:2012. For examp|e, the
multi-channeldecoder33omaybeconfiguredtoevaluateapredictionparameter
describing a contribution of a signal component' which is derived using a signal
componentofapreviousframe,toaprovisionofthefirstresidualsignal332andthe
secondresidualsignal334foracurrentframe.Moreover,themulti.channeldecoder330
maybeconfiguredtoapplythecommonresidualsignal(whichisincludedinthejointlyencodedrepresentation3lo)withafirstsign,toobtainthefirstresidualsignal332,andto
applythecommonresidualsignal(whichisincludedintheiointly.encodedrepresentation
18
10
310) with a second sign, which is opposite to tho first sign, to obtain th€ second residual
signal 334. Thus, the common residual aignal may, at lea6t partly, describe differencos
between the first rEsidual signal 332 and th€ 36cond residual signal 334. However, th6
multi-channel decoder 330 may evaluate the downmix signal, the common residual signal
and the one or more prediction parametors, which are all included in the jointly-encoded
representation 310, to obtain the first residual signal 332 and the second residual signal
334 as described in the above-referenced internatlonal standard ISO/IEC 23003'3:2012'
Moreover, it should be noted that tho flrst residual signal 332 may be associated with a
first horizontal position (or azimuth position), for example, a left horizontal position, and
that the second residual signal 334 may be associated with a second horizontal position
(or azimuth position), for examplo a right horizontal position, of an audio scene'
The jointly-encodod representation 360 of the first downmix signal and of the second
downmix signal preferably comprises a downmix 3ignal of the first downmix signal and of
the second downmix signal, a common rosidual signal of th€ first downmix signal and of
theseconddownmixsignal,andon6ormor€predictionparameters.lnotherwords'th€r6
isa,,common,,downmixsignal,intowhichthefirstdownmixsignal312andtheSecond
downmixsignal3,l4aredownmixed,andthereisa,.common"residualsignalwhichmay
describe,atleastpartly,differencesbetweenthefirstdownmixsignal312andthesecond
downmixsignal3l4.Themulti.channeldecoder3T0ispreferablyaprediction.based,
residual-signal-assistedmultFchanneldecoder,forexample'aUSACcomplexstereo
predictiondecoder.lnotherwords,themulti-channeldecoder3TO,whichprovidesthefirst
downmix signal 312 and the second downmix signal 314 may be substantially identical to
the multi-channel decoder 330, which provides the first residual signal 332 and the second
residualslgnal334,suchthattheaboveexplanationsandreferencesalsoapply.
Moreover,itShouldbenotedthatthefirstdownmixSignal3,l2ispreferablyassociated
with a first horizontal position or azimuth position (for example, left horizontal position or
azimuth position) of the audio scene, and that the second downmix signal 314 is
preferablyassociatedwithasecondhorizontalpositionorazimuthposition(forexample,
righthorizontalpositionorazimuthposition)oftheaudioscene.Accordingly,thefirst
downmixsignal312andthefirstresidualsignal332maybeassociatedwiththeSame,
firsthorizontalpositionorazimuthposition(forexample,lefthorizontalposition),andthe
seconddownmixsignal3,l4andtheSecondresidualsignal334maybeassociatedwith
the Same, second horizontal position or azimuth position (for example, right horizontal
position).Accordingly,boththemulti.channeldecoder3Toandthemulti.channeldecoder
330 may perform a horizontal splitting (or horizontal separation or horizontal distribution)
15
20
25
30
1F
19
10
15
20
)q
30
The residual-signaFassisted multi-channel decoder 340 may preferably be paramet€r'
based, and may consequently r€ceive on6 or more parameters 342 describing a desirod
correlation between two channels (for example, b€twoen the first audio channel signal 320
and the second audio chann€l signal 322) andlor level differences between said two
channels. For example, the residual-signal-aesistEd multi-chenn€l decoding 340 may be
based on an MPEG-surround coding (as described, for example, in lso/lEC 23003'
1:2007) with a residual signal oxtension or a "unified stereo decoding" decoder (a8
described'forexampleinlso/lEC23oo3.3,chapter7.11(Decoder)&AnnexB.21
(Description of the Encoder & Dofinition of the T€rm "unified stereo")). Accordingly' the
residual-signal-assiBted multi-channel decodor 340 may provido the first audio channcl
signal 320 and the second audio channel signal 322, wherein the first audio channel
signal320andthesecondaudiochannelsignal322areassociatedwithvertically
neighboring positions of the audio scene. For example, the first audio channel signal may
beassociatedwithalowerleftpositionoftheaudioscene,andtheSecondaudiochannal
signalmaybeassociatedwithanupperleftpositionoftheaudioscene(Suchthatthefirst
audiochannelsignal32oandthesecondaudiochannelsignal322are,forexample,
associat€dwithidgnticalhorizontalpositionsorazimuthpositionsoftheaudioscene'or
with azimuth positions separated by no more than 30 degrees) ln other words' the
residual-signal.assistedmultFchanneldecoder340mayperformaverticalsplitting(or
distribution, or seParation)
Thefunctionalityoftheresidual-signal.assistedmultFchanneldecoder350maybe
ldenticaltothefunctionalityoftheresidual-signal.assistedmultFchanneldecoder340'
whereinthethirdaudiochannelsignalmay,forexample,beassociatedwithalowerright
position of the audio scene, and wherein the fourth audio channel signal may' for
example, be associated with an upper right position of the audio scene ln other words'
the third audio channel signal and the fourth audio channel signal may be associated with
vertically neighboring positions of the audio scene' and may be associated with the same
horizontalpositionorazimUthpositionoftheaudioscene,whereintheresidual.signal.
assisted multi-channel decoder 350 performs a vertical splitting (or separation' or
distribution).
To summarize, the audio decoder 3oo according to Fig. 3 performs a hierarchical audio
decoding'whereinaleft-rightSplittingisperformedinthefirstStages(multi-channel
decoder 330, multi-channel decoder 370), and wherein an upper-lower splitting is
20
10
performed in the second stage (residual-signal-assisted multi-channel decoders 340,
350). Moreover, the residual signals 332, 334 are also encoded using a jointly-encoded
representation 310, as well as the downmix signals 312, 314 (jointly-€ncodod
representation 360). Thus, correlations betweon tho dlfferent channels are exploited both
for the encoding (and decoding) of the downmix signals 312, 314 and for the encoding
(and decoding) of the residual signals 332, 334. Accordingly, a high coding efficiency is
achieved, and the correlations b€tween the signals 8ro well exploited.
4. Audio encoder accordtno to Fio. 4
Fig. 4 shows a block schematic diagram of an audio encoder, according to anothor
embodiment of the present invention. The audio encoder according to Fig, 4 is designated
in its entirety with 400. The audio encoder 4OO is configured to receive four audio channel
signals'namelyafirstaudiochannelsignal4l0,asecondaudiochannelsignal4l2,a
third audio channel signal 414 and a fourth audio channel signal 416. Moreover, the audio
encoder 4oo is configured to provide an encoded r€presentation on the basis of the audio
channel signals 410, 412, 414 and 416, wherein said encoded repres€ntation comprises a
jointly encoded representation 420 of two downmix signals' as well as an encoded
representation of a first sel 422 ol common bandwidth extension parameters and of a
second set 424 of common bandwidth extension parameters' The audio encoder 400
comprisesafirstbandwidthextensionparameterextractor430,whichisconfiguredto
obtain the first set 422 of common bandwidth extraction parameters on the basis of the
first audio channel signal 410 and the third audio channel signal 414. The audio encoder
4oo also comprises a second bandwidth extension parameter extractor 440, which is
configuredtoobtaintheSecondset424olcommonbandwidthextensionparameterson
the basis of the second audio channel signal 412 and the fourth audio channel signal 416'
Moreover,theaudioencoder4oocomprisesa(first)multi.channelencoder450,whichis
configuredtojointly.encodeatleastthefirstaudiochannelsignal4l0andthesecond
audio channel signal 412 using a multi.channel encoding, to obtain a first downmix Signal
452.Furthel,theaudioencoder400alsocomprisesa(second)multi.channelencoder
460, which is configured to jointly-encode at least the third audio channel signal 414 and
thefourthaudiochannelsignal4l6usingamulti.channelencoding,toobtainasecond
downmix signal 462. Further, the audio encoder 4OO also comprises a (third) multichannel encoder 470, which is configured to jointly-encode the first downmix signal 452
.F
20
21
10
15
and the second downmix signal 462 using a multi-channel encoding, to obtain the jointlyencoded representation 420 of the downmix signals
Regarding the functionality of the audio encoder 4oo, it should be noted that the audio
encoder4ooperformsahierarchicalmulti.chann€lencoding,wherointhefirstaudio
chann€l signal 410 and the second audio channol signal 412 are combined in a first stage,
and wherein the third audio channel signal 414 and the fourth audio channel signal 416
are also combined in the first stage, to thereby obtain the first downmix signal 452 and the
seconddownmixSignal462.Thefirstdownmixsignal452andtheseconddownmixsignal
462arethenjointlyencodedinasecondstage.However,itShouldb€not6dthatthefirgt
bandwidth extension parameter extractor 430 Provides the first sel 422 of common
bandwidth e}itraction parameters on the basis of audio channel signals 410, 414 which are
handledbydifferentmulti-channelencoders4So,460inthefirststageofthehierarchical
multi-channelencoding.similarly,thesecondbandwidthextensionparameterextractor
440 provides a second sel 424 of common bandwidth extraction param€ters on the basis
ofdifferentaudiochannelsignals4l2,4l6,whicharehandledbydifferentmulti-channol
encoders 450, 460 in the first processing stago. This specific processing ord6r brings
alongtheadvantagethattheSetS422'424ofbandwidthextensionparametersarebased
onchannelswhichareonlycombinedinthesecondstageofthehierarchicalencoding
(i.e., in the multl-channel encoder 470) This is advantageous' since it is desirable to
combinesuchaudiochannelsinthefirststageofthehierarchicalencoding,the
relationshipofwhichisnothighlyrelevantwithrespecttoasoundSourceposition
perception. Rather, it is recommendable that the relationship between the first downmix
signal and the second downmix signal mainly determines a sound source location
perception, because the relationship between the first downmix signal 452 and the second
downmixsignal462canbemaintainedbetterthantherelationshipbetweentheindividual
audiochannelsignals410,4l2,4l4,4l6Wordeddifferently'ithasbeenfoundthatitis
desirablethatthefirstse|422olcommonbandwidthextensionparametersisbasedon
two audio channels (audio channel signals) which contribute to different of the downmix
signals 452, 462, and that the second set 424 of common bandwidth extension
parameters is provided on the basis of audio channel signals 412' 416' which also
contributetodifferentofthedownmixsignals452'462'whichisreachedbytheabovedescribed processing of the audio channel signals in the hierarchical multi-channel
encoding. Consequently, the first set 422 of common bandwidth extension parameters is
based on a similar channel relationship when compared to the channel relationship
between the first downmix Signal 452 and the Second downmix signal 462, wherein the
20
25
30
35
22
10
15
latter typically dominates the spatial impression generated at the side of an audio
decoder. Accordingly, the provision of the first set 422 of bandwidth extension param€tor8,
and also the provision of the second eel 424 ol bandwidth extension parameters is well'
adapted to a spatial hearing impression which is generated at the side of an audio
decoder.
5. Audio decoder accordino to Fio. 5
Fig. 5 shows a block schematic diagram of an audio decoder, according to anoth€r
embodiment of the present invention. The audio d6coder according to Fig. 5 is designated
in its entirety with 500.
The audio decoder 500 is configured to receive a jointly-encoded representation 510 of a
first downmix signal and a second downmix signal. Moreover, the audio decoder 500 is
configured to provid€ a first bandwidth-oxtend€d chann€l signal 520, a second bandwidth
extended channel signal 522, a ld|lnd bandwidth-extended channel signal 524 and a fourth
bandwidth-extended channel signal 526.
The audio decoder 5OO comprises a (first) multi-channel decoder 530, which is configured
to provide a first downmix signal 532 and a second downmix signal 534 on the basis of
the jointly-encoded representation 510 of the first downmix signal and the second
downmix signal using a multi-channel decoding. The audio decoder 500 also comprises a
(second) multi-channel decoder 540, which is configured to provide at least a first audio
channel signal 542 and a second audio channel signal 544 on the basis of the first
downmix signal 532 using a multi-channel decoding. The audio decoder 500 also
comprises a (third) multi-channel decoder 550, which is configured to provide at least a
third audio channel signal 556 and a fourth audio channel signal 558 on the basis of the
second downmix signal 544 using a multi-channel decoding. Moreover, the audio decoder
500 comprises a (first) multi-channel bandwidth extension 560, which is configured to
perform a multi-channel bandwidth extension on the basis of the first audio channel signal
542 and the third audio channel signal 556, to obtain a first bandwidth-extended channel
signal 520 and the third bandwidth-extended channel signal 524. Moreover, the audio
decoder comprises a (second) multi-channel bandwidth extension 570, which is
configured to perform a multi-channel bandwidth extension on the basis of the second
audio channel signal 544 and the fourth audio channel signal 558, to obtain the second
25
35
23
10
15
20
25
30
bandwidth-extended channel signal 522 and the fourth bandwidth-extended channel
signal 526.
Regarding the functionality of the sudlo decoder 500, lt should be noted that tho eudio
decoder 500 performs a hierarchical multi-channel decoding, whor€in a splitting between
a first downmix signal 532 and a second downmix signal 534 is performed in a first stago
of the hierarchical decoding, and wherein the first audio channel signal 542 and th€
second audio channel signal 544 arc derived from the first downmix signal 532 in a
second stage of the hierarchical decoding, and wherein the third audio channel signal 556
and the fourth audio channol signal 558 aro dorivod from the 36cond downmix signal 550
in the second stage of the hierarchical decoding. However, both the first multi'channol
bandwidth extension 560 and tho socond multi-channel bandwidth extension 570 6ach
receive one audio channel signal which is derived from the first downmix signal 532 and
one audio channel signal which is derived from the second downmix signal 534. Since a
better channel separation is typically achieved by the (first) multi-channel decoding 530,
which is performed as a first stage of the hierarchical multi-channel decoding, when
compared to the second stag€ of the hierarchical decoding, it can b€ seen that each multichannel bandwidth extension 560, 570 receives input signals which are well-separated
(because they originate from the first downmix signal 532 and the second downmix signal
534, which are well-channel-separated). Thus, the multi-channel bandwidth e)dension
560, 570 can consider stereo characteristics, which are important for a hearing
impression, and which are well-represented by the relationship between the first downmix
signal 532 and the second downmix signal 534, and can therefore provide a good hearing
impression.
ln other words, the "cross" structure of the audio decoder, wherein each of the multichannel bandwidth extension stages 560, 570 receives input signals from both (second
stage) multi-channel decoders 540, 550 allows for a good multi-channel bandwidth
extension, which considers a stereo relationship between the channels.
However, it should be noted that the audio decoder 500 can be supplemented by any of
the features and functionalities described herein with respect to the audio decoders
according to Figs. 2, 3, 6 and 13, wherein it is possible to introduce individual features into
the audio decoder 500 to gradually improve the performance of the audio decoder.
6. Audio decoder accordino to Fio. 6
24
10
'15
20
Fig. 6 shows a block Echematic dlegrem ol an audio docod€r according to another
embodiment of the presont invention. The audio decoder according to Fig. 6 is designatod
in its entirety with 600. The eudio decodor 600 according to Fig. 6 is similar to the audio
decoder 500 according to Fig. 5, such that the above explanations also apply. However,
the audio decoder 600 has been supplemented by some features and functionalities'
which can also be introduced, individually or in combination, into the audio decoder 500
for improvement.
The audio decoder 600 is configured to rec8ive a jointly encod6d reprasentation 610 of e
first downmix signal and of a second downmix signal and to provide a first bandwidth'
extended signal 620, a second bandwidth extended signal 622, a third bandwidth
extended signal 624 and a fourth bandwidth extended signal 626. The audio decoder 600
comprises a multi-channel decoder 630, which is configured to receive the jointly encoded
representation 610 of th6 first downmix signal and of the Eecond downmix signal, and to
provide, on the basis th6reof, the first downmix signal 632 and the second downmix signal
634. The audio decoder 600 further comprises a multFchannel decoder 640, which is
configured to receive the first downmix signal 632 and to provide, on the basis thereof, a
first audio channel signal 542 and a second audio channel signal 5'44. The audio decoder
600 also comprises a multi-channel decoder 650, which is configured to receive the
second downmix signal 634 and to provide a third audio channel signal 656 and a fourth
audio channel signal 658. The audio decoder 600 also comprises a (first) multi-channel
bandwidth extension 660, which is configured to receive the first audio channel signal M2
and the third audio channel signal 656 and to provide, on the basis thereof, the first
bandwidth extended channel signal 620 and the third bandwidth extended channel signal
624. Also, a (second) multi-channel bandwidth extension 670 receives the second audio
channel signal 644 and the fourth audio channel signal 658 and provides, on the basis
thereof, the second bandwidth extended channel signal 622 and the fourth bandwidth
extended channel signal 626.
The audio decoder 600 also comprises a further multi-channel decoder 680, which is
configured to receive a jointly-encoded representation 682 of a first residual signal and of
a second residual signal and which provides, on the basis thereof, a first residual signal
684 for usage by the multi-channel decoder 640 and a second residual signal 686 for
usage by the multi-channel decoder 650.
25
30
,IE
25
'10
The multFchannel decoder 630 is preferably a prediction-based residual-signal-assisted
multi-channel decoder. For example, the multi-channel decoder 630 may be substantially
identical to the multi-channel decoder 370 deecribed above. For example, th€ multi'
channel decoder 630 may be a USAC Complex storeo predication decoder, as mentioned
above, and as deecribed in the USAC atandard referenced above. Accordingly, tho iointly
encoded representation 6'lo of th6 fir8t downmix signal and of the second downmix signal
may, for example, comprise a (common) downmix signal of the first downmix signal and of
the second downmix signal, a (common) residual signal of the first downmix signal and of
the second downmix signal, and one or more prediction parameters, which are evaluatod
by the multi-channel decoder 630
Moreover, it should be noted that the first downmix signal 632 may, for example, be
associated with a first horizontal position or azimuth position (for example, a left horizontal
position) of an audio scene and that the second downmix signal 634 may, for example, be
associated with a second horizontal position or azimuth position (for example, a right
horizontal position) of the audio scene.
Moreover' the multi.channel decoder 680 may, for example, be a prediction.based,
residual-signal-associated multi-channel decoder. The multi-channel decoder 680 may be
substantially identical to the multi-channel decoder 330 described above. For example,
the multi-channel decoder 680 may be a USAC complex stereo prediction decoder, as
mentioned above. consequently, the jointly encoded representation 682 of the first
residual signal and of the second . residual signal may comprise a (common) downmix
signal of the first residual signal and of the second residual Eignal, a (common) residual
signal of the first residual signal and of the second residual signal, and one or more
prediction parameters, which are evaluated by the multi-channel decoder 680 Moreover,
it should be noted that the first residual signal 684 may be associated with a first
horizontal position or azimuth position (for example, a left horizontal position) of the audio
scene, and that the second residual signal 686 may be associated with a second
horizontal position or azimuth position (for example, a right horizontal position) of the
audio scene.
The multi-channel decoder 640 may, for example, be a parameter-based multi-channel
decoding like, for example, an MPEG surround multi-channel decoding, as described
above and in the referenced standard. However, in the presence of the (optional) multichannel decoder 680 and the (optional) first residual signal 684, the multi-channel
15
20
25
30
26
10
t3
20
decoder 640 may be a parameter-based, residual-signal-assisted multi-channel decodor,
like, for example, a unified stereo decoder. Thu3, the multi-channel decoder 640 may be
substantially identical to the multi-channel decoder 340 described above, and the multi'
channel decoder 640 may, for example, receive tho paremet€rs 342 doscribed above.
Similarly, the multi-channel decodor 650 may bo substantially identical to the multi'
channel decoder 640. Accordingly, the multi-channol decoder 650 may, for exampl6, be
parameter based and may optionally be residual-signal assisted (in the presence of the
optional multFchannel decoder 680).
Moreover, it Bhould be not6d that the first audio channol signal 642 and the 86cond audio
channel srgnal 644 are preferably associated with vortically adjacent spatial positions of
the audio scene. For example, th6 first audio chann€l signal 642 i3 associated with a
lower left position of the audio scene and the second audio channel signal 644 is
associated with an upper left position of the audio scene. Accordingly, the multi-channel
decoder 640 performs a vertical splitting (or separatlon or distribution) of the audio content
described by th€ first downmix signal 632 (and, optionally, by the first residual signal 884)'
simitarly, th6 third audio channel signal 656 and the fourth audio channel signal 658 ar6
associated with vertically adjacent positions of the audio scene, and are preferably
associated with the same horizontal position or azimuth position of the audio scene. For
example, the third audio channel signal 656 is preferably associated with a lower right
position of the audio scene and the fourth audio channel signal 658 is preferably
associated with an upper right poEition of the audio scene. Thus, the multi-channel
decoder 650 performs a vertical splitting (or separation, or distribution) of the audio
content described by the second downmix signal 634 (and, optionally, the second residual
signal 686).
However, the first multi-channel bandwidth extension 660 receives the first audio channel
signal 642 and the third audio channel 656, which are associated with the lower left
position and a lower right position of the audio scene. Accordingly, the first multi-channel
bandwidth extension 660 performs a multi-channel bandwidth extension on the basis of
two audio channel signals which are associated with the same horizontal plane (for
example, lower horizontal plane) or elevation of the audio scene and different sides
(lefuright) of the audio scene. Accordingly, the multi-channel bandwidth extension can
consider stereo characteristics (for example, the human Stereo perception) when
performing the bandwidth extension. similarly, the second multi-channel bandwidth
)q
35
27
10
extension 670 may also consider stereo charaoteristics' since the second multi-channel
bandwidth extension operates on audio channal signals of th6 same horizontal plane (for
example, upper horizontal plane) or elovation but at differ€nt horizontal positions (different
sides) (lefUright) of the audio scene.
To further conclude, the hierarchical audio decoder 600 comprises a structure wherein a
left/right splitting (or separation, or distribution) is performed in a first stage (multi-channel
decoding 630, 680), wherein a vertical splitting (separation or distribution) is porformed in
a Second stag6 (multi.channel decoding 640, 650), and wh€rein lhe multi-channel
bandwidth extengion operates on a pair of lefy1ght signais (multi'channel bandwidth
extension 660, 670). This "crossing" of the decoding pathes allows that left/right
separation, which is particularly important for the hearing impression (for example, more
important than the upper/lower splitting) can be performed in the first processing stage of
the hierarchical audio decoder and that th6 multi-channol bandwidth extension can also
be performed on a pair of left-right audio channel signals' which again results in e
particularly good hearing impression, The upper/lower splitting is performed as an
intermediate stage between the left-right separation and the multi-channel bandwidth
extension, which allows to derive four audio channel signals (or bandwidth-extended
channel signals) without significantly degrading the hearing impression'
7. Method accordino to Fiq. 7
Fig. 7 shows a flow chart of a method 700 for providing an encoded representation on the
basis of at least four audio channel signals.
The method 700 comprises jointly encoding 710 at least a first audio channel signal and a
second audio channel signal using a residual-signal-assisted multi-channel encoding' to
obtain a first downmix signal and a first residual signal. The method also comprises iointly
encoding 720 at least a third audio channel Eignal and a fourth audio channel signal using
a residual-signal-assisted multi-channel encoding, to obtain a second downmix signal and
a second residual signal. The method further comprises jointly encoding 730 the first
residual signal and the second residual signal using a multi-channel encoding, to obtain
an encoded representation of the residual signals. However, it should be noted that the
method 700 can be supplemented by any of the features and functionalities described
herein with respect to the audio encoders and audio decoders'
15
20
30
35
28
10
15
8. Method accordino to Fiq. 8
Fig. 8 shows a flow chart of a method 8oo for providing at least four audio channel signala
on the basis of an encoded repr€s6ntation
The method 800 comprises providing 810 a first reeidual signal and a second residual
signal on the basis of a jointly-encoded repr€sentation of the first residual signal and the
second residual signal using a multi-channel decoding' The method 800 also comprises
providing 820 a first audio chann6l signal and a 36cond audio channel signal on the basis
of a first downmix signal and the first residual signal ueing a residual-signal-assisted multi'
channeldecoding.Themethodalgocomprisesprovidlngs30athirdaudiochannelsignel
andafourthaudiochannelsignalonthebasisofaseconddownmixsignalandthe
secondresiduaIsignaluSingaresidual.signal.assistedmUlti.channeldecoding.
Moreover, it should be noted that the method 800 can be supplemented by any of th€
features and functionalities described herein with resp€ct to the audio decoders and audio
encoders.
Fig. I shows a flow chart of a method 9OO for providing an encoded representation on the
basis of at least four audio channel signal'
Themethodg00comprisesobtaininggloafirstsetofcommonbandwidthextension
parametersonthebasisofafirstaudiochannelsignalandathirdaudiochannelsignal.
The method 9OO also comprises obtaining 920 a second set of common bandwidth
extension parameters on the basis of a second audio channel signal and a fourth audio
channel signal. The method also comprises jointly encoding at least the first audio
channel signal and the second aud|o channel signal using a multi.channel encoding, to
obtainafirstdownmixsignalandjointlyencodingg4oatleastthethirdaudiochannel
Signalandthefourthaudlochannelsignalusingamulti-channelencodingtoobtaina
seconddownmixsignal,Themethodalsocomprisesjointlyencoding950thefirst
downmixSignalandtheseconddownmixsignalusingamulti.channelencoding,toobtain
an encoded representation of the downmix signals'
20
25
at
29
10
15
20
It should be noted that some of the steps of the method 900, which do not comprise
specific inter d6pendenci6s, can be performod in arbitrary order or in parallel. Moreover, it
should be notod that the method 9oo can be eupplemented by any of the features and
functionalities d6scribed herein with r€spect to the audio encod€rB and audio decoders
10. Method accordino to Fiq. 10
Fig. 10 shows a flow chart of a method looo for providing at least four audio channel
signals on the basis of an encod€d r€presentation.
The method .looo comprises providing lolo a lirst downmix signal and a second downmix
signal on the basis of a jointly encoded representation of the first downmix signal and the
seconddownmixsignalusingamulti-channeldecoding,providingl020atleastafirst
audio channel signal and a Eecond audio channel signal on the basis of the first downmix
signalusingamulti-channeldecoding,providing.lo3oatleastathirdaudiochannelsignal
and a fourth audio channel signal on th€ basis of the second downmix signal using a
multi-channel decoding, performing '1040 a multi-channel bandwidth extension on the
basis of the first audio channel signal and the third audio channel signal, to obtain a first
bandwidth-extended channel signal and a third bandwidth-extended channel signal, and
performinglo5oamulti-.channelbandwidthextensiononthebasisofthesecondaudio
channel signal and the fourth audio channel signal' to obtain a second bandwidthextended channel signal and a fourth bandwidth-extended channel signal'
It should be noted that some of the steps of the method looo may be preformed in parallel
or in a different order. Moreover, it should be noted that the method 1000 can be
Supplementedbyanyofthefeaturesandfunctionalitiesdescribedhereinwithrespectto
the audio encoder and the audio decoder'
)q
30 11. Embodiments accordino to Fios 11'12and13
lnthefollowing,someadditionalembodimentsaccordingtothepresentinventionandthe
underlying considerations will be described'
Fig'llshowsablockSchematicdiagramofanaudioencoderll00accordingtoan
embodiment of the tnvention. The audio encoder 1100 is configured to receive a left lower
J5
30
'10
channel signal 1110, a left upper channel signal 1112, a right lower channel signal 1114
and a right upper channel signal 11'16.
The audio encoder 1 1Oo comprises a firct multi'channel sudio oncoder (or encoding)
1 120, which is an MPEG surround 2-1-2 audio encoder (or encoding) or a unified stereo
audio encoder (or encoding) and which r€ceives the left lower channel signal 1110 and
the left upper channel signal 1112. The first multi-channel audio encoder 1120 provides a
left downmix signal 1122 and, optionally, a left residual signal 1124. Moreover, the audio
encoder 1 1OO comprises a second multi-channel encoder (or encoding) 1 'l 30, which is an
MPEc-surround 2-1-2 encodq (or encoding) or a unifiod stereo encoder (or oncoding)
which receives the right lower channel slgnal 1'114 and the right upp6r chsnnel signal
1 1 16. The second multi-channel audio encoder 1 130 provides a right downmix signal
1132 and, optionally, a right residual signal 1134. The audio encoder 1 100 also comprises
a stereo coder (or coding) 1140, which receives the left downmix signal 1122 and the right
downmix signal '1 132. Moreover, the first stereo coding 1 140, which is a complex
prediction st6r6o coding, receives a psycho acoustic model information 1142 from a
psycho acoustic model. For example, the psycho model information 1142 may describe
the psycho acoustic relevance of differ€nt frequency bands or frequency subbands,
psycho acoustic masking effects and the like. The stereo coding 1'140 provides a channel
pair element (CPE) "downmix", which is designated with 1144 and which describes the left
downmix signal 1122 and the right downmix signal 1132 in a jointly encoded form.
Moreover, the audio encoder 11OO optionally comprises a Second Stereo coder (or coding)
1 150, which is configured to receive the optional left residual signal 1 124 and the optional
right residual signal 1 134, as well as the psycho acoustic model information 1 142. The
second stereo coding 1150, which is a complex prediction stereo coding, is configured to
provide a channel pair element (CPE) "residual", which represents the left residual signal
1124 and the right residual signal 1 134 in a jointly encoded form.
The encoder 1 1OO (as well as the other audio encoders described herein) is based on the
idea that horizontal and verttcal signal dependencies are exploited by hierarchically
combining available USAC stereo tools (i.e., encoding concepts which are available in the
USAC encoding). Verlically neighbored channel pairs are combined using MPEG
surround 2-1-2 ot unified stereo (designated with 1120 and 1130) with a band-limited or
fult-band residual signal (designated wilh 1124 and 1134). The output of each vertical
channel pair is a downmix signal 1122, 1132 and, for the unified stereo, a residual signal
1124, 1134. ln order to satisfy perceptual requirements for binaural unmasking, both
tc
20
25
30
35
31
10
15
20
25
downmix signals 't '122, 1192 ate combined horizontally and jointly coded by use of
complex prediction (encoder '1 140) in the MDCT domain, which includes the possibility of
left-right and mid-side coding. The same m€thod can b6 applied to the horizontally
combined residual signals 1124, 1134. ThiE concept iE illu8tratod in Fig. 11.
The hierarchical structure explained with reforence to Fig. 11 can be achieved by enabling
both stereo tools (for example, both USAC storeo tools) and resorting channels in
between. Thus, no additional preJpost processing step is necessary and the bit stream
syntax for transmission of the tool's payloads r6main8 unchanged (for example,
substantially unchanged when compared to the USAC standard). This idea results in the
encoder structure thown in Fig. 12.
Fig. 12 shows a block schematic diagram of an audio encoder '1200, according to an
embodiment of the invention. The audio encoder 1200 is configured to receive a first
channel signal 1210, a second channel signal 1212, a lhid channel signal 1214 and a
fourth channel signal 1216. The audio encoder '1200 is configured to provide a bit stream
1220 lot a first channel pair element and a bit e:ream 1222 for a second channel pair
element.
The audio encoder 1200 comprises a first multFchannel encoder 1230, which is an
MPEc-surround 2-1-2 encoder or a unified stereo encoder, and which receives the first
channet srgnal 1210 and the second channel signal 1212. Moreover, the first multichannel encoder 1230 provides a first downmix signal '1232, an MPEG surround payload
1236 and, optionally, a first residual signal 1234. The audio encoder 1200 also comprises
a second multi-channel encoder 1240 which is an MPEG surround 2-1-2 encoder or a
unified stereo encoder and which receives the third channel signal '|214 and the fourth
channel signal 1216. The second multi-channel encoder 1240 provides a first downmix
signal 1242, an MPEG surround payload 1246 and, optionally, a second residual signal
1244.
The audio encoder 12OO also comprises first stereo coding 1250, which is a complex
prediction stereo coding. The first stereo coding 1250 receives the first downmix signal
1232 and the second downmix signal 1242. The first stereo coding 1250 provides a iointly
encoded representation 1252 ol lhe first downmix signal 1232 and the second downmix
signal 1242, wherein the jointly encoded representation 1252 may comprise a
representation of a (common) downmix signal (of the first downmix signal 1232 and of the
?q
32
10
second downmix signal 1242) and of a common residual signal (of the first downmix
signal 1 232 and of the second downmix eignal 1242). Moreover, the (first) complex
prediction stereo coding 1250 provides e complex prediction payload 1254, which typically
comprises one or more complex prediction coofficients Moreover' the audio encoder
12OO also comprises a second stereo coding 't260, which i8 a complox Prediction stereo
coding.ThesecondStereocodingl260receivesthefirstresidualsignall234andthe
second residual Signal 1244 (or zero input values, if there is no residual signal provided by
themulti-channelencodersl2go,l24o),ThoEocond.t.roocodingl260providosajointly
encoded representation 1262 of th€ fir8t residual Eignal '1234 and of the second residual
signal 1244, which may' for oxamPlr, compriso a (common) downmix signal (of the firet
residual signal 1234 and of th€ second rosidual signal 1244) and a common residual
signal(ofthefirstresidualsignall234andofthesecondresidualsignal1244),Moreover,
the complex prediction stereo coding '1260 provides a complex prediction payload 1264
which typically comprises one or more prediction coefficients'
Moreover,theaudioencoderl2oocomprisesepsychoacousticmodell2T0,which
providesaninformationthatcontrolsth6firstcomplexpredictionstereocodingl250and
thesecondcomplexpredictionsteroocodingl260.Forexample,theinformationprovided
by the psycho acoustic model ',l270 may describe which frequency bands or frequency
blnsareofhighpsychoacousticrelevanceandshouldbeencodedwithhighaccuracy'
However,itShouldbenotedthatth6usageoftheinformationprovidedbythepsycho
acousttc model 1270 is optional'
Moreover,theaudioencoderl200comprisesafirstencoderandmultiplexer,l2S0which
receivesthejointlyencodedrepresentation1252lromthefirstcomplexpredictionStereo
codingl250,thecomplexpredictionpaYloadl2s4fromthefirstcomplexpredictionstereo
codingl250andtheMPEGsurroundpayloadl236fromthefirstmulti.channelaudio
encoder 123O Moreover, the first encoding and multiplexing 1280 may receive
information from the psycho acoustic model '1270' which describes' for example' which
encoding precision should be applied to which frequency bands or frequency subbands'
takingintoaccountpsychoacousticmaskingeffectsandthelike.Accordingly,thefirst
encoding and multiplexing 1280 provides the first channel palr element bit stream 1220'
Moreover, the audio encoder 1200 comprises a second encoding and multiplexing '1290'
whichisconfiguredtoreceivetheiointlyencodedrepresentationl262providedbythe
second complex prediction stereo encoding 1260' the complex prediction payload 1264
15
25
30
2(
33
10
15
proved by the second complex prediction stereo coding '1260, and the MPEG surround
payload 1246 provided by the second multFchannel audio encoder 1240. Moreover, th6
second encoding and multiplexing 1290 may r€c€iv€ an information from tho psycho
acoustic model 1270. Accordingly, th6 second encoding and multiplexing 1290 provid€s
the second chann€l pair eloment bit stoam '1222-
Regarding the functionality of the audio encoder 1200, reference is made to the above
explanations, and also to the explanations with respect to the audio encoders according to
Figs. 2, 3, 5 and 6.
Moreov€r, it should be noted that this conc€pt cen be extended to use multiple MPEG
surroundboxesforjointcodingofhorizontally,verticallyorotherwisegeometricallyrelated
channels and combining the downmix and residual signals to complex prediction stereo
pairs, considering their geometric and perceptual properties This leads to a generalized
decoder structure.
tn the fo owing, the implementation of a quad channel element will be described. ln a
three-dimensional audio coding systom, the hierarchical combination of four channels to
formaquadchannelelement(ocE)isused.AQcEconsistsoftwoUsAcchannelpair
elements (cPE) (or provides two usAc channel pair elements, or receives to usAC
channelpairelements).VerticalchannelpairsarecombinedusingMPS2.l-2orunified
stereo. The downmix channels are jointly coded in the first channel pair element cPE' lf
residual coding is applied, the residual signals are jointly coded in the second channel pair
element cPE, else the signal in the second cPE is set to zero. Both channel pair
elements cPEs use complex prediction for joint stereo coding, including the possibility of
teft-right and mid-side coding. To preserve the perceptual stereo properties of the high
frequency part of the signal, stereo sBR (spectral bandwidth replication) is applied
between the upper lefuright channel pair and the lower lefvright channel pair, by an
additional resorting step before the application of SBR.
A possible decoder structure will be described taking reference to Fig. 13 which shows a
block schematic diagram of an audio decoder according to an embodiment of the
invention. The audio decoder 1 3OO is configured to receive a first bit stream 1 310
20
34
10
15
representing a first channel pair elemont and a 86cond bit stream 1312 reprosenting a
second channel peir el€ment. However, the firet bit stream 1310 and the 86cond bit
stream 1312 may be included in a common overall bit stream.
The audio docoder '1300 is configurod to provide a first bandwidth extended channel
signal 1320, which may, for example, represent a lower l€ft position of an audio scene, a
second bandwidth extended channel signal 1322, which may, for example, represent an
upper left position of the audio scene, a third bendwidth extended channel signal 1324,
which may, for example, be associated with a lowor right pogition of the audio scene and
a fourth bandwidth onended channel signal 1326, which may, for exampl€, be associated
with an upper right position of the audio scene.
The audio decoder 13OO comprises a first bit stream decoding 1330, which is configured
to receive the bit stream 1310 for the first channel pair element and to provide, on the
basis thereof, a jointly-encoded reprosentation of two downmix signals, a complex
prediction payload 1334, an MPEG surround payload 1336 and a spectral bandwidth
replication payload 1 338. The audio decoder 1300 also comprises a first complex
prediction stereo decoding 1340, which is configured to receive the jointly encoded
representation 1332 and the complex prediction payload 1334 and to provide, on the
basis thereof, a first downmix signal 1342 and a second downmix signal 1344. Similarly,
the audio decoder 1300 comprises a second bit stream decodlng 1350 which is
configured to receive the bit stream 1312 for the second channel element and to provide,
on the basis thereof, a jointly encoded representation 1352 of two residual signals, a
complex predrction payload 1354, an MPEG surround payload 1356 and a spectral
bandwidth replication bit load 1358. The audio decoder also comprises a second complex
prediction stereo decoding 1360, which provides a first residual signal 1362 and a second
residual signal 1364 on the basis of the jointly encodod representation 1352 and the
complex prediction payload 1354.
Moreover, the audio decoder 1300 comprises a first MPEG surround{ype multichannel
decoding 1370, which is an MPEG surround 2-1-2 decoding or a unified stereo decoding.
The first MPEG surround-type multi-channel decoding 1370 receives the first downmix
signal 1342, the first residual signal 1362 (optional) and the MPEG surround payload 1336
and provides, on the basis thereof, a first audio channel signal 1372 and a second audio
channel signal 1374. The audio decoder 1300 also comprises a second MPEG surroundtype multi-channel decoding 1380, which is an MPEG surround 2-1-2 multi-channel
30
35
10
decoding or a unified stereo multi-channel decoding. The second MPEG sunound'type
multi-channel decoding 1380 receives the second downmix signal 1344 and the second
residual signal 1364 (optionel), as well as the MPEG sunound payload 1356, and
provides, on the basis thereof, a third audio channol signal 1382 and fourth audio channsl
signal 1384. The audio decoder 1300 alEo compris€s a first stereo spectral bandwidth
replication '1390, which is configured to r€c€ive the firgt audio channel signal 1372 and the
third audio channel signal 1382, as well as the spectral bandwidth replication payload
1338, and to provide, on the basis th€r6of, the first bandwidth extended channel signal
'1320 and the third bandwidth extonded channel Eignal 1324. Moreover, the audio decodor
comprises a second stereo spectral bandwidth replication 1394, which is configur€d to
receive the second audio channel signal 1374 and the fourth audio channel signal 1384,
as well as the spectral bandwidth replication payload 1358 and to provide, on the basig
thereof, the second bandwidth extended channel signal 1322 and the fourth bandwidth
extended channel signal '1326.
Regarding the functionality of the audio decoder 1300, reference is made to the above
discussion, and also the discussion ofthe audio decoder according to Figs. 2, 3, 5 and 6.
ln the following, an example of a bit stream which can be used for the audio
encoding/decoding described herein will be described taking reference to Figs. 't4a and
14b. lt should be noted that the bit stream may, for example, be an extension of the bit
stream used in the unified speech-and-audio coding (USAC), which is described in the
above mentioned standard (lSO/lEC 23003-3:2012). For example, the MPEG surround
payloads 1236, 1246, 1336, '1356 and the complex prediction payloads 1254, 1264, 1334,
1354 may be transmitted as for legacy channel pair elements (i.e., for channel pair
elements accordrng to the USAC standard). For signaling the use of a quad channel
element QCE, the USAC channel pair configuration may be extended by two bits, as
shown in Fig. 14a. ln other words, two bits designated with "qc€lndex" may be added to
the USAC bitstream leement "UsacChannelPairElementConfig0". The meaning of the
parameter represented by the bits "qcelndex" can be defined, for example, as shown in
the table of Fig. 14b.
For example, two channel pair elements that form a QCE may be transmitted as
consecutive elements, first the CPE containing the downmix channels and the MPS
payload for the first MPS box, second the CPE containing the residual signal (or zero
audio signal for MPS 2-1-2 coding) and the MPS payload for the second MPS box.
15
.E
30
35
36
10
15
ln other words, there is only a small signaling overhead when compared to th€
conventional USAC bit stream for transmitting a quad channel element QCE'
However, different bit stroam formats can naturally alSo be used'
1 2. Encodinq/decodinq environment
lnthefollowing,anaudioencoding/decodingenvironmentwillbedescribedinwhich
concepts according to ths presont invention can be appliod'
A 3D audio codec system, in which th6 concepts according to the present invention can
be used, is based on an MPEG-D USAC codec for decoding of channel and object
signals. To increase the efficiency for coding a large amount of objects' MPEG SAOC
technologyhasbeenadapted.Threetypesofrenderersperformthetasksofrendering
objects to channels, rendering channels to h€adphonos or rendering chann€ls to a
different roudspeaker setup. when object signars are expricifly transmitted or
parametrically encoded using SAOC, the corresponding object metadata information is
compressed and multiplexed into the 3D audio bit stream'
Fig, 15 shows a block schematic diagram of such an audio encoder' and Fig 16 shows a
block schematic diagram of such an audio decoder' ln other words' Figs '15 and '16 show
the different algorithmic blocks of the 3D audio system'
Taking reference now to Fig 15, which shows a block schematic diagram of a 3D audio
encoder 1500, some details will be explained' The encoder 15OO comprises an optional
pre-renderer/mixer 1510, which receives one or more channel signals 1512 and one or
more object signals 1514 and provides, on the basis thereof' one or more channel signals
1516 as well as one or more object signals 1518' 1520' The audio encoder also
comprises a USAC encoder 1530 and, optionally' a SAOC encoder 1540 The SAOC
encoder 1540 is configured to provide one or more SAOC transport channels 1542 and a
SAOC side information 1544 on the basis of one or more objects 1520 provided to the
SAoCencoder.Moreover,theUSACencoderl530isconfiguredtoreceivethechannel
signals 1516 comprising channels and pre-rendered objects from the pre-renderer/mixer'
to receive one or more object signals 1518 from the pre-renderer/mixer and to receive one
ormoreSAoCtransportchannels1542andSAoCsideinformationl544'andprovides,
20
30
35
37
10
15
20
tc
30
on the basis thereof, an encoded representation 1532. Moreover, the audio encoder 1500
also comprises an objoct m€tadata oncodor 1550 which is configurgd to receive object
metadata 1552 (which may be evaluated by the pre-renderer/mixer '1510) and to oncode
the object metadata to obtain encoded object metadats 1554. The encoded metadata i8
also received by the usAc encoder 1530 and used to provide the encoded representation
1532.
SomedetailsregardingtheindividualcomponontSoftheaudioencoderl500willbe
described below.
TakingreferencenowtoFig.16,anaudiodecoder1600willbedescribed.Theaudio
decoder 1600 is configured to receive an encoded ropresentation 1610 and to provide, on
thebasisthereof,multi-channelloudspeakersignalsl612,headphonesignalsl6'14
and/orloudspeakersignalsl6l6inanalternativeformat(forexample,ina5.lformat).
The audio decoder ,l600 comprises a USAC docod€r 1620, and provides one or more
channel signals 1622, one or more pre-rendered object signals 1624' one or more obiect
signals 1626, one or more sAoc kansport channels '1628, a SAOC side information 1630
and a compressed ob,ect metadata information '1632 on the basis of the encoded
representation .'1610. The audio decoder 1600 also comprises an object renderer 1640
which is configured to provide one or more rendered object signals 1642 on the basis of
the object signal 'l 626 and an object metadata information 1644, wherein the object
metadata information 1644 is provided by an obiect metadata decoder 1650 on the basis
of the compressed object metadata information 1632. The audio decoder 1600 also
comprises, optionally, a sAoc decoder 1660, which is configured to receive the sAoc
transport channel 1628 and the sAoc side information 1630, and to provide, on the basis
thereof, one or more rendered object signals 1662. The audio decoder 1600 also
comprises a mixer 1670, which is configured to receive the channel signals 1622, the prerendered object signals 1624, the rendered object signals 1642, and the rendered object
signals 1662, and to provide, on the basis thereof, a plurality of mixed channel signals
1672 which may, for example, constitute the multi-channel loudspeaker signals 1612 The
audio decoder 1600 may, for example, also comprise a binaural render 1680, which is
configured to receive the mixed channel signals 1672 and to provide, on the basis thereof,
the headphone signals 1614. Moreover, the audio decoder 1600 may comprise a format
conversion 1690, which is configured to receive the mixed channel signals 1672 and a
35
38
10
15
reproduction layout information 1692 and to provide, on the basis thereof, a loudspeaker
signal 1616 for an alternative loudspeaker setup.
ln the following, som6 details regarding th6 compon€nts of the audio encoder 1500 and of
the audio decoder 1600 will be describ6d.
Pre-renderer/mixer
The pre-renderer/mixer 1510 can b6 optionally uEed to convert I chennel plus object input
scene into a channel scene before oncoding. Functionally, it may, for example, be
identical to the object rendsrer/mixer dsscribcd bclow, Pre-rendoring of objacls may, for
example, ensure a deterministic signal entropy at the encoder input that is basically
independent of the number of simultaneously active object signals. ln the pre-rendering of
objects, no object metadata transmission is required. Discreet object signals are rendered
to the channel layout that the encoder is configured to us6. The weights of the objects for
each channel are obtained from th6 essociated objoct metadata (OAM) 1552.
USAC core cod€c
The core codec 1530, 1620 for loudspeaker-channel signals, discreet object signals,
object downmix signals and pre-rendered signals is based on MPEG-D USAC technology.
It handles the coding of the multitude of signals by croating channel and object mapping
information based on the geometric and semantic information of the input's channel and
object assignment. This mapping information describes how input channels and objects
are mapped to UsAC-channel elements (CPEs, SCEs, LFEs) and the corresponding
information is transmitted to the decoder. All additional payloads like sAoc data or object
metadata have been passed through extension elements and have been considered in
the encoders rate control.
The coding of objects is possible in different ways, depending on the rate/distortion
requirements and the interactivity requirements for the renderer. The following object
coding variants are possible:
1. Pre-rendered objects: object signals are pre-rendered and mixed to the 22.2
channel signals before encoding. The subsequent coding chain sees 22.2 channel
signals.
25
35
39
10
2. Discreet object wave forms: objects are supplied as monophonic wave forms to
the encoder. The encoder uses single channel elements SCEs to transfer the
obiects in addition to th6 channel signels. The d€coded obj€cts are rend€red 8nd
mixed at the receiver side. Compr€ssed object metadata information is transmitted
to the receiver/renderer along side.
3 Parametric object wave forms: object properties and there relation to each other
are described by means of SAOC parameters. The downmix of the object signals
is coded with USAC. The parametric information is transmitted along side. The
number of downmix channels is choeen dopending on th6 number of objects and
the overall data rate. Compressed object metadata information is transmitted to
the SAOC renderer.
SAOC
The SAOC encoder 1540 and the SAOC decoder 1660 for object signals are based on
MPEG SAOC technology. The system is capable of recreating, modifying and rendering a
number of audio objects based on a smaller number of transmitted channels and
additional parametric data (object level differences oLDs, inter object correlations locs,
downmix gains DMGs). The additional parametric data exhibits a significanfly lower data
rate than required for transmitting all objects individually, making the coding very efficient.
The sAoc encoder takes as input the objecvchannel signals as monophonic waveforms
and outputs the parametric information (which is packed into the 3D-audio bit stroam
1532, 16'10) and the sAoc transport channels (which are encoded using single channel
elements and transmitted).
The sAoc decoder 1600 reconstructs the objecuchannel signals from the decoded sAoc
transport channels 1628 and parametric information 1630, and generates the output audio
scene based on the reproduction layout, the decompressed object metadata information
and optionally on the user interaction information.
Obiect Metadata Codec
For each object, the associated metadata that specifies the geometrical position and
volume of the object in 3D space is efficien y coded by quantization of the object
15
20
30
35
40
10
15
20
)q
30
properties in time and space. The compressed object metadata coAM 1 554, 1632 is
transmitted to the receiver as side information.
Obiect Renderer/Mixer
The obiect renderer utiliz€s the compressed object metadata to generat€ object
waveforms according to the given reproduction format. Each object is rendered to c€rtain
output channels according to its metadata. The output of this block results from the sum of
the partial results. lf both channel ba8ed content as well as discreeUparametric obiects are
d€cod6d, tho channel baged waveforms and thc rendored object waveforms are mlxod
before outputting the resulting waveforms (or before feeding them to a post processor
module like the binaural renderer or the loudspeaker renderer module)'
Binaural Renderer
The binaural ronderer module 1680 produces a binaural downmix of the multichanncl
audio matorlal, ruch that €ach input channol is ropresented by a virtual sound Eource The
processing ig conducted frame-wise in QMF domain. The binauralization is based on
measured binaural room impulse responses.
Loudspeaker Renderer/Format Conversion
The loudspeaker renderer 1690 converts between the transmitted channel configuration
and the desired reproduction format. lt is thus called "format converter'' in the following.
The format converter performs conversions to lower numbers of output channels, i.e., it
creates downmtxes. The System automatically generates optimized downmix matrices for
the given combination of input and output formats and applies these matrices in a dowmix
process. The format converter allows for Standard loudspeaker configurations as well as
for random configurations with non-standard loudspeaker positions.
Fig. 17 shows a block schematic diagram of the format converter. As can be seen, the
format converter '1700 receives mixer output signals 1710, for example, the mixed channel
signals 1672 and provides loudspeaker signals 1712, for example, the speaker signals
1616. The format converter comprises a downmix process 1720 in the QMF domain and a
downmix configurator 1730, wherein the downmix configurator provides configuration
41
10
15
20
information for the downmix process 1720 on the basis of a mixer output layout
information 1732 and a reproduction layout information 1734.
Moreover, it should be noted that the conc6pts described above, for example the audio
encoder 100, the audio decoder 200 or 300, the audio encoder 400, the audio decoder
500 or 600, the methods 700, 800, 900, or'1000, the audio encoder 1 100 or 1200 and the
audio decoder 1300 can be used within the audio encoder 1500 and/or within the audio
decoder 1600. For example, th€ audio encoders/decoders mentioned before can be uSed
for encoding or decoding of channel signals which are associated with different spatial
positions.
1 3. Alternative embodiments
ln the following, some additional embodimonts will be described.
Taking reference now to Figs. 18 to 21, additional €mbodiments according o the invention
will be explained.
It should be noted that a so-called "Quad Channel Element" (QCE) can be considered as
a tool of an audio decoder, which can be used, for example, for decoding 3-dimensional
audio content.
ln other words, the Quad Channel Element (QCE) is a method for joint coding of four
channels for more efficient coding of horizontally and vertically distributed channels. A
QCE consists of two consecutive CPEs and is formed by hierarchically combining the
Joint Stereo Tool with possibility of Complex Stereo Prediction Tool in horizontal direction
and the MPEG Surround based stereo tool in vertical direction. This is achieved by
enabling both stereo tools and swapping output channels between applying the tools.
Stereo SBR is performed in horizontal direction to preserve the left+ight relations of high
frequencies.
Fig. 18 shows a topological structure of a QCE. lt should be noted that the QCE of Fig. 18
is very similar to the QCE of Fig. '1 1, such that reference is made to the above
explanations. However, it should be noted that, in the QCE of Fig, 18, it is not necessary
to make use of the psychoacoustic model when performing complex stereo prediction
(while, such use is naturally possible optionally). Moreover, it can be seen that first stereo
1E
35
42
10
'15
spectral bandwidth replication (Ster6o SBR) is performed on the basis of the lefl lower
channel and the right lower channel, and that that second stereo spectral bandwidth
replication (Stereo SBR) is p€rform€d on the baBis of the left upper channel and th€ right
upper channel.
ln the following, some terms and definitions will be provided, which may apply in some
embodiments.
A data element qcelndex indicates a QCE mode of a CPE Regarding the meaning of the
bitstream variable qcelndex, reference is made to Fig. 14b. lt should be noted that
qcelndex describes whether two gubsequant elements of type
UsacchannelPairElement0 ere treatod as a Quadruple Channel Element (QCE). The
different QCE modes are given in Fig. t4b. The qcelndex shall be the same for the two
subsequent elemonts forming one QCE.
ln the following, some help elements will be defined, which may be used in some
embodiments according to the invention:
cplx-out-dmx-Ll] first channel of first CPE after complex prediction stereo decoding
cplx-out-dmx-R[] second channel of first CPE after complex prediction stereo decoding
cplx-out-res-L[] second CPE after complex prediction stereo decoding
(zero if qcelndex = 1 )
cplx_out_res_R[] second channel of second CPE after complex prediction stereo
decoding (zero if qcelndex = 'l)
mps_out_L_1[] first output channel of first MPS box
mps_out_L_2[] second output channel of first MPS box
mps_out_R_1[ first output channel ofsecond MPS box
mps_out_R_2[ second output channel of second MPS box
30
43
sbr_out_L_1[]
sbr-out_R_1 []
sbr_out_L_2[]
sbr_out_R_2[]
first output channel of first Storeo SBR box
second output channel of first Stereo SBR box
first output channel of second Stereo SBR box
second output channBl of socond StEreo SBR box
10
15
ln the following, a decoding process, which is performed in an embodiment according to
the invention, will be explained.
The syntax element (or bitstream element, or data element) qcelndex in
UsacOhannelPairElementConfig0 indicates whether a CPE belongs to a QCE and if
residual coding is used. ln case that qcolndex i8 unequal 0, thE current cPE forms a QCE
together with its subsequent element which shall be a cPE having the same qcelndex.
Stereo SBR is always used for the QCE, thus the syntax item stereoConfiglndex shall be
3 and bsStereoSbr shall be 1.
ln case of qcelndex == 1 only the payloads for MPEG Surround and sBR and no relevant
audio signal data is contained in the second CPE and the syntax element
bsResidualCoding is set to 0.
The presence of a residual signal in the second CPE is indicated by qcelndex == 2. ln this
case the syntax element bsResidualCoding is set to 1.
However, some difforent and possible simplified signaling schemes may also be used.
Decoding of Joint Stereo with possibility of Complex Stereo Prediction is performed as
described in ISO/IEC 23003-3, subclause 7 .7. The resulling output of the first CPE are the
MPS downmix signals cplx-out-dmx-L[] and cplx-out-dmx-R[]. lf residual coding is used
(i.e. qcelndex == 2), the output of the second CPE are the MPS residual signals
cplx_out_res-L[], cplx-out-res-R[], if no residual signal has been transmitted (i.e.
qcelndex == '1), zero signals are inserted.
20
.)E
30
44
10
'15
20
Before applying MPEG Surround decoding, the second channel of the first elem€nt
(cplx-out-dmx-R[]) and the first channel of the second elemont (cplx-out-res-L[]) are
swapped.
DecodingofMPEGSurroundisperformedasdescribedinlso/|EC23003-3'subclause
7.11. lf residual coding is used' the decoding may' however' be modified when comparad
to conventional MPEG surround decoding in some embodiments' Decoding of MPEG
SurroundwithoutresidualusingsBRasdefinedinlsO/lEC23OO3'3'subclauseT'11'2'7
(figure 23), is modified so that Stcr€o SBR i3 also used for b8Rosidualooding == 1'
resurting in the decoder schematics shown in Fig. 'r9. Fig. 't9 shows a block schematic
diagram of an audio coder for bsReoidualCoding ==O and bsStereoSbr ==1 '
As can be seen in Fig. 19, an USAC core decoder 2010 provides a downmix signal (DMx)
2oj2 ro anMps (MPEG surround) decoder 2020, which provides a first decoded audio
signal 2O22and a second decoded audio sign8l 2024 A Stereo SBR decoder 2030
receives the first decoded audio signal 2022 andthe Eecond decoded audio signal 2024
and provides, on tne Oasis tne'"ot i tett bandwidth oxtended audio signal 2032 and a right
bandwidth extended audio signal 2034'
Before applying Stereo SBR' the second channel of the first element (mps-out-L-2[) and
the first channet of tne second element (mps-out-R-'10) are swapped to allow right-left
Stereo SBR After application of Stereo SBR' the second output ohannel of the first
element (sbr-out-R-1[]) and the first channel of the second eloment (sbr-out-L-2[l) are
25 swapped again to restore the input channel order'
A QCE decoder structure is illustrated in Fig 20' which shows a OCE decoder schematics'
It should be noted that the block schematic diagram of Fig 20 is very similar to the block
schematic diagram of Fig 13' such that reference is also made to the above explanations'
Moreover, it should be noted that some signal labeling has been added in Fig 20' wherein
reference is made to the definitions in this section' Moreover' a final resorting of the
channelsisshown,whichisperformedaftertheStereoSBR.
Fig. 21 shows a block schematic diagram of a Quad Channel Encoder 22oo' according to
an embodiment of the present in'"ntion ln other words' a Quad Channel Encoder (Quad
30
45
10
15
20
)q
channel Element), which may be considered as a core Encoder Tool, is illustrated in Fig.
21.
The Quad channel Encoder 2200 comprises a first stereo sBR 2210, which receives a
first left-channel input signal 2212 and a second left channel input signal 2214, and which
provides, on the basis thereof, a first sBR payload 2215, a first left channel sBR output
signal2216andafirstrightchannelsBRoutputsignal22ls.Moreover,theQuad
ChannelEncoder22OOcompris€gasecondStereoSBR,whichreceivesasecondl6ft'
channel input signal 2222 and a eecond right chennel input signal 2224 ' and which
provides, on the basis thereof, a first SBR payload 2225, a first left channel SBR output
signal2226 and a first right channel SBR output signal2228'
TheQuadChannelEncoder22oocomprisesafirstMPEG-Surround-type(MPS2.1-2or
UnifiedStereo)multi-channelencoder2230whichreceivesthefirstleftchannelSBR
output signal 2216 and the second left channel SBR output signal 2226' and which
provides, on the basis thereof, a first MPS payloed 2232' a left channel MPEG Sunound
downmixsignal2234and,optionally,aleftchannelMPEGSurroundresidualsignal2236'
TheQuadchannelEncoder2200alsocomprisesaSecondMPEG-Surround.type(MPS
2-1-2 ot Unified stereo) mutti-channel encoder 2240 which receives the first right channel
SBR output signal 2218 and the second right channel SBR output signal2228' and which
provides, on the basis thereof, a first MPS payload 2242' a right channel MPEG Surround
downmix signal 2244 and, optionally, a right channel MPEG Surround residual signal
2246.
The Quad channel Encoder 2200 comprises a first complex prediction stereo encoding
2250,whichreceivestheleftchannelMPEGsurrounddownmixsignal2234andtheright
channelMPEGSurrounddownmixsignal2244,andwhichprovides,onthebasisthereof,
acomplexpredictionpayload2252andajointlyencododrepresentation2254oftheleft
channel MPEG Surround downmix signal 2234 and the right channel MPEG Surround
downmix signal 2244- The Quad Channel Encoder 2200 comprises a second complex
prediction stereo encoding 2260, which receives the left channel MPEG Surround residual
signal 2236 and the right channel MPEG Surround residual signal 2246' and which
provides, on the basis thereof, a complex prediction payload 2262 and a iointly encoded
representation 2264 ot the left channel MPEG Surround downmix signal 2236 and the
right channel MPEG Surround downmix signal2246'
46
10
.A
20
The Quad channel Encoder also comprises a first bitstream encoding 2270, which
receives the joinfly encoded representation 2254, the complex prediction payload 2252m
the MPS payload 2232 and the sBR payload 2215 and provides, on the basis th€reof, a
bitstream portion representing a first channol pair olement. The Quad Channel Encoder
also comprises a second bitstream Encoding 2280, which receives the jointly encodsd
representation 2264, the complex prediction payload 2262, the MPS payload 2242 and
the SBR payload 2225 and provides, on the basis thereof, a bitstream portion
representing a first channel pair element.
14, lmpl€mentation Alternativos
AlthoughSomeaspectshavebeendescribedinthecontextofanapparatus,itisclearthat
these aspects also represent a description of the corresponding method' where a block or
device corresponds to a method step or a feature of a method step Analogously' aspects
described in the context of a method step also represent a description of a corresponding
block or item or feature of a corrosponding apperatuE' Some or all of the method 3tep3
may be executed by (or using) a hardware apparatus, like for example' a microprocessor'
a programmable computer or an electronic circuit ln some embodiments' some one or
more of the most important method steps may be executed by such an apparatus'
The inventive encoded audio signal can be stored on a digital storage medium or can be
transmittedonatransmissionm€diumsuchaSawirelesstransmissionmediumorawired
transmission medium such as the lnternet
Depending on certain implementation requirements' embodiments of the invention can be
implementedinhardwareorinsoffuvare.Theimplementationcanbeperformedusinga
digital storage medium, for example a floppy disk, a DVD' a Blu-Ray' a CD' a ROM' a
PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable
control signals stor€d thereon, which cooperate (or are capable of cooperating) with a
programmablecomputersystemsuchthattherespectivemethodisperformed.Therefore,
the digital storage medium may be computer readable'
Someembodimentsaccordingtotheinventioncompriseadatacarrierhaving
35 electronically readable control signals, which are capable of cooperating with a
25
47
10
15
2U
25
programmable computer system, such that one of the methods described herein is
performed.
Generally, embodiments of the present invention can bo implemented as a computer
program product with a program code, the program code being operative for performing
one of the methods when the computer program product runs on a computer' The
program code may for example be stored on a machine roadable carrier'
otherembodimentscompris€thecomput€rprogramforperformingon€ofth€methods
described herein, stored on a machine readable carrier'
lnotherWords,anembodimentoftheinventivemethodis'therefore,acomputerprogram
havrng a program code for performing one of the methods described herein' when the
computer program runs on a computer.
Afurtherembodimentofth6inventivemethodsis,therofore,adatacarrier(oradigital
storagemedium,oracomputer'readablemedium)comprising'recordedther€on'the
computer program for performing one of the methods described herein The data carrier'
the digital storage inedium or the recorded medium are typically tangible and/or nontransitionary.
A further embodiment of the inventive method is, therefore, a data stream or a sequence
of signals representing the computer program for performing one of the methods
described herein. The data stream or the Sequ€nce of signals may for example be
configured to be transferred via a data communication connection, for example via the
lnternet.
A further embodiment comprises a processing means, for example a computer' or a
programmable logic device, configured to or adapted to perform one of the methods
described herein.
A further embodiment comprises a computer having installed thereon the computer
program for performing one of the methods described herein'
A further embodiment according to the invention comprises an apparatus or a system
configured to transfer (for example, electronically or optically) a computer program for
30
48
10
15
performing one of the methods described herein to a receiver' The receiver may' for
example, be a computer, a mobile device, a memory device or the lik6 The apparatus or
system may, for example, comprise a filo sorver for traneferring the computer program to
the receiver .
lnsomeembodim€nts,aprogrammablelogicd€vice(forexampleafieldprogrammable
gate array) may be used to perform some or all of the functionalities of the methode
described herein. ln some embodiments, a field programmable gate anay may cooperat6
with a microprocessor in order to p€rform one of the methods described herein Generally'
themethodsarepreferablyperformedbyanyhardwareapparatus.
The above described embodiments are morely illustrative for the principles of the present
inventton. lt is understood that modifications and variations of the arrangements and the
details described herein will be apparent to othors skilled in the art lt is the intent'
therefore, to be limited only by the scope of the impending patent claims and not by the
specific details presented by way of description and explanation of the embodimentg
herein.
15. Conclusions
ln the following, some conclusions will be providod
The embodiments according to the inv€ntion are based on the consideration that' to
account for signal dependencies Oetween vertically and horizontally distributed channels'
four channels can be join,y coded by hierarchically combining joint stereo coding tools'
For example, vortical channel pairs are combined using MPS 2-1'2 andlor unified stereo
with band_limited or fulFband residual coding. ln order to satisfy perceptual requirements
for binaural unmasking, the output downmixes are' for example' lointly coded by use of
complex prediction in the MDCT domain' which includes the possibility of left+ight and
mid-side coding lf residual signals are present' they are horizontally combined using the
same method.
35
49
10
15
Moreover, it should be noted that embodiments accolding to the invention overcome
some or all of the disadvantages of the prior art. Embodim€nts according to the invontion
are adapted to the 3D audio cont€xt, wher€in tho loudsposker channels are distributed in
several height layers, resulting in a horizontal and vertical channel pairs. lt has b6en found
the joint coding of only two channelg as defined in USAC is not sufiici€nt to consider the
spatial and perceptual relations between channels' However, this problem is overcome by
embodiments according to the invention.
Moreover, conventional MPEG surround is applied in an additional pre'post processing
Step, SUch that residual signals are transmitted individually without the possibility of joint
stereo coding, e.g., to explore depend€ncies between left and right radical residual
signals. ln contrast, embodiments according to the invention allow for an efficient
encoding/decoding by making use of such dependencies'
Tofurtherconclude,embodimentsaccordingtotheinventioncreateanapparatus,a
method or a computer program for €ncoding and decoding as described herein'
50
10
References:
t1l lsoilEC 23003-3: 2012 - lnformation Technology - MPEG Audio Technologies, Part 3:
Unified Speech and Audio Coding;
[2] ISOAEC 23003-1: 2OO7 - lnformation Technology - MPEG Audio Technologies, Part 1:
MPEG Surround
51
1. An audio decoder (200; 300; 600; 1300; 1600; 2000) for providing at least four
audio channel signals (220, 222, 224, 226; 320, 322, 324, 326; 620, 622, 624, 626;
5 1320, 1322, 1324, 1326) on the basis of an encoded representation (210; 310,
360; 610, 682; 1310,1312; 1610),
wherein the audio decoder is configured to provide a first residual signal (232; 332;
684; 1362) and a second residual signal (234; 334; 686; 1364) on the basis of a
10 jointly encoded representation (210; 310; 682; 1312) of the first residual signal and
of the second residual signal using a predition-based multi-channel decoding (230;
330; 680; 1360);
wherein the audio decoder is configured to provide a first audio channel signal
15 (220; 320; 642; 1372) and a second audio channel signal (222; 322; 644; 1374) on
the basis of a first downmix signal (212; 312; 632; 1342) and the first residual
signal using a residual-signal-assisted multi-channel decoding (240; 340; 640;
1370); and
20 wherein the audio decoder is configured to provide a third audio channel signal
(224; 324; 656; 1382) and a fourth audio channel signal (226; 326; 658; 1384) on
the basis of a second downmix signal (214; 314; 634; 1344) and the second
residual signal using a residual-signal-assisted multi-channel decoding (250; 350;
650; 1380).
25
2. A method (800) for providing at least four audio channel signals on the basis of an
encoded representation, the method comprising:
providing (810) a first residual signal and a second residual signal on the basis of a
30 jointly encoded representation of the first residual signal and the second residual
signal using a prediction-based multi-channel decoding;
providing (820) a first audio channel signal and a second audio channel signal on
the basis of a first downmix signal and the first residual signal using a residual35 signal-assisted multi-channel decoding; and
52
providing (830) a third audio channel signal and a fourth audio channel signal on
the basis of a second downmix signal and the second residual signal using a
residual-signal-assisted multi-channel decoding.
5
3. A computer program for performing the method according to claim 2 when the
computer program runs on a computer.
53
10
t3
Audio Encoder, Audio Decoder, Mothods and ComPuter Program Using Jointly
Encoded Rasidual Signals

Documents

Application Documents

#	Name	Date
1	202538042737-STATEMENT OF UNDERTAKING (FORM 3) [02-05-2025(online)].pdf	2025-05-02
2	202538042737-REQUEST FOR EXAMINATION (FORM-18) [02-05-2025(online)].pdf	2025-05-02
3	202538042737-PROOF OF RIGHT [02-05-2025(online)].pdf	2025-05-02
4	202538042737-FORM 18 [02-05-2025(online)].pdf	2025-05-02
5	202538042737-FORM 1 [02-05-2025(online)].pdf	2025-05-02
6	202538042737-FIGURE OF ABSTRACT [02-05-2025(online)].pdf	2025-05-02
7	202538042737-DRAWINGS [02-05-2025(online)].pdf	2025-05-02
8	202538042737-DECLARATION OF INVENTORSHIP (FORM 5) [02-05-2025(online)].pdf	2025-05-02
9	202538042737-COMPLETE SPECIFICATION [02-05-2025(online)].pdf	2025-05-02
10	202538042737-FORM-26 [27-06-2025(online)].pdf	2025-06-27
11	202538042737-FORM 3 [25-07-2025(online)].pdf	2025-07-25
12	202538042737-FORM 3 [30-10-2025(online)].pdf	2025-10-30