Frame Element Length Transmission In Audio Coding

< Back

Frame Element Length Transmission In Audio Coding

Abstract: Frame elements which shall be made available for skipping may are transmitted more efficiently by arranging that a default payload length information is transmitted separately within a configuration block, with the length information within the frame elements, in turn being subdivided into a default payload length flag followed, if the default payload length flag is not set, by a payload length value explicitly coding the payload length of the respective frame element. However, if the default payload length flag is set, an explicit transmission of the payload length may be avoided. Rather, any frame element, the default extension payload length flag of which is set, has the default payload length and any frame element, the default extension payload length flag of which is not set, has a payload length corresponding to the payload length value. By this measure, transmission effectiveness is increased.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

23 September 2013

Publication Number

01/2014

Publication Type

INA

Invention Field

COMMUNICATION

Status

Parent Application

Patent Number

Legal Status

Grant Date

2020-04-20

Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Hansastraße 27c, 80686 München, GERMANY

DOLBY INTERNATIONAL AB

Apollo Building, 3E Herikerbergweg 1-35, 1101 CN Amsterdam Zuid-Oost, NETHERLANDS

KONINKLIJKE PHILIPS N.V.

High Tech Campus 5, NL 5656 AE Eindhoven, NETHERLANDS

Inventors

1. NEUENDORF, Max

Theatergasse 17, 90402 Nürnberg, GERMANY

2. MULTRUS, Markus

Etzlaubweg 7, 90469 Nürnberg, GERMANY

3. DÖHLA, Stefan

Hartmannstr. 47a, 91052 Erlangen, GERMANY

4. PURNHAGEN, Heiko

Gjuteribacken 17, 17265 Sundbyberg, SWEDEN

5. DE BONT, Frans

De Hasselt 7, NL-5561 CC Riethoven, NETHERLANDS

Specification

Frame Element Length Transmission in Audio Coding
Specification
The present invention relates to audio coding, such as the so-called USAC codec (USAC =
Unified Speech and Audio Coding) and, in particular, the frame element length
transmission.
In recent years, several audio codecs have been made available, each audio codec being
specifically designed to fit to a dedicated application. Mostly, these audio codecs are able
to code more than one audio channel or audio signal in parallel. Some audio codecs are
even suitable for differently coding audio content by differently grouping audio channels
or audio objects of the audio content and subjecting these groups to different audio coding
principles. Even further, some of these audio codecs allow for the insertion of extension
data into the bitstream so as to accommodate for future extensions/developments of the
audio codec.
One example of such audio codecs is the USAC codec as defined in ISO/IEC CD 23003-3.
This standard, named "Information Technology - MPEG Audio Technologies - Part 3:
Unified Speech and Audio Coding", describes in detail the functional blocks of a reference
model of a call for proposals on unified speech and audio coding.
Figs. 5a and 5b illustrate encoder and decoder block diagrams. In the following, the
general functionality of the individual blocks is briefly explained. Thereupon, the problems
in putting all of the resulting syntax portions together into a bitstream is explained with
respect to Fig. 6.
Figs. 5a and 5b illustrate encoder and decoder block diagrams. The block diagrams of the
USAC encoder and decoder reflect the structure of MPEG-D USAC coding. The general
structure can be described like this: First there is a common pre/post-processing consisting
of an MPEG Surround (MPEGS) functional unit to handle stereo or multi-channel
processing and an enhanced SBR (eSBR) unit which handles the parametric representation
of the higher audio frequencies in the input signal. Then there are two branches, one
consisting of a modified Advanced Audio Coding (AAC) tool path and the other consisting
of a linear prediction coding (LP or LPC domain) based path, which in turn features either
a frequency domain representation or a time domain representation of the LPC residual.
All transmitted spectra for both, AAC and LPC, are represented in MDCT domain
following quantization and arithmetic coding. The time domain representation uses an
ACELP excitation coding scheme.
The basic structure of the MPEG-D USAC is shown in Figure 5a and Figure 5b. The data
flow in this diagram is from left to right, top to bottom. The functions of the decoder are to
find the description of the quantized audio spectra or time domain representation in the
bitstream payload and decode the quantized values and other reconstruction information.
In case of transmitted spectral information the decoder shall reconstruct the quantized
spectra, process the reconstructed spectra through whatever tools are active in the bitstream
payload in order to arrive at the actual signal spectra as described by the input bitstream
payload, and finally convert the frequency domain spectra to the time domain. Following
the initial reconstruction and scaling of the spectrum reconstruction, there are optional
tools that modify one or more of the spectra in order to provide more efficient coding.
In case of transmitted time domain signal representation, the decoder shall reconstruct the
quantized time signal, process the reconstructed time signal through whatever tools are
active in the bitstream payload in order to arrive at the actual time domain signal as
described by the input bitstream payload.
For each of the optional tools that operate on the signal data, the option to "pass through" is
retained, and in all cases where the processing is omitted, the spectra or time samples at its
input are passed directly through the tool without modification.
In places where the bitstream changes its signal representation from time domain to
frequency domain representation or from LP domain to non-LP domain or vice versa, the
decoder shall facilitate the transition from one domain to the other by means of an
appropriate transition overlap-add windowing.
eSBR and MPEGS processing is applied in the same manner to both coding paths after
transition handling.
The input to the bitstream payload demultiplexer tool is the MPEG-D USAC bitstream
payload. The demultiplexer separates the bitstream payload into the parts for each tool, and
provides each of the tools with the bitstream payload information related to that tool.
The outputs from the bitstream payload demultiplexer tool are:
• Depending on the core coding type in the current frame either:
o the quantized and noiselessly coded spectra represented by
o scale factor information
o arithmetically coded spectral lines
• or: linear prediction (LP) parameters together with an excitation signal represented
by either:
o quantized and arithmetically coded spectral lines (transform coded
excitation, TCX) or
o ACELP coded time domain excitation
• The spectral noise filling information (optional)
• The M/S decision information (optional)
· The temporal noise shaping (TNS) information (optional)
• The filterbank control information
• The time unwarping (TW) control information (optional)
• The enhanced spectral bandwidth replication (eSBR) control information (optional)
• The MPEG Surround (MPEGS) control information
The scale factor noiseless decoding tool takes information from the bitstream payload
demultiplexer, parses that information, and decodes the Huffman and DPCM coded scale
factors.
The input to the scale factor noiseless decoding tool is:
• The scale factor information for the noiselessly coded spectra
The output of the scale factor noiseless decoding tool is:
• The decoded integer representation of the scale factors:
The spectral noiseless decoding tool takes information from the bitstream payload
demultiplexer, parses that information, decodes the arithmetically coded data, and
reconstructs the quantized spectra. The input to this noiseless decoding tool is:
· The noiselessly coded spectra
The output of this noiseless decoding tool is:
• The quantized values of the spectra
The inverse quantizer tool takes the quantized values for the spectra, and converts the
integer values to the non-scaled, reconstructed spectra. This quantizer is a companding
quantizer, whose companding factor depends on the chosen core coding mode.
The input to the Inverse Quantizer tool is:
• The quantized values for the spectra
The output of the inverse quantizer tool is:
• The un-scaled, inversely quantized spectra
The noise filling tool is used to fill spectral gaps in the decoded spectra, which occur when
spectral value are quantized to zero e.g. due to a strong restriction on bit demand in the
encoder. The use of the noise filling tool is optional.
The inputs to the noise filling tool are:
• The un-scaled, inversely quantized spectra
• Noise filling parameters
• The decoded integer representation of the scale factors
The outputs to the noise filling tool are:
• The un-scaled, inversely quantized spectral values for spectral lines which were
previously quantized to zero.
• Modified integer representation of the scale factors
The rescaling tool converts the integer representation of the scale factors to the actual
values, and multiplies the un-scaled inversely quantized spectra by the relevant scale
factors.
The inputs to the scale factors tool are:
• The decoded integer representation of the scale factors
• The un-scaled, inversely quantized spectra
The output from the scale factors tool is:
• The scaled, inversely quantized spectra
For an overview over the M/S tool please refer to ISO/IEC 14496-3:2009, 4.1.1.2.
For an overview over the temporal noise shaping (TNS) tool, please refer to ISO/IEC
14496-3:2009, 4.1.1.2.
The filterbank / block switching tool applies the inverse of the frequency mapping that was
carried out in the encoder. An inverse modified discrete cosine transform (IMDCT) is used
for the filterbank tool. The IMDCT can be configured to support 120, 128, 240, 256, 480,
512, 960 or 1024 spectral coefficients.
The inputs to the filterbank tool are:
• The (inversely quantized) spectra
• The filterbank control information
The output(s) from the filterbank tool is (are):
• The time domain reconstructed audio signal(s).
The time-warped filterbank / block switching tool replaces the normal filterbank / block
switching tool when the time warping mode is enabled. The filterbank is the same
(IMDCT) as for the normal filterbank, additionally the windowed time domain samples are
mapped from the warped time domain to the linear time domain by time-varying
resampling.
The inputs to the time-warped filterbank tools are:
• The inversely quantized spectra
• The filterbank control information
· The time-warping control information
The output(s) from the filterbank tool is (are):
• The linear time domain reconstructed audio signal(s).
The enhanced SBR (eSBR) tool regenerates the highband of the audio signal. It is based on
replication of the sequences of harmonics, truncated during encoding. It adjusts the spectral
envelope of the generated highband and applies inverse filtering, and adds noise and
sinusoidal components in order to recreate the spectral characteristics of the original signal.
The input to the eSBR tool is:
· The quantized envelope data
• Misc. control data
• a time domain signal from the frequency domain core decoder or the ACELP/TCX
core decoder
The output of the eSBR tool is either:
• a time domain signal or
• a QMF-domain representation of a signal, e.g. in the MPEG Surround tool is used.
The MPEG Surround (MPEGS) tool produces multiple signals from one or more input
signals by applying a sophisticated upmix procedure to the input signal(s) controlled by
appropriate spatial parameters. In the USAC context MPEGS is used for coding a multi¬
channel signal, by transmitting parametric side information alongside a transmitted
downmixed signal.
The input to the MPEGS tool is:
• a downmixed time domain signal or
• a QMF-domain representation of a downmixed signal from the eSBR tool
The output of the MPEGS tool is:
• a multi-channel time domain signal
The Signal Classifier tool analyses the original input signal and generates from it control
information which triggers the selection of the different coding modes. The analysis of the
input signal is implementation dependent and will try to choose the optimal core coding
mode for a given input signal frame. The output of the signal classifier can (optionally)
also be used to influence the behavior of other tools, for example MPEG Surround,
enhanced SBR, time-warped filterbank and others.
The input to the signal Classifier tool is:
the original unmodified input signal
additional implementation dependent parameters
The output of the Signal Classifier tool is:
• a control signal to control the selection of the core codec (non-LP filtered
frequency domain coding, LP filtered frequency domain or LP filtered time domain
coding)
The ACELP tool provides a way to efficiently represent a time domain excitation signal by
combining a long term predictor (adaptive codeword) with a pulse-like sequence
(innovation codeword). The reconstructed excitation is sent through an LP synthesis filter
to form a time domain signal.
The input to the ACELP tool is:
adaptive and innovation codebook indices
adaptive and innovation codes gain values
other control data
inversely quantized and interpolated LPC filter coefficients
The output of the ACELP tool is:
• The time domain reconstructed audio signal
The MDCT based TCX decoding tool is used to turn the weighted LP residual
representation from an MDCT-domain back into a time domain signal and outputs a time
domain signal including weighted LP synthesis filtering. The IMDCT can be configured to
support 256, 512, or 1024 spectral coefficients.
The input to the TCX tool is:
• The (inversely quantized) MDCT spectra
• inversely quantized and interpolated LPC filter coefficients
The output of the TCX tool is:
5 · The time domain reconstructed audio signal
The technology disclosed in ISO/IEC CD 23003-3, which is incorporated herein by
reference allows the definition of channel elements which are, for example, single channel
elements only containing payload for a single channel or channel pair elements comprising
10 payload for two channels or LFE (Low-Frequency Enhancement) channel elements
comprising payload for an LFE channel.
Naturally, the USAC codec is not the only codec which is able to code and transfer
information on a more complicated audio codec of more than one or two audio channels or
15 audio objects via one bitstream. Accordingly, the USAC codec merely served as a concrete
example.
Fig. 6 shows a more general example of an encoder and decoder, respectively, both
depicted in one common scenery where the encoder encodes audio content 10 into a
20 bitstream 12, with the decoder decoding the audio content or at least a portion thereof,
from the bitstream 12. The result of the decoding, i.e. the reconstruction, is indicated at 14.
As illustrated in Fig. 6, the audio content 10 may be composed of a number of audio
signals 16. For example, the audio content 10 may be a spatial audio scene composed of a
number of audio channels 16. Alternatively, the audio content 10 may represent a
25 conglomeration of audio signals 16 with the audio signals 16 representing, individually
and/or in groups, individual audio objects which may be put together into an audio scene at
the discretion of a decoder's user so as to obtain the reconstruction 14 of the audio content
10 in the form of, for example, a spatial audio scene for a specific loudspeaker
configuration. The encoder encodes the audio content 10 in units of consecutive time
30 periods. Such a time period is exemplarily shown at 18 in Fig. 6. The encoder encodes the
consecutive periods 18 of the audio content 10 using the same manner: that is, the encoder
inserts into the bitstream 12 one frame 20 per time period 18. In doing so, the encoder
decomposes the audio content within the respective time period 18 into frame elements, the
number and the meaning/type of which is the same for each time period 18 and frame 20,
35 respectively. With respect to the USAC codec outlined above, for example, the encoder
encodes the same pair of audio signals 16 in every time period into a channel pair
element of the elements 22 of the frames 20, while using another coding principle, such as
single channel encoding for another audio signal 16 so as to obtain a single channel
element 22 and so forth. Parametric side information for obtaining an upmix of audio
signals out of a downmix audio signal as defined by one or more frame elements 22 is
collected to form another frame element within frame 20. In that case, the frame element
conveying this side information relates to, or forms a kind of extension data for, other
frame elements. Naturally, such extensions are not restricted to multi-channel or multiobject
side information.
One possibility is to indicate within each frame element 22 of what type the respective
frame element is. Advantageously, such a procedure allows for coping with future
extensions of the bitstream syntax. Decoders which are not able to deal with certain frame
element types, would simply skip the respective frame elements within the bitstream by
exploiting respective length information within these frame elements. Moreover, it is
possible to allow for standard conform decoders of different type: some are able to
understand a first set of types, while others understand and can deal with another set of
types; alternative element types would simply be disregarded by the respective decoders.
Additionally, the encoder would be able to sort the frame elements at his discretion so that
decoders which are able to process such additional frame elements may be fed with the
frame elements within the frames 20 in an order which, for example, minimizes buffering
needs within the decoder. Disadvantageously, however, the bitstream would have to
convey frame element type information per frame element, the necessity of which, in turn,
negatively affects the compression rate of the bitstream 12 on the one hand and the
decoding complexity on the other hand as the parsing overhead for inspecting the
respective frame element type information occurs within each frame element.
Moreover, in order to allow for skipping frame elements to be skipped, the bitstream 12
has to convey the afore-mentioned length information concerning the frame elements
potentially to be skipped. This transmission in turn reduces the compression efficiency.
Naturally, it would be possible to otherwise fix the order among the frame elements 22,
such as per convention, but such a procedure prevents encoders from having the freedom
to rearrange frame elements due to, for example, specific properties of future extension
frame elements necessitating or suggesting, for example, a different order among the frame
elements.
Further, it would be favorable if the transmission of the length information could be
performed more effectively.
Accordingly, there is a need for another concept of a bitstream, encoder and decoder,
respectively.
Accordingly, it is the object of the present invention to provide a bitstream, an encoder and
a decoder which solve the just-outlined problem and allow for obtaining a more efficient
way of length information transmission.
This object is achieved by the subject matter of the pending independent claims.
The present invention is based on the finding that frame elements which shall be made
available for skipping may be transmitted more efficiently if a default payload length
information is transmitted separately within a configuration block, with the length
information within the frame elements, in turn, being subdivided into a default payload
length flag followed, if the default payload length flag is not set, by a payload length value
explicitly coding the payload length of the respective frame element. However, if the
default payload length flag is set, an explicit transmission of the payload length may be
avoided. Rather, any frame element, the default extension payload length flag of which is
set, has the default payload length and any frame element, the default extension payload
length flag of which is not set, has a payload length corresponding to the payload length
value. By this measure, transmission effectiveness is increased.
In accordance with an embodiment of the present application, the bitstream syntax is
further designed to take advantage of the finding that a better compromise between a too
high bitstream and decoding overhead on the one hand and flexibility of frame element
positioning on the other hand may be obtained if each of the sequence of frames of the
bitstream comprises a sequence of N frame elements and, on the other hand, the bitstream
comprises a configuration block comprising a field indicating the number of elements N
and a type indication syntax portion indicating, for each element position of the sequence
of N element positions, an element type out of a plurality of element types with, in the
sequences of N frame elements of the frames, each frame element being of the element
type indicated, by the type indication portion, for the respective element position at which
the respective frame element is positioned within the sequence of N frame elements of the
respective frame in the bitstream. Thus, the frames are equally structured in that each
frame comprises the same sequence of N frame elements of the frame element type
indicated by the type indication syntax portion, positioned within the bitstream in the same
sequential order. This sequential order is commonly adjustable for the sequence of frames
by use of the type indication syntax portion which indicates, for each element position of
the sequence of N element positions, an element type out of a plurality of element types.
By this measure, the frame element types may be arranged in any order, such as at the
encoder's discretion, so as to choose the order which is the most appropriate for the frame
element types used, for example.
The plurality of frame element types may, for example, include an extension element type
with merely frame elements of the extension element type comprising the length
information on the length of the respective frame element so that decoders not supporting
the specific extension element type, are able to skip these frame elements of the extension
element type using the length information as a skip interval length. On the other hand,
decoders able to handle these frame elements of the extension element type accordingly
process the content or payload portion thereof. Frame elements of other element types
may not comprise such length information. If, in accordance with the just mentioned more
specific embodiment, the encoder is able to freely position these frame elements of the
extension element type within the sequence of frame elements of the frames, buffering
overhead at the decoders may be minimized by choosing the frame element type order
appropriately and signaling same within the type indication syntax portion.
Advantageous implementations of embodiments of the present invention are the subject of
the dependent claims.
Further, preferred embodiments of the present application are described below with respect
to the figures, among which:
Fig. 1 shows a schematic block diagram of an encoder and its input and output in
accordance with an embodiment;
Fig. 2 shows a schematic block diagram of a decoder and its input and output in
accordance with an embodiment;
Fig. 3 schematically shows a bitstream in accordance with an embodiment;
Fig. 4 a to z and za to zc show tables of pseudo code, illustrating a concrete syntax of
bitstream in accordance with an embodiment; and
Fig. 5 a and b show a block diagram of a USAC encoder and decoder; and
. 6 shows a typical pair of encoder and decoder
Fig. 1 shows an encoder 24 in accordance with an embodiment. The encoder 24 is for
encoding an audio content 10 into a bitstream 12.
As described in the introductory portion of the specification of the present application, the
audio content 10 may be a conglomeration of several audio signals 16. The audio signals
16 represent, for example, individual audio channels of a spatial audio scene. Alternatively,
the audio signals 16 form audio objects of a set of audio objects together defining an audio
scene for free mixing at the decoding side. The audio signals are defined at a common
time basis t as illustrated at 26. That is, the audio signals 16 may relate to the same time
interval and may, accordingly, be time aligned relative to each other.
The encoder 24 is configured to encode consecutive time periods 18 of the audio content
10 into a sequence of frames 20 so that each frame 20 represents a respective one of the
time periods 18 of the audio content 10. The encoder 24 is configured to, in some sense,
encode each time period in the same way such that each frame 20 comprises a sequence of
a number of elements N of frame elements. Within each frame 20, it holds true that each
frame element 22 is of a respective one of a plurality of element types. In particular, the
sequence of frames 20 is a composition of N sequences of frame elements 22 with each
frame element 22 being of a respective one of a plurality of element types such that each
frame 20 comprises one frame element 22 out of each of the N sequences of frame
elements 22, respectively, and for each sequence of frame elements 22, the frame elements
22 are of equal element type relative to each other. In the embodiments described further
below, the N frame elements within each frame 20 are arranged within the bitstream 12
such that frame elements 22 positioned at a certain element position are of the same or
equal element type and form one of the N sequences of frame elements, sometimes called
substreams in the following. That is, the first frame elements 22 in the frames 20 are of the
same element type and form a first sequence (or substream) of frame elements, the second
frame elements 22 of all frames 20 are of an element type equal to each other and form a
second sequence of frame elements, and so forth. However, it is emphasized that this
aspect of the following embodiments is merely optional and all of the subsequently
outlined embodiments may be modified in this regard: for example, instead of keeping the
order among the frame elements of the N substreams within each frame 20 constant with
transferring the information concerning the element types of the substreams within the
configuration block, all of the subsequently explained embodiments may be revised in that
a respective element type of the frame elements is contained within the frame element
syntax itself so that the order among the substreams within each frame 20 may change
between different frames. Naturally, such a modification would come at the cost of giving
up the advantage regarding transmission effectiveness as further explained below. Even
alternatively, the order ccould be fixed but somehow predefined by convention so that no
indication within the configuration block would be necessary.
As will be outlined in more detail below, the substreams conveyed by the sequence of
frames 20 convey information which enables a decoder to reconstruct the audio content.
While some of the substreams may be indispensible, others are somehow optional and may
be skipped by some of the decoders. For example, some of the substreams may represent
side information with respect to other substreams and may, for example, be dispensable.
This will be explained in more detail below. However, in order to allow for decoders to
skip some of the frame elements or, to be more precise, the frame elements of at least one
of the sequences of frame elements, i.e. substreams, the encoder 24 is configured to write a
configuration block 28 into the bitstream 12, which comprises a default payload length
information on a default payload length. Further, the encoder writes for each frame
element 22 of this at least one substream a length information into the bitstream 12,
comprising, for at least a subset of the frame elements 22 of this at least one substream, a
default payload length flag followed, if the default payload length flag is not set, by a
payload length value. Any frame element of the at least one of the sequences of frame
elements 22, the default extension payload length flag of which is set, has the default
payload length, and any frame element of the at least one of the sequences of frame
elements 22, the default extension payload length flag 64 of which is not set, has a payload
length corresponding to the payload length value. By this measure, an explicit transmission
of the payload length for each frame element of a skippable substream may be avoided.
Rather, depending on the payload type conveyed by such frame elements, the statistics of
the payload length may be such that the transmission effectiveness is greatly increased by
referring to the default payload length rather than explicitly transmitting the payload length
for each frame element again and again.
Thus, after having rather generally described the bitstream, in the following the same is
described in more detail with respect to more specific embodiments. As mentioned before,
in these embodiments the constant, but adjustable order among the substreams within the
consecutive frames 20 merely represents an optional feature and may be changed in these
embodiments.
In accordance with an embodiment, for example, the encoder 24 is configured such that the
plurality of element types comprises the following:
a) frame elements of a single-channel element type, for example, may be generated by the
encoder 24 to represent one single audio signal. Accordingly, the sequence of frame
elements 22 at a certain element position within the frames 20, e.g. the i element frames
with 0 > i > N+l, which, hence, form the i substream of frame elements, would together
represent consecutive time periods 18 of such a single audio signal. The audio signal thus
represented could directly correspond to any one of the audio signals 16 of the audio
content 10. Alternatively, however, and as will be described in more detail below, such a
represented audio signal may be one channel out of a downmix signal which, along with
payload data of frame elements of another frame element type, positioned at another
element position within the frames 20, yields a number of audio signals 16 of the audio
content 10 which is higher than the number of channels of the just-mentioned downmix
signal. In case of the embodiment described in more detail below, frame elements of such
single-channel element type are denoted UsacSingleChannelElement. In the case of MPEG
Surround and SAOC, for example, there is only a single downmix signal, which can be
mono, stereo, or even multichannel in the case of MPEG Surround. In the latter case the,
e.g. 5.1 downmix, consists of two channel pair elements and one single channel element. In
this case the single channel element, as well as the two channel pair elements, are only a
part of the downmix signal. In the stereo downmix case, a channel pair element will be
used.
b) Frame elements of a channel pair element type may be generated by the encoder 24 so
as to represent a stereo pair of audio signals. That is, frame elements 22 of that type, which
are positioned at a common element position within the frames 20, would together form a
respective substream of frame elements which represent consecutive time periods 18 of
such a stereo audio pair. The stereo pair of audio signals thus represented could be directly
any pair of audio signals 16 of the audio content 10, or could represent, for example, a
downmix signal, which along with payload data of frame elements of another element type
that are positioned at another element position yield a number of audio signals 16 of the
audio content 10 which is higher than 2. In the embodiment described in more detail
below, frame elements of such channel pair element type are denoted as
UsacChannelPairElement.
c) In order to convey information on audio signals 16 of the audio content 10 which need
less bandwidth such as subwoofer channels or the like, the encoder 24 may support frame
elements of a specific type with frame elements of such a type, which are positioned at a
common element position, representing, for example, consecutive time periods 18 of a
single audio signal. This audio signal may be any one of the audio signals 16 of the audio
content 10 directly, or may be part of a downmix signal as described before with respect to
the single channel element type and the channel pair element type. In the embodiment
described in more detail below, frame elements of such a specific frame element type are
denoted UsacLfeElement.
d) Frame elements of an extension element type could be generated by the encoder 24 so as
to convey side information along with a bitstream so as to enable the decoder to upmix any
of the audio signals represented by frame elements of any of the types a, b and/or c to
obtain a higher number of audio signals. Frame elements of such an extension element
type, which are positioned at a certain common element position within the frames 20,
would accordingly convey side information relating to the consecutive time period 8 that
enables upmixing the respective time period of one or more audio signals represented by
any of the other frame elements so as to obtain the respective time period of a higher
number of audio signals, wherein the latter ones may correspond to the original audio
signals 16 of the audio content 10. Examples for such side information may, for example,
be parametric side information such as, for example, MPS or SAOC side information.
In accordance with the embodiment described in detail below, the available element types
merely consist of the above outlined four element types, but other element types may be
available as well. On the other hand, only one or two of the element types a to c may be
available.
As became clear from the above discussion, the omission of frame elements 22 of the
extension element type from the bitstream 1 or the neglection of these frame elements in
decoding, does not completely render the reconstruction of the audio content 10
impossible: at least, the remaining frame elements of the other element types convey
enough information to yield audio signals. These audio signals do not necessarily
correspond to the original audio signals of the audio content 10 or a proper subset thereof,
but may represent a kind of "amalgam" of the audio content 10. That is, frame elements of
the extension element type may convey information (payload data) which represents side
information with respect to one or more frame elements positioned at different element
positions within frames 20.
In the embodiment described below, however, frame elements of the extension element
type are not restricted to such a kind of side information conveyance. Rather, frame
elements of the extension element type are, in the following, denoted UsacExtElement and
are defined to convey payload data along with length information wherein the latter length
information enables decoders receiving the bitstream 12, so as to skip these frame elements
of the extension element type in case of, for example, the decoder being unable to process
the respective payload data within these frame elements. This is described in more detail
below.
Before proceeding with the description of the encoder of Fig. 1, however, it should be
noted that there are several possibilities for alternatives for the element types described
above. This is especially true for the extension element type described above. In particular,
in case of the extension element type being configured such that the payload data thereof is
skippable by decoders which are, for example, not able to process the respective payload
data, the payload data of these extension element type frame elements could be any
payload data type. This payload data could form side information with respect to payload
data of other frame elements of other frame element types, or could form self-contained
payload data representing another audio signal, for example. Moreover, even in case of the
payload data of the extension element type frame elements representing side information of
payload data of frame elements of other frame element types, the payload data of these
extension element type frame elements is not restricted to the kind just-described, namely
multi-channel or multi-object side information. Multi-channel side information payload
accompanies, for example, a downmix signal represented by any of the frame elements of
the other element type, with spatial cues such as binaural cue coding (BCC) parameters
such as inter channel coherence values (ICC), inter channel level differences (ICLD),
and/or inter channel time differences (ICTD) and, optionally, channel prediction
coefficients, which parameters are known in the art from, for example, the MPEG
Surround standard. The just mentioned spatial cue parameters may, for example, be
transmitted within the payload data of the extension element type frame elements in a
time/frequency resolution, i.e. one parameter per time/frequency tile of the time/frequency
grid. In case of multi-object side information, the payload data of the extension element
type frame element may comprise similar information such as inter-object cross-correlation
(IOC) parameters, object level differences (OLD) as well as downmix parameters revealing
how original audio signals have been downmixed into a channel(s) of a downmix signal
represented by any of the frame elements of another element type. Latter parameters are,
for example, known in the art from the SAOC standard. However, an example of a
different side information which the payload data of extension element type frame
elements could represent is, for example, SBR data for parametrically encoding an
envelope of a high frequency portion of an audio signal represented by any of the frame
elements of the other frame element types, positioned at a different element position within
frames 20 and enabling, for example, spectral band replication by use of the low frequency
portion as obtained from the latter audio signal as a basis for the high-frequency portion
with then forming the envelope of the high frequency portion thus obtained by the SBR
data's envelope. More generally, the payload data of frame elements of the extension
element type could convey side information for modifying audio signals represented by
frame elements of any of the other element types, positioned at a different element position
within frame 20, either in the time domain or in the frequency domain wherein the
frequency domain may, for example, be a QMF domain or some other filterbank domain or
transform domain.
Proceeding further with the description of the functionality of encoder 24 of Fig. 1, same is
configured to encode into the bitstream 12 a configuration block 28 which comprises a
field indicating the number of elements N, and a type indication syntax portion indicating,
for each element position of the sequence of N element positions, the respective element
type. Accordingly, the encoder 24 is configured to encode, for each frame 20, the sequence
of N frame elements 22 into the bitstream 12 so that each frame element 22 of the
sequence of N frame elements 22, which is positioned at a respective element position
within the sequence of N frame elements 22 in the bitstream 12, is of the element type
indicated by the type indication portion for the respective element position. In other words,
the encoder 24 forms N substreams, each of which is a sequence of frame elements 22 of a
respective element type. That is, for all of these N substreams, the frame elements 22 are of
equal element type, while frame elements of different substreams may be of a different
element type. The encoder 24 is configured to multiplex all of these frame elements into
bitstream 12 by concatenating all N frame elements of these substreams concerning one
common time period to form one frame 20. Accordingly, in the bitstream 12 these
frame elements 22 are arranged in frames 20. Within each frame 20, the representatives of
the N substreams, i.e. the N frame elements concerning the same time period 18, are
arranged in the static sequential order defined by the sequence of element positions and the
type indication syntax portion in the configuration block 28, respectively.
By use of the type indication syntax portion, the encoder 24 is able to freely choose the
order, using which the frame elements 22 of the N substreams are arranged within frames
20. By this measure, the encoder 24 is able to keep, for example, buffering overhead at the
decoding side as low as possible. For example, a substream of frame elements of the
extension element type which conveys side information for frame elements of another
substream (base substream), which are of a non-extension element type, may be positioned
at an element position within frames 20 immediately succeeding the element position at
which these base substream frame elements are located in the frames 20. By this measure,
the buffering time during which the decoding side has to buffer results, or intermediate
results, of the decoding of the base substream for an application of the side information
thereon, is kept low, and the buffering overhead may be reduced. In case of the side
information of the payload data of frame elements of a substream, which are of the
extension element type, being applied to an intermediate result, such as a frequency
domain, of the audio signal represented by another substream of frame elements 22 (base
substream), the positioning of the substream of extension element type frame elements 22
so that same immediately follows the base substream, does not only minimize the buffering
overhead, but also the time duration during which the decoder may have to interrupt
further processing of the reconstruction of the represented audio signal because, for
example, the payload data of the extension element type frame elements is to modify the
reconstruction of the audio signal relative to the base substream' s representation. It might,
however, also be favorable to position a dependent extension substream prior to its base
substream representing an audio signal, to which the extension substream refers, For
example, the encoder 24 is free to position the substream of extension payload within the
bitstream upstream relative to a channel element type substream. For example, the
extension payload of substream i could convey dynamic range control (DRC) data and is
transmitted prior to, or at an earlier element position i, relative to the coding of the
corresponding audio signal, such as via frequency domain (FD) coding, within channel
substream at element position i+1, for example. Then, the decoder is able to use the DRC
immendiately when decoding and reconstructing the audio signal represented by nonextension
type substream i+1.
The encoder 24 as described so far represents a possible embodiment of the present
application. However, Fig. 1 also shows a possible internal structure of the encoder which
is to be understood merely as an illustration. As shown in Fig. 1, the encoder 24 may
comprise a distributer 30 and a sequentializer 32 between which various encoding modules
34a-e are connected in a manner described in more detail in the following. In particular,
the distributer 30 is configured to receive the audio signals 16 of the audio content 0 and
to distribute same onto the individual encoding modules 34a-e. The way the distributer 30
distributes the consecutive time periods 18 of the audio signal 16 onto the encoding
modules 34a to 34e is static. In particular, the distribution may be such that each audio
signal 16 is forwarded to one of the encoding modules 34a to 34e exclusively. An audio
signal fed to LFE encoder 34a is encoded by LFE encoder 34a into a substream of frame
elements 22 of type c (see above), for example. Audio signals fed to an input of single
channel encoder 34b are encoded by the latter into a substream of frame elements 22 of
type a (see above), for example. Similarly, a pair of audio signals fed to an input of channel
pair encoder 34c is encoded by the latter into a substream of frame elements 22 of type d
(see above), for example. The just mentioned encoding modules 34a to 34c are connected
with an input and output thereof between distributer 30 on the one hand and sequentializer
32 on the other hand.
As is shown in Fig. 1, however, the inputs of encoder modules 34b and 34c are not only
connected to the output interface of distributer 30. Rather, same may be fed by an output
signal of any of encoding modules 34d and 34e. The latter encoding modules 34d and 34e
are examples of encoding modules which are configured to encode a number of inbound
audio signals into a downmix signal of a lower number of downmix channels on the one
hand, and a substream of frame elements 22 of type d (see above), on the other hand. As
became clear from the above discussion, encoding module 34d may be a SAOC encoder,
and encoding module 34e may be a MPS encoder. The downmix signals are forwarded to
either of encoding modules 34b and 34c. The substreams generated by encoding modules
34a to 34e are forwarded to sequentializer 32 which sequentializes the substreams into the
bitstream 12 as just described. Accordingly, encoding modules 34d and 34e have their
input for the number of audio signals connected to the output interface of distributer 30,
while their substream output is connected to an input interface of sequentializer 32, and
their downmix output is connected to inputs of encoding modules 34b and/or 34c,
respectively.
It should be noted that in accordance with the description above the existence of the multiobject
encoder 34d and multi-channel encoder 34e has merely been chosen for illustrative
purposes, and either one of these encoding modules 34d and 34e may be left away or
replaced by another encoding module, for example.
After having described the encoder 24 and the possible internal structure thereof, a
corresponding decoder is described with respect to Fig. 2. The decoder of Fig. 2 is
generally indicated with reference sign 36 and has an input in order to receive the bitstream
12 and an output for outputting a reconstructed version 38 of the audio content 10 or an
amalgam thereof. Accordingly, the decoder 36 is configured to decode the bitstream 12
comprising the configuration block 28 and the sequence of frames 20 shown in Fig. 1, and
to decode each frame 20 by decoding the frame elements 22 in accordance with the
element type indicated, by the type indication portion, for the respective element position
at which the respective frame element 22 is positioned within the sequence of N frame
elements 22 of the respective frame 20 in the bitstream 12. That is, the decoder 36 is
configured to assign each frame element 22 to one of the possible element types depending
on its element position within the current frame 20 rather than any information within the
frame element itself. By this measure, the decoder 36 obtains N substreams, the first
substream made up of the first frame elements 22 of the frames 20, the second substream
made up of the second frame elements 22 within frames 20, the third substream made up of
the third frame elements 22 within frames 20 and so forth.
Before describing the functionality of decoder 36 with respect to extension element type
frame elements in more detail, a possible internal structure of decoder 36 of Fig. 2 is
explained in more detail so as to correspond to the internal structure of encoder 24 of Fig.
1. As described with respect to the encoder 24, the internal structure is to be understood
merely as being illustrative.
In particular, as shown in Fig. 2, the decoder 36 may internally comprise a distributer 40
and an arranger 42 between which decoding modules 44a to 44e are connected. Each
decoding module 44a to 44e is responsible for decoding a substream of frame elements 22
of a certain frame element type. Accordingly, distributer 40 is configured to distribute the
N substreams of bitstream 12 onto the decoding modules 44a to 44e correspondingly.
Decoding module 44a, for example, is an LFE decoder which decodes a substream of
frame elements 22 of type c (see above) so as to obtain a narrowband (for example) audio
signal at its output. Similarly, single-channel decoder 44b decodes an inbound substream
of frame elements 22 of type a (see above) to obtain a single audio signal at its output, and
channel pair decoder 44c decodes an inbound substream of frame elements 22 of type b
(see above) to obtain a pair of audio signals at its output. Decoding modules 44a to 44c
have their input and output connected between output interface of distributer 40 on the one
hand and input interface of arranger 42 on the other hand.
Decoder 36 may merely have decoding modules 44a to 44c. The other decoding modules
44e and 44d are responsible for extension element type frame elements and are,
accordingly, optional as far as the conformity with the audio codec is concerned. If both or
any of these extension modules 44e to 44d are missing, distributer 40 is configured to skip
respective extension frame element substreams in the bitstream 12 as described in more
detail below, and the reconstructed version 38 of the audio content 10 is merely an
amalgam of the original version having the audio signals 16.
If present, however, i.e. if the decoder 36 supports SAOC and/or MPS extension frame
elements, the multi-channel decoder 44e may be configured to decode substreams
generated by encoder 34e, while multi-object decoder 44d is responsible for decoding
substreams generated by multi-object encoder 34d. Accordingly, in case of decoding
module 44e and/or 44d being present, a switch 46 may connect the output of any of
decoding modules 44c and 44b with a downmix signal input of decoding module 44e
and/or 44d. The multi-channel decoder 44e may be configured to up-mix an inbound
downmix signal using side information within the inbound substream from distributer 40 to
obtain an increased number of audio signals at its output. Multi-object decoder 44d may
act accordingly with the difference that multi-object decoder 44d treats the individual
audio signals as audio objects whereas the multi-channel decoder 44e treats the audio
signals at its output as audio channels.
The audio signals thus reconstructed are forwarded to arranger 42 which arranges them to
form the reconstruction 38. Arranger 42 may be additionally controlled by user input 48,
which user input indicates, for example, an available loudspeaker configuration or a
highest number of channels of the reconstruction 38 allowed. Depending on the user input
48, arranger 42 may disable any of the decoding modules 44a to 44e such as, for example,
any of the extension modules 44d and 44e, although present and although extension frame
elements are present in the bitstream 12.
Generally speaking, the decoder 36 may be configured to parse the bitstream 12 and
reconstruct the audio content based on a subset of the sequences of frame elements, i.e.
substreams, and to, with respect to at least one of the sequences of frame elements 22 not
belonging to the subset of the sequences of frame elements, read the configuration block
28 of the at least one of the sequences of frame elements 22, including a default payload
length information on a payload length, and for each frame element 22 of the at least one
of the sequences of frame elements 22, read a length information from the bitstream 12, the
reading of the length information comprising, for at least a subset of the frame elements 22
of the at least one of the sequences of frame elements 22, reading a default payload length
flag followed, if the default payload length flag is not set, by reading a payload length
value. The decoder 36 may then skip, in parsing the bitstream 12, any frame element of the
at least one of the sequences of frame elements, the default extension payload length flag
of which is set, using the default payload length as skip interval length, and any frame
element of the at least one of the sequences of frame elements 22, the default extension
payload length flag of which is not set, using a payload length corresponding to the
payload length value of a skip interval length.
In the embodiments described further below, this mechanism is restricted to extension
element type substreams only, but naturally such mechanism or syntax portion could apply
to more than one element type.
Before describing further possible details of the decoder, encoder and bitstream,
respectively, it should be noted that owning to the ability of the encoder to intersperse
frame elements of substreams which are of the extension element type, inbetween frame
elements of substreams, which are not of the extension element type, buffer overhead of
decoder 36 may be lowered by the encoder 24 appropriately choosing the order among the
substreams and the order among the frame elements of the substreams within each frame
20, respectively. Imagine, for example, that the substream entering channel pair decoder
44c would be placed at the first element position within frame 20, while multi-channel
substream for decoder 44e would be placed at the end of each frame. In that case, the
decoder 36 would have to buffer the intermediate audio signal representing the downmix
signal for multi-channel decoder 44e for a time period bridging the time between the
arrival of the first frame element and the last frame element of each frame 20, respectively.
Only then is the multi-channel decoder 44e able to commence its processing. This deferral
may be avoided by the encoder 24 arranging the substream dedicated for multi-channel
decoder 44e at the second element position of frames 20, for example. On the other hand,
distributer 40 does not need to inspect each frame element with respect to its membership
to any of the substreams. Rather, distributer 40 is able to deduce the membership of a
current frame element 22 of a current frame 20 to any of the N substreams merely from the
configuration block and the type indication syntax portion contained therein.
Reference is now made to Fig. 3 showing a bitstream 12 which comprises, as already
described above, a configuration block 28 and a sequence of frames 20. Bitstream portions
to the right follow other bitstream portion's positions to the left when look at Fig. 3. In the
case of Fig. 3, for example, configuration block 28 precedes the frames 20 shown in Fig, 3
wherein, for illustrative purposes only, merely three frames 20 are completely shown in
Fig. 3.
Further, it should be noted that the configuration block 28 may be inserted into the
bitstream 12 in between frames 20 on a periodic or intermittent basis to allow for random
access points in streaming transmission applications. Generally speaking, the configuration
block 28 may be a simply-connected portion of the bitstream 12.
The configuration block 28 comprises, as described above, a field 50 indicating the number
of elements N, i.e. the number of frame elements N within each frame 20 and the number
of substreams multiplexed into bitstream 12 as described above. In the following
embodiment describing an embodiment for a concrete syntax of bitstream 12, field 50 is
denoted numElements and the configuration block 28 called UsacConfig in the following
specific syntax example of Fig. 4a-z and za-zc. Further, the configuration block 28
comprises a type indication syntax portion 52. As already described above, this portion 52
indicates for each element position an element type out of a plurality of element types. As
shown in Fig. 3 and as is the case with respect to the following specific syntax example,
the type indication syntax portion 52 may comprise a sequence of N syntax elements 54
which each syntax element 54 indicating the element type for the respective element
position at which the respective syntax element 54 is positioned within the type indication
syntax portion 52. In other words, the i syntax element 54 within portion 52 may indicate
the element type of the it substream and it frame element of each frame 20, respectively.
In the subsequent concrete syntax example, the syntax element is denoted
UsacElementType. Although the type indication syntax portion 52 could be contained
within the bitstream 12 as a simply-connected or contiguous portion of the bitstream 12, it
is exemplarily shown in Fig. 3 that the elements 54 thereof are intermeshed with other
syntax element portions of the configuration block 28 which are present for each of the N
element positions individually. In the below-outlined embodiments, this intermeshed
syntax portions pertains the substream-specific configuration data 55 the meaning of which
is described in the following in more detail.
As already described above, each frame 20 is composed of a sequence of N frame elements
22. The element types of these frame elements 22 are not signaled by respective type
indicators within the frame elements 22 themselves. Rather, the element types of the frame
elements 22 are defined by their element position within each frame 20. The frame element
22 occurring first in the frame 20, denoted frame element 22a in Fig. 3, has the first
element position and is accordingly of the element type which is indicated for the first
element position by syntax portion 52 within configuration block 28. The same applies
with respect to the following frame elements 22. For example, the frame element 22b
occurring immediately after the first frame element 22a within bitstream 12, i.e. the one
having element position 2, is of the element type indicated by syntax portion 52.
In accordance with a specific embodiment, the syntax elements 54 are arranged within
bitstream 12 in the same order as the frame elements 22 to which they refer. That is, the
first syntax element 54, i.e. the one occurring first in the bitstream 12 and being positioned
at the outermost left-hand side in Fig. 3, indicates the element type of the first occurring
frame element 22a of each frame 20, the second syntax element 54 indicates the element
type of the second frame element 22b and so forth. Naturally, the sequential order or
arrangement of syntax elements 54 within bitstream 12 and syntax portions 52 may be
switched relative to the sequential order of frame elements 22 within frames 20. Other
permutations would also be feasible although less preferred.
For the decoder 36, this means that same may be configured to read this sequence of N
syntax elements 54 from the type indication syntax portion 52. To be more precise, the
decoder 36 reads field 50 so that decoder 36 knows about the number N of syntax elements
54 to be read from bitstream 12. As just mentioned, decoder 36 may be configured to
associate the syntax elements and the element type indicated thereby with the frame
elements 22 within frames 20 so that the i syntax element 54 is associated with the i
frame element 22.
In addition to the above description, the configuration block 28 may comprise a sequence
55 of N configuration elements 56 with each configuration element 56 comprising
configuration information for the element type for the respective element position at which
the respective configuration element 56 is positioned in the sequence 55 of N configuration
elements 56. In particular, the order in which the sequence of configuration elements 56 is
written into the bitstream 12 (and read from the bitstream 1 by decoder 36) may be the
same order as that used for the frame elements 22 and/or the syntax elements 54,
respectively. That is, the configuration element 56 occurring first in the bitstream 12 may
comprise the configuration information for the first frame element 22a, the second
configuration element 56, the configuration information for frame element 22b and so
forth. As already mentioned above, the type indication syntax portion 52 and the elementposition-
specific configuration data 55 is shown in the embodiment of Fig. 3 as being
interleaved which each other in that the configuration element 56 pertaining element
position i is positioned in the bitstream 12 between the type indicator 54 for element
position i and element position i+1. In even other words, configuration elements 56 and the
syntax elements 54 are arranged in the bitstream alternately and read therefrom alternately
by the decoder 36, but other positioning if this data in the bistream 12 within block 28
would a so be feasible as mentioned before.
By conveying a configuration element 56 for each element position 1...N in configuration
block 28, respectively, the bitstream allows for differently configuring frame elements
belonging to different substreams and element positions, respectively, but being of the
same element type. For example, a bitstream 12 may comprise two single channel
substreams and accordingly two frame elements of the single channel element type within
each frame 20. The configuration information for both substreams may, however, be
adjusted differently in the bitstream 12. This, in turn, means that the encoder 24 of Fig. 1 is
enabled to differently set coding parameters within the configuration information for these
different substreams and the single channel decoder 44b of decoder 36 is controlled by
using these different coding parameters when decoding these two substreams. This is also
true for the other decoding modules. More generally speaking, the decoder 36 is
configured to read the sequence of N configuration elements 56 from the configuration
block 28 and decodes the i h frame element 22 in accordance with the element type
indicated by the i h syntax element 54, and using the configuration information comprised
by the i h configuration element 56.
For illustrative purposes, it is assumed in Fig. 3 that the second substream, i.e. the
substream composed of the frame elements 22b occurring at the second element position
within each frame 20, has an extension element type substream composed of frame
elements 22b of the extension element type. Naturally, this is merely illustrative.
Further, it is only for illustrative purposes that the bitstream or configuration block 28
comprises one configuration element 56 per element position irrespective of the element
type indicated for that element position by syntax portion 52. In accordance with an
alternative embodiment, for example, there may be one or more element types for which
no configuration element is comprised by configuration block 28 so that, in the latter case,
the number of configuration elements 56 within configuration block 28 may be less than N
depending on the number of frame elements of such element types occurring in syntax
portion 52 and frames 20, respectively.
In any case, Fig. 3 shows a further example for building configuration elements 56
concerning the extension element type. In the subsequently explained specific syntax
embodiment, these configuration elements 56 are denoted UsacExtElementConfig. For
completeness only, it is noted that in the subsequently explained specific syntax
embodiment, configuration elements for the other element types are denoted
UsacSingleChannelElementConfig, UsacChannelPairElementConfig and
UsacLfeElementConfig.
However, before describing a possible structure of a configuration element 56 for the
extension element type, reference is made to the portion of Fig. 3 showing a possible
structure of a frame element of the extension element type, here illustratively the second
frame element 22b. As shown therein, frame elements of the extension element type may
comprise a length information 58 on a length of the respective frame element 22b. Decoder
36 is configured to read, from each frame element 22b of the extension element type of
every frame 20, this length information 58. If the decoder 36 is not able to, or is instructed
by user input not to, process the substream to which this frame element of the extension
element type belongs, decoder 36 skips this frame element 22b using the length
information 58 as skip interval length, i.e. the length of the portion of the bitstream to be
skipped. In other words, the decoder 36 may use the length information 58 to compute the
number of bytes or any other suitable measure for defining a bitstream interval length,
which is to be skipped until accessing or visiting the next frame element within the current
frame 20 or the starting of the next following frame 20, so as to further prosecute reading
the bitstream 12.
As will be described in more detail below, frame elements of the extension element type
may be configured to accommodate for future or alternative extensions or developments of
the audio codec and accordingly frame elements of the extension element type may have
different statistical length distributions. In order to take advantage of the possibility that in
accordance with some applications the extension element type frame elements of a certain
substream are of constant length or have a very narrow statistical length distribution, in
accordance with some embodiments of the present application, the configuration elements
56 for extension element type may comprise default payload length information 60 as
shown in Fig. 3. In that case, it is possible for the frame elements 22b of the extension
element type of the respective substream, to refer to this default payload length information
60 contained within the respective configuration element 56 for the respective substream
instead of explicitly transmitting the payload length. In particular, as shown in Fig. 3, in
that case the length information 58 may comprise a conditional syntax portion 62 in the
form of a default extension payload length flag 64 followed, if the default payload length
flag 64 is not set, by an extension payload length value 66. Any frame element 22b of the
extension element type has the default extension payload length as indicated by
information 60 in the corresponding configuration element 56 in case the default extension
payload length flag 64 of the length information 62 of the respective frame element 22b of
the extension element type is set, and has an extension payload length corresponding to the
extension payload length value 66 of the length information 58 of the respective frame
element 22b of the extension element type in case of the default extension payload length
flag 64 of the length information 58 of the respective frame 22b of the extension element
type is not set. That is, the explicit coding of the extension payload length value 66 may be
avoided by the encoder 24 whenever it is possible to merely refer to the default extension
payload length as indicated by the default payload length information 60 within the
configuration element 56 of the corresponding substream and element position,
respectively. The decoder 36 acts as follows. Same reads the default payload length
information 60 during the reading of the configuration element 56. When reading the frame
element 22b of the corresponding substream, the decoder 36, in reading the length
information of these frame elements, reads the default extension payload length flag 64 and
checks whether same is set or not. If the default payload length flag 64 is not set, the
decoder proceeds with reading the extension payload length value 66 of the conditional
syntax portion 62 from the bitstream so as to obtain an extension payload length of the
respective frame element. However, if the default payload flag 64 is set, the decoder 36
sets the extension payload length of the respective frame to be equal to the default
extension payload length as derived from information 60. The skipping of the decoder 36
may then involve skipping a payload section 68 of the current frame element using the
extension payload length just determined as the skip interval length, i.e. the length of a
portion of the bitstream 12 to be skipped so as to access the next frame element 22 of the
current frame 20 or the beginning of the next frame 20.
Accordingly, as previously described, the frame-wise repeated transmission of the payload
length of the frame elements of an extension element type of a certain substream may be
avoided using flag mechanism 64 whenever the variety of the payload length of these
frame elements is rather low.
However, since it is not a priori clear whether the payload conveyed by the frame elements
of an extension element type of a certain substream has such a statistic regarding the
payload length of the frame elements, and accordingly whether it is worthwhile to transmit
the default payload length explicitly in the configuration element of such a substream of
frame elements of the extension element type, in accordance with further embodiment, the
default payload length information 60 is also implemented by a conditional syntax portion
comprising a flag 60a called UsacExtElementDefaultLengthPresent in the following
specific syntax example, and indicating whether or not an explicit transmission of the
default payload length takes place. Merely if set, the conditional syntax portion comprises
the explicit transmission 60b of the default payload length called
UsacExtEIementDefaultLength in the following specific syntax example. Otherwise, the
default payload length is by default set to 0. In the latter case, bitstream bit consumption is
saved as an explicit transmission of the default payload length is avoided. That is, the
decoder 36 (and distributor 40 which is responsible for all reading procedures described
hereinbefore and hereinafter), may be configured to, in reading the default payload length
information 60, read a default payload length present flag 60a from the bitstream 12, check
as to whether the default payload length present flag 60a is set, and if the default payload
length present flag 60a is set, set the default extension payload length to be zero, and if the
default payload length present flag 60a is not set, explicitly read the default extension
payload length 60b from the bit stream 12 (namely, the field 60b following flag 60a).
In addition to, or alternatively to the default payload length mechanism, the length
information 58 may comprise an extension payload present flag 70 wherein any frame
element 22b of the extension element type, the extension payload present flag 70 of the
length information 58 of which is not set, merely consists of the extension payload present
flag and that's it. That is, there is no payload section 68. On the other hand, the length
information 58 of any frame element 22b of the extension element type, the payload data
present flag 70 of the length information 58 of which is set, further comprises a syntax
portion 62 or 66 indicating the extension payload length of the respective frame 22b, i.e.
the length of its payload section 68. In addition to the default payload length mechanism,
i.e. in combination with the default extension payload length flag 64, the extension payload
present flag 70 enables providing each frame element of the extension element type with
two effectively codable payload lengths, namely 0 on the one hand and the default payload
length, i.e. the most probable payload length, on the other hand.
In parsing or reading the length information 58 of a current frame element 22b of the
extension element type, the decoder 36 reads the extension payload present flag 70 from
the bitstream 12, checks whether the extension payload present flag 70 is set, and if the
extension payload present flag 70 is not set, ceases reading the respective frame element
22b and proceeds with reading another, next frame element 22 of the current frame 20 or
starts with reading or parsing the next frame 20. Whereas if the payload data present flag
70 is set, the decoder 36 reads the syntax portion 62 or at least portion 66 (if flag 64 is non¬
existent since this mechanism is not available) and skips, if the payload of the current
frame element 22 is to be skipped, the payload section 68 by using the extension payload
length of the respective frame element 22b of the extension element type as the skip
interval length.
As described above, frame elements of the extension element type may be provided in
order to accommodate for future extensions of the audio codec or alternative extensions
which the current decoder is not suitable for, and accordingly frame elements of the
extension element type should be configurable. In particular, in accordance with an
embodiment, the configuration block 28 comprises, for each element position for which the
type indication portion 52 indicates the extension element type, a configuration element 56
comprising configuration information for the extension element type, wherein the
configuration information comprises, in addition or alternatively to the above outlined
components, an extension element type field 72 indicating a payload data type out of a
plurality of payload data types. The plurality of payload data types may, in accordance
with one embodiment, comprise a multi-channel side information type and a multi-object
coding side information type besides other data types which are, for example, reserved for
future developments. Depending of the payload data type indicated, the configuration
element 56 additionally comprises a payload data type specific configuration data.
Accordingly, the frame elements 22b at the corresponding element position and of the
respective substream, respectively, convey in its payload sections 68 payload data
corresponding to the indicated payload data type. In order to allow for an adaption of the
length of the payload data type specific configuration data 74 to the payload data type, and
to allow for the reservation for future developments of further payload data types, the
specific syntax embodiments described below have the configuration elements 56 of
extension element type additionally comprising a configuration element length value called
UsacExtElementConfigLength so that decoders 36 which are not aware of the payload data
type indicated for the current substream, are able to skip the configuration element 56 and
its payload data type specific configuration data 74 to access the immediately following
portion of the bitstream 12 such as the element type syntax element 54 of the next element
position (or in the alternative embodiment not shown, the configuration element of the next
element position) or the beginning of the first frame following the configuration block 28
or some other data as will be shown with respect to Fig. 4a. In particular, in the following
specific embodiment for a syntax, multi-channel side information configuration data is
contained in SpatialSpecificConfig, while multi-object side information configuration data
is contained within SaocSpecificConfig.
In accordance with the latter aspect, the decoder 36 would be configured to, in reading the
configuration block 28, perform the following steps for each element position or substream
for which the type indication portion 52 indicates the extension element type:
Reading the configuration element 56, including reading the extension element type field
72 indicating the payload data type out of the plurality of available payload data types,
If the extension element type field 72 indicates the multi-channel side information type,
reading multi-channel side information configuration data 74 as part of the configuration
information from the bitstream 12, and if the extension element type field 72 indicates the
multi-object side information type, reading multi-object side-information configuration
data 74 as part of the configuration information from the bitstream 12.
Then, in decoding the corresponding frame elements 22b, i.e. the ones of the
corresponding element position and substream, respectively, the decoder 36 would
configure the multi-channel decoder 44e using the multi-channel side information
configuration data 74 while feeding the thus configured multi-channel decoder 44e payload
data 68 of the respective frame elements 22b as multi-channel side information, in case of
the payload data type indicating the multi-channel side information type, and decode the
corresponding frame elements 22b by configuring the multi-object decoder 44d using the
multi-object side information configuration data 74 and feeding the thus configured multiobject
decoder 44d with payload data 68 of the respective frame element 22b, in case of
the payload data type indicating the multi-object side information type.
However, if an unknown payload data type is indicated by field 72, the decoder 36 would
skip payload data type specific configuration data 74 using the aforementioned
configuration length value also comprised by the current configuration element.
For example, the decoder 36 could be configured to, for any element position for which the
type indication portion 52 indicates the extension element type, read a configuration data
length field 76 from the bitstream 1 as part of the configuration information of the
configuration element 56 for the respective element position so as to obtain a configuration
data length, and check as to whether the payload data type indicated by the extension
element type field 72 of the configuration information of the configuration element for the
respective element position, belongs to a predetermined set of payload data types being a
subset of the plurality of payload data types. If the payload data type indicated by the
extension element type field 72 of the configuration information of the configuration
element for the respective element position, belongs to the predetermined set of payload
data types, decoder 3 would read the payload data dependent configuration data 74 as part
of the configuration information of the configuration element for the respective element
position from the data stream 12, and decode the frame elements of the extension element
type at the respective element position in the frames 20, using the payload data dependent
configuration data 74. But if the payload data type indicated by the extension element type
field 72 of the configuration information of the configuration element for the respective
element position, does not belong to the predetermined set of payload data types, the
decoder would skip the payload data dependent configuration data 74 using the
configuration data length, and skip the frame elements of the extension element type at the
respective element position in the frames 20 using the length information 58 therein.
In addition to, or alternative to the above mechanisms, the frame elements of a certain
substream could be configured to be transmitted in fragments rather than one per frame
completely. For example, the configuration elements of extension element types could
comprises an fragmentation use flag 78, the decoder could be configured to, in reading
frame elements 22 positioned at any element position for which the type indication portion
indicates the extension element type, and for which the fragmentation use flag 78 of the
configuration element is set, read a fragment information 80 from the bitstream 12, and use
the fragment information to put payload data of these frame elements of consecutive
frames together. In the following specific syntax example, each extension type frame
element of a substream for which the fragmentation use flag 78 is set, comprises a pair of a
start flag indicating a start of a payload of the substream, and an end flag indicating an end
of a payload item of the substream. These flags are called usacExtElementStart and
usacExtElementStop in the following specific syntax example.
Further, in addition to, or alternative to the above mechanisms, the same variable length
code could be used to read the length information 80, the extension element type field 72,
and the configuration data length field 76, thereby lowering the complexity to implement
the decoder, for example, and saving bits by necessitating additional bits merely in
seldomly occurring cases such as future extension element types, greater extension element
type lengths and so forth. In the subsequently explained specific example, this VLC code is
derivable from Fig. 4m.
Summarizing the above, the following could apply for the decoder's functionality:
(1) Reading the configuration block 28, and
(2) Reading/parsing the sequence of frames 20. Step 1 and 2 are performed by decoder 36
and, more precisely, distributor 40.
(3) A reconstruction of the audio content is restricted to those substreams, i.e. to those
sequences of frame elements at element positions, the decoding of which is supported by
the decoder 36. Step 3 is performed within decoder 36 at, for example, the decoding
modules thereof (see Fig,. 2).
Accordingly, in step 1 the decoder 36 reads the number 50 of substreams and the number
of frame elements 22 per frame 20, respectively, as well as the element type syntax portion
52 revealing the element type of each of these substreams and element positions,
respectively. For parsing the bitstream in step 2, the decoder 36 then cyclically reads the
frame elements 22 of the sequence of frames 20 from bitstream 12. In doing so, the
decoder 36 skips frame elements, or remaining/payload portions thereof, by use of the
length information 58 as has been described above. In the third step, the decoder 36
performs the reconstruction by decoding the frame elements not having been skipped.
In deciding in step 2 which of the element positions and substreams are to be skipped, the
decoder 36 may inspect the configuration elements 56 within the configuration block 28. In
order to do so, the decoder 36 may be configured to cyclically read the configuration
elements 56 from the configuration block 28 of bitstream 12 in the same order as used for
the element type indicators 54 and the frame elements 22 themselves. As denoted above,
the cyclic reading of the configuration elements 56 may be interleaved with the cyclic
reading of the syntax elements 54. In particular, the decoder 36 may inspect the extension
element type field 72 within the configuration elements 56 of extension element type
substreams. If the extension element type is not a supported one, the decoder 36 skips the
respective substream and the corresponding frame elements 22 at the respective frame
element positions within frames 20.
In order to ease the bitrate needed for transmitting the length information 58, the decoder
36 is configured to inspect the configuration elements 56 of extension element type
substreams, and in particular the default payload length information 60 thereof in step 1. In
the second step, the decoder 36 inspects the length information 58 of extension frame
elements 22 to be skipped. In particular, first, the decoder 36 inspects flag 64. If set, the
decoder 36 uses the default length indicated for the respective substream by the default
payload length information 60, as the remaining payload length to be skipped in order to
proceed with the cyclical reading/parsing of the frame elements of the frames. If flag 64,
however, is not set then the decoder 36 explicitly reads the payload length 66 from the
bitstream 12. Although not explicitly explained above, it should be clear that the decoder
36 may derive the number of bits or bytes to be skipped in order to access the next frame
element of the current frame or the next frame by some additional computation. For
example, the decoder 36 may take into account whether the fragmentation mechanism is
activated or not, as explained above with respect to flag 78. If activated, the decoder 36
may take into account that the frame elements of the substream having flag 78 set, in any
case have the fragmentation information 80 and that, accordingly, the payload data 68
starts later as it would have in case of the fragmentation flag 78 not being set.
In decoding in step 3, the decoder acts as usual: that is, the individual substreams are
subject to respective decoding mechanisms or decoding modules, as shown in Fig. 2,
wherein some substreams may form side information with respect to other substreams as
has been explained above with respect to specific examples of extension substreams.
Regarding other possible details regarding the decoders functionality, reference is made to
the above discussion. For completeness only, it is noted that decoder 36 may also skip the
further parsing of configuration elements 56 in step 1, namely for those element positions
which are to be skipped because, for example, the extension element type indicated by
field 72 does not fit to a supported set of extension element types. Then, the decoder 36
may use the configuration length information 76 in order to skip respective configuration
elements in cyclically reading/parsing the configuration elements 56, i.e. in skipping a
respective number of bits/bytes in order to access the next bitstream syntax element such as
the type indicator 54 of the next element position.
Before proceeding with the above mentioned specific syntax embodiment, it should be
noted that the present invention is not restricted to be implemented with unified speech and
audio coding and its facets like switching core coding using a mixture or a switching
between AAC like frequency domain coding and LP coding using parametric coding
(ACELP) and transform coding (TCX). Rather, the above mentioned substreams may
represent audio signals using any coding scheme. Moreover, while in the below outlined
specific syntax embodiment assume that SBR is a coding option of the core codec used to
represent audio signals using single channel and channel pair element type substreams,
SBR may also be no option of the latter element types, but merely be usable using
extension element types.
In the following the specific syntax example for a bitstream 1 is explained. It should be
noted that the specific syntax example represents a possible implementation for the
embodiment of Fig. 3 and the concordance between the syntax elements of the following
syntax and the structure of the bitstream of Fig. 3 is indicated or derivable from the
respective notations in Fig. 3 and the description of Fig. 3. The basic aspects of the
following specific example are outlined now. In this regard, it should be noted that any
additional details in addition to those already described above with respect to Fig. 3 are to
be understood as a possible extension of the embodiment of Fig. 3. All of these extensions
may be individually built into the embodiment of Fig. 3. As a last preliminary note, it
should be understood that the specific syntax example described below explicitly refers to
the decoder and encoder environment of Figs. 5a and 5b, respectively.
High level information, like sampling rate, exact channel configuration, about the
contained audio content is present in the audio bitstream. This makes the bitstream more
self contained and makes transport of the configuration and payload easier when embedded
in transport schemes which may have no means to explicitly transmit this information.
The configuration structure contains a combined frame length and SBR sampling rate ratio
index (coreSbrFrameLengthlndex)). This guarantees efficient transmission of both values
and makes sure that non-meaningful combinations of frame length and SBR ratio cannot
be signaled. The latter simplifies the implementation of a decoder.
The configuration can be extended by means of a dedicated configuration extension
mechanism. This will prevent bulky and inefficient transmission of configuration
extensions as known from the MPEG-4 AudioSpecificConfig().
Configuration allows free signaling of loudspeaker positions associated with each
transmitted audio channel. Signaling of commonly used channel to loudspeaker mappings
can be efficiently signaled by means of a channelConfigurationlndex.
Configuration of each channel element is contained in a separate structure such that each
channel element can be configured independently.
SBR configuration data (the "SBR header") is split into an SbrInfo() and an SbrHeader().
For the SbrHeaderQ a default version is defined (SbrDfltHeader()), which can be
efficiently referenced in the bitstream. This reduces the bit demand in places where retransmission
of SBR configuration data is needed.
More commonly applied configuration changes to SBR can be efficiently signaled with the
help of the SbrInfo() syntax element.
The configuration for the parametric bandwidth extension (SBR) and the parametric stereo
coding tools (MPS212, aka. MPEG Surround 2-1-2) is tightly integrated into the USAC
configuration structure. This represents much better the way that both technologies are
actually employed in the standard.
The syntax features an extension mechanism which allows transmission of existing and
future extensions to the codec.
The extensions may be placed (i.e. interleaved) with the channel elements in any order.
This allows for extensions which need to be read before or after a particular channel
element which the extension shall be applied on.
A default length can be defined for a syntax extension, which makes transmission of
constant length extensions very efficient, because the length of the extension payload does
not need to be transmitted every time.
The common case of signaling a value with the help of an escape mechanism to extend the
range of values if needed was modularized into a dedicated genuine syntax element
(escapedValue()) which is flexible enough to cover all desired escape value constellations
and bit field extensions.
Bitstream Configuration
UsacConfigO (Fig. 4a)
The UsacConfigO was extended to contain information about the contained audio content
as well as everything needed for the complete decoder set-up. The top level information
about the audio (sampling rate, channel configuration, output frame length) is gathered at
the beginning for easy access from higher (application) layers.
UsacChannelConfigO (Fig. 4b)
These elements give information about the contained bitstream elements and their mapping
to loudspeakers. The channelConfigurationlndex allows for an easy and convenient way of
signaling one out of a range of predefined mono, stereo or multi-channel configurations
which were considered practically relevant.
For more elaborate configurations which are not covered by the
channelConfigurationlndex the UsacChannelConfigO allows for a free assignment of
elements to loudspeaker position out of a list of 32 speaker positions, which cover all
currently known speaker positions in all known speaker set-ups for home or cinema sound
reproduction.
This list of speaker positions is a superset of the list featured in the MPEG Surround
standard (see Table 1 and Figure 1 in ISO/IEC 23003-1). Four additional speaker positions
have been added to be able to cover the lately introduced 22.2 speaker set-up (see Figs. 3a,
3b, 4a and 4b).
UsacDecoderConfigO (Fig. 4c)
This element is at the heart of the decoder configuration and as such it contains a further
information required by the decoder to interpret the bitstream.
In particular the structure of the bitstream is defined here by explicitly stating the number
of elements and their order in the bitstream.
A loop over all elements then allows for configuration of all elements of all types (single,
pair, lfe, extension).
UsacConfigExtensionO (Fig. 41)
In order to account for future extensions, the configuration features a powerful mechanism
to extend the configuration for yet non-existent configuration extensions for USAC.
UsacSingleChannelEIementConfigO (Fig. 4d)
This element configuration contains all information needed for configuring the decoder to
decode one single channel. This is essentially the core coder related information and if
SBR is used the SBR related information.
UsacChannelPairElementConfigO (Fig. 4e)
In analogy to the above this element configuration contains all information needed for
configuring the decoder to decode one channel pair. In addition to the above mentioned
core config and SBR configuration this includes stereo-specific configurations like the
exact kind of stereo coding applied (with or without MPS212, residual etc.). Note that this
element covers all kinds of stereo coding options available in USAC.
UsacLfeEIementConfigO (Fig. 4f)
The LFE element configuration does not contain configuration data as an LFE element has
a static configuration.
UsacExtElementConfigO (Fig. 4k)
This element configuration can be used for configuring any kind of existing or future
extensions to the codec. Each extension element type has its own dedicated ID value. A
length field is included in order to be able to conveniently skip over configuration
extensions unknown to the decoder. The optional definition of a default payload length
further increases the coding efficiency of extension payloads present in the actual
bitstream.
Extensions which are already envisioned to be combined with USAC include: MPEG
Surround, SAOC, and some sort of FIL element as known from MPEG-4 AAC.
UsacCoreConfigO (Fig. 4g)
This element contains configuration data that has impact on the core coder set-up.
Currently these are switches for the time warping tool and the noise filling tool.
SbrConfigO (Fig. 4h)
In order to reduce the bit overhead produced by the frequent re-transmission of the
sbr_header(), default values for the elements of the sbr_header() that are typically kept
constant are now carried in the configuration element SbrDfltHeader(). Furthermore, static
SBR configuration elements are also carried in SbrConfig(). These static bits include flags
for en- or disabling particular features of the enhanced SBR, like harmonic transposition or
inter TES.
SbrDfltHeaderO (Fig. 4i)
This carries elements of the sbr_header() that are typically kept constant. Elements
affecting things like amplitude resolution, crossover band, spectrum prefiattening are now
carried in SbrlnfoQ which allows them to be efficiently changed on the fly.
Mps212ConfigO (Fig. 4j)
Similar to the above SBR configuration, all set-up parameters for the MPEG Surround 2-1-
2 tools are assembled in this configuration. All elements from SpatialSpecificConfig() that
are not relevant or redundant in this context were removed.
Bitstream Payload
UsacFrameO (Fig. 4n)
This is the outermost wrapper around the USAC bitstream payload and represents a USAC
access unit. It contains a loop over all contained channel elements and extension elements
as signaled in the config part. This makes the bitstream format much more flexible in terms
of what it can contain and is future proof for any future extension.
UsacSingleChannelElementO (Fig. 4o)
This element contains all data to decode a mono stream. The content is split in a core coder
related part and an eSBR related part. The latter is now much more closely connected to
the core, which reflects also much better the order in which the data is needed by the
decoder.
UsacChannelPairElementO (Fig. 4p)
This element covers the data for all possible ways to encode a stereo pair. In particular, all
flavors of unified stereo coding are covered, ranging from legacy M/S based coding to
fully parametric stereo coding with the help of MPEG Surround 2-1-2. stereoConfiglndex
indicates which flavor is actually used. Appropriate eSBR data and MPEG Surround 2-1-2
data is sent in this element.
UsacLfeElementO (Fig. 4q)
The former lfe_channel_element() is renamed only in order to follow a consistent naming
scheme.
UsacExtElementO (Fig. 4r)
The extension element was carefully designed to be able to be maximally flexible but at
the same time maximally efficient even for extensions which have a small payload (or
frequently none at all). The extension payload length is signaled for nescient decoders to
skip over it. User-defined extensions can be signaled by means of a reserved range of
extension types. Extensions can be placed freely in the order of elements. A range of
extension elements has already been considered including a. mechanism to write fill bytes.
UsacCoreCoderDataQ (Fig. 4s)
This new element summarizes all information affecting the core coders and hence also
contains fd_channel_stream()'s and lpd_channel_stream()'s.
StereoCoreToolInfoO (Fig. 4t)
In order to ease the readability of the syntax, all stereo related information was captured in
this element. It deals with the numerous dependencies of bits in the stereo coding modes.
UsacSbrDataO (Fig. 4x)
CRC functionality and legacy description elements of scalable audio coding were removed
from what used to be the sbr_extension_data() element. In order to reduce the overhead
caused by frequent re-transmission of SBR info and header data, the presence of these can
be explicitly signaled.
SbrlnfoO (Fig. 4y)
SBR configuration data that is frequently modified on the fly. This includes elements
controlling things like amplitude resolution, crossover band, spectrum preflattening, which
previously required the transmission of a complete sbr_header(). (see 6.3 in [N1 1660],
"Efficiency").
SbrHeaderO (Fig. 4z)
In order to maintain the capability of SBR to change values in the sbr_header() on the fly,
it is now possible to carry an SbrHeader() inside the UsacSbrData() in case other values
than those sent in SbrDfltHeader() should be used. The bs_header_extra mechanism was
maintained in order to keep overhead as low as possible for the most common cases.
sbr_data0 (Fig. 4za)
Again, remnants of SBR scalable coding were removed because they are not applicable in
the USAC context. Depending on the number of channels the sbr_data() contains one
sbr_single_channel_element() or one sbr_channel_pair_element().
usacSamplingFrequencylndex
This table is a superset of the table used in MPEG-4 to signal the sampling frequency of
the audio codec. The table was further extended to also cover the sampling rates that are
currently used in the USAC operating modes. Some multiples of the sampling frequencies
were also added.
channelConfigurationlndex
This table is a superset of the table used in MPEG-4 to signal the channelConfiguration. It
was further extended to allow signaling of commonly used and envisioned future
loudspeaker setups. The index into this table is signaled with 5 bits to allow for future
extensions.
usacElementType
Only 4 element types exist. One for each of the four basic bitstream elements:
UsacSingleChannelElement(), UsacChannelPairElement(), UsacLfeElement(),
UsacExtElement(). These elements provide the necessary top level structure while
maintaining all needed flexibility.
usacExtElementType
Inside of UsacExtElement(), this element allows to signal a plethora of extensions. In order
to be future proof the bit field was chosen large enough to allow for all conceivable
extensions. Out of the currently known extensions already few are proposed to be
considered: fill element, MPEG Surround, and SAOC.
usacConfigExtType
Should it at some point be necessary to extend the configuration then this can be handled
by means of the UsacConfigExtension() which would then allow to assign a type to each
new configuration. Currently the only type which can be signaled is a fill mechanism for
the configuration.
coreSbrFrameLengthlndex
This table shall signal multiple configuration aspects of the decoder. In particular these are
the output frame length, the SBR ratio and the resulting core coder frame length (ccfl). At
the same time it indicates the number of QMF analysis and synthesis bands used in SBR
stereoConfiglndex
This table determines the inner structure of a UsacChannelPairElement(). It indicates the
use of a mono or stereo core, use of MPS212, whether stereo SBR is applied, and whether
residual coding is applied in MPS212.
By moving large parts of the eSBR header fields to a default header which can be
referenced by means of a default header flag, the bit demand for sending eSBR control data
was greatly reduced. Former sbr_header() bit fields that were considered to change most
likely in a real world system were outsourced to the sbrlnfoQ element instead which now
consists only of 4 elements covering a maximum of 8 bits. Compared to the sbr_header(),
which consists of at least 18 bits this is a saving of 10 bit.
It is more difficult to assess the impact of this change on the overall bitrate because it
depends heavily on the rate of transmission of eSBR control data in sbrlnfo(). However,
already for the common use case where the sbr crossover is altered in a bitstream the bit
saving can be as high as 22 bits per occurrence when sending an sbrInfo() instead of a fully
transmitted sbr_header().
The output of the USAC decoder can be further processed by MPEG Surround (MPS)
(ISO/IEC 23003-1) or SAOC (ISO/IEC 23003-2). If the SBR tool in USAC is active, a
USAC decoder can typically be efficiently combined with a subsequent MPS/SAOC
decoder by connecting them in the QMF domain in the same way as it is described for HEAAC
in ISO/IEC 23003-1 4.4. If a connection in the QMF domain is not possible, they
need to be connected in the time domain.
If MPS/SAOC side information is embedded into a USAC bitstream by means of the
usacExtElement mechanism (with usacExtElementType being ID_EXT_ELEJVIPEGS or
ID_EXT_ELE_SAOC), the time-alignment between the USAC data and the MPS/SAOC
data assumes the most efficient connection between the USAC decoder and the
MPS/SAOC decoder. If the SBR tool in USAC is active and if MPS/SAOC employs a 64
band QMF domain representation (see ISO/IEC 23003-1 6.6.3), the most efficient
connection is in the QMF domain. Otherwise, the most efficient connection is in the time
domain. This corresponds to the time-alignment for the combination of HE-AAC and MPS
as defined in ISO/IEC 23003-1 4.4, 4.5, and 7.2.1.
The additional delay introduced by adding MPS decoding after USAC decoding is given
by ISO/IEC 23003-1 4.5 and depends on whether HQ MPS or LP MPS is used, and
whether MPS is connected to USAC in the QMF domain or in the time domain.
ISO/IEC 23003-1 4.4 clarifies the interface between USAC and MPEG Systems. Every
access unit delivered to the audio decoder from the systems interface shall result in a
corresponding composition unit delivered from the audio decoder to the systems interface,
i.e., the compositor. This shall include start-up and shut-down conditions, i.e., when the
access unit is the first or the last in a finite sequence of access units.
For an audio composition unit, ISO/IEC 14496-1 7.1.3.5 Composition Time Stamp (CTS)
specifies that the composition time applies to the n-th audio sample within the composition
unit. For USAC, the value of n is always 1. Note that this applies to the output of the
USAC decoder itself. In the case that a USAC decoder is, for example, being combined
with an MPS decoder needs to be taken into account for the composition units delivered at
the output of the MPS decoder.
If MPS/SAOC side infomiation is embedded into a USAC bitstream by means of the
usacExtElement mechanism (with usacExtElementType being ID_EXT_ELEJvlPEGS or
IDJBXT_ELE_SAOC), the following restrictions may, optionally, apply:
The MPS/SAOC sacTimeAlign parameter (see ISO/IEC 23003-1 7.2.5) shall have
the value 0.
The sampling frequency of MPS/SAOC shall be the same as the output sampling
frequency of USAC.
The MPS/SAOC bsFrameLength parameter (see ISO/IEC 23003-1 5.2) shall have
one of the allowed values of a predetermined list.
The USAC bitstream payload syntax is shown in Fig. 4n to 4r, and the syntax of subsidiary
payload elements shown in Fig. 4s-w, and enhanced SBR payload syntax is shown in Fig.
4x to 4zc.
Short Description of Data Elements
UsacConfigO This element contains information about the contained audio
content as well as everything needed for the complete
decoder set-up
UsacChannelConfigO This element give information about the contained bitstream
elements and their mapping to loudspeakers
UsacDecoderConfigO This element contains all further information required by the
decoder to interpret the bitstream. In particular the SBR
resampling ratio is signaled here and the structure of the
bitstream is defined here by explicitly stating the number of
elements and their order in the bitstream
UsacConfigExtensionQ Configuration extension mechanism to extend the
configuration for future configuration extensions for USAC.
UsacSingleChannelElementConfigO contains all information needed for
configuring the decoder to decode one single channel. This is
essentially the core coder related information and if SBR is
used the SBR related information.
UsacChannelPairElementConfigO In analogy to the above this element configuration
contains all information needed for configuring the decoder
to decode one channel pair. In addition to the above
mentioned core config and sbr configuration this includes
stereo specific configurations like the exact kind of stereo
coding applied (with or without MPS212, residual etc.). This
element covers all kinds of stereo coding options currently
available in USAC.
UsacLfeElementConfigO The LFE element configuration does not contain
configuration data as an LFE element has a static
configuration.
UsacExtEIementConfigO This element configuration can be used for configuring any
kind of existing or future extensions to the codec. Each
extension element type has its own dedicated type value. A
length field is included in order to be able to skip over
configuration extensions unknown to the decoder.
UsacCoreConfigO contains configuration data which have impact on the core
coder set-up.
SbrConfigO contains default values for the configuration elements of
eSBR that are typically kept constant. Furthermore, static
SBR configuration elements are also carried in SbrConfig().
These static bits include flags for en- or disabling particular
features of the enhanced SBR, like harmonic transposition or
inter TES.
SbrDfltHeaderO This element carries a default version of the elements of the
SbrHeader() that can be referred to if no differing values for
these elements are desired.
Mps212ConfigO All set-up parameters for the MPEG Surround 2- 1-2 tools are
assembled in this configuration.
escapedValueO this element implements a general method to transmit an
integer value using a varying number of bits. It features a
two level escape mechanism which allows to extend the
representable range of values by successive transmission of
additional bits.
usacSampIingFrequencylndex This index determines the sampling frequency of the
audio signal after decoding. The value of
usacSampIingFrequencylndex and their associated sampling
frequencies are described in Table C.
TableC - Value and meaning of usacSampIingFrequencylndex
usacSampIingFrequencylndex sampling frequency
0x00 96000
0x01 88200
0x02 64000
0x03 48000
0x04 441 00
0x05 32000
0x06 24000
0x07 22050
0x08 16000
0x09 12000
0x0a 11025
0x0b 8000
OxOc 7350
OxOd reserved
OxOe reserved
OxOf 57600
0x1 0 5 1200
0x1 1 40000
0x1 2 38400
0x1 3 341 50
0x14 28800
0x1 5 25600
0x16 20000
0x1 7 19200
0x1 8 7075
0x1 9 14400
0x1 a 12800
0x1 b 9600
0x1 c reserved
0x1 d reserved
0x1 e reserved
0x1 f escape value
NOTE: The values of UsacSamplingFrequencylndex 0x00 up to
OxOe are identical to those of the samplingFrequencylndex 0x0
up to Oxe contained in the AudioSpecificConfig() specified in
ISO/IEC 14496-3:2009
usacSamplingFrequency Output sampling frequency of the decoder coded as unsigned
integer value in case UsacSamplingFrequencylndex equals
zero.
channelConfigurationlndex This index determines the channel configuration. If
channelConfigurationlndex > 0 the index unambiguously
defines the number of channels, channel elements and
associated loudspeaker mapping according to Table Y. The
names of the loudspeaker positions, the used abbreviations
and the general position of the available loudspeakers can be
deduced from Figs. 3a, 3b and Figs. 4a and 4b.
bsOutputChannelPos This index describes loudspeaker positions which are
associated to a given channel according to Table XX. Figure
Y indicates the loudspeaker position in the 3D environment
of the listener. In order to ease the understanding of
loudspeaker positions Table XX also contains loudspeaker
positions according to IEC 100/1706/CDV which are listed
here for information to the interested reader.
Table - Values of coreCoderFrameLength, sbrRatio, outputFrameLength and numSlots
de endin on coreSbrFrameLengthlndex
5 usacConfigExtensionPresent Indicates the presence of extensions to the configuration
numOutChannels If the value of channelConfigurationlndex indicates that none
of the pre-defined channel configurations is used then this
element determines the number of audio channels for which a
10 specific loudspeaker position shall be associated.
numElements This field contains the number of elements that will follow in
the loop over element types in the UsacDecoderConfigO
15 usacElementType [elemIdx] defines the USAC channel element type of the element at
position elemldx in the bitstream. Four element types exist,
one for each of the four basic bitstream elements:
UsacSingleChannelElement(), UsacChannelPairElement(),
UsacLfeElement(),UsacExtElement(). These elements
20 provide the necessary top level structure while maintaining
all needed flexibility. The meaning of usacElementType is
defined in Table A.
TableA- Value of usacElementT e
'25
stereoConfiglndex This element determines the inner structure of a
UsacChannelPairElement(). It indicates the use of a mono or
stereo core, use of MPS212, whether stereo SBR is applied,
and whether residual coding is applied in MPS212 according
to Table ZZ. This element also defines the values of the
helper elements bsStereoSbr and bsResidua!Coding.
Table ZZ - Values of stereoConfiglndex and its meaning and implicit assignment of bsStereoSbr and
tw mdct This flag signals the usage of the time-warped MDCT in this
stream.
noiseFilling This flag signals the usage of the noise filling of spectral
holes in the FD core coder.
harmonicSBR This flag signals the usage of the harmonic patching for the
SBR.
bs_interTes This flag signals the usage of the inter-TES tool in SBR.
dflt start freq This is the default value for the bitstream element
bs_start_freq, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderQ elements shall be assumed.
dflt_stop_freq This is the default value for the bitstream element
bs_stop_freq, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderQ elements shall be assumed.
d t header extral This is the default value for the bitstream element
bs_header_extral, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderO elements shall be assumed.
dflt_header_extra2 This is the default value for the bitstream element
bs_header_extra2, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderO elements shall be assumed.
dflt_freq_scale This is the default value for the bitstream element
bs_freq_scale, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderO elements shall be assumed.
dflt_alter_scale This is the default value for the bitstream element
bs_alter_scale, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderO elements shall be assumed.
dflt noise bands This is the default value for the bitstream element
bs_noise_bands, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderO elements shall be assumed.
dflt limiter bands This is the default value for the bitstream element
bs_limiter_bands, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderO elements shall be assumed.
dflt_Iimiter_gains This is the default value for the bitstream element
bs_limiter_gains, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderO elements shall be assumed.
dflt_interpoI_freq This is the default value for the bitstream element
bs_interpol_freq, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeaderO elements shall be assumed.
dflt_smoothing_mode This is the default value for the bitstream element
bs_smoothing_mode, which is applied in case the flag
sbrUseDfltHeader indicates that default values for the
SbrHeader() elements shall be assumed.
usacExtElementType this element allows to signal bitstream extensions types. The
meaning of usacExtElementType is defined in Table B.
TableB- Value of usacExtElementT e
usacExtEIementConfigLength signals the length of the extension configuration in
bytes (octets).
usacExtElementDefaultLengthPresent This flag signals whether a
usacExtElementDefaultLength is conveyed in the
UsacExtElementConfigO .
usacExtElementDefaultLength signals the default length of the extension element in
bytes. Only if the extension element in a given access unit
deviates from this value, an additional length needs to be
transmitted in the bitstream. If this element is not explicitly
transmitted (usacExtElementDefaultLengthPresent—O) then
the value of usacExtElementDefaultLength shall be set to
zero.
usacExtElementPayloadFrag This flag indicates whether the payload of this
extension element may be fragmented and send as several
segments in consecutive USAC frames.
numConflgExtensions If extensions to the configuration are present in the
UsacConfigO this value indicates the number of signaled
configuration extensions.
confExtldx Index to the configuration extensions.
usacConfigExtType This element allows to signal configuration extension types.
The meaning of usacExtElementType is defined in Table D.
TableD - Value of usacConfi ExtT e
usacConfigExtLength signals the length of the configuration extension in bytes
(octets).
bsPseudoLr This flag signals that an inverse mid/side rotation should be
applied to the core signal prior to Mps212 processing.
Table - bsPseudoLr
bsStereoSbr This flag signals the usage of the stereo SBR in combination
with MPEG Surround decoding.
Table - bsStereoSbr
bsResidualCoding indicates whether residual coding is applied according to the
Table below. The value of bsResidualCoding is defined by
stereoConfiglndex (see X).
TableX - bsResidualCoding
bsResidualCoding Meaning
0 no residual coding, core coder is mono
1 residual coding, core coder is stereo
sbrRatioIndex indicates the ratio between the core sampling rate and the
sampling rate after eSBR processing. At the same time it
indicates the number of QMF analysis and synthesis bands
used in SBR according to the Table below.
Table- Definition of sbrRatioIndex
elemldx Index to the elements present in the UsacDecoderConfig()
and the UsacFrameQ.
UsacConfigO
The UsacConfigO contains information about output sampling frequency and channel
configuration. This information shall be identical to the information signaled outside of
this element, e.g. in an MPEG-4 AudioSpecificConfig().
Usac Output Sampling Frequency
If the sampling rate is not one of the rates listed in the right column in Table 1, the
sampling frequency dependent tables (code tables, scale factor band tables etc.) must be
deduced in order for the bitstream payload to be parsed. Since a given sampling frequency
is associated with only one sampling frequency table, and since maximum flexibility is
desired in the range of possible sampling frequencies, the following table shall be used to
associate an implied sampling frequency with the desired sampling frequency dependent
tables.
Table 1 - Sampling frequency mapping
Frequency range (in Hz) Use tables for sampling frequency (in Hz)
f >= 92017 96000
92017 > f >= 75132 88200
75132 > f >= 55426 64000
55426 > f >= 46009 48000
46009 > f >= 37566 44100
37566 > f >= 27713 32000
27713 > f >= 23004 24000
23004 > f >= 18783 22050
18783 > f >= 13856 16000
13856 > f >= 11502 12000
1 502 > f >= 9391 11025
9391 > f 8000
UsacChannelConfig
The channel configuration table covers most common loudspeaker positions. For further
flexibility channels can be mapped to an overall selection of 32 loudspeaker positions
found in modern loudspeaker setups in various applications (see Figs. 3a, 3b)
For each channel contained in the bitstream the UsacChannelConfig() specifies the
associated loudspeaker position to which this particular channel shall be mapped. The
loudspeaker positions which are indexed by bsOutputChannelPos are listed in Table X. In
case of multiple channel elements the index i of bsOutputChannelPos [i] indicates the
position in which the channel appears in the bitstream. Figure Y gives an overview over
the loudspeaker positions in relation to the listener.
More precisely the channels are numbered in the sequence in which they appear in the
bitstream starting with 0 (zero). In the trivial case of a UsacSingleChannelElement() or
UsacLfeElement() the channel number is assigned to that channel and the channel count is
increased by one. In case of a UsacChannelPairElement() the first channel in that element
(with index ch==0) is numbered first, whereas the second channel in that same element
(with index ch==l) receives the next higher number and the channel count is increased by
two.
It follows that numOutChannels shall be equal to or smaller than the accumulated sum of
all channels contained in the bitstream. The accumulated sum of all channels is equivalent
to the number of all UsacSingleChannelElement()'s plus the number of all
UsacLfeElement()'s plus two times the number of all UsacChannelPairElement()'s.
All entries in the array bsOutputChannelPos shall be mutually distinct in order to avoid
double assignment of loudspeaker positions in the bitstream.
In the special case that channelConfigurationlndex is 0 and numOutChannels is smaller
than the accumulated sum of all channels contained in the bitstream, then the handling of
the non-assigned channels is outside of the scope of this specification. Information about
this can e.g. be conveyed by appropriate means in higher application layers or by
specifically designed (private) extension payloads.
UsacDecoderConfigO
The UsacDecoderConfigO contains all further information required by the decoder to
interpret the bitstream. Firstly the value of sbrRatioIndex determines the ratio between core
coder frame length (ccfl) and the output frame length. Following the sbrRatioIndex is a
loop over all channel elements in the present bitstream. For each iteration the type of
element is signaled in usacElementTypeQ, immediately followed by its corresponding
configuration structure. The order in which the various elements are present in the
UsacDecoderConfigO shall be identical to the order of the corresponding payload in the
UsacFrame().
Each instance of an element can be configured independently. When reading each channel
element in UsacFrame(), for each element the corresponding configuration of that instance,
i.e. with the same elemldx, shall be used.
UsacSingleChannelEIementConfigO
The UsacSingleChannelElementConfigO contains all information needed for configuring
the decoder to decode one single channel. SBR configuration data is only transmitted if
SBR is actually employed.
UsacChannelPairEIementConfigO
The UsacChannelPairEIementConfigO contains core coder related configuration data as
well as SBR configuration data depending on the use of SBR. The exact type of stereo
coding algorithm is indicated by the stereoConfiglndex. In USAC a channel pair can be
encoded in various ways. These are:
1. Stereo core coder pair using traditional joint stereo coding techniques, extended by
the possibility of complex prediction in the MDCT domain
2. Mono core coder channel in combination with MPEG Surround based MPS212 for
fully parametric stereo coding. Mono SBR processing is applied on the core signal.
3. Stereo core coder pair in combination with MPEG Surround based MPS212, where
the first core coder channel carries a downmix signal and the second channel
carries a residual signal. The residual may be band limited to realize partial residual
coding. Mono SBR processing is applied only on the downmix signal before
MPS212 processing.
4. Stereo core coder pair in combination with MPEG Surround based MPS212, where
the first core coder channel carries a downmix signal and the second channel
5 carries a residual signal. The residual may be band limited to realize partial residual
coding. Stereo SBR is applied on the reconstructed stereo signal after MPS212
processing.
Option 3 and 4 can be further combined with a pseudo LR channel rotation after the core
10 decoder.
UsacLfeEIementConfigO
Since the use of the time warped MDCT and noise filling is not allowed for LFE channels,
there is no need to transmit the usual core coder flag for these tools. They shall be set to
15 zero instead.
Also the use of SBR is not allowed nor meaningful in an LFE context. Thus, SBR
configuration data is not transmitted.
20 UsacCoreConfigO
The UsacCoreConfigO only contains flags to en- or disable the use of the time warped
MDCT and spectral noise filling on a global bitstream level. If twjmdct is set to zero, time
warping shall not be applied. If noiseFilling is set to zero the spectral noise filling shall not
be applied.
25
SbrConfigO
The SbrConfigO bitstream element serves the purpose of signaling the exact eSBR setup
parameters. On one hand the SbrConfigO signals the general employment of eSBR tools.
On the other hand it contains a default version of the SbrHeader(), the SbrDfltHeaderO-
30 The values of this default header shall be assumed if no differing SbrHeader() is
transmitted in the bitstream. The background of this mechanism is, that typically only one
set of SbrHeaderO values are applied in one bitstream. The transmission of the
SbrDfltHeader() then allows to refer to this default set of values very efficiently by using
only one bit in the bitstream. The possibility to vary the values of the SbrHeader on the fly
35 is still retained by allowing the in-band transmission of a new SbrHeader in the bitstream
itself.
SbrDfltHeaderO
The SbrDfltHeader() is what may be called the basic SbrHeader() template and should
contain the values for the predominantly used eSBR configuration. In the bitstream this
configuration can be referred to by setting the sbrUseDfltHeader flag. The structure of the
SbrDfltHeader() is identical to that of SbrHeader(). In order to be able to distinguish
between the values of the SbrDfltHeaderQ and SbrHeader(), the bit fields in the
SbrDfltHeaderO are prefixed with "dflt_" instead of "bs_". If the use of the
SbrDfltHeader() is indicated, then the SbrHeader() bit fields shall assume the values of the
corresponding SbrDfltHeader(), i.e.
bs_start_f eq = df lt_start _freq ;
bs_stop_f req = df lt_stop_f req;
etc .
(continue for all elements i n SbrHeader ( ) , l i ke :
bs_xxx_yyy = df lt_xxx_yyy;
Mps212ConfigO
The Mps212Config() resembles the SpatialSpecificConfig() of MPEG Surround and was in
large parts deduced from that. It is however reduced in extent to contain only information
relevant for mono to stereo upmixing in the USAC context. Consequently MPS212
configures only one OTT box.
UsacExtElementConfigO
The UsacExtElementConfigO is a general container for configuration data of extension
elements for USAC. Each USAC extension has a unique type identifier,
usacExtElementType, which is defined in Table X. For each UsacExtElementConfigO the
length of the contained extension configuration is transmitted in the variable
usacExtElementConfigLength and allows decoders to safely skip over extension elements
whose usacExtElementType is unknown.
For USAC extensions which typically have a constant payload length, the
UsacExtElementConfigO allows the transmission of a usacExtElementDefaultLength.
Defining a default payload length in the configuration allows a highly efficient signaling of
the usacExtElementPayloadLength inside the UsacExtElement(), where bit consumption
needs to be kept low.
In case of USAC extensions where a larger amount of data is accumulated and transmitted
not on a per frame basis but only every second frame or even more rarely, this data may be
transmitted in fragments or segments spread over several USAC frames. This can be
helpful in order to keep the bit reservoir more equalized. The use of this mechanism is
signaled by the flag usacExtElementPayloadFrag flag. The fragmentation mechanism is
further explained in the description of the usacExtElement in 6.2.X.
UsacConfigExtensionO
The UsacConfigExtensionO is a general container for extensions of the UsacConfig(). It
provides a convenient way to amend or extend the information exchanged at the time of
the decoder initialization or set-up. The presence of config extensions is indicated by
usacConfigExtensionPresent. If config extensions are present
(usacConfigExtensionPresent==l), the exact number of these extensions follows in the bit
field numConfigExtensions. Each configuration extension has a unique type identifier,
usacConfigExtType, which is defined in Table X. For each UsacConfigExtension the
length of the contained configuration extension is transmitted in the variable
usacConfigExtLength and allows the configuration bitstream parser to safely skip over
configuration extensions whose usacConfigExtType is unknown.
Top level payloads for the audio object type USAC
Terms and definitions
UsacFrameO This block of data contains audio data for a time period of
one USAC frame, related information and other data. As
signaled in UsacDecoderConfig(), the UsacFrame() contains
numElements elements. These elements can contain audio
data, for one or two channels, audio data for low frequency
enhancement or extension payload.
UsacSingleChannelElementO Abbreviation SCE. Syntactic element of the bitstream
containing coded data for a single audio channel. A
single_channel_element() basically consists of the
UsacCoreCoderData(), containing data for either FD or LPD
core coder. In case SBR is active, the
UsacSingleChannelElement also contains SBR data.
UsacChannelPairElement() Abbreviation CPE. Syntactic element of the bitstream
payload containing data for a pair of channels. The channel
pair can be achieved either by transmitting two discrete
channels or by one discrete channel and related Mps
payload. This is signaled by means of the stereoConfiglndex.
The UsacChannelPairElement further contains SBR data in
case SBR is active.
UsacLfeElement() Abbreviation LFE. Syntactic element that contains a low
sampling frequency enhancement channel. LFEs are always
encoded using the fd_channel_stream() element.
UsacExtElement() Syntactic element that contains extension payload. The length
of an extension element is either signaled as a default length
in the configuration (USACExtElementConfigO) or signaled
in the UsacExtElement() itself. If present, the extension
payload is of type usacExtElementType, as signaled in the
configuration.
usacIndependencyFlag indicates if the current UsacFrame() can be decoded entirely
without the knowledge of information from previous frames
according to the Table below
Table - Meaning of usacIndependencyFlag
value of Meaning
usacIndependencyFlag
Decoding of data conveyed in
0 UsacFrame() might require access to
the previous UsacFrame().
Decoding of data conveyed in
1 UsacFrameO is possible without
access to the previous UsacFrameQ.
NOTE: Please refer to X.Y for recommendations on the use
of the usacIndependencyFlag.
usacExtElementUseDefaultLength indicates whether the length of the extension element
corresponds to usacExtElementDefaultLength, which was
defined in the UsacExtElementConfig().
usacExtElementPayloadLength shall contain the length of the extension element in
bytes. This value should only be explicitly transmitted in the
bitstream if the length of the extension element in the present
access unit deviates from the default value,
usacExtElementDefaultLength.
usacExtElementStart Indicates if the present usacExtElementSegmentData begins a
data block.
usacExtElementStop Indicates if the present usacExtElementSegmentData ends a
data block.
usacExtElementSegmentData The concatenation of all usacExtElementSegmentData
from UsacExtElement() of consecutive USAC frames,
starting from the UsacExtElement() with
usacExtElementStart==l up to and including the
UsacExtElement() with usacExtElementStop==l forms one
data block. In case a complete data block is contained in one
UsacExtElementO, usacExtElementStart and
usacExtElementStop shall both be set to 1. The data blocks
are interpreted as a byte aligned extension payload depending
on usacExtEIementType according to the following Table:
Table - Interpretation of data blocks for USAC extension payload decoding
usacExtEIementType The concatenated
usacExtElementSegmentData represents:
ID_EXT_ELE_FIL Series of fill_byte
1D_EXT_ELE_M PEGS SpatialFrameO
ID_EXT_ELE_SAOC SaocFrame()
unknown unknown data. The data block shall be
discarded.
fill_byte Octet of bits which may be used to pad the bitstream with bits
that carry no information. The exact bit pattern used for
fill_byte should be 010010G.
Helper Elements
nrCoreCoderChannels In the context of a channel pair element this variable
indicates the number of core coder channels which form the
basis for stereo coding. Depending on the value of
stereoConfiglndex this value shall be 1 or 2.
nrSbrChannels In the context of a channel pair element this variable
indicates the number of channels on which SBR processing is
applied. Depending on the value of stereoConfiglndex this
value shall be 1 or 2.
Subsidiary payloads for USAC
Terms and Definitions
UsacCoreCoderDataQ This block of data contains the core-coder audio data. The
payload element contains data for one or two core-coder
channels, for either FD or LPD mode. The specific mode is
signaled per channel at the beginning of the element.
StereoCoreToolInfoQ All stereo related information is captured in this element. It
deals with the numerous dependencies of bits fields in the
stereo coding modes.
Helper Elements
commonCoreMode in a CPE this flag indicates if both encoded core coder
channels use the same mode.
Mps212Data() This block of data contains payload for the Mps212 stereo
module. The presence of this data is dependent on the
stereoConfiglndex.
common window indicates if channel 0 and channel 1 of a CPE use identical
window parameters.
common_tw indicates if channel 0 and channel 1 of a CPE use identical
parameters for the time warped MDCT.
Decoding of UsacFrameO
One UsacFrame() forms one access unit of the USAC bitstream. Each UsacFrame decodes
into 768, 1024, 2048 or 4096 output samples according to the outputFrameLength
determined from Table X.
The first bit in the UsacFrame() is the usacIndependencyFlag, which determines if a given
frame can be decoded without any knowledge of the previous frame. If the
usacIndependencyFlag is set to 0, then dependencies to the previous frame may be present
in the payload of the current frame.
The UsacFrame() is further made up of one or more syntactic elements which shall appear
in the bitstream in the same order as their corresponding configuration elements in the
UsacDecoderConfig(). The position of each element in the series of all elements is indexed
by elemldx. For each element the corresponding configuration, as transmitted in the
UsacDecoderConfigO, of that instance, i.e. with the same elemldx, shall be used.
These syntactic elements are of one of four types, which are listed in Table X. The type of
each of these elements is determined by usacElementType. There may be multiple
elements of the same type. Elements occurring at the same position elemldx in different
frames shall belong to the same stream.
Table - Exam les of sim le ossible bitstream a loads
If these bitstream payloads are to be transmitted over a constant rate channel then they
might include an extension payload element with an usacExtElementType of
IDJEXTJELEJFILL to adjust the instantaneous bitrate. In this case an example of a coded
stereo signal is:
Table - Examples of simple stereo bitstream
with extension a load for writin fi l bits.
Decoding of UsacSingleChannelEIementO
The simple structure of the UsacSingleChannelEIementO is made up of one instance of a
UsacCoreCoderData() element with nrCoreCoderChannels set to 1. Depending on the
sbrRatioIndex of this element a UsacSbrData() element follows with nrSbrChannels set to
1 as well.
Decoding of UsacExtElementO
UsacExtElement() structures in a bitstream can be decoded or skipped by a USAC decoder.
Every extension is identified by a usacExtElementType, conveyed in the
UsacExtElement() 's associated UsacExtElementConfig(). For each usacExtElementType a
specific decoder can be present.
If a decoder for the extension is available to the USAC decoder then the payload of the
extension is forwarded to the extension decoder immediately after the UsacExtElementO
has been parsed by the USAC decoder.
If no decoder for the extension is available to the USAC decoder, a minimum of structure
is provided within the bitstream, so that the extension can be ignored by the USAC
decoder.
The length of an extension element is either specified by a default length in octets, which
can be signaled within the corresponding UsacExtElementConfig() and which can be
overruled in the UsacExtElementO, or by an explicitly provided length information in the
UsacExtElementO, which is either one or three octets long, using the syntactic element
escapedValueOExtension
payloads that span one or more UsacFrame()'s can be fragmented and their
payload be distributed among several UsacFrame()'s. In this case the
usacExtElementPayloadFrag flag is set to 1 and a decoder must collect all fragments from
the UsacFrame() with usacExtElementStart set to 1 up to and including the UsacFrame()
with usacExtElementStop set to 1. When usacExtElementStop is set to 1 then the extension
is considered to be complete and is passed to the extension decoder.
Note that integrity protection for a fragmented extension payload is not provided by this
specification and other means should be used to ensure completeness of extension
pay loads.
Note, that all extension payload data is assumed to be byte-aligned.
Each UsacExtElement() shall obey the requirements resulting from the use of the
usacIndependencyFlag. Put more explicitly, if the usacIndependencyFlag is set (==1) the
UsacExtElement() shall be decodable without knowledge of the previous frame (and the
extension payload that may be contained in it).
Decoding Process
The stereoConfiglndex, which is transmitted in the UsacChannelPairElementConfig(),
determines the exact type of stereo coding which is applied in the given CPE. Depending
on this type of stereo coding either one or two core coder channels are actually transmitted
in the bitstream and the variable nrCoreCoderChannels needs to be set accordingly. The
syntax element UsacCoreCoderData() then provides the data for one or two core coder
channels.
Similarly the there may be data available for one or two channels depending on the type of
stereo coding and the use of eSBR (ie. if sbrRatioIndex>0). The value of nrSbrChannels
needs to be set accordingly and the syntax element UsacSbrData() provides the eSBR data
for one or two channels.
Finally Mps212Data() is transmitted depending on the value of stereoConfiglndex.
Low frequency enhancement (LFE) channel element, UsacLfeElementO
General
In order to maintain a regular structure in the decoder, the UsacLfeElementO is defined as
a standard fd_channel_stream(0,0,0,0,x) element, i.e. it is equal to a UsacCoreCoderData()
using the frequency domain coder. Thus, decoding can be done using the standard
procedure for decoding a UsacCoreCoderDataQ-element.
In order to accommodate a more bitrate and hardware efficient implementation of the LFE
decoder, however, several restrictions apply to the options used for the encoding of this
element:
The window_sequence field is always set to 0 (ONLY_LONG_SEQUENCE)
Only the lowest 24 spectral coefficients of any LFE may be non-zero
No Temporal Noise Shaping is used, i.e. tns_data_present is set to 0
Time warping is not active
No noise filling is applied
UsacCoreCoderDataO
The UsacCoreCoderDataO contains all information for decoding one or two core coder
channels.
The order of decoding is:
• get the core_mode[] for each channel
• in case of two core coded channels (nrChannels==2), parse the
StereoCoreToolInfo() and determine all stereo related parameters
· Depending on the signaled core_modes transmit an lpd_channel_stream() or an
fd_channel_stream() for each channel
As can be seen from the above list, the decoding of one core coder channel
(nrChannels==l) results in obtaining the core_mode bit followed by one
lpd_channel_stream or fd_channel_stream, depending on the core_mode.
In the two core coder channel case, some signaling redundancies between channels can be
exploited in particular if the core_mode of both channels is 0. See 6.2.X (Decoding of
StereoCoreToolInfoO) for details
StereoCoreToolInfoO
The StereoCoreToolInfoO allows to efficiently code parameters, whose values may be
shared across core coder channels of a CPE in case both channels are coded in FD mode
In particular the following data elements are shared, when the
appropriate flag in the bitstream is set to 1.
Table - Bitstream elements shared across channels of a core coder channel pair
common_xxx flag is set to 1 channels 0 and 1 share the following
elements:
common_window ics_info()
common_window && common_max_sfb max_sfb
common_tw tw_data()
common_tns tns_data()
If the appropriate flag is not set then the data elements are transmitted individually for each
core coder channel either in StereoCoreToolInfo() (max_sfb, max_sfbl) or in the
5 fd_channel_stream() which follows the StereoCoreToolInfo() in the UsacCoreCoderData()
element.
In case of common_window==l the StereoCoreToolInfo() also contains the information
about M/S stereo coding and complex prediction data in the MDCT domain (see 7.7.2).
10
UsacSbrData() This block of data contains payload for the SBR bandwidth
extension for one or two channels. The presence of this data
is dependent on the sbrRatioIndex.
15 SbrlnfoO This element contains SBR control parameters which do not
require a decoder reset when changed.
SbrHeader() This element contains SBR header data with SBR
configuration parameters, that typically do not change over
20 the duration of a bitstream.
SBR payload for USAC
In USAC the SBR payload is transmitted in UsacSbrData(), which is an integral part of
each single channel element or channel pair element. UsacSbrDataO follows immediately
25 UsacCoreCoderData(). There is no SBR payload for LFE channels.
numSlots The number of time slots in an Mps212Data frame.
Although some aspects have been described in the context of an apparatus, it is clear that
30 these aspects also represent a description of the corresponding method, where a block or
device corresponds to a method step or a feature of a method step. Analogously, aspects
described in the context of a method step also represent a description of a corresponding
block or item or feature of a corresponding apparatus.
Depending on certain implementation requirements, embodiments of the invention can be
implemented in hardware or in software. The implementation can be performed using a
digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an
EPROM, an EEPROM or a FLASH memory, having electronically readable control
signals stored thereon, which cooperate (or are capable of cooperating) with a
programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a non-transitory data carrier
having electronically readable control signals, which are capable of cooperating with a
programmable computer system, such that one of the methods described herein is
performed.
The encoded audio signal can be transmitted via a wireline or wireless transmission
medium or can be stored on a machine readable carrier or on a non-transitory storage
medium.
Generally, embodiments of the present invention can be implemented as a computer
program product with a program code, the program code being operative for performing
one of the methods when the computer program product runs on a computer. The program
code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods
described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program
having a program code for performing one of the methods described herein, when the
computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital
storage medium, or a computer-readable medium) comprising, recorded thereon, the
computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of
signals representing the computer program for performing one of the methods described
herein. The data stream or the sequence of signals may for example be configured to be
transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a
programmable logic device, configured to or adapted to perform one of the methods
described herein.
A further embodiment comprises a computer having installed thereon the computer
program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable
gate array) may be used to perform some or all of the functionalities of the methods
described herein. In some embodiments, a field programmable gate array may cooperate
with a microprocessor in order to perform one of the methods described herein. Generally,
the methods are preferably performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of the present
invention. It is understood that modifications and variations of the arrangements and the
details described herein will be apparent to others skilled in the art. It is the intent,
therefore, to be limited only by the scope of the impending patent claims and not by the
specific details presented by way of description and explanation of the embodiments
herein.
Claims
A bitstream comprising a configuration block (28) and a sequence of frames (20)
respectively representing consecutive time periods (18) of an audio content (10),
wherein the sequence of frames (20) is a composition of N sequences of frame
elements (22) with each frame element (22) being of a respective one of a plurality
of element types so that each frame (20) comprises one frame element (22) out of
the N sequences of frame elements (22), respectively, and for each sequence of
frame elements (22), the frame elements (22) are of equal element type relative to
each other,
wherein the configuration block (28) comprises, for at least one of the sequences of
frame elements (22), a default payload length information (60) on a default payload
length, and
wherein each frame element (22) of the at least one of the sequences of frame
elements (22), comprises a length information (58) comprising, for at least a subset
of the frame elements (22) of the at least one of the sequences of frame elements
(22), a default payload length flag (64) followed, if the default payload length flag
(64) is not set, by a payload length value (66),
wherein any frame element of the at least one of the sequences of frame elements
(22), the default extension payload length flag (64) of which is set, has the default
payload length, and any frame element of the at least one of the sequences of frame
elements (22), the default extension payload length flag (64) of which is not set, has
a payload length corresponding to the payload length value (66).
Bitstream according to claim 1, wherein the configuration block (28) comprises
a field (50) indicating a number of elements N, and
a type indication syntax portion (52) indicating, for each element position of a
sequence of N element positions, an element type out of a plurality of element
types;
wherein each frame element is of the element type indicated, by the type indication
syntax portion (52), for the respective element position at which the respective
frame element (22) is positioned within the sequence of N frame elements of the
respective frame (20) in the bitstream (12).
Bitstream according to claim 2, wherein the type indication syntax portion (52)
comprises a sequence of N syntax elements (54) with each syntax element (54)
indicating the element type for the respective element position at which the
respective syntax element (54) is positioned within the type indication syntax
portion (52).
Bitstream according to any of claims 1 to 3, wherein the configuration block (28)
comprises one configuration element (56) per sequence of frame elements (22),
comprising configuration information for the element type of which the frame
elements of the respective sequence of frame elements are.
Bitstream according to claim 4, wherein the type indication syntax portion (52)
comprises a sequence of N syntax elements (54) with each syntax element (54)
indicating the element type for the respective element position at which the
respective syntax element (54) is positioned within the type indication syntax
portion (52), and the configuration elements (56) and the syntax elements are
arranged in the bitstream alternately.
Bitstream according to claim 5 or 6, wherein for each frame element (22) of the at
least one sequence of frame elements (22), the length information (58) comprises
an extension payload present flag (70), wherein any frame element (22b), the
extension payload present flag (70) of the length information (58) of which is not
set, merely consists of the extension payload present flag (70), and the length
information (58) of any frame element (22b), the payload data present flag (70) of
the length information (58) of which is set, further comprises the default payload
length flag (64) followed, if the default payload length flag (64) is not set, by the
payload length value (66).
Bitstream according to any of claims 1 to 6, wherein the configuration block (28)
comprises, for the at least one of the sequences of frame elements (22), a
configuration element (56) comprising configuration information, wherein the
configuration information comprises an extension element type field (72) indicating
a payload data type out of a plurality of payload data types, wherein the plurality of
payload data types comprises a multi-channel side information type and a multiobject
coding side information type, wherein the configuration information, the
extension element type field (72) of which indicates the multi-channel side
information, also comprises multi-channel side information configuration data (74),
and the configuration information the extension element type field (72) of which
indicates the multi-object side information type, also comprise multi-object side
information configuration data (74), and the frame elements (22b) of the at least
one sequence of frame elements (22), convey payload data of the payload data type
indicated by the extension element type field (72) of the configuration information
of the configuration element for the respective sequence of frame elements.
8. Decoder for decoding a bitstream (12) comprising a configuration block (28) and a
sequence of frames (20) respectively representing consecutive time periods of an
audio content (10), wherein the sequence of frames (20) is a composition of N
sequences of frame elements (22) with each frame element (22) being of a
respective one of a plurality of element types so that each frame (20) comprises one
frame element (22) out of the N sequences of frame elements (22), respectively,
and for each sequence of frame elements (22), the frame elements (22) are of equal
element type relative to each other,
wherein the decoder is configured to parse the bitstream (12) and reconstruct the
audio content based on a subset of the sequences of frame elements and to, with
respect to at least one of the sequences of frame elements (22), not belonging to the
subset of the sequences of frame elements,
read from the configuration block (28), for the at least one of the sequences of
frame elements (22), a default payload length information (60) on a default payload
length, and
for each frame element (22) of the at least one of the sequences of frame elements
(22), read a length information from the bitstream (12), the reading of the length
information (58) comprising, for at least a subset of the frame elements (22) of the
at least one of the sequences of frame elements (22), reading a default payload
length flag (64) followed, if the default payload length flag (64) is not set, by
reading a payload length value (66),
skip, in parsing the bitstream (12), any frame element of the at least one of the
sequences of frame elements (22), the default extension payload length flag (64) of
which is set, using the default payload length as skip interval length, and any frame
element of the at least one of the sequences of frame elements (22), the default
extension payload length flag (64) of which is not set, using a payload length
corresponding to the payload length value (66) as skip interval length.
9. Decoder according to claim 8, wherein the decoder is configured to, in reading the
configuration block (28), reading a field (50) indicating the number of elements N,
and a type indication syntax portion (52) indicating, for each element position of a
sequence of N element positions, an element type out of a plurality of element
types, wherein the decoder is configured to decode each frame (20) by
decoding each frame element (22) in accordance with the element type indicated,
by the type indication syntax portion, for the respective element position at which
the respective frame element is positioned within the sequence of N frame elements
(22) of the respective frame (20) in the bitstream (12).
10. Decoder according to claim 9, wherein the decoder is configured to read a sequence
of N syntax elements (54) from the type indication syntax portion (52), with each
element indicating the element type for the respective element position at which the
respective syntax element is positioned in the sequence of N syntax elements.
11. Decoder according to any of claims 8 to 10, wherein the decoder is configured to
read a configuration element (56) for each sequence of frame elements from the
configuration block (28), with each configuration element comprising
configuration information for the respective sequence of frame elements, wherein
the decoder is configured to, in reconstructing the audio content based on a subset
of the sequences of frame elements, decode each frame element (22) of the subset
of the sequences of frame elements using the configuration information of the
respective configuration element.
12. Decoder according to claim 11, wherein the type indication syntax portion (52)
comprises a sequence of N syntax elements (54) with each syntax element (54)
indicating the element type for the respective element position at which the
respective syntax element (54) is positioned within the type indication syntax
portion (52), the decoder is configured to read the configuration elements (56) and
the syntax elements (54) from the bitstream (12) alternately.
13. Decoder according to any of claims 8 to 12, wherein
the decoder is configured to, in reading the length information (58) of any frame
element of the at least one sequence of frame elements, read an extension payload
present flag (70) from the bitstream (12), check as to whether the extension payload
present flag (70) is set, and, if the extension payload present flag (70) is not set,
cease reading the respective frame element (22b) and proceed with reading another
frame element (22) of a current frame (20) or a frame element of a subsequent
frame (20), and if the payload data present flag (70) is set, proceed with reading the
default payload length flag (64) followed, if the default payload length flag (64) is
not set, the payload length value (66) from the bitstream (12), and with the
skipping.
14. Decoder according to any of claims 8 to 13, wherein
the decoder is configured to, in reading the default payload length information (60),
read a default payload length present flag from the bitstream (12),
check as to whether the default payload length present flag is set,
if the default payload length present flag is not set, set the default extension
payload length to be zero, and
if the default payload length present flag is set, explicitly read the default
extension payload length from the bit stream.
1 . Decoder according to any of claims 8 to 14, wherein
the decoder is configured to, in reading the configuration block (28), for each
sequence of frame elements of the at least one sequence of frame elements,
read a configuration element (56) comprising configuration information for an
extension element type from the bitstream (12), wherein the configuration
information comprises an extension element type field (72) indicating a payload
data type out of a plurality of payload data types.
16. Decoder according to claim 15, wherein the plurality of payload data types
comprises a multi-channel side information type and a multi-object coding side
information type,
the decoder is configured to, in reading the configuration block (28), for each of the
at least one sequence of frame elements,
if the extension element type field (72) indicates the multi-channel side
information type, read multi-channel side information configuration data
(74) as part of the configuration information from the data stream (12), and
if the extension element type field (72) indicates the multi-object side
information type, read multi-object side information configuration data (74)
as part of the configuration information from the data stream, and
the decoder is configured to, in decoding each frame,
decode the frame elements of any of the at least one sequence of frame elements ,
for which the extension element type of the configuration element (56) indicates the
multi-channel side information type, by configuring a multi-channel decoder (44e)
using the multi-channel side information configuration data (74) and feeding the
thus configured multi-channel decoder (44e) with payload data (68) of the frame
elements (22b) of the respective sequence of frame elements as multi-channel side
information, and
decode the frame elements of any of the at least one sequence of frame elements ,
for which the extension element type of the configuration element (56) indicates the
multi-object side information type, by configuring a multi-object decoder (44d)
using the multi-object side information configuration data (74) and feeding the thus
configured multi-object decoder (44d) with payload data (68) of the frame elements
(22) of the respective sequence of frame elements.
Decoder according to claim 15 or 16, wherein the decoder is configured to, for any
of the at least one sequence of frame elements ,
read a configuration data length field (76) from the bitstream (12) as part of the
configuration information of the configuration element for the respective sequence
of frame elements,
check as to whether the payload data type indicated by the extension element type
field (72) of the configuration information of the configuration element for the
respective sequence of frame elements, belongs to a predetermined set of payload
data types being a subset of the plurality of payload data types,
if the payload data type indicated by the extension element type field (72) of the
configuration information of the configuration element for the respective sequence
of frame elements, belongs to the predetermined set of payload data types,
read payload data dependent configuration data (74) as part of the
configuration information of the configuration element for the respective
sequence of frame elements from the data stream (12), and
decode the frame elements of the respective sequence of frame elements in
the frames (20), using the payload data dependent configuration data (74),
and
if the payload data type indicated by the extension element type field (72) of the
configuration information of the configuration element for the respective sequence
of frame elements, does not belong to the predetermined set of payload data types,
skip the payload data dependent configuration data (74) using the
configuration data length, and
skip the frame elements of the respective sequence of frame elements in the
frames (20) using the length information (58) therein.
18. Decoder according to any of claims 8 to 17, wherein
the decoder is configured to, in reading the configuration block (28), for each of the
of the at least one sequence of frame elements,
read a configuration element (56) comprising configuration information for
an extension element type from the bitstream (12), wherein the
configuration information comprises an fragmentation use flag (78), and
the decoder is configured to, in reading frame elements (22) of any sequence of
frame elements for which the fragmentation use flag (78) of the configuration
element is set,
read a fragment information from the bitstream, and
use the fragment information to put payload data of these frame elements of
consecutive frames together.
19. Decoder according to any of claims 8 to 18, wherein the decoder is configured such
that the decoder reconstructs an audio signal from frame elements (22) of one of the
subset of the sequences of frame elements which are of a single channel element
type.
20. Decoder according to any of claims 8 to 19, wherein the decoder is configured such
that the decoder reconstructs an audio signal from frame elements (22) of one of the
subset of the sequences of frame elements which are of a channel pair element type.
21. Decoder according to any of claims 8 to 20, wherein the decoder is configured to
use the same variable length code to read the length information (80), the extension
element type field (72), and the configuration data length field (76), .
22. Encoder for encoding of an audio content into a bitstream, the decoder being
configured to
encode consecutive time periods (18) of the audio content (10) into a sequence of
frames (20) respectively representing the consecutive time periods (18) of the audio
content (10), such that the sequence of frames (20) is a composition of N sequences
of frame elements (22) with each frame element (22) being of a respective one of a
plurality of element types so that each frame (20) comprises one frame element (22)
out of the N sequences of frame elements (22), respectively, and for each sequence
of frame elements (22), the frame elements (22) are of equal element type relative
to each other,
encode into the bitstream (12) a configuration block (28) which comprises, for at
least one of the sequences of frame elements (22), a default payload length
information (60) on a default payload length, and
encoding each frame element (22) of the at least one of the sequences of frame
elements (22) into the bitstream (12) such that same comprises a length information
(58) comprising, for at least a subset of the frame elements (22) of the at least one
of the sequences of frame elements (22), a default payload length flag (64)
followed, if the default payload length flag (64) is not set, by a payload length value
(66), and that any frame element of the at least one of the sequences of frame
elements (22), the default extension payload length flag (64) of which is set, has the
default payload length, and any frame element of the at least one of the sequences
of frame elements (22), the default extension payload length flag (64) of which is
not set, has a payload length corresponding to the payload length value (66).
Method for decoding a bitstream (12) comprising a configuration block (28) and a
sequence of frames (20) respectively representing consecutive time periods of an
audio content, wherein the sequence of frames (20) is a composition of N sequences
of frame elements (22) with each frame element (22) being of a respective one of a
plurality of element types so that each frame (20) comprises one frame element (22)
out of the N sequences of frame elements (22), respectively, and for each sequence
of frame elements (22), the frame elements (22) are of equal element type relative
to each other, wherein the method comprises parsing the bitstream (12) and
reconstructing the audio content based on a subset of the sequences of frame
elements and, with respect to at least one frame of the sequences of frame elements
(22), not belonging to the subset of the sequences of frame elements,
reading from the configuration block (28), for the at least one of the sequences of
frame elements (22), a default payload length information (60) on a default payload
length, and
for each frame element (22) of the at least one of the sequences of frame elements
(22), reading a length information from the bitstream (12), the reading of the length
information comprising, for at least a subset of the frame elements (22) of the at
least one of the sequences of frame elements (22), reading a default payload length
flag (64) followed, if the default payload length flag (64) is not set, by reading a
payload length value (66),
skipping, in parsing the bitstream (12), any frame element of the at least one of the
sequences of frame elements (22), the default extension payload length flag (64) of
which is set, using the default payload length as skip interval length, and any frame
element of the at least one of the sequences of frame elements (22), the default
extension payload length flag (64) of which is not set, using a payload length
corresponding to the payload length value (66) as skip interval length.
Method for encoding of an audio content into a bitstream, the method comprising
encoding consecutive time periods (18) of the audio content (10) into a sequence of
frames (20) respectively representing the consecutive time periods (18) of the audio
content (10), such that the sequence of frames (20) is a composition of N sequences
of frame elements (22) with each frame element (22) being of a respective one of a
plurality of element types so that each frame (20) comprises one frame element (22)
out of the N sequences of frame elements (22), respectively, and for each sequence
of frame elements (22), the frame elements (22) are of equal element type relative
to each other,
encoding into the bitstream (12) a configuration block (28) which comprises, for
at least one of the sequences of frame elements (22), a default payload length
information (60) on a default payload length, and
encoding each frame element (22) of the at least one of the sequences of frame
elements (22) into the bitstream (12) such that same comprises a length information
(58) comprising, for at least a subset of the frame elements (22) of the at least one
of the sequences of frame elements (22), a default payload length flag (64)
followed, if the default payload length flag (64) is not set, by a payload length value
(66), and that any frame element of the at least one of the sequences of frame
elements (22), the default extension payload length flag (64) of which is set, has the
default payload length, and any frame element of the at least one of the sequences
of frame elements (22), the default extension payload length flag (64) of which is
not set, has a payload length corresponding to the payload length value (66).
Computer program for performing, when running on a computer, the method of
claim 23 or claim 24.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	2804-KOLNP-2013-(23-09-2013)-PCT SEARCH REPORT & OTHERS.pdf	2013-09-23
1	2804-KOLNP-2013-FORM-27 [31-07-2024(online)].pdf	2024-07-31
2	2804-KOLNP-2013-(23-09-2013)-GPA.pdf	2013-09-23
2	2804-KOLNP-2013-RELEVANT DOCUMENTS [25-09-2023(online)].pdf	2023-09-25
3	2804-KOLNP-2013-RELEVANT DOCUMENTS [07-09-2023(online)].pdf	2023-09-07
3	2804-KOLNP-2013-(23-09-2013)-FORM-5.pdf	2013-09-23
4	2804-KOLNP-2013-RELEVANT DOCUMENTS [07-08-2023(online)].pdf	2023-08-07
4	2804-KOLNP-2013-(23-09-2013)-FORM-3.pdf	2013-09-23
5	2804-KOLNP-2013-PROOF OF ALTERATION [23-05-2023(online)].pdf	2023-05-23
5	2804-KOLNP-2013-(23-09-2013)-FORM-2.pdf	2013-09-23
6	2804-KOLNP-2013-RELEVANT DOCUMENTS [27-09-2022(online)].pdf	2022-09-27
6	2804-KOLNP-2013-(23-09-2013)-FORM-1.pdf	2013-09-23
7	2804-KOLNP-2013-RELEVANT DOCUMENTS [13-09-2022(online)].pdf	2022-09-13
7	2804-KOLNP-2013-(23-09-2013)-CORRESPONDENCE.pdf	2013-09-23
8	2804-KOLNP-2013.pdf	2013-10-03
8	2804-KOLNP-2013-RELEVANT DOCUMENTS [09-09-2022(online)].pdf	2022-09-09
9	2804-KOLNP-2013-(05-12-2013)-PA.pdf	2013-12-05
9	2804-KOLNP-2013-RELEVANT DOCUMENTS [07-09-2021(online)].pdf	2021-09-07
10	2804-KOLNP-2013-(05-12-2013)-CORRESPONDENCE.pdf	2013-12-05
10	2804-KOLNP-2013-IntimationOfGrant20-04-2020.pdf	2020-04-20
11	2804-KOLNP-2013-(05-12-2013)-ASSIGNMENT.pdf	2013-12-05
11	2804-KOLNP-2013-PatentCertificate20-04-2020.pdf	2020-04-20
12	2804-KOLNP-2013-FORM-18.pdf	2014-01-02
12	2804-KOLNP-2013-Written submissions and relevant documents [23-03-2020(online)].pdf	2020-03-23
13	2804-KOLNP-2013-(11-04-2014)-OTHERS.pdf	2014-04-11
13	2804-KOLNP-2013-Correspondence to notify the Controller [05-03-2020(online)].pdf	2020-03-05
14	2804-KOLNP-2013-(11-04-2014)-CORRESPONDENCE.pdf	2014-04-11
14	2804-KOLNP-2013-FORM-26 [05-03-2020(online)].pdf	2020-03-05
15	2804-KOLNP-2013-HearingNoticeLetter-(DateOfHearing-11-03-2020).pdf	2020-02-25
15	Other Patent Document [13-06-2016(online)].pdf	2016-06-13
16	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [25-11-2019(online)].pdf	2019-11-25
16	Other Patent Document [28-07-2016(online)].pdf	2016-07-28
17	Other Patent Document [05-10-2016(online)].pdf	2016-10-05
17	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [15-07-2019(online)].pdf	2019-07-15
18	2804-kolnp-2013-Information under section 8(2) (MANDATORY) [22-03-2019(online)].pdf	2019-03-22
18	Other Patent Document [08-12-2016(online)].pdf	2016-12-08
19	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [21-12-2018(online)].pdf	2018-12-21
19	Other Patent Document [04-04-2017(online)].pdf	2017-04-04
20	2804-KOLNP-2013-ABSTRACT [23-11-2018(online)].pdf	2018-11-23
20	Information under section 8(2) [23-06-2017(online)].pdf	2017-06-23
21	2804-KOLNP-2013-CLAIMS [23-11-2018(online)].pdf	2018-11-23
21	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [19-07-2017(online)].pdf	2017-07-19
22	2804-KOLNP-2013-COMPLETE SPECIFICATION [23-11-2018(online)].pdf	2018-11-23
22	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [15-11-2017(online)].pdf	2017-11-15
23	2804-KOLNP-2013-CORRESPONDENCE [23-11-2018(online)].pdf	2018-11-23
23	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [20-03-2018(online)].pdf	2018-03-20
24	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [04-04-2018(online)]_17.pdf	2018-04-04
24	2804-KOLNP-2013-FER_SER_REPLY [23-11-2018(online)].pdf	2018-11-23
25	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [04-04-2018(online)].pdf	2018-04-04
25	2804-KOLNP-2013-OTHERS [23-11-2018(online)].pdf	2018-11-23
26	2804-KOLNP-2013-FER.pdf	2018-04-25
26	2804-KOLNP-2013-PETITION UNDER RULE 137 [23-11-2018(online)].pdf	2018-11-23
27	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [15-09-2018(online)].pdf	2018-09-15
27	2804-KOLNP-2013-RELEVANT DOCUMENTS [23-11-2018(online)].pdf	2018-11-23
28	2804-KOLNP-2013-FORM 4(ii) [23-10-2018(online)].pdf	2018-10-23
29	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [15-09-2018(online)].pdf	2018-09-15
29	2804-KOLNP-2013-RELEVANT DOCUMENTS [23-11-2018(online)].pdf	2018-11-23
30	2804-KOLNP-2013-FER.pdf	2018-04-25
30	2804-KOLNP-2013-PETITION UNDER RULE 137 [23-11-2018(online)].pdf	2018-11-23
31	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [04-04-2018(online)].pdf	2018-04-04
31	2804-KOLNP-2013-OTHERS [23-11-2018(online)].pdf	2018-11-23
32	2804-KOLNP-2013-FER_SER_REPLY [23-11-2018(online)].pdf	2018-11-23
32	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [04-04-2018(online)]_17.pdf	2018-04-04
33	2804-KOLNP-2013-CORRESPONDENCE [23-11-2018(online)].pdf	2018-11-23
33	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [20-03-2018(online)].pdf	2018-03-20
34	2804-KOLNP-2013-COMPLETE SPECIFICATION [23-11-2018(online)].pdf	2018-11-23
34	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [15-11-2017(online)].pdf	2017-11-15
35	2804-KOLNP-2013-CLAIMS [23-11-2018(online)].pdf	2018-11-23
35	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [19-07-2017(online)].pdf	2017-07-19
36	Information under section 8(2) [23-06-2017(online)].pdf	2017-06-23
36	2804-KOLNP-2013-ABSTRACT [23-11-2018(online)].pdf	2018-11-23
37	Other Patent Document [04-04-2017(online)].pdf	2017-04-04
37	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [21-12-2018(online)].pdf	2018-12-21
38	2804-kolnp-2013-Information under section 8(2) (MANDATORY) [22-03-2019(online)].pdf	2019-03-22
38	Other Patent Document [08-12-2016(online)].pdf	2016-12-08
39	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [15-07-2019(online)].pdf	2019-07-15
39	Other Patent Document [05-10-2016(online)].pdf	2016-10-05
40	2804-KOLNP-2013-Information under section 8(2) (MANDATORY) [25-11-2019(online)].pdf	2019-11-25
40	Other Patent Document [28-07-2016(online)].pdf	2016-07-28
41	2804-KOLNP-2013-HearingNoticeLetter-(DateOfHearing-11-03-2020).pdf	2020-02-25
41	Other Patent Document [13-06-2016(online)].pdf	2016-06-13
42	2804-KOLNP-2013-(11-04-2014)-CORRESPONDENCE.pdf	2014-04-11
42	2804-KOLNP-2013-FORM-26 [05-03-2020(online)].pdf	2020-03-05
43	2804-KOLNP-2013-(11-04-2014)-OTHERS.pdf	2014-04-11
43	2804-KOLNP-2013-Correspondence to notify the Controller [05-03-2020(online)].pdf	2020-03-05
44	2804-KOLNP-2013-FORM-18.pdf	2014-01-02
44	2804-KOLNP-2013-Written submissions and relevant documents [23-03-2020(online)].pdf	2020-03-23
45	2804-KOLNP-2013-(05-12-2013)-ASSIGNMENT.pdf	2013-12-05
45	2804-KOLNP-2013-PatentCertificate20-04-2020.pdf	2020-04-20
46	2804-KOLNP-2013-IntimationOfGrant20-04-2020.pdf	2020-04-20
46	2804-KOLNP-2013-(05-12-2013)-CORRESPONDENCE.pdf	2013-12-05
47	2804-KOLNP-2013-(05-12-2013)-PA.pdf	2013-12-05
47	2804-KOLNP-2013-RELEVANT DOCUMENTS [07-09-2021(online)].pdf	2021-09-07
48	2804-KOLNP-2013-RELEVANT DOCUMENTS [09-09-2022(online)].pdf	2022-09-09
48	2804-KOLNP-2013.pdf	2013-10-03
49	2804-KOLNP-2013-(23-09-2013)-CORRESPONDENCE.pdf	2013-09-23
49	2804-KOLNP-2013-RELEVANT DOCUMENTS [13-09-2022(online)].pdf	2022-09-13
50	2804-KOLNP-2013-(23-09-2013)-FORM-1.pdf	2013-09-23
50	2804-KOLNP-2013-RELEVANT DOCUMENTS [27-09-2022(online)].pdf	2022-09-27
51	2804-KOLNP-2013-PROOF OF ALTERATION [23-05-2023(online)].pdf	2023-05-23
51	2804-KOLNP-2013-(23-09-2013)-FORM-2.pdf	2013-09-23
52	2804-KOLNP-2013-RELEVANT DOCUMENTS [07-08-2023(online)].pdf	2023-08-07
52	2804-KOLNP-2013-(23-09-2013)-FORM-3.pdf	2013-09-23
53	2804-KOLNP-2013-RELEVANT DOCUMENTS [07-09-2023(online)].pdf	2023-09-07
53	2804-KOLNP-2013-(23-09-2013)-FORM-5.pdf	2013-09-23
54	2804-KOLNP-2013-RELEVANT DOCUMENTS [25-09-2023(online)].pdf	2023-09-25
54	2804-KOLNP-2013-(23-09-2013)-GPA.pdf	2013-09-23
55	2804-KOLNP-2013-(23-09-2013)-PCT SEARCH REPORT & OTHERS.pdf	2013-09-23
55	2804-KOLNP-2013-FORM-27 [31-07-2024(online)].pdf	2024-07-31

Search Strategy

1	2804kolnp2013_07-02-2018.pdf