AUDIO ENCODER, AUDIO DECODER, METHOD FOR ENCODING AN AUDIO
INFORMATION, METHOD FOR DECODING AN AUDIO INFORMATION AND COMPUTER
PROGRAM USING AN OPTIMIZED HASH TABLE
5 Technical Field
Embodiments according to the invention are related to an audio decoder for providing a
decoded audio information on the basis of an encoded audio information, an audio encoder for
providing an encoded audio information on the basis of an input audio information, a method
10 for providing a decoded audio information on the basis of an encoded audio information, a
method for providing an encoded audio information on the basis of an input audio information
and a computer program.
Embodiments according to the invention are related to an improved spectral noiseless coding,
15 which can be used in an audio encoder or decoder, like, for example, a so-called unifiedspeech-
and-audio coder (USAC).
Embodiment according to the invention are related to an update of spectral coding tables for
application in a current USAC specification.
20
Background of the Invention
In the following, the background of the invention will be briefly explained in order to
facilitate the understanding of the invention and the advantages thereof. During the past
25 decade, big efforts have been put on creating the possibility to digitally store and distribute
audio contents with good bitrate efficiency. One important achievement on this way is the
definition of the International Standard ISO/IEC 14496-3. Part 3 of this Standard is related to
an encoding and decoding of audio contents, and subpart 4 of part 3 is related to general audio
coding. ISO/IEC 14496 part 3, subpart 4 defines a concept for encoding and decoding of
30 general audio content. In addition, further improvements have been proposed in order to
improve the quality and/or to reduce the required bit rate.
According to the concept described in said Standard, a time-domain audio signal is converted
into a time-frequency representation. The transform from the time-domain to the time-
35 frequency-domain is typically performed using transform blocks, which are also designated as
"frames", of time-domain samples. It has been found that it is advantageous to use
overlapping frames, which are shifted, for example, by half a frame, because the overlap
allows to efficiently avoid (or at least reduce) artifacts. In addition, it has been found that a
windowing should be performed in order to avoid the artifacts originating from this
processing of temporally limited frames.
By transforming a windowed portion of the input audio signal from the time-domain to the
time-frequency domain, an energy compaction is obtained in many cases, such that some of
the spectral values comprise a significantly larger magnitude than a plurality of other spectral
values. Accordingly, there are, in many cases, a comparatively small number of spectral
values having a magnitude, which is significantly above an average magnitude of the spectral
values. A typical example of a time-domain to time-frequency domain transform resulting in
an energy compaction is the so-called modified-discrete-cosine-transform (MDCT).
The spectral values are often scaled and quantized in accordance with a psychoacoustic
model, such that quantization errors are comparatively smaller for psychoacoustically more
important spectral values, and are comparatively larger for psychoacoustically less-important
spectral values. The scaled and quantized spectral values are encoded in order to provide a
bitrate-efficient representation thereof.
For example, the usage of a so-called Huffman coding of quantized spectral coefficients is
described in the International Standard ISO/IEC 14496-3:2005(E), part 3, subpart 4.
However, it has been found that the quality of the coding of the spectral values has a
significant impact on the required bitrate. Also, it has been found that the complexity of an
audio decoder, which is often implemented in a portable consumer device, and which should
therefore be cheap and of low power consumption, is dependent on the coding used for
encoding the spectral values.
In view of this situation, there is a need for a concept for an encoding and decoding of an
audio content, which provides for an improved trade-off between bitrate-efficiency and
resource efficiency.
Summary of the Invention
An embodiment according to the invention creates an audio decoder for providing a plurality
of decoded spectral values on the basis of an arithmetically encoded representation of the
spectral values. The audio decoder also comprises a frequency-domain-to-time-domain
converter for providing a time-domain audio representation using the decoded spectral values,
in order to obtain the decoded audio information. The arithmetic decoder is configured to
select a mapping rule describing a mapping of a code value representing a spectral value, or a
most-significant bit-plane of a spectral value, in an encoded form, onto a symbol code
representing a spectral value, or a most-significant bit-plane of a spectral value, in a decoded
form, in dependence on a context state described by a numeric current context value. The
arithmetic decoder is configured to determine the numeric current context value in
dependence on a plurality of previously decoded spectral values. The arithmetic decoder is
configured to evaluate a hash table, entries of which define both significant state values
amongst the numeric context values and boundaries of intervals of numeric context values, in
order to select the mapping rule. The arithmetic decoder is configured to evaluate the hash
table for finding a hash table index value i for which the value ari_hash_m[i]»8 is equal or
greater than c, while, if the found hash table index value i is greater than 0, the value
ari hash_m[i-l]»8 is lower than c. Moreover, the arithmetic decoder is configured to select
the mapping rule which is determined by a probability model index (pki) which equals to
ari_hash_m[i]&&OxFF when ari_hashjn[i]»8 is equal to c, or equals to ari_lookup_m[i]
otherwise. In the present embodiment, the hash table ari_hash_m is defined as given in Figs.
22(1), 22(2), 22(3) and 22(4). Moreover, the mapping table ari_lookup_m is defined as given
in Fig. 21.
It has been found that the combination of the above mentioned algorithm with the hash table
of Figs. 22(1) to 22(4) allows for a particularly efficient selection of a mapping rule, as the
hash table in accordance with Fig. 22(1) to 22(4) defines, in a particularly well-suited manner,
both significant values of the numeric context value and state intervals. Moreover, the
interaction between said algorithm and the hash table in accordance with Figs. 22(1) to 22(4)
has shown to bring along particularly good results while keeping computational complexity
reasonable small. Moreover, the mapping table defined in Fig. 2 1 is also particularly welladapted
to said algorithm when taken in combination with the above mentioned hash table. To
summarize, the usage of the hash table as given in Figs. 22(1) to 22(4) and of the mapping
table as defined in Fig. 22 in connection with the algorithm as described above brings along a
good coding/decoding efficiency and a low computational complexity.
In a preferred embodiment, the arithmetic decoder is configured to evaluate the hash table
using the algorithm as defined in Fig. 5e, wherein c designates a variable representing the
numeric current context value or a scaled version thereof, wherein i is a variable describing a
current hash table index value, wherein in i_min is a variable initialed to designate a hash
table index value of a first entry of the hash table and selectively updated in dependence on a
comparison between c and (j »8). In the above mentioned algorithm, the condition
"c<(j»8)" defines that a state value described by the variable c is smaller than a state value
described by the table entry ari_hash_m[i]. Also, in the above mentioned algorithm, "j&OxFF"
describes a mapping rule index value described by the table entry ari_hash_m[i]. Further
i_max is a variable initialized to designate a hash table index value of a last entry of the hash
table an selectively updated in dependence on a comparison between c and (j> ) - The
condition "c>(j»8)" defines that a state value described by the variable c is larger than a state
value described by the table entry ari_hash_m[i] . The return value of said algorithm
designates an index pki of a probability model and is a mapping rule index value.
"ari_hash_m" designates the hash table, and "ari_hash_m[i]" designates an entry of the hash
table ari_hash-m having hash table index value i. "ari_lookup_m" designates a mapping table,
and "ari_lookup_m[i_max]" designates an entry of the mapping table ari_lookup_m having
mapping index value i_max.
It has been found that the combination of the above mentioned algorithm, as shown in Fig. 5e,
with the hash table of Figs. 22(1) to 22(4) allows for a particularly efficient selection of a
mapping rule, as the hash in accordance with Fig. 22(1) to 22(4) defines, in a particularly
well-suited manner, both significant values of the numeric context value and state intervals.
Moreover, the interaction between said algorithm in accordance with Fig. 5e and the hash
table in accordance with Figs. 22(1) to 22(4) has shown to bring along particularly good
results in combination with a fast algorithm for the table search. Moreover, the mapping table
defined in Fig. 1 is also particularly well-adapted to said algorithm when taken in
combination with the above mentioned hash table. To summarize, the usage of the hash table
as given in Figs. 22(1) to 22(4) and of the mapping table as defined in Fig. 22 in connection
with the algorithm as defined in Fig. 5e brings along a good coding/decoding efficiency and a
low computational complexity. In other words, it has been found that the bi-section algorithm
of Fig. 5e is well suited to operate with the tables ari_hash_m and ari_lookup_m, as defined
above.
However, it should be noted that slight changes (which are easily feasible) or even significant
changes of the search algorithm may be made without changing the concept according to the
present invention.
In other words, the search method is not constrained to the mentioned methods. Even though
the usage of the bi-section method (for example, according to Fig. 5e) further improves the
performance, it would also be possible to perform a simple exhaustive search, which,
nevertheless brings along some increase of complexity.
In a preferred embodiment, the arithmetic decoder is configured to select the mapping rule
describing a mapping of a code value onto a symbol code on the basis of the mapping rule
index value pki, which is, for example provided as a return value of the algorithm shown in
Fig. 5e. The usage of said mapping rule index value pki is very efficient, because the
interaction of the above mentioned tables and the above mentioned algorithm is optimized for
providing a meaningful mapping rule index value.
In a preferred embodiment, the arithmetic decoder is configured to use the mapping rule index
value as a table index value to select the mapping rule describing a mapping of a code value
onto a symbol code. The usage of the mapping rule index value as a table index value allows
for a computationally efficient an memory efficient selection of the mapping rule.
In a preferred embodiment, the arithmetic decoder is configured to select one of the sub-tables
of the table ari_cf_m[64][17], as defined in Fig. 23(1), 23(2), 23(3), as the selected mapping
rule. This concept is based on the fact that the mapping rules defined by sub tables of the table
ari_cf_m[64][17], as defined in Fig. 23(1 ),(2), (3), are well-adapted to the results which can
be achieved by the execution of the above mentioned algorithm according to Fig. 5e in
combination with the tabled in accordance with Figs. 2 1 and 22(1) to 22(4).
In a preferred embodiment, the arithmetic decoder is configured to obtain the numeric context
value on the basis of a numeric previous context value using an algorithm in accordance with
Fig. 5c, wherein the algorithm receives, as input values, a value of a variable c representing a
numeric previous context value, a value or a variable i representing an index of a two-tuple of
spectral values to decode in a vector of spectral values. A value or a variable N represents a
window length of a reconstruction window of the frequency-domain-to-time-domainconverter.
The algorithm provides, as an output value, an updated value or variable c
representing the numeric current context value. In the algorithm, an operation "c»4"
describes a shift to the right by 4 bits of the value or variable c. Moreover, q[0][i+l]
designates a context sub region value associate with a previous audio frame and having
associated a higher frequency index i+1 , higher by 1, than a current frequency index of a twototal
of spectral values to be currently decoded. Similarly q[l][i-l] designates a context sub
region value associated with a current audio frame and having associated a smaller frequency
index i-1, smaller by 1, then a current frequency index of a two-tuple of spectral values to be
currently decoded. q[l ][i-2] designates a context sub region value associated with a current
audio frame and having associated a smaller frequency index i-2, smaller by 2, than a current
frequency index of a two-tuple of spectral values to be currently decoded. q[l][i-3] designates
a context sub region value associated with the current audio frame and having associated a
smaller frequency index i-3, smaller by 3, than a current frequency index of a two-tuple of
spectral values to be currently decoded. It has been found that the algorithm according to Fig.
5e when taken in combination with the tables of Fig. 2 1 and 22(1) to 22(4) is well-adapted to
provide the mapping rule index value on the basis of a numeric current context value c
obtained using the algorithm of Fig. 5c, wherein obtaining the numeric current context value
using the algorithm of Fig. 5c is computationally particularly efficient, because the algorithm
according to Fig. 5c requires only a very simple computation.
In a preferred embodiment, the arithmetic decoder is configured to update a context sub
region value q[l][i] associated with a current audio frame and having associated the current
frequency index of the two-tuple of spectral values currently decoded using an algorithm
according to Fig. 51, wherein a designates an absolute value of a first spectral value of the
two-tuple of the spectral values currently decoded, and wherein b designates a second spectral
value of the two-tuple of spectral values currently decoded. It can be seen that the preferred
algorithm is very well-suited for a simple update of the context sub region values.
In a preferred embodiment, the arithmetic decoder is configured to provide a decoded value m
representing a two-tuple of decoded spectral values using the arithmetic decoding algorithm
according to Fig. 5g. It has been found that said arithmetic decoding algorithm is very wellsuited
for the cooperation with the above described algorithms.
Another embodiment according to the invention creates a decoder for providing a decoded
audio information on the basis of an encoded audio information. The audio decoder comprises
an arithmetic decoder for providing a plurality of decoded spectral values on the basis of an
arithmetically encoded representation of the spectral values. The audio decoder also
comprises a frequency-domain-to-time-domain converter for providing a time-domain audio
representation using the decoded spectral values, in order to obtain the decoded audio
information. The arithmetic decoder is configured to select a mapping rule describing a
mapping of a code value representing a spectral value, or a most-significant bit-plane of a
spectral value, in a encoded form, onto a symbol code representing a spectral value, or a mostsignificant
bit-plane of a spectral value, in a decoded form, in dependence on a context state
described by a numeric current context value. The arithmetic decoder is configured to
determine the numeric current context value in dependence on a plurality of previously
decoded spectral values. The arithmetic decoder is configured to evaluate the hash table,
entries of which define both significant state values amongst the numeric context values and
boundaries of intervals of numeric context values, in order to select the mapping rule. The
hash table ari_hash_m is defined as given in Fig. 22(1), 22(2), 22(3) and 22(4). The arithmetic
decoder is configured to evaluate the hash table to determine whether the numeric current
context value is identical to a table context value described by an entry of the hash table or to
determine an interval described by entries of the hash table within which the numeric current
context value lies, and to derive a mapping rule index value describing a selected mapping
rule in dependence on a result of a evaluation. It has been found that the hash table
ari_hash_m ais given in Figs. 22(1) to 22(4) is well-suited for a parsing for table context
values described by entries of the hash table and intervals described by entries of the hash
table, to thereby derive the mapping index value. It has been found that the definition of both
table context values and intervals by the hash table in accordance with Fig. 22(1) to 22(4)
provides an efficient mechanism for the selection of the mapping rule when taken in
combination with a simple concept for the evaluation of the hash table which uses entries of
said hash table both to a check for table context values and to determine in which interval
defined by entries of the hash table a non-table context values lies.
In a preferred embodiment, the arithmetic decoder is configured to compare the numeric
current context value, or a scaled version of the numeric current context value, with a series of
the numerically ordered entries or sub-entries of the hash table, to iteratively obtain a hash
table index value of a table entry, such that the numeric current context value lies within a
interval defined by the obtained hash table entry designated by the obtained hash table index
value and an adjacent hash table entry. In this case, the arithmetic decoder is configured to
determine a next entry of the series of entries of the hash table in dependence on a result of a
comparison between the numeric current context value, or a scaled version of the numeric
current context value, and a current entry or sub-entry. It has been recognized that this
mechanism allows for a particularly efficient evaluation of the hash table in accordance with
Fig. 22(1) to 22(4).
In a preferred embodiment, the arithmetic decoder is configured to select a mapping rule
defined by a second sub-entry of the hash table designated by the current hash table index
value if it is found that the numeric current context value or a scaled version thereof is equal
to the first sub-entry of the hash table designated by the current hash table index value.
Accordingly, the entries of the hash-table, as defined in accordance with Figs. 22(1) to 22(4)
take over a double function. A first sub-entry (i.e., a first portion of an entry) of the hash table
is used for identifying particularly significant states of the numeric (current) context value,
while a second sub-entry of the hash table (i.e., a second part of such an entry) defines a
mapping rule, for example, by defining a mapping rule index value. Thus, the entries of the
hash table are used in a very efficient manner. Also, the mechanism is particularly efficient in
providing mapping rule index values for the particularly important states of the numeric
current context values, which are described by entries of the hash table, or, more precisely, by
sub-entries of the hash table. Thus, a complete entry of the hash table, as defined in Figs.
22(1) to 22(4), defines rules amapping of a particularly important state of the numeric
(current) context value to a mapping rule and interval boundaries of regions (or intervals) of
less important states of the numeric current context value.
In a preferred embodiment, the arithmetic decoder is configured to select a mapping rule
defined by an entry or sub-entry of a mapping table ari_lookup_m if it is not found that the
numeric current context value is equal to a sub-entry of the hash table. In this case, the
arithmetic decoder is configured to choose the entry or sub-entry of the mapping table in
dependence on the iteratively obtained hash table index value. Thus, a particularly efficient
two-table mechanism is created, which allows to efficiently provide a mapping rule index
value both for particularly important states of the numeric current context value and for less
important states of the numeric current context value (wherein the less important states of the
numeric current context value are not explicitly, i.e. individually, described by entries or subentries
of the hash table).
In a preferred embodiment, the arithmetic decoder is configured to selectively provide a
mapping rule index value defined by the entry of the hash table designated by the obtained
hash table index value if it is found that the numeric current context value equals the value
defined by the entry of the hash table designated by the current hash table index value. Thus,
there is an efficient mechanism which allows for the double-usage of the entries of the hash
table.
Further embodiments of the invention create methods for providing a decoded audio
information on the basis of an encoded audio information. Said methods fulfill the
functionality of the audio decoders discussed before. Accordingly, the methods are based on
the same ideas and findings as the audio decoders, such that a discussion is omitted here for
the sake of brevity. It should be noted that the methods can be supplemented by any of the
features and functionalities of the audio decoders.
Another embodiment according to the invention creates an audio decoder for providing an
encoded audio information on the basis of an input audio information. The audio encoder
comprises an energy-compacting time-domain-to-frequency-domain converter for providing a
frequency-domain audio representation on the basis of a time-domain representation of the
input audio information, such that the frequency-domain audio representation comprises a set
of spectral values. The audio encoder also comprises an arithmetic encoder configured to
encode a spectral value or a preprocessed version thereof using a variable length codeword.
The arithmetic encoder is configured to map a spectral value, or a value of a most significant
bit-plane of a spectral value, onto a code value. The arithmetic encoder is also configured to
select a mapping rule describing a mapping of a spectral value, or a most significant bit-plane
of a spectral value, onto a code value, in dependence on a context state described by a
numeric current context value. The arithmetic encoder is configured to determine the numeric
current context value in dependence on a plurality of previously-encoded spectral values. The
arithmetic encoder is also configured to evaluate a hash table, entries of which define both
significant state values amongst the numeric context values and boundaries of intervals of
numeric context values, in order to select the mapping rule. The hash table ari hash m is
defined as given in Figs. 22(1) to 22(4). The arithmetic encoder is configured to evaluate the
hash table to determine whether the numeric current context value is identical to a table
context value described by an entry of the hash table or to determine an interval described by
entries of the hash table within which the numeric current context value lies, and to derive a
mapping rule index value describing a selected mapping rule in dependence on a result of said
evaluation. t should be noted that the functionality of the audio encoder is in parallel with the
functionality of the audio decoder discussed above. Accordingly, reference is made to the
above discussion of the key ideas of the audio decoder for the sake of brevity.
Moreover, it should be noted that the audio encoder can be supplemented by any of the
features and functionalities of the audio decoder. In particular, any of the features regarding
the selection of the mapping rule can be implemented in the audio encoder as well, wherein
encoded spectral values take the place of decoded spectral values, and so on.
Another embodiment according to the invention creates a method for providing an encoded
audio information on the basis for an input audio information. The method performs the
functionality of the audio encoder described before in is based on the same ideas.
Another embodiment according to the invention creates a computer program for performing at
least one of the methods described before.
Brief Description of the Figures
Embodiments according to the present invention will subsequently be described taking
reference to the enclosed figures, in which:
Fig 1 shows a block schematic diagram of an audio encoder, according to an
embodiment of the invention;
Fig 2 shows a block schematic diagram of an audio decoder, according to an
embodiment of the invention:
Fig 3 shows a pseudo-program-code representation of an algorithm
"values_decode()' for decoding spectral values;
Fig 4 shows a schematic representation of a context for a state calculation;
Fig 5a shows a pseudo-program-code representation of an algorithm
"arith_map_context()" for mapping a context;
Fig 5b shows a pseudo-program-code representation of another algorithm
"arith_map_context()" for mapping a context;
Fig 5c shows a pseudo-program-code representation of an algorithm
"arith_get_contextO" for obtaining a context state value;
Fig 5 shows a pseudo-program-code representation of another algorithm
"arith_get_context()" for obtaining a context state value;
Fig 5e shows a pseudo-program-code representation of an algorithm "arith_get_pk()"
for deriving a cumulative-frequencies-table index value "pki" from a state
value (or a state variable) ;
shows a pseudo-program-code representation of another algorithm
"arith_get_pk()" for deriving a cumulative-frequencies-table index value "pki"
from a state value (or a state variable);
shows a pseudo-program-code representation of an algorithm "arith_decode()"
for arithmetically decoding a symbol from a variable length codeword;
shows a first part of a pseudo-program-code representation of another
algorithm "arith_decode()" for arithmetically decoding a symbol from a
variable length codeword;
shows a second part of a pseudo-program-code representation of the other
algorithm "arith_decode()" for arithmetically decoding a symbol from a
variable length codeword;
shows a pseudo-program-code representation of an algorithm for deriving
absolute values a,b of spectral values from a common value m;
shows a pseudo-program-code representation of an algorithm for entering the
decoded values a,b into an array of decoded spectral values;
shows a pseudo-program-code representation of an algorithm
"arith_update_context()" for obtaining a context subregion value on the basis
of absolute values a,b of decoded spectral values;
shows a pseudo-program-code representation of an algorithm "arith_finish()"
for filling entries of an array of decoded spectral values and an array of context
subregion values;
shows a pseudo-program-code representation of another algorithm for deriving
absolute values a,b of decoded spectral values from a common value m;
shows a pseudo-program-code representation of an algorithm
"arith_update_context()" for updating an array of decoded spectral values and
an array of context subregion values;
Fig 5p shows a pseudo-program-code representation of an algorithm
"arith_save_context()" for filling entries of an array of decoded spectral values
and entries of an array of context subregion values;
Fig 5q shows a legend of definitions;
Fig 5 shows another legend of definitions;
Fig 6a shows a syntax representation of a unified-speech-and-audio-coding (USAC)
raw data block;
Fig 6 shows a syntax representation of a single channel element;
Fig 6c shows a syntax representation of a channel pair element;
Fig 6d shows a syntax representation of an "ICS" control information;
Fig 6e shows a syntax representation of a frequency-domain channel stream;
Fig 6f shows a syntax representation of arithmetically coded spectral data;
Fig 6g shows a syntax representation for decoding a set of spectral values;
Fig 6h shows another syntax representation for decoding a set of spectral values;
Fig 6i shows a legend of data elements and variables;
Fig 6j shows another legend of data elements and variables;
Fig. 6k shows a syntax representation of a USAC single channel element
"UsacSingleChannelElementO" ;
Fig. 6 1 shows a syntax representation of a USAC channel pair element
"UsacChannelPairElementO";
Fig. 6m shows a syntax representation of an "ICS" control information;
shows a syntax representation of USAC core coder data
"UsacCoreCoderData";
shows a syntax representation of a frequency domain channel stream
"fd_channel_stream() ";
shows a syntax representation of arithmetically coded spectral data
"ac_spectral_data()" ;
shows a block schematic diagram of an audio encoder, according to the first
aspect of the invention;
shows a block schematic diagram of an audio decoder, according to the first
aspect of the invention;
shows a graphical representation of a mapping of a numeric current context
value onto a mapping rule index value, according to the first aspect of the
invention;
shows a block schematic diagram of an audio encoder, according to a second
aspect of the invention;
shows a block schematic diagram of an audio decoder, according to the second
aspect of the invention;
shows a block schematic diagram of an audio encoder, according to a third
aspect of the invention;
shows a block schematic diagram of an audio decoder, according to the third
aspect of the invention;
shows a schematic representation of a context for a state calculation, as it is
used in accordance with working draft 4 of the USAC Draft Standard;
shows an overview of the tables as used in the arithmetic coding scheme
according to working draft 4 of the USAC Draft Standard;
Fig 15a shows a schematic representation of a context for a state calculation, as it is
used in embodiments according to the invention;
Fig 15b shows an overview of the tables as used in the arithmetic coding scheme
according to a comparison example;
Fig 16a shows a graphical representation of a read-only memory demand for the
noiseless coding scheme according a comparison example, and according to
working draft 5 of the USAC Draft Standard, and according to the AAC
(advanced audio coding) Huffman Coding;
Fig 16b shows a graphical representation of a total USAC decoder data read-only
memory demand in accordance with a comparison example and in accordance
with the concept according to working draft 5 of the USAC Draft Standard;
Fig 17 shows a schematic representation of an arrangement for a comparison of a
noiseless coding according to working draft 3 or working draft 5 of the USAC
Draft Standard with a coding scheme according to the comparison example;
Fig 18 shows a table representation of average bit rates produced by a USAC
arithmetic coder according to working draft 3 of the USAC Draft Standard and
according to a comparison example;
Fig 1 shows a table representation of minimum and maximum bit reservoir levels for
an arithmetic decoder according to working draft 3 of the USAC Draft
Standard and for an arithmetic decoder according to a comparison example;
Fig 20 shows a table representation of average complexity numbers for decoding a 32-
kbits bitstream according to working draft 3 of the USAC Draft Standard for
different versions of the arithmetic coder;
Fig. 2 1 shows a table representation of a content of a table "ari_lookup_rn[742]",
according to an embodiment of the invention;
Figs 22(1) to 22(4) show a table representation of a content of a table "ari_hash_m[742]",
according to an embodiment of the invention;
Figs 23(1) to 23(3) show a table representation of a content of a table "ari_cf_m[64][17]",
according to an embodiment of the invention; and
Fig 24 shows a table representation of a content of a table "ari_cf_r[]";
Fig. 25 shows a schematic representation of a context for a state calculation;
Fig. 26 shows a table representation of an averaged coding performance for
transcoding of WD6 reference quality bitstreams for a comparison example
("M17558") and for an embodiment according to the invention ("New
Proposal");
Fig. 27 shows a table representation of a coding performance for transcoding of WD6
reference quality bitstreams per operating point for a comparison example
("Ml 7558") and for an embodiment according to the invention ("Re-trained
tables")
Fig. 28 shows a table representation of a comparison of Noiseless Coder Memory
Demand for WD6, for a comparison example ("Ml 7588") and for an
embodiment according to the invention ("New Proposal");
Fig. 29 shows a table representation of characteristics of tables as used in an
embodiment according to the invention ("Re-trained coding scheme");
Fig. 3 shows a table representation of average complexity numbers for decoding the
32 kbit/s WD6 reference quality bitstreams for the different arithmetic coder
versions;
Fig. 31 shows a table representation of average complexity numbers for decoding the
12 kbit/s WD6 reference quality bitstreams for the different arithmetic coder
versions;
Fig. 32 shows a table representation of average bitrates produced by the arithmetic
coder in an embodiment according to the invention and in the WD6;
Fig. 33 shows a table representation of minimum, maximum and average bitrates of
USAC on a frame basis using the proposed scheme;
shows a table representation of average bitrates produced by a USAC coder
using D6 arithmetic coder and a coder according to an embodiment
according to the invention ("new proposal");
shows a table representation of best and worst cases for an embodiment
according to the invention;
shows a table representation of bitreservoir limit for an embodiment according
to the invention;
shows a syntax representation of arithmetically coded data "arith_data",
according to an embodiment of the invention;
shows a legend of definitions an help elements;
shows another legend of definitions;
shows a pseudo-program-code representation of a function or algorithm
"arith_map_context", according to an embodiment of the invention;
shows a pseudo-program-code representation of a function or algorithm
"arith_get_context", according to an embodiment of the invention;
shows a pseudo-program-code representation of a function or algorithm
"arith_map_pk", according to an embodiment of the invention;
shows a pseudo-program-code representation of a first portion of a function or
algorithm "arith_decode", according to an embodiment of the invention;
shows a pseudo-program-code representation of a second portion of a function
or algorithm "arith_decode", according to an embodiment of the invention;
shows a pseudo-program-code representation of a function or algorithm for
decoding one or more least significant bits, according to an embodiment of the
invention;
Fig. 40g shows a pseudo-program-code representation of a function or algorithm
"arith_update_context", according to an embodiment of the invention;
Fig. 40h shows a pseudo-program-code representation of a function or algorithm
"arith_save_context", according to an embodiment of the invention;
Figs. 41(1) and 41(2) show a table representation of a content of a table
"ari_lookup_m[742]" according to an embodiment of the invention;
Figs. 42 (1),(2),(3),(4) show a table representation of a content of a table
"ari_hash_m[742]", according to an embodiment of the invention;
Figs. 43 (1),(2),(3),(4),(5),(6) show a table representation of a content of a table
"ari_cf_m[96][17]", according to an embodiment of the invention; and
Fig. 44 shows a table representation of a table "ari_cf_r[4]", according to an
embodiment of the invention.
Detailed Description of the Embodiments
1. Audio Encoder according to Fig 7
Fig 7 shows a block schematic diagram of an audio encoder, according to an embodiment of
the invention. The audio encoder 700 is configured to receive an input audio information 710
and to provide, on the basis thereof, an encoded audio information 712.
The audio encoder comprises an energy-compacting time-domain-to-frequency-domain
converter 720 which is configured to provide a frequency-domain audio representation 722 on
the basis of a time-domain representation of the input audio information 710, such that the
frequency-domain audio representation 722 comprises a set of spectral values.
The audio encoder 700 also comprises an arithmetic encoder 730 configured to encode a
spectral value (out of the set of spectral values forming the frequency-domain audio
representation 722), or a pre-processed version thereof, using a variable-length codeword in
order to obtain the encoded audio information 712 (which may comprise, for example, a
plurality of variable-length codewords).
The arithmetic encoder 730 is configured to map a spectral value, or a value of a mostsignificant
bit-plane of a spectral value, onto a code value (i.e. onto a variable-length
codeword) in dependence on a context state.
The arithmetic encoder is configured to select a mapping rule describing a mapping of a
spectral value, or of a most-significant bit-plane of a spectral value, onto a code value, in
dependence on a (current) context state. The arithmetic encoder is configured to determine the
current context state, or a numeric current context value describing the current context state,
in dependence on a plurality of previously-encoded (preferably, but not necessarily, adjacent)
spectral values.
For this purpose, the arithmetic encoder is configured to evaluate a hash-table, entries of
which define both significant state values amongst the numeric context values and boundaries
of intervals of numeric context values.
The hash_table (also designated as "ari_hash_m" in the following) is preferably defined as
given in the table representation of Figs. 22(1), 22(2), 22(3) and 22(4).
Moreover, the arithmetic encoder is preferably configured to evaluate the hash
table(ari hash m), to determine whether the numeric current context value is identical to a
table context value described by entries of the hash table (ari_hash_m) and/or to determine an
interval described by entries of the hash table (ari_hash_m) within which the numeric current
context value lies, and to derive a mapping rule index value (for example, designated with
"pki" herein) describing a selected mapping rule in dependence on a result of the evaluation.
In some cases, a mapping rule index value may be individually associated to a numeric
(current) context value being a significant state value. Also, a common mapping rule index
value may be associated to different numeric (current) context values lying within an interval
bounded by interval boundaries (wherein the interval boundaries are preferably defined by the
entries of the hash table).
As can be seen, the mapping of a spectral value (of the frequency-domain audio
representation 722), or of a most-significant bit-plane of a spectral value, onto a code value
(of the encoded audio information 712), may be performed by a spectral value encoding 740
using a mapping rule 742. A state tracker 750 may be configured to track the context state.
The state tracker 750 provides an information 754 describing the current context state. The
information 754 describing the current context state may preferably take the form of a
numeric current context value. A mapping rule selector 760 is configured to select a mapping
rule, for example, a cumulative-frequencies-table, describing a mapping of a spectral value, or
of a most-significant bit-plane of a spectral value, onto a code value. Accordingly, the
mapping rule selector 760 provides the mapping rule information 742 to the spectral value
encoding 740. The mapping rule information 742 may take the form of a mapping rule index
value or of a cumulative-frequencies-table selected in dependence on a mapping rule index
value. The mapping rule selector 760 comprises (or at least evaluates) a hash-table 752,
entries of which define both significant state values amongst the numeric context values and
boundaries and intervals of numeric context values. Preferably, the entries of the hash table
762 (ari_hash_m[742J) are defined as given in the table representation of Figs. 22(1) to 22(4).
The hash-table 762 is evaluated in order to select the mapping rule, i.e. in order to provide the
mapping rule information 742.
Preferably, but not necessarily, a mapping rule index value may be individually associated to
a numeric context value being a significant state value, and a common mapping rule index
value may be associated to different numeric context values lying within an interval bounded
by interval boundaries.
To summarize the above, the audio encoder 700 performs an arithmetic encoding of a
frequency-domain audio representation provided by the time-domain-to-frequency-domain
converter. The arithmetic encoding is context-dependent, such that a mapping rule (e.g. a
cumulative-frequencies-table) is selected in dependence on previously encoded spectral
values. Accordingly, spectral values adjacent in time and/or frequency (or, at least, within a
predetermined environment) to each other and/or to the currently-encoded spectral value (i.e.
spectral values within a predetermined environment of the currently encoded spectral value)
are considered in the arithmetic encoding to adjust the probability distribution evaluated by
the arithmetic encoding. When selecting an appropriate mapping rule, numeric context current
values 754 provided by a state tracker 750 are evaluated. As typically the number of different
mapping rules is significantly smaller than the number of possible values of the numeric
current context values 754, the mapping rule selector 760 allocates the same mapping rules
(described, for example, by a mapping rule index value) to a comparatively large number of
different numeric context values. Nevertheless, there are typically specific spectral
configurations (represented by specific numeric context values) to which a particular mapping
rule should be associated in order to obtain a good coding efficiency.
It has been found that the selection of a mapping rule in dependence on a numeric current
context value can be performed with particularly high computational efficiency if entries of a
single hash-table define both significant state values and boundaries of intervals of numeric
(current) context values. Moreover, it has been found that the usage of the hash table as
defined in Figs 22(1), 22(2), 22(3), 22(4) brings along a particularly high coding efficiency. It
has been found that this mechanism, in combination with said hash table, is well-adapted to
the requirements of the mapping rule selection, because there are many cases in which a
single significant state value (or significant numeric context value) is embedded between a
left-sided interval of a plurality of non-significant state values (to which a common mapping
rule is associated) and a right-sided interval of a plurality of non-significant state values (to
which a common mapping rule is associated). Also, the mechanism of using a single hashtable,
entries of which are defined in the tables of Figs. 22(1), 22(2), 22(3), 22(4) and define
both significant state values and boundaries of intervals of numeric (current) context values
can efficiently handle different cases, in which, for example, there are two adjacent intervals
of non-significant state values (also designated as non-significant numeric context values)
without a significant state value in between. A particularly high computational efficiency is
achieved due to a number of table accesses being kept small. For example, a single iterative
table search is sufficient in most embodiments in order to find out whether the numeric
current context value is equal to any of the significant state values defined by the entries of
said hash table, or in which of the intervals of non-significant state values the numeric current
context value lays. Consequently, the number of table accesses which are both, timeconsuming
and energy-consuming, can be kept small. Thus, the mapping rule selector 760,
which uses the hash-table 762, may be considered as a particularly efficient mapping rule
selector in terms of computational complexity, while still allowing to obtain a good encoding
efficiency (in terms of bitrate).
Further details regarding the derivation of the mapping rule information 742 from the numeric
current context value 754 will be described below.
2. Audio Decoder according to Fig. 8
Fig. 8 shows a block schematic diagram of an audio decoder 800. The audio decoder 800 is
configured to receive an encoded audio information 10 and to provide, on the basis thereof, a
decoded audio information 812.
The audio decoder 800 comprises an arithmetic decoder 820 which is configured to provide a
plurality of spectral values 822 on the basis of an arithmetically encoded representation 821 of
the spectral values.
The audio decoder 800 also comprises a frequency-domain-to-time-domain converter 830
which is configured to receive the decoded spectral values 822 and to provide the timedomain
audio representation 812, which may constitute the decoded audio information, using
the decoded spectral values 822, in order to obtain a decoded audio information 812.
The arithmetic decoder 820 comprises a spectral value determinator 824, which is configured
to map a code value of the arithmetically-encoded representation 82 of spectral values onto a
symbol code representing one or more of the decoded spectral values, or at least a portion (for
example, a most-significant bit-plane) of one or more of the decoded spectral values. The
spectral value determinator 824 may be configured to perform a mapping in dependence on a
mapping rule, which may be described by a mapping rule information 828a. The mapping rule
information 828a may, for example, take the form of a mapping rule index value, or of a
selected cumulative-frequencies-table (selected, for example, in dependence on a mapping
rule index value).
The arithmetic decoder 820 is configured to select a mapping rule (e.g. a cumulativefrequencies-
table) describing a mapping of code values (described by the arithmeticallyencoded
representation 821 of spectral values) onto a symbol code (describing one or more
spectral values, or a most-significant bit-plane thereof, in a decoded form) in dependence on a
context state (which may be described by the context state information 826a).
The arithmetic decoder 820 is configured to determine the current context state (described by
the numeric current context value) in dependence on a plurality of previously-decoded
spectral values. For this purpose, a state tracker 826 may be used, which receives an
information describing the previously-decoded spectral values and which provides, on the
basis thereof, a numeric current context value 826a describing the current context state.
The arithmetic decoder is also configured to evaluate a hash-table 829, entries of which define
both significant state values amongst the numeric context values and boundaries of intervals
of numeric context values, in order to select the mapping rule. Preferably, the entries of the
hash table 829 (ari_hash_m[742]) are defined as given in the table representation of Figs.
22(1) to 22(4). The hash-table 829 is evaluated in order to select the mapping rule, i.e. in
order to provide the mapping rule information 829.
Preferably, a mapping rule index value is individually associated to a numeric context value
being a significant state value, and a common mapping rule index value is associated to
different numeric context values lying within an interval bounded by interval boundaries. The
evaluation of the hash-table 829 may, for example, be performed using a hash-table evaluator
which may be part of the mapping rule selector 828. Accordingly, a mapping rule information
828a, for example, in the form of a mapping rule index value, is obtained on the basis of the
numeric current context value 826a describing the current context state. The mapping rule
selector 828 may, for example, determine the mapping rule index value 828a in dependence
on a result of the evaluation of the hash-table 829. Alternatively, the evaluation of the hashtable
829 may directly provide the mapping rule index value.
Regarding the functionality of the audio signal decoder 800, it should be noted that the
arithmetic decoder 820 is configured to select a mapping rule (e.g. a cumulative-frequenciestable)
which is, on average, well adapted to the spectral values to be decoded, as the mapping
rule is selected in dependence on the current context state (described, for example, by the
numeric current context value), which in turn is determined in dependence on a plurality of
previously-decoded spectral values. Accordingly, statistical dependencies between adjacent
spectral values to be decoded can be exploited. Moreover, the arithmetic decoder 820 can be
implemented efficiently, with a good trade-off between computational complexity, table size,
and coding efficiency, using the mapping rule selector 828. By evaluating a (single) hashtable
829, entries of which describe both significant state values and interval boundaries of
intervals of non-significant state values, a single iterative table search may be sufficient in
order to derive the mapping rule information 828a from the numeric current context value
826a. Moreover, it has been found that the usage of the hash table as defined in Figs 22(1),
22(2), 22(3), 22(4) brings along a particularly high coding efficiency. Accordingly, it is
possible to map a comparatively large number of different possible numeric (current) context
values onto a comparatively smaller number of different mapping rule index values. By using
the hash-table 829, as described above, and as defined in the table representation of Figs.
22(1) to 22(4), it is possible to exploit the finding that, in many cases, a single isolated
significant state value (significant context value) is embedded between a left-sided interval of
non-significant state values (non-significant context values) and a right-sided interval of non¬
significant state values (non-significant context values), wherein a different mapping rule
index value is associated with the significant state value (significant context value), when
compared to the state values (context values) of the left-sided interval and the state values
(context values) of the right-sided interval. However, usage of the hash-table 829 is also wellsuited
for situations in which two intervals of numeric state values are immediately adjacent,
without a significant state value in between.
To conclude, the mapping rule selector 828, which evaluates the hash-table 829
"ari_hash_m[742], brings along a particularly good efficiency when selecting a mapping rule
(or when providing a mapping rule index value) in dependence on the current context state (or
in dependence on the numeric current context value describing the current context state),
because the hashing mechanism is well-adapted to the typical context scenarios in an audio
decoder.
Further details will be described below.
3 . Context Value Hashing Mechanism According to Fig. 9
In the following, a context hashing mechanism will be disclosed, which may be implemented
in the mapping rule selector 760 and/or the mapping rule selector 828. The hash-table 762
and/or the hash-table 829, as defined in the table representation of Figs. 22(1) to 22(4), may
be used in order to implement said context value hashing mechanism.
Taking reference now to Fig. 9, which shows a numeric current context value hashing
scenario, further details will be described. In the graphic representation of Fig. 9, an abscissa
910 describes values of the numeric current context value (i.e. numeric context values). An
ordinate 912 describes mapping rule index values. Markings 914 describe mapping rule index
values for non-significant numeric context values (describing non-significant states).
Markings 916 describe mapping rule index values for "individual" (true) significant numeric
context values describing individual (true) significant states. Markings 916 describe mapping
rule index values for "improper" numeric context values describing "improper" significant
states, wherein an "improper" significant state is a significant state to which the same
mapping rule index value is associated as to one of the adjacent intervals of non-significant
numeric context values.
As can be seen, a hash-table entry "ari_hash_m[il]" describes an individual (true) significant
state having a numeric context value of cl. As can be seen, the mapping rule index value
mrivl is associated to the individual (true) significant state having the numeric context value
cl. Accordingly, both the numeric context value cl and the mapping rule index value mrivl
may be described by the hash-table entry "ari_hash_m[il]". An interval 932 of numeric
context values is bounded by the numeric context value cl, wherein the numeric context value
c1 does not belong to the interval 932, such that the largest numeric context value of interval
932 is equal to cl - 1. A mapping rule index value of mriv4 (which is different from mrivl) is
associated with the numeric context values of the interval 932. The mapping rule index value
mriv4 may, for example, be described by the table entry "ari_lookup_m[il-l]" of an
additional table "ari_lookup_m".
Moreover, a mapping rule index value mriv2 may be associated with numeric context values
lying within an interval 934. A lower bound of interval 934 is determined by the numeric
context value cl, which is a significant numeric context value, wherein the numeric context
value c does not belong to the interval 932. Accordingly, the smallest value of the interval
934 is equal to cl + 1 (assuming integer numeric context values). Another boundary of the
interval 934 is determined by the numeric context value c2, wherein the numeric context
value c2 does not belong to the interval 934, such that the largest value of the interval 934 is
equal to c2 - 1. The numeric context value c2 is a so-called "improper" numeric context
value, which is described by a hash-table entry "ari_hash_m[i2]". For example, the mapping
rule index value mriv2 may be associated with the numeric context value c2, such that the
numeric context value associated with the "improper" significant numeric context value c2 is
equal to the mapping rule index value associated with the interval 934 bounded by the
numeric context value c2. Moreover, an interval 936 of numeric context value is also bounded
by the numeric context value c2, wherein the numeric context value c2 does not belong to the
interval 936, such that the smallest numeric context value of the interval 936 is equal to c2 +
1. A mapping rule index value mriv3, which is typically different from the mapping rule
index value mriv2, is associated with the numeric context values of the interval 936.
As can be seen, the mapping rule index value mriv4, which is associated to the interval 932 of
numeric context values, may be described by an entry "ari_lookup_m[il-l]" of a table
"ari_lookup_m", the mapping rule index mriv2, which is associated with the numeric context
values of the interval 934, may be described by a table entry "ari_lookup_m[il]" of the table
"ari_lookup_m", and the mapping rule index value mriv3 may be described by a table entry
"ari_lookup_m[i2]" of the table "ari_lookup_m". In the example given here, the hash-table
index value i2, may be larger, by 1, than the hash-table index value il.
As can be seen from Fig. 9, the mapping rule selector 760 or the mapping rule selector 828
may receive a numeric current context value 764, 826a, and decide, by evaluating the entries
of the table "ari sh n", whether the numeric current context value is a significant state
value (irrespective of whether it is an "individual" significant state value or an "improper"
significant state value), or whether the numeric current context value lies within one of the
intervals 932, 934, 936, which are bounded by the ("individual" or "improper") significant
state values cl, c2. Both the check whether the numeric current context value is equal to a
significant state value cl, c2 and the evaluation in which of the intervals 932, 934, 936 the
numeric current context value lies (in the case that the numeric current context value is not
equal to a significant state value) may be performed using a single, common hash table
search.
Moreover, the evaluation of the hash-table "ari_hash_m" may be used to obtain a hash-table
index value (for example, il-1, il or i2). Thus, the mapping rule selector 760, 828 may be
configured to obtain, by evaluating a single hash-table 762, 829 (for example, the hash-table
"ari_hash_m"), a hash-table index value (for example, il-1, i l or i2) designating a significant
state value (e.g., cl or c2) and/or an interval (e.g., 932,934,936) and an information as to
whether the numeric current context value is a significant context value (also designated as
significant state value) or not.
Moreover, if it is found in the evaluation of the hash-table 762, 829, "ari_hash_m", that the
numeric current context value is not a "significant" context value (or "significant" state
value), the hash-table index value (for example, il-1, i l or i2) obtained from the evaluation of
the hash-table ("ari_hash_m") may be used to obtain a mapping rule index value associated
with an interval 932, 934, 936 of numeric context values. For example, the hash-table index
value (e.g., il-1, il or i2) may be used to designate an entry of an additional mapping table
(for example, "ari_lookup_m"), which describes the mapping rule index values associated
with the interval 932, 934, 936 within which the numeric current context value lies.
For further details, reference is made to the detailed discussion below of the algorithm
"arith_get_pk" (wherein there are different options for this algorithm "arith_get_pk()",
examples of which are shown in Figs. 5eand 5f).
Moreover, it should be noted that the size of the intervals may differ from one case to another.
In some cases, an interval of numeric context values comprises a single numeric context
value. However, in many cases, an interval may comprise a plurality of numeric context
values.
4. Audio Encoder According to Fig. 0
Fig. 10 shows a block schematic diagram of an audio encoder 1000 according to an
embodiment of the invention. The audio encoder 1000 according to Fig. 10 is similar to the
audio encoder 700 according to Fig. 7, such that identical signals and means are designated
with identical reference numerals in Figs. 7 and 10.
The audio encoder 1000 is configured to receive an input audio information 710 and to
provide, on the basis thereof, an encoded audio information 712. The audio encoder 1000
comprises an energy-compacting time-domain-to-frequency-domain converter 720, which is
configured to provide a frequency-domain representation 722 on the basis of a time-domain
representation of the input audio information 710, such that the frequency-domain audio
representation 722 comprises a set of spectral values. The audio encoder 1000 also comprises
an arithmetic encoder 1030 configured to encode a spectral value (out of the set of spectral
values forming the frequency-domain audio representation 722), or a pre-processed version
thereof, using a variable-length codeword to obtain the encoded audio information 712 (which
may comprise, for example, a plurality of variable-length codewords).
The arithmetic encoder 1030 is configured to map a spectral value, or a plurality of spectral
values, or a value of a most-significant bit-plane of a spectral value or of a plurality of
spectral values, onto a code value (i.e. onto a variable-length codeword) in dependence on a
context state. The arithmetic encoder 1030 is configured to select a mapping rule describing a
mapping of a spectral value, or of a plurality of spectral values, or of a most-significant bitplane
of a spectral value or of a plurality of spectral values, onto a code value in dependence
on a context state. The arithmetic encoder is configured to determine the current context state
in dependence on a plurality of previously-encoded (preferably, but no necessarily adjacent)
spectral values. For this purpose, the arithmetic encoder is configured to modify a number
representation of a numeric previous context value, describing a context state associated with
one or more previously-encoded spectral values (for example, to select a corresponding
mapping rule), in dependence on a context sub-region value, to obtain a number
representation of a numeric current context value describing a context state associated with
one or more spectral values to be encoded (for example, to select a corresponding mapping
rule).
As can be seen, the mapping of a spectral value, or of a plurality of spectral values, or of a
most-significant bit-plane of a spectral value or of a plurality of spectral values, onto a code
value may be performed by a spectral value encoding 740 using a mapping rule described by
a mapping rule information 742. A state tracker 750 may be configured to track the context
state. The state tracker 750 may be configured to modify a number representation of a
numeric previous context value, describing a context state associated with an encoding of one
or more previously-encoded spectral values, in dependence on a context sub-region value, to
obtain a number representation of a numeric current context value describing a context state
associated with an encoding of one or more spectral values to be encoded. The modification
of the number representation of the numeric previous context value may, for example, be
performed by a number representation modifier 1052, which receives the numeric previous
context value and one or more context sub-region values and provides the numeric current
context value. Accordingly, the state tracker 1050 provides an information 754 describing the
current context state, for example, in the form of a numeric current context value. A mapping
rule selector 1060 may select a mapping rule, for example, a cumulative-frequencies-table,
describing a mapping of a spectral value, or of a plurality of spectral values, or of a mostsignificant
bit-plane of a spectral value or of a plurality of spectral values, onto a code value.
Accordingly, the mapping rule selector 1060 provides the mapping rule information 742 to
the spectral encoding 740.
It should be noted that, in some embodiments, the state tracker 1050 may be identical to the
state tracker 750 or the state tracker 826. It should also be noted that the mapping rule selector
1060 may, in some embodiments, be identical to the mapping rule selector 760, or the
mapping rule selector 828. Preferably, the mapping rule selector 828 may be configured to
use a hash table "ari_hash_m[742]", as defined in the table representation of Figs. 22(1) to
22(4), for the selection of the mapping rule. For example, the mapping rule selector may
perform the functionality as described above with reference to Figs. 7 and 8.
To summarize the above, the audio encoder 1000 performs an arithmetic encoding of a
frequency-domain audio representation provided by the time-domain-to-frequency-domain
converter. The arithmetic encoding is context dependent, such that a mapping rule (e.g. a
cumulative-frequencies-table) is selected in dependence on previously-encoded spectral
values. Accordingly, spectral values adjacent in time and/or frequency (or at least within a
predetermined environment) to each other and/or to the currently-encoded spectral value (i.e.
spectral values within a predetermined environment of the currently-encoded spectral value)
are considered in the arithmetic encoding to adjust the probability distribution evaluated by
the arithmetic encoding.
When determining the numeric current context value, a number representation of a numeric
previous context value, describing a context state associated with one or more previouslyencoded
spectral values, is modified in dependence on a context sub-region value, to obtain a
number representation of a numeric current context value describing a context state associated
with one or more spectral values to be encoded. This approach allows avoiding a complete recomputation
of the numeric current context value, which complete re-computation consumes
a significant amount of resources in conventional approaches. A large variety of possibilities
exist for the modification of the number representation of the numeric previous context value,
including a combination of a re-scaling of a number representation of the numeric previous
context value, an addition of a context sub-region value or a value derived therefrom to the
number representation of the numeric previous context value or to a processed number
representation of the numeric previous context value, a replacement of a portion of the
number representation (rather than the entire number representation) of the numeric previous
context value in dependence on the context sub-region value, and so on. Thus, typically the
numeric representation of the numeric current context value is obtained on the basis of the
number representation of the numeric previous context value and also on the basis of at least
one context sub-region value, wherein typically a combination of operations are performed to
combine the numeric previous context value with a context sub-region value, such as for
example, two or more operations out of an addition operation, a subtraction operation, a
multiplication operation, a division operation, a Boolean-AND operation, a Boolean-OR
operation, a Boolean-NAND operation, a Boolean NOR operation, a Boolean-negation
operation, a complement operation or a shift operation. Accordingly, at least a portion of the
number representation of the numeric previous context value is typically maintained
unchanged (except for an optional shift to a different position) when deriving the numeric
current context value from the numeric previous context value. In contrast, other portions of
the number representation of the numeric previous context value are changed in dependence
on one or more context sub-region values. Thus, the numeric current context value can be
obtained with a comparatively small computational effort, while avoiding a complete recomputation
of the numeric current context value.
Thus, a meaningful numeric current context value can be obtained, which is well-suited for
the use by the mapping rule selector 1060, and which is particularly well suited for use in
combination with the hash table ari_hash_m as defined in the table representation of Figs.
22(1),22(2),22(3),22(4).
Consequently, an efficient encoding can be achieved by keeping the context calculation
sufficiently simple.
5. Audio Decoder According to Fig. 11
Fig. 1 shows a block schematic diagram of an audio decoder 1100. The audio decoder 1100
is similar to the audio decoder 800 according to Fig. 8, such that identical signals, means and
functionalities are designated with identical reference numerals.
The audio decoder 1100 is configured to receive an encoded audio information 810 and to
provide, on the basis thereof, a decoded audio information 812. The audio decoder 1100
comprises an arithmetic decoder 1120 that is configured to provide a plurality of decoded
spectral values 822 on the basis of an arithmetically-encoded representation 821 of the
spectral values. The audio decoder 1100 also comprises a frequency-domain-to-time-domain
converter 830 which is configured to receive the decoded spectral values 822 and to provide
the time-domain audio representation 812, which may constitute the decoded audio
information, using the decoded spectral values 822, in order to obtain a decoded audio
information 812.
The arithmetic decoder 1120 comprises a spectral value determinator 824, which is
configured to map a code value of the arithmetically-encoded representation 821 of spectral
values onto a symbol code representing one or more of the decoded spectral values, or at least
a portion (for example, a most-significant bit-plane) of one or more of the decoded spectral
values. The spectral value determinator 824 may be configured to perform the mapping in
dependence on a mapping rule, which may be described by a mapping rule information 828a.
The mapping rule information 828a may, for example, comprise a mapping rule index value,
or may comprise a selected set of entries of a cumulative-frequencies-table.
The arithmetic decoder 1120 is configured to select a mapping rule (e.g., a cumulativefrequencies-
table) describing a mapping of a code value (described by the arithmeticallyencoded
representation 821 of spectral values) onto a symbol code (describing one or more
spectral values) in dependence on a context state, which context state may be described by the
context state information 1126a. The context state information 1126a may take the form of a
numeric current context value. The arithmetic decoder 1120 is configured to determine the
current context state in dependence on a plurality of previously-decoded spectral values 822.
For this purpose, a state tracker 1126 may be used, which receives an information describing
the previously-decoded spectral values. The arithmetic decoder is configured to modify a
number representation of numeric previous context value, describing a context state
associated with one or more previously decoded spectral values, in dependence on a context
sub-region value, to obtain a number representation of a numeric current context value
describing a context state associated with one or more spectral values to be decoded. A
modification of the number representation of the numeric previous context value may, for
example, be performed by a number representation modifier 1127, which is part of the state
tracker 1126. Accordingly, the current context state information 1126a is obtained, for
example, in the form of a numeric current context value. The selection of the mapping rule
may be performed by a mapping rule selector 1128, which derives a mapping rule information
828a from the current context state information 1126a, and which provides the mapping rule
information 828a to the spectral value determinator 824. Preferably, the mapping rule selector
1128 may be configured to use a hash table "ari_hash_m[742]" as defined in the table
representation of Figs. 22(1) to 22(4), for the selection of the mapping rule. For example, the
mapping rule selector may perform the functionality as described above with reference to
Figs. 7 and 8.
Regarding the functionality of the audio signal decoder 1100, it should be noted that the
arithmetic decoder 1120 is configured to select a mapping rule (e.g., a cumulativefrequencies-
table) which is, on average, well-adapted to the spectral value to be decoded, as
the mapping rule is selected in dependence on the current context state, which, in turn, is
determined in dependence on a plurality of previously-decoded spectral values. Accordingly,
statistical dependencies between adjacent spectral values to be decoded can be exploited.
Moreover, by modifying a number representation of a numeric previous context value
describing a context state associated with a decoding of one or more previously decoded
spectral values, in dependence on a context sub-region value, to obtain a number
representation of a numeric current context value describing a context state associated with a
decoding of one or more spectral values to be decoded, it is possible to obtain a meaningful
information about the current context state, which is well-suited for a mapping to a mapping
rule index value, and which is particularly well suited for use in combination with the hash
table ari_hash_m as defined in the table representation of Figs. 22(1),22(2),22(3),22(4), with
comparatively small computational effort. By maintaining at least a portion of a number
representation of the numeric previous context value (possibly in a bit-shifted or a scaled
version) while updating another portion of the number representation of the numeric previous
context value in dependence on the context sub-region values which have not been considered
in the numeric previous context value but which should be considered in the numeric current
context value, a number of operations to derive the numeric current context value can be kept
reasonably small. Also, it is possible to exploit the fact that contexts used for decoding
adjacent spectral values are typically similar or correlated. For example, a context for a
decoding of a first spectral value (or of a first plurality of spectral values) is dependent on a
first set of previously-decoded spectral values. A context for decoding of a second spectral
value (or a second set of spectral values), which is adjacent to the first spectral value (or the
first set of spectral values) may comprise a second set of previously-decoded spectral values.
As the first spectral value and the second spectral value are assumed to be adjacent (e.g., with
respect to the associated frequencies), the first set of spectral values, which determine the
context for the coding of the first spectral value, may comprise some overlap with the second
set of spectral values, which determine the context for the decoding of the second spectral
value. Accordingly, it can easily be understood that the context state for the decoding of the
second spectral value comprises some correlation with the context state for the decoding of
the first spectral value. A computational efficiency of the context derivation, i.e. of the
derivation of the numeric current context value, can be achieved by exploiting such
correlations. It has been found that the correlation between context states for a decoding of
adjacent spectral values (e.g., between the context state described by the numeric previous
context value and the context state described by the numeric current context value) can be
exploited efficiently by modifying only those parts of the numeric previous context value
which are dependent on context sub-region values not considered for the derivation of the
numeric previous context state, and by deriving the numeric current context value from the
numeric previous context value.
To conclude, the concepts described herein allow for a particularly good computational
efficiency when deriving the numeric current context value.
Further details will be described below.
6. Audio Encoder According to Fig. 1
Fig. 12 shows a block schematic diagram of an audio encoder, according to an embodiment of
the invention. The audio encoder 1200 according to Fig. 12 is similar to the audio encoder
700 according to Fig. 7, such that identical means, signals and functionalities are designated
with identical reference numerals.
The audio encoder 1200 is configured to receive an input audio information 710 and to
provide, on the basis thereof, an encoded audio information 712. The audio encoder 1200
comprises an energy-compacting time-domain-to-frequency-domain converter 720 which is
configured to provide a frequency-domain audio representation 722 on the basis of a timedomain
audio representation of the input audio information 710, such that the frequencydomain
audio representation 722 comprises a set of spectral values. The audio encoder 1200
also comprises an arithmetic encoder 1230 configured to encode a spectral value (out of the
set of spectral values forming the frequency-domain audio representation 722), or a plurality
of spectral values, or a pre-processed version thereof, using a variable-length codeword to
obtain the encoded audio information 7 (which may comprise, for example, a plurality of
variable-length codewords.
The arithmetic encoder 1230 is configured to map a spectral value, or a plurality of spectral
values, or a value of a most-significant bit-plane of a spectral value or of a plurality of
spectral values, onto a code value (i.e. onto a variable-length codeword), in dependence on a
context state. The arithmetic encoder 1 30 is configured to select a mapping rule describing a
mapping of a spectral value, or of a plurality of spectral values, or of a most-significant bitplane
of a spectral value or of a plurality of spectral values, onto a code value, in dependence
on the context state. The arithmetic encoder is configured to determine the current context
state in dependence on a plurality of previously-encoded (preferably, but not necessarily,
adjacent) spectral values. For this purpose, the arithmetic encoder is configured to obtain a
plurality of context sub-region values on the basis of previously-encoded spectral values, to
store said context sub-region values, and to derive a numeric current context value associated
with one or more spectral values to be encoded in dependence on the stored context subregion
vales. Moreover, the arithmetic encoder is configured to compute the norm of a vector
formed by a plurality of previously encoded spectral values, in order to obtain a common
context sub-region value associated with the plurality of previously-encoded spectral values.
As can be seen, the mapping of a spectral value, or of a plurality of spectral values, or of a
most-significant bit-plane of a spectral value or of a plurality of spectral values, onto a code
value may be performed by a spectral value encoding 740 using a mapping rule described by
a mapping rule information 742. A state tracker 1250 may be configured to track the context
state and may comprise a context sub-region value computer 1252, to compute the norm of a
vector formed by a plurality of previously encoded spectral values, in order to obtain a
common context sub-region values associated with the plurality of previously-encoded
spectral values. The state tracker 1250 is also preferably configured to determine the current
context state in dependence on a result of said computation of a context sub-region value
performed by the context sub-region value computer 1252. Accordingly, the state tracker
1250 provides an information 1254, describing the current context state. A mapping rule
selector 1260 may select a mapping rule, for example, a cumulative-frequencies-table,
describing a mapping of a spectral value, or of a most-significant bit-plane of a spectral value,
onto a code value. Accordingly, the mapping rule selector 1260 provides the mapping rule
information 742 to the spectral encoding 740. Preferably, the mapping rule selector 1260 may
be configured to use a hash table "ari_hash_m[742]", as defined in the table representation of
Figs. 22(1) to 22(4), for the selection of the mapping rule. For example, the mapping rule
selector may perform the functionality as described above with reference to Figs. 7 and 8.
To summarize the above, the audio encoder 1200 performs an arithmetic encoding of a
frequency-domain audio representation provided by the time-domain-to-frequency-domain
converter 720. The arithmetic encoding is context-dependent, such that a mapping rule (e.g., a
cumulative-frequencies-table) is selected in dependence on previously-encoded spectral
values. Accordingly, spectral values adjacent in time and/or frequency (or, at least, within a
predetermined environment) to each other and/or to the currently-encoded spectral value (i.e.
spectral values within a predetermined environment of the currently encoded spectral value)
are considered in the arithmetic encoding to adjust the probability distribution evaluated by
the arithmetic encoding.
In order to provide a numeric current context value, a context sub-region value associated
with a plurality of previously-encoded spectral values is obtained on the basis of a
computation of a norm of a vector formed by a plurality of previously-encoded spectral
values. The result of the determination of the numeric current context value is applied in the
selection of the current context state, i.e. in the selection of a mapping rule.
By computing the norm of a vector formed by a plurality of previously-encoded spectral
values, a meaningful information describing a portion of the context of the one or more
spectral values to be encoded can be obtained, wherein the norm of a vector of previously
encoded spectral values can typically be represented with a comparatively small number of
bits. Thus, the amount of context information, which needs to be stored for later use in the
derivation of a numeric current context value, can be kept sufficiently small by applying the
above discussed approach for the computation of the context sub-region values. It has been
found that the norm of a vector of previously encoded spectral values typically comprises the
most significant information regarding the state of the context. In contrast, it has been found
that the sign of said previously encoded spectral values typically comprises a subordinate
impact on the state of the context, such that it makes sense to neglect the sign of the
previously decoded spectral values in order to reduce the quantity of information to be stored
for later use. Also, it has been found that the computation of a norm of a vector of previouslyencoded
spectral values is a reasonable approach for the derivation of a context sub-region
value, as the averaging effect, which is typically obtained by the computation of the norm,
leaves the most important information about the context state substantially unaffected. To
summarize, the context sub-region value computation performed by the context sub-region
value computer 1252 allows for providing a compact context sub-region information for
storage and later re-use, wherein the most relevant information about the context state is
preserved in spite of the reduction of the quantity of information.
Moreover, it has been found that a numeric current context value obtained as discussed above
is very well suited for a selection of a mapping rule using the hash table ''ari_hash m[742]",
as defined in the table representation of Figs. 22(1) to 22(4). For example, the mapping rule
selector may perform the functionality as described above with reference to Figs. 7 and 8
Accordingly, an efficient encoding of the input audio information 710 can be achieved, while
keeping the computational effort and the amount of data to be stored by the arithmetic
encoder 1230 sufficiently small.
7. Audio Decoder According to Fig. 3
Fig. 13 shows a block schematic diagram of an audio decoder 1300. As the audio decoder
1300 is similar to the audio decoder 800 according to Fig. 8, and to the audio decoder 1100
according to Fig. 11, identical means, signals and functionalities are designated with identical
numerals.
The audio decoder 1300 is configured to receive an encoded audio information 810 and to
provide, on the basis thereof, a decoded audio information 812. The audio decoder 1300
comprises an arithmetic decoder 1320 that is configured to provide a plurality of decoded
spectral values 822 on the basis of an arithmetically-encoded representation 821 of the
spectral values. The audio decoder 1300 also comprises a frequency-domain-to-time-domain
converter 830 which is configured to receive the decoded spectral values 822 and to provide
the time-domain audio representation 812, which may constitute the decoded audio
information, using the decoded spectral values 822, in order to obtain a decoded audio
information .
The arithmetic decoder 1 20 comprises a spectral value determinator 824 which is configured
to map a code value of the arithmetically-encoded representation 821 of spectral values onto a
symbol code representing one or more of the decoded spectral values, or at least a portion
(e.g. a most-significant bit-plane) of one or more of the decoded spectral values. The spectral
value determinator 824 may be configured to perform a mapping in dependence on a mapping
rule, which is described by a mapping rule information 828a. The mapping rule information
828a may, for example, comprise a mapping rule index value, or a selected set of entries of a
cumulative-frequencies-table.
The arithmetic decoder 1320 is configured to select a mapping rule (e.g., a cumulativefrequencies-
table) describing a mapping of a code value (described by the arithmeticallyencoded
representation 8 1 of spectral values) onto a symbol code (describing one or more
spectral values) in dependence on a context state (which may be described by the context state
information 1326a). Preferably, the arithmetic decoder 1320 may be configured to use a hash
table "ari_hash_m[742]", as defined in the table representation of Figs. 22(1) to 22(4), for the
selection of the mapping rule. For example, the arithmetic decoder 1320 may perform the
functionality as described above with reference to Figs. 7 and 8. The arithmetic decoder 1320
is configured to determine the current context state in dependence on a plurality of
previously-decoded spectral values 822. For this purpose, a state tracker 1326 may be used,
which receives an information describing the previously-decoded spectral values. The
arithmetic decoder is also configured to obtain a plurality of context sub-region values on the
basis of previously-decoded spectral values and to store said context sub-region values. The
arithmetic decoder is configured to derive a numeric current context value associated with one
or more spectral values to be decoded in dependence on the stored context sub-region values.
The arithmetic decoder 1320 is configured to compute the norm of a vector formed by a
plurality of previously decoded spectral values, in order to obtain a common context subregion
value associated with the plurality of previously-decoded spectral values.
The computation of the norm of a vector formed by a plurality of previously-encoded spectral
values, in order to obtain a common context sub-region value associated with the plurality of
previously decoded spectral values, may, for example, be performed by the context sub-region
value computer 1327, which is part of the state tracker 1326. Accordingly, a current context
state information 1326a is obtained on the basis of the context sub-region values, wherein the
state tracker 1326 preferably provides a numeric current context value associated with one or
more spectral values to be decoded in dependence on the stored context sub-region values.
The selection of the mapping rules may be performed by a mapping rule selector 1328, which
derives a mapping rule information 828a from the current context state information 1326a,
and which provides the mapping rule information 828a to the spectral value determinator 824.
Regarding the functionality of the audio signal decoder 1300, it should be noted that the
arithmetic decoder 1320 is configured to select a mapping rule (e.g., a cumulativefrequencies-
table) which is, on average, well-adapted to the spectral value to be decoded, as
the mapping rule is selected in dependence on the current context state, which, in turn, is
determined in dependence on a plurality of previously-decoded spectral values. Accordingly,
statistical dependencies between adjacent spectral values to be decoded can be exploited.
However, it has been found that it is efficient, in terms of memory usage, to store context subregion
values, which are based on the computation of a norm of a vector formed on a plurality
of previously decoded spectral values, for later use in the determination of the numeric
context value. It has also been found that such context sub-region values still comprise the
most relevant context information. Accordingly, the concept used by the state tracker 1326
constitutes a good compromise between coding efficiency, computational efficiency and
storage efficiency.
Further details will be described below.
8. Audio Encoder According to Fig. 1
In the following, an audio encoder according to an embodiment of the present invention will
be described. Fig. 1 shows a block schematic diagram of such an audio encoder 100.
The audio encoder 100 is configured to receive an input audio information 10 and to
provide, on the basis thereof, a bitstream 112, which constitutes an encoded audio
information. The audio encoder 100 optionally comprises a preprocessor 120, which is
configured to receive the input audio information 110 and to provide, on the basis thereof, a
pre-processed input audio information 110a. The audio encoder 100 also comprises an
energy-compacting time-domain to frequency-domain signal transformer 130, which is also
designated as signal converter. The signal converter 130 is configured to receive the input
audio information 110, 110a and to provide, on the basis thereof, a frequency-domain audio
information 132, which preferably takes the form of a set of spectral values. For example, the
signal transformer 130 may be configured to receive a frame of the input audio information
110, 110a (e.g. a block of time-domain samples) and to provide a set of spectral values
representing the audio content of the respective audio frame. In addition, the signal
transformer 130 may be configured to receive a plurality of subsequent, overlapping or nonoverlapping,
audio frames of the input audio information 110, 110a and to provide, on the
basis thereof, a time-frequency-domain audio representation, which comprises a sequence of
subsequent sets of spectral values, one set of spectral values associated with each frame.
The energy-compacting time-domain to frequency-domain signal transformer 130 may
comprise an energy-compacting filterbank, which provides spectral values associated with
different, overlapping or non-overlapping, frequency ranges. For example, the signal
transformer 130 may comprise a windowing MDCT transformer 130a, which is configured to
window the input audio information 110, 10a (or a frame thereof) using a transform window
and to perform a modified-discrete-cosine-transform of the windowed input audio information
110, 110a (or of the windowed frame thereof). Accordingly, the frequency-domain audio
representation 132 may comprise a set of, for example, 1024 spectral values in the form of
MDCT coefficients associated with a frame of the input audio information.
The audio encoder 100 may further, optionally, comprise a spectral post-processor 140, which
is configured to receive the frequency-domain audio representation 132 and to provide, on the
basis thereof, a post-processed frequency-domain audio representation 142. The spectral post¬
processor 140 may, for example, be configured to perform a temporal noise shaping and/or a
long term prediction and/or any other spectral post-processing known in the art. The audio
encoder further comprises, optionally, a sealer/quantizer 150, which is configured to receive
the frequency-domain audio representation 132 or the post-processed version 142 thereof and
to provide a scaled and quantized frequency-domain audio representation 1 2.
The audio encoder 100 further comprises, optionally, a psycho-acoustic model processor 160,
which is configured to receive the input audio information 110 (or the post-processed version
110a thereof) and to provide, on the basis thereof, an optional control information, which may
be used for the control of the energy-compacting time-domain to frequency-domain signal
transformer 130, for the control of the optional spectral post-processor 140 and/or for the
control of the optional sealer/quantizer 150. For example, the psycho-acoustic model
processor 160 may be configured to analyze t e input audio information, to determine which
components of the input audio information 110, 110a are particularly important for the human
perception of the audio content and which components of the input audio information 110,
10a are less important for the perception of the audio content. Accordingly, the psychoacoustic
model processor 160 may provide control information, which is used by the audio
encoder 100 in order to adjust the scaling of the frequency-domain audio representation 132,
142 by the sealer/quantizer 150 and/or the quantization resolution applied by the
sealer/quantizer 150. Consequently, perceptually important scale factor bands (i.e. groups of
adjacent spectral values which are particularly important for the human perception of the
audio content) are scaled with a large scaling factor and quantized with comparatively high
resolution, while perceptually less-important scale factor bands (i.e. groups of adjacent
spectral values) are scaled with a comparatively smaller scaling factor and quantized with a
comparatively lower quantization resolution. Accordingly, scaled spectral values of
perceptually more important frequencies are typically significantly larger than spectral values
of perceptually less important frequencies.
The audio encoder also comprises an arithmetic encoder 170, which is configured to receive
the scaled and quantized version 152 of the frequency-domain audio representation 132 (or,
alternatively, the post-processed version 142 of the frequency-domain audio representation
132, or even the frequency-domain audio representation 132 itself) and to provide arithmetic
codeword information 172a on the basis thereof, such that the arithmetic codeword
information represents the frequency-domain audio representation 152.
The audio encoder 100 also comprises a bitstream payload formatter 1 0, which is configured
to receive the arithmetic codeword information 172a. The bitstream payload formatter 1 0 is
also typically configured to receive additional information, like, for example, scale factor
information describing which scale factors have been applied by the sealer/quantizer 150. In
addition, the bitstream payload formatter 190 may be configured to receive other control
information. The bitstream payload formatter 190 is configured to provide the bitstream 112
on the basis of the received information by assembling the bitstream in accordance with a
desired bitstream syntax, which will be discussed below.
In the following, details regarding the arithmetic encoder 170 will be described. The
arithmetic encoder 170 is configured to receive a plurality of post-processed and scaled and
quantized spectral values of the frequency-domain audio representation 132. The arithmetic
encoder comprises a most-significant-bit-plane-extractor 174, or even from two spectral
values, which is configured to extract a most-significant bit-plane m from a spectral value. It
should be noted here that the most-significant bit-plane may comprise one or even more bits
(e.g. two or three bits), which are the most-significant bits of the spectral value. Thus, the
most-significant bit-plane extractor 174 provides a most-significant bit-plane value 176 of a
spectral value.
Alternatively, however, the most significant bit-plane extractor 174 may provide a combined
most-significant bit-plane value m combining the most-significant bit-planes of a plurality of
spectral values (e.g., of spectral values a and b). The most-significant bit-plane of the spectral
value a is designated with m. Alternatively, the combined most-significant bit-plane value of a
plurality of spectral values a,b is designated with m.
The arithmetic encoder 170 also comprises a first codeword determinator 180, which is
configured to determine an arithmetic codeword acod_m [pki][m] representing the mostsignificant
bit-plane value m. Optionally, the codeword determinator 180 may also provide
one or more escape codewords (also designated herein with "AMTHJESCAPE") indicating,
for example, how many less-significant bit-planes are available (and, consequently, indicating
the numeric weight of the most-significant bit-plane). The first codeword determinator 180
may be configured to provide the codeword associated with a most-significant bit-plane value
m using a selected cumulative-frequencies-table having (or being referenced by) a
cumulative-frequencies-table index pki.
In order to determine as to which cumulative-frequencies-table should be selected, the
arithmetic encoder preferably comprises a state tracker 182, which is configured to track the
state of the arithmetic encoder, for example, by observing which spectral values have been
encoded previously. The state tracker 182 consequently provides a state information 184, for
example, a state value designated with "s" or "t" or "c". The arithmetic encoder 170 also
comprises a cumulative-frequencies-table selector 186, which is configured to receive the
state information 184 and to provide an information 188 describing the selected cumulativefrequencies-
table to the codeword determinator 180. For example, the cumulativefrequencies-
table selector 186 may provide a cumulative-frequencies-table index „pki"
describing which cumulative-frequencies-table, out of a set of 64 cumulative-frequenciestables,
is selected for usage by the codeword determinator. Alternatively, the cumulativefrequencies-
table selector 1 6 may provide the entire selected cumulative-frequencies-table or
a sub-table to the codeword determinator. Thus, the codeword determinator 180 may use the
selected cumulative-frequencies-table or sub-table for the provision of the codeword
acod_m[pki][m] of the mast-significant bit-plane value m, such that the actual codeword
acod_m[pki][m] encoding the most-significant bit-plane value m is dependent on the value of
m and the cumulative-frequencies-table index pki, and consequently on the current state
information 184. Further details regarding the coding process and the obtained codeword
format will be described below.
It should be noted, however, that in some embodiments, the state tracker 82 may be identical
to, or take the functionality of, the state tracker 750, the state tracker 1050 or the state tracker
1250. It should also be noted that the cumulative-frequencies-table selector 186 may, in some
embodiments, be identical to, or take the functionality of, the mapping rule selector 760, the
mapping rule selector 1060, or the mapping rule selector 1260. Moreover, the first codeword
determinator 180 may, in some embodiments, be identical to, or take the functionality of, the
spectral value encoding 740.
The arithmetic encoder 170 further comprises a less-significant bit-plane extractor 189a,
which is configured to extract one or more less-significant bit-planes from the scaled and
quantized frequency-domain audio representation 1 2, if one or more of the spectral values to
be encoded exceed the range of values encodeable using the most-significant bit-plane only.
The less-significant bit-planes may comprise one or more bits, as desired. Accordingly, the
less-significant bit-plane extractor 189a provides a less-significant bit-plane information
189b. The arithmetic encoder 170 also comprises a second codeword determinator 189c,
which is configured to receive the less-significant bit-plane information 89d and to provide,
on the basis thereof, 0, 1 or more codewords "acod_r" representing the content of 0, 1 or more
less-significant bit-planes. The second codeword determinator 189c may be configured to
apply an arithmetic encoding algorithm or any other encoding algorithm in order to derive the
less-significant bit-plane codewords "acod_r" from the less-significant bit-plane information
189b.
It should be noted here that the number of less-significant bit-planes may vary in dependence
on the value of the scaled and quantized spectral values 152, such that there may be no lesssignificant
bit-plane at all, if the scaled and quantized spectral value to be encoded is
comparatively small, such that there may be one less-significant bit-plane if the current scaled
and quantized spectral value to be encoded is of a medium range and such that there may be
more than one less-significant bit-plane if the scaled and quantized spectral value to be
encoded takes a comparatively large value.
To summarize the above, the arithmetic encoder 170 is configured to encode scaled and
quantized spectral values, which are described by the information 52, using a hierarchical
encoding process. The most-significant bit-plane (comprising, for example, one, two or three
bits per spectral value) of one or more spectral values, is encoded to obtain an arithmetic
codeword "acod_m[pki][m]" of a most-significant bit-plane value m. One or more lesssignificant
bit-planes (each of the less-significant bit-planes comprising, for example, one,
two or three bits) of the one or more spectral values are encoded to obtain one or more
codewords "acod_r". When encoding the most-significant bit-plane, the value m of the mostsignificant
bit-plane is mapped to a codeword acod_m[pki][m]. For this purpose, 64 different
cumulative-frequencies-tables are available for the encoding of the value m in dependence on
a state of the arithmetic encoder 170, i.e. in dependence on previously-encoded spectral
values. Accordingly, the codeword "acod_m[pki][m]" is obtained. In addition, one or more
codewords "acod_r" are provided and included into the bitstream if one or more lesssignificant
bit-planes are present.
Reset description
The audio encoder 100 may optionally be configured to decide whether an improvement in
bitrate can be obtained by resetting the context, for example by setting the state index to a
default value. Accordingly, the audio encoder 100 may be configured to provide a reset
information (e.g. named "arith_reset_flag") indicating whether the context for the arithmetic
encoding is reset, and also indicating whether the context for the arithmetic decoding in a
corresponding decoder should be reset.
Details regarding the bitstream format and the applied cumulative-frequency tables will be
discussed below.
9. Audio Decoder According to Fig. 2
In the following, an audio decoder according to an embodiment of the invention will be
described. Fig. 2 shows a block schematic diagram of such an audio decoder 200.
The audio decoder 200 is configured to receive a bitstream 210, which represents an encoded
audio information and which may be identical to the bitstream 112 provided by the audio
encoder 100. The audio decoder 200 provides a decoded audio information 212 on the basis
of the bitstream 210.
The audio decoder 200 comprises an optional bitstream payload de-formatter 220, which is
configured to receive the bitstream 210 and to extract from the bitstream 210 an encoded
frequency-domain audio representation 222. For example, the bitstream payload de-formatter
220 may be configured to extract from the bitstream 210 arithmetically-coded spectral data
like, for example, an arithmetic codeword "acod [pki][m]" representing the mostsignificant
bit-plane value m of a spectral value a, or of a plurality of spectral values a, b, and
a codeword "acod_r" representing a content of a less-significant bit-plane of the spectral value
a, or of a plurality of spectral values a, b, of the frequency-domain audio representation. Thus,
the encoded frequency-domain audio representation 222 constitutes (or comprises) an
arithmetically-encoded representation of spectral values. The bitstream payload deformatter
220 is further configured to extract from the bitstream additional control information, which is
not shown in Fig. 2. In addition, the bitstream payload deformatter is optionally configured to
extract from the bitstream 210, a state reset information 224, which is also designated as
arithmetic reset flag or "arith_reset_flag".
The audio decoder 200 comprises an arithmetic decoder 230, which is also designated as
"spectral noiseless decoder". The arithmetic decoder 230 is configured to receive the encoded
frequency-domain audio representation 220 and, optionally, the state reset information 224.
The arithmetic decoder 230 is also configured to provide a decoded frequency-domain audio
representation 232, which may comprise a decoded representation of spectral values. For
example, the decoded frequency-domain audio representation 232 may comprise a decoded
representation of spectral values, which are described by the encoded frequency-domain audio
representation 220.
The audio decoder 200 also comprises an optional inverse quantizer/rescaler 240, which is
configured to receive the decoded frequency-domain audio representation 232 and to provide,
on the basis thereof, an inversely-quantized and rescaled frequency-domain audio
representation 242.
The audio decoder 200 further comprises an optional spectral pre-processor 250, which is
configured to receive the inversely-quantized and rescaled frequency-domain audio
representation 242 and to provide, on the basis thereof, a pre-processed version 252 of the
inversely-quantized and rescaled frequency-domain audio representation 242. The audio
decoder 200 also comprises a frequency-domain to time-domain signal transformer 260,
which is also designated as a "signal converter". The signal transformer 260 is configured to
receive the pre-processed version 252 of the inversely-quantized and rescaled frequencydomain
audio representation 242 (or, alternatively, the inversely-quantized and rescaled
frequency-domain audio representation 242 or the decoded frequency-domain audio
representation 232) and to provide, on the basis thereof, a time-domain representation 262 of
the audio information. The frequency-domain to time-domain signal transformer 260 may, for
example, comprise a transformer for performing an inverse-modified-discrete-cosine
transform (IMDCT) and an appropriate windowing (as well as other auxiliary functionalities,
like, for example, an overlap-and-add).
The audio decoder 200 may further comprise an optional time-domain post-processor 270,
which is configured to receive the time-domain representation 262 of the audio information
and to obtain the decoded audio information 212 using a time-domain post-processing.
However, if the post-processing is omitted, the time-domain representation 262 may be
identical to the decoded audio information 212.
It should be noted here that the inverse quantizer/rescaler 240, the spectral pre-processor 250,
the frequency-domain to time-domain signal transformer 260 and the time-domain postprocessor
270 may be controlled in dependence on control information, which is extracted
from the bitstream 210 by the bitstream payload deformatter 220.
To summarize the overall functionality of the audio decoder 200, a decoded frequencydomain
audio representation 232, for example, a set of spectral values associated with an
audio frame of the encoded audio information, may be obtained on the basis of the encoded
frequency-domain representation 222 using the arithmetic decoder 230. Subsequently, the set
of, for example, 1024 spectral values, which may be MDCT coefficients, are inversely
quantized, rescaled and pre-processed. Accordingly, an inversely-quantized, rescaled and
spectrally pre-processed set of spectral values (e.g., 1024 MDCT coefficients) is obtained.
Afterwards, a time-domain representation of an audio frame is derived from the inverselyquantized,
rescaled and spectrally pre-processed set of frequency-domain values (e.g. MDCT
coefficients). Accordingly, a time-domain representation of an audio frame is obtained. The
time-domain representation of a given audio frame may be combined with time-domain
representations of previous and/or subsequent audio frames. For example, an overlap-and-add
between time-domain representations of subsequent audio frames may be performed in order
to smoothen the transitions between the time-domain representations of the adjacent audio
frames and in order to obtain an aliasing cancellation. For details regarding the reconstruction
of the decoded audio information 2 on the basis of the decoded time-frequency domain
audio representation 232, reference is made, for example, to the International Standard
ISO/IEC 14496-3, part 3, sub-part 4 where a detailed discussion is given. However, other
more elaborate overlapping and aliasing-cancellation schemes may be used.
In the following, some details regarding the arithmetic decoder 230 will be described. The
arithmetic decoder 230 comprises a most-significant bit-plane determinator 284, which is
configured to receive the arithmetic codeword acod_m [pki][m] describing the mostsignificant
bit-plane value m. The most-significant bit-plane determinator 284 may be
configured to use a cumulative-frequencies table out of a set comprising a plurality of 64
cumulative-frequencies-tables for deriving the most-significant bit-plane value m from the
arithmetic codeword "acodjn [pki][m]".
The most-significant bit-plane determinator 284 is configured to derive values 286 of a mostsignificant
bit-plane of one of more spectral values on the basis of the codeword acodjn. The
arithmetic decoder 230 further comprises a less-significant bit-plane determinator 288, which
is configured to receive one or more codewords "acod_r" representing one or more lesssignificant
bit-planes of a spectral value. Accordingly, the less-significant bit-plane
determinator 288 is configured to provide decoded values 290 of one or more less-significant
bit-planes. The audio decoder 200 also comprises a bit-plane combiner 292, which is
configured to receive the decoded values 286 of the most-significant bit-plane of one or more
spectral values and the decoded values 290 of one or more less-significant bit-planes of the
spectral values if such less-significant bit-planes are available for the current spectral values.
Accordingly, the bit-plane combiner 292 provides decoded spectral values, which are part of
the decoded frequency-domain audio representation 232. Naturally, the arithmetic decoder
230 is typically configured to provide a plurality of spectral values in order to obtain a full set
of decoded spectral values associated with a current frame of the audio content.
The arithmetic decoder 230 further comprises a cumulative-frequencies-table selector 296,
which is configured to select one of the 64 cumulative-frequencies tables ari_cf_m[64][17]
(each table ari_cf_m[pki][17], with 0
j"), or whether the context value described by the input
variable c is smaller than the context value described by the hash-table-entry
"ari_hash_m[i_min]" (second case defined by the condition "cj), an entry "ari lookup n i min +1]" of the table "ari_lookup_m[]"
designated by the table index value "i_min+l" is returned as the output value of the function
"arith_get_pk()". In the second case (c<(j»8)), an entry "ari_lookup_m[i_min]" of the table
"ari_lookup_m[]" designated by the table index value "i_min" is returned as the return value
of the function "arith_get_pk()". In the third case (i.e. if the context value described by the
input variable c is equal to the significant state value described by the table entry
"ari_hash_m[i_min]"), a mapping rule index value described by the lowermost 8-bits of the
hash-table entry "ari_hash_m[i_min]" is returned as the return value of the function
"arith_get_pkO".
To summarize the above, a particularly simple table search is performed in step 508b, wherein
the table search provides a variable value of a variable "i_min" without distinguishing
whether the context value described by the input variable c is equal to a significant state value
defined by one of the state entries of the table "ari_hash_m[]" or not. In the step 508c, which
is performed subsequent to the table search 508b, a magnitude relationship between the
context value described by the input variable c and a significant state value described by the
hash-table-entry "ari_hash_m[i_min]" is evaluated, and the return value of the function
"arith_get_pk()" is selected in dependence on a result of said evaluation, wherein the value of
the variable "i_min", which is determined in the table evaluation 508b, is considered to select
a mapping rule index value even if the context value described by the input variable c is
different from the significant state value described by the hash-table-entry
"ari hash_m[i_min]" .
It should further be noted that the comparison in the algorithm should preferably (or
alternatively) be done between the context index (numeric context value) c and
j=ari_hash_m[i]»8. Indeed, each entry of the table "ari_hash_m[]" represents a context
index, coded beyond the 8th bits, and its corresponding probability model coded on the 8 first
bits (least significant bits). In the current implementation, we are mainly interested in
knowing whether the present context c is greater than ari_hash_m[i]»8, which is equivalent
to detecting if s=c«8 is also greater than ari_hash_m[i].
To summarize the above, once the context state is calculated (which may, for example, be
achieved using the algorithm "arith_get_context(c,i,N)" according to fig 5c, or the algorithm
"arith_get_context(c,i)" according to fig 5d, the most significant 2-bit-wise-plane is decoded
using the algorithm "arith_decode" (which will be described below) called with the
appropriate cumulative-frequencies-table corresponding to the probability model
corresponding to the context state. The correspondence is made by the function
"arith_get_pk()", for example, the function "arith_get_pk()" which has been discussed with
reference to fig 5f.
11.6 Arithmetic Decoding
11.6.1 Arithmetic Decoding Using the Algorithm According to Fig 5g
In the following, the functionality a preferred implementation of the function "arith_decode()"
will be discussed in detail with reference to fig 5g. Fig. 5g shows a pseudo C-code describing
the used algorithm.
It should be noted that the function "arith_decode()" uses the helper function
"arith_first_symbol (void)", which returns TRUE, if it is the first symbol of the sequence and
FALSE otherwise. The function "arith_decode()" also uses the helper function
"arith_get_next_bit(void)", which gets and provides the next bit of the bitstream.
In addition, the function "arith_decode()" uses the global variables "low", "high" and "value".
Further, the function "arith_decode()" receives, as an input variable, the variable
"cum_freq[]", which points towards a first entry or element (having element index or entry
index 0) of the selected cumulative-frequencies-table or cumulative-frequencies sub-table
(preferably, one of the sub-tables ari_cf_m[pki=0][17] to ari_cf_m[pki=63][17] of the table
ari_cf_m[64][17], as defined by the table representation of Figs. 23(1), 23(2), 23(3)). Also,
the function "ari h decodeO" uses the input variable "cfl", which indicates the length of the
selected cumulative-frequencies-table or cumulative-frequencies sub-table designated by the
variable "cum_freq[]".
The function "arith_decode()" comprises, as a first step, a variable initialization 570a, which
is performed if the helper function "arith_first_symbol()" indicates that the first symbol of a
sequence of symbols is being decoded. The value initialization 550a initializes the variable
"value" in dependence on a plurality of, for example, 16 bits, which are obtained from the
bitstream using the helper function "arith_get_next_bit", such that the variable "value" takes
the value represented by said bits. Also, the variable "low" is initialized to take the value of 0,
and the variable "high" is initialized to take the value of 65535.
In a second step 570b, the variable "range" is set to a value, which is larger, by , than the
difference between the values of the variables "high" and "low". The variable "cum" is set to
a value which represents a relative position of the value of the variable "value" between the
value of the variable "low" and the value of the variable "high". Accordingly, the variable
"cum" takes, for example, a value between 0 and 216 in dependence on the value of the
variable "value".
The pointer p is initialized to a value which is smaller, by 1, than the starting address of the
selected cumulative-frequencies-table or sub-table.
The algorithm "arith_decode()" also comprises an iterative cumulative-frequencies-tablesearch
570c. The iterative cumulative-frequencies-table-search is repeated until the variable
cfl is smaller than or equal to 1. In the iterative cumulative-frequencies-table-search 570c, the
pointer variable q is set to a value, which is equal to the sum of the current value of the
pointer variable p and half the value of the variable "cfl". If the value of the entry *q of the
selected cumulative-frequencies-table, which entry is addressed by the pointer variable q, is
larger than the value of the variable "cum", the pointer variable p is set to the value of the
pointer variable q, and the variable "cfl" is incremented. Finally, the variable "cfl" is shifted
to the right by one bit, thereby effectively dividing the value of the variable "cfl" by 2 and
neglecting the modulo portion.
Accordingly, the iterative cumulative-frequencies-table-search 570c effectively compares the
value of the variable "cum" with a plurality of entries of the selected cumulative-frequenciestable,
in order to identify an interval within the selected cumulative-frequencies-table, which
is bounded by entries of the cumulative-frequencies-table, such that the value cum lies within
the identified interval. Accordingly, the entries of the selected cumulative-frequencies-table
define intervals, wherein a respective symbol value is associated to each of the intervals of the
selected cumulative-frequencies-table. Also, the widths of the intervals between two adjacent
values of the cumulative-frequencies-table define probabilities of the symbols associated with
said intervals, such that the selected cumulative-frequencies-table in its entirety defines a
probability distribution of the different symbols (or symbol values). Details regarding the
available cumulative-frequencies-tables or cumulative-frequencies-sub-tables will be
discussed below taking reference to Fig. 23.
Taking reference again to Fig. 5 , the symbol value is derived from the value of the pointer
variable p, wherein the symbol value is derived as shown at reference numeral 570d. Thus,
the difference between the value of the pointer variable p and the starting address "cum freq"
is evaluated in order to obtain the symbol value, which is represented by the variable
"symbol".
The algorithm "arith_decode" also comprises an adaptation 570e of the variables "high" and
"low". If the symbol value represented by the variable "symbol" is different from 0, the
variable "high" is updated, as shown at reference numeral 570e. Also, the value of the
variable "low" is updated, as shown at reference numeral 570e. The variable "high" is set to a
value which is determined by the value of the variable "low", the variable "range" and the
entry having the index "symbol -1" of the selected cumulative-frequencies-table or
cumulative-frequencies sub-table. The variable "low" is increased, wherein the magnitude of
the increase is determined by the variable "range" and the entry of the selected cumulativefrequencies-
table having the index "symbol". Accordingly, the difference between the values
of the variables "low" and "high" is adjusted in dependence on the numeric difference
between two adjacent entries of the selected cumulative-frequencies-table.
Accordingly, if a symbol value having a low probability is detected, the interval between the
values of the variables "low" and "high" is reduced to a narrow width. In contrast, if the
detected symbol value comprises a relatively large probability, the width of the interval
between the values of the variables "low" and "high" is set to a comparatively large value.
Again, the width of the interval between the values of the variable "low" and "high" is
dependent on the detected symbol and the corresponding entries of the cumulativefrequencies-
table.
The algorithm "arith_decode()" also comprises an interval renormalization 570f, in which the
interval determined in the step 570e is iteratively shifted and scaled until the "break"-
condition is reached. In the interval renormalization 570f, a selective shift-downward
operation 570fa is performed. If the variable "high" is smaller than 32768, nothing is done,
and the interval renormalization continues with an interval-size-increase operation 570fb. If,
however, the variable "high" is not smaller than 32768 and the variable "low" is greater than
or equal to 32768, the variables "values", "low" and "high" are all reduced by 32768, such
that an interval defined by the variables "low" and "high" is shifted downwards, and such that
the value of the variable "value" is also shifted downwards. If, however, it is found that the
value of the variable "high" is not smaller than 32768, and that the variable "low" is not
greater than or equal to 32768, and that the variable "low" is greater than or equal to 16384
and that the variable "high" is smaller than 49152, the variables "value", "low" and "high" are
all reduced by 16384, thereby shifting down the interval between the values of the variables
"high" and "low" and also the value of the variable "value". If, however, neither of the above
conditions is fulfilled, the interval renormalization is aborted.
If, however, any of the above-mentioned conditions, which are evaluated in the step 570fa, is
fulfilled, the interval-increase-operation 570fb is executed. In the interval-increase-operation
570fb, the value of the variable "low" is doubled. Also, the value of the variable "high" is
doubled, and the result of the doubling is increased by 1. Also, the value of the variable
"value" is doubled (shifted to the left by one bit), and a bit of the bitstream, which is obtained
by the helper function "arith_get_next_bit" is used as the least-significant bit. Accordingly,
the size of the interval between the values of the variables "low" and "high" is approximately
doubled, and the precision of the variable "value" is increased by using a new bit of the
bitstream. As mentioned above, the steps 570fa and 570fb are repeated until the "break"
condition is reached, i.e. until the interval between the values of the variables "low" and
"high" is large enough.
Regarding the functionality of the algorithm "arith_decode()", it should be noted that the
interval between the values of the variables "low" and "high" is reduced in the step 570e in
dependence on two adjacent entries of the cumulative-frequencies-table referenced by the
variable "cum_freq". If an interval between two adjacent values of the selected cumulativefrequencies-
table is small, i.e. if the adjacent values are comparatively close together, the
interval between the values of the variables "low" and "high", which is obtained in the step
570e, will be comparatively small. In contrast, if two adjacent entries of the cumulativefrequencies-
table are spaced further, the interval between the values of the variables "low"
and "high", which is obtained in the step 570e, will be comparatively large.
Consequently, if the interval between the values of the variables "low" and "high", which is
obtained in the step 570e, is comparatively small, a large number of interval renormalization
steps will be executed to re-scale the interval to a "sufficient" size (such that neither of the
conditions of the condition evaluation 570fa is fulfilled). Accordingly, a comparatively large
number of bits from the bitstream will be used in order to increase the precision of the
variable "value". If, in contrast, the interval size obtained in the step 570e is comparatively
large, only a smaller number of repetitions of the interval normalization steps 570fa and 570fb
will be required in order to renormalize the interval between the values of the variables "low"
and "high" to a "sufficient" size. Accordingly, only a comparatively small number of bits
from the bitstream will be used to increase the precision of the variable "value" and to prepare
a decoding of a next symbol.
To summarize the above, if a symbol is decoded, which comprises a comparatively high
probability, and to which a large interval is associated by the entries of the selected
cumulative-frequencies-table, only a comparatively small number of bits will be read from the
bitstream in order to allow for the decoding of a subsequent symbol. In contrast, if a symbol is
decoded, which comprises a comparatively small probability and to which a small interval is
associated by the entries of the selected cumulative-frequencies-table, a comparatively large
number of bits will be taken from the bitstream in order to prepare a decoding of the next
symbol.
Accordingly, the entries of the cumulative-frequencies-tables reflect the probabilities of the
different symbols and also reflect a number of bits required for decoding a sequence of
symbols. By varying the cumulative-frequencies-table in dependence on a context, i.e. in
dependence on previously-decoded symbols (or spectral values), for example, by selecting
different cumulative-frequencies-tables in dependence on the context, stochastic dependencies
between the different symbols can be exploited, which allows for a particular bitrate-efficient
encoding of the subsequent (or adjacent) symbols.
To summarize the above, the function "arith_decode()", which has been described with
reference to Fig. 5g, is called with the cumulative-frequencies-table "arith_cf_m[pki][]",
corresponding to the index "pki" returned by the function "arith_get_pk()" to determine the
most-significant bit-plane value m (which may be set to the symbol value represented by the
return variable "symbol").
To summarize the above, the arithmetic decoder is an integer implementation using the
method of tag generation with scaling. For details, reference is made to the book
"Introduction to Data Compression" of K. Sayood, Third Edition, 2006, Elsevier Inc.
The computer program code according to Fig. 5 describes the used algorithm according to an
embodiment of the invention.
11.6.2 Arithmetic Decoding Using the Algorithm According to Figs. 5h and 5i
Fig. 5h and 5i show a pseudo program code representation of another embodiment of the
algorithm "arith_decode()", which can be used as an alternative to the algorithm
"arith_decode" described with reference to Fig. 5g.
It should be noted that both the algorithms according to Fig. 5g and Figs. 5h and 5i may be
used in the algorithm "values_decode()" according to Fig. 3.
To summarize, the value m is decoded using the function "arith_decode()" called with the
cumulative-frequencies-table "arith_cf_m[pki][]" (which is, preferably, a sub-table of the
table ari_cf_m[67][17] defined in the table representations of Figs. 23(1), 23(2), 23(3))
wherein "pki" corresponds to the index returned by the function "arith_get_pkO"- The
arithmetic coder (or decoder) is an integer implementation using the method of tag generation
with scaling. For details, reference is made to the Book "Introduction to Data Compression"
of K. Sayood, Third Edition, 2006, Elsevier Inc. The computer program code according to
Fig. 5h and 5i describes the used algorithm.
11.7 Escape Mechanism
In the following, the escape mechanism, which is used in the decoding algorithm
"values_decode()" according to Fig. 3, will briefly be discussed.
When the decoded value m (which is provided as a return value of the function
"arith_decode()") is the escape symbol "ARITH ESCAPE", the variables "lev" and "esc_nb"
are incremented by 1, and another value m is decoded. In this case, the function
"arith_get_pk0" (or "get_pk()") is called once again with the value "c+ esc_nb«17 as input
argument, where the variable "esc_nb" describes the number of escape symbols previously
decoded for the same 2-tuple and bounded to 7.
To summarize, if an escape symbol is identified, it is assumed that the most-significant bitplane
value m comprises an increased numeric weight. Moreover, current numeric decoding is
repeated, wherein a modified numeric current context value "c+ esc_nb«17" is used as an
input variable to the function "arith_get_pkQ". Accordingly, a different mapping rule index
value "pki" is typically obtained in different iterations of the sub-algorithm 312ba.
11.8 Arithmetic Stop Mechanism
In the following, the arithmetic stop mechanism will be described. The arithmetic stop
mechanism allows for the reduction of the number of required bits in the case that the upper
frequency portion is entirely quantized to 0 in an audio encoder.
In an embodiment, an arithmetic stop mechanism may be implemented as follows: Once the
value m is not the escape symbol, "ARITH_ESCAPE", the decoder checks if the successive
m forms an "ARITH_STOP" symbol. If the condition "(esc_nb >0&&m==0)" is true, the
"ARITH_STOP" symbol is detected and the decoding process is ended. In this case, the
decoder jumps directly to the sign decoding described below or to the "arith_finish()"
function which will be described below. The condition means that the rest of the frame is
composed of zero values.
11.9 Less-Significant Bit-Plane Decoding
In the following, the decoding of the one or more less-significant bit-planes will be described.
The decoding of the less-significant bit-plane, is performed, for example, in the step 12d
shown in Fig. 3 . Alternatively, however, the algorithms as shown in Fig. 5j and 5n may be
used, wherein the algorithm of Fig 5j is a preferred algorithm.
11.9.1 Less-Significant Bit-Plane Decoding According to Fig. 5i
Taking reference now to Fig. 5j, it can be seen that the values of the variables a and b are
derived from the value m. For example, the number representation of the value m is shifted to
the right by 2-bits to obtain the number representation of the variable b. Moreover, the value
of the variable a is obtained by subtracting a bit-shifted version of the value of variable b, bitshifted
to the left by 2-bits, from the value of the variable m.
Subsequently, an arithmetic decoding of the least-significant bit-plane values r is repeated,
wherein the number of repetitions is determined by the value of the variable "lev". A leastsignificant
bit-plane value r is obtained using the function "arith_decode", wherein a
cumulative-frequencies-table adapted to the least-significant bit-plane decoding is used
(cumulative-frequencies-table "arith_cf_r"). A least-significant bit (having a numeric weight
of 1) of the variable r describes a less-significant bit-plane of the spectral value represented by
the variable a, and a bit having a numeric weight of 2 of the variable r describes a lesssignificant
bit of the spectral value represented by the variable b. Accordingly, the variable a
is updated by shifting the variable a to the left by 1 bit and adding the bit having the numeric
weight of 1 of the variable r as the least significant bit. Similarly, the variable b is updated by
shifting the variable b to the left by one bit and adding the bit having the numeric weight of 2
of the variable r.
Accordingly, the two most-significant information carrying bits of the variables a,b are
determined by the most-significant bit-plane value m, and the one or more least-significant
bits (if any) of the values a and b are determined by one or more less- significant bit-plane
values r.
To summarize the above, if the "ARITH_STOP" symbol is not met, the remaining bit planes
are then decoded, if any exist, for the present 2-tuple. The remaining bit-planes are decoded
from the most-significant to the least-significant level by calling the function "arith_decodeO"
lev number of times with the cumulative frequencies table "arith_cf_r[]". The decoded bitplanes
r permit to refine the previously-decoded value m in accordance with the algorithm, a
pseudo program code of which is shown in Fig. 5j .
11.9.2 Less-Significant Bit Band Decoding According to Fig. 5n
Alternatively, however, the algorithm a pseudo program code representation of which is
shown in Fig. 5n can also be used for the less-significant bit-plane decoding. In this case, if
the "ARITH_STOP" symbol is not met, the remaining bit-planes are then decoded, if any
exist, for the present 2-tuple. The remaining bit-planes are decoded from the most-significant
to the least-significant level by calling "lev" times "arith_decode()" with the cumulativefrequencies-
table "arith_cf_r()". The decoded bit-planes r permits for the refining of the
previously-decoded value m in accordance with the algorithm shown in Fig. 5n.
11.10 Context Update
11.10.1 Context Update According to Fig, 5k, 51. and 5m
In the following, operations used to complete the decoding of the tuple of spectral values will
be described, taking reference to Figs. 5k and 51. Moreover, an operation will be described
which is used to complete a decoding of a set of tuples of spectral values associated with a
current portion (for example, a current frame) of an audio content.
It should be noted that the algorithms according to Figs. 5k, 5 1 and 5m are preferred, even
though alternative algorithms may be used.
Taking reference now to Fig. 5k, it can be seen that the entry having entry index 2*i of the
array "x_ac_dec[]" is set to be equal to a, and that the entry having entry index "2*i+l" of the
array "x_ac_dec[]" is set to be equal to b after the less significant bit decoding 3 1 d. In other
words, at the point after the less-significant bit decoding 3 d, the unsigned value of the 2-
tuple {a,b}, is completely decoded. It is saved into the array (for example the array
"x_ac_dec[]") holding the spectral coefficients in accordance with the algorithm shown in
Fig. 5k.
Subsequently, the context "q" is also updated for the next 2-tuple. It should be noted that this
context update also has to be performed for the last 2-tuple. This context update is performed
by the function "arith_update_context()", a pseudo program code representation of which is
shown in Fig. 51.
Taking reference now to Fig. 51, it can be seen that the function "arith_update_context(i,a,b)"
receives, as input variables, decoded unsigned quantized spectral coefficients (or spectral
values) a, b of the 2-tuple. In addition, the function "arith_update_context" also receives, as
an input variable, an index i (for example, a frequency index) of the quantized spectral
coefficient to decode. In other words, the input variable i may, for example, be an index of the
tuple of spectral values, absolute values of which are defined by the input variables a, b. As
can be seen, the entry "q[l][i]" of the array "q[][]" may be set to a value which is equal to
a+b+1. In addition, the value of the entry "q[l][i]" of the array "q[][]" may be limited to a
hexadecimal value of "OxF". Thus, the entry "q[l][i]" of the array "q[][]" is obtained by
computing a sum of absolute values of the currently decoded tuple {a,b} of spectral values
having frequency index i, and adding 1 to the result of said sum.
It should be noted here that the entry "q[l][i]" of the array "q[][]" may be considered as a
context sub-region value, because it describes a sub-region of the context which is used for a
subsequent decoding of additional spectral values (or tuples of spectral values).
It should be noted here that the summation of the absolute values a and b of the two currently
decoded spectral values (signed versions of which are stored in the entries "x_ac_dec[2*i]"
and "x_ac_dec[2*i+l]" of the array "x_ac_dec[]"), may be considered as the computation of a
norm (e.g. a LI norm) of the decoded spectral values.
It has been found that context sub-region values (i.e. entries of the array "q[][]") which
describe a norm of a vector formed by a plurality of previously decoded spectral values are
particularly meaningful and memory efficient. It has been found that such a norm, which is
computed on the basis of a plurality of previously decoded spectral values, comprises
meaningful context information in a compact form. It has been found that the sign of the
spectral values is typically not particularly relevant for the choice of the context. It has also
been found that the formation of a norm across a plurality of previously decoded spectral
values typically maintains the most important information, even though some details are
discarded. Moreover, it has been found that a limitation of the numeric current context value
to a maximum value typically does not result in a severe loss of information. Rather, it has
been found that it is more efficient to use the same context state for significant spectral values
which are larger than a predetermined threshold value. Thus, the limitation of the context subregion
values brings along a further improvement of the memory efficiency. Furthermore, it
has been found that the limitation of the context sub-region values to a certain maximum
value allows for a particularly simple and computationally efficient update of the numeric
current context value, which has been described, for example, with reference to Figs. 5c and
5d. By limiting the context sub-region values to a comparatively small value (e.g. to a value
of 15), a context state which is based on a plurality of context sub-region values can be
represented in the efficient form, which has been discussed taking reference to Figs. 5c and
5d.
Moreover, it has been found that a limitation of the context sub-region values to values
between 1 and 15, brings along a particularly good compromise between accuracy and
memory efficiency, because 4 bits are sufficient in order to store such a context sub-region
value.
However, it should be noted that in some other embodiments, a context sub-region value may
be based on a single decoded spectral value only. In this case, the formation of a norm may
optionally be omitted.
The next 2-tuple of the frame is decoded after the completion of the function
"arith_update_context" by incrementing i by 1 and by redoing the same process as described
above, starting from the function "arith_get_context()".
When lg/2 2-tuples are decoded within the frame, or with the stop symbol "ARITH_STOP"
occurs, the decoding process of the spectral amplitude terminates and the decoding of the
signs begins.
Details regarding the decoding of the signs have been discussed with reference to Fig. 3,
wherein the decoding of the signs is shown in reference numeral 314.
Once all unsigned quantized spectral coefficients are decoded, the according sign is added.
For each non-null quantized value of "x_ac_dec" a bit is read. If the read bit value is equal to
1, the quantized value is positive, nothing is done and the signed value is equal to the
previously-decoded unsigned value. Otherwise (i.e. if the read bit value is equal to 0), the
decoded coefficient (or spectral value) is negative and the two's complement is taken from the
unsigned value. The sign bits are read from the low to the higher frequencies. For details,
reference is made to Figs. 3 and to the explanations regarding the signs decoding 314.
The decoding is finished by calling the function "arith_finish()". The remaining spectral
coefficients are set to 0. The respective context states are updated correspondingly.
For details, reference is made to Fig. 5m, which shows a pseudo program code representation
of the function "arith_finish()". As can be seen, the function "arith_fimsh()" receives an input
variable lg which describes the decoded quantized spectral coefficients. Preferably, the input
variable lg of the function "arith_finish" describes a number of actually-decoded spectral
coefficients, leaving spectral coefficients unconsidered, to which a 0-value has been allocated
in response to the detection of an "ARITH_STOP" symbol. An input variable N of the
function "arith_fmish" describes a window length of a current window (i.e. a window
associated with the current portion of the audio content). Typically, a number of spectral
values associated with a window of length N is equal to N/2 and a number of 2-tuples of
spectral values associated with a window of window length N is equal to N/4.
The function "arith finish" also receives, as an input value, a vector "x_ac_dec" of decoded
spectral values, or at least a reference to such a vector of decoded spectral coefficients.
The function "arith_finish" is configured to set the entries of the array (or vector) "x_ac_dec",
for which no spectral values have been decoded due to the presence of an arithmetic stop
condition, to 0. Moreover, the function "arith finish" sets context sub-region values "q[l][i]",
which are associated with spectral values for which no value has been decoded due to the
presence of an arithmetic stop condition, to a predetermined value of 1. The predetermined
value of 1 corresponds to a tuple of the spectral values wherein both spectral values are equal
to .
Accordingly, the function "arith_finish()" allows to update the entire array (or vector)
"x_ac_dec[]" of spectral values and also the entire array of context sub-region values
"q[l][i]", even in the presence of an arithmetic stop condition.
11.10.2 Context Update According to Figs. 5o and 5p
In the following, another embodiment of the context update will be described taking reference
to Figs. 5o and 5p. At the point at which the unsigned value of the 2-tuple (a,b) is completely
decoded, the context q is then updated for the next 2-tuple. The update is also performed if the
present 2-tuple is the last 2-tuple. Both updates are made by the function
"arith_update_context()", a pseudo program code representation of which is shown in Fig. 5o.
The next 2-tuple of the frame is then decoded by incrementing i by 1 and calling the function
arith_decode(). If the lg/2 2-tuples were already decoded with the frame, or if the stop symbol
"ARITH STOP" occurred, the function "arith_fmish()" is called. The context is saved and
stored in the array (or vector) "qs" for the next frame. A pseudo program code of the function
"arith_save_context()" is shown in Fig. 5p,
Once all unsigned quantized spectral coefficients are decoded, the sign is then added. For
each non-quantized value of "qdec", a bit is read. If the read bit value is equal to 0, the
quantized value is positive, nothing is done and the signed value is equal to the previouslydecoded
unsigned value. Otherwise, the decoded coefficient is negative and the two's
complement is taken from the unsigned vale. The signed bits are read from the low to the high
frequencies.
11.1 1 Summary of Decoding Process
In the following, the decoding process will briefly be summarized. For details, reference is
made to the above discussion and also to Figs. 3, 4, 5a, 5c, 5e, 5g, 5j, 5k, 51, and 5m. The
quantized spectral coefficients "x_ac_dec[]" are noiselessly decoded starting from the lowestfrequency
coefficient and progressing to the highest-frequency coefficient. They are decoded
by groups of two successive coefficients a,b gathering in a so-called 2-tuple (a,b) (also
designated with {a,b }).
The decoded coefficients "x_ac_dec[]" for the frequency-domain (i.e. for a frequency-domain
mode) are then stored in the array "x_ac_quant[g][win][sfb][bin]". The order of transmission
of the noiseless coding codewords is such that when they are decoded in the order received
and stored in the array, "bin" is the most rapidly incrementing index and "g" is the most
slowly incrementing index. Within a codeword, the order of decoding is a, then b. The
decoded coefficients "x_ac_dec[]" for the "TCX" (i.e. for an audio decoding using a
transform-coded excitation) are stored (for example, directly) in the array
"x_tcx_invquant[win][bin]" and the order of the transmission of the noiseless coding
codewords is such that when they are decoded in the order received and stored in the array,
"bin" is the most rapidly incrementing index and "win" is the most slowly incrementing
index. Within a codeword, the order of decoding is a, then b.
First, the flag "arith_reset_fiag" determines if the context must be reset. If the flag is true, this
is considered in the function "arith_map_context".
The decoding process starts with an initialization phase where the context element vector "q"
is updated by copying and mapping the context elements of the previous frame stored in
"q[l][]" into "q[0][]". The context elements within "q" are stored on a 4-bits per 2-tuple. For
details, reference is made to the pseudo program code of Fig. 5a.
The noiseless decoder outputs 2-tuples of unsigned quantized spectral coefficients. At first,
the state c of the context is calculated based on the previously-decoded spectral coefficients
surrounding the 2-tuple to decode. Therefore, the state is incrementally updated using the
context state of the last decoded 2-tuple considering only two new 2-tuples. The state is
decoded on 17-bits and is returned by the function "arith_get_context". A pseudo program
code representation of the set function "arith_get_context" is shown in Fig. 5c.
The context state c determines the cumulative-frequencies-table used for decoding the most
significant 2-bit-wise-plane m. The mapping from c to the corresponding cumulativefrequencies-
table index "pki" is performed by the function "arith_get_pk()". A pseudo
program code representation of the function "arith_get_pk()" is shown in Fig. 5e.
The value m is decoded using the function "arith_decode()" called with the cumulativefrequencies-
table, "arith_cf_m[pki][]" where "pki" corresponds to the index returned by
"arith_get_pk()". The arithmetic coder (and decoder) is an integer implementation using a
method of tag generation with scaling. The pseudo program code according to Fig. 5g
describes the used algorithm.
When the decoded value m is the escape symbol "ARITH_ESCAPE", the variables "lev" and
"esc_nb" are incremented by 1 and another value m is decoded. In this case, the function
"get_pk()" is called once again with the value "c+ esc_nb«17" as input argument, where
"esc_nb" is the number of escape symbols previously decoded for the same 2-tuple and
bounded to 7.
Once the value m is not the escape symbol "ARITH_ESCAPE", the decoder checks if the
successive m forms an "ARITH_STOP" symbol. If the condition "(esc_nb>0&&m==0)" is
true, the "ARITH_STOP" symbol is detected and the decoding process is ended. The decoder
jumps directly to the sign decoding described afterwards. The condition means that the rest of
the frame is composed of 0 values.
If the "ARITH_STOP" symbol is not met, the remaining bit-planes are then decoded, if any
exist, for the present 2-tuple. The remaining bit-planes are decoded from the most-significant
to the least-significant level, by calling "arith_decode()" lev number of times with the
cumulative-frequencies-table "arifh_cf_r[]'\ The decoded bit-planes r permit the refining of
the previously-decoded value m, in accordance with the algorithm a pseudo program code of
which is shown in Fig. 5j. At this point, the unsigned value of the 2-tuple (a,b) is completely
decoded. It is saved into the element holding the spectral coefficients in accordance with the
algorithm, a pseudo program code representation of which is shown in Fig. 5k.
The context "q" is also updated for the next 2-tuple. It should be noted that this context update
has to also be performed for the last 2-tuple. This context update is performed by the function
"arifh_update_context()", a pseudo program code representation of which is shown in Fig. 51.
The next 2-tuple of the frame is then decoded by incrementing i by 1 and by redoing the same
process as described as above, starting from the function "arith_get_context()". When lg/2 2-
tuples are decoded within the frame, or when the stop symbol "ARITH_STOP" occurs, the
decoding process of the spectral amplitude terminates and the decoding of the signs begins.
The decoding is finished by calling the function "arith_finish()". The remaining spectral
coefficients are set to 0. The respective context states are updated correspondingly. A pseudo
program code representation of the function "arith_finish" is shown in Fig. 5m.
Once all unsigned quantized spectral coefficients are decoded, the according sign is added.
For each non-null quantized value of "x_ac_dec", a bit is read. If the read bit value is equal to
I , the quantized value is positive, and nothing is done, and the signed value is equal to the
previously decoded unsigned value. Otherwise, the decoded coefficient is negative and the
two's complement is taken from the unsigned value. The signed bits are read from the low to
the high frequencies.
I I .12 Legends
Fig. 5q shows a legend of the definitions which is related to the algorithms according to Figs.
5a, 5c, 5e, 5f, 5g, 5j, 5k, 51, and 5m.
Fig. 5r shows a legend of the definitions which is related to the algorithms according to Figs.
5b, 5d, 5f, 5h, 5i, 5n, 5o, and 5p.
12. Mapping Tables
In an embodiment according to the invention, particularly advantageous tables
"ari lookup m", "ari_hash_m", and "ari_cf_m" are used for the execution of the function
"arith_getjpk()" according to Fig. 5e or Fig. 5f, and for the execution of the function
"arith_decode()" which was discussed with reference to Figs. 5g, 5h and 5i. However, it
should be noted that different tables may be used in some alternative embodiments.
12.1 Table "ari hash HIG 742 T ' According to Figs. 22(1 . 22(2), 22(3) and 22(4)
A content of a particularly advantageous implementation of the table "ari_hash_m", which is
used by the function "arith_get_pk", a first preferred embodiment of which was described
with reference to Fig. 5e, and a second embodiment of which was described with reference to
Fig. 5f, is shown in the table of Figs. 22(1) to 22(4). It should be noted that the table of Figs.
22(1) to 22(4) lists the 742 entries of the table (or array) "ari_hash_m[742]". It should also be
noted that the table representation of Figs. 22(1 ) to 22(4) shows the elements in the order of
the element indices, such that the first value "0x00000 104UL" corresponds to a table entry
"ari_hash_m[0]" having an element index (or table index) 0, and such that the last value
"OxFFFFFFOOUL" corresponds to a table entry "ari_hash_m[741]" having element index or
table index 741 . It should further be noted here that "Ox" indicates that the table entries of the
table "ari_hash_m[]" are represented in a hexadecimal format. Moreover, it should be noted
here that the suffix "UL" indicates that the table entries of the table "ari_hash_m[]" are
represented as unsigned "long" integer values (having a precision of 32-bits).
Furthermore, it should be noted that the table entries of the table "ari_hash_m[]" according to
Figs. 22(1) to 22(4) are arranged in a numeric order, in order to allow for the execution of the
table search 506b, 508b, 510b of the function "arith_get_pk()".
It should further be noted that the most-significant 24-bits of the table entries of the table
"ari_hash_m" represent certain significant state values (and may be considered as a first subentry),
while the least-significant 8-bits represent mapping rule index values "pki" (and may
be considered as a second sub-entry). Thus, the entries of the table "ari_hash_m[]" describe a
"direct hit" mapping of a context value onto a mapping rule index value "pki".
However, the uppermost 24-bits of the entries of the table "ari_hash_m[]" represent, at the
same time, interval boundaries of intervals of numeric context values, to which the same
mapping rule index value is associated. Details regarding this concept have already been
discussed above.
12.2 Table "ari lookup m" According to Fig. 2 1
A content of a particularly advantageous embodiment of the table "ari_lookup_m" is shown in
the table of Fig. 21. It should be noted here that the table of Fig. 2 1 lists the entries of the
table "ari_lookup_m". The entries are referenced by a 1-dimensional integer-type entry index
(also designated as "element index" or "array index" or "table index") which is, for example,
designated with "i_max" or "i_min" or "i". It should be noted that the table "ari_lookup_m",
which comprises a total of 742 entries, is well-suited for the use by the function
"arith_get__pk" according to Fig. e or Fig. 5f. It should also be noted that the table
"ari_lookup_m" according to Fig. 2 1 is adapted to cooperate with the table "ari_hash_m"
according to Fig. 22.
It should be noted that the entries of the table "ari_lookup_m[742]" are listed in an ascending
order of the table index "i" (e.g. "i min" or "i_max" or "i") between 0 and 741. The term "Ox"
indicates that the table entries are described in a hexadecimal format. Accordingly, the first
table entry "0x01" corresponds to the table entry "ari_lookup_m[0]" having table index 0 and
the last table entry "0x27" corresponds to the table entry "ari_lookup_m[741]" having table
index 741.
It should also be noted that the entries of the table "ari_lookup_m[]" are associated with
intervals defined by adjacent entries of the table "arith_hash_m[]". Thus, the entries of the
table "ari_lookup_m" describe mapping rule index values associated with intervals of numeric
context values, wherein the intervals are defined by the entries of the table "arith_hash_m".
12.3. Table "ari cf mi64iri71" According to Figs. 23(1). 23(2) and 23(3)
Fig. 23 shows a set of 64 cumulative-frequencies-tables (or sub-tables) "ari cf m[pki][17]",
one of which is selected by and audio encoder 100, 700 or an audio decoder 200, 800, for
example, for the execution of the function "arith_decode()", ί · · for the decoding of the mostsignificant
bit-plane value. The selected one of the 64 cumulative-frequencies-tables (or subtables)
shown in Figs. 23(1) to 23(3) takes the function of the table "cum_freq[]" in the
execution of the function "arith_decode()".
As can be seen from Figs. 23(1) to 23(3), each sub-block or line represents a cumulativefrequencies-
table having 17 entries. For example, a first sub-block or line 23 10 represents the
17 entries of a cumulative-frequencies-table for "pki=0". A second sub-block or line 23 12
represents the 7 entries of a cumulative-frequencies-table for "pki=l". Finally, a 64th subblock
or line 2364 represents the 17 entries of a cumulative-frequencies-table for "pki=63".
Thus, Figs. 23(1 ) to 23(3) effectively represent 64 different cumulative-frequencies-tables (or
sub-tables) for "pki=0" to "pki=95", wherein each of the 64 cumulative-frequencies-tables is
represented by a sub-block (enclosed by curled brackets) or line, and wherein each of said
cumulative-frequencies-tables comprises 17 entries.
Within a sub-block or line (e.g. a sub-block or line 2310 or 23 12, or a sub-block or line 2396),
a first value (for example, a first value 708 of the first sub-block 2310) describes a first entry
of the cumulative-frequencies-table (having an array index or table index of 0) represented by
the sub-block or line, and a last value (for example, a last value 0 of the first sub-block or line
2 10) describes a last entry of the cumulative-frequencies-table (having an array index or
table index of 16) represented by the sub-block or line.
Accordingly, each sub-block or line 23 10, 2312, 2364 of the table representation of Fig. 23
represents the entries of a cumulative-frequencies-table for use by the function "arith_decode"
according to Fig. 5g, or according to Figs. 5h and 5i. The input variable "cum freqf]" of the
function "arith_decode" describes which of the 64 cumulative-frequencies-tables (represented
by individual sub-blocks of 17 entries of the table "arith_cf_m") should be used for the
decoding of the current spectral coefficients.
12.4 Table "ari cf GP " According to Fig. 24
Fig. 24 shows a content of the table
The four entries of said table are shown in Fig. 24. However, it should be noted that the table
"ari_cf_r" may eventually be different in other embodiments.
13. Overview, Performance Evaluation and Advantages
The embodiments according to the invention use updated functions (or algorithms) and an
updated set of tables, as discussed above, in order to obtain an improved tradeoff between
computational complexity, memory requirement, and coding efficiency.
Generally speaking, the embodiments according to the invention create an improved spectral
noiseless coding. Embodiments according to the present invention describe an enhancement
of the spectral noiseless coding in USAC (unified speech and audio encoding).
Embodiments according to the invention create an updated proposal for the CE on improved
spectral noiseless coding of spectral coefficients, based on the schemes as presented in the
MPEG input papers ml69l2 and ml7002. Both proposals were evaluated, potential short
comings eliminated and the strengths combined. In addition embodiments of the invention
comprise an update of noiseless spectral coding tables for application in a current USAC
specification.
13.1. Overview
In the following, a short overview will be given. In the course of the ongoing standardization
of USAC (Unified Speech and Audio Coding), an enhanced spectral noiseless coding scheme
(aka entropy coding scheme) in USAC was proposed. This enhanced spectral noiseless coding
scheme helps to more efficiently code quantized spectral coefficients in a lossless manner.
Therefore, spectral coefficients are mapped to corresponding code-words of variable length.
This entropy coding scheme is based on a context based arithmetic coding scheme: The
context (i.e. neighboring spectral coefficients) of a spectral coefficient determines a
probability distribution (cumulative frequency table), which is used for arithmetic coding of
the spectral coefficient.
Embodiments according to the present invention use an updated set of tables for the spectral
coding scheme, as previously proposed in the context of USAC. To give the background, it
should be noted that the conventional spectral noiseless coding technology consists firstly of
an algorithm and secondly of a set of trained tables (or, at least, comprises an algorithm and a
set of trained tables). This conventional set of trained tables is based upon USAC D4
bitstreams. Since USAC has now progressed to WD7, and significant changes have been
applied to the USAC specification in the meantime, a new set of re-trained tables is used in
embodiments according to the invention, which is based on the most recent USAC version
WD7. The algorithm itself remains unchanged. As a side effect, the retrained tables provide
compression performance better than any of the previously presented schemes.
According to the present invention, it is proposed to replace the conventional trained tables by
the re-trained tables as presented here, which results in an increased coding performance.
13.2. Introduction
In the following, an introduction will be provided.
For the USAC work item, several proposals on updating the noiseless coding scheme were
brought forward during the last meetings in a collaborative fashion. However, this work was
basically initiated at the 89th meeting. Since then it has been common practice for all
proposals on spectral coefficient coding to show performance results based on USAC WD 4
reference quality bitstreams and training upon a WD 4 training database.
In the meantime, great improvements to other fields of USAC, in particular to stereo
processing and windowing, have been incorporated into the USAC specification as of today.
It was found that these improvements also slightly affect the statistics for the spectral
noiseless coding. The results shown for the noiseless coding CEs can therefore be regarded as
suboptimal, since they do not correspond to the latest WD revision.
Accordingly, spectral noiseless coding tables are suggested which are better adapted to the
updated algorithms and to the statistics of the spectral values to be encoded and decoded.
13.3. Short description of Algorithm
In the following, a short description of the algorithm will be provided.
To overcome the issue of memory footprint and the computational complexity, an improved
noiseless coding scheme was proposed to replace the scheme as in working draft 6/7
(WD6/7). The main focus in the development was put on reducing memory demand while
maintaining the compression efficiency and not increasing the computational complexity.
More specifically the target was to reach the best tradeoff in the multi-dimension complexity
space of compression performance, complexity and memory requirements.
The proposed coding scheme proposal borrows the main feature of the WD6/7 noiseless
coder, namely the context adaptation. The context is derived using previously decoded
spectral coefficients, which come as in WD6/7 from both the past and the present frame.
However, the spectral coefficients are now coded by combining 2 coefficients together for
forming a 2-tuple. Another difference lays in the fact that the spectral coefficients are now
split in three parts, the sign, the MSBs and the LSBs. The sign is coded independently from
the magnitude which is further divided in two parts, the two most significant bits and the rest
of bits if they exist. The 2-tuples for which the magnitude of the two elements is lower or
equal to 3 are coded directly by the MSBs coding. Otherwise, an escape codeword is
transmitted first for signaling any additional bit plane. In the base version, the missing
information, the LSBs and the sign are both coded using uniform probability distribution.
The table size reduction is still possible since:
• Only probabilities for 1 symbols need to be stored: {[0;+3], [0;+3]} + ESC symbol;
• There is no need to store a grouping table (egroups, dgroups, dgvectors); and
• The size of the hash-table could be reduced with an appropriate training
13.3.1 MSBs coding
In the following, a MSBs coding will be described.
As already mentioned, the main difference between D6/7, previous proposals and the
current proposal, is the dimension of the symbols. In WD6/7 4-tuples were considered for the
context generation and the noiseless coding. In previous submissions, 1-tuples were used
instead for reducing the ROM requirements. In the course of our development, the 2-tuples
were found to be the best compromise for reducing the ROM requirements without increasing
the computational complexity. Instead of considering four 4-tuples for the context derivation,
now four 2-tuples are considered. As shown in Fig. 25, three 2-tuples come from the past
frame and one from the present frame.
The table size reduction is due to three main factors. First, only probabilities for 17 symbols
need to be stored (i.e. {[0;+3], [0;+3]} + ESC symbol). Grouping tables (i.e. egroups,
dgroups, dgvectors) are not needed anymore. Moreover, the size of the hash-table was
reduced by performing an appropriate training.
Although the dimension was reduced from 4 to 2, the complexity was maintained to the range
as in the WD6/7. It was achieved by simplifying both the context generation and the hash
table access.
The different simplifications and optimizations were done in a way that the coding
performance was not affected and even slightly improved.
13.3.2 LSBs coding
The LSBs are coded with a uniform probability distribution. Compared to WD6/7, the LSBs
are now considered within 2-tuples instead of 4 t-tuples. However, different coding of the
least significant bits is possible.
13.3.3 Sign coding
The sign is coded without using the arithmetic core-coder for the sake of complexity
reduction. The sign is transmitted on 1 bit only when the corresponding magnitude is nonnull.
0 means a positive value and 1 a negative value.
13.4. Proposed Update of Tables
This contribution provides an updated set of tables for the USAC spectral noiseless coding
scheme. The tables were re-trained based on the current USAC WD6/7 bitstreams. Apart from
the actual tables, which result from a training process, the algorithm remains unchanged.
To investigate the effect of the re-training, coding efficiency and memory requirement of the
new tables is compared against the previous proposal (M17558) and the WD6. WD6 is
selected as a reference point since a) results at the 92nd meeting were given with respect to
this reference and b) the differences between WD6 and WD7 are only very minor (bugfixes
only, with no effect on entropy coding or distribution of spectral coefficients).
13.4.1 Coding Efficiency
Firstly, the coding efficiency of the proposed new set of tables is compared against USAC
WD6 and the CE as proposed in M17558. As can be seen in the table representation of Fig.
26, by a pure retraining the averaged increase in coding efficiency (compared to WD6) could
be increased from 1.74% (Ml 7558) to 2.45% (new propsal, according to an embodiment of
the invention). Compared to Ml 7558, the compression gain could thus be increased by
roughly 0.7% in embodiments according to the invention.
Fig. 27 visualizes the compression gain for all operating points. As can be seen, a minimum
compression gain of at least 2% can be reached using embodiments according to the invention
compared to WD6. For low rates, such as 12 kbit/s and 16 kbit s, the compression gain is even
slightly increased. The good performance is also retained at higher bitrates such as 64 kbit/s,
where a significant increase in coding efficiency of more than 3% can be observed.
It should be noted that a lossless transcoding of all WD6 reference quality bitstreams was
proven to be possible without violating the bit reservoir constraints. More detailed results will
be given in section 13.6.
13.4.2 Memory Demand and Complexity
Secondly, memory demand and complexity are compared against USAC WD6 and the CE as
proposed in M17558. The table of Fig. 28 compares the memory demand for the noiseless
coder as in WD6, proposed in Ml 7 58 and the new proposal according to an embodiment of
the invention. As can be clearly seen, the memory demand is significantly reduced by
adopting the new algorithm, as proposed in M17558. Further on it can be seen that for the
new proposal the total table size could even be slightly reduced by nearly 80 words (32 bit),
resulting in a total ROM demand of 1441 words, and a total RAM demand of 64 words (32
bit) per audio channel. The small saving in ROM demand is the result of a better trade-off
between number of probability models and hash-table size, found by the automatic training
algorithm based on the new set of D6 training bitstreams. For more details reference is
made to the table of Fig. 29.
In term of complexity, the newly proposed schemes' computational complexity was compared
against an optimized version of the current noiseless in USAC. It was found by a "pen and
paper" method and by instructing the code that the new coding scheme has the same order of
complexity as the current scheme. As reported in the table of Fig. 30 for the 32 kbps stereo
and the table of Fig. 31 for the 12 kbps mono operating points, the estimated complexity
shows an increase of 0.006 weighted MOPS and 0.024 weighted MOPS respectively over an
optimized implementation of the WD6 noiseless decoder. Compared to an overall complexity
of about 11.7 PCU [2], these differences can be considered negligible.
13.5. Conclusion
In the following, some conclusions will be provided.
A new set of tables for the USAC spectral noiseless coding scheme was presented. In contrast
to the previous proposal, which is the result of a training based upon older bitstreams, the
proposed new tables are now trained on current USAC WD bitstreams, wherein an advanced
training concept has been used. By this re-training, the coding efficiency on current USAC
bitstreams could be improved, without sacrificing the low memory demand or increasing the
complexity compared to previous proposals. Compared to the USAC WD6, the memory
demand could be significantly reduced.
13.6. Detailed Information on the Transcoding of WD6 Bitstreams
Detailed information on the transcoding of Working draft 6 (WD6) bitstreams can be seen in
the table representations of Figs. 32, 33, 34, 35 and 36.
Fig. 32 shows a table representation of average bitrates produced by the arithmetic coder in an
embodiment according to the invention and in the WD6.
Fig. shows a table representation of minimum, maximum and average bitrates of USAC on
a frame basis using the proposed scheme.
Fig. 34 shows a table representation of average bitrates produced by a USAC coder using
WD6 arithmetic coder and a coder according to an embodiment according to the invention
("new proposal").
Fig. 35 shows a table representation of best and worst cases for an embodiment according to
the invention.
Fig. 36 shows a table representation of bitreservoir limit for an embodiment according to the
invention.
14. Changes when compared to the working draft 6 or working draft 7
In the following, changes of the noiseless coding when compared to a conventional noiseless
coding will be described. Accordingly, an embodiment is defined in terms of modifications
when compared to the working draft 6 or working draft 7of the USAC draft standard.
In particular, Changes to WD text will be described. In other words, this section lists the
complete set of changes against the USAC specification WD7.
14.1. Changes to the Technical description
The proposed new noiseless coding engenders the modifications in the MPEG USAC WD
which will be described in the following. The main differences are marked.
14.1.1. Changes of the syntax and the payload
Fig. 7 shows a representation of a syntax of the arithmetically coded data "arith_data()". The
main differences are marked.
In the following, changes with respect to the Payloads of the spectral noiseless coder will be
described.
Spectral coefficients from both the "linear prediction-domain" coded signal and the
"frequency-domain" coded signal are scalar quantized and then noiselessly coded by an
adaptively context dependent arithmetic coding. The quantized coefficients are gathered
together in 2-tuples before being transmitted from the lowest-frequency to the highestfrequency.
It should be noted that the usage of 2-tuples constitutes a change when compared
to previous versions of the spectral noiseless coding.
However, it is a further change that each 2-tuple is split into the sign s, the most significant 2
bits-wise plane, m, and the remaining less significant bit-planes, r. Also, it is a change that the
value m is coded according to the coefficient's neighborhood, and that the remaining less
significant bit-planes, r, are entropy coded without considering the context. Also, it is a
change with respect to some of the previous versions that the values m and r form the symbols
of the arithmetic coder. Finally, it is a change with respect to some of the previous versions
that the signs s are coded outside the arithmetic coder using 1 bit per non-null quantized
coefficient.
A detailed arithmetic decoding procedure is described below in section 14.2.3.
14.1.2 Changes of the Definitions and help elements
Changes to the definitions and help elements are shown in the representation of definitions
and help elements in Fig. 38.
14.2 Spectral Noiseless coding
In the following, the spectral noiseless coding according to an embodiment will be
summarized.
14.2.1 Tool description
Spectral noiseless coding is used to further reduce the redundancy of the quantized spectrum.
The spectral noiseless coding scheme is based on an arithmetic coding in conjunction with a
dynamically adapted context. The noiseless coding is fed by the quantized spectral values and
uses context dependent cumulative frequencies tables derived from four previously decoded
neighboring. Here, neighborhood in both, time and frequency is taken into account, as
illustrated in Figure 25. The cumulative frequencies tables are then used by the arithmetic
coder to generate a variable length binary code.
The arithmetic coder produces a binary code for a given set of symbols and their respective
probabilities. The binary code is generated by mapping a probability interval, where the set of
symbols lies, to a codeword.
14.2.2 Definitions
Definitions and help elements are described in Fig. 39. Changes when compared to previous
versions of the arithmetic coding are marked.
14.2.3 Decoding process
The quantized spectral coefficients qdec are noiselessly decoded starting from the lowestfrequency
coefficient and progressing to the highest-frequency coefficient. They are decoded
by groups of two successive coefficients a and b gathering in a so-called 2-tuple {a,b}.
The decoded coefficients for AAC are then stored in the array x_ac_quant[g] [win] [sft>][bin] .
The order of transmission of the noiseless coding codewords is such that when they are
decoded in the order received and stored in the array, bin is the most rapidly incrementing
index and g is the most slowly incrementing index. Within a codeword the order of decoding
is a and then b.
The decoded coefficients for the TCX are stored in the array xjcxjnvquant[win] [bin] , and
the order of the transmission of the noiseless coding codewords is such that when they are
decoded in the order received and stored in the array, bin is the most rapidly incrementing
index and win is the most slowly incrementing index. Within a codeword the order of
decoding is a and then b.
The decoding process starts with an initialization phase where a mapping is done between the
saved past context stored in qs and the context of the current frame q. The past context qs is
stored on 2 bits per frequency line.
For details, reference is mad to the pseudo program code representation of the algorithm
"arith_map_context" in Fig. 40a.
The noiseless decoder outputs 2-tuples of unsigned quantized spectral coefficients. At first,
the state c of the context is calculated based on the previously decoded spectral coefficients
surrounding the 2-tuple to decode. The state is incrementally updated using the context state
of the last decoded 2-tuple considering only two new 2-tuples. The state is coded on 17 bits
and is returned by the function arith_get_context().
A pseudo program code representation of the function "arith_get_context()" is shown in Fig.
40b.
Once the context state c is calculated, the most significant 2-bits wise plane m is decoded
using the arith_decodeQ fed with the appropriated cumulative frequencies table
corresponding to the probability model corresponding to the context state. The
correspondence is made by the function arxth_get k .
A pseudo program code representation of the function arith_get_pk() is shown in Fig. 40c.
The value m is decoded using the function arith decodef) called with the cumulative
frequencies table, arith_cf_m[pki][] , where phi corresponds to the index returned by
arith_get_pk(). The arithmetic coder is an integer implementation using the method of tag
generation with scaling. The pseudo C-code shown in Figs. 40d and 40e describes the used
algorithm.
When the decoded value is the escape symbol, ARITH_ESCAPE, the variable lev and
esc ib are incremented by one and another value m is decoded. In this case, the function
get_pk() is called once again with the value c&esc_nb«17 as input argument, where escjxb
is the number of escape symbols previously decoded for the same 2-tuple and bounded to 7.
Once the value m is not the escape symbol, ARITH ESCAPE, the decoder checks if the
successive m form a ARITH_STOP symbol. If the condition (esc_nb>0 && m==0) is true,
the ARITH_STOP symbol is detected and the decoding process is ended. The decoder jumps
directly to the arith_save_context() function. The condition means that the rest of the frame is
composed of zero values.
If the ARJTH_STOP symbol is not met, the remaining bit planes are then decoded if any
exists for the present 2-tuple. The remaining bit planes are decoded from the most significant
to the lowest significant level by calling lev times arith_decode() with the cumulative
frequencies table arith_cf_r[]. The decoded bit planes r permit to refine the previously
decoded value m by the function or algorithm a pseudo program code representation of which
is shown in Fig. 40f.
At this point, the unsigned value of the 2-tuple {a,b} is completely decoded. The context q is
then updated for the next 2-tuple. If it is the last 2-tuple, as well. The both updates are made
by the function arith_update_context(), a pseudo program code representation of which is
shown in Fig. 40g.
The next 2-tuple of the frame is then decoded by incrementing i by one and calling the
function. If the lg/2 2-tuple were already decoded with the frame or if the stop symbol
ARITH_STOP occurred, the function arith_save_context() is called. The context is saved and
stored in qs for the next frame. A pseudo program code representation of the function or
algorithm arith_save_context() is shown in Fig. 40h.
Once all unsigned quantized spectral coefficients are decoded, the sign is then added. For
each non-null quantized value of qdec a bit is read. If the read bit value is equal to zero, the
quantized value is positive, nothing is done and the signed value is equal to the previously
decoded unsigned value. Otherwise, the decoded coefficient is negative and the two's
complement is taken from the unsigned value. The sign bits are read from the low to the high
frequencies.
14.2.4 Undated Tables
A set of re-trained tables for use with the algorithms described above is shown in Figs. 41(1),
41(2), 42(1), 42(2), 42(3), 42(4), 43(1), 43(2), 43(3), 43(4), 43(5), 43(6) and 44.
Figs. 41(1) and 41(2) show a table representation of a content of a table
"ari_lookup_m[742]", according to an embodiment of the invention;
Figs. 42 (1),(2),(3),(4) show a table representation of a content of a table
"ari_hash_m[742]", according to an embodiment of the invention;
Figs. 43 (1),(2),(3),(4),(5),(6) show a table representation of a content of a table
"ari_cf_m[96][17]", according to an embodiment of the invention; and
Fig. 44 shows a table representation of a table "ari_cf_r[4] , according to an embodiment of
the invention.
To summarize the above, it can be seen that embodiments according to the present invention
provide a particularly good trade-off between computational complexity, memory
requirements and coding efficiency.
Bitstream Syntax
15.1 Payloads of the Spectral Noiseless Coder
In the following, some details regarding the payloads of the spectral noiseless coder will be
described. In some embodiments, there is a plurality of different coding modes, such as, for
example, a so-called "linear-prediction-domain" coding mode and a "frequency-domain"
coding mode. In the linear-prediction-domain coding mode, a noise shaping is performed on
the basis of a linear-prediction analysis of the audio signal, and a noise-shaped signal is
encoded in the frequency-domain. In the frequency-domain coding mode a noise shaping is
performed on the basis of a psychoacoustic analysis and a noise shaped version of the audio
content is encoded in the frequency-domain.
Spectral coefficients from both the "linear-prediction-domain" coded signal and the
"frequency-domain" coded signal are scalar quantized and then noiselessly coded by an
adaptively context dependent arithmetic coding. The quantized coefficients are gathered
together into 2-tuples before being transmitted from the lowest frequency to the highest
frequency. Each 2-tuple is split into a sign s, the most significant 2-bits-wise-plane m, and the
remaining one or more less-significant bit-planes r (if any). The value m is coded according to
a context defined by the neighboring spectral coefficients. In other words, m is coded
according to the coefficients neighborhood. The remaining less-significant bit-planes r are
entropy coded without considering the context. By means of m and r, the amplitude of these
spectral coefficients can be reconstructed on the decoder side. Fo all non-null symbols, the
signs s is coded outside the arithmetic coder using 1-bit. In other words, the values m and r
form the symbols of the arithmetic coder. Finally, the signs s, are coded outside of the
arithmetic coder using 1-bit per non-null quantized coefficient.
A detailed arithmetic coding procedure is described herein.
15.2 Syntax Elements according to Figs 6a to 6i
In the following, the bitstream syntax of a bitstream carrying the arithmetically-encoded
spectral information will be described taking reference to Figs. 6a to 6j.
Fig. 6a shows a syntax representation of so-called USAC raw data block
("usac_raw_data_block()") ,
The USAC raw data block comprises one or more single channel elements
("single channel elementO") and/or one or more channel pair elements
("channeljpair_element()") .
Taking reference now to Fig. 6b, the syntax of a single channel element is described. The
single channel element comprises a linear-prediction-domain channel stream
("lpd_channel_stream ()") or a frequency-domain channel stream ("fd_channel_stream ()") in
dependence on the core mode.
Fig. 6c shows a syntax representation of a channel pair element. A channel pair element
comprises core mode information ("core modeO", "core_model"). In addition, the channel
pair element may comprise a configuration information "ics_info()". Additionally, depending
on the core mode information, the channel pair element comprises a linear-prediction-domain
channel stream or a frequency-domain channel stream associated with a first of the channels,
and the channel pair element also comprises a linear-prediction-domain channel stream or a
frequency-domain channel stream associated with a second of the channels.
The configuration information "ics_info()", a syntax representation of which is shown in Fig.
6d, comprises a plurality of different configuration information items, which are not of
particular relevance for the present invention.
A frequency-domain channel stream ("fd_channel_stream ()"), a syntax representation of
which is shown in Fig. 6e, comprises a gain information ("global gain") and a configuration
information ("icsjnfo ()"). In addition, the frequency-domain channel stream comprises scale
factor data ("scale_factor_data ()"), which describes scale factors used for the scaling of
spectral values of different scale factor bands, and which is applied, for example, by the scaler
150 and the rescaler 240. The frequency-domain channel stream also comprises
arithmetically-coded spectral data ("ac spectral data 0"). which represents arithmeticallyencoded
spectral values.
The arithmetically-coded spectral data ("ac_spectral_data()"), a syntax representation of
which is shown in Fig. 6f, comprises an optional arithmetic reset flag ("arith_reset_fiag"),
which is used for selectively resetting the context, as described above. In addition, the
arithmetically-coded spectral data comprise a plurality of arithmetic-data blocks
("arith_data"), which carry the arithmetically-coded spectral values. The structure of the
arithmetically-coded data blocks depends on the number of frequency bands (represented by
the variable "num_bands") and also on the state of the arithmetic reset flag, as will be
discussed in the following.
In the following, the structure of the arithmetically encoded data-block will be described
taking reference to Fig. 6g, which shows a syntax representation of said arithmetically-coded
data-blocks. The data representation within the arithmetically-coded data-block depends on
the number lg of spectral values to be encoded, the status of the arithmetic reset flag and also
on the context, i.e. the previously-encoded spectral values.
The context for the encoding of the current set (e.g., 2-tuple) of spectral values is determined
in accordance with the context determination algorithm shown at reference numeral 660.
Details with respect to the context determination algorithm have been explained above, taking
reference to Figs. 5a and 5b. The arithmetically-encoded data-block comprises lg/2 sets of
codewords, each set of codewords representing a plurality (e.g., a 2-tuple) of spectral values.
A set of codewords comprises an arithmetic codeword "acod_m[pki][m]" representing a
most-significant bit-plane value m of the tuple of spectral values using between 1 and 20 bits.
In addition, the set of codewords comprises one or more codewords "acod_r[r]" if the tuple of
spectral values requires more bit-planes than the most-significant bit-plane for a correct
representation. The codeword "acod_r[r]" represents a less-significant bit-plane using
between 1 and 14 bits.
If, however, one or more less-significant bit-planes are required (in addition to the mostsignificant
bit-plane) for a proper representation of the spectral values, this is signaled by
using one or more arithmetic escape codewords ("ARITH_ESCAPE"). Thus, it can be
generally said that for a spectral value, it is determined how many bit-planes (the mostsignificant
bit-plane and, possibly, one or more additional less-significant bit-planes) are
required. If one or more less-significant bit-planes are required, this is signaled by one or
more arithmetic escape codewords "acod_m[pki][ARITH_ESCAPE]", which are encoded in
accordance with a currently selected cumulative-frequencies-table, a cumulative-frequenciestable-
index of which is given by the variable "pki". In addition, the context is adapted, as can
be seen at reference numerals 664, 662, if one or more arithmetic escape codewords are
included in the bitstream. Following the one or more arithmetic escape codewords, an
arithmetic codeword "acod_m[pki][m]" is included in the bitstream, as shown at reference
numeral 663, wherein "pki" designates the currently valid probability model index (taking the
context adaptation caused by the inclusion of the arithmetic escape codewords into
consideration) and wherein m designates the most-significant bit-plane value of the spectral
value to be encoded or decoded (wherein m is different from the "ARITH_ESCAPE"
codeword).
As discussed above, the presence of any less-significant bit-plane results in the presence of
one or more codewords "acod_r[r]", each of which represents 1 bit of a least-significant bitplane
of a first spectral value and each of which also represents 1 bit of a least-significant bitplane
of a second spectral value. The one or more codewords "acod_r[r]" are encoded in
accordance with a corresponding cumulative-frequencies-table, which may, for example, be
constant and context-independent. However, different mechanisms for the selection of the
cumulative-frequencies-table for the decoding of the one or more codewords "acod_r[r]" are
possible.
In addition, it should be noted that the context is updated after the encoding of each tuple of
spectral values, as shown at reference numeral 668, such that the context is typically different
for encoding and decoding two subsequent tuples of spectral values.
Fig. 6i shows a legend of definitions and help elements defining the syntax of the
arithmetically encoded data-block.
Moreover, an alternative syntax of the arithmetic data "arifh_data()" is shown in Fig. 6h, with
a corresponding legend of definitions and help elements shown in Fig. 6j.
To summarize the above, a bitstream format has been described, which may be provided by
the audio encoder 100 and which may be evaluated by the audio decoder 200. The bitstream
of the arithmetically encoded spectral values is encoded such that it fits the decoding
algorithm discussed above.
In addition, it should be generally noted that the encoding is the inverse operation of the
decoding, such that it can generally be assumed that the encoder performs a table lookup
using the above-discussed tables, which is approximately inverse to the table lookup
performed by the decoder. Generally, it can be said that a man skilled in the art who knows
the decoding algorithm and/or the desired bitstream syntax will easily be able to design an
arithmetic encoder, which provides the data defined in the bitstream syntax and required by an
arithmetic decoder.
Moreover, it should be noted that the mechanisms for determining the numeric current context
value and for deriving a mapping rule index value may be identical in an audio encoder and
an audio decoder, because it is typically desired that the audio decoder uses the same context
as the audio encoder, such that the decoding is adapted to the encoding.
15.3. Syntax Elements according to Figs 6k, 61, 6m. 6n, 6o and 6p
In the following, an extract from an alternative bitstream syntax will be described taking
reference to Figs. 6k, 61, 6m, 6n, 6o and 6p.
Fig. 6k shows a syntax representation of a bitstream element
"UsacSingleChannelElement(indepFlag)". Said syntax element
"UsacSingleChanneIEIement(indepFlag)" comprises a syntax element "UsacCoreCoderData"
describing one core coder channel.
Fig. 6 1 shows a syntax representation of a bitstream element
"UsacChannelPairElement(indepFlag)". Said syntax element
"UsacChannelPairElement(indepFlag)" comprises a syntax element "UsacCoreCoderData"
describing one or two core coder channels, depending on a stereo configuration.
Fig. 6m shows a syntax representation of a bitstream element "ics_info()" which comprises
definitions of a number of parameters, as can be seen in Fig. 6m.
Fig. 6n shows a syntax representation of a bitstream element "UsacCoreCoderDataO". The
bitstream element "UsacCoreCoderDataO" comprises one or more linear-prediction-domain
channel streams "lpd_channel_srream()" and/or one or more frequency domain channel
streams "fd_channel_streamO". Some other control information may optionally also be
included in the bitstream element "UsacCoreCoderDataO", as can be seen in Fig. 6n.
Fig. 6o shows a syntax representation of a bitstream element "fd_channel_stream()". The
bitstream element "fd_channel_strearn()" comprises, among other optional bitstream
elements, a bitstream element "scale_factor_data()" and a bitstream element
"ac_spectral_data()".
Fig 6p shows a syntax representation of a bitstream element "ac_spectral data()"- The
bitstream element "ac_spectral_data()" optionally comprises a bitstream element
"arith_reset_flag". Moreover, the bitstream element also comprises a number of arithmetically
encoded data "arith_data()". The arithmetically encoded data may, for example, follow the
bitstream syntax described with reference to Fig. 6g.
16. Implementation Alternatives
Although some aspects have been described in the context of an apparatus, it is clear that
these aspects also represent a description of the corresponding method, where a block or
device corresponds to a method step or a feature of a method step. Analogously, aspects
described in the context of a method step also represent a description of a corresponding block
or item or feature of a corresponding apparatus. Some or all of the method steps may be
executed by (or using) a hardware apparatus, like for example, a microprocessor, a
programmable computer or an electronic circuit. In some embodiments, some one or more of
the most important method steps may be executed by such an apparatus.
The inventive encoded audio signal can be stored on a digital storage medium or can be
transmitted on a transmission medium such as a wireless transmission medium or a wired
transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be
implemented in hardware or in software. The implementation can be performed using a digital
storage medium, for example a floppy disk, a DVD, a Blue-Ray, a CD, a ROM, a PROM, an
EPROM, an EEPROM or a FLASH memory, having electronically readable control signals
stored thereon, which cooperate (or are capable of cooperating) with a programmable
computer system such that the respective method is performed. Therefore, the digital storage
medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically
readable control signals, which are capable of cooperating with a programmable computer
system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program
product with a program code, the program code being operative for performing one of the
methods when the computer program product runs on a computer. The program code may for
example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods
described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program
having a program code for performing one of the methods described herein, when the
computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital
storage medium, or a computer-readable medium) comprising, recorded thereon, the computer
program for performing one of the methods described herein. The data carrier, the digital
storage medium or the recorded medium are typically tangible and/or non-transitionary.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of
signals representing the computer program for performing one of the methods described
herein. The data stream or the sequence of signals may for example be configured to be
transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a
programmable logic device, configured to or adapted to perform one of the methods described
herein.
A further embodiment comprises a computer having installed thereon the computer program
for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system
configured to transfer (for example, electronically or optically) a computer program for
performing one of the methods described herein to a receiver. The receiver may, for example,
be a computer, a mobile device, a memory device or the like. The apparatus or system may,
for example, comprise a file server for transferring the computer program to the receiver .
In some embodiments, a programmable logic device (for example a field programmable gate
array) may be used to perform some or all of the functionalities of the methods described
herein. In some embodiments, a field programmable gate array may cooperate with a
microprocessor in order to perform one of the methods described herein. Generally, the
methods are preferably performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of the present
invention. It is understood that modifications and variations of the arrangements and the
details described herein will be apparent to others skilled in the art. It is the intent, therefore,
to be limited only by the scope of the impending patent claims and not by the specific details
presented by way of description and explanation of the embodiments herein.
17. Conclusions
To conclude, embodiments according to the invention comprise one or more of the following
aspects, wherein the aspects may be used individually or in combination.
a) Context state hashing mechanism
According to an aspect of the invention, the states in the hash table are considered as
significant states and group boundaries. This permits to significantly reduce the size of
the required tables.
b). Incremental Context Update
According to an aspect, some embodiments according to the invention comprise a
computationally efficient manner for updating the context. Some embodiments use an
incremental context update in which a numeric current context value is derived from a
numeric previous context value.
c) . Context Derivation
According to an aspect of the invention, using the sum of two spectral absolute values
is association of a truncation. It is a kind of gain vector quantization of the spectral
coefficients (as opposition to the conventional shape-gain vector quantization). It aims
to limit the context order, while conveying the most meaningful information from the
neighborhood.
d) Updated tables
According to an aspect of the invention, optimized tables ari_hash_rn[742],
ari lookup_m[742] and ari_cf_m[64][17], which provide for a particularly good
compromise between coding efficiency and computational complexity are applied.
Some other technologies, which are applied in embodiments according to the invention, are
described in patent applications PCT EP20 10/065725, PCT EP20 10/065 726, and PCT EP
2010/065727. Moreover, in some embodiments according to the invention, a stop symbol is
used. Moreover, in some embodiments, only the unsigned values are considered for the
context.
However, the above-mentioned International patent applications disclose aspects which are
still in use in some embodiments according to the invention.
For example, an identification of a zero-region is used in some embodiments of the invention.
Accordingly, a so-called "small-value-flag" is set (e.g., bit 16 of the numeric current context
value c).
In some embodiments, the region-dependent context computation may be used. However, in
other embodiments, a region-dependent context computation may be omitted in order to keep
the complexity and the size of the tables reasonably small.
Moreover, the context hashing using a hash function is an important aspect of the invention.
The context hashing may be based on the two-table concept which is described in the abovereferenced
non-pre-published International patent applications. However, specific adaptations
of the context hashing may be used in some embodiments in order to increase the
computational efficiency. Nevertheless, in some other embodiments according to the
invention, the context hashing which is described in the above-referenced International patent
applications may be used.
Moreover, it should be noted that the incremental context hashing is rather simple and
computationally efficient. Also, the context-independence from the sign of the values, which
is used in some embodiments of the invention, helps to simplify the context, thereby keeping
the memory requirements reasonably low.
In some embodiments of the invention, a context derivation using the sum of two spectral
values and a context limitation is used. These two aspects can be combined. Both aim to limit
the context order by conveying the most meaningful information from the neighborhood.
In some embodiments, a small-value-flag is used which may be similar to an identification of
a group of a plurality of zero values.
In some embodiments according to the invention, an arithmetic stop mechanism is used. The
concept is similar to the usage of a symbol "end-of-block" in JPEG, which has a comparable
function. However, in some embodiments of the invention, the symbol ("ARITH STOP") is
not included explicitly in the entropy coder. Instead, a combination of already existing
symbols, which could not occur previously, is used, i.e. "ESC+0". In other words, the audio
decoder is configured to detect a combination of existing symbols, which are not normally
used for representing a numeric value, and to interpret the occurrence of such a combination
of already existing symbols as an arithmetic stop condition.
An embodiment according to the invention uses a two-table context hashing mechanism.
To further summarize, some embodiments according to the invention may comprise one or
more of the following five main aspects.
• improved tables;
· extended context for detecting either zero-regions or small amplitude regions in the
neighborhood;
• context hashing;
• context state generation: incremental update of the context state; and
• context derivation: specific quantization of the context values including summation of
the amplitudes and limitation.
To further conclude, one aspect of embodiments according to the present invention lies in an
incremental context update. Embodiments according to the invention comprise an efficient
concept for the update of the context, which avoids the extensive calculations of the working
draft (for example, of the working draft 5). Rather, simple shift operations and logic
operations are used in some embodiments. The simple context update facilitates the
computation of the context significantly.
In some embodiments, the context is independent from the sign of the values (e.g., the
decoded spectral values). This independence of the context from the sign of the values brings
along a reduced complexity of the context variable. This concept is based on the finding that a
neglect of the sign in the context does not bring along a severe degradation of the coding
efficiency.
According to an aspect of the invention, the context is derived using the sum of two spectral
values. Accordingly, the memory requirements for storage of the context are significantly
reduced. Accordingly, the usage of a context value, which represents the sum of two spectral
values, may be considered as advantageous in some cases.
Also, the context limitation brings along a significant improvement in some cases. In addition
to the derivation of the context using the sum of two spectral values, the entries of the context
array "q" are limited to a maximum value of "OxF" in some embodiments, which in turn
results in a limitation of the memory requirements. This limitation of the values of the context
array "q" brings along some advantages.
In some embodiments, a so-called "small value flag" is used. In obtaining the context variable
c (which is also designated as a numeric current context value), a flag is set if the values of
some entries "q[l][i-3]" to "q[l][i-l]" are very small. Accordingly, the computation of the
context can be performed with high efficiency. A particularly meaningful context value (e.g.
numeric current context value) can be obtained.
In some embodiments, an arithmetic stop mechanism is used. The "ARITH_STOP"
mechanism allows for an efficient stop of the arithmetic encoding or decoding if there are
only zero values left. Accordingly, the coding efficiency can be improved at moderate costs in
terms of complexity.
According to an aspect of the invention, a two-table context hashing mechanism is used. The
mapping of the context is performed using an interval-division algorithm evaluating the table
"ari_hash_m" in combination with a subsequent lookup table evaluation of the table
"ari_lookup_m". This algorithm is more efficient than the WD3 algorithm.
In the following, some additional details will be discussed.
It should be noted here that the tables "arith_hash_m[742]" and "arith_lookup_m[742]" are
two distinct tables. The first is used to map a single context index (e.g. numeric context value)
to a probability model index (e.g., mapping rule index value) and the second is used for
mapping a group of consecutive contexts, delimited by the context indices in
"arith_hash_m[]", into a single probability model.
It should further be noted that table "arith_cf_m sb[64][16]" may be used as an alternative to
the table "ari_cf_m[64][17]", even though the dimensions are slightly different.
"ari_cf_m[][]" and "ari_cf_msb[][]" may refer to the same table, as the 17th coefficients of the
probability models are always zero. It is sometimes not taken into account when counting the
required space for storing the tables.
To summarize the above, some embodiments according to the invention provide a proposed
new noiseless coding (encoding or decoding), which engenders modifications in the MPEG
USAC working draft (for example, in the MPEG USAC working draft 5). Said modifications
can be seen in the enclosed figures and also in the related description.
As a concluding remark, it should be noted that the prefix "ari" and the prefix "arith" in names
of variables, arrays, functions, and so on, are used interchangeably.
Claims
1. An audio decoder (200; 800) for providing a decoded audio information (212; 812) on
the basis of an encoded audio information (210; 810), the audio decoder comprising:
an arithmetic decoder (230; 820) for providing a plurality of decoded spectral values
(232; 822) on the basis of an arithmetically encoded representation (222; 821) of the
spectral values; and
a frequency-domain-to-time-domain converter (260; 830) for providing a time-domain
audio representation (262; 812) using the decoded spectral values (232; 822), in order
to obtain the decoded audio information (212; 812);
wherein the arithmetic decoder (232; 820) is configured to select a mapping rule (297;
cum_freq[]) describing a mapping of a code value (value) representing a spectral
value, or a most significant bit-plane of a spectral value, in an encoded form, onto a
symbol code (symbol) representing a spectral value, or a most significant bitplane of a
spectral value, in a decoded form, in dependence on a context state (s) described by a
numeric current context value (c);
wherein the arithmetic decoder (230; 820) is configured to determine the numeric
current context value (c) in dependence on a plurality of previously decoded spectral
values;
wherein the arithmetic decoder is configured to evaluate a hash table (aii_hash_m[]),
entries of which define both significant state values amongst the numeric context
values and boundaries of intervals of numeric context values, in order to select the
mapping rule,
wherein the arithmetic decoder is configured to evaluate the hash table for finding a
hash table index value i for which the value arijiash_m[i]»8 is equal or greater than
c, while, if the found hash table index value i is greater than 0, the value ari_hash_m[i-
1]»8 is lower than c;
wherein the arithmetic decoder is configured to select the mapping rule which is
determined by a probability model index (pki) which equals to ari_hash_m[i]&&OxFF
when ari_hash_m[i]»8 is equal to c, or equals to ari_lookup_m[i] otherwise;
wherein the hash table ari_hash_m is defined as given in Figs. 22(1), 22(2), 22(3) and
22(4) ; and
wherein the mapping table ari_lookup_m is defined as given in Fig. 21.
The audio decoder according to claim 1, wherein the arithmetic decoder is configured
to evaluate the hash table using the algorithm
i = i rnin;
while ((i_max-i_min)>l)
{
i = i_min+((i_max-i_min)/2);
j = ari_hash_m[i] ;
if(c0»8))
i min=i;
else
return(j&0xFF);
}
return ari_lookup_m[i_max];
wherein c designates a variable representing the numeric current context value or a
scaled version thereof;
wherein i is a variable describing a current hash table index value;
wherein i_min is a variable initialized to designate a hash table index value of a first
entry of the hash table and selectively updated in dependence on a comparison
between c and (j ) ;
wherein the condition "c<(j»8)" defines that a state value described by the variable c
is smaller than a state value described by the table entry ari_hash_m[i];
wherein "j&OxFF" describes a mapping rule index value described by the table entry
ari_hash_m[i];
wherein i_max is a variable initialized to designate a hash table index value of a last
entry of the hash table and selectively updated in dependence on a comparison
between c and (j»8);
wherein the condition "c>(j»8)" defines that a state value described by the variable c
is larger than a state value described by the table entry ari_hash_m[i];
wherein j is a variable;
wherein the return value designates an index pki of a probability model, and is a
mapping rule index value;
wherein ari_hash_m designates the hash table;
wherein ari_hash_m[i] designates an entry of the hash table ari_hash_m having hash
table index value i;
wherein ari_lookup_m designates a mapping table; and
wherein ari_lookup_m[i_max] designates an entry of the mapping table ari_lookup_m
having mapping table index value i_max.
3. The audio decoder (200; 800) according to claim 1 or 2,
wherein the arithmetic decoder is configured to select the mapping rule (297;
cum req ]) describing a mapping of a code value (value) onto a symbol code
(symbol) on the basis of the mapping rule index value pki,
4. The audio decoder (200; 800) according to claim 3,
wherein the arithmetic decoder is configured to use the mapping rule index value as a
table index value to select the mapping rule (297; cum_freq[]) describing a mapping of
a code value (value) onto a symbol code (symbol).
5. The audio decoder (200; 800) according to one of claims 1 to 4, wherein the
arithmetic decoder is configured to select one of the sub-tables (ari_cf_m[pki][17]) of
the table ari_cf_m[64][17] as given in Figs. 23(1), 23(2), 23(3) as the selected
mapping rule.
6. The audio decoder according to one of claims 1 to 5,
wherein the arithmetic decoder is configured to obtain the numeric current context
value on the basis of a numeric previous context value using the algorithm
c = c»4;
if (i0)
c = c + (q[l][i-l]);
if(i > 3) {
if((q[l][i-3] + q[l][i-2] + q[l][i-l]) < 5)
return(c+0x 10000);
}
return (c);
wherein the algorithm receives, as input values, a value or variable c representing the
numeric previous context value, and a value or variable i representing an index of a 2-
tuple of spectral values to decode in a vector of spectral values;
wherein a value or variable N represents a window length of a reconstruction window
of the frequency-domain- to-time-domain converter; and
wherein the algorithm provides, as an output value, an updated value or variable c
representing the numeric current context value;
wherein an operation "c»4" describes a shift to the right by 4 bits of the value or
variable c,
wherein q[0][i+l] designates a context subregion value associated with a previous
audio frame and having associated a higher frequency index i+1, higher by one, than a
current frequency index of a 2-tuple of spectral values to be currently decoded; and
wherein q[l][i-l] designates a context subregion value associated with a current audio
frame and having associated a smaller frequency index i-1, smaller by one, than a
current frequency index of a 2-tuple of spectral values to be currently decoded;
wherein q[l][i-2] designates a context subregion value associated with a current audio
frame and having associated a smaller frequency index i-2, smaller by two, than a
current frequency index of a 2-tuple of spectral values to be currently decoded;
wherein q[l][i-3] designates a context subregion value associated with a current audio
frame and having associated a smaller frequency index i-3, smaller by three, than a
current frequency index of a 2-tuple of spectral values to be currently decoded.
The audio decoder according to claim 6,
wherein the arithmetic decoder is configured to update a context subregion value
q[l][i] associated with a current audio frame and having associated the current
frequency index of the 2-tuple of spectral values currently decoded, using a
combination of a plurality of spectral values currently decoded.
The audio decoder according to one of claims 6 or 7,
wherein the arithmetic decoder is configured to update a context subregion value
q[l][i] associated with a current audio frame and having associated the frequency
index of a 2-tuple of spectral values currently decoded using an algorithm
{
q[l][i] = a+b+l;
if(q[l][i]>0xF)
q[l][i] = 0xF;
}
wherein a and b are decoded unsigned quantized spectral coefficients of the 2-tuple
currently decoded; and
wherein i is the frequency index of the 2-tuple of spectral values currently decoded,
The audio decoder according to one of claims 1 to 8,
wherein the arithmetic decoder is configured to provide a decoded value m
representing a 2-tuple of decoded spectral values using the arithmetic decoding
algorithm
{
if (arith_first_symbol()) {
value = 0;
for (i=l; i<=16; i++) {
value = (value«l) | arith_get_next_bit();
}
low = 0;
high = 65535;
}
range = high-low+1;
cum =(((( t ) (value-low+l))«14)-((int) l))/range;
p = cum_freq-l;
do {
q = p + (cfJ»l);
if ( *q > cum ) {p=q; cfl++; }
cfl»=l;
}
while ( cfl>l );
symbol = p-cum_freq+l;
if (symbol)
high = low + (range*cum_freq[symbol-l])»14 - 1;
low += (range * cum_freq[symbol])»14;
for (;;) {
if (high<32768) { }
else if(low>=32768) {
value -= 32768;
low -= 32768;
high -= 32768;
}
else if (low>=16384 && high<49152) {
value -= 16384;
low -= 16384;
high -= 16384;
}
else break;
low += low;
high += high+l;
value = (value«l) j arith_get_next_bit();
}
return symbol;
}
wherein "cum_freq" is a variable describing a start of a selected table or sub-table
(ari_cf_m[pki][17]) describing a mapping of a code value (value) onto a symbol code
(symbol);
wherein "cfl" is a value or variable describing a length of the selected table or subtable
(ari_cf_m[pki][17]) describing a mapping of a code value (value) onto a symbol
code (symbol);
wherein a helper function arith_first_symbol() returns true if a symbol to be decoded
is a first symbol of a sequence of symbols, and returns false otherwise;
wherein a helper function get_next_bit() provides a next bit of the bitstream;
wherein a variable "low" is a global variable;
wherein a variable "high" is a global variable;
wherein a variable "value" is a global variable;
wherein "range" is a variable;
wherein "cum" is a variable;
wherein "p" is a variable pointing to an element of the selected table or sub-table
(ari_cf_m[pki][17]) describing a mapping of a code value (value) onto a symbol code
(symbol);
wherein "q" is a variable pointing to an element of the selected table or sub-table
(ari_cf_m[pki][17]) describing a mapping of a code value (value) onto a symbol code
(symbol);
wherein "*q" is a table element or sub-table element of the selected table or sub-table
(ari_cf_m[pki][17]) describing a mapping of a code value (value) onto a symbol code
(symbol)to which the variable q points;
wherein a variable "symbol" is returned by the arithmetic decoding algorithm; and
wherein the arithmetic decoder is configured to derive most-significant-bitplanevalues
of a currently decoded 2-tuple of spectral values from a return value of the
arithmetic decoding algorithm.
10. An audio decoder (200; 800) for providing a decoded audio information (212; 812) on
the basis of an encoded audio information (210; 810), the audio decoder comprising:
an arithmetic decoder (230; 820) for providing a plurality of decoded spectral values
(232; 822) on the basis of an arithmetically encoded representation (222; 821) of the
spectral values; and
a frequency-domain-to-time-domain converter (260; 830) for providing a time-domain
audio representation (262; 812) using the decoded spectral values (232; 822), in order
to obtain the decoded audio information (212; 812);
wherein the arithmetic decoder (232; 820) is configured to select a mapping rule (297;
cum_freq[]) describing a mapping of a code value (value) representing a spectral
value, or a most significant bit-plane of a spectral value, in an encoded form, onto a
symbol code (symbol) representing a spectral value, or a most significant bit-palne of
a spectral value, in a decoded form, in dependence on a context state (s) described by a
numeric current context value (c);
wherein the arithmetic decoder (230; 820) is configured to determine the numeric
current context value (c) in dependence on a plurality of previously decoded spectral
values;
wherein the arithmetic decoder is configured to evaluate a hash table (ari_hash_m[]),
entries of which define both significant state values amongst the numeric context
values and boundaries of intervals of numeric context values, in order to select the
mapping rule,
wherein the hash table ari_hash_m is defined as given in Figs. 22(1), 22(2), 22(3) and
22(4);
wherein the arithmetic decoder is configured to evaluate the hash table (ari_hash_m),
to determine whether the numeric current context value is identical to a table context
value described by an entry of the hash table (ari_hash_m) or to determine an interval
described by entries of the hash table (ari_hash_m) within which the numeric current
context value lies, and to derive a mapping rule index value (pki) describing a selected
mapping rule in dependence on a result of the evaluation.
11. The audio decoder according to claim 1 ,
wherein the arithmetic decoder is configured to compare the numeric current context
value (c), or a scaled version (s) of the numeric current context value, with a series of
the numerically ordered entries (j=ari_hash_m[i]) or sub-entries of the hash table
(ari_hash_m[]),
to iteratively obtain a hash table index value (i_max) of a table entry
(ari_lookup_m[i_max]), such that the numeric current context value (c) lies within an
interval defined by the obtained hash table entry (ari_hash_m[i_max]) designated by
the obtained hash table index value (i_max) and an adjacent hash table entry
(ari_hash_m[i_max-l]), and
wherein the arithmetic decoder is configured to determine a next entry of the series of
entries of the hash table (ari_hash_m[]) in dependence on a result of a comparison
between the numeric current context value (c), or a scaled version (s) of the numeric
current context value, and a current entry or sub-entry (ari_hash_m[i]) of the hash
table.
The audio decoder according to claim ,
wherein the arithmetic decoder is configured to select a mapping rule defined by a
second sub-entry (j&OxFF) of the hash table (ari_hash_m) designated by the current
hash table index value (i) if it is found that the numeric current context value (c) or a
scaled version (s) thereof is equal to the first sub-entry j> > ) f m e a s table
designated by the current hast table index value (i).
13. The audio decoder according to claim 11 or 12,
wherein the arithmetic decoder is configured to select a mapping rule defined by an
entry or subentry (ari_lookup_m[i_max]) of a mapping table ari_lookup_m if it is not
found that the numeric current context value is equal to a sub-entry of the hash table
(ari_hash_m), wherein the arithmetic decoder is configured to choose the entry or subentry
of the mapping table in dependence on the iteratively obtained hash table index
value (i_max).
14. The audio decoder according to one of claims 10 to 13, wherein the arithmetic decoder
is configured to selectively provide a mapping rule index value defined by the entry of
the hash table designated by the current hash table index value if it is found that the
numeric current context value (c) equals to a value (j»8) defined by the entry
(ari_hash_m[i]) of the hash table designated by the current hash table index value (i).
15. A method for providing a decoded audio information (212; 812) on the basis of an
encoded audio information (210; 810), the method comprising:
providing a plurality of decoded spectral values (232; 822) on the basis of an
arithmetically encoded representation (222; 821) of the spectral values; and
providing a time-domain audio representation (262; 812) using the decoded spectral
values (232; 822), in order to obtain the decoded audio information (212; 812);
wherein providing the plurality of decoded spectral values comprises selecting a
mapping rule (297; cum_freq[]) describing a mapping of a code value (value)
representing a spectral value, or a most-significant bit-plane of a spectral value, in an
encoded form, onto a symbol code (symbol) representing a spectral value, or a most
significant bit-plane of a spectral value, in a decoded form, in dependence on a context
state (s) described by a numeric current context value (c);
wherein the numeric current context value (c) is determined in dependence on a
plurality of previously decoded spectral values;
wherein a hash table (ari_hash_m[]), entries of which define both significant state
values amongst the numeric context values and boundaries of intervals of numeric
context values, is evaluated in order to select the mapping rule,
wherein the hash table is evaluated using the algorithm
i = i_min;
while ((i_max-i_min)>l)
{
i = i_min+((i_max-i_min)/2);
if(c<(j»8))
i_max = i ;
else if(c>(i»8))
i_min=i;
else
return(j&OxFF);
}
return ari_lookup_m[i_max];
wherein c designates a variable representing the numeric current context value or a
scaled version thereof;
wherein i is a variable describing a current hash table index value;
wherein ijmin is a variable initialized to designate a hash table index value of a first
entry of the hash table and selectively updated in dependence on a comparison
between c and (j»8);
wherein the condition "c<(j»8)" defines that a state value described by the variable c
is smaller than a state value described by the table entry ari_hashjn[i] ;
wherein "j&OxFF" describes a mapping rule index value described by the table entry
ari_hash_m[i];
wherein ijmax is a variable initialized to designate a hash table index value of a last
entry of the hash table and selectively updated in dependence on a comparison
between c and (j»8);
wherein the condition "c>(j»8)" defines that a state value described by the variable c
is larger than a state value described by the table entry ari_hash_m[i];
wherein j is a variable;
wherein the return value designates an index pki of a probability model, and is a
mapping rule index value;
wherein ari_hash_m designates the hash table;
wherein ari_hash_m[i] designates an entry of the hash table ari hashjn having hash
table index value i;
wherein ari_lookup_m designates a mapping table;
wherein ari_lookup_m[i_max] designates an entry of the mapping table arMookup m
having mapping table index value i_max;
wherein the hash table ari_hash_m is defined as given in Figs. 22(1), 22(2), 22(3),
22(4) ; and
wherein the mapping table ari lookup m is defined as given in Fig. 21.
16. A method for providing a decoded audio information (212; 812) on the basis of an
encoded audio information (210; 810), the metod comprising:
providing a plurality of decoded spectral values (232; 822) on the basis of an
arithmetically encoded representation (222; 821) of the spectral values; and
providing a time-domain audio representation (262; 812) using the decoded spectral
values (232; 822), in order to obtain the decoded audio information (212; 812);
wherein providing a plurality of decoded spectral values comprises selecting a
mapping rule (297; cum_freq[]) describing a mapping of a code value (value)
representing a spectral value, or a most significant bit-plane of a spectral value, in an
encoded form onto a symbol code (symbol) representing a spectral value, or a most
significant bit-plane of a spectral value, in a decoded form, in dependence on a context
state (s) described by a numeric current context value (c);
wherein the numeric current context value (c) is determined in dependence on a
plurality of previously decoded spectral values;
wherein a hash table (ari_hash_m[]), entries of which define both significant state
values amongst the numeric context values and boundaries of intervals of numeric
context values, is evaluated in order to select the mapping rule,
wherein the hash table ari_hash_m is defined as given in Figs. 22(1), 22(2), 22(3) and
22(4);
wherein the hash table (ari_hash_m) is evaluated to determine whether the numeric
current context value is identical to a table context value described by an entry of the
hash table (ari_hash_m) or to determine an interval described by entries of the hash
table (ari_hash_m) within which the numeric current context value lies, and
wherein a mapping rule index value (pki) describing a selected mapping rule is derived
in dependence on a result of the evaluation.
17. An audio encoder (100; 700) for providing an encoded audio information ( 12; 712)
on the basis of an input audio information (110; 710), the audio encoder comprising:
an energy-compacting time-domain-to-frequency-domain converter (130; 720) for
providing a frequency-domain audio representation (132; 722) on the basis of a timedomain
representation (110; 710) of the input audio information, such that the
frequency-domain audio representation (132; 722) comprises a set of spectral values;
and
an arithmetic encoder (170; 730) configured to encode a spectral value (a) or a
preprocessed version thereof using a variable length codeword (acod_m, acod_r),
wherein the arithmetic encoder (170) is configured to map a spectral value (a), or a
value (m) of a most significant bit-plane of a spectral value (a), onto a code value
(acod_m),
wherein the arithmetic encoder is configured to select a mapping rule describing a
mapping of a spectral value, or of a most significant bit-plane of a spectral value, onto
a code value, in dependence on a context state (s) described by a numeric current
context value (c); and
wherein the arithmetic encoder is configured to determine the numeric current context
value (c) in dependence on a plurality of previously-encoded spectral values; and
wherein the arithmetic encoder is configured to evaluate a hash table, entries of which
define both significant state values amongst the numeric context values and boundaries
of intervals of numeric context values, in order to select the mapping rule,
wherein the hash table ari_hash_m is defined as given in Figs. 22(1), 22(2), 22(3) and
22(4);
wherein the arithmetic encoder is configured to evaluate the hash table (ari_hash_m),
to determine whether the numeric current context value is identical to a table context
value described by an entry of the hash table (ari_hash_m) or to determine an interval
described by entries of the hash table (ari_hash_m) within which the numeric current
context value lies, and to derive a mapping rule index value (pki) describing a selected
mapping rule in dependence on a result of the evaluation.
A method for providing an encoded audio information (112; 712) on the basis of an
input audio information (110; 710), the method comprising:
providing a frequency-domain audio representation (132; 722) on the basis of a timedomain
representation ( 1 10; 710) of the input audio information, such that the
frequency-domain audio representation (132; 722) comprises a set of spectral values;
and
encoding a spectral value (a) or a preprocessed version thereof using a variable length
codeword (acod_m, acod_r), wherein a spectral value (a), or a value (m) of a most
significant bit-plane of a spectral value (a), is mapped onto a code value (acodjtn),
wherein a mapping rule describing a mapping of a spectral value, or of a most
significant bit-plane of a spectral value, onto a code value, is selected in dependence
on a context state (s) described by a numeric current context value (c); and
wherein the numeric current context value (c) is determined in dependence on a
plurality of previously-encoded spectral values; and
wherein a hash table, entries of which define both significant state values amongst the
numeric context values and boundaries of intervals of numeric context values, is
evaluated in order to select the mapping rule,
wherein the hash table ari_hash_m is defined as given in Figs. 22(1), 22(2), 22(3) and
22(4); and
wherein the hash table (ari_hash_m) is evaluated, to determine whether the numeric
current context value is identical to a table context value described by an entry of the
hash table (ari_hash_m) or to determine an interval described by entries of the hash
table (ari_hash_m) within which the numeric current context value lies, and wherein a
mapping rule index value (pki) describing a selected mapping rule is derived in
dependence on a result of the evaluation.
1 . A computer program for performing the method according to claim 16 or claim 18,
when the computer program runs on a computer.