Abstract: An audio encoder comprises a first information sink oriented encoding branch such as a spectral domain encoding branch, a second information source or SNR oriented encoding branch such as an LPC-domain encoding branch, and a switch for switching between the first encoding branch and the second encoding branch, wherein the second encoding branch comprises a converter into a specific domain different from the spectral domain such as an LPC analysis stage generating an excitation signal, and wherein the second encoding branch furthermore comprises a specific domain coding branch such as LPC domain processing branch, and a specific spectral domain coding branch such as LPC spectral domain processing branch, and an additional switch for switching between the specific domain coding branch and the specific spectral domain coding branch. An audio decoder comprises a first domain decoder such as a spectral domain decoding branch, a second domain decoder such as an LPC domain decoding branch for decoding a signal such as an excitation signal in the second domain, and a third domain decoder such as an LPC-spectral decoder branch and two cascaded switches for switching between the decoders.
Low Bitrate Audio Encoding/Decoding Scheme Having Cascaded
Switches
Field of the Invention
The present invention is related to audio coding and, par-
ticularly, to low bit rate audio coding schemes.
Background of the Invention and Prior Art
In the art, frequency domain coding schemes such as MP3 or
AAC are known. These frequency-domain encoders are based on
a time-domain/frequency-domain conversion, a subsequent
quantization stage, in which the quantization error is con-
trolled using information from a psychoacoustic module, and
an encoding stage, in which the quantized spectral coeffi-
cients and corresponding side information are entropy-
encoded using code tables.
On the other hand there are encoders that are very well
suited to speech processing such as the AMR-WB+ as de-
scribed in 3GPP TS 26.290. Such speech coding schemes per-
form a Linear Predictive filtering of a time-domain signal.
Such a LP filtering is derived from a Linear Prediction
analysis of the input time-domain signal. The resulting LP
filter coefficients are then quantized/coded and transmit-
ted as side information. The process is known as Linear
Prediction Coding (LPC) . At the output of the filter, the
prediction residual signal or prediction error signal which
is also known as the excitation signal is encoded using the
analysis-by-synthesis stages of the ACELP encoder or, al-
ternatively, is encoded using a transform encoder, which
uses a Fourier transform with an overlap. The decision be-
tween the ACELP coding and the Transform Coded excitation
coding which is also called TCX coding is done using a
closed loop or an open loop algorithm.
Frequency-domain audio coding schemes such as the high ef-
ficiency-AAC encoding scheme, which combines an AAC coding
scheme and a spectral band replication technique can also
be combined with a joint stereo or a multi-channel coding
tool which is known under the term "MPEG surround".
On the other hand, speech encoders such as the AMR-WB+ also
have a high frequency enhancement stage and a stereo func-
tionality.
Frequency-domain coding schemes are advantageous in that
they show a high quality at low bitrates for music signals.
Problematic, however, is the quality of speech signals at
low bitrates.
Speech coding schemes show a high quality for speech sig-
nals even at low bitrates, but show a poor quality for mu-
sic signals at low bitrates.
Summary of the Invention
It is an object of the present invention to provide an im-
proved encoding/decoding concept.
This object is achieved by an audio encoder in accordance
with claim 1, a method of audio encoding in accordance with
claim 15, a decoder in accordance with claim 16, a method
of decoding in accordance with claim 23, an encoded signal
in accordance with claim 24 or a computer program in accor-
dance with claim 25.
One aspect of the present invention is an audio encoder for
encoding an audio input signal, the audio input signal be-
ing in a first domain, comprising: a first coding branch
for encoding an audio signal using a first coding algorithm
to obtain a first encoded signal; a second coding branch
for encoding an audio signal using a second coding algo-
rithm to obtain a second encoded signal, wherein the first
coding algorithm is different from the second coding algo-
rithm; and a first switch for switching between the first
coding branch and the second coding branch so that, for a
portion of the audio input signal, either the first encoded
signal or the second encoded signal is in an encoder output
signal, wherein the second coding branch comprises: a con-
verter for converting the audio signal into a second domain
different from the first domain, a first processing branch
for processing an audio signal in the second domain to ob-
tain a first processed signal; a second processing branch
for converting a signal into a third domain different from
the first domain and the second domain and for processing
the signal in the third domain to obtain a second processed
signal; and a second switch for switching between the first
processing branch and the second processing branch so that,
for a portion of the audio signal input into the second
coding branch, either the first processed signal or the
second processed signal is in the second encoded signal.
A further aspect is a decoder for decoding an encoded audio
signal, the encoded audio signal comprising a first coded
signal, a first processed signal in a second domain, and a
second processed signal in a third domain, wherein the
first coded signal, the first processed signal, and the
second processed signal are related to different time por-
tions of a decoded audio signal, and wherein a first do-
main, the second domain and the third domain are different
from each other, comprising: a first decoding branch for
decoding the first encoded signal based on the first coding
algorithm; a second decoding branch for decoding the first
processed signal or the second processed signal, wherein
the second decoding branch comprises a first inverse
processing branch for inverse processing the first
processed signal to obtain a first inverse processed signal
in the second domain; a second inverse processing branch
for inverse processing the second processed signal to ob-
tain a second inverse processed signal in the second do-
main; a first combiner for combining the first inverse
processed signal and the second inverse processed signal to
obtain a combined signal in the second domain; and a con-
verter for converting the combined signal to the first do-
main; and a second combiner for combining the converted
signal in the first domain and the decoded first signal
output by the first decoding branch to obtain a decoded
output signal in the first domain.
In a preferred embodiment of the present invention, two
switches are provided in a sequential order, where a first
switch decides between coding in the spectral domain using
a frequency-domain encoder and coding in the LPC-domain,
i.e., processing the signal at the output of an LPC analy-
sis stage. The second switch is provided for switching in
the LPC-domain in order to encode the LPC-domain signal ei-
ther in the LPC-domain such as using an ACELP coder or cod-
ing the LPC-domain signal in an LPC-spectral domain, which
requires a converter for converting the LPC-domain signal
into an LPC-spectral domain, which is different from a
spectral domain, since the LPC-spectral domain shows the
spectrum of an LPC filtered signal rather than the spectrum
of the time-domain signal.
The first switch decides between two processing branches,
where one branch is mainly motivated by a sink model and/or
a psycho acoustic model, i.e. by auditory masking, and the
other one is mainly motivated by a source model and by seg-
mental SNR calculations. Exemplarily, one branch has a fre-
quency domain encoder and the other branch has an LPC-based
encoder such as a speech coder. The source model is usually
the speech processing and therefore LPC is commonly used.
The second switch again decides between two processing
branches, but in a domain different from the "outer" first
branch domain. Again one "inner" branch is mainly motivated
by a source model or by SNR calculations, and the other
"inner" branch can be motivated by a sink model and/or a
psycho acoustic model, i.e. by masking or at least includes
frequency/spectral domain coding aspects. Exemplarily, one
"inner" branch has a frequency domain encoder/spectral con-
verter and the other branch has an encoder coding on the
other domain such as the LPC domain, wherein this encoder
is for example an CELP or ACELP quantizer/sealer processing
an input signal without a spectral conversion.
A further preferred embodiment is an audio encoder compris-
ing a first information sink oriented encoding branch such
as a spectral domain encoding branch, a second information
source or SNR oriented encoding branch such as an LPC-
domain encoding branch, and a switch for switching between
the first encoding branch and the second encoding branch,
wherein the second encoding branch comprises a converter
into a specific domain different from the time domain such
as an LPC analysis stage generating an excitation signal,
and wherein the second encoding branch furthermore compris-
es a specific domain such as LPC domain processing branch
and a specific spectral domain such as LPC spectral domain
processing branch, and an additional switch for switching
between the specific domain coding branch and the specific
spectral domain coding branch.
A further embodiment of the invention is an audio decoder
comprising a first domain such as a spectral domain decod-
ing branch, a second domain such as an LPC domain decoding
branch for decoding a signal such as an excitation signal
in the second domain, and a third domain such as an LPC-
spectral decoder branch for decoding a signal such as an
excitation signal in a third domain such as an LPC spectral
domain, wherein the third domain is obtained by performing
a frequency conversion from the second domain wherein a
first switch for the second domain signal and the third do-
main signal is provided, and wherein a second switch for
switching between the first domain decoder and the decoder
for the second domain or the third domain is provided.
Brief Description of the Drawings
Preferred embodiments of the present invention are subse-
quently described with respect to the attached drawings, in
which:
Fig. lais a block diagram of an encoding scheme in accor-
dance with a first aspect of the present inven-
tion;
Fig. lb is a block diagram of a decoding scheme in accor-
dance with the first aspect of the present inven-
tion;
Fig. lc is a block diagram of an encoding scheme in ac-
cordance with a further aspect of the present in-
vention;
Fig. 2a is a block diagram of an encoding scheme in ac-
cordance with a second aspect of the present in-
vention;
Fig. 2b is a schematic diagram of a decoding scheme in
accordance with the second aspect of the present
invention.
Fig. 2c is a block diagram of an encoding scheme in ac-
cordance with a further aspect of the present in-
vention
Fig. 3a illustrates a block diagram of an encoding scheme
in accordance with a further aspect of the
present invention;
Fig. 3b illustrates a block diagram of a decoding scheme
in accordance with the further aspect of the
present invention;
Fig. 3c illustrates a schematic representation of the en-
coding apparatus/method with cascaded switches;
Fig. 3d illustrates a schematic diagram of an apparatus
or method for decoding, in which cascaded combin-
ers are used;
Fig. 3e illustrates an illustration of a time domain sig-
nal and a corresponding representation of the en-
coded signal illustrating short cross fade re-
gions which are included in both encoded signals;
Fig. 4a illustrates a block diagram with a switch posi-
tioned before the encoding branches;
Fig. 4b illustrates a block diagram of an encoding scheme
with the switch positioned subsequent to encoding
the branches;
Fig. 4c illustrates a block diagram for a preferred com-
biner embodiment;
Fig. 5a illustrates a wave form of a time domain speech
segment as a quasi-periodic or impulse-like sig-
nal segment;
Fig. 5b illustrates a spectrum of the segment of Fig. 5a;
Fig. 5c illustrates a time domain speech segment of un-
voiced speech as an example for a noise-like seg-
ment ;
Fig. 5d illustrates a spectrum of the time domain wave
form of Fig. 5c;
Fig. 6 illustrates a block diagram of an analysis by
synthesis CELP encoder;
Figs. 7a to 7d illustrate voiced/unvoiced excitation sig-
nals as an example for impulse-like signals;
Fig. 7e illustrates an encoder-side LPC stage providing
short-term prediction information and the predic-
tion error (excitation) signal;
Fig. 7f illustrates a further embodiment of an LPC device
for generating a weighted signal;
Fig. 7g illustrates an implementation for transforming a
weighted signal into an excitation signal by ap-
plying an inverse weighting operation and a sub-
sequent excitation analysis as required in the
converter 537 of Fig. 2b;
Fig. 8 illustrates a block diagram of a joint multi-
channel algorithm in accordance with an embodi-
ment of the present invention;
Fig. 9 illustrates a preferred embodiment of a bandwidth
extension algorithm;
Fig. 10a illustrates a detailed description of the switch
when performing an open loop decision; and
Fig. 10b illustrates an illustration of the switch when
operating in a closed loop decision mode.
Detailed Description of Preferred Embodiments
Fig. la illustrates an embodiment of the invention having
two cascaded switches. A mono signal, a stereo signal or a
multi-channel signal is input into a switch 200. The switch
200 is controlled by a decision stage 300. The decision
stage receives, as an input, a signal input into block 200.
Alternatively, the decision stage 300 may also receive a
side information which is included in the mono signal, the
stereo signal or the multi-channel signal or is at least
associated to such a signal, where information is existing,
which was, for example, generated when originally producing
the mono signal, the stereo signal or the multi-channel
signal.
The decision stage 300 actuates the switch 200 in order to
feed a signal either in a frequency encoding portion 400
illustrated at an upper branch of Fig. la or an LPC-domain
encoding portion 500 illustrated at a lower branch in Fig.
la. A key element of the frequency domain encoding branch
is a spectral conversion block 410 which is operative to
convert a common preprocessing stage output signal (as dis-
cussed later on) into a spectral domain. The spectral con-
version block may include an MDCT algorithm, a QMF, an FFT
algorithm, a Wavelet analysis or a filterbank such as a
critically sampled filterbank having a certain number of
filterbank channels, where the subband signals in this fil-
terbank may be real valued signals or complex valued sig-
nals. The output of the spectral conversion block 410 is
encoded using a spectral audio encoder 421, which may in-
clude processing blocks as known from the AAC coding
scheme.
Generally, the processing in branch 400 is a processing in
a perception based model or information sink model. Thus,
this branch models the human auditory system receiving
sound. Contrary thereto, the processing in branch 500 is to
generate a signal in the excitation, residual or LPC do-
main. Generally, the processing in branch 500 is a
processing in a speech model or an information generation
model. For speech signals, this model is a model of the hu-
man speech/sound generation system generating sound. If,
however, a sound from a different source requiring a dif-
ferent sound generation model is to be encoded, then the
processing in branch 500 may be different.
In the lower encoding branch 500, a key element is an LPC ,
device 510, which outputs an LPC information which is used
for controlling the characteristics of an LPC filter. This
LPC information is transmitted to a decoder. The LPC stage
510 output signal is an LPC-domain signal which consists of
an excitation signal and/or a weighted signal.
The LPC device generally outputs an LPC domain signal,
which can be any signal in the LPC domain such as the exci-
tation signal in Fig. 7e or a weighted signal in Fig. 7f or
any other signal, which has been generated by applying LPC
filter coefficients to an audio signal. Furthermore, an LPC
device can also determine these coefficients and can also
quantize/encode these coefficients.
The decision in the decision stage can be signal-adaptive
so that the decision stage performs a music/speech discrim-
ination and controls the switch 200 in such a way that mu-
sic signals are input into the upper branch 400, and speech
signals are input into the lower branch 500. In one embodi-
ment, the decision stage is feeding its decision informa-
tion into an output bit stream so that a decoder can use
this decision information in order to perform the correct
decoding operations.
Such a decoder is illustrated in Fig. lb. The signal output
by the spectral audio encoder 421 is, after transmission,
input into a spectral audio decoder 431. The output of the
spectral audio decoder 431 is input into a time-domain con-
verter 440. Analogously, the output of the LPC domain en-
coding branch 500 of Fig. la received on the decoder side
and processed by elements 531, 533, 534, and 532 for ob-
taining an LPC excitation signal. The LPC excitation signal
is input into an LPC synthesis stage 540, which receives,
as a further input, the LPC information generated by the
corresponding LPC analysis stage 510. The output of the
time-domain converter 440 and/or the output of the LPC syn-
thesis stage 540 are input into a switch 600. The switch
600 is controlled via a switch control signal which was,
for example, generated by the decision stage 300, or which
was externally provided such as by a creator of the origi-
nal mono signal, stereo signal or multi-channel signal. The
output of the switch 600 is a complete mono signal, stereo
signal or multichannel signal.
The input signal into the switch 200 and the decision stage
300 can be a mono signal, a stereo signal, a multi-channel
signal or generally an audio signal. Depending on the deci-
sion which can be derived from the switch 200 input signal
or from any external source such as a producer of the orig-
inal audio signal underlying the signal input into stage
200, the switch switches between the frequency encoding
branch 400 and the LPC encoding branch 500. The frequency
encoding branch 400 comprises a spectral conversion stage
410 and a subsequently connected quantizing/coding stage
421. The quantizing/coding stage can include any of the
functionalities as known from modern frequency-domain en-
coders such as the AAC encoder. Furthermore, the quantiza-
tion operation in the quantizing/coding stage 421 can be
controlled via a psychoacoustic module which generates psy-
choacoustic information such as a psychoacoustic masking
threshold over the frequency, where this information is in-
put into the stage 421.
In the LPC encoding branch, the switch output signal is
processed via an LPC analysis stage 510 generating LPC side
info and an LPC-domain signal. The excitation encoder in-
ventively comprises an additional switch for switching the
further processing of the LPC-domain signal between a quan-
tization/coding operation 522 in the LPC-domain or a quan-
tization/coding stage 524, which is processing values in
the LPC-spectral domain. To this end, a spectral converter
523 is provided at the input of the quantizing/coding stage
524. The switch 521 is controlled in an open loop fashion
or a closed loop fashion depending on specific settings as,
for example, described in the AMR-WB+ technical specifica-
tion.
For the closed loop control mode, the encoder additionally
includes an inverse quantizer/coder 531 for the LPC domain
signal, an inverse quantizer/coder 533 for the LPC spectral
domain signal and an inverse spectral converter 534 for the
output of item 533. Both encoded and again decoded signals
in the processing branches of the second encoding branch
are input into the switch control device 525. In the switch
control device 525, these two output signals are compared
to each other and/or to a target function or a target func-
tion is calculated which may be based on a comparison of
the distortion in both signals so that the signal having
the lower distortion is used for deciding, which position
the switch 521 should take. Alternatively, in case both
branches provide non-constant bit rates, the branch provid-
ing the lower bit rate might be selected even when the sig-
nal to noise ratio of this branch is lower than the signal
to noise ratio of the other branch. Alternatively, the tar-
get function could use, as an input, the signal to noise
ratio of each signal and a bit rate of each signal and/or
additional criteria in order to find the best decision for
a specific goal. If, for example, the goal is such that the
bit rate should be as low as possible, then the target
function would heavily rely on the bit rate of the two sig-
nals output by the elements 531, 534. However, when the
main goal is to have the best quality for a certain bit
rate, then the switch control 525 might, for example, dis-
card each signal which is above the allowed bit rate and
when both signals are below the allowed bit rate, the
switch control would select the signal having the better
signal to noise ratio, i.e., having the smaller quantiza-
tion/coding distortions.
The decoding scheme in accordance with the present inven-
tion is, as stated before, illustrated in Fig. lb. For
each of the three possible output signal kinds, a specific
decoding/re-quantizing stage 431, 531 or 533 exists. While
stage 431 outputs a time-spectrum which is converted into
the time-domain using the frequency/time converter 440,
stage 531 outputs an LPC-domain signal, and item 533 out-
puts an LPC-spectrum. In order to make sure that the input
signals into switch 532 are both in the LPC-domain, the
LPC-spectrum/LPC-converter 534 is provided. The output data
of the switch 532 is transformed back into the time-domain
using an LPC synthesis stage 540, which is controlled via
encoder-side generated and transmitted LPC information.
Then, subsequent to block 540, both branches have time-
domain information which is switched in accordance with a
switch control signal in order to finally obtain an audio
signal such as a mono signal, a stereo signal or a multi-
channel signal, which depends on the signal input into the
encoding scheme of Fig. la.
Fig. lc illustrates a further embodiment with a different
arrangement of the switch 521 similar to the principle of
Fig. 4b.
Fig. 2a illustrates a preferred encoding scheme in accor-
dance with a second aspect of the invention. A common pre-
processing scheme connected to the switch 200 input may
comprise a surround/joint stereo block 101 which generates,
as an output, joint stereo parameters and a mono output
signal, which is generated by downmixing the input signal
which is a signal having two or more channels. Generally,
the signal at the output of block 101 can also be a signal
having more channels, but due to the downmixing functional-
ity of block 101, the number of channels at the output of
block 101 will be smaller than the number of channels input
into block 101.
The common preprocessing scheme may comprise alternatively
to the block 101 or in addition to the block 101 a band-
width extension stage 102. In the Fig. 2a embodiment, the
output of block 101 is input into the bandwidth extension
block 102 which, in the encoder of Fig. 2a, outputs a band-
limited signal such as the low band signal or the low pass
signal at its output. Preferably, this signal is downsam-
pled (e.g. by a factor of two) as well. Furthermore, for
the high band of the signal input into block 102, bandwidth
extension parameters such as spectral envelope parameters,
inverse filtering parameters, noise floor parameters etc.
as known from HE-AAC profile of MPEG-4 are generated and
forwarded to a bitstream multiplexer 800.
Preferably, the decision stage 300 receives the signal in-
put into block 101 or input into block 102 in order to de-
cide between, for example, a music mode or a speech mode.
In the music mode, the upper encoding branch 400 is se-
lected, while, in the speech mode, the lower encoding
branch 500 is selected. Preferably, the decision stage ad-
ditionally controls the joint stereo block 101 and/or the
bandwidth extension block 102 to adapt the functionality of
these blocks to the specific signal. Thus, when the deci-
sion stage determines that a certain time portion of the
input signal is of the first mode such as the music mode,
then specific features of block 101 and/or block 102 can be
controlled by the decision stage 300. Alternatively, when
the decision stage 300 determines that the signal is in a
speech mode or, generally, in a second LPC-domain mode,
then specific features of blocks 101 and 102 can be con-
trolled in accordance with the decision stage output.
Preferably, the spectral conversion of the coding branch
400 is done using an MDCT operation which, even more pre-
ferably, is the time-warped MDCT operation, where the
strength or, generally, the warping strength can be con-
trolled between zero and a high warping strength. In a zero
warping strength, the MDCT operation in block 411 is a
straight-forward MDCT operation known in the art. The time
warping strength together with time warping side informa-
tion can be transmitted/input into the bitstream multiplex-
er 800 as side information.
In the LPC encoding branch, the LPC-domain encoder may in-
clude an ACELP core 526 calculating a pitch gain, a pitch
lag and/or codebook information such as a codebook index
and gain. The TCX mode as known from 3GPP TS 26.290 incurs
a processing of a perceptually weighted signal in the
transform domain. A Fourier transformed weighted signal is
quantized using a split multi-rate lattice quantization
(algebraic VQ) with noise factor quantization. A transform
is calculated in 1024, 512, or 256 sample windows. The ex-
citation signal is recovered by inverse filtering the quan-
tized weighted signal through an inverse weighting filter.
In the first coding branch 400, a spectral converter pre-
ferably comprises a specifically adapted MDCT operation
having certain window functions followed by a quantiza-
tion/entropy encoding stage which may consist of a single
vector quantization stage, but preferably is a combined
scalar quantizer/entropy coder similar to the quantiz-
er/coder in the frequency domain coding branch, i.e., in
item 421 of Fig. 2a.
In the second coding branch, there is the LPC block 510
followed by a switch 521, again followed by an ACELP block
526 or an TCX block 527. ACELP is described in 3GPP TS
26.190 and TCX is described in 3GPP TS 26.290. Generally,
the ACELP block 526 receives an LPC excitation signal as
calculated by a procedure as described in Fig. 7e. The TCX
block 527 receives a weighted signal as generated by Fig.
7f.
In TCX, the transform is applied to the weighted signal
computed by filtering the input signal through an LPC-based
weighting filter. The weighting filter used preferred embo-
diments of the invention is given by (1- A(z/γ))/(1 -µz-l) .
Thus, the weighted signal is an LPC domain signal and its
transform is an LPC-spectral domain. The signal processed
by ACELP block 526 is the excitation signal and is differ-
ent from the signal processed by the block 527, but both
signals are in the LPC domain.
At the decoder side illustrated in Fig. 2b, after the in-
verse spectral transform in block 537 , the inverse of the
weighting filter is applied, that is (1 -µZ-])/(1 -A(z/γ)) .
Then, the signal is filtered through (1-A(z)) to go to the
LPC excitation domain. Thus, the conversion to LPC domain
block 534 and the TCX-1 block 537 include inverse transform
and then filtering through to convert from
the weighted domain to the excitation domain.
Although item 510 in Figs, 1a, 1c, 2a, 2c illustrates a
single block, block 510 can output different signals as
long as these signals are in the LPC domain. The actual
mode of block 510 such as the excitation signal mode or the
weighted signal mode can depend on the actual switch state.
Alternatively, the block 510 can have two parallel
processing devices, where one device is implemented similar
to Fig. 7e and the other device is implemented as Fig. 7f.
Hence, the LPC domain at the output of 510 can represent
either the LPC excitation signal or the LPC weighted signal
or any other LPC domain signal.
In the second encoding branch (ACELP/TCX) of Fig. 2a or 2c,
the signal is preferably pre-emphasized through a filter
1-0.68z-1 before encoding. At the ACELP/TCX decoder in Fig.
2b the synthesized signal is deemphasized with the filter
1/(1-0.68z-l) . The preemphasis can be part of the LPC block
510 where the signal is preemphasized before LPC analysis
and quantization. Similarly, deemphasis can be part of the
LPC synthesis block LPC-1 540.
Fig. 2c illustrates a further embodiment for the implemen-
tation of Fig. 2a, but with a different arrangement of the
switch 521 similar to the principle of Fig. 4b.
In a preferred embodiment, the first switch 200 (see Fig.
la or 2a) is controlled through an open-loop decision (as
in Fig. 4a) and the second switch is controlled through a
closed-loop decision (as in figure 4b).
For example, Fig. 2c, has the second switch placed after
the ACELP and TCX branches as in Fig. 4b. Then, in the
first processing branch, the first LPC domain represents
the LPC excitation, and in the second processing branch,
the second LPC domain represents the LPC weighted signal.
That is, the first LPC domain signal is obtained by filter-
ing through (1-A(z)) to convert to the LPC residual domain,
while the second LPC domain signal is obtained by filtering
through the filter (1- A(z/ γ))l(1 -µz-1) to convert to the LPC
weighted domain.
Fig. 2b illustrates a decoding scheme corresponding to the
encoding scheme of Fig. 2a. The bitstream generated by bit-
stream multiplexer 800 of Fig. 2a is input into a bitstream
demultiplexer 900. Depending on an information derived for
example from the bitstream via a mode detection block 601,
a decoder-side switch 600 is controlled to either forward
signals from the upper branch or signals from the lower
branch to the bandwidth extension block 701. The bandwidth
extension block 701 receives, from the bitstream demultip-
lexer 900, side information and, based on this side infor-
mation and the output of the mode decision 601, recon-
structs the high band based on the low band output by
switch 600.
The full band signal generated by block 701 is input into
the joint stereo/surround processing stage 702, which re-
constructs two stereo channels or several multi-channels.
Generally, block 702 will output more channels than were
input into this block. Depending on the application, the
input into block 702 may even include two channels such as
in a stereo mode and may even include more channels as long
as the output by this block has more channels than the in-
put into this block.
The switch 200 has been shown to switch between both
branches so that only one branch receives a signal to
process and the other branch does not receive a signal to
process. In an alternative embodiment, however, the switch
may also be arranged subsequent to for example the audio
encoder 421 and the excitation encoder 522, 523, 524, which
means that both branches 400, 500 process the same signal
in parallel. In order to not double the bitrate, however,
only the signal output by one of those encoding branches
400 or 500 is selected to be written into the output bit-
stream. The decision stage will then operate so that the
signal written into the bitstream minimizes a certain cost
function, where the cost function can be the generated bi-
trate or the generated perceptual distortion or a combined
rate/distortion cost function. Therefore, either in this
mode or in the mode illustrated in the Figures, the deci-
sion stage can also operate in a closed loop mode in order
to make sure that, finally, only the encoding branch output
is written into the bitstream which has for a given percep-
tual distortion the lowest bitrate or, for a given bitrate,
has the lowest perceptual distortion. In the closed loop
mode, the feedback input may be derived from outputs of the
three quantizer/sealer blocks 421, 522 and 424 in Fig. la.
In the implementation having two switches, i.e., the first
switch 200 and the second switch 521, it is preferred that
the time resolution for the first switch is lower than the
time resolution for the second switch. Stated differently,
the blocks of the input signal into the first switch, which
can be switched via a switch operation are larger than the
blocks switched by the second switch operating in the LPC-
domain. Exemplarily, the frequency domain/LPC-domain switch
200 may switch blocks of a length of 1024 samples, and the
second switch 521 can switch blocks having 256 samples
each.
Although some of the Figs, la through 10b are illustrated
as block diagrams of an apparatus, these figures simulta-
neously are an illustration of a method, where the block
functionalities correspond to the method steps.
Fig. 3a illustrates an audio encoder for generating an en-
coded audio signal as an output of the first encoding
branch 400 and a second encoding branch 500. Furthermore,
the encoded audio signal preferably includes side informa-
tion such as pre-processing parameters from the common
pre-processing stage or, as discussed in connection with
preceding Figs., switch control information.
Preferably, the first encoding branch is operative in or-
der to encode an audio intermediate signal 195 in accor-
dance with a first coding algorithm, wherein the first
coding algorithm has an information sink model. The first
encoding branch 400 generates the first encoder output
signal which is an encoded spectral information represen-
tation of the audio intermediate signal 195.
Furthermore, the second encoding branch 500 is adapted for
encoding the audio intermediate signal 195 in accordance
with a second encoding algorithm, the second coding algo-
rithm having an information source model and generating,
in a second encoder output signal, encoded parameters for
the information source model representing the intermediate
audio signal.
The audio encoder furthermore comprises the common pre-
processing stage for pre-processing an audio input signal
99 to obtain the audio intermediate signal 195. Specifi-
cally, the common pre-processing stage is operative to
process the audio input signal 99 so that the audio inter-
mediate signal 195, i.e., the output of the common pre-
processing algorithm is a compressed version of the audio
input signal.
A preferred method of audio encoding for generating an en-
coded audio signal, comprises a step of encoding 400 an au-
dio intermediate signal 195 in accordance with a first cod-
ing algorithm, the first coding algorithm having an infor-
mation sink model and generating, in a first output signal,
encoded spectral information representing the audio signal;
a step of encoding 500 an audio intermediate signal 195 in
accordance with a second coding algorithm, the second cod-
ing algorithm having an information source model and gene-
rating, in a second output signal, encoded parameters for
the information source model representing the intermediate
signal 195, and a step of commonly pre-processing 100 an
audio input signal 99 to obtain the audio intermediate sig-
nal 195, wherein, in the step of commonly pre-processing
the audio input signal 99 is processed so that the audio
intermediate signal 195 is a compressed version of the au-
dio input signal 99, wherein the encoded audio signal in-
cludes, for a certain portion of the audio signal either
the first output signal or the second output signal. The
method preferably includes the further step encoding a cer-
tain portion of the audio intermediate signal either using
the first coding algorithm or using the second coding algo-
rithm or encoding the signal using both algorithms and out- .
putting in an encoded signal either the result of the first
coding algorithm or the result of the second coding algo-
rithm.
Generally, the audio encoding algorithm used in the first
encoding branch 400 reflects and models the situation in
an audio sink. The sink of an audio information is normal-
ly the human ear. The human ear can be modeled as a fre-
quency analyzer. Therefore, the first encoding branch out-
puts encoded spectral information. Preferably, the first
encoding branch furthermore includes a psychoacoustic mod-
el for additionally applying a psychoacoustic masking
threshold. This psychoacoustic masking threshold is used
when quantizing audio spectral values where, preferably,
the quantization is performed such that a quantization
noise is introduced by quantizing the spectral audio val-
ues, which are hidden below the psychoacoustic masking
threshold.
The second encoding branch represents an information
source model, which reflects the generation of audio
sound. Therefore, information source models may include a
speech model which is reflected by an LPC analysis stage,
i.e., by transforming a time domain signal into an LPC do-
main and by subsequently processing the LPC residual sig-
nal, i.e., the excitation signal. Alternative sound source
models, however, are sound source models for representing
a certain instrument or any other sound generators such as
a specific sound source existing in real world. A selec-
tion between different sound source models can be per-
formed when several sound source models are available, for
example based on an SNR calculation, i.e., based on a cal-
culation, which of the source models is the best one suit-
able for encoding a certain time portion and/or frequency
portion of an audio signal. Preferably, however, the
switch between encoding branches is performed in the time
domain, i.e., that a certain time portion is encoded using
one model and a certain different time portion of the in-
termediate signal is encoded using the other encoding
branch.
Information source models are represented by certain pa-
rameters. Regarding the speech model, the parameters are
LPC parameters and coded excitation parameters, when a
modern speech coder such as AMR-WB+ is considered. The
AMR-WB+ comprises an ACELP encoder and a TCX encoder. In
this case, the coded excitation parameters can be global
gain, noise floor, and variable length codes.
Fig. 3b illustrates a decoder corresponding to the encoder
illustrated in Fig. 3a. Generally, Fig. 3b illustrates an
audio decoder for decoding an encoded audio signal to ob-
tain a decoded audio signal 799. The decoder includes the
first decoding branch 450 for decoding an encoded signal
encoded in accordance with a first coding algorithm having
an information sink model. The audio decoder furthermore
includes a second decoding branch 550 for decoding an en-
coded information signal encoded in accordance with a
second coding algorithm having an information source mod-
el. The audio decoder furthermore includes a combiner for
combining output signals from the first decoding branch
450 and the second decoding branch 550 to obtain a com-
bined signal. The combined signal which is illustrated in
Fig. 3b as the decoded audio intermediate signal 699 is
input into a common post processing stage for post
processing the decoded audio intermediate signal 699,
which is the combined signal output by the combiner 600 so
that an output signal of the common pre-processing stage
is an expanded version of the combined signal. Thus, the
decoded audio signal 799 has an enhanced information con-
tent compared to the decoded audio intermediate signal
699. This information expansion is provided by the common
post processing stage with the help of pre/post processing
parameters which can be transmitted from an encoder to a
decoder, or which can be derived from the decoded audio
intermediate signal itself. Preferably, however, pre/post
processing parameters are transmitted from an encoder to a
decoder, since this procedure allows an improved quality
of the decoded audio signal.
Fig. 3c illustrates an audio encoder for encoding an audio
input signal 195, which may be equal to the intermediate
audio signal 195 of Fig. 3a in accordance with the pre-
ferred embodiment of the present invention. The audio in-
put signal 195 is present in a first domain which can, for
example, be the time domain but which can also be any oth-
er domain such as a frequency domain, an LPC domain, an
LPC spectral domain or any other domain. Generally, the
conversion from one domain to the other domain is per-
formed by a conversion algorithm such as any of the well-
known time/frequency conversion algorithms or frequen-
cy/time conversion algorithms.
An alternative transform from the time domain, for example
in the LPC domain is the result of LPC filtering a time
domain signal which results in an LPC residual signal or
excitation signal. Any other filtering operations produc-
ing a filtered signal which has an impact on a substantial
number of signal samples before the transform can be used
as a transform algorithm as the case may be. Therefore,
weighting an audio signal using an LPC based weighting
filter is a further transform, which generates a signal in
the LPC domain. In a time/frequency transform, the modifi-
cation of a single spectral value will have an impact on
all time domain values before the transform. Analogously,
a modification of any time domain sample will have an im-
pact on each frequency domain sample. Similarly, a modifi-
cation of a sample of the excitation signal in an LPC do-
main situation will have, due to the length of the LPC
filter, an impact on a substantial number of samples be-
fore the LPC filtering. Similarly, a modification of a
sample before an LPC transformation will have an impact on
many samples obtained by this LPC transformation due to
the inherent memory effect of the LPC filter.
The audio encoder of Fig. 3c includes a first coding
branch 400 which generates a first encoded signal. This
first encoded signal may be in a fourth domain which is,
in the preferred embodiment, the time-spectral domain,
i.e., the domain which is obtained when a time domain sig-
nal is processed via a time/frequency conversion.
Therefore, the first coding branch 400 for encoding an au-
dio signal uses a first coding algorithm to obtain a first
encoded signal, where this first coding algorithm may or
may not include a time/frequency conversion algorithm.
The audio encoder furthermore includes a second coding
branch 500 for encoding an audio signal. The second coding
branch 500 uses a second coding algorithm to obtain a
second encoded signal, which is different from the first
coding algorithm.
The audio encoder furthermore includes a first switch 200
for switching between the first coding branch 400 and the
second coding branch 500 so that for a portion of the au-
dio input signal, either the first encoded signal at the
output of block 400 or the second encoded signal at the
output of the second encoding branch is included in an en-
coder output signal. Thus, when for a certain portion of
the audio input signal 195, the first encoded signal in
the fourth domain is included in the encoder output sig-
nal, the second encoded signal which is either the first
processed signal in the second domain or the second
processed signal in the third domain is not included in
the encoder output signal. This makes sure that this en-
coder is bit rate efficient. In embodiments, any time por-
tions of the audio signal which are included in two dif-
ferent encoded signals are small compared to a frame
length of a frame as will be discussed in connection with
Fig. 3e. These small portions are useful for a cross fade
from one encoded signal to the other encoded signal in the
case of a switch event in order to reduce artifacts that
might occur without any cross fade. Therefore, apart from
the cross-fade region, each time domain block is
represented by an encoded signal of only a single domain.
As illustrated in Fig. 3c, the second coding branch 500
comprises a converter 510 for converting the audio signal
in the first domain, i.e., signal 195 into a second do-
main. Furthermore, the second coding branch 500 comprises
a first processing branch 522 for processing an audio sig-
nal in the second domain to obtain a first processed sig-
nal which -is, preferably, also in the second domain so
that the first processing branch 522 does not perform a
domain change.
The second encoding branch 500 furthermore comprises a
second processing branch 523, 524 which converts the audio
signal in the second domain into a third domain, which is
different from the first domain and which is also differ-
ent from the second domain and which processes the audio
signal in the third domain to obtain a second processed
signal at the output of the second processing branch 523,
524.
Furthermore, the second coding branch comprises a second
switch 521 for switching between the first processing
branch 522 and the second processing branch 523, 524 so
that, for a portion of the audio signal input into the
second coding branch, either the first processed signal in
the second domain or the second processed signal in the
third domain is in the second encoded signal.
Fig. 3d illustrates a corresponding decoder for decoding
an encoded audio signal generated by the encoder of Fig.
3c. Generally, each block of the first domain audio signal
is represented by either a second domain signal, a third
domain signal or a fourth domain encoded signal apart from
an optional cross fade region which is, preferably, short
compared to the length of one frame in order to obtain a
system which is as much as possible at the critical sam-
pling limit. The encoded audio signal includes the first
coded signal, a second coded signal in a second domain and
a third coded signal in a third domain, wherein the first
coded signal, the second coded signal and the third coded
signal all relate to different time portions of the de-
coded audio signal and wherein the second domain, the
third domain and the first domain for a decoded audio sig-
nal are different from each other.
The decoder comprises a first decoding branch for decoding
based on the first coding algorithm. The first decoding
branch is illustrated at 431, 440 in Fig. 3d and prefera-
bly comprises a frequency/time converter. The first coded
signal is preferably in a fourth domain and is converted
into the first domain which is the domain for the decoded
output signal.
The decoder of Fig. 3d furthermore comprises a second de-
coding branch which comprises several elements. These ele-
ments are a first inverse processing branch 531 for in-
verse processing the second coded signal to obtain a first
inverse processed signal in the second domain at the out-
put of block 531. The second decoding branch furthermore
comprises a second inverse processing branch 533, 534 for
inverse processing a third coded signal to obtain a second
inverse processed signal in the second domain, where the
second inverse processing branch comprises a converter for
converting from the third domain into the second domain.
The second decoding branch furthermore comprises a first
combiner 532 for combining the first inverse processed
signal and the second inverse processed signal to obtain a
signal in the second domain, where this combined signal
is, at the first time instant, only influenced by the
first inverse processed signal and is, at a later time in-
stant, only influenced by the second inverse processed
signal.
The second decoding branch furthermore comprises a conver-
ter 540 for converting the combined signal to the first
domain.
Finally, the decoder illustrated in Fig. 3d comprises a
second combiner 600 for combining the decoded first signal
from block 431, 440 and the converter 540 output signal to
obtain a decoded output signal in the first domain. Again,
the decoded output signal in the first domain is, at the
first time instant, only influenced by the signal output
by the converter 540 and is, at a later time instant, only
influenced by the first decoded signal output by block
431, 440.
This situation is illustrated, from an encoder perspec-
tive, in Fig. 3e. The upper portion in Fig. 3e illustrates
in the schematic representation, a first domain audio sig-
nal such as a time domain audio signal, where the time in-
dex increases from left to right and item 3 might be con-
sidered as a stream of audio samples representing the sig-
nal 195 in Fig. 3c. Fig. 3e illustrates frames 3a, 3b, 3c,
3d which may be generated by switching between the first
encoded signal and the first processed signal and the
second processed signal as illustrated at item 4 in Fig.
3e. The first encoded signal, the first processed signal
and the second processed signals are all in different do-
mains and in order to make sure that the switch between
the different domains does not result in an artifact on
the decoder-side, frames 3a, 3b of the time domain signal
have an overlapping range which is indicated as a cross
fade region, and such a cross fade region is there at
frame 3b and 3c. However, no such cross fade region is ex-
isting between frame 3d, 3c which means that frame 3d is
also represented by a second processed signal, i.e., a
signal in the third domain, and there is no domain change
between frame 3c and 3d. Therefore, generally, it is pre-
ferred not to provide a cross fade region where there is
no domain change and to provide a cross fade region, i.e.,
a portion of the audio signal which is encoded by two sub-
sequent coded/processed signals when there is a domain
change, i.e., a switching action of either of the two
switches. Preferably, crossfades are performed for other
domain changes.
In the embodiment, in which the first encoded signal or
the second processed signal has been generated by an MDCT
processing having e.g. 50 percents overlap, each time do-
main sample is included in two subsequent frames. Due to
the characteristics of the MDCT, however, this does not
result in an overhead, since the MDCT is a critically sam-
pled system. In this context, critically sampled means
that the number of spectral values is the same as the num-
ber of time domain values. The MDCT is advantageous in
that the crossover effect is provided without a specific
crossover region so that a crossover from an MDCT block to
the next MDCT block is provided without any overhead which
would violate the critical sampling requirement.
Preferably, the first coding algorithm in the first coding
branch is based on an information sink model, and the
second coding algorithm in the second coding branch is
based on an information source or an SNR model. An SNR
model is a model which is not specifically related to a
specific sound generation mechanism but which is one cod-
ing mode which can be selected among a plurality of coding
modes based e.g. on a closed loop decision. Thus, an SNR
model is any available coding model but which does not
necessarily have to be related to the physical constitu-
tion of the sound generator but which is any parameterized
coding model different from the information sink model,
which can be selected by a closed loop decision and, spe-
cifically, by comparing different SNR results from differ-
ent models.
As illustrated in Fig. 3c, a controller 300, 525 is pro-
vided. This controller may include the functionalities of
the decision stage 300 of Fig. la and, additionally, may
include the functionality of the switch control device 525
in Fig. la. Generally, the controller is for controlling
the first switch and the second switch in a signal adap-
tive way. The controller is operative to analyze a signal
input into the first switch or output by the first or the
second coding branch or signals obtained by encoding and
decoding from the first and the second encoding branch
with respect to a target function. Alternatively, or addi-
tionally, the controller is operative to analyze the sig-
nal input into the second switch or output by the first
processing branch or the second processing branch or ob-
tained by processing and inverse processing from the first
processing branch and the second processing branch, again
with respect to a target function.
In one embodiment, the first coding branch or the second
coding branch comprises an aliasing introducing
time/frequency conversion algorithm such as an MDCT or an
MDST algorithm, which is different from a straightforward
FFT transform, which does not introduce an aliasing ef-
fect. Furthermore, one or both branches comprise a quan-
tizer/entropy coder block. Specifically, only the second
processing branch of the second coding branch includes the
time/frequency converter introducing an aliasing operation
and the first processing branch of the second coding
branch comprises a quantizer and/or entropy coder and does
not introduce any aliasing effects. The aliasing introduc-
ing time/frequency converter preferably comprises a win-
dower for applying an analysis window and an MDCT trans-
form algorithm. Specifically, the windower is operative to
apply the window function to subsequent frames in an over-
lapping way so that a sample of a windowed signal occurs
in at least two subsequent windowed frames.
In one embodiment, the first processing branch comprises
an ACELP coder and a second processing branch comprises an
MDCT spectral converter and the quantizer for quantizing
spectral components to obtain quantized spectral compo-
nents, where each quantized spectral component is zero or
is defined by one quantizer index of the plurality of dif-
ferent possible quantizer indices.
Furthermore, it is preferred that the first switch 200 op-
erates in an open loop manner and the second switch oper-
ates in a closed loop manner.
As stated before, both coding branches are operative to
encode the audio signal in a block wise manner, in which
the first switch or the second switch switches in a block-
wise manner so that a switching action takes place, at the
minimum, after a block of a predefined number of samples
of a signal, the predefined number forming a frame length
for the corresponding switch. Thus, the granule for
switching by the first switch may be, for example, a block
of 2048 or 1028 samples, and the frame length, based on
which the first switch 200 is switching may be variable
but is, preferably, fixed to such a quite long period.
Contrary thereto, the block length for the second switch
521, i.e., when the second switch 521 switches from one
mode to the other, is substantially smaller than the block
length for the first switch. Preferably, both block
lengths for the switches are selected such that the longer
block length is an integer multiple of the shorter block
length. In the preferred embodiment, the block length of
the first switch is 2048 or 1024 and the block length of
the second switch is 1024 or more preferably, 512 and even
more preferably, 256 and even more preferably 128 samples
so that, at the maximum, the second switch can switch 16
times when the first switch switches only a single time. A
preferred maximum block length ratio, however, is 4:1.
In a further embodiment, the controller 300, 525 is opera-
tive to perform a speech music discrimination for the
first switch in such a way that a decision to speech is
favored with respect to a decision to music. In this embo-
diment, a decision to speech is taken even when a portion
less than 50% of a frame for the first switch is speech
and the portion of more than 50% of the frame is music.
Furthermore, the controller is operative to already switch
to the speech mode, when a quite small portion of the
first frame is speech and, specifically, when a portion of
the first frame is speech, which is 50% of the length of
the smaller second frame. Thus, a preferred
speech/favouring switching decision already switches over
to speech even when, for example, only 6% or 12% of a
block corresponding to the frame length of the first
switch is speech.
This procedure is preferably in order to fully exploit the
bit rate saving capability of the first processing branch,
which has a voiced speech core in one embodiment and to
not loose any quality even for the rest of the large first
frame, which is non-speech due to the fact that the second
processing branch includes a converter and, therefore, is
useful for audio signals which have non-speech signals as
well. Preferably, this second processing branch includes
an overlapping MDCT, which is critically sampled, and
which even at small window sizes provides a highly effi-
cient and aliasing free operation due to the time domain
aliasing cancellation processing such as overlap and add
on the decoder-side. Furthermore, a large block length for
the first encoding branch which is preferably an AAC-like
MDCT encoding branch is useful, since non-speech signals
are normally quite stationary and a long transform window
provides a high frequency resolution and, therefore, high
quality and, additionally, provides a bit rate efficiency
due to a psycho acoustically controlled quantization mod-
ule, which can also be applied to the transform based cod-
ing mode in the second processing branch of the second
coding branch.
Regarding the Fig. 3d decoder illustration, it is pre-
ferred that the transmitted signal includes an explicit
indicator as side information 4a as illustrated in Fig.
3e. This side information 4a is extracted by a bit stream
parser not illustrated in Fig. 3d in order to forward the
corresponding first encoded signal, first processed signal
or second processed signal to the correct processor such
as the first decoding branch, the first inverse processing
branch or the second inverse processing branch in Fig. 3d.
Therefore, an encoded signal not only has the en-
coded/processed signals but also includes side information
relating to these signals. In other embodiments, however,
there can be an implicit signaling which allows a decoder-
side bit stream parser to distinguish between the certain
signals. Regarding Fig. 3e, it is outlined that the first
processed signal or the second processed signal is the
output of the second coding branch and, therefore, the
second coded signal.
Preferably, the first decoding branch and/or the second
inverse processing branch includes an MDCT transform for
converting from the spectral domain to the time domain. To
this end, an overlap-adder is provided to perform a time
domain aliasing cancellation functionality which, at the
same time, provides a cross fade effect in order to avoid
blocking artifacts. Generally, the first decoding branch
converts a signal encoded in the fourth domain into the
first domain, while the second inverse processing branch
performs a conversion from the third domain to the second
domain and the converter subsequently connected to the
first combiner provides a conversion from the second do-
main to the first domain so that, at the input of the com-
biner 600, only first domain signals are there, which
represent, in the Fig. 3d embodiment, the decoded output
signal.
Fig. 4a and 4b illustrate two different embodiments, which
differ in the positioning of the switch 200. In Fig. 4a,
the switch 200 is positioned between an output of the com-
mon pre-processing stage 100 and input of the two encoded
branches 400, 500. The Fig. 4a embodiment makes sure that
the audio signal is input into a single encoding branch
only, and the other encoding branch, which is not con-
nected to the output of the common pre-processing stage
does not operate and, therefore, is switched off or is in
a sleep mode. This embodiment is preferable in that the
non-active encoding branch does not consume power and com-
putational resources which is useful for mobile applica-
tions in particular, which are battery-powered and, there-
fore, have the general limitation of power consumption.
On the other hand, however, the Fig. 4b embodiment may be
preferable when power consumption is not an issue. In this
embodiment, both encoding branches 400, 500 are active all
the time, and only the output of the selected encoding
branch for a certain time portion and/or a certain fre-
quency portion is forwarded to the bit stream formatter
which may be implemented as a bit stream multiplexer 800.
Therefore, in the Fig. 4b embodiment, both encoding
branches are active all the time, and the output of an en-
coding branch which is selected by the decision stage 300
is entered into the output bit stream, while the output of
the other non-selected encoding branch 400 is discarded,
i.e., not entered into the output bit stream, i.e., the
encoded audio signal.
Fig. 4c illustrates a further aspect of a preferred decod-
er implementation. In order to avoid audible artifacts
specifically in the situation, in which the first decoder
is a time-aliasing generating decoder or generally stated
a frequency domain decoder and the second decoder is a
time domain device, the borders between blocks or frames
output by the first decoder 450 and the second decoder 550
should not be fully continuous, specifically in a switch-
ing situation. Thus, when the first block of the first de-
coder 450 is output and, when for the subsequent time por-
tion, a block of the second decoder is output, it is pre-
ferred to perform a cross fading operation as illustrated
by cross fade block 607. To this end, the cross fade block
607 might be implemented as illustrated in Fig. 4c at
607a, 607b and 607c. Each branch might have a weighter
having a weighting factor m1 between 0 and 1 on the norma-
lized scale, where the weighting factor can vary as indi-
cated in the plot 609, such a cross fading rule makes sure
that a continuous and smooth cross fading takes place
which, additionally, assures that a user will not perceive
any loudness variations. Non-linear crossfade rules such
as a sin2 crossfade rule can be applied instead of a li-
near crossfade rule.
In certain instances, the last block of the first decoder
was generated using a window where the window actually
performed a fade out of this block. In this case, the
weighting factor m1 in block 607a is equal to 1 and, ac-
tually, no weighting at all is required for this branch.
When a switch from the second decoder to the first decoder
takes place, and when the second decoder includes a window
which actually fades out the output to the end of the
block, then the weighter indicated with "m2" would not be
required or the weighting parameter can be set to 1
throughout the whole cross fading region.
When the first block after a switch was generated using a
windowing operation, and when this window actually per-
formed a fade in operation, then the corresponding weight-
ing factor can also be set to 1 so that a weighter is not
really necessary. Therefore, when the last block is win-
dowed in order to fade out by the decoder and when the
first block after the switch is windowed using the decoder
in order to provide a fade in, then the weighters 607a,
607b are not required at all and an addition operation by
adder 607c is sufficient.
In this case, the fade out portion of the last frame and
the fade in portion of the next frame define the cross
fading region indicated in block 609. Furthermore, it is
preferred in such a situation that the last block of one
decoder has a certain time overlap with the first block of
the other decoder.
If a cross fading operation is not required or not possi-
ble or not desired, and if only a hard switch from one de-
coder to the other decoder is there, it is preferred to
perform such a switch in silent passages of the audio sig-
nal or at least in passages of the audio signal where
there is low energy, i.e., which are perceived to be si-
lent or almost silent. Preferably, the decision stage 300
assures in such an embodiment that the switch 200 is only
activated when the corresponding time portion which fol-
lows the switch event has an energy which is, for example,
lower than the mean energy of the audio signal and is,
preferably, lower than 50% of the mean energy of the audio
signal related to, for example, two or even more time por-
tions/frames of the audio signal.
Preferably, the second encoding rule/decoding rule is an
LPC-based coding algorithm. In LPC-based speech coding, a
differentiation between quasi-periodic impulse-like exci-
tation signal segments or signal portions, and noise-like
excitation signal segments or signal portions, is made.
This is performed for very low bit rate LPC vocoders (2.4
kbps) as in Fig 7b. However, in medium rate CELP coders,
the excitation is obtained for the addition of scaled vec-
tors from an adaptive codebook and a fixed codebook.
Quasi-periodic impulse-like excitation signal segments,
i.e., signal segments having a specific pitch are coded
with different mechanisms than noise-like excitation sig-
nals. While quasi-periodic impulse-like excitation signals
are connected to voiced speech, noise-like signals are re-
lated to unvoiced speech.
Exemplarily, reference is made to Figs. 5a to 5d. Here,
quasi-periodic impulse-like signal segments or signal por-
tions and noise-like signal segments or signal portions are
exemplarily discussed. Specifically, a voiced speech as il-
lustrated in Fig. 5a in the time domain and in Fig. 5b in
the frequency domain is discussed as an example for a
quasi-periodic impulse-like signal portion, and an unvoiced
speech segment as an example for a noise-like signal por-
tion is discussed in connection with Figs. 5c and 5d.
Speech can generally be classified as voiced, unvoiced, or
mixed. Time-and-frequency domain plots for sampled voiced
and unvoiced segments are shown in Fig. 5a to 5d. Voiced
speech is quasi periodic in the time domain and harmoni-
cally structured in the frequency domain, while unvoiced
speed is random-like and broadband. The short-time spectrum
of voiced speech is characterized by its fine harmonic for-
ma nt structure. The fine harmonic structure is a conse-
quence of the quasi-periodicity of speech and may be at-
tributed to the vibrating vocal chords. The formant struc-
ture (spectral envelope) is due to the interaction of the
source and the vocal tracts. The vocal tracts consist of
the pharynx and the mouth cavity. The shape of the spectral
envelope that "fits" the short time spectrum of voiced
speech is associated with the transfer characteristics of
the vocal tract and the spectral tilt (6 dB /Octave) due to
the glottal pulse. The spectral envelope is characterized
by a set of peaks which are called formants. The formants
are the resonant modes of the vocal tract. For the average
vocal tract there are three to five formants below 5 kHz.
The amplitudes and locations of the first three formants,
usually occurring below 3 kHz are quite important both, in
speech synthesis and perception. Higher formants are also
important for wide band and unvoiced speech representa-
tions. The properties of speech are related to the physical
speech production system as follows. Voiced speech is pro-
duced by exciting the vocal tract with quasi-periodic glot-
tal air pulses generated by the vibrating vocal chords. The
frequency of the periodic pulses is referred to as the fun-
damental frequency or pitch. Unvoiced speech is produced by
forcing air through a constriction in the vocal tract. Na-
sal sounds are due to the acoustic coupling of the nasal
tract to the vocal tract, and plosive sounds are produced
by abruptly releasing the air pressure which was built up
behind the closure in the tract.
Thus, a noise-like portion of the audio signal shows nei-
ther any impulse-like time-domain structure nor harmonic
frequency-domain structure as illustrated in Fig. 5c and in
Fig. 5d, which is different from the quasi-periodic im-
pulse-like portion as illustrated for example in Fig. 5a
and in Fig. 5b. As will be outlined later on, however, the
differentiation between noise-like portions and quasi-
periodic impulse-like portions can also be observed after a
LPC for the excitation signal. The LPC is a method which
models the vocal tract and extracts from the signal the ex-
citation of the vocal tracts.
Furthermore, quasi-periodic impulse-like portions and
noise-like portions can occur in a timely manner, i.e.,
which means that a portion of the audio signal in time is
noisy and another portion of the audio signal in time is
quasi-periodic, i.e. tonal. Alternatively, or additionally,
the characteristic of a signal can be different in differ-
ent 'frequency bands. Thus, the determination, whether the
audio signal is noisy or tonal, can also be performed fre-
quency-selective so that a certain frequency band or sev-
eral certain frequency bands are considered to be noisy and
other frequency bands are considered to be tonal. In this
case, a certain time portion of the audio signal might in-
clude tonal components and noisy components.
Fig. la illustrates a linear model of a speech production
system. This system assumes a two-stage excitation, i.e.,
an impulse-train for voiced speech as indicated in Fig. 7c,
and a random-noise for unvoiced speech as indicated in Fig.
7d. The vocal tract is modelled as an all-pole filter 70
which processes pulses of Fig. 7c or Fig. 7d, generated by
the glottal model 72. Hence, the system of Fig. 7a can be
reduced to an all pole-filter model of Fig. 7b having a
gain stage 77, a forward path 78, a feedback path 79, and
an adding stage 80. In the feedback path 79, there is a
prediction filter 81, and the whole source-model synthesis
system illustrated in Fig. 7b can be represented using z-
domain functions as follows:
S(z)=g/(1-A(z) ) . X(z) ,
where g represents the gain, A(z) is the prediction filter
as determined by an LP analysis, X(z) is the excitation
signal, and S(z) is the synthesis speech output.
Figs. 7c and 7d give a graphical time domain description of
voiced and unvoiced speech synthesis using the linear
source system model. This system and the excitation parame-
ters in the above equation are unknown and must be deter-
mined from a finite set of speech samples. The coefficients
of A(z) are obtained using a linear prediction of the input
signal and a quantization of the filter coefficients. In a
p-th order forward linear predictor, the present sample of
the speech sequence is predicted from a linear combination
of p passed samples. The predictor coefficients can be de-
termined by well-known algorithms such as the Levinson-
Durbin algorithm, or generally an autocorrelation method or
a reflection method.
Fig. 7e illustrates a more detailed implementation of the
LPC analysis block 510. The audio signal is input into a
filter determination block which determines the filter in-
formation A(z). This information is output as the short-
term prediction information required for a decoder. The
short-term prediction information is required by the ac-
tual prediction filter 85. In a subtracter 86, a current
sample of the audio signal is input and a predicted value
for the current sample is subtracted so that for this sam-
ple, the prediction error signal is generated at line 84.
A sequence of such prediction error signal samples is very
schematically illustrated in Fig. 7c or 7d. Therefore,
Fig. 7a, 7b can be considered as a kind of a rectified im-
pulse-like signal.
While Fig. 7e illustrates a preferred way to calculate the
excitation signal, Fig. 7f illustrates a preferred way to
calculate the weighted signal. In contrast to Fig. 7e, the
filter 85 is different, when γ is different from 1. A
value smaller than 1 is preferred for γ. Furthermore, the
block 87 is present, and µ, is preferable a number smaller
than 1. Generally, the elements in Fig. 7e and 7f can be
implemented as in 3GPP TS 26.190 or 3GPP TS 26.290.
Fig. 7g illustrates an inverse processing, which can be
applied on the decoder side such as in element 537 of Fig.
2b. Particularly, block 88 generates an unweighted signal
from the weighted signal and block 89 calculates an exci-
tation from the unweighted signal. Generally, all signals
but the unweighted signal in Fig. 7g are in the LPC do-
main, but the excitation signal and the weighted signal
are different signals in the same domain. Block 89 outputs
an excitation signal which can then be used together with
the output of block 536. Then, the common inverse LPC
transform can be performed in block 540 of Fig. 2b.
Subsequently, an analysis-by-synthesis CELP encoder will be
discussed in connection with Fig. 6 in order to illustrate
the modifications applied to this algorithm. This CELP en-
coder is discussed in detail in "Speech Coding: A Tutorial
Review", Andreas Spanias, Proceedings of the IEEE, Vol. 82,
No. 10, October 1994, pages 1541-1582. The CELP encoder as
illustrated in Fig. 6 includes a long-term prediction com-
ponent 60 and a short-term prediction component 62. Fur-
thermore, a codebook is used which is indicated at 64. A
perceptual weighting filter W(z) is implemented at 66, and
an error minimization controller is provided at 68. s(n) is
the time-domain input signal. After having been perceptu-
ally weighted, the weighted signal is input into a subtrac-
ter 69, which calculates the error between the weighted
synthesis signal at the output of block 66 and the original
weighted signal sw(n). Generally, the short-term prediction
filter coefficients A(z) are calculated by an LP analysis
stage and its coefficients are quantized in A(z) as indi-
cated in Fig. 7e. The long-term prediction information
AL(z) including the long-term prediction gain g and the
vector quantization index, i.e., codebook references are
calculated on the prediction error signal at the output of
the LPC analysis stage referred as 10a in Fig. 7e. The LTP
parameters are the pitch delay and gain. In CELP this is
usually implemented as an adaptive codebook containing the
past excitation signal (not the residual). The adaptive CB
delay and gain are found by minimizing the mean-squared
weighted error (closed-loop pitch search).
The CELP algorithm encodes then the residual signal ob-
tained after the short-term and long-term predictions using
a codebook of for example Gaussian sequences. The ACELP al-
gorithm, where the "A" stands for "Algebraic" has a spe-
cific algebraically designed codebook.
A codebook may contain more or less vectors where each
vector is some samples long. A gain factor g scales the
code vector and the gained code is filtered by the long-
term prediction synthesis filter and the short-term pre-
diction synthesis filter. The "optimum" code vector is se-
lected such that the perceptually weighted mean square er-
ror at the output of the subtracter 69 is minimized. The
search process in CELP is done by an analysis-by-synthesis
optimization as illustrated in Fig. 6.
For specific cases, when a frame is a mixture of unvoiced
and voiced speech or when speech over music occurs, a TCX
coding can be more appropriate to code the excitation in
the LPC domain. The TCX coding processes the a weighted
signal in the frequency domain without doing any assump-
tion of excitation production. The TCX is then more ge-
neric than CELP coding and is not restricted to a voiced
or a non-voiced source model of the excitation. TCX is
still a source-filer model coding using a linear predic-
tive filter for modelling the formants of the speech-like
signals.
In the AMR-WB+-like coding, a selection between different
TCX modes and ACELP takes place as known from the AMR-WB+
description. The TCX modes are different in that the
length of the block-wise Discrete Fourier Transform is
different for different modes and the best mode can be se-
lected by an analysis by synthesis approach or by a direct
"feedforward" mode.
As discussed in connection with Fig. 2a and 2b, the common
pre-processing stage 100 preferably includes a joint mul-
ti-channel (surround/joint stereo device) 101 and, addi-
tionally, a band width extension stage 102. Corresponding-
ly, the decoder includes a band width extension stage 701
and a subsequently connected joint multichannel stage 702.
Preferably, the joint multichannel stage 101 is, with re-
spect to the encoder, connected before the band width ex-
tension stage 102, and, on the decoder side, the band
width extension stage 701 is connected before the joint
multichannel stage 702 with respect to the signal
processing direction. Alternatively, however, the common
pre-processing stage can include a joint multichannel
stage without the subsequently connected bandwidth exten-
sion stage or a bandwidth extension stage without a con-
nected joint multichannel stage.
A preferred example for a joint multichannel stage on the
encoder side 101a, 101b and on the decoder side 702a and
702b is illustrated in the context of Fig. 8. A number of
E original input channels is input into the downmixer 101a
so that the downmixer generates a number of K transmitted
channels, where the number K is greater than or equal to
one and is smaller than or equal E.
Preferably, the E input channels are input into a joint
multichannel parameter analyzer 101b which generates para-
metric information. This parametric information is prefer-
ably entropy-encoded such as by a difference encoding and
subsequent Huffman encoding or, alternatively, subsequent
arithmetic encoding. The encoded parametric information
output by block 101b is transmitted to a parameter decoder
702b which may be part of item 702 in Fig. 2b. The parame-
ter decoder 702b decodes the transmitted parametric infor-
mation and forwards the decoded parametric information in-
to the upmixer 702a. The upmixer 702a receives the K
transmitted channels and generates a number of L output
channels, where the number of L is greater than or equal K
and lower than or equal to E.
Parametric information may include inter channel level
differences, inter channel time differences, inter channel
phase differences and/or inter channel coherence measures
as is known from the BCC technique or as is known and is
described in detail in the MPEG surround standard. The
number of transmitted channels may be a single mono chan-
nel for ultra-low bit rate applications or may include a
compatible stereo application or may include a compatible
stereo signal, i.e., two channels. Typically, the number
of E input channels may be five or maybe even higher. Al-
ternatively, the number of E input channels may also be E
audio objects as it is known in the context of spatial au-
dio object coding (SAOC).
In one implementation, the downmixer performs a weighted
or unweighted addition of the original E input channels or
an addition of the E input audio objects. In case of audio
objects as input channels, the joint multichannel parame-
ter analyzer 101b will calculate audio object parameters
such as a correlation matrix between the audio objects
preferably for each time portion and even more preferably
for each frequency band. To this end, the whole frequency
range may be divided in at least 10 and preferable 32 or
64 frequency bands.
Fig. 9 illustrates a preferred embodiment for the imple-
mentation of the bandwidth extension stage 102 in Fig. 2a
and the corresponding band width extension stage 701 in
Fig. 2b. On the encoder-side, the bandwidth extension
block 102 preferably includes a low pass filtering block
102b, a downsampler block, which follows the lowpass, or
which is part of the inverse QMF, which acts on only half
of the QMF bands, and a high band analyzer 102a. The orig-
inal audio signal input into the bandwidth extension block
102 is low-pass filtered to generate the low band signal
which is then input into the encoding branches and/or the
switch. The low pass filter has a cut off frequency which
can be in a range of 3kHz to 10kHz. Furthermore, the band-
width extension block 102 furthermore includes a high band
analyzer for calculating the bandwidth extension parame-
ters such as a spectral envelope parameter information, a
noise floor parameter information, an inverse filtering
parameter information, further parametric information re-
lating to certain harmonic lines in the high band and ad-
ditional parameters as discussed in detail in the MPEG-4
standard in the chapter related to spectral band replica-
tion .
On the decoder-side, the bandwidth extension block 701 in-
cludes a patcher 701a, an adjuster 701b and a combiner
701c. The combiner 701c combines the decoded low band sig-
nal and the reconstructed and adjusted high band signal
output by the adjuster 701b. The input into the adjuster
701b is provided by a patcher which is operated to derive
the high band signal from the low band signal such as by
spectral band replication or, generally, by bandwidth ex-
tension. The patching performed by the patcher 701a may be
a patching performed in a harmonic way or in a non-
harmonic way. The signal generated by the patcher 701a is,
subsequently, adjusted by the adjuster 701b using the
transmitted parametric bandwidth extension information.
As indicated in Fig. 8 and Fig. 9, the described blocks
may have a mode control input in a preferred embodiment.
This mode control input is derived from the decision stage
300 output signal. In such a preferred embodiment, a cha-
racteristic of a corresponding block may be adapted to the
decision stage output, i.e., whether, in a preferred embo-
diment, a decision to speech or a decision to music is
made for a certain time portion of the audio signal. Pre-
ferably, the mode control only relates to one or more of
the functionalities of these blocks but not to all of the
functionalities of blocks. For example, the decision may
influence only the patcher 701a but may not influence the
other blocks in Fig. 9, or may, for example, influence on-
ly the joint multichannel parameter analyzer 101b in Fig.
8 but not the other blocks in Fig. 8. This implementation
is preferably such that a higher flexibility and higher
quality and lower bit rate output signal is obtained by
providing flexibility in the common pre-processing stage.
On the other hand, however, the usage of algorithms in the
common pre-processing stage for both kinds of signals al-
lows to implement an efficient encoding/decoding scheme.
Fig. 10a and Fig. 10b illustrates two different implementa-
tions of the decision stage 300. In Fig. 10a, an open loop
decision is indicated. Here, the signal analyzer 300a in
the decision stage has certain rules in order to decide
whether the certain time portion or a certain frequency
portion of the input signal has a characteristic which re-
quires that this signal portion is encoded by the first en-
coding branch 400 or by the second encoding branch 500. To
this end, the signal analyzer 300a may analyze the audio
input signal into the common pre-processing stage or may
analyze the audio signal output by the common pre-
processing stage, i.e., the audio intermediate signal or
may analyze an intermediate signal within the common pre-
processing stage such as the output of the downmix signal
which may be a mono signal or which may be a signal having
k channels indicated in Fig. 8. On the output-side, the
signal analyzer 300a generates the switching decision for
controlling the switch 200 on the encoder-side and the cor-
responding switch 600 or the combiner 600 on the decoder-
side.
Although not discussed in detail for the second switch
521, it is to be emphasized that the second switch 521 can
be positioned in a similar way as the first switch 200 as
discussed in connection with Fig. 4a and Fig. 4b. Thus, an
alternative position of switch 521 in Fig. 3c is at the
output of both processing branches 522, 523, 524 so that,
both processing branches operate in parallel and only the
output of one processing branch is written into a bit
stream via a bit stream former which is not illustrated in
Fig. 3c.
Furthermore, the second combiner 600 may have a specific
cross fading functionality as discussed in Fig. 4c. Alter-
natively or additionally, the first combiner 532 might
have the same cross fading functionality. Furthermore,
both combiners may have the same cross fading functionali-
ty or may have different cross fading functionalities or
may have no cross fading functionalities at all so that
both combiners are switches without any additional cross
fading functionality.
As discussed before, both switches can be controlled via
an open loop decision or a closed loop decision as dis-
cussed in connection with Fig. 10a and Fig. 10b, where the
controller 300, 525 of Fig. 3c can have different or the
same functionalities for both switches.
Furthermore, a time warping functionality which is signal-
adaptive can exist not only in the first encoding branch
or first decoding branch but can also exist in the second
processing branch of the second coding branch on the en-
coder side as well as' on the decoder side. Depending on a
processed signal, both time warping functionalities can
have the same time warping information so that the same
time warp is applied to the signals in the first domain
and in the second domain. This saves processing load and
might be useful in some instances, in cases where subse-
quent blocks have a similar time warping time characteris-
tic. In alternative embodiments, however, it is preferred
to have independent time warp estimators for the first
coding branch and the second processing branch in the
second coding branch.
The inventive encoded audio signal can be stored on a digi-
tal storage medium or can be transmitted on a transmission
medium such as a wireless transmission medium or a wired
transmission medium such as the Internet.
In a different embodiment, the switch 200 of Fig. la or 2a
switches between the two coding branches 400, 500. In a
further embodiment, there can be additional encoding
branches such as a third encoding branch or even a fourth
encoding branch or even more encoding branches. On the de-
coder side, the switch 600 of Fig. lb or 2b switches be-
tween the two decoding branches 431, 440 and 531, 532, 533,
534, 540. In a further embodiment, there can be additional
decoding branches such as a third decoding branch or even a
fourth decoding branch or even more decoding branches. Si-
milarly, the other switches 521 or 532 may switch between
more than two different coding algorithms, when such addi-
tional coding/decoding branches are provided.
The above-described embodiments are merely illustrative
for the principles of the present invention. It is unders-
tood that modifications and variations of the arrangements
and the details described herein will be apparent to oth-
ers skilled in the art. It is the intent, therefore, to be
limited only by the scope of the impending patent claims
and not by the specific details presented by way of de-
scription and explanation of the embodiments herein.
Depending on certain implementation requirements of the in-
ventive methods, the inventive methods can be implemented
in hardware or in software. The implementation can be per-
formed using a digital storage medium, in particular, a
disc, a DVD or a CD having electronically-readable control
signals stored thereon, which co-operate with programmable
computer systems such that the inventive methods are per-
formed. Generally, the present invention is therefore a
computer program product with a program code stored on a
machine-readable carrier, the program code being operated
for performing the inventive methods when the computer pro-
gram product runs on a computer. In other words, the inven-
tive methods are, therefore, a computer program having a
program code for performing at least one of the inventive
methods when the computer program runs on a computer.
Claims
1. Audio encoder for encoding an audio input signal
(195), the audio input signal being in a first domain,
comprising:
a first coding branch (400) for encoding an audio sig-
nal using a first coding algorithm to obtain a first
encoded signal;
a second coding branch (500) for encoding an audio
signal using a second coding algorithm to obtain a
second encoded signal, wherein the first coding algo-
rithm is different from the second coding algorithm;
and
a first switch (200) for switching between the first
coding branch and the second coding branch so that,
for a portion of the audio input signal, either the
first encoded signal or the second encoded signal is
in an encoder output signal,
wherein the second coding branch comprises:
a converter (510) for converting the audio signal
into a second domain different from -the first do-
main,
a first processing branch (522) for processing an
audio signal in the second domain to obtain a
first processed signal;
a second processing branch (523, 524) for con-
verting a signal into a third domain different
from the first domain and the second domain and
for processing the signal in the third domain to
obtain a second processed signal; and
a second switch (521) for switching between the
first processing branch (522) and the second
processing branch (523, 524) so that, for a por-
tion of the audio signal input into the second
coding branch, either the first processed signal
or the second processed signal is in the second
encoded signal.
2. Audio encoder in accordance with claim 1, in which the
first coding algorithm in the first coding branch
(400) is based on an information sink model, or in
which the second coding algorithm in the second coding
branch (500) is based on an information source or a
signal to noise ratio (SNR) model.
3. Audio encoder in accordance with claim 1 or 2, in
which the first coding branch comprises a converter
(410) for converting the audio input signal into a
fourth domain different from the first domain, the
second domain, and the third domain.
4. Audio encoder in accordance with one of the preceding
claims, in which the first domain is the time domain,
the second domain is an LPC domain obtained by an LPC
filtering the first domain signal, the third domain is
an LPC spectral domain obtained by converting an LPC
filtered signal into a spectral domain, and the fourth
domain is a spectral domain obtained by frequency do-
main converting the first domain signal.
5. Audio encoder in accordance with one of the preceding
claims, further comprising a controller (300, 525) for
controlling the first switch (200) or the second
switch (521) in a signal adaptive way,
wherein the controller is operative to analyze a sig-
nal input into the first switch (200; or output by the
first coding branch or the second coding branch or a
signal obtained by decoding an output signal of the
first coding branch or the second coding branch with
respect to a target function, or
wherein the controller (300, 525) is operative to ana-
lyze a signal input into the second switch (521) or
output by the first processing branch or the second
processing branch or signals obtained by inverse
processing output signals from the first processing
branch (522) and the second processing branch (523,
524) with respect to a. target function.
6. Audio encoder in accordance with one of the preceding
claims, in which the first coding branch (400) or the
second processing branch (523, 524) of the second cod-
ing branch (500) comprises an aliasing introducing
time/frequency converter and a quantizer/entropy coder
stage (421) and wherein the first processing branch of
the second coding branch comprises a quantizer or en-
tropy coder stage (522) without an aliasing introduc-
ing conversion.
7. Audio encoder in accordance with claim 6, in which the
aliasing introducing time/frequency converter compris-
es a windower for applying an analysis window and a
modified discrete cosine transform (MDCT) algorithm,
the windower being operative to apply the window func-
tion to subsequent frames in an overlapping manner so
that a sample of an input signal into the windower oc-
curs in at least two subsequent frames.
8. Audio encoder in accordance with one of the preceding
claims, in which the first processing branch (522)
comprises the LPC excitation coding of an algebraic
code excited linear prediction (ACELP) coder and the
second processing branch comprises an MDCT spectral
converter and a quantizer for quantizing spectral com-
ponents to obtain quantized spectral components,
wherein each quantized spectral component is zero or
is defined by one quantization index of a plurality of
quantization indices.
9. Audio encoder in accordance with claim 5, in which the
controller is operative to control the first switch
(200) in an open loop manner and to control the second
switch (521) in a closed loop manner.
10. Audio encoder in accordance with one of the preceding
claims, in which the first coding branch and the
second coding branch are operative to encode the audio
signal in a block wise manner, wherein the first
switch or the second switch are switching in a block-
wise manner so that a switching action takes place, at
the minimum, after a block of a predefined number of
samples of a signal, the predefined number of samples
forming a frame length for the corresponding switch
(521, 200) .
11. Audio encoder in accordance with claim 10, in which
the frame length for the first switch is at least
double the size of the frame length of the second
switch.
12. Audio encoder in accordance with claim 5, in which the
controller is operative to perform a speech/music dis-
crimination in such a way that a decision to speech is
favored with respect to a decision to music so that a
decision to speech is taken even when a portion less
than 50% of a frame for the first switch is speech and
a portion more than 50% of the frame for the first
switch is music.
13. Audio encoder in accordance' with claim 5 or 12, in
which a frame for the second switch is smaller than a
frame for the first switch, and in which the control-
ler (525, 300) is operative to take a decision to
speech when only a portion of the first frame which
has a length which is more than 50% of the length of
the second frame is found out to include music.
14. Audio encoder in accordance with one of the preceding
claims, in which the first encoding branch (400) or
the second processing branch of the second coding
branch includes a variable time warping functionality.
15. Method of encoding an audio input signal (195), the
audio input signal being in a first domain, compris-
ing :
encoding (400) an audio signal using a first coding
algorithm to obtain a first encoded signal;
encoding (500) an audio signal using a second coding
algorithm to obtain a second encoded signal, wherein
the first coding algorithm is different from the
second coding algorithm; and
switching (200) between encoding using the first cod-
ing algorithm and encoding using the second coding al-
gorithm so that, for a portion of the audio input sig-
nal, either the first encoded signal or the second en-
coded signal is in an encoded output signal,
wherein encoding (500) using the second coding algo-
rithm comprises:
converting (510) the audio signal into a second
domain different from the first domain,
processing (522) an audio signal in the second
domain to obtain a first processed signal;
converting (523) a signal into a third domain
different from the first domain and the second
domain and processing (524) the signal in the
third domain to obtain a second processed signal;
and
switching (521) between processing (522) the au-
dio signal and converting (523) and processing
(524) so that, for a portion of the audio signal
encoded using the second coding algorithm, either
the first processed signal or the second
processed signal is in the second encoded signal.
16. Decoder for decoding an encoded audio signal, the en-
coded audio signal comprising a first coded signal, a
first processed signal in a second domain, and a
second processed signal in a third domain, wherein the
first coded signal, the first processed signal, and
the second processed signal are related to different
time portions of a decoded audio signal, and wherein a
first domain, the second domain and the third domain
are different from each other, comprising:
a first decoding branch (431, 440) for decoding the
first encoded signal based on the first coding algo-
rithm;
a second decoding branch for decoding the first
processed signal or the second processed signal,
wherein the second decoding branch comprises
a first inverse processing branch (531) for in-
verse processing the first processed signal to
obtain a first inverse processed signal in the
second domain;
a second inverse processing branch (533, 534) for
inverse processing the second processed signal to
obtain a second inverse processed signal in the
second domain;
a first combiner (532) for combining the first
inverse processed signal and the second inverse
processed signal to obtain a combined signal in
the second domain; and
a converter (540) for converting the combined
signal to the first domain; and
a second combiner (600) for combining the converted
signal in the first domain and the first decoded sig-
nal output by the first decoding branch to obtain a
decoded output signal in the first domain.
17. Decoder of the claim 16, in which the first combiner
(532) or the second combiner (600) comprises a switch
having a cross fading functionality.
18. Decoder of claim 16 or 17, in which the first domain
is a time domain, the second domain is an LPC domain,
the third domain is an LPC spectral domain, or the
first encoded signal is encoded in a fourth domain,
which is a time-spectral domain obtained by
time/frequency converting a signal in the first do-
main .
19. Decoder in accordance with any of the claims 16 to 18,
in which the first decoding branch (431, 440) compris-
es an inverse coder and a de-quantizer and a frequency
domain time domain converter (440), or
the second decoding branch comprises an inverse coder
and a de-quantizer in the first inverse processing
branch or an inverse coder and a de-quantizer and an
LPC spectral domain to LPC domain converter (534) in
the second inverse processing branch.
20. Decoder of claim 19, in which the first decoding
branch or the second inverse processing branch com-
prises an overlap-adder for performing a time domain
aliasing cancellation functionality.
21. Decoder in accordance with one of claim 16 to 20, in
which the first decoding branch or the second inverse
processing branch comprises a de-warper controlled by
a warping characteristic included in the encoded audio
signal.
22. Decoder in accordance with one of claims 16 to 21, in
which the encoded signal comprises, as side informa-
tion (4a), an indication whether a coded signal is to
be coded by a first encoding branch or a second encod-
ing branch or a first processing branch of the second
encoding branch or a second processing branch of the
second encoding branch, and
which further comprises a parser for parsing the en-
coded signal to determine, based on the side informa-
tion (4a) , whether a coded signal is to be processed
by the first decoding branch, or the second decoding
branch, or the first inverse processing branch of the
second decoding branch or the second inverse
processing branch of the second decoding branch.
23. Method of decoding an encoded audio signal, the en-
coded audio signal comprising a first coded signal, a
first processed signal in a second domain, and a
second processed signal in a third domain, wherein the
first coded signal, the first processed signal, and
the second processed signal are related to different
time portions of a decoded audio signal, and wherein a
first domain, the second domain and the third domain
are different from each other, comprising:
decoding (431, 440) the first encoded signal based on
a first coding algorithm;
decoding the first processed signal or the second
processed signal,
wherein the decoding the first processed signal or the
second processed signal comprises:
inverse processing (531) the first processed sig-
nal to obtain a first inverse processed signal in
the second domain;
inverse processing (533, 534) the second
processed signal to obtain a second inverse
processed signal in the second domain;
combining (532) the first inverse processed sig-
nal and the second inverse processed signal to
obtain a combined signal in the second domain;
and
converting (540) the combined signal to the first
domain; and
combining (600) the converted signal in the first do-
main and the decoded first signal to obtain a decoded
output signal in the first domain.
24. Encoded audio signal comprising:
a first coded signal encoded or to be decoded using a
first coding algorithm,
a first processed signal in a second domain, and a
second processed signal in a third domain, wherein the
first processed signal and the second processed signal
are encoded using a second coding algorithm,
wherein the first coded signal, the first processed
signal, and the second processed signal are related to
different time portions of a decoded audio signal,
wherein a first domain, the second domain and the
third domain are different from each other, and
side information (4a) indicating whether a portion of
the encoded signal is the first coded signal, the
first processed signal or the second processed signal.
25. Computer program for performing, when running on the
computer, the method of encoding an audio signal in
accordance with claim 15 or the method of decoding an
encoded audio signal in accordance with claim 23.
An audio encoder comprises a first information sink
oriented encoding branch such as a spectral domain encoding
branch, a second information source or SNR oriented encoding
branch such as an LPC-domain encoding branch, and a
switch for switching between the first encoding branch and
the second encoding branch, wherein the second encoding
branch comprises a converter into a specific domain different
from the spectral domain such as an LPC analysis stage
generating an excitation signal, and wherein the second encoding
branch furthermore comprises a specific domain coding
branch such as LPC domain processing branch, and a specific
spectral domain coding branch such as LPC spectral
domain processing branch, and an additional switch for
switching between the specific domain coding branch and the
specific spectral domain coding branch. An audio decoder
comprises a first domain decoder such as a spectral domain
decoding branch, a second domain decoder such as an LPC domain
decoding branch for decoding a signal such as an excitation
signal in the second domain, and a third domain decoder
such as an LPC-spectral decoder branch and two cascaded
switches for switching between the decoders.
| # | Name | Date |
|---|---|---|
| 1 | abstract-61-kolnp-2011.jpg | 2011-10-06 |
| 2 | 61-kolnp-2011-specification.pdf | 2011-10-06 |
| 3 | 61-kolnp-2011-pct request form.pdf | 2011-10-06 |
| 4 | 61-kolnp-2011-pct priority document notification.pdf | 2011-10-06 |
| 5 | 61-KOLNP-2011-PA.pdf | 2011-10-06 |
| 6 | 61-kolnp-2011-international search report.pdf | 2011-10-06 |
| 7 | 61-kolnp-2011-international publication.pdf | 2011-10-06 |
| 8 | 61-kolnp-2011-form-5.pdf | 2011-10-06 |
| 9 | 61-kolnp-2011-form-3.pdf | 2011-10-06 |
| 10 | 61-kolnp-2011-form-2.pdf | 2011-10-06 |
| 11 | 61-kolnp-2011-form-1.pdf | 2011-10-06 |
| 12 | 61-KOLNP-2011-FORM 3-1.1.pdf | 2011-10-06 |
| 13 | 61-KOLNP-2011-FORM 18.pdf | 2011-10-06 |
| 14 | 61-kolnp-2011-drawings.pdf | 2011-10-06 |
| 15 | 61-kolnp-2011-description (complete).pdf | 2011-10-06 |
| 16 | 61-kolnp-2011-correspondence.pdf | 2011-10-06 |
| 17 | 61-KOLNP-2011-CORRESPONDENCE-1.1.pdf | 2011-10-06 |
| 18 | 61-KOLNP-2011-CORRESPONDENCE 1.2.pdf | 2011-10-06 |
| 19 | 61-kolnp-2011-claims.pdf | 2011-10-06 |
| 20 | 61-KOLNP-2011-ASSIGNMENT.pdf | 2011-10-06 |
| 21 | 61-kolnp-2011-abstract.pdf | 2011-10-06 |
| 22 | Other Patent Document [05-09-2016(online)].pdf | 2016-09-05 |
| 23 | Other Patent Document [07-10-2016(online)].pdf | 2016-10-07 |
| 24 | Other Patent Document [11-03-2017(online)].pdf | 2017-03-11 |
| 25 | 61-KOLNP-2011-FER.pdf | 2017-03-16 |
| 26 | 61-KOLNP-2011-PETITION UNDER RULE 137 [25-08-2017(online)].pdf | 2017-08-25 |
| 27 | 61-KOLNP-2011-OTHERS [26-08-2017(online)].pdf | 2017-08-26 |
| 28 | 61-KOLNP-2011-FER_SER_REPLY [26-08-2017(online)].pdf | 2017-08-26 |
| 29 | 61-KOLNP-2011-COMPLETE SPECIFICATION [26-08-2017(online)].pdf | 2017-08-26 |
| 30 | 61-KOLNP-2011-CLAIMS [26-08-2017(online)].pdf | 2017-08-26 |
| 31 | 61-KOLNP-2011-ABSTRACT [26-08-2017(online)].pdf | 2017-08-26 |
| 32 | 61-KOLNP-2011-PatentCertificate31-10-2017.pdf | 2017-10-31 |
| 33 | 61-KOLNP-2011-IntimationOfGrant31-10-2017.pdf | 2017-10-31 |
| 34 | 61-KOLNP-2011-RELEVANT DOCUMENTS [30-01-2018(online)].pdf | 2018-01-30 |
| 35 | 61-KOLNP-2011-RELEVANT DOCUMENTS [06-02-2019(online)].pdf | 2019-02-06 |
| 36 | 61-KOLNP-2011-RELEVANT DOCUMENTS [06-02-2019(online)]-1.pdf | 2019-02-06 |
| 37 | 61-KOLNP-2011-RELEVANT DOCUMENTS [02-03-2020(online)].pdf | 2020-03-02 |
| 38 | 61-KOLNP-2011-RELEVANT DOCUMENTS [25-09-2021(online)].pdf | 2021-09-25 |
| 39 | 61-KOLNP-2011-RELEVANT DOCUMENTS [10-09-2022(online)].pdf | 2022-09-10 |
| 40 | 61-KOLNP-2011-RELEVANT DOCUMENTS [06-09-2023(online)].pdf | 2023-09-06 |
| 41 | 61-KOLNP-2011-POWER OF AUTHORITY [30-07-2025(online)].pdf | 2025-07-30 |
| 42 | 61-KOLNP-2011-FORM-16 [30-07-2025(online)].pdf | 2025-07-30 |
| 43 | 61-KOLNP-2011-ASSIGNMENT WITH VERIFIED COPY [30-07-2025(online)].pdf | 2025-07-30 |
| 1 | PatSeersearchstrategy_27-12-2016.pdf |
| 2 | PatSeersearchresult_27-12-2016.pdf |