“Apparatus And Method For Encoding And Decoding An Audio Signal Using

< Back

“Apparatus And Method For Encoding And Decoding An Audio Signal Using An Aligned Look Ahead Portion”

Abstract: An apparatus for encoding an audio signal having a stream of audio samples (100) comprises: a windower (102) for applying a prediction coding analysis window (200) to the stream of audio samples to obtain windowed data for a prediction analysis and for applying a transform coding analysis window (204) to the stream of audio samples to obtain windowed data for a transform analysis wherein the transform coding analysis window is associated with audio samples within a current frame of audio samples and with audio samples of a predefined portion of a future frame of audio samples being a transform coding look ahead portion (206) wherein the prediction coding analysis window is associated with at least the portion of the audio samples of the current frame and with audio samples of a predefined portion of the future frame being a prediction coding look ahead portion (208) wherein the transform coding look ahead portion (206) and the prediction coding look ahead portion (208) are identically to each other or are different from each other by less than 20% of the prediction coding look ahead portion (208) or less than 20% of the transform coding look ahead portion 206; and an encoding processor (104) for generating prediction coded data for the current frame using the windowed data for the prediction analysis or for generating transform coded data for the current frame using the windowed data for the transform analysis.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

13 August 2013

Publication Number

14/2014

Publication Type

INA

Invention Field

ELECTRONICS

Status

davar@cal2.vsnl.net.in

Parent Application

Patent Number

Legal Status

Grant Date

2019-09-13

Renewal Date

Applicants

FRAUNHOFER GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Hansastraße 27c 80686 München

Inventors

1. RAVELLI Emmanuel

Henkestr. 6 91054 Erlangen

2. GEIGER Ralf

Jakob Herz Weg 36 91052 Erlangen

3. SCHNELL Markus

Rieterstr. 5 90419 Nürnberg

4. FUCHS Guillaume

Fürther Straße 17 91058 Erlangen

5. RUOPPILA Vesa

Schweppermannstr. 39 90408 Nürnberg

6. BÄCKSTRÖM Tom

Bauerngasse 8 12 90443 Nürnberg

7. GRILL Bernhard

Peter Henlein Str. 7 91207 Lauf an der Pegnitz

8. HELMRICH Christian

Hauptstraße 68 91054 Erlangen

Specification

Apparatus and Method for Encoding and Decoding an Audio Signal Using an
Aligned Look-Ahead Portion
Specification
The present invention is related to audio coding and, particularly, to audio coding relying
on switched audio encoders and correspondingly controlled audio decoders, particularly
suitable for low-delay applications.
Several audio coding concepts relying on switched codecs are known. One well-known
audio coding concept is the so-called Extended Adaptive Multi-Rate-Wideband (AMRWB+)
codec, as described in 3GPP TS 26.290 BIO.0.0 (201 1-03). The AMR-WB+ audio
codec contains all the AMR-WB speech codec modes 1 to 9 and AMR-WB VAD and
DTX. AMR-WB+ extends the AMR-WB codec by adding TCX, bandwidth extension, and
stereo.
The AMR-WB+ audio codec processes input frames equal to 2048 samples at an internal
sampling frequency Fs. The internal sampling frequency is limited to the range of 12800 to
38400 Hz. The 2048 sample frames are split into two critically sampled equal frequency
bands. This results in two super-frames of 1024 samples corresponding to the low
frequency (LF) and high frequency (HF) bands. Each super-frame is divided into four 256-
sample frames. Sampling at the internal sampling rate is obtained by using a variable
sampling conversion scheme, which re-samples the input signal.
The LF and HF signals are then encoded using two different approaches: the LF is encoded
and decoded using the "core" encoder/decoder based on switched ACELP and transform
coded excitation (TCX). In ACELP mode, the standard AMR-WB codec is used. The HF
signal is encoded with relatively few bits (16 bits/frame) using a bandwidth extension
(BWE) method. The parameters transmitted from encoder to decoder are the mode
selection bits, the LF parameters and the HF parameters. The parameters for each 1024
samples super-frame are decomposed into four packets of identical size. When the input
signal is stereo, the left and right channels are combined into a mono-signal for
ACELP/TCX encoding, whereas the stereo encoding receives both input channels. On the
decoder-side, the LF and HF bands are decoded separately after which they are combined
in a synthesis filterbank. If the output is restricted to mono only, the stereo parameters are
omitted and the decoder operates in mono mode. The AMR-WB+ codec applies LP
analysis for both the ACELP and TCX modes when encoding the LF signal. The LP
coefficients are interpolated linearly at every 64-samples subframe. The LP analysis
window is a half-cosine of length 384 samples. To encode the core mono-signal, either an
ACELP or TCX coding is used for each frame. The coding mode is selected based on a
closed-loop analysis-by-synthesis method. Only 256-sample frames are considered for
ACELP frames, whereas frames of 256, 512 or 1024 samples are possible in TCX mode.
The window used for LPC analysis in AMR-WB+ is illustrated in Fig. 5b. A symmetric
LPC analysis window with look-ahead of 20 ms is used. Look-ahead means that, as
illustrated in Fig. 5b, the LPC analysis window for the current frame illustrated at 500 not
only extends within the current frame indicated between 0 and 20 ms in Fig. 5b illustrated
by 502, but extends into the future frame between 20 and 40 ms. This means that, by using
this LPC analysis window, an additional delay of 20 ms, i.e., a whole future frame is
necessary. Therefore, the look-ahead portion indicated at 504 in Fig. 5b contributes to the
systematic delay associated with the AMR-WB+ encoder. In other words, a future frame
must be fully available so that the LPC analysis coefficients for the current frame 502 can
be calculated.
Fig. 5a illustrates a further encoder, the so-called AMR-WB coder and, particularly, the
LPC analysis window used for calculating the analysis coefficients for the current frame.
Once again, the current frame extends between 0 and 20 ms and the future frame extends
between 20 and 40 ms. In contrast to Fig. 5b, the LPC analysis window of AMR-WB
indicated at 506 has a look-ahead portion 508 of 5 ms only, i.e., the time distance between
20 ms and 25 ms. Hence, the delay introduced by the LPC analysis is reduced substantially
with respect to Fig. 5a. On the other hand, however, it has been found that a larger lookahead
portion for determining the LPC coefficients, i.e., a larger look-ahead portion for the
LPC analysis window results in better LPC coefficients and, therefore, a smaller energy in
the residual signal and, therefore, a lower bitrate, since the LPC prediction better fits the
original signal.
While Figs. 5a and 5b relate to encoders having only a single analysis window for
determining the LPC coefficients for one frame, Fig. 5c illustrates the situation for the
G.718 speech coder. The G718 (06-2008) specification is related to transmission systems
and media digital systems and networks and, particularly, describes digital terminal
equipment and, particularly, a coding of voice and audio signals for such equipment.
Particularly, this standard is related to robust narrow-band and wideband embedded
variable bitrate coding of speech and audio from 8-32 kbit/s as defined in recommendation
ITU-T G718. The input signal is processed using 20 ms frames. The codec delay depends
on the sampling rate of input and output. For a wideband input and wideband output, the
overall algorithmic delay of this coding is 42.875 ms. It consists of one 20-ms frame, 1.875
ms delay of input and output re-sampling filters, 10 ms for the encoder look-ahead, one ms
of post- filtering delay and 10 ms at the decoder to allow for the overlap-add operation of
higher layer transform coding. For a narrow band input and a narrow band output, higher
layers are not used, but the 10 ms decoder delay is used to improve the coding performance
in the presence of frame erasures and for music signals. If the output is limited to layer 2,
the codec delay can be reduced by 10 ms. The description of the encoder is as follows. The
lower two layers are applied to a pre-emphasized signal sampled at 12.8 kHz, and the
upper three layers operate in the input signal domain sampled at 6 kHz. The core layer is
based on the code-excited linear prediction (CELP) technology, where the speech signal is
modeled by an excitation signal passed through a linear prediction (LP) synthesis filter
representing the spectral envelope. The LP filter is quantized in the immittance spectral
frequency (ISF) domain using a switched-predictive approach and the multi-stage vector
quantization. The open-loop pitch analysis is performed by a pitch-tracking algorithm to
ensure a smooth pitch contour. Two concurrent pitch evolution contours are compared and
the track that yields the smoother contour is selected in order to make the pitch estimation
more robust. The frame level pre-processing comprises a high-pass filtering, a sampling
conversion to 12800 samples per second, a pre-emphasis, a spectral analysis, a detection of
narrow-band inputs, a voice activity detection, a noise estimation, noise reduction, linear
prediction analysis, an LP to ISF conversion, and an interpolation, a computation of a
weighted speech signal, an open-loop pitch analysis, a background noise update, a signal
classification for a coding mode selection and frame erasure concealment. The layer 1
encoding using the selected encoding type comprises an unvoiced coding mode, a voiced
coding mode, a transition coding mode, a generic coding mode, and a discontinuous
transmission and comfort noise generation (DTX/CNG).
A long-term prediction or linear prediction (LP) analysis using the auto-correlation
approach determines the coefficients of the synthesis filter of the CELP model. I n CELP,
however, the long-term prediction is usually the „adaptive-codebook" and so is different
from the linear-prediction. The linear-prediction can, therefore , be regarded more a shortterm
prediction. The auto-correlation of windowed speech is converted to the LP
coefficients using the Levinson-Durbin algorithm. Then, the LPC coefficients are
transformed to the immitance spectral pairs (ISP) and consequently to immitance spectral
frequencies (ISF) for quantization and interpolation purposes. The interpolated quantized
and unquantized coefficients are converted back to the LP domain to construct synthesis
and weighting filters for each subframe. In case of encoding of an active signal frame, two
sets of LP coefficients are estimated in each frame using the two LPC analysis windows
indicated at 510 and 512 in Fig. 5c. Window 512 is called the "mid-frame LPC window",
and window 510 is called the "end-frame LPC window". A look-ahead portion 4 of 10
s is used for the frame-end auto-correlation calculation. The frame structure is illustrated
in Fig. 5c. The frame is divided into four subframes, each subframe having a length of 5
ms corresponding to 64 samples at a sampling rate of 12.8 kHz. The windows for frameend
analysis and for mid-frame analysis are centered at the fourth subframe and the second
subframe, respectively as illustrated in Fig. 5c. A Hamming window with the length of 320
samples is used for windowing. The coefficients are defined in G.718, Section 6.4.1. The
auto-correlation computation is described in Section 6.4.2. The Levinson-Durbin algorithm
is described in Section 6.4.3, the LP to ISP conversion is described in Section 6.4.4, and
the ISP to LP conversion is described in Section 6.4.5.
The speech encoding parameters such as adaptive codebook delay and gain, algebraic
codebook index and gain are searched by minimizing the error between the input signal
and the synthesized signal in the perceptually weighted domain. Perceptually weighting is
performed by filtering the signal through a perceptual weighting filter derived from the LP
filter coefficients. The perceptually weighted signal is also used in open-loop pitch
analysis.
The G.718 encoder is a pure speech coder only having the single speech coding mode.
Therefore, the G.718 encoder is not a switched encoder and, therefore, this encoder is
disadvantageous in that it only provides a single speech coding mode within the core layer.
Hence, quality problems will occur when this coder is applied to other signals than speech
signals, i.e., to general audio signals, for which the model behind CELP encoding is not
appropriate.
An additional switched codec is the so-called USAC codec, i.e., the unified speech and
audio codec as defined in ISO/IEC CD 23003-3 dated September 24, 2010. The LPC
analysis window used for this switched codec is indicated in Fig. 5d at 516. Again, a
current frame extending between 0 and 20 ms is assumed and, therefore, it appears that the
look-ahead portion 618 of this codec is 20 ms, i.e., is significantly higher than the lookahead
portion of G.718. Hence, although the USAC encoder provides a good audio quality
due to its switched nature, the delay is considerable due to the LPC analysis window lookahead
portion 518 in Fig. 5d. The general structure of USAC is as follows. First, there is a
common pre/postprocessing consisting of an MPEG surround (MPEGS) functional unit to
handle stereo or multi-channel processing and an enhanced SBR (eSBR) unit which
handles the parametric representation of the higher audio frequency in the input signal.
Then, there are two branches, one consisting of a modified advanced audio coding (AAC)
tool path and the other consisting of a linear prediction coding (LP or LPC domain) based
path, which in turn features either a frequency domain representation or a time-domain
representation of the LPC residual. All transmitted spectra for both, AAC and LPC, are
represented in MDCT domain following quantization and arithmetic coding. The timedomain
representation uses an ACELP excitation coding scheme. The ACELP tool
provides a way to efficiently represent a time domain excitation signal by combining a
long-term predictor (adaptive codeword) with a pulse-like sequence (innovation
codeword). The reconstructed excitation is sent through an LP synthesis filter to form a
time domain signal. The input to the ACELP tool comprises adaptive and innovation
codebook indices, adaptive and innovation codes gain values, other control data and
inversely quantized and interpolated LPC filter coefficients. The output of the ACELP tool
is the time-domain reconstructed audio signal.
The MDCT-based TCX decoding tool is used to turn the weighted LP residual
representation from an MDCT domain back into a time domain signal and outputs the
weighted time-domain signal including weighted LP synthesis filtering. The IMDCT can
be configured to support 256, 512 or 1024 spectral coefficients. The input to the TCX tool
comprises the (inversely quantized) MDCT spectra, and inversely quantized and
interpolated LPC filter coefficients. The output of the TCX tool is the time-domain
reconstructed audio signal.
Fig. 6 illustrates a situation in USAC, where the LPC analysis windows 16 for the current
frame and 520 for the past or last frame are drawn, and where, in addition, a TCX window
522 is illustrated. The TCX window 522 is centered at the center of the current frame
extending between 0 and 20 ms and extends 10 ms into the past frame and 10 s into the
future frame extending between 20 and 40 ms. Hence, the LPC analysis window 516
requires an LPC look-ahead portion between 20 and 40 ms, i.e., 20 ms, while the TCX
analysis window additionally has a look-ahead portion extending between 20 and 30 ms
into the future frame. This means that the delay introduced by the USAC analysis window
516 is 20 ms, while the delay introduced into the encoder by the TCX window is 10 ms.
Hence. It becomes clear that the look-ahead portions of both kinds of windows are not
aligned to each other. Therefore, even though the TCX window 522 only introduces a
delay of 10 ms, the whole delay of the encoder is nevertheless 20 ms due to the LPC
analysis window 516. Therefore, even though there is a quite small look-ahead portion for
the TCX window, this does not reduce the overall algorithmic delay of the encoder, since
the total delay is determined by the highest contribution, i.e., is equal to 20 ms due to the
LPC analysis window 516 extending 20 ms into the future frame, i.e., not only covering
the current frame but additionally covering the future frame.
It is an object of the present invention to provide an improved coding concept for audio
coding or decoding which, on the one hand, provides a good audio quality and which, on
the other hand, results in a reduced delay.
This object is achieved by an apparatus for encoding an audio signal in accordance with
claim 1, a method of encoding an audio signal in accordance with claim 15, an audio
decoder in accordance with claim 16, a method of audio decoding in accordance with
claim 24 or a computer program in accordance with claim 25.
In accordance with the present invention, a switched audio codec scheme is applied having
a transform coding branch and a prediction coding branch. Importantly, the two kinds of
windows, i.e., the prediction coding analysis window on the one hand and the transform
coding analysis window on the other hand are aligned with respect to their look-ahead
portion so that the transform coding look-ahead portion and the prediction coding lookahead
portion are identical or are different from each other by less than 20% of the
prediction coding look-ahead portion or less than 20% of the transform coding look-ahead
portion. It is to be noted that the prediction analysis window" is used not only in the
prediction coding branch, but it is actually used in both branches. The LPC analysis is also
used for shaping the noise in the transform domain. Therefore, in other words, the lookahead
portions are identical or are quite close to each other. This ensures that an optimum
compromise is achieved and that no audio quality or delay features are set into a suboptimum
way. Hence, for the prediction coding in the analysis window it has been found
out that the LPC analysis is the better the higher the look-ahead is, but, on the other hand,
the delay increases with a higher look-ahead portion. On the other hand, the same is true
for the TCX window. The higher the look-ahead portion of the TCX window is, the better
the TCX bitrate can be reduced, since longer TCX windows result in lower bitrates in
general. Therefore, in contrast to the present invention, the look-ahead portions are
identical or quite close to each other and, particularly, less than 20% different from each
other. Therefore, the look-ahead portion, which is not desired due to delay reasons is, on
the other hand, optimally used by both, encoding/decoding branches.
In view of that, the present invention provides an improved coding concept with, on the
one hand, a low-delay when the look-ahead portion for both analysis windows is set low
and provides, on the other hand, an encoding/decoding concept with good characteristics
due to the fact that the delay which has to be introduced for audio quality reasons or bitrate
reasons anyways is optimally used by both coding branches and not only by a single
coding branch.
An apparatus for encoding an audio signal having a stream of audio samples comprises a
windower for applying a prediction coding analysis window to a stream of audio samples
to obtain windowed data for a prediction analysis and for applying a transform coding
analysis window to the stream of audio samples to obtain windowed data for a transform
analysis. The transform coding analysis window is associated with audio samples of a
current frame of audio samples of a predefined look-ahead portion of a future frame of
audio samples being a transform coding look-ahead portion.
Furthermore, the prediction coding analysis window is associated with at least a portion of
the audio samples of the current frame and with audio samples of a predefined portion of
the future frame being a prediction coding look-ahead portion.
The transform coding look-ahead portion and the prediction coding look-ahead portion are
identical to each other or are different from each other by less than 20 % of the prediction
coding look-ahead portion or less than 20 % of the transform coding look-ahead portion
and are therefore quite close to each other. The apparatus additionally comprises an
encoding processor for generating prediction coded data for the current frame using the
windowed data for the prediction analysis or for generating transform coded data for the
current frame using the window data for transform analysis.
An audio decoder for decoding an encoded audio signal comprises a prediction parameter
decoder for performing a decoding of data for a prediction coded frame from the encoded
audio signal and, for the second branch, a transform parameter decoder for performing a
decoding of data for a transform coded frame from the encoded audio signal.
The transform parameter decoder is configured for performing a spectral-time transform
which is preferably an aliasing-affected transform such as an MDCT or MDST or any
other such transform, and for applying a synthesis window to transformed data to obtain a
data for the current frame and the future frame. The synthesis window applied by the audio
decoder is so that it has a first overlap portion, an adjacent second non-overlap portion and
an adjacent third overlap portion, wherein the third overlap portion is associated with audio
samples for the future frame and the non-overlap portion is associated with data of the
current frame. Additionally, in order to have a good audio quality on the decoder side, an
overlap-adder is applied for overlapping and adding synthesis windowed samples
associated with the third overlap portion of a synthesis window for the current frame and
synthesis windowed samples associated with the first overlap portion of a synthesis
window for the future frame to obtain a first portion of audio samples for the future frame,
wherein a rest of the audio samples for the future frame are synthesis windowed samples
associated with the second non-overlapping portion of the synthesis window for the future
frame obtained without overlap-adding, when the current frame and the future frame
comprise transform coded data.
Preferred embodiments of the present invention have the feature that the same look-ahead
for the transform coding branch such as the TCX branch and the prediction coding branch
such as the ACELP branch are identical to each other so that both coding modes have the
maximum available look-ahead under delay constraints. Furthermore, it is preferred that
the TCX window overlap is restricted to the look-ahead portion so that a switching from
the transform coding mode to the prediction coding mode from one frame to the next frame
is easily possible without any aliasing addressing issues.
A further reason to restrict the overlap to the look ahead is for not introducing a delay at
the decoder side. If one would have a TCX window with 10ms look ahead, and e.g. 20ms
overlap, one would introduce 10ms more delay in the decoder. When one has a TCX
window with 10ms look ahead and 10ms overlap, one does not have any additional delay
at the decoder side. The easier switching is a good consequence of that.
Therefore, it is preferred that the second non-overlap portion of the analysis window and of
course the synthesis window extend until the end of current frame and the third overlap
portion only starts with respect to the future frame. Furthermore, the non-zero portion of
the TCX or transform coding analysis/synthesis window is aligned with the beginning of
the frame so that, again, an easy and low efficiency switching over from one mode to the
other mode is available.
Furthermore, it is preferred that a whole frame consisting of a plurality of subframes, such
as four subframes, can either be fully coded in the transform coding mode (such as TCX
mode) or fully coded in the prediction coding mode (such as the ACELP mode).
Furthermore, it is preferred to not only use a single LPC analysis window but two different
LPC analysis windows, where one LPC analysis window is aligned with the center of the
fourth subframe and is an end frame analysis window while the other analysis window is
aligned with the center of the second subframe and is a mid frame analysis window. If the
encoder is switched to transform coding, then however it is preferred to only transmit a
single LPC coefficient data set only derived from the LPC analysis based on the end frame
LPC analysis window. Furthermore, on the decoder-side, it is preferred to not use this LPC
data directly for transform coding synthesis, and particularly a spectral weighting of TCX
coefficients. Instead, it is preferred to interpolate the TCX data obtained from the end
frame LPC analysis window of the current frame with the data obtained by the end frame
LPC analysis window from the past frame, i.e. the frame immediately preceding in time
the current frame. By transmitting only a single set of LPC coefficients for a whole frame
in the TCX mode, a further bitrate reduction can be obtained compared to transmitting two
LPC coefficient data sets for mid frame analysis and end frame analysis. When, however,
the encoder is switched to ACELP mode, then both sets of LPC coefficients are transmitted
from the encoder to the decoder.
Furthermore, it is preferred that the mid-frame LPC analysis window ends immediately at
the later frame border of the current frame and additionally extends into the past frame.
This does not introduce any delay, since the past frame is already available and can be used
without any delay.
On the other hand, it is preferred that the end frame analysis window starts somewhere
within the current frame and not at the beginning of the current frame. This, however, is
not problematic, since, for the forming TCX weighting, an average of the end frame LPC
data set for the past frame and the end frame LPC data set for the current frame is used so
that, in the end, all data are in a sense used for calculating the LPC coefficients. Hence, the
start of the end frame analysis window is preferably within the look-ahead portion of the
end frame analysis window of the past frame.
On the decoder-side, a significantly reduced overhead for switching from one mode to the
other mode is obtained. The reason is that the non-overlapping portion of the synthesis
window, which is preferably symmetric within itself, is not associated to samples of the
current frame but is associated with samples of a future frame, and therefore only extends
within the look-ahead portion, i.e., in the future frame only. Hence, the synthesis window
is so that only the first overlap portion preferably starting at the immediate start of the
current frame is within the current frame and the second non-overlapping portion extends
from the end of the first overlapping portion to the end of the current frame and, therefore,
the second overlap portion coincides with the look-ahead portion. Therefore, when there is
a transition from TCX to ACELP, the data obtained due to the overlap portion of the
synthesis window is simply discarded and is replaced by prediction coding data which is
available from the very beginning of the future frame out of the ACELP branch.
On the other hand, when there is a switch from ACELP to TCX, a specific transition
window is applied which immediately starts at the beginning of the current frame, i.e., the
frame immediately after the switchover, with a non-overlapping portion so that any data do
not have to be reconstructed in order to find overlap "partners". Instead, the non-overlap
portion of the synthesis window provides correct data without any overlapping and without
any overlap-add procedures necessary in the decoder. Only for the overlap portions, i.e.,
the third portion of the window for the current frame and the first portion of the window
for the next frame, an overlap-add procedure is useful and performed in order to have, as in
a straightforward MDCT, a continuous fade-in/fade-out from one block to the other in
order to finally obtain a good audio quality without having to increase the bitrate due to the
critically sampled nature of the MDCT as also known in the art under the term "timedomain
aliasing cancellation (TDAC).
Furthermore, the decoder is useful in that, for an ACELP coding mode, LPC data derived
from the mid-frame window and the end-frame window in the encoder is transmitted
while, for the TCX coding mode, only a single LPC data set derived from the end-frame
window is used. For spectrally weighting TCX decoded data, however, the transmitted
LPC data is not used as it is, but the data is averaged with the corresponding data from the
end-frame LPC analysis window obtained for the past frame.
Preferred embodiments of the present invention are subsequently described with respect to
the accompanying drawings, in which:
Fig. 1a illustrates a block diagram of a switched audio encoder;
Fig. l b illustrates a block diagram of a corresponding switched decoder;
Fig. l c illustrates more details on the transform parameter decoder illustrated in Fig.
lb;
Fig. Id illustrates more details on the transform coding mode of the decoder of Fig.
la;
Fig. 2a illustrates a preferred embodiment for the windower applied in the encoder
for LPC analysis on the one hand and transform coding analysis on the other
hand, and is a representation of the synthesis window used in the transform
coding decoder of Fig. lb;
Fig. 2b illustrates a window sequence of aligned LPC analysis windows and TCX
windows for a time span of more than two frames;
illustrates a situation for a transition from TCX to ACELP and a transition
window for a transition from ACELP to TCX;
illustrates more details of the encoder of Fig. la;
illustrates an analysis-by-synthesis procedure for deciding on a coding mode
for a frame;
illustrates a further embodiment for deciding between the modes for each
illustrates the calculation and usage of the LPC data derived by using two
different LPC analysis windows for a current frame;
illustrates the usage of LPC data obtained by windowing using an LPC
analysis window for the TCX branch of the encoder;
illustrates LPC analysis windows for AMR-WB;
Fig. 5d illustrates symmetric windows for AMR-WB+ for the purpose of LPC
analysis;
Fig. 5c illustrates LPC analysis windows for a G.718 encoder;
Fig. 5d illustrates LPC analysis windows as used in USAC; and
Fig. 6 illustrates a TCX window for a current frame with respect to an LPC
analysis window for the current frame.
Fig. l a illustrates an apparatus for encoding an audio signal having a stream of audio
samples. The audio samples or audio data enter the encoder at 100. The audio data is
introduced into a windower 102 for applying a prediction coding analysis window to the
stream of audio samples to obtain windowed data for a prediction analysis. The windower
102 is additionally configured for applying a transform coding analysis window to the
stream of audio samples to obtain windowed data for a transform analysis. Depending on
the implementation, the LPC window is not applied directly on the original signal but on a
„pre-emphasized" signal (like in AMR-WB, AMR-WB+, G718 and USAC). On the other
hand the TCX window is applied on the original signal directly (like in USAC). However,
both windows can also be applied to the same signals or the TCX window can also be
applied to a processed audio signal derived from the original signal such as by preemphasizing
or any other weighting used for enhancing the quality or compression
efficiency.
The transform coding analysis window is associated with audio samples in a current frame
of audio samples and with audio samples of a predefined portion of the future frame of
audio samples being a transform coding look-ahead portion.
Furthermore, the prediction coding analysis window is associated with at least a portion of
the audio samples of the current frame and with audio samples of a predefined portion of
the future frame being a prediction coding look-ahead portion.
As outlined in block 102, the transform coding look-ahead portion and the prediction
coding look-ahead portion are aligned with each other, which means that these portions are
either identical or quite close to each other, such as different from each other by less than
20% of the prediction coding look-ahead portion or less than 20% of the transform coding
look-ahead portion. Preferably, the look-ahead portions are identical or different from each
other by less than even 5% of the prediction coding look-ahead portion or less than 5% of
the transform coding look-ahead portion.
The encoder additionally comprises an encoding processor 104 for generating prediction
coded data for the current frame using the windowed data for the prediction analysis or for
generating transform coded data for the current frame using the windowed data for the
transform analysis.
Furthermore, the encoder preferably comprises an output interface 106 for receiving, for a
current frame and, in fact, for each frame, LPC data 108a and transform coded data (such
as TCX data) or prediction coded data (ACELP data) over line 108b. The encoding
processor 104 provides these two kinds of data and receives, as input, windowed data for a
prediction analysis indicated at 110a and windowed data for a transform analysis indicated
at 110b. Furthermore, the apparatus for encoding comprises an encoding mode selector or
controller 112 which receives, as an input, the audio data 100 and which provides, as an
output, control data to the encoding processor 104 via control lines 14a, or control data to
the output interface 106 via control line 14b.
Fig. 3a provides additional details on the encoding processor 104 and the windower 102.
The windower 102 preferably comprises, as a first module, the LPC or prediction coding
analysis windower 102a and, as a second component or module, the transform coding
windower (such as TCX windower) 102b. As indicated by arrow 300, the LPC analysis
window and the TCX window are aligned with each other so that the look-ahead portions
of both windows are identical to each other, which means that both look-ahead portions
extend until the same time instant into a future frame. The upper branch in Fig. 3a from the
LPC windower 102a onwards to the right is a prediction coding branch comprising an LPC
analyzer and interpolator 302, a perceptual weighting filter or a weighting block 304 and a
prediction coding parameter calculator 306 such as an ACELP parameter calculator. The
audio data 100 is provided to the LPC windower 102a and the perceptual weighting block
304. Additionally, the audio data is provided to the TCX windower, and the lower branch
from the output of the TCX windower to the right constitutes a transform coding branch.
This transform coding branch comprises a time-frequency conversion block 310, a spectral
weighting block 312 and a processing/quantization encoding block 314. The time
frequency conversion block 310 is preferably implemented as an aliasing-introducing
transform such as an MDCT, an MDST or any other transform which has a number of
input values being greater than the number of output values. The time-frequency
conversion has, as an input, the windowed data output by the TCX or, generally stated,
transform coding windower 02b.
Although, Fig. 3a indicates, for the prediction coding branch, an LPC processing with an
ACELP encoding algorithm, other prediction coders such as CELP or any other time
domain coders known in the art can be applied as well, although the ACELP algorithm is
preferred due to its quality on the one hand and its efficiency on the other hand.
Furthermore, for the transform coding branch, an MDCT processing particularly in the
time-frequency conversion block 310 is preferred, although any other spectral domain
transforms can be performed as well.
Furthermore, Fig. 3a illustrates a spectral weighting 312 for transforming the spectral
values output by block 310 into an LPC domain. This spectral weighting 312 is performed
with weighting data derived from the LPC analysis data generated by block 302 in the
prediction coding branch. Alternatively, however, the transform from the time-domain into
the LPC domain could also be performed in the time-domain. In this case, an LPC analysis
filter would be placed before the TCX windower 102b in order to calculate the prediction
residual time domain data. However, it has been found that the transform from the timedomain
into the LPC-domain is preferably performed in the spectral domain by spectrally
weighting the transform-coded data using LPC analysis data transformed from LPC data
into corresponding weighing factors in the spectral domain such as the MDCT domain.
Fig. 3b illustrates the general overview for illustrating an analysis-by-synthesis or "closedloop"
determination of the coding mode for each frame. To this end, the encoder illustrated
in Fig. 3c comprises a complete transform coding encoder and transform coding decoder as
is illustrated at 104b and, additionally, comprises a complete prediction coding encoder
and corresponding decoder indicated at 104a in Fig. 3c. Both blocks 104a, 104b receive, as
an input, the audio data and perform a full encoding/decoding operation. Then, the results
of the encoding/decoding operation for both coding branches 104a, 104b are compared to
the original signal and a quality measure is determined in order to find out which coding
mode resulted in a better quality. The quality measure can be a segmented SNR value or an
average segmental SNR such as, for example, described in Section 5.2.3 of 3GPP TS
26.290. However, any other quality measures can be applied as well which typically rely
on a comparison of the encoding/decoding result with the original signal.
Based on the quality measure which is provided from each branch 104a, 104b to the
decider 112, the decider decides whether the current examined frame is to be encoded
using ACELP or TCX. Subsequent to the decision, there are several ways in order to
perform the coding mode selection. One way is that the decider 112 controls the
corresponding encoder/decoder blocks 104a, 104b, in order to simply output the coding
result for the current frame to the output interface 106, so that it is made sure that, for a
certain frame, only a single coding result is transmitted in the output coded signal at 107.
Alternatively, both devices 104a, 104b could forward their encoding result already to the
output interface 106, and both results are stored in the output interface 106 until the decider
controls the output interface via line 105 to either output the result from block 104b or
from block 104a.
Fig. 3b illustrates more details on the concept of Fig. 3c. Particularly, block 104a
comprises a complete ACELP encoder and a complete ACELP decoder and a comparator
112a. The comparator 112a provides a quality measure to comparator 112c. The same is
true for comparator 112b, which has a quality measure due to the comparison of a TCX
encoded and again decoded signal with the original audio signal. Subsequently, both
comparators 112a, 112b provide their quality measures to the final comparator 112c.
Depending on which quality measure is better, the comparator decides on a CELP or TCX
decision. The decision can be refined by introducing additional factors into the decision.
Alternatively, an open-loop mode for determining the coding mode for a current frame
based on the signal analysis of the audio data for the current frame can be performed. In
this case, the decider 1 2 of Fig. 3c would perform a signal analysis of the audio data for
the current frame and would then either control an ACELP encoder or a TCX encoder to
actually encode the current audio frame. In this situation, the encoder would not need a
complete decoder, but an implementation of the encoding steps alone within the encoder
would be sufficient. Open-loop signal classifications and signal decisions are, for example,
also described in AMR-WB+ (3GPP TS 26.290).
Fig. 2a illustrates a preferred implementation of the windower 102 and, particularly, the
windows supplied by the windower.
Preferably, the prediction coding analysis window for the current frame is centered at the
center of a fourth subframe and this window is indicated at 200. Furthermore, it is
preferred to use an additional LPC analysis window, i.e., the mid-frame LPC analysis
window indicated at 202 and centered at the center of the second subframe of the current
frame. Furthermore, the transform coding window such as, for example, the MDCT
window 204 is placed with respect to the two LPC analysis windows 200, 202 as
illustrated. Particularly, the look-ahead portion 206 of the analysis window has the same
length in time as the look-ahead portion 208 of the prediction coding analysis window.
Both look-ahead portions extend 10 ms into the future frame. Furthermore, it is preferred
that the transform coding analysis window not only has the overlap portion 206, but has a
non-overlap portion between 10 and 20 ms 208 and the first overlap portion 210. The
overlap portions 206 and 210 are so that an overlap-adder in a decoder performs an
overlap-add processing in the overlap portion, but an overlap-add procedure is not
necessary for the non-overlap portion.
Preferably, the first overlap portion 210 starts at the beginning of the frame, i.e., at zero ms
and extends until the center of the frame, i.e., 10 ms. Furthermore, the non-overlap portion
extends from the end of the first portion of the frame 210 until the end of the frame at 20
ms so that the second overlap portion 206 fully coincides with the look-ahead portion. This
has advantages due to switching from one mode to the other mode.. From a TCX
performance point of view, it would be better to use a sine window with full overlap (20
ms overlap, like in USAC). This would, however, make necessary a technology like
forward aliasing cancellation for the transitions between TCX and ACELP. Forward
aliasing cancellation is used in USAC to cancel the aliasing introduced by the missing next
TCX frames (replaced by ACELP). Forward aliasing cancellation requires a significant
amount of bits and thus is not suitable for a constant bitrate and, particularly, low-bitrate
codec like a preferred embodiment of the described codec. Therefore, in accordance with
the embodiments of the invention, instead of using FAC, the TCX window overlap is
reduced and the window is shifted towards the future so that the full overlap portion 206 is
placed in the future frame. Furthermore, the window illustrated in Fig. 2a for transform
coding has nevertheless a maximum overlap in order to receive perfect reconstruction in
the current frame, when the next frame is ACELP and without using forward aliasing
cancellation. This maximum overlap is preferably set to 0 ms which is the available lookahead
in time, i.e., 10 ms as becomes clear from Fig. 2a.
Although Fig. 2a has been described with respect to an encoder, where window 204 for
transform encoding is an analysis window, it is noted that window 204 also represents a
synthesis window for transform decoding. In a preferred embodiment, the analysis window
is identical to the synthesis window, and both windows are symmetric in itself. This means
that both windows are symmetric to a (horizontal) center line. In other applications,
however, non-symmetric windows can be used, where the analysis window is different in
shape than the synthesis window.
Fig. 2b illustrates a sequence of windows over a portion of a past frame, a subsequently
following current frame, a future frame which is subsequently following the current frame
and the next future frame which is subsequently following the future frame.
It becomes clear that the overlap-add portion processed by an overlap-add processor
illustrated at 250 extends from the beginning of each frame until the middle of each frame,
i.e., between 20 and 30 ms for calculating the future frame data and between 40 and 50 ms
for calculating TCX data for the next future frame or between zero and 10 ms for
calculating data for the current frame. However, for calculating the data in the second half
of each frame, no overlap-add, and therefore no forward aliasing cancellation technique is
necessary. This is due to the fact that the synthesis window has a non-overlap part in the
second half of each frame.
Typically, the length of an MDCT window is twice the length of a frame. This is the case
in the present invention as well. When, again, Fig. 2a is considered, however, it becomes
clear that the analysis/synthesis window only extends from zero to 30 ms, but the complete
length of the window is 40 ms. This complete length is significant for providing input data
for the corresponding folding or unfolding operation of the MDCT calculation. In order to
extend the window to a full length of 14 ms, 5 ms of zero values are added between - 5 and
0 ms and 5 seconds of MDCT zero values are also added at the end of the frame between
30 and 35 ms. This additional portions only having zeros, however, do not play any part
when it comes to delay considerations, since it is known to the encoder or decoder that the
last five ms of the window and the first five ms of the window are zeros, so that this data is
already present without any delay.
Fig. 2c illustrates the two possible transitions. For a transition from TCX to ACELP,
however, no special care has to be taken since, when it is assumed with respect to Fig. 2a
that the future frame is an ACELP frame, then the data obtained by TCX decoding the last
frame for the look-ahead portion 206 can simply be deleted, since the ACELP frame
immediately starts at the beginning of the future frame and, therefore, no data hole exists.
The ACELP data is self-consistent and, therefore, a decoder, when having a switch from
TCX to ACELP uses the data calculated from TCX for the current frame, discards the data
obtained by the TCX processing for the future frame and, instead, uses the future frame
data from the ACELP branch.
When, however, a transition from ACELP to TCX is performed, then a special transition
window as illustrated in Fig. 2c is used. This window starts at the beginning of the frame
from zero to 1, has a non-overlap portion 220 and has an overlap portion in the end
indicated at 222 which is identical to the overlap portion 206 of a straightforward MDCT
window.
This window is, additionally, padded with zeros between -12.5 ms to zero at the beginning
of the window and between 30 and 35.5 ms at the end, i.e., subsequent to the look-ahead
portion 222. This results in an increased transform length. The length is 50 ms, but the
length of the straightforward analysis/synthesis window is only 40 ms. This, however, does
not decrease the efficiency or increase the bitrate, and this longer transform is necessary
when a switch from ACELP to TCX takes place. The transition window used in the
corresponding decoder is identical to the window illustrated in Fig. 2c.
Subsequently, the decoder is discussed in more detail. Fig. l b illustrates an audio decoder
for decoding an encoded audio signal. The audio decoder comprises a prediction parameter
decoder 180, where the prediction parameter decoder is configured for performing a
decoding of data for a prediction coded frame from the encoded audio signal received at
181 and being input into an interface 182. The decoder additionally comprises a transform
parameter decoder 183 for performing a decoding of data for a transform coded frame from
the encoded audio signal on line 181. The transform parameter decoder is configured for
performing, preferably, an aliasing-affected spectral-time transform and for applying a
synthesis window to transformed data to obtain data for the current frame and a future
frame. The synthesis window has a first overlap portion, an adjacent second non-overlap
portion, and an adjacent third overlap portion as illustrated in Fig. 2a, wherein the third
overlap portion is only associated with audio samples for the future frame and the nonoverlap
portion is only associated with data of the current frame. Furthermore, an overlapadder
184 is provided for overlapping and adding synthesis window samples associated
with the third overlap portion of a synthesis window for the current frame and a synthesis
window at the samples associated with the first overlap portion of a synthesis window for
the future frame to obtain a first portion of audio samples for the future frame . The rest of
the audio samples for the future frame are synthesis windowed samples associated with the
second non-overlap portion of the synthesis window for the future frame obtained without
overlap-adding when the current frame and the future frame comprise transform coded
data. When, however, a switch takes place from one frame to the next frame, a combiner
185 is useful which has to care for a good switchover from one coding mode to the other
coding mode in order to finally obtain the decoded audio data at the output of the combiner
185.
Fig. 1c illustrates more details on the construction of the transform parameter decoder 183 .
The decoder comprises a decoder processing stage 183a which is configured for
performing all processing necessary for decoding encoded spectral data such as arithmetic
decoding or Huffman decoding or generally, entropy decoding and a subsequent dequantization,
noise filling, etc. to obtain decoded spectral values at the output of block 183.
These spectral values are input into a spectral weighter 183b. The spectral weighter 183b
receives the spectral weighting data from an LPC weighting data calculator 183c, which is
fed by LPC data generated from the prediction analysis block on the encoder-side and
received, at the decoder, via the input interface 182. Then, an inverse spectral transform is
performed which comprises, as a first stage, preferably a DCT-IV inverse transform 183d
and a subsequent defolding and synthesis windowing processing 183e, before the data for
the future frame, for example, is provided to the overlap-adder 184. The overlap-adder can
perform the overlap-add operation when the data for the next future frame is available.
Blocks 183d and 183e together constitute the spectral/time transform or, in the
embodiment in Fig. lc, a preferred MDCT inverse transform (MDCT ').
Particularly, the block 183d receives data for a frame of 20 ms, and increases the data
volume in the defolding step of block 183e into data for 40 ms, i.e., twice the amount of
the data from before and, subsequently, the synthesis window having a length of 40 ms
(when the zero portions at the beginning and the end of the window are added together) is
applied to these 40 ms of data. Then, at the output of block 183e, the data for the current
block and the data within the look-ahead portion for the future block are available.
Fig. Id illustrates the corresponding encoder-side processing. The features discussed in the
context of Fig. Id are implemented in the encoding processor 104 or by corresponding
blocks in Fig. 3a. The time-frequency conversion 310 in Fig. 3a is preferably implemented
as an MDCT and comprises a windowing, folding stage 310a, where the windowing
operation in block 310a is implemented by the TCX windower 103d. Hence, the actually
first operation in block 310 in Fig. 3a is the folding operation in order to bring back 40 ms
of input data into 20 ms of frame data. Then, with the folded data which now has received
aliasing contributions, a DCT-IV is performed as illustrated in block 310d. Block 302
(LPC analysis) provides the LPC data derived from the analysis using the end-frame LPC
window to an (LPC to MDCT) block 302b, and the block 302d generates weighting factors
for performing spectral weighting by spectral weighter 312. Preferably, 16 LPC
coefficients for one frame of 20 ms in the TCX encoding mode are transformed into 16
MDCT-domain weighting factors, preferably by using an oDFT (odd Discrete Fourier
Transform). For other modes, such as the NB modes having a sampling rate of 8 kHz, the
number of LPC coefficients can be lower such as 10. For other modes with a higher
sampling rates, there can also be more than 16 LPC coefficients. The result of this oDFT
are 16 weighting values, and each weighting value is associated with a band of spectral
data obtained by block 310b. The spectral weighting takes place by dividing all MDCT
spectral values for one band by the same weighting value associated with this band in order
to very efficiently perform this spectral weighting operation in block 312. Hence, 16 bands
of MDCT values are each divided by the corresponding weighting factor in order to output
the spectrally weighted spectral values which are then further processed by block 314 as
known in the art, i.e., by, for example, quantizing and entropy-encoding.
On the other hand, on the decoder-side, the spectral weighting corresponding to block 312
in Fig. 1d will be a multiplication performed by spectral weighter 83b illustrated in Fig.
Ic.
Subsequently, Fig. 4a and Fig. 4b are discussed in order to outline how the LPC data
generated by the LPC analysis window or generated by the two LPC analysis windows
illustrated in Fig. 2 are used either in ACELP mode or in TCX/MDCT mode.
Subsequent to the application of the LPC analysis window, the autocorrelation
computation is performed with the LPC windowed data. Then, a Levinson Durbin
algorithm is applied on the autocorrelation function. Then, the 6 LP coefficients for each
LP analysis, i.e., 16 coefficients for the mid-frame window and 6 coefficients for the endframe
window are converted into ISP values. Hence, the steps from the autocorrelation
calculation to the ISP conversion are, for example, performed in block 400 of Fig. 4a.
Then, the calculation continues, on the encoder side by a quantization of the ISP
coefficients. Then, the ISP coefficients are again unquantized and converted back to the LP
coefficient domain. Hence, LPC data or, stated differently, 16 LPC coefficients slightly
different from the LPC coefficients derived in block 400 (due to quantization and
requantization) are obtained which can then be directly used for the fourth subframe as
indicated in step 401 . For the other subframes, however, it is preferred to perform several
interpolations as, for example, outlined in section 6.8.3 of Rec. ITU-T G.718 (06/2008).
LPC data for the third subframe are calculated by interpolating end-frame and mid-frame
LPC data illustrated at block 402. The preferred interpolation is that each corresponding
data are divided by two and added together, i.e., an average of the end-frame and midframe
LPC data. In order to calculate the LPC data for the second subframe as illustrated in
block 403, additionally, an interpolation is performed. Particularly, 10% of the values of
the end-frame LPC data of the last frame, 80% of the mid-frame LPC data for the current
frame and 10% of the values of the LPC data for the end-frame of the current frame are
used in order to finally calculate the LPC data for the second subframe.
Finally, the LPC data for the first subframe are calculated, as indicated in block 404, by
forming an average between the end-frame LPC data of the last frame and the mid-frame
LPC data of the current frame.
For performing the ACELP encoding, both quantized LPC parameter sets, i.e., from the
mid-frame analysis and the end-frame analysis are transmitted to a decoder.
Based on the results for the individual subframes calculated by blocks 401 to 404, the
ACELP calculations are performed as indicated in block 405 in order to obtain the ACELP
data to be transmitted to the decoder.
Subsequently, Fig. 4b is described. Again, in block 400, mid-frame and end-frame LPC
data are calculated. However, since there is the TCX encoding mode, only the end-frame
LPC data are transmitted to the decoder and the mid-frame LPC data are not transmitted to
the decoder. Particularly, one does not transmit the LPC coefficients themselves to the
decoder, but one transmits the values obtained after ISP transform and quantization.
Hence, it is preferred that, as LPC data, the quantized ISP values derived from the endframe
LPC data coefficients are transmitted to the decoder.
In the encoder, however, the procedures in steps 406 to 408 are, nevertheless, to be
performed in order to obtain weighting factors for weighting the MDCT spectral data of
the current frame. To this end, the end-frame LPC data of the current frame and the endframe
LPC data of the past frame are interpolated. However, it is preferred to not
interpolate the LPC data coefficients themselves as directly derived from the LPC analysis.
Instead, it is preferred to interpolate the quantized and again dequantized ISP values
derived from the corresponding LPC coefficients. Hence, the LPC data used in block 406
as well as the LPC data used for the other calculations in block 401 to 404 are always,
preferably, quantized and again de-quantized ISP data derived from the original 16 LPC
coefficients per LPC analysis window.
The interpolation in block 406 is preferably a pure averaging, i.e., the corresponding values
are added and divided by two. Then, in block 407, the MDCT spectral data of the current
frame are weighted using the interpolated LPC data and, in block 408, the further
processing of weighted spectral data is performed in order to finally obtain the encoded
spectral data to be transmitted from the encoder to a decoder. Hence, the procedures
performed in the step 407 correspond to the block 312, and the procedure performed in
block 408 in Fig. 4d corresponds to the block 314 in Fig. 4d. The corresponding operations
are actually performed on the decoder-side. Hence, the same interpolations are necessary
on the decoder-side in order to calculate the spectral weighting factors on the one hand or
to calculate the LPC coefficients for the individual subframes by interpolation on the other
hand. Therefore, Fig. 4a and Fig. 4b are equally applicable to the decoder-side with respect
to the procedures in blocks 401 to 404 or 406 of Fig. 4b.
The present invention is particularly useful for low-delay codec implementations. This
means that such codecs are designed to have an algorithmic or systematic delay preferably
below 45 ms and, in some cases even equal to or below 35 ms. Nevertheless, the lookahead
portion for LPC analysis and TCX analysis are necessary for obtaining a good audio
quality. Therefore, a good trade-off between both contradictory requirements is necessary.
It has been found that the good trade-off between delay on the one hand and quality on the
other hand can be obtained by a switched audio encoder or decoder having a frame length
of 20 ms, but it has been found that values for frame lengths between 15 and 30 ms also
provide acceptable results. On the other hand, it has been found that a look-ahead portion
of 10 ms is acceptable when it comes to delay issues, but values between 5 ms and 20 ms
are also useful depending on the corresponding application. Furthermore, it has been found
that the relation between look-ahead portion and the frame length is useful when it has the
value of 0.5, but other values between 0.4 and 0.6 are useful as well. Furthermore,
although the invention has been described with ACELP on the one hand and MDCT-TCX
on the other hand, other algorithms operating in the time domain such as CELP or any
other prediction or wave form algorithms are useful as well. With respect to TCX/MDCT,
other transform domain coding algorithms such as an MDST, or any other transform-based
algorithms can be applied as well.
The same is true for the specific implementation of LPC analysis and LPC calculation. It is
preferred to rely on the procedures described before, but other procedures for
calculation/interpolation and analysis can be used as well, as long as those procedures rely
on an LPC analysis window.
Although some aspects have been described in the context of an apparatus, it is clear that
these aspects also represent a description of the corresponding method, where a block or
device corresponds to a method step or a feature of a method step. Analogously, aspects
described in the context of a method step also represent a description of a corresponding
block or item or feature of a corresponding apparatus.
Depending on certain implementation requirements, embodiments of the invention can be
implemented in hardware or in software. The implementation can be performed using a
digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an
EPROM, an EEPROM or a FLASH memory, having electronically readable control
signals stored thereon, which cooperate (or are capable of cooperating) with a
programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a non-transitory data carrier
having electronically readable control signals, which are capable of cooperating with a
programmable computer system, such that one of the methods described herein is
performed.
Generally, embodiments of the present invention can be implemented as a computer
program product with a program code, the program code being operative for performing
one of the methods when the computer program product runs on a computer. The program
code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods
described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program
having a program code for performing one of the methods described herein, when the
computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital
storage medium, or a computer-readable medium) comprising, recorded thereon, the
computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of
signals representing the computer program for performing one of the methods described
herein. The data stream or the sequence of signals may for example be configured to be
transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a
programmable logic device, configured to or adapted to perform one of the methods
described herein.
A further embodiment comprises a computer having installed thereon the computer
program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable
gate array) may be used to perform some or all of the functionalities of the methods
described herein. In some embodiments, a field programmable gate array may cooperate
with a microprocessor in order to perform one of the methods described herein. Generally,
the methods are preferably performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of the present
invention. It is understood that modifications and variations of the arrangements and the
details described herein will be apparent to others skilled in the art. It is the intent,
therefore, to be limited only by the scope of the impending patent claims and not by the
specific details presented by way of description and explanation of the embodiments
herein.
Claims
Apparatus for encoding an audio signal having a stream of audio samples (100),
comprising:
a windower (102) for applying a prediction coding analysis window (200) to the
stream of audio samples to obtain windowed data for a prediction analysis and for
applying a transform coding analysis window (204) to the stream of audio samples
to obtain windowed data for a transform analysis,
wherein the transform coding analysis window is associated with audio samples
within a current frame of audio samples and with audio samples of a predefined
portion of a future frame of audio samples being a transform-coding look-ahead
portion (206),
wherein the prediction coding analysis window is associated with at least the
portion of the audio samples of the current frame and with audio samples of a
predefined portion of the future frame being a prediction coding look-ahead portion
(208),
wherein the transform coding look-ahead portion (206) and the prediction coding
look-ahead portion (208) are identically to each other or are different from each
other by less than 20% of the prediction coding look-ahead portion (208) or less
than 20% of the transform coding look-ahead portion (206); and
an encoding processor (104) for generating prediction coded data for the current
frame using the windowed data for the prediction analysis or for generating
transform coded data for the current frame using the windowed data for the
transform analysis.
Apparatus of claim 1, wherein the transform coding analysis window (204)
comprises a non-overlapping portion extending in the transform-coding look-ahead
portion (206).
Apparatus of claim 1 or 2, wherein the transform coding analysis window (204)
comprises a further overlapping portion (210) starting at the beginning of the
current frame and ending at the beginning of the non-overlapping portion (208).
Apparatus of claim 1, in which the windower (102) is configured to only use a start
window (220, 222) for the transition from prediction coding to transform coding
from a frame to the next frame, wherein the start window is not used for a transition
from transform coding to prediction coding from one frame to the next frame.
Apparatus in accordance with one of the preceding claims, further comprising:
an output interface (106) for outputting an encoded signal for the current frame; and
an encoding mode selector ( 112) for controlling the encoding processor (104) to
output either prediction coded data or transform coded data for the current frame,
wherein the encoding mode selector ( 1 12) is configured to only switch between
either prediction coding or transform coding for the whole frame so that the
encoded signal for the whole frame either contains prediction coded data or
transform coded data.
Apparatus in accordance with one of the preceding claims,
wherein the windower (102) uses, in addition to the prediction coding analysis
window, a further prediction coding analysis window (202) being associated with
audio samples being placed at the beginning of the current frame, and wherein the
prediction coding analysis window (200) is not associated with audio samples being
placed at the beginning of the current frame.
Apparatus in accordance with one of the preceding claims,
wherein the frame comprises a plurality of subframes, wherein the prediction
analysis window (200) is centered to a center of a subframe, and wherein the
transform coding analysis window is centered to a border between two subframes.
Apparatus in accordance with claim 7,
wherein the prediction analysis window (200) is centered at the center of the last
subframe of the frame, wherein the further analysis window (202) is centered at a
center of the second subframe of the current frame, and wherein the transform
coding analysis window is centered at a border between the third and the fourth
subframe of the current frame, wherein the current frame is subdivided into four
subframes.
9. Apparatus in accordance with one of the preceding claims, wherein a further
prediction coding analysis window (202) does not have a look-ahead portion in the
future frame and is associated with samples of the current frame.
10. Apparatus in accordance with one of the preceding claims, in which the transform
coding analysis window additionally comprises a zero portion before a beginning of
the window and a zero portion subsequent to an end of the window so that a full
length in time of the transform coding analysis window is twice the length in time
of the current frame.
11. Apparatus in accordance with claim 10, wherein, for a transition from the
prediction coding mode to the transform coding mode from one frame to the next
frame, a transition window is used by the windower (102),
wherein the transition window comprises a first non-overlap portion starting at the
beginning of the frame and an overlap portion starting at the end of the non-overlap
portion and extending into the future frame,
wherein the overlap portion extending into the future frame has a length which is
identical to the length of the transform coding look-ahead portion of the analysis
window.
12. Apparatus in accordance with one of the preceding claims, wherein a length in time
of the transform coding analysis window is greater than a length in time of the
prediction coding analysis window (200, 202).
13. Apparatus in accordance with one of the preceding claims, further comprising:
an output interface (106) for outputting an encoded signal for the current frame; and
an encoding mode selector ( 1 12) for controlling the encoding processor (104) to
output either prediction coded data or transform coded data for the current frame,
wherein the window (102) is configured to use a further prediction coding window
located in the current frame before the prediction coding window, and
wherein the encoding mode selector ( 11 ) is configured to control the encoding
processor (104) to only forward prediction coding analysis data derived from the
prediction coding window, when the transform coded data is output to the output
interface and not to forward the prediction coding analysis data derived from the
further prediction coding window, and
wherein the encoding mode selector ( 112) is configured to control the encoding
processor (104) to forward prediction coding analysis data derived from the
prediction coding window and to forward the prediction coding analysis data
derived from the further prediction coding window, when the prediction coded data
is output to the output interface.
Apparatus in accordance with one of the preceding claims, wherein the encoding
processor (104) comprises:
a prediction coding analyzer (302) for deriving prediction coding data for the
current frame from the windowed data (100a) for a prediction analysis;
a prediction coding branch comprising:
a filter stage (304) for calculating filter data from the audio samples for the
current frame using the prediction coding data; and
a prediction coder parameter calculator (306) for calculating prediction
coding parameters for the current frames; and
a transform coding branch comprising:
a time-spectral converter (310) for converting the window data for the
transform coding algorithm into a spectral representation;
a spectral weighter (312) for weighting the spectral data using weighted
weighting data derived from the prediction coding data to obtain weighted
spectral data; and
a spectral data processor (314) for processing the weighted spectral data to
obtain transform coded data for the current frame.
15. Method of encoding an audio signal having a stream of audio samples (100),
comprising:
applying (102) a prediction coding analysis window (200) to the stream of audio
samples to obtain windowed data for a prediction analysis and applying a transform
coding analysis window (204) to the stream of audio samples to obtain windowed
data for a transform analysis,
wherein the transform coding analysis window is associated with audio samples
within a current frame of audio samples and with audio samples of a predefined
portion of a future frame of audio samples being a transform-coding look-ahead
portion (206),
wherein the prediction coding analysis window is associated with at least the
portion of the audio samples of the current frame and with audio samples of a
predefined portion of the future frame being a prediction coding look-ahead portion
(208),
wherein the transform coding look-ahead portion (206) and the prediction coding
look-ahead portion (208) are identically to each other or are different from each
other by less than 20% of the prediction coding look-ahead portion (208) or less
than 20% of the transform coding look-ahead portion (206); and
generating (104) prediction coded data for the current frame using the windowed
data for the prediction analysis or for generating transform coded data for the
current frame using the windowed data for the transform analysis.
16. Audio decoder for decoding an encoded audio signal, comprising:
a prediction parameter decoder (180) for performing a decoding of data for a
prediction coded frame from the encoded audio signal;
a transform parameter decoder (183) for performing a decoding of data for a
transform coded frame from the encoded audio signal,
wherein the transform parameter decoder (183) is configured for performing a
spectral-time transform and for applying a synthesis window to transformed data to
obtain data for the current frame and a future frame, the synthesis window having a
first overlap portion, an adjacent second overlap portion and an adjacent third
overlap portion (206), the third overlap portion being associated with audio samples
for the future frame and the non-overlap portion (208) being associated with data of
the current frame; and
an overlap-adder (184) for overlapping and adding synthesis windowed samples
associated with the third overlap portion of a synthesis window for the current
frame and synthesis windowed samples associated with the first overlap portion of
a synthesis window for the future frame to obtain a first portion of audio samples
for the future frame, wherein a rest of the audio samples for the future frame are
synthesis windowed samples associated with the second non-overlapping portion of
the synthesis window for the future frame obtained without overlap-adding, when
the current frame and the future frame comprise transform-coded data.
17. Audio decoder of claim 16, wherein the current frame of the encoded audio signal
comprises transform coded data and the future frame comprises prediction coded
data, wherein the transform parameter decoder (183) is configured to perform a
synthesis windowing using the synthesis window for the current frame to obtain
windowed audio samples associated with the non-overlap portion (208) of the
synthesis window, wherein the synthesis windowed audio samples associated with
the third overlap portion of the synthesis window for the current frame are
discarded, and
wherein audio samples for the future frame are provided by the prediction
parameter decoder ( 180) without data from the transform parameter decoder ( 183).
18. Audio decoder of claim 16 or 17,
wherein the current frame comprises prediction coding data and the future frame
comprises transform coding data,
wherein the transform parameter decoder (183) is configured for using a transition
window being different from the synthesis window,
wherein the transition window (220, 222) comprises a first non-overlap portion
(220) at the beginning of the future frame and an overlap portion (222) starting at
an end of the future frame and extending into the frame following the future frame
in time, and
wherein the audio samples for the future frame are generated without an overlap
and audio data associated with the second overlap portion (222) of the window for
the future frame are calculated by the overlap-adder (184) using the first overlap
portion of the synthesis window for the frame following the future frame.
19. Audio decoder of one of claims 16 to 18,
in which the transform parameter calculator (183) comprises:
a spectral weighter (183b) for weighting decoded transform spectral data for the
current frame using prediction coding data; and
an prediction coding weighting data calculator (183c) for calculating the prediction
coding data by combining a weighted sum of prediction coding data derived from a
past frame and prediction coding data derived from the current frame to obtain
interpolated prediction coding data.
20. Audio decoder in accordance with claim 19,
wherein the prediction coding weighting data calculator (183c) is configured to
convert the prediction coding data into a spectral representation having a weighting
value for each frequency band, and
wherein the spectral weighter (183b) is configured to weight all spectral values in a
band by the same weighting value for this band.
2 1. Audio decoder of any of claims 16 to 19, wherein the synthesis window is
configured to have a total time length less than 50 ms and greater than 25 ms,
wherein the first and the third overlap portions have the same length and wherein
the third overlap portion has a length smaller than 15 ms.
22. Audio decoder of any of claims 16 to 21,
wherein the synthesis window has a length of 30 ms without zero padded portions,
the first and third overlap portions each have a length of 10 ms and the nonoverlapping
portion has a length of 10 ms.
23. Audio decoder of any of claims 16 to 22,
wherein the transform parameter decoder (183) is configured to apply, for the
spectral-time transform, a DCT transform (183d) having a number of samples
corresponding to a frame length, and a defolding operation (183e) for generating a
number of time values being twice the number of time values before the DCT, and
to apply ( 183e) the synthesis window to a result of the defolding operation, wherein
the synthesis window comprises, before the first overlap portion and subsequent to
the third overlap portion zero portions having a length being half the length of the
first and third overlap portions.
24. Method of decoding an encoded audio signal, comprising:
performing (180) a decoding of data for a prediction coded frame from the encoded
audio signal;
from the encoded audio signal,
wherein the step of performing (183) a decoding of data for a transform coded
frame comprises performing a spectral-time transform and applying a synthesis
window to transformed data to obtain data for the current frame and a future frame,
the synthesis window having a first overlap portion, an adjacent second overlap
portion and an adjacent third overlap portion (206), the third overlap portion being
associated with audio samples for the future frame and the non-overlap portion
(208) being associated with data of the current frame; and
overlapping and adding (184) synthesis windowed samples associated with the third
overlap portion of a synthesis window for the current frame and synthesis
windowed samples associated with the first overlap portion of a synthesis window
for the future frame to obtain a first portion of audio samples for the future frame,
wherein a rest of the audio samples for the future frame are synthesis windowed
samples associated with the second non-overlapping portion of the synthesis
window for the future frame obtained without overlap-adding, when the current
frame and the future frame comprise transform-coded data.
Computer program having a program code for performing, when running on a
computer, the method of encoding an audio signal of claim 15 or the method of
decoding an audio signal of claim 24.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	2515-KOLNP-2013-RELEVANT DOCUMENTS [06-09-2023(online)].pdf	2023-09-06
1	FOA.pdf	2013-08-13
2	2515-KOLNP-2013-RELEVANT DOCUMENTS [10-09-2022(online)].pdf	2022-09-10
2	F5.pdf	2013-08-13
3	F3.pdf	2013-08-13
3	2515-KOLNP-2013-RELEVANT DOCUMENTS [25-09-2021(online)].pdf	2021-09-25
4	F2.pdf	2013-08-13
4	2515-KOLNP-2013-RELEVANT DOCUMENTS [06-04-2020(online)].pdf	2020-04-06
5	DW.pdf	2013-08-13
5	2515-KOLNP-2013-IntimationOfGrant13-09-2019.pdf	2019-09-13
6	2515-KOLNP-2013-PatentCertificate13-09-2019.pdf	2019-09-13
6	2515-KOLNP-2013-(16-01-2014)-CORRESPONDENCE.pdf	2014-01-16
7	2515-KOLNP-2013-HearingNoticeLetter09-08-2019.pdf	2019-08-09
7	2515-KOLNP-2013-(16-01-2014)-ANNEXURE TO FORM 3.pdf	2014-01-16
8	2515-KOLNP-2013.pdf	2014-03-05
8	2515-KOLNP-2013-CORRECTED PAGES [08-08-2019(online)].pdf	2019-08-08
9	2515-KOLNP-2013-FORM 13 [08-08-2019(online)].pdf	2019-08-08
9	Other Patent Document [12-07-2016(online)].pdf	2016-07-12
10	2515-KOLNP-2013-Written submissions and relevant documents (MANDATORY) [08-08-2019(online)].pdf	2019-08-08
10	Other Patent Document [24-08-2016(online)].pdf	2016-08-24
11	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [24-06-2019(online)].pdf	2019-06-24
11	Other Patent Document [28-03-2017(online)].pdf	2017-03-28
12	2515-KOLNP-2013-ABSTRACT [14-05-2019(online)]-1.pdf	2019-05-14
12	Information under section 8(2) [21-06-2017(online)].pdf	2017-06-21
13	2515-KOLNP-2013-ABSTRACT [14-05-2019(online)].pdf	2019-05-14
13	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [20-09-2017(online)].pdf	2017-09-20
14	2515-KOLNP-2013-CLAIMS [14-05-2019(online)]-1.pdf	2019-05-14
14	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [15-11-2017(online)].pdf	2017-11-15
15	2515-KOLNP-2013-CLAIMS [14-05-2019(online)].pdf	2019-05-14
15	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [14-03-2018(online)].pdf	2018-03-14
16	2515-KOLNP-2013-CORRESPONDENCE [14-05-2019(online)]-1.pdf	2019-05-14
16	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [16-03-2018(online)].pdf	2018-03-16
17	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [14-09-2018(online)].pdf	2018-09-14
17	2515-KOLNP-2013-CORRESPONDENCE [14-05-2019(online)].pdf	2019-05-14
18	2515-KOLNP-2013-DRAWING [14-05-2019(online)]-1.pdf	2019-05-14
18	2515-KOLNP-2013-FER.pdf	2018-11-15
19	2515-KOLNP-2013-DRAWING [14-05-2019(online)].pdf	2019-05-14
19	2515-KOLNP-2013-PETITION UNDER RULE 137 [14-05-2019(online)].pdf	2019-05-14
20	2515-KOLNP-2013-FER_SER_REPLY [14-05-2019(online)]-1.pdf	2019-05-14
20	2515-KOLNP-2013-OTHERS [14-05-2019(online)].pdf	2019-05-14
21	2515-KOLNP-2013-FER_SER_REPLY [14-05-2019(online)].pdf	2019-05-14
21	2515-KOLNP-2013-OTHERS [14-05-2019(online)]-1.pdf	2019-05-14
22	2515-KOLNP-2013-FER_SER_REPLY [14-05-2019(online)].pdf	2019-05-14
22	2515-KOLNP-2013-OTHERS [14-05-2019(online)]-1.pdf	2019-05-14
23	2515-KOLNP-2013-FER_SER_REPLY [14-05-2019(online)]-1.pdf	2019-05-14
23	2515-KOLNP-2013-OTHERS [14-05-2019(online)].pdf	2019-05-14
24	2515-KOLNP-2013-PETITION UNDER RULE 137 [14-05-2019(online)].pdf	2019-05-14
24	2515-KOLNP-2013-DRAWING [14-05-2019(online)].pdf	2019-05-14
25	2515-KOLNP-2013-DRAWING [14-05-2019(online)]-1.pdf	2019-05-14
25	2515-KOLNP-2013-FER.pdf	2018-11-15
26	2515-KOLNP-2013-CORRESPONDENCE [14-05-2019(online)].pdf	2019-05-14
26	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [14-09-2018(online)].pdf	2018-09-14
27	2515-KOLNP-2013-CORRESPONDENCE [14-05-2019(online)]-1.pdf	2019-05-14
27	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [16-03-2018(online)].pdf	2018-03-16
28	2515-KOLNP-2013-CLAIMS [14-05-2019(online)].pdf	2019-05-14
28	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [14-03-2018(online)].pdf	2018-03-14
29	2515-KOLNP-2013-CLAIMS [14-05-2019(online)]-1.pdf	2019-05-14
29	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [15-11-2017(online)].pdf	2017-11-15
30	2515-KOLNP-2013-ABSTRACT [14-05-2019(online)].pdf	2019-05-14
30	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [20-09-2017(online)].pdf	2017-09-20
31	2515-KOLNP-2013-ABSTRACT [14-05-2019(online)]-1.pdf	2019-05-14
31	Information under section 8(2) [21-06-2017(online)].pdf	2017-06-21
32	2515-KOLNP-2013-Information under section 8(2) (MANDATORY) [24-06-2019(online)].pdf	2019-06-24
32	Other Patent Document [28-03-2017(online)].pdf	2017-03-28
33	2515-KOLNP-2013-Written submissions and relevant documents (MANDATORY) [08-08-2019(online)].pdf	2019-08-08
33	Other Patent Document [24-08-2016(online)].pdf	2016-08-24
34	2515-KOLNP-2013-FORM 13 [08-08-2019(online)].pdf	2019-08-08
34	Other Patent Document [12-07-2016(online)].pdf	2016-07-12
35	2515-KOLNP-2013-CORRECTED PAGES [08-08-2019(online)].pdf	2019-08-08
35	2515-KOLNP-2013.pdf	2014-03-05
36	2515-KOLNP-2013-HearingNoticeLetter09-08-2019.pdf	2019-08-09
36	2515-KOLNP-2013-(16-01-2014)-ANNEXURE TO FORM 3.pdf	2014-01-16
37	2515-KOLNP-2013-PatentCertificate13-09-2019.pdf	2019-09-13
37	2515-KOLNP-2013-(16-01-2014)-CORRESPONDENCE.pdf	2014-01-16
38	DW.pdf	2013-08-13
38	2515-KOLNP-2013-IntimationOfGrant13-09-2019.pdf	2019-09-13
39	F2.pdf	2013-08-13
39	2515-KOLNP-2013-RELEVANT DOCUMENTS [06-04-2020(online)].pdf	2020-04-06
40	F3.pdf	2013-08-13
40	2515-KOLNP-2013-RELEVANT DOCUMENTS [25-09-2021(online)].pdf	2021-09-25
41	F5.pdf	2013-08-13
41	2515-KOLNP-2013-RELEVANT DOCUMENTS [10-09-2022(online)].pdf	2022-09-10
42	2515-KOLNP-2013-RELEVANT DOCUMENTS [06-09-2023(online)].pdf	2023-09-06
42	FOA.pdf	2013-08-13

Search Strategy

1	Search_15-02-2018.pdf