Apparatus, Method And Computer Program For Obtaining A Parameter

< Back

Apparatus, Method And Computer Program For Obtaining A Parameter Describing A Variation Of A Signal Characteristic Of A Signal

Abstract: An apparatus for obtaining a parameter describing a variation of a signal characteristic of a signal on the basis of actual transform-domain parameters describing the audio signal in transform-domain comprises a parameter determinator. The parameter determinator is configured to determine one or more model parameters of a transformdomain variation model describing an evolution of the transform-domain parameters in dependence on one or more model parameters representing a signal characteristic, such that a model error, representing a deviation between a modeled temporal evolution of the transform-domain parameters and an evolution of the actual transform-domain parameters, is brought below a predetermined threshold value or minimized.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

21 July 2011

Publication Number

07/2012

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Patent Number

Legal Status

Grant Date

2019-01-03

Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

HANSASTRAβE 27C, 80686 MUNICH, GERMANY

Inventors

1. BAECKSTROEM, TOM

BAUERNGASSE 8-12, 90443 NUERNBERG, GERMANY

2. BAYER, STEFAN

DORTMUNDER STRASSE 14, 90425 NUERNBERG, GERMANY

3. GEIGER, RALF

JAKOB-HERZ-WEG 36, 91052 ERLANGEN, GERMANY

4. NEUENDORF, MAX

PARADIESSTRASSE 20, 90459 NÜRNBERG, GERMANY

5. DISCH, SASCHA

WILHELMSTRASSE 70, 90766 FÜRTH, GERMANY

Specification

Apparatus, method and computer program for obtaining a
parameter describing a variation of a signal characteristic
of a signal
Background of the invention
Embodiments according to the invention are related to an
apparatus, a method and a computer program for obtaining a
parameter describing a variation of a signal characteristic
of a signal on the basis of actual transform-domain
parameters describing the audio signal in a transform
domain.
Preferred embodiments according to the invention are
related to an apparatus, a method and a computer program
for obtaining a parameter describing a temporal variation
of a signal characteristic of an audio signal on the basis
of actual transform-domain parameters describing the audio
signal in a transform domain.
Further embodiments according to the invention are related
to signal variation estimation.
While the primary scope of the current invention is
analysis of temporal variations of audio signals, the same
method can be readily adapted to any digital signal and the
variations that such signals exhibit on any of their axis.
Such signals and variations include, for example, spatial
and temporal variations in characteristics such as
intensity and contrast of images and movies, modulations
(variations) in characteristics such as amplitude and
frequency of radar and radio signals, and variations in
properties such as heterogeneity of electrocardiogram
signals.
In the following, a brief introduction regarding the
concept of signal variation estimation will be given.
Classical signal processing usually begins with the
assumption of locally stationary signals and for many
applications, this is a reasonable assumption. However, to
claim that signals such as speech and audio are locally
stationary stretches the truth beyond acceptable levels in
some cases. Signals whose characteristics rapidly change
introduce distortions to analysis results that are
difficult to contain by classical approaches and thus
require methodology specially tailored for rapidly varying
signals.
For example, the coding of a speech signal with a transform
based coder may be considered. Here, the input signal is
analyzed in windows, whose contents are transformed to the
spectral domain. When the signal is a harmonic signal whose
fundamental frequency rapidly changes, the locations of
spectral peaks, corresponding to the harmonics, change over
time. If, for example, the analysis window length is
relatively long in comparison to the change in fundamental
frequency, the spectral peaks are spread to neighboring
frequency bins. In other words, the spectral representation
becomes smeared. This distortion may be specially severe at
the upper frequencies, where the location of spectral peaks
more rapidly moves when the fundamental frequency changes.
While methods exist for compensation of changes in the
fundamental frequency, such as time-warped-modified-
discrete-cosine-transform (TW-MDCT) (see references [8] and
[3]), pitch variation estimation has remained a challenge.
In the past, pitch variation has been estimated by
measuring the pitch and simply taking the time derivative.
However, since pitch estimation is a difficult and often
ambiguous task, the pitch variation estimates were littered
with errors. Pitch estimation suffers, among others, from
two types of common errors (see, for example, reference
[2]). Firstly, when the harmonics have greater energy than
the fundamental, estimators are often distracted to believe
that the harmonic is actually the fundamental, whereby the
output is a multiple of the true frequency. Such errors can
be observed as discontinuities in the pitch track and
produce a huge error in terms of the time derivative.
Secondly, most pitch estimation methods basically rely on
peak picking in the auto correlation (or similar) domain(s)
by some heuristic. Especially in the case of varying
signals, these peaks are broad (flat at the top), whereby a
small error in the autocorrelation estimate can move the
estimated peak location significantly. The pitch estimate
is thus an unstable estimate.
As indicated above, the general approach in signal
processing is to assume that the signal is constant in
short time intervals and estimate the properties in such
intervals. If, then, the signal is actually time-varying,
it is assumed that the time evolution of the signal is
sufficiently slow, so that the assumption of stationarity
in a short interval is sufficiently accurate and analysis
in short intervals will not produce significant distortion.
In view of the above, it is desirable to provide a concept
for obtaining a parameter describing a temporal variation
of a signal characteristic with improved robustness.
Summary of the invention
An embodiment according to the invention creates an
apparatus for obtaining a parameter describing a temporal
variation of a signal characteristic of an audio signal on
the basis of actual transform-domain parameters describing
the audio signal in a transform domain. The apparatus
comprises a parameter determinator configured to determine
one or more model parameters of a transform-domain
variation model describing a temporal evolution of
transform-domain parameters in dependence on one or more
model parameters representing a signal characteristic, such
that a model error, representation a deviation between a
modeled temporal evolution of the transformed-domain
parameters and a temporal evolution of the actual
transform-domain parameters, is brought below a
predetermined threshold value or is minimized.
This embodiment is based on the finding that typical
temporal variations of an audio signal result in a
characteristic temporal evolution in the transform-domain,
which can be well described using only a limited number of
model parameters. While this is particularly true for voice
signals, where the characteristic temporal evolution is
determined by the typical anatomy of the human speech
organs, the assumption holds over a wide range of audio and
other signals, like typical music signals.
Further, the typically smooth temporal evolution of a
signal characteristic (like, for example, a pitch, an
envelope, a tonality, a noisiness, and so on) can be
considered by the transform-domain variation model.
Accordingly, the usage of a parameterized transform-domain
variation model may even serve to enforce (or to consider)
the smoothness of the estimated signal characteristic.
Thus, discontinuities of the estimated signal
characteristic, or of the derivative thereof, can be
avoided. By choosing the transform-domain variation model
accordingly, any typical restrictions can be imposed on the
modeled variation of the signal characteristics, like, for
example, a limited rate of variation, a limited range of
values, and so on. Also, by choosing the transform-domain
variation model appropriately, the effects of harmonics can
be considered, such that, for example, an improved
reliability can be obtained by simultaneously modeling a
temporal evolution of a fundamental frequency and the
harmonic thereof.
Further, by using a variation modeling in the transform-
domain, the effect of signal distortions may be restricted.
While some kinds of distortion (for example, a frequency-
dependent signal delay) result in a severe modification of
a signal wave form, such distortion may have a limited
impact on the transform-domain representation of a signal.
As it is naturally desirable to also precisely estimate
signal characteristics in the presence of distortions, the
usage of the transform-domain has shown to be a very good
choice.
To summarize the above, the usage of a transform-domain
variation model, the parameters of which are adapted to
bring the parameterized transform-domain variation model
(or the output thereof) in agreement with an actual
temporal evolution of actual transform-domain parameters
describing an input audio signal, enables that the signal
characteristics of a typical audio signal can be determined
with good precision and reliability.
In a preferred embodiment, the apparatus may be configured
to obtain, as the actual transform-domain parameters, a
first set of transform-domain parameters describing a first
time interval of the audio signal in the transform-domain
for a predetermined set of values of a transformation
variable (also designated herein as "transform variable") .
Similarly, the apparatus may be configured to obtain a
second set of transform-domain parameters describing a
second time interval of the audio signal in the transform-
domain for the predetermined set of values of the
transformation variable. In this case, the parameter
determinator may be configured to obtain a frequency (or
pitch) variation model parameter using a parameterized
transform-domain variation model comprising a frequency-
variation (or pitch-variation) parameter and representing a
compression or expansion of the transform-domain
representation of the audio signal with respect to the
transformation variable assuming a smooth frequency
variation of the audio signal. The parameter determinator
may be configured to determine the frequency variation
parameter such that the parameterized transform-domain
variation model is adapted to the first set of transform-
domain parameters and to the second set of transform-domain
parameters. By using this approach, a very efficient usage
can be made of the information available in the transform-
domain. It has been found that a transform-domain
representation of an audio signal (for example, an
autocorrelation domain representation, an autocovariance
domain representation, a Fourier transform domain
representation, a discrete-cosine-transform domain
representation, and so on) is smoothly expanded or
compressed with varying fundamental frequency or pitch. By
modeling this smooth compression or expansion of the
transform-domain representation, the full information
content of the transform-domain representation may be
exploited, as multiple samples of the transform-domain
representation (for different values of the transformation
variable) may be matched.
In a preferred embodiment, the apparatus may be configured
to obtain, as the actual transform-domain parameters,
transform-domain parameters describing the audio signal in
the transform-domain as a function of a transform variable.
The transform-domain may be chosen such that a frequency
transposition of the audio signal results at least in a
frequency shift of the transform-domain representation of
the audio signal with respect to the transform variable, or
in a stretching of the transform-domain representation with
respect to the transform variable, or in a compression of
the transform-domain representation with respect to the
transform variable. The parameter determiner may be
configured to obtain a frequency-variation model parameter
(or pitch-variation model parameter) on the basis of a
temporal variation of corresponding (e.g. associated with
the same value of the transform variable) actual transform-
domain parameters, taking into consideration a dependency
of the transform-domain representation of the audio signal
from the transform variable. Using this approach, the
information about a temporal variation of corresponding
actual transform-domain parameters (e.g. transform-domain
parameters for identical autocorrelation lag,
autocovariance lag, or Fourier-transform frequency bin) can
be evaluated separately for the information regarding a
dependence of the transform-domain representation from the
transformation variable. Subsequently, the separately
calculated information can be combined. Thus, a
particularly efficient way is available to estimate the
expansion or compression of the transform-domain
representation, for example, by comparing multiple pairs of
transform domain parameters and taking into consideration
an estimated local gradient of the transform-parameter-
dependent variation of the transform-domain representation.
In other words, the local slope of the transform-domain
representation, in dependence on the transform parameter,
and the temporal change of the transform-domain
representation (for example, across subsequent windows) can
be combined to estimate a magnitude of the temporal
compression or expansion of the transform-domain
representation, which in return is a measure of a temporal
frequency variation or pitch variation.
Further preferred embodiments are also defined in the
dependent claims.
Another embodiment according to the invention creates a
method for obtaining a parameter describing a temporal
variation of a signal characteristic of an audio signal on
the basis of actual transform-domain parameters describing
the audio signal in a transform-domain.
Yet another embodiment creates a computer program for
obtaining a parameter describing a temporal variation of a
signal characteristic of an audio signal.
Brief description of the figures
Fig.la shows a block schematic diagram of an apparatus
for obtaining a parameter describing a temporal
variation of a signal characteristic of an audio
signal;
Fig. lb shows a flow chart of a method for obtaining a
parameter describing a temporal variation of a
signal characteristic of an audio signal;
Fig. 2 shows a flow chart of a method for obtaining a
parameter describing a temporal evolution of a
signal envelope, according to an embodiment of
the invention;
Fig. 3a shows a flow chart of a method for obtaining a
parameter describing a temporal variation of a
pitch, according to an embodiment of the
invention;
Fig. 3b shows a simplified flow chart of the method for
obtaining a parameter describing the temporal
evolution of the pitch;
Fig. 4 shows a flow chart of a further improved method
for obtaining a parameter describing a temporal
variation of a pitch, according to an embodiment
of the invention;
Fig. 5 shows a flow chart of a method for obtaining a
parameter describing a temporal variation of a
signal characteristic of an audio signal in an
autocovariance domain;
Fig. 6 shows a block schematic diagram of an audio
signal encoder, according to the embodiment of
the invention; and
Fig. 7 shows a flow chart of a general method for
obtaining a parameter describing a variation of a
signal.
Detailed description of the embodiment
In the following, the concept of variation modeling will be
described in general in order to facilitate the
understanding of the present invention. Subsequently, a
generic embodiment according to the invention will be
described taking reference to Figs. la and lb.
Subsequently, more specific embodiments will be described
taking reference to Figs. 2 to 5. Finally, the application
of the inventive concept for an audio signal encoding will
be described taking reference to Fig. 6, and a summary will
be given taking reference to Fig. 7.
In order to avoid confusion, the terminology will be used
as follows:
• with the term "variation" we refer to a general set of
functions that describes the change in characteristics
in time, and
• the (partial) derivative dldx is used as a
mathematically accurately defined entity.
In other words, "variation" refers to signal
characteristics (on an abstract level), whereas
"derivative" is used whenever the mathematical definition
dldx is used, for example, as the k (autocorellation-lag /
autocovariance lag) or t (time) derivatives of
autocorrelation/covariance.
Any other measures of change will be explained in words,
typically without using the term "variation".
Further, embodiments according to the invention will
subsequently be described for an estimation of temporal
variation of audio signals. However, the present invention
is not restricted to only audio signals and only temporal
variations. Rather embodiments according to the invention
can be applied to estimate general variations of signals,
even though the invention is at present mainly used for
estimating temporal variations of audio signals.
Variation modeling
General overview on variation modeling
Generally speaking, embodiments according to the invention
use variation models for the analysis of an input audio
signal. Thus, the variation model is used to provide a
method for estimating the variation.
Assumptions for variation modeling
In the following, some differences between a conventional
signal characteristic estimation and the concept applied in
the embodiments according to the present invention will be
discussed.
Whereas traditional methods assume that characteristics of
the signal (for example, an audio signal) are constant (or
stationary) in short windows of time, it is one of primary
approaches of the current invention to assume that the
(normalized) rate of change (e.g. of a signal
characteristic, (like a pitch or an envelope)) is constant
in a short window of time. Therefore, while traditional
methods can handle stationary signals as well as, within a
modest level of distortion, slowly changing signals, some
embodiments according the present invention can handle
stationary signals, linearly changing signals (or
exponentially changing signals), as well as, with a modest
level of distortion, such non-linearly changing signals
where the rate of non-linear change is slow.
As noted above, it is one of the primary approaches of the
present invention to assume that the (normalized) rate of
change is constant in a short window, but the presented
method and concept can be readily extended to a more
general case. For example, the normalized rate of change,
the variation, can be modeled by any function, and as long
as the variation model (or said function) has less
parameters than the number of data points, the model
parameters can be unambiguously solved.
In the preferred embodiments, the variation model may, for
example, describe a smooth change of a signal
characteristic. For example, the model may be based on the
assumption that a signal characteristic (or a normalized
rate of change thereof) follows a scaled version of an
elementary function, or a scaled combination of elementary
functions (wherein elementary functions comprise: xa; l/xa;
-J(x) ; 1/x; 1/x2; ex; ax; ln(x); loga(x); sinh x; cosh x;
tanh x; coth x; arsinh x; arcosh x; artanh x; arcoth x; sin
x; cos x; tan x; cot x; sec x; esc x; arcsin x; arccos x;
arctan x; arccot x) . In some embodiments, it is preferred
that the function describing the temporal evolution of the
signal characteristic, or of the normalized rate of change,
is steady and smooth over the range of interest.
Applicability in different domains
One of the primary fields of application of the concept
according to the present invention is analysis of signal
characteristics where the magnitude of change, the
variation, is more informative than the magnitude of this
characteristic. For example, in terms of pitch this means
that embodiments according to the invention are related to
applications where one is more interested in the change in
pitch, rather than the pitch magnitude.
If, however, in an application, one is more interested in
the magnitude of a signal characteristic rather than its
rate of change, one can still benefit from the concept
according to the present invention. For example, if a
priori information about signal characteristics is
available, such as the valid range for rate of change, then
the signal variation can be used as additional information
in order to obtain accurate and robust time contours of the
signal characteristic. For example, in terms of pitch, it
is possible to estimate the pitch by conventional methods,
frame by frame, and to use the pitch variation to weed out
estimation errors, out-liers, octave jumps and assist in
making the pitch contour a continuous track rather than
isolated points at the center of each analysis window. In
other words, it is possible to combine the model parameter,
parameterizing the transform-domain variation model, and
describing the variation of a signal characteristic, with
one or more discrete values describing a snapshot value of
a signal characteristic.
Moreover, in an embodiment according to the invention it is
a primary approach to model the normalized magnitude of
change, since the magnitude of the signal characteristics
is then explicitly cancelled from the calculations.
Generally, this approach makes the mathematical
formulations more tractable. However, embodiments according
to the invention are not constrained to using normalized
measures of variation, because there is no inherent reason
why one should constrain the concept to normalized measures
of variation.
Mathematical variation model
In the following, a mathematical variation model will be
described which may be applied in some embodiments
according to the invention. However, other variation models
are naturally also usable.
Consider a signal with a property such as pitch, that
varies over time and denote it by p(t) . The change in pitch
is its derivative —p{t) and in order to cancel the effect
dt
of the pitch magnitude, we normalize the change with p~x(t)
and define
We call this measure c (t) the normalized pitch variation,
or simply pitch variation, since a non-normalized measure
of pitch variation is meaningless in the present example.
The period length T(t) of a signal is inversely
proportional to the pitch, T(t) -p'\t), whereby we can
readily obtain
By assuming that the pitch variation is constant in a small
interval of t, c(t) = c, the partial differential equation
of Equation 1 can be readily solved whereby we obtain
where p0 and T0 signify, respectively, the pitch and period
length at time t = 0.
While T (t) is the period length at time t, we realize that
any temporal feature follows the same formula. In
particular, for the autocorrelation R(k,t) lag k at time t,
the temporal features in the ^-domain follow this formula.
In other words, a feature of the autocorrelation that
appears at lag k0 at t = 0 will be shifted as a function of
t as
In Equation 2, we considered only variations that can be
assumed constant in a short interval. However, if desired,
we can use higher order models by allowing the variation to
follow some functional form in a short temporal interval.
Polynomials are in this case of special interest since the
resulting differential equation can be readily solved. For
example, if we define the variation to follow the
polynomial form
Note that now, the constant p0 appearing in Equation 2 has
been assimilated into the exponential without loss of
generality, in order to make the presentation clearer.
This form demonstrates how the variation model can readily
be extended to more complicated cases. However, unless
otherwise stated, in this document we will consider only
the first order case (constant variation), in order to
retain understandability and accessibility. Those familiar
with the art can readily extend the methods to higher order
cases.
The same approach used here to pitch variation modeling can
be used without modification also to other measures for
which the normalized derivative is a well-warranted domain.
For example, the temporal envelope of a signal, which
corresponds to the instantaneous energy of the signal's
Hilbert transform, is such a measure. Often, the magnitude
of the temporal envelope is of less importance than the
relative value, that is the temporal variation of the
envelope. In audio coding, modeling of the temporal
envelope is useful in diminishing temporal noise spreading
and is usually achieved by a method known as Temporal Noise
Shaping (TNS), where the temporal envelope is modeled by a
linear predictive model in the frequency domain (see, for
example, reference [4]). The current invention provides an
alternative to TNS for modeling and estimating the temporal
envelope.
If we denote the temporal envelope by a (t), then the
(normalized) envelope variation h(t) is
and, correspondingly, the solution of the partial
differential equation is
Note that the above form implies that in the logarithmic
domain, the amplitude is a simple polynomial. This is
convenient since amplitudes are often expressed on the
decibel scale (dB).
Generic embodiment of an apparatus for obtaining a
parameter describing a temporal variation of a signal
characteristic
Fig. 1 shows a block schematic diagram of an apparatus for
obtaining a parameter describing a temporal variation of a
signal characteristic of an audio signal on the basis of
actual transform-domain parameters (e.g. autocorrelation
values, autocovariance values, Fourier coefficients, and so
on) describing the audio signal in a transform domain. The
apparatus shown in Fig. la is designated in its entirety
with 100. The apparatus 100 is configured to obtain (e.g.
receive or compute) actual transform-domain parameters 120
describing the audio signal in a transform domain. Also,
the apparatus 100 is configured to provide one or more
model parameters 14 0 of a transform-domain variation model
describing a temporal evolution of transform-domain
parameters in dependence on one or more model parameters.
The apparatus 100 comprises an optional transformer 110
configured to provide the actual transform-domain
parameters 120 on the basis of a time-domain representation
118 of the audio signal, such that the actual transform-
domain parameters 120 describe the audio signal in a
transform domain. However, the apparatus 100 may
alternatively be configured to receive the actual
transform-domain parameters 120 from an external source of
transform-domain parameters.
The apparatus 100 further comprises a parameter
determinator 130, wherein the parameter determinator 130 is
configured to determine one or more model parameters of the
transform-domain variation model, such that a model error,
representing a deviation between a modeled temporal
evolution of the transform-domain parameters and an actual
temporal evolution of the actual transform-domain
parameters, is brought below a predetermined threshold
value or minimized. Thus, the transform-domain variation
model, describing a temporal evolution of transform-domain
parameters in dependence on one or more model parameters
representing a signal characteristic, is adapted (or fit)
to the audio signal, represented by the actual transform-
domain parameters. Thus, it is effectively achieved that a
modeled variation of the audio-signal transform-domain
parameters described, implicitly or explicitly, by the
transform-domain variation model, approximates (within a
predetermined tolerance range) the actual variation of the
transform-domain parameters.
Many different implementation concepts are available for
the parameter determinator. For example, the parameter
determinator may comprise, for example, stored therein (or
on an external data carrier) variation model parameter
calculation equations 130a describing a mapping transform
domain parameters onto variation model parameters. In this
case, the parameter determinator 130 may also comprise a
variation model parameter calculator 130b (for example a
programmable computer or a signal processor or an fpga),
which may be configured, for example hardware or software,
to evaluate the variation model parameter calculation
equations 130a. For example, the variation model parameter
calculator 130b may be configured to receive a plurality of
actual transform-domain parameters describing the audio
signal in a transform domain and to compute, using the
variation model parameter calculation equations 130a, the
one or more model parameters 140. The variation model
parameter calculation equations 130a may, for example,
describe in explicit form a mapping of the actual
transform-domain parameters 120 onto the one or more model
parameters 140.
Alternatively, the parameter determinator 130 may, for
example, perform an iterative optimization. For this
purpose, the parameter determinator 130 may comprise a
representation 130c of the time-domain variation model,
which allows, for example, for a computation of a
subsequent set of estimated transform-domain parameters on
the basis of a previous set of actual transform-domain
parameters (representing the audio signal), taking into
consideration a model parameter describing the assumed
temporal evolution. In this case, the parameter
determinator 130 may also comprise a model parameter
optimizer 130d, wherein the model parameter optimizer 130d
may be configured to modify the one or more model
parameters of the time-domain variation model 130c, until
the set of estimated transform-domain parameters obtained
by the parameterized time-domain variation model 130c,
using a previous set of actual transform-domain parameters,
is in sufficiently good agreement (for example within a
predetermined difference threshold) with the current actual
transform-domain parameters.
However, there are naturally numerous other methods for
determining the one or more model parameters 140 on the
basis of the actual transform-domain parameters, because
there are different mathematical formulations of the
solution for the general problem to determine model
parameters such that the result of the modeling
approximates the actual transform-domain parameters (and/or
their temporal evolution).
In view of the above discussion, the functionality of the
apparatus 100 can be explained taking reference to Fig. lb,
which shows a flow chart of a method 150 for obtaining the
parameter 140 describing a temporal variation of a signal
characteristic of an audio signal. The method 150 comprises
an optional step 160 of computing the actual transform-
domain parameters 120 describing the audio signal in a
transform domain. The method 150 also comprises a step 170
of determining the one or more model parameters 140 of a
transform-domain variation model describing a temporal
evolution of transform-domain parameters in dependence on
one or more model parameters representing a signal
characteristic, such that a model error, representing a
deviation between a modeled temporal evolution and the
actual transform-domain parameters, is brought below a
predetermined threshold value or minimized.
In the following, some embodiments according to the
invention will be described in more detail in order to
explain in more detail the inventive concept.
Variation estimation in the autocorrelation domain
In the current context, the autocorrelation of signal xn is
defined as
and estimated by

where we assume that xn is non-zero only on the range
[1,N]. Note that the estimate converges to the true value
when N goes to infinity. Moreover, generally, some sort of
windowing may be applied to xn priori to estimation of the
autocorrelation in order to enforce the assumption that it
is zero outside the range [1,N].
Variation estimation in the autocorrelation domain - Pitch
variation
In an embodiment, our objective is to estimate signal
variation, that is, in the case of pitch variation, to
estimate how much the autocorrelation stretches or shrinks
as a function of time. In other words, our objective is to
determine the time derivative of the autocorrelation lag k,
dk
which is denoted as —. In the interest of clearness, we
dt
now use the short hand form k instead of k(t) and assume
that the dependence on t is implicit.
From Equation 4 we obtain
A conventional problem, which is overcome in some
embodiments according to the invention, is that the time
derivative of k is not available and direct estimation is
difficult. However, it has been recognized that the chain
rule of derivatives can be used to obtain
It has been found, that using an estimate of c, we can
then, using first order Taylor series, model the
autocorrelation at time tz using the autocorrelation at
time ti and the time derivative
In a practical application the derivative —R(k) can be
dk
estimated, for example, by the second order estimate
This estimate is preferred over the first order difference
R(k + 1) - R(k) since the second order estimate does not
suffer from the half-sample phase shift like the first
order estimate. For improved accuracy or computational
efficiency, alternative estimates can be used, such as
windowed segments of the derivative of the sine-function.
Using the minimum mean square error criterion we obtain the
optimization problem
The same derivations hold also when the pitch variation is
estimated from consecutive autocovariance windows instead
of the autocorrelation. However, in comparison to the
autocorrelation, the autocovariance contains additional
information the usage of which is described in the section
titled "Modeling in the Autocovariance domain".
Variation estimation in the autocorrelation domain -
Temporal envelope
As will be described in the following, a temporal evolution
of the envelope can also be estimated in the
autocorrelation domain.
In the following, a brief overview of the determination of
the temporal envelope variation will be given taking
reference to Fig. 2. Subsequently, a possible algorithm,
according to an embodiment of the invention, will be
described in detail.
Fig. 2 shows a flow chart of a method for obtaining a
parameter describing a temporal variation of an envelope of
the audio signal. The method shown in Fig. 2 is designated
in its entirety with 200. The method 200 comprises
determining 210 short-time energy values for a plurality of
consecutive time intervals. Determining the short-time
energy values may, for example, comprise determining
autocorrelation values at a common predetermined lag (e.g.
lag 0} for a plurality of consecutive (temporally
overlapping or temporally non-overlapping) autocorrelation
windows, to obtain the short-time energy values. A step 220
further comprises determining appropriate model parameters.
For example, step 220 may comprise determining polynomial
coefficients of a polynomial function of time, such that
the polynomial function approximates a temporal evolution
of the short-time energy values. In the following, an
example algorithm for determining the polynomial
coefficients will be described. For example, the step 220
may comprise a step 220a of setting-up a matrix (e.g.
designated with V) comprising sequences of powers of time
values associated with consecutive time intervals (time
intervals beginning or being centered, for example, at
times to, ti, t2, and so on). The step 220 may also comprise
of step 220b of setting-up a target vector (e.g. designated
with r) the entries of which describe the short-time energy
values for the consecutive time intervals.
In addition, the step 220 may comprise a step 220c of
solving a linear system of equations (for example, of the
form r = Vh) defined by the matrix (e.g. designated with V)
and by the target vector (e.g. designated with r) , to
obtain as a solution the polynomial coefficients (e.g.
described by vector h).
In the following, additional details regarding this
procedure will be explained.
In the autocorrelation domain, modeling of the temporal
envelope is straightforward. We can readily prove that the
autocorrelation at lag zero corresponds to the average of
the squared amplitude. Furthermore, the autocorrelation at
all other lags is scaled by the average of the squared
amplitude. In other words, the same information is
available at any and all lags, whereby it is sufficient to
consider the autocorrelation at lag zero only.
Since the first order model of envelope variation is
trivial, a higher order model is used in a preferred
embodiment. This also serves as an example of how to
proceed with higher order models, also in the case of pitch
variation estimation.
Consider an Mth order polynomial model for the envelope
variation according to Equation 5. We then have M + 1
unknowns and it is thus preferred to use at least M + 1
equations for a solution. In other words, it is preferred
to use at least M + 1 consecutive autocorrelation windows
(designated, for example, by autocorrelation window center
time or autocorrelation window start time th, R(k,th), h 6
[0,N] and N ^ M) . Then, the value of a (t) (describing, for
example, a short-term average power or short-term average
amplitude, for example in a linear or non-linear scaling)
at N + 1 different times t = th (or for N + 1 different
overlapping or non-overlapping time intervals) is obtained,
that is a(th) = R (0, th)1/2 and
Since a (t) is a polynomial (more precisely: is
approximated by a polynomial), this is the classical
problem of solving the coefficients of a polynomial, for
which numerous methods exist in literature.
One basic alternative for solution is to use a Vandermonde
matrix as follows.
The Vandermonde matrix V is, for example, defined as
and may be computed, for example, in step 220a. A target
vector r and a solution vector h may be defined as
The target vector may, for example, be computed in step
220b.
Then
Since the th' s are distinct and if M = N, then the inverse
V~J exists and we obtain

for example in step 220c.
If M > N, then the pseudo-inverse yields the answer.
However, if N and M are large, then more refined methods
known in the art may be employed for efficient solution.
Variation estimation in the autocorrelation domain - Bias
analysis
While the above presented estimate measures variation,
there is one step where the locally-stationary assumption
is not overcome in some embodiments. Namely, estimation of
the autocorrelation by conventional means (e.g. using an
autocorrelation window of finite length) makes the
assumption that the signal should be locally stationary. In
the following, it will be shown that signal variation does
not introduce bias to the estimate, such that the method
can be considered as sufficiently accurate.
In order to analyze bias of the autocorrelation, assume
that the pitch variation is constant in this time interval.
Furthermore, assume that at to we have a signal x(t) with
period length T(t0) = T0, then at a second point tj it has
period length T(fx) = T0 exp(-c(/, -tQ)) . The average period length
on the interval [to,ti] is
Observe that the latter part of the expression above is a
"hyperbolic sine" function, which we will denote by
Then for a window of length Atwjn=t^—(0 we have
By analogy between T and k, this expression also quantifies
how much an autocorrelation estimate is stretched due to
signal variation. However, if windowing is applied prior to
autocorrelation estimation, the bias due to signal
variation is reduced, since the estimate then concentrates
around the mid-point of the analysis window.
When estimating c from two consecutive biased
autocorrelation frames the values of k for each frame are
biased and follow the formulae
where f, and f2 are the mid-points of each of the frames.
Parameter c can be solved by defining f,=0 and the
distance between windows Atslep = i2 -f, , whereby

where we observe that all instances of Atwin have cancelled
each other out. In other words, even though signal
variation biases the autocorrelation estimate, the
variation estimate extracted from two autocorrelations is
unbiased.
However, while signal variation does not bias the variation
estimate, estimation errors due to overtly short analysis
windows cannot be avoided. Estimation of the
autocorrelation from a short analysis window is prone to
errors, since it depends on the location of the analysis
window with respect to the signal phase. Longer analysis
windows reduce this type of estimation errors but in order
to retain the assumption of locally constant variation, a
compromise has to be sought. A generally accepted choice in
the art is to have an analysis window length at least twice
the lowest expected period length. Nevertheless, shorter
analysis windows may be used if an increased error is
acceptable.
In terms of temporal envelope variation, the results are
similar. For a first order model, the estimate for envelope
variation is unbiased. Moreover, exactly the same logic can
be applied to autocovariance estimates, whereby the same
result holds for the autocovariance.
Variation estimation in the autocorrelation domain -
Application
In the following, a possible application of the present
invention for the estimation of a pitch variation will be
described. Firstly, the general concept will be outlined
taking reference to Fig. 3, which shows a flow chart of a
method 300 for obtaining a parameter describing a temporal
variation of a pitch of an audio signal, according to an
embodiment of the invention. Subsequently, implementation
details of the said method 300 will be given.
The method 300 shown in Fig. 3 comprises, as an optional
first step, performing 310 an audio signal pre-processing
of an input audio signal. The audio pre-processing may
comprise, for example, a pre-processing which facilitates
an extraction of the desired audio signal characteristics,
for example, by reducing any detrimental signal components.
For example, the formant structure modeling described below
may be applied as an audio signal pre-processing step 310.
The method 300 also comprises a step 320 of determining a
first set of autocorrelation values R(k,ti) of an audio
signal xn for a first time or time interval ti and for a
plurality of different autocorrelation lag values k. For a
definition of the autocorrelation values, reference is made
to the description below.
The method 300 also comprises a step 322 of determining a
second set of autocorrelation values R(k,t2) of the audio
signal xn for a second time or time interval t^ and for a
plurality of different autocorrelation lag values k.
Accordingly, steps 320 and 322 of the method 300 may
provide pairs of autocorrelation values, each pair of
autocorrelation values comprising two autocorrelation
(result) values associated with different time intervals of
the audio signal but same autocorrelation lag value k. The
method 300 also comprises a step 330 of determining a
partial derivative of the autocorrelation over
autocorrelation lag, for example, for the first time
interval starting at ti or for the second time interval
starting at t^. Alternatively, the partial derivative over
autocorrelation lag may also be computed for a different
instance in time or time interval lying or extending
between time ti and time t2.
Accordingly, the variation of the autocorrelation R(k,t)
over autocorrelation lag can be determined for a plurality
of the different autocorrelation lag values k, for example,
for those autocorrelation lag values for which the first
set of autocorrelation values and second set of
autocorrelation values are determined in steps 320, 322.
Naturally, there is no fixed temporal order with respect to
the execution of steps 320, 322, 330, such that the steps
can be executed partially or completely in parallel, or in
a different order.
The method 300 also comprises a step 340 of determining one
or more model parameters of a variation model using the
first set of autocorrelation values, the second set of
autocorrelation values and the partial derivative of the
autocorrelation —R(k,t) over autocorrelation lag.
8k
When determining the one or more model parameters, a
temporal variation between autocorrelation values of a pair
of autocorrelation values (as described above) may be taken
into consideration. The difference between the two
autocorrelation values of the pair of autocorrelation
values may be weighted, for example, in dependence on the
variation of the autocorrelation over lag (—R(k,h)) . In
the weighting of a difference between two autocorrelation
values of a pair of autocorrelation values, the
autocorrelation lag value k (associated with the pair of
autocorrelation values) may also be considered as a
weighting factor. Accordingly, a sum term of the form

may be used for the determination of the one or more model
parameters, wherein said sum term may be associated to a
given autocorrelation lag value k and wherein the sum term
comprises a product of a difference between two
autocorrelation values of a pair of autocorrelation values
of the form

and a lag-dependent weighting factor, for example of the
form

The autocorrelation lag dependent weighting factor allows
for a consideration of the fact that the autocorrelation is
extended more intensively for larger autocorrelation lag
values than for small autocorrelation lag values, because
the autocorrelation lag value factor k is included.
Further, the incorporation of the variation of the
autocorrelation value over lag makes it possible to
estimate the expansion or compression of the
autocorrelation function on the basis of local (equal
autocorrelation lag) pairs of autocorrelation values. Thus,
the expansion or compression of the autocorrelation
function (over lag) can be estimated without conducting a
pattern scaling and match functionality. Rather, the
individual sum terms are based on local (single lag value
k) contributions
Nevertheless, in order to obtain a large amount of
information from the autocorrelation function, sum terms
associated with different lag values k may be combined,
wherein the individual sum terms are still single-lag-value
sum terms.
In addition, normalization may be performed when
determining the model parameters of the variation model,
wherein the normalization factor may, for example, take the
form

and may, for example, comprise a sum of single-
autocorrelation-lag-value terms.
In other words, the determination of the one or more model
parameters may comprise a comparison (e.g. difference
formation or subtraction) of autocorrelation values for a
given, common autocorrelation lag value but for different
time intervals and, for the computation of the variation of
the autocorrelation value over lag (^-derivative of
autocorrelation), a comparison of autocorrelation values
for a given, common time interval but for different
autocorrelation lag values. However, a comparison (or
subtraction) of autocorrelation values for different time
intervals and for different autocorrelation lag values,
which would bring along considerable effort, is avoided.
The method 300 may further, optionally, comprise a step 350
of computing a parameter contour, such as a temporal pitch
contour, on the basis of the one or more model parameters
determined in the step 340.
In the following, a possible implementation of the concept
described with reference to Fig. 3a will be explained in
detail.
As a concrete application of the present innovation, we
shall in the following demonstrate an embodiment of a
method of estimating pitch variation from a temporal signal
in the autocorrelation domain. The method (360), which is
schematically represented in Figure 3b, comprises (or
consists of) the following steps:
1. Estimate (320,322;370) the autocorrelation R(k,h)
of xn for window h and h+l (for example windowed by
windowing function wn) of length Atwjn , separated by
^step
2. Estimate (330; 374) ^--derivative of autocorrelation
for window (or "frame") h, for example by
3. Estimate (340;378) pitch variation ch between
windows or frames h and h + l using (from Eq. 8)

If a (optionally normalized) pitch contour is desired
instead of only the pitch variation measure ch, a further
step shall be added:
1
4. Let the mid-point of window or frame i be th. Then
the pitch contour between windows or frames h and h + l
is

where p(th) is acquired from the previous pair of
frames or actual estimates of pitch magnitude. If no
measurements of the pitch magnitude are available, we
can set p(0) to an arbitrarily chosen starting value,
e.g. p(0) = 1, and calculate pitch contour iteratively
for all consecutive windows.
A number of pre-processing steps (310) known in the art can
be used to improve the accuracy of the estimate. For
example, speech signals have generally a fundamental
frequency in the range 80 to 400 Hz and if it is desired to
estimate the change in pitch, it is beneficial to band-pass
filter the input signal for example on range of 80 to 1000
Hz so as to retain the fundamental and a few first
harmonics, but attenuate high-frequency components that
could degrade the quality especially of the derivative
estimates and thus also the overall estimate.
Above, the method is applied in the autocorrelation domain
but the method can optionally, mutatis mutandis, be
implemented in other domains such as the autocovariance
domain. Similarly, above, the method is presented in
application to pitch variation estimation, but the same
approach can be used to estimate variations in other
characteristics of the signal such as the magnitude of the
temporal envelope. Moreover, the variation parameter(s) can
be estimated from more than two windows for increased
accuracy or, when the variation model formulation requires
additional degrees of freedom. The general form of the
presented method is depicted in Figure 7.
If additional information is available regarding the
properties of the input signal, thresholds can optionally
be used to remove infeasible variation estimates. For
example, the pitch (or pitch variation) of a speech signal
rarely exceeds 15 octaves/second, whereby any estimate that
exceeds this value is typically either non-speech or an
estimation error, and can be ignored. Similarly, the
minimum modeling error from Eq. 7 can optionally be used as
an indicator of the quality of the estimate. Particularly,
it is possible to set a threshold for the modeling error
such that an estimate based on a model with large modeling
error is ignored, since the change exhibited in the model
is not well described by the model and the estimate itself
is unreliable.
Variation estimation in the autocorrelation domain -
Formant structure modeling
In the following, a concept will be described for an audio
signal pre-processing, which can be used to improve the
estimation of the characteristics (for example, of the
pitch variation) of the audio signal.
In speech processing, formant structure is generally
modeled by linear predictive (LP) models (see reference
[6]) and its derivatives, such as warped linear prediction
(WLP) (see reference [5]) or minimum variance
distortionless response (MVDR) (see reference [9] ) .
Furthermore, while speech is constantly changing, the
formant model is usually interpolated in the Line Spectral
Pair (LSP) domain (see reference [7]) or equivalently, in
the Immittance Spectral Pair (ISP) domain (see reference
[1]), to obtain smooth transitions between analysis
windows.
For LP modeling of formants, however, the normalized
variation is not of primary interest, since normalizing the
LP model does not bring relevant advantages in some cases.
Specifically, in speech processing, the location of
formants is usually more important and interesting
information than the change in their locations. Therefore,
while it is possible to formulate normalized variation
models for formants as well, we will focus on the more
interesting topic of canceling the effect of formants.
In other words, inclusion of a model for changes in
formants can be used to improve accuracy of the estimation
of pitch variation or other characteristics. That is, by
canceling the effect of changes in formant structure from
the signal prior to the estimation of pitch variation, it
is possible to reduce the chance that a change in formant
structure is interpreted as a change in pitch. Both the
formant location and pitch can change with up to roughly 15
octaves per second, which means that changes can be very
rapid, they vary on roughly the same range and their
contributions could be easily confused.
To optionally cancel the effect of formant structure, we
first estimate an LP model for each frame, remove formant
structure by filtering and use the filtered data in the
pitch variation estimation. For pitch variation estimation,
it is important that the autocorrelation has a low-pass
character and it is therefore useful to estimate the LP
model from a high-pass filtered signal, but cancel the
formant structure only from the original signal (i.e.
without high-pass filtering), whereby the filtered data
will have a low-pass character. As is well known, the low-
pass character makes it easier to estimate derivatives from
the signal. The filtering process itself, can be performed
in time-domain, autocorrelation domain or frequency domain,
according to computational requirements of the application.
Specifically, the pre-processing method for canceling
formant structure from the autocorrelation can be stated as
1. Filter the signal with a fixed high-pass filter.
2. Estimate LP models for each frame of the high-pass
filtered signal.
3. Remove the contribution of the formant structure by
filtering the original signal with the LP filter.
The fixed high-pass filter in Step 1, can optionally be
replaced by a signal adaptive filter, such as a low-order
LP model estimated for each frame, if a higher level of
accuracy is required. If low-pass filtering is used as a
pre-processing step at another stage in the algorithm, this
high-pass filtering step can be omitted, as long as the
low-pass filtering appears after formant cancellation.
The LP estimation method in Step 2 can be freely chosen
according to requirements of the application. Well-
warranted choices would be, for example, conventional LP
(see reference [6]), warped LP (see reference [5]) and MVDR
(see reference [9]). Model order and method should be
chosen so that the LP model does not model the fundamental
frequency but only the spectral envelope.
In step 3, filtering of the signal with the LP filters can
be performed either on a window-by-window basis or on the
original continuous signal. If filtering the signal without
windowing (i.e. filtering the continuous signal), it is
useful to apply interpolation methods known in the art,
such as LSP or ISP, to decrease sudden changes of signal
characteristics at transitions between analysis windows.
In the following, the process of formant structure removal
(or reduction) will be briefly summarized taking reference
to Fig. 4. The method 400, a flow chart of which is shown
in Fig. 4, comprises a step 410 of reducing or removing a
formant structure from an input audio signal, to obtain a
formant-structure-reduced audio signal. The method 400 also
comprises a step 420 of determining a pitch variation
parameter on the basis of the formant-structure-reduced
audio signal. Generally speaking, the step 410 of reducing
or removing the formant structure comprises a sub-step 410a
of estimating parameters of a linear-predictive model of
the input audio signal on the basis of a high-pass-filtered
version or signal-adaptively filtered version of the input
audio signal. The step 410 also comprises a sub-step 410b
of filtering a broadband version of the input audio signal
on the basis of the estimated parameters, to obtain the
formant-structure-reduced audio signal such that the
formant-structure-reduced audio signal comprises a low-pass
character.
Naturally, the method 400 can be modified, as described
above, for example, if the input audio signal is already
low-pass filtered.
Generally, it can be said that a reduction or removal of
formant structure from the input audio signal can be used
as an audio signal pre-processing in combination with an
estimation of different parameters (e.g. pitch variation,
envelope variation, and so on) and also in combination with
a processing in different domains (e.g. autocorrelation
domain, autocovariance domain, Fourier transformed domain,
and so on).
Modeling in the autocovariance domain
Modeling in the autocovariance domain: Introduction and
overview
In the following, it will be described how model parameters
representing a temporal variation of an audio signal can be
estimated in an autocovariance domain. As mentioned above,
different model parameters, like a pitch variation model
parameter or an envelope variation model parameter, can be
estimated.
The autocovariance is defined as

wherein xn designates samples of the input audio signal.
Note that, in difference to the autocorrelation, here we do
not assume that xn is non-zero only in the analysis
interval. That is, xn does not need to be windowed before
analysis. Like the autocorrelation, for a stationary signal
the autocovariance converges to E[x„xn+k] when jV-»=o .
In comparison to autocorrelation, the autocovariance is a
very similar domain, but with some additional information.
Specifically, where as in the autocorrelation domain, phase
information of the signal is discarded, in the covariance
it is retained. When looking at' stationary signals, we
often find that phase information is not that useful, but
for rapidly varying signals, it can be very useful. The
underlying difference comes from the fact that for a
stationary signal the expected value is independent of time
but for a non-stationary signal this does not hold.
Assume at time t (or for a time interval starting at time t
or being centered at time t) we estimate, for signal xn,
the autocovariance Q(k,t). Then we can readily see that it
holds that E[Q(k,t)] = E[Q(-k,t + k] . In the following we will
adapt a notation where the expectations (described by the
operator E [...]) are implicit, whereby Q(k,t) = Q(-k,t+k) .
Similarly, the relationship Q(-k,t) = Q(k,t-k) may hold.
By applying the assumption of locally constant temporal
envelope variation, we have
The time derivative of Q(k,t) is therefore
Using these relations we can now form a first order Taylor
estimate for Q(k,t) centered at t

For example, the time shift may be measured in the same
units as the autocorrelation lag, such that the following
may hold:
Now all terms appear at the same point in time t (or for
the same time interval), so we can define qk=Q(k,t) and
qk=Q(k,t).
Recall that our purpose was to estimate the envelope
variation h. Since the above relation holds for all k we
can, for example, minimize the squared modeling error
(11)
The minimum can be readily found as
(12)
Here we have chosen to use minimum mean square error (MMSE)
as our optimization criterion but any other criteria known
in the art can be applied equally well here and also in the
other embodiments. Likewise, we have chosen to take the
estimate over all lags between k = —N and k = N, but a
selection of indices can be used for benefit of
computational efficiency and accuracy if desired here and
also in the other embodiments.
Note that in comparison to the autocorrelation, with the
autocovariance we do not need to use successive analysis
windows, but we can estimate the temporal envelope
variation from a single window. A similar approach can
readily be developed for the estimation of pitch variation
from a single autocovariance window.
Furthermore, note that in comparison to pitch variation
estimation, for envelope estimation we do not need to pre-
filter the signal with a low-pass filter, since no k-
derivatives of the autocovariance are needed.
Modeling in the autocovariance domain - Application
As another example of concrete application of the concept
of the present invention, we shall demonstrate the method
of estimating temporal envelope variation from a signal in
the autocovariance domain. The method comprises (or
consists of) the following steps:
1. Estimate the autocovariance q^ of signal xn for a
window of length Atw±n
2. Find the temporal envelope variation h by
calculating

If a normalized envelope contour is desired instead of only
the envelope variation measure h, a further step shall be
added optionally:
3. The envelope contour is

where a0 is acquired from the previous frame or an
actual estimate of the envelope magnitude. If no
measurements of the envelope magnitude are available,
we can set a0=l and calculate the envelope contour
iteratively for all consecutive windows.
If additional information is available regarding the
properties of the input signal, thresholds can optionally
be used to remove infeasible variation estimates. For
example, the minimum modeling error from Eq. 11 can
optionally be used as an indicator of the quality of the
estimate. Particularly, it is possible to set a threshold
for the modeling error such that an estimate based on a
model with large modeling error may be ignored, since the
change exhibited in the model is not well described by the
model and the estimate itself is unreliable.
To further improve the accuracy, it is optionally possible
to first cancel the formant structure of the input signal
(as explained in the section titled "Variation estimation
in the autocorrelation domain - Formant structure
modeling"). However, note that, in terms of speech signals,
we then obtain an estimate of the glottal pressure wave-
form instead of the speech signal (speech pressure wave-
form) and the temporal envelope models thus the envelope of
the glottal pressure, which may or may not be a desired
consequence, depending on the application.
Modeling in the autocovariance domain - Joint estimation of
pitch and envelope variation
Similarly as the envelope variation was estimated in the
previous section, also the pitch variation can be estimated
directly from a single autocovariance window. However, in
this section, we will demonstrate the more general problem
of how to jointly estimate pitch and envelope variation
from a single autocovariance window. It will then be
straightforward for anyone knowledgeable in the art to
modify the method for the estimation of the pitch variation
only. It should be noted here that it is not necessary to
use any windowing in the autocovariance domain. For
example, it is sufficient to compute the autocovariance
parameters as outlined in the section titled "Modeling in
the Autocovariance domain - Overview". Nevertheless, the
expression "single autocovariance window" expresses that
the autocovariance estimate of a single fixed portion of
the audio signal may be used to estimate variation, in
contrast to the autocorrelation, where autocorrelation
estimates of at least two fixed portions of the audio
signal has to be used to estimate variation. The usage of a
single autocovariance window is possible since the
autocovariance at lag +k and -k express, respectively the
autocovariance k steps forward and backward from a given
sample. In other words, since the signal characteristics
evolve over time, the autocovariance forward and backward
from a sample will be different and this difference in
forward and backward autocovariance expresses the magnitude
of change in signal characteristics. Such estimation is not
possible in the autocorrelation domain, since the
autocorrelation domain is symmetric, that is,
autocorrelations forward and backward are identical.
Consider a signal x(t) = a(t)f(b(t)), where amplitude and pitch
variation are modeled by first order models, whereby
a(t)-a0ehl and b(t) = b0tecl . The autocovariance Qx(k) of x(t) is
then
where Qf(k,t) is the autocovariance of f(b(t)).
Using Equations 6, 10 and 13, we obtain the time derivative
of Qx(k,t) as
However, the above equation contains a product ch and is
thus not a linear function of c and h. In order to
facilitate efficient solution of parameters, we may assume
that \ch\ is small, whereby we can approximate
As before, we can define qk= Qx(k-,t) and form the first order
Taylor estimate
The square difference between the true value q^ and the
Taylor estimate qk will again serve as our objective
function when finding optimal (or at least approximately
optimal) c and h. We obtain the minimization problem
whose solution can be readily obtained as
where
Although the formulas appear to be complex, the
construction of A and u can be performed using only
operations for vectors of length 2N (lag zero can be
omitted) and the solution of c and h can be performed using
the inversion of the 2x2 matrix A. The computational
complexity thus only a modest 0(N) (i.e. of the order of
N) .
The application of joint estimation of pitch and envelope
variation follows the same approach as presented in the
section titled "Modeling in the autocovariance domain -
Application", but using Eq. 14 in Step 2.
Modeling in the autocovariance domain - Further concepts
In the following, different approaches of modeling the
autocovariance domain will be briefly discussed taking
reference to Fig. 5. Fig. 5 shows a block schematic diagram
of a method 500 for obtaining a parameter describing a
temporal variation of signal characteristic of an audio
signal, according to an embodiment of the invention. The
method 500, comprises, as an optional step 510, an audio
signal pre-processing. The audio signal pre-processing in
step 510 may, for example, comprise a filtering of the
audio signal (for example, a low-pass filtering) and/or a
formant structure reduction/removal, as described above.
The method 500 may further comprise a step 520 of obtaining
first autocovariance information describing an
autocovariance of the audio signal for a first time
interval and for a plurality of different autocovariance
lag values k. The method 500 may also comprise a step 522
of obtaining second autocovariance information describing
an autocovariance of the audio signal for a second time
interval and for the different autocovariance lag values k.
Further, the method 500 may comprise a step 530 of
evaluating, for the plurality of different autocovariance
lag values k, a difference between the first autocovariance
information and the second autocovariance information, to
obtain a temporal variation information.
Further, method 500 may comprise a step 540 of estimating a
"local" (i.e. in an environment of a respective lag value)
variation of the autocovariance information over lag for a
plurality of different lag values, to obtain a "local lag
variation information".
Also, the method 500 may generally comprise a step 550 of
combining the temporal variation information and the
information about the local variation q' of the
autocovariance information over lag (also designated as
"local lag variation information") , to obtain the model
parameter.
When combining the temporal variation information and the
information about the local variation q' of the
autocovariance information over lag, the temporal variation
information and/or the information about the local
variation q' of the autocovariance information over lag may-
be scaled in accordance with the corresponding
autocovariance lag k, for example, proportional to the
autocovariance lag k or a potency thereof.
Alternatively, steps 520, 522 and 530 may be replaced by
steps 570, 580, as will be explained in the following. In
step 570, an autocovariance information describing an
autocovariance of the audio signal for a single
autocovariance window but for different autocovariance lag
values k may be obtained. For example, an autocovariance
value Q(k,t) = qk and an autocovariance information q^k = Q(~k,t)
may be obtained.
Subsequently, weighted differences, e.g. 2k{qk-q_k) and/or
k2(qk-q„k), between autocovariance values associated with
different lag values (e.g. -k, +k) may be evaluated for a
plurality of different autocovariance lag values k in step
580. The weights (e.g. 2k, k2) may be chosen in dependence
on a difference of the lag values of the respective
subtracted autocovariance values (e.g. the difference in
lag between the autocovariance values qk,q_k :k-(-k)-2k) .
To summarize the above, there are many different ways of
obtaining the one or more desired model parameters in the
autocovariance domain. In the preferred embodiments, a
single autocovariance window may be sufficient in order to
estimate one or more temporal variation model parameters.
In this case, differences between autocovariance values
being associated with different autocovariance lag values
may be compared (e.g. subtracted). Alternatively,
autocovariance values for different time intervals but same
autocovariance lag value may be compared (e.g. subtracted)
to obtain temporal variation information. In both cases,
weighting may be introduced which takes into account the
autocovariance difference or autocovariance lag, when
deriving the model parameter.
Modeling in other domains
In addition to the autocorrelation and autocovariance, the
concept disclosed herein can be formulated also in other
domains, such as the Fourier spectrum. When applying the
method in domain T , it may comprise the following steps:
1. Transform time signal to domain SP .
2. Calculate time derivative(s) in domain *¥ , in a
form where the variation model parameters are present
in explicit form.
3. Form the Taylor series approximation of the signal
in domain *P and minimize its fit to the true time
evolution, to obtain the variation model parameters.
4. (Optional) Calculate time contour of signal
variation.
In a practical application, the application of the
inventive concept may, for example, comprise transforming
the signal to the desired domain and determining the
parameters of a Taylor series approximation, such that the
model represented by the Taylor series approximation is
adjusted to fit the actual time evolution of the transform-
domain signal representation.
In some embodiments, the transform domain can also be
trivial, that is, it is possible to apply the model
directly in time domain.
As presented in previous sections, the variation model(s)
can for example be locally constant (s), polynomial (s) or
have other functional form(s).
As demonstrated in previous sections, the Taylor series
approximation can be applied either across consecutive
windows, within one window, or in a combination of within
windows and across consecutive windows.
The Taylor series approximation can be of any order,
although first order models are generally attractive since
then the parameters can be obtained as solutions to linear
equations. Moreover, also other approximation methods known
in the art can be used.
Generally, minimization of the mean squared error (MMSE) is
a useful minimization criterion, since then parameters can
be obtained as solutions to linear equations. Other
minimization criterions can be used for improved robustness
or when the parameters are better interpreted in another
minimization domain.
Apparatus for encoding an audio signal
As already mentioned above, the inventive concept can be
applied in an apparatus for encoding an audio signal. For
example, the inventive concept is particularly useful
whenever an information about a temporal variation of an
audio signal is required in an audio encoder (or an audio
decoder, or any other audio processing apparatus).
Fig. 6 shows a block schematic diagram of an audio encoder,
according to an embodiment of the invention. The audio
encoder shown in Fig. 6 is designated in its entirety with
600. The audio encoder 600 is configured to receive a
representation 606 of an input audio signal (e.g. a time-
domain representation of an audio signal), and to provide,
on the basis thereof, an encoded representation 630 of the
input audio signal. The audio encoder 600 comprises,
optionally, a first audio signal pre-processor 610 and,
further optionally, a second audio signal pre-processor
612. Also, the audio encoder 600 may comprise an audio
signal encoder core 620, which may be configured to receive
the representation 606 of the input audio signal, or a pre-
processed version thereof, provided, for example, by the
first audio signal pre-processor 610. The audio signal
encoder core 620 is further configured to receive a
parameter 622 describing a temporal variation of a signal
characteristic of the audio signal 606. Also, the audio
signal encoder core 620 may be configured to encode the
audio signal 606, or the respective pre-processed version
thereof, in accordance to an audio signal encoding
algorithm, taking into account the parameter 622. For
example, an encoding algorithm of the audio signal encoder
core 620 may be adjusted to follow a varying characteristic
(described by the parameter 622) of the input audio signal,
or to compensate for the varying characteristic of the
input audio signal.
Thus, the audio signal encoding is performed in a signal-
adaptive way, taking into consideration a temporal
variation of the signal characteristics.
The audio signal encoder core 620 may, for example, be
optimized to encode music audio signals (for example, using
a frequency-domain encoding algorithm). Alternatively, the
audio signal encoder may be optimized for speech encoding,
and may therefore also be considered as a speech encoder
core. However, the audio signal encoder core or speech
encoder core may naturally also be configured to follow a
so-called "hybrid" approach, exhibiting good performance
both for encoding music signals and speech signals.
For example, the audio signal encoder core or speech
encoder core 620 may constitute (or comprise) a time-warp
encoder core, thus using the parameter 622 describing a
temporal variation of a signal characteristic (e.g. pitch)
as a warp parameter.
The audio encoder 600 may therefore comprise an apparatus
100, as described with reference to Fig. 1, which apparatus
100 is configured to receive the input audio signal 606, or
a preprocessed version thereof (provided by the optional
audio signal pre-processor 612) and to provide, on the
basis thereof, the parameter information 622 describing a
temporal variation of a signal characteristic (e.g. pitch)
of the audio signal 606.
Thus, the audio encoder 606 may be configured to make use
of any of the inventive concepts described herein for
obtaining the parameter 622 on the basis of the input audio
signal 606.
Computer Implementation
Depending on certain implementation requirements,
embodiments of the invention can be implemented in hardware
or in software. The implementation can be performed using a
digital storage medium, for example a floppy disk, a DVD, a
CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory,
having electronically readable control signals stored
thereon, which cooperate (or are capable of cooperating)
with a programmable computer system such that the
respective method is performed.
Some embodiments according to the invention comprise a data
carrier having electronically readable control signals,
which are capable of cooperating with a programmable
computer system, such that one of the methods described
herein is performed.
Generally, embodiments of the present invention can be
implemented as a computer program product with a program
code, the program code being operative for performing one
of the methods when the computer program product runs on a
computer. The program code may for example be stored on a
machine readable carrier.
Other embodiments comprise the computer program for
performing one of the methods described herein, stored on a
machine readable carrier.
In other words, an embodiment of the inventive method is,
therefore, a computer program having a program code for
performing one of the methods described herein, when the
computer program runs on a computer.
A further embodiment of the inventive methods is,
therefore, a data carrier (or a digital storage medium, or
a computer-readable medium) comprising, recorded thereon,
the computer program for performing one of the methods
described herein.
A further embodiment of the inventive method is, therefore,
a data stream or a sequence of signals representing the
computer program for performing one of the methods
described herein. The data stream or the sequence of
signals may for example be configured to be transferred via
a data communication connection, for example via the
Internet.
A further embodiment comprises a processing means, for
example a computer, or a programmable logic device,
configured to or adapted to perform one of the methods
described herein.
A further embodiment comprises a computer having installed
thereon the computer program for performing one of the
methods described herein.
In some embodiments, a programmable logic device (for
example a field programmable gate array) may be used to
perform some or all of the functionalities of the methods
described herein. In some embodiments, a field programmable
gate array may cooperate with a microprocessor in order to
perform one of the methods described herein.
Conclusion
In the following, the inventive concept will be briefly
summarized taking reference to Fig. 7, which shows a
flowchart of a method 700 according to an embodiment of the
invention. The method 700 comprises a step 710 of
calculating a transform domain representation of an input
signal, for example, an input audio signal. The method 700
further comprises a step 730 of minimizing the modeling
error of a model describing an effect of the variation in
the domain. Modeling 720 the effect of variation in the
transform domain may be performed as a part of the method
700, but may also be performed as a preparatory step.
However, when minimizing the modeling error in step 730,
both the transform domain representation of the input audio
signal and the model describing the effect of variation may
be taken into consideration. The model describing the
effect of variation may be used in a form describing
estimates of a subsequent transform domain representation
as an explicit function of previous (or following, or
other) actual transform domain parameters, or in a form
describing optimal (or at least sufficiently good)
variation model parameters as an explicit function of a
plurality of actual transform domain parameters (of a
transform domain representation of the input audio signal).
Step 730 of minimizing the modeling error results in one or
more model parameters describing a variation magnitude.
The optional step 740 of generating a contour results in a
description of a contour of the signal characteristic of
the input (audio) signal.
To summarize, the above embodiments according to the
present invention address one of the most fundamental
questions in signal processing, namely, how much does a
signal change?
According to the present invention, embodiments provide a
method (and an apparatus) for an estimation of variation in
signal characteristics, such as a change in fundamental
frequency or temporal envelope. For changes in frequency,
it is oblivious to octave jumps, robust to errors in the
autocorrelation (or autocovariance) simple, yet effective
and unbiased.
Specifically, the embodiments according to the present
invention comprise the following features:
• The variation in signal characteristics (e.g. of the
input audio signal) is modeled. In terms of pitch
variation or temporal envelope, the model specifies
how the autocorrelation or autocovariance (or another
transform domain representation) changes over time.
• While signal characteristics cannot be assumed to be
locally constant, the variation (which may be
normalized in some embodiments) in signal
characteristics can be assumed constant or to follow a
functional form.
• By modeling the signal change, its variation (= the
time evolution of the signal characteristics) can be
modeled.
• The signal variation model (e.g. in implicit or
explicit functional representation) is fitted to
observations (e.g. actual transform domain parameters
obtained by transforming the input audio signal) by
minimizing the modeling error, whereby the model
parameters quantify the magnitude of variation.
• In terms of pitch variation estimation, the variation
is estimated directly from the signal, without an
intermediate step of pitch estimation (e.g. an
estimation of an absolute value of the pitch).
• By modeling the variation in pitch, the effect of
variation can be measured from any lag of the
autocorrelation and not only at multiples of the
period length, thus enabling usage of all available
data and thereby obtaining a high level of robustness
and stability.
• Even though estimating the autocorrelation or
autocovariance from a non-stationary signal introduces
bias to the autocorrelation and -covariance estimates,
the variation estimate in the present work will still
be unbiased in some embodiments.
• When the actual characteristics of the signal are
sought and not only the variation in characteristics,
the method optionally provides an accurate and
continuous contour which can be fitted to estimates of
signal characteristics along the contour.
• In speech and audio coding, the presented method can
be used as input for the time-warped MDCT, such that
when changes in pitch are known, their effect can be
canceled by time-warping, before applying the MDCT.
This will reduce smearing of frequency components and
thus improve energy compaction.
• When estimating from the autocorrelation, consecutive
analysis windows may be used to obtain the temporal
change. When estimating from the autocovariance, only
a single window is needed to measure the temporal
change, but consecutive windows can be used when
desired.
• Jointly estimating changes in both pitch and temporal
envelope corresponds to AM-FM analysis of the signal.
In the following, some embodiments according to the
invention will be briefly summarized.
According to an aspect, an embodiment according to the
invention comprises a signal variation estimator. The
signal variation estimator comprises a signal variation
modeling in a transform domain, a modeling of time
evolution of signal in transform domain, and a model error
minimization in terms of fit to input signal.
According to an aspect of the invention, the signal
variation estimator estimates variation in the
autocorrelation domain.
According to another aspect, the signal variation estimator
estimates variation in pitch.
According to an aspect, the present invention creates a
pitch variation estimator, wherein the variation model
comprises:
• A model for shift in autocorrelation lag.
an
• An estimate of autocorrelation lag derivative —.
dk
• A model for relation (i.) the time derivative of
autocorrelation lag, (ii.) time derivative of
autocorrelation and (iii.) autocorrelation lag
derivative.
• A Taylor series estimate of autocorrelation.
• A MMSE estimate of model fit, which yields the pitch
variation parameter(s).
According to an aspect of the invention, the pitch
variation estimator can be used, in combination with time-
warped-modified-discrete-cosine-transform (TW-MDCT, see
reference [3]) in speech and audio coding as input (or to
provide input) to the time-warped-modified-discrete-cosine-
transform (TW-MDCT).
According to an aspect of the invention, the signal
variation estimator estimates variation in the
autocovariance domain.
According to an aspect, the signal variation estimator
estimates a variation in temporal envelope.
According to an aspect, the temporal envelope variation
estimator comprises a variation model, the variation model
comprising:
• A model for the effect of temporal envelope variation
on autocovariance as function of lag k.
• A Taylor series estimate of autocovariance.
• A MMSE estimate of model fit, which yields the
envelope variation parameter(s).
According to an aspect, the effect of formant structure is
canceled in the signal variation estimator.
According to another aspect, the present invention
comprises the usage of signal variation estimates of some
characteristics of a signal as additional information for
finding accurate and robust estimates of that
characteristic.
To summarize, embodiments according to the present
invention use variation models for the analysis of a
signal. In contrast, conventional methods require an
estimate of pitch variation as input to their algorithms,
but do not provide a method for estimating the variation.
References
[1] Y. Bistritz and S. Peller. Immittance spectral pairs
(ISP) for speech encoding . In Proc. Acou Speech Signal
Processing, ICASSP-93, Minneapolis, MN, USA, April 27-30
1993.
[2] A. de Cheveigne and H. Kawahara. YIN, a fundamental
frequency estimator for speech and music. J Acoust Soc Am,
111(4):1917-1930, April 2002.
[3]. B. Edler, S. Disch, R. Geiger, S. Bayer, U. Kramer, G.
Fuchs, M. Neundorf, M. Multrus, G. Schuller und H. Popp.
Audio processing using high-quality pitch correction. US
Patent application 61/042,314, 2008.
[4] J. Herre and J.D. Johnston. Enhancing the performance
of perceptual audio coders by using temporal noise shaping
(TNS) . In Proc AES Convention 101, Los Angeles, CA, USA,
November 8-11 1996.
[5] A. Harma. Linear predictive coding with modified filter
structures. IEEE Trans. Speech Audio Process., 9(8):769-
777, November 2001.
[6] J. Makhoul. Linear prediction: A tutorial review. Proc.
IEEE, 63(4): 561-580, April 1975
[7] K.K. Paliwal. Interpolation properties of linear
prediction parametric representations. In Proc Eurospeech
'95, Madrid, Spain, September 18-21 1995.
[8] L. Villemoes. Time warped modified transform coding of
audio signals. International Patent PCT/EP2006/010246,
Published 10.05.2007.
[9] M. Wolfel and J. McDonough. Minimum variance
distortionless response spectral estimation. IEEE Signal
Process Mag., 22(5):117-126, September 2005.
WE CLAIM:

1. An apparatus (100) for obtaining a parameter (140)
describing a variation of a signal characteristic of a
signal on the basis of actual transform domain
parameters (120) describing the signal in a transform
domain, the apparatus comprising:
a parameter determinator (130) configured to determine
one or more model parameters of a transform-domain
variation model (130a; 130c) describing an evolution
of transform domain parameters in dependence on one or
more model parameters (140) representing a signal
characteristic, such that a model error, representing
a deviation between a modelled evolution of the
transform domain parameters and an evolution of the
actual transform domain parameters, is brought below a
predetermined threshold value or minimized;
wherein the apparatus (100) is configured to obtain,
as the actual transform-domain parameters, first
transform domain information (R(k,h)) describing the
audio signal for a first time interval for a plurality
of different values of the transform variable (k), and
second transform domain information (R(k,h+1))
describing the audio signal for a second time interval
for the different values of the transform variable;
wherein the parameter determinator (130) is configured
to evaluate, for a plurality of different values of
the transform variable (k) , a temporal variation
between the first transform domain information and the
second transform domain information, to obtain
temporal variation information,
to estimate a local variation of the transform domain
information over the transform variable for a

plurality of different values of the transform
variable, to obtain a local variation information, and
to combine the temporal variation information and the
local variation information, to obtain a frequency
variation model parameter;
wherein the parameter determinator (130) is configured
to obtain the frequency variation model parameter
using a model comprising the frequency variation model
parameter and representing a compression or expansion
of the transform domain representation of the audio
signal with respect to the transform variable (k)
assuming a smooth frequency variation of the audio
signal;
wherein the parameter determinator is configured to
determine the frequency variation model parameter such
that the parameterized transform-domain variation
model is adapted to the first set of transform domain
parameters and the second set of transform domain
parameters.
2. The apparatus (100) according to claim 1, wherein the
apparatus (100) is configured to obtain, as the actual
transform-domain parameters (120), a first set of
transform domain parameters (R(k,h)) describing a
first time interval of the audio signal in the
transform domain for a predetermined set of values of
a transform variable (k) , and a second set of
transform domain parameters {R(k,h+1)) describing a
second time interval of the audio signal in the
transform domain for the predetermined set of values
of the transform variable [k) .
3. The apparatus (100) according to claim 1, wherein the
apparatus (100) is configured to obtain, as the actual
transform domain parameters (120), transform domain

parameters describing the audio signal in the
transform-domain as a function of a transform variable
(k),
wherein the transform domain is chosen such that a
frequency transposition of the audio signal results at
least in a shift of the transform domain
representation of the audio signal with respect to the
transform variable or in a stretching of the transform
domain representation with respect to the transform
variable, or in a compression of the transform domain
representation with respect to the transform variable;
wherein the parameter determinator 130 is configured
to obtain a frequency variation model parameter (ch)
on the basis of a temporal change (R(k,h + Y)-R(k,h)) of
corresponding actual transform domain parameters,
taking into consideration a dependence of the
transform-domain-representation of the audio signal
from the transform variable [k).
4. The apparatus (100) according to one of claims 1 to 3
wherein the apparatus (100) is configured to obtain,
as the actual transform-domain parameters, first
autocorrelation information [R(k,h)) describing an
autocorrelation of the audio signal for a first time
interval for a plurality of different autocorrelation
lag values (k) , and second autocorrelation information
{R(k,h+1)) describing an autocorrelation of the audio
signal for a second time interval for the different
autocorrelation lag values;
wherein the parameter determinator 130 is configured
to evaluate, for a plurality of different
autocorrelation lag values \k) , a temporal variation
between the first autocorrelation information and the
second autocorrelation information, to obtain temporal
variation information,

to estimate a local variation of the autocorrelation
information over lag for a plurality of different lag
values, to obtain a local lag variation information,
and
to combine the temporal variation information and the
local lag variation information, to obtain the model
parameter.
5. The apparatus (100) according to claim 4, wherein the
parameter determinator is configured to compute an
estimated variation parameter ch using the following
equation:

wherein
k designates a running variable describing
different autocorrelation lag values;
h designates a first time interval;
h+1 designates a second time interval;
N>2 designates a number of autocorrelation lag
values to be evaluated;
R{k,h) designates an autocorrelation of the audio
signal (xn ) for a window designated by index h
R(k,h + \) designates an autocorrelation of the
audio signal xn for a window designated by index
h+1; and
—R{k,h) designates a variation of the
8k
autocorrelation R(k,h) over a lag for a window
designated by index h in a surrounding of the lag
designated by k.
6. The apparatus (100) according to one of claims 1 to 3,
wherein the apparatus is configured to obtain, as the
actual transform-domain parameters, first
autocovariance information (Q(k,t) = qk) describing an
autocovariance of the audio signal for a first time
interval for a plurality of different autocorrelation
lag values (k) and second autocovariance information
iQ(~k,t) = Q(k,t — k) = q_k) describing an autocovariance of
the audio signal for a second time interval (t-k) for
a plurality of different autocorrelation lag values;
and
wherein the parameter determinator is configured to
evaluate, for a plurality of different autocovariance
lag values, a variation ((?*-#_*) between the first
autocovariance information and the second
autocovariance information, to obtain temporal
variation information,
to estimate a local derivative (—-) of the
dk
autocovariance information over lag for a plurality of
different lag values, to obtain a local lag variation
information, and
to combine the temporal variation information and the
local lag variation information, to obtain the model
parameter (140).

7. The apparatus (100) according to one of claims 1 to 3,
wherein the apparatus (100) is configured to obtain
autocovariance information (Q(k,t) = qk, Q(-k,t) = q_k)
describing an autocovariance of the audio signal for a
single autocovariance window but for different
autocovariance lag values,
to evaluate, for a plurality of different pairs of
autocovariance lag values (-k,k), weighted differences
(k2(qk-q_k)) between the pairs of autocovariance
values,
wherein the weight is chosen in dependence on a
difference (2k) of the lag values of the respective
pairs of lag values, and in dependence on a variation
(q'_k) of the autocovariance values over lag,
to sum-combine different weighted difference values,
to obtain a combination value, and
to obtain the model parameters on the basis of the
combination value.
8. The apparatus (100) according to one of claims 1 to 7,
wherein the apparatus (100) is configured to obtain a
parameter describing a temporal variation of an
envelope of the audio signal,
wherein the parameter determinator (130) is configured
to obtain a plurality of transform-domain parameters
{R(0,th)) describing a signal power of the audio signal
for a plurality of time intervals,
wherein the parameter determinator is configured to
obtain an envelope variation model parameter using a
representation of a parameterized transform-domain

variation model comprising an envelope variation model
parameter and representing a temporal increase in
power or a temporal decrease in power of the
transform-domain representation of the audio signal
assuming a smooth envelope variation of the audio
signal, and
wherein the parameter determinator is configured to
determine the envelope variation model parameter such
that the parameterized transform-domain variation
model is adapted to the transform-domain parameters
(R(0,th)) .
9. The apparatus (100) according to claim 8, wherein the
parameter determinator (130) is configured to obtain a
plurality of autocorrelation parameters or
autocovariance parameters for a given autocorrelation
lag or autocovariance lag, and
wherein the parameter determinator is configured to
determine a plurality of polynomial parameters of a
polynomial envelope variation model.
10. The apparatus according to claim 1, wherein the
apparatus is configured to obtain autocorrelation
domain parameters describing the audio signal in an
autocorrelation domain, and
wherein the parameter determinator (130) is configured
to determine one or more model parameters (140) of an
autocorrelation domain variation model; or
wherein the apparatus is configured to obtain
autocovariance domain parameters describing the audio
signal in an autocovariance domain, and
wherein the parameter determinator (130) configured to
determine one or more model parameters of an
autocovariance domain variation model.
11. The apparatus according to one of claims 1 to 10,
wherein the transform-domain variation model describes
a temporal variation of a pitch of the audio signal,
or
. wherein the transform-domain variation model describes
a temporal variation of an envelope of the audio
signal, or
wherein the transform-domain variation model describes
a simultaneous temporal variation of a pitch and of an
envelope of the audio signal.
12. The apparatus (100) according to one of claims 1 to
11, wherein the apparatus comprises a formant-
structure-reducer configured to preprocess an input
audio signal, to obtain a formant-structure-reduced
audio signal; and
wherein the apparatus is configured to obtain the
actual transform-domain parameter on the basis of the
formant-structure-reduced audio signal;
wherein the formant-structure-reducer is configured to
estimate parameters of a linear-predictive model of
the input audio signal on the basis of a high-pass
filtered version of the input audio signal, and
to filter a broad band version of the input audio
signal on the basis of the estimated parameters of the
linear-predictive model,
to obtain the formant-structure-reduced audio signal
such that the formant-structure-reduced audio signal
comprises a low-pass characteristic.
13. The apparatus according to one of claims 1 to 12,
wherein the parameter determinator is configured to
adapt the transform-domain variation model, describing
a temporal evolution of transform domain parameters in
dependence on one or more model parameters
representing a signal characteristic, to the signal
represented by the actual transform domain parameters.
14. The apparatus according to one of claims 1 to 13,
wherein the parameter determinator is configured to
evaluate, for a plurality of different values of the
transform variable (k), differences between pairs
(R(k, h + 1), R(k, h) ) of transform domain values of
the first set of transform domain parameters and the
second set of transform domain parameters associated
with same values of the transform variable, to obtain
the temporal variation information.
15. The apparatus according to one of claims 1 to 14,
wherein the parameter determinator is configured to
use all available transform domain values (R(k, h +
1), R(k, h)), for any value of the transform variable,
to obtain the temporal variation information.
16. A method for obtaining a parameter describing a
variation of a signal characteristic for a signal on
the basis of actual transform-domain parameters
describing the signal in a transformed domain, the
method comprising:

determining one or more model parameters of a
transform-domain variation model describing an
evolution of transform-domain parameters in dependence
on one or more model parameters representing a signal
characteristic, such that a model error, representing
a deviation between a modeled temporal evolution of
the transform-domain parameters and an evolution of
the actual transform-domain parameters, is brought
below a predetermined threshold value or minimized;
wherein first transform domain information describing
the audio signal for a first time interval for a
plurality of different values of a transform variable,
and second transform domain information describing the
audio signal for a second time interval for the
different values of the transform variable are
obtained as the actual transform-domain parameters;
wherein a temporal variation between the first
transform domain information and the second transform
domain information is evaluated for a plurality of
different values of the transform variable (k) , to
obtain temporal variation information,
wherein a local variation of the transform domain
information over the transform variable is estimated
for a plurality of different values of the transform
variable, to obtain a local variation information;
wherein the temporal variation information and the
local variation information are combined, to obtain a
frequency variation model parameter;
wherein the frequency variation model parameter is
obtained using a model comprising the frequency
variation model parameter and representing a
compression or expansion of the transform domain
representation of the audio signal with respect to the

transform variable (k) assuming a smooth frequency
variation of the audio signal; and
wherein the frequency variation model parameter is
determined such that the parameterized transform-
domain variation model is adapted to the first set of
transform domain parameters and the second set of
transform domain parameters.
17. An apparatus (100) for obtaining a parameter (140)
describing a variation of a signal characteristic of a
signal on the basis of actual transform domain
parameters (120) describing the signal in a transform
domain, the apparatus comprising:
a parameter determinator (130) configured to determine
one or more model parameters of a transform-domain
variation model (130a; 130c) describing an evolution
of transform domain parameters in dependence on one or
more model parameters (140) representing a signal
characteristic, such that a model error, representing
a deviation between a modelled evolution of the
transform domain parameters and an evolution of the
actual transform domain parameters, is brought below a
predetermined threshold value or minimized;
wherein the apparatus (100) is configured to obtain
autocovariance information {Q(k,t) = qk, Q(-k,t) = q_k)
describing an autocovariance of the audio signal for a
single autocovariance window but for different
autocovariance lag values,
to evaluate, for a plurality of different pairs of
autocovariance lag values {-k,k), weighted differences
(k2(qk-q_k)) between the pairs of autocovariance
values,

wherein the weight is chosen in dependence on a
difference (2k) of the lag values of the respective
pairs of lag values, and in dependence on a variation
(q[k) of the autocovariance values over lag,
to sum-combine different weighted difference values,
to obtain a combination value, and
to obtain the model parameters on the basis of the
combination value.
18. A method for obtaining a parameter describing a
variation of a signal characteristic for a signal on
the basis of actual transform-domain parameters
describing the signal in a transformed domain, the
method comprising:
determining one or more model parameters of a
transform-domain variation model describing an
evolution of transform-domain parameters in dependence
on one or more model parameters representing a signal
characteristic, such that a model error, representing
a deviation between a modeled temporal evolution of
the transform-domain parameters and an evolution of
the actual transform-domain parameters, is brought
below a predetermined threshold value or minimized;
wherein an autocovariance information describing an
autocovariance of the signal for a single
autocovariance window but for different autocovariance
lag values is obtained;
wherein weighted differences between pairs of
autocovariance values are evaluated for a plurality of
different pairs of autocovariance lag values (-k, k),
wherein the weight is chosen in dependence on a
difference (2k) of the lag values of the respective

pairs of lag values, and in dependence on a variation
(q'_k) of the autocovariance values over lag,
wherein different weighted difference values are sum-
combined, to obtain a combination value; and
wherein the model parameters are obtained on the basis
of the combination value.
19. An apparatus (100) for obtaining a parameter (140)
describing a variation of a signal characteristic of a
signal on the basis of actual transform domain
parameters (120) describing the signal in a transform
domain, the apparatus comprising:
a parameter determinator (130) configured to determine
one or more model parameters of a transform-domain
variation model (130a; 130c) describing an evolution
of transform domain parameters in dependence on one or
more model parameters (140) representing a signal
characteristic, such that a model error, representing
a deviation between a modelled evolution of the
transform domain parameters and an evolution of the
actual transform domain parameters, is brought below a
predetermined threshold value or minimized;
wherein the apparatus (100) is configured to obtain a
parameter describing a temporal variation of an
envelope of the audio signal,
wherein the parameter determinator (130) is configured
to obtain a plurality of transform-domain parameters
(R(0,th)) describing a signal power of the audio signal
for a plurality of time intervals,
wherein the parameter determinator is configured to
obtain an envelope variation model parameter using a
representation of a parameterized transform-domain

wherein a plurality of transform-domain parameters
describing a signal power of the audio signal for a
plurality of time intervals is obtained;
wherein an envelope variation model parameter is
obtained using a representation of a parameterized
transform-domain variation model comprising an
envelope variation model parameter and representing a
temporal increase in power or a temporal decrease in
power of the transform-domain representation of the
audio signal assuming a smooth envelope variation of
the audio signal,
wherein the envelope variation model parameter is
determined such that the parameterized transform-
domain variation model is adapted to the transform-
domain parameters,
wherein a plurality of autocorrelation parameters or
autocovariance parameters are obtained for a given
autocorrelation lag or autocovariance lag, and
wherein a plurality of polynomial parameters of a
polynomial envelope variation model are determined.
21. An apparatus (100) for obtaining a parameter (140)
describing a variation of a signal characteristic of a
signal on the basis of actual transform domain
parameters (120) describing the signal in a transform
domain, the apparatus comprising:
a parameter determinator (130) configured to determine
one or more model parameters of a transform-domain
variation model (130a; 130c) describing an evolution
of transform domain parameters in dependence on one or
more model parameters (140) representing a signal
characteristic, such that a model error, representing
a deviation between a modelled evolution of the

transform domain parameters and an evolution of the
actual transform domain parameters, is brought below a
predetermined threshold value or minimized;
wherein the apparatus comprises a formant-structure-
reducer configured to preprocess an input audio
signal, to obtain a formant-structure-reduced audio
signal;
wherein the apparatus is configured to obtain the
actual transform-domain parameter on the basis of the
formant-structure-reduced audio signal;
wherein the formant-structure-reducer is configured to
estimate parameters of a linear-predictive model of
the input audio signal on the basis of a high-pass
filtered version of the input audio signal, and
to filter a broad band version of the input audio
signal on the basis of the estimated parameters of the
linear-predictive model,
to obtain the formant-structure-reduced audio signal
such that the formant-structure-reduced audio signal
comprises a low-pass characteristic.
22. A method for obtaining a parameter describing a
variation of a signal characteristic for a signal on
the basis of actual transform-domain parameters
describing the signal in a transformed domain, the
method comprising:
determining one or more model parameters of a
transform-domain variation model describing an
evolution of transform-domain parameters in dependence
on one or more model parameters representing a signal
characteristic, such that a model error, representing
a deviation between a modeled temporal evolution of

the transform-domain parameters and an evolution of
the actual transform-domain parameters, is brought
below a predetermined threshold value or minimized;
wherein an input audio signal is preprocessed, to
obtain a formant-structure-reduced audio signal;
wherein the actual transform-domain parameter is
obtained on the basis of the formant-structure-reduced
audio signal;
wherein parameters of a linear-predictive model of the
input audio signal are estimated on the basis of a
high-pass filtered version of the input audio signal;
wherein a broad band version of the input audio signal
is filtered on the basis of the estimated parameters
of the linear-predictive model,
to obtain the formant-structure-reduced audio signal
such that the formant-structure-reduced audio signal
comprises a low-pass characteristic.
23. A computer program for performing the method according
to claim 16, claim 18, claim 20 or claim 22, when the
computer program runs in a computer.
24. A time-warped audio encoder for time-warped encoding
an input audio signal, the time-warped audio encoder
comprising:
an apparatus (100) for obtaining a parameter
describing a temporal variation of a signal
characteristic of an audio signal, according to one of
claims 1 to 15 or claim 17 or claim 19 or claim 21,
wherein the apparatus for obtaining a parameter is
configured to obtain a pitch variation parameter

describing a temporal pitch variation of the input
audio signals; and
a time-warped-signal processor configured to perform a
time-warped signal sampling of the input audio signal
using the pitch variation parameter for an adjustment
of the time-warp.

An apparatus for obtaining a parameter describing a
variation of a signal characteristic of a signal on the
basis of actual transform-domain parameters describing the
audio signal in transform-domain comprises a parameter
determinator. The parameter determinator is configured to
determine one or more model parameters of a transformdomain
variation model describing an evolution of the
transform-domain parameters in dependence on one or more
model parameters representing a signal characteristic, such
that a model error, representing a deviation between a
modeled temporal evolution of the transform-domain
parameters and an evolution of the actual transform-domain
parameters, is brought below a predetermined threshold
value or minimized.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	3098-KOLNP-2011-RELEVANT DOCUMENTS [08-09-2023(online)].pdf	2023-09-08
1	abstract-3098-kolnp-2011.jpg	2011-10-07
2	3098-KOLNP-2011-RELEVANT DOCUMENTS [07-09-2022(online)].pdf	2022-09-07
2	3098-kolnp-2011-specification.pdf	2011-10-07
3	3098-KOLNP-2011-RELEVANT DOCUMENTS [25-09-2021(online)].pdf	2021-09-25
3	3098-kolnp-2011-pct request form.pdf	2011-10-07
4	3098-KOLNP-2011-RELEVANT DOCUMENTS [02-03-2020(online)].pdf	2020-03-02
4	3098-kolnp-2011-pct priority document notification.pdf	2011-10-07
5	3098-kolnp-2011-others.pdf	2011-10-07
5	3098-KOLNP-2011-IntimationOfGrant03-01-2019.pdf	2019-01-03
6	3098-KOLNP-2011-PatentCertificate03-01-2019.pdf	2019-01-03
6	3098-kolnp-2011-international search report.pdf	2011-10-07
7	3098-KOLNP-2011-Written submissions and relevant documents (MANDATORY) [24-12-2018(online)].pdf	2018-12-24
7	3098-kolnp-2011-international publication.pdf	2011-10-07
8	3098-kolnp-2011-international preliminary examination report.pdf	2011-10-07
8	3098-KOLNP-2011-HearingNoticeLetter.pdf	2018-11-12
9	3098-kolnp-2011-form-5.pdf	2011-10-07
9	3098-KOLNP-2011-Information under section 8(2) (MANDATORY) [18-08-2018(online)].pdf	2018-08-18
10	3098-kolnp-2011-form-3.pdf	2011-10-07
10	3098-KOLNP-2011-Information under section 8(2) (MANDATORY) [20-02-2018(online)].pdf	2018-02-20
11	3098-KOLNP-2011-ABSTRACT [19-01-2018(online)].pdf	2018-01-19
11	3098-kolnp-2011-form-2.pdf	2011-10-07
12	3098-KOLNP-2011-CLAIMS [19-01-2018(online)].pdf	2018-01-19
12	3098-KOLNP-2011-FORM-18.pdf	2011-10-07
13	3098-KOLNP-2011-CORRESPONDENCE [19-01-2018(online)].pdf	2018-01-19
13	3098-kolnp-2011-form-1.pdf	2011-10-07
14	3098-kolnp-2011-description (complete).pdf	2011-10-07
14	3098-KOLNP-2011-DRAWING [19-01-2018(online)].pdf	2018-01-19
15	3098-kolnp-2011-correspondence.pdf	2011-10-07
15	3098-KOLNP-2011-FER_SER_REPLY [19-01-2018(online)].pdf	2018-01-19
16	3098-kolnp-2011-claims.pdf	2011-10-07
16	3098-KOLNP-2011-PETITION UNDER RULE 137 [19-01-2018(online)].pdf	2018-01-19
17	3098-KOLNP-2011-Information under section 8(2) (MANDATORY) [29-08-2017(online)].pdf	2017-08-29
17	3098-kolnp-2011-abstract.pdf	2011-10-07
18	3098-KOLNP-2011-(14-12-2011)-CORRESPONDENCE.pdf	2011-12-14
18	3098-KOLNP-2011-FER.pdf	2017-07-27
19	3098-KOLNP-2011-(14-12-2011)-ASSIGNMENT.pdf	2011-12-14
19	Other Patent Document [21-02-2017(online)].pdf	2017-02-21
20	3098-KOLNP-2011-(09-01-2012)-FORM-3.pdf	2012-01-09
20	Other Patent Document [10-09-2016(online)].pdf	2016-09-10
21	3098-KOLNP-2011-(09-01-2012)-CORRESPONDENCE.pdf	2012-01-09
22	3098-KOLNP-2011-(09-01-2012)-FORM-3.pdf	2012-01-09
22	Other Patent Document [10-09-2016(online)].pdf	2016-09-10
23	3098-KOLNP-2011-(14-12-2011)-ASSIGNMENT.pdf	2011-12-14
23	Other Patent Document [21-02-2017(online)].pdf	2017-02-21
24	3098-KOLNP-2011-FER.pdf	2017-07-27
24	3098-KOLNP-2011-(14-12-2011)-CORRESPONDENCE.pdf	2011-12-14
25	3098-KOLNP-2011-Information under section 8(2) (MANDATORY) [29-08-2017(online)].pdf	2017-08-29
25	3098-kolnp-2011-abstract.pdf	2011-10-07
26	3098-kolnp-2011-claims.pdf	2011-10-07
26	3098-KOLNP-2011-PETITION UNDER RULE 137 [19-01-2018(online)].pdf	2018-01-19
27	3098-kolnp-2011-correspondence.pdf	2011-10-07
27	3098-KOLNP-2011-FER_SER_REPLY [19-01-2018(online)].pdf	2018-01-19
28	3098-kolnp-2011-description (complete).pdf	2011-10-07
28	3098-KOLNP-2011-DRAWING [19-01-2018(online)].pdf	2018-01-19
29	3098-KOLNP-2011-CORRESPONDENCE [19-01-2018(online)].pdf	2018-01-19
29	3098-kolnp-2011-form-1.pdf	2011-10-07
30	3098-KOLNP-2011-CLAIMS [19-01-2018(online)].pdf	2018-01-19
30	3098-KOLNP-2011-FORM-18.pdf	2011-10-07
31	3098-KOLNP-2011-ABSTRACT [19-01-2018(online)].pdf	2018-01-19
31	3098-kolnp-2011-form-2.pdf	2011-10-07
32	3098-kolnp-2011-form-3.pdf	2011-10-07
32	3098-KOLNP-2011-Information under section 8(2) (MANDATORY) [20-02-2018(online)].pdf	2018-02-20
33	3098-kolnp-2011-form-5.pdf	2011-10-07
33	3098-KOLNP-2011-Information under section 8(2) (MANDATORY) [18-08-2018(online)].pdf	2018-08-18
34	3098-KOLNP-2011-HearingNoticeLetter.pdf	2018-11-12
34	3098-kolnp-2011-international preliminary examination report.pdf	2011-10-07
35	3098-kolnp-2011-international publication.pdf	2011-10-07
35	3098-KOLNP-2011-Written submissions and relevant documents (MANDATORY) [24-12-2018(online)].pdf	2018-12-24
36	3098-KOLNP-2011-PatentCertificate03-01-2019.pdf	2019-01-03
36	3098-kolnp-2011-international search report.pdf	2011-10-07
37	3098-kolnp-2011-others.pdf	2011-10-07
37	3098-KOLNP-2011-IntimationOfGrant03-01-2019.pdf	2019-01-03
38	3098-KOLNP-2011-RELEVANT DOCUMENTS [02-03-2020(online)].pdf	2020-03-02
38	3098-kolnp-2011-pct priority document notification.pdf	2011-10-07
39	3098-KOLNP-2011-RELEVANT DOCUMENTS [25-09-2021(online)].pdf	2021-09-25
39	3098-kolnp-2011-pct request form.pdf	2011-10-07
40	3098-kolnp-2011-specification.pdf	2011-10-07
40	3098-KOLNP-2011-RELEVANT DOCUMENTS [07-09-2022(online)].pdf	2022-09-07
41	abstract-3098-kolnp-2011.jpg	2011-10-07
41	3098-KOLNP-2011-RELEVANT DOCUMENTS [08-09-2023(online)].pdf	2023-09-08

Search Strategy

1	SearchStrategy_24-07-2017.pdf