Multi Channel Audio Coding Using Complex Prediction And Window Shape Information
Abstract:
An audio encoder and an audio decoder are based on a combination of two audio channels (201, 202) to obtain a first combination signal (204) as a mid signal and a residual signal (205) which can be derived using a predicted side signal derived from the mid signal. The first combination signal and the prediction residual signal are encoded (209) and written (212) into a data stream (213) together with the prediction information (206) derived by an optimizer (207) based on an optimization target (208). A decoder uses the prediction residual signal, the first combination signal and the prediction information to derive a decoded first channel signal and a decoded second channel signal. In an encoder example or in a decoder example, a real-to-imaginary transform can be applied for estimating the imaginary part of the spectrum of the first combination signal. For calculating the prediction signal used in the derivation of the prediction residual signal, the real-valued first combination signal is multiplied by a real portion of the complex prediction information and the estimated imaginary part of the first combination signal is multiplied by an imaginary portion of the complex prediction information.
{Fig. 2}
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
Apollo Building, 3E Herikerbergweg 1-35, 1101CN, Amsterdam Zuid-Oost, The Netherlands.
Inventors
1. PURNHAGEN, Heiko
Gjuteribacken 17, S-17265 Sundbyberg, Sweden.
2. CARLSSON, Pontus
Byggmaestarvaegen 3, S-16832 Bromma, Sweden.
3. VILLEMOES, Lars
Mandolinvägen 22, S-17556 Järfälla, Sweden
4. ROBILLARD, Julien
Innerer Kleinreuther Weg 25A, 90408 Nürnberg, Germany
5. NEUSINGER, Matthias
Bergstrasse 10, 91186 Rohr, Germany.
6. HELMRICH, Christian
Hauptstrasse 68, 91054 Erlangen, Germany.
7. HILPERT, Johannes
Herrnhüttestrasse 46, 90411 Nürnberg, Germany.
8. RETTELBACH, Nikolaus
Spessartstrasse 38, 90427 Nürnberg, Germany.
9. DISCH, Sascha
Wilhelmstrasse 70, 90766 Fürth, Germany.
10. EDLER, Bernd
Hemelingstrasse 10, 30419 Hannover, Germany.
Specification
Audio Encoder, Audio Decoder and Related Methods for Processing Multi-Channel
Audio Signals Using Complex Prediction
Specification
The present invention is related to audio processing and, particularly, to multi-channel audio processing of a multi-channel signal having two or more channel signals.
It is known in the field of multi-channel or stereo processing to apply the so-called mid/side stereo coding. In this concept, a combination of the left or first audio channel signal and the right or second audio channel signal is formed to obtain a mid or mono signal M. Additionally, a difference between the left or first channel signal and the right or second channel signal is formed to obtain the side signal S. This mid/side coding method results in a significant coding gain, when the left signal and the right signal are quite similar to each other, since the side signal will become quite small. Typically, a coding gain of a quantizer/entropy encoder stage will become higher, when the range of values to be quantized/entropy-encoded becomes smaller. Hence, for a PCM or a Huffman-based or arithmetic entropy-encoder, the coding gain increases, when the side signal becomes smaller. There exist, however, certain situations in which the mid/side coding will not result in a coding gain. The situation can occur when the signals in both channels are phase-shifted to each other, for example, by 90°. Then, the mid signal and the side signal can be in a quite similar range and, therefore, coding of the mid signal and the side signal using the entropy-encoder will not result in a coding gain and can even result in an increased bit rate. Therefore, a frequency-selective mid/side coding can be applied in order to deactivate the mid/side coding in bands, where the side signal does not become smaller to a certain degree with respect to the original left signal, for example.
Although the side signal will become zero, when the left and right signals are identical, resulting in a maximum coding gain due to the elimination of the side signal, the situation once again becomes different when the mid signal and the side signal are identical with respect to the shape of the waveform, but the only difference between both signals is their overall amplitudes. In this case, when it is additionally assumed that the side signal has no phase-shift to the mid signal, the side signal significantly increases, although, on the other hand, the mid signal does not decrease so much with respect to its value range. When such a situation occurs in a certain frequency band, then one would again deactivate mid/side coding due to the lack of coding gain. Mid/side coding can be applied frequency-selectively or can alternatively be applied in the time domain.
There exist alternative multi-channel coding techniques which do not rely on a kind of a waveform approach as mid/side coding, but which rely on the parametric processing based on certain binaural cues. Such techniques are known under the term "binaural cue coding", "parametric stereo coding" or "MPEG Surround coding". Here, certain cues are calculated for a plurality of frequency bands. These cues include inter-channel level differences, inter-channel coherence measures, inter-channel time differences and/or inter-channel phase differences. These approaches start from the assumption that a multi-channel impression felt by the listener does not necessarily rely on the detailed waveforms of the two channels, but relies on the accurate frequency-selectively provided cues or inter-channel information. This means that, in a rendering machine, care has to be taken to render multi-channel signals which accurately reflect the cues, but the waveforms are not of decisive importance.
This approach can be complex particularly in the case, when the decoder has to apply a decorrelation processing in order to artificially create stereo signals which are decorrelated from each other, although all these channels are derived from one and the same downmix channel. Decorrelators for this purpose are, depending on their implementation, complex and may introduce artifacts particularly in the case of transient signal portions. Additionally, in contrast to waveform coding, the parametric coding approach is a lossy coding approach which inevitably results in a loss of information not only introduced by the typical quantization but also introduced by looking on the binaural cues rather than the particular waveforms. This approach results in very low bit rates but may include quality compromises.
There exist recent developments for unified speech and audio coding (USAC) illustrated in Fig. 7a. A core decoder 700 performs a decoding operation of the encoded stereo signal at input 701, which can be mid/side encoded. The core decoder outputs a mid signal at line 702 and a side or residual signal at line 703. Both signals are transformed into a QMF domain by QMF filter banks 704 and 705. Then, an MPEG Surround decoder 706 is applied to generate a left channel signal 707 and a right channel signal 708. These low-band signals are subsequently introduced into a spectral band replication (SBR) decoder 709, which produces broad-band left and right signals on the lines 710 and 711, which are then transformed into a time domain by the QMF synthesis filter banks 712, 713 so that broad-band left and right signals L, R are obtained.
Fig. 7b illustrates the situation when the MPEG Surround decoder 706 would perform a mid/side decoding. Alternatively, the MPEG Surround decoder block 706 could perform a
binaural cue based parametric decoding for generating stereo signals from a single mono core decoder signal. Naturally, the MPEG Surround decoder 706 could also generate a plurality of low band output signals to be input into the SBR decoder block 709 using parametric information such as inter-channel level differences, inter-channel coherence measures or other such inter-channel information parameters.
When the MPEG Surround decoder block 706 performs the mid/side decoding illustrated in Fig. 7b, a real-gain factor g can be applied and DMX/RES and L/R are downmix/residual and left/right signals, respectively, represented in the complex hybrid QMF domain.
Using a combination of a block 706 and a block 709 causes only a small increase in computational complexity compared to a stereo decoder used as a basis, because the complex QMF representation of the signal is already available as part of the SBR decoder. In a non- SBR configuration, however, QMF-based stereo coding, as proposed in the context of USAC, would result in a significant increase in computational complexity because of the necessary QMF banks which would require in this example 64-band analysis banks and 64-band synthesis banks. These filter banks would have to be added only for the purpose of stereo coding.
In the MPEG USAC system under development, however, there also exist coding modes at high bit rates where SBR typically is not used.
It is an objective of the present invention to provide an improved audio processing concept which, on the one hand, yields high coding gain and, on the other hand, results in a good audio quality and/or reduced computational complexity.
This objective is achieved by an audio decoder in accordance with claim 1, an audio encoder in accordance with claim 15, a method of audio decoding in accordance with claim 21, a method of audio encoding in accordance with claim 22, a computer program in accordance with claim 23, or an encoded multi-channel audio signal in accordance with claim 24.
The present invention relies on the finding that a coding gain of the high quality waveform - coding approach can be significantly enhanced by a prediction of a second combination signal using a first combination signal, where both combination signals are derived from the original channel signals using a combination rule such as the mid/side combination rule. It has been found that this prediction information is calculated by a predictor in an audio encoder so that an optimization target is fulfilled, incurs only a small overhead, but results in a significant decrease of bit rate required for the side signal without losing any audio quality, since the inventive prediction is nevertheless a waveform-based coding and not a parameter-based stereo or multi-channel coding approach. In order to reduce computational complexity, it is preferred to perform frequency-domain encoding, where the prediction information is derived from frequency domain input data in a band-selective way. The conversion algorithm for converting the time domain representation into a spectral representation is preferably a critically sampled process such as a modified discrete cosine transform (MDCT) or a modified discrete sine transform (MDST), which is different from a complex transform in that only real values or only imaginary values are calculated, while, in a complex transform, real and complex values of a spectrum are calculated resulting in 2-times oversampling.
Preferably, a transform based on aliasing introduction and cancellation is used. The MDCT, in particular, is such a transform and allows a cross-fading between subsequent blocks without any overhead due to the well-known time domain aliasing cancellation (TDAC) property which is obtained by overlap-add-processing on the decoder side.
Preferably, the prediction information calculated in the encoder, transmitted to the decoder and used in the decoder comprises an imaginary part which can advantageously reflect phase differences between the two audio channels in arbitrarily selected amounts between 0° and 360°. Computational complexity is significantly reduced when only a real-valued transform or, in general, a transform is applied which either provides a real spectrum only or provides an imaginary spectrum only. In order to make use of this imaginary prediction information which indicates a phase shift between a certain band of the left signal and a corresponding band of the right signal, a real-to-imaginary converter or, depending on the implementation of the transform, an imaginary-to-real converter is provided in the decoder in order to calculate a prediction residual signal from the first combination signal, which is phase-rotated with respect to the original combination signal. This phase-rotated prediction residual signal can then be combined with the prediction residual signal transmitted in the bit stream to regenerate a side signal which, finally, can be combined with the mid signal to obtain the decoded left channel in a certain band and the decoded right channel in this band.
To increase audio quality, the same real-to-imaginary or imaginary-to-real converter which is applied on the decoder side is implemented on the encoder side as well, when the prediction residual signal is calculated in the encoder.
The present invention is advantageous in that it provides an improved audio quality and a reduced bit rate compared to systems having the same bit rate or having the same audio quality.
Additionally, advantages with respect to computational efficiency of unified stereo coding useful in the MPEG US AC system at high bit rates are obtained, where SBR is typically not used. Instead of processing the signal in the complex hybrid QMF domain, these approaches implement residual-based predictive stereo coding in the native MDCT domain of the underlying stereo transform coder.
In accordance with an aspect of the present invention, the present invention comprises an apparatus or method for generating a stereo signal by complex prediction in the MDCT domain, wherein the complex prediction is done in the MDCT domain using a real-to-complex transform, where this stereo signal can either be an encoded stereo signal on the encoder-side or can alternatively be a decoded/transmitted stereo signal, when the apparatus or method for generating the stereo signal is applied on the decoder-side.
Preferred embodiments of the present invention are subsequently discussed with respect to the accompanying drawings, in which:
Fig. 1 is a diagram of a preferred embodiment of an audio decoder;
Fig. 2 is a block diagram of a preferred embodiment of an audio encoder;
Fig. 3 a illustrates an implementation of the encoder calculator of Fig. 2;
Fig. 3b illustrates an alternative implementation of the encoder calculator of Fig. 2;
Fig. 3c illustrates a mid/side combination rule to be applied on the encoder side;
Fig. 4a illustrates an implementation of the decoder calculator of Fig. 1;
Fig. 4b illustrates an alternative implementation of the decoder calculator in form of a matrix calculator;
Fig. 4c illustrates a mid/side inverse combination rule corresponding
combination rule illustrated in Fig. 3c;
Fig. 5a illustrates an embodiment of an audio encoder operating in the frequency domain which is preferably a real- valued frequency domain;
Fig. 5b illustrates an implementation of an audio decoder operating in the frequency domain;
Fig. 6a illustrates an alternative implementation of an audio encoder operating in the
MDCT domain and using a real-to-imaginary transform;
Fig. 6b illustrates an audio decoder operating in the MDCT domain and using a real-to- imaginary transform;
Fig. 7a illustrates an audio postprocessor using a stereo decoder and a subsequently connected SBR decoder;
Fig. 7b illustrates a mid/side upmix matrix;
Fig. 8 a illustrates a detailed view on the MDCT block in Fig. 6a;
Fig. 8b illustrates a detailed view on the MDCT"1 block of Fig. 6b;
Fig. 9a illustrates an implementation of an optimizer operating on reduced resolution with respect to the MDCT output;
Fig. 9b illustrates a representation of an MDCT spectrum and the corresponding lower resolution bands in which the prediction information is calculated;
Fig. 10a illustrates an implementation of the real-to-imaginary transformer in Fig. 6a or
Fig. 6b; and
Fig. 10b illustrates a possible implementation of the imaginary spectrum calculator of
Fig. 10a.
Fig. 1 illustrates an audio decoder for decoding an encoded multi-channel audio signal obtained at an input line 100. The encoded multi-channel audio signal comprises an encoded first combination signal generated using a combination rule for combining a first channel signal and a second channel signal representing the multi-channel audio signal, an encoded prediction residual signal and prediction information. The encoded multi-channel signal can be a data stream such as a bitstream which has the three components in a multiplexed form. Additional side information can be included in the encoded multi-channel signal on line 100. The signal is input into an input interface 102. The input interface 102 can be implemented as a data stream demultiplexer which outputs the encoded first combination signal on line 104, the encoded residual signal on line 106 and the prediction information on line 108. Preferably, the prediction information is a factor having a real part not equal to zero and/or an imaginary part different from zero. The encoded combination signal and the encoded residual signal are input into a signal decoder 110 for decoding the first combination signal to obtain a decoded first combination signal on line 112. Additionally, the signal decoder 110 is configured for decoding the encoded residual signal to obtain a decoded residual signal on line 114. Depending on the encoding processing on an audio encoder side, the signal decoder may comprise an entropy-decoder such as a Huffman decoder, an arithmetic decoder or any other entropy-decoder and a subsequently connected dequantization stage for performing a dequantization operation matching with a quantizer operation in an associated audio encoder. The signals on line 112 and 114 are input into a decoder calculator 115, which outputs the first channel signal on line 117 and a second channel signal on line 118, where these two signals are stereo signals or two channels of a multi-channel audio signal. When, for example, the multi-channel audio signal comprises five channels, then the two signals are two channels from the multi-channel signal. In order to fully encode such a multi-channel signal having five channels, two decoders illustrated in Fig. 1 can be applied, where the first decoder processes the left channel and the right channel, the second decoder processes the left surround channel and the right surround channel, and a third mono decoder would be used for performing a mono-encoding of the center channel. Other groupings, however, or combinations of wave form coders and parametric coders can be applied as well. An alternative way to generalize the prediction scheme to more than two channels would be to treat three (or more) signals at the same time, i.e., to predict a 3rd combination signal from a 1st and a 2nd signal using two prediction coefficients, very similarly to the "two-to-three" module in MPEG Surround.
The decoder calculator 116 is configured for calculating a decoded multi-channel signal having the decoded first channel signal 117 and the decoded second channel signal 118 using the decoded residual signal 114, the prediction information 108 and the decoded first combination signal 112. Particularly, the decoder calculator 116 is configured to operate in such a way that the decoded first channel signal and the decoded second channel signal are at least an approximation of a first channel signal and a second channel signal of the multichannel signal input into a corresponding encoder, which are combined by the combination rule when generating the first combination signal and the prediction residual signal. Specifically, the prediction information on line 108 comprises a real- valued part different from zero and/or an imaginary part different from zero.
The decoder calculator 116 can be implemented in different manners. A first implementation is illustrated in Fig. 4a. This implementation comprises a predictor 1160, a combination signal calculator 1161 and a combiner 1162. The predictor receives the decoded first combination signal 112 and the prediction information 108 and outputs a prediction signal 1163. Specifically, the predictor 1160 is configured for applying the prediction information 108 to the decoded first combination signal 112 or a signal derived from the decoded first combination signal. The derivation rule for deriving the signal to which the prediction information 108 is applied may be a real-to-imaginary transform, or equally, an imaginary-to-real transform or a weighting operation, or depending on the implementation, a phase shift operation or a combined weighting/phase shift operation. The prediction signal 1163 is input together with the decoded residual signal into the combination signal calculator 1161 in order to calculate the decoded second combination signal 1165. The signals 112 and 1165 are both input into the combiner 1162, which combines the decoded first combination signal and the second combination signal to obtain the decoded multi-channel audio signal having the decoded first channel signal and the decoded second channel signal on output lines 1166 and 1167, respectively. Alternatively, the decoder calculator is implemented as a matrix calculator 1168 which receives, as input, the decoded first combination signal or signal M, the decoded residual signal or signal D and the prediction information a 108. The matrix calculator 1168 applies a transform matrix illustrated as 1169 to the signals M, D to obtain the output signals L, R, where L is the decoded first channel signal and R is the decoded second channel signal. The notation in Fig. 4b resembles a stereo notation with a left channel L and a right channel R. This notation has been applied in order to provide an easier understanding, but it is clear to those skilled in the art that the signals L, R can be any combination of two channel signals in a multi-channel signal having more than two channel signals. The matrix operation 1169 unifies the operations in blocks 1160, 1161 and 1162 of Fig. 4a into a kind of "single-shot" matrix calculation, and the inputs into the Fig. 4a circuit and the outputs from the Fig. 4a circuit are identical to the inputs into the matrix calculator 1168 or the outputs from the matrix calculator 1168.
Fig. 4c illustrates an example for an inverse combination rule applied by the combiner 1162 in Fig. 4a. Particularly, the combination rule is similar to the decoder-side combination rule in well-known mid/side coding, where L = M + S, and R = M - S. It is to be understood that the signal S used by the inverse combination rule in Fig. 4c is the signal calculated by the combination signal calculator, i.e. the combination of the prediction signal on line 1163 and the decoded residual signal on line 114. It is to be understood that in this specification, the signals on lines are sometimes named by the reference numerals for the lines or are sometimes indicated by the reference numerals themselves, which have been attributed to the lines. Therefore, the notation is such that a line having a certain signal is indicating the signal itself. A line can be a physical line in a hardwired implementation. In a computerized
implementation, however, a physical line does not exist, but the signal represented by the line is transmitted from one calculation module to the other calculation module.
Fig. 2 illustrates an audio encoder for encoding a multi-channel audio signal 200 having two or more channel signals, where a first channel signal is illustrated at 201 and a second channel is illustrated at 202. Both signals are input into an encoder calculator 203 for calculating a first combination signal 204 and a prediction residual signal 205 using the first channel signal 201 and the second channel signal 202 and the prediction information 206, so that the prediction residual signal 205, when combined with a prediction signal derived from the first combination signal 204 and the prediction information 206 results in a second combination signal, where the first combination signal and the second combination signal are derivable from the first channel signal 201 and the second channel signal 202 using a combination rule.
The prediction information is generated by an optimizer 207 for calculating the prediction information 206 so that the prediction residual signal fulfills an optimization target 208. The first combination signal 204 and the residual signal 205 are input into a signal encoder 209 for encoding the first combination signal 204 to obtain an encoded first combination signal 210 and for encoding the residual signal 205 to obtain an encoded residual signal 211. Both encoded signals 210, 211 are input into an output interface 212 for combining the encoded first combination signal 210 with the encoded prediction residual signal 211 and the prediction information 206 to obtain an encoded multi-channel signal 213, which is similar to the encoded multi-channel signal 100 input into the input interface 102 of the audio decoder illustrated in Fig. 1.
Depending on the implementation, the optimizer 207 receives either the first channel signal 201 and the second channel signal 202, or as illustrated by lines 214 and 215, the first combination signal 214 and the second combination signal 215 derived from a combiner 2031 of Fig. 3 a, which will be discussed later.
A preferred optimization target is illustrated in Fig. 2, in which the coding gain is maximized, i.e. the bit rate is reduced as much as possible. In this optimization target, the residual signal D is minimized with respect to a. This means, in other words, that the prediction information a is chosen so that ||S - aM||2 is minimized. This results in a solution for x illustrated in Fig. 2. The signals S, M are given in a block- wise manner and are preferably spectral domain signals, where the notation ||...|| means the 2-norm of the argument, and where <...> illustrates the dot product as usual. When the first channel signal 201 and the second channel signal 202 are input into the optimizer 207, then the optimizer would have to apply the combination rule, where an exemplary combination rule is illustrated in Fig. 3 c. When, however, the first
combination signal 214 and the second combination signal 215 are input into the optimizer 207, then the optimizer 207 does not need to implement the combination rule by itself.
Other optimization targets may relate to the perceptual quality. An optimization target can be that a maximum perceptual quality is obtained. Then, the optimizer would require additional information from a perceptual model. Other implementations of the optimization target may relate to obtaining a minimum or a fixed bit rate. Then, the optimizer 207 would be implemented to perform a quantization/entropy-encoding operation in order to determine the required bit rate for certain a values so that the a can be set to fulfill the requirements such as a minimum bit rate, or alternatively, a fixed bit rate. Other implementations of the optimization target can relate to a minimum usage of encoder or decoder resources. In case of an implementation of such an optimization target, information on the required resources for a certain optimization would be available in the optimizer 207. Additionally, a combination of these optimization targets or other optimization targets can be applied for controlling the optimizer 207 which calculates the prediction information 206.
The encoder calculator 203 in Fig. 2 can be implemented in different ways, where an exemplary first implementation is illustrated in Fig. 3 a, in which an explicit combination rule is performed in the combiner 2031. An alternative exemplary implementation is illustrated in Fig. 3b, where a matrix calculator 2039 is used. The combiner 2031 in Fig. 3a may be implemented to perform the combination rule illustrated in Fig. 3 c, which is exemplarily the well-known mid/side encoding rule, where a weighting factor of 0.5 is applied to all branches. However, other weighting factors or no weighting factors at all can be implemented depending on the implementation. Additionally, it is to be noted that other combination rules such as other linear combination rules or non-linear combination rules can be applied, as long as there exists a corresponding inverse combination rule which can be applied in the decoder combiner 1162 illustrated in Fig. 4a, which applies a combination rule that is inverse to the combination rule applied by the encoder. Due to the inventive prediction, any invertible prediction rule can be used, since the influence on the waveform is "balanced" by the prediction, i.e. any error is included in the transmitted residual signal, since the prediction operation performed by the optimizer 207 in combination with the encoder calculator 203 is a waveform-conserving process.
The combiner 2031 outputs the first combination signal 204 and a second combination signal 2032. The first combination signal is input into a predictor 2033, and the second combination signal 2032 is input into the residual calculator 2034. The predictor 2033 calculates a prediction signal 2035, which is combined with the second combination signal 2032 to finally obtain the residual signal 205. Particularly, the combiner 2031 is configured for combining
the two channel signals 201 and 202 of the multi-channel audio signal in two different ways to obtain the first combination signal 204 and the second combination signal 2032, where the two different ways are illustrated in an exemplary embodiment in Fig. 3c. The predictor 2033 is configured for applying the prediction information to the first combination signal 204 or a signal derived from the first combination signal to obtain the prediction signal 2035. The signal derived from the combination signal can be derived by any non-linear or linear operation, where a real-to-imaginary transform/ imaginary-to-real transform is preferred, which can be implemented using a linear filter such as an FIR filter performing weighted additions of certain values.
The residual calculator 2034 in Fig. 3a may perform a subtraction operation so that the prediction signal is subtracted from the second combination signal. However, other operations in the residual calculator are possible. Correspondingly, the combination signal calculator 1161 in Fig. 4a may perform an addition operation where the decoded residual signal 114 and the prediction signal 1163 are added together to obtain the second combination signal 1165.
Fig. 5 a illustrates a preferred implementation of an audio encoder. Compared to the audio encoder illustrated in Fig 3a, the first channel signal 201 is a spectral representation of a time domain first channel signal 55a. Correspondingly, the second channel signal 202 is a spectral representation of a time domain channel signal 55b. The conversion from the time domain into the spectral representation is performed by a time/frequency converter 50 for the first channel signal and a time/frequency converter 51 for the second channel signal. Preferably, but not necessarily, the spectral converters 50, 51 are implemented as real-valued converters. The conversion algorithm can be a discrete cosine transform, an FFT transform, where only the real-part is used, an MDCT or any other transform providing real-valued spectral values. Alternatively, both transforms can be implemented as an imaginary transform, such as a DST, an MDST or an FFT where only the imaginary part is used and the real part is discarded. Any other transform only providing imaginary values can be used as well. One purpose of using a pure real- valued transform or a pure imaginary transform is computational complexity, since, for each spectral value, only a single value such as magnitude or the real part has to be processed, or, alternatively, the phase or the imaginary part. In contrast to a fully complex transform such as an FFT, two values, i.e., the real part and the imaginary part for each spectral line would have to be processed which is an increase of computational complexity by a factor of at least 2. Another reason for using a real-valued transform here is that such a transform is usually critically sampled, and hence provides a suitable (and commonly used) domain for signal quantization and entropy coding (the standard "perceptual audio coding" paradigm implemented in "MP3", AAC, or similar audio coding systems).
Fig. 5a additionally illustrates the residual calculator 2034 as an adder which receives the side signal at its "plus" input and which receives the prediction signal output by the predictor 2033 at its "minus" input. Additionally, Fig. 5a illustrates the situation that the predictor control information is forwarded from the optimizer to the multiplexer 212 which outputs a multiplexed bit stream representing the encoded multi-channel audio signal. Particularly, the prediction operation is performed in such a way that the side signal is predicted from the mid signal as illustrated by the Equations to the right of Fig. 5a.
Preferably, the predictor control information 206 is a factor as illustrated to the right in Fig. 3b. In an embodiment in which the prediction control information only comprises a real portion such as the real part of a complex-valued a or a magnitude of the complex-valued a, where this portion corresponds to a factor different from zero, a significant coding gain can be obtained when the mid signal and the side signal are similar to each other due to their waveform structure, but have different amplitudes.
When, however, the prediction control information only comprises a second portion which can be the imaginary part of a complex-valued factor or the phase information of the complex-valued factor, where the imaginary part or the phase information is different from zero, the present invention achieves a significant coding gain for signals which are phase shifted to each other by a value different from 0° or 180°, and which have, apart from the phase shift, similar waveform characteristics and similar amplitude relations.
Preferably, a prediction control information is complex-valued. Then, a significant coding gain can be obtained for signals being different in amplitude and being phase shifted. In a situation in which the time/frequency transforms provide complex spectra, the operation 2034 would be a complex operation in which the real part of the predictor control information is applied to the real part of the complex spectrum M and the imaginary part of the complex prediction information is applied to the imaginary part of the complex spectrum. Then, in adder 2034, the result of this prediction operation is a predicted real spectrum and a predicted imaginary spectrum, and the predicted real spectrum would be subtracted from the real spectrum of the side signal S (band-wise), and the predicted imaginary spectrum would be subtracted from the imaginary part of the spectrum of S to obtain a complex residual spectrum D.
The time-domain signals L and R are real- valued signals, but the frequency-domain signals can be real- or complex-valued. When the frequency-domain signals are real-valued, then the transform is a real-valued transform. When the frequency domain signals are complex, then the transform is a complex-valued transform. This means that the input to the time-to-
frequency and the output of the frequency-to-time transforms are real-valued, while the frequency domain signals could e.g. be complex-valued QMF -domain signals.
Fig. 5b illustrates an audio decoder corresponding to the audio encoder illustrated in Fig. 5a. Similar elements with respect to the Fig. 1 audio decoder have similar reference numerals.
The bitstream output by bitstream multiplexer 212 in Fig. 5a is input into a bitstream demultiplexer 102 in Fig. 5b. The bitstream demultiplexer 102 demultiplexes the bitstream into the downmix signal M and the residual signal D. The downmix signal M is input into a dequantizer 110a. The residual signal D is input into a dequantizer 110b. Additionally, the bitstream demultiplexer 102 demultiplexes a predictor control information 108 from the bitstream and inputs same into the predictor 1160. The predictor 1160 outputs a predicted side signal a · M and the combiner 1161 combines the residual signal output by the dequantizer 110b with the predicted side signal in order to finally obtain the reconstructed side signal S. The signal is then input into the combiner 1162 which performs, for example, a sum/difference processing, as illustrated in Fig. 4c with respect to the mid/side encoding. Particularly, block 1162 performs an (inverse) mid/side decoding to obtain a frequency-domain representation of the left channel and a frequency-domain representation of the right channel. The frequency-domain representation is then converted into a time domain representation by corresponding frequency/time converters 52 and 53.
Depending on the implementation of the system, the frequency/time converters 52, 53 are real-valued frequency/time converters when the frequency-domain representation is a real-valued representation, or complex-valued frequency/time converters when the frequency-domain representation is a complex-valued representation.
For increasing efficiency, however, performing a real-valued transform is preferred as illustrated in another implementation in Fig. 6a for the encoder and Fig. 6b for the decoder. The real- valued transforms 50 and 51 are implemented by an MDCT. Additionally, the prediction information is calculated as a complex value having a real part and an imaginary part. Since both spectra M, S are real-valued spectra, and since, therefore, no imaginary part of the spectrum exists, a real-to-imaginary converter 2070 is provided which calculates an estimated imaginary spectrum 600 from the real-valued spectrum of -signal M. This real-to-imaginary transformer 2070 is a part of the optimizer 207, and the imaginary spectrum 600 estimated by block 2070 is input into the a optimizer stage 2071 together with the real spectrum M in order to calculate the prediction information 206, which now has a real-valued factor indicated at 2073 and an imaginary factor indicated at 2074. Now, in accordance with this embodiment, the real-valued spectrum of the first combination signal M is multiplied by the real part
Documents
Orders
Section
Controller
Decision Date
Application Documents
#
Name
Date
1
201938005258-IntimationOfGrant06-03-2024.pdf
2024-03-06
1
201938005258-STATEMENT OF UNDERTAKING (FORM 3) [11-02-2019(online)].pdf
2019-02-11
2
201938005258-PatentCertificate06-03-2024.pdf
2024-03-06
2
201938005258-POWER OF AUTHORITY [11-02-2019(online)].pdf
2019-02-11
3
201938005258-FORM 3 [18-01-2024(online)].pdf
2024-01-18
3
201938005258-FORM 1 [11-02-2019(online)].pdf
2019-02-11
4
201938005258-Written submissions and relevant documents [20-12-2023(online)].pdf
2023-12-20
4
201938005258-FIGURE OF ABSTRACT [11-02-2019(online)].pdf
2019-02-11
5
201938005258-DRAWINGS [11-02-2019(online)].pdf
2019-02-11
5
201938005258-Correspondence to notify the Controller [04-12-2023(online)].pdf
2023-12-04
6
201938005258-FORM 13 [04-12-2023(online)].pdf
2023-12-04
6
201938005258-DECLARATION OF INVENTORSHIP (FORM 5) [11-02-2019(online)].pdf