Abstract: An apparatus for decoding an encoded multichannel signal, comprises: a base channel decoder (700) for decoding an encoded base channel to obtain a decoded base channel; a decorrelation filter (800) for filtering at least a portion of the decoded base channel to obtain a filling signal; and a multichannel processor (900) for performing a multichannel processing using a spectral representation of the decoded base channel and a spectral representation of the filling signal, wherein the decorrelation filter (800) is a broad band filter and the multichannel processor (900) is configured to apply a narrow band processing to the spectral representation of the decoded base channel and the spectral representation of the filling signal. Fig. 7a
Description:Description
The present invention is related to audio processing and, particularly, to multichannel audio processing within an apparatus or method for decoding an encoded multichannel signal. 5
The state of the art codec for parametric coding of stereo signals at low bitrates is the MPEG codec xHE-AAC. It features a fully parametric stereo coding mode based on a mono downmix and stereo parameters inter-channel level difference (ILD) and inter-channel coherence (ICC), which are estimated in subbands. The output is synthesized from the mono downmix by matrixing in each subband the subband 10 downmix signal and a decorrelated version of that subband downmix signal, which is obtained by applying subband filters within the QMF filterbank.
There are some drawbacks related to xHE-AAC for coding speech items. The filters by which the synthetic second signal is generated produce a very reverberant 15 version of the input signal, which requires a ducker. Therefore, the processing heavily smears the spectral shape of the input signal over time. This works well for many signal types but for speech signals, where the spectral envelope changes rapidly, this causes unnatural coloration and audible artifacts, such as double talk or ghost voice. Furthermore, the filters depend on the temporal resolution of the 20 underlying QMF filter bank, which changes with the sampling rate. Therefore, the output signal is not consistent for different sampling rates.
Apart from this, the 3GPP codec AMR-WB+ features a semi-parametric stereo mode supporting bitrates from 7 to 48kbit/s. It is based on a mid/side transform of left 25 and right input channel. In low frequency range, the side signal s is predicted by the mid signal m to obtain a balance gain and m and the prediction residual are both encoded and transmitted, alongside with the prediction coefficient, to the decoder. In mid-frequency range, only the downmix signal m is coded and the missing signal s is predicted from m using a low order FIR filter, which is calculated 30 at the encoder. This is combined with a bandwidth extension for both channels. The codec generally yields a more natural sound than xHE-AAC for speech, but faces several problems. The procedure of predicting s by m by a low order FIR filter does not work very well if the input channels are only weakly correlated, as is e.g. the case for echoic speech signals or double talk. Also, the codec is unable to handle 35 out-of-phase signals, which can lead to substantial loss in quality, and one observes
2
that the stereo image of the decoded output is usually very compressed. Furthermore, the method is not folly parametric and hence not efficient in terms of bitrate.
Generally, a fully parametric method may result in audio quality degradations due the 5 fact that any signal portions lost due to parametric encoding are not reconstructed on the decoder-side.
On the hand, waveform-preserving procedures such as mid/side coding or so do not allow substantial bitrates savings as can be obtained from parametric multichannel 10 coders.
It is an object of the present invention to provide an improved concept for decoding an encoded multichannel signal. 15
This object is achieved by an apparatus for decoding an encoded multichannel signal, a method of decoding an encoded multichannel signal of claim 37, a computer program of claim 38, and audio signal decorrelator of claim 39, a method of decorrelating an audio input signal of claim 49 or a computer program of claim 50. 20
The present invention is based on the finding that a mixed approach is useful for decoding an encoded multi-channel signal. This mixed approach relies on using a filling signal generated by a decorrelation filter, and this filling signal is then used by a multi-channel processor such as a parametric or other multi-channel processor to generate the decoded multi-channel signal. Particularly, the decorrelation filter is a 25 broad band filter and the multi-channel processor is configured to apply a narrow band processing to the spectral representation. Thus, the filling signal is preferably generated in the time domain by an allpass filter procedure, for example, and the multichannel processing takes place in the spectral domain using the spectral representation of the decoded base channel and, additionally, using a spectral 30 representation of the filling signal generated from the filling signal calculated in the time domain.
Thus, the advantages of frequency domain multi-channel processing on the one hand and time domain decorrelation on the other hand are combined in a useful way to 35 obtain a decoded multi-channel signal having a high audio quality. Nevertheless, the 3
bitrate for transmitting the encoded multi-channel signal is kept as low as possible due to the fact that the encoded multi-channel signal is typically not a waveform-preserving encoding format but, for example, a parametric multi-channel coding format. Hence, for generating the filling signal, only decoder-available data such as the decoded base channel is used and, in certain embodiments, additional stereo parameters such as a 5 gain parameter or a prediction parameter or, alternatively, ILD, ICC or any other stereo parameters known in the art.
Subsequently, several preferred embodiments are discussed. The most efficient way to code stereo signals is to use parametric methods such as Binaural Cue Coding or 10 Parametric Stereo. They aim at reconstructing the spatial impression from a mono downmix by restoring several spatial cues in subbands and as such are based on psychoacoustics. There is another way of looking at parametric methods: one simply tries to parametrically model one channel by another, trying to exploit inter channel redundancy. This way, one may recover part of the secondary channel from the primary 15 channel but one is usually left with a residual component. Omitting this component usually leads to an unstable stereo image of the decoded output. Therefore, it is necessary to fill in a suitable replacement for such residual components. Since such a replacement is blind, it is safest to take such parts from a second signal that has similar temporal and spectral properties as the downmix signal. 20
Hence, embodiments of the present invention is particularly useful in the context of parametric audio coder and, particularly, parametric audio decoder where replacements for missing residual parts are extracted from an artificial signal generated by a decorrelation filter on the decoder-side. 25
Further embodiments relate to procedures for generating the artificial signal. Embodiments relate to methods of generating an artificial second channel from which replacements for missing residual parts are extracted and its use in a fully parametric stereo coder, called enhanced Stereo Filling. The signal is more suitable 30 for coding speech signals than the xHEAAC signal, since its spectral shape is temporally closer to the input signal. It is generated in time domain by applying a special filter structure, and therefore independent of the filter bank in which the stereo upmix is performed. It can hence be used in different upmix procedures. It could, for instance, be used in xHE-AAC to replace the artificial signals after 35 transforming to QMF domain, which would improve the performance for speech, as
4
well as in the midrange of AMR-WB+ to stand in for the residual in the mid/side prediction, which would improve the performance for weakly correlated input channels and improve the stereo image. This is of special interest for codecs featuring different stereo modes (such as time domain and frequency domain stereo processing). 5
In preferred embodiments, the decorrelation filter comprises at least one allpass filter cell, the at least one allpass filter cell comprising two Schroeder allpass filter cells nested into a third Schroeder allpass filter, and/or the allpass filter comprises at least one allpass filter cell, the allpass filter cell comprising two cascaded Schroeder allpass 10 filters, wherein an input into the first cascaded Schroeder allpass filter and an output from the cascaded second Schroeder allpass filter are connected, in the direction of the signal flow, before a delay stage of the third Schroeder allpass filter.
In a further embodiment, several such allpass filter cells comprising of three nested 15 Schroeder allpass filters are cascaded in order to obtain a specifically useful allpass filter that has a good impulse response for the purpose of stereo or multi-channel decoding.
It is to be emphasized here that, although several aspects of the present invention are 20 discussed with respect to stereo decoding generating, from a mono base channel, a left upmix channel and a right upmix channel, the present invention is also applicable for multi-channel decoding, where a signal of, for example, four channels is encoded using two base channels, wherein the first two upmix channels are generated from the first base channel and the third and the fourth upmix channel are generated from the 25 second base channel. In other alternatives, the present invention is also useful to generate, from a single base channel, three or more upmix channels always using preferably the same filling signal. In all such procedures, however, the filling signal is generated in a broad band manner, i.e., preferably in the time domain, and the multi-channel processing for generating, from the decoded base channel, the two or more 30 upmix channels is done in the frequency domain.
The decorrelation filter preferably operates fully in the time domain. However, other hybrid approaches are useful as well, where, for example, the decorrelation is performed by decorrelating a low band portion on the one hand and a high band 35 portion on the other hand while, for example, the multi-channel processing is
5
performed in a much higher spectral resolution. Thus, exemplarily, the spectral resolution of the multi-channel processing can, for example, be as high as processing each DFT or FFT line individually, and parametric data is given for several bands, where each band, for example, comprises two, three, or many more DFT/FFT/MDCT lines, and the filtering of the decoded base channel to obtain the filing signal is done 5 broad band like i.e., in the time domain or semi-broad band like, for example, within a low band and a high band or, probably within three different bands. Thus, in any case, the spectral resolution of the stereo processing that is typically performed for individual lines or subband signals is the highest spectral resolution. Typically, the stereo parameters generated in an encoder and transmitted and used by preferred decoder 10 have a medium spectral resolution. Thus, the parameters are given for bands, the bands can have varying bandwidths, but each band at least comprises two or more lines or subband signals generated and used by the multi-channel processors. And, the spectral resolution of the decorrelation filtering is very low and, in the case of time domain filtering extremely low or is medium, in the case of generating different 15 decorrelated signals for different bands, but this medium spectral resolution is still lower than the resolution, in which the parameters for the parametric processing are given.
In a preferred embodiment, the filter characteristic of the decorrelation filter is an 20 allpass filter having a constant magnitude region over the whole interesting spectral range. However, other decorrelation filters that do not have this ideal allpass filter behavior are useful as well as long as, in a preferred embodiment, a region of constant magnitude of the filter characteristic is greater than a spectral granularity of the spectral representation of the decoded base channel and the spectral granularity of 25 the spectral representation of the filling signal.
Thus, it is made sure that the spectral granularity of the filling signal or the decoded base channel, on which the multi-channel processing is performed does not influence the decorrelation filtering, so that a high quality filling signal is generated, preferably 30 adjusted using an energy normalization factor and then used for generating the two or more upmix channels.
Furthermore, it is to be noted that the generation of a decorrelated signal such as described with respect to subsequently discussed Figs. 4, 5, or 6 can be used in the 35 context of a multichannel decoder, but can also be used in any other application, where 6
a decorrelated signal is useful such as in any audio signal rendering, any reverberating operation etc.
Subsequently, preferred embodiments are discussed with respect to the accompanying drawings in which: 5
Fig. 1a illustrates an artificial signal generation when used with an EVS core coder;
Fig. 1b illustrates an artificial signal generation when used with an EVS core 10 coder in accordance with a different embodiment;
Fig. 2a illustrates an integration into DFT stereo processing including time domain bandwidth extension upmix; 15
Fig. 2b illustrates an integration into DFT stereo processing including time domain bandwidth extension upmix in accordance with a different embodiment;
Fig. 3 illustrates an integration into a system featuring multiple stereo 20 processing units;
Fig. 4 illustrates a basic allpass unit;
Fig. 5 illustrates an allpass filter unit; 25
Fig. 6 illustrates an impulse response of a preferred allpass filter;
Fig. 7a illustrates an apparatus for decoding an encoded multi-channel signal; 30
Fig. 7b illustrates a preferred implementation of the decorrelation filter;
Fig. 7c illustrates a combination of a base channel decoder and a spectral converter; 35
Fig. 8 illustrates a preferred implementation of the multi-channel processor;
7
Fig. 9a illustrates a further implementation of the apparatus for decoding an encoded multi-channel signal using bandwidth extension processing;
Fig. 9b illustrates preferred embodiments for generating a compressed energy 5 normalization factor;
Fig. 10 illustrates an apparatus for decoding an encoded multi-channel signal in accordance with a further embodiment operating using a channel transformation in the base channel decoder; 10
Fig. 11 illustrates cooperation between a resampler for the base channel decoder and the subsequently connected decorrelation filter;
Fig. 12 illustrates an exemplary parametric multi-channel encoder useful with 15 the apparatus for decoding in accordance with the present invention;
Fig. 13 illustrates a preferred implementation of the apparatus for decoding an encoded multi-channel signal; and 20
Fig. 14 illustrates a further preferred implementation of the multi-channel processor.
Fig. 7a illustrates a preferred embodiment of an apparatus for decoding an encoded multichannel signal. The encoded multi-channel signal comprises an encoded base 25 channel that is input into a base channel decoder 700 for decoding the encoded base channel to obtain a decoded base channel.
Furthermore, the decoded base channel is input into a decorrelation filter 800 for filtering at least a portion of the decoded base channel to obtain a filling signal. 30
Both the decoded base channel and the filling signal are input into a multi-channel processor 900 for performing a multi-channel processing using a spectral representation of the decoded base channel and, additionally, a spectral representation of the filling signal. The multi-channel processor outputs the decoded 35 multi-channel signal that comprises, for example, a left upmix channel and a right 8
upmix channel in the context of stereo processing or three or more upmix channels in the case of multi-channel processing covering more than two output channels.
The decorrelation filter 800 is configured as a broad band filter, and the multi-channel processor 900 is configured to apply a narrowband processing to the spectral 5 representation of the decoded base channel and the spectral representation of the filling signal. Importantly, broad band filtering is also done, when the signal to be filtered is downsampled from a higher sampling rate such as downsampled to 16 kHz or 12.8 kHz from a higher sampling rate such as 22 kHz or lower. 10
Thus, the multi-channel processor operates in a spectral granularity that is significantly higher than a spectral granularity, with which the filling signal is generated. In other words, a filter characteristic of the decorrelation filter is selected so that the region of a constant magnitude of the filter characteristic is greater than a spectral granularity of the spectral representation of the decoded base channel and a spectral granularity of 15 the spectral representation of the filling signal.
Thus, for example, when the spectral granularity of the multi-channel processor is so that, for each spectral line of a, for example, 1024 line DFT spectrum the upmix processing is performed, then the decorrelation filter is defined in such a way that the 20 region of constant magnitude of the filter characteristic of the decorrelation filter has a frequency width that is higher than two or more spectral lines of the DFT spectrum. Typically, the decorrelation filter operates in the time domain, and the used spectral band, for example, from 20 Hz to 20 kHz. Such filters are known to be allpass filters, and it is to be noted here that a perfectly constant magnitude range where the 25 magnitude is perfectly constant can be typically not be obtained by allpass filters, but variations from a constant magnitude by +/- 10% of an average value also are found to be useful for an allpass filter and, therefore, also represent a “constant magnitude of the filter characteristic”. 30
Fig. 7b illustrates an implementation of the decorrelation filter 800 with a time domain filter stage 802 and the subsequently connected spectral converted 804 generating a spectral representation of the filling signal. The spectral converter 804 is typically implemented as an FFT or a DFT processor, although other time-frequency domain conversion algorithms are useful as well. 35
9
Fig. 7c illustrates a preferred implementation of the cooperation between the base channel decoder 700 and a base channel spectral converter 902. Typically, the base channel decoder is configured to operate as a time domain base channel decoder generating a time domain base channel signal while the multi-channel processor 900 operates in the spectral domain. Thus, the multi-channel processor 900 of Fig. 7a has, 5 as an input stage, the base channel spectral converter 902 of Fig. 7c, and the spectral representation of the base channel spectral converter 902 is then forwarded to the multi-channel processor processing elements that are, for example, illustrated in Fig. 8, Fig. 13, Fig. 14, Fig. 9a or Fig. 10. In this context, it is to be outlined that, in general, reference numerals starting from a “7” represent elements that preferably belong to 10 the base channel decoder 700 of Fig. 7a. Elements having a reference numeral starting with a “8” preferably belong to the decorrelation filter 800 of Fig. 7a, and elements with a reference numeral starting with “9” in the figures preferably belong to the multi-channel processor 900 of Fig. 7a. However, it is to be noted here that the separations between the individual elements are only made for describing the present 15 invention, but any actual implementation can have different, typically hardware or alternatively software or mixed hardware/software processing blocks that are separated in a different manner than the logical separation illustrated in Fig. 7a and other figures. 20
Fig. 4 illustrates a preferred implementation of the filter stage 802 that is indicated as 802’. Particularly, Fig. 4 illustrates a basic allpass unit that can be included in the decorrelation filter alone or together with more such cascaded allpass units as, for example, illustrated in Fig. 5. Fig. 5 illustrates the decorrelation filter 802 with exemplarily five cascaded basic allpass units 502, 504, 506, 508, 510, while each of 25 basic allpass units can be implemented as outlined in Fig. 4. Alternatively, however, the decorrelation filter can include a single basic allpass unit 403 of Fig. 4 and, therefore, represents an alternative implementation of the decorrelation filter stage 802’. 30
Preferably, each basic allpass unit comprises two Schroeder allpass filters 401, 402 nested into a third Schroeder allpass filter 403. In this implementation, the allpass filter cell 403 is connected to two cascaded Schroeder allpass filters 401, 402, wherein input into the first cascaded Schroeder allpass filter 401 and an output from the cascaded second Schroeder allpass filter 402 are connected, in the direction of the signal flow, 35 before a delay stage 423 of the third Schroeder allpass filter.
10
Particularly, the allpass filter illustrated in Fig. 4 comprises: a first adder 411, a second adder 412, a third adder 413, a fourth adder 414, a fifth adder 415 and a sixth adder 416; a first delay stage 421, a second delay stage 422 and a third delay stage 423; a first forward feed 431 with a first forward gain, a first backward feed 441 with a first backward gain, a 5 second forward feed 442 with a second forward gain and a second backward feed 432 with a second backward gain; and a third forward feed 443 with a third forward gain and a third backward feed 433 with a third backward gain.
The connections are illustrated in Fig. 4 are as follows: The input into the first adder 411 10 represents an input into the allpass filter 802, wherein a second input into the first adder 411 is connected to an output of the third filter delay stage 423 and comprises the third backward feed 433 with a third backward gain. The output of the first adder 411 is connected to an input into the second adder 412 and is connected to an input of the sixth adder 416 via the third forward feed 443 with the third forward gain. The input into the 15 second adder 412 is connected to the first delay stage 421 via a first backward feed 441 with the first backward gain. The output of the second adder 412 is connected to an input of the first delay stage 421 and is connected to an input of the third adder 413 via the first forward feed 431 with the first forward gain. The output of the first delay stage 421 is connected to a further input of the third adder 413. The output of the third adder 413 is 20 connected to an input of the fourth adder 414. The further input into the fourth adder 414 is connected to an output of the second delay stage 422 via the second backward feed 432 with the second backward gain. The output of the fourth adder 414 is connected to an input into the second delay stage 422 and is connected to an input into the fifth adder 415 via the second forward feed 442 with the second forward gain. The output of the second 25 delay stage 421 is connected to a further input into the fifth adder 415. The output of the fifth adder 415 is connected to an input of the third delay stage 423. The output of the third delay stage 423 is connected to an input into the sixth adder 416. The further input into the sixth adder 416 is connected to an output of the first adder 411 via the third forward feed 443 with the third forward gain. The output of the sixth adder 416 represents an output of 30 the allpass filter 802.
Preferably, as illustrated in Fig. 8, the multi-channel processor 900 is configured to determine a first upmix channel and a second upmix channel using different weighted combinations of spectral bands of the decoded base channel and corresponding 35 spectral bands of the filling signal. Particularly, the different weighted combinations 11
depend on a prediction factor and/or a gain factor as derived from encoded parametric information included within the encoded multi-channel signal. Furthermore, the weighted combinations preferably depend on an envelope normalization factor or, preferably an energy normalization factor calculated using a spectral band of the decoded base channel and the corresponding spectral band of the filling signal. Thus, 5 the processor 904 of Fig. 8 receives the spectral representation of the decoded base channel and the spectral representation of the filling signal and outputs, preferably in the time domain, a first upmix channel and a second upmix channel, and the prediction factor, the gain factor, and the energy normalization factor are input in a per-band manner and these factors are then used for all spectral lines within a band, but change 10 for a different band, where this data is retrieved from the encoded signal or locally determined in the decoder.
Particularly, the prediction factor and the gain factor typically represent encoded parameters that are decoded on the decoder side and are then used in the parametric 15 stereo upmixing. Contrary thereto, the energy normalization factor is calculated on the decoder-side typically using a spectral band of the decoded base channel and the spectral band of the filling signal. The same is true for the envelope normalization factor. Preferably, the envelope normalization corresponds to an energy normalization per band. 20
Although the present invention is discussed with the specific reference encoder illustrated in Fig. 12 and the specific decoder illustrated in Fig. 13 or Fig. 14, it is, however, to be noted that the generation of a broad band filling signal and the application of the broad band filling signal in multi-channel stereo decoding operating 25 in a narrow band spectral domain can also be applied to any other parametric stereo encoding techniques known in the art. These are parametric stereo encoding known from the HE-AAC standard or from the MPEG surround standard or from Binaural Cue Coding (BCC coding) or any other stereo encoding/decoding tools or any other multi-channel encoding/decoding tools. 30
Fig. 9a illustrates a further preferred embodiment of the multi-channel decoder comprising a multi-channel processor stage 904 generating a first upmix channel and a second upmix channel and subsequently connected time domain bandwidth extension elements 908, 910 that perform a time domain bandwidth extension in a 35 guided or unguided manner to the first upmix channel and the second upmix channel 12
individually. Typically, a windower and energy normalization factor calculator 912 is provided to calculate an energy normalization factor to be used by the multi-channel processor 904. In alternative embodiments that are discussed with respect to Fig. 1a or Fig. 1b and Fig. 2a or Fig. 2b, however, the bandwidth extension is performed with the mono or decoded core signal and, only a single stereo processing element 960 of 5 Fig. 2a or Fig. 2b is provided for generating, from the high band mono signal, a high band left channel signal and a high band right channel signal that are then added to the low band left channel signal and the low band right channel signal with the use of adders 994a and 994b. 10
This adding illustrated in Fig. 2a or 2b can, for example, be performed in the time domain. Then, block 960 generates a time domain signal. This is the preferred implementation. However, alternatively, the stereo processing 904 in Fig. 2a or 2b and the left channel and right channel signals from block 960 can be generated in the spectral domain and, the adders 994a and 994b are, for example, implemented by a 15 synthesis filter bank so that the low band data from block 904 is input into the low band input of the synthesis filter bank and the high band output of block 960 is input into the high band input of the synthesis filter bank and the output of the synthesis filter bank is the corresponding left channel time domain signal or a right channel time domain signal. 20
Preferably, the windower and factor calculator 912 in Fig. 9a generates and calculates an energy value of the high band signal as, for example, also illustrated at 961 in Fig. 1a or Fig. 1b and uses this energy estimate for generating high band first and second upmix channels as will be discussed later on with respect to equations 28 to 31 in a 25 preferred embodiment.
Preferably, the processor 904 for calculating the weighted combination receives, as an input, the energy normalization factor per band. In a preferred embodiment, however, a compression of the energy normalization factor is performed and the 30 different weighted combinations are calculated using the compressed energy normalization factor. Thus, with respect to Fig. 8, the processor 904 receives, instead of the non-compressed energy normalization factor, a compressed energy normalization factor. This procedure is illustrated, with respect to different embodiments, in Fig. 9b. Block 920 receives an energy of the residual or filling signal 35 per time/frequency bin and an energy of the decoded base channel per time and 13
frequency bin, and then calculates an absolute energy normalization factor for a band comprising several such time/frequency bins. Then, in block 921, a compression of the energy normalization factor is performed, and this compression can, for example, be the usage of a logarithm function as, for example, discussed with respect to equation 22 later on. 5
Based on the compressed energy normalization factor generated by block 921, different procedures for generating the compressed energy normalization factor are given. In the first alternative, a function is applied to the compressed factor as illustrated in 922, and this function is preferably a non-linear function. Then, in block 10 923 the evaluated factor is expanded to obtain a specific compressed energy normalization factor. Hence, block 922 can, for example, be implemented to the function expression in equation (22) that will be given later on, and block 923 is performed by the “exponent” function within equation (22). However, a different alternative resulting in a similar compressed energy normalization factor is given in 15 block 924 and 925. In block 924 an evaluation factor is determined and, in block 925, the evaluation factor is applied to the energy normalization factor obtained from block 920. Thus, the application of the factor to the energy normalization factor as outlined in block 912 can, for example, be implemented by subsequently illustrated equation 27. 20
Thus, as for example, illustrated in equation 27 later on, the evaluation factor is determined and this factor is simply a factor that can be multiplied by the energy normalization factor ?????????? as determined by block 920 without actually performing special function evaluations. Therefore, the calculation of block 925 can also 25 dispensed with, i.e., the specific calculation of the compressed energy normalization factor is not necessary, as soon as the original non-compressed energy normalization factor, and the evaluation factor and a further operand within a multiplication such as a spectral value of the filling signal are multiplied together to obtain a normalized filling signal spectral line. 30
Fig. 10 illustrates a further implementation, where the encoded multi-channel signal is not simply a mono signal but comprises an encoded mid signal and an encoded side signal, for example. In such a situation, the base channel decoder 700 not only decodes the encoded mid signal and the encoded side signal or, generally, the 35 encoded first signal and the encoded second signal, but additionally performs a 14
channel transformation 705, for example, in the form of a mid/side transform and inverse mid/side transformation to calculate a primary channel such as L and a secondary channel such as R, or the transformation is a Karhunen Loeve transformation. 5
However, the result of the channel transformation and, particularly, the result of the decoding operation is that the primary channel is a broad band channel while the secondary channel is a narrow band channel. Then, the broad band channel is input into the decorrelation filter 800 and, a high pass filtering is performed in block 930 to generate a decorrelated high pass signal and this decorrelated high pass signal is then 10 added to the narrow band secondary channel in the band combiner 934 to obtain the broad band secondary channel so that, in the end, the broad band primary channel and the broad band secondary channel are output.
Fig. 11 illustrates a further implementation, where a decoded base channel obtained 15 by the base channel decoder 700 in a certain sampling rate associated with the encoded base channel is input into a resampler 710 in order to obtain a resampled base channel that is then used in the multi-channel processor that operates on the resampled channel. 20
Fig. 12 illustrates a preferred implementation of a reference stereo encoding. In block 1200, an inter-channel phase difference IPD is calculated for the first channel such as L and the second channel such as R. this IPD value is then, typically quantized and output for each band in each time frame as encoder output data 1206. Furthermore, the IPD values are used for calculating parametric data for the stereo signal such as 25 a prediction parameter ????,?? for each band ?? in each time frame ?? and a gain parameter ????,?? for each band ?? in each time frame ??.
Furthermore, both first and second channels are also used in a mid/side processor 1203 to calculate, for each band, a mid signal and a side signal. 30
Depending on the implementation, only the mid signal ?? can be forwarded to an encoder 1204, and the side signal is not forwarded to the encoder 1204 so that the output data 1206 only comprises the encoded base channel, the parametric data generated by block 1202 and the IPD information generated by block 1200. 35
15
Subsequently, a preferred embodiment is discussed with respect to a reference encoder, but it is to be noted that any other stereo encoders as discussed before can be used as well.
A REFERENCE STEREO ENCODER 5
A DFT based stereo encoder is specified for reference. As usual, time frequency vectors Lt and Rt of the left and right channel are generated by simultaneously applying an analysis window followed by a Discrete Fourier Transform (DFT). The DFT bins are then grouped into subbands (Lt,k)k ? Ib resp. (Rt,k)k ? Ib,where Ib denotes the set of subband indices. 10
Calculation of IPDs and Downmixing. For the downmix, a bandwise inter-channelphase-difference (IPD) is calculated as
(1) ??????=??????(S????,??????,??)*?? ?? ????), 15
Where ??* denotes the complex conjugate of ??. This is used to generate a band-wise mid and side signal
(2) ????,??= ??-????????,??+ ????(????????,??-??)????,??v2 20
and
(3) ????,??= ??-????????,??- ????(????????,??-??)????,??v2 25
for ?? ?? ????, where ß is an absolute phase rotation parameter e.g. given by
(4) ??=????????2 (sin(????????,??),cos(????????,??)+21+????,??1-????,??).
Calculation of parameters. In addition to the band-wise IPDs, two further stereo 30 parameters are extracted. The optimal coefficient for predicting ????,?? by ????,??, i.e. the number ????,?? such that the energy of the remainder
(5) ????,??= ????,??- ????,??????,??
16
is minimal, and a relative gain factor ????,?? which, if applied to the mid signal ????, equalizes the energy of ???? and ???? in each band, i.e.,
(6) ????,??= vS|????,??|2????????S|????,??|2???????? 5
The optimal prediction coefficient can be calculated from the energies in the subbands
(7) ????,??,??= S|????,??|????????2 and ????,??,??=S|????,??|????????2 10
and the absolute value of the inner product of ???? and ????
(8) ????/??,??,?? = |S????,??????,??*????????|
as 15
(9) ????,??= ????,??,??- ????,??,??????,??,??+ ????,??,??+2????/??,??,?? .
From this it follows that ????,?? lies in [-1, 1]. The residual gain can be calculated similarly from the energies and the inner product as 20
(10) ????,?? = ((1-????,??)????,??,??+(1+????,??)????,??,??-2????/??,??,??????,??,??+ ????,??,??+2????/??,??,??)1/2,
which implies 25
(11) 0 = ????,??= v1-????,??2 .
Fig. 13 illustrates a preferred implementation of the decoder-side. In block 700, representing the base channel decoder of Fig. 7a, the encoded base channel ?? is decoded. 30
17
Then, in block 940a, the primary upmix channel such as L is calculated. Furthermore, in block 940b, the secondary upmix channel is calculated which is, for example, channel ??.
Both blocks 940a and 940b are connected to the filling signal generator 800 and 5 receive the parametric data generated by block 1200 in Fig. 12 or 1202 of Fig. 12.
Preferably, the parametric data is given in bands having the second spectral resolution and the blocks 940a, 940b operate in high spectral resolution granularity and generate spectral lines with a first spectral resolution that is higher than the second spectral 10 resolution.
The output of blocks 940a, 940b are, for example, input into frequency-time converters 961, 962. These converters can be a DFT or any other transform, and typically also comprise a subsequent synthesis window processing and a further overlap-add 15 operation.
Additionally, the filling signal generator receives the energy normalization factor and, preferably, the compressed energy normalization factor, and this factor is used for generating a correctly leveled/weighted filling signal spectral line for blocks 940a and 20 940b.
Subsequently, a preferred implementation of blocks 940a, 940b is given. Both blocks comprise the calculation 941a of phase rotation factor, the calculation of a first weight for the spectral line of the decoded base channel as indicated by 942a and 942b. 25 Furthermore, both blocks comprise the calculation 943a and 943b for the calculation of the second weight for the spectral line of the filling signal.
Furthermore, the filling signal generator 800 receives the energy normalization factor generated by block 945. This block 945 receives the filling signal per band and the 30 base channel signal per band and, then, calculates the same energy normalization factor used for all lines in a band.
Finally, this data is forwarded to the processor 946 for calculating the spectral lines for the first and the second upmix channels. To this end, the processor 946 receives the 35 data from blocks 941a, 941b, 942a, 942b, 943a, 943b and the spectral line for the 18
decoded base channel and the spectral line for the filling signal. The output of block 946 is then a corresponding spectral line for the first and the second upmix channel.
Subsequently, preferred implementations of a decoder are given. 5
Reference Decoder
A DFT based decoder for reference is specified which corresponds to the encoder described above. The time-frequency transform from both the encoder is applied to the decoded downmix yielding time-frequency vectors ??~??,??. Using the dequantized values 10 ??????~??,??, ??~??,??, and ??~
??,??, left and right channel are calculated as
(12) ??~??,??= ??????(??~ ??,??(1+??~??,??)+ ??~??,?? ?????????? ??~??,??)v2
and 15
(13) ??~??,??= ????(??-??????~??(??~??,??(1+??~??,??)- ??~??,?? ?????????? ??~??,??)v2
for ?? ?? ???? where ??~??,?? is a substitute for the missing residual ????,?? from the encoder, and ?????????? is the energy normalizing factor 20
(14) ??????????= v????,~??,??????,~??,??
which turns the relative residual prediction gain ????,?? into an absolute gain. A simple choice for ??~??,?? would be 25
(15) ??~??,??= ??~??-????,??,
where ????> denotes a band-wise frame-delay but this has certain drawbacks, namely 30
•
??~?? and ??~?? can have very different spectral and temporal shapes,
•
even in the case of matching spectral and temporal envelopes, the use of (15) in 19
(12) and (13) in
duces a frequency dependent ILD and IPD, which varies only slowly in low to mid frequency range. This causes problems e.g. for tonal items,
•
for speech signals, the delay should be chosen small in order to stay below the echo threshold but this causes strong coloration due to comb-filtering. 5
It is therefore better to use time-frequency bins of the artificial signal which is described below.
The phase rotation factor ß is again calculated as 10
(16) ??=????????2(sin(??????~??,??),cos(??????~??,??)+21+??~??,??1-??~??,??).
Synthetic Signal Generation 15
For replacing missing residual parts in the stereo upmix, a second signal is generated from the time-domain input signal ??~, outputting a second signal ??~??. The design constrain for this filter is to have a short, dense impulse response. This is achieved by applying several stages of basic allpass filters obtained by nesting two Schroeder allpass filter into a third Schroeder filter, i.e. 20
(17) ??(??)=??((??-??3??(??))-1),
where 25
(18) ??(??)= ??1+ ??-??11-??1??-??1 ??2+ ??-??21-??1??-??2
and
(19) ??(??)= ??3+ ??-11-??3??-1 .
These elementary allpass filters 30
(20) ??+ ??-??1-?? ??-??
have been proposed by Schroeder in the context of artificial reverb generation, where they are applied with both large gains and large delays. Since it is not desirable in this
20
context to have a reverberant output signal, gains and delays are chosen to be rather small. Similarly to the reverb case, a dense and random-like impulse response is best obtained by choosing delays ???? that are pairwise coprime for all allpass filters.
The filter runs at a fixed sampling rate, regardless of the bandwidth or sampling 5 rate of the signal that is delivered by the core coder. When used with the EVS coder, this is necessary since the bandwidth may be changed by a bandwidth detector during operation and the fixed sampling rate guarantees a consistent output. The preferred sampling rate for the allpass filter is 32 kHz, the native super wide band sampling rate, since the absence of residual parts above 16kHz are usually not 10 audible anymore. When used with the EVS coder, the signal is directly constructed from the core, which incorporates several resampling routines as displayed in Figure 1.
A filter that has been found to work well at 32kHz sampling rate is 15
(21) ??(??)= ?????(??)5
??=1
where ???? are basic allpass filters with gains and delays displayed in Table 1. The impulse response of this filter is depicted in Figure 6. For complexity reasons, one 20 can also apply such a filter at lower sampling rates and/or reduce the number of basic allpass filter units.
The allpass filter unit also provides the functionality to overwrite parts of the input signal by zeros, which is encoder-controlled. This can for instance be used to delete 25 attacks from the filter input.
COMPRESSION OF THE ?????????? FACTOR
To obtain a smoother output it has been found beneficial to apply a compressor to 30 the energyadjusting gain ?????????? which compresses the values towards one. This also compensates a bit for the fact that part of the ambience is typically lost after coding the downmix at lower bitrates.
Such a compressor can be constructed by taking 35
21
(22) ??~????????=exp (??(log(??????????)),
where,
(23) ??(??)=??-???(??)??????0 5
and the function ?? satisfies
(24) 0 =??(??)=1. 10
The value of ?? around ?? then specifies how strongly this region is compressed, where the value 0 corresponds to no compression and the value 1 corresponds to total compression. Furthermore, the compression scheme is symmetric if ?? is even, i.e., ??(??)=??(-??). One example is 15
(25) ??(??)= {1-??
| # | Name | Date |
|---|---|---|
| 1 | 202418039744-STATEMENT OF UNDERTAKING (FORM 3) [21-05-2024(online)].pdf | 2024-05-21 |
| 2 | 202418039744-REQUEST FOR EXAMINATION (FORM-18) [21-05-2024(online)].pdf | 2024-05-21 |
| 3 | 202418039744-FORM 18 [21-05-2024(online)].pdf | 2024-05-21 |
| 4 | 202418039744-FORM 1 [21-05-2024(online)].pdf | 2024-05-21 |
| 5 | 202418039744-DRAWINGS [21-05-2024(online)].pdf | 2024-05-21 |
| 6 | 202418039744-DECLARATION OF INVENTORSHIP (FORM 5) [21-05-2024(online)].pdf | 2024-05-21 |
| 7 | 202418039744-COMPLETE SPECIFICATION [21-05-2024(online)].pdf | 2024-05-21 |
| 8 | 202418039744-FORM-26 [10-07-2024(online)].pdf | 2024-07-10 |
| 9 | 202418039744-GPA-130924.pdf | 2024-09-19 |
| 10 | 202418039744-Correspondence-130924.pdf | 2024-09-19 |
| 11 | 202418039744-FORM 3 [29-10-2024(online)].pdf | 2024-10-29 |