Abstract: A multi channel audio decoder for providing at least two output audio signals on the basis of an encoded representation is configured to render a plurality of decoded audio signals which are obtained on the basis of the encoded representation in dependence on one or more rendering parameters to obtain a plurality of rendered audio signals. The multichannel audio decoder is configured to derive one or more decorrelated audio signals from the rendered audio signals and to combine the rendered audio signals or a scaled version thereof with the one or more decorrelated audio signals to obtain the output audio signals. A multi channel audio encoder provides a decorrelation method parameter to control an audio decoder.
Multi-Channel Audio Decoder, Multi-Channel Audio Encoder, Methods, Computer Program and Encoded Audio Representation using a Decorrelation of Rendered
Audio Signals
Description
Technical Field
Embodiments according to the invention are related to a multi-channel audio decoder for providing at least two output audio signals on the basis of an encoded representation.
Further embodiments according to the invention are related to a multi-channel audio encoder for providing an encoded representation on the basis of at least two input audio signals.
Further embodiments according to the invention are related to a method for providing at least two output audio signals on the basis of an encoded representation.
Further embodiments according to the invention are related to a method for providing an encoded representation on the basis of at least two input audio signals.
Further embodiments according to the invention are related to a computer program for performing one of said methods.
Further embodiments according to the invention are related to an encoded audio representation.
Generally speaking, embodiments according to the present invention are related to a decorrelation concept for multi-channel downmix/upmix parametric audio object coding systems.
Background of the Invention
In recent years, demand for storage and transmission of audio contents has steadily increased. Moreover, the quality requirements for the storage and transmission of audio contents have also steadily increased. Accordingly, the concepts for the encoding and decoding of audio content have been enhanced.
For example, the so called "Advanced Audio Coding" (AAC) has been developed, which is described, for example, in the international standard ISO/IEC 13818-7:2003. Moreover, some spatial extensions have been created, like for example the so called "MPEG Surround" concept, which is described, for example, in the international standard ISO/IEC 23003-1 :2007. Moreover, additional improvements for encoding and decoding of spatial information of audio signals are described in the international standard ISO/IEC 23003-2:2010, which relates to the so called "Spatial Audio Object Coding".
Moreover, a switchable audio encoding/decoding concept which provides the possibility to encode both general audio signals and speech signals with good coding efficiency and to handle multi-channel audio signals is defined in the international standard ISO/IEC 23003-3:2012, which describes the so called "Unified Speech and Audio Coding" concept.
Moreover, further conventional concepts are described in the references, which are mentioned at the end of the present description.
However, there is a desire to provide an even more advanced concept for an efficient coding and decoding of 3-dimensional audio scenes.
Summary of the Invention
An embodiment according to the invention creates a multi-channel audio decoder for providing at least two output audio signals on the basis of an encoded representation. The multi-channel audio decoder is configured to render a plurality of decoded audio signals, which are obtained on the basis of the encoded representation, in dependence on one or more rendering parameters, to obtain a plurality of rendered audio signals. The multichannel audio decoder is configured to derive one or more decorrelated audio signals from the rendered audio signals. Moreover, the multi-channel audio decoder is configured to combine the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to obtain the output audio signals.
This embodiment according to the invention is based on the finding that audio quality can be improved in a multi-channel audio decoder by deriving one or more decorrelated audio signals from rendered audio signals, which are obtained on the basis of a plurality of decoded audio signals, and by combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to obtain the output audio signals. It has been found that it is more efficient to adjust the correlation characteristics, or the covariance characteristics, of the output audio signals by adding decorrelated signals after the rendering when compared to adding decorrelated signals before the rendering or during the rendering. It has been found that this concept is more efficient in general cases, in which there are more decoded audio signals, which are input to the rendering, than rendered audio signals, because more decorrelators would be required if the decorrelation was performed before the rendering or during the rendering. Moreover, it has been found that artifacts are often provided when decorrelated signals are added to the decoded audio signals before the rendering, because the rendering typically brings along a combination of decoded audio signals. Accordingly, the concept according to the present embodiment of the invention outperforms conventional approaches, in which decorrelated signals are added before the rendering. For example, it is possible to directly estimate the desired correlation characteristics or covariance characteristics of the rendered signals, and to adapt the provision of decorrelated audio signals to the actually rendered signals, which results in a better tradeoff between efficiency and audio quality, and often even results in an increased efficiency and a better quality at the same time.
In a preferred embodiment, the multi-channel audio decoder is configured to obtain the decoded audio signals, which are rendered to obtain the plurality of rendered audio signals, using a parametric reconstruction. It has been found that the concept according to the present invention brings along advantages in combination with a parametric reconstruction of audio signals, wherein the parametric reconstruction is, for example, based on a side information describing object signals and/or a relationship between object signals (wherein the object signals may constitute the decoded audio signals). For example, there may be a comparatively large number of object signals (decoded audio signals) in such a concept, and it has been found that the application of the decorrelation on the basis of the rendered audio signals is particularly efficient and avoids artifacts in such a scenario.
In a preferred embodiment, the decoded audio signals are reconstructed object signals (for example, parametrically reconstructed object signals) and the multi-channel audio
decoder is configured to derive the reconstructed object signals from the one or more downmix signals using a side information. Accordingly, the combination of the rendered audio signals with one or more decorrelated audio signals, which are based on the rendered audio signals, allows for an efficient reconstruction of correlation characteristics or covariance characteristics in the output audio signals, even if there is a comparatively large number of reconstructed object signals (which may be larger than a number of rendered audio signals or output audio signals).
In a preferred embodiment, the multi-channel audio decoder may be configured to derive un-mixing coefficients from the side information and to apply the un-mixing coefficients to derive the (parametrically) reconstructed object signals from the one or more downmix signals using the un-mixing coefficients. Accordingly, the input signals for the rendering may be derived from a side information, which may for example be an object-related side information (like, for example, an inter-object-correlation information or an object-level difference information, wherein the same result may be obtained by using absolute energies).
In a preferred embodiment, the multi-channel audio decoder may be configured to combine the rendered audio signals with the one or more decorrelated audio signals, to at least partially achieve desired correlation characteristics or covariance characteristics of the output audio signals. It has been found that the combination of the rendered audio signals with the one or more decorrelated audio signals, which are derived from the rendered audio signals, allows for an adjustment (or reconstruction) of desired correlation characteristics or covariance characteristics. Moreover, it has been found that it is important for the auditory impression to have the proper correlation characteristics or covariance characteristics in the output audio signal, and that this can be achieved best by modifying the rendered audio signals using the decorrelated audio signals. For example, any degradations, which are caused in previous processing stages, may also be considered when combining the rendered audio signals and the decorrelated audio signals based on the rendered audio signals.
In a preferred embodiment, the multi-channel audio decoder may be configured to combine the rendered audio signals with the one or more decorrelated audio signals, to at least partially compensate for an energy loss during a parametric reconstruction of the decoded audio signals, which are rendered to obtain the plurality of rendered audio signals. It has been found that the post-rendering application of the decorrelated audio signals allows to correct for signal imperfections which are caused by a processing before the rendering, for example, by the parametric reconstruction of the decoded audio signals. Consequently, it is not necessary to reconstruct correlation characteristics or covariance characteristics of the decoded audio signals, which are input into the rendering, with high accuracy. This simplifies the reconstruction of the decoded audio signals and therefore brings along a high efficiency.
In a preferred embodiment, the multi-channel audio decoder is configured to determine desired correlation characteristics of covariance characteristics of the output audio signals. Moreover, the multi-channel audio decoder is configured to adjust a combination of the rendered audio signals with the one or more decorrelated audio signals, to obtain the output audio signals, such that correlation characteristics or covariance characteristics of the obtained output audio signals approximate or equal the desired correlation characteristics or desired covariance characteristics. By computing (or determining) desired correlation characteristics or covariance characteristics of the output audio signals (which should be reached after the combination of the rendered audio signals with the decorrelated audio signals), it is possible to adjust the correlation characteristics or covariance characteristics at a late stage of the processing, which in turn allows for a relatively precise reconstruction. Accordingly, a spatial hearing impression of the output audio signals is well adapted to a desired hearing impression.
In a preferred embodiment, the multi-channel audio decoder may be configured to determine the desired correlation characteristics or desired covariance characteristics in dependence on a rendering information describing a rendering of the plurality of decoded audio signals, which are obtained on the basis of the encoded representation, to obtain the plurality of rendered audio signals. By considering the rendering process in the determination of the desired correlation characteristics or the desired covariance characteristics, it is possible to achieve a precise information for adjusting the combination of the rendered audio signals with the one or more decorrelated audio signals, which brings along the possibility to have output audio signals that match a desired hearing impression.
In a preferred embodiment, the multi-channel audio decoder may be configured to determine the desired correlation characteristics or desired covariance characteristics in dependence on an object correlation information or an object covariance information describing characteristics of a plurality of audio objects and/or a relationship between a plurality of audio objects. Accordingly, it is possible to restore correlation characteristics or covariance characteristics, which are adapted to the audio objects, at a late processing stage, namely after the rendering. Accordingly, the complexity for decoding the audio objects is reduced. Moreover, by considering the correlation characteristics or covariance characteristics of the audio objects after the rendering, a detrimental impact of the rendering can be avoided and the correlation characteristics or covariance characteristics can be reconstructed with good accuracy.
In a preferred embodiment, the multi-channel audio decoder is configured to determine the object correlation information or the object covariance information on the basis of a side information included in the encoded representation. Accordingly, the concept can be well-adapted to a spatial audio object coding approach, which uses side information.
In a preferred embodiment, the multi-channel audio decoder is configured to determine actual correlation characteristics or covariance characteristics of the rendered audio signals and to adjust the combination of the rendered audio signals with the one or more decorrelated audio signals, to obtain the output audio signals in dependence on the actual correlation characteristics or covariance characteristics of the rendered audio signals. Accordingly, it can be reached that imperfections in earlier processing stages like, for example, an energy loss when reconstructing audio objects, or imperfections caused by the rendering, can be considered. Thus, the combination of the rendered audio signals with the one or more decorrelated audio signals can be adjusted in a very precise manner to the needs, such that the combination of the actual rendered audio signals with the decorrelated audio signals results in the desired characteristics.
In a preferred embodiment, the multi-channel audio decoder may be configured to combine the rendered audio signals with the one or more decorrelated audio signals, wherein the rendered audio signals are weighted using a first mixing matrix P and wherein the one or more decorrelated audio signals are weighted using a second mixing matrix M. This allows for simple derivation of the output audio signals, wherein a linear combination operation is performed, which is described by the mixing matrix P which is applied to the rendered audio signals and a mixing matrix M which is applied to the one or more decorrelated audio signals.
In a preferred embodiment, the multi-channel audio decoder is configured to adjust at least one out of the mixing matrix P and the mixing matrix M such that correlation
characteristics or covariance characteristics of the obtained output audio signals approximate or equal to the desired correlation characteristics or desired covariance characteristics. Thus, there is a way to adjust one or more of the mixing matrices, which is typically possible with moderate effort and good results.
In a preferred embodiment, the multi-channel audio decoder is configured to jointly compute the mixing matrix P and the mixing matrix . Accordingly, it is possible to obtain the mixing matrices such that the correlation characteristics or covariance characteristics of the obtained output audio signals can be set to approximate or equal the desired correlation characteristics or desired covariance characteristics. Moreover, when jointly computing the mixing matrix P and the mixing matrix M, some degrees of freedom are typically available, such that is possible to best fit the mixing matrix P and the mixing matrix M to the requirements.
In a preferred embodiment, the multi-channel audio decoder is configured to obtain a combined mixing matrix F, which comprises the mixing matrix P and the mixing matrix M, such that a covariance matrix of the obtained output audio signals is equal to a desired covariance matrix.
In a preferred embodiment, the combined mixing matrix can be computed in accordance with the equations described below.
In a preferred embodiment, the multi-channel audio decoder may be configured to determine the combined mixing matrix F using matrices, which are determined using a singular value decomposition of a first covariance matrix, which describes the rendered audio signal and the decorrelated audio signal, and of a second covariance matrix, which describes desired covariance characteristics of the output audio signals. Using such a singular value decomposition constitutes a numerically efficient solution for determining the combined mixing matrix.
In a preferred embodiment, the multi-channel audio decoder is configured to set the mixing matrix P to be an identity matrix, or a multiple thereof, and to compute the mixing matrix M. This avoids a mixing of different rendered audio signals, which helps to preserve a desired spatial impression. Moreover, the number of degrees of freedom is reduced.
In a preferred embodiment, the multi-channel audio decoder may be configured to determine the mixing matrix M such that a difference between a desired covariance matrix and a covariance matrix of the rendered audio signals approximate or equals a covariance of the one or more decorrelated signals, after mixing with the mixing matrix M. Thus, a computationally simple concept for obtaining the mixing matrix M is given.
In a preferred embodiment, the multi-channel audio decoder may be configured to determine the mixing matrix M using matrices which are determined using a singular value decomposition of the difference between the desired covariance matrix and the covariance matrix of the rendered audio signals and of the covariance matrix of the one or more decorrelated signals. This is a computationally very efficient approach for determining the mixing matrix M.
In a preferred embodiment, the multi-channel audio decoder is configured to determine the mixing matrices P, M under the restriction that a given rendered audio signal is only mixed with a decorrelated version of the given rendered audio signal itself. This concept limits to a small modification (for example, in the presence of imperfect decorrelators) or prevents a modification of cross-correlation characteristics or cross-covariance characteristics (for example, in case of ideal decorrelators) and may therefore be desirable in some cases to avoid a change of a perceived object position. However, in the presence of non-ideal decorrelators, autocorrelation values (or autocovariance values) are explicitly modified, and the changes in the cross-terms are ignored.
In a preferred embodiment, the multi-channel audio decoder is configured to combine the rendered audio signals with the one or more decorrelated audio signals such that only autocorrelation values or autocovariance values of rendered audio signals are modified while cross-correlation characteristics or cross-covariance characteristics are left unmodified or modified with a small value (for example, in the presence of imperfect decorrelators). Again, a degradation of a perceived position of audio objects can be avoided. Moreover, the computational complexity can be reduced. However, for example, the cross-covariance values are modified as consequence of the modification of the energies (autocorrelation values), but the cross-correlation values remain unmodified (they represent normalized version of the cross-covariance values).
In a preferred embodiment, the multi-channel audio decoder is configured to set the mixing matrix P to be an identity matrix, or a multiple thereof, and to compute the mixing matrix M under the restriction that M is a diagonal matrix. Thus, a modification of cross-correlation characteristics or cross-covariance characteristics can be avoided or restricted to a small value (for example, in the presence of imperfect decorrelators).
In a preferred embodiment, the multi-channel audio decoder is configured to combine the rendered audio signals with the one or more decorrelated audio signals, to obtain the output audio signal, wherein a diagonal matrix M is applied to the one or more decorrelated audio signals W. In this case, the multi-channel audio decoder is configured to compute diagonal elements of the mixing matrix M such that diagonal elements of a covariance matrix of the output audio signals are equal to desired energies. Accordingly, an energy loss, which may be obtained by the rendering operation and/or by the reconstruction of audio objects on the basis of one or more downmix signals and a spatial side-information, can be compensated. Thus, a proper intensity of the output audio signals can be achieved.
In a preferred embodiment, the multi-channel audio decoder may be configured to compute the elements of the mixing matrix M in dependence on diagonal elements of a desired covariance matrix, diagonal elements of a covariance matrix of the rendered audio signals, and diagonal elements of a covariance matrix of the one or more decorrelated signals. Non-diagonal elements of the mixing matrix may be set to zero, and the desired covariance matrix may be computed on the basis of the rendering matrix used for the rendering operation and an object covariance matrix. Furthermore, a threshold value may be used to limit an amount of decorrelation added to the signals. This concept provides for a very computationally efficient determination of the elements of the mixing matrix M.
In a preferred embodiment, the multi-channel audio decoder may be configured to consider correlation characteristics or covariance characteristics of the decorrelated audio signals when determining how to combine the rendered audio signals, or the scaled version thereof, with the one or more decorrelated audio signals. Accordingly, imperfections of the decorrelation can be considered.
In a preferred embodiment, the multi-channel audio decoder may be configured to mix rendered audio signals and decorrelated audio signals, such that a given output audio
signal is provided on the basis of two or more rendered audio signals and at least one decorrelated audio signal. By using this concept, cross-correlation characteristics can be efficiently adjusted without the need to introduce large amounts of decorrelated signals (which may degrade a auditory spatial impression).
In a preferred embodiment, the multi-channel audio decoder may be configured to switch between different modes, in which different restrictions are applied for determining how to combine the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to obtain the output audio signals. Accordingly, complexity and processing characteristics can be adjusted to the signals which are processed.
In a preferred embodiment, the multi-channel audio decoder may be configured to switch between a first mode, in which a mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, a second mode in which no mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, and in which it is allowed that a given decorrelated signal is combined, with same or different scaling, with a plurality of rendered audio signals, or a scaled version thereof, in order to adjust cross-correlation characteristics or cross-covariance characteristics of the output audio signals, and a third mode in which no mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, and in which it is not allowed that a given decorrelated signal is combined with rendered audio signals other than a rendered audio signal from which the given decorrelated signal is derived. Thus, both complexity and processing characteristics can be adjusted to the type of audio signal which is currently being rendered. Modifying only the auto-correlation characteristics or auto-covariance characteristics and not explicitly modifying the cross-correlation characteristics or cross-covariance characteristics may, for example, be helpful if a spatial impression of the audio signals would be degraded by such a modification, while it is nevertheless desirable to adjust intensities of the output audio signals. On the other hand, there are cases in which it is desirable to adjust cross-correlation characteristics or cross-covariance characteristics of the output audio signals. The multi-channel audio decoder mentioned here allows for such an adjustment, wherein in the first mode, it is possible to combine rendered audio signals, such that an amount (or intensity) of decorrelated signal components, which is required for adjusting the cross-correlation characteristics or cross-covariance
characteristics, is comparatively small. Thus, "localizable" signal components are used in the first mode to adjust the cross-correlation characteristics or cross-covariance characteristics. In contrast, in the second mode, decorrelated signals are used to adjust cross-correlation characteristics or cross-covariance characteristics, which naturally brings along a different hearing impression. Accordingly, by providing three different modes, the audio decoder can be well-adapted to the audio content being handled.
In a preferred embodiment, the multi-channel audio decoder is configured to evaluate a bitstream element of the encoded representation indicating which of the three modes for combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals is to be used, and to select the mode in dependence on said bitstream element. Accordingly, an audio encoder can signal an appropriate mode in dependence on its knowledge of the audio contents. Thus, a maximum quality of the output audio signals can be achieved under any circumstance.
An embodiment according to the invention creates a multi-channel audio encoder for providing an encoded representation on the basis of at least two input audio signals. The multi-channel audio encoder is configured to provide one or more downmix signals on the basis of the at least two input audio signals. Moreover, the multi-channel audio encoder is configured to provide one or more parameters describing a relationship between the at least two input audio signals. In addition, the multi-channel audio encoder is configured to provide a decorrelation method parameter describing which decorrelation mode out of a plurality of decorrelation modes should be used at the side of an audio encoder. Accordingly, the multi-channel audio encoder can control the audio decoder to use an appropriate decorrelation mode, which is well adapted to the type of audio signal which is currently encoded. Thus, the multi-channel audio encoder described here is well-adapted for cooperation with the multi-channel audio decoder discussed before.
In a preferred embodiment, the multi-channel audio encoder is configured to selectively provide the decorrelation method parameter, to signal one out of the following three modes for the operation of an audio decoder: a first mode, in which a mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, a second mode in which no mixing between different of the rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, and in which it is allowed that a given decorrelated audio
signal is combined, with same or different scaling, with a plurality of rendered audio signals, or a scaled version thereof, in order to adjust cross-correlation characteristics or cross-covariance characteristics of the output audio signals, and a third mode in which no mixing between different of the rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, and in which it is not allowed that a given decorrelated audio signal is combined with rendered audio signals other than a rendered audio signal from which the given decorrelated audio signal is derived. Thus, the multi-channel audio encoder can switch a multi-channel audio decoder through the above discussed three modes in dependence on the audio content, wherein the mode in which the multi-channel audio decoder is operated can be well-adapted by the multi-channel audio encoder to the type of audio content currently encoded. However, in some embodiments, only one or two of the above mentioned three modes for the operation of the audio decoder may be used (or may be available).
In a preferred embodiment, the multi-channel audio encoder is configured to select the decorrelation method parameter in dependence on whether the input audio signals comprise a comparatively high correlation or a comparatively lower correlation. Thus, an adaptation of the decorrelation, which is used in the decoder, can be made on the basis of an important characteristic of the audio signals which are currently encoded.
In a preferred embodiment, the multi-channel audio encoder is configured to select the decorrelation method parameter to designate the first mode or the second mode if a correlation or covariance between the input audio signals is comparatively high, and to select the decorrelation method parameter to designate the third mode if a correlation or covariance between the input audio signals is comparatively lower. Accordingly, in the case of comparatively small correlation or covariance between the input audio signals, a decoding mode is chosen in which there is no correction of cross-covariance characteristics or cross-correlation characteristics. It has been found that this is an efficient choice for signals having a comparatively low correlation (or covariance), since such signals are substantially independent, which eliminates the need for an adaptation of cross-correlations or cross-covariances. Rather, an adjustment of cross-correlations or cross-covariances for substantially independent input audio signals (having a comparatively small correlation or covariance) would typically degrade an audio quality and at the same time increase a decoding complexity. Thus, this concept allows for a
reasonable adaptation of the multi-channel audio decoder to the signal input into the multichannel audio encoder.
An embodiment according to the invention creates a method for providing at least two output audio signals on the basis of an encoded representation. The method comprises rendering a plurality of decoded audio signals, which are obtained on the basis of the encoded representation, in dependence on one or more rendering parameters, to obtain a plurality of rendered audio signals. The method also comprises deriving one or more decorrelated audio signals from the rendered audio signals and combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to obtain the output audio signals. This method is based on the same considerations as the above described multi-channel audio decoder. Moreover, the method can be supplemented by any of the features and functionalities discussed above with respect to the multi-channel audio decoder.
Another embodiment according to the invention creates a method for providing an encoded representation on the basis of at least two input audio signals. The method comprises providing one or more downmix signals on the basis of the at least two input audio signals, providing one or more parameters describing a relationship between the at least two input audio signals, and providing a decorrelation method parameter describing which decorrelation mode out of a plurality of decorrelation modes should be used at the side of an audio decoder. This method is based on the same considerations as the above described multi-channel audio encoder. Moreover, the method can be supplemented by any of the features and functionalities described herein with respect to the multi-channel audio encoder.
Another embodiment according to the invention creates a computer program for performing one or more of the methods described above.
Another embodiment according to the invention creates an encoded audio representation, comprising an encoded representation of a downmix signal, an encoded representation of one or more parameters describing a relationship between the at least two input audio signals, and an encoded decorrelation method parameter describing which decorrelation mode out of a plurality of decorrelation modes should be used at the side of an audio decoder. This encoded audio representation allows to signal an appropriate decorrelation mode and therefore helps to implement the advantages described with respect to the multi-channel audio encoder and the multi-channel audio decoder.
Brief Description of the Figures
Embodiments according to the present invention will subsequently be described taking reference to the enclosed figures in which:
Fig. 1 shows a block schematic diagram of a multi-channel audio decoder, according to an embodiment of the present invention;
Fig. 2 shows a block schematic diagram of a multi-channel audio encoder, according to an embodiment of the present invention;
Fig. 3 shows a flowchart of a method for providing at least two output audio signals on the basis of an encoded representation, according to an embodiment of the invention;
Fig. 4 shows a flowchart of a method for providing an encoded representation on the basis of at least two input audio signals, according to an embodiment of the present invention;
Fig. 5 shows a schematic representation of an encoded audio representation, according to an embodiment of the present invention;
Fig. 6 shows a block schematic diagram of a multi-channel decorrelator, according to an embodiment of the present invention;
Fig. 7 shows a block schematic diagram of a multi-channel audio decoder, according to an embodiment of the present invention;
Fig. 8 shows a block schematic diagram of a multi-channel audio encoder, according to an embodiment of the present invention,
shows a flowchart of a method for providing plurality of decorrelated signals on the basis of a plurality of decorrelator input signals, according to an embodiment of the present invention;
shows a flowchart of a method for providing at least two output audio signals on the basis of an encoded representation, according to an embodiment of the present invention;
shows a flowchart of a method for providing an encoded representation on the basis of at least two input audio signals, according to an embodiment of the present invention;
shows a schematic representation of an encoded representation, according to an embodiment of the present invention;
shows schematic representation which provides an overview of an MMSE based parametric downmix/upmix concept;
shows a geometric representation for an orthogonality principle in 3- dimensional space;
shows a block schematic diagram of a parametric reconstruction system with decorrelation applied on rendered output, according to an embodiment of the present invention;
shows a block schematic diagram of a decorrelation unit;
shows a block schematic diagram of a reduced complexity decorrelation unit, according to an embodiment of the present invention;
shows a table representation of loudspeaker positions, according to an embodiment of the present invention;
Figs. 19a to 19g show table representations of premixing coefficients for N = 22 and
K between 5 and 1 1 ;
Figs. 20a to 20d show table representations of premixing coefficients for N = 10 and K between 2 and 5;
Figs. 21 a to 21 c show table representations of premixing coefficients for N = 8 and K between 2 and 4;
Figs 21 d to 21f show table representations of premixing coefficients for N = 7 and K between 2 and 4;
Figs. 22a and 22b show table representations of premixing coefficients for N = 5 and
K = 2 or K = 3;
Fig. 23 shows a table representation of premixing coefficients for N = 2 and K =1 ;
Fig. 24 shows a table representation of groups of channel signals;
Fig. 25 shows a syntax representation of additional parameters, which may be included into the syntax of SAOCSpecifigConfig() or, equivalently, SAOC3DSpecificConfig();
Fig. 26 shows a table representation of different values for the bitstream variable bsDecorrelationMethod;
Fig. 27 shows a table representation of a number of decorrelators for different decorrelation levels and output configurations, indicated by the bitstream variable bsDecorrelationLevel;
Fig. 28 shows, in the form of a block schematic diagram, an overview over a 3D audio encoder:
Fig. 29 shows, in the form of a block schematic diagram, an overview over a 3D audio decoder; and
Fig. 30 shows a block schematic diagram of a structure of a format converter.
Fig. 31 shows a block schematic diagram of a downmix processor, according to an embodiment of the present invention;
Fig. 32 shows a table representing decoding modes for different number of SAOC downmix objects; and
Fig. 33 shows a syntax representation of a bitstream
"SAOC3DSpecificConfig".
Detailed Description of the Embodiments
1. Multi-channel audio decoder according to Fig. 1
Fig. 1 shows a block schematic diagram of a multi-channel audio decoder 100, according to an embodiment of the present invention.
The multi-channel audio decoder 100 is configured to receive an encoded representation 1 10 and to provide, on the basis thereof, at least two output audio signals 1 12, 1 14.
The multi-channel audio decoder 100 preferably comprises a decoder 120 which is configured to provide decoded audio signals 122 on the basis of the encoded representation 1 10. Moreover, the multi-channel audio decoder 100 comprises a renderer 130, which is configured to render a plurality of decoded audio signals 122, which are obtained on the basis of the encoded representation 1 10 (for example, by the decoder 120) in dependence on one or more rendering parameters 132, to obtain a plurality of rendered audio signals 134, 136. Moreover, the multi-channel audio decoder 100 comprises a decorrelator 140, which is configured to derive one or more decorrelated audio signals 142, 144 from the rendered audio signals 134, 136. Moreover, the multi-channel audio decoder 100 comprises a combiner 150, which is configured to combine the rendered audio signals 134, 136, or a scaled version thereof, with the one or more decorrelated audio signals 142, 144 to obtain the output audio signals 1 12, 1 14.
However, it should be noted that a different hardware structure of the multi-channel audio decoder 100 may be possible, as long as the functionalities described above are given.
Regarding the functionality of the multi-channel audio decoder 100, it should be noted that the decorrelated audio signals 142, 144 are derived from the rendered audio signals 134, 136, and that the decorrelated audio signals 142, 144 are combined with the rendered audio signals 134, 136 to obtain the output audio signals 1 12, 1 4. By deriving the decorrelated audio signals 142, 144 from the rendered audio signals 134, 136, a particularly efficient processing can be achieved, since the number of rendered audio signals 134, 136 is typically independent from the number of decoded audio signals 122 which are input into the renderer 130. Thus, the decorrelation effort is typically independent from the number of decoded audio signals 122, which improves the implementation efficiency. Moreover, applying the decorrelation after the rendering avoids the introduction of artifacts, which could be caused by the renderer when combining multiple decorrelated signals in the case that the decorrelation is applied before the rendering. Moreover, characteristics of the rendered audio signals can be considered in the decorrelation performed by the decorrelator 140, which typically results in output audio signals of good quality.
Moreover, it should be noted that the multi-channel audio decoder 100 can be supplemented by any of the features and functionalities described herein. In particular, it should be noted that individual improvements as described herein may be introduced into the multi-channel audio decoder 100 in order to thereby even improve the efficiency of the processing and/or the quality of the output audio signals.
2. Multi-Channel Audio Encoder According to Fig. 2
Fig. 2 shows a block schematic diagram of a multi-channel audio encoder 200, according to an embodiment of the present invention. The multi-channel audio encoder 200 is configured to receive two or more input audio signals 210, 212, and to provide, on the basis thereof, an encoded representation 214. The multi-channel audio encoder comprises a downmix signal provider 220, which is configured to provide one or more downmix signals 222 on the basis of the at least two input audio signals 210, 212. Moreover, the multi-channel audio encoder 200 comprises a parameter provider 230, which is configured to provide one or more parameters 232 describing a relationship (for example, a cross-correlation, a cross-covariance, a level difference or the like) between the at least two input audio signals 210, 212.
Moreover, the multi-channel audio encoder 200 also comprises a decorrelation method parameter provider 240, which is configured to provide a decorrelation method parameter 242 describing which decorrelation mode out of a plurality of decorrelation modes should be used at the side of an audio decoder. The one or more downmix signals 222, the one or more parameters 232 and the decorrelation method parameter 242 are included, for example, in an encoded form, into the encoded representation 214.
However, it should be noted that the hardware structure of the multi-channel audio encoder 200 may be different, as long as the functionalities as described above are fulfilled. In other words, the distribution of the functionalities of the multi-channel audio encoder 200 to individual blocks (for example, to the downmix signal provider 220, to the parameter provider 230 and to the decorrelation method parameter provider 240) should only be considered as an example.
Regarding the functionality of the multi-channel audio encoder 200, it should be noted that the one or more downmix signals 222 and the one or more parameters 232 are provided in a conventional way, for example like in an SAOC multi-channel audio encoder or in a USAC multi-channel audio encoder. However, the decorrelation method parameter 242, which is also provided by the multi-channel audio encoder 200 and included into the encoded representation 214, can be used to adapt a decorrelation mode to the input audio signals 210, 212 or to a desired playback quality. Accordingly, the decorrelation mode can be adapted to different types of audio content. For example, different decorrelation modes can be chosen for types of audio contents in which the input audio signals 210, 212 are strongly correlated and for types of audio content in which the input audio signals 210, 212 are independent. Moreover, different decorrelation modes can, for example, be signaled by the decorrelation mode parameter 242 for types of audio contents in which a spatial perception is particularly important and for types of audio content in which a spatial impression is less important or even of subordinate importance (for example, when compared to a reproduction of individual channels). Accordingly, a multi-channel audio decoder, which receives the encoded representation 214, can be controlled by the multi-channel audio encoder 200, and may be set to a decoding mode which brings along a best possible compromise between decoding complexity and reproduction quality.
Moreover, it should be noted that the multi-channel audio encoder 200 may be supplemented by any of the features and functionalities described herein. It should be
noted that the possible additional features and improvements described herein may be added to the multi-channel audio encoder 200 individually or in combination, to thereby improve (or enhance) the multi-channel audio encoder 200.
3. Method for Providing at Least Two Output Audio Signals According to Fig. 3
Fig. 3 shows a flowchart of a method 300 for providing at least two output audio signals on the basis of an encoded representation. The method comprises rendering 310 a plurality of decoded audio signals, which are obtained on the basis of an encoded representation 312, in dependence on one or more rendering parameters, to obtain a plurality of rendered audio signals. The method 300 also comprises deriving 320 one or more decorrelated audio signals from the rendered audio signals. The method 300 also comprises combining 330 the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to obtain the output audio signals 332.
It should be noted that the method 300 is based on the same considerations as the multichannel audio decoder 100 according to Fig. 1. Moreover, it should be noted that the method 300 may be supplemented by any of the features and functionalities described herein (either individually or in combination). For example, the method 300 may be supplemented by any of the features and functionalities described with respect to the multi-channel audio decoders described herein.
4. Method for Providing an Encoded Representation According to Fig. 4
Fig. 4 shows a flowchart of a method 400 for providing an encoded representation on the basis of at least two input audio signals. The method 400 comprises providing 410 one or more downmix signals on the basis of at least two input audio signals 412. The method 400 further comprises providing 420 one or more parameters describing a relationship between the at least two input audio signals 412 and providing 430 a decorrelation method parameter describing which decorrelation mode out of a plurality of decorrelation modes should be used at the side of an audio decoder. Accordingly, an encoded representation 432 is provided, which preferably includes an encoded representation of the one or more downmix signals, one or more parameters describing a relationship between the at least two input audio signals, and the decorrelation method parameter.
It should be noted that the method 400 is based on the same considerations as the multichannel audio encoder 200 according to Fig. 2, such that the above explanations also apply.
Moreover, it should be noted that the order of the steps 410, 420, 430 can be varied flexibly, and that the steps 410, 420, 430 may also be performed in parallel as far as this is possible in an execution environment for the method 400. Moreover, it should be noted that the method 400 can be supplemented by any of the features and functionalities described herein, either individually or in combination. For example, the method 400 may be supplemented by any of the features and functionalities described herein with respect to the multi-channel audio encoders. However, it is also possible to introduce features and functionalities which correspond to the features and functionalities of the multi-channel audio decoders described herein, which receive the encoded representation 432.
5. Encoded Audio Representation According to Fig. 5
Fig. 5 shows a schematic representation of an encoded audio representation 500 according to an embodiment of the present invention.
The encoded audio representation 500 comprises an encoded representation 510 of a downmix signal, an encoded representation 520 of one or more parameters describing a relationship between at least two audio signals. Moreover, the encoded audio representation 500 also comprises an encoded decorrelation method parameter 530 describing which decorrelation mode out of a plurality of decorrelation modes should be used at the side of an audio decoder. Accordingly, the encoded audio representation allows to signal a decorrelation mode from an audio encoder to an audio decoder. Accordingly, it is possible to obtain a decorrelation mode which is we 11 -adapted to the characteristics of the audio content (which is described, for example, by the encoded representation 510 of one or more downmix signals and by the encoded representation 520 of one or more parameters describing a relationship between at least two audio signals (for example, the at least two audio signals which have been downmixed into the encoded representation 510 of one or more downmix signals)). Thus, the encoded audio representation 500 allows for a rendering of an audio content represented by the encoded audio representation 500 with a particularly good auditory spatial impression and/or a particularly good tradeoff between auditory spatial impression and decoding complexity.
Moreover, it should be noted that the encoded representation 500 may be supplemented by any of the features and functionalities described with respect to the multi-channel audio encoders and the multi-channel audio decoders, either individually or in combination.
6. Multi-Channel Decorrelator According to Fig. 6
Fig. 6 shows a block schematic diagram of a multi-channel decorrelator 600, according to an embodiment of the present invention.
The multi-channel decorrelator 600 is configured to receive a first set of N decorrelator input signals 610a to 61 On and provide, on the basis thereof, a second set of N' decorrelator output signals 612a to 612n'. In other words, the multi-channel decorrelator 600 is configured for providing a plurality of (at least approximately) decorrelated signals 612a to 612n' on the basis of the decorrelator input signals 610a to 61 On.
The multi-channel decorrelator 600 comprises a premixer 620, which is configured to premix the first set of N decorrelator input signals 610a to 61 On into a second set of K decorrelator input signals 622a to 622k, wherein K is smaller than N (with K and N being integers). The multi-channel decorrelator 600 also comprises a decorrelation (or decorrelator core) 630, which is configured to provide a first set of K' decorrelator output signals 632a to 632k' on the basis of the second set of K decorrelator input signals 622a to 622k. Moreover, the multi-channel decorrelator comprises an postmixer 640, which is configured to upmix the first set of K' decorrelator output signals 632a to 632k' into a second set of N' decorrelator output signals 612a to 612n', wherein N' is larger than K' (with N' and K' being integers).
However, it should be noted that the given structure of the multi-channel decorrelator 600 should be considered as an example only, and that it is not necessary to subdivide the multi-channel decorrelator 600 into functional blocks (for example, into the premixer 620, the decorrelation or decorrelator core 630 and the postmixer 640) as long as the functionality described herein is provided.
Regarding the functionality of the multi-channel decorrelator 600, it should also be noted that the concept of performing a premixing, to derive the second set of K decorrelator input signals from the first set of N decorrelator input signals, and of performing the decorrelation on the basis of the (premixed or "downmixed") second set of K decorrelator input signals brings along a reduction of a complexity when compared to a concept in which the actual decorrelation is applied, for example, directly to N decorrelator input signals. Moreover, the second (upmixed) set of N' decorrelator output signals is obtained on the basis of the first (original) set of decorrelator output signals, which are the result of the actual decorrelation, on the basis of an postmixing, which may be performed by the upmixer 640. Thus, the multi-channel decorrelator 600 effectively (when seen from the outside) receives N decorrelator input signals and provides, on the basis thereof, N' decorrelator output signals, while the actual decorrelator core 630 only operates on a smaller number of signals (namely K downmixed decorrelator input signals 622a to 622k of the second set of K decorrelator input signals). Thus, the complexity of the multi-channel decorrelator 600 can be substantially reduced, when compared to conventional decorrelators, by performing a downmixing or "premixing" (which may preferably be a linear premixing without any decorrelation functionality) at an input side of the decorrelation (or decorrelator core) 630 and by performing the upmixing or "postmixing" (for example, a linear upmixing without any additional decorrelation functionality) on the basis of the (original) output signals 632a to 632k' of the decorrelation (decorrelator core) 630.
Moreover, it should be noted that the multi-channel decorrelator 600 can be supplemented by any of the features and functionalities described herein with respect to the multi-channel decorrelation and also with respect to the multi-channel audio decoders. It should be noted that the features described herein can be added to the multi-channel decorrelator 600 either individually or in combination, to thereby improve or enhance the multi-channel decorrelator 600.
It should be noted that a multi-channel decorrelator without complexity reduction can be derived from the above described multichannel decorrelator for K=N (and possibly K -N' or even K=N=K'=N').
7. Multi-channel Audio Decoder According to Fig. 7
Fig. 7 shows a block schematic diagram of a multi-channel audio decoder 700, according to an embodiment of the invention.
The multi-channel audio decoder 700 is configured to receive an encoded representation 710 and to provide, on the basis of thereof, at least two output signals 712, 714. The multi-channel audio decoder 700 comprises a multi-channel decorrelator 720, which may be substantially identical to the multi-channel decorrelator 600 according to Fig. 6. Moreover, the multi-channel audio decoder 700 may comprise any of the features and functionalities of a multi-channel audio decoder which are known to the man skilled in the art or which are described herein with respect to other multi-channel audio decoders.
Moreover, it should be noted that the multi-channel audio decoder 700 comprises a particularly high efficiency when compared to conventional multi-channel audio decoders, since the multi-channel audio decoder 700 uses the high-efficiency multi-channel decorrelator 720.
8. Multi-Channel Audio Encoder According to Fig. 8
Fig. 8 shows a block schematic diagram of a multi-channel audio encoder 800 according to an embodiment of the present invention. The multi-channel audio encoder 800 is configured to receive at least two input audio signals 810, 812 and to provide, on the basis thereof, an encoded representation 814 of an audio content represented by the input audio signals 810, 812.
The multi-channel audio encoder 800 comprises a downmix signal provider 820, which is configured to provide one or more downmix signals 822 on the basis of the at least two input audio signals 810, 812. The multi-channel audio encoder 800 also comprises a parameter provider 830 which is configured to provide one or more parameters 832 (for example, cross-correlation parameters or cross-covariance parameters, or inter-object-correlation parameters and/or object level difference parameters) on the basis of the input audio signals 810,812. Moreover, the multi-channel audio encoder 800 comprises a decorrelation complexity parameter provider 840 which is configured to provide a decorrelation complexity parameter 842 describing a complexity of a decorrelation to be used at the side of an audio decoder (which receives the encoded representation 814). The one or more downmix signals 822, the one or more parameters 832 and the
decorrelation complexity parameter 842 are included into the encoded representation 814, preferably in an encoded form.
However, it should be noted that the internal structure of the multi-channel audio encoder 800 (for example, the presence of the downmix signal provider 820, of the parameter provider 830 and of the decorrelation complexity parameter provider 840) should be considered as an example only. Different structures are possible as long as the functionality described herein is achieved.
Regarding the functionality of the multi-channel audio encoder 800, it should be noted that the multi-channel encoder provides an encoded representation 814, wherein the one or more downmix signals 822 and the one or more parameters 832 may be similar to, or equal to, downmix signals and parameters provided by conventional audio encoders (like, for example, conventional SAOC audio encoders or USAC audio encoders). However, the multi-channel audio encoder 800 is also configured to provide the decorrelation complexity parameter 842, which allows to determine a decorrelation complexity which is applied at the side of an audio decoder. Accordingly, the decorrelation complexity can be adapted to the audio content which is currently encoded. For example, it is possible to signal a desired decorrelation complexity, which corresponds to an achievable audio quality, in dependence on an encoder-sided knowledge about the characteristics of the input audio signals. For example, if it is found that spatial characteristics are important for an audio signal, a higher decorrelation complexity can be signaled, using the decorrelation complexity parameter 842, when compared to a case in which spatial characteristics are not so important. Alternatively, the usage of a high decorrelation complexity can be signaled using the decorrelation complexity parameter 842, if it is found that a passage of the audio content or the entire audio content is such that a high complexity decorrelation is required at a side of an audio decoder for other reasons.
To summarize, the multi-channel audio encoder 800 provides for the possibility to control a multi-channel audio decoder, to use a decorrelation complexity which is adapted to signal characteristics or desired playback characteristics which can be set by the multichannel audio encoder 800.
Moreover, it should be noted that the multi-channel audio encoder 800 may be supplemented by any of the features and functionalities described herein regarding a multi-channel audio encoder, either individually or in combination. For example, some or
all of the features described herein with respect to multi-channel audio encoders can be added to the multi-channel audio encoder 800. Moreover, the multi-channel audio encoder 800 may be adapted for cooperation with the multi-channel audio decoders described herein.
9. Method for Providing a Plurality of Decorrelated Signals on the Basis of a Plurality of Decorrelator Input Signals, According to Fig. 9
Fig. 9 shows a flowchart of a method 900 for providing a plurality of decorrelated signals on the basis of a plurality of decorrelator input signals.
The method 900 comprises premixing 910 a first set of N decorrelator input signals into a second set of K decorrelator input signals, wherein K is smaller than N. The method 900 also comprises providing 920 a first set of K' decorrelator output signals on the basis of the second set of K decorrelator input signals. For example, the first set of K' decorrelator output signals may be provided on the basis of the second set of K decorrelator input signals using a decorrelation, which may be performed, for example, using a decorrelator core or using a decorrelation algorithm. The method 900 further comprises postmixing 930 the first set of K' decorrelator output signals into a second set to N' decorrelator output signals, wherein NT is larger than K' (with N' and K' being integer numbers). Accordingly, the second set of N' decorrelator output signals, which are the output of the method 900, may be provided on the basis of the first set of N decorrelator input signals, which are the input to the method 900.
It should be noted that the method 900 is based on the same considerations as the multichannel decorrelator described above. Moreover, it should be noted that the method 900 may be supplemented by any of the features and functionalities described herein with respect to the multi-channel decorrelator (and also with respect to the multi-channel audio encoder, if applicable), either individually or taken in combination.
10. Method for Providing at Least Two Output Audio Signals on the Basis of an Encoded Representation, According to Fig. 10
Fig. 10 shows a flowchart of a method 1000 for providing at least two output audio signals on the basis of an encoded representation.
The method 1000 comprises providing 1010 at least two output audio signals 1014, 1016 on the basis of an encoded representation 1012. The method 1000 comprises providing 1020 a plurality of decorrelated signals on the basis of a plurality of decorrelator input signals in accordance with the method 900 according to Fig. 9.
It should be noted that the method 1000 is based on the same considerations as the multi-channel audio decoder 700 according to Fig. 7.
Also, it should be noted that the method 1000 can be supplemented by any of the features and functionalities described herein with respect to the multi-channel decoders, either individually or in combination.
1 1. Method for Providing an Encoded Representation on the Basis of at Least Two Input Audio Signals, According to Fig. 1 1
Fig. 1 1 shows a flowchart of a method 100 for providing an encoded representation on the basis of at least two input audio signals.
The method 1 100 comprises providing 1 1 10 one or more downmix signals on the basis of the at least two input audio signals 1 1 12, 1 1 14. The method 1100 also comprises providing 1 20 one or more parameters describing a relationship between the at least two input audio signals 1 1 12, 1 1 14. Furthermore, the method 1 100 comprises providing 1 130 a decorrelation complexity parameter describing a complexity of a decorrelation to be used at the side of an audio decoder. Accordingly, an encoded representation 1 132 is provided on the basis of the at least two input audio signals 1 1 12, 1 1 14, wherein the encoded representation typically comprises the one or more downmix signals, the one or more parameters describing a relationship between the at least two input audio signals and the decorrelation complexity parameter in an encoded form.
It should be noted that the steps 1 1 10, 1 120, 1 130 may be performed in parallel or in a different order in some embodiments according to the invention. Moreover, it should be noted that the method 1 100 is based on the same considerations as the multi-channel
audio encoder 800 according to Fig. 8, and that the method 1 100 can be supplemented by any of the features and functionalities described herein with respect to the multi-channel audio encoder, either in combination or individually. Moreover, it should be noted that the method 1 100 can be adapted to match the multi-channel audio decoder and the method for providing at least two output audio signals described herein.
12. Encoded Audio Representation According to Fig. 12
Fig. 12 shows a schematic representation of an encoded audio representation, according to an embodiment of the present invention. The encoded audio representation 1200 comprises an encoded representation 1210 of a downmix signal, an encoded representation 1220 of one or more parameters describing a relationship between the at least two input audio signals, and an encoded decorrelation complexity parameter 1230 describing a complexity of a decorrelation to be used at the side of an audio decoder. Accordingly, the encoded audio representation 1200 allows to adjust the decorrelation complexity used by a multi-channel audio decoder, which brings along an improved decoding efficiency, and possible an improved audio quality, or an improved tradeoff between coding efficiency and audio quality. Moreover, it should be noted that the encoded audio representation 1200 may be provided by the multi-channel audio encoder as described herein, and may be used by the multi-channel audio decoder as described herein. Accordingly, the encoded audio representation 1200 can be supplemented by any of the features described with respect to the multi-channel audio encoders and with respect to the multi-channel audio decoders.
13. Notation and Underlying Considerations
Recently, parametric techniques for the bitrate efficient transmission/storage of audio scenes containing multiple audio objects have been proposed in the field of audio coding (see, for example, references [BCC], [JSC], [SAOC], [SAOC1], [SAOC2]) and informed source separation (see, for example, references [ISS1 ], [ISS2], [ISS3], [ISS4], [ISS5], [ISS6]). These techniques aim at reconstructing a desired output audio scene or audio source object based on additional side information describing the transmitted/stored audio scene and/or source objects in the audio scene. This reconstruction takes place in the decoder using a parametric informed source separation scheme. Moreover, reference is
also made to the so-called "MPEG Surround" concept, which is described, for example, in the international standard ISO/IEC 23003-1 :2007. Moreover, reference is also made to the so-called "Spatial Audio Object Coding" which is described in the international standard ISO/IEC 23003-2:2010. Furthermore, reference is made to the so-called "Unified Speech and Audio Coding" concept, which is described in the international standard ISO/IEC 23003-3:2012. Concepts from these standards can be used in embodiments according to the invention, for example, in the multi-channel audio encoders mentioned herein and the multi-channel audio decoders mentioned herein, wherein some adaptations may be required.
In the following, some background information will be described. In particular, an overview on parametric separation schemes will be provided, using the example of MPEG spatial audio object coding (SAOC) technology (see, for example, the reference [SAOC]). The mathematical properties of this method are considered.
13.1 . Notation and Definitions
The following mathematical notation is applied in the current document:
Nohjecls number of audio object signals
DmxCh number of downmix (processed) channels
number of upmix (output) channels
N sample number of processed data samples
D downmix matrix, size NDnixCh x Nohjects
x input audio object signal, size Nobjects χ NSamples
Ex object covariance matrix, size N0bJecls x Nohjecls
defined as E x = XX"
Y downmix audio signal, size NDtmCh x NSamples
defined as Y = DX
Ej. covariance matrix of the downmix signals, size NDmxCh χ N
defined as E = YY"
G parametric source estimation matrix, size N0hjects x NDmxCh
which approximates E V D" (DE D" ) 1
X parametrically reconstructed object signal, size N0hjects x NSamples
which approximates x and defined as X = GY
K rendering matrix (specified at the decoder side), size NUpmixCh x N0bjecls
Z ideal rendered output scene signal, size NUpmixCh x NSamples
defined as z = RX
Z rendered parametric output, size NUpmixCh x N&mp/es
defined as Z=RX
C covariance matrix of the ideal output, size NUpmixCh χ NUpmlxCh
defined as C = RE^R "
W decorrelator outputs, size NUpmaCh x NSamples
Γ Z 7 Ί
S combined signal S size 2NUpmjxCh x NSamp/es
W
combined signal covariance matrix, size 2N, UpmixCh defined as Es = SS'
Z final output, size NUpmixCh x N¾m/?/es
{■)" self-adjoint (Hermitian) operator
which represents the complex conjugate transpose of (■) . The notation
(·)* can be also used.
Fde∞rr (·) decorrelator function
ε is an additive constant or a limitation constant (for example, used in a "maximum" operation or a "max" operation) to avoid division by zero
H = matdiag(M) is a matrix containing the elements from the main diagonal of matrix
M on the main diagonal and zero values on the off-diagonal positions.
Without loss of generality, in order to improve readability of equations, for all introduced variables the indices denoting time and frequency dependency are omitted in this document.
13.2. Parametric Separation Systems
General parametric separation systems aim to estimate a number of audio sources from a signal mixture (downmix) using auxiliary parameter information (like, for example, inter-channel correlation values, inter-channel level difference values, inter-object correlation values and/or object level difference information). A typical solution of this task is based on application of the minimum mean squared error (MMSE) estimation algorithms. The SAOC technology is one example of such parametric audio encoding/decoding systems.
Fig. 13 shows the general principle of the SAOC encoder/decoder architecture. In other words, Fig. 13 shows, in the form of a block schematic diagram, an overview of the MMSE based parametric downmix/upmix concept.
An encoder 1310 receives a plurality of object signals 1312a, 1312b to 1312n. Moreover, the encoder 1310 also receives mixing parameters D, 1314, which may, for example, be downmix parameters. The encoder 1310 provides, on the basis thereof, one or more downmix signals 1316a, 1316b, and so on. Moreover, the encoder provides a side information 1318 The one or more downmix signals and the side information may, for example, be provided in an encoded form.
The encoder 1310 comprises a mixer 1320, which is typically configured to receive the object signals 1312a to 1312n and to combine (for example downmix) the object signals 1312a to 1312n into the one or more downmix signals 1316a, 1316b in dependence on the mixing parameters 1314. Moreover, the encoder comprises a side information estimator 1330, which is configured to derive the side information 1318 from the object signals 1312a to 1312n. For example, the side information estimator 1330 may be configured to derive the side information 1318 such that the side information describes a relationship between object signals, for example, a cross-correlation between object signals (which may be designated as "inter-object-correlation" IOC) and/or an information describing level differences between object signals (which may be designated as a "object level difference information" OLD).
The one or more downmix signals 1316a, 1316b and the side information 1318 may be stored and/or transmitted to a decoder 1350, which is indicated at reference numeral 1340.
The decoder 1350 receives the one or more downmix signals 1316a, 1316b and the side information 1318 (for example, in an encoded form) and provides, on the basis thereof, a plurality of output audio signals 1352a to 1352n. The decoder 1350 may also receive a user interaction information 1354, which may comprise one or more rendering parameters R (which may define a rendering matrix). The decoder 1350 comprises a parametric object separator 1360, a side information processor 1370 and a renderer 1380. The side information processor 1370 receives the side information 1318 and provides, on the basis thereof, a control information 1372 for the parametric object separator 1360. The parametric object separator 1360 provides a plurality of object signals 1362a to 1362n on the basis of the downmix signals 1360a, 1360b and the control information 1372, which is derived from the side information 1318 by the side information processor 1370. For example, the object separator may perform a decoding of the encoded downmix signals and an object separation. The renderer 1380 renders the reconstructed object signals 1362a to 1362n, to thereby obtain the output audio signals 1352a to 1352n.
In the following, the functionality of the MMSE based parameter downmix/upmix concept will be discussed.
The general parametric downmix/upmix processing is carried out in a time/frequency selective way and can be described as a sequence of the following steps:
• The "encoder" 1310 is provided with input "audio objects" x and "mixing parameters" D . The "mixer" 1320 downmixes the "audio objects" x into a number of "downmix signals" Y using "mixing parameters" D (e.g., downmix gains). The
"side info estimator" extracts the side information 1318 describing characteristics of the input "audio objects" x (e.g., covariance properties).
• The "downmix signals" Y and side information are transmitted or stored. These downmix audio signals can be further compressed using audio coders (such as
MPEG-1/2 Layer II or III, MPEG-2/4 Advanced Audio Coding (AAC), MPEG Unified Speech and Audio Coding (USAC), etc.). The side information can be also represented and encoded efficiently (e.g., as loss-less coded relations of the object powers and object correlation coefficients).
• The "decoder" 1350 restores the original "audio objects" from the decoded "downmix signals" using the transmitted side information 1318. The "side info processor" 1370 estimates the un-mixing coefficients 1372 to be applied on the "downmix signals" within "parametric object separator" 1360 to obtain the parametric object reconstruction of x . The reconstructed "audio objects" 1362a to 1362n are rendered to a (multi-channel) target scene, represented by the output channels Z , by applying "rendering parameters" R , 1354.
Moreover, it should be noted that the functionalities described with respect to the encoder 1310 and the decoder 1350 may be used in the other audio encoders and audio decoders described herein as well.
3.3. Orthogonality Principle of Minimum Mean Squared Error Estimation
Orthogonality principle is one major property of MMSE estimators. Consider two Hilbert spaces W and V, with V spanned by a set of vectors y and a vector x&W . If one wishes to find an estimate X G V which will approximate x as a linear combination of the vectors y, e V , while minimizing the mean square error, then the error vector will be orthogonal on the space spanned by the vectors yi :
(x - x)yH = 0 ,
As a consequence, the estimation error and the estimate itself are orthogonal:
(x - x)xH = 0 .
Geometrically one could visualize this by the examples shown in Fig. 14.
Fig. 14 shows a geometric representation for orthogonality principle in 3-dimensional space. As can be seen, a vector space is spanned by vectors y1 t y2. A vector x is equal to a sum of a vector x and a difference vector (or error vector) e. As can be seen, the error vector e is orthogonal to the vector space (or plane) V spanned by vectors and y2.
Accordingly, vector x can be considered as a best approximation of x within the vector space V.
13.4. Parametric Reconstruction Error
Defining a matrix comprising N signals: X and denoting the estimation error with XEnor , the following identities can be formulated. The original signal can be represented as a sum of the parametric reconstruction X and the reconstruction error x Error as
X - X + XError■
Because of the orthogonality principle, the covariance matrix of the original signals = XX" can be formulated as a sum of the covariance matrix of the reconstructed
signals XXH and the covariance matrix of the estimation errors XErrorXHError as
XX" - X + XError (X + XError j - XX/ + XErrorX Error + XX Error + ^Error^'
XX" + Xi X
When the input objects x are not in the space spanned by the downmix channels (e.g. the number of downmix channels is less than the number of input signals) and the input objects cannot be represented as linear combinations of the downmix channels, the
MMSE-based algorithms introduce reconstruction inaccuracy X/ r„rX"rmr .
13.5. Inter Object Correlation
In the auditory system, the cross-covariance (coherence/correlation) is closely related to the perception of envelopment, of being surrounded by the sound, and to the perceived width of a sound source. For example in SAOC based systems the Inter-Object Correlation (IOC) parameters are used for characterization of this property:
Let us consider an example of reproducing a sound source using two audio signals. If the IOC value is close to one, the sound is perceived as a well-localized point source. If the IOC value is close to zero, the perceived width of the sound source increases and for extreme cases it can even be perceived as two distinct sources [Blauert, Chapter 3].
13.6. Compensation for Reconstruction Inaccuracy
In the case of imperfect parametric reconstruction, the output signal may exhibit a lower energy compared to the original objects. The error in the diagonal elements of the covariance matrix may result in audible level differences and error in the off-diagonal elements in a distorted spatial sound image (compared with the ideal reference output). The proposed method has the purpose to solve this problem.
In the MPEG Surround (MPS), for example, this issue is treated only for some specific channel-based processing scenarios, namely, for mono/stereo downmix and limited static output configurations (e.g., mono, stereo, 5.1 , 7.1 , etc). In object-oriented technologies, like SAOC, which also uses mono/stereo downmix this problem is treated by applying the MPS post-processing rendering for 5.1 output configuration only.
The existing solutions are limited to standard output configurations and fixed number of input/output channels. Namely, they are realized as consequent application of several blocks implementing just "mono-to-stereo" (or "stereo-to-three") channel decorrelation methods.
Therefore, a general solution (e.g., energy level and correlation properties correction method) for parametric reconstruction inaccuracy compensation is desired, which can be applied for a flexible number of downmix/output channels and arbitrary output configuration setups.
13 7. Conclusions
To conclude, an overview over the notation has been provided. Moreover, a parametric separation system has been described on which embodiments according to the invention are based. Moreover, it has been outlined that the orthogonality principle applies to minimum mean squared error estimation. Moreover, an equation for the computation of a covariance matrix Ex has been provided which applies in the presence of a reconstruction error XError. Also, the relationship between the so-called inter-object correlation values and the elements of a covariance matrix E has been provided, which may be applied, for example, in embodiments according to the invention to derive desired covariance characteristics (or correlation characteristics) from the inter-object correlation values (which may be included in the parametric side information), and possibly form the object level differences. Moreover, it has been outlined that the characteristics of reconstructed object signals may differ from desired characteristics because of an imperfect reconstruction. Moreover, it has been outlined that existing solutions to deal with the problem are limited to some specific output configurations and rely on a specific combination of standard blocks, which makes the conventional solutions inflexible.
14. Embodiment According to Fig. 15
14.1. Concept Overview
Embodiments according to the invention extend the MMSE parametric reconstruction methods used in parametric audio separation schemes with a decorrelation solution for an arbitrary number of downmix/upmix channels. Embodiments according to the invention, like, for example, the inventive apparatus and the inventive method, may compensate for the energy loss during a parametric reconstruction and restore the correlation properties of estimated objects.
Fig. 15 provides an overview of the parametric downmix/upmix concept with an integrated decorrelation path. In other words, Fig. 15 shows, in the form of a block schematic diagram, a parametric reconstruction system with decorrelation applied on rendered output.
The system according to Fig. 15 comprises an encoder 1510, which is substantially identical to the encoder 1310 according to Fig. 13. The encoder 1510 receives a plurality of object signals 1512a to 1512n, and provides on the basis thereof, one or more downmix signals 1516a, 1516b, as well as a side information 1518. Downmix signals 1516a, 1515b may be substantially identical to the downmix signals 1316a, 1316b and may designated with Y. The side information 1518 may be substantially identical to the side information 1318. However, the side information may, for example, comprise a decorrelation mode parameter or a decorrelation method parameter, or a decorrelation complexity parameter. Moreover, the encoder 1510 may receive mixing parameters 1514.
The parametric reconstruction system also comprises a transmission and/or storage of the one or more downmix signals 1516a, 1516b and of the side information 1518, wherein the transmission and/or storage is designated with 1540, and wherein the one or more downmix signals 1516a, 1516b and the side information 1518 (which may include parametric side information) may be encoded.
Moreover, the parametric reconstruction system according to Fig. 15 comprises a decoder 1550, which is configured to receive the transmitted or stored one or more (possibly encoded) downmix signals 1516a, 1516b and the transmitted or stored (possibly encoded) side information 1518 and to provide, on the basis thereof, output audio signals 1552a to 1552n. The decoder 1550 (which may be considered as a multi-channel audio decoder) comprises a parametric object separator 1560 and a side information processor 1570. Moreover, the decoder 1550 comprises a renderer 1580, a decorrelator 1590 and a mixer 1598.
Claims
1 . A multi-channel audio decoder (100; 700; 1550; 3000) for providing at least two output audio signals (1 12, 1 14; 712,714; 1552a - 1552n; 3012) on the basis of an encoded representation (1 10; 710; 1516a, 1516b, 1518),
wherein the multi-channel audio decoder is configured to render (130; 1580) a plurality of decoded audio signals (122; 1562a - 1562η, X), which are obtained on the basis of the encoded representation, in dependence on one or more rendering parameters (132), to obtain a plurality of rendered audio signals (134, 136; 1582a- 1582n, Z), and
wherein the multi-channel audio decoder is configured to derive (140; 1590) one or more decorrelated audio signals (142, 144; 1592a-1592n) from the rendered audio signals, and
wherein the multi-channel audio decoder is configured to combine (150; 1598) the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to obtain the output audio signals.
2. The multi-channel audio decoder according to claim 1 , wherein the multi-channel audio decoder is configured to obtain the decoded audio signals, which are rendered to obtain the plurality of rendered audio signals, using a parametric reconstruction (120; 1560).
3. The multi-channel audio decoder according to claim 2, wherein the decoded audio signals are reconstructed object signals, and
wherein the multi-channel audio decoder is configured to derive the reconstructed object signals from one or more downmix signals (1516a, 1516b) using a side information (1518).
4. The multi-channel audio decoder according to claim 3, wherein the multi-channel audio decoder is configured to derive un-mixing coefficients from the side information and to apply the un-mixing coefficients to derive the reconstructed object signals from the one or more downmix signals using the un-mixing coefficients.
The multi-channel audio decoder according to one of claims 1 to 4, wherein the multi-channel audio decoder is configured to combine the rendered audio signals with the one or more decorrelated audio signals, to at least partially achieve desired correlation characteristics or covariance characteristics of the output audio signals.
The multi-channel audio decoder according to one of claims 1 to 5, wherein the multi-channel audio decoder is configured to combine the rendered audio signals with the one or more decorrelated audio signals, to at least partially compensate for an energy loss during a parametric reconstruction (120; 1560) of the decoded audio signals (122; 1562a to 1562n), which are rendered to obtain the plurality of rendered audio signals.
The multi-channel audio decoder according to one of claims 1 to 6, wherein the multi-channel audio decoder is configured to determine desired correlation characteristics or desired covariance characteristics of the output audio signals, and
wherein the multi-channel audio decoder is configured to adjust a combination (150; 1598) of the rendered audio signals with the one or more decorrelated audio signals, to obtain the output audio signals, such that correlation characteristics or covariance characteristics of the obtained output audio signals approximate or equal the desired correlation characteristics or desired covariance characteristics (C).
The multi-channel audio decoder according to claim 7, wherein the multi-channel audio decoder is configured to determine the desired correlation characteristics or desired covariance characteristics (C) in dependence on a rendering information (R) describing a rendering (130; 1560) of the plurality of decoded audio signals (X), which are obtained on the basis of the encoded representation, to obtain the plurality of rendered audio signals(Z).
The multi-channel audio decoder according to claim 7 or claim 8, wherein the multi-channel audio decoder is configured to determine the desired correlation characteristics or desired covariance characteristics (C) in dependence on an object correlation information or an object covariance information (Ex) describing characteristics of a : plurality of audio objects and/or a relationship between a plurality of audio objects.
The multi-channel audio decoder according to claim 9, wherein the multi-channel audio decoder is configured to determine the object correlation information or object covariance information (Ex) on the basis of a side information (1518) included in the encoded representation.
The multi-channel audio decoder according to one of claims 7 to 10, wherein the multi-channel audio decoder is configured to determine actual correlation characteristics or covariance characteristics (Es) of the rendered audio signals and the one or more decorrelated audio signals, and
to adjust the combination (150; 1598) of the rendered audio signals with the one or more decorrelated audio signals, to obtain the output audio signals, in dependence on the actual correlation characteristics or covariance characteristics (Es) of the rendered audio signals and the one or more decorrelated audio signals.
The multi-channel audio decoder according to one of claims 1 to 1 1 ,
wherein the multi-channel audio decoder is configured to combine the rendered audio signals Z with the one or more decorrelated audio signals W, to obtain the output audio signals Z according to
Z PZ + MW
wherein P is a mixing matrix which is applied to the rendered audio signals Z, and wherein M is a mixing matrix which is applied to the one or more decorrelated audio signals W.
13. The multi-channel audio decoder according to claim 12,
wherein the multi-channel audio decoder is configured to adjust at least one out of the mixing matrix P and the mixing matrix such that correlation characteristics or covariance characteristics (JE¾) of the obtained output audio signals Z approximate or equal the desired correlation characteristics or desired covariance characteristics (C).
14. The multi-channel audio decoder according to claim 12 or claim 13,
wherein the multi-channel audio decoder is configured to jointly compute the mixing matrix P and the mixing matrix M.
15. The multi-channel audio decoder according to one of claims 12 to 14,
wherein the multi-channel audio decoder is configured to obtain a combined mixing matrix F, with
F = [P M]
such that a covariance matrix E% of the obtained output audio signals Z approximates or equals a desired covariance matrix C.
16. The multi-channel audio decoder according to claim 15,
wherein the multi-channel audio decoder is configured to determine the combined mixing matrix F such that the covariance matrix
is equal to the desired covariance matrix
C = RE_VR
wherein Es is a covariance matrix of a signal S combining the rendered audio signals Z and the one or more decorrelated audio signals W, which is defined as
wherein Ex is an object covariance matrix.
The multi-channel audio decoder according to one of claims 1 to 1 1 ,
wherein the multi-channel audio decoder is configured to combine the rendered audio signals Z with the one or more decorrelated audio signals W, to obtain the output audio signals Z
according to
Z = ArlrvPZ + MW
or according to
Z = PZ + AWPtMW
or according to
AdryPZ + AwetMW
wherein P is a mixing matrix which is applied to the rendered audio signals Z, and
wherein M is a mixing matrix which is applied to the one or more decorrelated audio signals W,
wherein Adry is a first correction matrix or a first adjustment matrix,
wherein Awet is a second correction matrix or a second adjustment matrix.
18. The multi-channel audio decoder according to claim 17,
wherein the multi-channel audio decoder is configured to adjust at least one out of the mixing matrix P and the mixing matrix M such that correlation characteristics or covariance characteristics (E%) of the obtained output audio signals Z or of audio signals obtained by a mixing of Z and W using P and approximate or equal the desired correlation characteristics or desired covariance characteristics (C).
19. The multi-channel audio decoder according to claim 17 or claim 18,
wherein the multi-channel audio decoder is configured to jointly compute the mixing matrix P and the mixing matrix M.
20. The multi-channel audio decoder according to one of claims 17 to 19,
wherein the multi-channel audio decoder is configured to obtain a combined mixing matrix F, with
F = [P M]
such that a covariance matrix Et of the obtained output audio signals Z or a covariance matrix of audio signals obtained by a mixing of Z and W using P and M approximates or equals a desired covariance matrix C.
21. The multi-channel audio decoder according to claim 20,
wherein the multi-channel audio decoder is configured to determine the combined mixing matrix F such that the covariance matrix
= FE,F
is equal to the desired covariance matrix
C = RE ,.R
wherein Es is a covariance matrix of a signal S combining the rendered audio signals Z and the one or more decorrelated audio signals W, which is defined as
Z
S = , and
w
wherein Ex is an object covariance matrix.
The multi-channel audio decoder according to one of claims 17 to 21 ,
wherein the multi-channel audio decoder is configured to determine the first correction matrix such that a contribution of the rendered audio signals onto the output audio signals is limited, and/or
wherein the multi-channel audio decoder is configured to determine the second correction matrix such that a contribution of the decorrelated audio signals onto the output audio signals is limited.
The multi-channel audio decoder according to one of claims 17 to 22,
wherein the multi-channel audio decoder is configured to determine the first correction matrix in dependence on properties of the rendered audio signals, and/or in dependence on properties of the decorrelated audio signals, and/or in dependence on properties of desired output audio signals, and/or in dependence on estimated properties of mixed rendered audio signals, and/or in dependence on estimated properties of mixed decorrelated audio signals, such that a contribution of the rendered audio signals onto the output audio signals is limited, and/or
wherein the multi-channel audio decoder is configured to determine the second correction matrix in dependence on a properties of the rendered audio signals, and/or in dependence on properties of the decorrelated audio signals, and/or in dependence on properties of desired output audio signals, and/or in dependence on estimated properties of mixed rendered audio signals, and/or in dependence on estimated properties of mixed decorrelated audio signals, such that a contribution of the decorrelated audio signals onto the output audio signals is limited.
The multi-channel audio decoder according to claim 23, wherein the properties of the rendered audio signals, and/or of the decorrelated audio signals, and/or of the desired output audio signals, and/or of the mixed rendered audio signals, and/ or the mixed decorrelated audio signals are energy properties, or correlation properties, or covariance properties.
The multi-channel audio decoder according to one of claims 1 to 24,
wherein the multi-channel audio decoder is configured to combine the rendered audio signals Z with the one or more decorrelated audio signals W, to obtain the output audio signals Z
according to
Z = PZ + AwetMW,
wherein the multi-channel audio decoder is configured to provide the correction matrix Awet such that Awet is a diagonal matrix and such that entries Awet (i,i) of the correction matrix Awet are reduced when compared to normal, unreduced diagonal entries of the correction matrix Awet if a ratio between an intensity (Eyry(i, i)) of a rendered audio signal and an intensity {Εγ βΐ(ϊ, ΐ)) of a mixed decorrelated audio signal, with mixing matrix M, in an i-th output audio signal would be smaller than a threshold value.
The multi-channel audio decoder according to claim 25, wherein the threshold value is a predetermined constant threshold value or wherein the threshold value is time-variant and/or frequency variant in dependence on signal properties, for example, energy properties, correlation properties and/or covariance properties.
27. The multi-channel audio decoder according to one of claims 1 to 26,
wherein the multi-channel audio decoder is configured to combine the rendered audio signals Z with the one or more decorrelated audio signals W, to obtain the output audio signals Z
according to
= PZ + AwetMW,
wherein
wherein M=Pwet,
wherein Eyry is a covariance matrix of the rendered audio signals Z, and
wherin Ey et is an estimated covariance matrix of the decorrelated audio signals after the matrix Pwet has been applied.
28. The multi-channel audio decoder according to claim 17, wherein the multi-channel audio decoder is configured to determine the combined mixing matrix F according to
where the matrices U, T, V and Q are determined using Singular Value Decomposition of the covariance matrices Es and C yielding
C = UTU" ,
and
Es = VQV" ,
wherein the matrix H is defined as
wherein aM and bjj are chosen such that
a + b = 1
The multi-channel audio decoder according to claim 12 or claim 13,
wherein the multi-channel audio decoder is configured to set the mixing matrix P to be an identity matrix, or a multiple thereof, and to compute the mixing matrix M.
The multi-channel audio decoder according to claim 29, wherein the multi-channel audio decoder is configured to determine the mixing matrix Wl such that a difference &L between the desired covariance matrix C and a covariance matrix Ε , which is defined as
is equal to, or approximates, a covariance
ME„-MA
wherein the desired covariance matrix C is defined as
C = RE R"
wherein R is a rendering matrix,
wherein Ex is an object covariance matrix, and
wherein Ew is a covariance matrix of the one or more decorrelated signals, and
wherein E% is a covariance matrix of the rendered audio signals.
31 . The multi-channel audio decoder according to claim 30,
wherein the multi-channel audio decoder is configured to determine the mixing matrix M according to
where the matrices U, T, V and Q are determined using Singular Value Decomposition of the covariance matrices and Ew yielding
32. The multi-channel audio decoder according to claim 12 or claim 13,
wherein the multi-channel audio decoder is configured to determine the mixing matrices P, M under the restriction that a given rendered audio signal is only mixed with a decorrelated version of the given rendered audio signal itself.
33. The multi-channel audio decoder according to claim 12 or claim 13 or claim 32, wherein the multi-channel audio decoder is configured to combine the rendered audio signals with the one or more decorrelated audio signals such that only
autocorrelation values or autocovariance values of rendered audio signals are modified while cross-correlation values or cross-covariance values are left unchanged.
34. The multi-channel audio decoder according to claim 12 or claim 13 or claim 32 or claim 33,
wherein the multi-channel audio decoder is configured to set the mixing matrix P to be an identity matrix, or a multiple thereof, and to compute the mixing matrix M under the restriction that M is a diagonal matrix.
35. The multi-channel audio decoder according to claim 32 or 33 or 34, wherein the multi-channel audio decoder is configured to combine the rendered audio signals Z with the one or more decorrelated audio signals W, to obtain the output audio signals Z according to
Z - Z + MW
wherein M is a diagonal mixing matrix which is applied to the one or more decorrelated audio signals W, and
wherein the multi-channel audio decoder is configured to compute diagonal elements of the mixing matrix such that diagonal elements of a covariance matrix of the output audio signals are equal to desired energies.
The multi-channel audio decoder according to claim 35, wherein the multi-channel audio decoder is configured to compute the elements of the mixing matrix M according to
wherein the desired covariance matrix C is defined as
C = RE_ R
wherein R is a rendering matrix,
wherein Ex is an object covariance matrix,
wherein Ew is a covariance matrix of the of the one or more decorrelated signals, and
wherein Dec is a threshold value limiting an amount of decorrelation added to the signals.
The multi-channel audio decoder according to one of claims 1 to 36, wherein the multi-channel audio decoder is configured to consider correlation characteristics or covariance characteristics of the decorrelated audio signals when determining how to combine the rendered audio signals, or the scaled version thereof, with the one or more decorrelated audio signals.
38. The multi-channel audio decoder according to one of claims 1 to 28 or 37, wherein the multi-channel audio decoder is configured to mix rendered audio signals and decorrelated audio signals, such that a given output audio signal is provided on the basis of two or more rendered audio signals and at least one decorrelated audio signal.
The multi-channel audio decoder according to one of claims 1 to 38, wherein the multi-channel audio decoder is configured to switch between different modes, in which different restrictions are applied for determining how to combine the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to obtain the output audio signals.
The multi-channel audio decoder according to one of claims 1 to 39, wherein the multi-channel audio decoder is configured to switch between
a first mode, in which a mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals,
a second mode in which no mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, and in which it is allowed that a given decorrelated signal is combined, with same or different scaling, with a plurality of rendered audio signals, or a scaled version thereof, in order to adjust cross-correlation characteristics or cross-covariance characteristics of the output audio signals, and
a third mode in which no mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, and in which it is not allowed that a given decorrelated signal is combined with rendered audio signals other than a rendered audio signal from which the given decorrelated signal is derived.
The multi-channel audio decoder according to claim 39 or claim 40, wherein the multi-channel audio decoder is configured to evaluate a bitstream element of the encoded representation indicating which of the three modes for combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals is to be used, and to select the mode in dependence on said bitstream element.
A multi-channel audio encoder (200; 1510; 2900)for providing an encoded representation (214; 1516a, 1516b, 1518; 2932) on the basis of at least two input audio signals (210,212; 1512a-1512n; 2912, 2914),
wherein the multi-channel audio encoder is configured to provide (220) one or more downmix signals (222; 1516a, 516b) on the basis of the at least two input audio signals, and
wherein the multi-channel audio encoder is configured to provide (230) one or more parameters (232; 1518) describing a relationship between the at least two input audio signals, and
wherein the multi-channel audio encoder is configured to provide (240) a decorrelation method parameter (242; 1518) describing which decorrelation mode out of a plurality of decorrelation modes should be used at the side of an audio decoder.
43. The multi-channel audio encoder according to claim 42, wherein the multi-channel audio encoder is configured to selectively provide the decorrelation method parameter, to signal one out of the following three modes for the operation of an audio decoder:
a first mode, in which a mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals,
a second mode in which no mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, and in which it is allowed that a given decorrelated signal is combined, with same or different scaling, with a plurality of rendered audio signals, or a scaled version thereof, in order to adjust cross-correlation characteristics or cross-covariance characteristics of the output audio signals, and
a third mode in which no mixing between different rendered audio signals is allowed when combining the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, and in which it is not allowed that a given decorrelated signal is combined with rendered audio signals other than a rendered audio signal from which the given decorrelated signal is derived.
44. The multi-channel audio encoder according to claim 42 or claim 43, wherein the multi-channel audio encoder is configured to select the decorrelation method parameter in dependence on whether the input audio signals comprise a comparatively high correlation or a comparatively lower correlation.
45. The multi-channel audio encoder according to claim 43, wherein the multi-channel audio encoder is configured to select the decorrelation method parameter to designate the first mode or the second mode if a correlation between the input audio signals is comparatively high, and
wherein the multi-channel audio encoder is configured to select the decorrelation method parameter to designate the third mode if a correlation between the input audio signals is comparatively lower.
A method (300) for providing at least two output audio signals on the basis of an encoded representation, the method comprising:
rendering (310) a plurality of decoded audio signals, which are obtained on the basis of the encoded representation, in dependence on one or more rendering parameters, to obtain a plurality of rendered audio signals,
deriving (320) one or more decorrelated audio signals from the rendered audio signals, and
combining (330) the rendered audio signals, or a scaled version thereof, with the one or more decorrelated audio signals, to obtain the output audio signals.
A method (400) for providing an encoded representation on the basis of at least two input audio signals, the method comprising:
providing (410) one or more downmix signals on the basis of the at least two input audio signals,
providing (420) one or more parameters describing a relationship between the at least two input audio signals, and
providing (430) a decorrelation method parameter describing which decorrelation mode out of a plurality of decorrelation modes should be used at the side of an audio decoder.
A computer program for performing the method according to claim 46 or claim 47 when the computer program runs on a computer.
An encoded audio representation (500), comprising:
an encoded representation (510) of a downmix signal;
an encoded representation (520) of one or more parameters describing a relationship between the at least two input audio signals, and
an encoded decorrelation method parameter(530) describing which decorrelation mode out of a plurality of decorrelation modes should be used at the side of an audio decoder.
| # | Name | Date |
|---|---|---|
| 1 | Form 5 [19-01-2016(online)].pdf | 2016-01-19 |
| 2 | Form 3 [19-01-2016(online)].pdf | 2016-01-19 |
| 3 | Form 20 [19-01-2016(online)].pdf | 2016-01-19 |
| 4 | Drawing [19-01-2016(online)].pdf | 2016-01-19 |
| 5 | Description(Complete) [19-01-2016(online)].pdf | 2016-01-19 |
| 6 | 201637001870-(22-04-2016)-FORM-1.pdf | 2016-04-22 |
| 7 | 201637001870-(22-04-2016)-CORRESPONDENCE.pdf | 2016-04-22 |
| 8 | Other Patent Document [30-05-2016(online)].pdf | 2016-05-30 |
| 9 | Other Patent Document [30-09-2016(online)].pdf | 2016-09-30 |
| 10 | Other Patent Document [23-01-2017(online)].pdf | 2017-01-23 |
| 11 | Other Patent Document [25-02-2017(online)].pdf | 2017-02-25 |
| 12 | Other Patent Document [28-03-2017(online)].pdf | 2017-03-28 |
| 13 | Information under section 8(2) [07-07-2017(online)].pdf | 2017-07-07 |
| 14 | 201637001870-Information under section 8(2) (MANDATORY) [15-07-2017(online)].pdf | 2017-07-15 |
| 15 | 201637001870-Information under section 8(2) (MANDATORY) [25-07-2017(online)].pdf | 2017-07-25 |
| 16 | 201637001870-Information under section 8(2) (MANDATORY) [24-08-2017(online)].pdf | 2017-08-24 |
| 17 | 201637001870-Information under section 8(2) (MANDATORY) [25-09-2017(online)].pdf | 2017-09-25 |
| 18 | 201637001870-Information under section 8(2) (MANDATORY) [02-12-2017(online)].pdf | 2017-12-02 |
| 19 | 201637001870-Information under section 8(2) (MANDATORY) [18-01-2018(online)].pdf | 2018-01-18 |
| 20 | 201637001870-Information under section 8(2) (MANDATORY) [24-04-2018(online)].pdf | 2018-04-24 |
| 21 | 201637001870-Information under section 8(2) (MANDATORY) [30-05-2018(online)].pdf | 2018-05-30 |
| 22 | 201637001870-Information under section 8(2) (MANDATORY) [01-08-2018(online)].pdf | 2018-08-01 |
| 23 | 201637001870-Information under section 8(2) (MANDATORY) [16-10-2018(online)].pdf | 2018-10-16 |
| 24 | 201637001870-Information under section 8(2) (MANDATORY) [27-11-2018(online)].pdf | 2018-11-27 |
| 25 | 201637001870-Information under section 8(2) (MANDATORY) [29-11-2018(online)].pdf | 2018-11-29 |
| 26 | 201637001870-Information under section 8(2) (MANDATORY) [16-01-2019(online)].pdf | 2019-01-16 |
| 27 | 201637001870-Information under section 8(2) (MANDATORY) [05-04-2019(online)].pdf | 2019-04-05 |
| 28 | 201637001870-Information under section 8(2) (MANDATORY) [20-05-2019(online)].pdf | 2019-05-20 |
| 29 | 201637001870-Information under section 8(2) (MANDATORY) [26-07-2019(online)].pdf | 2019-07-26 |
| 30 | 201637001870-Information under section 8(2) (MANDATORY) [13-08-2019(online)].pdf | 2019-08-13 |
| 31 | 201637001870-Information under section 8(2) (MANDATORY) [07-11-2019(online)].pdf | 2019-11-07 |
| 32 | 201637001870-FER.pdf | 2019-11-15 |
| 33 | 201637001870-Information under section 8(2) [30-04-2020(online)].pdf | 2020-04-30 |
| 34 | 201637001870-Information under section 8(2) [09-05-2020(online)].pdf | 2020-05-09 |
| 35 | 201637001870-OTHERS [15-05-2020(online)].pdf | 2020-05-15 |
| 36 | 201637001870-FER_SER_REPLY [15-05-2020(online)].pdf | 2020-05-15 |
| 37 | 201637001870-COMPLETE SPECIFICATION [15-05-2020(online)].pdf | 2020-05-15 |
| 38 | 201637001870-CLAIMS [15-05-2020(online)].pdf | 2020-05-15 |
| 39 | 201637001870-FORM 3 [07-07-2020(online)].pdf | 2020-07-07 |
| 40 | 201637001870-Information under section 8(2) [20-07-2020(online)].pdf | 2020-07-20 |
| 41 | 201637001870-Information under section 8(2) [13-10-2020(online)].pdf | 2020-10-13 |
| 42 | 201637001870-Information under section 8(2) [16-12-2020(online)].pdf | 2020-12-16 |
| 43 | 201637001870-Information under section 8(2) [22-02-2021(online)].pdf | 2021-02-22 |
| 44 | 201637001870-Information under section 8(2) [10-04-2021(online)].pdf | 2021-04-10 |
| 45 | 201637001870-Information under section 8(2) [16-06-2021(online)].pdf | 2021-06-16 |
| 46 | 201637001870-Information under section 8(2) [12-07-2021(online)].pdf | 2021-07-12 |
| 47 | 201637001870-Information under section 8(2) [18-09-2021(online)].pdf | 2021-09-18 |
| 48 | 201637001870-Information under section 8(2) [05-11-2021(online)].pdf | 2021-11-05 |
| 49 | 201637001870-Information under section 8(2) [17-01-2022(online)].pdf | 2022-01-17 |
| 50 | 201637001870-Information under section 8(2) [30-03-2022(online)].pdf | 2022-03-30 |
| 51 | 201637001870-Information under section 8(2) [14-06-2022(online)].pdf | 2022-06-14 |
| 52 | 201637001870-FORM 3 [14-07-2022(online)].pdf | 2022-07-14 |
| 53 | 201637001870-Information under section 8(2) [20-07-2022(online)].pdf | 2022-07-20 |
| 54 | 201637001870-Information under section 8(2) [01-12-2022(online)].pdf | 2022-12-01 |
| 55 | 201637001870-FORM 3 [05-01-2023(online)].pdf | 2023-01-05 |
| 56 | 201637001870-Information under section 8(2) [18-04-2023(online)].pdf | 2023-04-18 |
| 57 | 201637001870-FORM 3 [06-07-2023(online)].pdf | 2023-07-06 |
| 58 | 201637001870-Information under section 8(2) [27-07-2023(online)].pdf | 2023-07-27 |
| 59 | 201637001870-US(14)-HearingNotice-(HearingDate-07-12-2023).pdf | 2023-11-10 |
| 60 | 201637001870-FORM-26 [28-11-2023(online)].pdf | 2023-11-28 |
| 61 | 201637001870-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [29-11-2023(online)].pdf | 2023-11-29 |
| 62 | 201637001870-US(14)-ExtendedHearingNotice-(HearingDate-04-01-2024).pdf | 2023-12-07 |
| 63 | 201637001870-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [29-12-2023(online)].pdf | 2023-12-29 |
| 64 | 201637001870-US(14)-ExtendedHearingNotice-(HearingDate-30-01-2024).pdf | 2024-01-02 |
| 65 | 201637001870-Correspondence to notify the Controller [25-01-2024(online)].pdf | 2024-01-25 |
| 66 | 201637001870-FORM 3 [13-02-2024(online)].pdf | 2024-02-13 |
| 67 | 201637001870-Written submissions and relevant documents [14-02-2024(online)].pdf | 2024-02-14 |
| 68 | 201637001870-Annexure [14-02-2024(online)].pdf | 2024-02-14 |
| 69 | 201637001870-PatentCertificate19-02-2024.pdf | 2024-02-19 |
| 70 | 201637001870-IntimationOfGrant19-02-2024.pdf | 2024-02-19 |
| 1 | _Search_14-11-2019.pdf |