Sign In to Follow Application
View All Documents & Correspondence

Parametric Joint Coding Of Audio Sources

Abstract: The following coding scenario is addressed: A number of audio source signals need to be transmitted or stored for the purpose of mixing wave field synthesis, multi channel surround, or stereo signals after decoding the source signals. The proposed technique offers significant coding gain when jointly coding the source signals, compared to separately coding them, even when no redundancy is present between the source signals. This is possible by considering statistical properties of the source signals, the properties of mixing techniques, and spatial hearing. The sum of the source signals is transmitted plus the statistical properties of the source signals which mostly determine the perceptually important spatial cues of the final mixed audio channels. Source signals are recovered at the receiver such that their statistical properties approximate the corresponding properties of the original source signals. Subjective evaluations indicate that high audio quality is achieved by the proposed scheme.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
26 September 2012
Publication Number
47/2014
Publication Type
INA
Invention Field
COMMUNICATION
Status
Email
Parent Application
Patent Number
Legal Status
Grant Date
2021-06-04
Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG E.V.
HANSASTRAßE 27C, 80686 MUNICH GERMANY

Inventors

1. FALLER, CHRISTOF
Guetrain 1, 8274 Tägerwilen, SWITZERLAND

Specification

PARAMETRIC JOINT-CODING OF AUDIO SOURCES 1. INTRODUCTION In a general coding problem, we have a number of (mono) source signals si(n) (1 0) and less de- correlation processing can be applied than would be needed for generating independent M or N channels. Due to less de-correlation processing better audio quality is expected. Best audio quality is expected when the mixer parameters are constrained such that αi2 +bi2 = 1, i.e. Gi = 0 dB. In this case, the power of each source in the transmitted sum signal (1) is the same as the power of the same source in the mixed decoder output signal. The decoder output signal (Figure 10) is the same as if the mixer output signal (Figure 4) were encoded and decoded by a BCC encoder/decoder in this case. Thus, also similar quality can be expected. The decoder can not only determine the direction at which each source is to appear but also the gain of each source can be varied. The gain is increased by choosing αi2 +bi2 > 1, (Gi > 0 dB) and decreased by choosing αi2 +bi2 < 1 (Gi < 0 dB). B. Using no de-corretation processing The restriction of the previously described technique is that mixing is carried out with a BCC synthesis scheme. One could imagine implementing not only ICTD, ICLD, and ICC synthesis but additionally effects processing within the BCC synthesis. However, it may be desired that existing mixers and effects processors can be used. This also includes wavefield synthesis mixers (often denoted "convoluters"). For using existing mixers and effects processors, the Ŝi(n)are computed explicitly and used as if they were the original source signals. When applying no de-correlation processing (hi(n) = δ(n) in (16)) good audio quality can also be achieved. It is a compromise between artifacts introduced due to de- correlation processing and artifacts due to the fact that the source signals Ŝi(n) are correlated. When no de-correlation processing is used the resulting auditory spatial image may suffer from instability [1]. But the mixer may introduce itself some de- correlation when reverberators or other effects are used and thus there is less need for de-correlation processing. If Ŝi(n)are generated without de-correlation processing, the level of the sources depends on the direction to which they are mixed relative to the other sources. By replacing amplitude panning algorithms in existing mixers with an algorithm compensating for this level dependence, the negative effect of loudness dependence on mixing parameters can be circumvented. A level compensating amplitude algorithm is shown in Figure 11 which aims to compensate the source level dependence on mixing parameters. Given the gain factors of a conventional amplitude panning algorithm (e.g. Figure 4), ai and bi, the weights in Figure 11, ᾱ{ and bi, are computed by Note that ᾱi and bi are computed such that the output subband power is the same as if Ŝi(n) were independent in each subband. c. Reducing the amount of de-correlation processing As mentioned previously, the generation of independent Ŝi(n) is problematic. Here strategies are described for applying less de-correlation processing, while effectively getting a similar effect as if the Ŝi(n) were independent. Consider for example a wavefield synthesis system as shown in Figure 12. The desired virtual source positions for s1 s2, .... s6 (M = 6) are indicated. A strategy for computing Ŝi(n)(16) without generating M fully independent signals is: 1. Generate groups of source indices corresponding to sources close to each other. For example in Figure 8 these could be: {1}, {2, 5}, {3}, and {4, 6}. 2. At each time in each subband select the source index of the strongest source, Apply no de-correlation processing for the source indices part of the group containing 3. For each other group choose the same hi(n) within the group. The described algorithm modifies the strongest signal components least. Additionally, the number of different hi(n) that are used are reduced. This is an advantage because de-correlation is easier the less independent channels need to be generated. The described technique is also applicable when stereo or multi-channel audio signals are mixed. V. SCALABILITY IN TERMS OF QUALITY AND BITRATE The proposed scheme transmits only the sum of all source signals, which can be coded with a conventional mono audio coder. When no mono backwards compatibility is needed and capacity is available for transmission/storage of more than one audio waveform, the proposed scheme can be scaled for use with more than one transmission channel. This is implemented by generating several sum signals with different subsets of the given source signals, i.e. to each subset of source signals the proposed coding scheme is applied individually. Audio quality is expected to improve as the number of transmitted audio channels is increased because less independent channels have to be generated by de-correlation from each transmitted channel (compared to the case of one transmitted channel). VI. BACKWARDS COMPATIBILITY TO EXISTING STEREO AND SURROUND AUDIO FORMATS Consider the following audio delivery scenario. A consumer obtains a maximum quality stereo or multi-channel surround signal (e.g. by means of an audio CD, DVD, or on-line music store, etc.). The goal is to optionally deliver to the consumer the flexibility to generate a custom mix of the obtained audio content, without compromising standard stereo/surround playback quality. This is implemented by delivering to the consumer (e.g. as optional buying option in an on-line music store) a bit stream of side information which allows computation of Ŝi(n) as a function of the given stereo or multi-channel audio signal. The consumer's mixing algorithm is then applied to the Ŝi(n). In the following, two possibilities for computing Ŝi(n), given stereo or multi-channel audio signals, are described. A. Estimating the sum of the source signals at the receiver The most straight forward way of using the proposed coding scheme with a stereo or multi-channel audio transmission is illustrated in Figure 13, where yi(n) (1 < i < L) are the L channels of the given stereo or multi-channel audio signal. The sum signal of the sources is estimated by downmixing the transmitted channels to a single audio channel. Downmixing is carried out by means of computing the sum of the channels yi(n) (1 < i < L) or more sophisticated techniques may be applied. For best performance, it is recommended that the level of the source signals is adapted prior to E {s~i2 (n)} estimation (6) such that the power ratio between the source signals approximates the power ratio with which the sources are contained in the given stereo or multi-channel signal. In this case, the downmix of the transmitted channels is a relatively good estimate of the sum of the sources (1) (or a scaled version thereof). An automated process may be used to adjust the level of the encoder source signal inputs si(n) prior to computation of the side information. This process adaptively in time estimates the level at which each source signal is contained in the given stereo or multi-channel signal. Prior to side information computation, the level of each source signal is then adaptively in time adjusted such that it is equal to the level at which the source is contained in the stereo or multi-channel audio signal. B. Using the transmitted channels individually Figure 14 shows a different implementation of the proposed scheme with stereo or multi-channel surround signal transmission. Here, the transmitted channels are not downmixed, but used individually for generation of the Ŝi(n). Most generally, the subband signals of Ŝi (n) are computed by where wi(n) are weights determining specific linear combinations of the transmitted channels' subbands. The linear combinations are chosen such that the Ŝi(n) are already as much decorrelated as possible. Thus, no or only a small amount of de- correlation processing needs to be applied, which is favorable as discussed earlier. VII. APPLICATIONS Already previously we mentioned a number of applications for the proposed coding schemes, Here, we summarize these and mention a few more applications. A. Audio coding for mixing Whenever audio source signals need to be stored or transmitted prior to mixing them to stereo, multi-channel, or wavefield synthesis audio signals, the proposed scheme can be applied. With prior art, a mono audio coder would be applied to each source signal independently, resulting in a bitrate which scales with the number of sources. The proposed coding scheme can encode a high number of audio source signals with a single mono audio coder plus relatively low bitrate side information. As described in Section V, the audio quality can be improved by using more than one transmitted channel, if the memory/capacity to do so is available. B. Re-mixing with meta-data As described in Section VI, existing stereo and multi-channel audio signals can be re- mixed with the help of additional side information (i.e. "meta-data"). As opposed to only selling optimized stereo and multi-channel mixed audio content, meta data can be sold allowing a user to re-mix his stereo and multi-channel music. This can for example also be used for attenuating the vocals in a song for karaoke, or for attenuating specific instruments for playing an instrument along the music. Even if storage would not be an issue, the described scheme would be very attractive for enabling custom mixing of music. That is, because it is likely that the music industry would never be willing to give away the multi-track recordings. There is too much a danger for abuse. The proposed scheme enables re-mixing capability without giving away the multi-track recordings. Furthermore, as soon as stereo or multi-channel signals are re-mixed a certain degree of quality reduction occurs, making illegal distribution of re-mixes less attractive. c. Stereo/multi-channel to wavefield synthesis conversion Another application for the scheme described in Section VI is described in the following. The stereo and multi-channel (e.g. 5.1 surround) audio accompanying moving pictures can be extended for wavefield synthesis rendering by adding side information. For example, Dolby AC-3 (audio on DVD) can be extended for 5.1 backwards compatibly coding audio for wavefield synthesis systems, i.e. DVDs play back 5.1 surround sound on conventional legacy players and wavefield synthesis sound on a new generation of players supporting processing of the side information. VIII. SUBJECTIVE EVALUATIONS We implemented a real-time decoder of the algorithms proposed in Section IV-A and IV-B. An FFT-based STFT fitterbank is used. A 1024-point FFT and a STFT window size of 768 (with zero padding) are used. The spectral coefficients are grouped together such that each group represents signal with a bandwidth of two times the equivalent rectangular bandwidth (ERB). Informal listening revealed that the audio quality did not notably improve when choosing higher frequency resolution. A lower frequency resolution is favorable since it results in less parameters to be transmitted. For each source, the amplitude/delay panning and gain can be adjusted individually. The algorithm was used for coding of several multi-track audio recordings with 12 - 14 tracks. The decoder allows 5.1 surround mixing using a vector base amplitude panning (VBAP) mixer. Direction and gain of each source signal can be adjusted. The software allows on the-fly switching between mixing the coded source signal and mixing the original discrete source signals. Casual listening usually reveals no or little difference between mixing the coded or original source signals if for each source a gain G, of zero dB is used. The more the source gains are varied the more artifacts occur. Slight amplification and attenuation of the sources (e.g. up to ± 6 dB) still sounds good. A critical scenario is when all the sources are mixed to one side and only a single source to the other opposite side. In this case the audio quality may be reduced, depending on the specific mixing and source signals. IX. CONCLUSIONS A coding scheme for joint-coding of audio source signals, e.g. the channels of a multi-track recording, was proposed. The goal is not to code the source signal waveforms with high quality, in which case joint-coding would give minimal coding gain since the audio sources are usually independent. The goal is that when the coded source signals are mixed a high quality audio signal is obtained. By considering statistical properties of the source signals, the properties of mixing schemes, and spatial hearing it was shown that significant coding gain improvement is achieved by jointly coding the source signals. The coding gain improvement is due to the fact that only one audio waveform is transmitted. Additionally side information, representing the statistical properties of the source signals which are the relevant factors determining the spatial perception of the final mixed signal, are transmitted. The side information rate is about 3 kbs per source signal. Any mixer can be applied with the coded source signals, e.g. stereo, multi-channel, or wavefield synthesis mixers. It is straight forward to scale the proposed scheme for higher bitrate and quality by means of transmitting more than one audio channel. Furthermore, a variation of the scheme was proposed which allows re-mixing of the given stereo or multi-channel audio signal (and even changing of the audio format, e.g. stereo to multi-channel or wavefield synthesis). The applications of the proposed scheme are manifold. For example MPEG-4 could be extended with the proposed scheme to reduce bitrate when more than one "natural audio object" (source signal) needs to be transmitted. Also, the proposed scheme offers compact representation of content for wavefield synthesis systems. As mentioned, existing stereo or multi-channel signals could be complemented with side information to allow that the user re-mixes the signals to his liking. REFERENCES [1] C. Faller, Parametric Coding of Spatial Audio, Ph.D. thesis, Swiss Federal Institute of Technology Lausanne (EPFL), 2004, Ph.D. Thesis No. 3062. [2] C. Faller and F. Baumgarte, "Binaural Cue Coding - Part if: Schemes and applications," IEEE Trans, on Speech and Audio Proc, vol. 11, no. 6, Nov. 2003. We Claim: 1. Method of encoding a plurality of source signals, comprising: computing, for the plurality of source signals, statistical information representing the one or more source signals, and transmitting the computed statistical information as metadata for an audio signal derived from the plurality of source signals. 2. Method as claimed in claim 1, wherein the statistical information comprises information on a subband power of the plurality of source signals, a normalized subband cross correlation function or a normalized subband auto-correlation function. 3. Apparatus for encoding a plurality of source signals, wherein the apparatus is operative for: computing, for the plurality of source signals, statistical information representing a spectral envelope of one or more source signals, and transmitting the computed statistical information as metadata for an audio signal derived from the plurality of source signals.

Documents

Application Documents

# Name Date
1 2819-KOLNP-2012-(25-09-2012)-SPECIFICATION.pdf 2012-09-25
2 2819-KOLNP-2012-(25-09-2012)-FORM-5.pdf 2012-09-25
3 2819-KOLNP-2012-(25-09-2012)-FORM-3.pdf 2012-09-25
4 2819-KOLNP-2012-(25-09-2012)-FORM-2.pdf 2012-09-25
5 2819-KOLNP-2012-(25-09-2012)-FORM-1.pdf 2012-09-25
6 2819-KOLNP-2012-(25-09-2012)-DRAWINGS.pdf 2012-09-25
7 2819-KOLNP-2012-(25-09-2012)-DESCRIPTION (COMPLETE).pdf 2012-09-25
8 2819-KOLNP-2012-(25-09-2012)-CORRESPONDENCE.pdf 2012-09-25
9 2819-KOLNP-2012-(25-09-2012)-CLAIMS.pdf 2012-09-25
10 2819-KOLNP-2012-(25-09-2012)-ABSTRACT.pdf 2012-09-25
11 2819-KOLNP-2012-(25-03-2013)-FORM-18.pdf 2013-03-25
12 Other Patent Document [31-08-2016(online)].pdf 2016-08-31
13 Other Patent Document [16-02-2017(online)].pdf 2017-02-16
14 Other Patent Document [31-03-2017(online)].pdf 2017-03-31
15 Information under section 8(2) [31-05-2017(online)].pdf 2017-05-31
16 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [23-08-2017(online)].pdf 2017-08-23
17 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [09-02-2018(online)].pdf 2018-02-09
18 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [17-02-2018(online)].pdf 2018-02-17
19 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [13-03-2018(online)].pdf 2018-03-13
20 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [29-08-2018(online)].pdf 2018-08-29
21 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [20-11-2018(online)].pdf 2018-11-20
22 2819-KOLNP-2012-FER.pdf 2019-01-08
23 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [01-03-2019(online)].pdf 2019-03-01
24 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [25-04-2019(online)].pdf 2019-04-25
25 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [06-06-2019(online)].pdf 2019-06-06
26 2819-KOLNP-2012-Proof of Right (MANDATORY) [14-06-2019(online)].pdf 2019-06-14
27 2819-KOLNP-2012-FORM-26 [14-06-2019(online)].pdf 2019-06-14
28 2819-KOLNP-2012-Proof of Right (MANDATORY) [21-06-2019(online)].pdf 2019-06-21
29 2819-KOLNP-2012-FORM-26 [21-06-2019(online)].pdf 2019-06-21
30 2819-KOLNP-2012-FORM 4(ii) [01-07-2019(online)].pdf 2019-07-01
31 2819-KOLNP-2012-Information under section 8(2) (MANDATORY) [16-09-2019(online)].pdf 2019-09-16
32 2819-KOLNP-2012-PETITION UNDER RULE 137 [04-10-2019(online)].pdf 2019-10-04
33 2819-KOLNP-2012-PETITION UNDER RULE 137 [04-10-2019(online)]-1.pdf 2019-10-04
34 2819-KOLNP-2012-FORM-26 [04-10-2019(online)].pdf 2019-10-04
35 2819-KOLNP-2012-FORM 13 [04-10-2019(online)].pdf 2019-10-04
36 2819-KOLNP-2012-FER_SER_REPLY [04-10-2019(online)].pdf 2019-10-04
37 2819-KOLNP-2012-DRAWING [04-10-2019(online)].pdf 2019-10-04
38 2819-KOLNP-2012-CORRESPONDENCE [04-10-2019(online)].pdf 2019-10-04
39 2819-KOLNP-2012-COMPLETE SPECIFICATION [04-10-2019(online)].pdf 2019-10-04
40 2819-KOLNP-2012-CLAIMS [04-10-2019(online)].pdf 2019-10-04
41 2819-KOLNP-2012-ABSTRACT [04-10-2019(online)].pdf 2019-10-04
42 2819-KOLNP-2012-Information under section 8(2) [04-02-2020(online)].pdf 2020-02-04
43 2819-KOLNP-2012-Information under section 8(2) [05-10-2020(online)].pdf 2020-10-05
44 2819-KOLNP-2012-Information under section 8(2) [13-10-2020(online)].pdf 2020-10-13
45 2819-KOLNP-2012-Information under section 8(2) [20-11-2020(online)].pdf 2020-11-20
46 2819-KOLNP-2012-FORM 3 [10-04-2021(online)].pdf 2021-04-10
47 2819-KOLNP-2012-PatentCertificate04-06-2021.pdf 2021-06-04
48 2819-KOLNP-2012-IntimationOfGrant04-06-2021.pdf 2021-06-04
49 2819-KOLNP-2012-RELEVANT DOCUMENTS [07-09-2023(online)].pdf 2023-09-07

Search Strategy

1 Search_19-12-2017.pdf

ERegister / Renewals

3rd: 11 Aug 2021

From 13/02/2008 - To 13/02/2009

4th: 11 Aug 2021

From 13/02/2009 - To 13/02/2010

5th: 11 Aug 2021

From 13/02/2010 - To 13/02/2011

6th: 11 Aug 2021

From 13/02/2011 - To 13/02/2012

7th: 11 Aug 2021

From 13/02/2012 - To 13/02/2013

8th: 11 Aug 2021

From 13/02/2013 - To 13/02/2014

9th: 11 Aug 2021

From 13/02/2014 - To 13/02/2015

10th: 11 Aug 2021

From 13/02/2015 - To 13/02/2016

11th: 11 Aug 2021

From 13/02/2016 - To 13/02/2017

12th: 11 Aug 2021

From 13/02/2017 - To 13/02/2018

13th: 11 Aug 2021

From 13/02/2018 - To 13/02/2019

14th: 11 Aug 2021

From 13/02/2019 - To 13/02/2020

15th: 11 Aug 2021

From 13/02/2020 - To 13/02/2021

16th: 11 Aug 2021

From 13/02/2021 - To 13/02/2022

17th: 01 Feb 2022

From 13/02/2022 - To 13/02/2023

18th: 31 Jan 2023

From 13/02/2023 - To 13/02/2024

19th: 31 Jan 2024

From 13/02/2024 - To 13/02/2025

20th: 31 Jan 2025

From 13/02/2025 - To 13/02/2026