< Back

Apparatus, Method And Computer Program For Encoding, Decoding, Scene Processing And Other Procedures Related To Dirac Based Spatial Audio Coding Using Low Order, Mid Order And High Order Components Generators

Fraunhofer Gesellschaft Zur Förderung Der Angewandten Forschung E.V.

Apparatus, Method And Computer Program For Encoding, Decoding, Scene Processing And Other Procedures Related To Dirac Based Spatial Audio Coding Using Low Order, Mid Order And High Order Components Generators

Abstract: An apparatus for generating a sound field description using an input signal comprising a mono-signal or a multi-channel signal comprises: an input signal analyzer (600) for analyzing the input signal to derive direction data and diffuseness data; a low-order components generator (810) for generating a low-order sound field description from the input signal up to a predetermined order and mode, wherein the low-order components generator is configured to derive the low-order sound field description by copying or taking the input signal or performing a weighted combination of the channels of the input signal; a mid-order components generator (820) for generating a mid-order sound field description above the predetermined order or at the predetermined order and above the predetermined mode and below or at a first truncation order using a synthesis of at least one direct portion and of at least one diffuse portion using the direction data and the diffuseness data so that the mid-order sound field description comprises a direct contribution and a diffuse contribution; and a high-order components generator (830) for generating a high-order sound field description having a component above the first truncation order using a synthesis of at least one direct portion, wherein the high order sound field description comprises a direct contribution only.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

31 May 2021

Publication Number

32/2021

Publication Type

INA

Invention Field

ELECTRONICS

Status

Email

kolkatapatent@lsdavar.in

Parent Application

Patent Number

Legal Status

Grant Date

2024-01-24

Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Hansastraße 27c 80686 München

Inventors

1. FUCHS, Guillaume

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

2. THIERGART, Oliver

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

3. KORSE, Srikanth

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

4. DÖHLA, Stefan

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

5. MULTRUS, Markus

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

6. KÜCH, Fabian

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

7. BOUTHÉON, Alexandre

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

8. EICHENSEER, Andrea

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

9. BAYER, Stefan

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

Specification

Apparatus, method and computer program for encoding, decoding, scene processing and other procedures related to DirAC based spatial audio coding using low-order, mid-order and high-order components generators Specification The present invention is directed to audio coding and, particularly, to the generation of a sound field description from an input signal using one or more sound component generators. The Directional Audio Coding (DirAC) technique [1] is an efficient approach to the analysis and reproduction of spatial sound. DirAC uses a perceptually motivated representation of the sound field based on direction of arrival (DOA) and diffuseness measured per frequency band. It is built upon the assumption that at one time instant and at one critical band, the spatial resolution of auditory system is limited to decoding one cue for direction and another for inter-aural coherence. The spatial sound is then represented in frequency domain by cross-fading two streams: a non-directional diffuse stream and a directional non-diffuse stream. DirAC was originally intended for recorded B-format sound but can also be extended for microphone signals matching a specific loudspeaker setup like 5.1 [2] or any configuration of microphone arrays [5] In the latest case, more flexibility can be achieved by recording the signals not for a specific loudspeaker setup, but instead recording the signals of an intermediate format. Such an intermediate format, which is well-established in practice, is represented by (higher-order) Ambisonics [3]. From an Ambisonics signal, one can generate the signals of every desired loudspeaker setup including binaural signals for headphone reproduction. This requires a specific renderer which is applied to the Ambisonics signal, using either a linear Ambisonics renderer [3] or a parametric renderer such as Directional Audio Coding (DirAC). An Ambisonics signal can be represented as a multi-channel signal where each channel (referred to as Ambisonics component) is equivalent to the coefficient of a so-called spatial basis function. With a weighted sum of these spatial basis functions (with the weights corresponding to the coefficients) one can recreate the original sound field in the recording location [3] Therefore, the spatial basis function coefficients (i.e., the Ambisonics components) represent a compact description of the sound field in the recording location. There exist different types of spatial basis functions, for example spherical harmonics (SHs) [3] or cylindrical harmonics (CHs) [3] CHs can be used when describing the sound field in the 2D space (for example for 2D sound reproduction) whereas SHs can be used to describe the sound field in the 2D and 3D space (for example for 2D and 3D sound reproduction). As an example, an audio signal /(t) which arrives from a certain direction (f, q) results in a spatial audio signal /(< , Q, t) which can be represented in Ambisonics format by expanding the spherical harmonics up to a truncation order H: whereby Uith(f, q) being the spherical harmonics of order / and mode m, and im(t) the expansion coefficients. With increasing truncation order H the expansion results in a more precise spatial representation. Spherical harmonics up to order H = 4 with Ambisonics Channel Numbering (ACN) index are illustrated in Fig. 1a for order n and mode m. DirAC was already extended for delivering higher-order Ambisonics signals from a first order Ambisonics signal (FOA as called B-format) or from different microphone arrays [5] This document focuses on a more efficient way to synthesize higher-order Ambisonics signals from DirAC parameters and a reference signal. In this document, the reference signal, also referred to as the down-mix signal, is considered a subset of a higher-order Ambisonics signal or a linear combination of a subset of the Ambisonics components. In addition, the present invention considers the case in which the DirAC is used for the transmission in parametric form of the audio scene. In this case, the down-mix signal is encoded by a conventional audio core encoder while the DirAC parameters are transmitted in a compressed manner as side information. The advantage of the present method is to takes into account quantization error occurring during the audio coding. In the following, an overview of a spatial audio coding system based on DirAC designed for Immersive Voice and Audio Services (IVAS) is presented. This represents one of different contexts such as a system overview of a DirAC Spatial Audio Coder. The objective of such a system is to be able to handle different spatial audio formats representing the audio scene and to code them at low bit-rates and to reproduce the original audio scene as faithfully as possible after transmission. The system can accept as input different representations of audio scenes. The input audio scene can be captured by multi-channel signals aimed to be reproduced at the different loudspeaker positions, auditory objects along with metadata describing the positions of the objects over time, or a first-order or higher-order Ambisonics format representing the sound field at the listener or reference position. Preferably the system is based on 3GPP Enhanced Voice Services (EVS) since the solution is expected to operate with low latency to enable conversational services on mobile networks. As shown in Fig. 1 b, the encoder (IVAS encoder) is capable of supporting different audio formats presented to the system separately or at the same time. Audio signals can be acoustic in nature, picked up by microphones, or electrical in nature, which are supposed to be transmitted to the loudspeakers. Supported audio formats can be multi-channel signal, first-order and higher-order Ambisonics components, and audio objects. A complex audio scene can also be described by combining different input formats. All audio formats are then transmitted to the DirAC analysis, which extracts a parametric representation of the complete audio scene. A direction of arrival and a diffuseness measured per time-frequency unit form the parameters. The DirAC analysis is followed by a spatial metadata encoder, which quantizes and encodes DirAC parameters to obtain a low bit-rate parametric representation. Along with the parameters, a down-mix signal derived from the different sources or audio input signals is coded for transmission by a conventional audio core-coder. In this case an EVS-based audio coder is adopted for coding the down-mix signal. The down-mix signal consists of different channels, called transport channels: the signal can be e.g. the four coefficient signals composing a B-format signal, a stereo pair or a monophonic down-mix depending of the targeted bit-rate. The coded spatial parameters and the coded audio bitstream are multiplexed before being transmitted over the communication channel. The encoder side of the DirAC-based spatial audio coding supporting different audio formats is illustrated in Fig. 1 b. An acoustic/electrical input 1000 is input into an encoder interface 1010, where the encoder interface has a specific functionality for first-order Ambisonics (FOA) or high order Ambisonics (HOA) illustrated in 1013. Furthermore, the encoder interface has a functionality for multichannel (MC) data such as stereo data, 5.1 data or data having more than two or five channels. Furthermore, the encoder interface 1010 has a functionality for object coding as, for example, SAOC (spatial audio object coding) illustrated 101 1. The IVAS encoder comprises a DirAC stage 1020 having a DirAC analysis block 1021 and a downmix (DMX) block 1022. The signal output by block 1022 is encoded by an IVAS core encoder 1040 such as AAC or EVS encoder, and the metadata generated by block 1021 is encoded using a DirAC metadata encoder 1030. In the decoder, shown in Fig. 2, the transport channels are decoded by the core-decoder, while the DirAC metadata is first decoded before being conveyed with the decoded transport channels to the DirAC synthesis. At this stage, different options can be considered. It can be requested to play the audio scene directly on any loudspeaker or headphone configurations as is usually possible in a conventional DirAC system (MC in Fig. 2). The decoder can also deliver the individual objects as they were presented at the encoder side (Objects in Fig. 2). Alternatively, it can also be requested to render the scene to Ambisonics format for other further manipulations, such as rotation, reflection or movement of the scene (FOA/HOA in Fig. 2) or for using an external renderer not defined in the original system. The decoder of the DirAC-spatial audio coding delivering different audio formats is illustrated in Fig. 2 and comprises an IVAS decoder 1045 and the subsequently connected decoder interface 1046. The IVAS decoder 1045 comprises an IVAS core-decoder 1060 that is configured in order to perform a decoding operation of content encoded by IVAS core encoder 1040 of Fig. 1 b. Furthermore, a DirAC metadata decoder 1050 is provided that delivers the decoding functionality for decoding content encoded by the DirAC metadata encoder 1030. A DirAC synthesizer 1070 receives data from block 1050 and 1060 and using some user interactivity or not, the output is input into a decoder interface 1046 that generates FOA/HOA data illustrated at 1083, multichannel data (MC data) as illustrated in block 1082, or object data as illustrated in block 1080. A conventional HOA synthesis using DirAC paradigm is depicted in Fig. 3. An input signal called down-mix signal is time-frequency analyzed by a frequency filter bank. The frequency filter bank 2000 can be a complex-valued filter-bank like Complex-valued QMF or a block transform like ST FT. The HOA synthesis generates at the output an Ambisonics signal of order H containing ( H + l)2 components. Optionally it can also output the Ambisonics signal rendered on a specific loudspeaker layout. In the following, we will detail how to obtain the ( H + l)2 components from the down-mix signal accompanied in some cases by input spatial parameters. The down-mix signal can be the original microphone signals or a mixture of the original signals depicting the original audio scene. For example if the audio scene is captured by a sound field microphone, the down-mix signal can be the omnidirectional component of the scene (W), a stereo down-mix (UR), or the first order Ambisonics signal (FOA). For each time-frequency tile, a sound direction, also called Direction-of-Arrival (DOA), and a diffuseness factor are estimated by the direction estimator 2020 and by the diffuseness estimator 2010, respectively, if the down-mix signal contains sufficient information for determining such DirAC parameters. It is the case, for example, if the down-mix signal is a First Oder Ambisonics signal (FOA). Alternatively or if the down-mix signal is not sufficient to determine such parameters, the parameters can be conveyed directly to the DirAC synthesis via an input bit-stream containing the spatial parameters. The bit-stream could consists for example of quantized and coded parameters received as side-information in the case of audio transmission applications. In this case, the parameters are derived outside the DirAC synthesis module from the original microphone signals or the input audio formats given to the DirAC analysis module at the encoder side as illustrated by switch 2030 or 2040. The sound directions are used by a directional gains evaluator 2050 for evaluating, for each time-frequency tile of the plurality of time-frequency tiles, one or more set of (H + l)2 directional gains Gjn k, n), where H is the order of the synthesized Ambisonics signal. The directional gains can be obtained by evaluation the spatial basis function for each estimated sound direction at the desired order (level) / and mode m of the Ambisonics signal to synthesize. The sound direction can be expressed for example in terms of a unit-norm vector n(k, rt) or in terms of an azimuth angle (p{k, n) and/or elevation angle 0(/c, n), which are related for example as: After estimating or obtaining the sound direction, a response of a spatial basis function of the desired order (level) / and mode m can be determined, for example, by considering real-valued spherical harmonics with SN3D normalization as spatial basis function: if m < 0 if m > 0 with the ranges 0 £ are the Legendre-functions and is a normalization term for both the Legendre functions and the trigonometric functions which takes the following form for SN3D: where the Kronecker-delta Sm is one for m = 0 and zero otherwise. The directional gains are then directly deduced for each time-frequency tile of indices (k,n) as: G (k, n) = Y^iipik. ^. eik. n)) The direct sound Ambisonics components are computed by deriving a reference signal Pref from the down-mix signal and multiplied by the directional gains and a factor function of the diffuseness' ilk, n): For example, the reference signal Pre can be the omnidirectional component of the down-mix signal or a linear combination of the K channels of the down-mix signal. The diffuse sound Ambisonics component can be modelled by using a response of a spatial basis function for sounds arriving from all possible directions. One example is to define the average response D™ by considering the integral of the squared magnitude of the spatial basis function Uipi(f, 0) over all possible angles f and Q: The diffuse sound Ambisonics components are computed from a signal /¾// multiplied by the average response and a factor function of the diffuseness ^k, n): PZ(k’n) = pdiff,i(k’ n)Z^ K n) DZ The signal can be obtained by using different decorrelators applied to the reference signal Pref. Finally, the direct sound Ambisonics component and the diffuse sound Ambisonics component are combined 2060, for example, via the summation operation, to obtain the final Ambisonics component P™ °f the desired order (level) / and mode m for the time-frequency tile (k, n), i.e., PZ(k, n) = P (k, n) + pZiff,i(k> n) The obtained Ambisonics components may be transformed back into the time domain using an inverse filter bank 2080 or an inverse ST FT, stored, transmitted, or used for example for spatial sound reproduction applications. Alternatively, a linear Ambisonics renderer 2070 can be applied for each frequency band for obtaining signals to be played on a specific loudspeaker layout or over headphone before transforming the loudspeakers signals or the binaural signals to the time domain. It should be noted that [5] also taught the possibility that diffuse sound components could only be synthesized up to an order L, where L L. The resulting vector yH contains the synthesized coefficients of order L < l < H, denoted by Tm i. The HOA synthesis normally depends on the diffuseness Y (or a similar measure), which describes how diffuse the sound field for the current time-frequency point is. Normally, the coefficients in yH only are synthesized if the sound field becomes non-diffuse, whereas in diffuse situations, the coefficients become zero. This prevents artifacts in diffuse situations, but also results in a loss of energy. Details on the HOA synthesis are explained later. To compensate for the loss of energy in diffuse situations mentioned above, we apply an energy compensation to bL in the energy compensation block 650, 750. The resulting signal is denoted by xL and has the same maximum order L as bL. The energy compensation depends on the diffuseness (or similar measure) and increases the energy of the coefficients in diffuse situations such that the loss of energy of the coefficients in yH is compensated. Details are explained later. In the combination block, the energy compensated coefficients in xL are combined 430 with the synthesized coefficients in yH to obtain the output Ambisonics signal ½ containing all ( H + l)2 coefficients, i.e., Subsequently, a HOA synthesis is explained as an embodiment. There exist several state-of-the-art approaches to synthesize the HOA coefficients in yH, e.g., a covariance-based rendering or a direct rendering using Directional Audio Coding (DirAC). In the simplest case, the coefficients in yH are synthesized from the omnidirectional component Bg in bL using U = B ^ίT YGJri(f, q). Here, (f, q) is the direction-of-arrival (DOA) of the sound and ϋ{h(f, Q) is the corresponding gain of the Ambisonics coefficient of order l and mode m. Normally, G;m(

Documents

Application Documents

#	Name	Date
1	202137024114-IntimationOfGrant24-01-2024.pdf	2024-01-24
1	202137024114-STATEMENT OF UNDERTAKING (FORM 3) [31-05-2021(online)].pdf	2021-05-31
2	202137024114-FORM 1 [31-05-2021(online)].pdf	2021-05-31
2	202137024114-PatentCertificate24-01-2024.pdf	2024-01-24
3	202137024114-Information under section 8(2) [17-01-2024(online)].pdf	2024-01-17
3	202137024114-FIGURE OF ABSTRACT [31-05-2021(online)].pdf	2021-05-31
4	202137024114-FORM 3 [14-10-2023(online)].pdf	2023-10-14
4	202137024114-DRAWINGS [31-05-2021(online)].pdf	2021-05-31
5	202137024114-Information under section 8(2) [14-10-2023(online)].pdf	2023-10-14
5	202137024114-DECLARATION OF INVENTORSHIP (FORM 5) [31-05-2021(online)].pdf	2021-05-31
6	202137024114-Information under section 8(2) [27-07-2023(online)].pdf	2023-07-27
6	202137024114-COMPLETE SPECIFICATION [31-05-2021(online)].pdf	2021-05-31
7	202137024114-Information under section 8(2) [26-06-2023(online)].pdf	2023-06-26
7	202137024114-FORM 18 [10-06-2021(online)].pdf	2021-06-10
8	202137024114-Proof of Right [16-07-2021(online)].pdf	2021-07-16
8	202137024114-Information under section 8(2) [19-05-2023(online)].pdf	2023-05-19
9	202137024114-FORM 3 [13-04-2023(online)].pdf	2023-04-13
9	202137024114-FORM-26 [20-07-2021(online)].pdf	2021-07-20
10	202137024114-Information under section 8(2) [13-04-2023(online)].pdf	2023-04-13
10	202137024114.pdf	2021-10-19
11	202137024114-CLAIMS [07-12-2022(online)].pdf	2022-12-07
11	202137024114-Information under section 8(2) [23-10-2021(online)].pdf	2021-10-23
12	202137024114-FER.pdf	2022-03-08
12	202137024114-FER_SER_REPLY [07-12-2022(online)].pdf	2022-12-07
13	202137024114-FORM 3 [07-12-2022(online)].pdf	2022-12-07
13	202137024114-FORM 3 [17-05-2022(online)].pdf	2022-05-17
14	202137024114-FORM 3 [12-10-2022(online)].pdf	2022-10-12
14	202137024114-FORM 4(ii) [23-08-2022(online)].pdf	2022-08-23
15	202137024114-Information under section 8(2) [12-10-2022(online)].pdf	2022-10-12
16	202137024114-FORM 3 [12-10-2022(online)].pdf	2022-10-12
16	202137024114-FORM 4(ii) [23-08-2022(online)].pdf	2022-08-23
17	202137024114-FORM 3 [17-05-2022(online)].pdf	2022-05-17
17	202137024114-FORM 3 [07-12-2022(online)].pdf	2022-12-07
18	202137024114-FER_SER_REPLY [07-12-2022(online)].pdf	2022-12-07
18	202137024114-FER.pdf	2022-03-08
19	202137024114-CLAIMS [07-12-2022(online)].pdf	2022-12-07
19	202137024114-Information under section 8(2) [23-10-2021(online)].pdf	2021-10-23
20	202137024114-Information under section 8(2) [13-04-2023(online)].pdf	2023-04-13
20	202137024114.pdf	2021-10-19
21	202137024114-FORM 3 [13-04-2023(online)].pdf	2023-04-13
21	202137024114-FORM-26 [20-07-2021(online)].pdf	2021-07-20
22	202137024114-Information under section 8(2) [19-05-2023(online)].pdf	2023-05-19
22	202137024114-Proof of Right [16-07-2021(online)].pdf	2021-07-16
23	202137024114-FORM 18 [10-06-2021(online)].pdf	2021-06-10
23	202137024114-Information under section 8(2) [26-06-2023(online)].pdf	2023-06-26
24	202137024114-COMPLETE SPECIFICATION [31-05-2021(online)].pdf	2021-05-31
24	202137024114-Information under section 8(2) [27-07-2023(online)].pdf	2023-07-27
25	202137024114-Information under section 8(2) [14-10-2023(online)].pdf	2023-10-14
25	202137024114-DECLARATION OF INVENTORSHIP (FORM 5) [31-05-2021(online)].pdf	2021-05-31
26	202137024114-FORM 3 [14-10-2023(online)].pdf	2023-10-14
26	202137024114-DRAWINGS [31-05-2021(online)].pdf	2021-05-31
27	202137024114-Information under section 8(2) [17-01-2024(online)].pdf	2024-01-17
27	202137024114-FIGURE OF ABSTRACT [31-05-2021(online)].pdf	2021-05-31
28	202137024114-PatentCertificate24-01-2024.pdf	2024-01-24
28	202137024114-FORM 1 [31-05-2021(online)].pdf	2021-05-31
29	202137024114-STATEMENT OF UNDERTAKING (FORM 3) [31-05-2021(online)].pdf	2021-05-31
29	202137024114-IntimationOfGrant24-01-2024.pdf	2024-01-24

Search Strategy

1	SearchHistory(4)E_08-03-2022.pdf

ERegister / Renewals

3rd: 26 Feb 2024

From 06/12/2021 - To 06/12/2022

4th: 26 Feb 2024

From 06/12/2022 - To 06/12/2023

5th: 26 Feb 2024

From 06/12/2023 - To 06/12/2024

6th: 03 Dec 2024

From 06/12/2024 - To 06/12/2025