Apparatus, Method And Computer Program For Encoding An Audio Signal Or

< Back

Apparatus, Method And Computer Program For Encoding An Audio Signal Or For Decoding An Encoded Audio Scene

Abstract: There are disclosed an apparatus for generating an encoded audio scene, and an apparatus for decoding and/or processing an encoded audio scene; as well as related methods and non-transitory storage units storing instructions which, when executed by a processor, cause the processor to perform a related method. An apparatus (200) for processing an encoded audio scene (304) may comprise, in a first frame (346), a first soundfield parameter representation (316) and an encoded audio signal (346), wherein a second frame (348) is an inactive frame, the apparatus comprising: an activity detector (2200) for detecting that the second frame (348) is the inactive frame; a synthetic signal synthesizer (210) for synthesizing a synthetic audio signal (228) for the second frame (308) using the parametric description (348) for the second frame (308); an audio decoder (230) for decoding the encoded audio signal (346) for the first frame (306); and a spatial renderer (240) for spatially rendering the audio signal (202) for the first frame (306) using the first soundfield parameter representation (316) and using the synthetic audio signal (228) for the second frame (308), or a transcoder for generating a meta data assisted output format comprising the audio signal (346) for the first frame (306), the first soundfield parameter representation (316) for the first frame (306), the synthetic audio signal (228) for the second frame (308), and a second soundfield parameter representation (318) for the second frame (308).

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

23 June 2025

Publication Number

38/2025

Publication Type

INA

Invention Field

ELECTRONICS

Status

Email

Parent Application

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Hansastraße 27c 80686 München Germany

Inventors

1. FUCHS, Guillaume

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen Germany

2. TAMARAPU, Archit

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen Germany

3. EICHENSEER, Andrea

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen Germany

4. KORSE, Srikanth

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen Germany

5. DÖHLA, Stefan

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen Germany

6. MULTRUS, Markus

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen Germany

Specification

is to be performed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
1. Apparatus (200) for processing an encoded audio scene (304), the encoded audio scene
5 (304) comprising, in a first frame (346) which is an active frame, a first soundfield parameter representation (316) and an encoded audio signal (346), and in a second frame (348)
which is an inactive frame, a second soundfield parameter representation (318) and a
parametric description (348) for the second frame, the apparatus being configured to derive one or more soundfield parameters (219, 318) for the second frame from the second
10 soundfield parameter representation (318), the apparatus comprising:
an activity detector (2200) configured for detecting that the second frame (348) is
the inactive frame;
a synthetic signal synthesizer (210) configured for synthesizing a synthetic audio
signal (228) for the second frame (308) using the parametric description (348) for the sec15 ond frame (308);
an audio decoder (230) configured for decoding the encoded audio signal (346)
for the first frame (306); and
a spatial renderer (220) configured for spatially rendering the audio signal (202)
for the first frame (306) using the first soundfield parameter representation (316) and using
20 the synthetic audio signal (228) and the one or more soundfield parameters (219, 318) for
the second frame (308).
2. The apparatus of claim 1, wherein the first or the second soundfield parameter representation (314, 316) comprises one or more direction parameters indicating a direction of
25 sound with respect to a listener position in the first frame (306), or one or more diffuseness
parameters indicating a portion a diffuse sound with respect to a direct sound in the first
frame (306), or one or more energy ratio parameters indicating an energy ratio of a direct
sound and a diffuse sound in the first frame (306), or an inter-channel/surround coherence
parameter in the first frame (306).
30
3. The apparatus of any of the preceding claims, wherein the first frame (306) or the second
frame (308) has frequency bin(s), representing individual sound source(s), wherein, for
each frequency bin, at least one soundfield parameter is determined, the soundfield parameter comprising at least one of a direction parameter, a direction of arrival parameter,
52
a diffuseness parameter, an energy ratio parameter or any parameter representing a characteristic of the soundfield represented by the first frame (306) of the audio signal with
respect to a listener position.
5 4. Apparatus of any one of the preceding claims, comprising a parameter processor (275,
1075) for deriving the one or more soundfield parameters (219, 318) for the second frame
(308),
wherein the parameter processor (275, 1075) is configured to store the soundfield
parameter representation for the first frame (306) and to synthesize one or more sound10 field parameters for the second frame (308) using the stored first soundfield parameter
representation (316) for the first frame (306), wherein the second frame (308) follows the
first frame (306) in time, or
wherein the parameter processor (275, 1075) is configured to store one or more
soundfield parameter representations (318) for several frames occurring in time before the
15 second frame (308) or occurring in time subsequent to the second frame (308) to extrapolate or interpolate using the at least two soundfield parameter representations of the one
or more soundfield parameter representations for several frames to determine the one or
more soundfield parameters for the second frame (308), and
wherein the spatial renderer is configured to use, for the rendering of the synthetic
20 audio signal (228) for the second frame (308), the one or more soundfield parameters for
the second frame (308).
5. Apparatus of any one of the preceding claims, wherein the parameter processor (275) is
configured to perform a dithering when deriving the directions included in the one or more
soundfield parameters for the second frame..
25
6. Apparatus of any one of the preceding claims, wherein the encoded audio scene (304)
comprises one or more transport channels (326) for the first frame (306),
wherein the synthetic signal synthesizer (210) is configured to generate one or
more transport channels (228) for the second frame (308) as the synthetic audio signal
30 (228), and
wherein the spatial renderer (220) is configured to spatially render the one or more
transport channels (228) for the second frame (308).
53
7. Apparatus of any one of the preceding claims, wherein the synthetic signal synthesizer
(210) is configured to generate, for the second frame (308), a plurality of synthetic component audio signals for individual components related to an audio output format of the
spatial renderer as the synthetic audio signal (228).
5
8. Apparatus of claim 7, wherein the synthetic signal synthesizer (210) is configured to generate, at least for each one of a subset of at least two individual components (228a, 228b)
related to the audio output format (202), an individual synthetic component audio signal,
wherein a first individual synthetic component audio signal (228a) is decorrelated
10 from a second individual synthetic component audio signal (228b), and
wherein the spatial renderer (220) is configured to render a component of the audio
output format (202) using a combination of the first individual synthetic component audio
signal (228a) and the second individual synthetic component audio signal (228b).
9. Apparatus of any one of the preceding claims, wherein the synthetic signal synthesizer
15 (210, 710, 810) is a comfort noise generator.
10. Apparatus of one of claims 8-9, wherein the synthetic signal synthesizer (210) comprises
a noise generator and the first individual synthetic component audio signal is generated
by a first sampling of the noise generator and the second individual synthetic component
20 audio signal is generated by a second sampling of the noise generator, wherein the second sampling is different from the first sampling.
11. Apparatus of any one of the preceding claims, wherein the spatial renderer (220) is configured to operate
25 in a first mode for the first frame (306) using a mixing of a direct signal and a diffuse
signal generated by a decorrelator (730) from the direct signal under a control of the first
soundfield parameter representation (316), and
in a second mode for the second frame (308) using a mixing of a first synthetic
component signal and the second synthetic component signal, wherein the first and the
30 second synthetic component signals are generated by the synthetic signal synthesizer
(210) by different realizations of a noise process or a pseudo noise process.
12. Apparatus of claim 11, wherein the spatial renderer (220) is configured to control the mixing (740) in the second mode by a diffuseness parameter, an energy distribution parameter, or a coherence parameter derived for the second frame (308) which are, or are ob35 tained from, the one or more soundfield parameter for the second frame.
54
13. Apparatus of any one of the preceding claims,
wherein the synthetic signal synthesizer (210) is configured to generate a synthetic
audio signal (228) for the first frame (306) using the parametric description (348) for the
second frame (308), and
5 wherein the spatial renderer is configured to perform a weighted combination of
the audio signal for the first frame (306) and the synthetic audio signal (228) for the first
frame (306) before or after the spatial rendering, wherein, in the weighted combination, an
intensity of the synthetic audio signal (228) for the first frame (306) is reduced with respect
to an intensity of the synthetic audio signal (228) for the second frame (308).
10
14. Apparatus of any one of the preceding claims,
wherein a parameter processor (275, 1075) is configured to determine, for the second inactive frame (308), a surround coherence being defined as a ratio of diffuse energy
being coherent in a soundfield represented by the second frame (308), wherein the spatial
15 renderer is configured for re-distributing an energy between direct and diffuse signals in
the second frame (308) based on the sound coherence, wherein an energy of sound surround coherent components is removed from the diffuse energy to be re-distributed to
directional components, and wherein the directional components are panned in a reproduction space.
20
15. Apparatus of any one of the preceding claims, further comprising an output interface for
converting an audio output format generated by the spatial renderer into a transcoded
output format such as an output format comprising a number of output channels dedicated
for loudspeakers to be placed at predefined positions, or a transcoded output format com25 prising first order Ambisonic, FOA, or higher order Ambisonic, HOA, data.
16. Method of processing an encoded audio scene comprising, in a first frame (306), a first
soundfield parameter representation (316) and an encoded audio signal, wherein a second frame (308) is an inactive frame, the method comprising:
30 detecting that the second frame (308) is the inactive frame;
synthesizing a synthetic audio signal (228) for the second frame (308) using the
parametric description (348) for the second frame (308);
decoding the encoded audio signal for the first frame (306); and
spatially rendering the audio signal for the first frame (306) using the first sound35 field parameter representation (316) and using the synthetic audio signal (228) for the
55
second frame (308), or generating a meta data assisted output format comprising the audio signal for the first frame (306), the first soundfield parameter representation (316) for
the first frame (306), the synthetic audio signal (228) for the second frame (308), and a
second soundfield parameter representation (318) for the second frame (308).
5
56

Documents

Application Documents

#	Name	Date
1	202538059955-STATEMENT OF UNDERTAKING (FORM 3) [23-06-2025(online)].pdf	2025-06-23
2	202538059955-REQUEST FOR EXAMINATION (FORM-18) [23-06-2025(online)].pdf	2025-06-23
3	202538059955-PROOF OF RIGHT [23-06-2025(online)].pdf	2025-06-23
4	202538059955-FORM 18 [23-06-2025(online)].pdf	2025-06-23
5	202538059955-FORM 1 [23-06-2025(online)].pdf	2025-06-23
6	202538059955-FIGURE OF ABSTRACT [23-06-2025(online)].pdf	2025-06-23
7	202538059955-DRAWINGS [23-06-2025(online)].pdf	2025-06-23
8	202538059955-DECLARATION OF INVENTORSHIP (FORM 5) [23-06-2025(online)].pdf	2025-06-23
9	202538059955-COMPLETE SPECIFICATION [23-06-2025(online)].pdf	2025-06-23
10	202538059955-FORM-26 [22-08-2025(online)].pdf	2025-08-22
11	202538059955-FORM 3 [13-11-2025(online)].pdf	2025-11-13