Apparatus And Method For Encoding A Spatial Audio Representation Or

< Back

Apparatus And Method For Encoding A Spatial Audio Representation Or Apparatus And Method For Decoding An Encoded Audio Signal Using Transport Metadata And Related Computer Programs

Abstract: An apparatus for encoding a spatial audio representation representing an audio scene to obtain an encoded audio signal, comprises: a transport representation generator (600) for generating a transport representation (611) from the spatial audio representation, and for generating transport metadata (610) related to the generation of the transport representation (611) or indicating one or more directional properties of the transport representation (611); and an output interface (640) for generating the encoded audio signal, the encoded audio signal comprising information on the transport representation (611), and information on the transport metadata (610).

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

20 July 2021

Publication Number

15/2022

Publication Type

INA

Invention Field

ELECTRONICS

Status

Email

info@krishnaandsaurastri.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-07-19

Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Hansastraße 27c 80686 München

Inventors

1. KÜCH, Fabian

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

2. THIERGART, Oliver

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

3. FUCHS, Guillaume

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

4. DÖHLA, Stefan

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

5. BOUTHÉON, Alexandre

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

6. HERRE, Jürgen

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

7. BAYER, Stefan

c/o Fraunhofer-Institut für Integrierte Schaltungen IIS Am Wolfsmantel 33 91058 Erlangen

Specification

(EXTRACTED FROM WIPO) Apparatus and Method for Encoding a Spatial Audio Representation or Apparatus and Method for Decoding an Encoded Audio Signal Using Transport Metadata and Related Computer Programs Specification Embodiments of the invention relate to transport channel or downmix signaling for direc tional audio coding. Directional Audio Coding (DirAC) technique [Pulkki07] is an efficient approach to the anal ysis and reproduction of spatial sound. DirAC uses a perceptually motivated representation of the sound field based on spatial parameters, i.e., the direction of arrival (DOA) and dif fuseness measured per frequency band. It is built upon the assumption that at one time instant and at one critical band, the spatial resolution of auditory system is limited to decod ing one cue for direction and another for inter-aural coherence. The spatial sound is then represented in the frequency domain by cross-fading two streams: a non-directional diffuse stream and a directional non-diffuse stream. DirAC was originally intended for recorded B-format sound but can also be extended for microphone signals matching a specific loudspeaker setup like 5.1 [2] or any configuration of microphone arrays [5] In the latest case, more flexibility can be achieved by recording the signals not for a specific loudspeaker setup, but instead recording the signals of an intermediate format. Such an intermediate format, which is well-established in practice, is represented by (higher-order) Ambisonics [3] From an Ambisonics signal, one can generate the signals of every desired loudspeaker setup including binaural signals for headphone reproduction. This requires a specific renderer which is applied to the Ambisonics signal, using either a linear Ambisonics renderer [3] or a parametric renderer such as Directional Audio Coding (DirAC). An Ambisonics signal can be represented as a multi-channel signal where each channel (referred to as Ambisonics component) is equivalent to the coefficient of a so-called spatial basis function. With a weighted sum of these spatial basis functions (with the weights cor responding to the coefficients) one can recreate the original sound field in the recording location [3]. Therefore, the spatial basis function coefficients (i.e., the Ambisonics compo nents) represent a compact description of the sound field in the recording location. There exist different types of spatial basis functions, for example spherical harmonics (SHs) [3] or cylindrical harmonics (CHs) [3] CHs can be used when describing the sound field in the 2D space (for example for 2D sound reproduction) whereas SHs can be used to describe the sound field in the 2D and 3D space (for example for 2D and 3D sound reproduction). As an example, an audio signal /(t) which arrives from a certain direction (f, q) results in a spatial audio signal /( 0 with the ranges 0 £ are the Legendre-functions and is a normalization term for both the Legendre functions and the trigonometric functions which takes the following form for SN3D: where the Kronecker-delta Sm is one for m = 0 and zero otherwise. The directional gains are then directly deduced for each time-frequency tile of indices (k,n) as: G n(k, n) = Fjm(m and 0m are the desired azimuth angle and elevation angle of the look direction of the generated m-th direc tional microphone signal. For example, for cm = 0.5, a directional microphone with cardioid directivity is achieved, cm = 1 corresponds to an omnidirectional characteristic cm = 0 cor responds to a dipole characteristic. In other words, the parameter cm describes the general shape of the first-order directivity pattern. The weights for the linear combination, e.g., am W, am X, am Y, and am Z, or the correspond ing parameters cm, Fp, and 0m, describe the directivity patterns of the corresponding directional microphone signals. This information is represented by the down-mix parameters in the encoder in Fig. 3 and is transmitted to the decoder as part of the metadata. Different encoding strategies can be used to efficiently represent the down-mix parameters in the bitstream including quantization of the directional information or referring to a table entry by an index, where the table includes all relevant parameters. In some embodiments it is already sufficient or more efficient to use only a limited number of presets for the look directions FM and 0m as well as for the shape parameter cm. This obviously corresponds to using a limited number of presets for the weights am W, am X, amX, and am Z, too. For example, the shape parameters can be limited to represent only three different directivity patterns: omnidirectional, cardioid, and dipole characteristic. The num ber of possible look directions Fpi and 0m can be limited such that they only represent the 5 cases left, right, front, back, up, and down. In another even simpler embodiment, the shape parameter is kept fixed and always corre sponds to a cardioid pattern or the shape parameter is not defined at all. The down-mix parameters associated with the look direction are used to signal whether a pair of downmix-10 channels correspond to a left/right or a front/back channel pair configuration such that the rendering process at the decoder can use the optimum down-mix channel as reference signal for rendering a certain loudspeaker channel located in the in the left, right or frontal hemisphere. 15 In the practical application, the parameter cm can be defined, e.g., manually (typically cm = 0.5). The look directions TO and 0m). Additionally, the down-mix metadata may comprise information on the directivity pattern of the selected microphones, e.g., by using the first-order parameter cm described before. On the decoder side (Fig. 4), the down-mix metadata is used in the“spatial audio synthesis” block to obtain optimum rendering quality. For example, for loudspeaker output (MC output), when the down-mix metadata indicates that two omnidirectional microphones at two specific positions were transmitted as down-mix signals, the reference signal Prefj(/c, n), from which the loudspeaker signal is generated as explained before, can be selected to correspond to the down-mix signals that has the smallest distance to the j-th loudspeaker position. Simi larly, if the down-mix metadata indicates that two directional microphones with look direction {Om, 0m} were transmitted, Pre[j(k, n) can be selected to correspond to the down-mix signal with closest look direction towards the loudspeaker position. Alternatively, a linear combi nation of the transmitted coincident directional down-mix signals can be performed, as ex plained in the second embodiment. When generating FOA/HOA output at the decoder, a single down-mix signal may be se lected (at will) for generating the direct sound for all FOA/HOA components if the down-mix metadata indicates that spaced omnidirectional microphones have been transmitted. In fact, each omnidirectional microphone contains the same information on the direct sound to be reproduced due to the omnidirectional characteristic. However, for generating the diffuse sound reference signals Pref , one can consider all transmitted omnidirectional down-mix signals. In fact, if the sound field is diffuse, the spaced omnidirectional down-mix signals will be partially decorrelated such that less decorrelation is required to generate mutually un- correlated reference signals Prefj•. The mutually uncorrelated reference signals can be gen erated from the transmitted down-mix audio signals by using e.g. the covariance-based rendering approach proposed in [Vilkamo13] It is well-known that the correlation between the signals of two microphones in a diffuse sound field strongly depends on the distance between the microphones: the larger the dis tance of the microphones the less the recorded signals in a diffuse sound field are correlated [Laitinenl 1]. The information related to the microphone distance included in the down-mix parameters can be used at the decoder to determine by how much the down-mix channels have to be synthetically decorrelated to be suitable for rendering diffuse sound components. In case of the down-mix signals are already sufficiently decorrelated due to sufficiently large microphone spacings, artificial decorrelation may even be discarded and any decorrelation related artifacts can be avoided. When the down-mix metadata indicates that e.g. coincident directional microphone signals have been transmitted as downmix signals, then the reference signals PTefj(k, n) for FOA/HOA output can be generated as explained in the second embodiment. Note that instead of selecting a subset of microphones as down-mix audio signals in the encoder, one could select all available microphone input signal (for example two or more) as down-mix audio signal. In this case, the down-mix metadata describes the entire microphone array configuration, e.g., in terms of Cartesian microphone positions, microphone look directions i»m and 0m in polar coordinates, or microphone directivities in terms of first-order parameters cm. In a second example, the down-mix audio signals are generated in the encoder in the “down-mix generation” block using a linear combination of the input microphone signals, e.g., using spatial filtering (beamforming). In this case, the down-mix signals Dm(k, n) can be computed as Here, x(k, n) is a vector containing all input microphone signals and are the weights for the linear combination, i.e., the weights of the spatial filter or beamformer, for the m-th audio down-mix signal. There are various ways to compute spatial filters or beamformers in an optimal way [Veen88] In many cases, a look direction {F7P, 0m] is defined, towards which the beamformer is directed. The beamformer weights can then be computed, e.g., as a delay-and-sum beamformer or MVDR beamformer [Veen88] In this embodiment, the beamformer look direction {4>m, 0m} is defined for each audio down-mix signal. This can be done manually (e.g., based on presets) or automatically in the same ways as described in the second embodiment. The look direction 0m} of the beamformer signals, which rep resent the different audio down-mix signals, then can represents the down-mix metadata that is transmitted to the decoder in Fig. 4. Another example is especially suitable when using loudspeaker output at the decoder (MC output). In this case, that down-mix signal Dm(k, n) is used as Prefj(/c, ) for which the beam-former look direction is closest to the loudspeaker direction. The required beamformer look direction is described by the down-mix metadata. Note that in all examples, the transport channel configuration, i.e., down-mix parameters, can be adjusted time-frequency dependent, e.g., based on the spatial parameters, similarly as in the previous embodiments. Subsequently, further embodiments of the present invention or the embodiments already described before are discussed with respect to the same or additional or further aspects. Preferably, the transport representation generator 600 of Fig. 6 comprises one or several of the features illustrated in Fig. 8a. Particularly, an energy location determiner 606 is pro vided that controls a block 602. The block 602 may comprise a selector for selecting from Ambisonics coefficient signals when the input is an FOA or HOA signal. Alternatively, or additionally, the energy location determiner 606 controls a combiner for combining Ambi sonics coefficient signals. Additionally, or alternatively, a selection from a multichannel rep resentation or from microphone signals is done. In this case, the input has microphone signals or a multichannel representation rather than FOA or HOA data. In additional or al ternatively, a channel combination or a combination of microphone signals is performed as indicated at 602 in Fig. 8a. For the lower two alternatives, the multichannel representation or microphone signals are input. The transport data generated by one or several of the blocks 602 are input into the transport metadata generator 605 included in the transport representation generator 600 of Fig. 6 in order to generate the (encoded) transport metadata 610. Any one of the blocks 602 generates the preferably non-encoded transport representation 614 that is then further encoded by a core encoder 603such as the one illustrated in Fig. 3 or Fig. 5. It is outlined that an actual implementation of the transport representation generator 600 may comprise only a single one of the blocks 602 in Fig. 8a or two or more of the blocks illustrated in Fig. 8a. In the latter case, the transport metadata generator 605 is configured to additionally include a further transport metadata item into the transport metadata 610 that indicates for which (time and/or frequency) portion of the spatial audio representation any one of the alternatives indicated at item 602 has been taken. Thus, Fig. 8a illustrates a situation where only one of the alternatives 602 is active or where two or more are active and a signal-dependent switch can be performed among the different alternatives for the transport representation generation or downmixing and the corresponding transport metadata. Fig. 8b illustrates a table of different transport metadata alternatives that can be generated by the transport representation generator 600 of Fig. 6 and that can be used by the spatial audio synthesizer of Fig. 7. The transport metadata alternatives comprise a selection infor mation for the metadata indicating which subset of a set of audio input data components have been selected as the transport representation. An example is, for example, that only two or three out of, for example, four FOA components have been selected. Alternatively, the selection information may indicate which microphone signals of a microphone signal array have been selected. A further alternative of Fig. 8b is a combination information indi cating how a certain audio representation input component or signals have been combined. A certain combination information may refer to weights for a linear combination or to which channels have been combined, for example with equal or predefined weights. A further information refers to a sector or hemisphere information associated with a certain transport signal. A sector of hemisphere information may refer to the left sector or the right sector or the front sector or the rear sector with respect to a listening position or, alternatively, a smaller sector than a 180° sector. Further embodiments relate to the transport metadata indicating a shape parameter refer ring to the shape of, for example, a certain physical or virtual microphone directivity generating the corresponding transport representation signal. The shape parameter may indicate an omnidirectional microphone signal shape or a cardioid microphone signal shape or a dipole microphone signal shape or any other related shape. Further transport metadata alternatives relate to microphone locations, microphone orientations, a distance between microphones or a directional pattern of microphones that have, for example, generated or recorded the transport representation signals included in the (encoded) transport represen tation 614. Further embodiments relate to the look direction or a plurality of look directions of signals included in the transport representation or information on beamforming weights or beamformer directions or, alternatively or additionally, related to whether the included microphone signals are omnidirectional microphone signals or cardioid microphone signals or other signals. A very small transport metadata side information (with respect to bit rate) can be generated by simply including a single flag indicating whether the transport signals are microphone signals from an omnidirectional microphone or from any other microphone different from an omnidirectional microphone. Fig. 8c illustrates a preferred implementation of the transport metadata generator 605. In particular, for numerical transport metadata, the transport metadata generator comprises a transport metadata quantizer 605a or 622 and a subsequently connected transport metadata entropy encoder 605b. The procedures illustrated in Fig. 8c can also be applied to parametric metadata and, in particular, to spatial parameters as well. Fig. 9a illustrates a preferred implementation of the spatial audio synthesizer 750 in Fig. 7. The spatial audio synthesizer 750 comprises a transport metadata parser for interpreting the (decoded) transport metadata 710. The output data from block 752 is introduced into a combiner/selector/reference signal generator 760 that, additionally, receives the transport signal 71 1 as included in the transport representation obtained from the input interface 700 of Fig. 7. Based on the transport metadata, the combiner/selector/reference signal generator generates one or more reference signals and forwards these reference signals to a com ponent signal calculator 770 that calculates components of the synthesized spatial audio representation such as general components for a multichannel output, Ambisonics components for an FOA or HOA output, left and right channels for a binaural representation or audio object components where an audio object component is a mono or stereo object signal. Fig. 9b illustrates and encoded audio signal consisting of, for example, n transport signals T1 , T2, Tn indicated at item 61 1 and, additionally, consisting of transport metadata 610 and optional spatial parameters 612. The order of the different data blocks and the size of a certain data block with respect to the other data block is only schematically illustrated in Fig. 9b. Fig. 9c illustrates an overview table for the procedure of the combiner/selector/reference signal generator 760 for certain transport meta data, a certain transport representation and a certain speaker setup. In particular, in the Fig. 9c embodiment, the transport representa tion comprises a left transport signal (or a front transport signal or an omnidirectional or cardioid signal) and the transport representation additionally comprises a second transport signal T2 being a right transport signal (or a back transport signal, an omnidirectional transport signal or a cardioid transport signal) for example. In case of left/right, the reference signal for the left speaker A is selected to be the first transport signal T1 and the reference signal for the right speaker is selected as the transport signal T2. For left surround and right surround, the left and the right signals are selected as outlined in the table 771 for the corresponding channels. For the center channel, a sum of the left and right transport signal T 1 and T2 is selected as the reference signal for the center channel component of the synthesized spatial audio representation. In Fig. 9c, a further selection is illustrated when the first transport signal T1 is a front transport signal and the second transport signal T2 is a right transport signal. Then, the first transport signal T1 is selected for left, right, center and the second transport signal T2 is selected for left surround and right surround. Fig. 9d illustrates a further preferred implementation of the spatial audio synthesizer of Fig. 7. In a block 910, the transport or downmix data is calculated regarding a certain first order Ambisonics or higher order Ambisonics selection. Four different selection alternatives are, for example, illustrated in Fig. 9d where, in the fourth alternative, only two transport signals T1 , T2 are selected rather than a third component that is, in the other alternatives, the omnidirectional component. The reference signal for the (virtual) channels is determined based on the transport downmix data and a fallback procedure is used for the missing component, i.e., for the fourth component with respect to the examples in Fig. 9d or for the two missing components in the case of the fourth example. Then, at block 912, the channel signals are generated using directional parameters received or derived from the transport data. Thus, the direc tional or spatial parameters can either be additionally received as is illustrated at 712 in Fig. 7 or can be derived from the transport representation by a signal analysis of the transport representation signals. In an alternative implementation, a selection of a component as an FOA component is performed as indicated in block 913 and the calculation of the missing component is performed using a spatial basis function response as illustrated at item 914 in Fig. 9d. A certain pro cedure using a spatial basis functional response is illustrated in Fig. 10 at block 410 where, in Fig. 10, block 826 provides an average response for the diffuse portion while block 410 in Fig. 10 provides a specific response for each mode m and order I for the direct signal portion. Fig. 9e illustrates a further table indicating certain transport metadata particularly compris ing a shape parameter or a look direction in addition to the shape parameter or alternative to the shape parameter. The shape parameter may comprise the shape factor cm being 1 , 0.5 or 0. The factor cM =1 indicates an omnidirectional shape of the microphone recording characteristic, while a factor of 0.5 indicates a cardioid shape and a value of 0 indicates a dipole shape. Furthermore, different look directions can comprise left, right, front, back, up, down, a spe cific direction of arrival consisting of an azimuth angle f and an elevation angle Q or, alter natively, a short metadata consisting of an indication that the pair of signals in the transport representation comprise a left/right pair or a front/back pair. In Fig. 9f, a further implementation of the spatial audio synthesizer is illustrated where, in block 910, the transport metadata are read as is, for example, done by the input interface 700 of Fig. 7 or an input port of the spatial audio synthesizer 750. In block 950, a reference signal determination is adapted to the read transport metadata as is performed, for example, by block 760. Then, in block 916, the multichannel, FOA/HOA, object or binaural output and, in particular, the specific components for these kinds of data output are calculated using the reference signal obtained via block 915 and the optionally transmitted parametric data 712 if available. Claims 1. Apparatus for encoding a spatial audio representation representing an audio scene to obtain an encoded audio signal, the apparatus comprising: a transport representation generator (600) for generating a transport representation from the spatial audio representation, and for generating transport metadata related to the generation of the transport representation or indicating one or more directional properties of the transport representation; and an output interface (640) for generating the encoded audio signal, the encoded audio signal comprising information on the transport representation, and information on the transport metadata. 2. Apparatus of claim 1 , further comprising a parameter processor (620) for deriving spatial parameters from the spatial audio representation, wherein the output interface (640) is configured for generating the encoded audio signal such that the encoded audio signal additionally comprises information on the spatial parameters. 3. Apparatus of claim 1 or 2, wherein the spatial audio representation is a first order Ambisonics or higher order Ambisonics representation comprising a multitude of coefficient signals, or a multi channel representation comprising a plurality of audio channels, wherein the transport representation generator (600) is configured to select one or more coefficient signals from the first order Ambisonics or higher order Ambisonics representation or to combine coefficients from the higher order Ambisonics or first order Ambisonics representation, or wherein the transport representation generator (600) is configured to select one or more audio channels from the multichannel representation or to combine two or more audio channels from the multichannel representation, and wherein the transport representation generator (600) is configured to generate, as the transport metadata, information indicating which specific one or more coefficient signals or audio channels have been selected, or information how the two or more coefficients signals or audio channels have been combined, or which ones of the first order Ambisonics or higher order Ambisonics coefficient signals or audio chan nels have been combined. 4. Apparatus of claim 1 , 2 or 3, wherein the transport representation generator (600) is configured to determine, whether a majority of sound energy is located in a horizontal plane, or wherein only an omnidirectional coefficient signal, an X coefficient signal and a Y coefficient signal are selected as the transport representation in response to the de termination or in response to an audio encoder setting, and wherein the transport representation generator (600) is configured to determine the transport metadata so that the transport metadata includes an information on the selection of the coefficient signals. 5. Apparatus of claim 1 , 2, or 3, wherein the transport representation generator (600) is configured to determine, whether a majority of sound energy is located in a x-z plane, or wherein only an omnidirectional coefficient signal, a X coefficient signal and a Z co efficient signal are selected as the transport representation in response to the deter mination or in response to an audio encoder setting, and wherein the transport representation generator (600) is configured to determine the transport metadata so that the transport metadata includes an information on the selection of the coefficient signal. 6. Apparatus of claim 1 , 2, or 3, wherein the transport representation generator (600) is configured to determine, whether a majority of sound energy is located in a y-z plane, or wherein only an omnidirectional coefficient signal, a Y coefficient signal and a Z co efficient signal are selected as the transport representation in response to the deter mination or in response to an audio encoder setting, and wherein the transport representation generator (600) is configured to determine the transport metadata so that the transport metadata includes an information on the selection of the coefficient signals. Apparatus of claim 1 , 2, or 3, wherein the transport representation generator (600) is configured to determine whether a dominant sound energy originates from a specific sector or hemisphere such as a left or right hemisphere or a forward or backward hemisphere, or wherein the transport representation generator (600) is configured to generate a first transport signal from the specific sector or hemisphere, where a dominant sound energy originates or in response to an audio encoder setting, and a second transport signal from a different sector or hemisphere such as the sector or hemisphere having an opposite direction with respect to a reference location and with respect to the specific sector or hemisphere, and wherein the transport representation generator (600) is configured to determine the transport metadata so that the transport metadata comprises information identifying the specific sector or hemisphere, or identifying the different sector or hemisphere. Apparatus of one of the preceding claims, wherein the transport representation generator (600) is configured to combine coefficient signals of the spatial audio representation so that a first resulting signal being a first transport signal corresponds to a directional microphone signal directed to a specific sector or hemisphere, and a second resulting signal being a second transport signal corresponds to a directional microphone signal directed to a different sector or hemisphere. Apparatus of one of the preceding claims, further comprising a user interface (650) for receiving a user input, wherein the transport representation generator (600) is configured to generate the transport representation based on the user input received at the user interface (650), and wherein the transport representation generator (600) is configured to generate the transport metadata so that the transport metadata has information on the user input. 10. Apparatus of one of the preceding claims, wherein the transport representation generator (600) is configured to generate the transport representation and the transport metadata in a time-variant or frequency- dependent way, so that the transport representation and the transport metadata for a first frame is different from the transport representation and the transport metadata for a second frame, or so that the transport representation and the transport metadata for a first frequency band is different from a transport representation and the transport metadata for a second different frequency band. 11. Apparatus of one of the preceding claims, wherein the transport representation generator (600) is configured to generate one or two transport signals by a weighted combination (602) of two or more than two coefficient signals of the spatial audio representation, and wherein the transport representation generator (600) is configured to calculate the transport metadata so that the transport metadata comprises information on weights used in the weighted combination, or information on an azimuth and/or elevation angle as a look direction of a generated directional microphone signal, or information on a shape parameter indicating a directional characteristic of a directional micro phone signal. 12. Apparatus of one of the preceding claims, wherein the transport representation generator (600) is configured to generate quan titative transport metadata, to quantize (605a) the quantitative transport metadata to obtain quantized transport metadata, and to entropy encode (605b) the quantized transport metadata, and wherein the output interface (640) is configured to include the encoded transport metadata into the encoded audio signal. 13. Apparatus of one of claims 1 to 1 1 , wherein the transport representation generator (600) is configured to transform the transport metadata into a table index or a preset parameter, and wherein the output interface (640) is configured to include the table index or the preset parameter into the encoded audio signal. Apparatus of one of the preceding claims, wherein the spatial audio representation comprises at least two audio signals and spatial parameters, wherein a parameter processor (620) is configured to derive the spatial parameters from the spatial audio representation by extracting the spatial parameters from the spatial audio representation, wherein the output interface (640) is configured to include information on the spatial parameters into the encoded audio signal or to include information on processed spatial parameters derived from the spatial parameters into the encoded audio signal, or wherein the transport representation generator (600) is configured to select a subset of the at least two audio signals as the transport representation and to generate the transport metadata so that the transport metadata indicates the selection of the subset, or to combine the at least two audio signals or a subset of the at least two audio signals and to calculate the transport metadata such that the transport metadata includes information on the combination of the audio signals performed for calculat ing the transport representation of the spatial audio representation. Apparatus of one of the preceding claims, wherein the spatial audio representation comprises a set of at least two microphone signals acquired by a microphone array, wherein the transport representation generator (600) is configured to select one or more specific microphone signals associated with specific locations or with specific microphones of the microphone array, and wherein the transport metadata comprises information on the specific locations or the specific microphones or on a microphone distance between locations associated with selected microphone signals, or information on a microphone orientation of a microphone associated with a selected microphone signal, or information on microphone directional patterns of microphone signals associated with selected micro phones. 16. Apparatus of claim 15, wherein the transport representation generator (600) is configured to select one or more signals of the spatial audio representation in accordance with a user input received by a user interface (650), to perform (606) an analysis of the spatial audio representation with respect to which location has which sound energy and to select (602) one or more signals of the spatial audio representation in accordance with an analysis result, or to perform a sound source localization and to select (602) one or more signals of the spatial audio representation in accordance with a result of the sound source localization. 17. Apparatus of one of claims 1 to 15, wherein the transport representation generator (600) is configured to select all sig nals of a spatial audio representation, and wherein the transport representation generator (600) is configured to generate the transport metadata so that the transport metadata identifies a microphone array, from which the spatial audio representation is derived. 18. Apparatus of one of the preceding claims, wherein the transport representation generator (600) is configured to combine (602) audio signals included in the spatial audio representation using spatial filtering or beamforming, and wherein the transport representation generator (600) is configured to include infor mation on the look direction of the transport representation or information on beam forming weights used in calculating the transport representation into the transport metadata. 19. Apparatus of one of the preceding claims, wherein the spatial audio representation is a description of a sound field related to a reference position, and wherein a parameter processor (620) is configured to derive spatial parameters from the spatial audio representation, wherein the spatial parameters define time-variant or frequency-dependent parameters on a direction of arrival of sound at the refer- ence position or time-variant or frequency-dependent parameters on a diffuseness of the sound field at the reference position, or wherein the transport representation generator (600) comprises a down mixer (601 ) for generating, as the transport representation, a downmix representation having a second number of individual signals being smaller than a first number of individual signals included in the spatial audio representation, wherein the downmixer (601 ) is configured to select a subset of the individual signals included in the spatial audio representation or to combine the individual signals included in the spatial audio representation in order to decrease the first number of signals to the second number of signals. 20. Apparatus of one of the preceding claims, wherein a parameter processor (620) comprises a spatial audio analyzer (621 ) for deriving the spatial parameters from the spatial audio representation by performing an audio signal analysis, and wherein the transport representation generator (600) is configured to generate the transport representation based on the result of the spatial audio analyzer (621 ), or wherein the transport representation comprises a core encoder (603) for core en coding one or more audio signals of the transport signals of the transport represen tation, or wherein the parameter processor (620) is configured to quantize and entropy en code (622) the spatial parameters, and wherein the output interface (640) is configured to include a core-encoded transport representation (611 ) as the information on the transport representation into the en coded audio signal or to include the entropy encoded spatial parameters (612) as the information on spatial parameters into the encoded audio signal. 21. Apparatus for decoding an encoded audio signal, comprising: an input interface (700) for receiving the encoded audio signal comprising infor mation on a transport representation and information on transport metadata; and a spatial audio synthesizer (750) for synthesizing a spatial audio representation us ing the information on the transport representation and the information on the transport metadata. 22. Apparatus of claim 21 , wherein the input interface (700) is configured for receiving the encoded audio signal additionally comprising information on spatial parameters, and wherein the spatial audio synthesizer (750) is configured for synthesizing the spatial audio representation additionally using the information on the spatial parameters. 23. Apparatus of claim 21 or 22, wherein the spatial audio synthesizer (750) comprises: a core decoder (751 ) for core decoding two or more encoded transport signals rep resenting the information on the transport representation to obtain two or more de coded transport signals, or wherein the spatial audio synthesizer (750) is configured to calculate a first order Ambisonics or a higher order Ambisonics representation (754) or a multi-channel signal (755) or an object representation (756) or a binaural representation of the spatial audio representation, or wherein the spatial audio synthesizer (750) comprises a metadata decoder (752) for decoding the information on the transport metadata to derive the decoded transport metadata (720) or for decoding information on spatial parameters (722) to obtain decoded spatial parameters. 24. Apparatus of claim 21 , 22, or 23, wherein the spatial audio representation comprises a plurality of component signals, wherein the spatial audio synthesizer (750) is configured to determine (760), for a component signal of the spatial audio representation, a reference signal using the information on the transport representation (711 ) and the information on the transport metadata (710), and to calculate (770) the component signal of the spatial audio representation using the reference signal and information on spatial parameters, or to calculate (770) the component signal of the spatial audio representation using the reference signal. Apparatus of one of claims 22 to 24, wherein the spatial parameters comprise at least one of the time-variant or fre quency-dependent direction of arrival or diffuseness parameters, wherein the spatial audio synthesizer (750) is configured to perform a directional audio coding (DirAC) synthesis using the spatial parameters to generate the plurality of different components of the spatial audio representation, wherein the first component of the spatial audio representation is determined using one of the at least two transport signals or a first combination of the at least two transport signals, wherein a second component of the spatial audio representation is determined using another one of the at least two transport signals or a second combination of the at least two transport signals, wherein the spatial audio synthesizer (750) is configured to perform (760) a deter mination of the one or the different one of the at least two transport signals or to perform (760) a determination of the first combination or the different second com bination in accordance with the transport metadata. 26. Apparatus of one of claims 21 to 25, wherein the transport metadata indicates a first transport signal as referring to a first sector or hemisphere related to a reference position of the spatial audio representa tion and a second transport signal as referring to a second different sector or hemi sphere related to the reference position of the spatial audio representation, wherein the spatial audio synthesizer (750) is configured to generate (915) a component signal of the spatial audio representation associated with the first sector or hemisphere using the first transport signal and without using the second transport signal, or wherein the spatial audio synthesizer (750) is configured to generate (915) another component signal of the spatial audio representation associated with the second sector or hemisphere using the second transport signal and not using the first transport signal, or wherein the spatial audio synthesizer (750) is configured to generate (915) a com ponent signal associated with the first sector or hemisphere using a first combination of the first and the second transport signal, or to generate (915) a component signal associated with a different second sector or hemisphere using a second combination of the first and the second transport signals, wherein the first combination is influ enced by the first transport signal stronger than the second combination, or wherein the second combination is influenced by the second transport signal stronger than the first combination. Apparatus of one of claims 21 to 26, wherein the transport metadata comprises information on a directional characteristic associated with transport signals of the transport representation, wherein the spatial audio synthesizer (750) is configured to calculate (911 ) virtual microphone signals using first order Ambisonics or higher order Ambisonics signals, loudspeaker positions and the transport metadata, or wherein the spatial audio synthesizer (750) is configured to determine (911 ) the directional characteristic of the transport signals using the transport metadata and to determine a first order Ambisonics or a higher order Ambisonics component (754) from the transport signals in line with the determined directional characteristics of the transport signals, or to determine (91 1 ) a first order Ambisonics or higher order Ambisonics component (754) not associated with the directional characteristics of the transport signals in accordance with a fallback process. Apparatus of one of claims 21 to 27, wherein the transport metadata comprises an information on the first look direction associated with a first transport signal, and an information on a second look direction associated with a second transport signal, wherein the spatial audio synthesizer (750) is configured to select (771 ) a reference signal for the calculation of a component signal of the spatial audio representation based on the transport metadata and the position of a loudspeaker associated with the component signal of the spatial audio representation. Apparatus of claim 28, wherein the first look direction indicates a left or a front hemisphere, wherein the second look direction indicates a right or a back hemisphere, wherein, for the calculation of a component signal for a loudspeaker in the left hem isphere, the first transport signal and not the second transport signal is used (771 ), or wherein for the calculation of a loudspeaker signal in the right hemisphere, the second transport signal and not the first transport signal is used (771 ), or wherein for the calculation of a loudspeaker in a front hemisphere, the first transport signal and not the second transport signal is used (771 ), or wherein for the calcula tion of a loudspeaker in a back hemisphere, the second transport signal and not the first transport signal is used (771 ), or wherein for the calculation of a loudspeaker in a center region, a combination of the left transport signal and the second transport signal is used (771 ), or wherein for the calculation of a loudspeaker signal associated with a loudspeaker in a region between the front hemisphere and the back hemisphere, a combination of the first transport signal and the second transport signal is used (771 ). 30. Apparatus of one of claims 21 to 29, wherein the information on the transport metadata indicates, as a first look direction, a left direction for a left transport signal and indicates, as a second look direction, a right look direction for a second transport signal, wherein the spatial audio synthesizer (750) is configured to calculate a first Ambi-sonics component by adding (920) the first transport signal and the second transport signal, or to calculate a second Ambisonics component by subtracting (921 ) the first transport signal and the second transport signal, or wherein another Ambisonics component is calculated (922) using a sum of the first transport signal and the second transport signal. Apparatus of one of claims 21 to 27, wherein the transport metadata indicates, for a first transport signal, a front look direction and indicates, for a second transport signal, a back look direction, wherein the spatial audio synthesizer (750) is configured to calculate a first order Ambisonics component for an x direction by performing the calculation of a differ ence (921 ) between the first and the second transport signals, and to calculate an omnidirectional first order Ambisonics component using an addition (920) of the first transport signal and the second transport signal, and to calculate (922) another first order Ambisonics component using a sum of the first transport signal and the second transport signal. Apparatus of one of claims 21 to 26, wherein the transport metadata indicate information on weighting coefficients or look directions of transport signals of the transport representation, wherein the spatial audio synthesizer (750) is configured to calculate (932) different first order Ambisonics components of the spatial audio representation using the information on the look direction or the weighting coefficients, using the transport signals and the spatial parameters, or wherein the spatial audio synthesizer (750) is configured to calculate (932) different first order Ambisonics components of the spatial audio representation using the information on the look direction or the weighting coefficients, and using the transport signals. Apparatus of one of claims 21 to 32, wherein the transport metadata include information on the transport signals being derived from microphone signals at two different positions or with different look di rections, wherein the spatial audio synthesizer (750) is configured to select (931 ) a reference signal that has a position that is closest to a loudspeaker position, or to select (932) a reference signal that has a closest look direction with respect to the direction from a reference position of the spatial audio representation and a loudspeaker position, or wherein the spatial audio synthesizer (750) is configured to perform a linear combi nation (771 ) with the transport signals to determine a reference signal for a loudspeaker being placed between two look directions indicated by the transport metadata. Apparatus of one of claims 21 to 33, wherein the transport metadata includes information on a distance between micro phone positions associated with the transport signals, wherein the spatial audio synthesizer (750) comprises a diffuse signal generator (830, 823, 824), and wherein the diffuse signal generator (830, 823, 824) is config ured to control an amount of a decorrelated signal in a diffuse signal generated by the diffuse signal generator using the information on the distance, so that, for a first distance, a higher amount of decorrelated signal is included in the diffuse signal compared to an amount of decorrelated signal for a second distance, wherein the first distance is lower than the second distance, or wherein the spatial audio synthesizer (750) is configured to calculate, for a first dis tance between the microphone positions, a component signal for the spatial audio representation using an output signal of a decorrelation filter (823) configured for decorrelating a reference signal or a scaled reference signal and the reference sig nal weighted (822) using a gain derived from a sound direction of arrival information and to calculate, for a second distance between the microphone positions, a com ponent signal for the spatial audio representation using the reference signal weighted (822) using a gain derived from a sound direction of arrival information without any decorrelation processing, the second distance being greater than the first distance or being greater than a distance threshold. Apparatus of one of claims 21 to 34, wherein the transport metadata comprises information on a beamforming or a spatial filtering associated with the transport signals of the transport representation, and wherein the spatial audio synthesizer (750) is configured to generate (932) a loud speaker signal for a loudspeaker using the transport signal having a look direction being closest to a look direction from a reference position of the spatial audio repre sentation to the loudspeaker. Apparatus of one of claims 21 to 35, wherein the spatial audio synthesizer (750) is configured to determine component signals of the spatial audio representation as a combination (825) of a direct sound component and a diffuse sound component, wherein the direct sound component is obtained by scaling (822) a reference signal with a factor depending on a diffuseness parameter or a directional parameter, wherein the directional parameter depends on a direction of arrival of sound, wherein the determination of the reference signal is performed (821 , 760) based on the information on the transport metadata, and wherein the diffuse sound component is determined (823, 824) using the same ref erence signal and the diffuseness parameter. Apparatus of one of claims 21 to 36, wherein the spatial audio synthesizer (750) is configured to determine component signals of the spatial audio representation as a combination (825) of a direct sound component and a diffuse sound component, wherein the direct sound component is obtained by scaling (822) a reference signal with a factor depending on a diffuseness parameter or a directional parameter, wherein the directional parameter depends on a direction of arrival of sound, wherein the determination of the reference signal is performed (821 , 760) based on the information on the transport metadata, and wherein the diffuse sound component is determined (823, 824) using a decorrelation filter (823), the same reference signal, and the diffuseness parameter. 38. Apparatus of one of claims 21 to 37, wherein the transport representation comprises at least two different microphone signals, wherein the transport metadata comprises information indicating, whether the at least two different microphone signals are at least one of omnidirectional signals, dipole signals or cardioid signals, and wherein the spatial audio synthesizer is configured for adapting (915) a reference signal determination to the transport metadata to determine, for components of the spatial audio representation, individual reference signals and for calculating (916) the respective component using the individual reference signal determined for the respective component. 39. Method for encoding a spatial audio representation representing an audio scene to obtain an encoded audio signal, the method comprising: generating a transport representation from the spatial audio representation; generating transport metadata related to the generation of the transport representa- tion or indicating one or more directional properties of the transport representation; and generating the encoded audio signal, the encoded audio signal comprising infor mation on the transport representation, and information on the transport metadata. 40. Method of claim 39, further comprising deriving spatial parameters from the spatial audio representation, and wherein the encoded audio signal additionally comprises information on the spatial parameters. 41. Method for decoding an encoded audio signal, the method comprising: receiving the encoded audio signal comprising information on a transport representation and information on transport metadata; and synthesizing a spatial audio representation using the information on the transport representation and the information on the transport metadata. 42. Method of claim 41 , further comprising receiving information on spatial parameters, and wherein the synthesizing additionally uses the information on the spatial param eters. 43. Computer program for performing, when running on a computer or a processor, the method of any one of claims 39 to 42. 44. Encoded audio signal comprising: information on a transport representation (611 ) of a spatial audio representation; and information on transport metadata (610). 45. Encoded audio signal of claim 44, further comprising information on spatial pa ram- eters (612) associated with the transport representation (611 ).

Documents

Application Documents

#	Name	Date
1	202127032698-STATEMENT OF UNDERTAKING (FORM 3) [20-07-2021(online)].pdf	2021-07-20
2	202127032698-REQUEST FOR EXAMINATION (FORM-18) [20-07-2021(online)].pdf	2021-07-20
3	202127032698-PRIORITY DOCUMENTS [20-07-2021(online)].pdf	2021-07-20
4	202127032698-FORM 18 [20-07-2021(online)].pdf	2021-07-20
5	202127032698-FORM 1 [20-07-2021(online)].pdf	2021-07-20
6	202127032698-FIGURE OF ABSTRACT [20-07-2021(online)].jpg	2021-07-20
7	202127032698-DRAWINGS [20-07-2021(online)].pdf	2021-07-20
8	202127032698-DECLARATION OF INVENTORSHIP (FORM 5) [20-07-2021(online)].pdf	2021-07-20
9	202127032698-COMPLETE SPECIFICATION [20-07-2021(online)].pdf	2021-07-20
10	202127032698-Proof of Right [21-09-2021(online)].pdf	2021-09-21
11	202127032698-FORM-26 [21-09-2021(online)].pdf	2021-09-21
12	202127032698.pdf	2021-10-19
13	202127032698-ORIGINAL UR 6(1A) FORM 1 & FORM 26-270921.pdf	2021-11-01
14	202127032698-FORM 3 [20-01-2022(online)].pdf	2022-01-20
15	Abstract1.jpg	2022-04-07
16	202127032698-FER.pdf	2022-04-22
17	202127032698-FORM 3 [13-06-2022(online)].pdf	2022-06-13
18	202127032698-FORM 4(ii) [13-10-2022(online)].pdf	2022-10-13
19	202127032698-FORM 3 [14-12-2022(online)].pdf	2022-12-14
20	202127032698-OTHERS [20-01-2023(online)].pdf	2023-01-20
21	202127032698-FER_SER_REPLY [20-01-2023(online)].pdf	2023-01-20
22	202127032698-DRAWING [20-01-2023(online)].pdf	2023-01-20
23	202127032698-COMPLETE SPECIFICATION [20-01-2023(online)].pdf	2023-01-20
24	202127032698-CLAIMS [20-01-2023(online)].pdf	2023-01-20
25	202127032698-ABSTRACT [20-01-2023(online)].pdf	2023-01-20
26	202127032698-FORM 3 [23-01-2024(online)].pdf	2024-01-23
27	202127032698-PatentCertificate19-07-2024.pdf	2024-07-19
28	202127032698-IntimationOfGrant19-07-2024.pdf	2024-07-19

Search Strategy

1	neww2E_21-04-2022.pdf

ERegister / Renewals

3rd: 21 Aug 2024

From 21/01/2022 - To 21/01/2023

4th: 21 Aug 2024

From 21/01/2023 - To 21/01/2024

5th: 21 Aug 2024

From 21/01/2024 - To 21/01/2025

6th: 21 Aug 2024

From 21/01/2025 - To 21/01/2026