Abstract: An audio analyzer configured to obtain spectral domain representations of two or more input audio signals. Additionally the audio analyzer is configured to obtain directional information associated with spectral bands of the spectral domain representations and to obtain loudness information associated with different directions as an analysis result. Contributions to the loudness information are determined in dependence on the directional information.
Directional loudness map based audio processing
Description
Technical Field
Embodiments according to the invention related to a directional loudness map based audio processing.
Background of the Invention
Since the advent of perceptual audio coders, a considerable interest arose in developing algorithms that can predict audio quality of the coded signals without relying on extensive subjective listening tests to save time and resources. Algorithms performing a so-called objective assessment of quality on monaurally coded signals such as PEAQ [3] or POLQA [4] are widely spread. However, their performance for signals coded with spatial audio techniques is still considered unsatisfactory [5]. In addition, non-waveform preserving techniques such as bandwidth extension (BWE) are also known for causing these algorithms to overestimate the quality loss [6] since many of the features extracted for analysis assume waveform preserving conditions. Spatial audio and BWE techniques are predominantly used at low-bitrate audio coding (around 32 kbps per channel).
It is assumed that spatial audio content of more than two channels can be rendered to a binaural representation of the signals entering the left and the right ear by using sets of Head Related Transfer Functions (HRTFs) and/or Binaural Room Impulse Responses (BRIR) [5, 7]. Most of the proposed extensions for binaural objective assessment of quality are based on well-known binaural auditory cues related to the human perception of sound localization and perceived auditory source width such as Inter-aural Level Differences (ILD), Inter-aural Time Differences (ITD) and Inter-aural Cross-Correlation (IACC) between signals entering the left and the right ear [1, 5, 8, 9]. In the context of objective quality evaluation, features are extracted based on these spatial cues from reference and test signals and a distance measure between the two is used as a distortion index. The consideration of these spatial cues and their related perceived distortions allowed for considerable progress in the context of spatial audio coding algorithm design [7]. However, in the use case of predicting the overall spatial audio coding quality, the interaction of these cue distortions with each other and with monaural/timbral distortions (especially in non-waveform-preserving cases) renders a complex scenario [10] with varying results when using the features to predict a single quality score given by subjective quality tests such as MUSHRA [11]. Other alternative models have also been proposed [2] in which the output of a binaural model is further processed by a clustering algorithm to identify the number of participating sources in the instantaneous auditory image and therefore is also an abstraction of the classical auditory cue distortion models. Nevertheless, the model in [2] is mostly focused on moving sources in space and its performance is also limited by the accuracy and tracking ability of the associated clustering algorithm. The number of added features to make this model usable is also significant.
Objective audio quality measurement systems should also employ the fewest, mutually independent and most relevant extracted signal features as possible to avoid the risk of over-fitting given the limited amount of ground-truth data for mapping feature distortions to quality scores provided by listening tests [3].
One of the most salient distortion characteristics reported in listening tests for spatially coded audio signals at low bitrates is described as a collapse of the stereo image towards the center position and channel cross-talk [12].
Therefore, it is desired to acquire a concept which provides an improved, efficient and high-accuracy audio analysis, audio encoding and audio decoding.
This is achieved by the subject matter of the independent claims of the present application.
Further embodiments according to the invention are defined by the subject matter of the dependent claims of the present application.
Summary of the Invention
An embodiment according to this invention is related to an audio analyzer, for example, an audio signal analyzer. The audio analyzer is configured to obtain spectral-domain representations of two or more input audio signals. Thus, the audio analyzer is, for example, configured to determine or receive the spectral-domain representations. According to an embodiment, the audio analyzer is configured to obtain the spectral-domain representations by decomposing the two or more input audio signals into time-frequency tiles. Furthermore, the audio analyzer is configured to obtain directional information associated with spectral bands of the spectral-domain representations. The directional information represents, for example, different directions (or positions) of audio components contained in the two or more input audio signals. According to an embodiment, the directional information can be understood as a panning index, which describes, for example, a source location in a sound field created by the two or more input audio signals in a binaural processing. In addition, the audio analyzer is configured to obtain loudness information associated with different directions as an analysis result, wherein contributions to the loudness information are determined in dependence on the directional information. In other words, the audio analyzer is, for example, configured to obtain the loudness information associated with different panning directions or panning indices or for a plurality of different evaluated direction ranges as an analysis result. According to an embodiment, the different directions, for example, panning directions, panning indices and/or direction ranges, can be obtained from the directional information. The loudness information comprises, for example, a directional loudness map or level information or energy information. The contributions to the loudness information are, for example, contributions of spectral bands of the spectral-domain representations to the loudness information. According to an embodiment, the contributions to the loudness information are contributions to values of the loudness information associated with the different directions.
This embodiment is based on the idea that it is advantageous to determine the loudness information in dependence on the directional information obtained from the two or more input audio signals. This enables to obtain information about loudness of different sources in a stereo audio mix realized by the two or more audio signals. Thus, with the audio analyzer a perception of the two or more audio signals can be analyzed very efficiently by obtaining the loudness information associated with different directions as an analysis result. According to an embodiment, the loudness information can comprise or represent a directional loudness map, which gives, for example, information about a loudness of a combination of the two or more signals at the different directions or information about a loudness of at least one common time signal of the two or more input audio signals, averaged over all ERB bands (ERB = equivalent rectangular bandwidth).
According to an embodiment, the audio analyzer is configured to obtain a plurality of weighted spectral-domain (e.g., time-frequency-domain) representations (e.g., "directional signals") on the basis of the spectral-domain (e.g., time-frequency-domain) representations of the two or more input audio signals. Values of the one or more spectral-domain representations are weighted in dependence on the different directions (e.g., panning direct)(e.g., represented by weighting factors) of the audio components (for example, of spectral bins or spectral bands)(e.g., tunes from instruments or singer) in the two or more input audio signals to obtain the plurality of weighted spectral-domain representations (e.g., "directional signals"). The audio analyzer is configured to obtain loudness information (e.g., loudness values for a plurality of different directions; e.g., a "directional loudness map") associated with the different directions (e.g., panning directions) on the basis of the weighted spectral-domain representations (e.g., "directional signals") as the analysis result.
This means, for example, that the audio analyzer analyzes in which direction of the different directions of the audio components the values of the one or more spectral-domain representations influence the loudness information. Each Spectral bin is, for example, associated with a certain direction, wherein a loudness information associated with a certain direction can be determined by the audio analyzer based on more than on spectral bin associated with this direction. The weighing can be performed for each bin or each spectral band of the one or more spectral-domain representations. According to an embodiment, the values of a frequency bin or a frequency group are windowed by the weighing to one of the different directions. For example, they are weighted to the direction they are associated with and/or to neighboring directions. The direction is, for example associated with a direction in which the frequency bin or frequency group influences the loudness information. Values deviating from that direction are, for example, weighted less importantly. Thus, the plurality of weighted spectral-domain representations can provide an indication of spectral bins or spectral bands influencing the loudness information in the different directions. According to an embodiment, the plurality of weighted spectral-domain representations can represent at least partially the contributions to the loudness information.
According to an embodiment, the audio analyzer is configured to decompose (e.g. transform) the two or more input audio signals into a short-time Fourier transform (STFT) domain (e.g., using a Hann window) to obtain two or more transformed audio signals. The two or more transform audio signals can represent the spectral-domain (e.g., the time-frequency-domain) representations of the two or more input audio signals.
According to an embodiment, the audio analyzer is configured to group spectral bins of the two or more transformed audio signals to spectral bands of the two or more transformed audio signals (e.g., such that bandwidths of the groups or spectral bands increase with
increasing frequency)(e.g., based on a frequency selectivity of the human cochlea). Furthermore the audio analyzer is configured to weight the spectral bands (for example, spectral bins within the spectral bands) using different weights, based on an outer-ear and middle-ear model, to obtain the one or more spectral-domain representations of the two or more input audio signals. With the special grouping of the spectral bins into spectral bands and with the weighting of the spectral bands the two or more input audio signals are prepared such that a loudness perception of the two or more input audio signals by a user, hearing said signals, can be estimated or determined very precisely and efficiently by the audio analyzer in terms of determining the loudness information. With this feature the transform audio signals respectively the spectral-domain representations of the two or more input audio signals are adapted to the human ear, to improve an information content of the loudness information obtained by the audio analyzer.
According to an embodiment, the two or more input audio signals are associated with different directions or different loudspeaker positions (e.g., L (left), R (right)). The different directions or different loudspeaker positions can represent different channels for a stereo and/or a multichannel audio scene. The two or more input audio signals can be distinguished from each other by indices, which can, for example, be represented by letters of the alphabet (e.g., L (left), R (right), M (middle)) or, for example, by a positive integer indicating the number of the channel of the two or more input audio signals. Thus the indices can indicate the different directions or loudspeaker positions, with which the two or more input audio signal are associated with (e.g., they indicate a position, where the input signals originate in a listening space). According to an embodiment, the different directions (in the following, for example, first different directions) of the two or more input audio signals are not related to the different directions (in the following, for example, second different directions) with which the loudness information, obtained by the audio analyzer, is associated. Thus, a direction of the first different directions can represent a channel of a signal of the two or more input audio signals and a direction of the second different directions can represent a direction of an audio component of a signal of the two or more input audio signals. The second different directions can be positioned between the first directions. Additionally or alternatively the second different directions can be positioned outside of the first directions and/or at the first directions.
According to an embodiment, the audio analyzer is configured to determine a direction-dependent weighting (e.g., based on panning directions) per spectral bin (e.g., and also per time step/frame) and for a plurality of predetermined directions (desired panning directions).
The predetermined directions represent, for example, equidistant directions, which can be associated with predetermined panning directions/indices. Alternatively the predetermined directions are, for example, determined using the directional information associated with spectral bands of the spectral-domain representations, obtained by the audio analyzer. According to an embodiment, the directional information can comprise the predetermined directions. The direction-dependent weighting is, for example, applied to the one or more spectral-domain representations of the two or more input audio signals by the audio analyzer. With the direction-dependent weighting a value of a spectral bin is, for example, associated with one or more directions of the plurality of predetermined directions. This direction-dependent weighting is, for example, based on the idea that each spectral bin of the spectral-domain representations of the two or more input audio signals contribute to the loudness information at one or more different directions of the plurality of predetermined directions. Each spectral bin contributes, for example, primarily to one direction and only in a small amount to neighboring directions, whereby it is advantageous to weight a value of a spectral bin differently for different directions.
According to an embodiment, the audio analyzer is configured to determine a direction dependent weighting using a Gaussian function, such that the direction dependent weighting decreases with increasing deviation between respective extracted direction values (e.g., associated with the time-frequency bin under consideration) and respective predetermined direction values. The respective extracted direction values can represent directions of audio components in the two or more input audio signals. An interval for the respective extracted direction values can lie between a direction totally to the left and a direction totally to the right, wherein the directions left and right are with respect to a user perceiving the two or more input audio signals (e.g., facing the loudspeakers). According to an embodiment, the audio analyzer can determine each extracted direction value as a predetermined direction value or equidistant direction values as predetermined direction values. Thus, for example, one or more spectral bins corresponding to an extracted direction are weighted at predetermined directions neighboring this extracted direction according to the Gaussian function less importantly than at the predetermined direction corresponding to the extracted direction value. The greater the distance of a predetermined direction is to an extracted direction, the more the weighting of the spectral bins or of spectral bands decreases, such that, for example, a spectral bin has nearly or no influence on a loudness perception at a location far away from the corresponding extracted direction.
According to an embodiment, the audio analyzer is configured to determine panning index values as the extracted direction values. The panning index values will, for example, uniquely indicate a direction of time-frequency components (i. e. the spectral bins) of sources in a stereo mix created by the two or more input audio signals.
According to an embodiment, the audio analyzer is configured to determine the extracted direction values in dependence on spectral-domain values of the input audio signals (e.g., values of the spectral-domain representations of the input audio signals). The extracted direction values are, for example, determined on the basis of an evaluation of an amplitude panning of signal components (e.g., in time frequency bins) between the input audio signals, or on the basis of a relationship between amplitudes of corresponding spectral-domain values of the input audio signals. According to an embodiment, the extracted direction values define a similarity measure between the spectral-domain values of the input audio signals.
According to an embodiment, the audio analyzer is configured to obtain the direction- dependent weighting associated with a predetermined direction (e.g.,
represented by index Ψ0,j), a time (or time frame) designated with a time index m, and a spectral bin designated by a spectral bin index k according to
wherein ξ is a predetermined value (which controls, for example, a width
of a Gaussian window), Ψ(m, k) designates the extracted direction values associated with a time (or time frame) designated with a time index m, and a spectral bin designated by a spectral bin index k and Ψ0,j is a direction value which designates (or is associated with) a predetermined direction (e.g., having direction index j). The direction-dependent weighting is based on the idea that spectral values or spectral bins or spectral bands with an extracted direction value (e.g. a panning index) equaling Ψ0,j (e.g., equaling the predetermined direction) pass the direction-dependent weighting unmodified and spectral values or spectral bins or spectral bands with an extracted direction value (e.g. a panning index) deviating from Ψ0,j are weighted. According to an embodiment, spectral values or spectral bins or spectral bands with an extracted direction value near Ψ0,j are weighted and passed and the rest of the values are rejected (e.g., not processed further).
According to an embodiment, the audio analyzer is configured to apply the direction- dependent weighting to the one or more spectral-domain representations of the two or more input audio signals, in order to obtain the weighted spectral-domain representations (e.g., "directional signals"). Thus, the weighted spectral-domain representations comprise, for example, spectral bins (i.e. time-frequency components) of the one or more spectral-domain representations of the two or more input audio signals that correspond to one or more predetermined directions within, for example, a tolerance value (e.g., also spectral bins associated with different predetermined directions neighboring a selected predetermined direction). According to an embodiment, for each predetermined direction a weighted spectral-domain representation can be realized by the direction-dependent weighting (e.g., the weighted spectral-domain representation can comprise direction-dependent weighted spectral values, spectral bins or spectral bands associated with the predetermined direction and/or associated with a direction in a vicinity of the predetermined direction over time). Alternatively, for each spectral-domain representation (e.g., of the two or more input audio signals) one weighted spectral-domain representation is obtained, which represents, for example, the corresponding spectral-domain representation weighted for all predetermined directions.
According to an embodiment, the audio analyzer is configured to obtain the weighted spectral-domain representations, such that signal components having associated a first predetermined direction (e.g., a first panning direction) are emphasized over signal components having associated other directions (which are different from the first predetermined direction and which are, for example, attenuated according to the Gaussian function) in a first weighted spectral-domain representation and such that signal components having associated a second predetermined direction (which is different from the first predetermined direction)(e.g., a second panning direction) are emphasized over signal components having associated other directions (which are different from the second predetermined direction, and which are, for example, attenuated according to the Gaussian function) in a second weighted spectral-domain representation. Thus, for example, for each predetermined direction, a weighted spectral-domain representation for each signal of the two or more input audio signals can be determined.
According to an embodiment, the audio analyzer is configured to obtain the weighted spectral-domain representations associated with an input audio signal or
combination of input audio signals designated by index i, a spectral band designated by index b, a direction designated by index Ψ0,j, a time (or time frame) designated with a time index m, and a spectral bin designated by a spectral bin index k according to
designates a spectral-domain representation associated
with an input audio signal or combination of input audio signals designated by index i (e.g., i=L or i=R or i=DM; wherein L=left, R=right and DM=downmix), a spectral band designated by index b, a time (or time frame) designated with a time index m, and a spectral bin designated by a spectral bin index k and designates the direction-dependent
weighting (e.g., a weighting function like a Gaussian function) associated with a direction designated by index Ψ0,j, a time (or time frame) designated with a time index m, and a spectral bin designated by a spectral bin index k. Thus, the weighted spectral-domain representations can be determined, for example, by weighting the spectral-domain representation associated with an input audio signal or a combination of input audio signals by the direction-dependent weighting.
According to an embodiment, the audio analyzer is configured to determine an average over a plurality of band loudness values (e.g., associated with different frequency bands but the same direction, e.g. associated with a predetermined direction and/or directions in a vicinity of the predetermined direction), in order to obtain a combined loudness value (e.g., associated with a given direction or panning direction, i.e. the predetermined direction). The combined loudness value can represent the loudness information obtained by the audio analyzer as the analysis result. Alternatively, the loudness information obtained by the audio analyzer as the analysis result can comprise the combined loudness value. Thus, the loudness information can comprise combined loudness values associated with different predetermined directions, out of which a directional loudness map can be obtained.
According to an embodiment, the audio analyzer is configured to obtain band loudness values for a plurality of spectral bands (for example, ERB-bands) on the basis of a weighted combined spectral-domain representation representing a plurality of input audio signals (e.g., a combination of the two or more input audio signals)(e.g., wherein the weighted combined spectral representation may combine the weighted spectral-domain representations associated with the input audio signals). Additionally the audio analyzer is configured to obtain, as the analysis result, a plurality of combined loudness values (covering a plurality of spectral bands; for example, in the form of a single scalar value) on the basis of the obtained band loudness values for a plurality of different directions (or panning directions). Thus, for example, the audio analyzer is configured to average over all band loudness values associated with the same direction to obtain a combined loudness value associated with this direction (e.g., resulting in a plurality of combined loudness values). The audio analyzer is, for example, configured to obtain for each predetermined direction a combined loudness value.
According to an embodiment, the audio analyzer is configured to compute a mean of squared spectral values of the weighted combined spectral-domain representation over spectral values of a frequency band (or over spectral bins of a frequency band), and to apply an exponentiation having an exponent between 0 and 1/2 (and preferably smaller than or equal to 1/3 or ¼) to the mean of squared spectral values, in order to determine the band loudness values (associated with a respective frequency band).
According to an embodiment, the audio analyzer is configured to obtain the band loudness values associated with a spectral band designated with index b, a direction
designated with index Ψ0,j, a time (or time frame) designated with a time index m according to The Factor Kb designates a number of
spectral bins in a frequency band having frequency band index b. The variable k is a running variable and designates spectral bins in the frequency band having frequency band index b, wherein b designates a spectral band, designates a weighted combined
spectral-domain representation associated with a spectral band designated with index b, a direction designated by index Ψ0,j, a time (or time frame) designated with a time index m and a spectral bin designated by a spectral bin index k.
According to an embodiment, the audio analyzer is configured to obtain a plurality of combined loudness values L(m, Ψ0,j) associated with a direction designated with index Ψ0,j and a time (or time frame) designated with a time index m according to
The Factor B designates a total number of spectral bands b and
designates band loudness values associated with a spectral band designated with index b, a direction designated with index Ψ0,j and a time (or time frame) designated with a time index m.
According to an embodiment, the audio analyzer is configured to allocate loudness contributions to histogram bins associated with different directions (e.g., second different directions, as described above; e.g. predetermined directions) in dependence on the directional information, in order to obtain the analysis result. The loudness contributions are, for example, represented by the plurality of combined loudness values or by the plurality of band loudness values. Thus, for example, the analysis result comprises a directional
loudness map, defined by the histogram bins. Each histogram bin is, for example, associated with one of the predetermined directions.
According to an embodiment, the audio analyzer is configured to obtain loudness information associated with spectral bins on the basis of the spectral-domain representations (e.g., to obtain a combined loudness per T/F tile). The audio analyzer is configured to add a loudness contribution to one or more histogram bins on the basis of a loudness information associated with a given spectral bin. A loudness contribution associated with a given spectral bin is, for example, added to different histogram bins with a different weighting (e.g., depending on the direction corresponding to the histogram bin). A selection, to which one or more histogram bins the loudness contribution is made (i.e. is added), is based on a determination of the directional information (i.e. of the extracted direction value) for a given spectral bin. According to an embodiment, each histogram bin can represent a time-direction tile. Thus, a histogram bin is, for example, associated with a loudness of the combined two or more input audio signals at a certain time frame and direction. For the determination of the directional information for a given spectral bin, for example, level information for corresponding spectral bins of the spectral-domain representations of the two or more input audio signals are analyzed.
According to an embodiment, the audio analyzer is configured to add loudness contributions to a plurality of histogram bins on the basis of a loudness information associated with a given spectral bin, such that a largest contribution (e.g., main contribution) is added to a histogram bin associated with a direction that corresponds to the directional information associated with the given spectral bin (i.e. of the extracted direction value), and such that reduced contributions (e.g., comparatively smaller than the largest contribution or main contribution) are added to one or more histogram bins associated with further directions (e.g., in a neighborhood of the direction that corresponds to the directional information associated with the given spectral bin). As described above, each histogram bin can represent a time-direction tile. According to an embodiment, a plurality of histogram bins can define a directional loudness map, wherein the directional loudness map defines, for example, loudness for different directions over time for a combination of the two or more input audio signals.
According to an embodiment, the audio analyzer is configured to obtain directional information on the basis of an audio content of the two or more input audio signals. The directional information comprises, for example, directions of components or sources in the audio content of the two or more input audio signals. In other words, the directional information can comprise panning directions or panning indices of sources in the stereo mix of the two or more input audio signals.
According to an embodiment, the audio analyzer is configured to obtain directional information on the basis of an analysis of an amplitude panning of audio content. Additionally or alternatively the audio analyzer is configured to obtain directional information on the basis of an analysis of a phase relationship and/or a time delay and/or correlation between audio contents of two or more input audio signals. Additionally or alternatively the audio analyzer is configured to obtain directional information on the basis of an identification of widened (e.g., decorrelated and/or panned) sources. The analysis of the amplitude panning of the audio content can comprise an analysis of a level correlation between corresponding spectral bins of the spectral-domain representations of the two or more input audio signals (e.g., corresponding spectral bins with the same level can be associated with a direction in a middle of two loudspeaker transmitting one of two input audio signals each). Similarly, the analysis of the phase relationship and/or the time delay and/or the correlation between audio contents can be performed. Thus, for example, the phase relationship and/or the time delay and/or the correlation between audio contents is analyzed for corresponding spectral bins of the spectral-domain representations of the two or more input audio signals. Additionally or alternatively, aside from inter-channel level/time difference comparisons, there is a further (e.g. third) method for directional information estimation. This method consists in matching the spectral information of an incoming sound to pre-measured "template spectral responses/filters" of Head Related Transfer Functions (HRF) in different directions.
For example: at a certain time/frequency tile, the spectral envelope of the incoming signal at 35 degree from left and right channels might closely match the shape of the linear filters for the left and right ears measured at an angle of 35 degrees. Then, an optimization algorithm or pattern matching procedure will assign the direction of arrival of the sound to be 35°. More information can be found here: https://iem.kug.ac.at/fileadmin/media/iem/projects/2011/baumgartner_robert.pdf (see, for example, Chapter 2). This method has the advantage of allowing to estimate the incoming direction of elevated sound sources (sagittal plane) in addition to horizontal sources. This method is based, for example, on spectral level comparisons.
According to an embodiment, the audio analyzer is configured to spread loudness information to a plurality of directions (e.g., beyond a direction indicated by the directional information) according to a spreading rule (for example, a Gaussian spreading rule, or a limited, discrete spreading rule). This means, for example, that a loudness information corresponding to a certain spectral bin, associated with a certain directional information, can also contribute to neighboring directions (of the certain direction of the spectral bin) according to the spreading rule. According to an embodiment, the spreading rule can comprise or correspond to a direction-dependent weighting, wherein the direction-dependent weighting in this case, for example, defines differently weighted contributions of the loudness information of a certain spectral bin to the plurality of directions.
An embodiment according to this invention is related to an audio similarity evaluator, which is configured to obtain a first loudness information (e.g., a directional loudness map; e.g., one or more combined loudness values) associated with different (e.g., panning) directions on the basis of a first set of two or more input audio signals. The audio similarity evaluator is configured to compare the first loudness information with a second (e.g. corresponding) loudness information (e.g., reference loudness information, reference directional loudness map and/or reference combined loudness value) associated with the different (e.g., panning) directions and with a set of two or more reference audio signals, in order to obtain a similarity information (e.g., a "Model Output Variable" (MOV); for example, a single scalar value) describing a similarity between the first set of two or more input audio signals and the set of two or more reference audio signals (or representing, for example, a quality of the first set of two or more input audio signals when compared to the set of two or more reference audios signals).
This embodiment is based on the idea that it is efficient and improves the accuracy of an audio quality indication (e.g., the similarity information), to compare directional loudness information (e.g., the first loudness information) of two or more input audio signals with a directional loudness information (e.g., the second loudness information) of two or more reference audio signals. The usage of loudness information associated with different directions is especially advantageous with regard to stereo mixes or multichannel mixes, because the different directions can be associated, for example, with directions (i. e. panning directions, panning indices) of sources (i. e. audio components) in the mixes. Thus effectively the quality degradation of a processed combination of the two or more input audio signals can be measured. Another advantage is, that non-waveform preserving audio
processing such as bandwidth extension (BWE) does only minimally or not influence the similarity information, since the loudness information for the stereo image or multichannel image is, for example, determined in a Short-Time Fourier Transform (STFT) domain. Moreover the similarity information based on loudness information can easily be complemented with monaural/timbral similarity information to improve a perceptual prediction for the two or more input audio signals. Thus only one similarity information additional to monaural quality descriptors is, for example, used, which can reduce a number of independent and relevant signal features used by an objective audio quality measurement system with regard to known systems only using monaural quality descriptors. Using fewer features for the same performance will reduce the risk of over-fitting and indicates their higher perceptual relevance.
According to an embodiment, the audio similarity evaluator is configured to obtain the first loudness information (e.g., a directional loudness map) such that the first loudness information (for example, a vector comprising combined loudness values for a plurality of predetermined directions) comprises a plurality of combined loudness values associated with the first set of two or more input audio signals and associated with respective predetermined directions, wherein the combined loudness values of the first loudness information describe loudness of signal components of the first set of two or more input audio signals associated with the respective predetermined directions (wherein, for example, each combined loudness value is associated with a different direction). Thus, for example, each combined loudness value can be represented by a vector defining, for example, a change of loudness over time for a certain direction. This means, for example, that one combined loudness value can comprise one or more loudness values associated with consecutive time frames. The predetermined directions can be represented by panning directions/panning indices of the signal components of the first set of two or more input audio signals. Thus, for example, the predetermined directions can be predefined by amplitude leather panning techniques used for a positioning of directional signals in a stereo or multichannel mix represented by the first set of two or more input audio signals.
According to an embodiment, the audio similarity evaluator is configured to obtain the first loudness information (e.g., directional loudness map) such that the first loudness information is associated with combinations of a plurality of weighted spectral-domain representations (e.g., of each audio signal) of the first set of two or more input audio signals associated with respective predetermined directions (e.g., each combined loudness value and/or weighted spectral-domain representation is associated with a different
predetermined direction). This means, for example, that for each input audio signal at least one weighted spectral-domain representation is calculated and that then all the weighted spectral-domain representations associated with the same predetermined direction are combined. Thus, the first loudness information represents, for example, loudness values associated with multiple spectral bins associated with the same predetermined direction. At least some of the multiple spectral bins are, for example, weighted differently than other bins of the multiple spectral bins.
According to an embodiment, the audio similarity evaluator is configured to determine a difference between the second loudness information and the first loudness information to obtain a residual loudness information. According to an embodiment, the residual loudness information can represent the similarity information, or the similarity information can be determined based on the residual loudness information. The residual loudness information is, for example, understood as a distance measure between the second loudness information and the first loudness information. Thus, the residual loudness information can be understood as a directional loudness distance (e.g., DirLoudDist). With this feature very efficiently a quality of the two or more input audio signals associated with the first loudness information can be determined.
According to an embodiment, the audio similarity evaluator is configured to determine a value (e.g., a single scalar value) that quantifies the difference over a plurality of directions (and optionally also over time, for example, over a plurality of frames). The audio similarity evaluator is, for example, configured to determine an average of a magnitude of the residual loudness information over all directions (e.g. panning directions) and over time as the value that quantifies the difference. Thereby a single number termed Model Output Variable (MOV) is, for example, determined, wherein the MOV defines a similarity of the first set of two or more input audio signals with respect to the set of two or more reference audio signals.
According to an embodiment, the audio similarity evaluator is configured to obtain the first loudness information and/or the second loudness information (e.g. as directional loudness maps) using an audio analyzer according to one of the embodiments described herein.
According to an embodiment, the audio similarity evaluator is configured to obtain a direction component (e.g., direction information) used for obtaining the loudness information associated with different directions (e.g., one or more directional loudness maps) using
metadata representing position information of loudspeakers associated with the input audio signals. The different directions are not necessarily associated with the direction component. According to an embodiment, the direction component is associated with the two or more input audio signals. Thus, the direction component can represent a loudspeaker identifier or a channel identifier dedicated, for example, to different directions or positions of a loudspeaker. On the contrary, the different directions, with which the loudness information is associated, can represent directions or positions of audio components in an audio scene realized by the two or more input audio signals. Alternatively, the different directions can represent equally spaced directions or positions in a position interval (e.g., [-1; 1], wherein -1 represents signals panned fully to the left and +1 represents signals panned fully to the right) in which the audio scene realized by the two or more input audio signals can unfold. According to an embodiment, the different directions can be associated with the herein described predetermined directions. The direction component is, for example, associated with boundary points of the position interval.
An embodiment according to this invention is related to an audio encoder for encoding an input audio content comprising one or more input audio signals (preferably a plurality of input audio signals). The audio encoder is configured to provide one or more encoded (e.g., quantized and then losslessly encoded) audio signals (e.g., encoded spectral-domain representations) on the basis of one or more input audio signals (e.g., left signal and right signal), or one or more signals derived therefrom (e.g., mid signal or downmix signal and side-signal or difference signal). Additionally the audio encoder is configured to adapt encoding parameters (e.g., for the provision of the one or more encoded audio signals; e.g., quantization parameters) in dependence on one or more directional loudness maps which represent loudness information associated with a plurality of different directions (e.g., panning directions) of the one or more signals to be encoded (e.g., in dependence on contributions of individual directional loudness maps of the one or more signals to be quantized to an overall directional loudness map, e.g., associated with multiple input audio signals (e.g., with each signal of the one or more input audio signals))
Audio content comprising one input audio signal can be associated with a monaural audio scene, an audio content comprising two input audio signals can be associated with a stereo audio scene and an audio content comprising three or more input audio signals can be associated with a multichannel audio scene. According to an embodiment, the audio encoder provides for each input audio signal a separate encoded audio signal as output
signal or provides one combined output signal comprising two or more encoded audio signals of two or more input audio signals.
The directional loudness maps (i.e. DirLoudMap), on which the adaptation of the encoding parameters depends on, can vary for different audio content. Thus for a monaural audio scene the directional loudness map, for example, comprises only for one direction loudness values (based on the only input audio signal) deviating from zero and comprises, for example, for all other directions loudness values, which equal zero. For a stereo audio scene the directional loudness map represents, for example, loudness information associated with both input audio signals, wherein the different directions are, for example, associated with positions or directions of audio components of the two input audio signals. In the case of three or more input audio signals the adaptation of the encoding parameters depends, for example, on three or more directional loudness maps, wherein each directional loudness map corresponds to a loudness information associated with two of the three input audio signals (e.g., a first DirLoudMap can correspond to a first and a second input audio signal; a second DirLoudMap can correspond to the first and a third input audio signal; and a third DirLoudMap can correspond to the second and the third input audio signal). As described with regard to the stereo audio scene the different directions for the directional loudness maps are in case of multichannel audio scene, for example, associated with positions or directions of audio components of the multiple input audio signals.
The embodiments of this audio encoder are based on the idea that it is efficient and improves the accuracy of the encoding, to depend an adaptation of encoding parameters on one or more directional loudness maps. The encoding parameters are, for example, adapted in dependence on a difference of the directional loudness map associated to the one or more input audio signals and a directional loudness map associated to one or more reference audio signals. According to an embodiment, overall directional loudness maps, of a combination of all input audio signals and of a combination of all reference audio signals, are compared or alternatively directional loudness maps of individual or paired signals are compared to an overall directional loudness map of all input audio signals (e.g., more than one difference can be determined). The difference between the DirLoudMaps can represent a quality measure for the encoding. Thus the encoding parameters are, for example, adapted such that the difference is minimized, to ensure a high quality encoding of the audio content or the encoding parameters are adapted such that only signals of the audio content, corresponding to a difference under a certain threshold, are encoded, to reduce a complexity of the encoding. Alternatively the encoding parameters are, for
example, adapted in dependence on a ratio (e.g., contributions) of individual signals DirLoudMaps or of signal pairs DirLoudMaps to an overall DirLoudMap (e.g., a DirLoudMap associated to a combination of all input audio signals). This ratio can similarly to the difference indicate a similarity between individual signals or signal pairs of the audio content or between individual signals and a combination of all signals of the audio content or signal pairs and a combination of all signals of the audio content, resulting in a high quality encoding and/or a reduction of a complexity of the encoding.
According to an embodiment, the audio encoder is configured to adapt a bit distribution between the one or more signals and/or parameters to be encoded (or, for example, between two or more signals and/or parameters to be encoded)(e.g., between a residual signal and a downmix signal, or between a left channel signal and a right channel signal, or between two or more signals provided by a joint encoding of multiple signals, or between a signal and parameters provided by a joint encoding of multiple signals) in dependence on contributions of individual directional loudness maps of the one or more signals and/or parameters to be encoded to an overall directional loudness map. The adaptation of the bit distribution is, for example, understood as an adaptation of the encoding parameters by the audio encoder. The bit distribution can also be understood as a bitrate distribution. The bit distribution is, for example, adapted by controlling a quantization precision of the one or more input audio signals of the audio encoder. According to an embodiment, a high contribution can indicate a high relevance of the corresponding input audio signal or pair of input audio signals for a high quality perception of an audio scene created by the audio content. Thus, for example, the audio encoder can be configured to provide many bits for the signals with a high contribution and just few or no bits for signals with a low contribution. Thus, an efficient and high-quality encoding can be achieved.
According to an embodiment, the audio encoder is configured to disable encoding of a given one of the signals to be encoded (e.g., of a residual signal), when contributions of an individual directional loudness map of the given one of the signals to be encoded (e.g., of the residual signal) to an overall directional loudness map is below a (e.g., predetermined) threshold. The encoding is, e.g., disabled if an average ratio or a ratio in a direction of maximum relative contribution is below the threshold. Alternatively or additionally contributions of directional loudness maps of signal pairs (e.g., individual directional loudness maps of signal pairs (e.g., as signal pairs a combination of two signals can be understood; e.g., As signal pairs a combination of signals associated with different channels and/or residual signals and/or downmix signals can be understood.)) to the overall
directional loudness map can be used by the encoder to disable the encoding of the given one of the signals (e.g., for three signals to be encoded: As described above three directional loudness maps of signal pairs can be analyzed with respect to the overall directional loudness map; Thus the encoder can be configured to determine the signal pair with the highest contribution to the overall directional loudness map and encode only this two signals and to disable the encoding for the remaining signal).The disabling of an encoding of a signal is, for example, understood as an adaptation of encoding parameters. Thus, signals not highly relevant for a perception of the audio content by a listener don’t need to be encoded, which results in a very efficient encoding. According to an embodiment, the threshold can be set to smaller than or equal to 5%, 10%, 15%, 20% or 50% of the loudness information of the overall directional loudness map.
According to an embodiment, the audio encoder is configured to adapt a quantization precision of the one or more signals to be encoded (e.g., between a residual signal and a downmix signal) in dependence on contributions of individual directional loudness maps of the (respective) one or more signals to be encoded to an overall directional loudness map. Alternatively or additionally, similarly to the above described disabling, contributions of directional loudness maps of signal pairs to the overall directional loudness map can be used by the encoder to adapt a quantization precision of the one or more signals to be encoded. The adaptation of the quantization precision can be understood as an example for adapting the encoding parameters by the audio encoder.
According to an embodiment, the audio encoder is configured to quantize spectral-domain representations of the one or more input audio signals (e.g., left signal and right signal; e.g. The one or more input audio signals are, for example, corresponding to a plurality of different channels. Thus, the audio encoder receives, for example, a multichannel input), or of the one or more signals derived therefrom (e.g., mid signal or downmix signal and side-signal or difference signal) using one or more quantization parameters (e.g., scale factors or parameters describing which quantization accuracies or quantization step should be applied to which spectral bins or frequency bands of the one or more signals to be quantized)(wherein the quantization parameters describe, for example, an allocation of bits to different signals to be quantized and/or to different frequency bands), to obtain one or more quantized spectral-domain representations. The audio encoder is configured to adjust the one or more quantization parameters (e.g., in order to adapt a bit distribution between the one or more signals to be encoded) in dependence on one or more directional loudness maps which represent loudness information associated with a plurality of different directions (e.g., panning directions) of the one or more signals to be quantized, to adapt the provision of the one or more encoded audio signals (e.g., in dependence on contributions of individual directional loudness maps of the one or more signals to be quantized to an overall directional loudness map, e.g., associated with multiple input audio signals (e.g., with each signal of the one or more input audio signals)). Additionally the audio encoder is configured to encode the one or more quantized spectral-domain representations, in order to obtain the one or more encoded audio signals.
According to an embodiment, the audio encoder is configured to adjust the one or more quantization parameters in dependence on contributions of individual directional loudness maps of the one or more signals to be quantized to an overall directional loudness map.
According to an embodiment, the audio encoder is configured to determine an overall directional loudness map on the basis of the input audio signals, such that the overall directional loudness map represents loudness information associated with the different directions (e.g., of audio components; e.g., panning directions) of an audio scene represented (or to be represented, e.g., after a decoder-sided rendering) by the input audio signals (possibly in combination with knowledge or side information regarding positions of loudspeakers and/or knowledge or side information describing positions of audio objects). The overall directional loudness map represents, e.g., loudness information associated with (e.g. a combination of) all input audio signals.
According to an embodiment, the one or more signals to be quantized are associated (e.g., in a fixed, non-signal-dependent manner) with different directions (e.g., first different directions) or are associated with different loudspeakers (e.g., at different predefined loudspeaker positions) or are associated with different audio objects (e.g., with audio objects to be rendered at different positions, for example, in accordance with an object rendering information; e.g. a panning index).
According to an embodiment, the signals to be quantized comprise components (for example, a mid-signal and a side-signal of a mid-side stereo coding) of a joint multi-signal coding of two or more input audio signals.
According to an embodiment, the audio encoder is configured to estimate a contribution of a residual signal of the joint multi-signal coding to the overall directional loudness map, and to adjust the one or more quantization parameters on dependence thereon. The estimated
contribution is, for example, represented by a contribution of a directional loudness map of the residual signal to the overall directional loudness map.
According to an embodiment, the audio encoder is configured to adapt a bit distribution between the one or more signals and/or parameters to be encoded individually for different spectral bins or individually for different frequency bands. Additionally or alternatively the audio encoder is configured to adapt a quantization precision of the one or more signals to be encoded individually for different spectral bins or individually for different frequency bands. With the adaptation of the quantization precision, the audio encoder is, for example configured to also adapt the bit distribution. Thus, the audio encoder is, for example, configured to adapt the bit distribution between the one or more input audio signals of the audio content to be encoded by the audio encoder. Additionally or alternatively, the bit distribution between parameters to be encoded is adapted. The adaptation of the bit distribution can be performed by the audio encoder individually for different spectral bins or individually for different frequency bands. According to an embodiment, it is also possible that the bit distribution between signals and parameters is adapted. In other words, each signal of the one or more signals to be encoded by the audio encoder can comprise an individual bit distribution for different spectral bins and/or different frequency bands (e.g., of the corresponding signal) and this individual bit distribution for each of the one or more signals to be encoded can be adapted by the audio encoder.
According to an embodiment, the audio encoder is configured to adapt a bit distribution between the one or more signals and/or parameters to be encoded (for example, individually per spectral bin or per frequency band) in dependence on an evaluation of a spatial masking between two or more signals to be encoded. Furthermore the audio encoder is configured to evaluate the spatial masking on the basis of the directional loudness maps associated with the two or more signals to be encoded. This is, for example, based on the idea, that the directional loudness maps are spatially and/or temporally resolved. Thus, for example, only few or no bits are spent for masked signals and more bits (e.g., more than for the masked signals) are spent for the encoding of relevant signals or signal components (e.g., signals or signal components not masked by other signals or signal components). According to an embodiment, the spatial masking depends, for example, on a level associated with spectral bins and/or frequency bands of the two or more signals to be encoded, on a spatial distance between the spectral bins and/or frequency bands and/or on a temporal distance between the spectral bins and/or frequency bands). The directional loudness maps can directly provide loudness information for individual
spectral bins and/or frequency bands for individual signals or a combination of signals (e.g., signal pairs), resulting in an efficient analysis of spatial masking by the encoder.
According to an embodiment, the audio encoder is configured to evaluate a masking effect of a loudness contribution associated with a first direction of a first signal to be encoded onto a loudness contribution associated with a second direction (which is different from the first direction) of a second signal to be encoded (wherein, for example, a masking effect reduces with increasing difference of the angles). The masking effect defines, for example, a relevance of the spatial masking. This means, for example, that for loudness contributions, associated with a masking effect lower than a threshold, more bits are spent than for signals (e.g., spatially masked signals) associated with a masking effect higher than the threshold. According to an embodiment, the threshold can be defined as 20%, 50%, 60%, 70% or 75% masking of a total masking. This means, for example, that a masking effect of neighboring spectral bins or frequency bands are evaluated depending on the loudness information of directional loudness maps.
According to an embodiment, the audio encoder comprises an audio analyzer according to one of the herein described embodiments, wherein the loudness information (e.g., "directional loudness map") associated with different directions forms the directional loudness map.
According to an embodiment the audio encoder is configured to adapt a noise introduced by the encoder (e.g., a quantization noise) in dependence on the one or more directional loudness maps. Thus, for example, the one or more directional loudness maps of the one or more signals to be encoded can be compared by the encoder with one or more directional loudness maps of one or more reference signals. Based on this comparison the audio encoder is, for example, configured to evaluate differences indicating an introduced noise. The noise can be adapted by an adaptation of a quantization performed by the audio encoder.
According to an embodiment, the audio encoder is configured to use a deviation between a directional loudness map, which is associated with a given un-encoded input audio signal (or with a given un-encoded input audio signal pair), and a directional loudness map achievable by an encoded version of the given input audio signal (or of the given input audio signal pair), as a criterion (e.g., target criterion) for the adaptation of the provision of the given encoded audio signal (or of the given encoded audio signal pair). The following
examples are only described for one given non-encoded input audio signal but it is clear, that they are also applicable for a given un-encoded input audio signal pair. The directional loudness map associated with the given non-encoded input audio signal can be associated or can represent a reference directional loudness map. Thus, a deviation between the reference directional loudness map and the directional loudness map of the encoded version of the given input audio signal can indicate noise introduced by the encoder. To reduce the noise the audio encoder can be configured to adapt encoding parameters to reduce the deviation in order to provide a high quality encoded audio signal. This is, for example, realized by a feedback loop controlling each time the deviation. Thus the encoding parameters are adapted until the deviation is below a predefined threshold. According to an embodiment, the threshold can be defined as 5%, 10%, 15%, 20% or 25% deviation. Alternatively, the adaptation by the encoder is performed using a neural network (e.g., achieving a feed forward loop). With the neural network the directional loudness map for the encoded version of the given input audio signal can be estimated without directly determining it by the audio encoder or the audio analyzer. Thus, a very fast and high precision audio coding can be realized.
According to an embodiment, the audio encoder is configured to activate and deactivate a joint coding tool (which, for example, jointly encodes two or more of the input audio signals, or signals derived therefrom)(for example, to make a M/S (mid/side-signal) on/off decision) in dependence on one or more directional loudness maps which represent loudness information associated with a plurality of different directions of the one or more signals to be encoded. To activate or deactivate the joint coding tool, the audio encoder can be configured to determine a contribution of a directional loudness map of each signal or each candidate signal pair to an overall directional loudness map of an overall scene. According to an embodiment, a contribution higher than a threshold (e.g., a contribution of at least 10% or at least 20% or at least 30% or at least 50% indicates if a joint coding of input audio signals is reasonable. For example, the threshold may be comparatively low for this use case (e.g. lower than in other use cases), to primarily filter out irrelevant pairs. Based on the directional loudness maps the audio encoder can check if a joint coding of signals results in a more efficient and/or view bit high resolution encoding.
According to an embodiment, the audio encoder is configured to determine one or more parameters of a joint coding tool (which, e.g., jointly encode two or more of the input audio signals, or signals derived therefrom) in dependence on one or more directional loudness maps, which represent loudness information associated with a plurality of different
directions of the one or more signals to be encoded (for example, to control a smoothing of frequency dependent prediction factors; for example, to set parameters of an "intensity stereo" joint coding tool). The one or more directional loudness information maps comprise, for example, information about loudness at predetermined directions and time frames. Thus, for example, the audio encoder is configured, to determine the one or more parameters for a current time frame based on loudness information of previous time frames. Based on the directional loudness maps, masking effects can be analyzed very efficiently and can be indicated by the one or more parameters, whereby frequency dependent prediction factors can be determined based on the one or more parameters, such that predicted sample values are close to original sample values (associated with the signal to be encoded). Thus it is possible for the encoder to determine frequency dependent prediction factors representing an approximation of a masking threshold rather than the signal to be encoded. Furthermore the directional loudness maps are, for example, based on a psychoacoustic model, whereby a determination of the frequency dependent prediction factors based on the one or more parameters is improved further and can result in a highly accurate prediction. Alternatively the parameters of the joint coding tool define, for example, which signals or signal pairs should be coded jointly by the audio encoder. The audio encoder is, for example, configured to base the determination of the one or more parameters on contributions of each directional loudness map associated with a signal to be encoded or a signal pair, of signals to be encoded, to an overall directional loudness map. Thus, for example, the one or more parameters indicate individual signals and/or signal pairs with the highest contribution or a contribution equal to or higher than a threshold (see, for example, the threshold definition above). Based on the one or more parameters the audio encoder is, for example, configured to encode jointly the signals indicated by the one or more parameters. Alternatively, for example, signal pairs having a high proximity/similarity in the respective directional loudness map can be indicated by the one or more parameters of the joint coding tool. The chosen signal pairs are, for example, jointly represented by a downmix. Thus bits needed for the encoding are minimized or reduced, since the downmix signal or a residual signal of the signals to be encoded jointly is very small.
According to an embodiment, the audio encoder is configured to determine or estimate an influence of a variation of one or more control parameters controlling the provision of the one or more encoded audio signals onto a directional loudness map of one or more encoded signals, and to adjust the one or more control parameters in dependence on the determination or estimation of the influence. The influence of the control parameters onto the directional loudness map of one or more encoded signals can comprise a measure for
induced noise (e.g., the control parameters regarding a quantization position can be adjusted) by the encoding of the audio encoder, a measure for audio distortions and/or a measure for a falloff in quality of a perception of a listener. According to an embodiment, the control parameters can be represented by the encoding parameters or the encoding parameters can comprise the control parameters.
According to an embodiment, the audio encoder is configured to obtain a direction component (e.g., direction information) used for obtaining the one or more directional loudness maps using metadata representing position information of loudspeakers associated with the input audio signals (this concept can also be used in the other audio encoders). The direction component is, for example, represented by the herein described first different directions which are, for example, associated with different channels or loudspeakers associated with the input audio signals. According to an embodiment, based on the direction component, the obtained one or more directional loudness maps can be associated to an input audio signal and/or a signal pair of the input audio signals with the same direction component. Thus, for example, a directional loudness map can have the index L and an input audio signal can have the index L, wherein the L indicates a left channel or a signal for a left loudspeaker. Alternatively, the direction component can be represented by a vector, like (1, 3), which indicates a combination of input audio signals of a first channel and a third channel. Thus, the directional loudness map with the index (1, 3) can be associated with this signal pair. According to an embodiment, each channel can be associated with a different loudspeaker.
An embodiment according to this invention is related to an audio encoder for encoding an input audio content comprising one or more input audio signals (preferably a plurality of input audio signals). The audio encoder is configured to provide one or more encoded (e.g., quantized and then losslessly encoded) audio signals (e.g., encoded spectral-domain representations) on the basis of two or more input audio signals (e.g., left signal and right signal), or on the basis of two or more signals derived therefrom, using a joint encoding of two or more signals to be encoded jointly (e.g., using a mid signal or downmix signal and a side-signal or difference signal). Additionally the audio encoder is configured to select signals to be encoded jointly out of a plurality of candidate signals or out of a plurality of pairs of candidate signals (e.g., out of the two or more input audio signals or out of the two or more signals derived therefrom) in dependence on directional loudness maps which represent loudness information associated with a plurality of different directions (e.g., panning directions) of the candidate signals or of the pairs of candidate signals (e.g., in dependence on contributions of individual directional loudness maps of the candidate signals to an overall directional loudness map, e.g., associated with multiple input audio signals (e.g., with each signal of the one or more input audio signals), or in dependence on contributions of directional loudness maps of pairs of candidate signals to an overall directional loudness map (e.g., associated with all input audio signals)).
According to an embodiment, the audio encoder can be configured to activate and deactivate the joint encoding. Thus, for example, if the audio content comprises only one input audio signal, then the joint encoding is deactivated and it is only activated, if the audio content comprises two or more input audio signals. Thus it is possible to encode with the audio encoder a monaural audio content, a stereo audio content and/or an audio content comprising three or more input audio signals (i.e. a multichannel audio content). According to an embodiment, the audio encoder provides for each input audio signal a separate encoded audio signal as output signal (e.g., suitable for audio content comprising only one single input audio signal) or provides one combined output signal (e.g., signals encoded jointly) comprising two or more encoded audio signals of two or more input audio signals.
The embodiments of this audio encoder are based on the idea that it is efficient and improves the accuracy of the encoding, to base the joint encoding on directional loudness maps. The usage of directional loudness maps is advantageous, because they can indicate a perception of the audio content by a listener and thus improve the audio quality of the encoded audio content, especially in context with a joint encoding. It is, for example, possible to optimize the choice of signal pairs to be encoded jointly by analyzing directional loudness maps. The analysis of directional loudness maps gives, for example, information about signals or signal pairs, which can be neglected (e.g., signals, which have only little influence on a perception of a listener), resulting in a small amount of bits needed for the encoded audio content (e.g., comprising two or more encoded signals) by the audio encoder. This means, for example, that signals with a low contribution of their respective directional loudness map to the overall directional loudness map can be neglected. Alternatively, the analysis can indicate signals which have a high similarity (e.g., signals with similar directional loudness maps), whereby, for example, optimizes residual signals can be obtained by the joint encoding.
According to an embodiment, the audio encoder is configured to select signals to be encoded jointly out of a plurality of candidate signals or out of a plurality of pairs of candidate signals in dependence on contributions of individual directional loudness maps of the
candidate signals to an overall directional loudness map or in dependence on contributions of directional loudness maps of the pairs of candidate signals to an overall directional loudness map (e.g., associated with multiple input audio signals (e.g., with each signal of the one or more input audio signals))(or associated with an overall (audio) scene, e.g., represented by the input audio signals). The overall directional loudness map represents, for example, loudness information associated with the different directions (e.g., of audio components) of an audio scene represented (or to be represented, for example, after a decoder-sided rendering) by the input audio signals (possibly in combination with knowledge or side information regarding positions of loudspeakers and/or knowledge or side information describing positions of audio objects).
According to an embodiment, the audio encoder is configured to determine a contribution of pairs of candidate signals to the overall directional loudness map. Additionally the audio encoder is configured to choose one or more pairs of candidate signals having a highest contribution to the overall directional loudness map for a joint encoding or the audio encoder is configured to choose one or more pairs of candidate signals having a contribution to the overall directional loudness map which is larger than a predetermined threshold (e.g., a contribution of at least 60%, 70%, 80% or 90%) for a joint encoding. Regarding the highest contribution it is possible that only one pair of candidate signals has the highest contribution but it is also possible that more than one pair of candidate signals have the same contribution, which represents the highest contribution, or more than one pair of candidate signals have similar contributions within small variances of the highest contribution. Thus the audio encoder is, for example, configured to select more than one signal or signal pair for the joint encoding. With the features described in this embodiment it is possible to find relevant signal pairs for an improved joint encoding and to discard signals or signal pairs, which don’t influence a perception of the encoded audio content by a listener in a high amount.
According to an embodiment, the audio encoder is configured to determine individual directional loudness maps of two or more candidate signals (e.g., directional loudness maps associated with signal pairs). Additionally the audio encoder is configured to compare the individual directional loudness maps of the two or more candidate signals and to select two or more of the candidate signals for a joint encoding in dependence on a result of the comparison (for example, such that candidate signals (e.g., signal pairs, signal triplets, signal quadruplets, etc.), individual loudness maps of which comprise a maximum similarity or a similarity which is higher than a similarity threshold, are selected for a joint encoding).
Thus, for example, only few or no bits are spent for a residual signal (e.g., a side channel with respect to a mid-channel) maintaining a high quality of the encoded audio content.
According to an embodiment, the audio encoder is configured to determine an overall directional loudness map using a downmixing of the input audio signals and/or using a binauralization of the input audio signals. The downmixing or the binauralization contemplate, for example, the directions (e.g., associations with channels or loudspeaker for the respective input audio signals). The overall directional loudness map can be associated with loudness information corresponding to an audio scene created by all input audio signals.
An embodiment according to this invention is related to an audio encoder for encoding an input audio content comprising one or more input audio signals (preferably a plurality of input audio signals). The audio encoder is configured to provide one or more encoded (e.g., quantized and then losslessly encoded) audio signals (e.g., encoded spectral-domain representations) on the basis of two or more input audio signals (e.g., left signal and right signal), or on the basis of two or more signals derived therefrom. Additionally the audio encoder is configured to determine an overall directional loudness map (for example, a target directional loudness map of a scene) on the basis of the input audio signals, and/or to determine one or more individual directional loudness maps associated with individual input audio signals (or associated with two or more input audio signals, like signal pairs). Furthermore the audio encoder is configured to encode the overall directional loudness map and/or one or more individual directional loudness maps as a side information.
Thus, for example, if the audio content comprises only one input audio signal, the audio encoder is configured to encode only this signal together with the corresponding individual directional loudness map. If the audio content comprises two or more input audio signals, the audio encoder is, for example, configured to encode all or at least some (e.g., one individual signal and one signal pair of three input audio signals) signals individually together with the respective directional loudness map (e.g., with individual directional loudness maps of individual encoded signals and/or with directional loudness maps corresponding to signal pairs or other combinations of more than two signals and/or with overall directional loudness maps associated with all input audio signals). According to an embodiment, the audio encoder is configured to encode all or at least some signals resulting in one encoded audio signal, for example, together with the overall directional loudness map as output (e.g., one combined output signal (e.g., signals encoded jointly) comprising, for example, two or more encoded audio signals of two or more input audio signals). Thus it is possible to encode with the audio encoder a monaural audio content, a stereo audio content and/or an audio content comprising three or more input audio signals (i.e. a multichannel audio content).
The embodiments of this audio encoder are based on the idea that it is advantageous to determine and encode one or more directional loudness maps, because they can indicate a perception of the audio content by a listener and thus improve the audio quality of the encoded audio content. According to an embodiment, the one or more directional loudness maps can be used by the encoder to improve the encoding, for example, by adapting encoding parameters based on the one or more directional loudness maps. Thus, the encoding of the one or more directional loudness maps is especially advantageous, since they can represent information concerning an influence of the encoding. With the one or more directional loudness maps as side information in the encoded audio content, provided by the audio encoder, a very accurate decoding can be achieved, since information regarding the encoding is provided (e.g., in a data stream) by the audio encoder.
According to an embodiment, the audio encoder is configured to determine the overall directional loudness map on the basis of the input audio signals such that the overall directional loudness map represents loudness information associated with the different directions (e.g., of audio components) of an audio scene, represented (or to be represented, for example, after a decoder-sided rendering) by the input audio signals (possibly in combination with knowledge or side information regarding positions of loudspeakers and/or knowledge or side information describing positions of audio objects). The different directions of the audio scene represent, for example, the herein described second different directions.
According to an embodiment, the audio encoder is configured to encode the overall directional loudness map in the form of a set of (e.g., scalar) values associated with different directions (and preferably with a plurality of frequency bins or frequency bands). If the overall directional loudness map is encoded in the form of a set of values, a value associated with a certain direction can comprise loudness information of a plurality of frequency bins or frequency bands. Alternatively the audio encoder is configured to encode the overall directional loudness map using a center position value (for example, describing an angle or a panning index at which a maximum of the overall directional loudness map occurs for a given frequency bin or frequency band) and a slope information (for example, one or more scalar values describing slopes of the values of the overall directional loudness map in angle direction or panning index direction). The encoding of the overall directional loudness map using the center position value and the slope information can be performed for different given frequency bins or frequency bands. Thus, for example, the overall directional loudness map can comprise information of the center position value and the slope information for more than one frequency bin or frequency band. Alternatively the audio encoder is configured to encode the overall directional loudness map in the form of a polynomial representation or the audio encoder is configured to encode the overall directional loudness map in the form of a spline representation. The encoding of the overall directional loudness map in the form of a polynomial representation or a spline representation is a cost-efficient encoding. Although, these features are described with respect to the overall directional loudness map, this encoding can also be performed for individual directional loudness maps (e.g., of individual signals, of signal pairs and/or of groups of three or more signals). Thus, with these features the directional loudness maps are encoded very efficiently and information, on which the encoding is based on, is provided.
According to an embodiment, the audio encoder is configured to encode (e.g., and transmit or include into an encoded audio representation) one (e.g., only one) downmix signal obtained on the basis of a plurality of input audio signals and an overall directional loudness map. Alternatively the audio encoder is configured to encode (e.g., and transmit or include into an encoded audio representation) a plurality of signals (e.g., the input audio signals or signals derived therefrom), and to encode (e.g., and transmit or include into the encoded audio representation) individual directional loudness maps of a plurality of signals which are encoded (e.g., directional loudness maps of individual signals and/or of signal pairs and/or of groups of three or more signals). Alternatively the audio encoder is configured to encode (e.g., and transmit or include into an encoded audio representation) an overall directional loudness map, a plurality of signals (e.g., the input audio signals or signals derived therefrom) and parameters describing (e.g., relative) contributions of the signals which are encoded to the overall directional loudness map. According to an embodiment, the parameters describing contributions can be represented by scalar values. Thus, it is possible by an audio decoder receiving the encoded audio representation (e.g., an audio content or a data stream comprising the encoded signals, the overall directional loudness map and the parameters) to reconstruct individual directional loudness maps of the signals based on the overall directional loudness map and the parameters describing contributions of the signals.
An embodiment according to this invention is related to an audio decoder for decoding an encoded audio content. The audio decoder is configured to receive an encoded representation of one or more audio signals and to provide a decoded representation of the one or more audio signals (for example, using an AAC-like decoding or using a decoding of entropy-encoded spectral values). Furthermore the audio decoder is configured to receive an encoded directional loudness map information and to decode the encoded directional loudness map information, to obtain one or more (e.g., decoded) directional loudness maps. Additionally the audio decoder is configured to reconstruct an audio scene using the decoded representation of the one or more audio signals and using the one or more directional loudness maps. The audio content can comprise the encoded representation of the one or more audio signals and the encoded directional loudness map information. The encoded directional loudness map information can comprise directional loudness maps of individual signals, of signal pairs and/or of groups of three or more signals.
The embodiment of this audio decoder is based on the idea that it is advantageous to determine and decode one or more directional loudness maps because they can indicate a perception of the audio content by a listener and thus improve the audio quality of the decoded audio content. The audio decoder is, for example, configured to determine a high quality prediction signal based on the one or more directional loudness maps, whereby a residual decoding (or a joint decoding) can be improved. According to an embodiment, the directional loudness maps define loudness information for different directions in the audio scene over time. A loudness information for a certain direction at a certain point of time or in a certain time frame can comprise loudness information of different audio signals or one audio signal at, for example, different frequency bins or frequency bands. Thus, for example, the provision of the decoded representation of the one or more audio signals by the audio decoder can be improved, for example, by adapting the decoding of the encoded representation of the one or more audio signals based on the decoded directional loudness maps. Thus, the reconstructed audio scene is optimized, since the decoded representation of the one or more audio signals can achieve a minimal deviation to original audio signals based on an analysis of the one or more directional loudness maps, resulting in a high quality audio scene. According to an embodiment, the audio decoder can be configured to use the one or more directional loudness maps for an adaptation of decoding parameters to provide efficiently and with high accuracy the decoded representation of the one or more audio signals.
According to an embodiment, the audio decoder is configured to obtain output signals such that one or more directional loudness maps associated with the output signals approximate or equal one or more target directional loudness maps. The one or more target directional loudness maps are based on the one or more decoded directional loudness maps or are equal to the one or more decoded directional loudness maps. The audio decoder is, for example, configured to use an appropriate scaling or combination of the one or more decoded audio signals to obtain the output signals. The target directional loudness maps are, for example, understood as reference directional loudness maps. According to an embodiment, the target directional loudness maps can represent loudness information of one or more audio signals before an encoding and decoding of the audio signals. Alternatively, the target directional loudness maps can represent loudness information associated with the encoded representation of the one or more audio signals (e.g., one or more decoded directional loudness maps). The audio decoder receives, for example, encoding parameters used for the encoding to provide the encoded audio content. The audio decoder is, for example, configured to determine decoding parameters based on the encoding parameters to scale the one or more decoded directional loudness maps to determine the one or more target directional loudness maps. It is also possible that the audio decoder comprises an audio analyzer, which is configured to determine the target directional loudness maps based on the decoded directional loudness maps and the one or more decoded audio signals, wherein, for example, the decoded directional loudness maps are scaled based on the one or more decoded audio signals. Since the one or more target directional loudness maps can be associated with an optimal or optimized audio scene realized by the audio signals, it is advantageous to minimize a deviation between the one or more directional loudness maps associated with output signals and the one or more target directional loudness maps. According to an embodiment, this deviation can be minimized by the audio decoder by adapting decoding parameters or adapting parameters regarding the reconstruction of the audio scene. Thus, with this feature a quality of the output signals is controlled, for example, by a feedback loop, analyzing the one or more directional loudness maps associated with the output signals. The audio decoder is, for example, configured to determine the one or more directional loudness maps of the output signals (e.g. the audio decoder comprises an herein described audio analyzer to determine the directional loudness maps). Thus the audio decoder provides output signals, which are associated with directional loudness maps, which approximate or equal the target directional loudness maps.
According to an embodiment, the audio decoder is configured to receive one (e.g., only one) encoded downmix signal (e.g., obtained on the basis of a plurality of input audio signals) and an overall directional loudness map; or a plurality of encoded audio signals (e.g., the input audio signals of an encoder or signals derived therefrom), and individual directional loudness maps of the plurality of encoded signals; or an overall directional loudness map, a plurality of encoded audio signals (e.g., the input audio signals received by an audio encoder, or signals derived therefrom) and parameters describing (e.g., relative) contributions of the encoded audio signals to the overall directional loudness map. The audio decoder is configured to provide the output signals on the basis thereof.
An embodiment according to this invention is related to a format converter for converting a format of an audio content, which represents an audio scene (e.g., a spatial audio scene), from a first format to a second format. The first format may, for example, comprise a first number of channels or input audio signals and a side information or a spatial side information adapted to the first number of channels or input audio signals, and wherein the second format may, for example, comprise a second number of channels or output audio signals, which may be different from the first number of channels or input audio signals, and a side information or a spatial side information adapted to the second number of channels or output audio signals. Furthermore the format converter is configured to provide a representation of the audio content in the second format on the basis of the representation of the audio content in the first format. Additionally the format converter is configured to adjust a complexity of the format conversion (for example, by skipping one or more of the input audio signals of the first format, which contribute to the directional loudness map below a threshold, in the format conversion process) in dependence on contributions of input audio signals of the first format (e.g., one or more audio signals, one or more downmix signals, one or more residual signals, etc.) to an overall directional loudness map of the audio scene (wherein the overall directional loudness map may, for example, be described by a side information of the first format received by the format converter). Thus, for example, contributions of individual directional loudness maps, associated with individual input audio signals, to the overall directional loudness map of the audio scene are analyzed for the complexity adjustment of the format conversion. Alternatively, this adjustment can be performed by the format converter in dependence on contributions of directional loudness maps corresponding to combinations of input audio signals (e.g., signal pairs, a mid-signal, a side-signal, downmix signal, a residual signal, a difference signal and/or groups of three or more signals) to the overall directional loudness map of the audio scene.
The embodiments of the format converter are based on the idea that it is advantageous to convert a format of the audio content on the basis of one or more directional loudness maps because they can indicate a perception of the audio content by a listener and thus a high quality of the audio content in a second format is realized and the complexity of the format conversion is reduced in dependence on the directional loudness maps. With the contributions it is possible to get information of signals relevant for a high quality audio perception of the format converted audio content. Thus audio content in the second format, for example, comprises less signals (e.g., only the relevant signals according to the directional loudness maps) than the audio content in the first format, with nearly the same audio quality.
According to an embodiment, the format converter is configured to receive a directional loudness map information, and to obtain the overall directional loudness map (e.g., of the decoded audio scene; e.g., of the audio content in the first format) and/or one or more directional loudness maps on the basis thereof. The directional loudness map information (i.e. one or more directional loudness maps associated with individual signals of the audio content or associated with signal pairs or a combination of three or more signals of the audio content) can represent the audio content in the first format, can be part of the audio content in the first format or can be determined by the format converter based on the audio content in the first format (e.g., by a herein described audio analyzer; e.g., the format converter comprises the audio analyzer). According to an embodiment, the format converter is configured to also determine directional loudness map information of the audio content in the second format. Thus, for example, directional loudness maps before and after the format conversion can be compared, to reduce a perceived quality degradation due to the format conversion. This is, for example, realized by minimizing a deviation between the directional loudness map before and after the format conversion.
According to an embodiment, the format converter is configured to derive the overall directional loudness map (e.g., of the decoded audio scene) from the one or more (e.g., decoded) directional loudness maps (e.g., associated with signals in the first format).
According to an embodiment, the format converter is configured to compute or estimate a contribution of a given input audio signal (e.g., of a signal in the first format) to the overall directional loudness map of the audio scene. The format converter is configured to decide whether to consider the given input audio signal in the format conversion in dependence on a computation or estimation of the contribution (for example, by comparing the computed
or estimated contribution with a predetermined absolute or relative threshold value). If the contribution is, for example, at or above the absolute or relative threshold value the corresponding signal can be seen as relevant and thus the format converter can be configured to decide to consider this signal. This can be understood as a complexity adjustment by the format converter, since not all signals in the first format are necessarily converted into the second format. The predetermined threshold value can represent a contribution of at least 2% or of at least 5% or of at least 10% or of at least 20% or of at least 30%. This is, for example, meant to exclude inaudible and/or irrelevant channels (or nearly inaudible and/or irrelevant channels), i.e. the threshold should be lower (e.g. when compare to other use cases), e.g. 5%, 10%, 20%, 30%.
An embodiment according to this invention is related to an audio decoder for decoding an encoded audio content. The audio decoder is configured to receive an encoded representation of one or more audio signals and to provide a decoded representation of the one or more audio signals (for example, using an AAC-like decoding or using a decoding of entropy-encoded spectral values). Furthermore the audio decoder is configured to reconstruct an audio scene using the decoded representation of the one or more audio signals and to adjust a decoding complexity in dependence on contributions of encoded signals (e.g., one or more audio signals, one or more downmix signals, one or more residual signals, etc.) to an overall directional loudness map of a decoded audio scene.
The embodiments of this audio decoder are based on the idea that it is advantageous to adjust the decoding complexity based on one or more directional loudness maps, because they can indicate a perception of the audio content by a listener and thus realize at the same time a reduction of the decoding complexity and an improvement of the decoder audio quality of the audio content. Thus, for example, the audio decoder is configured to decide, based on the contributions, which encoded signals of the audio content should be decoded and used for the reconstruction of the audio scene by the audio decoder. This means, for example, that encoded representation of one or more audio signals comprises less audio signals (e.g., only the relevant audio signals according to the directional loudness maps) than the decoded representation of the one or more audio signals, with nearly the same audio quality.
Claims
1. An audio analyzer (100),
wherein the audio analyzer (100) is configured to obtain spectral domain representations (110, 1101, 1102, 110a, 110b) of two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b);
wherein the audio analyzer (100) is configured to obtain directional information (122, 1221, 1222, 125, 127) associated with spectral bands of the spectral domain representations (110, 1101, 1102, 110a, 110b);
wherein the audio analyzer (100) is configured to obtain loudness information (142, 1421, 1422, 142a, 142b) associated with different directions (121) as an analysis result,
wherein contributions (132, 1321, 1322, 1351, 1352) to the loudness information (142, 1421, 1422, 142a, 142b) are determined in dependence on the directional information (122, 1221, 1222, 125, 127).
2. Audio analyzer (100) according to claim 1, wherein the audio analyzer (100) is configured to obtain a plurality of weighted spectral domain representations (135, 1351, 1352, 132) on the basis of the spectral domain representations (110, 1101, 1102, 110a, 110b) of the two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b);
wherein values of the one or more spectral domain representations (110, 1101, 1102, 110a, 110b) are weighted (134) in dependence on the different directions (125) of the audio components in the two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b) to obtain the plurality of weighted spectral domain representations (135, 1351, 1352, 132);
wherein the audio analyzer (100) is configured to obtain loudness information (142, 1421, 1422, 142a, 142b) associated with the different directions (121) on the basis of the weighted spectral domain representations (135, 1351, 1352, 132) as the analysis result,
3. Audio analyzer (100) according to claim 1 or claim 2, wherein the audio analyzer (100) is configured to decompose the two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b) into a short-time Fourier transform (STFT) domain to obtain two or more transformed audio signals (110, 1101, 1102, 110a, 110b),
4. Audio analyzer (100) according to claim 3, wherein the audio analyzer (100) is configured to group spectral bins of the two or more transformed audio signals (110, 1101, 1102, 110a, 110b) to spectral bands of the two or more transformed audio signals (110, 1101, 1102, 110a, 110b); and
wherein the audio analyzer (100) is configured to weight the spectral bands using different weights, based on an outer-ear and middle-ear model (116), to obtain the one or more spectral domain representations (110, 1101, 1102, 110a, 110b) of the two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b),
5. Audio analyzer (100) according to one of the claims 1 to 4, wherein the two or more input audio signals (112, 1121 , 1122, 1123, 112a, 112b) are associated with different directions or different loudspeaker positions.
6. Audio analyzer (100) according to one of the claims 1 to 5, wherein the audio analyzer (100) is configured to determine a direction-dependent weighting (127, 122) per spectral bin and for a plurality of predetermined directions (121).
7. Audio analyzer (100) according to one of the claims 1 to 6, wherein the audio analyzer (100) is configured to determine a direction-dependent weighting (127,
122) using a Gaussian function, such that the direction-dependent weighting (127, 122) decreases with increasing deviation between respective extracted direction values (125, 122) and respective predetermined direction values (121).
8. Audio analyzer (100) according to claim 7, wherein the audio analyzer (100) is configured to determine panning index values as the extracted direction values (125, 122).
9. Audio analyzer (100) according to claim 7 or claim 8, wherein the audio analyzer (100) is configured to determine the extracted direction values (125, 122) in dependence on spectral domain values (110) of the input audio signals (112, 1121, 1122, 1123, 112a, 112b).
10. Audio analyzer (100) according to one of the claims 6 to 9, wherein the audio analyzer (100) is configured to obtain the direction-dependent weighting (127, 122) associated with a predetermined direction (121), a time designated with
a time index m, and a spectral bin designated by a spectral bin index k according to
wherein x is a predetermined value;
wherein Ψ(m, k) designates the extracted direction values (125, 122) associated with a time designated with a time index m, and a spectral bin designated by a spectral bin index k; and
wherein Ψ0,j is a direction value which designates a predetermined direction (121).
11. Audio analyzer (100) according to one of the claims 6 to 10, wherein the audio analyzer (100) is configured to apply the direction-dependent weighting (127, 122) to the one or more spectral domain representations (110, 1101, 1102, 110a, 110b) of the two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b), in order to obtain the weighted spectral domain representations (135, 1351, 1352, 132).
12 Audio analyzer (100) according to one of the claims 6 to 11, wherein the audio analyzer (100) is configured to obtain the weighted spectral domain representations (135, 1351, 1352, 132),
such that signal components having associated a first predetermined direction (121) are emphasized over signal components having associated other directions (125) in a first weighted spectral domain representation (135, 1351, 1352, 132) and
such that signal components having associated a second predetermined direction (121) are emphasized over signal components having associated other directions (125) in a second weighted spectral domain representation (135, 1351, 1352, 132).
13. Audio analyzer (100) according to one of the claims 1 to 12, wherein the audio analyzer (100) is configured to obtain the weighted spectral domain representations (135, 1351, 1352, 132) associated with an input audio signal or
combination of input audio signals (112, 1121, 1122, 1123, 112a, 112b) (112, 1121, 1122, 1123, 112a, 112b) designated by index i, a spectral band designated by index b, a direction (121) designated by index Ψ0,j, a time designated with a time index m, and a spectral bin designated by a spectral bin index k according to
wherein Xi,b(m,k) designates a spectral domain representation (110) associated with an input audio signal (112) or combination of input audio signals (112, 1121, 1122, 1123, 112a, 112b) designated by index i, a spectral band designated by index b, a time designated with a time index m, and a spectral bin designated by a spectral bin index k; and
wherein designates the direction-dependent weighting (127, 122)
associated with a direction (121) designated by index Ψ0,j, a time designated with a time index m, and a spectral bin designated by a spectral bin index k.
14. Audio analyzer (100) according to one of the claims 1 to 13, wherein the audio analyzer (100) is configured to determine an average over a plurality of band loudness values (145), in order to obtain a combined loudness value (142).
15. Audio analyzer (100) according to one of the claims 1 to 14, wherein the audio analyzer (100) is configured to obtain band loudness values (145) for a plurality of spectral bands on the basis of a weighted combined spectral domain representation (137) representing a plurality of input audio signals (112, 1121, 1122, 1123, 112a, 112b); and
wherein the audio analyzer (100) is configured to obtain, as the analysis result, a plurality of combined loudness values (142) on the basis of the obtained band loudness values (145) for a plurality of different directions (121).
16. Audio analyzer (100) according to claim 14 or claim 15, wherein the audio analyzer (100) is configured to compute a mean of squared spectral values of the weighted combined spectral domain representation (137) over spectral values of a frequency band, and to apply an exponentiation having an exponent between 0 and 1/2 to the mean of squared spectral values, in order to determine the band loudness values (145).
17. Audio analyzer (100) according to one of the claims 14 to 16, wherein the audio analyzer (100) is configured to obtain the band loudness values (145)
associated with a spectral band designated with index b, a direction (121) designated with index Ψ0,j, a time designated with a time index m according to
wherein Kb designates a number of spectral bins in a frequency band having frequency band index b;
wherein k is a running variable and designates spectral bins in the frequency band having frequency band index b;
wherein b designates a spectral band; and
wherein designates a weighted combined spectral domain
representation (137) associated with a spectral band designated with index b, a direction (121) designated by index Ψ0,j, a time designated with a time index m and a spectral bin designated by a spectral bin index k.
18. Audio analyzer (100) according to one of the claims 1 to 17, wherein the audio analyzer (100) is configured to obtain a plurality of combined loudness values (142) L(m, Ψ0,j) associated with a direction (121) designated with index Ψ0,j and a time designated with a time index m according to
wherein B designates a total number of spectral bands b and
wherein designates band loudness values (145) associated with a
spectral band designated with index b, a direction (121) designated with index Ψ0,j and a time designated with a time index m.
19. The audio analyzer (100) according to one of claims 1 to 18, wherein the audio analyzer (100) is configured to allocate loudness contributions (132, 1321, 1322, 1351, 1352) to histogram bins associated with different directions (121) in
dependence on the directional information (122, 1221, 1222, 125, 127), in order to obtain the analysis result.
20. The audio analyzer (100) according to one of claims 1 to 19, wherein the audio analyzer (100) is configured to obtain loudness information associated with spectral bins on the basis of the spectral domain representations (110, 1101, 1102, 110a, 110b), and
wherein the audio analyzer (100) is configured to add a loudness contribution (132, 1321, 1322, 1351, 1352) to one or more histogram bins on the basis of a loudness information associated with a given spectral bin;
wherein a selection, to which one or more histogram bins the loudness contribution (132, 1321, 1322, 1351, 1352) is made, is based on a determination of the directional information for a given spectral bin.
21. The audio analyzer (100) according to one of claims 1 to 20,
wherein the audio analyzer (100) is configured to add loudness contributions (132, 1321, 1322, 1351, 1352) to a plurality of histogram bins on the basis of a loudness information associated with a given spectral bin,
such that a largest contribution (132, 1321, 1322, 1351, 1352) is added to a histogram bin associated with a direction (121) that corresponds to the directional information (125, 122) associated with the given spectral bin, and such that reduced contributions (132, 1321, 1322, 1351, 1352) are added to one or more histogram bins associated with further directions (121).
22. The audio analyzer (100) according to one of claims 1 to 21,
wherein the audio analyzer (100) is configured to obtain directional information (122, 1221 , 1222, 125, 127) on the basis of an audio content of the two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b).
23. The audio analyzer (100) according to one of claims 1 to 22,
wherein the audio analyzer (100) is configured to obtain directional information (122, 1221, 1222, 125, 127) on the basis of an analysis of an amplitude panning of audio content; and/or
wherein the audio analyzer (100) is configured to obtain directional information (122, 1221, 1222, 125, 127) on the basis of an analysis of a phase relationship and/or a time delay and/or correlation between audio contents of two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b); and/or
wherein the audio analyzer (100) is configured to obtain directional information (122,
1221, 1222, 125, 127) on the basis of an identification of widened sources, and/or
wherein the audio analyzer is configured to obtain directional information (122, 1221,
1222, 125, 127) using a matching of spectral information of an incoming sound and templates associated with head related transfer functions in different directions..
24. The audio analyzer (100) according to one of claims 1 to 23,
wherein the audio analyzer (100) is configured to spread loudness information to a plurality of directions (121) according to a spreading rule.
25. An audio similarity evaluator (200),
wherein the audio similarity evaluator (200) is configured to obtain a first loudness information (142, 1421, 1422, 142a, 142b) associated with different directions (121) on the basis of a first set of two or more input audio signals (112a), and
wherein the audio similarity evaluator (200) is configured to compare (220) the first loudness information (142, 1421, 1422, 142a, 142b) with a second loudness information (142, 1421, 1422, 142a, 142b) associated with the different panning directions and with a set of two or more reference audio signals (112b), in order to obtain a similarity information (210) describing a similarity between the first set of two or more input audio signals (112a) and the set of two or more reference audio signals (112b).
26. An audio similarity evaluator (200) according to claim 25, wherein the audio similarity evaluator (200) is configured to obtain the first loudness information (142, 1421, 1422, 142a, 142b) such that the first loudness information (142, 1421, 1422, 142a, 142b) comprises a plurality of combined loudness values (142) associated with the first set of two or more input audio signals (112a) and associated with respective predetermined directions (121), wherein the combined loudness values (142) of the first loudness information (142, 1421, 1422, 142a, 142b) describe loudness of signal components of the first set of two or more input audio signals (112a) associated with the respective predetermined directions (121).
27. An audio similarity evaluator (200) according to claim 25 or claim 26, wherein the audio similarity evaluator (200) is configured to obtain the first loudness information (142, 1421, 1422, 142a, 142b) such that the first loudness information (142, 1421, 1422, 142a, 142b) is associated with combinations of a plurality of weighted spectral domain representations (135, 1351, 1352, 132) of the first set of two or more input audio signals (112a) associated with respective predetermined directions (121).
28. An audio similarity evaluator (200) according to one of the claims 25 to 27, wherein the audio similarity evaluator (200) is configured to determine a difference (210) between the second loudness information (142, 1421, 1422, 142a, 142b) and the first loudness information (142, 1421, 1422, 142a, 142b) to obtain a residual loudness information (210).
29. An audio similarity evaluator (200) according to claim 28, wherein the audio similarity evaluator (200) is configured to determine a value (210) that quantifies the difference (210) over a plurality of directions.
30. An audio similarity evaluator (200) according to one of the claims 25 to 29, wherein the audio similarity evaluator (200) is configured to obtain the first loudness information (142, 1421, 1422, 142a, 142b) and/or the second loudness information (142, 1421, 1422, 142a, 142b) using an audio analyzer (100) according to one of claims 1 to 24.
31. An audio similarity evaluator (200) according to one of claims 25 to 30,
wherein the audio similarity evaluator (200) is configured to obtain a direction component used for obtaining the loudness information (142, 1421, 1422, 142a, 142b) associated with different directions (121) using metadata representing position information of loudspeakers associated with the input audio signals (112, 1121, 1122, 1123, 112a, 112b).
32. An audio encoder (300) for encoding (310) an input audio content (112) comprising one or more input audio signals (112, 1121, 1122, 1123, 112a, 112b),
wherein the audio encoder (300) is configured to provide one or more encoded audio signals (320) on the basis of one or more input audio signals (112, 1121 , 1122, 1123, 112a, 112b), or one or more signals derived therefrom (110, 1101, 1102, 110a, 110b);
wherein the audio encoder (300) is configured to adapt (340) encoding parameters in dependence on one or more directional loudness maps which represent loudness information (142, 1421, 1422, 142a, 142b) associated with a plurality of different directions (121) of the one or more signals to be encoded.
33. Audio encoder (300) according to claim 32, wherein the audio encoder (300) is configured to adapt (340) a bit distribution between the one or more signals and/or parameters to be encoded in dependence on contributions of individual directional loudness maps of the one or more signals and/or parameters to be encoded to an overall directional loudness map (142, 1421, 1422, 142a, 142b).
34. Audio encoder (300) according to claim 32 or claim 33, wherein the audio encoder (300) is configured to disable encoding (310) of a given one of the signals to be encoded, when contributions of an individual directional loudness map of the given one of the signals to be encoded to an overall directional loudness map is below a threshold.
35. Audio encoder (300) according to one of the claims 32 to 34, wherein the audio encoder (300) is configured to adapt (342) a quantization precision of the one or more signals to be encoded in dependence on contributions of individual directional loudness maps of the one or more signals to be encoded to an overall directional loudness map.
36. Audio encoder (300) according to one of the claims 32 to 35, wherein the audio encoder (300) is configured to quantize (312) spectral domain representations (110, 1101, 1102, 110a, 110b) of the one or more input audio signals (112, 1121, 1122, 1123, 112a, 112b), or of the one or more signals derived therefrom (110, 1101, 1102, 110a, 110b) using one or more quantization parameters, to obtain one or more quantized spectral domain representations (313);
wherein the audio encoder (300) is configured to adjust (342) the one or more quantization parameters in dependence on one or more directional loudness maps which represent loudness information (142, 1421, 1422, 142a, 142b) associated with a plurality of different directions (121) of the one or more signals to be quantized, to adapt the provision of the one or more encoded audio signals (320); and
wherein the audio encoder (300) is configured to encode the one or more quantized spectral domain representations (313), in order to obtain the one or more encoded audio signals (320).
37. The audio encoder (300) according to claim 36, wherein the audio encoder (300) is configured to adjust (342) the one or more quantization parameters in dependence on contributions of individual directional loudness maps of the one or more signals to be quantized to an overall directional loudness map.
38. The audio encoder (300) according to claim 36 or claim 37, wherein the audio encoder (300) is configured to determine an overall directional loudness map on the basis of the input audio signals (112, 1121, 1122, 1123, 112a, 112b), such that the overall directional loudness map represents loudness information (142, 1421, 1422, 142a, 142b) associated with the different directions (121) of an audio scene represented by the input audio signals (112, 1121, 1122, 1123, 112a, 112b).
39. The audio encoder (300) according to one of the claims 36 to 38, wherein the one or more signals to be quantized are associated with different directions (121) or are associated with different loudspeakers or are associated with different audio objects.
40. The audio encoder (300) according to one of the claims 36 to 39, wherein the signals to be quantized comprise components of a joint multi-signal coding of two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b).
41. The audio encoder (300) according to one of the claims 36 to 40, wherein the audio encoder (300) is configured to estimate a contribution of a residual signal of the joint multi-signal coding to the overall directional loudness map, and to adjust (342) the one or more quantization parameters on dependence thereon.
42. The audio encoder (300) according to one of claims 32 to 41, wherein the audio encoder (300) is configured to adapt (340) a bit distribution between the one or more signals and/or parameters to be encoded individually for different spectral bins or individually for different frequency bands; and/or
wherein the audio encoder (300) is configured to adapt (342) a quantization precision of the one or more signals to be encoded individually for different spectral bins or individually for different frequency bands.
43. The audio encoder (300) according to one of claims 32 to 42,
wherein the audio encoder (300) is configured to adapt (340) a bit distribution between the one or more signals and/or parameters to be encoded in dependence on an evaluation of a spatial masking between two or more signals to be encoded,
wherein the audio encoder (300) is configured to evaluate the spatial masking on the basis of the directional loudness maps associated with the two or more signals to be encoded.
44. The audio encoder (300) according to claim 43, wherein the audio encoder (300) is configured to evaluate a masking effect of a loudness contribution (132, 1321, 1322, 1351, 1352) associated with a first direction of a first signal to be encoded onto a loudness contribution (132, 1321, 1322, 1351, 1352) associated with a second direction of a second signal to be encoded.
45. The audio encoder (300) according to one of claims 32 to 44, wherein the audio encoder (300) comprises an audio analyzer (100) according to one of claims 1 to 24, wherein the loudness information (142, 1421, 1422, 142a, 142b) associated with different directions (121) forms the directional loudness map.
46. The audio encoder (300) according to one of claims 32 to 45,
wherein the audio encoder (300) is configured to adapt (340) a noise introduced by the encoder in dependence on the one or more directional loudness maps.
47. The audio encoder (300) according to claim 46,
wherein the audio encoder (300) is configured to use a deviation between a directional loudness map, which is associated with a given un-encoded input audio signal, and a directional loudness map achievable by an encoded version of the given input audio signal, as a criterion for the adaptation of the provision of the given encoded audio signal.
48. The audio encoder (300) according to one of claims 32 to 47,
wherein the audio encoder (300) is configured to activate and deactivate a joint coding tool in dependence on one or more directional loudness maps which represent loudness information (142, 1421, 1422, 142a, 142b) associated with a plurality of different directions (121) of the one or more signals to be encoded.
49. The audio encoder (300) according to one of claims 32 to 48,
wherein the audio encoder (300) is configured to determine one or more parameters of a joint coding tool in dependence on one or more directional loudness maps which represent loudness information (142, 1421, 1422, 142a, 142b) associated with a plurality of different directions (121) of the one or more signals to be encoded.
50. The audio encoder (300) according to one of claims 32 to 49, wherein the audio encoder (300) is configured to determine or estimate an influence of a variation of one or more control parameters controlling the provision of the one or more encoded audio signals (320) onto a directional loudness map of one or more encoded signals, and to adjust the one or more control parameters in dependence on the determination or estimation of the influence.
51. The audio encoder (300) according to one of claims 32 to 50,
wherein the audio encoder (300) is configured to obtain a direction component used for obtaining the one or more directional loudness maps using metadata representing position information of loudspeakers associated with the input audio signals (112, 1121, 1122, 1123, 112a, 112b).
52. An audio encoder (300) for encoding (310) an input audio content (112) comprising one or more input audio signals (112, 1121, 1122, 1123, 112a, 112b),
wherein the audio encoder (300) is configured to provide one or more encoded audio signals (320) on the basis of two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b), or on the basis of two or more signals derived therefrom (110, 1101, 1102, 110a, 110b), using a joint encoding (310) of two or more signals to be encoded jointly;
wherein the audio encoder (300) is configured to select (350) signals to be encoded jointly out of a plurality of candidate signals (110, 1101, 1102) or out of a plurality of pairs of candidate signals (110, 1101, 1102) in dependence on directional loudness maps which represent loudness information (142, 1421, 1422, 142a, 142b) associated with a plurality of different directions (121 ) of the candidate signals (110, 1101, 1102) or of the pairs of candidate signals (110, 1101, 1102).
53. The audio encoder (300) according to claim 52,
wherein the audio encoder (300) is configured to select (350) signals to be encoded jointly out of a plurality of candidate signals (110, 1101, 1102) or out of a plurality of pairs of candidate signals (110, 1101, 1102) in dependence on contributions of individual directional loudness maps of the candidate signals (110, 1101, 1102) to an overall directional loudness map or in deoendence on contributions of directional loudness maps of the pairs of candidate signals (110, 1101, 1102) to an overall directional loudness map.
54. The audio encoder (300) according to claim 52 or claim 53,
wherein the audio encoder (300) is configured to determine a contribution of pairs of candidate signals (110, 1101, 1102) to the overall directional loudness map; and
wherein the audio encoder (300) is configured to choose one or more pairs of candidate signals (110, 1101, 1102) having a highest contribution to the overall directional loudness map for a joint encoding (310), or
wherein the audio encoder (300) is configured to choose one or more pairs of candidate signals (110, 1101, 1102) having a contribution to the overall directional loudness map which is larger than a predetermined threshold for a joint encoding (310).
55. The audio encoder (300) according to one of claims 52 to 54,
wherein the audio encoder (300) is configured to determine individual directional loudness maps of two or more candidate signals (110, 1101, 1102), and
wherein the audio encoder (300) is configured to compare the individual directional loudness maps of the two or more candidate signals (110, 1101, 1102), and
wherein the audio encoder (300) is configured to select (350) two or more of the candidate signals (110, 1101, 1102) for a joint encoding (310) in dependence on a result of the comparison.
56. The audio encoder (300) according to one of claims 52 to 55,
wherein the audio encoder (300) is configured to determine an overall directional loudness map using a downmixing of the input audio signals (112, 1121, 1122, 1123, 112a, 112b) or using a binauralization of the input audio signals (112, 1121, 1122, 1123, 112a, 112b).
57. An audio encoder (300) for encoding (310) an input audio content (112) comprising one or more input audio signals (112, 1121, 1122, 1123, 112a, 112b),
wherein the audio encoder (300) is configured to provide one or more encoded audio signals (320) on the basis of two or more input audio signals (112, 1121, 1122, 1123, 112a, 112b), or on the basis of two or more signals derived therefrom (110, 1101, 1102, 110a, 110b);
wherein the audio encoder (300) is configured to determine an overall directional loudness map on the basis of the input audio signals (112, 1121, 1122, 1123, 112a, 112b), and/or to determine one or more individual directional loudness maps associated with individual input audio signals (112, 1121, 1122, 1123, 112a, 112b); and
wherein the audio encoder (300) is configured to encode the overall directional loudness map and/or one or more individual directional loudness maps as a side information.
58. The audio encoder (300) according to claim 57,
wherein the audio encoder (300) is configured to determine the overall directional loudness map on the basis of the input audio signals (112, 1121, 1122, 1123, 112a, 112b) such that the overall directional loudness map represents loudness information (142, 1421, 1422, 142a, 142b) associated with the different directions (121) of an audio scene represented by the input audio signals (112, 1121, 1122, 1123, 112a, 112b).
59. The audio encoder (300) according to one of claims 57 to 58,
wherein the audio encoder (300) is configured to encode the overall directional loudness map in the form of a set of values associated with different directions (121); or
wherein the audio encoder (300) is configured to encode the overall directional loudness map using a center position value and a slope information; or
wherein the audio encoder (300) is configured to encode the overall directional loudness map in the form of a polynomial representation; or
wherein the audio encoder (300) is configured to encode the overall directional loudness map in the form of a spline representation.
60. The audio encoder (300) according to one of claims 57 to 59,
wherein the audio encoder (300) is configured to encode one downmix signal obtained on the basis of a plurality of input audio signals (112, 1121 , 1122, 1123, 112a, 112b) and an overall directional loudness map; or
wherein the audio encoder (300) is configured to encode a plurality of signals, and to encode individual directional loudness maps of a plurality of signals which are encoded; or
wherein the audio encoder (300) is configured to encode an overall directional loudness map, a plurality of signals and parameters describing contributions of the signals which are encoded to the overall directional loudness map.
61. An audio decoder (400) for decoding (410) an encoded audio content (420),
wherein the audio decoder (400) is configured to receive an encoded representation (420) of one or more audio signals and to provide a decoded representation (432) of the one or more audio signals;
wherein the audio decoder (400) is configured to receive an encoded directional loudness map information (424) and to decode the encoded directional loudness map information (424), to obtain one or more directional loudness maps (414); and
wherein the audio decoder (400) is configured to reconstruct (430) an audio scene using the decoded representation (432) of the one or more audio signals and using the one or more directional loudness maps.
62. The audio decoder (400) according to claim 61, wherein the audio decoder (400) is configured to obtain output signals such that one or more directional loudness maps associated with the output signals approximate or equal one or more target directional loudness maps,
wherein the one or more target directional loudness maps are based on the one or more decoded directional loudness maps (414) or are equal to the one or more decoded directional loudness maps (414).
63. The audio decoder (400) according to claim 61 or claim 62,
wherein the audio decoder (400) is configured to receive
- one encoded downmix signal and an overall directional loudness map; or
- a plurality of encoded audio signals (422), and individual directional loudness maps of the plurality of encoded signals; or
- an overall directional loudness map, a plurality of encoded audio signals (422) and parameters describing contributions of the encoded audio signals (422) to the overall directional loudness map; and
wherein the audio decoder (400) is configured to provide the output signals on the basis thereof.
64. A format converter (500) for converting (510) a format of an audio content (520), which represents an audio scene, from a first format to a second format,
wherein the format converter (500) is configured provide a representation (530) of the audio content in the second format on the basis of the representation of the audio content in the first format;
wherein the format converter (500) is configured to adjust (540) a complexity of the format conversion in dependence on contributions of input audio signals (112, 1121, 1122, 1123, 112a, 112b) of the first format to an overall directional loudness map of the audio scene.
65. The format converter (500) according to claim 64,
wherein the format converter (500) is configured to receive a directional loudness map information, and to obtain the overall directional loudness map and/or one or more directional loudness maps on the basis thereof.
66. The format converter (500) according to claim 65,
wherein the format converter (500) is configured to derive the overall directional loudness map from the one or more directional loudness maps.
67. The format converter (500) according to one of claims 64 to 66,
wherein the format converter (500) is configured to compute or estimate a contribution of a given input audio signal to the overall directional loudness map of the audio scene; and
wherein the format converter (500) is configured to decide whether to consider the given input audio signal in the format conversion in dependence on a computation or estimation of the contribution
68. An audio decoder (400) for decoding (410) an encoded audio content (420),
wherein the audio decoder (400) is configured to receive an encoded representation (420) of one or more audio signals and to provide a decoded representation (432) of the one or more audio signals;
wherein the audio decoder (400) is configured to reconstruct (430) an audio scene using the decoded representation (432) of the one or more audio signals;
wherein the audio decoder (400) is configured to adjust (440) a decoding complexity in dependence on contributions of encoded signals to an overall directional loudness map of a decoded audio scene.
69. The audio decoder (400) according to claim 68,
wherein the audio decoder (400) is configured to receive an encoded directional loudness map information (424) and to decode the encoded directional loudness map information (424), to obtain the overall directional loudness map and/or one or more directional loudness maps.
70. The audio decoder (400) according to claim 69,
Wherein the audio decoder (400) is configured to derive the overall directional loudness map from the one or more directional loudness maps.
71. The audio decoder (400) according to one of claims 68 to 70,
Wherein the audio decoder (400) is configured to compute or estimate a contribution of a given encoded signal to the overall directional loudness map of the decoded audio scene; and
Wherein the audio decoder (400) is configured to decide whether to decode the given encoded signal in dependence on a computation or estimation of the contribution,
72. A renderer (600) for rendering an audio content,
wherein the renderer (600) is configured to reconstruct (640) an audio scene on the basis of one or more input audio signals (112, 1121, 1122, 1123, 112a, 112b);
wherein the renderer (600) is configured to adjust (650) a rendering complexity in dependence on contributions of the input audio signals (112, 1121, 1122, 1123, 112a, 112b) to an overall directional loudness map (142) of a rendered audio scene (642).
73. The renderer (600) according to claim 72,
wherein the renderer (600) is configured to obtain a directional loudness map information (142), and to obtain the overall directional loudness map and/or one or more directional loudness maps on the basis thereof.
74. The renderer (600) according to claim 73,
wherein the renderer (600) is configured to derive the overall directional loudness map from the one or more directional loudness maps.
75. The renderer (600) according to one of claims 72 to 74,
Wherein the renderer (600) is configured to compute or estimate a contribution of a given input audio signal to the overall directional loudness map of the audio scene; and
Wherein the renderer (600) is configured to decide whether to consider the given input audio signal in the rendering in dependence on a computation or estimation of the contribution.
76. A method (1000) for analyzing an audio signal, the method comprising;
obtaining (1100) a plurality of weighted spectral domain representations on the basis of one or more spectral domain representations of two or more input audio signals,
wherein values of the one or more spectral domain representations are weighted (1200) in dependence on different directions of audio components in two or more input audio signals, to obtain the plurality of weighted spectral domain representations; and
obtaining (1300) loudness information associated with the different directions on the basis of the plurality of weighted spectral domain representations as an analysis result.
77. A method (2000) for evaluating a similarity of audio signals, the method comprising:
obtaining (2100) a first loudness information associated with different directions on the basis of a first set of two or more input audio signals, and
comparing (2200) the first loudness information with a second loudness information associated with the different panning directions and with a set of two or more reference audio signals, in order to obtain (2300) a similarity information describing a similarity between the first set of two or more input audio signals and the set of two or more reference audio signals.
78. A method (3000) for encoding an input audio content comprising one or more input audio signals,
wherein the method comprises providing (3100) one or more encoded audio signals on the basis of one or more input audio signals, or one or more signals derived therefrom; and
wherein the method comprises adapting (3200) the provision of the one or more encoded audio signals in dependence on one or more directional loudness maps which represent loudness information associated with a plurality of different directions of the one or more signals to be encoded.
79. A method (4000) for encoding an input audio content comprising one or more input audio signals,
wherein the method comprises providing (4100) one or more encoded audio signals on the basis of two or more input audio signals, or on the basis of two or more signals derived therefrom, using a joint encoding of two or more signals to be encoded jointly; and
wherein the method comprises selecting (4200) signals to be encoded jointly out of a plurality of candidate signals or out of a plurality of pairs of candidate signals in dependence on directional loudness maps which represent loudness information associated with a plurality of different directions of the candidate signals or of the pairs of candidate signals.
80. A method (5000) for encoding an input audio content comprising one or more input audio signals,
wherein the method comprises providing (5100) one or more encoded audio signals on the basis of two or more input audio signals, or on the basis of two or more signals derived therefrom;
wherein the method comprises determining (5200) an overall directional loudness map on the basis of the input audio signals, and/or determining one or more individual directional loudness maps associated with individual input audio signals; and
wherein the method comprises encoding (5300) the overall directional loudness map and/or one or more individual directional loudness maps as a side information.
81. A method (6000) for decoding an encoded audio content,
wherein the method comprises receiving (6100) an encoded representation of one or more audio signals and providing (6200) a decoded representation of the one or more audio signals;
wherein the method comprises receiving (6300) an encoded directional loudness map information and decoding (6400) the encoded directional loudness map information, to obtain (6500) one or more directional loudness maps; and
wherein the method comprises reconstructing (6600) an audio scene using the decoded representation of the one or more audio signals and using the one or more directional loudness maps.
82. A method (7000) for converting (7100) a format of an audio content, which represents an audio scene, from a first format to a second format,
wherein method comprises providing a representation of the audio content in the second format on the basis of the representation of the audio content in the first format;
wherein the method comprises adjusting (7200) a complexity of the format conversion in dependence on contributions of input audio signals of the first format to an overall directional loudness map of the audio scene.
83, A method (8000) for decoding an encoded audio content,
wherein the method comprises receiving (8100) an encoded representation of one or more audio signals and providing (8200) a decoded representation of the one or more audio signals;
wherein the method comprises reconstructing (8300) an audio scene using the decoded representation of the one or more audio signals;
wherein the method comprises adjusting (8400) a decoding complexity in dependence on contributions of encoded signals to an overall directional loudness map of a decoded audio scene.
84. A method (9000) for rendering an audio content,
wherein the method comprises reconstructing (9100) an audio scene on the basis of one or more input audio signals;
wherein the method comprises adjusting (9200) a rendering complexity in dependence on contributions of the input audio signals to an overall directional loudness map of a rendered audio scene.
85. A computer program having a program code for performing, when running on a computer, a method according to claims 100 to 108.
86. An Encoded audio representation, comprising
an encoded representation of one or more audio signals; and
an encoded directional loudness map information.
| # | Name | Date |
|---|---|---|
| 1 | 202137018985-FORM 3 [14-03-2024(online)].pdf | 2024-03-14 |
| 1 | 202137018985-STATEMENT OF UNDERTAKING (FORM 3) [24-04-2021(online)].pdf | 2021-04-24 |
| 2 | 202137018985-FORM 1 [24-04-2021(online)].pdf | 2021-04-24 |
| 2 | 202137018985-IntimationOfGrant07-03-2024.pdf | 2024-03-07 |
| 3 | 202137018985-PatentCertificate07-03-2024.pdf | 2024-03-07 |
| 3 | 202137018985-FIGURE OF ABSTRACT [24-04-2021(online)].pdf | 2021-04-24 |
| 4 | 202137018985-Information under section 8(2) [18-01-2024(online)].pdf | 2024-01-18 |
| 4 | 202137018985-DRAWINGS [24-04-2021(online)].pdf | 2021-04-24 |
| 5 | 202137018985-Information under section 8(2) [06-12-2023(online)].pdf | 2023-12-06 |
| 5 | 202137018985-DECLARATION OF INVENTORSHIP (FORM 5) [24-04-2021(online)].pdf | 2021-04-24 |
| 6 | 202137018985-FORM 3 [19-09-2023(online)].pdf | 2023-09-19 |
| 6 | 202137018985-COMPLETE SPECIFICATION [24-04-2021(online)].pdf | 2021-04-24 |
| 7 | 202137018985-Information under section 8(2) [19-09-2023(online)].pdf | 2023-09-19 |
| 7 | 202137018985-FORM 18 [29-04-2021(online)].pdf | 2021-04-29 |
| 8 | 202137018985-Information under section 8(2) [23-06-2023(online)].pdf | 2023-06-23 |
| 8 | 202137018985-Information under section 8(2) [06-07-2021(online)].pdf | 2021-07-06 |
| 9 | 202137018985-FORM 3 [10-03-2023(online)].pdf | 2023-03-10 |
| 9 | 202137018985-FORM-26 [12-07-2021(online)].pdf | 2021-07-12 |
| 10 | 202137018985-Information under section 8(2) [10-03-2023(online)].pdf | 2023-03-10 |
| 10 | 202137018985-Information under section 8(2) [15-09-2021(online)].pdf | 2021-09-15 |
| 11 | 202137018985-Information under section 8(2) [29-12-2022(online)].pdf | 2022-12-29 |
| 11 | 202137018985.pdf | 2021-10-19 |
| 12 | 202137018985-CLAIMS [30-11-2022(online)].pdf | 2022-11-30 |
| 12 | 202137018985-FER.pdf | 2022-03-04 |
| 13 | 202137018985-FER_SER_REPLY [30-11-2022(online)].pdf | 2022-11-30 |
| 13 | 202137018985-FORM 3 [16-03-2022(online)].pdf | 2022-03-16 |
| 14 | 202137018985-FORM 3 [08-09-2022(online)].pdf | 2022-09-08 |
| 14 | 202137018985-Information under section 8(2) [23-08-2022(online)].pdf | 2022-08-23 |
| 15 | 202137018985-FORM 4(ii) [24-08-2022(online)].pdf | 2022-08-24 |
| 16 | 202137018985-FORM 3 [08-09-2022(online)].pdf | 2022-09-08 |
| 16 | 202137018985-Information under section 8(2) [23-08-2022(online)].pdf | 2022-08-23 |
| 17 | 202137018985-FORM 3 [16-03-2022(online)].pdf | 2022-03-16 |
| 17 | 202137018985-FER_SER_REPLY [30-11-2022(online)].pdf | 2022-11-30 |
| 18 | 202137018985-FER.pdf | 2022-03-04 |
| 18 | 202137018985-CLAIMS [30-11-2022(online)].pdf | 2022-11-30 |
| 19 | 202137018985-Information under section 8(2) [29-12-2022(online)].pdf | 2022-12-29 |
| 19 | 202137018985.pdf | 2021-10-19 |
| 20 | 202137018985-Information under section 8(2) [10-03-2023(online)].pdf | 2023-03-10 |
| 20 | 202137018985-Information under section 8(2) [15-09-2021(online)].pdf | 2021-09-15 |
| 21 | 202137018985-FORM 3 [10-03-2023(online)].pdf | 2023-03-10 |
| 21 | 202137018985-FORM-26 [12-07-2021(online)].pdf | 2021-07-12 |
| 22 | 202137018985-Information under section 8(2) [06-07-2021(online)].pdf | 2021-07-06 |
| 22 | 202137018985-Information under section 8(2) [23-06-2023(online)].pdf | 2023-06-23 |
| 23 | 202137018985-FORM 18 [29-04-2021(online)].pdf | 2021-04-29 |
| 23 | 202137018985-Information under section 8(2) [19-09-2023(online)].pdf | 2023-09-19 |
| 24 | 202137018985-COMPLETE SPECIFICATION [24-04-2021(online)].pdf | 2021-04-24 |
| 24 | 202137018985-FORM 3 [19-09-2023(online)].pdf | 2023-09-19 |
| 25 | 202137018985-Information under section 8(2) [06-12-2023(online)].pdf | 2023-12-06 |
| 25 | 202137018985-DECLARATION OF INVENTORSHIP (FORM 5) [24-04-2021(online)].pdf | 2021-04-24 |
| 26 | 202137018985-Information under section 8(2) [18-01-2024(online)].pdf | 2024-01-18 |
| 26 | 202137018985-DRAWINGS [24-04-2021(online)].pdf | 2021-04-24 |
| 27 | 202137018985-PatentCertificate07-03-2024.pdf | 2024-03-07 |
| 27 | 202137018985-FIGURE OF ABSTRACT [24-04-2021(online)].pdf | 2021-04-24 |
| 28 | 202137018985-IntimationOfGrant07-03-2024.pdf | 2024-03-07 |
| 28 | 202137018985-FORM 1 [24-04-2021(online)].pdf | 2021-04-24 |
| 29 | 202137018985-STATEMENT OF UNDERTAKING (FORM 3) [24-04-2021(online)].pdf | 2021-04-24 |
| 29 | 202137018985-FORM 3 [14-03-2024(online)].pdf | 2024-03-14 |
| 1 | SearchE_04-03-2022.pdf |