Audio Decoder, Apparatus For Generating Encoded Audio Output Data And

Audio Decoder, Apparatus For Generating Encoded Audio Output Data And Methods Permitting Initializing A Decoder

Abstract: An audio decoder decodes a bit stream of encoded audio data, wherein the bit stream of encoded audio data represents a sequence of audio sample values and comprises a plurality of frames, wherein each frame includes associated encoded audio sample values. The audio decoder comprises a determiner configured to determine whether a frame of the encoded audio data is a special frame comprising encoded audio sample values associated with the special frame and additional information, wherein the additional information comprise encoded audio sample values of a number of frames preceding the special frame, wherein the encoded audio sample values of the preceding frames are encoded using the same codec configuration as the special frame, wherein the number of preceding frames is sufficient to initialize the decoder to be in a position to decode the audio sample values associated with the special frame if the special frame is the first frame upon start-up of the decoder. The decoder comprises an initializer configured to initialize the decoder, wherein initializing the decoder comprises decoding the encoded audio sample values included in the additional information before decoding the encoded audio sample values associated with the special frame.

Patent Information

Application #

Filing Date

25 February 2021

Publication Number

04/2022

Publication Type

INA

Invention Field

ELECTRONICS

Status

Email

iprdel@lakshmisri.com

Parent Application

Patent Number

Legal Status

Grant Date

2023-11-24

Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Hansastraße 27c 80686 München, Germany

Inventors

1. FISCHER, Daniel

Nürnberger Straße 75 90762 Fürth, Germany

2. CZELHAN, Bernd

In der Schwärze 8 91230 Happurg, Germany

3. NEUENDORF, Max

Paradiesstraße 20 90459 Nürnberg, Germany

4. RETTELBACH, Nikolaus

Spessartstr. 38 90427 Nürnberg, Germany

5. HOFMANN, Ingo

Campestraße 21 90419 Nürnberg, Germany

6. FUCHS, Harald

Amselstr. 5 91341 Roettenbach, Germany

7. DÖHLA, Stefan

Saidelsteig 61 91058 Erlangen, Germany

8. FÄRBER, Nikolaus

Thomas-Dehler-Strasse 28 91052 Erlangen, Germany

Specification

Claims

1. Audio decoder (60) for decoding a bit stream of encoded audio data, wherein the bit stream of encoded audio data represents a sequence of audio sample values and comprises a plurality of frames (40), wherein each frame (40) includes associated encoded audio sample values, the audio decoder (60) comprising:

a determiner (130) configured to determine whether a frame of the encoded audio data is a special frame (42, 80) comprising encoded audio sample values associated with the special frame (42, 80) and additional information (82), wherein the additional information (82) comprise encoded audio sample values of a number of frames (86) preceding the special frame, wherein the encoded audio sample values of the preceding frames are encoded using the same codec configuration as the special frame, wherein the number of preceding frames is sufficient to initialize the decoder (60) to be in a position to decode the audio sample values associated with the special frame (42, 80) if the special frame is the first frame upon start-up of the decoder; and

an initializer configured to initialize the decoder (60) if the determiner determines that the frame is a special frame, wherein initializing the decoder comprises decoding the encoded audio sample values included in the additional information before decoding the encoded audio sample values associated with the special frame (42, 80).

2. Audio decoder (60) of claim 1, wherein the initializer is configured to switch the audio decoder (60) from a current codec configuration to a different codec configuration (84) if the determiner (130) determines that the frame is a special frame (42, 80) and if the audio sample values of the special frame have been encoded using the different codec configuration.

3. Decoder of claim 2, configured to decode the special frame (42. 80) using the current codec configuration and to discard the additional information if the determiner determines (130) that the frame is a special frame (42, 80) and if the audio sample values of the special frame have been encoded using the current coded configuration.

4. Audio decoder of ciaim 2, wherein the additional information comprise information on the codec configuration (84) used for encoding the audio sample values associated with the special frame (42, 80), wherein the determiner is configured to determine whether the codec configuration of the additional information is different from the current codec configuration.

5. Audio decoder (60) of one of claims 2 to 4, comprising a crossfader (318) configured to perform crossfading between a plurality of output sample values obtained using the current codec configuration and a plurality of output sample values obtained by decoding the encoded audio sample values associated with the special frame (42, 80).

6. Audio decoder of claim 5, wherein the crossfader (318) is configured to perform crossfading of output sample values obtained by flushing the decoder (60) in the current codec configuration and output sample values obtained by decoding the encoded audio sample values associated with the special frame (42, 80).

7. Audio decoder of one of claims 1 to 6, wherein an earliest frame of the number of frames (86) comprised in the additional information (82) is not time-differentially encoded or entropy encoded relative to any frame previous to the earliest frame and wherein the special frame (42, 80) is not time-differentially encoded or entropy encoded relative to any frame previous to the earliest frame of the number of frames preceding the special frame (42, 80) or relative to any frame previous to the special frame (42, 80).

8. Audio decoder of one of claims 1 to 7, wherein the special frame (42, 80) comprises the additional information as an extension payload and wherein the determiner is configured to evaluate the extension payload of the special frame (42, 80).

9. Apparatus (100; 12. 14, 16, 18) for generating a bit stream of encoded audio data representing a sequence of audio sample values of an audio signal (10), wherein the bit stream of encoded audio data comprise a plurality of frames, wherein each frame includes associated encoded audio sample values, wherein the apparatus (100; 12, 14, 16, 18) comprises:

a special frame provider configured to provide at least one of the frames as a special frame (42, 80), the special frame (42, 80) comprising encoded audio sample values associated with the special frame (42, 80) and additional information (82), wherein the additional information (82) comprise encoded audio sample values of a number of frames (86) preceding the special frame, wherein the encoded audio sample values of the preceding frames are encoded using the same codec configuration as the special frame, and wherein the number of preceding frames is sufficient to initialize a decoder (60) to be in a position to decode the audio sample values associated with the special frame (42, 80) if the special frame is the first frame upon start-up of the decoder; and

an output (112) configured to output the bit stream of encoded audio data (54, 102).

10. Apparatus (100; 12, 14, 16, 18) of claim 9, wherein the additional information comprise information on the codec configuration (84) used for encoding the audio sample values associated with the special frame (42, 80).

11. Apparatus (100; 12, 14, 16, 18) of claim 9 or 10, wherein the encoded audio data comprise a plurality of segments (30), wherein each segment is associated with one of a plurality of portions of the sequence of audio sample values and comprises a plurality of frames (40). wherein the special frame adder is configured to add a special frame (42, 80) at the beginning of each segment (30).

12. Apparatus (100) of one of claims 9 or 10, wherein the encoded audio data (54, 102) comprise a plurality of segments (44, 46, 48), wherein each segment (44, 46, 48) is associated with one of a plurality of portions of the sequence of audio sample values and comprises a plurality of the frames (40), the apparatus (100) comprising:

a segment provider (104) configured to provide segments (44, 46, 48) associated with different portions of the sequence of audio sample values and encoded by different codec configurations, wherein the special frame provider is configured to provide a first frame (42, 80) of at least one of the segments as the special frame (42, 80); and

a generator (52, 110) configured to generate the audio output data by arranging the at least one of the segments (44, 46, 48) following another one of the segments (44, 46, 48).

13. Apparatus of claim 12, wherein the segment provider (100) is configured to select a codec configuration for each segment based on a control signal.

14. Apparatus of claim 12 or 13, wherein the segment provider (100) is configured to provide m encoded versions (22, 24, 26, 28) of the sequence of audio sample values, with m ≥ 2, wherein the m encoded versions are encoded using different codec configurations, wherein each encoded version comprises a plurality of segments (30) representing the plurality of portions of the sequence of audio sample values, wherein the special frame provider is configured to provide a special frame (42. 80) at the beginning of each of the segments.

15. Apparatus of claim 14, wherein the segment provider (100) comprises a plurality of encoders (12, 14, 16, 18), each configured to encode at least in part the audio signal according to one of the plurality of different codec configurations.

16. Apparatus of claim 15, wherein the segment provider comprises a memory storing the m encoded versions of the sequence of audio sample values.

17. Apparatus of one of claims 12 to 15, wherein the special frame provider (100) is configured to provide the additional information as an extension payload of the special frame (42, 80).

18. Method for decoding a bit stream of encoded audio data, wherein the bit stream of encoded audio data represents a sequence of audio sample values and comprises a plurality of frames (40), wherein each frame (40) includes associated encoded audio sample values, comprising:

determining whether a frame of the encoded audio data is a special frame (42, 80) comprising encoded audio sample values associated with the special frame (42, 80) and additional information (82), wherein the additional information (82) comprise encoded audio sample values of a number of frames (86) preceding the special frame, wherein the encoded audio sample values of the preceding frames are encoded using the same codec configuration as the special frame, wherein the number of preceding frames is sufficient to initialize a decoder (60) to be in a position to decode the audio sample values associated with the special frame (42, 80) if the special frame is the first frame upon start-up of the decoder; and

initializing the decoder (60) if it is determined that the frame is a special frame, wherein the initializing comprises decoding the encoded audio sample values included in the additional information before decoding the encoded audio sample values associated with the special frame (42, 80).

19. Method of claim 18, comprising switching the audio decoder (60) from a current codec configuration to a different codec configuration (84) if it is determined that the frame is a special frame (42, 80) and if the audio sample values of the special frame have been encoded using the different codec configuration.

20. Method of claim 19, wherein the bit stream of audio data comprises a first number of frames encoded using a first codec configuration and a second number of frames following the first number of frames and encoded using a second codec configuration, wherein the first frame of the second number of frames is the special frame.

21. Method of one of claims 18 to 20, wherein the additional information comprise information on the codec configuration (84) used for encoding the audio sample values associated with the special frame (42, 80), the method comprising determining whether the codec configuration of the additional information is different from the current codec configuration using which encoded audio sample values of frames in the bit stream, which precede the special frame, are encoded.

22. Method for generating a bit stream of encoded audio data representing a sequence of audio sample values of an audio signal (10), wherein the bit stream of encoded audio data comprise a plurality of frames, wherein each frame includes associated encoded audio sample values, comprising:

providing at least one of the frames as a special frame (42, 80), the special frame (42, 80) comprising encoded audio sample values associated with the special frame (42. 80) and additional information (82), wherein the additional information (82) comprise encoded audio sample values of a number of frames (86) preceding the special frame, wherein the encoded audio sample values of the preceding frames are encoded using the same codec configuration as the special frame, and wherein the number of preceding frames is sufficient to initialize a decoder (60) to be in a position to decode the audio sample values associated with the special frame (42, 80) if the special frame is the first frame upon start-up of the decoder; and

generating the bit stream by concatenating the special frame (42, 80) and the other frames of the plurality of frames.

23. Method of claim 22, wherein the additional information comprise information on the codec configuration (84) used for encoding the audio sample values associated with the special frame (42, 80).

24. Method of claim 22 or 23, comprising providing segments (44, 46, 48) associated with different portions of the sequence of audio sample values and encoded by different codec configurations, wherein a first frame (42, 80) of at least one of the segments is provided as the special frame (42, 80).

25. Computer program for performing, when running on a computer or a processor, the method of one of claims 18 to 24.

Audio Decoder, Apparatus for Generating Encoded Audio Output Data and Methods

Permitting Initializing a Decoder

Description

The present invention is related to audio encoding/decoding and in particular to an approach of encoding and decoding data, which permits initializing a decoder such as it may be required when switching between different codec configurations.

Embodiments of the invention may be applied to scenarios, in which properties of transmission channels may vary widely depending on access technology, such as DSL, WiFi, 3G. LTE and the like. Mobile phone reception may fade indoors or in rural areas. The quality of wireless internet connections strongly depends on the distance to the base station and access technology, leading to fluctuations of the bitrate. The available bitrate per user may also change with the number of clients connected to one base station.

It is the object of the invention to provide for a concept which permits delivery of audio content in a flexible manner.

According to the invention, this object is achieved by an audio decoder according to claim 1, an apparatus for generating encoded audio output data according to claim 9, , a method for decoding audio input data according to claim 18, a method for generating encoded audio data according to claim 22, and a computer program according to claim 25.

Embodiments of the invention provide an audio decoder for decoding a bit stream of encoded audio data, wherein the bit stream of encoded audio data represents a sequence of audio sample values and comprises a plurality of frames, wherein each frame includes associated encoded audio sample values, the audio decoder comprising:

a determiner configured to determine whether a frame of the encoded audio data is a special frame comprising encoded audio sample values associated with the special frame and additional information, wherein the additional information comprise encoded audio sample values of a number of frames preceding the special frame, wherein the encoded audio sample values of the preceding frames are encoded using the same codec configuration as the special frame, wherein the number of preceding frames is sufficient, to initialize the decoder to be in a position to decode the audio sample values associated with the special frame if the special frame is the first frame upon start-up of the decoder; and

an initializer configured to initialize the decoder if the determiner determines that the frame is a special frame, wherein initializing the decoder comprises decoding the encoded audio sample values included in the additional information before decoding the encoded audio sample values associated with the special frame.

Embodiments of the invention provide an apparatus for generating a bit stream of encoded audio data representing a sequence of audio sample values of an audio signal, wherein the bit stream of encoded audio data comprise a plurality of frames, wherein each frame includes associated encoded audio sample values, wherein the apparatus comprises:

a special frame provider configured to provide at least one of the frames as a special frame, the special frame comprising encoded audio sample values associated with the special frame and additional information, wherein the additional information comprise encoded audio sample values of a number of frames preceding the special frame, wherein the encoded audio sample values of the preceding frames are encoded using the same codec configuration as the special frame, and wherein the number of preceding frames is sufficient to initialize a decoder to be in a position to decode the audio sample values associated with the special frame if the special frame is the first frame upon start-up of the decoder; and

an output configured to output the bit stream of encoded audio data.

Embodiments of the invention provide a method for decoding a bit stream of encoded audio data, wherein the bit stream of encoded audio data represents a sequence of audio sample values and comprises a plurality of frames, wherein each frame includes associated encoded audio sample values, comprising:

determining whether a frame of the encoded audio data is a special frame comprising encoded audio sample values associated with the special frame and additional information, wherein the additional information compnse encoded audio sample values of a number of frames preceding the special frame, wherein the encoded audio sampie

values of the preceding frames are encoded using the same codec configuration as the special frame, wherein the number of preceding frames is sufficient to initialize a decoder to be in a position to decode the audio sample values associated with the special frame if the special frame is the first frame upon start-up of the decoder; and

initializing the decoder if it is determined that the frame is a special frame, wherein the initializing comprises decoding the encoded audio sample values included in the additional information before decoding the encoded audio sample values associated with the special frame.

Embodiments of the invention provide a method for generating a bit stream of encoded audio data representing a sequence of audio sample values of an audio signal, wherein the bit stream of encoded audio data comprise a plurality of frames, wherein each frame includes associated encoded audio sample values, comprising:

providing at least one of the frames as a special frame, the special frame comprising encoded audio sample values associated with the special frame and additional information, wherein the additional information comprise encoded audio sample values of a number of frames preceding the special frame, wherein the encoded audio sample values of the preceding frames are encoded using the same codec configuration as the special frame, and wherein the number of preceding frames is sufficient to initialize a decoder to be in a position to decode the audio sample values associated with the special frame if the special frame is the first frame upon start-up of the decoder; and

generating the bit stream by concatenating the special frame and the other frames of the plurality of frames.

Embodiments of the invention are based on the finding that immediate replay of a bit stream of encoded audio data representing a sequence of audio sample vaiues of an audio signal and comprising a plurality of frames can be achieved if one of the frames is provided as a special frame including encoded audio sample values associated with preceding frames, which are necessary to initiate a decoder to be in a position to decode the encoded audio sample vaiues associated with the special frame. The number of frames necessary to initiate the decoder accordingly depends on the codec configuration used and is known for the codec configurations. Embodiments of the invention are based on the finding that switching between different codec configurations can be achieved in a beneficial manner if such a special frame is arranged at a position where switching between the coding configurations shall take place. The special frame may not only include encoded audio sampie values associated with the special frame, but further information that allows switching between codec configurations and immediate replay upon switching. In embodiments of the invention, the apparatus and method for generating encoded audio output data and the audio encoder are configured to prepare encoded audio data in such a manner that immediate reply upon switching between codec configurations can take place at the decoder side. In embodiments of the invention, such audio data generated and output at the encoder side are received as audio input data at the decoder side and permit immediate replay at the decoder side. In embodiments of the invention, immediate replay is permitted at decoder side upon switching between different codec configurations at the decoder side.

In embodiments of the invention, the initializer is configured to switch the audio decoder from a current codec configuration to a different codec configuration if the determiner determines that the frame is a special frame and if the audio sample values of the special frame have been encoded using the different codec configuration.

In embodiments of the invention, the decoder is configured to decode the special frame using the current codec configuration and to discard the additional information if the determiner determines that the frame is a special frame and if the audio sample values of the special frame have been encoded using the current coded configuration.

In embodiments of the invention, the additional information comprise information on the codec configuration used for encoding the audio sample values associated with the special frame, wherein the determiner is configured to determine whether the codec configuration of the additional information is different from the current codec configuration.

In embodiments of the Invention, the audio decoder comprises a crossfader configured to perform crossfading between a plurality of output sample values obtained using the current codec configuration and a plurality of output sample values obtained by decoding the encoded audio sample values associated with the special frame. In embodiments of the invention, the crossfader is configured to perform crossfading of output sample values obtained by flushing the decoder in the current codec configuration and output sample values obtained by decoding the encoded audio sample values associated with the special frame.

in embodiments of the invention, an earliest frame of the number of frames comprised in the additional information is not time-differentially encoded or entropy encoded relative to any frame previous to the earliest frame and wherein the special frame is not time-differentially encoded or entropy encoded relative to any frame previous to the earliest frame of the number of frames preceding the special frame or relative to any frame previous to the special frame.

In embodiments of the invention, the special frame comprises the additional information as an extension payload and wherein the determiner is configured to evaluate the extension payload of the special frame. In embodiments of the invention, the additional information comprise information on the codec configuration used for encoding the audio sample values associated with the special frame.

In embodiments of the invention, the encoded audio data comprise a plurality of segments, wherein each segment is associated with one of a plurality of portions of the sequence of audio sample values and comprises a plurality of frames, wherein the special frame adder is configured to add a special frame at the beginning of each segment.

In embodiment of the invention, the encoded audio data comprise a plurality of segments, wherein each segment is associated with one of a plurality of portions of the sequence of audio sample values and comprises a plurality of the frames, wherein the apparatus for generating a bit stream of encoded audio data comprises a segment provider configured to provide segments associated with different portions of the sequence of audio sample values and encoded by different codec configurations, wherein the special frame provider is configured to provide a first frame of at least one of the segments as the special frame; and a generator configured to generate the audio output data by arranging the at least one of the segments following another one of the segments. In embodiments of the Invention, the segment provider is configured select a codec configuration for each segment based on a control signal. In embodiments of the invention, the segment provider is configured to provide m encoded versions of the sequence of audio sample values, with m ≥ 2, wherein the m encoded versions are encoded using different codec configurations, wherein each encoded version comprises a plurality of segments representing the plurality of portions of the sequence of audio sample values, wherein the special frame provider is configured to provide a special frame at the beginning of each of the segments.

in embodiments of the invention, the segment provider comprises a plurality of encoders, each configured to encode at least in part the audio signal according to one of the plurality of different codec configurations. In embodiments of the invention, the segment provider comprises a memory storing the m encoded versions of the sequence of audio sample values.

In embodiments of the invention, the additional information are in the form of an extension payload of the special frame.

In embodiments of the invention, the method of decoding comprises switching the audio decoder from a current codec configuration to a different codec configuration if it is determined that the frame is a special frame and if the audio sample values of the special frame have been encoded using the different codec configuration.

In embodiments of the invention, the bit stream of encoded audio data comprises a first number of frames encoded using a first codec configuration and a second number of frames following the first number of frames and encoded using a second codec configuration, wherein the first frame of the second number of frames is the special frame.

In embodiments of the invention, the additional information comprise information on the codec configuration used for encoding the audio sample values associated with the special frame, and the method comprises determining whether the codec configuration of the additional information is different from the current codec configuration using which encoded audio sample values of frames in the bit stream, which precede the special frame, are encoded.

In embodiments of the invention, the method of generating a bit stream of encoded audio data comprises providing segments associated with different portions of the sequence of audio sample vaiues and encoded by different codec configurations, wherein a first frame of at least one of the segments is provided as the special frame.

Thus, in embodiments of the invention, crossfading is performed in order to permit seamless switching between different codec configurations. In embodiments of the invention, the additional information of the special frame comprise the pre-roll frames necessary in order to initialize a decoder to be in a position to decode the special frame. In other words, in embodiments of the invention, the additional information comprise a

copy of that frames of encoded audio sample values preceding the special frame and encoded using the same codec configuration as the encoded audio sample values represented by the special frame necessary to initialize the decoder to be in position to decode the audio sample values associated with the special frame.

In embodiments of the invention, special frames are introduced into encoded audio data at regular temporal intervals, i.e. in a periodic manner. In embodiments of the invention, a first frame of each segment of encoded audio data is a special frame. In embodiments, the audio decoder is configured to decode the special frames and following frames using the codec configuration indicated in the special frame until a further special frame indicating a different codec configuration is encountered.

In embodiments of the invention, the decoder and the method for decoding are configured to perform a crossfade when switching from one codec configuration to another codec configuration, in order to permit seamless switching between multiple compressed audio representations.

In embodiments of the invention, the different codec configurations are different codec configurations according to the AAC (Advanced Audio Coding) standard, i.e. different codec configurations of the AAC family codecs. Embodiments of the invention may be directed to switching between codec configurations of the AAC family codecs and codec configurations of the AMR (Adaptive Multiple Rate) family codecs.

Thus, embodiments of the invention permit for immediate replay at decoder side and switching between different codec configurations so that the manner in which audio content is delivered may be adapted to the environmental conditions, such as a transmission channel with variable bitrate. Thus, embodiments of the invention permit for providing the consumer with the best possible audio quality for a given network condition.

Embodiments of the invention are subsequently discussed referring to the accompanying drawings, in which:

Fig. 1 shows a schematic view of an embodiment of an apparatus for generating encoded audio output data;

Fig. 2 shows a schematic view for explaining an embodiment of a special frame;

Fig. 3 shows a schematic view of different representations of an audio signal;

Fig. 4a and Fig. 4b show schematic views of apparatuses for generating encoded audio output data;

Fig. 5 shows a schematic view of an audio decoder;

Fig. 6 shows a schematic block diagram for explaining an embodiment of an audio decoder and a method for decoding;

Fig. 7 shows a schematic block diagram for explaining switching of an audio decoder between different codec configurations;

Fig. 8 shows a schematic diagram for explaining AAC (Advanced Audio Coding) decoder behavior;

Fig. 9 shows switching from a first stream 1 to a second stream 2; and

Fig. 10 shows an exemplary syntax element providing additional information.

Generally, embodiments of the invention aim at the delivery of audio content, possibly combined with video delivery, over a transmission-channel with variable bitrate. The goal may be to provide a consumer with the best possible audio quality for a given network condition. Embodiments of the invention focus on the implementation of AAC family codecs into an adaptive streaming environment.

In embodiments of the invention, as used herein, audio sample values which are not encoded represent time domain audio sample values such as PCM (pulse code modulated) samples. In embodiments of the invention, the term encoded audio sample value refers to frequency domain sample values obtained after encoding the time domain audio sample values, in embodiments of the invention, the encoded audio sample values or samples are those obtained by converting of the time domain samples into a spectral representation, such as by means of a MDCT (modified discrete cosine transformation), and encoding the result, such as by quantizing and Huffman coding. Accordingly, in embodiment of the invention, encoding means obtaining the frequency domain samples

from the time domain samples and decoding means obtaining the time domain samples from the frequency domain samples. Sample values (samples) obtained by decoding encoded audio data are sometimes referred to herein as output sample values (samples).

Fig. 1 shows an embodiment of an apparatus for generating encoded audio output data. Fig. 1 shows a typical scenario of adaptive audio streaming, which embodiments of the invention may be applied to. An audio input signal 10 is encoded by various audio encoders 12, 14, 16 and 18, i.e. encoders 1 to m. The encoders 1 to m may be configured to encode the audio input signal 10 simultaneously. Typically, encoders 1 to m may be configured such that a wide bit rate range can be achieved. The encoders generate different representations, i.e. coded versions, 22, 24, 26 and 28 of the audio input signal 10, i.e. representations 1 to m. Each representation includes a plurality of segments 1 to k, wherein the second segment of the first representation has been given reference number 30 for exemplary purposes only. Each segment comprises a plurality of frames (access units) designated by the letters AU and a respective index 1 to n indicating the position of the frame in the respective representation. The eighth frame of the first representation is given reference number 40 for exemplary purposes only.

The encoders 12, 14, 16 and 18 are configured to insert stream access points (SAPs) 42 at regular temporal intervals, which define the sizes of the segments. Thus, a segment, such as segment 30, consists of multiple frames, such as AU5, AU6, AU7 and AU8, wherein the first frame, AU5, represents a SAP 42. In Fig. 1 , the SAPs are indicated by hatching. Each representation 1 to m represents a compressed audio representation (CAR) for the audio input signal 10 and consists of k such segments. Switching between different CARs may take place at segment borders.

On decoder side, a client may request one of the representations which fits best for a given situation, e.g. for given network conditions. If for some reason the conditions change, the client should be able to request a different CAR, the apparatus for generating the encoded output data should be able to switch between different CARs at every segment border, and the decoder should be abie to switch to decode the different CAR at every segment border. Hence, the client would be in a position to adapt the media bit rate to the available channel bit rate in order to maximize quality while minimizing buffer under runs ("re-buffering"). If HTTP (Hyper Text Transfer Protocol) is used to download the segments, such a streaming architecture may be referred to as HTTP adaptive streaming.

Current implementations include Apple HTTP Live Streaming (HLS), Microsoft Smooth Streaming, and Adobe Dynamic Streaming, which all follow the basic principle. Recently. MPEG released an open standard: Dynamic Adaptive Streaming over HTTP (MPEG DASH), see "Guidelines for Implementation: DASH-AVC/264 Interoperability Points". http://dashif.Org/w/2013/08/DASH-AVC-264-v2.00-hd-mca.pdf. HTTP typically uses TCP/IP (Transmission Control Protocol/Internet Protocol) as the underlying network protocol. Embodiments of the invention can be applied to all of those current developments.

A switch between representations (encoded versions) shall be as seamless as possible. In other words, there shall not be any audible glitch or click during the switch. Without further measures provided for by embodiments of the invention, this requirement can only be achieved under certain constraints and if special care is taken during the encoding process.

In Fig. 1, the respective encoder which a segment originates from is indicated by a respective mark put within a circle. Fig. 1 further shows a decision engine 50, which decides which representation to download for each segment. A generator 52 generates encoded audio output data 54 from the selected segments which are given reference numbers 44, 46 and 48 in Fig. 1 by concatenating the selected segments. The encoded audio output data 54 may be delivered to a decoder 60 configured to decode the encoded audio output data into an audio output signal 62 comprising audio output samples.

In the embodiment shown in Fig. 1, segments, and therefore frames, originating from different encoders are fed into the same decoder 60, e.g. AU4 from encoder 2 and AU5 from encoder 3 in the example of Fig. 1. In case the same decoder instance is used to decode those AUs it is necessary that both encoders are compatible to each other. In particular, without any additional measures, this approach cannot work if the two encoders are from a completely different codec family, say AMR for encoder 2 and G.711 for encoder 3. However, even when the same codec is used across all representations, special care must be taken to restrict the encoding process. This Is because modem audio codec, such as Advanced Audio Coding (AAC) are flexible algorithms which can operate in several configurations using various coding tools and modes. Examples for such coding tools in AAC are Spectral Band Replication (SBR) or Short Blocks (SB) Other important configuration parameters are the sampling frequency (fs, e.g. 48 kHz) or channel configuration (mono, stereo, surround). In order to decode the frames (AUs)

correctly, the decoder must know about which tools are used and how those are configured (e.g. fs or SBR cross-over frequency). Therefore, generally, the required information is encoded in a short configuration string and made available to the decoder before decoding. These configuration parameters may be referred to as codec configuration. In case of AAC. this configuration is known as the Audio Specific Config (ASC).

So far, in order to achieve seamless switching, it was necessary to restrict the codec configuration to be compatible across representations (encoded versions). For example, the sampling frequency or coding tools must typically be identical across all representations. If incompatible codec configurations are used between representations, then the decoder has to be re-configured. This basically means that the old decoder has to be closed and the new decoder has to be started with a new configuration. However, this re-configuration process is not seamless under all circumstances and may cause a glitch. One reason for this is that the new decoder cannot produce valid samples immediately but requires several pre-roll AUs to build up the full signal strength. This start-up behavior is typical for codecs having a decoder state, i.e. where the decoding of the current AU is not completely independent from decoding previous AUs.

As a result from this behavior, the codec configuration was typically required to be constant across all Representations and the only changing parameter was the bit rate. This is e.g. the case for the DASH-AVC/264 profile as defined by the DASH Industry Forum.

This restriction did limit the flexibility of the codec and therefore the coding efficiency across the complete bit rate range. For example, SBR is a valuable coding tool for very low bit rates but limits audio quality at high bit rates. Hence, if the coded configuration is required to be constant, i.e. either with or without SBR, one had to compromise at either the high or low bit rates. Similarly, the coding efficiency could benefit from changing the sampling rate across representations but had to be kept constant because of the above mentioned constraints for seamless switching.

Embodiments of the present invention are directed to a novel approach that enables seamless audio switching in an adaptive streaming environment, and in particular enabling seamless audio switching for AAC-family audio codecs in an adaptive streaming environment. The inventive approach is designed to address ail shortcomings resulting

from the constraints on the codec configuration as described above. The overall goal is to have more flexibility in the configuration across representations (encoded versions), such as coding tools or sampling frequency, while seamless switching is still enabled or assured.

Embodiments of the invention are based on the finding that the restrictions explained above can be overcome and a higher flexibility can be achieved by adding a special frame carrying additional information in addition to encoded audio sample values associated with the special frame between other frames of encoded audio data, such as a compressed audio representation (CAR). A compressed audio representation may be regarded as a piece of audio material (music, speech, ...) after compression by a lossy or lossless audio encoder, for example an AAC-family audio encoder (AAC, HE-AAC, MPEG-D USAC, ...) with a constant overall bit rate. In particular, the additional information in the special frame is designed to permit an instantaneous play-out at the decoder side even in case of a switching between different codec configurations. Thus, the special frame may be referred to as an instantaneous play-out frame (IPF). The IPF is configured to compensate for the decoder start-up delay and is used to transmit audio information on previous frames along with the data of the present frame.

An example of such an IPF 80 is shown in Fig. 2. Fig. 2 shows a number of frames (access units ) 40, numbered n-4 to n+3. Each frame includes associated encoded audio sample values, i.e. encoded audio sample values of a specific number of time domain audio sample values of a sequence of time domain audio sample values representing an audio signal, such as audio input signal 10. For example, each frame may comprise encoded audio sample values representing 1024 time domain audio sample values, i.e. audio sample values of an unencoded audio signal. In Fig. 2, frame n arranged between preceding frame n-1 and following frame n+1 represents the special frame or IPF 80. The special frame 80 includes additional information 82. The additional information 82 includes information 84 on the codec configuration, i.e. information on the codec configuration used in encoding the data stream including frames n-4 to n+3, and, therefore, information on the codec configuration used to encode audio sample values associated with the special frame

In the embodiment shown in Fig. 2, a delay introduced by an audio decoder Is assumed to be three frames, i.e. it is assumed that three so-called pre-roll frames are needed to build up the full signal during startup of the audio decoder. Hence, assuming that the stream

configuration (codec configuration) is known to the decoder, the decoder would normally have to start decoding at frame n-3 in order to produce valid samples at frame n. Thus, in order to make available the necessary information to the decoder, the additional information 82 comprises a number of frames of encoded audio sample values preceding the speciai frame 80 and encoded using the codec configuration 84 indicated in the additional information 82. This number of frames is indicated by reference number 86 in Fig. 2. This number of frames 86 is necessary to initialize the decoder to be in a position to decode the audio sample values associated with the special frame n. Accordingly, the information of frame 86 is duplicated and carried as part of the special frame 80. Thus, this information is available to the decoder immediately upon switching to the data stream shown in Fig. 2 at frame n. Without this additional information in frame n, neither the codec configuration 84 nor frames n-3 to n-1 would be available to the decoder after a switch. Adding this information to the special frame 80 permits immediately initializing the decoder, and therefore immediate play-out upon switching to a data stream comprising the special frame. The decoder is configured such that such initialization and decoding of frame n can be performed within the time window available until the output samples obtained by decoding frame n have to be output.

During normal decoding, i.e. without switching to a different codec configuration, only frame n is decoded and the frames included in the additional information, n-3 to n-1, are ignored. However, after switching to a different codec configuration, all of the information in the special frame 80 is extracted and the decoder is initialized based on the included codec configuration and based on decoding of the pre-roll frames (n-3 to n-1) before finally decoding and replaying the current frame n. Decoding of the pre-roll frames takes place before the current frame is decoded and replayed. The pre-roll frames are not replayed, but the decoder is configured to decode the pre-roll frames within the time window available prior to replay of the current frame n.

The term "codec configuration" refers to the codec configuration used in encoding audio data or frames of audio data. Thus, the coding configuration can indicate different coding tools and modes used, wherein exemplary coding tools used in AAC are spectral band replication (SBR) or short blocks (SB). One configuration parameter may be the SBR cross-over frequency. Other configuration parameters may the sampling frequency or the channel configuration. Different codec configurations differ in one or more of these configuration parameters. In embodiments of the invention, different codec configurations may also comprise completely different codecs, such as AAC, AMR or G.711.

Accordingly, in the example illustrated in Fig. 2 three frames, i.e. n-3 to n-1 , are necessary to compensate the decoder start-up delay. The additional frame data may be transmitted by means of an extension payload mechanism inside the audio bitstream. For example. the USAC extension payload mechanism (UsacExtElement) may be used to carry the additional information. Furthermore, the "config" field may be used to transmit the stream configuration 94. This may be useful in case of bitstream switching or bitrate adaption. Both, the first pre-roll AU (n-3) and the IPF itself (n) may be an independently decodable frame. In the context of USAC encoders may set a flag (usaclndependencyFlag) to "1" for those frames. Implementing the frame structure as shown in Fig. 2 it is possible to randomly access the bitstream at every IPF and play-out valid PCM samples immediately. The decoding process of an IPF may include the following steps. Decode all "pre-roll" AUs (n-3... n-1) and discard the resulting output PCM samples. The internal decoder states and buffers are completely initialized after this step. Decode frame n and start regular play-out. Continue decoding as normal with frame n+1. The IPF may be used as an audio stream access point (SAP). Immediate play-out of valid PCM samples is possible at every IPF.

Special frames as defined herein can be implemented in any codec that allows the multiplexing and transmission of ancillary data or extension data or data stream elements or similar mechanisms for transmitting audio codec external data. Embodiments of the invention refer to the implementation for a USAC codec framework. Embodiments of the invention may be implemented in connection with USAC audio encoders and decoders. USAC means unified speech and audio coding and reference is made to standard ISO/IEC 23003-3:2012. In embodiments of the invention, the additional information is contained in an extension payload of the corresponding frame, such as frame n in Fig. 2. For example, the USAC standard allows addition of arbitrary extension payload to encoded audio data. The existence of extension payload is switchable on a frame to frame basis. Accordingly, the additional information may be implemented as a new extension payload type defined to carry additional audio information of previous frames.

As explained above, the instantaneous play-out frame 80 is designed such that valid output samples associated with a certain time stamp (frame n) can be generated immediately, i.e. without having to wait for the specific number of frames according to the audio codec delay, in other words, the audio codec delay can be compensated for. In the embodiment shown in Fig. 2. the audio codec delay is three frames. Moreover, the IPF is designed such that it is fully and independently decodable, i.e. without any further

knowledge of the previous audio stream, in this regard, the earliest of the number of frames added to the special frame (i.e. frame n-3 in Fig. 2) is not time differentially encoded or entropy encoded relative to any previous frame. In addition, the special frame is not time differentially encoded or entropy encoded relative to any frame previous to the earliest of the number of frames contained in the additional information or any previous frame at all. In other words, for the frames n-3 and n in Fig. 2 all dependencies to previous frames may be removed, e.g. time-differential coding of certain parameters or resetting the entropy encoding. Thus, those independent frames allow correct decoding and parsing of all symbols but are themselves not sufficient to obtain valid PCM samples instantaneously. While such independent frames are already available in common audio codecs, such as AAC or USAC, such audio codecs do not provide for special frames, such as IPF frame 80.

In embodiments of the invention, a special frame is provided at each stream access point of the representations shown in Fig. 1. In Fig. 1 the stream access points are the first frame in each segment and are hatched. Accordingly, Fig. 1 shows a specific embodiment of an apparatus for generating encoded audio output data according to the present invention. Moreover, each of the encoders 1 to m shown in Fig. 1 represents an embodiment of an audio encoder according to the invention. According to Fig. 1, encoders 12 to 18 represent providers configured to provide segments associated with different portions of the audio input signal 10 and encoded by different codec configurations. In this regard, each of encoders 12 to 18 uses a different codec configuration. Decision unit 50 is configured to decide for each segment which representation to download. Thus, decision unit 50 is configured to select a codec configuration (associated with the respective representation) for each segment based on a control signal. For example, the control signal may be received from a client requesting the representation which fits best for a given situation.

Documents

Application Documents

#	Name	Date
1	202118007919-IntimationOfGrant24-11-2023.pdf	2023-11-24
1	202118007919-STATEMENT OF UNDERTAKING (FORM 3) [25-02-2021(online)].pdf	2021-02-25
2	202118007919-PatentCertificate24-11-2023.pdf	2023-11-24
2	202118007919-REQUEST FOR EXAMINATION (FORM-18) [25-02-2021(online)].pdf	2021-02-25
3	202118007919-POWER OF AUTHORITY [25-02-2021(online)].pdf	2021-02-25
3	202118007919-Information under section 8(2) [07-11-2023(online)].pdf	2023-11-07
4	202118007919-FORM 3 [09-10-2023(online)].pdf	2023-10-09
4	202118007919-FORM 18 [25-02-2021(online)].pdf	2021-02-25
5	202118007919-FORM 3 [11-04-2023(online)].pdf	2023-04-11
5	202118007919-FORM 1 [25-02-2021(online)].pdf	2021-02-25
6	202118007919-DRAWINGS [25-02-2021(online)].pdf	2021-02-25
6	202118007919-CLAIMS [06-03-2023(online)].pdf	2023-03-06
7	202118007919-FER_SER_REPLY [06-03-2023(online)].pdf	2023-03-06
7	202118007919-DECLARATION OF INVENTORSHIP (FORM 5) [25-02-2021(online)].pdf	2021-02-25
8	202118007919-OTHERS [06-03-2023(online)].pdf	2023-03-06
8	202118007919-COMPLETE SPECIFICATION [25-02-2021(online)].pdf	2021-02-25
9	202118007919-FORM 3 [27-07-2021(online)].pdf	2021-07-27
9	202118007919-Information under section 8(2) [07-02-2023(online)].pdf	2023-02-07
10	202118007919-FORM 3 [13-10-2022(online)].pdf	2022-10-13
10	202118007919-FORM 3 [18-01-2022(online)].pdf	2022-01-18
11	202118007919-FORM 3 [09-03-2022(online)].pdf	2022-03-09
11	202118007919-Information under section 8(2) [13-10-2022(online)]-1.pdf	2022-10-13
12	202118007919-FORM 3 [12-07-2022(online)].pdf	2022-07-12
12	202118007919-Information under section 8(2) [13-10-2022(online)].pdf	2022-10-13
13	202118007919-FER.pdf	2022-09-06
14	202118007919-FORM 3 [12-07-2022(online)].pdf	2022-07-12
14	202118007919-Information under section 8(2) [13-10-2022(online)].pdf	2022-10-13
15	202118007919-FORM 3 [09-03-2022(online)].pdf	2022-03-09
15	202118007919-Information under section 8(2) [13-10-2022(online)]-1.pdf	2022-10-13
16	202118007919-FORM 3 [13-10-2022(online)].pdf	2022-10-13
16	202118007919-FORM 3 [18-01-2022(online)].pdf	2022-01-18
17	202118007919-Information under section 8(2) [07-02-2023(online)].pdf	2023-02-07
17	202118007919-FORM 3 [27-07-2021(online)].pdf	2021-07-27
18	202118007919-COMPLETE SPECIFICATION [25-02-2021(online)].pdf	2021-02-25
18	202118007919-OTHERS [06-03-2023(online)].pdf	2023-03-06
19	202118007919-FER_SER_REPLY [06-03-2023(online)].pdf	2023-03-06
19	202118007919-DECLARATION OF INVENTORSHIP (FORM 5) [25-02-2021(online)].pdf	2021-02-25
20	202118007919-DRAWINGS [25-02-2021(online)].pdf	2021-02-25
20	202118007919-CLAIMS [06-03-2023(online)].pdf	2023-03-06
21	202118007919-FORM 3 [11-04-2023(online)].pdf	2023-04-11
21	202118007919-FORM 1 [25-02-2021(online)].pdf	2021-02-25
22	202118007919-FORM 3 [09-10-2023(online)].pdf	2023-10-09
22	202118007919-FORM 18 [25-02-2021(online)].pdf	2021-02-25
23	202118007919-POWER OF AUTHORITY [25-02-2021(online)].pdf	2021-02-25
23	202118007919-Information under section 8(2) [07-11-2023(online)].pdf	2023-11-07
24	202118007919-REQUEST FOR EXAMINATION (FORM-18) [25-02-2021(online)].pdf	2021-02-25
24	202118007919-PatentCertificate24-11-2023.pdf	2023-11-24
25	202118007919-IntimationOfGrant24-11-2023.pdf	2023-11-24
25	202118007919-STATEMENT OF UNDERTAKING (FORM 3) [25-02-2021(online)].pdf	2021-02-25

Search Strategy

1	202118007919E_31-08-2022.pdf