Sign In to Follow Application
View All Documents & Correspondence

Efficient Adaptive Streaming

Abstract: Adaptive streaming is rendered more efficiently combinable with the usage of an open GOP structure by configuring a device for retrieving a video such that the same schedules a transition phase before switching from a first video stream to the second video stream and/or by configuring a device for outputting a video in accordance with the adaptive streaming protocol such that the same supports switching outputting the video in form of a layered video stream or an increased spatial resolution stream the layered video stream having a second layer which has encoded thereon the video at the increased spatial resolution using inter-layer prediction without residual coding. A media content such as a video to be represented in a dependent (second) representation which is composed a first set of temporal segments which has encoded thereinto the media content dependent on first portions of a first (reference) representation of the media content temporally corresponding to the first set of temporal segments and a second set of temporal segments the second representation which has encoded thereinto the media content independent from second portions of the first representation temporally corresponding to the second set of temporal segments so that a successful reconstruction. A media scene composed of several channels is made more efficiently streamable spending for each channel a set of representations of the respective channel which differ in a temporal distribution of random access points.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
13 August 2018
Publication Number
38/2018
Publication Type
INA
Invention Field
COMMUNICATION
Status
Email
Parent Application
Patent Number
Legal Status
Grant Date
2024-03-21
Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.
Hansastraße 27c 80686 München

Inventors

1. SKUPIN, Robert
Naugarder Straße 42 10409 Berlin
2. SANCHEZ, Yago
Warschauer Strasse 67 10243 Berlin
3. SCHIERL, Thomas
Boris-Pasternak-Weg 7b 13156 Berlin
4. HELLGE, Cornelius
Erich-Weinert-Straße 5 10439 Berlin
5. GRÜNEBERG, Karsten
Adickesstraße 43 13599 Berlin
6. WIEGAND, Thomas
Otto-Appel-Straße 52 14185 Berlin

Specification

Efficient Adaptive Streaming

Description

The present application is concerned with adaptive streaming such as using DASH.

Using adaptive streaming, a media data stream is provided from server to client in temporal segments. Depending on the application, the server may offer to the client a media at different bit rates. That is, sequences of temporal segments for different versions of the media content are available to the client for download and, during media streaming, a switching between the different versions is feasible. Accordingly, the sequence of temporal segments retrieved by the client from the server comprises, in an interleaved manner, ones stemming from a first version and ones stemming from another version. Problems may occur, however, if one would like to take advantage of the more efficient open GOP structure for encoding the media content as, in this case, reference pictures, in particular the leading pictures that miss references when decoding the stream is started at their associated random access point ( e.g. random access skip leading pictures - RASL pictures in HEVC), may get lost in such situations. Using a closed GOP structure for coding the media content does not cause these problems, but ends up into a lower coding/compression efficiency.

Other aspects for which there is a general interest in achieving improvements in adaptive streaming, pertain streaming parameters such as the frequency of necessary requests from the client to the server for requesting the issuance of a next temporal segment, a mean tune-in latency, i.e., the mean time at which a client is enabled to gain access to a certain media content, which latency should be minimalized, and an avoidance for bit rate peaks as such bit rate peaks in streaming media content requires a larger input buffer at the client for compensating the bit rate variations.

Accordingly, the object of the present invention is to provide adaptive streaming concepts which achieve the above outlined improvements.

This object is achieved by the subject matter of the independent claims.

In accordance with a thought pertaining a first aspect of the present application, adaptive streaming is rendered more efficiently combinable with the usage of an open GOP

structure by configuring a device for retrieving a video such that the same schedules a transition phase before switching from a first video stream to the second video stream. The second video stream may be, by this measure, encoded using an open GOP structure since the transition phase may provide enough time to compensate for missing reference pictures of random access dependent pictures such as RASL pictures on the basis of the first video stream.

In accordance with a second thought pertaining the first aspect of the present application, adaptive streaming is rendered more efficiently combinable with the usage of an open GOP structure by configuring a device for outputting a video in accordance with the adaptive streaming protocol such that the same supports switching outputting the video in form of a layered video stream or an increased spatial resolution stream, the layered video stream having a second layer which has encoded thereon the video at the increased spatial resolution using inter-layer prediction without residual coding. By this measure, information for substituting the afore-mentioned missing reference pictures of random access dependent pictures such as RASL pictures is rendered easily available at the client. The second stream, for which the reference pictures for random access dependent pictures are made available by means of using the layered video coded without residual coding, may be a layer of a layered video coded using inter-layer prediction or even a single layered video stream, i.e. a layer coded without inter-layer prediction. For the latter case, it means that inter-layer prediction is only used to make reference pictures for random access dependent pictures available.

In accordance with a second aspect of the present application, a media content such as a video, is made more efficiently streamable via adaptive streaming by allowing same to be represented in a dependent (second) representation which is composed a first set of temporal segments which has encoded thereinto the media content dependent on first portions of a first (reference) representation of the media content temporally corresponding to the first set of temporal segments, and a second set of temporal segments the second representation which has encoded thereinto the media content independent from second portions of the first representation temporally corresponding to the second set of temporal segments so that a successful reconstruction of the media content from the second representation gets along without the second portions the first representation.

In accordance with a third aspect of the present application, a media scene composed of several channels is made more efficiently streamable spending, for each channel, a set of representations of the respective channel which differ in a temporal distribution of random access points. By this measure, a client device may schedule the switching between the representations for optimizing fast tune-in and low bitrate variations.

In accordance with a fourth aspect of the present application, adaptive streaming quality is increased when transitioning between two representations or media streams offered at a server by offering to a client, in addition to the first and second media streams, an auxiliary media stream having encoded thereinto the media content dependent on the first and second media streams. The client may use the same to fade when switching from first to second representation by a fading phase within which the device retrieves the auxiliary media stream along with the first and second media streams and plays-out the auxiliary media stream instead of the second auxiliary stream.

Advantageous implementations are the subject of the dependent claims. Preferred embodiments of the present application are described below with respect to the figures, among which:

Fig. 1 shows a diagram illustrating a video data stream having a video encoded therein using an open GOP structure, wherein Fig. 1 shows the pictures of the video data stream in presentation time order;

Fig. 2 shows a schematic diagram illustrating a path of pictures of a video data stream having coded therein the pictures using an open GOP structure in presentation time order and, at the bottom half, in decoding order wherein the open GOP structure corresponds to the one of Fig. 1 and merely servers as an example;

Fig. 3 shows a schematic diagram illustrating two separately/independently coded video data streams, temporally subdivided into segments for adaptive streaming, at the top half, and at the bottom half the stitched data stream arriving at a client at a transition from lower quality to higher quality;

Fig. 4 shows a schematic diagram illustrating an output device in accordance with an embodiment concerning the first aspect of the present application;

shows a schematic diagram illustrating a layered video stream and an increased spatial resolution video stream used by the output device of Fig. 4 in accordance with an embodiment;

shows a schematic diagram illustrating a client device in accordance with an embodiment concerning the first aspect of the present application;

shows a schematic diagram illustrating the mode of operation of the client device of Fig. 6 with respect to inserting a transition phase when switching from lower spatial resolution to increased spatial resolution in accordance with an embodiment;

shows a schematic diagram illustrating the inbound stitched video data stream as obtained from a server by the client device of Fig. 6 when using the streams of Fig. 5;

shows as a schematic diagram illustrating the used streams of Fig . 8 by showing that the increased spatial resolution video stream may also be a layered video stream;

shows a schematic diagram illustrating a splicing point of a spliced video data stream as received by the client device of Fig. 6 when using an independently coded layer for the increased spatial resolution video data stream and using segments carrying both first and second layers for the layered video data stream;

shows a schematic diagram illustrating the data structure prepared for adaptive streaming at the server side in accordance with an embodiment where the segments of the layered video stream comprises the first layer and the second layer within common segments;

Fig. 12 shows a schematic diagram illustrating the data structure in accordance with an alternative embodiment to Fig. 1 1 where separate segments are used for first and second layers within a layered video stream;

Fig. 13 shows a schematic diagram illustrating the situation of Fig. 10, but here using separate segments for first and second layers of the layered video stream;

Fig. 14 shows a schematic diagram illustrating four consecutive segments of two representations, representation Q2 being dependent on representation Q1 , wherein in the upper half, an example is shown where the segments carrying RAPs are temporally aligned, and in the lower half it is illustrated that the segments are non-aligned, with showing the resulting download bitrate at downloading representation Q2, thereby illustrating the lower bitrate peaks in case of using non-aligned RAPs;

Fig. 15 shows a schematic diagram illustrating eight consecutive segments in case of a representation Q2 being dependent on a representation Q1 with coding some of the segments of representation Q2 in a manner independent from representation Q1 ;

Fig. 16 shows a schematic diagram illustrating an output device which may take advantage of the structure shown in Fig. 15;

Fig. 17 shows a schematic diagram illustrating a client device which may fit to the output device of Fig. 16;

Fig. 18 shows a schematic diagram illustrating a case of having one representation per section of a common feature scene with temporally aligned RAPs in the representations with showing the resulting bitrate peaks in downloading the complete scene;

Fig. 19 shows a schematic diagram illustrating an improved media scene structure of having several representations of differently temporally distributed RAPs for each channel of a multi-channel media scene;

Fig. 20 shows a schematic illustrating a schematic diagram of an output device;

Fig. 21 shows a schematic diagram illustrating the situation of Fig. 13 with additionally offering at the server an auxiliary track for fading purposes to illustrate a fourth aspect of the present application;

Fig. 22 illustrates two temporal graphs, one upon the other, illustrating the temporal decrease and increase of the factors of the linear combination of predictions on the basis of the first and second representations within the auxiliary track during the fading phase in accordance with an example where the decrease and increase takes place linearly;

Fig. 23 shows a schematic diagram illustrating an output device in accordance with an embodiment concerning the fourth aspect to the present application;

Fig. 24 shows a schematic diagram illustrating a client device in accordance with an embodiment concerning the fourth aspect of the present application; and

Fig. 25 shows a schematic diagram illustrating a client device modified compared to Fig. 25 in that the client device also operates in accordance with a first aspect of the present application.

The description of the present application with respect to the figures starts with a first aspect of the present application. Here, the usage of open GOP structures is made available for video streaming using an adaptive streaming protocol at reduced penalties in terms of switching between representations of the video relating to different spatial resolution.

In order to ease the understanding of the embodiments concerning the first aspect described later, open GOP structures are explained before.

Open GOP structures allow for a more efficient compression of a video than GOP structures at the same Random Access periodicity. As shown in Fig. 1 , when random accessing a stream encoding with an open GOP structure, there are certain pictures, denoted RASL in Fig. 1 , that are not properly decoded since their references are missing and, therefore, are not reproduced/output/shown.

Fig 1 . shows an open GOP structure in output order using the HEVC nomenclature in indicating the picture types. The Random Access point is here the Clean Random Access (CRA) picture and the Random Access Skip Leading (RASL) pictures are the random access dependent pictures that cannot be shown to the user since, when random accessing at the CRA picture, the reference P picture shown at the figure at the left-hand side is missing.

In order to render this description easier, reference is made to Fig. 2 which shows, on the top of Fig. 2, a sequence of nine pictures 10 of a video 12 in presentation time order. Pictures 10 are numbered from 1 to 9 along the presentation time order. At the bottom of Fig. 2, Fig. 2 shows pictures 10 in their decoding order at which same are encoded into a data stream. The illustration of Fig. 2 illustrates the case where pictures 10 are encoded into a data stream using the open GOP structure illustrated in Fig. 1. The numbering of the pictures 10 in the bottom half of Fig. 2 shows that the pictures 10 are temporally rearranged, i.e., that the decoding order deviates from the presentation time order.

In particular, Fig. 2 shows that the fifth picture 10 in presentation time order is coded as a random access point picture. That is, the fifth picture in presentation time order or picture number 5, is coded without depending on any picture of another presentation time and preceding in decoding order. As shown in Fig. 2, pictures number 2 to number 4 are coded in a manner directly or indirectly referencing, by temporal prediction, picture number 5, i.e., the random access point picture and another picture, namely here picture number 1 , which precedes the random access point picture both in terms of presentation time order as well as decoding order. For example, picture number 3 directly references, by temporal prediction, picture number 5 and picture number 1 . That is, picture number 3 is temporally predicted by way of motion compensated prediction, for instance, on the basis of picture number 5 and number 1. Picture number 4 does not directly reference, by temporal prediction, picture 1 , but indirectly, namely via picture number 3. That is, the set of pictures numbers 2, 3 and 4 have in common that: 1 ) they directly or indirectly reference, by temporal prediction, a random access point picture, here exemplary picture number 5 and, 2) directly or indirectly reference, by temporal prediction, a reference picture preceding, in terms of presentation time order and decoding order, the directly or indirectly referenced random access point picture, in Fig. 2 exemplary reference picture number 1. This set of pictures is liable to be skipped if random access point picture number 5 is used for random accessing the video data stream into which video 10 is encoded using an open GOP structure since the reference picture number 1 for this set of pictures numbers 2, 3 and 4 would be missing as it lies, in decoding order, upstream the random access point picture number 5.

Besides using open GOP structure for typical broadcast scenarios, where skipping some of the pictures such as RASL pictures when random accessing, for instance during channel switching between programs, is acceptable, open GOP structures have shown to be valuable for adaptive streaming such as DASH where switching to one or another stream with a different quality is done aligned with random access point pictures such as CRAs without skipping pictures. As long as the resolution is the same and the streams are authored carefully, it is possible to concatenate or stitch two streams with different qualities and obtain a specification conformant bit stream that can form a single video sequence from the view point of the video codec specification.

The latter circumstance is illustrated with respect to Fig. 3 which shows at the top half thereof, two representations Q1 and Q2 of a video and, in particular, two consecutive temporal segments Seg#1 and Seg#2 thereof. At the bottom half of Fig. 3, those temporal segments are illustrated in concatenation which are actually retrieved by a client from the server. As illustrated in Fig. 3, in the example of Fig. 3, the client has chosen to retrieve temporal segment Seg#1 from the representation Q1 and the subsequent temporal segment Seg#2 from representation Q2. In other words, Fig. 3 illustrates an example where a client downloads a first segment Seg#1 at a quality Q1 followed by a second temporal segment Seg#2 at quality Q2.

As was the case with Figs. 1 and 2, Fig. 3 illustrates the interdependencies among the pictures shown by way of arrows which point from the picture predictively coded to the respective reference picture referenced, here by temporal prediction, by the respective picture. Each segment starts, in decoding order, with a CRA picture, i.e., a random access point picture but, in the presentation time order at which the pictures are shown to be ordered in Fig. 3, this random access point picture of each segment is preceded by RASL pictures. The explanation of this circumstance has been explained above with respect to Fig. 2. By switching from quality Q1 to quality Q2, the reference pictures for the RASL pictures of the second segment of data stream Q2 did not get lost: within stream Q2, the RASL pictures of the second segment of stream Q2 reference picture P of the first segment of stream Q2, and within the stitched data stream where the second segment of stream Q2 follows the first segment of stream Q1 , these RASL pictures refer to the temporally aligned low quality picture P of the first segment of quality Q1 as a substitute.

Fig. 3 illustrates the effect of this reference picture change. In particular, Fig. 3 depicts the pictures of representation Q2 in a shaded form, whereas pictures of representation Q1 are depicted without shading. In the stitched or concatenated stream where the segment of quality Q2 follows the first segment of quality Q1 , the RASL pictures of the second segment of quality Q2 are depicted in one half without shading and the other half with shading, thereby indicating that the result of decoding these RASL pictures is neither the result of the corresponding RASL pictures when decoding the continuous stream of Q2 or Q1. However, besides being specification conformant, if they are authored properly, the quality degradation with respect to Q2 is not significant. This can already be signaled with the attribute @mediaStreamStructure in the Media Presentation Description (MPD) for DASH, i.e., within the manifest file.

A problem arises when the different qualities Q1 and Q2 do not have the same resolution, since reference pictures required for open GOP switching are not present at the proper resolution. This means that it is not possible to perform open GOP switching with resolution change with the current existing single layer codecs such as HEVC, for example. For such a purpose, a layered codec such as SHVC might be used.

In SHVC, when upswitching the decoding process from a lower to a higher layer, RASL pictures are automatically marked as non-output pictures. RASL pictures can be decoded after using the specified process for unavailable reference pictures. However, the decoding result will be visually impaired and the specification notes that, as these pictures do not influence following non-RASL pictures, the RASL pictures can be dropped at all resulting occasions in the lower layer pictures being output.

The subsequently explained embodiments generally follow two options. The first one provides enough information for the user so that the RASL pictures of a higher quality are shown at the highest quality instead of showing the lower quality for the case of using Open GOP structures for a layered codec using inter-layer prediction over the whole time (having all layers present constantly). Another option, however, is provided for the case that it is desirable to have independent layer bitstreams due to the higher compression efficiency, but still using inter-layer prediction for upswitching.

In order to ease the understanding of the following more detailed description of the various embodiments concerning the first aspect of the present application, Fig. 4 shows a device for outputting, using an adaptive streaming protocol, a video to a client. The device is denoted as output device in Fig. 4 and indicated using reference sign 20. The output

device 20 acts, accordingly, as a streaming server and the adaptive streaming protocol used by device 20 may be DASH or any other adaptive streaming protocol. The device 20 may be implemented in the form of hardware, firmware or software. When implemented in hardware, device 20 may, for instance, be an integrated circuit. If implemented in firmware, device 20 may be an FPGA and, if implemented in software, device 20 may comprise one or more processes programmed by an appropriate computer program.

The device 20 supports switching between, at least, operating the video at a first spatial resolution and outputting the video at a second spatial resolution. That is, the stream 22 output by output device 20 to the client may represent, or have encoded therein, the video 24 at a spatial resolution which varies in time and switches, for instance, between a first spatial resolution and a second spatial resolution which is greater than the first spatial resolution. The "spatial resolution" is, for instance, measured in samples per picture. Fig. 4, for instance, illustrates that output device 20 outputs the video 24 away of stream 22 at a first spatial resolution during a temporal interval 26 and at a second spatial resolution within a temporal interval 28. Within temporal interval 26, stream 22 represents the pictures 30 of video 24 at the first spatial resolution, and during temporal interval 28, stream 22 represents the pictures 30 at the second spatial resolution. The scene section captured by pictures 30 during temporal intervals 26 and 28 may be the same with merely the sample pitch at which pictures 30 spatially sample the scene differing between temporal intervals 26 and 28, or they may show differently sized sections of the same scene in accordance with an alternative embodiment, or a combination thereof.

The fact that output device 20 supports the switching between outputting the video 24 at the first spatial resolution and outputting the video at the second spatial resolution may, for instance, manifest itself in the ability of the client, an embodiment for which is described later, to retrieve from output device 20 the video 24 at the different spatial resolutions by requesting particular representations to the output device 20. As explained later on, output device 20 may, for instance, be a combination of a storage 32 storing an appropriately conceptualized data structure on the one hand and a manifest provider 34 on the other hand. Manifest provider 34 may, for instance, provide the client with a manifest which describes as to how a client may access storage 32 by respective requests. In doing so, the client, on the basis of the manifest, select between temporal segments having encoded therein to the video at the first spatial resolution and temporal segments having encoded therein to the video at the second spatial resolution. Details in this regard are set out below.

Fig. 5 illustrates as to how output device 20 enables the usage of open GOP structures for representing and encoding the video 24 at the increased spatial resolution while, nevertheless, avoiding the loss of random access dependent pictures as presented with respect to Fig. 2 in case of switching between the spatial representations. In particular, Fig. 5 illustrates that output device 20 switches, in time, between outputting the video in form of a layered video stream 36 and an increased spatial resolution video stream 38. The details regarding these streams are described further below. That is, the stream 22 output by output device 20 changes between temporal intervals 26 where the output stream 22 is the layered video stream 36 and temporal phases 28 where the output stream 22 is the increased spatial resolution video stream 38. For example, the layered video stream 36 output during interval 26 represents the video during interval 26 and is concatenated or stitched with the increased spatial resolution video stream 38 which represents or has encoded therein the video at the, for example, temporally succeeding interval 28.

The layered video stream 36 comprises, as shown in Fig. 5, a first layer L1 having encoded therein the video at a first spatial resolution. In Fig. 5, the pictures of the video as they are encoded into layer L1 are denoted by reference sign 40. The layer video stream 36 comprises, however, also a second layer L2 having encoded therein the video at the second spatial resolution. The pictures of the second layer L2 are depicted in Fig. 5 using reference sign 42. The way the video 24 is coded into layers L1 and L2 is, however, different. As will be described later on, temporal prediction might, for example, be used to encode pictures 40 into layer L1. For example, a closed GOP structure might be used. The pictures 42 of the second layer, however, are encoded into layer L2 using inter-layer prediction from the first layer L1 to the second layer L2 by inter-layer upsampling but without residual coding. The inter-layer upsampling is illustrated in Fig. 5 using vertical arrows 44, thereby illustrating that each picture 42 is purely inter-layer predicted on the basis of the temporally aligned picture 40 of layer L1. For example, the whole picture content of pictures 42 is obtained by upsampling from the corresponding portion of pictures 40. It is to be emphasized that this "coding" of pictures 42 comes at very low bitrate costs as no residual data has to be conveyed for layer L2 and the inter-layer prediction mode may be signaled for pictures 42, for example, at a coarsest granularity feasible.

The increased spatial resolution video stream 38 has encoded therein the video at the second spatial resolution using an open GOP structure. That is, the pictures 46 of the video 24 as they are encoded into the increased spatial resolution video stream 38 are of the second spatial resolution and among these pictures there are random access point pictures such as picture number 5 shown in Fig. 2 and random access dependent pictures such as pictures number 2, 3 and 4 in Fig. 2. Although Fig. 5 illustrates the case that layer L1 , layer L2 and increased spatial resolution video stream 38 have, for each of the pictures, corresponding temporally aligned pictures in the respective other ones, it should be noted that this does not have to be the case in accordance with an alternative embodiment. In order to illustrate as to how the mode of operation of the output device 20 enables the usage of the open GOP structure for forming the increased spatial resolution video stream 38 without loss of random access dependent pictures of this stream, the description of the output device 20 shall be briefly interrupted by a description of a corresponding client device shown in Fig. 6.

Fig. 6 shows a device for retrieving, using an adaptive streaming protocol, a video from a server such as the output device of Fig. 4. The device of Fig. 6 is denoted as client device 50 and may, as was the case with respect to the output device 20, be implemented in hardware, firmware or software. That is, device 50 may be an integrated circuit, an appropriately programmed FPGA or one or more processes appropriately programmed by an appropriate computer program. The client device 50 supports switching between retrieving a video at the first spatial resolution and retrieving the video at the second spatial resolution. To this end, client device 50 retrieves a stream 52 of temporal segments from the server which are selected, per temporal segment, out of different versions of the video or different streams representing the video at different bit rates. For example, stream 52 may be stream 22 of Figs. 4 and 5, with client device 50 switching between retrieving the video via layered video stream 36 and increased spatial resolution video stream 38 corresponding to a higher bit rate than the layered video stream 36. Internally, client device 50 may comprise a requester 54 responsible for requesting, for example, the aforementioned manifest from the server and sending requests to the server for fetching the temporal segments of the streams offered by the server, such as temporal segments of streams 36 and 38 between which requester 54 switches in order to, for example, avoid buffer over or under flow. For example, client device 50 also comprises a buffer 56 for buffering the inbound temporal segments fetched by requester 54 before they are subject to decoding by forwarding the buffered temporal segment to a video decoder.

The video decoder may be part of the client device 50 or may be external thereto. Fig. 6 illustrates the video decoder 58 as being external to the client device 50.

Device 50, thus, receives stream 52 from server by requesting temporal segments of different streams having encoded therein the video at different bit rates and outputs or forwards stream 52 to video decoder 58, thereby retrieving the video at varying spatial resolution.

In doing so, device 50 is configured to, in a transition phase between retrieving a first portion of the video and a first spatial resolution and retrieving a third portion of the video at the second spatial resolution, retrieve a second portion of the video, subsequent to the first and preceding the third portion, at the second spatial resolution by use of up-sampling from the first spatial resolution to the second spatial resolution.

In order to illustrate the latter circumstance and as to how device 50 enables the usage of open GOP structure for encoding the video into stream 52 at the second spatial resolution without loss of random access dependent pictures, reference is made to Fig. 7. As Fig. 7 illustrates, client device 50 retrieves within a first temporal portion 60 the video 24 at a first spatial resolution and within a third temporal portion 62 at the second, increased spatial resolution. Fig. 7 illustrates this by depicting the pictures of video 24 at different sizes. That is, within the temporal portion of phase 60, client device 50 retrieves temporal segments of a first stream offered, or rendered available for output, at the server and within the third temporal portion 62 or phase, client device 50 retrieves temporal segments of another stream offered, or rendered available for download, at the server. In between, there is the second temporal portion 64, i.e., preceding temporal portion 62 and succeeding temporal portion 60. Within this temporal portion, client device 50 obtains substitutes for pictures of the second increased spatial resolution by way of upsampling from the first to the second spatial resolution as illustrated by arrow 66. By this measure, client device 50 obtains substitutes or supplemental estimates 68 for pictures of the second spatial resolution, i.e., substitute pictures 68. Among these substitute pictures 68, some of which may be used as substitutes for reference pictures of random access dependent pictures of the random access dependent pictures of the video 24 within temporal portion 62. That is, the representation downloaded by client device 50 during temporal phase 62 may be encoded using open GOP structure and, nevertheless, the random access dependent pictures may be prevented from being lost.

Fig. 8 illustrates the mode of operation of client device 50 in accordance with an embodiment where client device 50 cooperates with output device 20 which offers streams 36 and 38 as described with respect to Fig. 5. That is, stream 52 is a stream like stream 22 explained with respect to Fig. 5. As shown in Fig. 8, client device 50 retrieves, during the second temporal portion 64, layer 1 and layer 2, L1 and L2, of the layer video stream 36 from the output device by fetching temporal segments thereof from output device 20. A client device 50 submits both layers L1 and L2 to video decoder 58 which, in turn, performs the up-sampling 66 in decoding the second layer L2 as the second layer L2 is coded using inter-layer prediction 44 as discussed above. By this measure, video decoder 58 fills an internal decoded picture buffer with pictures of the second spatial resolution which may then serve as reference pictures for random access dependent pictures of the increases spatial resolution video stream 38 which client device 50 retrieves by fetching corresponding temporal segments thereof during subsequent temporal portion 62. In the preceding temporal portion 60, in turn, client device 50 may merely submit the first layer for decoding to video decoder 58, i.e., without the second layer. Client device 50 may retrieve the second layer during the temporal portion 60 or not depending on whether, for example, output device 20 allows for a separate retrieval or fetching layers L1 and L2 of the layered video stream 36 or not.

Fig. 9 illustrates the case that the increased spatial resolution video stream 38 may also be a layered video stream having a first layer L1 and a second layer L2, wherein the pictures 46 of the second spatial resolution are not only coded using an open GOP structure, i.e., using temporal prediction, but also using inter-layer prediction 70 using upsampling from pictures 72 of layer L1 of stream 38 to the second resolution of pictures 46. Stream 38, however, then also uses residual coding for coding pictures 42 of layer L2. In other words, in the example of Fig. 9, pictures 42 of layer L2 of the layered video stream 36 are coded into data stream 36 without exploiting temporal redundancies, whereas pictures 46 are coded into stream 38 with exploiting both inter-layer and temporal redundancies, i.e., by removing them and using residua! prediction. This corresponds to the first option mentioned before the description of Fig. 4. In accordance with an alternative embodiment, pictures 46 are encoded into stream 38 as a layer of a layered video stream but without inter-layer prediction, i.e., as an independent layer. The layer index of pictures 46 coded into data stream 38 may be the same as the layer index of L2 in layered video stream 36. A transition between temporal portions 64 and 62 which would then result is illustrated in Fig. 10. Fig. 10 shows two consecutive segments within stream 52 arriving at device 50 at the junction between temporal portions 64 and 62, i.e., the first segment in data stream 52 carries layers L1 and L2 of layered video stream 36, and a temporal segment succeeding thereto, carries the independently coded layer L2 of stream 38. As can be seen, the pictures of layer L2 of layered data stream 36, which are obtained by inter-layer upsampling, serve as reference pictures for the RASL pictures of the following segment of stream 38 which segment, in turn, is coded without inter-layer prediction. That is, although reference pictures for RASL pictures in the independent layer L2 of stream 38 are required to be present in the decoded picture buffer (DPB) of decoder 58, at the correct resolution, this does not hinder the possibility of switching between the different spatial resolutions due to the measures taken and described above.

Thus, with respect to the above examples, an embodiment has been described where a layered codec such as SHVC has been used in order to allow the usage of open GOP structures in adaptive streaming for encoding a higher spatial resolution representation of a video. The embodiments generate and offer a "auxiliary switching track" as well as information to a user/client of the existence and usage of such a track.

As will be described in more detail below, timing information may be conveyed from server to client in order to inform the client as to how long the transition phase 64 between switching from a lower spatial resolution to a higher spatial resolution representation should be. By this measure, the client is informed about the necessity, for example, of decoding additional NAL units encapsulated within the "auxiliary switching track" that should be decoded some time before switching to the actual higher quality track during temporal portion 62. Hence, higher quality layer RASL pictures such as those shown in Fig. 10 can be decoded with significantly less impairment at visually appealing quality and be output instead of the corresponding pictures of the lower quality track, in case of considering open GOP structures for a layered codec using inter-layer prediction. In case of using layered codecs for a single-layer prediction (so-called independent layers), the client should schedule enough time for decoding the "auxiliary switching track" represented by pictures 42, a specific time before starting decoding a higher independent layer encoded in open GOP configuration such as, for example, pictures 46 with associated RASL pictures.

Briefly, referring back to the description of Fig. 6, it should be noted that the client device 50 may be agnostic with respect to with a way the streams or representations available for retrieval at the server are coded. Accordingly, in accordance with an embodiment, the output device or server informs the client device or client on the necessity to schedule the transition phase pertaining the second temporal portion 64 between the retrieval of the video at the first spatial resolution and the retrieval of the video at the second spatial resolution or before switching the video stream 38. Depending on this signa!ization, the client device 50 may skip or leave off the transition phase or not. By this measure, another video available at the same server or another server at different spatial resolution representations with the higher spatial resolution representation being coded, for example, in a closed GOP structure, may be retrieved without any transition phase when switching from the lower spatial resolution representation to the higher spatial resolution representation.

In a concrete example, stream 22 and 52, respectively, may be transferred between server and client or device 20 and 50, respectively, in a file format where an additional track is spent for carrying layer L2 of the layered video stream 36. This track could be marked as "switching track/representation". The marking indication as switching track do not have to be contained in the file format but could be contained in the manifest sent from server to client, i.e., device 20 to device 50 such as the MPD in DASH or the initial segment of the respective video. Although it could be that server and client, i.e., device 20 and 50, could use a default temporal name for the transition phase for temporal portion 64 so that the aforementioned signaiization in, for example, the manifest regarding the transition phase may merely correspond to a binary signaiization in the sense of switching between necessity of the transition phase of a predetermined length or the ieaving-off of the respective transition phase, alternatively, it is possible that the server informs the client on the length of the transition phase and the length of the temporal portion 64, respectively. The length could be indicated by indexing one of a plurality of predetermined length values agreed between server and client, an indication of the length and units of temporal segments at which the video is retrievable by the adaptive streaming protocol, or in units of time such as in units of picture order count or the like. For example, the manifest or media presentation description sent from the server or device 20 to the client or device 50 could be provided with an indication of the length of temporal portion 64 like at switching time shift or at numSwitchRepSegments.

Later on, it will be shown that stream 38 may be a video stream comprising a supplemental enhancement information (SEI) that allows for the derivation of the just mentioned transition phase length by providing an information on a maximum distance to the reference picture form a RASL picture referring the respective reference picture wherein this information is to be understood as a promise. In HEVC, the structures of

pictures SEI is, for example, not scoped for the whole coded video sequence (CVS) and could, accordingly, not suffice in this regard. Accordingly, a new type of supplemental enhancement information SEI would be advantageous.

Device 20 of the server could, accordingly, derive the length of the transition period 64 from this supplementary enhancement information and inform the client on device 50 via the manifest accordingly.

As also becomes clear from the above discussion, the client or device 50 may either be configured to inevitably apply the transition phase concerning temporal portion 64, thereby inevitably playing out the switching track or switching representation in the form of layer L2, or the transition phase would be optional and would be switched on by the server or device 20 using the aforementioned signalization in, for example, the manifest. In other words, it could be optional or mandatory to play out the switching track of representation in the form of layer L2 of the layered video stream 36.

As far as layer L1 of the layered video stream 36 is concerned, it is noted that it may be coded in a closed GOP structure using, for example, IDR pictures. By this measure, client or client device 50 may directly, i.e., without any transition, switch from the higher spatial resolution, i.e., downloading stream 38, to the lower spatial resolution, i.e., downloading layer L1 of stream 36.

Fig. 1 1 illustrates a concrete example with respect to the way the auxiliary switching track in the form of layer L2 of layered video stream 36 is offered to the client. Fig. 1 1 illustrates the data structure that might be stored in storage 32 of the output device 20. The data structure is indicated using reference sign 18 and comprises the increased spatial resolution video stream 38 and the layered video stream 36. Both are temporally subdivided into a sequence of temporal segments. The temporal segments of stream 38 are denoted 30Ί ... 38N and the temporal segments of stream 36 are denoted 36, ... 36N. Time-aligned temporal segments 38, and 36, pertain, or have encoded therein, a corresponding temporal portion of a video. In accordance with the embodiment of Fig. 1 1 , layer L2 of layered video stream 36 is not separately retrievable by the client. Rather, layer L2 is included as an additional track within the same segments 36| within which layer L1 of stream 36 is conveyed. Thus, it is shown at 82, the client or client device 50 would schedule a transition phase 84 before any start 86 of retrieving the video in the form of data stream 38 from the server or device 20. Within the transition phase 84, stream 22/52

comprises a sequence of corresponding temporal segments of stream 36. That is, during the transition phase 84, device 50 fetches the segment belonging to transition phase 84 out of the segments of layered video stream 36, thereby forming temporal portion 64. From time 86 onwards, device 50 fetches the temporal segments out of those sequence of segments of stream 38 until switching back from the increased spatial resolution to the lower spatial resolution. The difference between the mode of operation of device 50 during the transition phase 84 and the time before is the following.

As can be seen in Fig. 1 1 , in the embodiment shown, the client device merely has the choice between fetching segments of layered video stream 36 or segments of increased spatial resolution video stream 38. Before switching to increased spatial resolution video stream 38, client device 50 schedules the transition phase 84. Before the transition phase, client device 50 merely forwards layer L1 of the layered video stream 36 to decoding by video decoder 58 while, during the transition phase, client device 50 forwards both layers L1 and L2 to the video decoder 58. During this time 84, the video decoder 58 reconstructs pictures 42 of layer L2 of the layered video stream which then serve as reference pictures for random access dependent pictures of one or more segments of the increased spatial resolution video stream 38 retrieved from the server or device 20 from time 86 onwards. Fig. 1 1 illustrates the above outlined possibility that client device 50 schedules the transition phase 84 responsive to a corresponding signalization 88 from the server or output device 20 which signalization 88 may, for instance, be included in the media presentation description or manifest 90. If the signalization 88 indicates layer L2 to be used as reference picture substitute reservoir in transition phase 84, client device 50 acts as described so far. If not, client device 50 does not schedule the transition phase 84 before starting a time 86 retrieving temporal segments of the increased spatial resolution video stream 38 but directly extends the phase of merely subjecting layer L1 to decoding by video decoder 58 to a switching time 86 as illustrated at the bottom of Fig. 1 1 .

The latter embodiment of Fig. 1 1 involved including the "auxiliary switching track" L2 of stream 36 within the segment of the layered video stream 36. In the media presentation description or manifest 90, this auxiliary switching track would be indicated as a representation separate from a representation formed by layer l_1 of the layered video stream 36. Manifest 90 would signal, for instance, the required decoding capabilities for video decoder 58 to decode layer L2 which, in turn, is dependent on layer L1 , i.e., to decoder the "auxiliary switching track" and indicate the decoding capabilities for video decoder 58 to decode merely the low-resolution layer L1 of the layered video stream 36.

The following concrete signalization could be used within manifest 90 in order to signal to the client device 50 information concerning the auxiliary switching track L2 such as, for example, information 88 which indicates the existence of the auxiliary switching track L2 and, maybe, concurrently the length of the transition phase 84. Additionally, as just outlined, the required capabilities with respect to L2 are merely signaled.

The required capabilities of a representation are currently signaled with the @mimeType attribute. The first attribute that would be needed to be defined is that switching to a given representation is allowed, i.e. the required "auxiliary track" is included within the segments. Such an attribute could be named e.g. @switchableTo. Additionally the ©switching MimeType attribute should be defined describing the required capabilities when the "auxiliary switching track" is decoded. Finally, the time before the switch that the "auxiliary switching track" needs to be decoded needs to be signaled so that the DASH client can decide whether it can switch to a higher-resolution representation seamlessly or not (@switchingTimeShift/@numSwitchRepSegments). In order to be able to switch to such a higher-resolution representation the user is required to random-access the lower representation from a SAP earlier than the time described by (@switchingTimeShift/@numSwitchRepSegments). The concrete signaling could see as follows:

Element or Attribute Name Use Description

Representation This element contains a description of a

Representation.

@id M specifies an identifier for this

Representation. The identifier shall be unique within a Period unless the

Representation is functionally identically to another Representation in the same Period. The identifier shall not contain whitespace characters.

If used in the template-based URL construction as defined in 5.3.9.4.4, the string shall only contain characters that are permitted within an HTTP-URL according to RFC 1738.

©bandwidth M Consider a hypothetical constant bitrate

channel of bandwidth with the value of this attribute in bits per second (bps). Then, if the Representation is continuously delivered at this bitrate, starting at any SAP that is indicated either by @startWithSAP or by any Segment Index box, a client can be assured of having enough data for continuous playout providing playout begins after @minBufferTime * ©bandwidth bits have been received (i.e. at time

@minBufferTime after the first bit is received).

For dependent Representations this value shall specify the minimum bandwidth as defined above of this Representation and all complementary Representations.

@qualityRanking 0 specifies a quality ranking of the

Representation relative to other

Representations in the same Adaptation Set. Lower values represent higher quality content. If not present then no ranking is defined.

@dependencyld 0 specifies all complementary

Representations the Representation depends on in the decoding and/or presentation process as a whitespace- separated list of values of @id attributes. If not present, the Representation can be decoded and presented independently of any other Representation.

This attribute shall not be present where there are no dependencies.

@switchableTo 0 specifies all Representations for which the

Representation contains additional data to switch to as a whitespace-separated list of values of (a)id attributes.

This attribute shall not contain any value x if this Representation and the Representation with @id=x do not have the same

@mediaStreamStructureld.

@switching imeType 0 specifies the MIME type of the

concatenation of the Initialization Segment, if present, and some consecutive Media Segments in the Representation and some consecutive Media Segments in the

Representation pointed by @switchableTo as a whitespace-separated list of values.

@switchingTimeShift 0 specifies the time before the start of the segment at which switching can happen (e.g. SAP 3) at which the additional data corresponding to the @switchingMimeType needs to be decoded to be able to seamlessly switch to a given representation.

@numSwitchRepSegme 0 specifies the number of segments before nts the start of the segment at which switching can happen (e.g. SAP 3) at which the additional data corresponding to the

@switchingMimeType needs to be decoded to be able to seamlessly switch to a given representation.

CommonAttributesEle Common Attributes and Elements

me nts (attributes and elements from base type

RepresentationBaseType), for more details see 5.3.7

Legend:

For attributes: M=Mandatory, 0=Optional, OD=Optional with Default Value,

C =Conditionally Mandatory.

For elements: ... (N=unbounded)

Elements are bold; attributes are non-bold and preceded with an @, List of elements and attributes is in italics bold referring to those taken from the Base type that has been extended by this type.

An alternative of the description is brought forward with respect to Fig. 1 1 could be that it is agreed between client device 50 and output device 20 that client device 50 subjects the auxiliary switching track L2 to decoding by video decoder 58. Video decoder 58 would automatically have reference picture substitutes for random access dependent pictures of the increased spatial resolution video stream 38 at hand provided that any switching to the increased spatial resolution video stream 38 does not take place earlier than the length of the transition phase 84 from having started fetching a sequence of segments of the layered video stream 36. Accordingly, even in this case of requiring the client device 50 to inevitably subject layer L2 to decoding, client device 50 is to schedule the transition phase 84 before switching to the increased spatial resolution data stream 38. Thus, an alternative embodiment to the one described with respect to Fig. 11 is that, alternatively, another embodiment is that the user is signaled that in order to be able to switch seamlessly to another representation n+1 , no-additional time information is required but the user must decode the whole time the "auxiliary switching track" from the first AU present in the auxiliary track in segment n. Still in such a case the mimeType of this alternative representation would be necessary for the user to know what it is required to be able to decode such a track. Besides, the user could derive the resolution of the output from the Representation pointed by the @switchableTo attribute. In order to be able to switch to such a higher-resolution representation the user is required to random-access the lower representation from any SAP earlier than the SAP in the higher-resolution.

Element or Attribute Name Use Description

Representation This element contains a description of a

Representation.

@id M specifies an identifier for this

Representation. The identifier shall be unique within a Period unless the

Representation is functionally identically to another Representation in the same Period. The identifier shall not contain whitespace characters.

If used in the template-based URL construction as defined in 5.3.9.4.4, the string shall only contain characters that a e permitted within an HTTP-URL according to RFC 1738.

©bandwidth M Consider a hypothetical constant bitrate

channel of bandwidth with the value of this attribute in bits per second (bps). Then, if the Representation is continuously delivered at this bitrate, starting at any SAP that is indicated either by ©startWithSAP or by any Segment Index box, a client can be assured of having enough data for continuous playout providing playout begins after @minBufferTime * ©bandwidth bits have been received (i.e. at time

@minBufferTime after the first bit is received).

For dependent Representations this value shall specify the minimum bandwidth as defined above of this Representation and all complementary Representations.

@qualityRanking 0 specifies a quality ranking of the

Representation relative to other

Representations in the same Adaptation Set. Lower values represent higher quality content. If not present then no ranking is defined.

@dependencyld o specifies all complementary

Representations the Representation depends on in the decoding and/or presentation process as a whitespace- separated list of values of @id attributes. If not present, the Representation can be decoded and presented independently of any other Representation.

This attribute shall not be present where there are no dependencies.

@switchableTo 0 specifies all Representations for which the

Representation contains additional data to switch to as a whitespace-separated list of values of @id attributes.

This attribute shall not contain any value x if this Representation and the Representation with @id=x do not have the same

@mediaStreamStructureld.

@switchingMimeType 0 specifies the MIME type of the

concatenation of the Initialization Segment, if present, and some consecutive Media Segments in the Representation and some consecutive Media Segments in the

Representation pointed by @switchableTo as a whitespace-separated list of values.

Com on A ttributesEle Common Attributes and Elements

ments (attributes and elements from base type

Repres entationBase Type) , for more details see 5.3.7

Legend:

For attributes: =Mandatory, 0=Optional, OD=Optional with Default Value,

CM=Conditionally Mandatory.

For elements: ... (N=unbounded)

Elements are bold; attributes are non-bold and preceded with an @, List of elements and attributes is in italics bold referring to those taken from the Base type that has been extended by this type.

As stated above, the length of transition phase 84 may be set to a default value so that there is no need to transmit same. For example, by default, transition phase could be 84 one segment length long. That is, the temporal coding inter-dependencies could be limited so as to not by longer than one segment length, at least as far as representation switching instances are concerned, i.e. times where switching between representations is allowed. A further alternative embodiment of using transition phase so as to improve switching between different qualities uses this default setting and could be implemented as follows. In particular, the just-described embodiment could be used to inform in a manifest file such as a DASH MPD, a client on the advantageous consideration of the transition phase in switching to a higher quality layer.

For example, a Supplemental Property Descriptor could be denoted as

"urn:mpeg:dash:resolutionSwitching:2016" and used to indicate which Representations allow for a seamless resolution switching at the start of any Segment starting with a SAP type in the range of 1 to 3, inclusive. The descriptor could be placed on Adaptation Set or Representation level in the MPD hierarchy when used in DASH, ©value of the supplemental property descriptor is a white-space separated list of two values as specified in the following table:

SupplementalProperty@value attributes resolutionSwitching:2016

That is, this example shows, that a descriptor could indicate, for a certain representation such as Li , which representations are available for being switched to, such as L2. Irrespective of this descriptor indicating such representation(s) explicitly, the descriptor could, by its presence in the MPD, indicate that one segment perfected in advance before switching to representation L2 suffices to have all temporal references potentially

preceding the switching point owing to open GOP structure, for instance. In other words, by default, te resolution switching descriptor shall not be present unless all access units in a segment N with presentation time within [TEPT,TDEC) is constrained in such a way that they only depend on access units of segment N or segment N-1 . Thus, if a Representation is changed at segment N, where this descriptor is present, it might be necessary to decode an additional media stream during segment N-1 , namely in Fig, 1 1 the enhancement layer of the layered stream, different to the one conforming to the ©codecs attribute indicated at the "switch-from" representation, which presence is indicated by the presence of switchingMimeType , namely in Fig. 1 1 the single-layered high quality stream, in order to be able to decode all access units preceding the first SAP (i.e., in the interval EPT.TDEC)) of segment N of the "switch-to" Representation.

Fig. 12 shows an alternative embodiment compared to the embodiment of Fig. 1 1 where the data structure 80 has separate time-aligned segments for layer L1 and L2 of layered video stream 36, namely time aligned segment 362, and 36V All temporal segments 361,, 362i and 38, are associated with different addresses and are therefore individually fetchable by client device 50. Here, the client device 50 merely fetches segments 361, from the output device 20 during the time portion preceding the transition phase 84. During the transition phase 84, client device 50 retrieves for each temporal segment i of the video both temporal segments 361, and 362, from the output device 20, thereby forwarding to the video decoder 58 not only layer L1 but also layer L2. From time 86 on, client device 50 retrieves or fetches the temporal segments 38, of the increased spatial resolution video stream 38 and forwards same to the video decoder 58. Again, Fig. 12 illustrates that information 88 may control the client device 50 to apply or not apply the transition phase 84.

That is, Fig. 12 illustrates an embodiment where a separate representation is used that contains the additional data required for switching, namely the data within layer L2 of the layered video stream 36. That is, in the embodiment of Fig. 12, this data is not included within the same segments also carrying the base layer L1.

Fig. 13 illustrates for the latter embodiment the same situation as shown in Fig. 3, however the client device retrieving two segments for the first temporal segment of the video, namely one of representation 1 corresponding to layer L1 of the layered video stream 36, and the corresponding temporal segment of a representation 3 which corresponds to layer L2 of the layered video stream. As far as the manifest 90 and the

description of the availability of the video at the output device or server 20 is concerned, the following may be noted.

in such a case Rep3 should include @dependencyld=Rep1 and Rep2 and Rep3 should have the same @mediaStreamStructureld. In such a case Rep3 would not require an additional @mimeType as a representation it should already include it. However, this representation should be marked as "only intended for switching" with for instance a parameter @switchingRepresenation. As for the previous case timing information could be included indicating from which point onward it is needed to decode such a representation to be able to switch to another representation or it could be restricted in such a way that as long as it is decoded from the SAP in Rep1 previous to the switching point in Rep2 all required references are available.

Another embodiment consists of having only closed GOP RAPs (or switching points) in the lowest resolution and only Open GOP RAPs in the higher resolution. This allows for seamless switching to the lowest quality at all available RAPs. Alternatively if more resolutions are available, for instance 3, the lowest resolution has only closed GOP RAPs, the highest resolution has only Open GOP RAPs and the middle resolution representation has a mixture of both. Switching up is possible at the presence of any RAP but switching down only at the present of closed GOPs. In such a case, the existing @switchingPeriod should be extended to differentiate between upSwitching and downSwitching.

A further embodiment relates to the presence at the video of information about the largest amount of pictures in the past that the RASL pictures can refer to for prediction. This information would be required to derive the described attributes at the MPD in previous paragraphs. This information could be included, for instance, in the form of an SEI or in the VUI itself.

Claims

Device for outputting, using an adaptively streaming protocol, a video (24) to a client, the device supporting switching between, at least,

outputting the video (24) in form of a layered video stream (36); and

outputting the video (24) in form of an increased spatial resolution video stream (38) encoded using an open GOP structure and having encoded thereinto the video at a second spatial resolution and at a second quality,

the layered video stream (36) comprising

a first layer (L1 ) having encoded thereinto the video at a first spatial resolution and

a second layer (L2) having encoded thereinto the video at the second spatial resolution and a first quality lower than the second quality and using inter-layer prediction (44) from the first to the second layer by way of inter- layer upsampling, but without prediction residual coding.

Device according to claim 1 , wherein the increased spatial resolution video stream (38) is a further layered video stream comprising

a further first layer (L1 ) having encoded thereinto the video at the first spatial resolution and

a further second layer (L2) having encoded thereinto the video at the second spatial resolution using temporal prediction in the open GOP structure and using inter-layer prediction from the further first layer (L1 ) to further second layer (L2) by way of inter-layer upsampling and using prediction residual coding.

Device according to claim 2, wherein the first layer and the further first layer have the video encoded thereinto equally coded at the first spatial resolution so that a reconstruction of the video on the basis of the layered video stream (36) and the further layered data stream, spliced together at a splicing point, at the first spatial resolution is equal to a reconstruction of the video at the first spatial resolution on the basis of any of the layered video stream and the further layered data stream , respectively.

4. Device according to claim 2 or 3 wherein the first layer and the further first layer and the further second layer are encoded using an open GOP structure.

5. Device according to claim 1 , wherein the increased spatial resolution video stream is a further layered video stream comprising a further second layer having encoded thereinto the video at the second spatial resolution using temporal prediction in the open GOP structure and using prediction residual coding and without inter-layer prediction.

6. Device according to any of claims 2, 3 and 5, wherein the first layer is encoded using an closed GOP structure.

7. Device according to any of claims 2 to 6, wherein the second layer and the further second layer are labeled using a common layer ID so that splicing the layered video stream and further layered video stream results in a spliced layered video stream comprising a layer with the common layer ID having encoded thereinto the video at the second spatial resolution.

8. Device according to claim 1 , wherein the increased spatial resolution video stream (38) is a further layered video stream comprising a further second layer having encoded thereinto the video at the second spatial resolution and the second layer and the further second layer are labeled using a common layer ID so that splicing the layered video stream and the further layered data stream results in a spliced layered video stream comprising a layer with the common layer ID having encoded thereinto the video at the second spatial resolution.

9. Device according to any of the previous claims, wherein the device is configured to inform the client that the client is to schedule a transition phase before a switch from the layered video stream to the increased spatial resolution video stream in which the client is to derive a supplemental estimate of the video at the second spatial resolution by use of the second layer of the layered video stream.

10. Device according to claim 9, wherein a length of the transition phase exceeds or is equal to a maximum distance between

random access dependent pictures of the increased spatial resolution video stream which respectively directly or indirectly reference, by temporal prediction, a, in terms of presentation time order, succeeding random access point picture of the increased spatial resolution video stream and a reference picture preceding, in terms of presentation time order and decoding order, the random access point picture on the one hand and the reference picture directly or indirectly referenced by the random access dependent pictures on the other hand.

Device according to claim 9 or 10, wherein the device is configured to indicate to the client the length of the transition phase

in units of temporal segments of the layered video stream and the increased spatial resolution video stream, or

in temporal units.

Device according to any of the previous claims, wherein the device is configured to provide the client with a manifest

describing an availability of the video for the client in the first spatial resolution in form of the layered video stream (36) and in the second spatial resolution in form of the increased spatial resolution video stream (38) and

indicating a presence of the second layer (L2) of the layered video stream (36) along with the first layer (L1 ) of the layered video stream (36) in temporal segments of the layered video stream and a purpose of the second layer (L2) as a means for deriving a supplemental estimate of the video at the second spatial resolution so as to switch from the layered video stream to the increased spatial resolution video stream;

indicating a computational rule to compute addresses for fetching temporal segments of the layered video stream and the increased spatial resolution video stream.

Device according to any of claims 9 to 12, wherein the server is configured to insert into the manifest an information indicating that the client is to schedule a transition phase before a switch from the layered video to the increased spatial resolution video stream in which the client is to derive a supplemental estimate of the video at the second spatial resolution by use of the second layer of the first stream

Device according to claim 13, wherein the device is configured to obtain derive a length of the transition phase from an SEI of the increased spatial resolution video stream.

Device according to any of claims 1 to 14, wherein the device supports switching between, at least,

outputting the video in form of the layered video stream;

outputting the video in form of the increased spatial resolution video stream; and

outputting the video in form of a reduced spatial resolution stream having encoded thereinto the video at the first spatial resolution.

Device according to claim 15, wherein the reduced spatial resolution stream is

a single layer video stream which, or

an even further layered video stream comprising an even further first layer which

has encoded thereinto the video at the first spatial resolution.

Device according to claim 15, wherein the reduced spatial resolution stream is

an even further layered video stream comprising an even further first layer,

wherein the first layer and the even further first layer have the video encoded thereinto equally coded at the first spatial resolution so that a reconstruction of the video on the basis of the layered video stream and the even further layered video stream, spliced together at a splicing point, at the first spatial resolution is equal to a reconstruction of the video at the first spatial resolution on the basis of any of the layered video stream and the even further layered video stream, respectively.

Device according to claim 17, wherein the even further first layer is encoded using an closed GOP structure.

Device according to claim 17 or 18, wherein the device is configured to provide the client with a manifest describing an availability of the video for the client at the server in the first spatial resolution and in the second spatial resolution and indicating a computational rule to compute addresses for fetching temporal segments of the first layer, the second layer, the even further first layer and the increased spatial resolution video stream which are different for first layer, the second layer and the increased spatial resolution video stream, but are equal for the even further first layer and the first layer.

Device according to any of claims 15 to 19, wherein the device is configured to provide the client with a manifest describing an availability of the video for the client at the server in the first spatial resolution and in the second spatial resolution and indicating a computational rule to compute addresses for fetching temporal segments of the increased spatial resolution video stream, the first layer, the second layer and the reduced spatial resolution stream which differ for the increased spatial resolution video stream, the first layer, the second layer and the reduced spatial resolution stream .

Device for retrieving, using an adaptively streaming protocol, a video, the device supporting switching between, at least,

retrieving the video in form of a first video stream (36); and

retrieving the video in form of a second video stream (38),

wherein the device is configured to schedule a transition phase (64) before switching from retrieving the video in form of the first video stream (36) to retrieving the video in form of the second video stream (38).

22. Device according to claim 21 , wherein the second video stream (38) is encoded using an open GOP structure and the device is configured to subject the first video stream (36) and second video stream (38) in a manner spliced together to decoding such that pictures decoded from the first video stream form, for random access dependent pictures of the second video stream which respectively directly or indirectly reference, by temporal prediction, a, in terms of presentation time order, succeeding random access point picture of the second video stream and a reference picture preceding, in terms of presentation time order and decoding order, the random access point picture, a substitute of the reference picture.

23. Device according to claim 21 or 22, wherein the first video stream is a layered video stream comprising

a first layer having encoded thereinto the video at the first spatial resolution and

a second layer having encoded thereinto the video at a second spatial resolution greater than the first spatial resolution and using inter-layer prediction from the first to the second layer by way of inter-layer upsampling, but without residual coding.

24. Device according to claim 23, wherein the device is configured to retrieve the layered video stream in temporal segments containing the first and second layers.

25. Device according to claim 24, wherein the device is configured to restrict subjecting the second layer to decoding along with the first layer to a time during the transition phase.

26. Device according to claim 23, wherein the device is configured to retrieve the layered video stream in temporal segments separately containing the first and second layers.

27. Device according to claim 26, wherein the device is configured to refrain from retrieving temporal segments containing the second layer outside the transition phase.

Device according to any of claims 23 to 27, wherein the second video stream is a further layered video stream comprising a further second layer having encoded thereinto the video at the second spatial resolution without inter-layer prediction,

wherein the device is configured to, in the transition phase, submitting the first and second layers to decoding by a scalable video decoder and, immediately after the transition phase, submit the further layered video stream to decoding by the scalable video decoder so that the scalable video decoder obtains from the second layer of the layered video stream for random access dependent pictures of the second spatial resolution of the further layered video stream which respectively directly or indirectly reference, by temporal prediction, a, in terms of presentation time order, succeeding random access point picture of the further layered video stream and a reference picture preceding, in terms of presentation time order and decoding order, the random access point picture, a substitute for the reference picture.

Device according to claim 28 wherein the first layer is encoded using an closed GOP structure,

wherein the device is configured to, when switching from retrieving the video in form of the second video stream to retrieving the video in form of the first video stream,

immediately consecutively submit to the scalable video decoder a portion of the further layered video stream pertaining a first portion of the video so as to retrieve the first portion of the video at the second spatial resolution followed by the first layer of a second portion of the layered video stream pertaining a second portion of the video, immediately following the first portion, so as to retrieve the second portion of the video at the first spatial resolution.

30. Device according to any of claims 21 to 29, configured to obtain from a server from which the video is retrieved, an information of a length of the transition phase.

31. Device according to any of claims 21 to 29, configured to obtain from a server from which the video is retrieved, a signalization and, depending on the signalization, deactivate the scheduling or activate the scheduling.

32. Device according to claim 30 or 31 , configured to request from the server a manifest describing an availability of the video for the client at the server in form of the first video stream and in the second video stream and obtain the information on the length of the transition phase or the signalization from the manifest.

Data structure representing a video, the data structure being conceptualized for a retrieval of the video, using an adaptively streaming protocol, by a client switching between, at least, retrieval at a first spatial resolution and retrieval at a second spatial resolution greater than the first spatial resolution, the data structure comprising

an increased spatial resolution video stream having encoded therein the video using an open GOP structure at the second spatial resolution and at a second quality, and

a layered video stream comprising

a first layer having encoded thereinto the video at the first spatial resolution and

a second layer having encoded thereinto the video at the second spatial resolution and a first quality reduced compared to the second quality and using inter-layer prediction from the first to the second layer by way of inter- layer upsampling, but without residual coding.

Data structure according to claim 33, wherein the increased spatial resolution video stream is a further layered video stream comprising

a further first layer having encoded thereinto the video at the first spatial resolution and

a further second layer having encoded thereinto the video at the second spatial resolution using inter-layer prediction from the further first to further second layer by way of inter-layer upsampling and using residual coding.

Data structure according to claim 34, wherein the first layer and the further first layer have the video encoded thereinto equally coded at the first spatial resolution so that a reconstruction of the video on the basis of the layered video stream and the further layered data stream, spliced together at a splicing point, at the first spatial resolution is equal to a reconstruction of the video at the first spatial resolution on the basis of any of the first stream and the second stream, respectively.

Data structure according to claim 34 or 35 wherein the first layer and the further first layer and the further second layer are encoded using an open GOP structure.

Data structure according to claim 33, wherein the increased spatial resolution video stream is a further layered video stream comprising a further second layer having encoded thereinto the video at the second spatial resolution without inter- layer prediction.

Data structure according to any of claims 34 to 37, wherein the first layer is encoded using an closed GOP structure.

Data structure according to any of claims 35 to 38, wherein the second layer and the further second layer are labeled using a common layer ID so that splicing the layered video stream and further layered video stream results in a spliced layered video stream comprising a layer with the common layer ID having encoded thereinto the video at the second spatial resolution.

Data structure according to claim 33, wherein the increased spatial resolution video stream is a further layered video stream comprising a further second layer having encoded thereinto the video at the second spatial resolution and the second layer and the further second layer are labeled using a common layer ID so that splicing the layered video stream and the further layered data stream results in a spliced layered video stream comprising a layer with the common layer ID having encoded thereinto the video at the second spatial resolution.

41. Device for outputting, using an adaptively streaming protocol, a video to a client, the device being configured to offer the video to the client for retrieval in form of, at least,

a first video stream (36); and

a second video stream (38),

wherein the device is configured to inform the client on the necessity to schedule a transition phase (64) before switching from retrieving the video in form of the first video stream (36) to retrieving the video in form of the second video stream (38).

Device according to claim 41 , wherein

the device is configured to provide the client with a manifest

describing an availability of the video for the client in a first spatial resolution in form of the first video stream (36) and in a second spatial resolution, higher than the first spatial resolution, in form of the second video stream (38) and

indicating a presence of a second layer (L2) in temporal segments of the first video stream and a purpose of the second layer (L2) to be played-out when switching from the first spatial resolution to the second spatial resolution during the transition phase before switching to the second video stream (38);

indicating a computational rule to compute addresses for fetching temporal segments of the first video stream and the second video stream, respectively.

Device according to claim 41 , wherein the device is configured to offer the video to the client for retrieval additionally in form of a third video stream, and

the device is configured to provide the client with a manifest

describing an availability of the video for the client in a first spatial resolution in form of a third video stream (L1 ), and in a second spatial

resolution, higher than the first spatial resolution, in form of the second video stream (38) and

indicating that temporal segments of the first video stream are to be retrieved during the transition phase along with temporal segments of the third video stream when switching from the first spatial resolution to the second spatial resolution between switching from the third video stream via the first video stream to the second video stream;

indicating a computational rule to compute addresses for fetching temporal segments of the first, second and third video streams.

44. Device according to any of claims 42 and 43, wherein the device is configured to offer the video to the client for retrieval additionally in form of a further video stream with the manifest describing the availability of the video for the client in a third spatial resolution, higher than the first and second spatial resolutions, in form of the further video stream, and to inform the client on

down-switching occasions for switching from the third to the second spatial resolution and

up-switching occasions for switching from the first or third video stream to the second video stream.

45. Device according to any of claims 42 to 44, wherein the device indicates in the manifest that the first video stream and the second video stream may be spliced together so as to be fed to one decoder.

46. Device according to any of claims 41 to 45, wherein the device informs the client on a length of the transition phase.

47. Video stream having encoded thereinto a sequence of pictures in such a manner that there is among the sequence of pictures at least one random access dependent picture which directly or indirectly references, by temporal prediction, a, in terms of presentation time order, succeeding random access point picture of the sequence of pictures and a reference picture preceding, in terms of presentation

time order and decoding order, the random access point picture, wherein the video stream comprises

a syntax element indicating a maximum temporal distance between the at least one random access dependent picture and the reference picture directly or indirectly referenced by the at least one random access dependent picture.

Video encoder configured to

encode a sequence of pictures into a video stream in such a manner that there is among the sequence of pictures at least one random access dependent picture which directly or indirectly references, by temporal prediction, a, in terms of presentation time order, succeeding random access point picture of the sequence of pictures and a reference picture preceding, in terms of presentation time order and decoding order, the random access point picture, and

insert a syntax element into the data stream indicating a guaranteed maximum temporal distance between the at least one random access dependent picture reference picture and the reference picture directly or indirectly referenced by the at least one random access dependent picture.

Device for outputting, using an adaptively streaming protocol, a media content to a client, the device supporting switching in units of temporal segments between, at least,

a first representation,

a second representation having encoded thereinto the video dependent on the first representation,

wherein the device provides the client with an information discriminating between

a first set of temporal segments of the second representation which has encoded thereinto the media content dependent on first portions of the first representation temporally corresponding to the first set of temporal segments, and

a second set of temporal segments of the second representation has encoded thereinto the media content independent from second portions of the first representation temporally corresponding to the second set of temporal segments so that a reconstruction of the media content from the second representation succeeds without the second portions the first representation.

50. Device according to claim 49, configured to provide the client with a computational rule using which it is feasible to discriminate between addresses of temporal segments of the first representation lying within the first portions and second portions respectively.

51 . Device according to claim 50, configured to insert the computational rule into a manifest sent to the client.

52. Device according to any of claims 49 to 51 , configured to, using hints in predetermined temporal segments of the first and/or second sets of temporal segments of the second representation, attribute one or more subsequent temporal segments of the second representation subsequent to the predetermined temporal segments to one of the first and second sets of temporal segments.

53. Device according to any of claims 49 to 52, configured to provide the client with a manifest comprising information on a

first transmission bitrate for the second representation corresponding to a transmission of the first and second portions of the first representation in addition to the first and second temporal segments of the second representation and

second transmission bitrate for the second representation corresponding to a transmission of the first portions of the first representation in addition to the first and second temporal segments of the second representation without the second portions of the first representation.

54. Device for retrieving, using an adaptively streaming protocol, a media content, the device supporting switching in units of temporal segments between, at least,

retrieving a first representation,

retrieving a second representation having encoded thereinto the media content dependent on the first representation,

wherein the device is configured to, when retrieving the second representation,

retrieve a first set of temporal segments of the second representation which has encoded thereinto the media content dependent on first portions of the first representation temporally corresponding to a first set of temporal segments of the second representation, along with the first portions of the first representation, and

retrieve a second set of temporal segments the second representation which has encoded thereinto the media content independent from a second portion of the first representation temporally corresponding to the second set of temporal segments, without the second portions of the first representation.

Device according to claim 54, configured to, using a computational rule, discriminate between addresses of temporal segments of the first representation lying within the first portions and second portions respectively.

Device according to claim 55, configured to derive the computational rule from a manifest sent from a server from which the media content is retrieved.

Device according to any of claims 54 to 56, configured to, using hints in predetermined temporal segments of the first and/or second sets of temporal segments of the second representation, attribute one or more subsequent temporal segments of the second presentation subsequent to the predetermined temporal segments to one of the first and second sets of temporal segments.

Device according to any of claims 57, configured to use the hints in the predetermined temporal segments so as to attribute the one or more subsequent temporal segments of the second presentation subsequent to the predetermined temporal segments to one of the first and second sets of temporal segments, responsive to a signalization in a manifest sent from a server from which the media content is retrieved.

Manifest for use in an adaptively streaming protocol, describing a media content, the manifest describing the media content as being available in form of

a first representation of the media content,

a second representation having encoded thereinto the media content dependent on the first representation,

wherein the manifest comprises an information discriminating between

a first set of temporal segments of the second representation has encoded thereinto the media content dependent on first portions of the first representation temporally corresponding to the first set of temporal segments, and

a second set of temporal segments the second representation has encoded thereinto the media content independent from second portions of the first representation temporally corresponding to the second set of temporal segments so that a reconstruction of the media content from the second representation succeeds without the second portions the first representation.

Data structure representing a media content and conceptualized for streaming, using an adaptively streaming protocol, the media content to the client, the data structure comprising

a first representation having encoded thereinto the media content,

a second representation having encoded thereinto the media content dependent on the first representation,

wherein the data structure comprises an information discriminating between

a first set of temporal segments of the second representation has encoded thereinto the media content dependent on first portions of the first representation temporally corresponding to the first set of temporal segments, and

a second set of temporal segments the second representation has encoded thereinto the media content independent from second portions of the first representation temporally corresponding to the second set of temporal segments so that a reconstruction of the media content from the second representation succeeds without the second portions the first representation.

61. Layered video stream having encoded thereinto a video in a first and a second layer (L1 , L2) using inter-layer prediction from the first to the second layer,

wherein the layered video stream comprises information which indicates a temporal subdivision of a sequence of pictures of the second layer, in an alternating manner, into subsequences of pictures coded independent from the first layer, and subsequences of pictures coded dependent on the first layer.

62. Video encoder configured to

encode into a layered video stream a video so that the layered video stream has a first and a second layer (L1 , L2), using inter-layer prediction from the first to the second layer, so that a sequence of pictures of the second layer comprises first subsequences of pictures coded independent from the first layer, between which second subsequences of the sequence of pictures of the second layer are, and

provide the layered video stream with information which indicates a temporal subdivision of the sequence of pictures of the second layer into the first subsequences of pictures coded independent from the first layer and the second subsequences of pictures.

63. Network device configured to

receive a layered video stream having encoded thereinto a video in a first and a second layer (L1 , L2) using inter-layer prediction from the first to the second layer, and

read from the layered video stream information which indicates a temporal subdivision of a sequence of pictures of the second layer, in an alternating

manner, into subsequences of pictures coded independent from the first layer, and subsequences of pictures coded dependent on the first layer, and

use the information to stream the video using an adaptive streaming protocol.

64. Device for outputting, using an adaptively streaming protocol, channels of a media scene to a client, the device supporting switching, for each channel, between a set of representations of the respective channel which differ in a temporal distribution of random access points.

65. Device according to claim 63, wherein at intermittently occurring time instances, random access points of at least one of the set of representations of the channels are temporally aligned.

Device according to claim 63 or 64, configured to provide the client with an information revealing the temporal distribution of random access points in the set of representations of the channels.

Device according to claim 65, configured to provide the information within a manifest.

Device according to claim 66, configured to provide the information using hints predetermined temporal segments of the sets of representations of the channels.

Device according to any of claims 63 to 67, configured to inform the client on an achievable bitrate peak reduction by retrieving the media scene by selecting, for each channel, a representation currently to be retrieved for the respective channel among the set of representations for the respective channel, depending on the temporal distribution of random access points in the set of representations of the channels, so that a number of temporal segments among the selected temporal segments which comprise a random access point, temporally varies in a minimum possible manner.

70 Device for retrieving, using an adaptively streaming protocol, channels of a media scene, the device being configured to switch, for each channel, between a set of representations of the respective channel which differ in a temporal distribution of random access points.

Device according to claim 70, configured to select, for each channel, a representation currently to be retrieve for the respective channel among the set of representations for the respective channel, depending on the temporal distribution of random access points in the set of representations of the channels.

Device according to claim 70 or 71 , configured to retrieve an information revealing the temporal distribution of random access points in the set of representations of the channels from a server from which the media scene is retrieved.

Device according to claim 72, configured to retrieve the information from a manifest sent from the server.

Device according to claim 72, configured to retrieve the information using hints in predetermined temporal segments of the sets of representations of the channels.

Data structure representing a media scene and conceptualized for streaming channels of the media scene, using an adaptively streaming protocol, to a client, wherein the data structure comprises, for each channel, a set of representations of the respective channel which differ in a temporal distribution of random access points.

Device for outputting, using an adaptively streaming protocol, a media content to a client, the device offering the media content to the client for retrieval in form of, at least,

a first media stream having encoded thereinto the media content at a first quality,

a second media stream having encoded thereinto the media content at a second quality, and

an auxiliary media stream having encoded thereinto the media content dependent on the first and second media streams.

Device according to claim 76, wherein the device is configured to inform the client on a possibility to schedule a fading phase at switching from retrieving the first media stream to retrieving the second media stream within which the auxiliary media stream is to be played-out instead of the second media stream.

Device according to claim 76 or 77, wherein the device is configured to inform the client on a length of a fading phase which the client should schedule at switching from retrieving the first media stream to retrieving the second media stream, and within which the auxiliary media stream is to be played-out instead of the second media stream.

Device according to claim any of claims 76 to 78, wherein the first media stream, the second media stream and the auxiliary media stream represent separate layers of a layered media stream, with the layer of the auxiliary media stream being coded by a linear combination of predictors separately derived, by inter-layer prediction, from layers of the first and second media streams.

Device according to claim any of claims 76 to 79, wherein the second media stream has the media content encoded thereinto dependent on the first media stream.

Device according to claim any of claims 76 to 79, wherein the second media stream has the media content encoded thereinto independent from the first media stream.

Device according to claim any of claims 76 to 81 , wherein the third media stream is retrievable by the client from the device in temporal segments separate from temporal segments of the first and second media streams.

Device according to claim any of claims 76 to 82, wherein the device is configured to additionally offer the media content to the client for retrieval in form of

a switching media stream having encoded thereinto the media content dependent on the first media stream.

Device according to claim 83, wherein the device is configured to inform the client on the necessity to schedule a transition phase (64) before switching from retrieving the video in form of the first media stream to retrieving the video in form of the second media stream, the transition phase preceding the fading phase.

Device for retrieving, using an adaptively streaming protocol, a media content from a server, the device supporting switching between, at least,

retrieving a first media stream having encoded thereinto the media content at a first quality, and

retrieving a second media stream having encoded thereinto the media content at a second quality,

wherein the device is configured to schedule a fading phase at switching from retrieving the first media stream to retrieving the second media stream within which the device retrieves an auxiliary media stream having encoded thereinto the media content dependent on the first and second media streams along with the first and second media streams and plays-out the auxiliary media stream instead of the second auxiliary stream.

Device according to claim 85, wherein the device is configured to activate or deactivate the scheduling the fading phase at switching from retrieving the first media stream to retrieving the second media stream depending on a signalization from the server.

Device according to claim 85 or 86, wherein the device is configured to receive from the server an information on a length of the fading phase and set the length of the fading phase accordingly.

Device according to claim any of claims 85 to 87, wherein the first media stream, the second media stream and the auxiliary media stream represent separate layers of a layered media stream, with the layer of the auxiliary media stream being coded by a linear combination of predictors separately derived, by inter-layer prediction, from layers of the first and second media streams, wherein the device is configured to input the layers of the first media stream, the second media stream and the auxiliary media stream together to a media decoder during the fading phase with refraining inputting the auxiliary media stream to the media decoder outside the fading phase.

89. Device according to claim any of claims 85 to 88, wherein the second media stream has the media content encoded thereinto dependent on the first media stream, wherein the device is configured to, outside the fading phase,

accompany the retrieving the second media stream with retrieving the first media stream, and

refrain from retrieving the second media stream during the retrieving the first media stream.

90. Device according to claim any of claims 85 to 89, wherein the second media stream has the media content encoded thereinto independent from the first media stream wherein the device is configured to, outside the fading phase,

refrain from retrieving the first media stream during the retrieving the second media stream, and

refrain from retrieving the second media stream during the retrieving the first media stream.

91. Device according to claim any of claims 85 to 90, configured to retrieve the third media stream during the fading phase in temporal segments separate from, and in addition to, temporai segments of the first and second media streams.

92. Device according to claim any of claims 85 to 91 , wherein the device is configured to, before switching from retrieving the first media stream to retrieving the second media stream, retrieve, in a transition phase, a switching media stream in addition to the first media stream from the server, the switching media stream having encoded thereinto the media content dependent on the first media stream, und use the switching media stream to, for random access dependent pictures of the second media stream which respectively directly or indirectly reference, by temporal prediction, a, in terms of presentation time order, succeeding random access point picture of the second media stream and a reference picture of the

second media stream preceding, in terms of presentation time order and decoding order, the random access point picture, a substitute of the reference picture.

93. Device according to claim 92, wherein the device is configured to set a length of the transition phase (64) depending on information sent from the sever.

94. Data structure representing a media content and conceptualized for streaming the media content, using an adaptively streaming protocol, to a client, the data structure comprising

a first media stream having encoded thereinto the media content at a first quality,

a second media stream having encoded thereinto the media content at a second quality, and

an auxiliary media stream having encoded thereinto the media content dependent on the first and second representations.

95. Device for outputting, using an adaptively streaming protocol, a media content to a client, the device offering the media content to the client for retrieval in form of, at least,

a first media stream having encoded thereinto the media content at a first quality,

a second media stream having encoded thereinto the media content at a second quality,

wherein the device is configured to provide the client with meta data controlling a fading at the client when switching between the first and second media streams.

96. Device for retrieving, using an adaptively streaming protocol, a media content from a server, the device supporting switching between, at least,

retrieving a first media stream having encoded thereinto the media content at a first quality, and

retrieving a second media stream having encoded thereinto the media content at a second quality,

wherein the device is configured to receive meta data from the server and to controlling, using the meta data, a fading when switching between the first and second media streams.

97. Method for outputting, using an adaptively streaming protocol, a video (24) to a client, the method comprising switching between, at least,

outputting the video (24) in form of a layered video stream (36); and

outputting the video (24) in form of an increased spatial resolution video stream (38) encoded using an open GOP structure and having encoded thereinto the video at a second spatial resolution and at a second quality,

the layered video stream (36) comprising

a first layer (L1 ) having encoded thereinto the video at a first spatial resolution and

a second layer (L2) having encoded thereinto the video at the second spatial resolution and a first quality lower than the second quality and using inter-layer prediction (44) from the first to the second layer by way of inter- layer upsampling, but without prediction residual coding.

98. Method for retrieving, using an adaptively streaming protocol, a video, the method comprising switching between, at least,

retrieving the video in form of a first video stream (36); and

retrieving the video in form of a second video stream (38),

wherein the device is configured to schedule a transition phase (64) before switching from retrieving the video in form of the first video stream (36) to retrieving the video in form of the second video stream (38).

99. Digital storage medium storing a data structure according to any of claims 33 to 40 and 60 and 94.

100. Method for outputting, using an adaptively streaming protocol, a video to a client, the method comprising

offering the video to the client for retrieval in form of, at least,

a first video stream (36); and

a second video stream (38), and

informing the client on the necessity to schedule a transition phase (64) before switching from retrieving the video in form of the first video stream (36) to retrieving the video in form of the second video stream (38).

101 . Digital storage medium storing a video stream according to claim 47.

102. Video encoding method comprising

encoding a sequence of pictures into a video stream in such a manner that there is among the sequence of pictures at least one random access dependent picture which directly or indirectly references, by temporal prediction, a, in terms of presentation time order, succeeding random access point picture of the sequence of pictures and a reference picture preceding, in terms of presentation time order and decoding order, the random access point picture, and

inserting a syntax element into the data stream indicating a guaranteed maximum temporal distance between the at least one random access dependent picture reference picture and the reference picture directly or indirectly referenced by the at least one random access dependent picture.

103. Method for outputting, using an adaptively streaming protocol, a media content to a client, the method comprising

switching in units of temporal segments between, at least,

a first representation,

a second representation having encoded thereinto the video dependent on the first representation,

providing the client with an information discriminating between

a first set of temporal segments of the second representation which has encoded thereinto the media content dependent on first portions of the first representation temporally corresponding to the first set of temporal segments, and

a second set of temporal segments of the second representation has encoded thereinto the media content independent from second portions of the first representation temporally corresponding to the second set of temporal segments so that a reconstruction of the media content from the second representation succeeds without the second portions the first representation.

Method for retrieving, using an adaptively streaming protocol, a media content, the method comprising

supporting switching in units of temporal segments between, at least,

retrieving a first representation,

retrieving a second representation having encoded thereinto the media content dependent on the first representation,

when retrieving the second representation,

retrieving a first set of temporal segments of the second representation which has encoded thereinto the media content dependent on first portions of the first representation temporally corresponding to a first set of temporal segments of the second representation, along with the first portions of the first representation, and

retrieving a second set of temporal segments the second representation which has encoded thereinto the media content independent from a second portion of the first representation temporally corresponding to the second set of temporal segments, without the second portions of the first representation.

105. Digital storage medium storing a manifest according to claim 59.

106. Digital storage medium storing a layered video stream according to claim 61 .

107. Video encoding method comprising

encoding into a iayered video stream a video so that the layered video stream has a first and a second layer (L1 , L2), using inter-layer prediction from the first to the second layer, so that a sequence of pictures of the second layer comprises first subsequences of pictures coded independent from the first layer, between which second subsequences of the sequence of pictures of the second layer are, and

providing the Iayered video stream with information which indicates a temporal subdivision of the sequence of pictures of the second layer into the first subsequences of pictures coded independent from the first layer and the second subsequences of pictures.

108. N etwo rk device configured to

receiving a Iayered video stream having encoded thereinto a video in a first and a second layer (L1 , L2) using inter-layer prediction from the first to the second layer, and

reading from the Iayered video stream information which indicates a temporal subdivision of a sequence of pictures of the second layer, in an alternating manner, into subsequences of pictures coded independent from the first layer, and subsequences of pictures coded dependent on the first layer, and

using the information to stream the video using an adaptive streaming protocol.

109. Method for outputting, using an adaptively streaming protocol, channels of a media scene to a client, the method comprising switching, for each channel, between a set of representations of the respective channel which differ in a temporal distribution of random access points.

1 10. Method for retrieving, using an adaptively streaming protocol, channels of a media scene, the method comprising switching, for each channel, between a set of representations of the respective channel which differ in a temporal distribution of random access points.

1 1 1. Method for outputting, using an adaptively streaming protocol, a media content to client, the method comprising offering the media content to the client for retrieval form of, at least,

a first media stream having encoded thereinto the media content at a first quality,

a second media stream having encoded thereinto the media content at a second quality, and

an auxiliary media stream having encoded thereinto the media content dependent on the first and second media streams.

Method for retrieving, using an adaptively streaming protocol, a media content from a server, the method comprising

switching between, at least,

retrieving a first media stream having encoded thereinto the media content at a first quality, and

retrieving a second media stream having encoded thereinto the media content at a second quality,

scheduling a fading phase at switching from retrieving the first media stream to retrieving the second media stream within which the device retrieves an auxiliary media stream having encoded thereinto the media content dependent on the first and second media streams along with the first and second media streams and plays-out the auxiliary media stream instead of the second auxiliary stream.

13. Method for outputting, using an adaptively streaming protocol, a media content to a client, the method comprising

offering the media content to the client for retrieval in form of, at least,

a first media stream having encoded thereinto the media content at a first quality,

a second media stream having encoded thereinto the media content at a second quality,

providing the client with meta data controlling a fading at the client when switching between the first and second media streams.

14. Method for retrieving, using an adaptively streaming protocol, a media content from a server, the method comprising

switching between, at least,

retrieving a first media stream having encoded thereinto the media content at a first quality, and

retrieving a second media stream having encoded thereinto the media content at a second quality,

receiving meta data from the server and to controlling, using the meta data, fading when switching between the first and second media streams.

15. Computer program having a program code for performing, when running on a computer, a method according to any of claims 97 to 1 14.

Documents

Application Documents

# Name Date
1 201837030279-STATEMENT OF UNDERTAKING (FORM 3) [13-08-2018(online)].pdf 2018-08-13
2 201837030279-FORM 1 [13-08-2018(online)].pdf 2018-08-13
3 201837030279-FIGURE OF ABSTRACT [13-08-2018(online)].pdf 2018-08-13
4 201837030279-DRAWINGS [13-08-2018(online)].pdf 2018-08-13
5 201837030279-DECLARATION OF INVENTORSHIP (FORM 5) [13-08-2018(online)].pdf 2018-08-13
6 201837030279-COMPLETE SPECIFICATION [13-08-2018(online)].pdf 2018-08-13
7 201837030279-FORM 18 [30-08-2018(online)].pdf 2018-08-30
8 201837030279-Proof of Right (MANDATORY) [20-09-2018(online)].pdf 2018-09-20
9 201837030279-FORM-26 [10-11-2018(online)].pdf 2018-11-10
10 201837030279-Information under section 8(2) (MANDATORY) [17-01-2019(online)].pdf 2019-01-17
11 201837030279-Information under section 8(2) (MANDATORY) [25-07-2019(online)].pdf 2019-07-25
12 201837030279-Information under section 8(2) (MANDATORY) [16-12-2019(online)].pdf 2019-12-16
13 201837030279-Information under section 8(2) (MANDATORY) [16-01-2020(online)].pdf 2020-01-16
14 201837030279-Information under section 8(2) [02-03-2020(online)].pdf 2020-03-02
15 201837030279-Information under section 8(2) [17-07-2020(online)].pdf 2020-07-17
16 201837030279-Information under section 8(2) [17-07-2020(online)]-1.pdf 2020-07-17
17 201837030279-Information under section 8(2) [21-09-2020(online)].pdf 2020-09-21
18 201837030279-Information under section 8(2) [23-01-2021(online)].pdf 2021-01-23
19 201837030279-Information under section 8(2) [23-01-2021(online)]-1.pdf 2021-01-23
20 201837030279-FORM 4(ii) [15-02-2021(online)].pdf 2021-02-15
21 201837030279-Information under section 8(2) [25-02-2021(online)].pdf 2021-02-25
22 201837030279-Information under section 8(2) [20-05-2021(online)].pdf 2021-05-20
23 201837030279-PETITION UNDER RULE 137 [24-05-2021(online)].pdf 2021-05-24
24 201837030279-OTHERS [24-05-2021(online)].pdf 2021-05-24
25 201837030279-FER_SER_REPLY [24-05-2021(online)].pdf 2021-05-24
26 201837030279-CLAIMS [24-05-2021(online)].pdf 2021-05-24
27 201837030279-Information under section 8(2) [08-07-2021(online)].pdf 2021-07-08
28 201837030279-Information under section 8(2) [03-09-2021(online)].pdf 2021-09-03
29 201837030279-FER.pdf 2021-10-18
30 201837030279-Information under section 8(2) [22-10-2021(online)].pdf 2021-10-22
31 201837030279-Information under section 8(2) [24-12-2021(online)].pdf 2021-12-24
32 201837030279-Information under section 8(2) [18-01-2022(online)].pdf 2022-01-18
33 201837030279-FORM 3 [14-07-2022(online)].pdf 2022-07-14
34 201837030279-Information under section 8(2) [14-10-2022(online)].pdf 2022-10-14
35 201837030279-FORM 3 [05-01-2023(online)].pdf 2023-01-05
36 201837030279-Information under section 8(2) [13-01-2023(online)].pdf 2023-01-13
37 201837030279-FORM 3 [06-07-2023(online)].pdf 2023-07-06
38 201837030279-Information under section 8(2) [28-09-2023(online)].pdf 2023-09-28
39 201837030279-US(14)-HearingNotice-(HearingDate-01-02-2024).pdf 2023-12-22
40 201837030279-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [27-01-2024(online)].pdf 2024-01-27
41 201837030279-US(14)-ExtendedHearingNotice-(HearingDate-29-02-2024).pdf 2024-02-01
42 201837030279-FORM-26 [28-02-2024(online)].pdf 2024-02-28
43 201837030279-Correspondence to notify the Controller [28-02-2024(online)].pdf 2024-02-28
44 201837030279-Correspondence to notify the Controller [28-02-2024(online)]-1.pdf 2024-02-28
45 201837030279-FORM-26 [29-02-2024(online)].pdf 2024-02-29
46 201837030279-Written submissions and relevant documents [14-03-2024(online)].pdf 2024-03-14
47 201837030279-PatentCertificate21-03-2024.pdf 2024-03-21
48 201837030279-IntimationOfGrant21-03-2024.pdf 2024-03-21

Search Strategy

1 searchstrategyE_24-08-2020.pdf

ERegister / Renewals

3rd: 31 May 2024

From 14/02/2019 - To 14/02/2020

4th: 31 May 2024

From 14/02/2020 - To 14/02/2021

5th: 31 May 2024

From 14/02/2021 - To 14/02/2022

6th: 31 May 2024

From 14/02/2022 - To 14/02/2023

7th: 31 May 2024

From 14/02/2023 - To 14/02/2024

8th: 31 May 2024

From 14/02/2024 - To 14/02/2025

9th: 31 Jan 2025

From 14/02/2025 - To 14/02/2026