Spatially Unequal Streaming

< Back

Spatially Unequal Streaming

Abstract: Various concepts for media content streaming are described. Some allow for streaming spatial scene content in a spatially unequal manner so that the visible quality for the user is increased or the processing complexity or necessary bandwidth at the streaming retrieval site is decreased. Other allow for streaming spatial scene content in a manner enlarging the applicability to further application scenarios.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 April 2019

Publication Number

21/2019

Publication Type

INA

Invention Field

COMMUNICATION

Status

lsdavar@vsnl.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-03-07

Renewal Date

Applicants

FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Hansastraße 27c 80686 München

Inventors

1. SKUPIN, Robert

Naugarder Straße 42 10409 Berlin

2. HELLGE, Cornelius

Erich-Weinert-Straße 5 10439 Berlin

3. SCHIERL, Thomas

Boris-Pasternak-Weg 7b 13156 Berlin

4. SÁNCHEZ DE LA FUENTE, Yago

Warschauer Strasse 67 10243 Berlin

5. PODBORSKI, Dimitri

Christstrasse 33 14059 Berlin

6. WIEGAND, Thomas

c/o Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut, HHI Einsteinufer 37 10587 Berlin

Specification

Spatially Unequal Streaming

Description

The present application is concerned with spatially unequal streaming such as occurring in virtual reality (VR) streaming.

VR streaming typically involves transmission of a very high-resolution video. The resolving capacity of the human fovea is around 60 pixels per degree. If transmission of the full sphere with 360° x 180° is considered, one would end up by sending a resolution of around 22k x 11k pixels. Since, sending such high resolution would lead to tremendously high bandwidth requirements, another solution is to send only the viewport shown at the Head Mounted Displays (HMDs), which have FoV of 90° x 90°: leading to around a 6k x 6k pixels video. A trade-off between sending the whole video at the highest resolution and sending only the viewport is to send the viewport at high resolution and some neighboring data (or the rest of the spherical video) at lower resolution or lower quality.

In a DASH scenario, an omni-directional video (aka spherical video) can be offered in such a way that the mixed resolution or mixed quality video described before is controlled by the DASH client. The DASH client only needs to know information that describes how the content is offered.

One example could be to offer different representations with different projections that have asymmetric characteristics, such as different quality and distortion for different parts of the video. Each representation would correspond to a given viewport and would have the viewport encoded with a higher quality/resolution than the rest of the content. Knowing the orientation information (direction of the viewport for which the content has been encoded with a higher quality/resolution) the DASH client can chose one or another representation dynamically to match the viewing direction of the user at any time.

A more flexible option for a DASH client to select such asymmetric characteristic for the omni-directional video would be when the video is split into several spatial regions, with each region being available at different resolution or quality. One option could be to split it into rectangular regions (aka tiles) based on a grid, but other options could be foreseen. In such a case, the DASH client would need some signaling about the different qualities into which the different regions are offered and it could download the different regions at different qualities so that the viewport shown to the user is at a better quality than the other non-shown content.

In any of the previous cases, when user interaction happens and the viewport is changed, the DASH client will need some time to react to user movement and download the content in such a way that matches the new viewport. During the time between the user moves and the DASH client adapts its requests to match the new viewport, the user will see in the viewport some regions in high quality and low quality simultaneously. Though the acceptable quality/resolution difference is content dependent, the quality the user sees is in any case degraded.

Thus, it would be favorable to have a concept at hand which alleviates, or renders more efficient, or even increases the visible quality for the user with respect to partial presentation of spatial scene content streamed by adaptive streaming.

Thus, the object of the present invention to provide concepts for streaming spatial scene content in a spatially unequal manner so that the visible quality for the user is increased, or the processing complexity or necessary bandwidth at the streaming retrieval site is decreased, or to provide concepts for streaming spatial scene content in a manner enlarging the applicability to further application scenarios.

This object is achieved by the subject matter of the pending independent claims.

A first aspect of the present application is based on the finding that streaming media content pertaining to a temporally-varying spatial scene such as a video in a spatially unequal manner may be improved in terms of visible quality at comparable bandwidth consumption and/or computational complexity at a streaming reception site if the media segments selected and retrieved and/or a signalization obtained from the server, provides the retrieving device with hints on a predetermined relationship to be complied with by qualities at which different portions of the temporally-varying spatial scene are encoded into the selected and retrieved media segments. Otherwise, the retrieving device may not know beforehand as to which negative impact the juxtaposition of portions encoded at different quality into the selected and retrieved media segment may have on the overall visible quality experienced by the user. Information contained in the media segments and/or a signalization obtained from the server such as, for instance, within a manifest file (media presentation description) or additional streaming related control messages from server to client such as SAND messages, enable the retrieving device to appropriately select among the media segments offered at the server. In this manner, virtual reality streaming or partial streaming of video content may be made more robust against quality degradation as it could otherwise occur owing to an inadequate distribution of the available bandwidth on to this spatial section of the temporally-varying spatial scene presented to the user.

A further aspect of the present invention is based on the finding that streaming of media content pertaining to a temporally-varying spatial scene such as a video in a spatially unequal manner such as using a first quality at a first portion and a second, lower quality at a second portion or with leaving a second portion being non-streamed, may be improved in visible quality and/or may be made less complex in terms of bandwidth consumption and/or computational complexity at the streaming retrieval side, by determining a size and/or position of the first portion depending on information contained in the media segments and/or a signalization obtained from the server. Imagine, for instance, the temporally-varying spatial scene would be offered at the server at a tile-based manner for tile-based streaming, i.e. the media segments would represent spectral temporal portions of the temporally-varying spatial scene each of which would be a temporal segment of the spatial scene within a corresponding tile of a distribution of tiles into which the spatial scene is sub-divided. In such a case, it is up to the retrieving device (client) to decide as to how to distribute the available bandwidth and/or computational power over the spatial scene, namely, at the granularity of tiles. The retrieving device would perform the selection of the media segments to the extent that a first portion of the spatial scene which follows respectively tracks a temporally-varying view section of the spatial scene, is encoded into the selected and retrieved media segments in a predetermined quality which may, for instance, be the highest quality feasible at the current bandwidth and/or computational power conditions. A spatially neighboring second portion of the spatial scene may, for instance, not be encoded into the selected and retrieved media segments, or may be encoded there into at a further quality, reduced relative to the predetermined quality. In such a situation, it is a computationally complex matter, or even not feasible, to compute a number/count of neighboring tiles, the aggregation of which completely covers the temporally-varying view section irrespective of the view section's orientation. Depending on the projection chosen so as to map the spatial scene onto the individual tiles, the angular scene coverage per tile may vary over this scene and the fact that the individual tiles may mutually overlap, even renders a

computation of a count of neighboring tiles sufficient to cover the view section in spatial terms, irrespective of the view section's orientation, more difficult. Accordingly, in such a situation, the aforementioned information could indicate the size of the first portion as a count N of tiles or a number of tiles, respectively. By this measure, the device would be able to track the temporally-varying view section by selecting those media segments having the co-located aggregation of N tiles encoded there at the predetermined quality. The fact that the aggregation of these N tiles sufficiently covers the view section may be guaranteed by way of the information indicating N. Another example would be information contained in the media segments and/or a signalization obtained from the server, which is indicative of the size of the first portion relative to a size of the view section itself. For example, this information could somehow set a "safety zone" or prefetch zone around the actual view section in order to account for movements of the temporally-varying view section. The larger the speed at which the temporally-varying view section moves across the spatial scene, the larger the safety zone should be. Accordingly, the aforementioned information could be indicative of the size of the first portion in a manner relative to a size of the temporally-varying view section such as in an incremental or scaling manner. A retrieving device setting the size of the first portion according to such information would be able to avoid quality degradation which may otherwise occur owing to non-retrieved or low-quality portions of the spatial scene being visible in the view section. Here, it is irrelevant whether this scene is offered in a tile-based manner or in some other manner.

Related to the just-mentioned aspect of the present application, a video bit stream having a video encoded there into, may be made decodable at an increased quality if the video bit stream is provided with a signalization of a size of a focus area within the video onto which a decoding power for decoding the video should be focused. By this measure, a decoder which decodes the video from the bit stream, could focus, or even restrict, its decoding power onto the decoding of the video onto a portion having the size of the focus area signalized in the video bit stream thereby knowing, for instance, that the thus-decoded portion is decodable by the available decoding power, and spatially covers a wanted section of the video. For instance, the size of the focus area thus signalized could be selected to be large enough in order to cover the size of the view section and a movement of this view section taking the decoding latency in decoding the video into account. Or, put differently, a signalization of a recommended preferred view-section area of the video contained in the video bitstream could allow the decoder to treat this area in a preferred manner, thereby allowing the decoder to focus its decoding power accordingly. Irrespective of performing area-specific decoding power focusing, the area signalization

may be forwarded to a stage selecting on which media segments to download, i.e. where to place and how to dimension the portion of increased quality.

The first and second aspects of the present application are closely related to a third aspect of the present application according to which the fact that a vast number of retrieving devices stream media content from a server, is exploited, so as to gain information which may subsequently be used in order to appropriately set the aforementioned types of information allowing to set the size, or size and/or position, of the first portion and/or appropriately set the predetermined relationship between the first and second quality. Thus, in accordance with this aspect of the present application, the retrieving device (client) sends-out log messages logging one of a momentaneous measurement or a statistical value measuring a spatial position and/or movement of the first portion, a momentaneous measurement or a statistical value measuring a quality of the temporally-varying spatial scene as far as is encoded into the selected media segments and as far as is visible in a view section, and a momentaneous measurement or statistical value measuring the quality of the first portion or a quality of the temporally-varying spatial scene as far as is encoded into the selected media segments and as far as is visible in a view section. Momentaneous measurements and/or statistical values may be provided with time information concerning the time the respective momentaneous measurement or statistical value has been obtained. The log messages may be sent to the server where the media segments are offered, or to some other device evaluating the inbound log messages so as to update, based thereon, current settings of the aforementioned information used to set the size, or size and/or position, of the first portion and/or derive the predetermined relationship based thereon.

In accordance with a further aspect of the present application, streaming media content pertaining to a temporally-varying spatial scene such as a video, in particular in a tile-based manner, is made more effective in terms of avoidance of unavailing streaming trials by providing a media presentation description which comprises at least one version at which the temporally-varying spatial scene is offered for tile-based streaming, with an indication of benefitting requirements for benefitting from the tile-based streaming the respective version of the temporally-varying spatial scene for each of the at least one version. By this measure, the retrieving device is able to match the benefitting requirements of the at least one version with a device capability of the retrieving device itself or of another device interacting with the retrieving device with respect to tile-based streaming. For instance, the benefitting requirements could relate to decoding capability requirements. That is, if the decoding power for decoding the streamed/retrieved media content would not suffice to decode all media segments needed to cover a view section of the temporally-varying spatial scene, then trying to stream and present the media content would be a waste of time, bandwidth and computational power and accordingly, it would be more effective to not try it in any case. The decoding capability requirements could, for instance, indicate a number of decoder instantiations necessitated for a respective version if, for instance, the media segments relating to a certain tile form a media stream such as a video stream, separate from media segments pertaining to another tile. The decoding capability requirement could, for instance, also pertain to further information such as a certain fraction of decoder instantiations needed to fit to a predetermined decoding profile and/or level, or could indicate a certain minimum capability of a user input device to move in a sufficiently fast manner a viewport/section via which the user sees the scene. Depending on the scene content, a low movement capability may not suffice for the user to look onto the interesting portions of the scene.

A further aspect of the present invention pertains to an extension of streaming of media content pertaining to temporally-varying spatial scenes. In particular, the idea in accordance with this aspect is that a spatial scene may in fact not only vary temporally but also in terms of at least one further parameter suggest, for instance, views and a position, view depth or some other physical parameter. The retrieving device may use adaptive streaming in this context by, depending on a viewport direction and the at least one further parameter, computing addresses of media segments, the media segments describing a spatial scene varying in time and the at least one parameter, and retrieving the media segments using the computed addresses from a server.

The above-outlined aspects of the present application and their advantageous implementations which are the subject of the dependent claims, may be combined individually or all together.

Preferred embodiments of the present application are set forth below with respect to the figures among which

Fig. 1 shows a schematic diagram illustrating a system of client and server for virtual reality applications as an example as to where the embodiments set forth in the following figures may advantageously be used;

Fig. 2 shows a block diagram of a client device along with a schematic illustration of the media segment selection process in order to describe a possible mode of operation of the client device in accordance with an embodiment of the present application where the server 10 provides the device with information on acceptable or endurable quality variations within the media content presented to the user;

Fig. 3 shows a modification of Fig. 2, the portion of increase quality does not concern the portion tracking the view section of viewport, but a region of interest of the media scene content as signaled from server to client;

shows a block diagram of the client device along with a schematic illustration of the media segment selection process in accordance with an embodiment where the server provides information on how to set a size, or size and/or position, of the portion of increased quality or the size, or size and/or position, of the actually retrieved section of the media scene;

shows a variant of Fig. 5 in that information sent by the server directly indicates the size of portion 64, rather than scaling it depending on expected movements of the viewport;

shows a variant of Fig. 4 according to which the retrieved section has the predetermined quality and its size is determined by the information stemming from the server;

Fig. 7a to 7c show schematic diagrams illustrating the manner in which the information 74 according to Figs. 4 and 6 increases the size of the portion retrieved at the predetermined quality via a corresponding enlargement of the size of the viewport;

Fig. 8a shows a schematic diagram illustrating an embodiment where client device sends log messages to server or a certain evaluator for evaluating these log messages so as to derive thereof appropriate settings, for instance, for the types of information discussed with respect to Figs. 2 to 7c;

Fig. 8b shows a schematic diagram of a tile-based cubic projection of a 360 scene onto the tiles and an example of how some of the tiles are covered by an exemplary position of a viewport. The small circles indicate positions in the viewport equiangularly distributed, and hatched tiles are encoded at higher resolution in the downloaded segments than tiles without hatching;

Fig. 8c and d show a schematic diagram of a diagram showing along a temporal axis

(horizontal) as to how a buffer fullness (vertical axis) of different buffers of the client might develop, wherein Fig. 8c assumes the buffers to be used to buffer representations coding specific tiles, while Fig. 8d assumes the buffers to be used to buffer omnidirectional representations having the scene encoded thereinto at uneven quality, namely increased toward some direction specific for the respective buffer;

Fig. 8 e and f show a three-dimensional diagram of different pixel density measurements within the viewport 28, differing in terms of uniformity in spherical or viewplane sense;

Fig. 9 shows a block diagram of client device and a schematic illustration of the media segment selection process when the device inspects information stemming from the server in order to assess whether a certain version at which a tile-based streaming is offered by the server, is acceptable for the client device or not;

Fig. 10 shows a schematic diagram illustrating the plurality of media segments offered by a server in accordance with an embodiment allowing for a dependency of the media scene not only in time, but also in another non-temporal parameter, namely here, exemplarily, scene center position;

Fig. 1 1 shows a schematic diagram illustrating a video bit stream comprising information steering or controlling a size of a focus area within the video encoded into the bit stream along with an example for a video decoder able to take advantage of this information.

In order to ease the understanding of the description of embodiments of the present application with respect to the various aspects of the present application, Fig. 1 shows an example for an environment where the subsequently described embodiments of the

present application may be applied and advantageously used. In particular, Fig. 1 shows a system composed of client 10 and server 20 interacting via adaptive streaming. For instance, dynamic adaptive streaming over HTTP (DASH) may be used for the communication 22 between client 10 and server 20. However, the subsequently outlined embodiments should not be interpreted as being restricted to the usage of DASH and likewise, terms such as media presentation description (MPD) should be understand as being broad so as to also cover manifest files defined differently than in DASH.

Fig. 1 illustrates a system configured to implement a virtual reality application. That is, the system is configured to present to a user wearing a head up display 24, namely via an internal display 26 of head up display 24, a view section 28 out of a temporally-varying spatial scene 30 which section 28 corresponds to an orientation of the head up display 24 exemplarily measured by an internal orientation sensor 32 such as an inertial sensor of head up display 24. That is, the section 28 presented to the user forms a section of the spatial scene 30 the spatial position of which corresponds to the orientation of head up display 24. In case of Fig.1 , the temporally-varying spatial scene 30 is depicted as an omni-directional video or spherical video, but the description of Fig. 1 and the subsequently explained embodiments are readily transferrable to other examples as well, such as presenting a section out of a video with a spatial position of section 28 being determined by an intersection of a facial access or eye access with a virtual or real projector wall or the like. Further, sensor 32 and display 26 may, for instance, be comprised by different devices such as remote control and corresponding television, respectively, or they may be part of a hand-held device such as a mobile device such as a tablet or a mobile phone. Finally, it should be noted that some of the embodiments described later on, may also be applied to scenarios where the area 28 presented to the user constantly covers the whole temporally-varying spatial scene 30 with the unevenness in presenting the temporally-varying spatial scene relating, for instance, to an unequal distribution of quality over the spatial scene.

Further details with respect to server 20, client 10 and the way the spatial content 30 is offered at server 20 is illustrated in Fig. 1 and described in the following. These details should, however, also not be treated as limiting the subsequently explained embodiments, but should rather serve as an example of how to implement any of the subsequently explained embodiments.

ln particular, as shown in Fig. 1 , server 20 may comprise a storage 34 and a controller 36 such as an appropriately programmed computer, an application-specific integrated circuit or the like. The storage 34 has media segments stored thereon which represent the temporally-varying spatial scene 30. A specific example will be outlined in more detail below with respect to the illustration of Fig. 1. Controller 36 answers requests sent by client 10 by re-sending to client 10 requested media segments, a media presentation description and may send to client 10 further information on its own. Details in this regard are also set out below. Controller 36 may fetch requested media segments from storage 34. Within this storage, also other information may be stored such as the media presentation description or parts thereof, in the other signals sent from server 20 to client 10.

As shown in Fig.1 , server 20 may optionally in addition comprise a stream modifier 38 modifying the media segments sent from server 20 to client 10 responsive to the requests from the latter, so as to result at client 10 in a media data stream forming one single media stream decodable by one associated decoder although, for instance, the media segments retrieved by client 10 in this manner are actually aggregated from several media streams. However, the existence of such a stream modifier 38 is optional.

Client 10 of Fig. 1 is exemplarily depicted as comprising a client device or controller 40 or more decoders 42 and a reprojector 44. Client device 40 may be an appropriately programmed computer, a microprocessor, a programmed hardware device such as an FPGA or an application specific integrated circuit or the like. Client device 40 assumes responsibility for selecting segments to be retrieved from server 20 out of the plurality 46 of media segments offered at server 20. To this end, client device 40 retrieves a manifest or media presentation description from server 20 first. From the same, client device 40 obtains a computational rule for computing addresses of media segments out of plurality 46 which correspond to certain, needed spatial portions of the spatial scene 30. The media segments thus selected are retrieved by client device 40 from server 20 by sending respective requests to server 20. These requests contain computed addresses.

The media segments thus retrieved by client device 40 are forwarded by the latter to the one or more decoders 42 for decoding. In the example of Fig. 1 , the media segments thus retrieved and decoded represent, for each temporal time unit, merely a spatial section 48 out of the temporally-varying spatial scene 30, but as already indicated above, this may be different in accordance with other aspects, where, for instance, the view section 28 to be presented constantly covers the whole scene. Reprojector 44 may optionally re-project and cut-out the view section 28 to be displayed to the user out of the retrieved and decoded scene content of the selected, retrieved and decoded media segments. To this end, as shown in Fig. 1 , client device 40 may, for instance, continuously track and update a spatial position of view section 28 responsive to the user orientation data from sensor 32 and inform reprojector 44, for instance, on this current spatial position of scene section 28 as well as the reprojection mapping to be applied onto the retrieved and decoded media content so as to be mapped onto the area forming view section 28. Reprojector 44 may, accordingly, apply a mapping and an interpolation onto a regular grid of pixels, for instance, to be displayed on display 26.

Fig. 1 illustrates the case where a cubic mapping has been used to map the spatial scene 30 onto tiles 50. The tiles are, thus, depicted as rectangular sub-regions of a cube onto which scene 30 having the form of a sphere has been projected. Reprojector 44 reverses this projection. However, other examples may be applied as well. For instance, instead of a cubic projection, a projection onto a truncated pyramid or a pyramid without truncation may be used. Further, although the tiles of Fig. 1 are depicted as being non-overlapping in terms of coverage of the spatial scene 30, the subdivision into tiles may involve a mutual tile-overlapping. And as will be outlined in more detail below, the subdivision of scene 30 into tiles 50 spatially with each tile forming one representation as explained further below, is also not mandatory.

Thus, as depicted in Fig. 1 , the whole spatial scene 30 is spatially subdivided into tiles 50. In the example of Fig. 1 , each of the six faces of the cube is subdivided into 4 tiles. For illustration purposes, the tiles are enumerated. For each tile 50, server 20 offers a video 52 as depicted in Fig. 1. To be more precise, server 20 even offers more than one video 52 per tile 50, these videos differing in quality Q#. Even further, the videos 52 are temporally subdivided into temporal segments 54. The temporal segments 54 of all videos 52 of all tiles T# form, or are encoded into, respectively, one of the media segments of the plurality 46 of media segments stored in storage 34 of server 20.

It is again emphasized that even the example of a tile-based streaming illustrated in Fig. 1 merely forms an example from which many deviations are possible. For instance, although Fig. 1 seems to suggest that the media segments pertaining to a representation of the scene 30 at a higher quality relate to tiles coinciding to tiles to which media segments belong which have the scene 30 encoded thereinto at quality Q1 this

coincidence is not necessary and the tiles of different qualities may even correspond to tiles of a different projection of scene 30. Moreover, although not discussed so far, it may be that the media segments corresponding to different quality levels depicted in Fig. 1 differ in spatial resolution and/or signal to noise ratio and/or temporal resolution or the like.

Finally, differing from a tile-based streaming concept, according to which the media segments which may be individually retrieved by device 40 from server 20, relate to tiles 50 into which scene 30 is spatially subdivided, the media segments offered at server 20 may alternatively, for instance, each having the scene 30 encoded thereinto in a spatially complete manner with a spatially varying sampling resolution, however, having sampling resolution maximum at different spatial positions in scene 30. For instance, that could be achieved by offering at the server 20 sequences of segments 54 relating to a projecting of the scene 30 onto truncated pyramids the truncated tip of which would be oriented into mutually different directions, thereby leading to differently oriented resolution peaks.

Further, as to optionally present stream modifier 38, it is noted that same may alternatively be part of the client 10, or same may even be positioned inbetween, within a network device via which client 10 and server 20 exchange the signals described herein.

After having explained rather generally the system of server 20 and client 10, the functionality of client device 40 with respect to an embodiment in accordance with a first aspect of the present application as described in more detail. To this end, reference is made to Fig. 2 which shows device 40 in more detail. As already explained above, device 40 is for streaming media content pertaining to the temporally-varying spatial scene 30. As explained with respect to Fig. 1 , device 40 may either be configured so that the media content streamed pertains continuously to the whole scene in spatial terms, or merely a section 28 thereof. In any case, device 40 comprises a selector 56 for selecting appropriate media segments 58 out of the plurality 46 of media segments available on server 20, and a retriever 60 for retrieving the selected media segments from server 20 by respective requests such as HTTP requests. As described above, selector 56 may use the media presentation description so as to compute the addresses of selected media segments with retriever 60 using these addresses in retrieving the selected media segments 58. For example, the computational rule to compute the addresses indicated in the media presentation description may depend on quality parameter Q, tile T and some temporal segment t. The addresses may be URLs, for instance.

As has also been discussed above, the selector 56 is configured to perform the selection so that the selected media segments have at least a spatial section of the temporally-varying spatial scene and encoded thereinto. The spatial section may continuously cover the complete scene spatially. Fig. 2 illustrates at 61 the exemplary case where device 40 adapts the spatial section 62 of scene 30 to overlap and surround view section 28. This is, however, as already noted above, not necessarily the case and the spatial section may continuously cover the whole scene 30.

Further, selector 56 performs the selection such that the selected media segments have section 62 encoded thereinto in a manner of spatially unequal quality. To be more precise, a first portion 64, indicated by hatching in Fig. 2, of spatial section 62 is encoded into the selected media segment at a predetermined quality. This quality may, for instance, be the highest quality offered by server 20, or may be a "good" quality. Device 42 moves, for instance, or adapts the first portion 64 in a manner so as to spatially follow the temporally-varying view section 28. For instance, selector 56 selects the current temporal segments 54 of those tiles inheriting the current position of view section 28. In doing so, selector 56 may, optionally, as explained with respect to further embodiments hereinafter, keep the number of tiles making-up first portion 64 constant. In any case, a second portion 66 of section 62 is encoded into the selected media segments 58 at another quality such as a lower quality. For example, selector 56 selects the media segments corresponding to the current temporal segments of tiles spatially neighboring the tiles of portion 64 and belonging to the lower quality. For instance, selector 56 mainly selects the media segments corresponding to portion 66 for the sake of addressing the possible occasion where view section 28 moves too fast so as to leave portion 64 and overlap portion 66 before the temporal interval corresponding to the current temporal segment ends and selector 56 would be able to newly spatially arrange portion 64. In this situation, the portion of section 28 protruding into portion 66 may be presented to the user nevertheless, namely at reduced quality.

It is, not possible for device 40 to assess as to which negative quality degradation may result from preliminarily presenting to the user reduced quality scene content along with the scene content within portion 64 which is of the higher quality, to the user. In particular, a transition between these two qualities results which may be clearly visible to the user. At least, such transitions may be visible depending on the current scene content within section 28. The severity of the negative impact of such a transition within the view of the user is a characteristic of the scene content as offered by server 20 and may not be forecast by device 40.

Accordingly, in accordance with the embodiment of Fig. 2, device 40 comprises a deriver 66 deriving a predetermined relationship to be fulfilled between the quality of portion 64 and the quality of portion 66. Deriver 66 derives this predetermined relationship from information which may be contained in the media segments such as within transport boxes within the media segments 58 and/or contained in a signalization obtained from server 20 such as within the media presentation description or proprietary signals sent from server 20 such as SAND messages or the like. Examples as to how the information 68 would look like, are presented in the following. The predetermined relationship 70 derived by deriver 66 on the basis of information 68 is used by selector 56 in order to appropriately perform the selection. For instance, the restriction in selecting the qualities of portions 64 and 66 compared to a completely independent selection of qualities for portion 64 and 66 influences a distribution of available bandwidth for retrieving the media contents concerning section 62 onto portions 64 and 66. In any case, selector 56 selects the media segments such that the qualities at which portions 64 and 66 are encoded into the media segments finally retrieved fulfill the predetermined relationship. Examples as to how the predetermined relationship might look are also set out below.

The media segments selected and finally retrieved by retriever 60 are finally forwarded to the one or more decoders 42 for decoding.

In accordance with a first example, for instance, the signaling mechanism embodied by information 68 involves information 68 indicating to device 40, which may be a DASH client, which quality combinations are acceptable for the offered video content. For example, the information 68 could be a list of quality pairs that indicate to the user or device 40 that the different regions 64 and 66 can be mixed with a maximum quality (or resolution) difference. Device 40 may be configured to inevitably use a certain quality level such as the highest one offered at sever 10, for portion 64 and derive quality levels at which portion 66 may be coded into the selected media segments from information 68 wherein same be contained in form of a list of quality levels for portion 68, for instance.

The information 68 could indicate an endurable value for a measure of a difference between the quality of portion 68 and the quality of portion 64. As a "measure" of the difference in quality, a quality index of the media segments 58, by way of which the same are distinguished in the media presentation description and by way of which the addresses of the same are computed using the computational rule described in the media presentation description, may be used. In MPEG-DASH, the corresponding attribute indicating the quality would be, for instance, at @qualityRanking. Device 40 could take the restriction in selectable quality level pairs at which portions 64 and 66 may be coded into the selected media segments into account in performing the selection.

However, instead of this difference measure, the difference in quality could alternatively be measured, for instance, in bit rate difference, i.e., an endurable difference in bit rate at which portions 64 and 66 are encoded into the corresponding media segments, respectively, assuming that the bit rate usually monotonically increases with increasing quality. The information 68 could indicate allowed pairs of options for qualities at which portions 64 and 66 are encoded into the selected media segments. Alternatively, the information 68 simply indicates allowed qualities for coding portion 66, thereby indirectly indicating allowed or endurable quality differences assuming that main portion 64 is encoded using some default quality such as, for instance, the highest quality possible or available. For instance, information 68 could be a list of acceptable representation IDs or could indicate a minimum bit rate level with respect to the media segments concerning portion 66.

However, a more gradual quality difference could alternatively be desired, wherein, instead of quality pairs, quality groups (more than two qualities) could be indicated, wherein, dependent on the distance to section 28, i.e., the viewport, the quality difference could be increased. That is, the information 68 could indicate the endurable value for the measure of a difference between the qualities for portion 64 and 66 in a manner depending on a distance to view section 28. This could be done by way of a list of pairs of a respective distance to the view section and a corresponding endurable value for the measure of the difference in quality beyond the respective distance. Below the respective distance, the quality difference has to be lower. That is, each pair would indicate for a corresponding distance that a part within portion 66, further away from section 28 than the corresponding distance, may have a quality difference to the quality of portion 64 exceeding the corresponding endurable value of this list entry.

The endurable value may increase within increasing distance to view section 28. The acceptance of the just discussed quality difference is often dependent on the time that these different qualities are shown to the user. For instance, content with a high quality difference might be acceptable if it is only shown for 200 microseconds, while content with a lower quality difference might be acceptable if it is shown for 500 microseconds. Therefore, in accordance with a further example, the information 68 could also include, in addition to the aforementioned quality combinations, for instance, or in addition to the allowed quality difference, a time interval for which the combination/quality difference may be acceptable. In other words, the information 68 may indicate an endurable or maximally allowed difference between the qualities of portions 66 and 64 along with an indication of a maximally allowed time interval for which portion 66 may be shown within the view section 28 concurrently with portion 64.

As already noted previously, the acceptance of quality differences depends on the content itself. For instance, the spatial position of the different tiles 50 has an influence on the acceptance. Quality differences in a uniform background region with low frequency signals are expected to be more acceptable than quality differences in a foreground object. Furthermore, the position in time also has an influence on the acceptance rate due to changing content. Therefore, according to another example, signals forming information 68 are sent to device 40 intermittently such as, for instance, per representation or period in DASH. That is, the predetermined relationship indicated by information 68 may be intermittently updated. Additionally and/or alternatively, the signaling mechanism realized by information 68 may vary in space. That is, the information 68 may be made spatially dependent such as, by way of an SRD parameter in DASH. That is, different predetermined relationships may be indicated by information 68 for different spatial regions of scene 30.

The embodiment of device 40, as described with respect to Fig. 2, pertains to the fact that device 40 wants to keep quality degradations, due to pre-fetched portions 66, within the retrieved section 62 of video content 30 briefly being visible in section 28 before being able to change the position of section 62 and portion 64 so as to adapt the same to the change in position by section 28, as low as possible. That is, in Fig. 2, portions 64 and 66, the qualities of which were restricted as far as their possible combinations were concerned by way of information 68, were different portions of section 62 with a transition between both portions 64 and 66 being continuously shifted or adapted in order to track or run-ahead the moving view section 28. In accordance with an alternative embodiment shown in Fig. 3, device 40 uses information 68 in order to control possible combinations of qualities of portions 64 and 66 which, however, in accordance with the embodiment of Fig. 3, are defined to be portions differentiated or distinguished from one another in a manner defined, for instance, in the media presentation description, i.e., defined in a manner independent from a position of view section 28. The positions of portions 64 and 66 and the transition there between may be constant or vary in time. If varying in time, the variation is due to a change in content of scene 30. For example, portion 64 would correspond to a region of interest for which the expenditure of higher quality is worthwhile, while portion 66 is a portion for which quality reduction owing to low bandwidth conditions, for instance, should be considered prior to considering quality reductions for portion 64.

In the following, a further embodiment for an advantageous implementation of device 40 is described. In particular, Fig. 4 shows device 40 in a manner corresponding, in structure, to Figs. 2 and 3 but the mode of operation is changed so as to correspond to a second aspect of the present application.

That is, device 40 comprises a selector 56, a retriever 60 and a deriver 66. The selector 56 selects from the media segments 58 of the plurality 46 offered by server 20 and retriever 60 retrieves the selected media segments from the server. Fig. 4 presumes that device 40 operates as depicted and illustrated with respect to Figs. 2 and 3, namely that selector 56 performs the selection so that the selected media segments 58 have a spatial section 62 of scene 30 encoded thereinto in a manner where this spatial section follows view section 28 which varies its spatial position in time. However, a variant corresponding to the same aspect of the present application is described later on with respect to Fig. 5, wherein, for each time instant t, the selected and retrieved media segments 58 have the whole scene or a constant spatial section 62 encoded thereinto.

In any case, selector 56 selects, similar to the description with respect to Figs. 2 and 3, the media segments 58 such that a first portion 64 within section 62 is encoded into the selected and retrieved media segments at a predetermined quality, whereas a second portion 66 of section 62, which spatially neighbors the first portion 64, is encoded into the selected media segments at a reduced quality relative to the predetermined quality of portion 64. A variant where selector 56 restricts the selection and retrieval to media segments pertaining to a moving template tracking the position of viewport 28 and wherein the media segments have encoded thereinto the section 62 completely at the predetermined quality so that the first portion 64 completely covers section 62 while being surrounded by non-encoded portion 72 is depicted in Fig. 6. In any case, selector 56 performs the selection so that the first portion 64 follows the view section 28 which varies in spatial position temporally.

In such a situation, it is also not easy to forecast by client 40 as to how large section 62 or portion 64 should be. Depending on the scene content, most users may act similarly in moving view section 28 across scene 30 and, accordingly, the same applies to the interval of view section 28 speeds at which view section 28 may presumably move across scene 30. Accordingly, in accordance with the embodiment of Figs. 4 to 6, information 74 is provided by server 20 to device 40 so as to assist device 40 in setting a size, or size and/or position, of the first portion 64, or the size, or size and/or position, of section 62, respectively, dependent on the information 74. With respect to the possibilities of transmitting information 74 from server 20 to device 40, the same applies as described above with respect to Figs. 2 and 3. That is, the information may be contained within the media segments 58 such as within event boxes thereof, or a transmission within the media presentation description or proprietary messages sent from server to device 40, such as SAND messages, may be used to this end.

That is, in accordance with the embodiments of Figs. 4 to 6, selector 56 is configured to set a size of the first portion 64 depending on information 74 stemming from server 20. In the embodiments illustrated in Figs. 4 to 6, the size is set in units of tiles 50, but, as already described above with respect to Fig. 1 , the situation may be slightly different when using another concept of offering scene 30 in spatially varying quality at server 20.

In accordance with an example, information 70 could, for instance, include a probability for a given movement speed of viewport of view section 28. Information 74 could, as already denoted above, result within the media presentation description made available for client device 40 which may, for instance, be a DASH client, or some in-band mechanisms may be used to convey information 74 such as event boxes, i.e., EMSG or SAND messages in case of DASH. The information 74 could also be included in any container format such as ISO file format or transport format beyond MPEG-DASH such as MPEG-2TS. It could also be conveyed in the video bitstream such as in SEI messages as described later. In other words, the information 74 may indicate a predetermined value for a measure of a spatial speed of view section 28. In this manner, the information 74 is indicative of the size of portion 64 in the form of a scaling, or in the form of an increment relative to a size of view section 28. That is, information 74 starts from some "base size" for portion 64 necessary to cover the size of section 28 and increases this "base size" appropriately such as incrementally or by scaling. For example, the aforementioned movement speed of view section 28 could be used to correspondingly scale the circumference of a current position

of view section 28 so as to determine, for instance, the furthest positions of the circumference of view section 28 along any spatial direction feasible after this time interval, for example, determining the latency in adjusting the spatial location of portion 64 such as, for instance, the time duration of the temporal segments 54 corresponding to the temporal length of media segments 58. The speed times this time duration adds to the circumference of a current position of viewport 28, omni-directional, could thus result into such a worst case circumference and could be used to determine an enlargement of portion 64 relative to some minimum expansion of portion 64 assuming a non-moving viewport 28.

Information 74 may even be related to an evaluation of statistics of user behavior. Later on, embodiments are described which are suitable for feeding such an evaluation process. For instance, information 74 could indicate maximum speeds with respect to certain percentages of users. For example, information 74 could indicate that 90% of users move at a speed lower than 0.2 rad/s and 98% of users move at a speed lower than 0.5 rad/s. The information 74 or the messages carrying the same could be defined in such a way that probability-speed pairs are defined or a message could be defined that signals the maximum speed for a fixed percentage of users, e.g., always for 99% of the users. The movement speed signaling 74 could additionally comprise directional information, i.e., an angle in 2D or 2D plus depth in 3D also known as light field application. Information 74 would indicate different probability-speed pairs for different movement directions.

In other words, information 74 may apply to a given time span such as, for instance, the temporal length of a media segment. It may consist of trajectory-based (x percentile, average user path) or velocity-based pairs (x percentile, speed) or distance-based pairs (x percentile, aperture/diameter/preferred) or area-based pairs (x percentile, recommended preferred area) or single maximal boundary values for path, velocity, distance or preferred area. Instead of relating the information to percentiles, a simple frequency ranking could be done according to most of the users move at a certain speed, second most users move at a further speed and so on. Additionally or alternatively, information 74 is not restricted to indicate the speed of view section 28, but could likewise indicate a preferred area to be viewed respectively to direct the portion 62 and/or 64 which is sought to track view section 28 to, with or without an indication about statistical significance of the indication such as percentage of users having complied with that indication or indication of whether the indication coincides with the user viewing speeds/view sections having been logged most often, and with or without temporal persistence of the indication. Information 74 could

indicate another measure of the speed of view section 28, such as a measure for a travelling distance of view section 28 within a certain period in time, such as within a temporal length of the media segments or, in more detail, the temporal length of temporal segments 54. Alternatively, information 74 could be signaled in a manner distinguishing between certain directions of movement into which view section 28 may travel. This pertains to both an indication of speed or velocity of view section 28 into a certain direction as well as the indication of traveled distance of view section 28 with respect to a certain direction of movement. Further, the expansion of portion 64 could be signaled by way of information 74 directly, either omni-directionally or in a manner discriminating different movement directions. Furthermore, all of the just outlined examples may be modified, in that the information 74 indicates these values along with a percentage of users for which these values suffice in order to account for their statistical behavior in moving view section 28. In this regard, it should be noted that the view speed, i.e., the speed of view section 28 may be considerable and is not restricted to speed values for a user head, for instance. Rather, the view section 28 could be moved depending on the user's eye movement, for instance, in which case the view speed may be considerably larger. The view section 28 could also be moved according to another input device movement such as according to the movement of a tablet or the like. As all these "input possibilities" enabling the user to move section 28 result in different expected speeds of view section 28, information 74 may even be designed such that it distinguishes between different concepts for controlling the movement of view section 28. That is, information 74 could indicate or be indicative of the size of portion 64 in a manner indicating different sizes for different ways of controlling the movement of view section 28 and device 40 would use the size indicated by information 74 for the correct view section control. That is, device 40 gains knowledge about the way view section 28 is controlled by the user, i.e., checks whether view section 28 is controlled by head movement, eye movement or tablet movement or the like and sets the size in accordance with that part of information 74 which corresponds to this kind of view section control.

Generally, the movement speed can be signaled per content, period, representation, segment, per SRD position, per pixel, per tile, e.g., on any temporal or spatial granularity or the like. The movement speed can also be differentiated in head movement and/or eye movement, as just outlined. Further, the information 74 about user movement probability may be conveyed as a recommendation about high resolution prefetch, i.e., video area outside user viewport, or spherical coverage.

Fig. 7a to Fig. 7c briefly summarize some of the options explained with respect to information 74 in the way it is used by device 40 to amend the size of portion 64 or portion 62, respectively, and/or the position thereof. In accordance with the option shown in Fig. 7a, device 40 enlarges the circumference of section 28 by a distance corresponding to a product of the signaled speed v and the time duration At, which may correspond to the time period which corresponds to the temporal length of the temporal segments 54 encoded in the individual media segments 50a. Additionally and/or alternatively, the position of portion 62 and/or 64 may be placed the farther away from a current position of section 28, or the current position of portion 62 and/or 64 into direction of the signaled speed or movement as signaled by information 74, the larger the speed is. The speed and direction may be derived from surveying or extrapolating a recent development or change in recommended preferred area indication by information 74. Instead of omni-directionally applying v x At, the speed may be signaled by information 74 different for different spatial directions. The alternative depicted in Fig. 7d shows that information 74 may indicate the distance of enlarging the circumference of view section 28 directly, with this distance being indicated by parameter s in Fig. 7b. Again, a directionally varying enlargement of section may be applied. Fig. 7c shows that the enlargement of the circumference of section 28 could be indicated by information 74 by area increase such as, for instance, in the form of the ratio of the area of the enlarged section compared to the original area of section 28. In any case, the circumference of area 28, after enlargement, indicated by 76 in Figs. 7a to 7c could be used by selector 56 to dimension or set the size of portion 64 such that portion 64 covers the whole area within enlarged section 76 on at least a predetermined amount thereof. Obviously, the larger section 76, the larger the number of tiles, for instance, is within portion 64. In accordance with a further alternative, section 74 could indicate the size of portion 64 directly such as in the form of number of tiles making up portion 64.

The latter possibility of signaling the size of portion 64 is depicted in Fig. 5. The embodiment of Fig. 5 could be modified in the same manner as the embodiment of Fig. 4 was modified by the embodiment of Fig. 6, i.e., the complete area of section 62 could be fetched from server 20 by way of segments 58 at the quality of portion 64.

In any case, at the very end of Fig. 5, information 74 distinguishes between different sizes of view section 28, i.e., between different field of views seen by view section 28. Information 74 simply indicates the size of portion 64 depending on the size of view section 28 which device 40 currently aims at. This enables the service of server 20 to be used by devices with different field of views or different sizes of view section 28 without devices such as device 40 having to cope with the problem of computing or otherwise guessing the size of portion 64 so that portion 64 suffices to cover view section 28 irrespective of any movement of section 28 as discussed with respect to Figs. 4, 6 and 7. As may have become clear from the description of Fig. 1 , it is all but easy to assess as to which constant number of tiles, for instance, may suffice to completely cover a certain size of view section 28, i.e., a certain field of view, irrespective of the view section 28's direction for spatial positioning 30. Here, information 74 alleviates this situation and device 40 is able to simply look-up within information 74 the value of the size of portion 64 to be used for the size of view section 28 applying to device 40. That is, in accordance with the embodiment of Fig. 5, the media presentation description made available for the DASH client or some in-bent mechanisms, such as event boxes or SAND messages, could include information 74 about the spherical coverage or field of view of sets of representations or sets of tiles, respectively. One example could be a tiled offering with M representations as depicted in Fig. 1. The information 74 could indicate the recommended number n < M of tiles (called representations) to download for coverage of a given end device field of view, e.g., out of a cubic representation tiled into 6 x 4 tiles as depicted in Fig. 1 , 12 tiles are deemed sufficient to cover a 90° x 90° field of view. Due to the end device field of view not always being perfect aligned with the tile boundaries, this recommendation cannot be trivially generated by device 40 on its own. Device 40 may use information 74 by downloading, for instance, at least N tiles, i.e., the media segments 58 concerning N tiles. Another way to utilize the information would be to emphasize the quality of N tiles within section 62 that are closest to the current view center of the end device, i.e., use N tiles for making up portion 64 of section 62.

With respect to Fig. 8a, an embodiment with respect to a further aspect of the present application is described. Here, Fig. 8a shows client device 10 and server 20 which communicate with each other in accordance with any of the possibilities described above with respect to Figs. 1 to 7. That is, device 10 may be embodied in accordance with any of the embodiments descried with respect to Figs. 2 to 7 or may simply act without these specifics in the manner described above with respect to Fig. 1. However, favorably, device 10 is embodied in accordance with any of the embodiments described above with respect to Figs. 2 to 7 or any combination thereof and additionally inherits the mode of operation described now with respect to Fig. 8a. In particular, device 10 is internally construed as has been described above with respect to Figs. 2 to 8, i.e., device 40 comprises selector 56, retriever 60 and, optionally, deriver 66. Selector 56 performs the selection for aiming at unequal streaming, i.e., selecting the media segments in a manner so that the media content is encoded into the selected and retrieved media segments in a manner so that the quality spatially varies and/or in a manner so that there are non-encoded portions. However, in addition to this, device 40 comprises a log message sender 80 which sends-out to server 20 or an evaluation device 82 log messages logged in, for instance,

a momentaneous measurement or a statistical value measuring a spatial position and/or movement of the first portion 64,

a momentaneous measurement or a statistical value measuring a quality of the temporally-varying spatial scene as far as encoded into selected media segments and as far as visible in view section 28, and/or

a momentaneous measurement or a statistical value measuring the quality of the first portion or a quality of the temporally-varying spatial scene 30 as far as encoded into the selected media segments and as far as visible in view section 28.

The motivation is as follows.

In order to be able to derive statistics, such as the most interesting regions or speed-probability pairs, as described previously, reporting mechanisms from users are required. Additional DASH Metrics to the ones defined in Annex D of ISO/IEC23009-1 are necessary.

One metric would be the FoV of Client as DASH Metric, where DASH clients send back to a Metric Server (it could be the same as the DASH server or another one) the characteristics of the end device in term of FoV.

One Metric would be ViewportList, where DASH clients send back to a Metric Server (it could be the same as the DASH server or another one) the viewport watched by each client in time. An instantiation of such a message could be as follows:

Key Type Description

ViewportList List List of Viewport over time

Entry Object An entry for a single Viewport time Integer Playout-time (media-time) at which the

following viewport is chosen by the client. roll Integer The roll component of the orientation of the

Viewport

pitch Integer The pitch coordinate of the orientation of the

Viewport

yaw Integer The yaw coordinate of the orientation of the

Viewport

For the Viewport (region of interest) message, the DASH client could be asked to report whenever a Viewport change occurs, with potentially a given granularity (with or without avoiding reporting of very small movements) or with a given periodicity. Such a message could be included in the MPD as an attribute @reportViewPortPeriodicity or an element or descriptor. It could be also indicated out of band, such as with a SAND message or any other means.

Viewport can also be signalled on tile granularity.

Additionally or alternatively, log messages could report on other current scene related parameters changing responsive to user input, such as any of the parameters discussed below with respect to Fig. 10 such as current user distance from the scene centre and/or the current view depth.

Another metric would be the ViewportSpeedList, where DASH clients indicate the movement speed for a given viewport in time when a movement happens.

Key Type Description

ViewportSpeedList List List of Viewport change speed over time

Entry Object An entry for a single Viewport change speed time Integer Playout-time (media-time) at which the

following viewport is chosen by the client. roll Integer The roll component of the orientation of the

Viewport

pitch Integer The pitch coordinate of the orientation of the

Viewport

yaw Integer The yaw coordinate of the orientation of the

Viewport

speed_roll Integer The speed in roll component of the

orientation of the Viewport

speed_pitch Integer The speed in pitch component of the

orientation of the Viewport

speed_yaw Integer The speed in yaw component of the

orientation of the Viewport

This message would be sent only if the client performs a viewport movement. However, the server could, as well as for the previous case, indicate that the message should be only sent if the movement is significant. Such a configuration could be something like @minViewportDifferenceForReporting signalling the size in pixels or angle or any other magnitude that needs to have changed for a message to be sent.

Another important thing for a VR-DASH service, where asymmetric quality is offered as described above, is to evaluate how fast users switch from an asymmetric representation or a set of unequal quality/resolution representations for a Viewport to another representation or set of representation more adequate for another viewport. With such a metric, Servers could derive statistics that help them to understand relevant factors that impact the QoE. Such a metric could look like:
Claims

1. Device for streaming media content pertaining a temporally varying spatial scene (30), configured to

select (56) media segments out of a plurality (46) of media segments (58) being available on a server (20),

(60) the selected media segments from the server (20), wherein the device is configured to

perform the selection so that the selected media segments have at least a spatial section (62) of the temporally varying spatial scene (30) encoded thereinto in manner according to which a first portion (64) of the spatial section is encoded into the selected media segments at a

predetermined quality, and according to which a second portion (66) of the temporally varying spatial scene, which spatially neighbors the first portion (64), is encoded into the selected media segments at a further quality fulfilling a predetermined relationship with respect to the predetermined quality, and

derive the predetermined relationship from information (68) contained in the selected media segments and/or a signalization obtained from the server (20).

2. Device according to claim 1, wherein each media segment of the plurality (46) of media segments has encoded thereinto an associated spatiotemporal portion of the temporally varying spatial scene (30) at an associated one of a set of quality levels.

3. Device according to claim 2, wherein each of the spatiotemporal portions of the temporally varying spatial scene (30) encoded into the plurality (46) of media segments are temporal segments (54) of the temporally varying spatial scene (30) at a respective one of tiles (50) into which the temporally varying spatial scene (30) is spatially subdivided.

4. Device according to any of claims 1 to 3, wherein the Information (64) indicates an endurable value for a measure of a difference between the further quality and the predetermined quality.

5. Device according to claim 4, wherein the information (68) indicates the endurable value for the measure of a difference between the further quality and the predetermined quality in a manner depending on a distance to the view section (28).

6. Device according to claim 4 or 5, wherein the information indicates the endurable value for the measure of a difference between the further quality and the predetermined quality in a manner depending on a distance to the view section (28) by way of a list of pairs of a respective distance to the view section (28) and a corresponding endurable value for the measure of the difference beyond the respective distance.

7. Device according to any of claims 4 to 6, wherein the information (64) indicates the endurable value for the measure of a difference between the further quality and the predetermined quality so that the endurable value increases with increasing distance to the view section.

8. Device according to any of claims 4 to 7, wherein the information (64) indicates the endurable value for the measure of the difference between the further quality and the predetermined quality along with an indication of a maximally allowed time interval for which the second portion may be within the view section along with the first portion.

9. Device according to claim 8, wherein the information (64) indicates a further endurable value for the measure of the difference between the further quality and the predetermined quality along with an indication of a further maximally allowed time interval for which the second portion may be within the view section along with the first portion.

10. Device according to any of claims 4 to 9, wherein the information (64) is time-varying and/or spatially varying.

11. Device according to any of claims 1 to 10, wherein the Information (64) indicates allowed pairs of concurrent settings for the further quality and the predetermined quality.

12. Device according to any of claims 1 to 11, wherein the device is configured to perform the selection so that the first portion (64) follows a temporally varying view section (28) of the temporally varying spatial scene (30).

13. Device according to claim 12, wherein the device is configured so that a spatial location of the temporally varying view section (28) changes according to user input.

14. Device according to any of claims 1 to 13, wherein the device is configured to determ first portion (64) so as to correspond to a region of interest.

15. Device according to claim 14, wherein the device is configured to retrieve information ROI from the server.

16. Streaming server for media content pertaining a temporally varying spatial scene, configured to

render available a plurality of media segments for retrieval by a device, thereby enabling the device to select media segments for retrieval which have at least a spatial section of the temporally varying spatial scene encoded thereinto in manner according to which a first portion of the spatial section is encoded into the selected media segments at a predetermined quality, and according to which a second portion of the temporally varying spatial scene, which spatially neighbors the first portion, is encoded into the selected media segments at a further quality, and signal information on a predetermined relationship in the media segments and/or by way of a signalization to the device, the predetermined relationship being to be fulfilled by the further quality with respect to the predetermined quality.

17. Streaming server according to claim 16, wherein each media segment of the plurality (46) of media segments has encoded thereinto an associated spatiotemporal portion of the temporally varying spatial scene (30) at an associated one of a set of quality levels.

18. Streaming server according to claim 17, wherein each of the spatiotemporal portions of the temporally varying spatial scene (30) encoded into the plurality (46) of media segments are temporal segments (54) of the temporally varying spatial scene (30) at a respective one of tiles (50) into which the temporally varying spatial scene (30) is spatially subdivided.

19. Streaming server according to any of claims 16 to 18, wherein the Information (64) indicates an endurable value for a measure of a difference between the further quality and the predetermined quality.

20. Streaming server according to claim 19, wherein the information (68) indicates the endurable value for the measure of a difference between the further quality and the predetermined quality in a manner depending on a distance to the view section (28).

21. Streaming server according to claim 19 or 20, wherein the information indicates the endurable value for the measure of a difference between the further quality and the predetermined quality in a manner depending on a distance to the view section (28) by way of a list of pairs of a respective distance to the view section (28) and a corresponding endurable value for the measure of the difference beyond the respective distance.

22. Streaming server according to any of claims 19 to 21, wherein the information (64) indicates the endurable value for the measure of a difference between the further quality and the predetermined quality so that the endurable value increases with increasing distance to the view section.

23. Streaming server according to any of claims 19 to 22, wherein the information (64) indicates the endurable value for the measure of the difference between the further quality and the predetermined quality along with an indication of a maximally allowed time interval for which the second portion may be within the view section along with the first portion.

24. Streaming server according to claim23, wherein the information (64) indicates a further endurable value for the measure of the difference between the further quality and the predetermined quality along with an indication of a further maximally allowed time interval for which the second portion may be within the view section along with the first portion.

25. Streaming server according to any of claims 19 to 24, wherein the information (64) is time-varying and/or spatially varying.

26. Streaming server according to any of claims 19 to 25, wherein the Information (64) indicates allowed pairs of concurrent settings for the further quality and the predetermined quality.

27. Streaming server according to any of claims 19 to 6, wherein the device is configured to send information on an ROI to the device.

28. Media presentation description comprising

Information on computing addresses of a plurality of media segments, so that a device, using the information, may select and retrieve media segments out of the plurality of media segments

which have at least a spatial section of the temporally varying spatial scene encoded thereinto in manner according to which a first portion of the spatial section is encoded into the selected media segments at a predetermined quality, and according to which a second portion of the temporally varying spatial scene, which spatially neighbors the first portion, is encoded into the selected media segments at a further quality, and

information (64) on a predetermined relationship to be fulfilled by the further quality with respect to the predetermined quality.

29. Media presentation description according to claim 28, wherein each media segment of the plurality (46) of media segments has encoded thereinto an associated spatiotemporal portion of the temporally varying spatial scene (30) at an associated one of a set of quality levels.

30. Media presentation description according to claim 29, wherein each of the spatiotemporal portions of the temporally varying spatial scene (30) encoded into the plurality (46) of media segments are temporal segments (54) of the temporally varying spatial scene (30) at a respective one of tiles (50) into which the temporally varying spatial scene (30) is spatially subdivided.

31. Media presentation description according to any of claims 28 to 30, wherein the Informatior (64) indicates an endurable value for a measure of a difference between the further quality and the predetermined quality.

32. Media presentation description according to claim 31, wherein the information (68) indicates the endurable value for the measure of a difference between the further quality and the predetermined quality in a manner depending on a distance to the view section (28).

33. Media presentation description according to claim 31 or 32, wherein the information indicates the endurable value for the measure of a difference between the further quality and the predetermined quality in a manner depending on a distance to the view section (28) by way of a list of pairs of a respective distance to the view section (28) and a corresponding endurable value for the measure of the difference beyond the respective distance.

34. Media presentation description according to any of claims 31 to 33, wherein the information (64) indicates the endurable value for the measure of a difference between the further quality and the predetermined quality so that the endurable value increases with increasing distance to the view section.

35. Media presentation description according to any of claims 31 to 34, wherein the information (64) indicates the endurable value for the measure of the difference between the further quality and the predetermined quality along with an indication of a maximally allowed time interval for which the second portion may be within the view section along with the first portion.

36. Media presentation description according to claim 35, wherein the information (64) indicates a further endurable value for the measure of the difference between the further quality and the predetermined quality along with an indication of a further maximally allowed time interval for which the second portion may be within the view section along with the first portion.

37. Media presentation description according to any of claims 31 to 36, wherein the information (64) is time-varying and/or spatially varying.

38. Media presentation description according to any of claims 31 to 37, wherein the Information (64) indicates allowed pairs of concurrent settings for the further quality and the predetermined quality.

39. Media presentation description according to any of claims 31 to 38, wherein the media presentation description comprises information on an ROI.

40. Device for streaming media content pertaining a temporally varying spatial scene (30), configured to

select media segments out of a plurality (46) of media segments (58) available on a server (20), (20), wherein the device is configured to

perform the selection so that the selected media segments have at least a spatial section (62) of the temporally varying spatial scene (30) encoded thereinto in a manner

according to which a first portion (64) of the spatial section (62) is encoded into the selected media segments at a predetermined quality, and according to which a second portion (66; 72) of the temporally varying spatial scene, which spatially neighbors the first portion (64), is not encoded into the selected media segments or encoded into the selected media segments at a further quality reduced relative to the predetermined quality, and

so that the first portion (64) follows a temporally varying view section (28) of the temporally varying spatial scene (30), and

set a size and/or position of the first portion (64) depending on information (74) contained in the selected media segments and/or a signalization obtained from the server.

41. Device according to claim 40, wherein the information (74) is indicative of the size in form of an increment relative to, or a scaling of, a size of the temporally varying view section.

42. Device according to claim 40 or 41, wherein the information (74) indicates a predetermined value for a measure of a spatial speed of the view section.

43. Device according to claim 42, wherein the information (74) indicates the predetermined value for the measure of the spatial speed of the view section

for a default percentile of users for which a measured spatial speed does not exceed the predetermined value, and/or

along with a percentile value indicting a percentile of users for which a measured spatial speed does not exceed the predetermined value, and/or

along with a percentile value indicting a percentile of users for which a the view section is in a predetermined area, and/or

along with a hint indicating one or more types of user inputs controlling a movement of the view section for which the predetermined value is applicable.

44. Device according to claim 42 or 43, configured to perform the setting so that

wherein the higher the predetermined value for the measure of the spatial speed of the section d is the larger the size is.

45. Device according to any of claims 40 or 44, wherein the information (74) indicates a predetermined value for a measure of a probability for a direction of movement of the section.

46. Device according to claim 45, configured to perform the setting so that

the higher the probability for a respective direction of movement is the more the first portion extends into the respective direction.

47. Device according to any of claims 40 or 46, wherein the information (74) indicates the predetermined spatial speed in a time-varying and/or spatially varying and/or directionally varying manner.

48. Device according to any of claims 40 or 47, configured so that the first portion (64) and the spatial section (62) coincide.

49. Device according to any of claims 40 or 48, wherein each of the spatiotemporal portions of the temporally varying spatial scene (30) encoded into the plurality of media segments are temporal segments (54) of the temporally varying spatial scene (30) at a respective one of tiles (50) into which the temporally varying spatial scene (30) is spatially subdivided.

50. Device according to claim 49, wherein each media segment of the plurality of media segments has encoded thereinto an associated spatiotemporal portion of the temporally varying spatial scene at an associated one of a set of quality levels.

51. Device according to claim 40, wherein the device is configured to set the size in a manner independent from a size of the temporally varying view section dependent on the information (74).

52. Device according to claim 40, wherein the information comprises different values for the size for different size options of the time-varying view section with the device using the value comprised by the information for a size option fitting to an actual size of the time-varying view section.

53. Device according to claim 51 or 52, wherein the information indicates the size in number of tiles.

54. Streaming server for media content pertaining a temporally varying spatial scene, configured to

render available a plurality of media segments for retrieval by a device, thereby enabling the device to select media segments for retrieval which have at least a spatial section of the temporally varying spatial scene encoded thereinto in manner according to which a first portion of the spatial section is encoded into the selected media segments at a predetermined quality, and according to which a second portion of the temporally varying spatial scene, which spatially neighbors the first portion, is not encoded into the selected media segments or encoded into the selected media segments at a further quality reduced relative to the predetermined quality, and so that the first portion follows a temporally varying view section of the temporally varying spatial scene, and

signal information on how to set a size and/or position of the first portion, in the media segments and/or by way of a signalization to the device.

55. Signal defining a media presentation description comprising

Information on computing addresses of a plurality of media segments, so that a device, using the information, may select and retrieve media segments out of the plurality of media segments which have at least a spatial section of the temporally varying spatial scene encoded thereinto in manner according to which a first portion of the spatial section is encoded into the selected media segments at a predetermined quality, and according to which a second portion of the temporally varying spatial scene, which spatially neighbors the first portion, is not encoded into the selected media segments or encoded into the selected media segments at a further quality reduced relative to the predetermined quality, and so that the first portion follows a temporally varying view section of the temporally varying spatial scene, and

information on how to set a size and/or of the first portion, in the media segments and/or by way of a signalization to the device.

56. Signal according to claim 55, wherein the information (74) is indicative of the size in form of an increment relative to, or a scaling of, a size of the temporally varying view section.

57. Signal according to claim 55 or 56, wherein the information (74) indicates a predetermined value for a measure of a spatial speed of the view section.

58. Signal according to claim 57, wherein the information (74) indicates the predetermined value for the measure of the spatial speed of the view section

for a default percentile of users for which a measured spatial speed does not exceed the predetermined value, and/or

along with a percentile value indicting a percentile of users for which a measured spatial speed does not exceed the predetermined value, and/or

along with a percentile value indicting a percentile of users for which the view section is in a predetermined area, and/or

along with a hint indicating one or more types of user inputs controlling a movement of the view section for which the predetermined value is applicable.

59. Signal according to claim 57 or 58, configured to perform the setting so that

wherein the higher the predetermined value for the measure of the spatial speed of the section d is the larger the size is.

60. Signal according to any of claims 55 or 59, wherein the information (74) indicates a predetermined value for a measure of a probability for a direction of movement of the view section.

61. Signal according to claim 60, configured to perform the setting so that

the higher the probability for a respective direction of movement is the more the first portion extends into the respective direction.

62. Signal according to any of claims 55 or 61, wherein the information (74) indicates the predetermined spatial speed in a time-varying and/or spatially varying and/or directionally varying manner.

63. Signal according to any of claims 55 or 62, configured so that the first portion (64) and the spatial section (62) coincide.

64. Signal according to any of claims 55 or 63, wherein each of the spatiotemporal portions of the temporally varying spatial scene (30 encoded into the plurality of media segments are temporal segments (54) of the temporally varying spatial scene (30) at a respective one of tiles (50) into which the temporally varying spatial scene (30) is spatially subdivided.

65. Signal according to claim 64, wherein each media segment of the plurality of media segments has encoded thereinto an associated spatiotemporal portion of the temporally varying spatial scene at an associated one of a set of quality levels.

66. Signal according to claim 55, wherein the device is configured to the size in a manner independent from a size of the temporally varying view section dependent on the information (74).

67. Signal according to claim 55, wherein the information comprises different values for the size for different size options of the time-varying view section with the device using the value comprised by the information for a size option fitting to an actual size of the time-varying view section.

68. Signal according to claim 65 or 66, wherein the information indicates the size in number of tiles.

69. Video bitstream having a video encoded thereinto, the video bitstream comprising a signaiization of one or more of a size of focus area within the video onto which a decoding power for decoding the video should be focused, and a recommended preferred view-section area of the video.

70. Decoder for decoding from a video bitstream a video, configured to

derive from the video bitstream a signaiization (74) of a size of a focus area within the video, and focus a decoding power for decoding the video onto the focus area (116).

71. Decoder according to claim 70, configured to decode the focus area exclusively.

72. Decoder according to claim 70, configured to decode start decoding each picture of the video at the focus area.

73. Decoder according to claim 70, configured to cease decoding each picture of the video upon decoding the focus area.

74. Decoder according to any of claims 70 to 73, wherein the signalization indicates the size absolutely or decoder is configured to scale a size of focus area by a parameter contained in the signalization.

75. Device for streaming media content pertaining a temporally varying spatial scene (30), configured to

derive (90) from a media presentation description

at least one version at which the temporally varying spatial scene is offered for tile-based streaming,

for each of the at least one version an indication of benefiting requirements for benefiting from the tile-based streaming the respective version of the temporally varying spatial scene,

match (92) the benefiting requirements of the at least one version with a device capability of the device or another device interacting with the device with respect to the tile-based streaming.

76. Device according to claim 75, wherein the benefiting requirements and the device capability pertain decoding capabilities.

77. Device according to claim 75 or 76, wherein the benefiting requirements and the d capability pertain numbers of available decoders.

78. Device according to any of claims 75 to 77, wherein the benefiting requirements and the device capability pertain level and/or profile descriptors.

79. Device according to any of claims 75 to 78, wherein the benefiting requirements and the device capability pertain a type of input device for moving a view section across the temporally varying spatial scene (30) or pertain a speed at which the view section is, using the input device, moved across the temporally varying spatial scene (30).

80. Device according to any of claims 75 to 79, wherein the device is configured to

select media segments out of a plurality (46) of media segments (58) available on the server (20) by computing addresses of the selected media segments using a computational rule comprised by the media presentation description ,

retrieve the selected media segments from the server (20) using the computed addresses, wherein the device is configured to perform the selection so that the selected media segments have at least a spatial section (62) of the temporally varying spatial scene (30) encoded thereinto in a manner

so that the first portion (64) follows a temporally varying view section (28) of the temporally varying spatial scene (30).

81. Streaming server for streaming media content pertaining a temporally varying spatial scene (30), configured to

provide (90) a media presentation description from which

at least one version at which the temporally varying spatial scene is offered for tile-based streaming,

for each of the at least one version an indication of benefiting requirements for benefiting from the tile-based streaming the respective version of the temporally varying spatial scene,

is derivable, thereby enabling a device streaming the media content from the streaming server to

match (92) the benefiting requirements of the at least one version with a device capability of the device or another device interacting with the device with respect to the tile-based streaming.

82. A media presentation description comprising

Information on at least one version at which a temporally varying spatial scene is offered for tile-based streaming,

for each of the at least one version, an indication of benefiting requirements for benefiting from the tile-based streaming the respective version of the temporally varying spatial scene.

83. Media presentation description according to claim 82, wherein the benefiting requirements and the device capability pertain decoding capabilities.

84. Media presentation description according to claim 82 or 83, wherein the benefiting requirements and the device capability pertain numbers of available decoders.

85. Media presentation description according to any of claims 82 to 84, wherein the benefiting requirements and the device capability pertain level and/or profile descriptors.

86. Media presentation description according to any of claims 82 to 85, wherein the benefiting requirements and the device capability pertain a type of input device for moving a view section across the temporally varying spatial scene (30) or pertain a speed at which the view section is, using the input device, moved across the temporally varying spatial scene (30).

87. Media presentation description according to any of claims 82 to 86, further comprising a computational rule using which the device is enabled to

select media segments out of a plurality (46) of media segments (58) available on the server (20) by computing addresses of the selected media segments using the computational rule comprised.

88. Device for streaming media content pertaining a temporally varying spatial scene (30), configured to

Depending on a spatial viewport position and at least one parameter, compute addresses of media segments, the media segments describing a spatial scene (30) varying in time and the at least one parameter,

retrieve the media segments using the computed addresses.

89. Device according to claim 88, wherein the at least one parameter comprises one or more coordinates of a viewing center and/or a view depth.

90. Media presentation description comprising

a computation rule for, depending on a spatial viewport position and at least one parameter, computing addresses of media segments, the media segments describing a spatial scene (30)

varying in time and the at least one parameter so as to retrieve the media segments using the computed addresses.

91. Streaming server for allowing a device to stream media content pertaining a temporally varying spatial scene (30) from a server, configured to provide a media presentation descriptio according to claim 90.

92. Device for streaming media content pertaining a temporally varying spatial scene, configured to

select media segments out of a plurality of media segments available on a server,

wherein the device is configured to

have a first portion of the temporally varying spatial scene encoded thereinto at a quality increased compared to a spatial neighborhood of the first portion or in a manner so that the spatial neighborhood of the first portion is not encoded into the selected media segments, send-out log messages logging

a momentaneous measurement measuring a spatial position and/or movement of the first portion; and/or

a statistical value, such as a temporally average, measuring a spatial position and/or movement of the first portion; and/or

a momentaneous measurement measuring a quality of the temporally varying spatial scene as far as encoded into the selected media segments and as far as visible in a view section; and/or

an indication of a set of buffers (300) of the device involved in buffering the selected media segments, a description of a distribution rule applied in distributing the selected media segments onto the set of buffers, and a momentaneous buffer fullness of each of the set of buffers; and/or

a measurement of an amount of the selected media segments not having been output from a buffer of the device for being subject to decoding (42); and/or

a statistical value, such as a temporally average, measuring a quality of the temporally varying spatial scene as far as encoded into the selected media segments and as far as visible in a view section; and/or

a momentaneous measurement measuring the quality of the first portion or a quality of the temporally varying spatial scene as far as encoded into the selected media segments and as far as visible in a view section; and/or

a statistical value, such as an temporally average, measuring the quality of the first portion or a quality of the temporally varying spatial scene as far as encoded into the selected media segments and as far as visible in a view section; and/or

a field of view covered by the view section; and/or

a momentaneous measurement measuring a user position or view depth relative to a scene center (100); and/or

a statistical value, such as an temporally average, measuring a user position or view depth relative to a scene center (100).

93, Device according to claim 92, wherein the quality of the first portion or a quality of the temporally varying spatial scene as far as encoded into the selected media segments and as far as visible in a view section is measured as a time duration at which a lower quality portion is visible in the view section along with a higher quality portion.

94. Device according to claim 92 or 93, configured to perform the selection such that the first portion (64) of the temporally varying spatial scene tacks the view section (28).

95. Device according to any of claims 92 to 94, configured the send-out log messages logging the momentaneous measurement measuring a quality of the temporally varying spatial scene as far as encoded into the selected media segments and as far as visible in a view section as one of a measure measuring a mean density of pixels falling into the view section (28) at which the temporally varying spatial scene is encoded into the selected media segments.

96. Device according to claim 95, configured so that the measure measures the mean density of pixels by averaging the pixel density in a spatially uniform manner with respect to a pixel grid of pictures coded into the selected media segments.

97. Device according to claim 95, configured so that the measure measures the mean density of pixels by averaging the pixel density in a spatially non-uniform manner with respect to a pixel grid of pictures coded into the selected media segments.

98. Device according to claim 95, configured so that the send-out log messages indicate whether the measure measures the mean density of pixels by

averaging the pixel density in a spatially uniform manner with respect to a pixel grid of pictures coded into the selected media segments, or

averaging the pixel density in a spatially non-uniform manner with respect to a pixel grid of pictures coded into the selected media segments.

99. Device according to claim 97 or 98, wherein the averaging in pixel density in a spatially nonuniform manner corresponds to

averaging in a spherically uniform manner or

averaging spatially uniformly with respect to a viewport plane (310) which is perpendicular to a central view direction (312) of the view section (28).

100. Device according to any of claims 95 to 99, configured so that the measure measures the mean density of pixels by averaging the pixel density in a manner restricting the averaging to a central subsection of the view section (28), or applying a higher averaging weight, to the central subsection (202) compared to an edge portion (204) of the view section, surrounding the central subsection.

101. Device according to any of claims 95 to 100, configured so that the measure measures the mean density of pixels in a manner separately along a horizontal view section axis (204) and a vertical view section axis (206), respectively.

102. Device according to any of claims 95 to 101 configured to send-out log messages intermittently.

103. Device according to any of claims 95 to 102, configured to send-out log messages at a rate controlled by a manifest file based on which the device performs the selection of the media segments for download.

104. Device according to any of claims 95 to 103, wherein the plurality of media segments available on the server each belong to one of a plurality representations of the temporally varying spatial scene, the representations differing in one or more of

scene section (50) of the temporally varying spatial scene being encoded thereinto,

quality at which the temporally varying spatial scene is encoded thereinto,

spatial quality variation at which the temporally varying spatial scene is encoded thereinto, wherein the device is configured to send-out log messages logging the description of the distribution rule applied in distributing the selected media segments onto the set of buffers form of an association of each buffer to one of, or a combination or two or more of,

the scene section,

the quality,

the spatial quality distribution,

representation.

105. Device according to any of claims 95 to 104, wherein the representations are grouped into adaptation sets according to one or more of

the scene section of the temporally varying spatial scene being encoded thereinto,

the spatial quality variation at which the temporally varying spatial scene is encoded thereinto, wherein the device is configured to configured the send-out log messages logging the description of the distribution rule applied in distributing the selected media segments onto the set of buffers in form of an association of each buffer to one of the adaptation sets or logging the description of the distribution rule applied in distributing the selected media segments onto the set of buffers in form of an association of each buffer to one of the representations.

106. Device according to any of claims 95 to 105, wherein the device is configured to send-out log messages logging the measurement of the amount of the selected media segments not having been output from a buffer of the device for being subject to decoding (42) in form of a temporal measurement.

107. Device according to claim 106, wherein the device is configured so that the measurement of the amount of the selected media segments not having been output from a buffer of the device for being subject to decoding (42) in form of a temporal measurement in temporal units smaller than and/or being defined independent from a temporal length of the media segments (58) and/or in milliseconds.

108. Device according to any of claims 95 to 107, wherein the device is configured to send-out log messages logging the measurement of the amount of the selected media segments not having been output from a buffer of the device for being subject to decoding (42) in a format classified by one or more of

a buffer of the decoder within which the respective media segment has been buffered,

a scene section encoded into the respective media segment,

a quality at which the temporally varying spatial scene is encoded into the respective media segment,

a spatial quality distribution at which the temporally varying spatial scene is encoded into the respective media segment.

109. Method for streaming media content pertaining a temporally varying spatial scene (30), comprising

selecting (56) media segments out of a plurality (46) of media segments (58) available on a server (20),

retrieving (60) the selected media segments from the server (20), perform the selection is performed so that the selected media segments have at least a spatial section (62) of the temporally varying spatial scene (30) encoded thereinto in manner according to which a first portion (64) of the spatial section is encoded into the selected media segments at a

the method further comprises deriving the predetermined relationship from information (68) contained in the selected media segments and/or a signalization obtained from the server (20).

110. Method for streaming media content pertaining a temporally varying spatial scene, comprising

rendering available a plurality of media segments for retrieval by a device, thereby enabling the device to select media segments for retrieval which have at least a spatial section of the temporally varying spatial scene encoded thereinto in manner according to which a first portion of the spatial section is encoded into the selected media segments at a predetermined quality, and according to which a second portion of the temporally varying spatial scene, which spatially neighbors the first portion, is encoded into the selected media segments at a further quality, and signaling information on a predetermined relationship in the media segments and/or by way of a signalization to the device, the predetermined relationship being to be fulfilled by the further quality with respect to the predetermined quality.

111. Method for streaming media content pertaining a temporally varying spatial scene (30), comprising

selecting media segments out of a plurality (46) of media segments (58) available on a server (20), retrieving the selected media segments from the server (20), wherein the selection is performed so that the selected media segments have at least a spatial section (62) of the temporally varying spatial scene (30) encoded thereinto in a manner

so that the first portion (64) follows a temporally varying view section (28) of the temporally varying spatial scene (30), and

the method further comprising setting a size of the first portion (64) depending on information (74) contained in the selected media segments and/or a signalization obtained from the server.

112. Method for streaming media content pertaining a temporally varying spatial scene, comprising

rendering available a plurality of media segments for retrieval by a device, thereby enabling the device to select media segments for retrieval which have at least a spatial section of the temporally varying spatial scene encoded thereinto in manner according to which a first portion of the spatial section is encoded into the selected media segments at a predetermined quality, and according to which a second portion of the temporally varying spatial scene, which spatially neighbors the first portion, is not encoded into the selected media segments or encoded into the selected media segments at a further quality reduced relative to the predetermined quality, and so that the first portion follows a temporally varying view section of the temporally varying spatial scene, and

signaling information on how to set a size of the first portion, in the media segments and/or by way of a signalization to the device.

113. Method for decoding from a video bitstream a video, comprising

deriving from the video bitstream a signalization of a size of a focus area within the video, and focussing a decoding power for decoding the video onto the focus area.

114. Method, performed by a device, for streaming media content pertaining a temporally varying spatial scene (30), comprising

deriving (90) from a media presentation description

at least one version at which the temporally varying spatial scene is offered for tile-based streaming,

for each of the at least one version an indication of benefiting requirements for benefiting from the tile-based streaming the respective version of the temporally varying spatial scene,

matching (92) the benefiting requirements of the at least one version with a device capability of the device or another device interacting with the device with respect to the tile-based streaming.

115. Method for streaming media content pertaining a temporally varying spatial scene, comprising

selecting media segments out of a plurality of media segments available on a server,

retrieving the selected media segments from the server, perform wherein the selection is performed so that the selected media segments have a first portion of the temporally varying spatial scene encoded thereinto at a quality increased compared to a spatial neighborhood of the first portion or in a manner so that the spatial neighborhood of the first portion is not encoded into the selected media segments,

the method further comprising sending-out log messages logging

a momentaneous measurement measuring a spatial position and/or movement of the first portion; and/or

a statistical value, such as a temporally average, measuring a spatial position and/or movement of the first portion; and/or

a momentaneous measurement measuring a quality of the temporally varying spatial scene as far as encoded into the selected media segments and as far as visible in a view section; and/or

an indication of a set of buffers of the device involved in buffering the selected media segments, a description of a distribution rule applied in distributing the selected media segments onto the set of buffers, and a momentaneous buffer fullness of each of the set of buffers; and/or

a measurement of an amount of the selected media segments not having been output from a buffer of the device for being subject to decoding (42); and/or

a field of view covered by the view section; and/or

a momentaneous measurement measuring a user position or view depth relative to a scene center (100); and/or

a statistical value, such as an temporally average, measuring a user position or view depth relative to a scene center (100).

116. Method for streaming media content pertaining a temporally varying spatial scene (30), comprising

providing (90) a media presentation description from which

at least one version at which the temporally varying spatial scene is offered for tile-based streaming,

for each of the at least one version an indication of benefiting requirements for benefiting from the tile-based streaming the respective version of the temporally varying spatial scene,

is derivable, thereby enabling a device streaming the media content from a streaming server to

match (92) the benefiting requirements of the at least one version with a device capability of the device or another device interacting with the device with respect to the tile-based streaming.

117. Method for allowing a device to stream media content pertaining a temporally varying spatial scene (30) from the server, comprising providing a media presentation description according to claim 90.

118. A computer program having a program code for executing the method according to any of claims 109 to 117, when the program is executed on a computer.

Documents

Application Documents

#	Name	Date
1	201937014752.pdf	2019-04-12
2	201937014752-STATEMENT OF UNDERTAKING (FORM 3) [12-04-2019(online)].pdf	2019-04-12
3	201937014752-FORM 1 [12-04-2019(online)].pdf	2019-04-12
4	201937014752-FIGURE OF ABSTRACT [12-04-2019(online)].pdf	2019-04-12
5	201937014752-DRAWINGS [12-04-2019(online)].pdf	2019-04-12
6	201937014752-DECLARATION OF INVENTORSHIP (FORM 5) [12-04-2019(online)].pdf	2019-04-12
7	201937014752-COMPLETE SPECIFICATION [12-04-2019(online)].pdf	2019-04-12
8	201937014752-FORM 18 [27-04-2019(online)].pdf	2019-04-27
9	201937014752-RELEVANT DOCUMENTS [17-06-2019(online)].pdf	2019-06-17
10	201937014752-MARKED COPIES OF AMENDEMENTS [17-06-2019(online)].pdf	2019-06-17
11	201937014752-FORM 13 [17-06-2019(online)].pdf	2019-06-17
12	201937014752-FORM 13 [17-06-2019(online)]-1.pdf	2019-06-17
13	201937014752-AMMENDED DOCUMENTS [17-06-2019(online)].pdf	2019-06-17
14	201937014752-FORM-26 [01-07-2019(online)].pdf	2019-07-01
15	201937014752-Proof of Right (MANDATORY) [16-08-2019(online)].pdf	2019-08-16
16	201937014752-Information under section 8(2) (MANDATORY) [11-09-2019(online)].pdf	2019-09-11
17	201937014752-Information under section 8(2) [14-07-2020(online)].pdf	2020-07-14
18	201937014752-Information under section 8(2) [21-07-2020(online)].pdf	2020-07-21
19	201937014752-Information under section 8(2) [15-09-2020(online)].pdf	2020-09-15
20	201937014752-Information under section 8(2) [14-10-2020(online)].pdf	2020-10-14
21	201937014752-FORM 3 [23-03-2021(online)].pdf	2021-03-23
22	201937014752-Information under section 8(2) [10-04-2021(online)].pdf	2021-04-10
23	201937014752-Information under section 8(2) [17-06-2021(online)].pdf	2021-06-17
24	201937014752-FORM 4(ii) [29-07-2021(online)].pdf	2021-07-29
25	201937014752-Information under section 8(2) [18-08-2021(online)].pdf	2021-08-18
26	201937014752-Information under section 8(2) [23-09-2021(online)].pdf	2021-09-23
27	201937014752-FER.pdf	2021-10-18
28	201937014752-OTHERS [02-11-2021(online)].pdf	2021-11-02
29	201937014752-FER_SER_REPLY [02-11-2021(online)].pdf	2021-11-02
30	201937014752-DRAWING [02-11-2021(online)].pdf	2021-11-02
31	201937014752-CLAIMS [02-11-2021(online)].pdf	2021-11-02
32	201937014752-Annexure [02-11-2021(online)].pdf	2021-11-02
33	201937014752-Information under section 8(2) [24-11-2021(online)].pdf	2021-11-24
34	201937014752-Information under section 8(2) [24-12-2021(online)].pdf	2021-12-24
35	201937014752-Information under section 8(2) [10-03-2022(online)].pdf	2022-03-10
36	201937014752-Information under section 8(2) [11-03-2022(online)].pdf	2022-03-11
37	201937014752-Information under section 8(2) [04-04-2022(online)].pdf	2022-04-04
38	201937014752-Information under section 8(2) [06-05-2022(online)].pdf	2022-05-06
39	201937014752-Information under section 8(2) [15-06-2022(online)].pdf	2022-06-15
40	201937014752-Information under section 8(2) [29-06-2022(online)].pdf	2022-06-29
41	201937014752-Information under section 8(2) [04-07-2022(online)].pdf	2022-07-04
42	201937014752-Information under section 8(2) [25-08-2022(online)].pdf	2022-08-25
43	201937014752-FORM 3 [05-09-2022(online)].pdf	2022-09-05
44	201937014752-Information under section 8(2) [20-09-2022(online)].pdf	2022-09-20
45	201937014752-Information under section 8(2) [13-01-2023(online)].pdf	2023-01-13
46	201937014752-Information under section 8(2) [27-01-2023(online)].pdf	2023-01-27
47	201937014752-Information under section 8(2) [15-02-2023(online)].pdf	2023-02-15
48	201937014752-Information under section 8(2) [09-03-2023(online)].pdf	2023-03-09
49	201937014752-Information under section 8(2) [02-06-2023(online)].pdf	2023-06-02
50	201937014752-Information under section 8(2) [11-07-2023(online)].pdf	2023-07-11
51	201937014752-Information under section 8(2) [10-08-2023(online)].pdf	2023-08-10
52	201937014752-Information under section 8(2) [19-09-2023(online)].pdf	2023-09-19
53	201937014752-FORM 3 [19-09-2023(online)].pdf	2023-09-19
54	201937014752-Information under section 8(2) [15-01-2024(online)].pdf	2024-01-15
55	201937014752-PatentCertificate07-03-2024.pdf	2024-03-07
56	201937014752-IntimationOfGrant07-03-2024.pdf	2024-03-07

Search Strategy

1	search4752E_01-02-2021.pdf