Method And Device For Image Processing, And Suitable Method And Device

Method And Device For Image Processing, And Suitable Method And Device For Decoding A Multi View Video

Abstract: The invention relates to a method and a device for decoding a data flow representing a multi-view video. Syntax elements are obtained (E20) from at least a portion of the flow data and used to reconstruct (E21) at least one image of a view of the video. Thereafter, at least one metadata item in a predetermined form is obtained (E23) from at least one obtained syntax element, and supplied (E24) to an image processing module. The invention also relates to an image processing method and device for reading out the at least one metadata item in the predetermined form and for using same to generate at least one image of a virtual view from a reconstructed view of the multi-view video.

Patent Information

Application #

Filing Date

20 October 2020

Publication Number

06/2021

Publication Type

INA

Invention Field

ELECTRONICS

Status

Email

archana@anandandanand.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-07-22

Renewal Date

Applicants

ORANGE

78 rue Olivier de Serres 75015 Paris

Inventors

1. JUNG, Joël

Orange Gardens - TGI/OLR/IPL/Patents - 44 Avenue de la République CS 50010 92326 Châtillon Cedex

2. NIKITIN, Pavel

Orange Gardens - TGI/OLR/IPL/Patents - 44 Avenue de la République CS 50010 92326 Châtillon Cedex

3. BOISSONADE, Patrick

Orange Gardens - TGI/OLR/IPL/Patents - 44 Avenue de la République CS 50010 92326 Châtillon Cedex

Specification

Method and device for decoding a multi-view video, and method and device for processing images

1. Field of the invention

The present invention relates generally to the field of 3D image processing, and more precisely to the decoding of multi-view image sequences and to the synthesis of images of intermediate views.

2. Prior Art

In the field of virtual reality, free navigation allows the spectator to watch a scene from any point of view, whether that point of view corresponds to a point of view captured by a camera or to a point of view that does not was not captured by a camera. Such a view which has not been captured by the camera is also called a virtual view or an intermediate view because it is situated between views captured by the camera and must be synthesized for the restitution.

Free navigation in a scene requires correctly managing each movement of the user viewing the multi-view video, and avoiding the feeling of discomfort which may appear in the viewer when the rendering of the images is not optimal.

In general, the user's movement is correctly taken into account by the rendering devices, for example an HMD virtual reality headset (for Head Mounted Device in English).

However, providing the correct pixels for display regardless of user movement (rotation or translation) remains problematic. Indeed, the calculation of the images to be displayed requires the use of several captured views in order to be able to display additional virtual (ie synthesized) view images. Such virtual views correspond to points of view which have not been captured by a camera. It is then necessary to calculate them from the decoded captured views and the associated depths.

Consequently, a code offering a free navigation functionality must be efficient to code several views and associated depths and allow optimal rendering of virtual views, ie requiring the use of a synthesis algorithm to be displayed.

There are known multi-view video coders designed to code multi-view sequences, such as the MV-HEVC or 3D-HEVC standard (H series: Audiovisual and multimedia Systems -Infrastructure of audio visual services - Coding of moving video, High Efficiency Video Coding, Recommendation ITU-T H.265, International Telecommunication Union, December 2016).

The MV-HEVC encoder applies very basic inter-view prediction, while the 3D-HEVC encoder includes several additional tools to take advantage of not only temporal redundancies, but also inter-view redundancies. In addition, 3D-HEVC has specific tools for efficient coding of depth maps. These two coded codes, and in particular 3D-HEVC, effectively reduce the bit rate when it comes to coding several views with associated depths, compared to a conventional video codec processing 2D video sequences, such as the HEVC standard.

In a virtual reality context, after the decoding of the views which have been captured by cameras and encoded in a data stream, virtual views can be synthesized as a function for example of the movements of the user.

For example, the VSRS tool (Wegner, Stankiewicz, Tanimoto, Domanski, Enhanced view synthesis reference software (VSRS) for free-viewpoint television, ISO / IEC JTC1 / SC29 / WG11 m31520, October 2013, Geneva, Switzerland) is known to synthesize such virtual views.

FIG. 1 illustrates a conventional free navigation system, in which a decoder DEC (for example 3D-HEVC) decodes a data stream STR to produce decoded views (VD1, VD2). Such views are then used by a SYNTH view synthesizer (eg VSRS) to produce VS (1 +2) synthesized views. Decoded views and synthesized views are then displayed by a display device DISP, according to the user's movement.

A conventional decoder DEC is illustrated in FIG. 2. Conventionally, such a decoder performs the analysis (E20) of the data stream STR to obtain the relevant data to be decoded, then applies the decoding process (E21) to reconstruct the decoded views ( VD1, VD2) which can then be used by the SYNTH synthesis module to generate virtual views.

It thus appears that the process of decoding views from a data stream and the process of synthesizing virtual views are not correlated. In particular, the synthesis process is a difficult task in which the decoder does not participate. The decoder only makes available to the synthesis module decoded views reconstructed from the data stream.

A technical problem encountered by virtual reality applications is that the encoder and the decoder have no a priori knowledge of the final point of view required by the user, in particular in the case of free navigation. The multi-view video encoder and decoder even have no knowledge of the synthesis process that will ultimately be used to synthesize virtual views. Indeed, the synthesis method used to synthesize virtual views is not currently standardized, unlike the multi-view video decoder, so that a synthesis method used by virtual reality applications remains a tool. owner.

Consequently, the quality of the synthesized virtual views depends on the synthesis tools and the synthesis algorithm used by such applications. In general, such a quality depends on the complexity of the synthesis tools used and the resources of the devices implementing these synthesis tools.

Virtual reality applications, and more particularly those implementing free navigation, must be real time. The virtual view synthesis modules generally provide virtual views of average quality, in particular when the number of captured and decoded views is insufficient, and even when the captured decoded and reconstructed views are of high visual quality.

3. Disclosure of the invention

The invention improves the state of the art.

It relates to a method of decoding a data stream representative of a multi-view video, implemented by a decoding device, comprising obtaining elements of syntax from at least part of the data of the stream, and the reconstruction of at least one image of a view of the video from the syntax elements obtained. Advantageously, the decoding method further comprises obtaining at least one metadata in a predetermined form from at least one syntax element, and providing said at least one metadata to an image processing module .

Such a decoding method thus makes it possible to provide an image processing module, for example a synthesis module external to the decoder, with metadata representative of data from the video stream and which can be used by the image processing module. The method implemented within the image processing module is thus less complex. For example, in the case of a virtual view synthesis module, it is not necessary to recalculate some of the data which is used by the synthesis algorithm and which is available from the decoder. In addition, the invention also allows the image processing module to have access to data that it would not be able to calculate by itself and to use them in order to improve its operation. For example, in the case of

The method implemented within the image processing module can thus be improved, since the operational complexity for obtaining data which are available at the level of the decoder being reduced, image processing algorithms more complex, and therefore more efficient, can then be more easily implemented within the image processing module.

In the case of a virtual view synthesis module, the quality of the virtual views is thus improved. This also improves a user's free navigation in multi-view video, providing smoother transitions between views. In addition, the improvement of the synthesis of virtual views also makes it possible to reduce the number of cameras necessary to capture the scene.

La fourniture des métadonnées selon un format prédéterminé permet de faciliter la communication entre le décodeur et le module de traitement d'images. Par exemple, les métadonnées sont fournies sous la forme d'une table indexée et standardisée. Le module de traitement d'images sait ainsi pour chaque index de la table, quelle métadonnée est mémorisée à cet index.

It is known to use metadata for video data communications. For example, the SEI messages (for Supplementary Enhancement Information in English), introduced with the H.264 / AVC standard, are data relating to optional processing implemented at the decoder level. SEI messages are sent to the decoder via the video data bit stream. However, such SEI message data is created at the encoder level and is used only by the decoder, and this optionally, to improve the quality of the decoded and reconstructed views.

According to a particular embodiment of the invention, obtaining at least one metadata further comprises the calculation of said at least one metadata from at least part of the syntax elements.

Such a particular embodiment of the invention makes it possible to calculate new metadata, corresponding for example to information which is not used by the decoder to reconstruct a view, for example a confidence value calculated for depth information, or well to information which is used by the decoder in another form, for example motion information with a coarser granularity than that used during the reconstruction of an image.

According to another particular embodiment of the invention, said at least one metadata is not used for the reconstruction of the at least one image.

According to another particular embodiment of the invention, said at least one metadata corresponds to information included in the group comprising:

- camera settings,

- decoded and scaled motion vectors,

- partitioning of the reconstructed image,

- a reference image used by a block of an image of the reconstructed view,

- coding modes of an image of the reconstructed view,

- values of quantization parameters of an image of the reconstructed view,

- values of prediction residuals of an image of the reconstructed view,

- a map representative of the movement in an image of the reconstructed view,

- a map representative of the presence of occlusions in an image of the reconstructed view,

- a map representative of confidence values associated with a depth map.

According to another particular embodiment of the invention, the predetermined form corresponds to an indexed table in which at least one metadata is stored in association with an index.

According to another particular embodiment of the invention, said at least one metadata is obtained as a function of a level of granularity specified at the decoding device.

According to this particular embodiment of the invention, the metadata generated from the syntax elements can be obtained according to different levels of granularity. For example, for motion information, the motion vectors can be provided with the granularity used at the decoder (ie as used by the decoder), or else with a coarser granularity (eg by providing a motion vector by block size 64x64).

According to another particular embodiment of the invention, the decoding method further comprises the reception by the decoding device of a request from the image processing module indicating at least one metadata required by the data processing module. images. According to this particular embodiment of the invention, the image processing module indicates to the decoder the information it needs. The decoder can thus only provide the image processing module with the required metadata, which makes it possible to limit the complexity at the decoder and the use of memory resources.

According to another particular embodiment of the invention, the request comprises at least one index indicating the required metadata from among a predetermined list of available metadata.

The invention also relates to a decoding device configured to implement the decoding method according to any one of the particular embodiments defined above. This decoding device could of course include the various characteristics relating to the decoding method according to the invention. Thus, the characteristics and advantages of this decoding device are the same as those of the decoding method, and are not further detailed.

According to a particular embodiment of the invention, such a decoding device is included in a terminal, or a server.

The invention also relates to an image processing method comprising the generation of at least one image of a virtual view, from at least one image of a view decoded by a decoding device. According to the invention, such an image processing method also comprises reading at least one metadata in a predetermined form, said at least one metadata being obtained by the decoding device from at least one syntax element. obtained from a data stream representative of a multi-view video, said at least one image being generated using said at least one read metadata.

Thus, the image processing method advantageously takes advantage of metadata available at the decoder to generate images of a virtual view of the multi-view video. Such metadata may correspond to data to which the image processing device does not have access, or else to data that it is capable of recalculating, but at the cost of great operational complexity.

By virtual view is meant here a view from a new point of view of the scene for which no sequence of images has been captured by a camera of the scene acquisition system.

According to a particular embodiment of the invention, the image processing method further comprises sending to the decoding device a request indicating at least one metadata required to generate the image.

The invention also relates to an image processing device configured to implement the image processing method according to any one of the particular embodiments defined above. This image processing device could of course include the various characteristics relating to the image processing method according to the invention. Thus, the characteristics and advantages of this image processing device are the same as those of the image processing method, and are not further detailed.

According to a particular embodiment of the invention, such an image processing device is included in a terminal, or a server.

The invention also relates to an image processing system for displaying a multi-view video from a data stream representative of the multi-view video, comprising a decoding device according to any one of the following. embodiments described above, and an image processing device according to any one of the embodiments described above.

The decoding method, respectively the image processing method, according to the invention can be implemented in various ways, in particular in wired form or in software form. According to a particular embodiment of the invention, the decoding method, respectively the image processing method, is implemented by a computer program. The invention also relates to a computer program comprising instructions for implementing the decoding method or the image processing method according to any one of the particular embodiments described above, when said program is executed by a computer. processor. Such a program can use any programming language.

This program can use any programming language, and be in the form of source code, object code, or intermediate code between source code and object code, such as in a partially compiled form, or in any other. desirable shape.

The invention also relates to a recording medium or information medium readable by a computer, and comprising instructions of a computer program as mentioned above. The recording media mentioned above can be any entity or device capable of storing the program. For example, the medium may comprise a storage means, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, a USB key, or else a magnetic recording means, for example a hard disk. On the other hand, the recording media can correspond to a transmissible medium such as an electrical or optical signal, which can be conveyed via an electrical or optical cable, by radio or by other means. The program according to

Alternatively, the recording media can correspond to an integrated circuit in which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the method in question.

4. List of figures

Other characteristics and advantages of the invention will emerge more clearly on reading the following description of particular embodiments, given by way of simple illustrative and non-limiting examples, and the appended drawings, among which:

- Figure 1 schematically illustrates a system for free navigation within a multi-view video according to the prior art,

- Figure 2 schematically illustrates a decoder of a data stream representative of a multi-view video according to the prior art,

FIG. 3 schematically illustrates a system for free navigation within a multi-view video according to a particular embodiment of the invention,

FIG. 4 illustrates steps of the method of decoding a data stream representative of a multi-view video according to a particular embodiment of the invention,

FIG. 5 schematically illustrates a decoder of a data stream representative of a multi-view video according to a particular embodiment of the invention,

- Figure 6 illustrates steps of the image processing method according to a particular embodiment of the invention,

FIG. 7 illustrates steps of the decoding method and of the image processing method according to another particular embodiment of the invention,

- Figure 8 schematically illustrates a device suitable for implementing the decoding method according to a particular embodiment of the invention,

- Figure 9 schematically illustrates a device suitable for implementing the image processing method according to a particular embodiment of the invention,

FIG. 10 illustrates an arrangement of views of a multi-view capture system.

5. Description of particular embodiments of the invention

The invention is based on the modification of the decoding process of a data stream representative of a multi-view video, so that an image processing process based on views reconstructed by the decoding process is facilitated. For example, the image processing process corresponds to a process for synthesizing virtual views. For this, the decoder not only provides images of views reconstructed from the data stream, but also metadata associated with such images, which can then be used for the synthesis of virtual views. Advantageously, such metadata are formatted, ie put in a predetermined form to facilitate interoperability between the decoder and the synthesizer. Thus, n '

FIG. 3 schematically illustrates a system for free navigation within a multi-view video according to a particular embodiment of the invention. The system of FIG. 3 operates in a similar manner to the system described in relation to FIG. 1, with the difference that the decoder DEC outputs metadata MD1, MD2 in addition to the images of reconstructed views VD1 and VD2. Such metadata MD1, MD2 are supplied at the input of the synthesizer which then generates a virtual view VS (1 +2), for example from reconstructed views VD1, VD2. The decoder DEC and the synthesizer SYNTH then form an image processing system according to the invention. They can be included in the same device or else in two separate devices able to communicate with each other.

For example, in a nonlimiting and non-exhaustive manner, such metadata may correspond to:

- camera parameters of the view reconstructed by the decoder,

- motion vectors decoded and scaled to the image reconstructed by the decoder,

- partitioning of the reconstructed image,

- an indication of reference images used by blocks of the reconstructed image,

- coding modes of the reconstructed image,

- values of quantization parameters of the reconstructed image,

- values of prediction residuals of the reconstructed image.

Such information can be provided as used by the decoder. Or, such information can be processed by the decoder, for example to be provided with a finer or coarser granularity than that used by the decoder.

Metadata can also be calculated and shared by the decoder, for example:

- a map representing the general movement in an image of the reconstructed view, or in a group of images; for example such a map may be a binary map obtained by thresholding the motion vectors of the image or of the group of images,

a map representative of the presence of occlusions in an image of the reconstructed view; for example, such a map can be a binary map obtained by considering the level of information contained in the prediction residues of each pixel in the case of an inter-view prediction, or else the information of a possible location of occlusions can be derived from image disparity vectors or from a contour map,

a map representing confidence values associated with a depth map; for example such a map can be calculated by the decoder by comparing the coding modes of the texture and of the corresponding depth.

Some metadata provided as output can be data relating to a single view: they are then intrinsic to the view. Other metadata can be obtained from two or more views. In this case, the metadata is representative of a difference or a correlation between the views (camera parameters, occlusion map, difference in decoding modes, etc.).

FIG. 4 illustrates steps of the method of decoding a data stream representative of a multi-view video according to a particular embodiment of the invention.

A data stream STR is supplied at the input of the decoder DEC, for example in the form of a binary train. The STR data stream comprises for example multi-view video data encoded by a state-of-the-art video encoder suitable for encoding multi-view video using redundancies between views or else by a mono video encoder. -view applied to each view of the multi-view video individually.

During a step E20, the decoder DEC decodes at least part of the data of the stream delivering decoded syntax elements. Such an E20 decoding corresponds for example to a journey of the data stream and to an entropy decoding of the binary train in order to extract the syntax elements necessary for the reconstruction of a current image of a view to be reconstructed, for example a view viewed by a user. Such syntax elements correspond for example to the coding modes of the blocks of the current image, to the motion vectors in the case of an inter-image or inter-view prediction, to the quantized coefficients of the prediction residues, etc. .

Conventionally, during step E21, the current image of a view (VD1, VD2) to be reconstructed is reconstructed from the decoded syntax elements and optionally from the images of the view or from other previously reconstructed views. Such a reconstruction of the current image is implemented according to the coding modes and prediction techniques used at the level of the coder to code the current image.

The reconstructed view images are supplied at the input of a SYNTH image processing module.

During a step E23, at least one metadata is obtained from at least one decoded syntax element. Such metadata is formatted in a predetermined form. Such a predetermined form corresponds for example to a particular syntax according to which the data is arranged to be transmitted or stored in memory. When the multi-view video decoder is a decoder conforming to a particular standard, the syntax of the metadata may for example be described in this particular standard or in a standard associated with the particular decoding standard.

According to a particular embodiment of the invention, the predetermined form corresponds to an indexed table in which at least one metadata is stored in association with an index. According to this particular embodiment, each type of metadata is associated with an index. An example of such a table is illustrated by Table 1 below.

Table 1: example of a metadata table

Each metadata is stored in association with its index, and in an appropriate format depending on the type of metadata.

For example, the camera parameters of a view are stored in the form of a triplet of data comprising respectively a location information corresponding for example to the coordinates of the point in the 3D frame corresponding to the location of the camera in the scene , orientation information defined for example by the values of 3 angles in the 3D frame, and a depth of field.

According to another example, the motion vectors are stored in the form of a table comprising, for each block of the corresponding reconstructed image, the value of the corresponding motion vector.

The metadata table illustrated above is only a non-limiting example. The metadata can be stored in other predetermined forms. For example, when only one type of metadata is possible, it is not necessary to associate an index with the type of metadata.

According to a particular embodiment of the invention, during a step E22, at least one metadata is calculated from at least part of the decoded syntax elements, before the obtaining step E23.

Such a particular embodiment of the invention thus makes it possible to obtain metadata which is not used to reconstruct the current image of the view to be reconstructed but which can be used to synthesize virtual views from the current image reconstructed, for example an occlusion map.

Such a particular embodiment of the invention also makes it possible to obtain metadata with a granularity different from that used to reconstruct the current image. For example, the motion vectors can be calculated in a coarser way, for example for blocks of size 64x64 pixels over the whole image, from the reconstructed motion vectors of all the sub-blocks of the current image contained in the 64x64 block. For example, for each 64x64 block, a motion vector is calculated by taking the minimum, or maximum, mean or median, or any other function, of the motion vectors of the sub-blocks.

During a step E24, the metadata MD1, MD2 obtained during step E23 are supplied to the image processing module SYNTH external to the decoder DEC, for example a virtual view synthesis module. By module external to the decoder is meant a module whose operation is not necessary for the decoding of the data stream, nor for the display of the views reconstructed by the decoder.

For example, the metadata are stored in a memory accessible to the image processing module. According to another example, the metadata is transmitted to the image processing module, via a connection link, for example a data transmission bus, when the decoder and the image processing module are integrated within a same device, or via cable or wireless connection, when the decoder and image processing module are integrated in separate devices.

FIG. 5 schematically illustrates a decoder of a data stream representative of a multi-view video according to a particular embodiment of the invention.

Conventionally, the decoding of a view to be reconstructed from an STR data stream is implemented as follows. The decoding of a view to be reconstructed is done image by image, and for each image, block by block. For each block to be reconstructed, the elements corresponding to the block are decoded by an entropy decoding module D from the data stream STR, providing a set of decoded syntax elements SE (texture coding mode, motion vectors, disparity vectors, depth coding mode, reference image index, ...) and quantized coefficients coeff. The quantized coefficients are transmitted to an inverse quantization module (Q ~ 1 ), then to an inverse transformation module (T 1) to provide residual values of

block prediction res rec · {decoded syntax elements SE) are transmitted to a prediction module (P) for computing a predictor block for the pred also using an image previously reconstructed the AEF (a part of the image current, or a reference image of the previously reconstructed view, or a reference image of another previously reconstructed view). The current block is then reconstructed (B rec ) by adding the prediction pred to the prediction residuals of the res æc block The reconstructed block (B rec ) is then stored in MEM to be used later to reconstruct the current image or another image or another view.

According to the invention, at the output of the entropy decoding module, the decoded syntax elements SE and possibly the quantized coefficients of the blocks are transmitted to a FORM module configured to select at least part of the decoded syntax elements SE and possibly the coefficients quantized, and store them in the predetermined form to provide MD metadata relating to the reconstructed image, or to a group of images.

The selection of the decoded syntax elements SE to be formatted can be fixed, for example defined explicitly in the standard describing the operation of the decoder. Or, different types of selection can be defined in a fixed manner, for example via profiles of the decoder, and a parameterization of the decoder makes it possible to configure it so that the formatting module selects the corresponding syntax elements. According to yet another variant, the decoder is able to exchange with the image processing module to which it supplies the metadata. In this case, the image processing module explicitly indicates to the decoder the type of metadata that it wishes to receive and the FORM module of the decoder selects only the decoded syntax elements necessary.

When metadata can be provided according to a level of granularity different from that used by the decoder, such a level of granularity can be defined in a fixed manner in the standard describing the operation of the decoder, or else via profiles of the decoder. When the image processing module communicates with the decoder to obtain metadata, the image processing module can explicitly indicate to the decoder the level of granularity at which it wishes to receive some of the metadata.

According to a particular embodiment of the invention, the decoded syntax elements SE and possibly the quantized coefficients at the output of the entropy decoding module are transmitted to a CALC module configured to calculate metadata from the syntax elements SE and / or quantified coefficients. As previously, the metadata to be calculated can be explicitly defined in the standard describing the

operation of the decoder, according to different profiles or not, or else determined from exchanges with the image processing module for which they are intended.

According to a particular embodiment of the invention, the FORM module selects, in particular, the camera parameters of the view to be reconstructed.

In order to synthesize a new point of view, a synthesis module must create a model describing how each pixel of an original (reconstructed) view is projected into the virtual view. Most synthesizers, for example based on the DIBR (for Depth Image Based Rendering) technique, use depth information to project the pixels of the reconstructed view into 3D space. The corresponding points in 3D space are then projected into the camera plane from the new point of view.

Such a projection of the points of the image in 3D space can be modeled with the following equation: M = K.RT.M ', in which M is the coordinate matrix of the points of 3D space, K is the matrix of the intrinsic parameters of the virtual camera, RT is the matrix of the extrinsic parameters of the virtual camera (position of the camera and orientation in 3D space) and M ′ is the matrix of pixels of the current image.

If the camera parameters are not transmitted to the synthesis module, the synthesis module must calculate them at the cost of great complexity and the precision and calculation cannot be done in real time, or must be obtained by external sensors. The supply of these parameters by the decoder thus makes it possible to limit the complexity of the synthesis module.

According to another particular embodiment of the invention, the FORM module selects, in particular, the elements of syntax relating to the reference images used to reconstruct the current image.

In the case where, to generate a virtual view, the synthesis module has the possibility of selecting reference images among images of different views available, and previously reconstructed, the synthesis module could take advantage of knowing which reference views have been used when coding a view used for synthesis. For example, Figure 10 illustrates a view arrangement of a multi-view capture system comprising 16 cameras. The arrows between each view indicate the decoding order of the views. If the synthesis module must generate a virtual view VV for a point of view placed between view V6 and view V10 (represented by a cross in figure 10), conventionally, the synthesis module must check the availability of each view, to best build a virtual view.

According to the particular embodiment described here, if the synthesis module has metadata indicating, for a view, the reference views used to reconstruct it, the synthesis module can choose only the available view closest to the point of view. virtual (view V6 in the case of FIG. 10), in order to decide which images to use to generate the virtual view. For example, if blocks of the V6 view use an image of the V7 view as a reference image, the synthesis module may decide to also use the V7 view which is necessarily available since it is used by the view V6. Such an embodiment thus makes it possible to reduce the complexity of the synthesis module, by avoiding having to check the availability of each view during the synthesis.

According to another particular embodiment of the invention, the CALC module selects, in particular, the elements of syntax relating to the motion vectors to produce a motion map.

In areas where there is little movement, virtual view synthesis generally suffers from a lack of temporal coherence, due to the imprecision of the depth maps. These inconsistencies are extremely troublesome for visualization from the virtual point of view.

In this particular embodiment, the CALC module of the decoder selects the decoded and reconstructed motion vectors, ie after the inverse prediction of the motion vector and the scaling of the motion vector. The CALC module performs a thresholding of the reconstructed motion vectors of each block to produce a motion map, typically a binary map, in which each element takes the value 0 or 1, indicating locally whether the zone has motion or not. The bit map can be improved for example by the use of mathematical morphology (eg erosion, expansion, opening, closing).

The motion binary map can then be formatted, in particular according to the desired granularity (map at pixel level, map at block or sub-block level, or even map defined for a size of specific blocks in the image, etc. ...) to indicate the presence of motion information or not in the view.

The synthesis module receiving such a movement card can then adapt its operation, for example by applying different synthesis processes depending on whether an area is marked as having movement or not. For example, to solve the problem of temporal inconsistency, the traditional synthesis process can be turned off in the fixed areas (without movement) and simply inherit the pixel value from the previous image.

The synthesis module can obviously generate a movement map by itself, using other means, for example by estimating the movement as a

encoder. However, such an operation has a non-negligible impact on the complexity of the synthesis algorithm, as well as on the precision of the movement obtained, since the encoder estimates the movement from uncoded images which are no longer available. at the output of the decoder.

In the example shown in Figure 10 and the embodiment described above, the valid reference views can be calculated not only using the closest available view, but also averaging the reference views of a neighborhood of the point of. virtual view. For example, the reference views V6, V7, V10 and V1 1 can be averaged by the CALC module of the decoder and the average view obtained transmitted to the synthesis module.

According to another variant, the CALC module of the decoder can calculate an occlusion map, in which it indicates for each pixel or each block of the image whether the zone corresponds to an occlusion zone. For example, the CALC module can determine whether the area corresponds to an occlusion area by using the reference image information (s) used by the decoder to reconstruct the area. For example, in the case of figure 10, if most of the blocks of the image of the V6 view use a temporal prediction and some blocks of the image of the V6 view use an inter-view prediction, for example by compared to view V2, these blocks are likely to correspond to an occlusion zone.

The synthesis module receiving such an occlusion map can then decide to apply a different synthesis process depending on whether the area is marked or not as an occlusion area.

According to another particular embodiment of the invention, the CALC module selects, in particular, the coding modes associated respectively with the texture of the reconstructed image and with the depth map of the image.

According to the prior art, the synthesis algorithms mainly use depth maps. Such depth maps generally exhibit errors which produce artifacts in the synthesized virtual views. By comparing the encoding modes between the texture and the depth map, the decoder can derive a confidence measure associated with the depth map, for example a binary map indicating whether the depth and the texture are correlated (value 1) or not. (value 0).

For example, the confidence value can be derived from the encoding modes. If the texture encoding mode and the depth encoding mode are different, for example one in Intra mode and the other in Inter mode, this means that the texture and the depth are not correlated. The confidence value will therefore be low, for example 0.

The confidence value can also be positioned as a function of the motion vectors. If the texture and the depth have different motion vectors, it means that the texture and the depth are not correlated. The confidence value will therefore be low, for example 0.

The confidence value can also be set according to the reference images used by the texture and the depth. If the reference images are different, it means that texture and depth are not correlated. The confidence value will therefore be low, for example 0.

The synthesis module receiving such a confidence map can then decide to apply a different synthesis process depending on whether the zone is marked or not with a low confidence value. For example, for such areas, another reference view providing a better confidence value for the area can be used to summarize the corresponding area.

FIG. 6 illustrates steps of the image processing method according to a particular embodiment of the invention. Such a method is implemented for example by a module for synthesizing virtual views, from views decoded and reconstructed for example by the decoding method described in relation to FIG. 5.

During a step E60, at least one metadata (MD1, MD2) is read by the synthesis module. The metadata read by the synthesis module correspond to syntax elements decoded from a stream representative of a multi-view video and are associated with one or more views. They can also correspond to information calculated during the process for decoding the stream from decoded syntax elements. The metadata are stored or transmitted to the synthesis module in a predetermined form, so that any synthesis module having a suitable reading module can read them.

During a step E61, the synthesis module receives as input at least one image of a view reconstructed (VD1, VD2) by a multi-view video decoder, for example according to the decoding method described in relation to FIG. 5. The synthesis module uses these views received VD1, VD2 and the metadata MD1, MD2 read to generate at least one image from a virtual point of view VS (1 +2). In particular, the metadata MD1, MD2 are used by the synthesis module to determine a synthesis algorithm to be used for certain areas of the image, or else to determine a view to be used to generate the image of the virtual view. ..

FIG. 7 illustrates steps of the decoding method and of the image processing method according to another particular embodiment of the invention.

In general, the decoder of a multi-view video has no knowledge of the type of synthesizer that will be used to generate virtual viewpoints. In other words, the decoder does not know which synthesis algorithm will be used, nor what types of metadata could be useful to it.

According to the particular embodiment described here, the decoder and the synthesis module are adapted to be able to exchange bidirectionally. For example, the synthesis module can indicate to the decoder a list of metadata that it would need to achieve a better synthesis. Prior to or following the request of the synthesis module, the decoder can inform the synthesis module of the metadata that it is capable of transmitting to it. Advantageously, the list of metadata that the decoder is able to share is standardized, ie that all the decoders conforming to a decoding standard must be able to share the metadata from the list. Thus, for a given decoding standard, the synthesis module knows which metadata may be available. The list of metadata can also be adapted according to the profile of the decoder standard. For example, for a profile intended for decoders requiring low operational complexity, the list of metadata only includes decoded syntax elements of the stream, while for a profile intended for decoders capable of managing greater operational complexity, the list of metadata can also include metadata obtained by calculation from the decoded syntax elements of the stream, such as motion map, occlusion map, confidence map, etc.

During a step E70, the synthesis module transmits to the decoder a request indicating at least one metadata required to generate an image from a virtual point of view. For example, the query includes an index or a list of indexes corresponding respectively to required metadata.

Such a request is transmitted according to a predetermined format, ie according to a predetermined syntax so that the synthesis module and the decoder can understand each other. For example, such a syntax can be:

nb

for i an integer ranging from 0 to nb- 1, list [i]

in which the syntax element nb indicates the number of metadata required by the synthesis module, and therefore the number of indexes to be read by the decoder, and list [i] indicating the respective index of the required metadata.

According to one example, taking the example of the metadata given by the aforementioned table 1, the synthesis module can indicate in the request nb = 2 and the indexes 0 and 9 corresponding respectively to the camera parameters and to an occlusion map.

According to one variant, the synthesis module can also indicate, in association with the index of a required metadata, a level of granularity, for example by specifying a predetermined value of a “grleveP 'syntax element associated with the metadata. example, in the case of the occlusion map, the synthesis module can indicate a value of 1 for the element "lift associated with the index 9, if it wishes the occlusion map at pixel level, or else a value 2 for the element "lift associated with index 9, if he wants the occlusion map at a coarser level, for example for blocks of size 8x8.

During a step E71, the decoder obtains the corresponding metadata. For this, and according to the example described above in relation to FIG. 4 or 5, the decoder recovers the decoded syntax elements necessary for obtaining the metadata, and calculates the metadata not used by the decoder for reconstruction, such as the bite map. The metadata is then formatted according to the predetermined form so that the synthesis module can read them.

During a step E72, the decoder transmits the metadata to the synthesis module which can then use them in its synthesis algorithm.

FIG. 8 schematically illustrates a DEC device suitable for implementing the decoding method according to a particular embodiment of the invention described above.

Such a decoding device comprises a memory MEM, a processing unit UT, equipped for example with a processor PROC, and controlled by the computer program PG stored in memory MEM. The computer program PG comprises instructions for implementing the steps of the decoding method as described above, when the program is executed by the processor PROC.

According to a particular embodiment of the invention, the decoding device DEC comprises a COMO communication interface allowing in particular the decoding device to receive a data stream representative of a multi-view video, via a communication network.

According to another particular embodiment of the invention, the decoding device DEC comprises a communication interface COM1 allowing the decoding device to transmit metadata to an image processing device, such as a synthesis module, and images of views reconstructed from the data stream.

On initialization, the code instructions of the computer program PG are for example loaded into a memory before being executed by the processor PROC. The processor PROC of the processing unit UT notably implements the steps of the decoding method described in relation to FIGS. 4, 5 and 7, according to the instructions of the computer program PG. The memory MEM is particularly suitable for storing metadata obtained during the decoding process, in a predetermined form.

According to a particular embodiment of the invention, the decoding device described above is included in a terminal, such as a television receiver, a mobile phone (for example smartphone - for smart phone in English - a decoder (set top box in English), a virtual reality headset, etc ...

FIG. 9 schematically illustrates a SYNTH device suitable for implementing the image processing method according to a particular embodiment of the invention described above.

Such a device comprises a memory MEM9, a processing unit UT9, equipped for example with a processor PROC9, and controlled by the computer program PG9 stored in memory MEM9. The computer program PG9 comprises instructions for implementing the steps of the image processing method as described above, when the program is executed by the processor PROC9.

According to a particular embodiment of the invention, the SYNTH device comprises a COM9 communication interface allowing the device to receive metadata originating from a decoding device, such as the DEC device described above, and images of views reconstructed from a data stream representative of a multi-view video, by the device DEC.

On initialization, the code instructions of the computer program PG9 are for example loaded into a memory before being executed by the processor PROC9. The processor PROC9 of the processing unit UT9 notably implements the steps of the image processing method described in relation to FIGS. 6 and 7, according to the instructions of the computer program PG9.

According to a particular embodiment of the invention, the SYNTH device comprises an output interface AFF9 allowing the SYNTH device to transmit images to a display device, for example a screen. For example, such images may correspond to images from a virtual point of view, generated by the SYNTH device using the reconstructed view images and the metadata received from the DEC device.

According to a particular embodiment of the invention, the SYNTH device is a synthesis module. It is included in a terminal, such as a television receiver, a mobile phone (for example smartphone - for smart phone in English -), a decoder (set top box in English), a virtual reality headset, etc. .

The principle of the invention has been described in the case of a multi-view decoding system, in which several views are decoded from the same stream (binary train) and metadata are obtained for each view. The principle applies in a similar way to the case where the multi-view video is encoded using several streams (bit streams), one view per stream being encoded. In this case, each view decoder provides the metadata associated with the view that it decodes.

Claims

1. A method of decoding a data stream representative of a multi-view video, implemented by a decoding device, comprising:

- obtaining (E20) syntax elements from at least part of the data of the stream, and

- the reconstruction (E21) of at least one image of a view of the video from the syntax elements obtained,

the decoding method is characterized in that it further comprises:

- obtaining (E23) at least one metadata in a predetermined form from at least one syntax element,

the supply (E24) of said at least one metadata to an image synthesis module.

2. The decoding method according to claim 1, wherein obtaining at least one metadata further comprises calculating said at least one metadata from at least part of the syntax elements.

3. Decoding method according to any one of claims 1 or 2, wherein said at least one metadata is not used for the reconstruction of the at least one image.

4. The decoding method according to any one of claims 1 to 3, wherein said at least one metadata corresponds to information included in the group comprising:

- camera settings,

- decoded and scaled motion vectors,

- partitioning of the reconstructed image,

- a reference image used by a block of an image of the reconstructed view,

- coding modes of an image of the reconstructed view,

- values of quantization parameters of an image of the reconstructed view,

- values of prediction residuals of an image of the reconstructed view,

- a map representative of the movement in an image of the reconstructed view,

- a map representative of the presence of occlusions in an image of the reconstructed view,

- a map representative of confidence values associated with a depth map.

5. The decoding method according to any one of claims 1 to 4, wherein the predetermined form corresponds to an indexed table in which at least one metadata is stored in association with an index.

6. A decoding method according to any one of claims 1 to 5, wherein said at least one metadata is obtained as a function of a level of granularity specified at the decoding device.

7. The decoding method according to any one of claims 1 to 6, further comprising the reception by the decoding device of a request from the image synthesis module indicating at least one metadata required by the synthesis module of. images.

8. The decoding method according to claim 7, wherein the request comprises at least one index indicating the required metadata from among a predetermined list of available metadata.

9. Device for decoding a data stream representative of a multi-view video, the device is configured (UT, MEM, COM1) for:

- obtain syntax elements from at least part of the data of the stream, and

- reconstruct at least one image of a view of the video from the syntax elements obtained,

the decoding device is characterized in that it is further configured for:

- obtain at least one metadata in a predetermined form from at least one syntax element,

providing said at least one metadata to an image synthesis module.

10. Image synthesis method comprising the generation of at least one image of a virtual view, from at least one image of a view decoded by a decoding device, the image synthesis method is characterized in that:

- it comprises the reading (E60) of at least one metadata in a predetermined form, said at least one metadata being obtained by the decoding device from at least one syntax element obtained from a data stream representative of a multi-view video,

- the generation (E61) of said at least one image comprising the use of said at least one read metadata.

1 1. Image synthesis method according to claim 10, further comprising sending to the decoding device a request indicating at least one metadata required to generate the image.

12. Image synthesis device configured to generate at least one image of a virtual view, from at least one image of a view decoded by a decoding device, the image synthesis device is characterized by what:

- it is configured (UT9, MEM9, COM9) to read at least one metadata in a predetermined form, said at least one metadata being obtained by the decoding device from at least one syntax element obtained from a data stream representative of a multi-view video,

- And in that when said at least one image is generated, said at least one read metadata is used.

13. Image processing system for displaying a multi-view video from a data stream representative of the multi-view video, comprising:

- a decoding device according to claim 9, and

- an image synthesis device according to claim 12.

14. Computer program comprising instructions for implementing the decoding method according to any one of claims 1 to 8, or for implementing the image synthesis method according to any one of claims 10. to 1 1, when said program is executed by a processor.

Documents

Application Documents

#	Name	Date
1	202017045734-TRANSLATIOIN OF PRIOIRTY DOCUMENTS ETC. [20-10-2020(online)].pdf	2020-10-20
2	202017045734-STATEMENT OF UNDERTAKING (FORM 3) [20-10-2020(online)].pdf	2020-10-20
3	202017045734-PRIORITY DOCUMENTS [20-10-2020(online)].pdf	2020-10-20
4	202017045734-NOTIFICATION OF INT. APPLN. NO. & FILING DATE (PCT-RO-105) [20-10-2020(online)].pdf	2020-10-20
5	202017045734-FORM 1 [20-10-2020(online)].pdf	2020-10-20
6	202017045734-DRAWINGS [20-10-2020(online)].pdf	2020-10-20
7	202017045734-DECLARATION OF INVENTORSHIP (FORM 5) [20-10-2020(online)].pdf	2020-10-20
8	202017045734-COMPLETE SPECIFICATION [20-10-2020(online)].pdf	2020-10-20
9	202017045734-Information under section 8(2) [12-01-2021(online)].pdf	2021-01-12
10	202017045734-Verified English translation [15-01-2021(online)].pdf	2021-01-15
11	202017045734-FORM-26 [15-01-2021(online)].pdf	2021-01-15
12	202017045734-certified copy of translation [15-01-2021(online)].pdf	2021-01-15
13	202017045734-Proof of Right [19-04-2021(online)].pdf	2021-04-19
14	202017045734-FORM 3 [19-04-2021(online)].pdf	2021-04-19
15	202017045734.pdf	2021-10-19
16	202017045734-FORM 3 [15-11-2021(online)].pdf	2021-11-15
17	202017045734-FORM 18 [27-01-2022(online)].pdf	2022-01-27
18	202017045734-FER.pdf	2022-04-20
19	202017045734-OTHERS [20-10-2022(online)].pdf	2022-10-20
20	202017045734-MARKED COPIES OF AMENDEMENTS [20-10-2022(online)].pdf	2022-10-20
21	202017045734-Information under section 8(2) [20-10-2022(online)].pdf	2022-10-20
22	202017045734-FORM 3 [20-10-2022(online)].pdf	2022-10-20
23	202017045734-FORM 13 [20-10-2022(online)].pdf	2022-10-20
24	202017045734-FER_SER_REPLY [20-10-2022(online)].pdf	2022-10-20
25	202017045734-COMPLETE SPECIFICATION [20-10-2022(online)].pdf	2022-10-20
26	202017045734-CLAIMS [20-10-2022(online)].pdf	2022-10-20
27	202017045734-AMMENDED DOCUMENTS [20-10-2022(online)].pdf	2022-10-20
28	202017045734-Information under section 8(2) [18-09-2023(online)].pdf	2023-09-18
29	202017045734-US(14)-HearingNotice-(HearingDate-15-03-2024).pdf	2024-03-04
30	202017045734-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [12-03-2024(online)].pdf	2024-03-12
31	202017045734-US(14)-ExtendedHearingNotice-(HearingDate-19-04-2024).pdf	2024-03-15
32	202017045734-Correspondence to notify the Controller [16-04-2024(online)].pdf	2024-04-16
33	202017045734-US(14)-ExtendedHearingNotice-(HearingDate-29-04-2024).pdf	2024-04-18
34	202017045734-FORM-26 [24-04-2024(online)].pdf	2024-04-24
35	202017045734-Correspondence to notify the Controller [24-04-2024(online)].pdf	2024-04-24
36	202017045734-Written submissions and relevant documents [14-05-2024(online)].pdf	2024-05-14
37	202017045734-PETITION UNDER RULE 137 [14-05-2024(online)].pdf	2024-05-14
38	202017045734-Information under section 8(2) [14-05-2024(online)].pdf	2024-05-14
39	202017045734-FORM 3 [14-05-2024(online)].pdf	2024-05-14
40	202017045734-GPA-060524.pdf	2024-05-15
41	202017045734-Correspondence-060524.pdf	2024-05-15
42	202017045734-PatentCertificate22-07-2024.pdf	2024-07-22
43	202017045734-IntimationOfGrant22-07-2024.pdf	2024-07-22

Search Strategy

1	202017045734searchstrategyE_20-04-2022.pdf