Specification
The present invention relates generally to the field of image synthesis.
The present invention applies more particularly to the synthesis of non-captured intermediate points of view, from images of several 2D (two-dimensional), 360 °, 180 °, etc. views which are captured to generate an immersive video, such as in particular a 360 ° video, 180 °, etc.
The invention may in particular, but not exclusively, be applied to video decoding implemented in current video decoders AVC (English abbreviation for "Advanced Video Coding and their MVC extensions (English abbreviation for" Multiview Video Coding "), 3D- AVC, MV-HEVC, 3D-HEVC, etc.
2. Prior art
In an immersive video context, ie where the viewer has the feeling of being immersed in a 3D scene (three dimensions), the scene is conventionally captured by a set of cameras. These cameras can be:
- 2D type, to capture a particular angle of the scene, and / or
- type 360 °, 180 °, or others, to capture aunt the scene at 360 degrees, 180 degrees or others, around the camera.
The images of such captured views are traditionally encoded and then decoded by the viewer's terminal. However, in order to provide a sufficient quality of experience, and therefore visual quality and good immersion, displaying the captured views is insufficient. The images of a multitude of views, called intermediate views, must be calculated from the images of the decoded views.
The calculation of images of these intermediate views is carried out by a view synthesis algorithm. A synthesis algorithm is capable, from images of N views, with N> 2, of synthesizing an image from an intermediate point of view located anywhere in space. The image of a view considered among N comprises a texture component and a depth map which indicates the distance separating the different elements of the scene from the camera which captured the image of this view. Thus, the image of an intermediate view obtained by synthesis also comprises a texture component synthesized from the N texture components of the images of the N views and a depth map synthesized from the N depth maps of the images of the N views.
For a view considered among N, the depth map is either captured or calculated from the N texture components. However, in both cases, such a depth map may contain numerous errors which are either linked to the capture, or linked to the calculation, or linked to the compression of the images of the N views.
Such errors therefore inevitably have repercussions in the image of a view synthesized from the image of each of these N views, which significantly reduces the performance of current synthesis algorithms.
A well-known synthesis algorithm of the aforementioned type is for example the VSRS algorithm (in English “View Synthesis Reference Software”). This algorithm implements a projection of the N depth maps at a position corresponding to the image of the view to be synthesized in the video to be generated, in order to determine a single synthesized depth map. This depth map is then used to find the texture information associated with each pixel of the N texture components.
A drawback of this projection lies in the fact that certain pixels are missing in the N projected depth maps, since they do not correspond to any of the pixels of the N depth maps, before projection. This lack of correspondence can result either from a lack of precision on the pixels contained in the N depth maps used for the projection, or because the missing pixels correspond to areas not visible in the images of the N views (occlusion) containing the N depth maps.
Such projection errors therefore inevitably have repercussions in the image of a view which is synthesized from the image of each of these N views.
The errors induced during the initial calculation of a depth map, also depending on the quantification of this information, that is to say on the number of values that this depth map can take, but also on the level of compression applied, are translate on the depth map by two types of defects:
- blurring if the errors are small,
- gross defects if these errors are significant.
When the gap between what can see the camera that captured the image of a view and what should see the virtual camera increases, the synthesis defects
become more and more important, and go from vague to gross defects, the latter having to be absolutely avoided in order not to harm the feeling of immersion of the viewer.
The techniques of the state of the art do not take into account the fact that the available depth maps, intended to be projected during the synthesis of the image of an intermediate view, present errors. As a result, the depth maps obtained at the end of the projection of the available depth maps also contain errors.
3. Purpose and summary of the invention
One of the aims of the invention is to remedy the drawbacks of the aforementioned state of the art.
To this end, an object of the present invention relates to a method for synthesizing an image of a view from images of N (N> 2) views, implemented by an image synthesis device, characterized in that it includes the following:
- project, at a position corresponding to the image of the view to be synthesized, N depth maps associated respectively with the N views,
such a method being characterized in that it comprises the following:
- for at least one given pixel of at least one projected depth map, for which a depth value has been associated with the outcome of the projection, modify the depth value of said at least one given pixel if reliability information associated with the depth value is at a certain value, the modification using the depth value of a pixel of position corresponding to that of said at least one given pixel, in at least one other projected depth map, which generates at least a modified projected depth map.
Taking into account the fact that a conditional modification is applied to one or more projected pixels of at least one of the projected depth maps, the invention advantageously makes it possible to correct errors in said at least one projected depth map. For a given pixel of such a projected depth map, these projection errors can result:
- quantization noises introduced during digital quantization on a plurality of bits of the depth value of each pixel of the depth map of the image of a view, from which the projected depth map was obtained ,
- errors introduced when the depth map is compressed by an HEVC, 3D-HEVC, etc. type encoder,
- errors introduced during the projection in real space of the depth map of the image of the view.
All the pixels of a projected depth map are not necessarily modified. They are if a reliability item of information assigned to each of these pixels is at a certain value.
Furthermore, such a modification uses the depth value of a pixel having a position corresponding to that of the given pixel, in another projected depth map, such a depth value being considered relevant for correcting the erroneous depth value of the given pixel.
Thus, thanks to the invention, a considered projected depth map is affected by much fewer errors than a projected depth map of the state of the art. This results in a very marked improvement in the quality of the image of a view synthesized from a plurality of view images, when at least one of these images is associated with a depth map which contains errors before and / or after projection.
According to one embodiment of the invention, the modification uses a weighting of the depth value of the pixel of position corresponding to said at least one given pixel, in said at least one other projected depth map.
Such an embodiment makes it possible to attribute a greater or lesser importance to a pixel of position corresponding to that of the given pixel, in another map of projected depth, which, therefore, will have a greater or lesser impact on the modification of the depth value of the given pixel.
According to one embodiment of the invention, the confidence level of a pixel of a given depth map is calculated as a variation of the depth value of said pixel, said variation corresponding to a projection error in number of pixels. authorized.
Such a calculation of the confidence level advantageously makes it possible to take into account the real quality of projection of said at least one other depth map, in addition to the positioning distance between the camera which captured the image of the view, for which the projection of the depth map generated the considered projected depth map, and the camera which captured the image of said at least one other view, for which the projection of the depth map generated the other depth map projected.
According to one embodiment of the invention, the confidence level of a pixel is weighted by a coding parameter of the image of the view with which the depth map which contains the pixel is associated.
Such an embodiment makes it possible to refine the calculation of the confidence level of a given pixel by taking into account the level of compression quality, such as, for example, the value of the quantization step which was used during the coding of the image of the view with which the depth map which contains said pixel is associated, or the position of this image in the coding hierarchy.
According to one embodiment of the invention, the modification of the depth value of the given pixel consists in replacing said depth value by a value which is calculated from said depth value and from the depth value of the position pixel. corresponding to that of said at least one given pixel in said at least one other projected depth map, said depth values each being weighted by their respective confidence level.
According to one embodiment of the invention, the reliability information is generated as follows:
- determine in the set of N projected depth maps, for a same position as that of the given pixel, which pixel has the maximum depth value and which pixel has the minimum depth value,
- calculate the difference between the maximum and minimum depth values,
- compare the calculated difference with a threshold,
- generate reliability information, the value of which depends on the result of the comparison.
According to such an embodiment, the reliability information is advantageously calculated so that the modification which is applied to the value of the given pixel of the depth map considered and which is conditioned by this reliability information only results in fuzzy-type artifacts in the image of the synthesized view.
According to one embodiment of the invention, the reliability information is generated as follows:
- determine in the set of N projected depth maps, for a same position as that of the given pixel, which pixel has the maximum depth value and which pixel has the minimum depth value,
- calculate a difference between the depth value of said given pixel and the determined minimum depth value,
- compare the calculated difference with a threshold,
- generate reliability information with respect to the determined minimum depth value, the value of which depends on the result of the comparison,
- calculate another difference between the determined maximum depth value and the depth value of said given pixel,
- compare the other difference calculated with said threshold,
- Generate reliability information with respect to the determined maximum depth value, the value of which depends on the result of the comparison.
According to such an embodiment, the reliability information for a given pixel of a considered projected depth map is advantageously quantified on two levels:
a first level which takes into account the difference between the depth value of the pixel given in the projected depth map and the minimum depth value determined for the N projected depth maps, for the N pixels of position corresponding to that of the pixel given,
a second level which takes into account the difference between the maximum depth value determined for the N projected depth maps, for the N pixels of position corresponding to that of the given pixel, and the depth value of the given pixel in the map of projected depth.
In this way, for a given pixel of a projected depth map, as a function of the two reliability information associated with the given pixel, the respective depth values of the pixels of position corresponding to that of the given pixel in the N projected depth maps can be selected in two different ways for changing the depth value of the given pixel. The modification of the depth value of the given pixel is thus made more precise.
According to one embodiment of the invention, the comparison threshold is equal to the average of the N variations of the depth value of each pixel of position corresponding to that of said given pixel in their respective depth map.
Such an embodiment of calculating the comparison threshold makes it possible to optimize the reduction of artifacts of fuzzy type in the image of the synthesized view.
According to one embodiment of the invention, the comparison threshold is equal to the average of the N variances of the depth value of each pixel of position corresponding to that of said given pixel in their respective depth map.
Such an embodiment for calculating the comparison threshold makes it possible to optimize the reduction of fuzzy-type artifacts in the image of the synthesized view, while taking into account the intrinsic quality of the N projected depth maps.
The various aforementioned embodiments or characteristics can be added independently or in combination with one another, to the synthesis process defined above.
The invention also relates to a device for synthesizing an image of a view from images of N (N> 2) views, such a synthesis device being characterized in that it comprises a processor which is configured for implement the following:
- project, at a position corresponding to the image of the view to be synthesized, N depth maps associated respectively with the N views,
- for at least one given pixel of at least one projected depth map, for which a depth value has been associated with the outcome of the projection, modify the depth value of said at least one given pixel if reliability information associated with said depth value is at a certain value, said modification using the depth value of a pixel of position corresponding to that of said at least one given pixel, in at least one other projected depth map, which generates at least a modified projected depth map.
Such a synthesis device is in particular suitable for implementing the aforementioned synthesis method, according to any one of its aforementioned embodiments.
The invention also relates to a method of decoding a data signal representative of a set of images of N (N> 2) coded views, comprising the following:
- decoding the images of the N coded views, producing a set of images of N decoded views,
- synthesizing an image of a view from the set of images of N views decoded in accordance with the aforementioned synthesis method, according to any one of its aforementioned embodiments.
The invention also relates to a device for decoding a data signal representative of a set of images of N (N> 2) coded views, such a decoding device comprising a processor which is configured to implement what follows:
- decoding the images of the N coded views, producing a set of images of N decoded views,
synthesizing an image of a view from said set of images of N views decoded in accordance with the aforementioned synthesis method, according to any one of its aforementioned embodiments.
The invention also relates to a computer program comprising instructions for implementing the synthesis method or the decoding method integrating the synthesis method according to the invention, according to any one of the particular embodiments described above, when said program is executed by a processor.
This program can use any programming language, and be in the form of source code, object code, or intermediate code between source code and object code, such as in a partially compiled form, or in any other. desirable shape.
The invention also relates to a recording medium or information medium readable by a computer, and comprising instructions of a computer program as mentioned above.
The recording medium can be any entity or device capable of storing the program. For example, the medium may comprise a storage means, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or else a magnetic recording means, for example a USB key or a hard disk.
On the other hand, the recording medium can be a transmissible medium such as an electrical or optical signal, which can be conveyed via an electrical or optical cable, by radio or by other means. The program according to the invention can in particular be downloaded from an Internet type network.
Alternatively, the recording medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the aforementioned synthesis or decoding method.
4. Brief description of the drawings
Other characteristics and advantages will emerge more clearly on reading several preferred embodiments, given by way of simple illustrative and non-limiting examples, and described below with reference to the accompanying drawings, in which:
- Figure 1 shows the main actions performed by the synthesis method according to one embodiment of the invention,
FIG. 2 represents an example of an image used in the synthesis method of FIG. 1,
FIGS. 3A to 3C represent an example of calculation of a confidence level used in the synthesis method of FIG. 1,
FIG. 4 represents an example of calculation of reliability information used in the synthesis method of FIG. 1,
- Figure 5 shows a synthesis device implementing the synthesis method of Figure 1,
FIGS. 6A and 6B represent examples of arrangement of the synthesis device of FIG. 5, in the case where the images used for the synthesis of images have been decoded beforehand.
5. Description of the general principle of the invention
The invention mainly proposes a synthesis diagram of an image of an intermediate view from a plurality of images of respectively a plurality of views, each view representing at the current instant a 3D scene according to a given position or a given angle of view.
For each image of a view in the plurality of images, the depth map of the image of the view is projected in a conventional manner at a position corresponding to the image of a view to be synthesized.
The invention is characterized by the application of a conditional modification of the depth value of each pixel considered in a given projected depth map. Such a modification thus makes it possible to compensate for errors in the depth values which may come from:
- errors introduced when calculating the depth values of the depth map of the image of a view, from which the given projected depth map was obtained,
- errors introduced during the data compression of the image of the view,
- errors introduced during the projection in real space of the depth map of the image of the view.
6. Examples of implementation of synthesis diagram
Described below is a method of synthesizing an image of a view from images of a plurality of views, such a method being able to be used in or with any type of current AVC and HEVC video decoders and their extensions. (MVC, 3D-AVC, MV-HEVC, 3D-HEVC, etc.), or other.
With reference to FIG. 1, such a synthesis method uses N images \ ^, l 2 , ..., Ij,. . . , I N of respectively N views, with 1 2, the plurality of views representing a 3D scene respectively according to a plurality of viewing angles or a plurality of positions / orientations. In a classic way:
the image \ ^ comprises a texture component Ti and a depth map D 1;
- the image l 2 includes a texture component T 2 and a depth map D 2 ,
- the image I j includes a texture component T j and a depth map Dj,
- the image I N comprises a texture component T N and a map of
depth D N .
For a given image l j , as shown in Figure 2:
- its texture component T j comprises Q (Q³1) points p1 j , p2 j , ...., pQ j each assigned a corresponding texture value t1 j , t2 j , ...., tQ j,
. its depth map D j comprises the Q points p1 j , p2 j , ...., pQ j each assigned a corresponding depth value d1 j , d2 j , ...., dQ j .
In S1 in FIG. 1, the maps of depth D 1; D2, ..., D j , ..., D N are projected at a position corresponding to an image l sth of a view to be synthesized.
Such a projection is implemented by a projection algorithm, for example of the DIBR type (English abbreviation for “Depth Image Based Rendering”).
At the end of such a projection, N projected depth maps D, D 2 V , ..., D j v , ..., D N V are obtained . A depth map D j considered among N includes the Q points p1 j , p2 j , ...., pQ j each assigned a corresponding depth value d1 v j , d2 v j , ...., dQ v j .
Such depth values are not systematically correct, taking into account in particular:
- quantization noise introduced during digital quantization on a plurality of bits of each of the depth values d1 j , d2 j , ...., dQ j , and / or
- in the case where the depth card D j has undergone compression by an encoder of the HEVC, 3D-HEVC, etc. type, errors introduced during this compression, and / or
- errors introduced during the projection in real space of the depth map D j .
In a manner known per se, the N projected depth maps D, D 2 V , ..., D j v , ..., D N V are respectively associated with N attribute maps A 1; A 2 , ..., A j , ..., A N .
For any map of projected depth considered D j v (1 depth TH ·
If Ad1 v = d1 v ma x-d1 v mi n depth TH , in S103, the difference Ad1 v = d1 v ma x-d 1 v min is compared to a predefined threshold, according to the following relation:
Ad 1 v = d 1 v max-d 1 V min < Avg Var
The operations S104 or S105 of Fig. 4 proceed in the same manner as in the first embodiment.
The operations S100 to S103 and S104 or S105 are iterated for the pixels p2 1; p2 2 , ..., p2 N located in the same position (for example the second pixel of the first line from the left), in each of the projected depth maps D, D 2 V , D 3 V , D 4 V , and so on up to pixels pQi, pQ 2 , ..., pQ N located in the same position (for example the last pixel of the last row from the left), in each of the projected depth maps D, D 2 V , D 3 V , D 4 V.
The conditional modification S2 is implemented in the same way as in the first embodiment.
7.3 Third embodiment
According to this third embodiment, the calculation of the confidence level takes place in the same way as in the first embodiment.
The calculation of the reliability information proceeds in the same way as in the first embodiment up to and including S102. The operations S103 to S104 or S105 are replaced by the following:
With regard to the pixel pl i of the map of projected depth D, the differences dl Yd1 v min and d 1 v m ax- d1 v 1 are each compared to a predefined threshold depth TH , such
than
If d1 v rd1 v min³depth T H, a reliability information item F1 i is set to a first value V3, such as for example V3 = 0, to indicate that the depth value of the pixel pl i in the projected depth map D n is not reliable with respect to the minimum depth value d1 v min ·
As a variant, the comparison is d1 Yd1 v min > depth TH - If d1 v rd1 v min depth TH .
If d 1 v max -d 1 v -i 2) views (Ti, Di; T 2 , D 2 ; ...; TN, D n ) , implemented by an image synthesis device, characterized in that it comprises the following:
- project (S1), at a position corresponding to the image of the view to be synthesized, N depth maps associated respectively with the N views,
- for at least one given pixel (pi j ) of at least one map of projected depth (D j v ), for which a depth value (di v j ) has been associated with the outcome of the projection, modify ( S2) said depth value of said at least one given pixel if a reliability information item (Fi j ) associated with said depth value is at a certain value, said modification using the depth value of a pixel of position corresponding to that of said at least one given pixel, in at least one other projected depth map, which generates at least one modified projected depth map (D).
2. The method of claim 1, wherein said modification (S2) uses a weighting of the pixel depth value of position corresponding to said at least one given pixel, in said at least one other projected depth map.
3. Method according to claim 2, wherein the confidence level of a pixel of a given depth map is calculated (S12) as a variation of the depth value of said pixel, said variation corresponding to a projection error in. number of pixels allowed.
4. Method according to claim 3, in which the confidence level of a pixel is weighted by a coding parameter (by C o Mp ) of the image of the view with which the depth map which contains said pixel is associated. .
The method according to any one of claims 1 to 4, wherein the modification of the depth value of the given pixel comprises replacing (S23) said depth value with a value which is calculated from said depth value and the depth value of the pixel of position corresponding to that of said at least one given pixel in said at least one other projected depth map, said depth values each being weighted by their respective confidence level.
6. Method according to any one of claims 1 to 5, wherein the reliability information is generated as follows:
- determine (S101, S102) in the set of N projected depth maps, for a same position as that of the given pixel, which pixel has the maximum depth value and which pixel has the depth value minimal,
- calculate (S103) the difference between the maximum and minimum depth values,
- compare (S103) the calculated difference to a threshold (depth TH ),
- generating (S104 or S105) a reliability information item, the value of which depends on the result of the comparison.
7. Method according to any one of claims 1 to 5, wherein the reliability information is generated as follows:
- determine in the set of N projected depth maps, for a same position as that of the given pixel, which pixel has the maximum depth value and which pixel has the minimum depth value,
- calculate a difference between the depth value of said given pixel and the determined minimum depth value,
- compare the calculated difference to a threshold (depth TH ),
- generate reliability information with respect to the determined minimum depth value, the value of which depends on the result of the comparison,
- calculate another difference between the determined maximum depth value and the depth value of said given pixel,
- compare the other difference calculated with said threshold (depth Ti -i),
- Generate reliability information with respect to the determined maximum depth value, the value of which depends on the result of the comparison.
8. The method of claim 6 or claim 7, wherein the comparison threshold is equal to the average of the N variations of the depth value of each pixel of position corresponding to that of said given pixel in their respective depth map.
9. The method of claim 6 or claim 7, wherein the comparison threshold is equal to the average of the N variances of the depth value of each pixel of position corresponding to that of said given pixel in their respective depth map.
10. Synthesis device (SYNT) of an image of a view from images of N (N> 2) views (Ti, D-,; T 2 , D 2 ; ...; T N , D n ), said synthesis device being characterized in that it comprises a processor which is configured to implement the following:
- project, at a position corresponding to the image of the view to be synthesized, N depth maps associated respectively with the N views,
- for at least one given pixel (pi j ) of at least one map of projected depth (D j v ), for which a depth value has been associated with the outcome of the projection, modify the depth value of said at at least one given pixel if a reliability item of information (Fi j ) associated with said depth value is at a certain value (V2), said modification using the depth value of a pixel of position corresponding to that of said at least one given pixel , in at least one other projected depth map, which generates at least one modified projected depth map
11. A method of decoding a data signal representative of a set of images of N (N> 2) coded views, comprising the following:
- decoding the images of the N coded views, producing a set of images of N decoded views,
- synthesizing an image of a view from said set of images of N views decoded in accordance with the synthesis method according to any one of claims 1 to 9.
12. Device for decoding a data signal representative of a set of images of N (N> 2) coded views, said decoding device comprising a processor which is configured to implement the following:
- decoding the images of the N coded views, producing a set of images of N decoded views,
- synthesizing an image of a view from said set of images of N views decoded in accordance with the synthesis method according to any one of claims 1 to 9.
13. A computer program comprising instructions comprising program code instructions for executing the steps of the synthesis method according to any one of claims 1 to 9 or of the decoding method according to claim 11, when is run on a computer.
14. Information medium readable by a computer, and comprising instructions of a computer program according to claim 13.