Techniques For Stereo Three Dimensional Video Processing

Abstract: An apparatus for processing a stereo three dimensional video includes a memory to store an encoded stereo three dimensional (S3D) video frame of the stereo three dimensional video where the S3D video frame comprising a left frame right frame and depth frame. The apparatus may include a processor to retrieve a pixel (x y) of the depth frame the pixel (x y) comprising a reference device depth(x y). The apparatus may also include a device aware decoder component to determine pixel pair positions for a left pixel in the left frame and right pixel in the right frame based upon the pixel (x y) and to calculate a target device disparity for the pixel (x y) based upon the reference device depth(x y) and a screen size of a target device to present the S3D video.

Patent Information

Application #

Filing Date

02 May 2016

Publication Number

32/2016

Publication Type

INA

Invention Field

COMMUNICATION

Status

Email

Parent Application

Applicants

INTEL CORPORATION

2200 Mission College Blvd. Santa Clara California 95054

Inventors

1. HUANG Chao

Zizhu Site Zixing Road 880 Shanghai 200241

2. SABHARWAL Manuj

1900 Prairie City Rd Folsom California 200232

Specification

TECHNIQUES FOR STEREO THREE DIMENSIONAL VIDEO PROCESSING TECHNICAL FIELD The present embodiments are related to three dimensional (3D) digital media and in particular to decoding 3D video for presentation on display devices. BACKGROUND Recently the use of stereo 3D to view games and videos has become increasingly popular. Stereo 3D devices such as 3D projectors, 3D TVs and so on are widely used for watching movies, stereo 3D TV programs and other stereo 3D videos. However, users do not have the same experience when watching the same stereo 3D video on different devices when screen size varies between devices. For example, the user experience is different when a user watches a stereo 3D video designed for a 64 inch screen on a 32 inch screen instead of the designed-for 64 inch screen. Assuming that the expected viewing distance for 64 inch screen is 4 meters and the expected viewing distance for 32 inch screen is 2 meters, in present day design the presentation of stereo 3D videos is such that the perceived depth range for a 32 inch screen is much smaller than that for 64 inch screen. For example, although a 32 inch screen is exactly half the dimension of a 64 inch screen, when a stereo 3D video designed for the 64 inch screen is displayed on the 32 inch screen the perceived depth range may be much less than half of the perceived depth range experienced when watching the stereo 3D on a 64 inch screen. For example, the perceived depth range on the 32 inch screen may be less than ¼ that of the perceived depth range on a 64 inch screen. This large degradation in the ability to present depth in a stereo 3D video when screen size of a viewing device is smaller than that of the designed device limits the quality of experience for many users since stereo 3D videos may be deployed on multiple different device types. Moreover, this may limit the growth of stereo 3D video because of the need to custom design the stereo 3D video for different device screen sizes. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 depicts an exemplary implementation architecture for stereo three dimensional video processing. FIG. 2 depicts a processing scenario and apparatus consistent with the present embodiments. FIG. 3 depicts details of one embodiment for generating an encoded S3D video. FIG. 4A presents an example image that illustrates presentation of an L-frame or R-frame of an S3D video frame as a two dimensional image. FIG. 4B presents a depth image that corresponds to the image of FIG. 4A. FIGs. 5A to 5C depict the geometrical relation of parameters involved in video processing consistent with the present embodiments. FIG. 6 depicts operations involved in determination of a pixel pair in a L-frame and R-frame consistent with the present embodiments. FIG. 7 illustrates an operation in which a pixel from a depth frame is received by a decoding component consistent with the present embodiments. FIG. 8 shows generation of an L-frame and R-frame using a device aware decoder component. FIGs. 9A-9C detail the relation between perceived depth when an S3D video is presented on different devices. FIG. 10 shows an exemplary first logic flow. FIG. 11 illustrates an exemplary system architecture. DETAILED DESCRIPTION Embodiments provide enhancements for the viewing of stereoscopic 3D (S3D) video when presented on digital display devices in which screen size, among other properties, may vary. In particular embodiments, real time encoding and decoding of video data is handled in a manner that provides device aware video processing as detailed below. As noted, due to the proliferation of display device models with a large range of screen sizes and expected viewer distances, techniques for processing stereo videos are needed that can accommodate differences in screen sizes and still produce an acceptable perceived depth appropriate for the device presenting the video, which is referred to herein as the "target device." The present embodiments address this issue by changing disparity to accommodate differences in size of target devices during decoding of a video in real-time. In this manner a single S3D video may be deployed over multiple different devices having a range of screen sizes where the perceived depth produces an enjoyable user experience independent of device size. In the present embodiments, novel operations are performed by a decoding system that takes into account the characteristics of the device to present an S3D video. The techniques and components that perform such operations are termed herein "device aware" because of this feature of adjusting video encoding to account for screen size of a device to which the encoded video is to be transmitted. A result of operation of the present embodiments is the reduction in viewer's distortion that may be caused by unsuitable compression in perceived depth for a given S3D video. In various embodiments a component such as a video decoder or codec (encoder/decoder) is provided to perform or assist in the performing of device aware S3D decoding. The device aware S3D decoding may be provided as part of an implementation framework that includes novel S3D video recording and processing as detailed below. FIG. 1 depicts an implementation architecture for processing S3D content in accordance with embodiments of the disclosure. The architecture 100 includes various components to process an S3D video 102. As detailed below in various embodiments the S3D video may be video content that is generated in a manner that provides information to facilitate device aware S3D treatment of the S3D video. This has the effect of enhancing user experience when the S3D video is deployed for viewing across multiple different device platforms. In particular, the S3D video may be recorded and encoded to permit viewing on devices having multiple different screen sizes in which perceived depth of objects is adjusted to account for different screen dimensions. The S3D video 102 may be forwarded to a device aware decoder system 104 whose operation is detailed below. In one implementation, the S3D video 102 may be forwarded to a device 106 that includes a digital display having a screen size having a first dimension. In one example, the S3D video 102 may be encoded in a manner that is designed for presentation of the S3D video 102 on devices having screen sizes of the first dimension. Thus the device 106 may be termed a reference device. As further shown in FIG. 1, the S3D video 102 may be processed by the device aware decoder system 104 and forwarded to a second device, device 108, whose screen size differs from that of device 106. The device aware decoder system 104 may include components that decode the S3D video 102 to adjust its video frames so that perceived depth of objects in the video frames are appropriate for the device 108. In various implementations as detailed below, this may allow a larger perceived depth to be preserved as compared to conventional video codecs when the S3D video 102 is provided to devices having smaller screen size. FIG. 2 presents an exemplary processing scenario 200 for S3D video consistent with various embodiments. In this scenario video content 202 is recorded by a video recording device 204. In various embodiments, the video recording device 204 includes left camera, right camera, and depth camera, whose operation is detailed below. The video content may be stored or transmitted to another device as S3D video 206. The S3D video 206 may be stored in any location and any convenient device, which is indicated as a video source 208. In one use device aware decoder component 220 and device aware encoding system may be located in a different platform than that containing the display device 224. In the example of FIG. 2, the device aware decoder system 212 is configured to output the received streaming S3D video 210 as the decoded S3D video 222. As detailed below, the decoding of the streaming S3D video 210 takes place in a manner that formats left and right frames of each video frame of the decoded S3D video 222 to generate an on screen disparity that produces a desired perceived depth in object of the streaming S3D video 210. In some embodiments, and as detailed below, the device aware decoder g system 212 may operate to maintain a constant ratio of perceived object depth/screen width when the S3D decoded video 226 is presented on different devices of differing screen size including different screen width. FIG. 3 depicts details of one embodiment for generating an encoded S3D video. As illustrated in FIG. 3 an encoded streaming 3D video frame may be constructed from a left frame, shown as L-frame 302, right frame, shown as R-frame 306, and a depth frame 304. In the present embodiments a recording device 314 to record S3D video may include a left camera 316, right camera 318 , and depth camera 320 that is located midway between left camera 316 and right camera 318. When an S3D video is recorded a two dimensional left image, two dimensional right image and two dimensional depth image are recorded simultaneously to generate a left frame, right frame, and depth frame for a given video frame. The physical displacement of left camera 316 and right camera 318 with respect to one another result in an image of the same scene that is recorded in L-frame 302 being displaced along the axis 330 with respect to an image recorded in R-frame 306. The depth camera 320 may be a conventional depth recording component such as a combination of an infrared projector to project infrared radiation onto a scene and CMOS sensor to record reflected radiation as object depth. The embodiments are not limited in this context however. The depth frame 304 may comprise a two dimensional array of depth values that present the object depth at pixel positions recorded in the L-frame 302 and R-frame 306. FIG. 4A presents an example image 402 that illustrates presentation of an L-frame or R-frame of an S3D video frame as a two dimensional image. The image 402 may represent one or more visual features associated with a set of two dimensional pixel positions in a video. For example, each pixel coordinate in an L-frame or R-frame may be associated with color and intensity information for that pixel coordinate. When presented on a display device, each L-frame or R-frame may thus present a complete image of a scene to be presented as in conventional S3D video. An offset or disparity between the R-frame and L-frame when presented on a display screen determines the depth of objects perceived by a viewer. In the present embodiments this disparity is determined from the depth frame of an S3D video frame as detailed below. The FIG. 4B presents a depth image 404 that corresponds to the image 402. In other words, the depth image 404 represents a two dimensional array of depth information for a scene that corresponds to the scene depicted in image 402. For clarity of illustration, the depth information captured from a depth frame shown in depth image 404 is depicted in a format in which intensity is proportional to depth. Thus, in the depth image 404, the lighter regions represent objects that are at greater depths with respect to a viewer. This varying depth information may be used to vary the disparity between L-frame and R-frame at different pixel positions. Thus, for a given object at a given pixel position a depth frame, the larger the depth value associated with that object, the greater is the disparity between the pixel position of that object in the L-frame and pixel position of that object in the R-frame. Returning now to FIG. 3, the output of each of the L-frame 302 , R-frame 306 and depth frame 304 is encoded by the encoder 310 to generate the S3D encoded video 312. As discussed in the following FIGs. this encoded video format that includes L-frame, R-frame and depth frame data may be harnessed to generate proper on-screen displacement between left and right images of an S3D video for presentation on a display device of any screen size. Although in the example above the S3D encoded video 312 may be generated by a recording device, such as video recording device 204, in other embodiments an S3D encoded video may be computer-generated, such that L-frame, R-frame, and depth frame are all computer generated. FIGs. 5A to 5C depict the geometrical relation of several parameters involved in device aware S3D video processing consistent with the present embodiments. In FIG. 5A, there is shown an arrangement in which viewer eyes 502, 504 are shown in relation to a display device 506. The display device 506 has a screen width W that lies along a direction parallel to a plane 514 in which the viewer left and right eyes 504, 502 are located. A value e is the distance between the viewer's eyes. For a given screen size it may be assumed that a viewer's eyes are located a distance N from the screen surface of display device 506 for proper viewing. For example N may be four meters for a 64 inch screen and 2 meters for a 32 inch screen. A desired object depth range for perceived depth of an objects lies between the lines 508 and 510. The perpendicular distance F0 from line 508 to screen surface of display device 506 represents the limits of a comfortable perceived depth for objects outside the screen surface. A perpendicular distance FI from line 510 to screen surface of display device 506 represents the limit of a comfortable perceived depth for objects inside the screen surface. The comfortable viewing range for objects presented by the display device 506 may lie is a zone between FI and F0, which zone is determined in part by the size of the screen of the display device. In some examples FI may equal 0.7W and FO 0.4W. As noted, in stereo 3D displays, depth is perceived by disparities in the position of objects in left and right images that are provided to respective left and right eyes to produce a perceived three dimensional image. In order to generate the proper depth of an object, the disparity d between left and right images on a device screen is generated to either make the object appear in front of the screen (negative depth) or behind the screen (positive depth). Turning now to FIG. 5B there is shown the geometrical relationship between disparity d and location of an object PI that is apparently located inside or behind the screen of display device 506. In this case a left image perceived by the left eye 504 is displaced outwardly toward the edge of display device 506 from a right image perceived by the right eye 502. This causes the location of object 520 to appear inside the screen at position PI. Turning now to FIG. 5C there is shown the geometrical relationship between disparity d and an object P2 that is apparently located outside the screen of display device 506. In this case a left image perceived by the left eye 504 is displaced inwardly toward from the edge of display device 506 with respect to a right image perceived by the right eye 502. The left and right images in such a scenario cross one another such that the left image is perceived to lie to the right of the right image, which is equivalent to the right image being perceived to lied to the left of the left image. This causes the location of object 520 to appear outside the screen at position P2. The location of an object with respect to the viewer is termed herein the perceived depth h. It is straightforward from the geometry shown in FIGs. 5B and 5C using the principle of similar triangles to derive the relation between the perceived depth h and disparity d. In the scenario of FIG. 5B, the value of h is given by: sW h = T e —s". while in FIG. 5C, Since e is constant for a given viewer and N is constant for a given display screen, once the disparity d is decided, the perceived depth is determined. In order to produce the proper perceived depth of objects in an S3D video when presented on any target device, various embodiments employ device information of the target device as well as depth information received from the encoded S3D video to generate the proper disparity for the target device. Referring again to FIG. 2, in one implementation device aware decoder system 212 may perform an analysis to determine parameters related to presentation of an S3D video on a display device, such as the display device 224. For example, the device aware decoder component 220 may first retrieve or receive information regarding screen size including screen width W for the display device 224, as well as expected viewing distance N for viewers of the display device 224, and e, which may represent a determined average viewer eye separation. In one implementation when a source frame of an encoded S3D video is received by the device aware decoder system 212, the frame may be decoded to generate an L-frame, R-frame and depth frame that constitute each video frame of the encoded S3D video as discussed above. In some implementations such as S3D video streaming, S3D video frames may be loaded when received into memory 216, which may include one or more buffers. The pixels in the depth frame for each loaded S3D video frame may then be analyzed to determine the pixel position of pixel pairs in related L-frame and R-frame of the given S3D video frame. A pixel pair constitutes a designated pixel from an L-frame and a related pixel from an R-frame that contain the object data corresponding to a pixel position designated in the depth frame of that video frame. The depth frame may present a depth image that represents data collected from an intermediate position between images collected from left and right cameras that are stored in the respective left and right frames. Accordingly, the depth information for an object or portion of an object that is recorded at any pixel position of the depth frame may be used to determine the location of corresponding pixels in the left and right frames that record the same object or portion of an object. The pixel position of the object in the R-frame and L-frame are each are displaced from the pixel position of the depth frame due to the different location of the left, right, and depth cameras used to record the same object. For example, considering a frame pixel (x,y) of the depth frame, which may be located at any position in the depth frame, in order to convey depth the corresponding pixel (xL, yL) in the L-frame may be displaced from the pixel position represented by the pixel (x,y) of depth frame by a disparity d in a first direction along the "x axis" and the corresponding pixel in the R-frame (XR, yR) may be displaced by a disparity din a second direction opposite the first direction along the x-axis. In particular, the coordinates of the corresponding L-frame pixel may be given by: xL= x □ d(x , y), y L= y; and the coordinates of the corresponding R-frame pixel may be given by: XR= X+ d(x,y), andyR=Y. In order to determine the value of d for a given pixel x,y in the depth frame being evaluated, it may be recalled that for objects inside and outside the screen, respectively. Accordingly, the disparity is given by d=(N-)e//z or d = (h-N)e/h. The value of the perceived depth h for a given pixel (x,y) in turn is determined directly from the depth (x,y) value for the given pixel x,y of the depth frame according to h(x,y) =k *depth(x,y), where kis a constant. For example, the depth value in a depth frame may be provided based upon a reference or desired screen size to generate the proper perceived depth h. Moreover, the value of N and emay be predetermined for a given screen size as discussed above. Accordingly, the device aware decoder component 220 may load the depth value for the given pixel x,y, of the depth frame being processed, and apply the relevant values of k, N and ein order to determine d, in order to determine the coordinates for the pixel pair in the related L-frame and R-frame. The above procedure is highlighted in FIG. 6, which depicts operations involved in determination of a pixel pair in a L-frame and R-frame consistent with the present embodiments. As illustrated, an x,y pixel corresponding to the depth frame 602 is chosen for analysis. In the present embodiments, a series of x,y pixels that make up the depth frame 602 may be loaded in serial fashion to determine the related pixel pair in L-frame and R-frame for each x,y pixel. For a given x,y pixel, the depth (x,y) is retrieved by the device aware decoder component 220 from the depth value stored in the depth frame 602. The device aware decoder component 220 also receives the k, N and e values for the relevant depth values of the S3D video being decoded. Based upon these inputs, the device aware decoder component 220 calculates

Documents

Application Documents

#	Name	Date
1	Drawing [02-05-2016(online)].pdf	2016-05-02
2	Description(Complete) [02-05-2016(online)].pdf	2016-05-02
3	Form 3 [18-05-2016(online)].pdf	2016-05-18
4	201647015224.pdf	2016-06-08
5	Other Patent Document [19-09-2016(online)].pdf	2016-09-19
6	Form 26 [19-09-2016(online)].pdf	2016-09-19
7	Other Document [21-09-2016(online)].pdf	2016-09-21
8	Marked Copy [21-09-2016(online)].pdf	2016-09-21
9	Form 13 [21-09-2016(online)].pdf	2016-09-21
10	Description(Complete) [21-09-2016(online)].pdf	2016-09-21
11	201647015224-Power of Attorney-210916.pdf	2016-11-29
12	201647015224-Correspondence-PA-210916.pdf	2016-11-29
13	Other Patent Document [05-04-2017(online)].pdf	2017-04-05
14	Correspondence by Agent_Proof of Right_12-04-2017.pdf	2017-04-12
15	Form 3 [03-05-2017(online)].pdf	2017-05-03
16	201647015224-FORM 3 [03-11-2017(online)].pdf	2017-11-03
17	201647015224-FORM 3 [03-05-2018(online)].pdf	2018-05-03
18	201647015224-FER.pdf	2019-02-26
19	201647015224-FORM 3 [03-05-2019(online)].pdf	2019-05-03
20	201647015224-PETITION UNDER RULE 137 [26-08-2019(online)].pdf	2019-08-26
21	201647015224-OTHERS [26-08-2019(online)].pdf	2019-08-26
22	201647015224-FER_SER_REPLY [26-08-2019(online)].pdf	2019-08-26
23	201647015224-DRAWING [26-08-2019(online)].pdf	2019-08-26
24	201647015224-COMPLETE SPECIFICATION [26-08-2019(online)].pdf	2019-08-26
25	201647015224-CLAIMS [26-08-2019(online)].pdf	2019-08-26
26	Correspondence by Agent _Form 5_30-08-2019.pdf	2019-08-30
27	201647015224-FORM 3 [05-11-2019(online)].pdf	2019-11-05
28	201647015224-FORM 3 [06-05-2020(online)].pdf	2020-05-06
29	201647015224-FORM 3 [04-11-2020(online)].pdf	2020-11-04
30	201647015224-FORM 3 [01-05-2021(online)].pdf	2021-05-01
31	201647015224-FORM 3 [05-10-2021(online)].pdf	2021-10-05
32	201647015224-US(14)-HearingNotice-(HearingDate-29-03-2022).pdf	2022-03-01
33	201647015224-Correspondence to notify the Controller [11-03-2022(online)].pdf	2022-03-11
34	201647015224-Correspondence to notify the Controller [17-03-2022(online)].pdf	2022-03-17

Search Strategy

1	201647015224_18-01-2019.pdf