Low Latency 2 D To 3 D Conversion For Virtualrealitygaming

< Back

Low Latency 2 D To 3 D Conversion For Virtualrealitygaming

Abstract: A virtual-reality (VR) headset provides for enhanced immersion in a two-dimensional (2D) game using a low-latency 2D-to-3D conversion. Columns of pixels are scanned to identify image object edges in the forms of vertically adjacent pixels that differ by more than a predetermined threshold in color. To reduce processing time, only every third column is considered and only every third pixel in each considered column is considered. Object strips are identified by correlating vertical pairs of edges. Identified object strips are vertically enlarged to provide one of a stereo pair of images. Object strips are not resized for the other image of the stereo pair. The images of the stereo pair are respectively presented to the left and right eyes of a player to provide a more immersive experience.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

05 April 2017

Publication Number

41/2018

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

senthil@intepat.com

Parent Application

Applicants

Absentia Virtual Reality Pvt. Ltd.

1302, G25, AWHO, BANGALORE 560067

Inventors

1. MISHRA, Shubham

HAL Township, 510, GT Road, Kanpur -208007, Uttar Pradesh, India

2. PRASADE, Vrushali

A2-6/203, Flower Valley Complex, Off Eastern Express Highway, Khopat, Thane, (West) - 400601, Maharashtra,

3. VALIYATH, Harikrishna

A2, Ratandeep, 141 SV Road, Next to Shopper's Stop, Andheri (West), Mumbai - 400058, Maharashtra, India

Specification

LOW-LATENCY 2D-TO-3D CONVERSION FOR VIRTUAL-REALITY
GAMING
BACKGROUND
[01] Virtual-reality, three-dimensional (3D), games can provide an immersive and exciting gaming experience. Examples of 3D titles include: Kitchen for the Project Morpheus platform, Minecraft Hololens for the Microsoft Hololens platform, Time Machine for the Oculus Rift platform, and Keep Talking and Nobody Explodes for the Gear VR platform. While 3D versions of some conventional two-dimensional (2D) games are being released, there is a vast trove of conventional 2D games that are not scheduled for conversion to 3D. It would be desirable to provide for an immersive virtual-reality experience for such games, and to allow owners of games that are to be rereleased to benefit from immersive virtual-reality without requiring them to buy the new versions of already-owned games.
[02] One challenge is to achieve the desired 2D-to-3D conversion without disturbing the perception of immediacy between a player’s actions and the results on the display. Inherently, there is a delay between a user action and the resulting delay. The processing requirements for most 2D-to-3D conversions would add an excessive delay, compromising the sense of immersion and, in some cases, rendering the game unplayable. What is needed is lower latency 2D-to-3D conversion suitable for gaming applications.

BRIEF DESCRIPTION OF THE DRAWINGS
[03] FIGURE 1 is a schematic diagram of a gaming system and associated process for converting a 2D game to 3D.
[04] FIGURE 2 is a grey-scale rendition of a screen shot day-map source image to be converted to a stereo pair according to the process of FIG. 1.
[05] FIGURE 3 is a grey-scale rendition of a screen shot of aside-by-side stereo pair of images resulting from a 2D-to-3D conversion of the image of FIG. 2. A nose has been added to minimize disorientation of a player using the gaming system of FIG. 1.
[06] FIGURE 4 is a grey-scale rendition of a screen shot of a night-map source image that may be converted to a stereo pair according to the process of FIG. 1.
[07] FIGURE 5 is a schematic diagram of a source image showing selected pixels of a source image according to the process of FIG. 1.
[08] FIGURE 6 is a schematic diagram of the source image of FIG. 5for explaining edge detection and object resizing as implemented in the process of FIG. 1.
DETAILED DESCRIPTION
[09] In a low-latency 2D-to-3D conversion process, a subset of source-image pixels is selected to reduce the number of pixels to be handled while providing effective immersion. Edge detection is used to identify image objects. Identified objects are resized in at least one image of a stereo pair

derived from the source image. Presentation of the stereo pair to respective left and right eyes provides a 3D effect. The simple operations called for by the process along with the data reduction due to image sampling allows a low-latency conversion suitable for gaming.
[10] As shown in FIG. 1, a virtual-reality gaming system 100 includes a game controller 102, a game computer 104, and a virtual-reality (VR) headset 106, i.e., a VR head-mounted display (HMD). Game controller 102 may include such features as multiple buttons, directional pads, joysticks, and motion detection, for example.
[11] Computer 104 is a programmable hardware system for running a 2D computer game 108 and 2D-to-3D conversion software 110. To this end, computer 104 can include an input buffer 112, left and right (L&R) back buffers 114, and L&R front buffers 116. Input buffer 112 is used to store a 2D source image 118 generated by computer game 108. Front buffers 116 are used to store for display stereo image pairs resulting from the 2D-to-3D conversion. Each stereo pair includes a left-eye 120 image and a right-eye image 122. Back buffers 114 are used to pipeline converted images to front buffers 116 for sequential display.
[12] VR headset 106 includes a display system 124, an inter-pupil distance dial 126, a head-tracking system 128, and a haptic feedback system 130. Display system 124 includes: a left (L-) display 132 for display a left-eye image 120; and a right (R-) display 134 for display a right-eye image 122. Inter-pupil distance dial 126 can be used to adjust the separation of displays 132 and 134 to match an inter-pupillary distance of a user. In the illustrated VR headset, the default distance is set

at 55millimeters (mm). The dial is capable of adjusting the distance in the range 50 to 60 mm. Also vertical adjustment is provided to adjust the distance between screen and the eyes for myopic vision correction.
[13] 2D-to-3D converter 114is a hardware device, e.g., media encoded with code that, when executed by hardware, implements a 2D-to-3D conversion process 150 for converting a 2D source image, e.g., image 200, FIG. 2, to a stereo image pair, e.g., image pair 300, FIG. 3. Each source image includes a two-dimensional array of pixels, each of which is characterized by a respective position (x, y) vector and a respective color (R,G,B) vector.
[14] At 151, the effective distance between the left-eye and right-eye images is adjusted using dial 126. The adjustment allows matching of a distance between displays 132 and 134 with the inter-pupillary distance of the player.
[15] At 152, a color map for the image is characterized. For example, a day map, such as the one characterizing source image 200 (FIG. 2) may be distinguished from a night map, such as the one characterizing a source image 400 of FIG. 4. This characterizing can involve identifying the minimum and the maximum value of 8-bit-per-color-channel pixel data in each of the R, G and B color dimensions. If the difference between the minimum and the maximum in each scalar is less than, for example, 100, the map is identified to be low color range, which is characteristic of a night map, otherwise, the image is considered high color range, which is characteristic of a day map.
[16] This approximation works since there might be a single object standing out somewhere which would not fall in the average range and

render an otherwise low color range map as a day map. However, that much error is tolerated considering the tradeoff between the increased computations involved in identifying the frequency of pixels falling out of the average range and setting a threshold to that as well.
[17] At 153, a background color range is set as a function of the characterized color map. For example, a range of 50 per color channel can be set for a day map while a narrower range of 25 pre color channel can be used for a night map.
[18] At 154, a background portion of the source image is identified using the selected background range. For example, if a day map is determined at 153, then for each RGB color dimension, a range of 50, topping out at the maximum for that color value in the source image, is used. In one scenario, the color ranges could be R: 210-260, G: 190-240, B: 130-180. Pixels falling within this range are considered part of the background. If a night map is determined, the range is 25 values to the maximum for each color dimension. In FIG. 5, the unshaded portion 502 of image 500 is considered background, in contrast to a red image object 504 and a blue image object 506. Note that FIG. 5 is highly schematized for expository purposes.
[19] This background detection is not performed for every frame but only once per 60 frames to reduce computations. Typically, the background portion tends to persist for a relatively long time once the game is started, so an error tolerance in background detection for 60 frames at 60 frames-per-second (fps), in other words, 1 second is accepted. For other frames, the most recent background is carried forward and process 150 skips to sampling image pixels at 155.

[20] At 155, a subset of the pixels of the 2D source image is selected. That is, a subset of the pixels of the source image is selected for further processing. This reduces the amount of data to be processed, and, concomitantly, provides a lower latency. For example, as shown in the example of FIG. 5, an image can be divided into 3x3 arrays of pixels, with the center pixel of each array being selected and the remaining eight pixels in each array being unselected. Thus, a 9:1 data reduction is achieved. In an alternative embodiment, the sampling is performed before action 152, to reduce the amount of processing required for identifying background images. However, in the illustrated embodiment, the background of the source image is determined using all pixels in the image for greater accuracy.
[21] At 156, edges are detected. Edge detection can be performed by scanning columns of selected pixels for breakpoints. In the illustrated embodiment, one in 3 pixels in every vertical strip is selected and the pixel data retrieved. If the difference between vertically adjacent pixels in an individual color-dimension scalar is more than 50 and the sum across color dimensions of the differences is more than 150, then the vertically adjacent pixels represent an edge.Each edge is extrapolated so that it extends three pixels horizontally.
[22] Thus, in FIG. 6, which presents an alternative representation of source image 500, the selected pixels are arranged in rows R1-R8 and columns C1-C8. Scanning is column-wise from top to bottom, as indicated by arrow 602. Scanning from pixel R8C5 to R7C5 does not detect an edge as the pixels involved have similar colors. However, pixels R7C5 and R6C5 have sharply differing colors, so an edge is detectedtherebetween. No

edge is detected between pixels R6C5 and R5C5, or between pixels R5C5 and R4C4. An edge is detected between pixels R4C5 and R3C5.
[23] At 157, image objects are identified. Once a next edge (e.g., between R4C5 and R3C5) is determined, the color vectors associated with the pixel (e.g., R6C5) below the previous edge (between R7C5 and R6C5) and the pixel (R4C5) above the present edge (e.g., between R4C5 and R3C5) are compared. If the color difference in individual scalars is less than 50, then this block is identified as a single object that may be considered for resizing. However, if the vertical extent of this block is less than 80 pixels or greater than 600 pixels, it is not considered for resizing and/or other modification. Anarrow strip of pixels is not be consequential even after adding depth, while a strip taller than 600 is treated as a background object (sky/ground or similar). Note that the identified image objects are 3-pixel-wide vertical strips such as image object 604 in FIG. 6.
[24] At 158, the identified image objects are differentially modified, i.e., an image object is modified one way for the left image and another way for the right image. “Differentially modified” encompasses modifying an image object for one of the left and right images, but not for the other. In the illustrated embodiment, image objects are resized for the left image while the right image remains unchanged. For the left image, identified image objects are scaled up 10% vertically. More specifically, the top extends 5% higher and the bottom extends 5% lower, as shown schematically in dash at 606. For example, an object that extends 100 pixels from 250 to 350 vertically is resized to extend from 245 to 355. For another example, an object that extends 500 pixels is resized so that is extends 550 pixels.

[25] At 159, a nose 302, or rather a left-half nose 304 and a right-half nose 306 are added to the left and right images308 and 310 of a stereo image pair 300, as shown in FIG. 3. This helps prevent disorientation of the player that might occur as the player’s head turns to change the view in the VR field. At 160, the left and right images on route to displays 120 and 124 are time-staggered, e.g., 4-5 ms, so that they are slightly out-of-phase to enhance the sensation of depth. At 162, the left and right images are presented respectively to the left and right eyes. Actions 155-162 can be repeated for each image in a video sequence.
[26] The images in the video sequence are generated in response to user actions, including head motions that change the field of view. To reduce latency that might be introduced due to head movement, VR headset 198 includes two or more (as opposed to only one) inertial measurement units (IMUs), the outputs of which can be time-multiplexed to improve time resolution. The IMU’s can be 9-axis IMUs, with 3-axis accelerometers, 3-axis gyroscopes, and 3-axis magnetometers. The IMU outputs are time-staggered with a small phase shift to allow measurement of intermediate values. The signals can then be converted to human-interface-device (HID) signals to emulate mouse, keyboard, or joysticks with complete customization from the user end. The user can map keys of a keyboard in a game with any particular action captured by the gyro or the accelerometer.
[27] VR headset 106 also employs haptics involving feature abstraction and haptic feedback. Audiofeature abstraction can capture features like gun shots, bomb blasts, steps (e.g., walking) and recoil are to be gauged,

e.g., by building a directory for games,accessing the sounds, and comparing them. Moreover, the audio features can be game specific; the user selects the game to be played from a list. Moreover, pixel mapping can be done to extract features like: the health bar falls if the player gets shot by a gun, so once the pixels in that region change, it can depict a gunshot. Recoil can be checked by the mouse click as it relates to gunshot.Similarly, using audio and visual feature extraction, other game-specific features can be determined.
[28] Haptic feedback can take the form of vibrations and push-pull effects . An array of vibrationsensors is triggered by a microcontroller which receives feedback from the feature extraction software. The vibrations vary in terms of intensity, patterns and time. These sensors are placed in the cushion of the VR headset108.
[29] To enhance the pushback effect, a solenoid based pushback is produced. The solenoid is placed on the top of the head-mounted display (HMD) and the tip is connected to a strap which goes over the head. Whenever the solenoid is triggered, VR headset 106 is slightly pulled giving a push-back sensation to the user.
[30] Signals from game computer 104 can be taken from the video (e.g., HDMI or VGA) output, and these video signals can be converted to a Mobile Industry Processor Interface (MIPI) specification ora Low-voltage differential signaling (LVDS) standard to be displayed on the display(e.g., using a Toshiba IC). The system is USB powered. Also the ARM Cortex is used to interact with the two IMU’s and emulate an HID.

[31] There is also a spot jogger band developed which has an accelerometer chip. This band can be attached to the foot and track the motion of walking and running and sending this to gaming computer 104 in real time to interact with games to move around in real time.
[32] These and other variations upon and modifications to the described embodiments are provided for by the present invention, the scope of which is defined in the following claims.

We Claim:
1. A process comprising:
obtaining a source image including a two-dimensional array of image pixels, each of the pixels being characterized by a position vector and a color vector;
sampling the image to obtain selected pixels, the selected pixels being a non-exhaustive subset of the image pixels;
performing edge detection by comparing color vectors for adjacent selected pixels;
identifying image objects based at least in part on edge detections; and
generating a stereo pair of images from the source image at least in part by differentially modifying image objects for left and right images.
2. The process recited in Claim 1 wherein the differential modifying includes resizing image objects relative to the source image for one but not the other of the stereo pair of images.
3. The process of Claim 1 where in the generating further includes including images of noses in each of the stereo pair of images.
4. The process of Claim 1 wherein the selected pixels constitute a rectangular array of pixels.
5. The process of Claim 1 further comprising: identifying background pixels prior to performing the edge detection.

6. The process of Claim 5 wherein the identifying background pixels includes determining a source color range for the source image, selecting a background color range as function of the source color range, identifying as background pixels source-image pixels having color vectors within the background color range.
7. A system comprising media encoded with code that, when executed by hardware, implements a process including: 1. A process comprising:
obtaining a source image including a two-dimensional array of image pixels, each of the pixels being characterized by a position vector and a color vector;
sampling the image to obtain selected pixels, the selected pixels being a non-exhaustive subset of the image pixels;
performing edge detection by comparing color vectors for adjacent selected pixels;
identifying image objects based at least in part on edge detections;
generating a stereo pair of images from the source image at least in part by differentially modifying image objects for left and right images.
8. A system as recited in Claim 7 wherein the differential modifying includes resizing image objects relative to the source image for one but not the other of the stereo pair of images.
9. A system as recited in Claim 7 wherein the process further includes: identifying background pixels prior to performing the edge detection.

10. A system as recited in Claim 5 wherein the identifying background pixels includes determining a source color range for the source image, selecting a background color range as function of the source color range, identifying as background pixels source-image pixels having color vectors within the background color range.

Documents

Application Documents

#	Name	Date
1	Power of Attorney [05-04-2017(online)].pdf	2017-04-05
2	Form 5 [05-04-2017(online)].pdf	2017-04-05
3	Form 3 [05-04-2017(online)].pdf	2017-04-05
4	Form 1 [05-04-2017(online)].pdf	2017-04-05
5	Drawing [05-04-2017(online)].pdf	2017-04-05
6	Description(Complete) [05-04-2017(online)].pdf_96.pdf	2017-04-05
7	Description(Complete) [05-04-2017(online)].pdf	2017-04-05
8	201741012256-FORM 18 [24-05-2019(online)].pdf	2019-05-24
9	201741012256-FER.pdf	2021-10-17

Search Strategy

1	2021-02-1215-02-02E_12-02-2021.pdf