“A System To Generate A Photorealistic Video In A Virtual World And A

< Back

“A System To Generate A Photorealistic Video In A Virtual World And A Method Thereof”

Abstract: Present disclosure relates to a technique of generating a photorealistic video in a virtual world. The technique includes identifying and marking boundaries and coordinates of realworld location using one or more sensors of a user device; capturing pictures of said realworld location with marked boundaries using one or more camera of the user device; mapping the boundaries and coordinates of the captures real-world location into boundaries and coordinates of the virtual world and specifying one or more camera positions in the virtual world, based on said mapping; obtaining user’s trajectories and movements from shared experience of users in the virtual world through the one or more sensors of the user device; rendering the user’s trajectories and movements in real-time by overlaying the user’s trajectories and movements onto the one or more captures pictures of said real-world scene; and rendering a virtual scene for display on the user device.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

03 December 2020

Publication Number

23/2022

Publication Type

INA

Invention Field

COMMUNICATION

Status

IPO@KNSPARTNERS.COM

Parent Application

Applicants

HIKE PRIVATE LIMITED

Bharti Crescent, 1 Nelson Mandela Road Vasant Kunj, Phase - II New Delhi India 110070

Inventors

1. Mujtaba Hasan

Bharti Crescent, 1 Nelson Mandela Road Vasant Kunj, Phase - II New Delhi India 110070

2. Ankur Narang

Bharti Crescent, 1 Nelson Mandela Road Vasant Kunj, Phase - II New Delhi India 110070

3. Kavin Bharti Mittal

Bharti Crescent, 1 Nelson Mandela Road Vasant Kunj, Phase - II New Delhi India 110070

Specification

FIELD OF THE INVENTION:
[001] The present disclosure relates to a technique to generate a photorealistic video in a virtual
world.
BACKGROUND OF THE INVENTION:
[002] Smart mobile devices are becoming more common and sophisticated allowing users to
establish and maintain social connection virtually from anywhere using internet. One of
the way to achieve this is being able to share an experience that may include allowing a
group of users to view, hear, comment or be involved in the experience that one user is
going through. This forms shared experience. Thissharing of experiences in a virtual world
is possible through multiple sensors integrated within the smart mobile devices. However,
generating those shared experiences’ moments in a photorealistic environment is also
desirable. The virtual world maybe a mobile, Virtual reality (VR), or Augmented reality
(AR) application that lets the user interact visually and physically with virtual object,
agents and more.
[003] Current technology allows creation of photorealistic videos only when the two persons are
together. It also may involve a third person to record their experiences on a camera device.
Also, viewing angle or camera position is limited by the reach of the person holding the
camera in this case. Further, most of the virtual world experienced by the users do not offer
a customized view or set up based on shared experience of the user. Since, existing
approaches do not take into account of shared experience happening between the users, the
user experience may get lost.
[004] Therefore, there exists a need in the art for a technique that overcomes above mentioned
problems and allows users to experience a photorealistic video generated from shared
experiences of the users in the virtual world.
[005] It may be noted that, this Background is provided to introduce a brief context for the
Summary and Detailed Description that follow. This Background is not intended to be an
3
aid in determining the scope of the claimed subject matter nor be viewed as limiting the
described subject matter to implementations that solve any or all of the disadvantages or
problems presented by existing art.
SUMMARY OF THE INVENTION:
[006] The present disclosure overcomes one or more shortcomings of the prior art and provides
additional advantages discussed throughout the present disclosure. Additional features and
advantages are realized through the techniques of the present disclosure. Other
embodiments and aspects of the disclosure are described in detail herein and are considered
a part of the claimed disclosure.
[007] It is to be understood that the aspects and embodiments of the disclosure described below
may be used in any combination with each other. Several of the aspects and embodiments
may be combined together to form a further embodiment of the disclosure.
[008] In one non-limiting embodiment of the present disclosure, a method for generating a
photorealistic video in a virtual world is disclosed. The method comprises identifying and
marking boundaries and coordinates of real-world location using one or more sensors of a
user device. The method further comprises capturing pictures of said real-world location
with marked boundaries using one or more camera of the user device. The method further
comprises mapping the boundaries and coordinates of the captures real-world location into
boundaries and coordinates of the virtual world and specifying one or more camera
positions in the virtual world, based on said mapping. Furthermore, the method comprises
obtaining user’s trajectories and movements from shared experience of users in the virtual
world through the one or more sensors of the user device. The method comprises rendering
the user’s trajectories and movements in real-time by overlaying the user’s trajectories and
movements onto the one or more captures pictures of said real-world scene, based on the
mapping. At last, a rendered virtual scene for display on the user device is provided. The
above-mentioned method helps to provide realism of virtual objects enhancing the amount
of detail in the visual appearance, thereby providing an improved photorealistic user
experience.
4
[009] In still another non-limiting embodiment of the present disclosure, the method further
comprises a pre-processing step of capturing images of the user from multiple poses and
obtaining a 3D model for the user from the captured images.
[010] In yet another non-limiting embodiment of the present disclosure, obtaining the user’s
trajectories and movement from the shared experience in the virtual world further
comprises specifying one or more camera position and orientation during the shared
experience and capturing user’s activities and movement from the shared experience using
the one or more sensors of the user device.
[011] In still another non-limiting embodiment of the present disclosure, rendering the user’s
trajectories and movements in real time comprises mapping user expressions, lip syncs and
body movements mappings.
[012] In another non-limiting embodiment of the present disclosure, a system to generate a
photorealistic video in a virtual world is disclosed. The system comprises a user device
which further comprises one or more sensors. The one or more sensors configured to
identify and mark boundaries and coordinates of a real-world location. Further, the user
device also comprises one or more cameras that are operatively coupled to one or more
sensor. The one or more cameras are configured to capture pictures of said real-world
location with marked boundaries. Furthermore, the user device comprises a processing unit
electronically coupled to one or more sensors and the one or more cameras. The processing
unit is configured to map the boundaries and the coordinates of the captured real-world
location into boundaries and coordinates of the virtual world and specify the one or more
camera positions, in the virtual world, based on said mapping. The processing unit is further
configured to obtain user’s trajectories and movements from shared experience of users in
the virtual world through the one or more sensors of the user device. Furthermore, the
processing configured to render the user’s trajectories and movements in real-time by
overlaying the user’s trajectories and movements onto the one or more captures pictures of
5
said real-world scene, based on the mapping. At last, a rendered virtual scene for display
on the user device is provided by the processing unit.
[013] In yet another non-limiting embodiment of the present disclosure, the processing unit is
further configured to perform a pre-processing step that comprises capturing images of the
user from multiple poses and obtaining a 3D model for the user from the captured images.
[014] In still another non-limiting embodiment of the present disclosure, the processing unit
when obtaining the user’s trajectories and movements from the shared experience in the
virtual world, is configured to specify one or more camera position and orientation during
the shared experience, and capture user’s activities and movements from the shared
experience using the one or more sensors of the user device.
[015] In yet another non-limiting embodiment of the present disclosure, the processing unit when
rendering the user’s trajectories and movements in real time, is configured to map user
expressions, lip syncs and body movements onto real-world location.
OBJECTS OF THE INVENTION:
[016] The main object of the present invention is to generate a photorealistic video in a virtual
world based on a shared experience.
[017] Another object of the present invention is to enhance the user experience by providing the
personalized photorealistic experience of virtual world to each user involved in the shared
experience.
[018] Yet another object of the present invention is to provide independence of the camera
position and orientation while tracking movement trajectory of the user during the shared
experience.
[019] Still another object of the present invention is that the background in real world that can be
chosen as per user’s wish depending on the availability of the scene in camera coordinates.
6
BRIEF DESCRIPTION OF DRAWINGS:
[020] The accompanying drawings, which are incorporated in and constitute a part of this
disclosure, illustrate exemplary embodiments and, together with the description, serve to
explain the disclosed embodiments. In the figures, the left-most digit(s) of a reference
number identifies the figure in which the reference number first appears. The same
numbers are used throughout the figures to reference like features and components. Some
embodiments of system and/or methods in accordance with embodiments of the present
subject matter are now described, by way of example only, and with reference to the
accompanying figures, in which:
[021] Figure 1 illustrates a system for facilitating the present invention according to an
embodiment of the present disclosure.
[022] Figure 2 illustrates by way of a block diagram, a user terminal for generating a
photorealistic video in a virtual world according to an embodiment of the present
disclosure.
[023] Figure 3 discloses a flowchart of a method for generating a photorealistic video in a virtual
world according to an embodiment of present disclosure.
[024] It should be appreciated by those skilled in the art that any block diagrams herein represent
conceptual views of illustrative systems embodying the principles of the present subject
matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition
diagrams, pseudo code, and the like represent various processes which may be substantially
represented in computer readable medium and executed by a computer or processor,
whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION OF DRAWINGS:
[025] Referring now to the drawings, there is shown an illustrative embodiment of the disclosure
“A system to generate photorealistic video in a virtual world and a method thereof”. It is
understood that the disclosure is susceptible to various modifications and alternative forms;
7
specific embodiments thereof have been shown by way of example in the drawings and
will be described in detail below. It will be appreciated as the description proceeds that the
disclosure may be realized in different embodiments.
[026] In the present document, the word “exemplary” is used herein to mean “serving as an
example, instance, or illustration”. Any embodiment or implementation of the present
subject matter described herein as “exemplary” is not necessarily to be construed as
preferred or advantageous over other embodiments.
[027] While the disclosure is susceptible to various modifications and alternative forms, specific
embodiment thereof has been shown by way of example in the drawings and will be
described in detail below. It should be understood, however that it is not intended to limit
the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to
cover all modifications, equivalents, and alternative falling within the scope of the
disclosure.
[028] The terms “comprises”, “comprising”, “include(s)”, or any other variations thereof, are
intended to cover a non-exclusive inclusion, such that a setup, system or method that
comprises a list of components or steps does not include only those components or steps
but may include other components or steps not expressly listed or inherent to such setup or
system or method. In other words, one or more elements in a system or apparatus or device
proceeded by “comprises… a” does not, without more constraints, preclude the existence
of other elements or additional elements in the system or apparatus.
[029] Term virtual world in context of the present disclosure may refer to an environment,
wherein said environment may represent a real or fictitious world governed by rules of
interaction. In other words, virtual world may refer to simulated environment where a user
may be able to make changes in the virtual environment as per his/her choice and is allowed
to interact within such environment via his/her avatar. In particular, users in the virtual
world may appear on a platform in the form of representations referred to as avatars. The
degree of interaction between the avatars and the simulated environment may be
8
implemented by one or more applications that govern such interactions as simulated
physics, exchange of information between users, and the like. In an exemplary
embodiment, the term virtual world, virtual environment and virtual reality may be used to
interchangeably without departing from the scope of the present application.
[030] Avatar in context of the present application relates to graphical representation of a user,
user’s image/selfie or the user's character. Thus, it may be said that an avatar may be
configured to represent emotion/expression/feeling of the user by means of an image
converted into avatar capturing such emotion/expression/feelings by various facial
expressions or added objects such as heart, kisses etc. Further, it is to be appreciated that
an avatar may take either a two-dimensional form as an icon on a virtual platform such as
messaging/chat platforms and or a three-dimensional form such as in virtual environment.
[031] According to an aspect, the present disclosure provides a technique that enhances user
experience by generating a photorealistic video in a virtual world. The technique enables
to create a photorealistic video from shared experiences of two or more users in the virtual
world. The technique includes pre-processing steps which further includes steps of getting
a user’s image(s) from one or multiple poses, which is done for every user involved in a
shared experience and then obtaining the 3D model/avatar of each person or user. These
preprocessing steps are performed by each user device. Further, in the proposed technique,
each user device/terminal marks the boundaries of a real-world location, through one or
more sensors. After marking boundaries in the real world, the user device captures said
real-world location with the marked boundaries using one or more camera of the user
device. The technique further enables the user device to map the marked boundaries of the
captured real-world location into boundaries and coordinates of the virtual world. Based
on the mapping, a camera position is also specified in the virtual world. After mapping is
performed, the user device obtains user’s trajectories and movements from shared
experience of users in the virtual world through the one or more sensors of the user device.
Thus, when the shared experience happens, the techniques allows the user device to
generate, the photorealistic video by rendering the user’s trajectories and movements
captured from the shared experience, in real time. This is done by the user device by
9
overlaying the user’s trajectories and movements onto the one or more captured pictures
of said real-world scene, based on the mapping and providing a rendered virtual scene for
display on a user device.
[032] In an exemplary scenario, the technique also enables the user device to generate an
artistically looking video in a virtual world based on a different mapping function on
obtained photorealistic video.
[033] In the following detailed description of the embodiments of the disclosure, reference is
made to the accompanying drawings that form a part hereof, and in which are shown by
way of illustration specific embodiments in which the disclosure may be practiced. These
embodiments are described in sufficient detail to enable those skilled in the art to practice
the disclosure, and it is to be understood that other embodiments may be utilized and that
changes may be made without departing from the scope of the present disclosure. The
following description is, therefore, not to be taken in a limiting sense.
[034] Fig. 1 shows an exemplary system 100 required to generate a photorealistic video from
shared experiences in a virtual world. The system 100 may comprises at least two user
terminals 102a…..102n. Further, as shown in figure 1 the system 100 may comprise a
server 108 which may be communicably coupled to each of the user terminals ……… via
a network 106. The server 108 further comprises a processor 110 and a memory 112.
[035] In the illustrated embodiment, the system 100 includes at least two user terminals 102a and
102b connected to a server 108 through a communication network 106. In alternative
embodiment, the system 100 may include a plurality of user terminals 102a, 102b….
102m, 102n. Each of the plurality of said user terminals 102a….102n may include one or
more processing units, one or more cameras, one or more sensors and one or more
memories communicably coupled to each other (as shown in figure 2), to implement one
or more functionalities of the user terminals respectively. Example of user terminals 102a
... 102n may include, but not limited to, a personal computer, a mobile phone, a laptop, a
tablet and so forth. Further, each of the user terminals 102a, … 102n may include any
10
number of other components as required for their operation. However, description of such
components has been avoided for sake of brevity.
[036] The user terminals 102a, … 102n may be configured to capture multiple images of every
user involved in a shared experience. The user terminals 102a, … 102n may be configured
to capture multiple images of users from one or more multiple poses. In non-limiting
example, the user terminals 102a, … 102n may capture images from different viewing
angles. Further, the user terminals 102a, … 102n may be configured to obtain a 3D model
or avatar of each user with the help of captured multiple images of users. The 3D model or
avatar may be obtained using available machine learning techniques. In one of non-limiting
example, the machine learning technique may be referred to as one of computer vision and
pattern recognition technique.
[037] The user terminals 102a, … 102n may be configured to identify and mark boundaries and
coordinates of a real-world location of users involved in shared experience. The shared
experience may include any action, that can be performed by two or more users
simultaneously over a common platform, and is not limited to watching a movie, a serial,
any web series, any video clip, audio clip. The user terminals 102a … 102n may identify
and capture pictures of said real-world location with marked boundaries and the captured
pictures may be sent to one or more other user terminals from the server 108 via
communication network.
[038] In an exemplary embodiment, user terminals 102a and 102b are in a chat session or
conversation with each other, on a virtual communication platform. The user terminals
102a and 102b both are configured to mark boundaries of a desired real-world location,
via one or more of sensors (explained in figure 2 below). Then, the user terminal 102a via
one or more camera of the user terminal 102a captures pictures of a desired real worldlocation with marked boundaries. In non-limiting example, capturing pictures may include
taking the focus of one or more camera (as shown in figure 2) on a real-world location such
as living room, shopping complex, theater, lawn, park, balcony, etc., which is of interest
to the users, and then clicking image of the location.
11
[039] Further, the user terminal 102a may be configured to obtain coordinates and boundaries of
captured real-world location and the virtual world, respectively. Based on these, the one or
more the user terminal 102a may be configured to map the boundaries and the coordinates
of the captured real-world location into the boundaries and coordinates of a virtual world.
In non-limiting example, the mapping between two coordinate systems can be any suitable
bijective mapping which is invertible. A linear mapping is an example. Other invertible
mapping could be, for example, a third order polynomial mapping. Depending on the
mapping function, the rendering in different regions of the mapped space could behave
differently by being stretched or squished. In another exemplary embodiment, the user
terminal 102a may specify the one or more camera position in the virtual world based on
the mapping.
[040] Furthermore, the user terminals 102 may be configured to obtain user’s trajectories and
movements from shared experience of users in the virtual world. In non-limiting example,
camera viewing trajectory may be different for different users. In another non-limiting
example, the lighting source as well as the direction of the camera viewing trajectory may
also be customizable.
[041] In one of the exemplary embodiments, the captured real-world location is treated as
‘background’ and the captured user’s images are treated as ‘foreground’. The user’s
trajectories and movements captured from shared experience in the virtual are overlaid on
the real-world location as background and the user’s image as foreground. In other words,
the user terminals 102a … 102n render the user’s trajectories and movements in real-time
by overlaying the user’s trajectories and movements onto the one or more captured pictures
of said real-world scene, based on the mapping. The user terminals 102a … 102n displays
a rendered virtual scene for display. In another embodiment, the rendered virtual scene may
also be provided and stored at the server 108 via the communication network.
12
[042] It may be worth noted that each of the user terminals 102a……102n may perform similar
operations/ functions as described in paragraphs [038]- [043], to achieve technical
effects/advantages of the present disclosure.
[043] FIG. 2 shows a detailed block diagram 200 a user terminal 102 of system 100 in FIG. 1,
in accordance with an embodiment of the present disclosure. According to an embodiment
of present disclosure, the user terminal 102 may comprise input/output (I/O) interface 202,
a processing unit 204, one or more cameras 206, one or more sensors 208, and a memory
210. The I/O interface 202 may include a variety of software and hardware interfaces, for
example, a web interface, a graphical user interface, input device, output device and the
like. The I/O interface 202 may allow the user terminals to interact with the user directly
or through other devices.
[044] In an exemplary embodiment, there can be many different types of sensors in the user’s
mobile device or some connected wearable IOT devices. These sensors are able to capture
the user’s movements of different body parts. These movements would be aggregated for
all the users and the shared experience in the virtual environment will be created on the
server. In one non- limiting example, the one or more sensors which could be used are
gyroscope sensor, accelerometer sensor etc. The 3D reconstruction of real environment
would be done by multiple images taken on the mobile device.
[045] The memory 210 is communicatively coupled to the processing unit 204, the one or more
sensors 208, I/O interface and the one or more cameras 206. Further, the memory 210 may
store information not limited to, images or pictures, videos, any other social media
information, communications with other people, chat logs, location of the user, checked-in
place information, interaction with one or more real-world entities, browsing history, etc.
[046] In an embodiment, the memory 210 may be a computer-readable medium known in the art
including, for example, volatile memory, such as static random access memory (SRAM)
and dynamic random access memory (DRAM), and/or non-volatile memory, such as read
13
only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical
disks, and magnetic tapes.
[047] In an embodiment, the information may be stored within the memory 210 in the form of
various data structures. Additionally, the information stored in memory may be organized
using data models, such as relational or hierarchical data models or lookup tables. The
memory may also store other data such as temporary data and temporary files, generated
by the various units 204-208 for performing the various functions of the user terminal 102.
[048] In an embodiment, the information may be processed by one or more units 204-208 of the
user terminal 102. In a non-limiting exemplary implementation, the one or more units 204-
208 may form part of the processing unit 204. In another implementation, the one or more
units 204-208 may be communicatively coupled to each other for performing one or more
functions of the user terminal 102. As used herein, the term ‘unit’ refers to an application
specific integrated circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or
group), a combinational logic circuit, and/or other suitable components that provide the
described functionality. In an embodiment, the other units may be used to perform various
miscellaneous functionalities of the user terminal 102. It will be appreciated that such units
may be represented as a single unit or a combination of different units.
[049] In an embodiment, the processing unit 204 may be configured to perform a pre-processing
step of capturing images of each user, from multiple poses and obtaining a 3D model for
the user from the captured images. The images of the users may be captured from one or
more cameras 206 of the user terminal 102.
[050] In accordance with an embodiment of the present disclosure, the one or more sensors 206
may be configured to identify and mark boundaries of a real-world location. The identified
and marked boundaries may be stored in the memory 210. Further, the one or more cameras
208 operatively coupled to the one or more sensors 206 are configured to capture pictures
of said real-world location with marked boundaries. These pictures may be stored in the
memory 210.
14
[051] The processing unit 204 is operatively coupled to the one or more sensors 208 and the one
or more cameras 206. The processing unit 204 is configured to map the boundaries and the
coordinates of the captured real-world location into boundaries and coordinates of the
virtual world and specify the at least one camera position, in the virtual world, based on
said mapping. The processing unit 204 is further configured to obtain user’s trajectories
and movements from the shared experience of users in the virtual world, through the one
or more sensors 208. In an embodiment, the processing unit 204 when obtaining the user’s
trajectories and movements from the shared experience in the virtual world, may be further
configured to specify the one or more camera position and orientation during the shared
experience. In another embodiment, specifying one or more cameras position may refer to
different orientation and viewing angle of the one or more cameras 206 of the user terminal
102.
[052] In an embodiment, the processing unit 204 when obtaining the user’s trajectories and
movements from the shared experience in the virtual world, is further configured to capture
user’s trajectories and movements from the shared experience using the one or more
sensors of the user terminal 102. In non-limiting example, user’s activities or trajectories
may include following but not limited to, to view, hear, comment or be involved in an
experience that one user is going through, sharing pictures and other media items,
communications with other users, interaction with one or more real-world entities,
browsing history, etc.
[053] Further, the processing unit 204 may render the user’s trajectories and movements in real
time by overlaying the user’s trajectories and movements onto the one or more captured
pictures of said real-world scene, based on the mapping. In an exemplary embodiment, the
processing unit 204, when rendering the user’s trajectories and movements in real time, is
further configured to map user expressions, lip syncs and body movements onto captured
real-world location. In a non-limiting example, the rendering can be performed using
available rendering techniques such as differential rendering, HDR rendering, ray tracing,
z-buffering etc. However, the rendering may be performed by using any other rendering
15
technique or algorithm. At last, the processing unit 204 is further configured to provide a
rendered virtual scene for display (not shown) on a user terminal 102.
[054] To explain the embodiments defined in Fig. 2, let us consider an example, in which a user
of the terminal 102a is in a conversation with users of other terminal 102b, on a virtual
communication platform. The conversation may be by way of exchanging text, stickers,
emoji, video, audio, etc., but not limited thereto. The users of the terminals 102a and 102b
already involved the conversation, may want to watch a movie together on the virtual
communication platform/ virtual world. Each of the users of terminal 102a and 102b may
want to watch this movie in a photorealistic environment of their choice, to experience
realism of virtual objects and thus to truly enjoy the environment they are in. In other
words, each of the user may choose a background or scene in real world as per his/her wish,
based on the availability of the scene. Once it is identified by the respective user terminals
that the users wants to watch a movie together, the respective user terminals may determine
the user preference to generate a background for creating the photorealistic environment
for watching the movie. One user may prefer to watch this movie in his bedroom while
the other one may prefer to watch this movie in his balcony. The other real-world scenes
that the users may prefer, but not limited thereto, lawns, gardens, park, terrace etc. which
is available at moment, and that suits the shared experience they want to be in.
[055] Based on this preference of user, the terminal 102a may mark boundaries of said a preferred
location in the bedroom through one or more sensors and may capture pictures of said
preferred location in the bedroom with marked boundaries through one or more cameras.
In the similar manner, the terminal 102b may mark and capture a preferred location in the
balcony. These captured locations will act as a ‘background’ in the photo realistic
environment.
[056] Before capturing user-preferred real-world scenes as above, the user terminals 102a and
102b may capture images of user, from multiple poses and obtain a 3D model or avatars
for each user from the captured images. Each of the users in the virtual world may appear
in the form of the avatars. These avatars act a ‘foreground’ in the photo realistic
16
environment. To create a photorealistic video or environment, the user terminals 102a and
102b may map boundaries and the coordinates of the captured real-world scene into
boundaries and coordinates of the virtual world. Further, the user terminals 102a and 102b
may capture the user’s movements of different body parts while the users are in
conversation with each other. For example, user’s movements may include user facial
expressions, lip syncs and body part movements, etc. These movements may be aggregated
for all the users by the respective terminals and the shared experience e.g. watching a movie
together in the virtual environment will be created on the server in real time. Each of the
user terminal 102a and 102b may accomplish this by rendering the user’s movements in
real time by overlaying the user’s movements onto the captured real-world scene, based on
the mapping. For example, the obtained user movementsfrom the virtual world are overlaid
on the real-world background and the avatars in the foreground and displayed on a display
of the respective user terminal.
[057] In this manner, the user terminals may ensure that the desired photorealistic
environment/video in the virtual world is generated based on user’s choice of the user and
enhances the user experience. The user terminal may also dynamically update the virtual
world based on the user interest. Further, the user terminals may monitor the user
activity/movements continuously during a shared experience and generate the desired
photorealistic environment/video in the virtual world accordingly.
[058] In an embodiment of the present disclosure, the photorealistic video may be transferable to
another domain other than the realistic domain such as artistic domain such as painting,
drawing etc. Therefore, in the final rendered image, some part of the image could be
rendered in the photorealistic domain and other part in an art domain. In a non-limiting
example, a user’s one hand could look like a real hand while the other one could look like
an artistically generated hand.
[059] It may be worth noted that, aforementioned paragraphs provide various technical effects or
advantages such as the users involved in the shared experience need not to be together at
one place for the creation of the photorealistic video. Further, the technique provides an
17
enhanced/enriched user experience by providing photorealistic video based on user’s
trajectories and movements in real time as captured during a shared experience.
Furthermore, the method also provides technical effect of improved user experience by
choosing background in real world as per wish of the user, based on the availability of the
scene.
[060] FIG. 3 shows a flowchart of an exemplary method 300 for enhancing the user experience
by generating a photorealistic video in a virtual world, in accordance with another
embodiment of the present disclosure. The method starts with pre-processing which
comprises capturing user’s images from one or more multiple poses. The step of capturing
images is performed for every user involved in a shared experience. The pre-processing
also comprises obtaining a 3D model for every user from the captured images. The 3D
model may be obtained using available machine learning techniques. In a non-limiting
example, the machine learning techniques may include computer vision and pattern
recognition technique. However, the 3D model may be obtained by using any other
machine learning technique or algorithm.
[061] At block 302, the method may describe identifying and marking boundaries and
coordinates of a real-world location using one or more sensors 208 of user terminals 102.
In an exemplary embodiment, the user’s involved in the shared experience marks the
boundaries in the real world through the one or more sensors 208 of the user terminal 102.
[062] At block 304, the method may describe capturing pictures of said-real world location with
marked boundaries using one or more cameras 206 of the user terminal 102. In an
exemplary embodiment, a guiding person or the user may be involved to capture real
pictures of the marked boundaries in the real-world. In an exemplary embodiment, the
virtual world may be enhanced by positioning the one or more sensors and one or more
camera according to the interest of users such as components, locations, and places in the
virtual world according to interest levels of the user associated with the components,
locations, and places. Further, the captured pictures may be stored in memory 210. In a
18
non-limiting example, the captured pictures may be stored along with the timing
information of the picture such as time duration and date, but not limited thereto.
[063] In a similar manner, boundaries and coordinates of virtual world are also identified and
marked and capturing of pictures of said virtual world with marked boundaries is also
performed. In another exemplary embodiment, the boundaries may be marked taking in
focus of the one or more camera on foreground section of the real-world location. In an
exemplary embodiment, the captured real-world boundaries may be treated as background
and the user picture may be treated as foreground, but not limited thereto.
[064] At block 306, the method may describe mapping the boundaries and the coordinates of the
captured real-world location into boundaries and coordinates of the virtual world. In an
exemplary embodiment, the marked boundaries of the real-world are mapped to the
boundaries in the virtual world and the camera position is specified. In non-limiting
example, specifying camera position may refer to orientation or viewing angle of the
camera while capturing the pictures of the user involved in the shared experience.
[065] At step 308, based on the mapping, user’s trajectories and movements from shared
experience of users in the virtual world are obtained. For example, the users who are
involved in the shared experience or share time in the virtual world, each of their movement
are captured through the one or more sensors 208. In an exemplary embodiment, the shared
experience may comprise interaction with one or more real-world entities such as watching
a movie together, but not limited thereto. In non-limiting example, the earlier obtained realworld boundaries may be treated as background and the user images may be treated as
foreground and vice versa.
[066] In an embodiment, the method of obtaining the user’s trajectories and movements from the
shared experience in the virtual world further comprises specifying one or more camera
position and orientation during the shared experience. In another embodiment, the method
of obtaining the user’s trajectories and movements from the shared experience in the virtual
world further comprises capturing user’s activities and movements from the shared
19
experience using the one or more sensors 208 of the user terminal 102. In non-limiting
example, the capturing user’s activities may include following but not limited to, social
media information, communications with other people, chat logs, location of the user,
checked-in place information, interaction with one or more real-world entities, browsing
history, etc.
[067] At step 310, the method may describe rendering the user’s trajectories and movements in
real time by overlaying the user’s trajectories and movements onto one or more captured
pictures of said real-world scene, based on the mapping. For example, the obtained
trajectories from the virtual world are overlaid on the real-world background and the users
in the foreground marking those trajectories. In another embodiment, rendering the user’s
trajectories and movements in real time further comprises mapping user expressions, lip
syncs and body movements mappings.
[068] At block 312, the method may describe providing a rendered virtual scene for display on
the user terminal 102. The method as discussed above in steps 302-312 provides various
technical effects or advantages such as the involved users in the shared experience need
not to be together at one place for the creation of the video. Further, the method provides
an enhanced user experience by providing photorealistic virtual experience video of realworld scenes. In an exemplary embodiment, the photorealistic video may be experienced
in other artistic domain such as painting, drawing etc. Furthermore, the method also
provides technical effect of improved user experience by choosing background in real
world as per wish on the availability of the scene in the camera coordinates.
[069] The illustrated steps are set out to explain the exemplary embodiments shown, and it should
be anticipated that ongoing technological development will change the manner in which
particular functions are performed. These examples are presented herein for purposes of
illustration, and not limitation. Further, the boundaries of the functional building blocks
have been arbitrarily defined herein for the convenience of the description. Alternative
boundaries can be defined so long as the specified functions and relationships thereof are
appropriately performed.
20
[070] Alternatives (including equivalents, extensions, variations, deviations, etc., of those
described herein) will be apparent to persons skilled in the relevant art(s) based on the
teachings contained herein. Such alternatives fall within the scope and spirit of the
disclosed embodiments.
[071] Furthermore, one or more computer-readable storage media may be utilized in
implementing embodiments consistent with the present disclosure. A computer-readable
storage medium refers to any type of physical memory on which information or data
readable by a processor may be stored. Thus, a computer-readable storage medium may
store instructions for execution by one or more processors, including instructions for
causing the processor(s) to perform steps or stages consistent with the embodiments
described herein. The term “computer- readable medium” should be understood to include
tangible items and exclude carrier waves and transient signals, i.e., are non-transitory.
Examples include random access memory (RAM), read-only memory (ROM), volatile
memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any
other known physical storage media.
[072] Suitable processors include, by way of example, a general purpose processor, a special
purpose processor, a conventional processor, a digital signal processor (DSP), a plurality
of microprocessors, one or more microprocessors in association with a DSP core, a
controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field
Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC),
and/or a state machine

We claim:
1. A method for generating a photorealistic video in a virtual world, the method comprising:
identifying and marking boundaries and coordinates of a real-world location using
one or more sensors of a user device;
capturing pictures of said real-world location with marked boundaries using one or
more camera of the user device;
mapping the boundaries and the coordinates of the captured real-world location into
boundaries and coordinates of the virtual world and specifying one or more camera
positions in the virtual world, based on said mapping;
obtaining user’s trajectories and movements from shared experience of users in the
virtual world through the one or more sensors of the user device;
rendering the user’s trajectories and movements in real time by overlaying the
user’s trajectories and movements onto the one or more captured pictures of said real-world
location, based on the mapping.
providing a rendered virtual scene for display on the user device.
2. The method of claim 1, further comprising a pre-processing step that comprising:
capturing images of the user from multiple poses; and
obtaining a 3D model for the user from the captured images.
3. The method of claim 1, wherein obtaining the user’s trajectories and movements from the
shared experience in the virtual world comprising:
specifying one or more camera position and orientation during the shared
experience; and
capturing user’s activities and movements from the shared experience using the one
or more sensors of the user device.
4. The method of claim 1, wherein rendering the user’s trajectories and movements in real
time comprising:
mapping user expressions, lip syncs and body movements mappings.
23
5. A system to generate a photorealistic video in a virtual world, the system comprising:
a user device comprising:
one or more sensors configured to:
identify and mark boundaries and coordinates of a real-world location;
one or more cameras operatively coupled to the one or more sensors and configured
to:
capture pictures of said real-world location with marked boundaries; and
a processing unit electronically coupled to the one or more sensors and the one or
more cameras, wherein said processing unit is configured to:
map the boundaries and the coordinates of the captured real-world location
into boundaries and coordinates of the virtual world and specify the one or more
camera positions, in the virtual world, based on said mapping;
obtain user’s trajectories and movements from shared experience of users
in the virtual world through the one or more sensors of the user device;
render the user’s trajectories and movements in real time by overlaying the
user’s trajectories and movements onto the one or more captured pictures of said
real-world location, based on the mapping.
provide a rendered virtual scene for display on a user device.
6. The system of claim 5, wherein the processing unit is further configured to perform a preprocessing step that comprises capturing images of the user from multiple poses; and
obtaining a 3D model for the user from the captured images.
7. The system of claim 5, wherein the processing unit when obtaining the user’s trajectories
and movements from the shared experience in the virtual world, is configured to:
specify one or more camera position and orientation during the shared experience;
and
capture user’s activities and movements from the shared experience using the one
or more sensors of the user device.
24
8. The system of claim 5, wherein the processing unit when rendering the user’s trajectories
and movements in real time, is configured to map user expressions, lip syncs and body
movements onto captured real-world location.

Documents

Application Documents

#	Name	Date
1	202011052704-STATEMENT OF UNDERTAKING (FORM 3) [03-12-2020(online)].pdf	2020-12-03
2	202011052704-PROVISIONAL SPECIFICATION [03-12-2020(online)].pdf	2020-12-03
3	202011052704-FORM 1 [03-12-2020(online)].pdf	2020-12-03
4	202011052704-DRAWINGS [03-12-2020(online)].pdf	2020-12-03
5	202011052704-DECLARATION OF INVENTORSHIP (FORM 5) [03-12-2020(online)].pdf	2020-12-03
6	202011052704-FORM-26 [04-12-2020(online)].pdf	2020-12-04
7	202011052704-DRAWING [07-12-2020(online)].pdf	2020-12-07
8	202011052704-CORRESPONDENCE-OTHERS [07-12-2020(online)].pdf	2020-12-07
9	202011052704-COMPLETE SPECIFICATION [07-12-2020(online)].pdf	2020-12-07
10	202011052704-Proof of Right [16-12-2020(online)].pdf	2020-12-16
11	202011052704-FORM 18 [30-09-2024(online)].pdf	2024-09-30