Abstract: ABSTRACT SYSTEM AND METHOD FOR GENERATING VIRTUAL CONTENT FOR A LIVE EVENT Disclosed is a system and a method for generating virtual content for a live event. The system event data comprising venue data and broadcast data. Further, a three dimensional (3D) digital twin may be generated based on the venue data. Subsequently, the system may identify one or more sub-events from the broadcast data. Furthermore, the system may select a timeframe and an area of interest based on the one or more sub-events. The system may recreate the one or more sub-events in the 3D digital twin based on the time frame and the area of interest using a transformation model that may employ machine learning algorithms trained for 3D reconstruction. [To be published with Figure 1]
Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR GENERATING VIRTUAL CONTENT FOR A LIVE EVENT
APPLICANT:
Quidich Innovation Labs Pvt. Ltd.
A Company Registered in Indian,
Having address as:
No 6, Keytuo, Kondivita Rd, M.I.D.C, Andheri East, Mumbai, 400059
The following specification describes the invention and the manner in which it is to be performed.
PRIORITY INFORMATION
[001] The present application does not claim a priority from any other application.
TECHNICAL FIELD
[002] The present subject matter described herein, in general, pertains to the technical field of multimedia technology and digital content creation. Specifically, the invention relates to systems and methods for generating virtual content for a live event.
BACKGROUND
[003] Due to advancements in broadcasting and multimedia, people all over the world can now enjoy watching and participating in live events, such as sporting events, media events, political events, technological events, community events, and the like. In order to increase the viewers' knowledge and enjoyment of the events, these broadcasts often include video footage, live commentary, and metadata. However, a persistent difficulty arises in the field of broadcasting: the need to quickly locate and review critical occurrences (also known as turning points) during a live broadcast. For example, in a sporting event, scoring plays, extraordinary athletic performances, situations that change the course of the game, and the audience's emotional response are all examples of crucial events. Even though these moments are invaluable to viewers, commentators, and broadcasters, finding them again in the middle of a live broadcast may be a hassle. When using conventional techniques, viewers frequently need to manually go through large recordings or look elsewhere for selective event retrieval. As a result of this restriction, novel solutions are needed that make use of digital content development technology to provide a more interesting and interactive viewing experience for viewers.
SUMMARY
[004] Before the present system(s) and method(s), are described, it is to be understood that this application is not limited to the particular system(s), and methodologies described, as there can be multiple possible embodiments which are not expressly illustrated in the present disclosures. It is also to be understood that the terminology used in the description is for the purpose of describing the particular implementations or versions or embodiments only and is not intended to limit the scope of the present application. This summary is provided to introduce aspects related to a system and a method for generating virtual content for a live event. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[005] In one implementation, a method for generating virtual content for a live event is disclosed. The method may comprise receiving event data comprising venue data and broadcast data. The venue data may comprise at least one of measurements of a venue, one or more images of the venue, and location of the venue. The broadcast data may include at least one of a live video stream and time stamp data. Further, the method may comprise generating a three dimensional (3D) digital twin based on the venue data. The 3D digital twin may be generated using at least one of computer vision, photogrammetry, and computer graphics techniques. Concurrently, one or more sub-events may be identified from the broadcast data. The one or more sub-events correspond to a key moment of the live event. The one or more sub-events from the broadcast data may be identified using a machine learning model. Furthermore, the method may comprise selecting a time frame and an area of interest in the broadcast data based on the one or more sub-events. The area of interest may include at least one of an area of a venue in the broadcast data and an object in the broadcast data. A beginning and an end of the sub-event is indicated by the time frame. Further, the one or more sub-events may be recreated in the 3D digital twin based on the time frame and the area of interest. The one or more sub-events may be recreated using a transformation model. The transformation model may employ a plurality of machine learning algorithms to convert the one or more sub-events from the broadcast data to virtual content in the 3D digital object. The method may comprise receiving input from a user. The input may comprise a viewpoint and a playback speed. The one or more sub-events may be recreated based on the viewpoint and the playback speed. The one or more sub-events recreated in the 3D digital twin may be viewed on multimedia devices including at least one of a mobile, a tablet, a Virtual Reality (VR) headset, a television (TV), and an Augmented Reality (AR) Headset. The recreation of the one or more sub-events in the 3D digital twin may be performed in real-time.
[006] In another implementation, a system for generating virtual content for a live event is disclosed. The system may receive event data comprising venue data and broadcast data. The venue data may comprise at least one of measurements of a venue, one or more images of the venue, and location of the venue. The broadcast data may include at least one of a live video stream and time stamp data. Further, the system may generate a three dimensional (3D) digital twin based on the venue data. The 3D digital twin may be generated using at least one of computer vision, photogrammetry, and computer graphics techniques. Concurrently, one or more sub-events may be identified from the broadcast data. The one or more sub-events correspond to a key moment of the live event. The one or more sub-events from the broadcast data may be identified using a machine learning model. Furthermore, the system may select a time frame and an area of interest in the broadcast data based on the one or more sub-events. The area of interest may include at least one of a point of a venue in the broadcast data and an object in the broadcast data. A beginning and an end of the sub-event is indicated by the time frame. Further, the one or more sub-events may be recreated in the 3D digital twin based on the time frame and the area of interest. The one or more sub-events may be recreated using a transformation model. The transformation model may employ a plurality of machine learning algorithms to convert the one or more sub-events from the broadcast data to virtual content in the 3D digital object. The system may receive input from a user. The input may comprise a viewpoint and a playback speed. The one or more sub-events may be recreated based on the viewpoint and the playback speed. The one or more sub-events recreated in the 3D digital twin may be viewed on multimedia devices including at least one of a mobile, a tablet, a Virtual Reality (VR) headset, a television (TV), and an Augmented Reality (AR) Headset. The recreation of the one or more sub-events in the 3D digital twin may be performed in real-time.
[007] In another implementation, a non-transitory computer program product having embodied thereon a computer program for receiving event data comprising venue data and broadcast data. The venue data may comprise at least one of measurements of a venue, one or more images of the venue, and location of the venue. The broadcast data may include at least one of a live video stream and time stamp data. Further, the program may comprise a program code for generating a three dimensional (3D) digital twin based on the venue data. The 3D digital twin may be generated using at least one of computer vision, photogrammetry, and computer graphics techniques. Concurrently, one or more sub-events may be identified from the broadcast data. The one or more sub-events correspond to a key moment of the live event. The one or more sub-events from the broadcast data may be identified using a machine learning model. Furthermore, the program may comprise a program code for selecting a time frame and an area of interest in the broadcast data based on the one or more sub-events. The area of interest may include at least one of a point of a venue in the broadcast data and an object in the broadcast data. A beginning and an end of the sub-event is indicated by the time frame. Further, the one or more sub-events may be recreated in the 3D digital twin based on the time frame and the area of interest. The one or more sub-events may be recreated using a transformation model. The transformation model may employ a plurality of machine learning algorithms to convert the one or more sub-events from the broadcast data to virtual content in the 3D digital object. The program may comprise a program code for receiving input from a user. The input may comprise a viewpoint and a playback speed. The one or more sub-events may be recreated based on the viewpoint and the playback speed. The one or more sub-events recreated in the 3D digital twin may be viewed on multimedia devices including at least one of a mobile, a tablet, a Virtual Reality (VR) headset, a television (TV), and an Augmented Reality (AR) Headset. The recreation of the one or more sub-events in the 3D digital twin may be performed in real-time.
BRIEF DESCRIPTION OF THE DRAWINGS
[008] The foregoing detailed description of embodiments is better understood when read in conjunction with the appended drawings. For the purpose of illustrating of the present subject matter, an example of construction of the present subject matter is provided as figures, however, the invention is not limited to the specific method and system for generating virtual content for a live event in the document, and the figures.
[009] The present subject matter is described in detail with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer various features of the present subject matter.
[010] Figure 1 illustrates a network implementation of a system for generating virtual content for a live event is disclosed, in accordance with an embodiment of the present subject matter.
[011] Figure 2 illustrates a method for generating virtual content for a live event, in accordance with an embodiment of the present subject matter.
[012] Figure 3 illustrates an example of virtual content generated for a live event, in accordance with an embodiment of the present subject matter.
[013] Figure 4 shows an example of a frame of a live video stream, in accordance with an embodiment of the present subject matter.
[014] The figures depict an embodiment of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION
[015] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The words “receiving,” “generating,” “identifying,” “selecting,” “recreating,” and other forms thereof, are intended to be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any system and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the exemplary, system and methods are now described.
[016] The disclosed embodiments are merely examples of the disclosure, which may be embodied in various forms. Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure is not intended to be limited to the embodiments described but is to be accorded the widest scope consistent with the principles and features described herein.
[017] The present subject matter introduces techniques for recreating key moments of a live event, such as a sporting event, as a projection in a 3D digital object. By leveraging machine learning techniques and computer vision, the invention aims to redefine how a viewer may engage with a live broadcasted event.
[018] The present subject matter discloses a system and a method for generating virtual content for a live event. Specifically, the present invention discloses a system to generate virtual content for an event that may improve viewing experience of an audience. The system may analyse a live video stream of an event using one or more machine learning algorithms. Further, the system may identify moments of significance in the live video stream and recreate the moments as a 3D projection.
[019] Certain technical challenges exist in generating virtual content for a live event in real-time. One technical challenge faced in generating virtual content for a live event is that the virtual content needs to contain engaging content of specific interest to viewers. The solution presented in the embodiments herein is to use a machine learning algorithm to identify key moments from a live video stream. The machine learning model is trained on a vast dataset of video streams of events and annotated key moments. Another technical challenge is that the virtual content must accurately represent or depict the key moment in the live video. The solution presented in the embodiments herein is to use a machine learning model trained using a vast dataset to convert image frames from the live video stream of the event into a 3D projection on a 3D digital object. The detailed functioning of the system is described below with the help of figures.
[020] While aspects of the described system and method for generating virtual content for a live event may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
[021] Referring now to Figure 1, a network implementation 100 of a system 102 for generating virtual content for a live event is disclosed. It may be noted that one or more users may access the system 102 through one or more user devices 104-2, 104-3…104-N, collectively referred to as user devices 104, hereinafter, or applications residing on the user devices 104.
[022] Although the present disclosure is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a virtual environment, a mainframe computer, a server, a network server, a cloud-based computing environment. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N. In one implementation, the system 102 may comprise the cloud-based computing environment in which the user may operate individual computing systems configured to execute remotely located applications. Examples of the user devices 104 may include but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106. In another implementation, the system 102 may be implemented on a user device 104 as a stand-alone system.
[023] In one implementation, the network 106 may be a wireless network, a wired network, or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further, the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[024] In one embodiment, the system 102 may include at least one processor 108, an input/output (I/O) interface 110, a memory 112, and a database 114. The at least one processor 108 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, Central Processing Units (CPUs), state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 108 is configured to fetch and execute computer-readable instructions stored in the memory 112.
[025] The I/O interface 110 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 110 may allow the system 102 to interact with the user directly or through the client devices 104. Further, the I/O interface 110 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 110 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 110 may include one or more ports for connecting a number of devices to one another or to another server.
[026] The memory 112 may include any computer-readable medium or computer program product known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, Solid State Disks (SSD), optical disks, and magnetic tapes. The memory 112 may include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. The memory 112 may include programs or coded instructions that supplement applications and functions of the system 102. In one embodiment, the memory 112, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the programs or the coded instructions.
[027] In an embodiment, for generating virtual content for a live event, a user may use the user device 104 to access the system 102 via the I/O interface 110. The user may register the user devices 104 using the I/O interface 110 in order to use the system 102. In one aspect, the user may access the I/O interface 110 of the system 102 to provide input to the system if required.
[028] The present subject matter describes the system 102 for generating virtual content for a live event. The system 102 may generate virtual content for a live event. In order to generate the virtual content, initially, the system 102 may receive event data. The event data may comprise venue data and broadcast data. The venue data may include at least one of measurements of a venue, one or more images of the venue, and location of the venue. The one or more images may be at least one of two dimensional (2D) images and depth images captured using one or more image sensors. The broadcast data includes at least one of a live video stream and time stamp data. The live video stream may be received using video capture devices. The live video stream may be captured using one or more cameras strategically placed in the venue. The live video stream may be transmitted from the one or more cameras to a broadcasting station. In an embodiment, the live video stream may be transmitted to a cloud server. Further, the system may fetch the live video stream from the cloud server for analysis and virtual content generation. The transmission may be done using at least one of wired connections, fibre optics, and wireless technologies. In an embodiment, the live video stream may be compressed for transmission. The time stamp data may be used to analyse the live video stream by splitting the video into one or more image frames arranged in a sequence based on the time stamp data.
[029] The venue data from the event data may be used to generate a Three Dimensional (3D) digital object. The 3D digital twin may be a replica of the venue in a virtual environment, such as metaverse. The 3D digital twin may be generated using at least one of computer vision, photogrammetry, and computer graphics techniques. In an embodiment, the venue data and the live video stream may be used to generate the 3D digital object. One or more images of the venue may be extracted from the live video stream to capture a plurality of features and objects that may be missing from the venue data. One or more machine learning algorithms including Neural Radiance Fields (NeRF), Multi-View Stream (MVS), and the like may be used to convert the venue data into the 3D digital twin of the venue.
[030] The 3D digital twin may be used in video editing software to illustrate one or more sub-events of the live event from the live video stream. The one or more sub-events may be key moments of the live event being broadcasted. The 3D digital twin may be used to animate or recreate the sub-events such that the sub-events may be viewed from angles that were not captured by a camera. One or more infographics may be added along with the animation and the recreated sub-events in the 3D digital twin for a better viewing experience for audience watching the live broadcast of the event.
[031] In an embodiment, the one or more sub-events may be identified by an operator controlling the broadcast of the live event. The operator may continuously observe the live video stream to identify key moments of the event in real time referred to as the one or more sub-events. The operator may mark the one or more sub-events in the live video stream at the broadcasting station. The operator may use a user device of the user devices 518 to mark the one or more sub-events.
[032] In an embodiment, the one or more sub-events (also referred to as key moments) may be identified using a machine learning model. The machine learning model is trained to identify the one or more sub-events from the broadcast data using a training dataset comprising broadcast data for a plurality of live events and one or more sub-events annotated in the broadcast data for the plurality of live events. The machine learning model may use one or more machine learning algorithms and video processing techniques including at least one of Convolutional Neural Networks (CNN), Motion Tracking, Feature extraction, Object Detection, Audio processing, and Action detection. The machine learning model may be trained using at least one of Supervised Learning, Unsupervised Learning, Transfer Learning, and Reinforcement Learning.
[033] The one or more machine learning algorithms and video processing techniques may be used for specific purposes. For example, CNN may be used to process one or more image frames of the live video stream and classify the image frames into one or more categories. For example, the categories may be “Key Moment,” “Partial Key Moment,” and “No Key Moment.” The CNN may be trained using a labelled dataset comprising a plurality of live video streams and annotated image frames of the live video streams. The image frames may be annotated with a category.
[034] Motion tracking and Object detection may be used to identify an area of interest or an object of interest and continuously observe the object of interest in the live video stream. In an embodiment, a unique identifier may be assigned to the object of interest. The unique identifier may be replicated in all the image frames of the live video stream containing the object of interest. In an embodiment, optical flow algorithms may be used to estimate motion of the object of interest between one or more image frames. The optical flow algorithms may be used to identify dynamic movement of the object of interest.
[035] For example, let us consider that a cricket match is being broadcasted. The live video stream is received from plurality of cameras placed in a cricket stadium. The object detection may be done using algorithms including at least one of You Only Look Once (YOLO), Region based Convolutional Neural Networks (R-CNN), and the like. The area of interest or the object of interest may be at least one of an area of the cricket field inside 30 yards circle around the pitch, boundary of the stadium, a player, a cricket ball, and the like. Let us consider that the object of interest is a cricket ball. The cricket ball may be marked with a unique identifier in every image frame of the live video stream containing the cricket ball. The unique identifier may be used to track movement of the cricket ball across the image frames containing the cricket ball. Let us assume that the cricket ball was bowled, and it hits the wickets. The dynamic motion of the ball changing direction upon hitting the wickets from an estimated path of the ball is identified. This may be identified as a key moment. The image frame in which the cricket ball hits the wicket may be categorised as “Key Moment,” and one or more frames before and after this image frame may be categorised as “Partial Key Moment.” The number of frames marked as “Partial Key Moment” before and after the image frame of the cricket ball hitting the wickets may be predefined. For example, all image frames between the cricket ball being bowled by a bowler to the cricket ball hitting the wicket may be categorised as “Partial Key Moment” and the image frames corresponding to the cricket ball being bowled by a bowler and the cricket ball hitting the wicket may be categorised “Key Moment.”
[036] In another example, let us assume that the cricket ball is hit by a batsman and is caught by a fielder. Let us assume that the object of interest is the fielder. The system may generate virtual content depicting movement of the fielder since the time the cricket ball is hit by the batsman to the time the fielder catches the cricket ball. Changing position coordinates of the fielder may be recorded as the fielder moves to catch the cricket ball.
[037] Consider another example, the cricket ball is hit for a boundary by the batsman. In this example, let us assume that the cricket ball is the object of interest. As seen in Figure 3, the system may generate virtual content highlighting the path taken by the ball from the batsman’s bat to the boundary. To generate the virtual content, the system may identify key moments in the live video stream. Let us assume that the system identifies an image frame in which the batsman hits the ball and an image frame in which the ball lands outside the boundary as “Key Moment.” The system may further categorise the image frames in between the two identified image frames as “Partial Key Moment.”
[038] Feature extraction may be used to assist in object detection and motion tracking. Audio detection may be used to identify key moments. Consider a machine learning algorithm trained to identify changes in audio of the live video stream. The machine learning algorithm may be trained with a dataset comprising audio files of one or more cricket matches, the audio files may be annotated with key moments based on change in at least one of amplitude, frequency, and waveform of audio in the audio files. Consider the example of the cricket match, let us assume that audience in the cricket stadium viewing the cricket match erupts in cheers when the cricket ball hits the wicket. The machine learning algorithm may identify the moment as a key moment. The machine learning algorithm may mark the timestamp of the moment as a key moment.
[039] Action detection may be used to monitor an area of interest in the venue. Consider the example of the cricket match, the area of interest may be a point in the cricket stadium or an object in the cricket stadium. Let us assume that the area of interest is an area of the cricket stadium from where the audience is viewing the cricket match. Action detection may be performed using a machine learning algorithm trained using a dataset comprising a plurality of video streams, and annotated image frames of the video streams. The image frames may be annotated as containing action or missing action. For example, when the cricket ball hits the wickets, an excessive movement may be observed in the audience, the machine learning algorithm may identify this as an action. Further, the machine learning algorithm may output the time frame of when the action was detected. The time stamp of the action may be used to identify a key moment. For example, based on the time stamp of the action, one or more seconds before or after the time stamp of the action may be identified as the key moment. In the above example of the cricket match, let us assume that the action is detected one second after the cricket ball hits the wickets. The key moment of the cricket ball hitting the wickets is identified as one second prior to the time stamp of the action.
[040] In an embodiment, the machine learning model may alert the operator of a key moment. Further, the operator may note the time stamp of the key moment. The operator may then use a user device to input the time stamp in the system 102 for recreation of the key moment.
[041] In an embodiment a recommender system may identify the key moments. The recommender system may employ a rule-based algorithm in combination with the machine learning model trained to detect key moments. The machine learning algorithm may be trained to identify objects of interest and check detect the key moments based on the rule-based algorithm. Further, the machine learning algorithm may be trained to process audio input from one or more commentators. The algorithm may determine the key moments based on one or more keywords identified in the audio input and based on the rule-based algorithm. Further, the algorithm may be trained to detect the location of the venue of the event and determine the key moments based on the location of the venue.
[042] Further to identifying the one or more sub-events (also referred to as the key moments), the event data and the 3D digital twin may be used to recreate the one or more sub-events in the 3D digital twin using a transformation model. The transformation model may be a machine learning model employing plurality of video processing techniques and machine learning algorithms to recreate the one or more sub-events. The transformation model may be trained to plot the area of interest in the 3D digital object. To plot the area of interest in the 3D digital twin, the transformation model may convert the area of interest in the image frames to a 3D projection of the area of interest in the 3D digital twin using techniques such as planar homography, Structure from Motion (SfM), and the like. The transformation model is trained using a training dataset comprising a plurality of video streams, a plurality of objects of interest selected in the plurality of video streams, a plurality of 3D digital objects corresponding to the plurality of video streams, and the plurality of objects of interest plotted in the corresponding plurality of 3D digital objects. The area of interest may be plotted for a time period marked by the timestamp of the key moments identified. In an embodiment, a time frame may be selected. The time frame may be selected based on the identified key moments. The time frame for each key moment may comprise a starting time stamp and an ending time stamp corresponding to beginning of the key moment and end of the key moment.
[043] In an embodiment, the transformation model may trim the live video stream based on the selected time frame. Further, the transformation model may extract one or more image frames in the time frame that may be categorised as “Key Moment.” In an embodiment, the transformation model may extract all image frames of the trimmed live video stream. The image frames may be extracted based on a predefined frame rate. The predefined frame rate may be concurrent with the broadcasting frame rate or lower than the broadcasting frame rate. The frame rate refers to number of image frames in one second of the live video stream.
[044] In an embodiment, the transformation model may be used to detect an area of interest in the image frames. The area of interest may be an area in the venue or an object of interest. Considering an example of a cricket match being the live event, the area in the venue may be at least one of boundary of the cricket stadium, the area behind the batsman including the wickets, and the like. The object of interest may be at least one of the cricket ball, the batsman, the bowler, and the like. In an embodiment, the transformation model may receive information regarding the area of interest from the machine learning model used to identify the one or more sub-events. In another embodiment, the transformation model may select the area of interest based on the one or more sub-events. The transformation model may use a machine learning algorithm trained using a dataset comprising a plurality of image frames corresponding to a plurality of video streams, a plurality of sub-events corresponding to the plurality of video streams, one or more categories of sub-events, and annotated areas of interest in the plurality of images. The machine learning model may output a category for the one or more sub-events and detect the area of interest in the image frames extracted from the live video stream. The categories of the sub-events may be based on the live event. For example, if the live event is a cricket match, the categories may include at least one of “Wicket,” “Boundary,” “Inning change,” and the like. The area of interest may be selected based on the category of the key moment. For example, in the above example of the cricket match, when the category is “Wicket,” the area of interest may be at least one of the area of the wickets in the cricket field or the cricket ball.
[045] Further, the transformation model may plot the area of interest on the 3D digital twin of the venue. For plotting the area of interest on the 3D digital twin accurately and to the scale, the transformation model may use a combination of machine learning algorithms including at least one of optical flow, Kalman filtering, Multi-Stereo View, Visual Simultaneous Localization and Mapping (V SLAM), and other object tracking and 3D reconstruction algorithms and mathematical projection techniques such as perspective projections, multi-view geometry, and the like. The transformation model may maintain the scale between the live video stream and the plotted version based on camera parameters, parts of the venue with known measurements, triangulated position of the area of interest determined by at least one of using live video feed from two or more cameras and MVS techniques. The scale may be continuously verified, and a real-time feedback may be provided to the machine learning algorithm for corrections required in order to maintain the scale of the plotted area of interest with respect to the area of interest in the live video stream.
[046] To plot the area of interest, the transformation model may perform the following steps:
[047] Firstly, the transformation model may track the area of interest in the live video stream using object detection and tracking algorithms such as You Only Look Once (YOLO), Kernelized Correlation Filters, and the like.
[048] Then, the transformation model may perform semantic segmentation to extract and isolate the area of interest from the live video stream. The transformation model may identify relevant pixels corresponding to the area of interest from an image frame of the live video stream.
[049] The transformation model may perform feature extraction and matching to identify the area of interest in other image frames of the live video stream and track the area of interest in the live video stream. The feature extraction may be used to determine measurements, location, textures, coordinates and other parameters of the area of interest required for 3D mapping.
[050] The area of interest may be rendered in the 3D digital twin using rendering techniques in graphics engines such as Unity3D® and Unreal Engine®. The area of interest may be plotted in the 3D digital twin by mapping the coordinates of the area of interest to corresponding coordinates in the 3D digital twin using techniques like SfM. Further, the texture may also be mapped to the plotted area of interest. The transformation model may be trained to maintain the measurements of the area of interest as per a scale factor defining the scale of the 3D digital twin of the venue with respect to the venue.
[051] To recreate a story from the one or more sub-events, the transformation model may recreate one or more image frames corresponding to the trimmed live video stream of the story. The one or more images may be recreated by plotting the area of interest and other objects visible in each image frame in an instance of the 3D digital object. Further, one or more instances of the 3D digital twin generated by recreating the one or more image frames may be stitched together to be played as a video. In an embodiment, one or more instances of the 3D digital twin may be modified by addition of infographics over one or more pixels of the one or more instances of the 3D digital object. The infographics may include additional details about the story being recreated.
[052] Consider the example of the cricket match and the key moment being the cricket ball hitting the wickets. The one or more image frames may be extracted based on the time frame selected for the key moment. Let us assume that the time frame begins at the time stamp of a bowler releasing the cricket ball while bowling. Let us assume that the time stamp is 12:43:52:32 (hours:minutes:seconds:miliseconds). The time stamp may be represented is various formats including hours:minutes:seconds:miliseconds of daytime, hours:minutes:seconds:miliseconds of elapsed time, and the like. Further, the time frame ends at the time stamp of the cricket ball hitting the wickets. Let us assume that the time stamp is 12:43:53:32. Based on the above assumptions, the time frame is 12:43:52:32 – 12:43:53:32 i.e. one second. Let us assume that the frame rate of the live video stream is 60 frames per second. Based on the time frame, there are 60 image frames corresponding to the key moment. Each of the 60 image frames may be arranged in a sequence based on the time stamp of the image frames. Further, each image frame may be converted into a 3D projection on the 3D digital twin generating an instance of the 3D digital twin for each image frame. Furthermore, the instances of the 3D digital twin may be used for video encoding or video synthesis. The instances may be stitched together using a video (Compression-Decompression) CODEC algorithm. Further, transitioning between the instances may be smoothened using techniques like linear interpolation. A recreation of the story may look like the ball being released from the bowler’s hand, travelling towards the wicket while following the path from the live video stream and hitting the wickets in the 3D digital object.
[053] In an embodiment, the system may recreate the entire live video stream in the 3D digital twin as a 3D digital virtual content. Further, the one or more sub-events may be identified in the 3D digital virtual content. Furthermore, the one or more sub-events may be converted into a simplified recreation by recreating frames of the 3D digital virtual content with only the area of interest. Considering the cricket match example, in the image frames of the cricket ball being released from the bowler’s hand till the cricket ball hits the wickets, the image frames may contain a lot of background and foreground objects other than the area of interest (the cricket ball, the wickets, the cricket pitch). In the simplified recreation of the key moment of the wicket, the transformation algorithm may only plot the trajectory followed by the cricket ball in the time frame between the cricket ball released from the bowler’s hand and the cricket ball hitting the wickets. The cricket pitch may be recreated to show where the cricket ball bounced on the cricket pitch before hitting the wickets or establish a perspective for better viewing experience.
[054] The one or more sub-events recreated by the system may be viewed by a user using the user devices. The user devices may include at least one of a mobile, a tablet, a Virtual Reality (VR) Headset, a television (TV), and an Augmented Reality (AR) Headset.
[055] In an embodiment, the system may comprise a user interface to receive an input comprising a viewpoint and a playback speed from a user. Since the sub-events are recreated in a 3D digital twin of the venue, the system can generate a view of the recreated story based on the viewpoint selected by the user. The viewpoint may be at least one of a location in the 3D digital object, one of a set of predefined viewpoints, and a moving point following the area of interest. The transformation model may be used to adjust the recreated story with respect to the viewpoint selected by the user. The transformation model may employ one or more 3D reconstruction algorithms such as MVS and NeRF and 3D graphics rendering techniques such as multi-view rendering using Real-time rendering engines. The playback speed may be achieved by increasing or decreasing the frame rate of rendering the image frames in the 3D digital object. One or more image frames may be skipped while decreasing the playback speed, one or more image frames may be added between two image frames of the live video stream while increasing the playback speed for a smooth viewing experience. The one or more image frames added between two image frames may be created using interpolation techniques. Interpolation algorithms, such as linear interpolation, cubic spline interpolation, or optical flow-based methods, may be used to generate new frames based on the existing ones. These algorithms calculate the intermediate frames between two adjacent frames, ensuring a smooth transition between them.
[056] In an embodiment, the transformation model may extract audio from the live video feed for the time frame and synchronize the audio with the recreated story in the 3D digital object. In another embodiment, when the entire live video stream is converted into a 3D digital virtual content, the audio from the live video feed may be added over the 3D digital virtual content.
[057] Referring now to Figure 2, a method 200 for generating virtual content for a live event is shown, in accordance with an embodiment of the present subject matter. The method 200 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types.
[058] The order in which the method 200 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 200 or alternate methods for generating virtual content for a live event. Furthermore, the method 200 for generating virtual content for a live event can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 200 may be considered to be implemented in the above-described system 102.
[059] At block 202, event data is received from at least one of a database, one or more internet sources, a broadcasting center, a video camera, and a cloud server. The event data may comprise details about the live event being broadcast. The details may include venue data, and broadcast data. The venue data includes at least one of measurements of a venue, one or more images of the venue, and location of the venue. The broadcast data includes at least one of a live video stream and time stamp data. Let us consider an example of a cricket match being the live event. The venue may be a cricket Stadium. Let us consider that the measurements of the cricket stadium include dimensions of the stadium building, dimensions of the playing field, size of the cricket pitch, shape of the boundary, radius of the boundary, and the like. One or more images of the stadium may be captured using a drone and an imaging device. The images may be captured from multiple angles to cover the entire stadium. In an embodiment, only one image of the stadium may be captured from the top angle. The location of the stadium may be provided as at least one of the name of the stadium and the city and geographical coordinates of the stadium. The location of the stadium may be used to find the stadium on one or more internet sources to find images of the stadium. The live video stream may be received from one or more cameras recording the cricket match. The time stamp data may be provided along with the video stream for synchronizing the video stream and virtual content generated for the video stream.
[060] At block 204, a three dimensional (3D) digital twin is generated based on the venue data. The method may comprise employing at least one of computer vision, photogrammetry, and computer graphics techniques to generate the 3D digital object. The 3D digital twin may be a digital twin of the venue. Considering the cricket match being the live event, the 3D digital twin may be 3D model of the cricket stadium where the cricket match is being played. The 3D model may be a scaled replica of the cricket stadium. The 3D model may include all the objects as seen on the cricket stadium including the audience in the stands, the flood lights, the wickets, the players, and the like.
[061] At block 206, one or more sub-events may be identified from the broadcast data. The one or more sub-events may be moments of the live event that may be of significance (key moments). In one example, the key moments may be identified by an operator continuously monitoring the live video stream. The operator may be an experienced person with knowledge of the event being broadcast. In another example, the key moments may be identified using a machine learning algorithm. In yet another example, the operator may receive identified key moments from the machine learning algorithm and may verify or modify them for re-recreation of sub-events. In an embodiment, the key moments may be identified from the live video stream. In another embodiment, the key moments may be identified from 3D digital virtual content created by converting the live video stream into the 3D digital virtual content by plotting image frames from the live video stream in the 3D digital object. The 3D virtual content may be created using a transformation model. The transformation model may use a combination of machine learning algorithms and mathematical projection techniques to convert the live video stream into the 3D virtual content.
[062] At block 208, a time frame and an area of interest may be selected in the broadcast data or the 3D digital virtual content based on the one or more sub-events. The time frame may indicate the duration, the beginning, and the end of the key moments. The area of interest may be determined based on the key moment using a machine learning algorithm trained to select the area of interest based on the types of key moment. The types of key moments may be determined based on the live event.
[063] At block 210, the one or more sub-events or the key moments may be recreated in the 3D digital object. The transformation model may be used to recreate the key moments in the 3D digital object. The recreation of the key moments may be an animated version of the key moments focusing on the area of interest. The transformation algorithm may use a plurality of machine learning algorithms to recreate the sub-events in the 3D digital object.
[064] Referring now to Figure 3, a snapshot of the virtual content generated is illustrated. The virtual content generated is a recreated story of a boundary in a cricket match. As seen in the figure, the trajectory 302 of the cricket ball 304 is seen. The trajectory follows the cricket ball until the cricket ball lands outside the boundary 306.
[065] Referring now to Figure 4, a snapshot 400 of the live stream is shown. The snapshot is taken right before the batsman 402 hits the cricket ball 304 that lands outside the boundary 306 shown in Figure 3.
[066] Referring now to Figure 5, data flow among various components used for implementation of the system to generate virtual content for a live event is illustrated, in accordance with embodiment of the subject matter. Initially, a live video stream is received from one or more cameras 502 at a broadcasting centre 504. The live video stream from multiple cameras may be manipulated and merged to generate a single stream for a user. The single stream may be uploaded to a cloud server 506. In an embodiment, the live video stream may be directly uploaded on the cloud server 506. The video stream that may be the live video stream or the single stream may be fetched from the cloud server using a video capture device 508. Further, the video stream may then be processed by a 3D digital twin generator 510.
[067] In an embodiment, the 3D digital twin generator may fetch one or more images by converting the video stream into image frames. The one or more images may be used to determine venue data and the venue data may further be used to generate the 3D digital object. The 3D digital twin generator may use a combination of machine learning algorithms for object tracking and 3D reconstruction algorithms.
[068] In another embodiment, the system 102 may receive the venue data including images of the venue, dimensions of the venue, and the like. The 3D digital twin generator may generate the 3D digital twin using the venue data based on one or more machine learning techniques including 3D reconstruction, computer vision, photogrammetry, and the like.
[069] The live stream and the 3D digital twin may be received by a Video manipulator 512. The video manipulator may convert the live video stream into the 3D digital virtual content using the transformation model. The Video manipulator may comprise a database of training data for the one or more machine learning algorithms used by the transformation model and the execution steps, a processor for executing the one or more machine learning algorithms to convert the live video stream to the 3D digital virtual content. In an embodiment, the user may provide input comprising a viewpoint and a playback speed using a user device 518. The user input may be transmitted to the video manipulator from the cloud server, or the user input may be received by the system 102 directly. The video manipulator may use the transformation model to recreate key moments from the live video stream with respect to the viewpoint and the playback speed provided by the user.
[070] The output of the video manipulator may be at least one of one or more sub-events identified by a machine learning model recreated as a 3D projection in the 3D digital twin and the 3D digital virtual content obtained using the transformation model on the live video stream. The output of the video manipulator may be uploaded to the cloud server to be transmitted to the user devices for viewing the one or more sub-events recreated in the 3D digital twin or the 3D digital virtual content. In an embodiment, the output of the video manipulator may be directly transmitted or conveyed to the user devices.
[071] The embodiments of the system and the method described above are explained considering an example of a sporting event. The systems and the methods may be used for any live streamed event including media events, political events, technological events, community events, and the like.
[072] Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include the following.
[073] Some embodiments of the system and the method may provide a better viewing experience of a live event to the audience.
[074] Some embodiments of the system and the method may provide abilities to interact with a live video stream of an event.
[075] Some embodiments of the system and the method may provide better viewing experience and better understanding by allowing a user to select viewpoints while watching the live event recreated in the 3D digital twin and by allowing the broadcasters to add infographics to a live video stream.
[076] Some embodiments of the system and the method may provide detailed data of every moment in a live event. The data may be recorded as 3D reconstruction of each moment of the live event. Intricate data such as distance measurements between objects at a particular moment of the live event may be extracted using the 3D reconstruction and mathematical algorithms.
[077] Although implementations for methods and system for generating virtual content for a live event have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for generating virtual content for a live event.
, Claims:We Claim:
1. A method for generating virtual content for a live event, the method comprising:
receiving event data comprising venue data and broadcast data;
generating a three dimensional (3D) digital twin based on the venue data;
identifying one or more sub-events from the broadcast data;
selecting a time frame and an area of interest in the broadcast data based on the one or more sub-events; and
recreating the one or more sub-events in the 3D digital object, based on the time frame and the area of interest, using a transformation model.
2. The method as claimed in claim 1, wherein the venue data comprises at least one of measurements of a venue, one or more images of the venue, and location of the venue.
3. The method as claimed in claim 1, wherein the broadcast data includes at least one of a live video stream and time stamp data.
4. The method as claimed in claim 1, wherein the area of interest is at least one of an area of a venue in the broadcast data and an object in the broadcast data.
5. The method as claimed in claim 1, wherein the time frame indicates a beginning and an end of the sub-event, and wherein the sub-event is a key moment of the live event.
6. The method as claimed in claim 1, wherein the transformation model plots the area of interest in the 3D digital object, and wherein the transformation model is trained using a training dataset comprising a plurality of video streams, a plurality of objects of interest selected in the plurality of video streams, a plurality of 3D digital objects corresponding to the plurality of video streams, and the plurality of objects of interest plotted in the corresponding plurality of 3D digital objects.
7. The method as claimed in claim 1, wherein the transformation model trims the broadcast data based on the time frame, and wherein the transformation model extracts one or more image frames from the broadcast data based on a predefined frame rate and the time frame.
8. The method as claimed in claim 7, wherein the transformation model detects the area of interest in the one or more image frames, and wherein the transformation model plots the area of interest from the one or more image frames on one or more instances of the 3D digital object.
9. The method as claimed in claim 8, wherein the transformation model stitches one or more plotted instances of the 3D digital twin to recreate the one or more sub-events.
10. The method as claimed in claim 1, wherein the one or more sub-events from the broadcast data are identified using a machine learning model.
11. The method as claimed in claim 10, wherein the machine learning model is trained to identify the one or more sub-events from the broadcast data using a training dataset comprising broadcast data for a plurality of live events and one or more sub-events annotated in the broadcast data for the plurality of live events.
12. The method as claimed in claim 1, wherein the 3D digital twin is generated using at least one of computer vision, photogrammetry, and computer graphics techniques.
13. The method as claimed in claim 1, wherein the one or more sub-events are recreated in the 3D digital twin based on a viewpoint selected by a user, and wherein the viewpoint selected by the user is at least one of a location in the 3D digital object, one of a set of predefined viewpoints, and a moving point following the area of interest.
14. The method as claimed in claim 13, wherein playback speed of the one or more sub-events in the 3D digital twin is controlled by the user.
15. The method as claimed in claim 1, wherein the 3D digital twin is viewed on multimedia devices including at least one of a mobile, a tablet, a Virtual Reality (VR) Headset, a television (TV), and an Augmented Reality (AR) Headset.
16. A system for generating virtual content for a live event, the system comprising:
a memory;
a processor coupled to the memory, wherein the processor is configured to execute a set of instructions stored in the memory for:
receiving event data comprising venue data and broadcast data;
generating a three dimensional (3D) digital twin based on the venue data;
identifying one or more sub-events from the broadcast data;
selecting a time frame and an area of interest in the broadcast data based on the one or more sub-events; and
recreating the one or more sub-events in the 3D digital object, based on the time frame and the area of interest, using a transformation model.
17. A non-transitory computer program product having embodied thereon a computer program for generating virtual content for a live event, the non-transitory computer program product storing instructions for:
receiving event data comprising venue data and broadcast data;
generating a three dimensional (3D) digital twin based on the venue data;
identifying one or more sub-events from the broadcast data;
selecting a time frame and an area of interest in the broadcast data based on the one or more sub-events; and
recreating the one or more sub-events in the 3D digital object, based on the time frame and the area of interest, using a transformation model.
| # | Name | Date |
|---|---|---|
| 1 | 202321075441-STATEMENT OF UNDERTAKING (FORM 3) [04-11-2023(online)].pdf | 2023-11-04 |
| 2 | 202321075441-REQUEST FOR EARLY PUBLICATION(FORM-9) [04-11-2023(online)].pdf | 2023-11-04 |
| 3 | 202321075441-FORM-9 [04-11-2023(online)].pdf | 2023-11-04 |
| 4 | 202321075441-FORM FOR STARTUP [04-11-2023(online)].pdf | 2023-11-04 |
| 5 | 202321075441-FORM FOR SMALL ENTITY(FORM-28) [04-11-2023(online)].pdf | 2023-11-04 |
| 6 | 202321075441-FORM 1 [04-11-2023(online)].pdf | 2023-11-04 |
| 7 | 202321075441-FIGURE OF ABSTRACT [04-11-2023(online)].pdf | 2023-11-04 |
| 8 | 202321075441-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [04-11-2023(online)].pdf | 2023-11-04 |
| 9 | 202321075441-EVIDENCE FOR REGISTRATION UNDER SSI [04-11-2023(online)].pdf | 2023-11-04 |
| 10 | 202321075441-DRAWINGS [04-11-2023(online)].pdf | 2023-11-04 |
| 11 | 202321075441-DECLARATION OF INVENTORSHIP (FORM 5) [04-11-2023(online)].pdf | 2023-11-04 |
| 12 | 202321075441-COMPLETE SPECIFICATION [04-11-2023(online)].pdf | 2023-11-04 |
| 13 | 202321075441-STARTUP [16-11-2023(online)].pdf | 2023-11-16 |
| 14 | 202321075441-FORM28 [16-11-2023(online)].pdf | 2023-11-16 |
| 15 | 202321075441-FORM 18A [16-11-2023(online)].pdf | 2023-11-16 |
| 16 | Abstact.jpg | 2023-12-05 |
| 17 | 202321075441-FORM-26 [07-02-2024(online)].pdf | 2024-02-07 |
| 18 | 202321075441-FORM-26 [05-03-2024(online)].pdf | 2024-03-05 |
| 19 | 202321075441-Request Letter-Correspondence [13-03-2024(online)].pdf | 2024-03-13 |
| 20 | 202321075441-Power of Attorney [13-03-2024(online)].pdf | 2024-03-13 |
| 21 | 202321075441-FORM28 [13-03-2024(online)].pdf | 2024-03-13 |
| 22 | 202321075441-Form 1 (Submitted on date of filing) [13-03-2024(online)].pdf | 2024-03-13 |
| 23 | 202321075441-Covering Letter [13-03-2024(online)].pdf | 2024-03-13 |
| 24 | 202321075441-Request Letter-Correspondence [26-03-2024(online)].pdf | 2024-03-26 |
| 25 | 202321075441-Power of Attorney [26-03-2024(online)].pdf | 2024-03-26 |
| 26 | 202321075441-FORM28 [26-03-2024(online)].pdf | 2024-03-26 |
| 27 | 202321075441-Form 1 (Submitted on date of filing) [26-03-2024(online)].pdf | 2024-03-26 |
| 28 | 202321075441-Covering Letter [26-03-2024(online)].pdf | 2024-03-26 |
| 29 | 202321075441-CORRESPONDENCE(IPO)-(WIPO DAS)-02-04-2024.pdf | 2024-04-02 |
| 30 | 202321075441-FER.pdf | 2024-06-25 |
| 31 | 202321075441-Form-4 u-r 138 [10-01-2025(online)].pdf | 2025-01-10 |
| 32 | 202321075441-FER_SER_REPLY [10-01-2025(online)].pdf | 2025-01-10 |
| 33 | 202321075441-DRAWING [10-01-2025(online)].pdf | 2025-01-10 |
| 34 | 202321075441-CLAIMS [10-01-2025(online)].pdf | 2025-01-10 |
| 35 | 202321075441-Proof of Right [14-01-2025(online)].pdf | 2025-01-14 |
| 36 | 202321075441-US(14)-HearingNotice-(HearingDate-14-07-2025).pdf | 2025-07-02 |
| 37 | 202321075441-PETITION UNDER RULE 137 [13-07-2025(online)].pdf | 2025-07-13 |
| 38 | 202321075441-FORM-26 [13-07-2025(online)].pdf | 2025-07-13 |
| 39 | 202321075441-Correspondence to notify the Controller [13-07-2025(online)].pdf | 2025-07-13 |
| 40 | 202321075441-Written submissions and relevant documents [19-07-2025(online)].pdf | 2025-07-19 |
| 41 | 202321075441-PatentCertificate28-07-2025.pdf | 2025-07-28 |
| 42 | 202321075441-IntimationOfGrant28-07-2025.pdf | 2025-07-28 |
| 1 | SearchE_19-06-2024.pdf |