System And Method For Generating A Visual Representation Of

< Back

System And Method For Generating A Visual Representation Of Participants During An Event

Abstract: A method for generating a visual representation of one or more participants in an event occurring on a field is disclosed. A live video stream of the event from an image capturing unit is obtained. The image capturing unit is mounted at predetermined height to capture an entire view of the field. The obtained live video stream is processed. Further, respective positions of the one or more participants on the field in the processed video stream are identified by deriving spatial information associated with the one or more participants. Furthermore, the visual representation of one or more participants based on identified positions of the one or more participants is generated, on a user device. [To be published with Figure 1]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

14 March 2024

Publication Number

15/2024

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

Quidich Innovation Labs Pvt. Ltd.

No 6, Keytuo Kondivita Rd, M.I.D.C, Andheri East

Inventors

1. KULSHRESHTHA, Rahat

404, Rosa Alba, Next to Nahar International School, Chandivali, Andheri East 400072

2. MEHTA, Gaurav

601, Chester Supreme Pallacio Near Pancard Clubs, Baner 411045

3. CHAUDHARY, Manuyash

1197 Sector-3 Rohtak Haryana India 124001

4. RN, Shyam

No 1/1, Kamarajar street Sathya Garden, Saligramam 600093

5. LAHOTI, Yash

Ration Shop near Maruti chowk, Gangapur 431109

6. MOHAMMAD TM, Thaha

Thaha Mohammed TM Thottivalappil Mangadavath, Nannamukku 679575

7. PATOLIYA, Abhishek

Block No.84 Street No. 2, Punit Park, Near Punit Nagar Rajkot Gujarat India 360004

Specification

Description:FORM 2

THE PATENTS ACT, 1970 (39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
SYSTEM AND METHOD FOR GENERATING A VISUAL REPRESENTATION OF PARTICIPANTS DURING AN EVENT

Applicant:
APPLICANT:
Quidich Innovation Labs Pvt. Ltd.
Having address as:
No 6, Keytuo, Kondivita Rd, M.I.D.C, Andheri East, Mumbai, 400059

The following specification describes the invention and the manner in which it is to be performed.
[001] This patent application does not take priority from any application.
TECHNICAL FIELD
[002] The present subject matter described herein, in general, relates to generating visual representation, of one or more participants in an event occurring on a field, on a user device.
BACKGROUND
[003] In the context of traditional live sports broadcasts, significant challenges persist in delivering a comprehensive and real-time understanding of dynamic game aspects to viewers. Within the framework of conventional live sports broadcasts, viewers continue to face substantial obstacles in getting a thorough and up-to-date grasp of dynamic game features. At pivotal points in the game, the present broadcast style frequently fails to provide comprehensive insights into player positions, inter-player distances, and extra contextual information. Viewers face challenges in locating crucial details and statistics about the players on the field, leading to a hindered ability to fully comprehend and enjoy the ongoing events. A reduced viewing experience and possible loss of engagement result from this lack of in-depth information. It is therefore essential to address these problems in order to improve audience satisfaction and live sports broadcasts’ overall quality.
SUMMARY
[004] Before the present systems and methods are described, it is to be understood that this application is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments which are not expressly illustrated in the present disclosures. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only and is not intended to limit the scope of the present application. This summary is provided to introduce concepts related to systems and methods for generating visual representation of the one or more participants during an ongoing event and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
[005] In one implementation, a method for generating a visual representation of one or more participants in an event occurring on a field is disclosed. The method further discloses obtaining a live video stream of the event from an image capturing unit. The image capturing unit may be mounted at a predetermined height to capture an entire view of the field. Further, the obtained live video stream is processed. Furthermore, respective positions of the one or more participants are identified on the field in the processed video stream by deriving spatial information associated with the one or more participants. The method further discloses receiving an input, the input comprises a selection corresponding to the one or more participants. Finally, the method comprises generating, based on the input, the visual representation of the one or more selected participants, on a user device.
[006] In another implementation, a system for generating a visual representation of one or more participants in an event occurring on a field is disclosed. The system may comprise at least one processor and a memory. The processor upon execution of one or more instructions stored in the memory is configured to obtain a live video stream of the event from an image capturing unit. The image capturing unit is mounted at a predetermined height to capture an entire view of the field. Further, the processor is configured to process the obtained live video stream. Furthermore, respective positions of the one or more participants are identified on the field in the processed video stream by deriving spatial information associated with the one or more participants. The processor is further configured to receive an input, the input comprises a selection corresponding to the one or more participants. Furthermore, the processor is configured to generate, based on the input, the visual representation of the one or more selected participants, on a user device.
BRIEF DESCRIPTION OF THE DRAWINGS
[007] The foregoing detailed description of embodiments is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the disclosure, examples of the disclosure are shown in the present document; however, the disclosure is not limited to the specific methods and apparatus disclosed in the document and the drawings.
[008] The detailed description is given with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer to like features and components.
[009] Figure 1 illustrates a network implementation of a system for generating a visual representation of one or more participants in an event occurring on a field, in accordance with an embodiment of the present subject matter.
[0010] Figure 2 illustrates the system, in accordance with an embodiment of the present subject matter.
[0011] Figures 3 – 5 illustrate examples, in accordance with an embodiment of the present subject matter.
[0012] Figure 6 illustrates a method for generating a visual representation of one or more positions of one or more participants in an event occurring on a field, in accordance with an embodiment of the present subject matter.
DETAILED DESCRIPTION
[0013] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The words "receiving," "processing," "identifying," "tracking," "generating," "assigning,", "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the exemplary, systems and methods are now described. The disclosed embodiments are merely exemplary of the disclosure, which may be embodied in various forms.
[0014] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure is not intended to be limited to the embodiments illustrated but is to be accorded the widest scope consistent with the principles and features described herein.
[0015] The present subject matter discloses a method and a system for generating visual representation of one or more participants in an event occurring on a field, on a user device. The event may be an ongoing sport event like but not limited to cricket, football, tennis, and hockey. The one or more participants may include persons present on the field who are involved in the ongoing match. For example, one or more players playing the match, one or more umpires or referees, and the like. The visual representation of one or more participants showcases various parameters related to their involvement, including but not limited to their position (like fielding positions in a cricket match) and movements or actions throughout the event. This visual information, presented alongside the regular event broadcast, offers a more profound insight into the strategic aspects of the sport. Also, this element keeps viewers engaged throughout the match. The visual representations may be generated by implementing a comprehensive system disclosed in the present application. This system allows streamlined receiving of the live video stream and tracking one or more positions of the participants to generate visual representation of the one or more participants on a user device to enhance overall viewing experience by providing insights, engagement opportunities, and a more immersive understanding of the sport.
[0016] It may be understood that the proposed invention has been described considering the sport ‘Cricket’ as the event. The scope of the proposed methodology may not be restricted only to Cricket, but can be implemented to any other outdoor sport or indoor sport including Football, Baseball, Tennis, Hockey, etc.
[0017] Referring now to Figure 1, a network implementation 100 of a system 102 for visual representation of one or more participants in an event occurring on a field, on a user device, is disclosed. In order to generate the visual representations, the system 102 obtains a live video stream from an image capturing unit 101 mounted at a vantage point in a manner such that the image capturing unit captures an entire view of the field 108. In an embodiment, the image capturing unit 101 may be protected and deployed in a watertight environment by using a camera box which restricts water ingestion in rainy seasons thereby keeping the image capturing unit 101 safe for use. The camera box may comprise a cooling unit such as a fan to maintain the temperature of the image capturing unit 101 and keeps temperature within predefined temperature limits. The camera box may comprise a splitter and a first converter unit configured to convert image signals, pertaining to stream of images in the live video stream, to Ethernet signals. In an embodiment, the converted ethernet signals are split and carried through the cables which connect to the base station where the system 102 is deployed.
[0018] Although the present disclosure is explained considering that the system 102 may be implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, a cloud-based computing environment present at the base station. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user 104 or stakeholders, hereinafter, or applications residing on the user devices 104. In an embodiment, the system 102 may broadcast the live video stream of the event to the user devices 102. In one implementation, the system 102 may comprise the cloud-based computing environment in which a user may operate individual computing systems configured to execute remotely located applications. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106.
[0019] In one implementation, the network 106 may be a wireless network, a wired network, or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0020] In operation, the system 102 may receive a live video stream including the entire view of the field 108 from the image capturing unit 101 and may process the received live video stream to identify respective positions of the one or more participants on the field in the processed video stream by deriving spatial information associated with the one or more participants. The spatial information comprises a set of position coordinates corresponding to the one or more participants. The system 102 may further receive an input comprising a selection corresponding to the one or more participants and generate a visual representation of the one or more selected participants based on the input.
[0021] Referring now to Figure 2, the system 102 is illustrated in accordance with an embodiment of the present subject matter. In one embodiment, the system 102 may include at least one processor 202, an input/output (I/O) interface 204, and a memory 206. The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, graphics processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, at least one processor 202 is configured to fetch and execute computer-readable instructions stored in the memory 206.
[0022] The I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 102 to interact with the user directly or through the client devices 104. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.
[0023] The memory 206 may include any computer-readable medium or computer program product known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 206 may include modules 208 and data 210.
[0024] The modules 208 may include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 208 may include an image obtaining module 212, a graphics rendering module 214, a detection module 216, a processing module 220, an image translating module 222, a tracking module 224, a receiving module 226, and other modules 218. The other modules 218 may include programs or coded instructions, or one or more machine learning models, that supplement applications and functions of the system 102. The other modules 218 described herein may be implemented as software and/or hardware modules that may be executed in the cloud-based computing environment of the system 102. The other modules may comprise at least one of a computer graphics module, image manipulation module, video codec module, and a 3-Dimensional (3D) object module. The other modules 218 may be used to project data related to one or more parameters associated with the participants generated by tracking each participant for a user viewing a live stream of the event.
[0025] The data 210, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules 208. The data 210 may also include a database 228 and other data 230. The other data 230 may include data generated as a result of the execution of one or more modules in the other modules 218.
[0026] As there were various challenges observed in the existing art, these challenges necessitated a need for an automated system 102 for generating visual representation of the one or more participants. In one example, the system 102 may generate the visual representation from a live video stream received from only one image capturing device 101 that captures the view of the complete field. In other examples, additional image capturing unit(s) may provide additional live video stream(s) to supplement the video stream received from a primary image capturing unit. In yet another example, the additional image capturing unit(s) may focus on close-ups, capturing detailed shots of players, expressions, and other actions of the players. In order to overcome the challenges as elucidated above, a user, at first, may use the client device 104 to access the system via the I/O interface 204. In an exemplary embodiment, the user may register the user device via the I/O interface 204 to access the system 102. In yet another embodiment, a registration may not be required to access the system 102. The system 102 may employ the image obtaining module 212, the graphics rendering module 214, the detection module 216, processing module 220, image translating module 222, tracking module 224, receiving module 226, , and the other modules 218 for generating the visual representation of each participant present on the playing field, in real-time, during the ongoing event and further generating visual representation of each participant along with information associated with them.
[0027] In an embodiment, the system 102 obtains a live video stream of the event from the image capturing unit 101. The image capturing unit may be mounted at a predetermined height to capture an entire view, like an aerial view, of the field. The live video stream captured by the image capturing unit 101 may be a continuous stream of images captured in sequence at a rapid rate. The system obtains the live video stream from the image capturing unit via the image obtaining module 212. The image obtaining module obtains individual frames of images, composed in the live video stream, from the image capturing unit. Referring to figure 3, where the image capturing unit 101 is mounted on one of the flood lights illuminating playing field 302 and capturing the aerial view of the playing field 302, in entirety. In an embodiment, the image obtaining module 212 may pre-process each image frame in the live video stream by performing one or more actions like, but not limited to, image compression, noise reduction, image resizing, normalization, and frame cropping.
[0028] In an embodiment, the system is configured to process the live video stream obtained from the image obtaining module, via a processing module 220. In an exemplary embodiment, the live video stream may be processed to detect lighting conditions in each of the image frames of the live video stream. The lightning conditions may refer to characteristics of illumination in a given environment, at a given time, in the image frames of the live video stream. The lighting conditions play a critical role in shaping the visual appearance of images, affecting the perception of colors, shadows, contrasts, and overall visibility. The processing module may use one or more techniques, developed, or may be developed in the future, to determine the lightning conditions in each image frame of the live video stream. For example, in an embodiment, the processing module may perform histogram analysis to detect the lighting conditions where the histogram of pixel intensities in each image frame may be analyzed to provide insights into the distribution of brightness values. Sudden shifts or outliers in the histogram might indicate changes in lighting conditions. In yet another embodiment, the processing module may use Region of Interest (ROI) analysis to detect the lighting conditions. In this analysis, a focus is made on a specific region of interest within the image frame, such as the playing field or specific areas where lighting changes are critical and provide a more targeted assessment of lighting conditions. In yet another embodiment, the processing module performs contrast analysis to analyze contrast between different parts of the frame, such that sudden changes in contrast may indicate variations in lighting. In yet another embodiment the processing module may use a machine learning model being trained to classify frames based on their lighting conditions. The machine learning model may be trained on a dataset with labeled examples of different lighting conditions.
[0029] In an embodiment, the lighting conditions may depend at least on one or more parameters: timing of day, type of a stadium comprising the field, weather conditions, angle of the image capturing unit, one or more shadows, and reflections from the field. In an example, the direction and intensity of sunlight are affected by variations in the sun’s angle throughout the day. For example, as the sun is lower on the horizon in the morning and evening, it creates longer shadows and softer, warmer light. On the other hand, the midday sun is typically higher in the sky, making the illumination harsher with more intense shadow. Further, the type of stadium also affects the lighting conditions, like direct sunlight can create dynamic lighting conditions in an outdoor stadium. On the other hand, artificial lighting in an indoor stadium might offer steady illumination, but the lights' color temperature and intensity might still change. Further, the lighting conditions may depend upon the weather conditions, like on a cloudy day, the lighting is diffused, and shadows are softer. However, if it’s a sunny day, the sunlight can create strong contrasts between light and shadow. Rain can also affect lighting conditions, as it may add reflections and change the overall brightness. Furthermore, the angle at which the image capturing unit may be placed also affects the lighting conditions in the frames of the live video stream. For example, capturing the video of the one or more participants in direct sunshine might cause the faces of the participants to have shadows. Thus, depending on the time of day and the angle of illumination, players, equipment, and structures on the field can cast shadows. The length and intensity of these shadows can change rapidly, affecting the overall visibility and aesthetics of the video. Also, at times reflection from the field may affect the lighting conditions. For example, if the field is wet due to rain or irrigation, it may create reflections of light. This can add an extra level of complexity to the lighting conditions, especially if the sunlight reflects off the field’s surface, potentially causing glare or making it challenging to see certain details.
[0030] In an embodiment, the system 102, via the processing module 220, may be configured to adjust the detected lighting conditions in the received live video stream based on a light model. For example, when the lighting conditions in the image frames of the received live video stream is affected by the one or more parameters discussed above, the system 102 may adjust the detected lighting conditions in the received live video stream based on the light model. In an embodiment, the light model may be a machine learning model. To train the machine learning model, the system may receive a diverse training dataset of images or video frames captured in various lighting conditions during different events occurring in the field. The dataset may include samples from different times of the day, a range of lighting intensities, temperatures, types of light sources, weather conditions, stadiums, and camera angles. Each image sample in the video frames may be labeled with information about the lighting conditions, such as time of day, weather conditions, and stadium type, and the like.
[0031] In an embodiment, the light model may be trained on the diverse training dataset gathered by the system, using the labeled images and corresponding lighting condition information. For example, each image in the diverse training set may be assigned a label indicating the corresponding lighting conditions. The labels may include categories such as time of day (morning, afternoon, evening), weather conditions (sunny, cloudy, rainy), stadium type (indoor, outdoor), and other one or more parameters. The system may extract features from the images that are indicative of lighting conditions to train the light model, like analyzing color histograms, texture patterns, and other image characteristics. For example, a well-lit image might have more vibrant colors and fewer shadows compared to an image taken in low light. Once trained, the light model may predict lighting conditions for new images or video frames and adjust the lighting conditions accordingly. This may include modifying brightness, contrast, or color balance to enhance visibility or aesthetics, or similar settings. In an embodiment, the system may split the diverse dataset of images and videos into training set and validation sets. The light model may be trained on the training set, using the labeled images and corresponding lighting condition information. However, the validation set may help evaluate the light model's performance and allow tuning the model to improve the predictions.
[0032] Further, in an embodiment, once the system 102 has adjusted the detected lighting conditions in the received live video stream based on the light model, the system, via the processing module, may be configured to detect one or more parameters specific to the event. The one or more parameters may be associated with a stadium in which the event is taking place. The parameters may include, for example, dimensions of the stadium, dimension of playing area in the stadium, lighting conditions, seating capacity, and the like. The detection of one or more parameters may be explained further considering an example of a cricket match as the live event. If the event occurring on the field is a cricket match, one or more parameters specific to the cricket match may include playing field dimension, field markings, boundary line, pitch dimension, playing surface, and crowd in the stadium. For example, detecting the playing dimension field may involve determining dimensions of the cricket field, including the length of the pitch and the size of the playing area. It is essential for comprehending the game’s spatial environment and is useful for a variety of analyses and visualization. For a cricket match taking place in a cricket stadium, the system may further detect and measure the length and width of the field. Further, detection of field marking may include identifying important field markers, including return creases, popping creases, and stump locations. This is essential for monitoring player positions and deciding which players should be dismissed. Furthermore, detecting the dimensions of the cricket pitch, includes detecting length of the pitch, boundary distance, the position of the stumps, and the popping creases. This information is essential for assessing the delivery of the ball by the bowler. The system may further detect a kind of playing surface, such as artificial, dry and dusty, or grass. The players’ tactics and the ball's behavior can both be affected by this knowledge.
[0033] In an embodiment, the system, via the processing module, may detect the crowd present in the stadium, in the live video stream. The system may mask the detected crowd in the live video stream. In an embodiment, the system may use a machine learning model for detecting the crowd in the live video stream and masking it. The system may collect a diverse dataset of labeled images or video frames containing examples of the crowd and non-crowd regions. The dataset may include different scenarios, lighting conditions, and crowd densities. The system may train the machine learning model on the diverse data set to identify and localize the crowd in each frame of the live video stream. In an embodiment, the machine learning model may include object detection, segmentation models, and the like to detect and mask the crowd. The system may train the object detection model on the diverse data set to identify bounding boxes around the crowd in each video frame. Based on the bounding boxes identified by the object detection model, a region of interest (ROI) may be extracted from each frame. This ROI corresponds to the area where the crowd is present. The system may train the semantic segmentation model to classify the pixels within these bounding boxes as belonging to the crowd. The extracted ROI may be then fed into the semantic segmentation model, which performs pixel-level classification to determine which pixels belong to the crowd and which do not. The output of the semantic segmentation model is a binary mask, where pixels belonging to the crowd are marked, and non-crowd pixels are unmarked. The system may integrate both these models to process the live video stream to identify and mask the crowd in the live video stream and generate a processed video stream.
[0034] In an embodiment, the system 102, via the detection module 216 may be configured to identify respective positions of the one or more participants in the processed video stream. In an embodiment, the detection module 216 may receive Standard Definition signals from a converter unit. In an embodiment, one converter unit may be used to convert Ethernet signals received from an image capturing unit to SD signals and another converter unit may downscale each image frame of the live stream, received as SD signals, based on a predefined format and size to allow faster processing. The positions may be identified by deriving spatial information associated with the one or more participants. The spatial information comprises a set of position coordinates corresponding to the one or more participants. In an exemplary embodiment, to derive the position coordinates of each participant, the detection module 216 may comprise a Gstreamer, a streaming application which allows capturing the stream of images in the processed video stream and feed each image into a Computer Vision application deployed in the form of the detection module 216. In one aspect, the Gstreamer may be enabled using a plug-in to ingest the stream of images to the filtering module 214 at ‘25’ frames per second (fps). The above process of converting the Ethernet signals into Standard Definition (SD) signals and then transmitting the SD signals to the detection module 216 is continuously performed to provide access of each image to the detection module 216, which is configured to detect the position coordinates of each participant present in the image.
[0035] In one embodiment, the position coordinates may be detected by a Deep Learning - Machine Learning (ML) model trained using a plurality of training data images and video streams. The plurality of training data images and video streams depicting the one or more participants scattered around the playing field are captured from a variety of angles, weather conditions, lighting conditions, timing of the day, and players wearing jersey of distinct colors. In an embodiment, the Deep Learning - Machine Learning (ML) model may be trained on a data set comprising a plurality of video recordings of a plurality of events, annotations for position coordinates of each participant detected in the plurality of the video recordings. In one aspect, the Deep Learning - Machine Learning (ML) model may be trained based on object detection algorithms and object tracking algorithms.
[0036] With this training, the detection module 216 detects a set of position coordinates of each participant present in each image frame of the video stream. The set of position coordinates enables the detection module 216 to create a bounded box around the set of position coordinates in a manner such that each bounded box contains a participant. It may be understood that each participant is detected in each image frame of the video stream, so that the participant can be tracked until the event like a cricket match is being played. In one aspect, the position coordinates are detected by using at least one of, a person detection technique and a transformation technique. The person detection technique may be used to identify a person (for example, participant) in an image frame of the video stream. Further, the position coordinates of the person may be derived as pixel coordinates based on a coordinate system defined for the image. Further, the pixel coordinates may be converted to real-world coordinates corresponding to a location in the playing field. The system may use the transformation technique to calculate the real-world coordinates as the position coordinates. The detection module 216 further determines a number of participants including players and umpires on the playing field and generates as many bounding boxes to subsume a participant in each bounded box.
[0037] The bounding boxes may be created based on a subset of the set of position coordinates. The set of position coordinates may include a plurality of position coordinates of the participant in one or more images of stream of images in the video stream. The subset may comprise position coordinates of the participant in one image. The bounding box may be created in each image based on the position coordinates of the participant in that image. The position coordinates may represent at least one of a location of the participant in the playing field and pixels of the image depicting the participant.
[0038] In one embodiment, this continuous detection of the participant through the spatial information including the position coordinates in the stream of images enables monitoring of the movement of the participant around the playing field. The continuous detection further enables prediction of probable positions for each participant who goes undetected in the preceding images of the stream of images. In one aspect, this prediction may be performed based on last recorded position of the participant, velocity based on historic movement of the person, number of frames in which the participant goes undetected, and movement pattern around the playing field. In one embodiment, the participant probable position may be one of coordinates of a point on the playing field, coordinates corresponding to an area on the playing field, and coordinates corresponding to a track followed by the participant on the playing field.
[0039] In an embodiment, the system, via a receiving module 226, may receive an input from an operator. The input may comprise a selection corresponding to the one or more participants. For example, a user may interact with the system through the user device interface. The user device interface may provide different ways for users to make selections corresponding to one or more participants. For example, the selections may be made via buttons, dropdown menus, checkboxes, or other UI elements. In an embodiment, the selection corresponding to the one or more participant may mean that the user may select one or more participant objects corresponding to the participants on field, being displayed on the user’s device. The participant object may be in the bounded box which may be assigned with the unique identification (UID). The system may receive the selection of the one or more participant along with the assigned unique identification.
[0040] In an embodiment, the system may generate a visual representation of the one or more selected participants based on the identified positions of the one or more selected participants, on a user device. The generation of the visual representation includes overlaying graphics onto a particular portion of a broadcasted video stream on the user device. For example, the graphics may display representation of the playing field along with the one or more selected participants positioned on the field during the ongoing event. In an embodiment, the system, via an image translating module 222, translates the position coordinates of each participant, present in an image frame of the live video stream, into a destination image by using a homography technique. For example, the destination image may refer to an output image where the visual representation of participant showing one or more positions will be rendered. This image could be a map, a graphical representation of a playing field, or any other visual representation. The homography is a transformation technique used in computer vision to map points from one plane (such as the live video stream) to another plane (the destination image). It assumes a planar surface, making it suitable for mapping positions from a 2D video frame to another 2D image. The homography technique involves calculating a transformation matrix (homography matrix) that can map the coordinates of points in the image frame of the live video stream to their corresponding positions in the destination image.
[0041] The homography technique includes marking a set of points as edges of straight lines on the image frame. It may be noted that the set of points are marked in manner such that each straight line intersects the playing field from one end to another end of the playing field. The set of points are marked in the image frame in a specific order. In order to translate the position coordinates of each participant into the destination image, by using the homography technique, the image frame may be superimposed on a destination image having the set of points marked on the image frame. The destination image may be an image comprising an aerial view of the playing field. For example, the destination image may be captured via a drone. The playing field, varying in dimensions such as width, length, diameter, and circumference based on the stadium, is captured and standardized to a consistent aspect ratio. The standardization process may involve converting the aspect ratio into pixels, where each pixel corresponds to a specific unit of length in the real-world playing field, ensuring a proportional mapping between physical and digital representations. In an embodiment, a transformation matrix may be derived by comparing the destination image with the image frame, incorporating the spatial relationships between the marked points. The obtained transformation matrix may then be applied by multiplying it with one or more parameters of the image frame, resulting in a transformation of the image frame onto the destination image. For example, if a player is at coordinates (x, y) in the image frame, the transformation matrix multiplication may yield new coordinates (x’, y’) representing the adjusted position. The multiplication produces a set of transformed parameters for each player, accounting for changes in scale, orientation, and perspective. The adjusted parameters may dictate how the image frame should be modified to align with the drone-captured aerial view. The transformation may ensure accurate alignment and superimposition of participant positions. Notably, the transformation matrix facilitates the adjustment of scale and aspect ratio, ensuring a consistent representation between the destination image and the image frame.
[0042] In one aspect, the above translation using the homography technique is performed to intuitively render, on the user’s device, each of the one or more selected participants present on the field. The one or more selected participants may be rendered in a two-dimensional linear plane format. In an exemplary embodiment, the one or more participants may be intuitively rendered on the viewer’s device. This may be achieved by pushing the one or more selected participants as objects along with information related to one or more parameters associated with them. This process occurs in an output pipeline that is communicatively coupled with the broadcaster’s input. Subsequently, the broadcaster broadcasts the participant object and the information related to one or more parameters on the viewer’s device, as illustrated in Figure 5. For example, one or more participants may be rendered on the user device in the one or more forms like dot, simple market, arrows, icons, participant silhouette, color-coded participants, numbered labels, avatars, and the like. Thus, the result may be a translated image where the positions of participants from the live video stream are accurately reflected in the context of the destination image. This translated image could be displayed, stored, or used for further analysis. In yet another embodiment, the system may generate visual representations of all the participants present on the field during the event, using the above discussed method for the selected participants.
[0043] In an embodiment, the system, via a tracking module 224, may track, in real time, a change in the identified positions of the one or more selected participants using a tracking model. In yet another embodiment, the tracking module may track all the participants present on the field during the event. The tracking model may be a machine learning model trained to detect change in the position of the one or more participants using a training dataset. The training dataset may comprise a plurality of video streams including annotated positions of the one or more participants in image frames of the video streams and a plurality of video streams including an annotated change in positions of the one or more participants in the one or more image frames of the video streams. For example, the system may gather a diverse set of video streams from matches, practices, or relevant scenarios. The position of the participants in each of the image frames in the video streams may be annotated with bounding boxes or key point annotations. The annotations may include unique participant IDs for tracking. The system further annotates the change in positions of participants between consecutive frames. The annotations may represent the motion or trajectory of each participant over time. The dataset may include a variety of scenarios, such as different sports, lighting conditions, weather conditions, day timings, uniform color, participants interactions, and different angles of the image capturing unit and the like. In an embodiment, temporal information, including velocity and acceleration of participants may be annotated. This information helps the tracking model understand the dynamics of participant movement. The dataset gathered by the system acts as an input for training the tracking model to predict both current as well as change in position for each participant based on the provided dataset. In an embodiment, one or more model parameters may be optimized using optimization algorithms to minimize the difference between predicted and actual changes in position. The tracking model, trained to analyze and predict changes in position, effectively generates a movement profile by capturing and understanding the patterns, dynamics, and characteristics of participant movement in the live video stream. A movement profile may refer to a comprehensive representation of how an entity (like one or more participants) moves over time.
[0044] In an embodiment, in order to continuously track the participant, the system may assign a Unique Identification (UID) to the person. In one aspect, the system may assign a UID to each participant detected in the stream of images. In an embodiment, the system may annotate the bounding box comprising the participant with the UID. It may be understood that the UID may be a unique number assigned to each participant. In one embodiment, the UID is assigned by using an Operational Support System(OSS), player tracking algorithm customized to assign the unique number to each participant in the bounded box. The intent behind assigning the UID is to avoid an occlusion state whenever two or more participant overlap leading to swapping and/or loss of identification. It is to be understood that the bounded box once created moves along as the participant moves around the playing field. It may be noted that if the participant disappears and cannot be tracked, due to the occlusion, for a short amount of time during the ongoing sport and re-appears again in subsequent images, same UID would be assigned to the participant as assigned before. In other words, if the participant re-appears and detected again in the succeeding image, the participant may be assigned with the UID same as assigned before to him/her.
[0045] In an embodiment, the system, via graphics rendering module 214, may generate the visual representation of the one or more selected participants based on the change in the identified positions. As discussed above, the homography matrix may be calculated based on initial positions of participants in the first frame of the video. This matrix represents the transformation from the original coordinate system to a new one. The changes in the positions of participants are being continuously tracked using the tracking model. Further, for each frame, homography matrix may be applied to the changing positions of participants. This transforms the positions to the new coordinate system. In an embodiment, the system may overlay the visual representation onto a destination image using the transformed coordinates. The system may implement real-time updates to the visual representation as the positions of participants change over time. The machine learning model for position tracking, is integrated with the homography based transformation process, to reflect the change in positions of the one or more participants in the visual representation. The participants may be represented on the user device as a dot, marker, arrow, 2D avatar, participant silhouette, participant icon, and the like. In an embodiment, the system, via graphics rendering module 214, may generate the visual representation of all the participants present in the field during the event, based on the change in the identified positions, using the above discussed methods.
[0046] In an embodiment, the system, via the tracking module 224, tracks one of: one or more movements and one or more actions of the one or more selected participants using a machine learning model. The system gathers a diverse dataset for training the machine learning model to track the movement and action. The dataset may include video streams capturing different sports, lighting conditions, weather conditions, participant movements in the different sports, participants gestures or actions in the different sports, and various participant interactions in the different sports. The data set may be annotated with labeled information, such as bounding boxes or key point annotations, indicating the positions of participants in each frame. Also, annotations for static positions and dynamic changes in positions between consecutive frames may be included. The dataset may be annotated with temporal information, including velocity and acceleration of participants. The dataset may further be annotated with type of movements and actions like running, jumping, or changing direction, while actions may also involve specific events such as scoring a goal, making a pass, defensive actions, severing, net approaches, taking points, taking a wicket, scoring a run, fielding events like catches, run-outs, stumps, and the like. In an embodiment, the dataset may include the 3D coordinates (x, y, z) of one or more participants. The system may input the dataset to the machine learning model for training. The machine learning model may learn to recognize patterns and features associated with different movements and actions based on the provided annotations. The system may implement the trained machine learning model in the system’s tracking module to perform real-time tracking of participants during a live event. The machine learning model may predict both current positions and changes in positions based on the learned patterns. In an embodiment, the machine learning model may be able to identify and categorize specific movements of participants in real-time. This may involve classifying movements into predefined categories or providing a continuous representation of participant motion. In an embodiment, the system may establish a feedback loop to continuously improve the machine learning model’s performance. In an embodiment, the system, via the tracking module 224, tracks one of: one or more movements and one or more actions of the all the participants present in the field using the machine learning model.
[0047] In an embodiment, the system, via graphics rendering module 214, may generate the visual representation of at the least one of the movements and one or more actions of the selected one or participants. In yet another embodiment, the system may generate the visual representation of at the least one of the movements and one or more actions of all participants present in the field during the event. As discussed above, the homography matrix represents the transformation from the original coordinate system to a new one. The changes in the positions of participants are being continuously tracked using the tracking model. Further, for each frame, homography matrix is applied to the changing movements and positions of participants. This transforms the positions to the new coordinate system. In an embodiment, in real-time, the system overlays a dynamic visual representation onto a destination image, reflecting the current movements and actions of the players. The integration of the machine learning model for movement and action tracking further enhances the precision of this overlay, adapting to the evolving players actions and movement. The machine learning model for movement and action tracking, is integrated with the homography-based transformation process, to reflect the change in positions of one or more participants in the visual representation. The participants may be represented on the user device as a dot, marker, arrow, 2D avatar, participant silhouette, participant icon, and the like.
[0048] In an embodiment, it may be understood that after continuously tracking each participant, various meaningful insights like information related to one or more parameters associated with the participants can be deduced which may then be intuitively rendered on the user’s device for user’s consumption. In an exemplary embodiment, the information may be rendered on the user’s device upon receiving a selection of the insights from the operator. In yet another embodiment, the system may use machine learning algorithms to analyze user’s preference regarding the insights and render the preferred insights on the user device. In yet another embodiment, insights may be rendered in real-time on the user’s device as soon as they become available, or at specific defined intervals, without requiring explicit selection from the operator. Few embodiments of such meaningful insights which can be deduced based on continuously tracking are described hereinafter for better understanding. In an embodiment, as discussed earlier in the disclosure, the system receives the selection corresponding to the one or more participants on the field. The system may further receive via the receiving module an input from a user, such that the input comprises a selection corresponding to one or more parameters associated with the selected one or more participants. The system may capture a corresponding UID of the participant while selection. The one or more parameters may include, but not limited to, distance between the selected participants, distance of each of the selected participants from another one or more participants on the field, distance of each of the selected participants from a point on the field, field dimensions, biographic details related to the selected participant, highlighting the selected participant, tracing a player, change in fielding position, data related to wind speed, gap between selected participants, sport related statistics of the selected participant (for example, number of matches played, highest score, best catches and more), and the like. In an exemplary embodiment, in a game of cricket the one or more parameters may be a distance indicating a gap between any two fielders on the playing field, a total distance covered by a fielder from the fielder’s original position to a position where the fielder collects or catches the ball, for a ball delivered by a bowler and highlighting gap between two fielders.
[0049] In an embodiment, the system generates both the visual representation of the one or more selected participants as discussed above along with information associated with the one or more selected parameters corresponding to the one or more selected participants. The system may store the information associated with the one or more participant in a database 228. The information corresponding to each participant may be stored using the unique identification assigned by the system while tracking. To generate the associated information, the system may retrieve the required information from the database using the captured UID of the participant while selection.
[0050] This is further elaborated with a help of an example where different fielders are placed on a cricketing field 302 as shown in figure 4. In order to deduce the meaningful insights about the fielders, the operator selects two candidate objects, referred to as fielders, on the image rendered as the playing field. As shown in figure 4, the operator selects two points which are nothing but fielders positioned at ‘slip and ‘gully’ marked as ‘ABC’ and ‘PQY’ respectively. The information output then deduces the meaningful insights as the name of the selected participants along with a distance indicating a gap between the selected fielder’s ‘ABC’ and ‘PQY’. In an embodiment, the system may retrieve the respective names of the participant from database 228 using the UID. In an embodiment, the one or more parameters like the distance between ‘ABC’ and ‘PQY’ may be determined using a transformation matrix. The transformation model may transform the distance in meters on the actual field to pixels, to be displayed on the user device. For example, the system may capture calibration images or videos of the field (like but not limited to cricket field) with known distances, including calibration markers or objects like (participant objects) with easily identifiable features. Calibration provides the necessary information to understand how many pixels correspond to a specific distance in the physical world. Further, calibration objects in the images are annotated to obtain pixel coordinates. The actual distances between these calibration objects are calculated in meters. The system may further calculate a conversion factor by determining pixel-to-meter ratios. In the live video stream, these initial pixel-to-meter ratios are used as conversion factors for subsequent measurements. By establishing the relationship between pixels and meters based on the annotated calibration data, a foundation for accurate spatial measurements in the digital domain may be laid. For example, the annotated calibration data may be data related to objects already present on the playing field, like data related to pitch present at the playing field which may include topographical data, pith dimensions, pitch markings, environmental factors, vegetation conditions, etc. The system may train a machine learning model to refine the pixel-to-meter ratios using features extracted from calibration images. Subsequent to the determination of the meaningful insights, an image translating module (now shown in the figure) translates the position coordinates of each person, present in the image, and the meaningful insights the visual representation.
[0051] In an embodiment, the system may identify an order of the selected one or more participants. For example, if the user selects three participants, A, B, and C. The user may be an operator who can select the participants based on the ongoing commentary for the match. Accordingly, the system may identify which out of these was selected at first place, second place, or third place. As discussed, above in the disclosure, each participant may be associated with a bounded box and a unique identifier. So, when a user selects a participant object, the system captures the participants UID associated with the selection event. The participant object may be selected by a button click, checkbox selection, or any other user interaction method. The system may use different methods to detect the order of selection. In one embodiment, the system may assign a timestamp to each participant selection event. The order may then be determined based on the chronological sequence of timestamps. In yet another embodiment, an event-driven approach may be utilized by the system where each participant selection may trigger an event. The order is then determined by the sequence of events.
[0052] In an embodiment, the system is configured to generate a label for a respective position of the one or more selected participants. In an exemplary embodiment, when the event may be a cricket match the label may indicate respective fielding positions of the one or more selected participants. For example, the label may indicate slips, gully, cover, mid-off, mid-on, square-leg, fine-leg, mid-wicket, and the like. In an embodiment, the system may use a machine learning model to determine the label for the position. The machine learning model may be trained on variety of data set including the one or more parameters like distance between the selected participants, distance of each of the selected participants from another one or more participants on the field, and distance of each of the selected participants from a point on the field, historical matches, including information about the type of batsman, bowler, match conditions, and the fielding positions at critical moments. The system may Label the dataset with the optimal fielding positions for each situation. The system may train the machine learning model to on the labeled dataset, teaching it to predict the optimal fielding position based on the input features.
[0053] In an embodiment, the system may be configured to generate the visual representation of the one or more participant, via the graphics rendering module, as a 3D model. The system may use one or more techniques like Neural Radiance Fields, or NeRF, technique to generate the 3D model. The NeRF might be used to model the participants and their surroundings and create a 3D visual representation of the participant on the field. The neural network may learn the fine characteristics of the participants and produce realistic 3D reconstructions by training the NeRF model on a dataset of photos that feature participants from various angles. The trained NeRF model may be used in real-time applications to generate dynamic 3D participant representations from the image capturing unit’s live video stream. In an embodiment, multiple images of each participant may be captured from different viewpoints using the image capturing unit. In yet another embodiment, the multiple images may be captured using more than one capturing unit. These images serve as the input data for the NeRF algorithm. For each image, the depth and posture (camera viewpoint) are approximated. While position information may be essential for recreating the 3D world from various angles, depth information aids in understanding the scene’s three-dimensional structure. The NeRF model may be trained on the input images and their associated depth and pose information. The system may integrate the integrate the NeRF-based rendering with the live video stream. This involves continuously updating the neural network with new images from the image capturing unit live feed, estimating depth and pose, and rendering the participant’s 3D model in real-time.
[0054] Referring now to Figure 6, a method 600 for tracking a person on a playing field during an ongoing sport may be shown, in accordance with an embodiment of the present subject matter. The method 600 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 600 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0055] The order in which the method 600 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 600 or alternate methods. Additionally, individual blocks may be deleted from the method 600 without departing from the scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 600 may be considered to be implemented as described in the system 102.
[0056] At block 602, a live video stream of the event occurring in the field is obtained from an image capturing unit 101 mounted at vantage point. The image capturing unit 101 captures an image in a manner such that each image frame in the live video stream demonstrates aerial view of the playing field in entirety. In one implementation, the stream of images of the playing field may be obtained by the image obtaining module 212.
[0057] At block 604, the live video stream may be processed to detect lighting conditions, masking crowd, and like. A stream of images in the live video stream may be continuously received one after the other, such that each image received is processed.
[0058] At block 606, the method comprises identifying the respective positions of the one or more participants on the field. This identification may occur in the processed video stream by deriving spatial information associated with the one or more participants. In an embodiment, the system may collect all the images in which the person is detected. Further, the system may arrange the images in a sequence based on the order in which the images were received. The system may then analyze the collection of the images to derive the spatial information of the one or more participants.
[0059] At block 608, an input may be received comprising a selection of the one or more participants. The system may receive the input and identify an order of selection.
[0060] At block 610, the visual representation of the one or more selected participants based on the identified positions of the one or more participants is generated. Further, the visual representation of change in position of participants and change is actions/movements is also generated.
[0061] In an embodiment, a selection corresponding to one or more parameters associated with the selected one or more participants and the visual representation of the one or more selected participants and information associated with the one or more selected parameters corresponding to the one or more selected participants is generated. In an embodiment, a change in the identified positions of the one or more selected participants is tracked in real-time using a tracking model and the visual representation of the one or more selected participants based on the change in the identified positions is generated. In an embodiment one or more movements and one or more actions of the selected one or participants using a machine learning model is tracked and the visual representation of at least one of the movements and one or more actions of the one or more selected participants is generated. In an embodiment, the visual representation includes overlaying graphics onto a portion of a broadcasted video stream on the user device, wherein the graphics display at least one of: the one or more selected participants, change in the identified positions of the one or more selected participants, at least one of the movements and one or more actions of the one or more selected participants, and the information associated with the one or more selected participants. In an embodiment, an order of the selected one or more participants is identified. In an embodiment, the one or more parameters associated with the one or more selected participants include at least one of: distance between the selected participants, distance of each of the selected participants from another one or more participants on the field, and distance of each of the selected participants from a point on the field. In an embodiment, one or more parameters are determined based on a transformation model. In an embodiment, a label for a position of the one or more selected participants is generated based on the one or more parameters. In an embodiment, the tracking model is a machine learning model trained to detect change in the position using a training dataset comprising a plurality of video streams including annotated positions of the one or more participants in image frames of the video streams and a plurality of video streams including an annotated change in positions of the one or more participants in the one or more image frames of the video streams. In an embodiment, the processing further comprises detecting lighting conditions in the received live video stream and adjusting the detected lighting conditions in the received live video stream based on a light model. In an embodiment, the lighting conditions depend at least on one or more parameters selected from: timing of day, type of a stadium comprising the field, weather conditions, angle of the image capturing unit, one or more shadows, and reflections from the field. In an embodiment, the light model is a machine learning model trained on a diverse data set of images including at least a range of lighting intensities, temperatures, types of light sources, time of a day, one or more angles of the image capturing unit, and one or more weather conditions. In an embodiment, the processing further comprises detecting one or more parameters specific to the event. In an embodiment, detecting comprises detecting one or more parameters related to a stadium at which the event takes place, the one or more parameters at least include playing field dimension, field markings, boundary line, pitch dimension, playing surface, and crowd in the stadium. In an embodiment, the detected crowd in the live video stream received from the image capturing unit is masked. In an embodiment, the spatial information comprises a set of position coordinates corresponding to the one or more participants.
[0062] Although implementations for methods and systems for generating visual representation of a participant on a playing field during an ongoing sport have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for generating visual representation of each participant present on the playing field.
[0063] Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include those provided by the following features.
[0064] Some embodiments of the system and the method highlight important players, their motions, and their actions in real-time, giving spectators a more thorough knowledge of the game or event.
[0065] Some embodiments of the system and the method improve player identification such as, in crowded or action-packed scenes, graphic overlays make it easier for viewers to recognize and follow certain performers, making for a more captivating viewing experience.
[0066] Some embodiments of the system and the method enhance user experience by highlighting key moments. For example, overlays are a useful tool for graphically emphasizing plays, goals, or moments that are significant during a broadcast.
[0067] Some embodiments of the system and the method provide integration of statistics with overlaying graphics to give viewers the most recent player stats, scores, and other pertinent data.
[0068] Some embodiments of the system and the method improve the interaction and participatory nature of the viewing experience by the use of graphics. This is especially beneficial for fan interaction during live events.
[0069] The method and system disclosed in the present application has various applications for example, Sports Broadcasting: During live sports broadcasts, real-time statistics are provided, player tracking visuals are overlaid, and noteworthy plays are highlighted. Instant Replays: Enhancing instant replays with overlays to analyze specific actions or moments in detail. Tactical Analysis: During practice, coaches employ overlays to examine player movements, team tactics, and tactical plays. Skill Development: Providing visual feedback to athletes by overlaying graphics on their training sessions to improve technique and performance, and the like. , Claims:I/We Claim:

1. A method for generating a visual representation of one or more participants in an event occurring on a field, the method comprising:
obtaining a live video stream of the event from an image capturing unit, wherein the image capturing unit is mounted at a predetermined height to capture an entire view of the field;
processing the obtained live video stream;
identifying respective positions of the one or more participants on the field in the processed video stream by deriving spatial information associated with the one or more participants;
receiving an input, wherein the input comprises a selection corresponding to the one or more participants; and
generating, based on the identified respective positions in the processed video stream, the visual representation of the one or more selected participants on a user device.

2. The method of claim 1 further comprises:
receiving a selection corresponding to one or more parameters associated with the selected one or more participants; and
generating the visual representation of the one or more selected participants and information associated with the one or more selected parameters corresponding to the one or more selected participants.

3. The method of claim 1, wherein generating the visual representation of the one or more selected participants comprises:
tracking, in real time, a change in the identified positions of the one or more selected participants using a tracking model; and
generating the visual representation of the one or more selected participants based on the change in the identified positions.

4. The method of claim 1, wherein generating the visual representation of the one or more selected participants comprises:
tracking at least one of: one or more movements and one or more actions of the selected one or participants using a machine learning model; and
generating the visual representation of at least one of the movements and one or more actions of the one or more selected participants.

5. The method of claim 1, wherein generating the visual representation includes overlaying graphics onto a portion of a broadcasted video stream on the user device, wherein the graphics display at least one of: the one or more selected participants, change in the identified positions of the one or more selected participants, at least one of the movements and one or more actions of the one or more selected participants, and the information associated with the one or more selected participants.

6. The method of claim 1 further comprises:
identifying an order of the selected one or more participants.

7. The method of claim 2, wherein the one or more parameters associated with the one or more selected participants include at least one of: distance between the selected participants, distance of each of the selected participants from another one or more participants on the field, and distance of each of the selected participants from a point on the field.

8. The method of claim 2, wherein the one or more parameters are determined based on a transformation model.

9. The method of claim 2 further comprises:
generating, based on the one or more parameters, a label for a position of the one or more selected participants.

10. The method of claim 3, wherein the tracking model is a machine learning model trained to detect change in the position using a training dataset comprising a plurality of video streams including annotated positions of the one or more participants in image frames of the video streams and a plurality of video streams including an annotated change in positions of the one or more participants in the one or more image frames of the video streams.

11. The method of claim 1, wherein the processing further comprises:
detecting lighting conditions in the received live video stream; and
adjusting the detected lighting conditions in the received live video stream based on a light model.

12. The method of claim 11, wherein the lighting conditions depend at least on one or more parameters selected from: timing of day, type of a stadium comprising the field, weather conditions, angle of the image capturing unit, one or more shadows, and reflections from the field.

13. The method of claim 11, wherein the light model is a machine learning model trained on a diverse data set of images including at least a range of lighting intensities, temperatures, types of light sources, time of a day, one or more angles of the image capturing unit, and one or more weather conditions.

14. The method of claim 1, wherein the processing further comprises detecting one or more parameters specific to the event.

15. The method of claim 14, wherein detecting comprises detecting one or more parameters related to a stadium at which the event takes place, and wherein the one or more parameters at least include playing field dimension, field markings, boundary line, pitch dimension, playing surface, and crowd in the stadium.

16. The method of claim 15 further comprises:
masking the detected crowd in the live video stream received from the image capturing unit.

17. The method of claim 1, wherein the spatial information comprises a set of position coordinates corresponding to the one or more participants.

18. A system for generating visual representation of one or more participants on a field, the system comprising:
a memory; and
a processor coupled to the memory, wherein the processor is configured to execute one or more instructions stored in the memory to:
receive a live video stream from an image capturing unit, wherein the image capturing unit is mounted at a predetermined height to capture an entire view of the field;
process the received live video stream;
identify respective positions of the one or more participants on the field in the processed video stream by deriving spatial information associated with the one or more participants, wherein the spatial information comprises a set of position coordinates corresponding to the one or more participants;
receive an input, wherein the input comprises a selection corresponding to the one or more participants; and
generate, based on the input, visual representation of the one or more selected participants.

Documents

Application Documents

#	Name	Date
1	202421018643-STATEMENT OF UNDERTAKING (FORM 3) [14-03-2024(online)].pdf	2024-03-14
2	202421018643-REQUEST FOR EARLY PUBLICATION(FORM-9) [14-03-2024(online)].pdf	2024-03-14
3	202421018643-POWER OF AUTHORITY [14-03-2024(online)].pdf	2024-03-14
4	202421018643-FORM-9 [14-03-2024(online)].pdf	2024-03-14
5	202421018643-FORM FOR STARTUP [14-03-2024(online)].pdf	2024-03-14
6	202421018643-FORM FOR SMALL ENTITY(FORM-28) [14-03-2024(online)].pdf	2024-03-14
7	202421018643-FORM 1 [14-03-2024(online)].pdf	2024-03-14
8	202421018643-FIGURE OF ABSTRACT [14-03-2024(online)].pdf	2024-03-14
9	202421018643-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [14-03-2024(online)].pdf	2024-03-14
10	202421018643-DRAWINGS [14-03-2024(online)].pdf	2024-03-14
11	202421018643-DECLARATION OF INVENTORSHIP (FORM 5) [14-03-2024(online)].pdf	2024-03-14
12	202421018643-COMPLETE SPECIFICATION [14-03-2024(online)].pdf	2024-03-14
13	Abstract.jpg	2024-04-05