Automated Dataset Generation For A Sports Activity Monitoring System

< Back

Automated Dataset Generation For A Sports Activity Monitoring System

Abstract: The present disclosure relates to an automated dataset generation for a sports activity monitoring system. The method includes obtaining a raw video data associated with the sports activity, wherein the raw video data comprises one or more video frames comprising at least one player and at least one entity associated with the sports activity, detecting the at least one player performing the sports activity in each of the one or more frames of the raw video data, creating a bounding box around the detected player in each of the one or more frames of the raw video data, obtaining a start frame and an end frame based on the presence of the detected player and the bounding box in each of the one or more frames of the raw video data and generating the dataset based on the obtained start and end frames.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

27 May 2022

Publication Number

48/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

JIO PLATFORMS LIMITED

Office-101, Saffron, Nr. Centre Point, Panchwati 5 Rasta, Ambawadi, Ahmedabad - 380006, Gujarat, India.

Inventors

1. PAILLA, Balakrishna

Type 6 QTRS, Audit Bhavan, Green Valley, Alto Porvarim, Goa - 403521, India.

2. PANDEY, Naveen Kumar

74, Village - Pure Gangaram, Post- Pure Gajai, Kunda, Pratapgarh - 230128, Uttar Pradesh, India.

3. MISHRA, Pulkit

117/531, H-1, Pandu Nagar, Kanpur - 208005, Uttar Pradesh, India.

Specification

DESC:RESERVATION OF RIGHTS
[0001] A portion of the disclosure of this patent document contains material, which is subject to intellectual property rights such as, but are not limited to, copyright, design, trademark, Integrated Circuit (IC) layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (hereinafter referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.

TECHNICAL FIELD
[0002] The present disclosure relates to a sports activity monitoring system, and more particularly to a system and method for automating a process of generating data related to a performance of a player while playing a sport, wherein the generated data may be analyzed to provide feedback and training to the player on their performance.

BACKGROUND
[0003] The following description of related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art.
[0004] Sports activity monitoring system includes action recognition as one major requirement. Action recognition helps in generating important insights and analysis relating to a player’s performance of a particular sports activity, which may help coaching staff to provide better assistance to players towards improving their performance. Action recognition requires understanding and analyzing video inputs by employing computer vision and data centric artificial intelligence (DCAI) systems.
[0005] For example, in a sport like cricket, a player plays a lot of matches in a year and each match includes a lot of balls, wherein each ball is approached with a particular type of shot by a batsman resulting in a ginormous data relating to various shots. Presently available action recognition systems for cricket shot identification use deep learning and rely heavily on the availability of clean labelled datasets. Creating labelled datasets for action recognition involves manual intervention. This involves domain experts to spatio-temporally localize the cricket shot making the task of dataset generation extremely complex and laborious, resulting in domain experts wasting considerable amount of time on the same.
[0006] There is, therefore, a need in the art to automate the process of dataset generation using the concept of Data-Centric Artificial Intelligence (DACI) to curtail costs while saving time of domain experts.

OBJECTS OF THE PRESENT DISCLOSURE
[0007] Some of the objects of the present disclosure, which at least one embodiment herein satisfies are as listed herein below.
[0008] It is an object of the present disclosure to provide an automated process for dataset generation associated with a sports performance of a particular player.
[0009] It is an object of the present disclosure to determine a type of shot being played from a video input using a cost-effective computer vision-assisted workflow.
[0010] It is an object of the present disclosure to provide an infinitely scalable dataset generation process with minimal supervision.
[0011] It is an object of the present disclosure to facilitate building an action recognition system for enabling coaching staff or sports instructor to help players improve their skills and techniques.
[0012] It is an object of the present disclosure to provide the dataset generation system to be an easily operable application.
SUMMARY
[0013] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[0014] In an aspect, the present disclosure relates to a system for generating dataset for a sports activity. The system includes one or more processors and a memory operatively coupled to the one or more processors, where the memory includes processor-executable instructions, which on execution, cause the one or more processors to obtain, from a video capturing device, raw video data associated with the sports activity, where the raw video data includes one or more video frames including at least one player and at least one entity associated with the sports activity, detect the at least one player performing the sports activity in each of the one or more video frames of the raw video data, create a bounding box around the detected at least one player in each of the one or more video frames of the raw video data, obtain a first start frame and a first end frame based on a presence of the detected at least one player and the bounding box in each of the one or more video frames of the raw video data, and generate a first dataset based on the obtained first start frame and the first end frame, where the first dataset includes the first start frame, the first end frame, and a set of frames in between the first start frame and the first end frame.
[0015] In an embodiment, the processor is configured to determine an area associated with each bounding box in each of the one or more video frames of the first dataset and calculate a median associated with the area of each bounding box in each of the one or more video frames of the first dataset.
[0016] In an embodiment, the processor is configured to determine that the area of the bounding box in the one or more video frames is greater than the median to obtain a second start frame, and determine that the area of the bounding box in the one or more video frames is less than the median to obtain a second end frame, to generate a second dataset including the second start frame, the second end frame, and one or more frames in between the second start frame and the second end frame.
[0017] In an embodiment, the second dataset identifies a type of the sports activity performed by the at least one player and provides a spatio-temporally localized data associated with the raw video data.
[0018] In an embodiment, the sports activity includes a cricket game, where the at least one player performing the sports activity includes at least one of a batsman, a bowler, a wicketkeeper, a fielder, and a runner.
[0019] In an embodiment, the type of sports activity includes a batting shot performed by the batsman, and the at least one entity includes at least one of a wide marker, wickets, and a crease.
[0020] In an embodiment, the processor is configured to process the raw video data via an artificial intelligence (AI) engine using at least one of a computer vision technique and a data centric artificial intelligence (DCAI) technique.
[0021] In another aspect, the present disclosure relates to a method of generating dataset for a sports activity. The method includes obtaining, by one or more processors associated with a system, raw video data associated with the sports activity captured by a video capturing device, where the raw video data includes one or more video frames including at least one player and at least one entity associated with the sports activity. Further, the method includes detecting, by the one or more processors, the at least one player performing the sports activity in each of the one or more video frames of the raw video data, creating, by the one or more processors, a bounding box around the detected at least one player in each of the one or more video frames of the raw video data, obtaining, by the one or more processors, a first start frame and a first end frame based on a presence of the detected at least one player and the bounding box in each of the one or more video frames of the raw video data, and generating, by the one or more processors, a first dataset based on the obtained first start frame and first end frame.
[0022] In an embodiment, the method includes determining, by the one or more processors, an area associated with each bounding box in each of the one or more video frames of the raw video data, calculating, by the one or more processors, a median associated with the area of each bounding box in each of the one or more video frames of the raw video data, determining, by the one or more processors, that the area of the bounding box in the one or more video frames is greater than the median to obtain a second start frame, determining, by the one or more processors, that the area of the bounding box in the one or more video frames is less than the median to obtain a second end frame, and generating, by the one or more processors, a second dataset including the second start frame, the second end frame, and one or more frames in between the second start frame and the second end frame. In an embodiment, the second dataset identifies a type of the sports activity performed by the at least one player.
[0023] In another aspect, the present disclosure relates to a user equipment (UE). The UE includes one or more processors communicatively coupled to a system, where the one or more processors are configured to capture a video associated with a sports activity, and transmit the captured video through a network to the system. The system includes a processor configured to obtain the video associated with the sports activity from the UE, where the video includes one or more video frames including at least one player and at least one entity associated with the sports activity. The processor is configured to detect the at least one player performing the sports activity in each of the one or more video frames of the video, create a bounding box around the detected at least one player in each of the one or more video frames of the video, obtain a first start frame and a first end frame based on a presence of the detected at least one player and the bounding box in each of the one or more video frames of the video, and generate a dataset based on the obtained first start frame and the first end frame.

BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes the disclosure of electrical components, electronic components or circuitry commonly used to implement such components.
[0025] FIG. 1 illustrates an exemplary network architecture (100) in which or with which a proposed system may be implemented, in accordance with an embodiment of the present disclosure.
[0026] FIG. 2 illustrates an exemplary block diagram (200) of the proposed system for automated dataset generation, in accordance with an embodiment of the present disclosure.
[0027] FIG. 3 illustrates an exemplary representation (300) of an object detection model for a sports activity, in accordance with an embodiment of the present disclosure.
[0028] FIG. 4 illustrates an exemplary flow chart of a method (400) for automatically generating a dataset, in accordance with an embodiment of the present disclosure.
[0029] FIG. 5 illustrates an exemplary flow chart of a method (500) for finding an initial start frame and an initial end frame for generating a dataset, in accordance with an embodiment of the present disclosure.
[0030] FIG. 6 illustrates an exemplary flow chart of a method (600) for finding a new start frame and a new end frame for generating a dataset, in accordance with an embodiment of the present disclosure.
[0031] FIG. 7 illustrates an exemplary frame diagram (700) with initial start and end frames, and new start and end frames, in accordance with an embodiment of the present disclosure.
[0032] FIG. 8 illustrates an exemplary computer system (800) in which or with which embodiments of the present disclosure may be implemented.
[0033] The foregoing shall be more apparent from the following more detailed description of the disclosure.

DETAILED DESCRIPTION
[0034] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0035] The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
[0036] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
[0037] Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0038] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner similar to the term “comprising” as an open transition word—without precluding any additional or other elements.
[0039] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0040] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
[0041] The present disclosure provides a robust and effective solution for automated dataset generation for a sports activity monitoring system. Any sports activity may include different types of moves from different players while playing the sport. Analyzing such moves may enable a coach or trainer to assist a player for improving their performance. For example, in a game like cricket, thousands of cricket matches are played and hundreds of thousands of balls are bowled every year producing ginormous amounts of video data. A key aspect of the video data is a record that may be maintained related to what shot was played by a batsman on each of these balls. For training a batsman, such shots may need to be identified and labelled. However, identification and labelling of shots played by a player may not be an easy task. There are over 50 different types of shots with many of them looking very similar to each other. The shots may need to be spatio-temporally localized. The huge amount of data coupled with the complexity of shot identification makes the task of shot identification extremely human resource heavy requiring a domain expert to spend considerable amount of time. Action recognition systems using deep learning model enable automating the complex task of shot identification, thereby minimizing the time spent by domain experts.
[0042] The present disclosure relates to a technique for employing a deep learning model trained to detect different types of players and different entities involved in a sport to spatio-temporally localize where in the video a particular shot is being played and an area in a video frame in which the shot is being played. Thus, from the raw data, a dataset consisting of spatio-temporally localized shots which may be utilized for training deep learning model for shot detection is generated. For example, in a game of cricket, the players include batsman, bowler, runner, wicketkeeper, etc., and entities present on pitch include wide-marker, wickets, etc. The disclosed system and method facilitates automating the process of dataset generation for cricket shot identification by annotating a set of frames in which a particular shot is being played and also the area in each frame where the shot is being played. The disclosed system and method employs computer vision-assisted workflow enabling the task of dataset generation to scale infinitely and run with minimal supervision aiding in building action recognition systems which enables coaching staff of a sports team in helping players to improve their skills and techniques.
[0043] In accordance with an embodiment, the disclosed system obtains a set of raw video data, for example, a video captured when a particular sports activity is happening, and processes the video data to obtain a set of video frames comprising only a particular player performing a particular type of sports activity resulting in creating a dataset. A bounding box is created around the player performing the particular type of sports activity and an area associated with each bounding box in the set of video frames is calculated to find a median. Upon determining the median, the area for each bounding box in each frame is compared with the median. The frame having the bounding area more than the median is taken as a start frame and the frame having the bounding box area less than the median is taken as an end frame. The start frame, the end frame, and frames between them where the player is focussed on playing the sport is taken as the dataset for further processing. The entire process of generating the dataset is automated, thereby reducing the time spent by domain experts on the same. Other like benefits and advantages are provided by the disclosed solution, which will be discussed in detail throughout the disclosure.
[0044] Certain terms and phrases have been used throughout the disclosure and will have the following meanings in the context of the ongoing disclosure.
[0045] The term “dataset” may refer to a set of video frames comprising a player playing a particular sport.
[0046] The term “frame” may refer to a video frame in a video sequence.
[0047] The term “raw video data” may refer to video of a particular sports event captured by a video capturing device.
[0048] The term “bounding box” may refer to imaginary boxes used for annotating an object or a person in a video or image frame.
[0049] The various embodiments throughout the disclosure will be explained in more detail with reference to FIGs. 1-8.
[0050] FIG. 1 illustrates an exemplary network architecture (100) in which or with which a proposed system may be implemented, in accordance with an embodiment of the present disclosure.
[0051] Referring to FIG. 1, the network architecture (100) may include a sports activity (102) being captured by a video capturing device (112), wherein the captured video is transmitted for storage in a database (116) via a network (114). The stored video is processed by a system (120) to generate a dataset. The sports activity (102) may include players and a set of entities associated with the sport. In an example embodiment, the sports activity (102) may include a cricket game. The cricket game may include one or more players such as, without limitations, a batsman, a bowler, a fielder, a wicketkeeper, etc.
[0052] Referring to FIG. 1, the sports activity (102) includes a batsman (104) hitting a ball (108) bowled by a bowler (110), and includes the entity, for example, stumps (106). The bowler (110) bowling the ball (108) which is being hit by the batsman (104) is captured by the video capturing device (112). The video capturing device (112) may include, for example, without limitations, a camera, a handy cam, a camcorder, or any device capable of capturing a video. In some embodiments, the video capturing device (112) may include a user equipment (UE), a handheld wireless communication device (e.g., a mobile phone, a smart phone, a phablet device, and so on), a wearable computer device (e.g., a head-mounted display computer device, a head-mounted camera device, a wristwatch computer device, and so on), a laptop computer, a tablet computer, or another type of portable computer, a media capturing/playing device, and/or any other type of image/video capturing device with wireless communication capabilities, and the like.
[0053] A person of ordinary skill in the art will appreciate that the video capturing device (112) or UEs may not be restricted to the mentioned devices and various other devices may be used.
[0054] Referring to FIG. 1, the captured video(s) by the video capturing device (112) may be sent for storage in the database (116) via the network (114). The network (114) may include a wireless card or some other transceiver connection to facilitate this communication. In an exemplary embodiment, the network (114) may incorporate one or more of a plurality of standard or proprietary protocols including, but not limited to, Wi-Fi, ZigBee, or the like. In another embodiment, the network (114) may be implemented as, or include, any of a variety of different communication technologies such as a wide area network (WAN), a local area network (LAN), a wireless network, a mobile network, a Virtual Private Network (VPN), the Internet, the Public Switched Telephone Network (PSTN), or the like.
[0055] Referring to FIG. 1, the captured video stored in the database (116) may be used by the system (120) for further processing. In an embodiment, the database (116) may be present within the system (120) or in communication with the system (120). The video stored in the database (116) is processed to obtain the required details. The video captured by the video capturing device (112) includes not only the player (e.g., 104) playing a particular shot but also other details like the surrounding area where the sports is played, an audience watching the sport, other players, referees/umpires, etc.
[0056] Referring to FIG. 1, the system (120) may include an artificial intelligence (AI) engine (118) in which or with which the embodiments of the present disclosure may be implemented. In particular, the system (120), and as such, the AI engine (118) facilitates processing of the raw video data captured by the video capturing device (112) to obtain frames containing data relevant to a particular player (104) playing a particular shot. The AI engine (118) may use computer vision or data centric artificial intelligence (DCAI) techniques to process the raw video data. In other words, a spatio-temporal localization is achieved automatically to obtain frames of interest from the raw video data. The AI engine (118) obtains the raw video data from the database (116), checks for each frame in the video data to see if a player (104) and any one entity (e.g., 106) associated with the sports activity (102) is present. Upon detecting the player (104) and at least one entity (106), the AI engine (118) checks whether the player (104) is present in next consecutive four frames. If the player (104) is present in the next consecutive four frames, then a first frame with the player (104) is marked as a start frame and the other frames are checked until an absence of the player (104) is detected. If the AI engine (118) detects the absence of the player (104), then a check for absence is performed on consecutive two frames. If the two frames also mark absence of the player (104), then the third consecutive frame with the player (104) being absent is marked as an end frame. The start and end frames along with the frames in between are taken as a first dataset and a further limitation is applied on the first dataset to generate a second dataset. To apply further limitations, the AI engine (118) creates bounding boxes around the player (104) in all the frames and calculates an area associated with each bounding box. Further, the AI engine (118) derives a median area. The median is then used as a threshold value. Considering the fact that when a player (104) plays a particular shot, the video capturing device (112) will zoom the player (104), and the frames corresponding to such shots may have a bigger area for the bounding boxes compared to other frames, and when the player (104) is done playing the shot, the focus of the video capturing device (112) may shift to other entities (106) of the sports activity (102). In such frames, the area of the bounding boxes is smaller. The AI engine (118) compares the area of the bounding boxes in the first dataset with the median (i.e., threshold). At least five frames having bounding box area greater than the threshold are considered and the first one of the five frames is marked as a new start frame. Similarly, at least three frames having bounding box area smaller than the threshold are considered and the last one of the three frames is marked as a new end frame. The new start frame, the new end frame, and the frames in between form the second dataset. The second dataset is more limited to the player (104) playing the shot and is used in analyzing the player’s performance. The second dataset may then be stored in the database (116) for further processing.
[0057] Therefore, it may be appreciated that the present disclosure provides an automated generation of dataset by reducing human resource generally required in labelling such shots.
[0058] Although FIG. 1 shows exemplary components of the network architecture (100), in other embodiments, the network architecture (100) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1. Additionally, or alternatively, one or more components of the network architecture (100) may perform functions described as being performed by one or more other components of the network architecture (100).
[0059] FIG. 2 illustrates an exemplary block diagram (200) of the proposed system for automated dataset generation, in accordance with an embodiment of the present disclosure.
[0060] For example, the system (120) may include one or more processor(s) (202). The one or more processor(s) (202) may be implemented as one or more microprocessors, microcomputers, microcontrollers, edge or fog microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions. Among other capabilities, the one or more processor(s) (202) may be configured to fetch and execute computer-readable instructions stored in a memory (204) of the system (110). The memory (204) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as Random-Access Memory (RAM), or non-volatile memory such as Electrically Erasable Programmable Read-only Memory (EPROM), flash memory, and the like.
[0061] In an embodiment, the system (120) may include an interface(s) (206). The interface(s) (206) may comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as input/output (I/O) devices, storage devices, and the like. The interface(s) (206) may facilitate communication for the system (120). The interface(s) (206) may also provide a communication pathway for one or more components of the system (120). Examples of such components include, but are not limited to, processing unit/engine(s) (208) and a database (210).
[0062] The processing unit/engine(s) (208) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) (208). In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) (208) may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) (208) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) (208). In such examples, the system (120) may include the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the system (120) and the processing resource. In other examples, the processing engine(s) (208) may be implemented by electronic circuitry. In an aspect, the database (210) may comprise data that may be either stored or generated as a result of functionalities implemented by any of the components of the processor (202) or the processing engines (208). A person of ordinary skill in the art will understand that the database (210) may be similar the database (116) of FIG. 1.
[0063] In an embodiment, the processing engine (208) may include engines that receive raw video data from the video capturing device (112) via the network (114) (e.g., via the Internet) of FIG. 1, to process the video frames, index the data, and generate a dataset for further processing. In an embodiment, the generated dataset may be stored at the database (210). In an embodiment, the processing engine (208) may include one or more modules/engines such as, but not limited to, an acquisition engine (212), an AI engine (214), and other engine(s) (216). A person of ordinary skill in the art will understand that the AI engine (214) may be similar in its functionality with the AI engine (118) of FIG. 1, and hence, may not be described in detail again for the sake of brevity.
[0064] Referring to FIG. 2, the database (210) may store the raw video data, i.e., the video captured by the video capturing device (112) of FIG. 1. In an exemplary embodiment, the raw video data may include a video relating to a sports activity (e.g., 102).
[0065] By way of example but not limitation, the one or more processor(s) (202) may obtain raw video from the video capturing device (112) to generate a dataset of interest. Further, in an embodiment, the one or more processor(s) (202) may cause an acquisition engine (212) to extract video frames from the database (210) for further analysis by the AI engine (214).
[0066] In an embodiment, the one or more processor(s) (202) may cause the AI engine (214) to generate the dataset of interest to analyze the shots played by a particular player.
[0067] A person of ordinary skill in the art will appreciate that the exemplary block diagram (200) may be modular and flexible to accommodate any kind of changes in the system (120).
[0068] FIG. 3 illustrates an exemplary representation (300) of an object detection model for a sports activity, for example, cricket game, in accordance with an embodiment of the present disclosure. In an example embodiment, the object detection model may include a region based convolutional neural network (RCNN). In FIG. 3, object detection model (302) for a game of cricket is shown. The object detection model (302) takes red, green, blue (RGB) video frames as input, and for each frame, detects different types of cricket players including a batsman, a runner, a wicket keeper, a fielder, etc., and other entities present on pitch including wide marker, wickets, crease, etc., and creates separate classes for each player and each entity to be stored in a database (e.g., 116 or 210).
[0069] FIG. 4 illustrates an exemplary flow chart of a method (400) for automatically generating a dataset, in accordance with an embodiment of the present disclosure.
[0070] At step 402, the method (400) may include receiving a RGB video input, wherein the RGB video input includes raw video data received from a video capturing device (112) of FIG. 1. Further, at step 404, the method (400) may include detecting a type of player and list of entities present on the ground. For example, in a game of cricket, a batsman along with wickets may be detected. Further, at step 406, the method (400) may include creating bounding boxes around the player in each frame. It may be understood that the bounding boxes are imaginary boxes used for annotating objects or person in an image. At step 408, the method (400) may include obtaining an initial start frame or a first start frame and an initial end frame or a first end frame for a shot played by a player. Further, at step 410, the method (400) may include creating a first dataset comprising the initial start and end frames and a set of frames in between, where the first dataset mainly represents the shot played by a player. Furthermore, at step 412, upon creating the first dataset, the method (400), may include obtaining a new start frame or a second start frame and a new end frame or a second end frame based on the first dataset and the bounding boxes in each frame in the dataset. Further, at step 414, the method (400) may include generating a second dataset comprising the new start and end frames and set of frames in between. The second dataset includes frames more focussed on the shot played by the player.
[0071] By way of example, without any limitations, in a game of cricket, an automated technique is employed for shot identification. A computer vision-based two-step algorithm aids in generating datasets. The algorithm employs a cricket player and entity detection model to obtain insights of raw ball-by-ball video data and uses the insights gained to spatio-temporally localize where in the video is the shot being played. For example, in the first stage, based on a first heuristic, the algorithm finds an initial start frame and an initial end frame of the shot, and in the second stage, based on a second heuristic, the algorithm tightens the bounds on the start and end frames of the shot obtained from the first stage i.e., including only the frames where exactly the shot is being played.
[0072] The heuristics applied by the deep learning model for the game of cricket are based on the following insights gained from the raw video data. In a game of cricket, whenever a shot is played, the video begins with a random noise such as advertisements, commentators discussing in the commentary box, camera panning to show a view of the spectators etc., followed by bowler’s run-up and bowling the ball to the batsman. Once the bowler bowls the ball, the video focuses the batsman playing the shot and then the video capturing device (112) follows the direction in which the ball is hit and again followed with random noise.
[0073] Considering the above insights, the first heuristic applied may be to focus on video frames when a batsman is detected and present for five consecutive frames. The first frame of these five consecutive frames comprising at least one of keeper or wickets or wide marker is marked as the start frame. Upon marking the start frame, each frame is checked until a point where the batsman is absent. The absence of batsman is checked for three consecutive frames and if the batsman is found absent in three consecutive frames, the last frame is marked as the end frame. The marking of the start and end frame provides a small portion of video data or a first dataset comprising the shot played by the player or the batsman.
[0074] Further, a bounding box is created around the player in each frame between the start and the end frame. An area of the bounding box is calculated and a median of the areas is determined. The median along with the application of the second heuristic results in the required dataset. The second heuristic is based on the fact that, whenever the player plays the shot, the video capturing device (112) zooms in on the batsman making the bounding box bigger resulting in a larger calculated area. The median is used as a threshold to remove frames in the above-determined short video (first dataset) to obtain an even more specific dataset i.e., the second dataset. The area of bounding box present in each frame in the first dataset is compared with the median. The frames having an area of the bounding box more than the median are considered and the first of such frames is taken as a new start frame and a set of five consecutive frames are considered. Similarly, to find a new end frame, the frames having an area of the bounding box less than the median are considered as end frames and three consecutive frames are considered. The frames between the new start and end frames determine a precise location of the batsman playing the shot in the captured video.
[0075] FIG. 5 illustrates an exemplary flow chart of a method (500) for finding an initial start frame and an initial end frame for generating a dataset, in accordance with an embodiment of the present disclosure.
[0076] At step 502, the method (500) may include receiving a raw video input. The raw video input is a RGB video data including a set of frames related to a sports activity captured by a video capturing device (112) of FIG. 1. Further, at step 504, the method (500) may include checking for a presence of a player and an entity associated with the sport, for example, in a game of cricket, the player may be a batsman, a bowler, a fielder, a wicketkeeper, or a runner, and the entity may include wickets, crease, wide markers, etc. If the presence of the player is detected in a particular frame, the method (500), at step 506, may include checking for the next four consecutive frames. Further, at step 508, the method (500) may include determining if the player is present in all the four frames. If the player is present in all the four frames, the method (500), at step 510, may include saving the first of the five frames as an initial start frame. On the other hand, if the player is not present in the four consecutive frames, the method (500) may include checking the next frame from the raw video data.
[0077] Further, upon saving the initial start frame, the method (500), at step 512, may include checking more video frames. At step 514, the method (500) may include determining whether the player is absent in the frame. If the player is absent, the method (500), at step 516, may include checking the next two consecutive frames. Further, at step 518, the method (500) may include determining if the player is absent in the next two consecutive frames. If the player is absent in the next two consecutive frames, the method (500), at step 520, may include saving the last of the three frames as an initial end frame. On the other hand, if the player is still present in the frames or the player is not absent in the consecutive frames, the method (500), at step 512, may include checking the frames for the absence. Further, upon saving the initial start and end frames, the method (500), at step 522, may include calculating a median of area of bounding boxes around the player in each of start frame, end frame, and frames in between. The initial start frame, the initial end frame, and the frames in between form a first dataset. In an embodiment, the median is used as a threshold to determine a new start frame, a new end frame, and as such, a second dataset as will be explained below with reference to FIG. 6.
[0078] FIG. 6 illustrates an exemplary flow chart of a method (600) for finding a new start frame and a new end frame for generating a dataset, in accordance with an embodiment of the present disclosure.
[0079] At step 602, the method (600) may include considering the first dataset or first set of frames. Further, at step 604, the method (600) may include calculating an area of bounding box for each frame in the first dataset. At step 606, the method (600) may include determining if the calculated area of bounding box is greater than the median value determined above with reference to FIG. 5. If the calculated area is greater the median value, the method (600), at step 610, may include marking the corresponding frame as a new start frame. On the other hand, at step 608, the method (600) may include proceeding with calculating the area of the bounding box for next frame if the area of the bounding box is not greater than the median value. Further, upon marking the new start frame, the method (600), at step 612, may include moving through five consecutive frames. At step 614, the method (600) may include determining if the calculated area is less than the median value. Further, at step 618, the method (600) may include moving through three consecutive frames. At step 620, the method (600) may include marking the last of the three frames as a new end frame. On the other hand, at step 616, the method (600) may include proceeding to check with next frame if the calculated area is not less than the median value.
[0080] FIG. 7 illustrates an exemplary frame diagram (700) with initial start and end frames and new start and end frames, in accordance with an embodiment of the present disclosure. FIG. 7 is better understood with reference to FIGs. 5 and 6 as discussed above.
[0081] In FIG. 7, an initial start frame or a first start frame (702), an initial end frame or a first end frame (704), a new start frame or a second start frame (706), and a new end frame or a second end frame (708) are shown. Referring to the discussion above with respect to FIGs. 5 and 6, the frames between the initial start frame (702) and the initial end frame (704) provide a first dataset comprising the frames where the player is playing the shot. Reducing the frames with the new start frame (706) and the new end frame (708) provides a more spatio-temporally localized frames comprising exactly where the player is playing the shot. The new start frame (706) and the new end frame (708) along with frames in between provide the second dataset generated for shot identification.
[0082] FIG. 8 illustrates an exemplary computer system (800) in which or with which embodiments of the present disclosure may be utilized.
[0083] As shown in FIG. 8, the computer system (800) may include an external storage device (810), a bus (820), a main memory (830), a read-only memory (840), a mass storage device (850), communication port(s) (860), and a processor (870). A person skilled in the art will appreciate that the computer system (800) may include more than one processor and communication ports. The processor (870) may include various modules associated with embodiments of the present disclosure. The communication port(s) (860) may be any of an RS-232 port for use with a modem-based dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports. The communication port(s) (860) may be chosen depending on a network, such a Local Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system (800) connects. The main memory (830) may be random access memory (RAM), or any other dynamic storage device commonly known in the art. The read-only memory (840) may be any static storage device(s) including, but not limited to, a Programmable Read Only Memory (PROM) chips for storing static information e.g., start-up or basic input/output system (BIOS) instructions for the processor (870). The mass storage device (850) may be any current or future mass storage solution, which may be used to store information and/or instructions.
[0084] The bus (820) communicatively couples the processor (870) with the other memory, storage, and communication blocks. The bus (820) can be, e.g. a Peripheral Component Interconnect (PCI) / PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), universal serial bus (USB), or the like, for connecting expansion cards, drives, and other subsystems as well as other buses, such a front side bus (FSB), which connects the processor (870) to the computer system (800).
[0085] Optionally, operator and administrative interfaces, e.g. a display, keyboard, and a cursor control device, may also be coupled to the bus (820) to support direct operator interaction with the computer system (800). Other operator and administrative interfaces may be provided through network connections connected through the communication port(s) (860). In no way should the aforementioned exemplary computer system (800) limit the scope of the present disclosure.
[0086] While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.

ADVANTAGES OF THE PRESENT DISCLOSURE
[0087] The present disclosure provides an automated process of dataset generation for cricket shot identification.
[0088] The present disclosure provides determination of a shot being played using a cost-effective computer vision-assisted workflow.
[0089] The present disclosure provides a task of dataset generation to scale infinitely and run with minimal supervision.
[0090] The present disclosure provides building action recognition systems which enable coaching staff of cricket teams in helping players to improve their skills and techniques.
[0091] The present disclosure provides an extremely low-cost solution when compared to hiring domain experts to manually annotate the dataset.
[0092] The present disclosure provides an easy to operationalize solution as any other software or application.
[0093] The present disclosure provides a solution that is infinitely scalable as the process is completely automated and eliminates manual labour.
,CLAIMS:1. A system (120) for generating dataset for a sports activity (102), said system (120) comprising:
one or more processors (202); and
a memory (204) operatively coupled to the one or more processors (202), wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to:
obtain, from a video capturing device (112), raw video data associated with the sports activity (102), wherein the raw video data comprises one or more video frames comprising at least one player (104) and at least one entity (106) associated with the sports activity (102);
detect the at least one player (104) performing the sports activity in each of the one or more video frames of the raw video data;
create a bounding box around the detected at least one player (104) in each of the one or more video frames of the raw video data;
obtain a first start frame (702) and a first end frame (704) based on a presence of the detected at least one player (104) and the bounding box in each of the one or more video frames of the raw video data; and
generate a first dataset based on the obtained first start frame (702) and the first end frame (704).

2. The system (120) as claimed in claim 1, wherein the first dataset comprises the first start frame (702), the first end frame (704), and a set of frames in between the first start frame (702) and the first end frame (704).

3. The system (120) as claimed in claim 2, wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to determine an area associated with each bounding box in each of the one or more video frames of the first dataset.

4. The system (120) as claimed in claim 3, wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to calculate a median associated with the area of each bounding box in each of the one or more video frames of the first dataset.

5. The system (120) as claimed in claim 4, wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to determine that the area of the bounding box in the one or more video frames is greater than the median to obtain a second start frame (706).

6. The system (120) as claimed in claim 5, wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to determine that the area of the bounding box in the one or more video frames is less than the median to obtain a second end frame (708).

7. The system (120) as claimed in claim 6, wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to generate a second dataset comprising the second start frame (706), the second end frame (708), and one or more frames in between the second start frame (706) and the second end frame (708), and wherein the second dataset identifies a type of the sports activity performed by the at least one player (104).

8. The system (120) as claimed in claim 7, wherein the second dataset provides a spatio-temporally localized data associated with the raw video data.

9. The system (120) as claimed in claim 7, wherein the sports activity (102) comprises a cricket game.

10. The system (120) as claimed in claim 9, wherein the at least one player (104) performing the sports activity (102) comprises at least one of: a batsman, a bowler, a wicketkeeper, a fielder, and a runner.

11. The system (120) as claimed in claim 10, wherein the type of sports activity (102) comprises a batting shot performed by the batsman.

12. The system (120) as claimed in claim 9, wherein the at least one entity (106) comprises at least one of: a wide marker, wickets, and a crease.

13. The system (120) as claimed in claim 1, wherein the one or more processors (202) are configured to process the raw video data via an artificial intelligence (AI) engine (118) using at least one of: a computer vision technique and a data centric artificial intelligence (DCAI) technique.

14. A method (400) for generating dataset for a sports activity (102), the method (400) comprising:
obtaining, by one or more processors (202) associated with a system (120), raw video data associated with the sports activity (102) captured by a video capturing device (112), wherein the raw video data comprises one or more video frames comprising at least one player (104) and at least one entity (106) associated with the sports activity (102);
detecting, by the one or more processors (202), the at least one player (104) performing the sports activity (102) in each of the one or more video frames of the raw video data;
creating, by the one or more processors (202), a bounding box around the detected at least one player (104) in each of the one or more video frames of the raw video data;
obtaining, by the one or more processors (202), a first start frame (702) and a first end frame (704) based on a presence of the detected at least one player (104) and the bounding box in each of the one or more video frames of the raw video data; and
generating, by the one or more processors (202), a first dataset based on the obtained first start frame (702) and the first end frame (704).

15. The method (400) as claimed in claim 14, comprising:
determining, by the one or more processors (202), an area associated with each bounding box in each of the one or more video frames of the raw video data;
calculating, by the one or more processors (202), a median associated with the area of each bounding box in each of the one or more video frames of the raw video data;
determining, by the one or more processors (202), that the area of the bounding box in the one or more video frames is greater than the median to obtain a second start frame (706);
determining, by the one or more processors (202), that the area of the bounding box in the one or more video frames is less than the median to obtain a second end frame (708); and
generating, by the one or more processors (202), a second dataset comprising the second start frame (706), the second end frame (708), and one or more frames in between the second start frame (706) and the second end frame (708), wherein the second dataset identifies a type of the sports activity (102) performed by the at least one player (104).

16. A user equipment (UE), comprising:
one or more processors communicatively coupled to a system (120), wherein the one or more processors are configured to:
capture a video associated with a sports activity (102); and
transmit the captured video through a network (114) to the system (120), wherein the system (120) comprises a processor (202) configured to:
obtain the video associated with the sports activity (102) from the UE, wherein the video comprises one or more video frames comprising at least one player (104) and at least one entity (106) associated with the sports activity (102);
detect the at least one player (104) performing the sports activity (102) in each of the one or more video frames of the video;
create a bounding box around the detected at least one player (104) in each of the one or more video frames of the video;
obtain a first start frame (702) and a first end frame (704) based on the detected at least one player (104) and the bounding box in each of the one or more video frames of the video; and
generate a dataset based on the obtained first start frame (702) and the first end frame (704).

Documents

Application Documents

#	Name	Date
1	202221030596-STATEMENT OF UNDERTAKING (FORM 3) [27-05-2022(online)].pdf	2022-05-27
2	202221030596-PROVISIONAL SPECIFICATION [27-05-2022(online)].pdf	2022-05-27
3	202221030596-POWER OF AUTHORITY [27-05-2022(online)].pdf	2022-05-27
4	202221030596-FORM 1 [27-05-2022(online)].pdf	2022-05-27
5	202221030596-DRAWINGS [27-05-2022(online)].pdf	2022-05-27
6	202221030596-DECLARATION OF INVENTORSHIP (FORM 5) [27-05-2022(online)].pdf	2022-05-27
7	202221030596-ENDORSEMENT BY INVENTORS [27-05-2023(online)].pdf	2023-05-27
8	202221030596-DRAWING [27-05-2023(online)].pdf	2023-05-27
9	202221030596-CORRESPONDENCE-OTHERS [27-05-2023(online)].pdf	2023-05-27
10	202221030596-COMPLETE SPECIFICATION [27-05-2023(online)].pdf	2023-05-27
11	202221030596-FORM-8 [29-05-2023(online)].pdf	2023-05-29
12	202221030596-FORM 18 [29-05-2023(online)].pdf	2023-05-29
13	Abstract1.jpg	2023-10-28
14	202221030596-FER.pdf	2025-04-08
15	202221030596-FORM 3 [08-07-2025(online)].pdf	2025-07-08
16	202221030596-FER_SER_REPLY [07-10-2025(online)].pdf	2025-10-07

Search Strategy

1	Search_202221030596E_14-03-2024.pdf