Abstract: The present invention provides a system and method for automatic tagging of cricket metadata. The system and method uses the artificially intelligent machines/algorithms trained for identifying different facets of cricket footage and feeds for cataloguing the identified content. The input cricket video/stream is broken into frames. Score card from the frames are extracted and scores are analyzed to provide key data like batsman, non-striker, bowler, runs scored and events like wicket, extras, etc in the ball, replay start and end, logical ball boundary (start and end in time) etc. Once the logical ball boundary is created, the frames inside the logical ball boundary are subjected to visual recognition, scene tagging, sound analysis etc. The scenes are tagged and stored to help create cricket highlights and other custom packages. [FIG.1]
FORM 2
The Patents Act 1970
(39 of 1970)
&
The Patent Rules 2003
COMPLETE SPECIFICATION
(See Section 10 and rule 13)
TITLE A SYSTEM AND METHOD FOR AUTOMATIC TAGGING OF
CRICKET METADATA
APPLICANT:
PRIME FOCUS TECHNOLOGIES LIMITED
True North, 63 Road No. 13 MIDC, Andheri (East), Mumbai 400093, Maharashtra,
India
PREAMBLE OF THE DESCRIPTION:
THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE INVENTION AND THE METHOD IN WHICH IT IS BEING
PERFORMED
A SYSTEM AND METHOD FOR AUTOMATIC TAGGING OF CRICKET METADATA
A) TECHNICAL FIELD
[0001] The present invention is generally related to a field of artificial intelligence and machine learning techniques. The present invention is particularly related to an improved system and method for media applications using artificial intelligence and machine learning. The present invention is more particularly related to a system and method for automatic tagging of cricket metadata using machine learning and artificial intelligence.
B) BACKGROUND OF THE INVENTION
[0002] With an advent of technology such as artificial intelligence and machine learning, a development of smart applications has steadily become a daily phenomenon. These techniques help mankind in automating processes in various fields including media applications like Computer Vision, Sentiment Analysis, Automatic Cataloguing, etc. However, an application of these techniques in a field of sports is still not completely exploited and there is much scope for further development.
[0003] Cricket as a sport is highly watched and a pursuit to create interesting cricket packages mostly includes a creation of highlights. Cricket metadata is tagged at various levels in EVS broadcast machines and in media asset management systems. Broadcasters usually run a war room of EVS machines and EVS operators during live cricket matches to tag, mark and assemble events
in a cricket match. Highlights packages, Live inserts, Talking point packages are created and inserted during a match, between innings and after the match as well. Highlights and interesting snippets are packaged into Video on Demand (VOD) after the match. Typically, 2-3 EVS machines and operators per production language are involved in the process. In case of live telecast, broadcasters prepare packages for live insertions during the broadcast, during innings breaks and post match. The broadcasters also publish stories for digital VOD during the innings and post the match.
[0004] Cricket metadata is also tagged offline for a future search and retrieval purposes for creation of packages or obtaining analytics. Typically, the cricket meta data tagging and highlights creation are done in a production floor or back-office manually with some manual taggers who work on media asset management systems. Due to increased dependency on human intervention and manually operated machines, the process becomes time consuming, costly and is not easily scalable.
[0005] Hence, there is a need for a system and method for automatic tagging of cricket metadata. There is also a need for a system and method for automatic tagging of cricket metadata using artificial intelligence and machine learning to create cricket highlights and other custom packages in a quick and efficient manner. Further, there is a need for a system and method for identifying different facets of cricket footage/feeds and cataloguing the identified content.
[0006] The above-mentioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.
C) OBJECT OF THE INVENTION
[0007] The primary object of the present invention is to provide a system and method for automatic tagging of cricket metadata.
[0008] Another object of the present invention is to provide a system and method for automatic tagging of cricket metadata using artificial intelligence and machine learning to create cricket highlights and other custom packages in a quick and efficient manner.
[0009] Yet another object of the present invention is to provide a system and method for identifying different facets of cricket footage/feeds and cataloguing the identified content.
[0010] Yet another object of the present invention is to provide a system and method for analyzing video scenes, scoreboard, sounds and actions from the video recordings.
[0011] Yet another object of the present invention is to provide a system and method to identify logical ball boundaries in a video recording of the cricket footage and break the footage into a plurality of frames which are then subjected to visual recognition.
[0012] Yet another object of the present invention is to provide a system and method to summarize the learning’s from the video footage using various artificial intelligent machines.
[0013] Yet another object of the present invention is to provide a system and method for automatically identifying cricketing data through a summarization over a time scale and an unionization across various elements such as score board, visual, action, sound and commentary analysis.
[0014] Yet another object of the present invention is to provide a system and method to use summarization and unionization to correct the results at various stages of operation thereby generating reliable and precise results.
[0015] Yet another object of the present invention is to provide a system and method for artificial intelligence based identification of cricket metadata in a live stream or a video on demand stream to reduce human costs, specialized EVS machine costs and operational overheads.
[0016] Yet another object of the present invention is to provide a system and method for automatic tagging of cricket metadata to enable an easy search and packaging of various events of a match quickly and let the producer find interesting mix and match of events for creating and broadcasting an interesting story resulting in a more value of money for the air time.
[0017] Yet another object of the present invention is to provide a system and method for identifying one or more replays in the video stream through a lack of scoreboard on the frames and/or by using replay and live screen markers that come on the video at the start and end of the video.
[0018] Yet another object of the present invention is to provide a system and method for automatic tagging of cricket metadata that allows the user to search
the tagged metadata using innovative search strings to obtain packages/events such all fours, batsman’s fours, bowler’s wicket etc.
[0019] These and other objects and advantages of the present invention will become readily apparent from the following detailed description taken in conjunction with the accompanying drawings.
D) SUMMARY OF THE INVENTION
[0020] The various embodiments of the present invention provide a system for automatic tagging of cricket metadata. The system comprises a vision cloud server configured for receiving a video file/stream with cricket content and cutting the received video file/stream into a plurality of frames at a designated number of frames per second (FPS). The vision cloud server also comprises an image separation and scorecard de-construction module configured for cutting a scorecard portion at a bottom of each frame into a separate piece. The system also comprises a summarization and unionization (S and U) cloud server configured for automatic tagging of the video file/stream with a cricket content. The S and U server comprises a summarization and unionization (S and U) engine configured for operating in a plurality of stages. The summarization is done over a time scale and the unionization is done across one or more elements. The one or more elements comprise a score card, visual, action, sound and commentary analysis. Each stage of summarization and unionization is configured for achieving an accurate and precise result. The system also comprises an Optical Character Recognition (OCR) engine communicatively coupled with the S and U cloud server and configured for identifying one or
more blocks of text in each frame. The OCR engine in combination with the summarization and unionization engine is further configured for identifying the logical ball boundaries in the frames. The S and U server also invokes a scene tagging engine configured for receiving the frames that are present within the identified logical boundaries from the S and U engine. The scene tagging engine is further configured for identifying and tagging the frames using a plurality of pre-stored tags and their combinations. The tagged frames are returned to the S and U engine. The S and U server further invokes an action tracking engine configured for receiving the video clips that are present within the identified logical boundaries from the S and U engine. The action tracking engine is further configured for identifying one or more actions in the frames that remain unidentified using scene tagging engine. The action tagged frames are returned to the S and U engine. The S and U server still further invokes a sound analysis engine configured for receiving the frames from the S and U engine. The sound analysis engine is further configured for tracking a plurality of sounds collected from the frames that are taken within a preset time (seconds) since a ball is delivered. The plurality of sounds include a sound of a ball hitting a bat, an appeal, an applause and a cheer. The S and U server still further invokes a commentary Natural Language Processing (NLP) engine configured for converting the speech data from the video stream into text. The commentary NLP engine is further configured for extracting the keywords from the text. The extracted keywords are analyzed by the NLP engine to identify the tags. The sound identified frames are returned to the S and U engine. The vision cloud
server comprises a database configured for receiving and storing the tagged metadata from the S and U server. The tagged metadata is returned to a server on seeking as response to a query using a plurality of innovative search strings.
[0021] According to one embodiment of the present invention, the image separation and scorecard de-construction module is further configured for cutting the score card portion into a plurality of pieces to identify batsman1, batsman2, bowler, score and number of overs. The identified data is re-assembled into a new image containing only the score.
[0022] According to one embodiment of the present invention, the summarization and unionization engine in combination with OCR engine is configured for identifying a plurality of parameters from the frames. The plurality of parameters comprise batsman 1, batsman 2, bowler, score difference between two successive balls, runs difference of a batsman between two successive balls ball, batsman at the strike end, crossing over of batsman at two opposite ends, legitimate ball delivery or an extra, portions/frames that are part of a replay, marking multiple replays and errors in OCR identification.
[0023] According to one embodiment of the present invention, the one or more actions identified by the action tracking engine include a batsman being beaten, a shot, a sweep, a catch, a yorker and a bouncer.
[0024] According to one embodiment of the present invention, the sound analysis engine is further configured for running a clip of audio corresponding to the video stream acquired from the ball delivery time through an NLP (Natural Language Processing) engine and converting the commentary from speech to
text to extract a plurality of keywords. The extracted keywords are analyzed by the NLP engine to identify one or more tags.
[0025] According to one embodiment of the present invention, the scene tagging engine, action tracking engine, sound analysis engine and the commentary NLP engine are configured to use Artificial Intelligence, neural networks and machine learning algorithms.
[0026] According to one embodiment of the present invention, a method for automatic tagging of cricket metadata is provided. The method comprises the steps of publishing a cricket content as a video stream and cutting of the video stream into a plurality of frames by an image separation and scorecard de-construction module. The score card portion at the bottom of each frame is separated as an individual piece from a rest of the image and submitted to an OCR engine by a summarization and unionization (S and U) engine. The plurality of blocks in the image is identified using the OCR engine and the identified plurality of blocks and data are returned to the S and U engine. The logical boundaries in the video frames are identified and the frames are sent for scene tagging. The scenes in the frames are tagged with a plurality of pre-defined tags and their combinations. The identified scene tags are summarized and unionized. The video frames are passed to an action tracking engine to track a plurality of actions and summarizing and unionizing the identified tags. The sounds present in the frames are tracked using a sound analysis engine and commentary are tracked with a Natural Language processing (NLP) engine. The sound identified tags are summarized and unionized. The tagged metadata stored
in a database of a vision cloud server which is then returned to a server on seeking as response to a query.
[0027] According to one embodiment of the present invention, the method further comprises cutting/dividing the score card portion into a plurality of pieces to identify batsman1, batsman2, bowler, score and overs. The identified data is re-assembled into a new image containing only the score.
[0028] According to one embodiment of the present invention, the method further comprises identifying a plurality of parameters from the frames. The plurality of parameters comprise batsman 1, batsman 2, bowler, score difference between two successive balls, runs scored between two successive balls, batsman at the strike end, checking of a crossover between the two batsmen at the two opposite ends, legitimate ball delivery or an extra, portions/frames that are part of a replay, marking multiple replays and errors in OCR identification.
[0029] According to one embodiment of the present invention, the method further comprises running a clip of audio corresponding to the video stream acquired for preset time from the time of ball delivery through the NLP engine and converting the commentary from speech to text to extract a plurality of keywords. The extracted keywords are analyzed by the NLP engine to identify one or more tags.
[0030] According to one embodiment of the present invention, the S and U engine is imparted with instructional learning, which brings in the knowledge of the rules of the game some of which include that the batsman cross over strike at the end of the over or when 1 or 3 runs are scored or during a catch being taken,
different methods of scoring, getting out, number of balls per over, the fact that the ball is not counted when a no-ball or wide is bowled, that there are 6 balls in an over, etc.
[0031] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating the preferred embodiments and numerous specific details thereof, are given by way of an illustration and not of a limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
E) BRIEF DESCRIPTION OF THE DRAWINGS
[0032] The other objects, features, and advantages will occur to those skilled in the art from the following description of the preferred embodiment and the accompanying drawings in which:
[0033] FIG. 1 illustrates a block diagram of a system for automatic tagging of cricket metadata, according to an embodiment of the present invention.
[0034] FIG. 2 illustrates a schematic representation of a process flowchart explaining a method of automatic tagging of cricket metadata, according to an embodiment of the present invention.
[0035] FIG. 3 illustrates a functional block diagram of a system for automatic tagging of cricket metadata, according to an embodiment of the present invention
[0036] FIG. 4 illustrates a flow diagram of a cataloguing process and automatic tagging of cricket metadata, according to an embodiment of the present invention.
[0037] FIG. 5 illustrates a flow chart explaining a method for automatic tagging of cricket metadata, according to an embodiment of the present invention.
[0038] Although the specific features of the present invention are shown in some drawings and not in others. This is done for convenience only as each feature may be combined with any or all of the other features in accordance with the present invention.
F) DETAILED DESCRIPTION OF THE INVENTION
[0039] In the following detailed description, a reference is made to the accompanying drawings that form a part hereof, and in which the specific embodiments that may be practiced is shown by way of illustration. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments and it is to be understood that the logical, mechanical and other changes may be made without departing from the scope of the embodiments. The following detailed description is therefore not to be taken in a limiting sense.
[0040] The various embodiments of the present invention provide a system for automatic tagging of cricket metadata. The system comprises a vision cloud server configured for receiving a video file/stream with cricket content and cutting the received video file/stream into a plurality of frames at a designated
number of frames per second (FPS). The vision cloud server also comprises an image separation and scorecard de-construction module configured for cutting a scorecard portion at a bottom of each frame into a separate piece. The system also comprises a summarization and unionization (S and U) cloud server configured for automatic tagging of the video file/stream with a cricket content. The S and U server comprises a summarization and unionization (S and U) engine configured for operating in a plurality of stages. The summarization is done over a time scale and the unionization is done across one or more elements. The one or more elements comprise a score card, visual, action, sound and commentary analysis. Each stage of summarization and unionization is configured for achieving an accurate and precise result. The system also comprises an Optical Character Recognition (OCR) engine communicatively coupled with the S and U cloud server and configured for identifying one or more blocks of text in each frame. The OCR engine in combination with the summarization and unionization engine is further configured for identifying the logical ball boundaries in the frames. The S and U server also comprises a scene tagging engine configured for receiving the frames that are present within the identified logical boundaries from the S and U engine. The scene tagging engine is further configured for identifying and tagging the frames using a plurality of pre-stored tags and their combinations. The tagged frames are returned to the S and U engine. The S and U server further comprises an action tracking engine configured for receiving the frames that are present within the identified logical boundaries from the S and U engine. The action tracking engine is further
configured for identifying one or more actions in the frames that remain unidentified using scene tagging engine. The action tagged frames are returned to the S and U engine. The S and U server still further comprises a sound analysis engine configured for receiving the frames from the S and U engine. The sound analysis engine is further configured for tracking a plurality of sounds collected from the frames that are taken within a preset time (seconds) since a ball is delivered. The plurality of sounds comprise a sound of a ball hitting a bat, an appeal, an applause and a cheer. The S and U server still further comprises a commentary Natural Language Processing (NLP) engine configured for converting the speech data from the video stream into text. The commentary NLP engine is further configured for extracting the keywords from the text. The extracted keywords are analyzed by the NLP engine to identify the tags. The sound identified frames are returned to the S and U engine. The vision cloud server comprises a database configured for receiving and storing the tagged metadata from the S and U server. The tagged metadata is returned to a server on seeking as response to a query using a plurality of innovative search strings.
[0041] According to one embodiment of the present invention, the image separation and scorecard de-construction module is further configured for cutting the score card portion into a plurality of pieces to identify batsman1, batsman2, bowler, score and number of overs. The identified data is re-assembled into a new image containing only the score.
[0042] According to one embodiment of the present invention, the summarization and union engine in combination with OCR engine is configured
for identifying a plurality of parameters from the frames. The plurality of parameters comprise batsman 1, batsman 2, bowler, score difference between two successive balls, runs difference of a batsman between two successive balls ball, batsman at the strike end, crossing over of batsman at two opposite ends, legitimate ball delivery or an extra, portions/frames that are part of a replay, marking multiple replays and errors in OCR identification.
[0043] According to one embodiment of the present invention, the one or more actions identified by the action tracking engine comprise a batsman being beaten, a shot, a sweep, a catch, a yorker and a bouncer.
[0044] According to one embodiment of the present invention, the sound analysis engine is further configured for running a clip of audio corresponding to the video stream acquired from the ball delivery time through an NLP (Natural Language Processing) engine and converting the commentary from speech to text to extract a plurality of keywords. The extracted keywords are analyzed by the NLP engine to identify one or more tags.
[0045] According to one embodiment of the present invention, the scene tagging engine, action tracking engine, sound analysis engine and the commentary NLP engine are configured to use Artificial Intelligence, neural networks and machine learning algorithms.
[0046] According to one embodiment of the present invention, a method for automatic tagging of cricket metadata is provided. The method comprises the steps of publishing a cricket content as a video stream and cutting of the video stream into a plurality of frames by an image separation and scorecard de-
construction module. The score card portion at the bottom of each frame is separated as an individual piece from a rest of the image and submitted to an OCR engine by a summarization and unionization (S and U) engine. The plurality of blocks in the image is identified using the OCR engine and the identified plurality of blocks and data are returned to the S and U engine. The logical boundaries in the video frames are identified and the frames are sent for scene tagging. The scenes in the frames are tagged with a plurality of pre-defined tags and their combinations. The identified scene tags are summarized and unionized. The video frames are passed to an action tracking engine to track a plurality of actions and summarizing and unionizing the identified tags. The sounds present in the frames are tracked using a sound analysis engine and commentary are tracked with a Natural Language processing (NLP) engine. The sound identified tags are summarized and unionized. The tagged metadata stored in a database of a vision cloud server which is then returned to a server on seeking as response to a query.
[0047] According to one embodiment of the present invention, the method further comprises cutting/dividing the score card portion into a plurality of pieces to identify batsman1, batsman2, bowler, score and overs. The identified data is re-assembled into a new image containing only the score.
[0048] According to one embodiment of the present invention, the method further comprises identifying a plurality of parameters from the frames. The plurality of parameters comprise batsman 1, batsman 2, bowler, score difference between two successive balls, runs scored between two successive balls, batsman
at the strike end, checking of a crossover between the two batsmen at the two opposite ends, legitimate ball delivery or an extra, portions/frames that are part of a replay, marking multiple replays and errors in OCR identification.
[0049] According to one embodiment of the present invention, the method further comprises running a clip of audio corresponding to the video stream acquired for preset time from the time of ball delivery through the NLP engine and converting the commentary from speech to text to extract a plurality of keywords. The extracted keywords are analyzed by the NLP engine to identify one or more tags.
[0050] FIG. 1 illustrates a block diagram of a system for automatic tagging of cricket metadata, according to an embodiment of the present invention. The system comprises the vision cloud server 102, the S and U cloud server 104 and the OCR engine 106. The vision cloud server 102 receives a video file or video stream with cricket content and cuts them into a plurality of frames at a designated number of frames per second (FPS). The vision cloud server 102 also comprises the image separation and scorecard de-construction module 108 configured for cutting the scorecard portion form the image which are typically at the bottom of the frame into a separate piece. The scorecard portion is further cut to identify a plurality of pieces such as the batsman1, batsman2, bowler, score, over’s etc. and re-assembled into a new image which contains only the score. The rest of the frame (without the scorecard) is kept separate for image or scene analysis.
[0051] According to one embodiment of the present invention, the summarizer and unionizer (S and U) cloud server 104 comprises the S and U engine 112 that operates on algorithm based instructional learning and runs in one or more phases/stages. In the first phase the summarizer and unionizer (S and U) engine 112 is configured for picking the score card and submitting to the OCR (Optical Character Recognition) engine 106. The OCR engine 106 is configured for identifying one or more blocks in the image. The one or more identified blocks comprises batsman, bowler, scores, over’s etc. The OCR engine 106 used here may be provided by a third party provider. The OCR engine 106 returns the identified data to the summarizer and unionizer engine (2nd phase) 112.
[0052] According to one embodiment of the present invention, the summarizer and unionizer engine 112 starts working in second phase. During this phase, the engine in combination with OCR engine 106 identifies a plurality of parameters such as logical ball boundaries (where the ball number starts and ends), batsman 1, batsman 2, bowler, score difference for previous ball to current ball, batsman’s runs difference for previous ball to current ball, who is the striker, did they cross, is it a legitimate ball or an extra, what are the portions/frames that were part of a replay, mark multiple replays, errors in OCR identification etc. The errors in OCR identification are resolved by matching against the playing 11 names entered already or by checking for anomalies in score and ball number jumps.
[0053] According to one embodiment of the present invention, the frames that are between the identified logical ball boundaries are then sent to the scene tagging engine 114. In one example embodiment, the scene tagging engine 114 is implemented as a convolutional neural network. The scene tagging engine 114 identifies and tags the frames using a plurality of tags and their combinations. The scene tagging engine 114 is trained indigenously to identify the plurality of tags and combinations of tags such as bowling, celebration, umpire signals, huddles etc. The tagged frames are then returned on a per frame basis to the S and U engine 112 for another phase/round (3rd phase) of summarization and unionization. In this 3rd phase, the starting ball boundary is moved to the exact frame where the bowling action scene is identified. The respective umpiring signals are marked against the ball and the respective tags like celebration, huddle etc. are also marked.
[0054] According to one embodiment of the present invention, the S and U engine 112 passes the video frames to an action tracking engine 116. The action tracking engine 116 is configured for identifying the actions that are difficult to identify using scene tagging engine 114. The action tracking engine 116 identifies actions like a batsman being beaten, a shot, a sweep, a catch, a yorker, bouncer etc. In one example embodiment, this engine is implemented as a 3D spacio temporal neural network and drawn from an open source. The frames are then sent to the S and U engine 112 for another phase/round (4th Phase) of summarization and unionization. In the 4th phase, the S and U engine 112 marks the action tags between the ball boundaries.
[0055] According to one embodiment of the present invention, the frames are then forwarded to a sound analysis engine 118. The sound analysis engine 118 is configured for tracking various sounds in the frames. This engine uses neural networks to track sounds that are heard from the time ball is delivered to the next few seconds (e.g. 2 to 8 seconds). These tags indicate a ball hitting bat sound (in the first 2 seconds), an appeal, an applause, a cheer etc. The clip of audio starting from the ball delivery time is run through a commentary NLP (Natural Language Processing) engine 120 and the commentary is converted from speech to text for extracting the keyword. The extracted keywords are analyzed by the NLP engine 120 to identify tags like beaten, yorker, four, six, 50, 100, beamer etc. The analyzed frames are further sent to the S and U engine 112 for another phase/round (5th phase) of summarization and unionization.
[0056] According to one embodiment of the present invention, the S and U engine 112 completes identification of the ball boundaries, the events in the ball from across, score board, Visual, Audio, Action classifications etc. The tagged metadata is then stored into the database 110 and returned to the servers on seeking or search results are returned using search API’s (Application Programming Interface).
[0057] According to one embodiment of the present invention, the scene tagging engine, action tracking engine, sound analysis engine and the commentary NLP engine uses artificial intelligence, neural networks and machine learning algorithms.
[0058] According to one embodiment of the present invention, several parameters are AI tagged and searched using the system of present invention. These parameters can be related to batsman, bowler, packages, actions etc. Some of the example parameters related to batsman that can be tagged and searched include batsman’s four, batsman’s four with replays, batsman’s sixes, batsman’s sixes with replays, batsman’s out, batsman’s out with replays, batsman’s 50, batsman’s 100 and the like. For the bowler, the parameters include bowler’s wickets, bowler’s wicket with replays, bowler’s extras etc. Examples of the packages include all fours, all fours with replays, all sixes, all sixes with replays, all wickets, all wickets with replays, all extras etc. Similarly, the parameters for action tracking include batsman beaten, bowler’s Yorkers, batsman’s celebration, replay cut in and cut out, umpiring signals, fielders and the like.
[0059] According to one embodiment of the present invention, the system facilitates creation of packages for live insertions during the broadcast, during innings breaks and post match. Broadcasters can also publish stories for digital VOD during the innings and post the match. Examples of the live inserts include batsman’s fours and sixes, batsman dropped, bowler’s wickets, catch drops, misfields, extras in the match etc. Examples of the innings break and post match stories include innings highlights, match highlights, batsman’s sixes and fours, run outs in the innings, celebrations, milestones etc. Examples of the VOD packages include, pre-match trailers/previews, match highlights, team wickets, match report card, batsman’s shots, final over-relive, key breakthroughs etc.
[0060] FIG. 2 illustrates a schematic depicting a high level solution blueprint diagram of the system, according to one embodiment of the present invention. With respect to FIG. 2, the Media Asset Management (MAM) server 202 receive mixed feed form broadcaster in real time or as time coded score feed. The MAM server can be an on-premise server if required. The features of the MAM server include contemporary UI and search screens. It is also capable of handling high speed live video. The mixed feed received from the broadcaster or the score feed and the match data like teams, compositions etc. is then fed to the vision cloud server 204. The vision cloud server performs OCR, compound object/scene tagging, action detection and spacio temporal analysis, high speed GPU scaling, sound detection and commentary text analysis.
[0061] According to one embodiment of the present invention, at the high level, the process includes feeding of mixed/dirty feed into MAM server. The MAM then feed clear vision cloud server for auto-tagging the content, classification, assorting and recommendations. The operator works on a special search and Reco UI and queries for clips of certain events, readymade packages with pre-programmed graphics, assembly of events into packages for edit, highlights packages and custom packages. The operator further uploads the content after quality control for insertion into innings break shows, post match shows and VOD video pools.
[0062] FIG. 3 illustrates a functional block diagram of a system for automatic tagging of cricket metadata, according to one embodiment of the present invention. With respect to FIG. 3, the video stream is published to a
vision cloud by the Media Asset Management (MAM) server in step 1. At step 2, the video stream is cut into a plurality of frames. The frames are then provided as input to an image separation and de-construction module at step 3. The image separation and de-construction module separate the scorecard portion from the frames. At step 4, the frames are submitted to a S and U server. The S and U server then forwards these frames to an OCR engine at step 5. The OCR engine in combination with S and U engine provided in the S and U server identifies logical ball boundaries and a plurality of other parameters form the frames. The identified data is returned to the S and U engine at step 6. The S and U engine then marks the replays, cleans up the data and eliminates errors from the identified data at step 7. The frames are the forwarded to a scene tagging engine at step 8. The scene tagging tags the frames using several pre-defined tags like umpire signals, people etc. The scene tagging engine uses R-CNN neural network, scene classification model and compound object identification. The tagged data is then returned to the S and U engine at step 9.
[0063] According to one embodiment of the present invention, the S and U engine then cut the frames into fine ball boundaries, notes the tags and adjusts the replay objects at step 10. At step 11, the frames are sent to an action tracking engine which improvises actions identified and also identifies new actions. The identified data is returned to the S and U engine at step 12. At step 13, the frames are sent to the sound analysis engine which identifies bat-ball sounds, stumps and appeals. The identified frames are then returned to the S and U engine at step 14. The S and U engine re-summarizes and unionizes the identified data at step
15. At step 16, the tagged metadata is then stored in a database provided in the vision cloud. The tagged metadata stored in the database is then returned to the MAM servers on seeking it at step 17.
[0064] FIG. 4 illustrates a diagram depicting cataloguing process and status, according to one embodiment of the present invention. With respect to FIG. 4, in the first pass of OCR (402), the tasks carried out in sequential manner comprises data segmentation, data cleanup, data restructuring, error resolution, associations, identifying logical ball boundaries, action replay marking and event identification. During the process of visual tagging (404), the tasks performed comprise invocation of scene tagging module by the S and U engine, ball boundary identification, scene cataloguing and scene identification. The process of action tagging (406) comprises bowling action identification, event identification, action tracking, sound tracking and advanced action classification. The advance associations (408) include replay association and replay proposals for manual interventions. Lastly, the search indexing improvements (410) include search string customization.
[0065] FIG. 5 illustrates a flow chart depicting a method for automatic tagging of cricket metadata, according to an embodiment of the present invention. With respect to FIG. 5, the method comprises publishing the cricket content as a video stream (502) and cutting of the video stream into a plurality of frames by an image separation and scorecard de-construction module (504). The method also comprises separating the score card portion at the bottom of each frame into a separate piece from the rest of the image and submitting to an OCR
engine by a S and U engine (506). The method still further comprises identifying various blocks in the image using an OCR engine and returning the identified data to S and U engine (508). The method still further comprises identifying logical boundaries in the video frames and sending the frames for scene tagging (510). The method still further comprises tagging the scenes in the frames using a plurality of pre-defined tags and their combination (512). The identified scene tags are summarized and unionized. The method still further comprises passing the video frames to an action tracking engine to track various actions and summarizing and unionizing the identified tags (514). The method still further comprises tracking sounds in the frames using sound analysis engine and commentary NLP engine, and wherein the sound identified tags are summarized and unionized (516). The method still further comprises storing the tagged metadata in a database of a vision cloud server (518).
[0066] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating the preferred embodiments and numerous specific details thereof, are given by way of an illustration and not of a limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
G) ADVANTAGES OF THE INVENTION
[0067] The various embodiments of the present invention provide a system and method for automatic tagging of cricket metadata. The system and method uses various Artificially Intelligent machines/algorithms trained for identifying different facets of cricket footage and feeds and catalogues the identified content. The system and method is configured for identifying various actions from the video input like beaten/play and miss by using various techniques such as hearing the sound of the bat-hitting-ball and its absence, listening to commentary, visual action recognition using spacio temporal analysis, reducing content to a ball boundary to limit the possibilities and hence training sharply for actions (like bowling scene, appeal, etc.), identifying scores, batsman/striker, who got out and the runs scored using score card etc.
[0068] The system and method also identifies one or more replays on the screen using lack of scoreboard on the frames and/or replay and Live screen markers that come on the video at the start and end of the video. Further, innovative search strings are used to obtain the packages like all fours, batsman’s fours, bowler’s wicket, etc. Every stage of S and U corrects the results, makes them more reliable and precise.
[0069] Thus, AI based identification of cricket metadata in a Live stream or a VOD stream reduces human costs, specialized EVS machine costs and operational overheads. Automatic tagging enables easy search and packaging of various events of a match quickly and lets the producer find interesting mix and
match of events for interesting story telling which inturn enables more value for the air time.
[0070] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such as specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments.
[0071] It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modifications.
[0072] Although the embodiments herein are described with various specific embodiments, it will be obvious for a person skilled in the art to practice the embodiments herein with and without modifications.
We Claim:
1. A system for automatic tagging of cricket metadata, the system comprises:
a vision cloud server configured for receiving a video file/stream with a cricket content and cutting/dividing the video file/stream into a plurality of frames at a predetermined number of frames per second (FPS);
an image separation and scorecard de-construction module provided in the vision cloud server, and wherein the image separation and scorecard de-construction module is run on a hardware processor and configured for cutting/dividing a scorecard portion at a bottom of each frame into a separate/individual piece;
a summarization and unionization (S and U) cloud server configured for automatic tagging of the video file/stream with the cricket content;
a summarization and unionization (S and U) engine provided in the S and U cloud server, and wherein the summarization and unionization is run on the hardware processor and is configured for operating in a plurality of stages, and wherein the summarization process is done over a time scale and the unionization process is done across one or more elements, and wherein the one or more elements comprise a score card, visual, action, sound and commentary analysis;
an Optical Character Recognition (OCR) engine communicatively coupled with the S and U cloud server, and wherein the OCR engine is either a 3rd party cloud service or run on the hardware processor and configured for identifying one or more blocks of text in each frame, and wherein the OCR engine in combination with the summarization and unionization engine is further configured to identify logical ball boundaries in the plurality of frames;
a scene tagging engine provided in the S and U cloud server, and wherein the scene tagging engine is run on the hardware processor and a GPU and configured for receiving frames that are in the identified logical boundaries from the S and U engine, and wherein the scene tagging engine is further configured for identifying and tagging the frames using a plurality of pre-stored tags and a combination thereof, and wherein the tagged frames are returned to the S and U engine;
an action tracking engine provided in the S and U cloud server, and wherein the action tracking engine is run on the hardware processor and a GPU and configured for receiving a plurality of frames that are present in the identified logical boundaries from the S and U engine, and wherein the action tracking engine is further configured for identifying one or more actions in the plurality of unidentified frames that remain unidentified using the scene tagging engine, and wherein the action tagged frames are returned to the S and U engine;
a sound analysis engine provided in the S and U cloud server, and wherein the sound analysis engine is run on the hardware processor and configured for receiving the plurality of frames from the S and U engine, and wherein the sound analysis engine is further configured for tracking a plurality of sounds present in the frames that are acquired for a preset time from the a ball delivery time, and wherein the plurality of sounds comprise a sound of a ball hitting a bat, an appeal, an applause and a cheer;
a commentary Natural Language Processing (NLP) engine provided in the S and U cloud server, and wherein the commentary NLP engine is run on the hardware processor and configured for converting a speech/voice data from the video stream into text data, and wherein the commentary NLP engine is further configured for extracting keywords from the text data, and wherein the extracted keywords are analysed by the NLP engine to identify a plurality of tags, and wherein the plurality of frames are returned to the S and U engine after a completion of a sound identification process; and
a database provided in the vision cloud server, and wherein the database is configured for receiving and storing the tagged metadata from the S and U server, and wherein the tagged metadata is returned to a server upon seeking a response to a query using a plurality of search strings.
2. The system according to claim 1, wherein the image separation and scorecard de-construction module is further configured for cutting/dividing the score card portion into a plurality of pieces to identify batsman1, batsman2, bowler, score and overs, and wherein the identified data is re-assembled into a new image containing only a score data.
3. The system according to claim 1, wherein the summarization and unionization engine in combination with OCR engine is configured for identifying a plurality of parameters from the plurality of frames, and wherein the plurality of parameters comprise batsman 1, batsman 2, bowler, score difference between two successive balls, runs scored by each batsman between two successive balls, batsman at a strike end, checking/identifying a crossover of two batsmen at two opposite ends, legitimate ball delivery or an extra, portions/frames that are part of a replay, marking multiple replays and errors in OCR identification.
4. The system according to claim 1, wherein the one or more actions identified by the action tracking engine comprise a batsman being beaten, a shot, a sweep, a catch, a yorker and a bouncer.
5. The system according to claim 1, wherein the sound analysis engine is further configured for running a clip of audio corresponding to the video stream starting from the ball delivery time through a Natural Language Processing (NLP) engine and converting a commentary from speech/voice data to text data to extract a plurality of keywords, and
wherein the extracted keywords are analyzed by the NLP engine to identify one or more tags.
6. The system according to claim 1, wherein the scene tagging engine, the action tracking engine, the sound analysis engine and the commentary NLP engine are configured to use Artificial Intelligence, neural networks and machine learning algorithms.
7. A method for automatic tagging of cricket metadata using a computing device provided with a hardware processor and memory, the method comprising the steps of:
publishing a cricket content as a video stream or an image;
cutting/dividing the video stream into a plurality of video frames by an image separation and scorecard de-construction module;
separating a score card portion at a bottom of each frame into an individual/separate piece from a rest of the image by a summarization and unionization (S and U) engine and submitting the separated score card portion to an Optical Character Recognition (OCR) engine;
identifying a plurality of blocks in the image using an OCR engine and returning the identified data to S and U engine;
identifying logical boundaries in the plurality of video frames and sending the plurality of video frames for scene tagging;
tagging a plurality of scenes in the plurality of video frames using a plurality of pre-defined tags and their combinations, and wherein the identified scene tags are summarized and unionized;
passing the plurality of video frames to an action tracking engine to track various actions and summarizing and unionizing the identified tags;
tracking sounds present in the plurality of video frames using sound analysis engine and commentary Natural Language Processing (NLP) engine NLP engine, and wherein the sound identified tags are summarized and unionized; and
storing the tagged metadata in a database of a vision cloud server which is then returned to a server upon seeking a response to a query.
8. The method according to claim 7 further comprises the step of cutting the score card portion into a plurality of pieces to identify batsman1, batsman2, bowler, score and overs, and wherein the identified data is re-assembled into a new image containing only the score.
9. The method according to claim 7 further comprises the step of identifying a plurality of parameters from the plurality of video frames, and wherein the plurality of parameters comprise batsman 1, batsman 2, bowler, runs scored between two successive balls, runs scored by a batsman between two successive balls, batsman at a strike end, checking or identifying a crossover of two batsmen at two opposite ends, legitimate ball delivery or an extra, portions/frames that are part of a replay, marking multiple replays and errors in OCR identification.
10. The method according to claim 7 further comprises the step of running a clip of audio corresponding to the video stream starting from the ball delivery time through the NLP engine and converting a commentary from a speech/voice data to a text data to extract a plurality of keywords, and wherein the extracted keywords are analyzed by the NLP engine to identify one or more tags.
| # | Name | Date |
|---|---|---|
| 1 | 201821038840-STATEMENT OF UNDERTAKING (FORM 3) [12-10-2018(online)].pdf | 2018-10-12 |
| 2 | 201821038840-REQUEST FOR EXAMINATION (FORM-18) [12-10-2018(online)].pdf | 2018-10-12 |
| 3 | 201821038840-PROOF OF RIGHT [12-10-2018(online)].pdf | 2018-10-12 |
| 4 | 201821038840-POWER OF AUTHORITY [12-10-2018(online)].pdf | 2018-10-12 |
| 5 | 201821038840-FORM 18 [12-10-2018(online)].pdf | 2018-10-12 |
| 6 | 201821038840-FORM 1 [12-10-2018(online)].pdf | 2018-10-12 |
| 7 | 201821038840-DRAWINGS [12-10-2018(online)].pdf | 2018-10-12 |
| 8 | 201821038840-DECLARATION OF INVENTORSHIP (FORM 5) [12-10-2018(online)].pdf | 2018-10-12 |
| 9 | 201821038840-COMPLETE SPECIFICATION [12-10-2018(online)].pdf | 2018-10-12 |
| 10 | Abstract1.jpg | 2018-11-26 |
| 11 | 201821038840-ORIGINAL UR 6(1A) FORM 1 & FORM 26-151018.pdf | 2019-03-19 |
| 12 | 201821038840-FER.pdf | 2021-10-18 |
| 13 | 201821038840-FORM 13 [14-01-2022(online)].pdf | 2022-01-14 |
| 14 | 201821038840-FER_SER_REPLY [14-01-2022(online)].pdf | 2022-01-14 |
| 15 | 201821038840-US(14)-HearingNotice-(HearingDate-10-04-2024).pdf | 2024-03-20 |
| 16 | 201821038840-Correspondence to notify the Controller [04-04-2024(online)].pdf | 2024-04-04 |
| 17 | 201821038840-Written submissions and relevant documents [25-04-2024(online)].pdf | 2024-04-25 |
| 18 | 201821038840-POA [25-04-2024(online)].pdf | 2024-04-25 |
| 19 | 201821038840-MARKED COPIES OF AMENDEMENTS [25-04-2024(online)].pdf | 2024-04-25 |
| 20 | 201821038840-FORM 13 [25-04-2024(online)].pdf | 2024-04-25 |
| 21 | 201821038840-AMMENDED DOCUMENTS [25-04-2024(online)].pdf | 2024-04-25 |
| 22 | 201821038840-US(14)-ExtendedHearingNotice-(HearingDate-24-07-2024).pdf | 2024-07-12 |
| 23 | 201821038840-US(14)-ExtendedHearingNotice-(HearingDate-23-07-2024).pdf | 2024-07-12 |
| 24 | 201821038840-Correspondence to notify the Controller [19-07-2024(online)].pdf | 2024-07-19 |
| 25 | 201821038840-Written submissions and relevant documents [07-08-2024(online)].pdf | 2024-08-07 |
| 26 | 201821038840-MARKED COPIES OF AMENDEMENTS [07-08-2024(online)].pdf | 2024-08-07 |
| 27 | 201821038840-FORM 13 [07-08-2024(online)].pdf | 2024-08-07 |
| 28 | 201821038840-AMMENDED DOCUMENTS [07-08-2024(online)].pdf | 2024-08-07 |
| 29 | 201821038840-PatentCertificate09-08-2024.pdf | 2024-08-09 |
| 30 | 201821038840-IntimationOfGrant09-08-2024.pdf | 2024-08-09 |
| 31 | 201821038840-PROOF OF ALTERATION [05-06-2025(online)].pdf | 2025-06-05 |
| 32 | 201821038840-FORM-26 [06-06-2025(online)].pdf | 2025-06-06 |
| 1 | search026057E_15-07-2021.pdf |
| 2 | amdsearch038840AE_10-05-2022.pdf |