Sign In to Follow Application
View All Documents & Correspondence

Automated Sports Categorization And Multi Language Text Summarization Using Neural Networks

Abstract: In the realm of sports, video analysis plays a crucial role in understanding team performance and strategic insights. However, manually summarizing lengthy sports match videos to extract key moments and highlights can be time-consuming and challenging. This invention relates to sports classification and multi-lingual summarization. The purpose of the invention, titled "Automated Sports Categorization and Multi-Language Text Summarization using Neural Networks," is to transform the field by automating event categorization and generating concise text summaries in multiple languages. Advanced neural network techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), are utilized to enhance sports analysis for coaches, analysts, and viewers. The system provides efficient and accurate summarization, adapts to various sports, and continuously improves by exposure to different match dynamics. By incorporating visual and chronological aspects of matches, the technology offers unbiased and comprehensive summaries. Furthermore, it overcomes language barriers with its multi-language summarization capability and enhances content through integration with multiple data types. The system ensures consistent and objective summaries, resulting in improved sports viewing experiences globally. 4 Claims and 6 Figures

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
06 November 2023
Publication Number
51/2023
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

MLR Institute of Technology
Laxman Reddy Avenue, Dundigal – 500 043

Inventors

1. Mrs. Saima Afreen
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal – 500 043
2. Mr. Dhatrika Kamal Kumar
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal – 500 043
3. Mr. Maddhe Sai Prashanth
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal – 500 043
4. Mr. Vuppala Praneeth Kumar
Department of Computer Science and Engineering, MLR Institute of Technology, Laxman Reddy Avenue, Dundigal – 500 043

Specification

Description:Field of the Invention
The proposed innovation relates to sports video-to-text summarization, outlining a technique for capturing and analysing key highlights and events within sports videos. By observing and interpreting visual clues, this technology seeks to effectively summarize sports events, enabling the generation of simple textual summaries for numerous sporting events in multiple languages.
Objective of the Invention
The invention objective of "Automated Sports Categorization and Multi-Language Text Summarization using Neural Networks" aims to transform sports video analysis by automating sports event categorization and generating concise text summaries in multiple languages. It streamlines the process of summarizing sports match videos, making it efficient and accessible to a global audience. By utilizing neural networks, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the invention seeks to enhance sports analysis for coaches, analysts, and viewers, and to provide summaries in various languages, revolutionizing the sports viewing experience.
Background of the Invention
In the domain of sports, comprehensive video analysis of game performance and strategies holds undeniable importance. Yet, the laborious manual task of distilling extensive sports videos to capture pivotal moments necessitates an innovative solution. This invention introduces the concept of "Automated Sports Categorization and Multi-Language Text Summarization using Neural Networks.” The primary objective is to automate the process of creating succinct summaries from sports videos. By harnessing the capabilities of neural networks, particularly convolutional and recurrent neural networks, this invention transforms the landscape of sports video summarization. The result is a streamlined approach that empowers coaches, analysts, and viewers with efficient access to crucial insights from sporting events.
US20120106925A1 refers to an automatic system for efficient video summarization and sharing, particularly suited for enterprise domains. By employing techniques like shot detection, slide recognition, clustering, and ranking. The system creates concise static summaries by utilizing illustrative thumbnail images that represent the content effectively. This approach empowers audience to quickly decide whether to engage with the full video, thereby reducing time, bandwidth, and storage costs. The methodology's adaptability to various video types, incorporating Optical Character Recognition (OCR contribute to enhanced accessibility and informed decision-making in video content consumption.
US7657836B2 refers to generating concise summaries of soccer videos. The approach focuses on recognizing captivating sections through the analysis of replays, close-up shots, and the significant play moments. This process utilizes visual colour cues and audio characteristics to pinpoint these noteworthy portions, ensuring that the resulting summaries maintain their engaging quality. The technique addresses complexities such as varying camera angles and viewer preferences. Ultimately, producing succinct yet captivating summaries of soccer matches.
US20160070963A1 refers to real-time video summarization, capturing video via a camera module to generate a concise ongoing summary. The summary retains important frames, continuously updated for significance. Frame importance is determined by optimizing an objective function that balances diversity and representation, capturing the elements while conserving resources. The technology integrates into devices with camera modules, enabling efficient, low-power, and minimal-storage video capture, even over limited bandwidth. It's versatile for applications like surveillance and IoT. The dynamic frame selection ensures vital content is captured, a significant advancement in video technology.
US20190303683A1 refers to the introduction of devices, systems, and methods aimed at enhancing video playback and creating video summaries. Specifically, the approach involves generating annotation data for each video frame to identify its content, such as faces, objects, pets, or speech. By calculating a priority metric for each frame using this annotation data, a video summary is generated. This summary can be tailored to specific criteria, such as faces and time intervals, allowing the creation of video segments that focus on selected faces within designated time periods. Additionally, the video summary can target multiple faces and objects by utilizing the annotation data. In essence, this technology aims to improve video viewing experiences and facilitate the generation of concise and relevant video summaries.
EP1659519A2 refers to a method and apparatus for summarizing sports videos using audio and image data. Traditional devices, like personal video recorders (PVRs), played stored videos and decrypted encrypted content. However, sports demand quick access to vital scenes, called "moving picture summarizing." Existing methods detect events or segment videos into play and non-play parts for summarization, but irrelevant content may be included. This invention addresses this by segmenting, extracting, detecting events, calculating importance, and summarizing based on significant shots. It emphasizes audio and visual events like cheering, ensuring effective and precise sports video summarization for better viewing.
KR20080105387A invention pertains to generating ongoing media program summaries based on previously aired content. Content from the media program is transcribed into text, and additional contextual elements like topics, speaker identities, and interactions from listeners are identified. These transcribed segments and contextual details serve as multi-modal inputs for a trained model, which produces summaries of the media program. These summaries are then shared with potential listeners/viewers of the program, presented in a menu or user interface, or even announced to them. The field of the invention extends to video summarization, particularly in sports contexts like soccer, tennis, and volleyball. The method involves decrypting encrypted image data to enable playback, essential for sports games where selective scenes, like goal moments, need rapid access. Various techniques, including event detection and categorization, are employed for effectively summarizing sports content.
Summary of the Invention
This innovation presents a system that automates the categorization and summarization of sports videos. By utilizing advanced neural networks such as CNNs and RNNs, the system provides efficient and precise summarization. It excels in identifying important events, adapting to different sports, and continuously improving through exposure to various match dynamics. The technology ensures unbiased summaries that encompass both the visual and chronological aspects of matches, and it surpasses language barriers with its multi-language summarization capability. By integrating audio and visual data, the system enhances the content experience. Moreover, it maintains consistent and coherent summaries while reducing subjective interpretations. Overall, this invention transforms sports analysis and the viewing experience by delivering timely, comprehensive, and accessible summaries to a global audience.

Brief Description of Drawings
The invention will be described in detail with reference to the exemplary embodiments shown in the figures wherein:
Figure-1: Flowgorithm representing the work flow of Automated Sports Categorization and Multi-Language Text Summarization using Neural Networks.

Figure-2:Classification of sports using CNN VGG16.

Figure-3:Flowgorithm for sports video to text summarization using neural networks .

Figure-4:Visual feature extraction using VGG16.

Figure-5:Identifying subject in frame using CNN.

Figure-6: Multilingual Translation using Neural Machine Translation.

Detailed Description of the Invention
This innovative solution integrates advanced neural network methodologies to revolutionize the categorization of sports videos and the generation of multi-language text summaries. Through the seamless convergence of machine learning, computer vision, and natural language processing, this system facilitates enhanced accessibility and cross-cultural comprehension of sports content.
Data Collection and Preprocessing: A diverse dataset of sports videos, each accompanied by captions or transcripts in various languages, is meticulously collected and trained to model. Initial preprocessing involves frame extraction and normalization, ensuring consistent format and quality across video frames.

For classification of sports,

Input and Preprocessing: The system starts by taking a video as input and capturing frames from it. These frames are then fed into CNN (Convolutional Neural Network) which adapts quickly in identifying patterns within the images. The CNN meticulously labels each frame, aiding in the determination of the specific sport being showcased. The spotlight in this process is on the VGG16 CNN, which has intelligence garnered from analysing a diverse array in the ImageNet dataset. A clever technique referred to as “transfer learning” refines the VGG16 CNN’s capabilities to focus exclusively on discerning various sports. This shrewd approach accelerates the learning process and surpasses the results obtained from starting from scratch. Ultimately, the system produces an output hat classifies the overarching sport being played in the video.

For sports video to text summarization,

Video Feature Extraction: After taking the input video, the CNNs are employed as feature extractors to transform raw pixel data of video frames into high-level visual features. These features capture intricate visual details and patterns embedded within the sports videos using activation functions. These can identify the subjects, objects in the frames. Fig(4) and fig(5)

Temporal Feature Encoding: temporal relationships among video frames are captured using advanced neural network models such as Long Short-Term Memory (LSTM) networks or Transformer-based models. These models learn to encode sequential information, enabling the system to comprehend the dynamic progression of sports events.

Attention Mechanisms: these are applied to focus on relevant temporal segments, allowing the model to weigh different parts of the video differently.

Encoder-Decoder Architecture:
Encoder: Processes the visual features along with temporal context to create a condensed representation of the video’s content.
Decoder: Generates a textual summary from the encoded representation. This can be achieved using RLM.

Language Generation: employs Recurrent Language Models which uses RNN-based language models to generate coherent sentences, one word at a time forming the summarized text.

Training and Evaluation:
Loss Functions: Use appropriate loss functions, such as sequence-to-sequence loss or reinforcement learning-based rewards, to guide the model’s learning during training.
Evaluation Metrics: Assess the quality of generated summaries using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) that compare generated text with reference summaries.

Fine-Tuning and Transfer Learning:
Domain-Specific Fine-Tuning: Fine-tune the model using domain-specific sports data to improve its performance on sports-related content.

Transfer Learning: Utilizes pre-trained models on a large text corpus to bootstrap the summarization model’s language capabilities.

For Multi-Lingual Translation,

To achieve multilingual accessibility for sports video summarization, a series of steps are involved. First, a diverse dataset of sports videos and their summaries is collected. Then, the videos are pre-processed to clean and format the data. Visual features are extracted from the videos using pretrained convolutional neural networks (CNNs). The models are then fine-tuned using the extracted features and summaries to ensure accurate multilingual summarization. Neural machine translation (NMT) models are developed for translation between languages. These models are trained on parallel corpora. The process generates initial summaries in the source language using fine-tuned models. These summaries are then translated to target languages using NMT models. Language detection mechanisms identify the source language for accurate translation. A user-friendly interface allows users to select the language of the summary. This initiates real-time processing and delivery of the summary. User feedback is continuously collected to refine the architecture and enhance the models for improved summaries and translations. This architecture seamlessly integrates data collection, preprocessing, feature extraction, fine-tuning, NMT translation, language detection, user interaction, real-time processing, and feedback mechanisms to enable accessible and multilingual sports video summarization. Fig(6)
Input Text: The input to the system is the sports content in the source language. This text can be descriptions, summaries, or any textual content related to sports events.

NMT Language Translation: The core of the solution is the Neural Machine Translation (NMT) component. The input text in the source language is passed through the NMT model. The model has been trained on parallel texts in multiple languages, enabling it to generate accurate translations in target languages. The NMT model employs deep learning techniques to capture the linguistic nuances and context of the source text.

Multilingual Text Output: The output of the NMT component is the translated text in the target languages. This output represents the input sports content translated into different languages. Each translated text maintains the core meaning and context of the original text while adapting it to the linguistic norms of the target language.

In conclusion, the sports type of input video is categorized, concise summarized text is generated and can be translated in many languages.
Advantages of the proposed model,
Enhanced Efficiency: The advanced utilization of neural network techniques results in improved speed and accuracy of automated summarization, ensuring timely delivery of summaries to sports enthusiasts and analysts.
Accurate Event Detection: Convolutional Neural Networks (CNNs) excel in extracting visual features, enabling precise detection of crucial events and actions within sports videos. This enhances the accuracy of summarization by capturing critical moments effectively.
Adaptation to Different Sports: The system's flexibility, utilizing CNNs and Recurrent Neural Networks (RNNs), allows it to adapt to the unique characteristics of various sports, making it suitable for summarizing a wide range of sporting events.
Learning and Improvement: Through continuous exposure to diverse sports videos, the system can learn and improve over time, adapting to different match dynamics and user preferences.
Objective Summaries: By relying on data-driven neural networks, the system can generate summaries that are less influenced by subjective biases, resulting in more objective summarization.
Comprehensive Summaries: The collaboration between CNNs and RNNs enables a comprehensive understanding of the video content. Visual features extracted by CNNs and temporal context provided by RNNs contribute to summaries that encompass both the visual and chronological aspects of the match.
Multi-Language Accessibility: The system's multi-language text summarization feature transcends language barriers, ensuring global accessibility to sports content and enhancing engagement and user satisfaction.
Rich Multi-modal Integration: By processing both audio and visual data, the system enriches summaries with auditory context, creating a more immersive content experience.
Consistent Summaries and Reduced Subjectivity: Neural networks ensure consistent and coherent summaries, eliminating the variability of manually created versions. The objective analysis conducted by neural networks minimizes user subjectivity, thereby enhancing precision in identifying key moments.
4 Claims and 6 Figures , Claims:The scope of the invention is defined by the following claims:
Claims:
1.The Automated Sports Categorization and Multi-Language Text Summarization system comprising:
a) A system for automated sports video summarization, comprising a processing unit employing advanced neural network techniques for improved speed and accuracy in summarization, delivering timely summaries to sports enthusiasts and analysts.

b) A convolutional neural networks (CNNs) for extracting visual features and enabling accurate event detection, enhancing the precision of summarization by capturing critical moments.

c) A recurrent neural networks (RNNs) for adapting to various sports and their unique characteristics, allowing for summarizing a wide range of sporting events, and further incorporates a learning mechanism that exposes the system to diverse sports videos, facilitating continuous improvement over time and adaptation to different match dynamics and user preferences.

2. As per claim 1, the system comprising data-driven neural networks generating objective summaries, minimizing subjective biases and resulting in more reliable and objective summarization.

3. As per claim 1, the system comprising a collaborative framework between CNNs and RNNs, producing comprehensive summaries that encompass both visual and chronological aspects of the match, and further comprising a method for automated sports video summarization, employing convolutional neural networks (CNNs) to extract visual features from lengthy sports match videos, utilizing recurrent neural networks (RNNs) to capture temporal dependencies within the video data.

4. As per claim 1, the computer program product stored on a non-transitory computer-readable medium, comprising instructions executed by a processor, the instructions causing the processor to extract visual features from sports match videos using convolutional neural networks (CNNs), and capture temporal dependencies through recurrent neural networks (RNNs).

Documents

Application Documents

# Name Date
1 202341075641-REQUEST FOR EARLY PUBLICATION(FORM-9) [06-11-2023(online)].pdf 2023-11-06
2 202341075641-FORM-9 [06-11-2023(online)].pdf 2023-11-06
3 202341075641-FORM FOR STARTUP [06-11-2023(online)].pdf 2023-11-06
4 202341075641-FORM FOR SMALL ENTITY(FORM-28) [06-11-2023(online)].pdf 2023-11-06
5 202341075641-FORM 1 [06-11-2023(online)].pdf 2023-11-06
6 202341075641-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [06-11-2023(online)].pdf 2023-11-06
7 202341075641-EDUCATIONAL INSTITUTION(S) [06-11-2023(online)].pdf 2023-11-06
8 202341075641-DRAWINGS [06-11-2023(online)].pdf 2023-11-06
9 202341075641-COMPLETE SPECIFICATION [06-11-2023(online)].pdf 2023-11-06