Method And System For Providing Context Aware Media Streaming Content

< Back

Method And System For Providing Context Aware Media Streaming Content

Abstract: A system and a method for providing a context aware media streaming content. The method comprises receiving, at a character and object recognition module [102], a set of frames of a media streaming content and analyzing, at least one selected frame from the set of frames. Further, identifying, by the character and object recognition module [102], character(s) and/or object(s) based on the analysis. Furthermore, indexing, by a processing unit [104], at least one frame from the at least one selected frame that encompasses the character(s) and/or the object(s) and generating, by the processing unit [104], customized artwork(s) based on the indexing and a set of user attributes. Thereafter, generating, the context aware media streaming content based at least on the customized artwork(s) and providing, the context aware media streaming content on the user device(s). [Fig. 1]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

30 November 2022

Publication Number

22/2024

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

Novi Digital Entertainment Private Limited

Star House, Urmi Estate, 95, Ganpatrao Kadam Marg, Lower Parel (West), Mumbai - 400013, India

Inventors

1. Ramesh V. Panchagnula (

Star House, Urmi Estate, 95, Ganpatrao Kadam Marg, Lower Parel (West), Mumbai - 400013, India

2. Tao Xiong

Unit N711, Floor 7, North Building, Raycom Infotech Park Tower C, No.2 Kexuyuan South Road, Haidian District, Beijing, China – 100190

3. Haifang Qin

Unit N711, Floor 7, North Building, Raycom Infotech Park Tower C, No.2 Kexuyuan South Road, Haidian District, Beijing, China – 100190

Specification

FORM 2
THE PATENTS ACT, 1970
(39 OF 1970)
AND
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See section 10 and rule 13)
“METHOD AND SYSTEM FOR PROVIDING CONTEXT AWARE MEDIA STREAMING CONTENT”
I/We, Novi Digital Entertainment Private Limited, an Indian National, of Star House, Urmi Estate, 95, Ganpatrao Kadam Marg, Lower Parel (West), Mumbai ‐ 400013 Maharashtra, India.
The following specification particularly describes the invention and the manner in which it is to be performed.

METHOD AND SYSTEM FOR PROVIDING CONTEXT AWARE MEDIA STREAMING
CONTENT
TECHNICAL FIELD:
The present invention generally relates to streaming of media content and more particularly to methods and systems for providing enhanced context aware media streaming including enhanced content search, content recommendation, target content placement, content artwork customization based on indexing of media content and object and character recognition.
BACKGROUND OF THE DISCLOSURE:
The following description of the related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section is used only to enhance the understanding of the reader with respect to the present disclosure, and not as an admission of the prior art.
Streaming media platforms providing media content over the Internet have become increasingly popular in the last few years. OTT platforms provide video on demand as well as live streaming content to the users. There are times when a user who has previously viewed a certain piece of multimedia material needs or wants to access it again at a later time. For instance, a person who previously watched multimedia content would desire to continue watching from a desired point of his interest, or directly jump to a scene of his interest or preference (for example, some scene which the user enjoyed watching in the past and wants to watch again or someone may have recommended to the user to watch, etc.). Typically, media players on the OTT platforms or the user devices provide a feature

whereby a user can forward or rewind the video by a fixed number of seconds, say 10 or 15 seconds, to quickly skip some content (such as ads) or replay some content. Also, the media players on these platforms provide a navigation bar using which the user can forward or rewind to a particular point on the timeline, or start exactly from the point where the user left as the media players on the user devices would save the time point on which the user paused or stopped playing the media content. The existing OTT platforms/ solutions heavily rely on timestamping, i.e., the OTT platform may store the timestamp where the user last left the video so that it can be resumed from that timestamp; similarly, the OTT platform may forward or rewind content by 15 seconds by moving ahead/ behind 15 seconds from the current timestamp.
It is also often the case that users wish to skip certain scenes or watch a particular scene or character in a video/media content, however currently known solutions have limited capability of allowing the users to skip certain scenes as the currently known solutions fail to provide accurate information about scenes/ characters/ objects in the video/media content. If the user wants to skip certain scenes/ watch any particular scene or character, he/she is required to drag the cursor on the navigation/ time progress bar (however, there is no guarantee that the user lands on exactly the point he/she wants to land).
At best in the currently existing systems, there is the feature of including a table of contents providing one or more items, that allows a viewer to navigate to a particular item within the content. This allows a very limited use case by which the user can land at a particular point in time of the media content. These existing systems allow the user to go to a point in time in the video where the next item/scene is displayed. However, the user may not wish to watch the items predefined in the table of contents but may wish to watch or skip a particular character or object in that video. The existing solutions provide no flexibility to this

effect as they are solely based on a predefined table of contents that are hard coded with the video primarily based on the scene boundaries in the content.
Therefore, the existing methods of providing multimedia content via OTT platforms or other means using the internet require viewers to either watch the entire content from beginning to end or to haphazardly advance or rewind through it using the video player’s controls (such as fast forward, rewind, etc.) or to watch the complete scene from the items predefined or marked in the content and provided as a table of contents to the user.
One major drawback of the existing systems is that these are not fully context aware based on characters and objects recognized in the frames, or in other words they are not able to automatically become aware of the context of the content that is being streamed on the user device. In order to identify any context, the existing systems rely on manual intervention, where a human operator would manually have to review the content to understand what is the context of the content being played or on automated or semi‐automated processing that requires heavy resource consumption and still be inefficient in terms of time and costs. For providing any other enhancement with respect to the context, the existing solutions rely on manual intervention. The existing solutions are unable to automatically interpret or understand the context of the content being displayed and thus do not provide any enhanced features based on the automatic identification of the context of the content.
Since the existing systems are unable to automatically identify the context of the content based on characters and objects recognized in the frames in the media stream, they also do not provide for automatic in‐video feed targeted content insertions (such as advertisements) based on such context, without the need for manual intervention. Further, the existing systems are also unable to provide for

suggesting/recommending content of interest to the users, the content that is based on recognition of certain character(s) or object(s) present in a media stream. Also, the targeted content provided in live video feeds, are often done by manual intervention or semi‐automated systems. These targeted content are not dependent and/or not related to the content being displayed during the live video feed.
Similarly, since the existing systems are unable to automatically identify the context of the content in the media stream, they also do not provide for enhanced searching based on context. The existing solutions are able to provide search feature only based on the titles of the media, top content, etc. Since the existing systems do not have any context of the content in the media stream, they are unable to provide search based on such context. And therefore, the search results may not always be accurate using the existing systems.
Furthermore, existing systems are unable to provide character‐based recommendations to the users as the existing systems are unable to automatically identify the context of the content in the media stream. The existing solutions only know those characters of a particular media such as a movie, that have been listed in the description of that media. However, the existing systems are unable to capture all the characters of the media based on the broad description provided and thus are unable to provide character based recommendations to the users.
Also, the existing solutions provide fixed artwork such as previews, thumbnails, teasers, trailers, etc., for each media content. That is to say that for a particular media content, say a movie, the existing solutions provide the same thumbnail, preview, trailer and teaser of the movie to all the users, irrespective of the interests of that particular user. This is again because the existing solutions are unable to identify the context of the content and thus are unable to provide any

customization on the artwork based on such context and user preferences. Additionally, there are existing systems that provide for dynamic or customized artwork to some extent, but a lot of computational processing and related hardware resources are needed to implement the same.
Yet another limitation of the existing solutions is that they are unable to provide detailed and relevant description of the characters and/or objects in a media stream since they are unable to understand the context of the media stream. Similarly, the existing solutions are also unable to provide the user with shoppable e‐commerce linking in the media content due to the lack of automatic context awareness of the media content.
In order to solve the above and other related inherent problems, it is an imperative need to create a method and system that can provide enhanced context aware media streaming including enhanced content search, content recommendation, target content placement, content artwork customization, basis indexing of media content based on object and character recognition.
SUMMARY OF THE DISCLOSURE
This section is provided to introduce certain aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
A first aspect of the present disclosure is related to a method for providing a context aware media streaming content on one or more user devices. The method comprises receiving, at a character and object recognition module, a set of frames of a media streaming content. Further, the method encompasses analyzing, by the character and object recognition module, at least one selected frame from the set

of frames of the media streaming content. Furthermore, the method encompasses identifying, by the character and object recognition module at least one of one or more characters and one or more objects based on the analysis of the at least one selected frame. Thereafter, the method comprises: indexing, by a processing unit, at least one frame from the at least one selected frame that encompasses at least one of the one or more characters and / or the one or more objects; and generating, by the processing unit, one or more customized artworks based on the indexing of the at least one frame from the at least one selected frame and a set of user attributes. Subsequently, the method encompasses generating, by the processing unit, the context aware media streaming content based at least on the one or more customized artworks; and providing, by the processing unit, the context aware media streaming content on the one or more user devices.
Another aspect of the present disclosure relates to a system for providing a context aware media streaming content on one or more user devices. The system comprises a character and object recognition module that is configured to: receive, a set of frames of the media streaming content; and analyze, at least one selected frame from the set of frames of the media streaming content. Further, the character and object recognition module is configured to identify at least one of one or more characters and one or more objects based on the analysis of the at least one selected frame. Thereafter, the system comprises a processing unit that is configured to: index, at least one frame from the at least one selected frame that encompasses at least one of the one or more characters and the one or more objects; and generate one or more customized artworks based on the indexing of the at least one frame from the at least one selected frame and a set of user attributes. Subsequently, the processing unit is configured to: generate the context aware media streaming content based at least on the one or more customized artworks; and provide, the context aware media streaming content on the one or more user devices.

BRIEF DESCRIPTION OF DRAWINGS
The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.
Figure 1a illustrates an exemplary diagram depicting an exemplary network architecture diagram [100a], in accordance with exemplary embodiments of the present disclosure.
Figure 1 illustrates an exemplary block diagram of a system [100] to provide a context aware media streaming content on one or more user devices, in accordance with exemplary embodiments of the present disclosure.
Figure 2 illustrates an exemplary method flow diagram [200] for providing a context aware media streaming content on one or more user devices, shown in accordance with exemplary embodiments of the present disclosure.
Figure 3 illustrates an exemplary process [300] for identifying the one or more characters in accordance with exemplary embodiments of the present disclosure.
The foregoing shall be more apparent from the following more detailed

description of the disclosure.
DETAIL DESCRIPTION OF THE DISCLOSURE
In the following description, for the purposes of explanation, various specific details are set forth to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address any of the problems discussed above or might address only some of the problems discussed above.
The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skills in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail.
Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as

a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re‐arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure.
The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner similar to the term “comprising” as an open transition word—without precluding any additional or other elements.
To address the problems mentioned in the background section, solutions have been developed where indexing is done in the video content, but it is done on the scene boundaries, for example, to enable a user to skip or watch a particular scene in the video content. Also, some existing systems may implement solutions pertaining to object recognition and some other systems may implement character recognition separately, but do not provide a combined solution that does not have compatibility issues. Such compatibility issues may also arise due to different or separate levels of processing that a content has to go through for object recognition and character recognition. Further, there exists no solution for providing enhanced context aware media streaming including enhanced content search, content recommendation, target content placement, content artwork customization, etc. basis indexing of media content based on object and character

recognition. This indexing based on the characters and/or objects in the live or pre‐recorded video content may enable a person to obtain several applications, which are not possible to obtain with the currently existing systems, such as accessing metadata related to the indexed characters and/or objects, inserting targeted content related to the recognized characters and/or objects in live streamed videos, generating and providing customized artwork to the users, allowing to search for and skip to a particular character or object in the media stream, etc.
The present invention relates to methods and systems for enhanced context aware media streaming basis indexing of media content based on object and character recognition in the media being streamed. The content can be pre‐ recorded as well as live streamed. This indexing helps users to navigate through the media stream to select the exact instance at which the user wants to skip and/or play the media. The indexing of content provides various other features to the users, as discussed above, which the currently existing solutions implementing indexing based on scene boundaries in the content are unable to offer. Thus, in the present invention, the indexing of content may not be limited to scene boundaries, but the indexing of the content can be applied to characters and objects recognized in the frames of the media content as well. Thus, the present disclosure relates to providing a method and a system that indexes frames, based on identified characters and objects within the timeline of a video content. So, if the user wants to skip a particular part of the content or wishes to skip or watch a content of a particular character, he/she can easily do so by selecting the desired item/frame from an indexed content.
As used herein, a “processing unit” or “processor” or “operating processor” includes one or more processors, wherein processor refers to any logic circuitry for processing instructions, said processing unit may reside at server side (i.e., the network server of the OTT platform) or client side (i.e., the user device). The

processing unit may be centralized or decentralized (distributed). A processor may be a general‐purpose processor, a special purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits, Field Programmable Gate Array circuits, any other type of integrated circuits, a graphics processing unit etc. Also, the processor or the processing unit may comprise one or more processors for various operations. The processor may perform signal coding data processing, input/output processing, and/or any other functionality that enables the working of the system according to the present disclosure. More specifically, the processor or processing unit is a hardware processor. Furthermore, to execute certain operations, the processing unit/processor as disclosed in the present disclosure may include one or more Central Processing Unit (CPU) and one or more Graphics Processing Unit (GPU), selected based on said certain operations. Furthermore, the graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter a memory to accelerate a creation of images in a frame buffer intended for output to a display device.
As used herein, “storage unit” or “memory unit” refers to a machine or computer‐ readable medium including any mechanism for storing information in a form readable by a computer or similar machine. For example, a computer‐readable medium includes read‐only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory devices or other types of machine‐accessible storage media. The storage unit can be any type of storage unit such as Cloud or CDN (content delivery network) storage, public, shared, private, telecommunications operator based storage, or any other type of storage known in the art or may be developed in future. The storage unit stores at least the data that may be required by one or more units of the server/system/user device to perform their respective functions.

A ‘smart computing device’ or ‘user device’ refers to any electrical, electronic, or electromechanical equipment or a combination of one or more of the above devices. Smart computing devices may include, but not limit to, a mobile phone, smart phone, pager, laptop, a general purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, smart television, gaming consoles, media streaming devices, or any other computing device as may be obvious to a person skilled in the art. In general, a smart computing device is a digital, user configured, computer networked device that can operate autonomously. A smart computing device is one of the appropriate systems for storing data and other private/sensitive information.
A “smartphone” is one type of “smart computing device” that refers to the mobility wireless cellular connectivity device that allows end users to use services on 2G, 3G, 4G, 5G or other upcoming generations of mobile broadband Internet connections with an advanced mobile operating system which combines features of a personal computer operating system with other features useful for mobile or handheld use. These smartphones can access the Internet, have a touchscreen user interface, can run third‐party apps including capturing images, and are camera phones possessing high‐speed mobile broadband internet with video calling, hotspot functionality, motion sensors, mobile payment mechanisms and enhanced security features with alarm and alert in emergency situations.
The present disclosure proposes a unified and enhanced solution in the form of a system and method for providing enhanced context aware media streaming including enhanced content search, content recommendation, target content placement, content artwork customization basis indexing of media content based on object and character recognition.

The present disclosure is further explained in detail below with reference now to the diagrams so that those skilled in the art can easily carry out the present disclosure.
Referring to figure 1a that illustrates an exemplary diagram depicting an exemplary network architecture diagram [100a], in accordance with exemplary embodiments of the present disclosure. As shown in Figure 1a, the exemplary network architecture diagram [100a] comprises a set of user devices (UD) [101(1)], [101(2)], ….[101(n)] (hereinafter collectively referred to as user device [101] or the set of user device [101] or the one or more user devices [101] for clarity purpose) in communication with at least one server device [107], and a content delivery network [109] wherein in an implementation the server device [107] in combination with the content delivery network [109] further comprises a system [100] configured to implement the feature of the present disclosure. In an implementation system [100] may be in connection with the server device [107] and the user device [101], in a manner as obvious to a person skilled in the art to implement the features of the present disclosure.
Also, in Figure 1a only a single server device [107] is shown, however, there may be multiple such server devices [107] or there may be any such numbers of said server device [107] as obvious to a person skilled in the art or as required to implement the features of the present disclosure.
In general, the content delivery network (CDN) [109] is a geographically distributed network of servers strategically placed in various locations around the world. CDNs facilitate efficient delivery of content, such as personalized content layout/ or catalogue (whether dynamic or static), web pages, images, videos, and other digital assets, to end‐users, reducing latency and improving load times.

The CDN [109] is further connected to various other CDNs (not shown in figures) present at different geographical locations, creating a hierarchy of interconnected networks. Each CDN in the hierarchy can have its own set of servers (e.g., the server device [107]) located in different regions, forming a distributed network infrastructure. It is to be noted that the CDN may be public CDN, private CDN, Telco CDN or the ISP CDN.
When the user device [101] communicates with the content delivery server system [109] after initial authentication or validation of subscription of the user via the user device, the communication request is routed through the CDN network. The CDN [109] then uses intelligent routing algorithms to direct the communication request to the nearest server or the server with the lowest load (e.g., the server device [107]), minimizing the distance and optimizing response times. After the direction of the communication request, the content catalogue layout is then routed through the CDN [109].
Further, in the implementation where the system [100] is present in the server device [107], based on the implementation of the features of the present disclosure, a context aware media streaming content is generated via the system [100] and then provided on the one or more user devices [101].
Further, referring to figure 1 that illustrates an exemplary block diagram of a system [100] to provide a context aware media streaming content on one or more user devices, in accordance with exemplary embodiments of the present disclosure.
The present invention includes a system [100] comprising at least a character and object recognition module [102]; a processing unit [104]; and a storage unit [114]. The character and object recognition module [102] comprises at least one of a

face detector module [106], a face alignment module [108], a face recognition module [110] and an object recognition module [112]. The system components may be present at the same location or may be distributed at different locations. Also, a component of the system [100] may comprise one or more sub‐ components which may be centralized or distributed at various locations and may together be referred to as that particular component. In an exemplary implementation, the invention includes a system [100] comprising, but not limited to an input/output module, a communication module, and a repository of characters/objects.
The character and object recognition module [102] is configured to receive, a set of frames of a media streaming content. The media streaming content is one of a live video content and a pre‐recorded video content. Further, the character and object recognition module [102] is configured to analyze at least one selected frame from the set of frames of the media streaming content. The at least one selected frame from the set of frames is selected based on at least one of a logical selection and a random selection. Furthermore, the character and object recognition module [102] is configured to identify at least one of one or more characters and one or more objects based on the analysis of the at least one selected frame. Particularly, to identify the one or more characters, the face detector module [106] of the character and object recognition module [102] is configured to localize one or more faces in the set of frames of the media streaming content. Thereafter, the facial alignment module [108] of the character and object recognition module [102], is configured to align the one or more localized faces to normalized canonical coordinates. Next, the face recognition module [110] of the character and object recognition module [102] is configured to implement a facial recognition method on the one or more aligned localized faces. Also, the face recognition module [110] is then configured to identify the

one or more characters based on the implementation of the facial recognition method on the one or more aligned localized faces.
Therefore, according to the implementation of the features of the present disclosure, the character and object recognition module [102] is configured to identify faces (i.e., characters from their faces or from their overall appearance), and objects, if any, for selected frames in a streaming video content (i.e., the media streaming content). The present disclosure encompasses logically and/or randomly selecting some frames in a video content (i.e., the media streaming content). For example, while logically selecting the frames, the character and object recognition module [102] may analyse a background and/or a foreground of one or more consecutive frames in a content (i.e., media streaming content), may be through a quick screening of frames, and pick the frames when there is a drastic change in the background and/or the foreground of the frames or the change is above a threshold level as specified in the system [100]. A person skilled in the art would appreciate that the above example is provided for understanding purposes only and does not restrict the invention in any possible manner, and accordingly, any other logic is also possible for selecting one or more frames from the set of frames. In an implementation, each frame on which a process of object and character recognition is performed, or a frame on which indexing is done, should be understood as a selected frame throughout this disclosure. Also, character and object recognition may not be done one by one for a same content, that is for example, by first identifying characters in each frame of the content and then identifying objects in said each frame of the content, but may be done simultaneously where both the objects as well as characters are identified at once in a frame of the content.
Therefore, in accordance with the above disclosure, the streaming video content may be a live video content or a pre‐recorded video content. The character and

object recognition module [102] may further comprise the face detector module (face anti‐spoofing) [106], the face alignment module [108], and the face recognition module [110]. In an exemplary implementation, to implement the features of the present disclosure, firstly, the face detector module [106] receives the streaming video content and localizes faces in each frame of the streaming video content. The received streaming video content may be retrieved from the storage unit [114] of the system [100] or from an external server. Second, the facial alignment module [108] aligns the localized/ identified faces to normalized canonical coordinates. Third, the face recognition module [110] implements a facial recognition method on the aligned faces to identify one or more characters in the streaming video content. In an implementation a character number is assigned to each of the identified one or more characters and stored in the storage unit [114] for easy extraction of a data related to the identified one or more characters to implement the features of the present disclosure. The face recognition module [110] may implement face matching steps based on at least one of a threshold matching, neural network identification, metric learning, etc. In an implementation, the face recognition module [110] may include a feature extraction module, and a loss function calculator.
Also, in an implementation, the character and object recognition module [102] is a pre‐trained system that is trained on a training dataset, wherein the training dataset comprises a data associated with at least one of a plurality of images of characters and a plurality of objects.
Also, in an implementation the training dataset may include images of characters retrieved from popular search engines and specific databases of characters. The dataset enforces the pre‐trained system to address a wide range of intraclass va ria tion s, such as light ing, age, and p ose, whi ch have a l imited number of subjects but many images for each subject. In an instance implementation, the datasets

used for training may include 3 to 2469 images of each character/ subject for the purposes of training. The training data/training dataset may include metadata, social media profiles of characters, predefined cast of streaming video content, predefined objects database, standard object shapes, geometry, object labels such as brands, logos, titles, etc.
In another implementation, the character and object detection module [102] of the system via the object recognition module [112] is also configured to detect and recognize the one or more objects such as including but not limiting to one or more costumes that some characters are wearing, or one or more places that are shown in one or more scenes or clips in a video content, or certain other object(s) in a frame of a content, for example phones, computers, cars, food, refrigerator, water bottle, etc. these are identified and/or detected and/or recognized based on various techniques. For example, a comparison of the one or more objects to a dataset as defined above may help in identifying the one or more objects and tagging. Also, the above examples of objects and/or characters are provided for understanding purposes only and do not restrict or limit the present disclosure in any possible manner. Further, the character and object recognition module [102] implements various techniques for object detection purposes, such as Region‐Based Convolutional Neural Networks (R‐CNN) models. Also, other techniques, either known or that may be developed in future, related to image object detection or video object detection may be implemented for this purpose. Also, a combination of known techniques may also be developed for implementation of the object detection purpose. For example, ‘optical flow’ is a tool that calculates and traces a movement of all items present in an image. It takes all pixels in an image of a video frame and analyzes their trajectory throughout the video frame. So, a combination of ‘image detection’ technique with ‘object tracking’ technique may be implemented for a fast and accurate object detection in a streaming video.

Further, these techniques of character and object recognition are applied by the system [100] before transmitting a content data for distribution on one or more networks for a set of viewers. Thus, the character and object recognition may be done in run‐time, i.e., when the video streaming content is being streamed to a user’s device. Further, the present disclosure encompasses that the character and object recognition is done by the character and object recognition module [102] in near real‐time for live streaming video content and may be pre‐performed for video on demand content.
Further, once the character(s) and the object(s) in one or more scenes/ shots of the media streaming content have been recognized by the character and object recognition module [102], the processing unit [104] of the system [100] is configured to index, at least one frame from the at least one selected frame that encompasses at least one of the one or more characters and the one or more objects. In an implementation, all relevant frames of the streaming video content with the recognized character(s) and/or the recognized object(s) in each such frame are indexed. The present disclosure encompasses that the indexing is performed against a playtime length on a progress bar of the media streaming content, and the indexing is displayed on the progress bar by at least one of one or more markings, one or more colour codes, one or more character names, one or more emojis, and one or more actual frames in the media streaming content. Further, the processing unit [104] is configured to index, the at least one frame from the at least one selected frame by inserting in the at least one frame from the at least one selected frame a set of content to be provided on the one or more user devices.
For instance, in an implementation, indexing of one or more videos may leverage an insertion of a targeted content such as promotion campaigns, advertisements,

etc. in a video content of the one or more videos. In general, the targeted content is considered intrusive to users and the users are not interested in watching that targeted content or click on the targeted content to visit web pages linked with that targeted content to utilise the same. The solution to this problem of showing intrusive targeted content to the users, that the present invention encompasses, is by providing in‐feed targeted content that is not intrusive or are less intrusive. Also, in an implementation, the users may be given a choice for viewing targeted content where the users can select whether the targeted content should be shown to them or not.
In another implementation, to index the at least one frame from the at least one selected frame, the present disclosure also encompasses insertion of one or more content hyperlinks in the at least one frame that are relevant to user interests and customised based on user behaviour, past history of user(s), or any other user attributes. Also, in an implementation, the system [100] is configured to learn the user characteristics/behaviour/pattern by implementing a machine learning model. This machine learning model may be dynamically trained and updated periodically.
In already existing systems, this can be done through manual intervention wherein an operator first scans a whole content of a media stream and then marks one or more pointers for embedding in‐feed targeted content in the media stream, or in a semi‐automated manner where a system identifies the one or more pointers/spots for the operator to add/mark the targeted content. However, in the present invention, this manual intervention of the operator is done away with as the system [100] can itself automatically detect object(s) or character(s) in a frame of a streaming content and suggests content insertion opportunities in the frame. In an exemplary implementation, for this purpose of content insertion the system [100] communicates with at least one of a campaign server and an

advertisement server and fetches relevant content to be inserted based on the detected object(s) or character(s). In another exemplary embodiment, the invention encompasses manual indexing of character(s) and object(s) by a user watching a content. This indexing done by the user watching the content may further be verified at an operator’s end either manually or in an automated or semi‐automated manner and then made part of the content or rejected as needed, based on the verification results.
For this purpose of indexing the at least one frame from the at least one selected frame, in an implementation, a target content recommendation server may be connected to the processing unit [104]. This target content recommendation server comprises all contents that can be recommended to a user during streaming of a media streaming content on a user device of the user. Whenever the system [100] detects a content recommendation opportunity that is of relevance with respect to a user, it is directed to the target content recommendation server to search for any recommended content related to an identified object or an identified character and/or customized to a user interest or a user preference, that can be displayed on a screen or a user interface of a user device of the user at a particular instance of a media streaming content being displayed.
After indexing and markings are done, a table, i.e., a tabular array comprising, say for example, multiple rows and columns may be maintained in a file. Say, the tabular array has all timestamps of the media streaming content where there is a character and/or object available in the media streaming content and a list of corresponding characters and/ or objects detected in the media streaming content. It will be appreciated by those skilled in the art that an indexed content may be stored in any other format and the present disclosure is not limited to storing the indexed content in a single tabular array as mentioned above.

Further, the processing unit [104] is configured to generate one or more customized artworks based on the indexing of the at least one frame from the at least one selected frame and a set of user attributes. The one or more customized artworks comprise at least one of one or more previews, one or more thumbnails, one or more teasers, one or more trailers, one or more images, and one or more short videos, related to at least one of the one or more characters and the one or more objects. In an implementation in the present disclosure the processing unit [104] may refer to a table of contents as well as one or more user attributes that are already learned by the system [100] by implementing at least one of a machine learning model, user past search history, user watch history and such user interaction that is obvious to a person skilled in the art and decide whether any item in the table, that is, any object or character is of relevance to the user attribute(s). For example, if a user is interested in a cartoon movie, then the user in this case might be interested in graphics, the user in another case might be interested in toys, etc. The system [100] after referring to both a timeline or the indexed content and the user attribute(s) generates the one or more customized artworks and provides such customized artwork(s) to the user. Moreover, in an implementation, the present invention encompasses generating the one or more customized artworks for user(s) based on one or more user preferences of the user(s). This customized artwork(s) includes at least one of: one or more previews, one or more thumbnails, one or more teasers, one or more trailers, one or more images, one or more short videos, (such as a *.gif file, etc.) etc. for a data recommendation related to character(s) or object(s) in the streaming media content for the user(s). For example, a user A is interested in watching comedy content and a user B is interested in watching a content related to an Artist X. Now say, in a same video content C, the artist X performs a comedy act. Thus, the video content C may be of interest to the user A as well as the user B. However, the system [100] in this implementation, may generate different artworks for the user

A and the user B. For example, a custom artwork shown to the user A may comprise an image or short video, etc. which suggests that the video content C is a comedy content. And, a custom artwork shown to the user B may comprise an image or short video, etc. which suggests that the video content C includes work of the Artist X. Thus, in this implementation, the system [100] first searched for a particular user, that whether a video content contains anything that may be of relevance or interest for the particular user, and based on user preferences of said particular user, the system [100] also generates customized artwork for the particular user, further enhancing user experience. This is possible because the video content is indexed and the system [100] is context‐aware basis indexes in the video context, and also the system [100] considers user attributes to generate and provide customized artwork(s) to user(s). So continuing with the above example, say, a video content Z has a short segment featuring the Artist X and the system [100] identifies that the Artist X is relevant to the user B and also identifies a content C in a target content recommendation server of the system [100] which features the Artist X, then the system [100] automatically generates a customized artwork of the content C for the user B where the customized artwork suggests that the content C features the Artist X, and system [100] provides the customized artwork with the video content Z that is being displayed to the user B on the user device. Also, in an implementation, following the above example, if more than one contents similar to the content C for the user B are identified, say the content C, content D, content E and content F are identified as relevant to the user B as all the contents C, D, E and F feature the Artist X, then all the contents C, D, E, and F may be displayed in one or more customized artworks created for the user B. In one instance, contents C, D, E, and F may be displayed to the user B (where the customized artwork may comprise short clips of videos, images, etc. for each content C, D, E, and F, being displayed ) on the user device, or may be scrollable by the user B (where multiple artworks (i.e., customized artworks) are created for multiple contents and the artworks are displayed on a user device of the user B

and the user B can scroll through the same) on the user device to access the same. Therefore, the user B may directly watch/play the scenes associated with the Artist X through a selection of the contents C, D, E, and F displayed to the user B without watching/playing the entire content. Also, this display of content C, D, E, and F, may be made to the user B using a playlist file such as M3U8 file, etc. Also, in an implementation, a standard artwork is taken by the system [100] from other digital platforms, and checked whether the standard artwork is customizable for a user or not. In case the standard artwork is customizable, the standard artwork is customised by the system [100], and the customised artwork is shown to the user. Alternatively, if the standard artwork is not customizable according to the user, then the standard artwork is shown to the user.
Furthermore, the processing unit [104] is configured to generate the context aware media streaming content based at least on the one or more customized artworks. In an implementation to generate the context aware media streaming content the processing unit [104] is further configured to modify the media streaming content based on one of a manifest file associated with the one or more customized artworks and a hardcode of the one or more customized artworks. As used herein hardcoding the one or more customizes artworks implies embedding the one or more customized artworks directly to the media streaming content. Thereafter, the processing unit [104] is configured to provide the context aware media streaming content on the one or more user devices.
For instance, in an implementation, to provide a context aware media streaming content on a user device, a media streaming content modified based on a customised artwork or a link for the customised artwork may be sent to the user device using a manifest file. In an embodiment, this customised artwork is generated prior to encoding the video content for delivery to the user device. For instance, if the user has requested for an on‐demand video like a movie that has

already been indexed and the system [100] also knows the user’s preferences, in such a scenario, the customised artwork is generated prior to encoding the video for transmission to the user’s device. In another embodiment, the customised artwork is generated after the encoding of the video content for transmission to the user device. For instance, if the user has requested for a live content, the indexing for such content is done in real time, so customised artwork may not be generated prior to encoding. In another instance, if a new user has requested for an on‐demand video, the user preferences are not known to the system [100], in such a scenario also a customised artwork may not be generated prior to encoding a video for transmission but only after the user has started watching the video and has provided some inputs to the system [100] that assist in deriving such user’s preferences. In another embodiment, the system [100] may generate customized artwork for a user basis the basic demographics of the user and a pool of similar users (that is, by considering the preferences of other users with same or similar demographics) or any such correlation between the user and other users for which the system already has the data available.
The processing unit [104] is further configured to generate a customized targeted content based on the context aware media streaming content. Thereafter, the processing unit [104], is configured to provide on the one or more user devices, the customized targeted content based on a historical user pattern associated with the one or more user devices. In an implementation, the present invention also encompasses generating and providing the customized targeted content based on the indexing of the video content (i.e., the media streaming content). These customized targeted content may be shown to the viewer based on the past behavior of the viewer. For instance, say a video has been indexed with objects, for example, at particular time point in the video, the system [100] knows through indexing that characters are talking on a phone, and/or discussing about a mobile phone that is being displayed in the video streamed on a user device. In this

example, a pop‐up window in parallel occupying a separate area of a screen of the user device may be displayed wherein a targeted content of said phone may be shown. Alternatively, a link for such targeted content via a button or thumbnail or any other form accessible to the viewer may be shown on the screen or user interface of the user device. This will not be intrusive to the user as the content is already being played on the screen, and also as the targeted content is related to the video being played. For example, there is a communication happening between 2 persons in a video content regarding a product, the video content being, say, watched by 3 users at a time, User A, User B and User C on their respective user devices. And the system [100] finds that the product is relevant to the User C and not relevant to the User A and the User B. Now, the system [100] will insert the targeted content only for the User C and not for the User A and the User B. Second, the targeted content inserted for the User C will be non‐intrusive for the User C in the sense that it may be just provided, say for example, as a hyperlink in the form of a watermark at a corner of the screen, or say, if the product is shown in the frame, then the product may be clickable by the User C on the user interface on which the User C is watching the video content, or may be provided in a separate tray accessible to the User C on the user device, or in any other such non‐intrusive manner. Also, the targeted content shown may be displayed on the whole screen or a part of the screen, and the video content being displayed on the screen may be paused or stopped or may be played in parallel as per viewer’s desire. Alternatively, an ad content such as a brand name (for in‐feed targeted content) may be embedded with a streaming video content, such as on an object that is in a frame of the streaming video content. For example, a water filter that is in a content being displayed on a user screen may be displayed with a brand name embedded on that water filter. Also, in an implementation, the brand name may be hyperlinked to a respective/relevant web page, and a mark or any other identifying factor may be presented to the user to access the web page, etc.

For instance, the hyperlink could direct the user to an ecommerce website page for buying the product for which the targeted content has been inserted.
In an alternate embodiment, a content may be paused and details or any additional data related to a targeted content or character(s) or object(s) in a paused frame of the content, is displayed to the user while the video is paused. Further, the system [100] is aware of a context in the media streaming content being displayed to the one or more user devices. For example, there is some context of soft‐beverages going on in a content being streamed to a user on a user device. Here, the system [100] identifies an opportunity to insert a targeted content of a soft beverage to the user. Further, say in this example, the system [100] has multiple options of displaying various soft beverages on the user device. Thus, the insertion of the targeted content of the soft beverage in this example, may be based on a pre‐poll conducted by the system [100] on the user device, or other user patterns from a user data, from which the system [100] is aware of what option from the multiple options of soft beverages will be most relevant to the user. Similarly, in the same example, when the system [100] identifies an opportunity to show a web‐page related to a content of the soft beverages, the system [100] may identify that the user may be interested in a short video content on how beverages are made, or may be interested in reading some content on the internet to learn about the chemical composition of soft beverages, etc. The system [100] thus shows the user the relevant link for the same. Thus, the present invention not only enables the system to identify ad opportunities based on the indexed content, i.e., identify where ads can be shown to the user, but also provides customized targeted content to the user based on the user’s preferences.
For VOD (video on demand), the system [100] may already have a content available beforehand and therefore the indexing as per the present invention can

be done and presented to the user(s) after processing and insertion of the desired targeted content and/or data relevant to the user(s) at required instances of the media streaming content. For example, there is a pre‐recorded content available at a server in connection with the system [100]. Also, in an implementation the system [100] has a repository comprising the one or more user attributes and other data related to user(s). Further, since a video content data may be indexed beforehand in this case, a data presented to a user may be hardcoded with one or more relevant insertions related to object and character recognition in the video content, i.e., an indexing of a content and user attribute(s). In an implementation, the content available beforehand may not be hardcoded for user(s) and a separate manifest file, for example, a child manifest, may be inserted for recommendations to the user(s). In this implementation, an application running in a user device of the user(s) may pull the manifest file and show relevant content/recommendations to the user(s) basis opportunities detected by the system [100] in the video content being streamed on the user device.
In an implementation, the pre‐recorded content is available and encoded by the system [100] following the steps of shortlisting of frames, indexing of content in the shortlisted frames as explained above, which also comprises video compression as per a video codec used. This encoded content is now available in all resolutions as desired by an operator who needs to transmit a content to a user. Thus, a media content player on a user device may be able to select, based on various parameters, as to which resolution to play on the user device. In an implementation, while indexing is being done in the media streaming content, say the system [100] identifies some character or object in one of the frames of the media streaming content, it immediately contacts a target content recommendation server to find some relevant content that can be connected with the media streaming content at that point of time in the media streaming content. For this purpose, a content title data or any such other content identification data,

and a short clip which may comprise a small number of frames, say, 1‐2 frames where the system [100] has identified the object(s) or character(s), are sent to a content server. Further the target content recommendation server is configured to compare the objects and/or characters with one or more existing objects in the content server. In an implementation, a target content recommendation server may be contacted for displaying targeted content. Further, the targeted content is given to the content delivery network (CDN) [109] for displaying to user(s). In an implementation, a SCTE marker (such as standard markers SCTE 35, SCTE 104, etc. as developed by Society of Cable and Telecommunications Engineers as generally known in the art) may be made for suggesting or informing the system [100] about an opportunity to contact target content recommendation server, or to let the system [100] play the media streaming content as it is without displaying a relevant content from the target content recommendation server to the user(s). Thus, while indexing of the media streaming content, the system [100] keeps things ready in the background and when the media streaming content is being played on the one or more user devices the system [100] may connect to the target content recommendation server via a URL, etc. at the time when the media streaming content needs to be played for the user(s). This may be referred to as dynamic stitching in the pre‐recorded content or VOD (video on demand) content.
In an implementation, the users may be divided into various cohorts and a data content may be same or similarly indexed for a cohort of users, i.e., the same or similar recommendations may be provided to all users in a single cohort, wherein the cohort of users is defined by the users having same or similar characteristics/behavior/pattern. In another implementation, one or more recommendations made in a video content may be personalized for each user based on one or more user characteristics/behavior/pattern and the users may also not be divided into multiple cohorts.

In the live streaming of a data, for example, in case of a live cricket match, one or more small segments of a content get ingested into a platform and/or to another processing unit [104] which processes the content. For example, a live cricket match is captured by a camera set up and fed to a processing unit [104] via a communication means for distribution to the user(s). The processing unit [104] may be configured to insert a relevant content for the user(s) in the media content of the live cricket match received from the camera set up via the communication means. So, in case of the live content being streamed, content in near‐real time has to be shown to the user(s).
Thus, in an implementation, the customization of the video data content may not be performed in a single system where encoding of content is followed by indexing of the content as explained above in the pre‐recorded content. Instead, in this implementation, the content is directly fed to the encoder for encoding the content to ingest the encoded content (in various resolutions) in the content delivery network (CDN) [109]. And, separately the same content may be provided to a separate system for object and/or character identification and indexing of media content for providing enhanced context aware media streaming basis the same. This is done to overcome the issues related to delays which may be incurred due to object and/or character recognition and indexing of the content. Now, for example, the indexing in the separate system identifies some points of objects and/or characters and indexes them to find relevant data to be fetched from the target content recommendation server (that is populated for fetching such data) to be displayed at the points of indexing, the information regarding the data to be fetched from the target content recommendation server to be displayed to the user at the time point as suggested by the separate system and played at the user device when needed, is saved in a form of a separate file, that is a virtual manifest file or an updated/revised manifest file in the system that can be interpreted by the processors to display the required content. The separate file may also contain

uniform resource locators (URLs) from the target content recommendation server. Based on this updated manifest, the CDN [109] connected to the target content recommendation server fetches relevant content to be displayed to the user. Also, the target content recommendation server and the CDN [109] may or may not be placed together at the same location. In this way, the encoded stream that was directly sent to the CDN [109] and the one that is processed by the separate system to index and insert relevant content for the user can be combined and sent by the CDN to be played in synchronization at the user device. For the purpose to combining this virtual manifest file with the original manifest file, the virtual manifest file may be interacting with the original manifest file for displaying content recommendation to the user. For this purpose, for example, a periodic polling mechanism may be adopted where the original manifest file polls this updated manifest file to be sent. Also, the content available at a specific URL may also be configurable by the system [100] or the operators as needed.
In an implementation, the content recommendation data or the targeted content data is stored in the form of web links, etc. and presented to the user via content delivery networks [109], or may be stored in a storage unit [114] and displayed when required by the system [100].
Further, in an implementation, when the user requests to play a content at the user device, a server of an operator receives the user request. This server of the operator may comprise a group of servers such as content management system (CMS), user management system (UMS), H‐play system, and Z‐play system. These systems or servers may communicate with the content delivery networks (CDNs) [109] to provide URL addresses of the content to be played. This server(s) of the operator is also configured to authenticate the request of the user where it checks whether the user is genuine or fake. Once the authentication is done, a short token is issued and delivered to the user device in case the user is found genuine. The

short token comprises the URL. Further, the user device finds the URL in that short token and reaches the CDN [109]. Further, in an implementation, at the CDN [109], a basic validation may occur. Pursuant to this validation, the CDN [109] may issue and send a long token to the user device. The long token may comprise of a manifest file that contains a plurality of URLs of all resolutions of all segments of the content that needs to be played at the user device. Whenever there is the requirement of an updated manifest, the user device may connect again with the CDN [109] for the updated manifest file.
Additionally, the processing unit [104] of the system [100] is also configured to provide, at least one of a searching capability, a buying capability, an object related data accessing capability, a character related data accessing capabilities on the one or more user devices based on the context aware media streaming content. For instance, in an implementation, the present disclosure also encompasses improving the searching capabilities of platforms by means of the indexing of scenes, objects and characters. For instance, a user may provide an image, say, of a character, to search content and since the similar image is present in an index of say content X, the user may be directed to a video the content X as one of the options, and the system [100] may also search for other data related to that character and presented to the user. Also, in another instance, the search string/image provided the user maybe mapped to the interests of the user, the past browsing history and all the interactions of the users may be stored in a database or memory unit and from that data of the user, the output of the system may be optimized as per the user interests. Additionally, a user can form search strategies using keywords and Boolean or proximity operators or semantic search, and the search for these keywords or sentences is done in an indexed metadata so that an appropriate content list can be presented to the user.

In an implementation, the invention further encompasses providing a capability to directly purchase an indexed object/ or items relating to one or more indexed scenes/ characters in a video content. For example, when a character is wearing a particular apparel, then that article or similar article can be identified that is available at an ecommerce platform, and the user may be given an option via the system [100] to directly purchase said item. Also, here the indexing may help in identifying scenes that are being displayed to viewer that contain an object such as a tourist attraction, etc. The invention enables to identify one or more scenes or clips within a content in more detail with respect to context, storyline, genre, cast apparel / costumes, tourist attractions etc. for identifying targeted content opportunities, character‐based content recommendation, customized thumbnails, content previews with culled‐out character of interest for users, enhanced character or image based content search, etc. based on the same.
In an example, the invention may provide an additional data tray on the user device to provide details related to cast / objects in the content based on the same. This data tray may be accessed by a viewer on his/her device by clicking or dragging on a user interface or by any other means. Alternatively, in an implementation, the data tray may be available in a dedicated part of the user interface. Also, once the user accesses the data tray on his/her device, the data tray may be available on whole or part of the user interface or screen of the user device. Thus, if the data tray is shown partially on the user interface, the user may be able to view the video content on one part of the user interface and access the data tray in parallel to the same. Further, this additional data tray may comprise URL links for the viewer to access the relevant data or the data of interest to the user. The viewer may click to access the additional data tray to access one or more desired links. For example, in a scene or clip in a video content being displayed on the user device, an image of monument, say, statue of liberty is being displayed. Here, the system [100] may identify a targeted content opportunity related to the

statue of liberty, say, a tour and travel package offering a visit to the statue of liberty. Accordingly, the system [100] encompasses displaying a targeted content related to the tour and travel package offering a visit to the statue of liberty. Further, the invention also encompasses dynamically changing this targeted content or displaying this targeted content for a specific period of time. Therefore, the targeted content may change when a scene changes or after some time when the scene changes. For example, a scene or clip displaying the statue of liberty is over and a new scene has another place, say, the Taj Mahal. So, the targeted content shown to the user may also change from the “tour and travel package offering a visit to the statue of liberty” to the new targeted content related to the Taj Mahal, say “tour and travel package offering a visit to the Taj Mahal”. Also, the targeted content related to the Taj Mahal may be a different targeted content related to Taj Mahal based on a user preference or behavior, and not a targeted content of tour and travel package offering a visit to the Taj Mahal. For example, an analysis of past user behavior shows or suggests that a viewer has no interest in travelling or touring but is interested in collecting miniatures or historical monuments. Thus, in that case, the viewer may be shown a targeted content for buying a miniature artefact of Taj Mahal. Thus, in this way, the targeted content may be customized for each viewer using the present invention. In another example, when a scene or clip displays Taj Mahal, the user may not be provided with targeted content feed but a content recommendation in the form of a hyperlink to a web page to read more about Taj Mahal. Also, in this implementation of a data tray being visible to the user at the user device, the data tray may contain a combination of targeted content related to the Statue of Liberty, the Taj Mahal as well as the non‐targeted content recommendation related to the Statue of Liberty and/or the Taj Mahal so as to enable user to select which content to access from the available recommendations in the data tray. Also, extending this example, if a video data being streamed contains a video of a very less popular place, and the system identifies that the user may be interested

to know more about that place, then the hyperlink to read and know more about that place may be included in the data tray for the user to access.
Further, in an implementation, the content of the data tray may also be editable by the user. For example, if a user is shown content in the data tray to which the user has more to share, the user can provide inputs on the same, which may be added to the targeted content based on further validation at the operator’s end in a manual, automated or semi‐automated manner.
The invention also encompasses short‐form video generation at a character level. Based on the recognition of a particular character in multiple streaming video content, the system [100] may generate a short video for each character based on various clips or scenes from one or multiple streaming video content.
Figure 2 illustrates an exemplary method flow diagram [200] for providing a context aware media streaming content on one or more user devices, shown in accordance with exemplary embodiments of the present disclosure. The method is implemented via the system [100].
The method for providing the context aware media streaming content on the one or more user devices begins at step 202.
The method at step 204 comprises receiving, at a character and object recognition module [102], a set of frames of a media streaming content. The media streaming content is one of a live video content and a pre‐recorded video content. The character and object recognition module [102] comprises at least one of a face detector module [106], a face alignment module [108], and a face recognition module [110]. Also, in an implementation, the character and object recognition module [102] is a pre‐trained system that is trained on a training dataset, wherein

the training dataset comprises a data associated with at least one of a plurality of images of characters and a plurality of objects.
Further at step 206, the method encompasses analyzing, by the character and object recognition module [102], at least one selected frame from the set of frames of the media streaming content. The at least one selected frame from the set of frames is selected based on at least one of a logical selection and a random selection. In an implementation, the present disclosure encompasses logically and/or randomly selecting one or more frames in a video content. For example, while logically selecting the one or more frames, the character and object recognition module [102] may analyse a background and/or foreground of the frames, a change in camera angle, a shift of camera to a new scene and/or such analysis of one or more consecutive frames in the video content, that is obvious to a person skilled in the art through a quick screening of frames, and pick frame(s) when there is a drastic change in the background and/or foreground of the frames or a change is above a threshold level as specified in the system [100].
Furthermore, at step 208, the method comprises identifying, by the character and object recognition module [102], at least one of one or more characters and one or more objects based on the analysis of the at least one selected frame. As depicted in figure 3 at step [302], the process of identifying, by the character and object recognition module [102], the one or more characters begins at step [302]. The process [300] further at step [304] comprises localizing, by the face detector module [106], one or more faces in the set of frames of the media streaming content.
Further, the method [300] at step [306] comprises aligning, by the facial alignment module [108], the one or more localized faces to normalized canonical coordinates,

Furthermore, the method [300] at step [308] encompasses implementing, by the face recognition module [110], a facial recognition method on the one or more aligned localized faces, and next the method [300] at step [310] encompasses identifying, by the face recognition module [110], the one or more characters based on the implementation of the facial recognition method on the one or more aligned localized faces. The method then terminates at step [312].
Therefore, in an exemplary implementation, firstly, the face detector module [106] receives the streaming video content and localizes face(s) in each frame of the streaming video content. The received streaming video content may be retrieved from the storage unit [114] of the system or from an external server. Second, the facial alignment module [108] aligns the localized/ identified faces to normalized canonical coordinates. Third, the face recognition module [110] implements the facial recognition method on the aligned faces to identify one or more characters in the streaming video content. The face recognition module [110] may implement face matching steps based on threshold matching, neural network identification, metric learning, etc. In an implementation, the face recognition module [110] may include a feature extraction module, and a loss function calculator.
The character and object detection module [102] detects and recognizes the one or more objects via the object recognition module [112]. In an implementation, detecting and recognizing by the character and object detection module [102] the one or more objects such as including but not limiting to one or more cars that some characters are driving, or one or more places that are shown in one or more scenes o r clip s in a video c o nte nt, or certain other object(s) in a frame of a con ten t, for example phones, computers, cars, food, refrigerator, water bottle, etc. these are identified and/or detected and/or recognized based on various techniques.

For example, a comparison of the one or more objects to a dataset as defined above may help in identifying the one or more objects and tagging. Also, the above examples of objects and/or characters are provided for understanding purposes only and do not restrict or limit the present disclosure in any possible manner. Further, the character and object recognition module [102] implements various techniques for object detection purposes, such as Region‐Based Convolutional Neural Networks (R‐CNN) and Spatial Pyramid Pooling (SPP‐net) models. Also, a combination of known techniques may also be developed for implementation of the object detection purpose. So, a combination of ‘image detection’ technique with ‘object tracking’ technique may be implemented for a fast and accurate object detection in a streaming video. Further, the present disclosure encompasses that the character and object recognition is done by the character and object recognition module [102] in near real‐time for live streaming video content and may be pre‐performed for video on demand content.
Thereafter at step 210, the method comprises indexing, by a processing unit [104], at least one frame from the at least one selected frame that encompasses at least one of the one or more characters and the one or more objects. Further, the indexing is performed against a playtime length on a progress bar of the media streaming content, and the indexing is displayed on the progress bar by at least one of one or more markings, one or more colour codes, one or more character names, one or more emojis, and one or more actual frames in the media streaming content. Furthermore, the indexing, by the processing unit [104], the at least one frame from the at least one selected frame comprises inserting in the at least one frame from the at least one selected frame a set of content to be provided on the one or more user devices. In an implementation, the indexing of videos may also leverage an insertion of a targeted content such as promotion campaigns, advertisements, etc. in the video content of the one or more videos. In general, the targeted content is considered intrusive to users and the users are not

interested in watching that targeted content, or click on the targeted content to visit web pages linked with that targeted content to utilise the same. The solution to this problem of showing intrusive targeted content to the users, that the present invention encompasses, is by providing an in‐feed targeted content that is not intrusive or are less intrusive. Also, in an implementation, the users may be given a choice for viewing the in‐feed targeted content where the users can select whether the targeted content should be shown to them or not. For this purpose of indexing the at least one frame from the at least one selected frame, in an implementation, a target content recommendation server may be connected to the processing unit [104]. This target content recommendation server comprises all contents that can be recommended to a user during streaming of a media streaming content on a user device of the user. Whenever a content recommendation is detected by the system [100] that is of relevance with respect to a user, it is directed to the target content recommendation server to search for any recommended content related to an identified object or an identified character and/or customized to a user interest or a user preference, that can be displayed on a screen or a user interface of a user device of the user at a particular instance of a media streaming content being displayed.
Next at step 212, the method comprising generating, by the processing unit [104], one or more customized artworks based on the indexing of the at least one frame from the at least one selected frame and a set of user attributes. The one or more customized artworks comprise at least one of one or more previews, one or more thumbnails, one or more teasers, one or more trailers, one or more images, and one or more short videos related to at least one of the one or more characters and the one or more objects.
In an implementation the present disclosure the processing unit [104] may refer to a table of contents as well as one or more user attributes that are already

learned by the system [100] (by implementing a machine learning model, etc.) and decide whether any item in the table, that is, any object or character is of relevance to the user attribute(s). For example, if a user is interested in an action movie, then the user in this case might be interested in a powerful antagonist, the user in another case might be interested in a stimulating fight between a protagonist and an antagonist, etc. The system [100] after referring to both a timeline or the indexed content and the user attribute(s) generates the one or more customized artworks and provides such customized artwork(s) to the user. Also, in another implementation, a standard artwork is taken by the system [100] from other digital platforms, and checked whether the standard artwork is customizable for a user or not. In case the standard artwork is customizable, the standard artwork is customised by the system [100], and the customised artwork is shown to the user. Alternatively, if the standard artwork is not customizable according to the user, then the standard artwork is shown to the user.
Further at step 214, the method encompasses generating, by the processing unit [104], the context aware media streaming content based at least on the one or more customized artworks. Further, for generating, by the processing unit [104], the context aware media streaming content the method further comprises modifying the media streaming content based on one of a manifest file associated with the one or more customized artworks and a hardcoding of the one or more customized artworks. Next, at step 216, the method comprising providing, by the processing unit [104], the context aware media streaming content on the one or more user devices.
For instance, in an implementation, to provide a context aware media streaming content on a user device, a media streaming content modified based on a customised artwork or a link for the customised artwork may be sent to the user device using a manifest file. In an embodiment, this customised artwork is

generated prior to encoding the video content for delivery to the user device. For instance, if the user has requested for an on‐demand video like a movie that has already been indexed and the system [100] also knows the user’s preferences, in such a scenario, the customised artwork is generated prior to encoding the video for transmission to the user’s device. In another embodiment, the customised artwork is generated after the encoding of the video content for transmission to the user device. For instance, if the user has requested for a live content, the indexing for such content is done in real time, so customised artwork may not be generated prior to encoding. In another instance, if a new user has requested for an on‐demand video, the user preferences are not known to the system [100], in such a scenario also a customised artwork may not be generated prior to encoding a video for transmission but only after the user has started watching the video and has provided some inputs to the system [100] that assist in deriving such user’s preferences. In another embodiment, the system [100] may generate customized artwork for a user basis the basic demographics of the user and a pool of similar users (that is, by considering the preferences of other users with same or similar demographics) or any such correlation between the user and other users for which the system already has the data available.
The method further comprising generating, by the processing unit [104], a customized targeted content based on the context aware media streaming content, and providing, by the processing unit [104] on the one or more user devices, the customized targeted content based on a historical user pattern associated with the one or more user devices.
Further, the method encompassing providing by the processing unit [104], at least one of a searching capability, a buying capability, an object related data accessing capability, a character related data accessing capabilities on the one or more user devices based on the context aware media streaming content.

The method then terminates at step 218.
Various embodiments disclosed herein provide numerous advantages. More specifically, the embodiments disclosed herein suggest techniques for providing enhanced context aware media streaming to user(s) wherein content being streamed on user devices is indexed to provide for enhanced content search, content recommendation, target content placement, content artwork customization, customized targeted content, etc. Since the present system is context aware due to the indexing of characters and objects in the media stream, the system generates better recommendations for the users based on indexed content and user preferences. Also, providing a limited and relevant content to the user, requires less data to be ingested on the content delivery networks. Thus, the same leads to saving costs of ingesting more data in the content delivery networks and results in better user experience as the user gets to see only the relevant content that the user is interested in. Further, since indexing is done for each frame of the video content and user behavior/attributes are already learned by the system, the video content can be linked with user behavior and recommended content can be shown to the user based on user behavior, for example, user may like to watch short videos, or may want dubbed content in place of original content in some particular language, or may want to watch c o n te n t of a pa rti c ula r g en re o r of a pa rt i c u la r a rtist , etc. Based on this, customized artwork may also be created for the user which further enhances the user experience of searching the content.
Furthermore, insertion of targeted content related to the object or character being displayed in the streamed video content, and recommended content for the user or the targeted content being non‐intrusive to the user also improves the user experience. Considering the increased targeted content tolerance level of the user

due to the above‐mentioned features, and the characteristics of the content for determination of number of targeted contents to be inserted in the media stream, ensures an optimal number of targeted contents that may be inserted for different types of users. Further, training of the system using the existing data and the new data generated by the system also improves characteristics of the system such as the balance between latency and effectiveness, accuracy of the character recognition.
Further, in an implementation, ‘context‐aware ad opportunities’ in a content may be identified by recognition and indexing system and the content may be encoded along with the spots / markings for the identified opportunities. Further, the customized art, graphics, snippets, frames, shots for the respective items in the content where the relevant content, that is, targeted content opportunities are identified in the content are generated against the relevant content or targeted content campaigns available in the libraries, and populated in the CDN server. Further, the system may be trained to identify the targeted content display opportunities. For example, in a frame, there is a television and a logo on a Product is clearly visible (say the Product belongs to Brand ABC). The system may identify this as a targeted content opportunity and search for the scene boundary related to this scene in the frame. Say, the frame is at XX:30 min. and the scene boundary lies at XX:20 – XX:40 min. In that case, the system may provide to the target content recommendation server the complete segment of XX:20 – XX:40 min. In this case, the target content recommendation server, for example, may search for other products of the Brand ABC to be presented to the user. Continuing with the above example, say the Product is a Television of Brand ABC. And, the target content recommendation server finds a targeted content for a television of Brand DEF, then this television of Brand ABC may be replaced by the target content recommendation server with a television of Brand DEF, or only the logo of Brand ABC on the television may be replaced with the logo of Brand DEF to be displayed

to the user. Further, during the playback at the user device, at the marked spots for content relevant content display or targeted content opportunity, the manifest file may inform the user device to communicate with the CDN for displaying relevant content or targeted content.
Further, in an implementation, recognition of objects and characters and indexing in run time or near real‐time for live‐streaming video content, a local database may be maintained that crawls the internet for characters, other items data such as the items available on the ecommerce websites or other websites such as an educational website, and the system is trained to compare the characters/objects in a content with that local database in real‐time. This may also help in handling issues related to latency and synchronization of content during live streaming of data. Consequently, the relevant data may be provided to the user, for example, in a data tray where a user may click on the link to access the item that he desires.
Also, the present invention provides an enhanced search option to the user(s) since the system is context aware and understands the context of the content offered by the platform. The user is able to perform a more enhanced image or text based search using the system as the system not only searches in the title and description of the media content but also in the indexed content, i.e., the characters and objects in the media content.
While the invention has been explained with respect to many examples, it will be appreciated by those skilled in the art that the invention is not restricted by these examples and many changes can be made to the embodiments disclosed herein without departing from the principles and scope of the present invention.

I/We claim:
1. A method for providing a context aware media streaming content on one
or more user devices, the method comprising:
‐ receiving, at a character and object recognition module [102], a set
of frames of a media streaming content; ‐ analyzing, by the character and object recognition module [102], at
least one selected frame from the set of frames of the media
streaming content; ‐ identifying, by the character and object recognition module [102],
at least one of one or more characters and one or more objects
based on the analysis of the at least one selected frame; ‐ indexing, by a processing unit [104], at least one frame from the at
least one selected frame that encompasses at least one of the one
or more characters and the one or more objects; ‐ generating, by the processing unit [104], one or more customized
artworks based on the indexing of the at least one frame from the
at least one selected frame and a set of user attributes; ‐ generating, by the processing unit [104], the context aware media
streaming content based at least on the one or more customized
artworks; and ‐ providing, by the processing unit [104], the context aware media
streaming content on the one or more user devices.
2. The method as claimed in claim 1, wherein the at least one selected frame from the set of frames is selected based on at least one of a logical selection and a random selection.
3. The method as claimed in claim 1, wherein the media streaming content is one of a live video content and a pre‐recorded video content.
4. The method as claimed in claim 1, wherein the character and object recognition module [102] comprises at least one of a face detector module

[106], a face alignment module [108], a face recognition module [110] and an object recognition module [112].
5. The method as claimed in claim 4, wherein the identifying, by the character
and object recognition module [102], the one or more characters further
comprises:
‐ localizing, by the face detector module [106], one or more faces in the set of frames of the media streaming content,
‐ aligning, by the facial alignment module [108], the one or more localized faces to normalized canonical coordinates,
‐ implementing, by the face recognition module [110], a facial recognition method on the one or more aligned localized faces, and
‐ identifying, by the face recognition module [110], the one or more characters based on the implementation of the facial recognition method on the one or more aligned localized faces.
6. The method as claimed in claim 1, wherein the character and object recognition module [102] is a pre‐trained system that is trained on a training dataset, wherein the training dataset comprises a data associated with at least one of a plurality of images of characters and a plurality of objects.
7. The method as claimed in claim 1, wherein the indexing is performed against a playtime length on a progress bar of the media streaming content, and the indexing is displayed on the progress bar by at least one of one or more markings, one or more colour codes, one or more character names, one or more emojis, and one or more actual frames in the media streaming content.
8. The method as claimed in claim 1, wherein the indexing, by the processing unit [104], the at least one frame from the at least one selected frame comprises inserting in the at least one frame from the at least one selected frame a set of content to be provided on the one or more user devices.

9. The method as claimed in claim 1, wherein the one or more customized artworks comprise at least one of one or more previews, one or more thumbnails, one or more teasers, one or more trailers, one or more images, and one or more short videos, related to at least one of the one or more characters and the one or more objects.
10. The method as claimed in claim 1, wherein the generating, by the processing unit [104], the context aware media streaming content further comprises modifying the media streaming content based on one of a manifest file associated with the one or more customized artworks and a hardcoding of the one or more customized artworks.
11. The method as claimed in claim 1, the method further comprising:
‐ generating, by the processing unit [104], a customized targeted content based on the context aware media streaming content, and
‐ providing, by the processing unit [104] on the one or more user devices, the customized targeted content based on a historical user pattern associated with the one or more user devices.
12. The method as claimed in claim 1, the method further comprises providing by the processing unit [104], at least one of a searching capability, a buying capability, an object related data accessing capability, a character related data accessing capabilities, on the one or more user devices, based on the context aware media streaming content.
13. A system [100] for providing a context aware media streaming content on one or more user devices, the system [100] comprising:
o a character and object recognition module [102], configured to: ‐ receive, a set of frames of a media streaming content, ‐ analyze, at least one selected frame from the set of frames of the media streaming content, and

‐ identify, at least one of one or more characters and one or more
objects based on the analysis of the at least one selected frame;
and o a processing unit [104], configured to: ‐ index, at least one frame from the at least one selected frame that
encompasses at least one of the one or more characters and the
one or more objects, ‐ generate, one or more customized artworks based on the indexing
of the at least one frame from the at least one selected frame and
a set of user attributes, ‐ generate, the context aware media streaming content based at
least on the one or more customized artworks, and ‐ provide, the context aware media streaming content on the one or
more user devices.
14. The system [100] as claimed in claim 13, wherein the at least one selected frame from the set of frames is selected based on at least one of a logical selection and a random selection.
15. The system [100] as claimed in claim 13, wherein the media streaming content is one of a live video content and a pre‐recorded video content.
16. The system [100] as claimed in claim 13, wherein the character and object recognition module [102] comprises at least one of a face detector module [106], a face alignment module [108], a face recognition module [110] and an object recognition module [112].
17. The system [100] as claimed in claim 16, wherein to identify, the one or more characters:
‐ the face detector module [106], is configured to localize one or more faces in the set of frames of the media streaming content,
‐ the facial alignment module [108], is configured to align the one or more localized faces to normalized canonical coordinates,

‐ the face recognition module [110], is configured to implement a facial recognition method on the one or more aligned localized faces, and
‐ the face recognition module [110], is configured to identify the one or more characters based on the implementation of the facial recognition method on the one or more aligned localized faces.
18. The system [100] as claimed in claim 13, wherein the character and object recognition module [102] is a pre‐trained system that is trained on a training dataset, wherein the training dataset comprises a data associated with at least one of a plurality of images of characters and a plurality of objects.
19. The system [100] as claimed in claim 13, wherein the indexing is performed against a playtime length on a progress bar of the media streaming content, and the indexing is displayed on the progress bar by at least one of one or more markings, one or more colour codes, one or more character names, one or more emojis, and one or more actual frames in the media streaming content.
20. The system [100] as claimed in claim 13, wherein to index the at least one frame from the at least one selected frame the processing unit [104] is configured to insert in the at least one frame from the at least one selected frame a set of content to be provided on the one or more user devices.
21. The system [100] as claimed in claim 13, wherein the one or more customized artworks comprise at least one of one or more previews, one or more thumbnails, one or more teasers, one or more trailers, one or more images, and one or more short videos, related to at least one of the one or more characters and the one or more objects.
22. The system [100] as claimed in claim 13, wherein to generate the context aware media streaming content the processing unit [104] is further configured to modify the media streaming content based on one of a

manifest file associated with the one or more customized artworks and a hardcode of the one or more customized artworks.
23. The system [100] as claimed in claim 13, wherein the processing unit [104]
is further configured to:
‐ generate a customized targeted content based on the context
aware media streaming content, and ‐ provide on the one or more user devices, the customized targeted
content based on a historical user pattern associated with the one
or more user devices.
24. The system [100] as claimed in claim 13, wherein the processing unit [104]
is further configured to provide, at least one of a searching capability, a
buying capability, an object related data accessing capability, a character
related data accessing capabilities on the one or more user devices based
on the context aware media streaming content.

Documents

Application Documents

#	Name	Date
1	202221068891-STATEMENT OF UNDERTAKING (FORM 3) [30-11-2022(online)].pdf	2022-11-30
2	202221068891-PROVISIONAL SPECIFICATION [30-11-2022(online)].pdf	2022-11-30
3	202221068891-FORM 1 [30-11-2022(online)].pdf	2022-11-30
4	202221068891-FORM-26 [01-12-2022(online)].pdf	2022-12-01
5	202221068891-Proof of Right [03-02-2023(online)].pdf	2023-02-03
6	202221068891-FORM 18 [28-11-2023(online)].pdf	2023-11-28
7	202221068891-ENDORSEMENT BY INVENTORS [28-11-2023(online)].pdf	2023-11-28
8	202221068891-DRAWING [28-11-2023(online)].pdf	2023-11-28
9	202221068891-CORRESPONDENCE-OTHERS [28-11-2023(online)].pdf	2023-11-28
10	202221068891-COMPLETE SPECIFICATION [28-11-2023(online)].pdf	2023-11-28
11	Abstract1.jpg	2024-03-06
12	202221068891-PA [18-06-2024(online)].pdf	2024-06-18
13	202221068891-ASSIGNMENT DOCUMENTS [18-06-2024(online)].pdf	2024-06-18
14	202221068891-8(i)-Substitution-Change Of Applicant - Form 6 [18-06-2024(online)].pdf	2024-06-18
15	202221068891-RELEVANT DOCUMENTS [12-06-2025(online)].pdf	2025-06-12
16	202221068891-FORM 13 [12-06-2025(online)].pdf	2025-06-12
17	202221068891-FER.pdf	2025-07-28
19	202221068891-ORIGINAL UR 6(1A) FORM 26-220925.pdf	2025-09-25

Search Strategy

1	202221068891_SearchStrategyNew_E_SearchHistoryDOCUMENTNOVIDIGITALE_15-07-2025.pdf