Abstract: The present disclosure provides system (108) and method for creating and recommending personalized artwork. The system receives media files associated with a profile of a user (102). The system (108) identifies an action and detects one or more attributes among the one or more media files. The system (108) extracts one or more frames from the one or more media files and selects at least one frame from the one or more frames having the highest Image Quality Assessment (IQA) score. The system (108) generates one or more images based on the one or more attributes and said at least one frame having the highest IQA score. The system (108) generates a quality assessment score based on the one or more images. The system (108) generates personalized artwork based on the quality assessment score and the profile of the user (102) and provides the personalized artwork to the user (102).
DESC:RESERVATION OF RIGHTS
[001] A portion of the disclosure of this patent document contains material, which is subject to intellectual property rights such as but are not limited to, copyright, design, trademark, integrated circuit (IC) layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (hereinafter referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.
FIELD OF INVENTION
[0002] The embodiments of the present disclosure generally relate to recommendation systems. More particularly, the present disclosure relates to a system and a method for creating and recommending personalized artwork.
BACKGROUND
[0003] The following description of the related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section is used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of the prior art.
[0004] An artwork draws a user’s attention, hence creating and recommending an artwork is very crucial for a digital platform. However, the artwork may include various content such as, but not limited to, movie, documentary, sport, and summit. Therefore, the artwork may not be limited to a single theme and may include multiple themes based on the user’s preference. Hence, generating an artwork may be complex as various users have different preferences.
[0005] There is, therefore, a need in the art to provide a system and a method for creating and recommending artwork based on user preferences.
OBJECTS OF THE INVENTION
[0006] Some of the objects of the present disclosure, which at least one embodiment herein satisfies are listed herein below.
[0007] It is an object of the present disclosure to provide a system and a method for creating and recommending a personalized artwork to a user.
[0008] It is an object of the present disclosure to provide a system and a method that generates artwork/thumbnails based on an individual’s preference.
[0009] It is an object of the present disclosure to provide a system and a method that uses Image Quality Assessment (IQA) and computes an IQA score for a video frame using AI models.
[0010] It is an object of the present disclosure to use AI model to detect and recognize an action being performed in the video frame.
[0011] It is an object of the present disclosure to provide a system and a method that generates image properties from the video frame to identify a person and an emotion essayed by the person.
[0012] It is an object of the present disclosure to provide a system and a method that generates a thumbnail score based on various features in the video frame and ranks the video frame based on the thumbnail score.
SUMMARY
[0013] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[0014] In an aspect, the present disclosure relates to a system for generating personalized artwork. The system includes a processor and a memory operatively coupled with the processor, where said memory stores instructions which, when executed by the processor, cause the processor to receive one or more media files from a user via one or more computing devices. The one or more media files are associated with a profile of the user. The processor identifies an action performed among the one or more media files. The processor detects one or more attributes among the one or more media files. The processor extracts one or more frames from the one or more media files and selects at least one frame among the one or more frames having the highest Image Quality Assessment (IQA) score. The processor generates one or more images based on the one or more attributes and said at least one frame having the highest IQA score. The processor generates a quality assessment score for the one or more images. The processor generates personalized artwork based on the quality assessment score and the profile of the user. The processor provides the personalized artwork to the user.
[0015] In an embodiment, the one or more attributes may include a face and an emotion associated with the face.
[0016] In an embodiment, the processor may perform deduplication of the one or more images and generate the quality assessment score for the one or more images based on the deduplication.
[0017] In an embodiment, the processor may annotate the extracted one or more frames and train a machine learning model on an annotated dataset to predict the highest IQA score.
[0018] In an embodiment, the trained machine learning model may include a vision transformer (VIT) technique, and a combination of one or more AI models such as a support vector machine (SVM) and an extreme gradient (XG) boost classifier to classify the annotated one or more frames.
[0019] In an embodiment, the processor may bifurcate the one or more media files into one or more action subclasses and identify the action performed among the one or more action subclasses via the machine learning model.
[0020] In an embodiment, the processor may detect one or more faces using an AI based face detector that may include a Multi-Task Cascaded Convolutional Network (MTCNN) model and generate a face vector associated with the detected one or more attributes.
[0021] In an embodiment, the processor may train a machine learning classifier on a face vector database to predict an actor class from the one or more attributes.
[0022] In an embodiment, the processor may use an emotion recognition model to identify the emotion associated with the face.
[0023] In an aspect, the present disclosure relates to a method for generating personalized artwork. The method includes receiving, by a processor associated with a system, one or more media files from a user via one or more computing devices. The one or more media files are associated with a profile of the user. The method includes identifying, by the processor, an action performed among the one or more media files. The method includes detecting, by the processor, one or more attributes among the one or more media files. The method includes extracting, by the processor, one or more frames from the one or more media files and selecting at least one frame among the one or more frames having the highest IQA score. The method includes generating, by the processor, one or more images based on the one or more attributes and said at least one frame having the highest IQA score. The method includes generating, by the processor, a quality assessment score for the one or more images. The method includes generating, by the processor, personalized artwork based on the quality assessment score and the profile of the user, and providing, by the processor, the personalized artwork to the user.
[0024] In an embodiment, the method may include performing, by the processor, deduplication of the one or more images, and generating the quality assessment score for the one or more images based on the deduplication.
[0025] In an embodiment, the method may include annotating, by the processor, the extracted one or more frames, and training a machine learning model on an annotated dataset to predict the highest IQA score.
[0026] In an embodiment, the method may include bifurcating, by the processor, the one or more media files into one or more action subclasses, and identifying the action performed among the one or more action subclasses via the machine learning model.
[0027] In an aspect, a user equipment (UE) for sending requests includes one or more processors communicatively coupled to a processor associated with a system. The one or more processors are coupled with a memory, and where said memory stores instructions which, when executed by the one or more processors, cause the one or more processors to transmit one or more media files to the processor. The one or more media files are associated with a profile of a user associated with the UE. The processor is configured to receive the one or more media files from the user associated with the UE. The processor is configured to identify an action performed among the one or more media files. The processor is configured to detect one or more attributes among the one or more media files. The processor is configured to extract one or more frames from the one or more media files and select at least one frame among the one or more frames having the highest IQA score. The processor is configured to generate one or more images based on the one or more attributes and said at least one frame having the highest IQA score. The processor is configured to generate a quality assessment score for the one or more images. The processor is configured to generate personalized artwork based on the quality assessment score and the profile of the user, and provide the personalized artwork to the user.
BRIEF DESCRIPTION OF DRAWINGS
[0028] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes the disclosure of electrical components, electronic components, or circuitry commonly used to implement such components.
[0029] FIG. 1 illustrates an example network architecture (100) for implementing a proposed system (108), in accordance with an embodiment of the present disclosure.
[0030] FIG. 2 illustrates an example block diagram (200) of a proposed system (108), in accordance with an embodiment of the present disclosure.
[0031] FIG. 3A illustrates an example architecture diagram (300A) of the proposed system (108), in accordance with an embodiment of the present disclosure.
[0032] FIG. 3B illustrates an example flow diagram (300B) of the proposed system (108), in accordance with an embodiment of the present disclosure.
[0033] FIG. 4 illustrates an example flow diagram (400) for generating an Image Quality Assessment (IQA) model, in accordance with an embodiment of the present disclosure.
[0034] FIG. 5 illustrates an example sub video selection module (500), in accordance with an embodiment of the present disclosure.
[0035] FIG. 6 illustrates an exemplary actor detection module (600), in accordance with an embodiment of the present disclosure.
[0036] FIG. 7 illustrates an example emotion recognition module (700), in accordance with an embodiment of the present disclosure.
[0037] FIG. 8 illustrates an example actor and IQA standardized scoring module (800), in accordance with an embodiment of the present disclosure.
[0038] FIG. 9 illustrates an exemplary de-duplication module (900), in accordance with an embodiment of the present disclosure.
[0039] FIG. 10 illustrates an example computer system (1000) in which or with which embodiments of the present disclosure may be implemented.
[0040] The foregoing shall be more apparent from the following more detailed description of the disclosure.
DEATILED DESCRIPTION
[0041] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0042] The ensuing description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
[0043] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.
[0044] Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0045] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.
[0046] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0047] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
[0048] The present disclosure generates personalized artwork based on user’s interest. For example, if a user is inclined towards romance, the artwork recommended to the user depicts romance. For generating personalized artwork, various aspects of a video content may be analysed that may include a video quality, a video content, and an identification of a person (actor/actress) present in the video. The present disclosure provides a system and a method for generating personalized artworks.
[0049] The present disclosure generates an image quality assessment (IQA) to examine the characteristics of a video frame. The process of determining the level of accuracy is called IQA which is usually based on quality parameters to examine the characteristics of an image such as such as blurriness, noise, dynamic range, and distortion. Artificial Intelligence (AI) models are utilized to compute the IQA score for various video frames. Deep learning/machine learning-based action detection system and methods are used to detect and recognize the action is happening in the video frame. Further, image properties are extracted from the video frame for identifying the person and their facial expressions. Thumbnail score is computed for each frame based on various features that may include the IQA score, the identified action, a presence of an actor/actress, identified actor and facial expression. The system ranks the video frame with the corresponding thumbnail score, removing duplicates and creates thumbnails for display.
[0050] The present disclosure recommends the artwork at a personalized level, where a user profile may be created in terms of user’s details such as user’s age, demography, gender, user’s interest, recent activity including user’s like and dislikes. Further, the user profile may include watched videos by the user, and recommended thumbnails. A matching algorithm may be used to generate relevant thumbnails for an individual user.
[0051] The present disclosure may include system and methods to identify the action happening into video content and split video into sub-video. The present disclosure may include system and methods to assess the image quality for examining the characteristics of the frame extracted from video such as blurriness, noise, dynamic range, distortion, etc. The present disclosure may include system and methods to detect and identify the person in an image. The present disclosure may include systems and methods to recognize the emotion in the facial image. The present disclosure may include systems to compute the thumbnail score by combining the IQA score, sub-video genre, actor presence and facial expression in the extracted video frame. Normalizing Scores for IQA score and the actor presence may also be combined with genre and facial expressions instead of instead of creating a separate module. Further, the present disclosure may include a system and method for removing duplicate images which may be very similar thumbnails using colour histogram vectors.
[0052] Various embodiments of the present disclosure will be explained in detail with reference to FIGs. 1-10.
[0053] FIG. 1 illustrates an example network architecture (100) for implementing a proposed system (108), in accordance with an embodiment of the present disclosure.
[0054] As illustrated in FIG. 1, the network architecture (100) may include a system (108). The system (108) may be connected to one or more computing devices (104-1, 104-2…104-N) via a network (106). The one or more computing devices (104-1, 104-2…104-N) may be interchangeably specified as a user equipment (UE) (104) and be operated by one or more users (102-1, 102-2...102-N). Further, the one or more users (102-1, 102-2…102-N) may be interchangeably referred as a user (102) or users (102).
[0055] In an embodiment, the computing devices (104) may include, but not be limited to, a mobile, a laptop, etc. Further, the computing devices (104) may include a smartphone, virtual reality (VR) devices, augmented reality (AR) devices, a general-purpose computer, desktop, personal digital assistant, tablet computer, and a mainframe computer. Additionally, input devices for receiving input from the user (102) such as a touch pad, touch-enabled screen, electronic pen, and the like may be used. A person of ordinary skill in the art will appreciate that the computing devices (104) may not be restricted to the mentioned devices and various other devices may be used.
[0056] In an embodiment, the network (106) may include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. The network (106) may also include, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof.
[0057] In an embodiment, the system (108) may receive one or more media files from a user (102) via one or more computing devices (104). The system (108) may bifurcate the one or more media files into one or more action subclasses and identify the action performed among the one or more action subclasses via a machine learning model.
[0058] In an embodiment, the one or more media files may be associated with a profile of the user (102). The system (108) may detect one or more attributes among the one or more media files. The one or more attributes may include a face and an emotion associated with the face. The system (108) may use an emotion recognition model to identify the emotion associated with the face. The system (108) may detect one or more faces using an AI-based face detector model that may include, but not limited to, a a Multi-Task Cascaded Convolutional Network (MTCNN) model, and generate a face vector associated with the one or more attributes using the AI model, e.g., FaceNet, ResNet, etc. The system (108) may train a machine learning classifier on a face vector database to predict an actor class from the one or more attributes.
[0059] In an embodiment, the system (108) may extract one or more frames from the one or more media files and train a machine learning model on the annotated dataset to predict the highest IQA score. Further, the system (108) may annotate the extracted one or more frames and generate a trained machine learning model based on the annotated one or more frames to generate said at least frame having the highest IQA score. The trained machine learning model may include but not limited to a vision transformer (VIT) technique, and a combination of one or more AI models such as but not limited to a SVM and an extreme gradient (XG) boost technique to classify the annotated one or more frames.
[0060] Further, the system (108) may generate one or more images based on the one or more attributes and said at least frame having the highest IQA score. The system (108) may generate a quality assessment based on the one or more images. The system (108) may perform the deduplication of the one or more images and generate the quality assessment associated with the one or more images. Further, the system (108) may generate personalized artwork based on the quality assessment and the profile of the user (102) and provide the personalized artwork to the user.
[0061] Although FIG. 1 shows exemplary components of the network architecture (100), in other embodiments, the network architecture (100) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1. Additionally, or alternatively, one or more components of the network architecture (100) may perform functions described as being performed by one or more other components of the network architecture (100).
[0062] FIG. 2 illustrates an example block diagram (200) of a proposed system (108), in accordance with an embodiment of the present disclosure.
[0063] Referring to FIG. 2, the system (108) may a processor(s) (202) that may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions. Among other capabilities, the one or more processor(s) (202) may be configured to fetch and execute computer-readable instructions stored in a memory (204) of the system (108). The memory (204) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as random-access memory (RAM), or non-volatile memory such as erasable programmable read only memory (EPROM), flash memory, and the like.
[0064] In an embodiment, the system (108) may include an interface(s) (206). The interface(s) (206) may comprise a variety of interfaces, for example, interfaces for data input and output (I/O) devices, storage devices, and the like. The interface(s) (206) may also provide a communication pathway for one or more components of the system (108). Examples of such components include, but are not limited to, processing engine(s) (208) and a database (210), where the processing engine(s) (208) may include, but not be limited to, a data parameter engine (212) and other engine(s) (214). In an embodiment, the other engine(s) (214) may include, but not limited to, an input/output engine, and a notification engine.
[0065] In an embodiment, the processing engine(s) (208) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) (208). In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) (208) may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) (208) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) (208). In such examples, the system (108) may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the system (108) and the processing resource. In other examples, the processing engine(s) (208) may be implemented by electronic circuitry.
[0066] In an embodiment, the processor (202) may receive one or more media files from a user (102) via the data parameter engine (212). The one or more media files may be received from one or more computing devices (104). The processor (202) may store the one or more media files in the database (210).
[0067] In an embodiment, the processor (202) may bifurcate the one or more media files into one or more action subclasses and identify the action performed among the one or more action subclasses via a machine learning model.
[0068] In an embodiment, the one or more media files may be associated with a profile of the user (102). The processor (202) may identify an action performed among the one or more media files. The processor (202) may detect one or more attributes among the one or more media files. The one or more attributes may include a face and an emotion associated with the face. The processor (202) may use an emotion recognition model to identify the emotion associated with the face. The processor (202) may detect one or more faces using an AI-based face detector that may include a MTCNN model and generate a face vector associated with the one or more attributes. The processor (202) may train a machine learning classifier on a face vector database to predict an actor class from the one or more attributes.
[0069] In an embodiment, the processor (202) may extract one or more frames from the one or more media files and select at least a frame among the one or more frames having the highest IQA score. Further, the processor (202) may annotate the extracted one or more frames and generate a trained machine learning model based on the annotated one or more frames to generate said at least frame having the highest IQA score. The trained machine learning model may include but not limited to a VIT technique, and a combination of one or more AI models such as but not limited to a SVM and XG boost classifiers to classify the one or more frames.
[0070] Further, the processor (202) may generate one or more images based on the one or more attributes and said at least frame having the highest IQA score. The processor (202) may generate a quality assessment associated with the one or more images. The processor (202) may perform the deduplication of the one or more images and generate the quality assessment associated with the one or more images. Further, the processor (202) may generate personalized artwork based on the quality assessment and the profile of the user (102) and provide the personalized artwork to the user (102).
[0071] Although FIG. 2 shows exemplary components of the system (108), in other embodiments, the system (108) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 2. Additionally, or alternatively, one or more components of the system (108) may perform functions described as being performed by one or more other components of the system (108).
[0072] FIG. 3A illustrates an example architecture diagram (300) of the proposed system (108), in accordance with an embodiment of the present disclosure.
[0073] As illustrated in FIG. 3A, in an embodiment, upon uploading, at step 1, multiple sub-videos may be identified by the system (108). Action detection may be performed in the videos and the videos may be split into sub-clips. At step 2, actor identification and emotion recognition may be performed by the system (108). Actor identification may be performed by the system (108) through face detection. Further, at step 3, an image selection may be performed by the system (108) from the sub-videos determined at step 1. For performing the image selection, frames may be extracted from the video and an IQA of the frames may be performed by the system (108). Further, ‘N’ number of frames with the highest IQA may be selected by the system (108). At step 4, a thumbnail selection for each sub-video may be performed by the system (108). At step 4, an input may be received from the image selection block at step 3, and the actor identification and emotion recognition module at step 2. At step 4, the actor’s presence, IQA, emotion and action scores may be standardised and the thumbnail score may be computed by the system (108). Further, the system (108) may perform de-duplication of thumbnail’s images followed by a thumbnail quality assessment.
[0074] In an embodiment, a user’s interest vector may be determined by the system (108) based on the user’s profile. The user’s interest vector and the thumbnail quality assessment may be used to select images as thumbnails for the video. Thereafter, a personalized artwork for an individual user may be created by the system (108). In addition, the artwork displayed may be displayed to the user (102) and updated based on user’s recent activities.
[0075] In an embodiment, the system (108) may include an image quality assessment module. Levels of accuracy of an image may be calculated using the IQA module which may be based on quality parameters that examine characteristics of an image and may include but not limited to, blurriness, noise, dynamic range, and distortion. Further, quality assessment for a video frame may be formulated by the system (108) using a classification problem.
[0076] In an embodiment, the system (108) may extract frames from the video and annotate the frames to generate pre-define classes associated with the frames (Good/Accept or Bad/reject). A deep learning/machine learning model may be trained on the annotated frames. The trained model may be utilized to compute the IQA score for an image. A pre-defined threshold may be used to classify the image, where if the IQA score is below a pre-defined threshold, the image may be classified as bad/reject, otherwise, the image may be classified as good/accept.
[0077] FIG. 3B illustrates an example flow diagram (300) of the proposed system (108), in accordance with an embodiment of the present disclosure.
[0078] As illustrated in FIG. 3B, in an embodiment, the following steps may be utilized by the system (108).
[0079] At step 302: The system (108) may receive a full video.
[0080] At step 304: The system (108) may extract frames from the video.
[0081] At step 306: The system (108) may annotate the frames as good/bad.
[0082] At step 308: The system (108) may generate a trained deep learning (DL)/machine learning (ML) model for IQA.
[0083] At step 310: The system (108) may save the trained IQA model.
[0084] FIG. 4 illustrates an example flow diagram (400) for generating an Image Quality Assessment (IQA) model, in accordance with an embodiment of the present disclosure.
[0085] As illustrated in FIG. 4, in an embodiment, the system (108) may use a DL based IQA model to compute an IQA score.
[0086] At step 402: The system (108) may extract frames from the video.
[0087] At step 404: The system (108) may load the trained IQA model.
[0088] At step 406: The system (108) may compute the IQA score.
[0089] At step 408: The system (108) may determine if the IQA score is greater than a threshold.
[0090] At step 410: In response to a positive determination from step 408, the system (108) may classify an image associated with the frame as good/accept.
[0091] At step 412: In response to a negative determination from step 408, the system (108) may classify the image as bad/reject.
[0092] FIG. 5 illustrates an example sub video selection module (500), in accordance with an embodiment of the present disclosure.
[0093] As illustrated in FIG. 5, in an embodiment, the system (108) may split the video/full video (502) into sub-clips (504) according to the action being performed in the video and generate sub-videos/sub-clips that may be clustered into action classes. Further, the system (108) may process the sub-videos to create thumbnails from action classes. The system (108) may use DL models for action identification (506) in order to detect and recognize action in the video and generate multiple classes (400-700 classes).
[0094] In an embodiment, the system (108) may apply an action identification model iteratively on the video content with a fixed window size (t second). The system (108) may map (508) low-level classes to high level classes which may be directly useful to signify intimacy. Further, a probability threshold (cut-off) corresponding to the action class may be computed by the system (108). The threshold for action classes such as the physical intimacy may be high while the probability threshold for the action classes involving crying, sad, and thinking may be comparatively low. Successive windows of images may be merged (510) by the system (108) if they belong to the same action class. Further, the system (108) may generate sub-video data frames (512) based on the merged images.
[0095] As illustrated in FIG. 5, the following steps may be utilized by the system (108).
[0096] At step 502: The system (108) may receive a full length video.
[0097] At step 504: The system (108) may split the video into sub-clips/sub-videos with a specific window size (t second).
[0098] At step 506: The system (108) may identify the action in the sub-videos.
[0099] At step 508: The system (108) may map the low level classes to the high level classes.
[00100] At step 510: The system (108) may merge successive sub-videos according to their action classes.
[00101] At step 512: The system (108) may generate sub-video data frames that may include a start time, an end time, an action class, and a class probability.
[00102] FIG. 6 illustrates an exemplary actor detection module (600), in accordance with an embodiment of the present disclosure.
[00103] As illustrated in FIG. 6, in an embodiment, the system (108) may detect a face among the extracted video frames using an AI-based based face detection model (e.g., MTCNN face detector and locator). The system (108) may apply a face embedding model (i.e., suitable AI models) on the extracted face from video frame to obtain a face vector (FaceNet/ResNet). The face vector data may be used to train a machine learning classifier (e.g., SVM) to predict an actor class. The actor’s image database may be created and maintained by the system (108). Further, the system (108) may retrieve the actor images from the database for actor identification where a trained classifier may be applied on the images (filtered by quality). The classifier may return a list of actors in the image with a probability of the face appeared in the image where emotions may be tagged in the images. Further, a DL based emotion recognition model may be trained on the tagged images. The emotion recognition model may be trained to identify the facial expression in the detected face.
[00104] At step 602: The system (108) may use cast list for generating video frames.
[00105] At step 604: The system (108) may generate the video frames.
[00106] At step 606: The system (108) may perform face detection the video frames using the MTCNN model.
[00107] At step 608: The system (108) may apply face embedding model on the extracted face from video frame to obtain a face vector (e.g., FaceNet/ResNet).
[00108] At step 610: The system (108) may train a ML classifier on the face embedding.
[00109] At step 612: The system (108) may save the trained ML classifier for identification.
[00110] FIG. 7 illustrates an example emotion recognition module (700), in accordance with an embodiment of the present disclosure.
[00111] As illustrated in FIG. 7, in an embodiment, the following steps may be utilized by the system (108).
[00112] At step 702: The system (108) may receive the video frames.
[00113] At step 704: The system (108) may perform face detection among the video frames.
[00114] At step 706: The system (108) may perform image tagging for emotion recognition.
[00115] At step 708: The system (108) may train a DL based emotion recognition model.
[00116] At step 710: The system (108) may save the trained model for emotion recognition.
[00117] In an embodiment, the system (108) may select images based on the thumbnail score. The thumbnail score may include weighted sum of an action class, an actor identification and presence score and emotion recognition. The system (108) may determine an actor presence score that may include a percentage of screen occupied by the actor’s face. A threshold may be set to avoid extreme cases such as close ups or absence of an actor.
[00118] In an embodiment, the system (108) may use the IQA to eliminate the poor quality images and select good quality image for thumbnails. Different scenes in the video may have different lighting/camera angles, etc., therefore each of the action class scores may be normalized to a scale of ten. Further, the thumbnail score may be computed by weighting the above scores. Finally, the images with a top K thumbnail score may be selected by the deduplicate module for further processing.
[00119] FIG. 8 illustrates an example actor and IQA standardized scoring module (800), in accordance with an embodiment of the present disclosure.
[00120] As illustrated in FIG. 8, in an embodiment, the following steps may be utilized by the system (108).
[00121] At step 802: The system (108) may select sub-videos from a selection module and generate a sub-video data frame.
[00122] At step 804: The system (108) may aggregate sub-videos by the action classes.
[00123] At step 806: The system (108) may compute a weighted score to generate a standardized IQA score and a standardized actor presence score.
[00124] At step 808: The system (108) may select top K images as thumbnails.
[00125] At step 810: The system (108) may generate thumbnails for each action class.
[00126] FIG. 9 illustrates an exemplary de-duplication module (900), in accordance with an embodiment of the present disclosure.
[00127] As illustrated in FIG. 9, in an embodiment, the images selected by the system (108) based on the thumbnail score may have high similarity to other images, thus deduplication may be required to eliminate such similar images from the set of thumbnails. The system (108) may instead of explicit clustering, vectorize all potential thumbnails, using color histograms. Further, the system (108) may determine a similarity matrix for all thumbnails. The system (108) may iterate through the similarity matrix where for each image, the system (108) may collect all images where similarity is greater than the threshold. If a collected image already belongs to a group, the system (108) may remove the collected image from the current group. The system (108) may select one image from each group with the highest thumbnail score.
[00128] As illustrated in FIG. 9, the system (108) may perform the following steps.
[00129] At step 902: The system (108) may select images for each thumbnail for each action class.
[00130] At step 904: The system (108) may perform colour histogram vectorization.
[00131] At step 906: The system (108) may generate an image vector.
[00132] At step 908: The system (108) may compute the similarity matrix for the image vector.
[00133] At step 910: The system (108) may determine if similarity is greater than threshold. Based on a negative determination from this step, the system (108) may go to step 916.
[00134] At step 912: Based on a positive determination from step 910, the system (108) may collect all vectors for each image.
[00135] At step 914: The system (108) may keep the best image as thumbnail and eliminate others.
[00136] At step 916: The system (108) may generate deduplicated thumbnails.
[00137] FIG. 10 illustrates an exemplary computer system (1000) in which or with which embodiments of the present disclosure may be implemented.
[00138] As shown in FIG. 10, the computer system (1000) may include an external storage device (1010), a bus (1020), a main memory (1030), a read-only memory (1040), a mass storage device (1050), a communication port(s) (1060), and a processor (1070). A person skilled in the art will appreciate that the computer system (1000) may include more than one processor and communication ports. The processor (1070) may include various modules associated with embodiments of the present disclosure. The communication port(s) (1060) may be any of an RS-232 port for use with a modem-based dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports. The communication ports(s) (1060) may be chosen depending on a network, such as a Local Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system (1000) connects.
[00139] In an embodiment, the main memory (1030) may be Random Access Memory (RAM), or any other dynamic storage device commonly known in the art. The read-only memory (1040) may be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chip for storing static information e.g., start-up or basic input/output system (BIOS) instructions for the processor (1070). The mass storage device (1050) may be any current or future mass storage solution, which can be used to store information and/or instructions. Exemplary mass storage solutions include, but are not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives (internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces).
[00140] In an embodiment, the bus (1020) may communicatively couple the processor(s) (1070) with the other memory, storage, and communication blocks. The bus (1020) may be, e.g., a Peripheral Component Interconnect PCI) / PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), USB, or the like, for connecting expansion cards, drives, and other subsystems as well as other buses, such a front side bus (FSB), which connects the processor (1070) to the computer system (1000).
[00141] In another embodiment, operator and administrative interfaces, e.g., a display, keyboard, and cursor control device may also be coupled to the bus (1020) to support direct operator interaction with the computer system (1000). Other operator and administrative interfaces can be provided through network connections connected through the communication port(s) (1060). Components described above are meant only to exemplify various possibilities. In no way should the aforementioned exemplary computer system (1000) limit the scope of the present disclosure.
[00142] While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the disclosure. These and other changes in the preferred embodiments of the disclosure will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be implemented merely as illustrative of the disclosure and not as a limitation.
ADVANTAGES OF THE INVENTION
[00143] The present disclosure provides a system and a method that helps in generating thumbnails from a video which is densely populated with action and various actors.
[00144] The present disclosure provides a system and a method that generates a customized experience for a user based on the user preference.
[00145] The present disclosure provides a system and a method that uses maps action to timestamps in the video and enables generation of a trailer based on action classes.
[00146] The present disclosure provides a system and a method that enables a user to skip directly to an action scene based on his/her interest via the timestamped action information.
[00147] The present disclosure provides a system and a method that recommends personalized artwork based on user history, thereby increasing user satisfaction and engagement.
,CLAIMS:1. A system (108) for generating personalized artwork, the system (108) comprising:
a processor (202); and
a memory (204) operatively coupled with the processor (202), wherein said memory (204) stores instructions which, when executed by the processor (202), cause the processor (202) to:
receive one or more media files from a user (102) via one or more computing devices (104), wherein the one or more media files are associated with a profile of the user (102);
identify an action performed among the one or more media files;
detect one or more attributes among the one or more media files;
extract one or more frames from the one or more media files and select at least one frame from the one or more frames having the highest Image Quality Assessment (IQA) score;
generate one or more images based on the one or more attributes and said at least one frame having the highest IQA score;
generate a quality assessment score for the one or more images;
generate personalized artwork based on the quality assessment score and the profile of the user (102); and
provide the personalized artwork to the user (102).
2. The system (108) as claimed in claim 1, wherein the one or more attributes comprise: a face and an emotion associated with the face.
3. The system (108) as claimed in claim 1, wherein the processor (202) is to perform deduplication of the one or more images and generate the quality assessment score for the one or more images based on the deduplication.
4. The system (108) as claimed in claim 1, wherein the processor (202) is to annotate the extracted one or more frames, and train a machine learning model on an annotated dataset to predict the highest IQA score.
5. The system (108) as claimed in claim 4, wherein the trained machine learning model comprises at least one of: a vision transformer (VIT) technique, and a combination of one or more Artificial Intelligence (AI) models including a support vector machine (SVM) and an extreme gradient (XG) boost classifier to classify the annotated one or more frames.
6. The system (108) as claimed in claim 5, wherein the processor (202) is to bifurcate the one or more media files into one or more action subclasses, and identify the action performed among the one or more action subclasses via the machine learning model.
7. The system (108) as claimed in claim 2, wherein the processor (202) is to detect the face using an Artificial Intelligence (AI)-based face detector that comprises a Multi-Task Cascaded Convolutional Network (MTCNN) model, and generate a face vector associated with the one or more attributes.
8. The system (108) as claimed in claim 7, wherein the processor (202) is to train a machine learning classifier on a face vector database to predict an actor class from the one or more attributes.
9. The system (108) as claimed in claim 2, wherein the processor (202) is to use an emotion recognition model to identify the emotion associated with the face.
10. A method for generating personalized artwork, the method comprising:
receiving, by a processor (202) associated with a system (108), one or more media files from a user (102) via one or more computing devices (104), wherein the one or more media files are associated with a profile of the user (102);
identifying, by the processor (202), an action performed among the one or more media files;
detecting, by the processor (202), one or more attributes among the one or more media files;
extracting, by the processor (202), one or more frames from the one or more media files and selecting at least one frame among the one or more frames having the highest Image Quality Assessment (IQA) score;
generating, by the processor (202), one or more images based on the one or more attributes and said at least one frame having the highest IQA score;
generating, by the processor (202), a quality assessment score for the one or more images;
generating, by the processor (202), personalized artwork based on the quality assessment score and the profile of the user (102); and
providing, by the processor (202), the personalized artwork to the user (102).
11. The method as claimed in claim 10, comprising performing, by the processor (202), deduplication of the one or more images, and generating the quality assessment score for the one or more images based on the deduplication.
12. The method as claimed in claim 10, comprising annotating, by the processor (202), the extracted one or more frames, and training a machine learning model on an annotated dataset to predict the highest IQA score.
13. The method as claimed in claim 12, comprising bifurcating, by the processor (202), the one or more media files into one or more action subclasses, and identifying the action performed among the one or more action subclasses via the machine learning model.
14. A user equipment (UE) (104) for sending requests, the UE (104) comprising:
one or more processors communicatively coupled to a processor (202) associated with a system (108), wherein the one or more processors are coupled with a memory, and wherein said memory stores instructions which, when executed by the one or more processors, cause the one or more processors to:
transmit one or more media files to the processor (202), wherein the one or more media files are associated with a profile of a user (102) associated with the UE (104);
wherein the processor (202) is configured to:
receive the one or more media files from the user (102) associated with the UE (104);
identify an action performed among the one or more media files;
detect one or more attributes among the one or more media files;
extract one or more frames from the one or more media files and select at least one frame among the one or more frames having the highest Image Quality Assessment (IQA) score;
generate one or more images based on the one or more attributes and said at least one frame having the highest IQA score;
generate a quality assessment score for the one or more images;
generate personalized artwork based on the quality assessment score and the profile of the user (102); and
provide the personalized artwork to the user (102) associated with the UE (104).
| # | Name | Date |
|---|---|---|
| 1 | 202221075948-STATEMENT OF UNDERTAKING (FORM 3) [27-12-2022(online)].pdf | 2022-12-27 |
| 2 | 202221075948-PROVISIONAL SPECIFICATION [27-12-2022(online)].pdf | 2022-12-27 |
| 3 | 202221075948-POWER OF AUTHORITY [27-12-2022(online)].pdf | 2022-12-27 |
| 4 | 202221075948-FORM 1 [27-12-2022(online)].pdf | 2022-12-27 |
| 5 | 202221075948-DRAWINGS [27-12-2022(online)].pdf | 2022-12-27 |
| 6 | 202221075948-DECLARATION OF INVENTORSHIP (FORM 5) [27-12-2022(online)].pdf | 2022-12-27 |
| 7 | 202221075948-ENDORSEMENT BY INVENTORS [23-12-2023(online)].pdf | 2023-12-23 |
| 8 | 202221075948-DRAWING [23-12-2023(online)].pdf | 2023-12-23 |
| 9 | 202221075948-CORRESPONDENCE-OTHERS [23-12-2023(online)].pdf | 2023-12-23 |
| 10 | 202221075948-COMPLETE SPECIFICATION [23-12-2023(online)].pdf | 2023-12-23 |
| 11 | 202221075948-FORM 18 [17-01-2024(online)].pdf | 2024-01-17 |
| 12 | 202221075948-FORM-8 [19-01-2024(online)].pdf | 2024-01-19 |
| 13 | Abstract1.jpg | 2024-03-30 |
| 14 | 202221075948-FER.pdf | 2025-08-01 |
| 15 | 202221075948-FORM 3 [31-10-2025(online)].pdf | 2025-10-31 |
| 1 | 202221075948_SearchStrategyNew_E_202221075948E_06-03-2025.pdf |