System And Method For Recognising Products In Content

< Back

System And Method For Recognising Products In Content

Abstract: The present disclosure relates to a system and a method for recognising products in content. The system (110) identifies image frames comprised in a content via a Hue Saturation Value (HSV) value, detects products from each image frame based on the HSV value, and extracts, via a representation model, feature vectors from the detected products. Further, the system (110) compares the extracted feature vectors with feature vectors of catalogue images stored in a catalogue feature database to recognise visually similar products from the catalogue feature database, determines recommendations corresponding to the recognised products, and saves the determined recommendations corresponding to the recognised products as a metadata in a metadata database. The metadata including the determined recommendations corresponding to the recognised products is displayed to a user on a user equipment (104) while viewing the content.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

26 July 2022

Publication Number

05/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

JIO PLATFORMS LIMITED

Office-101, Saffron, Nr. Centre Point, Panchwati 5 Rasta, Ambawadi, Ahmedabad - 380006, Gujarat, India.

Inventors

1. SAHA, Samrat

Samhita Spice Wood Apt – WB – K1 Gm Palya 6th Main, Bangalore - 560075, Karnataka, India.

2. PARTHASARATHY, Madhuri

No 8 Seetha Ram Mandir Road, Ulsoor Pet, Bangalore - 560002, Karnataka, India.

3. BABY, Renjith

Nelluvelil House, Kudiyanmala PO, Kannur - 670582, Kerala, India.

4. JANGITI, Pavanteja

67, Praneeth Pranav Valley, Bachupally, Hyderabad – 500090, Telangana, India.

5. ANAND, Adithya

94/1, Road No – 03, Sulanki, Hatia, Ranchi – 834003, Jharkhand, India.

Specification

DESC:RESERVATION OF RIGHTS
[001] A portion of the disclosure of this patent document contains material, which is subject to intellectual property rights such as, but are not limited to, copyright, design, trademark, Integrated Circuit (IC) layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (hereinafter referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.

FIELD OF DISCLOSURE
[002] The embodiments of the present disclosure generally relate to computer vision-assisted fashion model synthesis, and more particularly, to a system and a method for recognising products, for example, apparels in a content which includes, but not limited to, Video on Demand (VOD) content, video content, Augmented Reality/Virtual reality (AR/VR) content, holographic content, etc.

BACKGROUND OF DISCLOSURE
[003] The following description of related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art.
[004] In general, creating synthetic data for fashion sub-field may improve search experience of users on an e-commerce website to make more informed decisions about the choice of apparel they are buying, thereby reducing return of items. Including cloth type tags may also improve the user experience on the e-commerce website, but shopping and matching apparels of the personalities in a Video on Demand (VOD) content to an external catalogue may be an extremely challenging task due to the visual difference between the video images and catalogue images. The images of a typical VOD content of 25 frames per second may be blurry, and low in contrast unlike retail catalogue high-definition images with proper pose.
[005] Conventional methods and systems disclose a process to recommend products from a catalogue for street images or images from customers. The conventional methods and systems assume prior knowledge of an exact category of a street to recommend the products. However, the process fails on the videos due to the absence of category information of the video images. Further, the conventional methods and systems may recognise and recommend the products in the video images using a different neural network architecture, but fail to predict gender and category information of the products from the video content.
[006] There is, therefore, a need in the art to provide a system and a method for recognising apparels in the VOD content to overcome the deficiencies of the prior arts.

OBJECTS OF THE PRESENT DISCLOSURE
[007] Some of the objects of the present disclosure, which at least one embodiment herein satisfies are as listed herein below.
[008] It is an object of the present disclosure to match apparels of personalities in a content to an external catalogue.
[009] It is an object of the present disclosure to determine visually similar apparel recommendations on the content and provide the visually similar apparel recommendations to a user on a user equipment, thereby enhancing the user experience.
[0010] It is an object of the present disclosure to automatically update the recommendations with new products at regular intervals.
[0011] It is an object of the present disclosure to enhance the user experience of buying the products while viewing the content.

SUMMARY
[0012] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[0013] In an aspect, the present disclosure relates to a system for recognising products in a content, the system includes one or more processors, and a memory operatively coupled to the one or more processors. The memory includes processor-executable instructions, which on execution, cause the one or more processors to identify each image frame of a plurality of image frames comprised in a content via a Hue Saturation Value (HSV) value, detect one or more products from each image frame of the plurality of image frames based on the HSV value of each image frame, extract, via a representation model, feature vectors from the detected one or more products, compare the extracted feature vectors with feature vectors of one or more catalogue images stored in a catalogue feature database to recognise visually similar products from the catalogue feature database, and determine recommendations corresponding to the recognised products and save the determined recommendations as metadata in a metadata database. The metadata including the determined recommendations corresponding to the recognised products is displayed to a user on a user equipment while viewing the content.
[0014] In an embodiment, the one or more processors may detect the one or more products from each image frame of the plurality of image frames based on the HSV value by being configured to: detect one or more slot boundaries in the content to identify at least one window for displaying the one or more products based on the HSV value of each image frame, detect classes of the one or more products, person bounding boxes, face bounding boxes, and segmented images from the content based on the one or more slot boundaries in the content, determine Intersection over Union (IoU) threshold between the person bounding boxes and the face bounding boxes to map at least one person detected in at least one image frame with at least one face detected in the at least one image frame, map the one or more products with at least one gender based on the IoU threshold, and obtain the plurality of image frames with the one or more products mapped to the at least one gender on the content.
[0015] In an embodiment, the one or more processors may recognise the visually similar products from the catalogue feature database by being configured to: search similar products from the catalogue feature database based on the extracted feature vectors, determine histogram of search results and categorize the search results by a metric of Hellinger distances based on the extracted feature vectors, detect the one or more products with similar colour from the catalogue feature database to filter a number of products to be displayed based on the extracted feature vectors, and recognise the visually similar products from the catalogue feature database based on the detection of the one or more products with similar colour.
[0016] In an embodiment, the memory includes processor-executable instructions, which on execution, may cause the one or more processors to remove one or more duplicate products from the determined recommendations based on a detection of one or more slot boundaries in the content and search results of similar products from the catalogue feature database.
[0017] In an embodiment, the memory includes processor-executable instructions, which on execution, may cause the one or more processors to save the detected one or more slot boundaries in the content and the search results of the similar products in a form of the metadata in the metadata database.
[0018] In an aspect, the present disclosure relates to a system for recognising products in a content, the system includes one or more processors, and a memory operatively coupled to the one or more processors. The memory includes processor-executable instructions, which on execution, cause the one or more processors to receive catalogue data including one or more catalogue images from a computing device, perform addition or deletion of the catalogue data to or from a catalogue feature database in response to the reception of the catalogue data, recognise one or more products comprised in the one or more catalogue images which are visually similar to one or more products comprised in a content, based on the addition or deletion of the catalogue data, and store the recognised one or more products in the catalogue feature database.
[0019] In an embodiment, the one or more processors may perform the addition of the catalogue data to the catalogue feature database by being configured to: send the one or more catalogue images comprised in the catalogue data to a product detection and segmentation model, detect one or more catalogue products and one or more objects available in the one or more catalogue images, determine one or more catalogue product categories and one or more object detection classes based on the detection of the one or more catalogue products and the one or more objects available in the one or more catalogue images, map at least one catalogue product category to one of the one or more object detection classes based on preconfigured rules in a mapping file, extract feature vectors from the one or more catalogue images based on the mapping, and store the extracted feature vectors of the one or more catalogue images in the catalogue feature database.
[0020] In an embodiment, the one or more processors may perform the deletion of the catalogue data from the catalogue feature database by being configured to: determine whether the one or more catalogue images comprised in the catalogue data are available in a metadata database, and perform one of: mark the one or more catalogue images as delisted in the catalogue feature database in response to a determination that the one or more catalogue images are available in the metadata database, and delete the one or more catalogue images from the catalogue feature database in response to a determination that the one or more catalogue images are not available in the metadata database.
[0021] In an aspect, the present disclosure relates to a system for recognising products in a content, the system includes one or more processors, and a memory operatively coupled to the one or more processors. The memory includes processor-executable instructions, which on execution, cause the one or more processors to identify each image frame of a plurality of image frames comprised in a content, wherein each image frame includes one or more products, extract feature vectors from the one or more products comprised in each image frame of the plurality of image frames, pull metadata corresponding to the one or more products comprised in each image frame from a metadata database based on the extracted feature vectors, recognise, via a representation model, new products associated with the extracted feature vectors in a catalogue feature database; merge the pulled metadata with the recognised new products and send the merged metadata to the metadata database, and update recommendations corresponding to the merged metadata on the content based on the extracted feature vectors.
[0022] In an embodiment, the memory includes processor-executable instructions, which on execution, may cause the one or more processors to delete outdated products or out of stock products from the metadata.
[0023] In an embodiment, the memory includes processor-executable instructions, which on execution, may cause the one or more processors to delete outdated catalogue products from the catalogue feature database.
[0024] In another aspect, the present disclosure relates to a method for recognising products on a content. The method includes: identifying, by a processor associated with a system, each image frame of a plurality of image frames comprised in a content via a HSV value, detecting, by the processor, one or more products from each image frame of the plurality of image frames based on the HSV value of each image frame, extracting, via a representation model, by the processor, feature vectors from the detected one or more products, comparing, by the processor, the extracted feature vectors with feature vectors of one or more catalogue images stored in a catalogue feature database to recognise visually similar products from the catalogue feature database, and determining, by the processor, recommendations corresponding to the recognised products and saving the determined recommendations as metadata in a metadata database. The metadata including the determined recommendations corresponding to the recognised products is displayed to a user on a user equipment while viewing the content.
[0025] In an embodiment, detecting, by the processor, the one or more products from each image frame of the plurality of image frames based on the HSV value may include: detecting, by the processor, one or more slot boundaries in the content to identify at least one window for displaying the one or more products based on the HSV value of each image frame, detecting, by the processor, classes of the one or more products, person bounding boxes, face bounding boxes and segmented images from the content based on the one or more slot boundaries in the content, determining, by the processor, Intersection over Union (IoU) threshold between the person bounding boxes and the face bounding boxes to map at least one person detected in at least one image frame with at least one face detected in the at least one image frame, mapping, by the processor, the one or more products with at least one gender based on the IoU threshold, and obtaining, by the processor, the plurality of image frames with the one or more products mapped to the at least one gender on the content.
[0026] In an embodiment, recognising, by the processor, the visually similar products from the catalogue feature database may include: searching, by the processor, similar products from the catalogue feature database based on the extracted feature vectors, determining, by the processor, histogram of search results and categorizing the search results by a metric of Hellinger distances based on the extracted feature vectors, detecting, by the processor, the one or more products with similar colour from the catalogue feature database to filter a number of products to be displayed based on the extracted feature vectors, and recognising, by the processor, the visually similar products from the catalogue feature database based on the detection of the one or more products with similar colour.
[0027] In an embodiment, the method may include removing, by the processor, one or more duplicate products from the determined recommendations, based on a detection of one or more slot boundaries in the content and search results of similar products from the catalogue feature database.
[0028] In an embodiment, the method may include saving, by the processor, the detected one or more slot boundaries in the content and the search results of the similar products in a form of the metadata in the metadata database.
[0029] In another aspect, the present disclosure relates to a method for recognising products on a content. The method includes: receiving, by a processor associated with a system, catalogue data comprising one or more catalogue images from a computing device, performing, by the processor, addition or deletion of the catalogue data to or from a catalogue feature database in response to the reception of the catalogue data, recognising, by the processor, one or more products comprised in the one or more catalogue images which are visually similar to one or more products comprised in a content, based on the addition or deletion of the catalogue data, and storing, by the processor, the recognised one or more products in the catalogue feature database.
[0030] In an embodiment, performing, by the processor, the addition of the catalogue data to the catalogue feature database may include: sending, by the processor, the one or more catalogue images comprised in the catalogue data to a product detection and segmentation model, detecting, by the processor, one or more catalogue products and one or more objects available in the one or more catalogue images, determining, by the processor, one or more catalogue product categories and one or more object detection classes based on the detection of the one or more catalogue products and the one or more objects available in the one or more catalogue images, mapping, by the processor, at least one catalogue product category to one of the one or more object detection classes based on preconfigured rules in a mapping file, extracting, by the processor, feature vectors from the one or more catalogue images based on the mapping, and storing, by the processor, the extracted feature vectors of the one or more catalogue images in the catalogue feature database.
[0031] In an embodiment, performing, by the processor, the deletion of the catalogue data from the catalogue feature database may include: determining, by the processor, whether the one or more catalogue images comprised in the catalogue data are available in a metadata database, and performing, by the processor, one of: marking the one or more catalogue images as delisted in the catalogue feature database in response to a determination that the one or more catalogue images are available in the metadata database, and deleting the one or more catalogue images from the catalogue feature database in response to a determination that the one or more catalogue images are not available in the metadata database.
[0032] In another aspect, the present disclosure relates to a method for recognising products on a content. The method includes: identifying, by a processor associated with a system, each image frame of a plurality of image frames comprised in a content, wherein each image frame comprises one or more products, extracting, by the processor, feature vectors from the one or more products comprised in each image frame of the plurality of image frames, pulling, by the processor, metadata corresponding to the one or more products comprised in each image frame from a metadata database based on the extracted feature vectors, recognising, via a representation model, by the processor, new products associated with the extracted feature vectors in a catalogue feature database, merging, by the processor, the pulled metadata with the recognised new products and sending the merged metadata to the metadata database, and updating, by the processor, recommendations corresponding to the merged metadata on the content based on the extracted feature vectors.
[0033] In an embodiment, the method may include deleting, by the processor, outdated products or out of stock products from the metadata.
[0034] In an embodiment, the method may include deleting, by the processor, outdated catalogue products from the catalogue feature database.
[0035] In another aspect, the present disclosure relates to a user equipment. The user equipment includes one or more processors, and a memory operatively coupled to the one or more processors, wherein the memory includes processor-executable instructions, which on execution, cause the one or more processors to receive metadata including recommendations corresponding to recognised products from a system, and display the received metadata including the recommendations corresponding to the recognised products to a user while viewing a content. The one or more processors are communicatively coupled with the system, and the system is configured to: identify each image frame of a plurality of image frames comprised in a content via a HSV value, detect one or more products from each image frame of the plurality of image frames based on the HSV value of each image frame, extract, via a representation model, feature vectors from the detected one or more products, compare the extracted feature vectors with feature vectors of one or more catalogue images stored in a catalogue feature database to recognise visually similar products from the catalogue feature database, and determine the recommendations corresponding to the recognised products and save the determined recommendations as metadata in a metadata database.

BRIEF DESCRIPTION OF THE DRAWINGS
[0036] The accompanying drawings, which are incorporated herein, and constitute a part of this invention, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that invention of such drawings includes the invention of electrical components, electronic components or circuitry commonly used to implement such components.
[0037] The diagrams are for illustration only, which thus is not a limitation of the present disclosure, and wherein:
[0038] FIG. 1 illustrates an exemplary network architecture (100) in which or with which proposed system (110) may be implemented, in accordance with an embodiment of the present disclosure.
[0039] FIG. 2 illustrates an exemplary block diagram (200) of a product recognition system (110), in accordance with an embodiment of the present disclosure.
[0040] FIG. 3 illustrates an exemplary block diagram (300) of key components of the product recognition system (110), in accordance with an embodiment of the present disclosure.
[0041] FIG. 4 illustrates an exemplary flow diagram (400) of video processing, in accordance with an embodiment of the present disclosure.
[0042] FIG. 5 illustrates an exemplary flow diagram (500) of catalogue processing, in accordance with an embodiment of the present disclosure.
[0043] FIG. 6 illustrates an exemplary flow diagram (600) of product recommendation update process, in accordance with an embodiment of the present disclosure.
[0044] FIG. 7 illustrates an exemplary representation (700) of variations of Mean Hue Saturation value (HSV) values per frame for a content, in accordance with an embodiment of the present disclosure.
[0045] FIG. 8 illustrates an exemplary representation (800) of non-single connected component segmentation, in accordance with an embodiment of the present disclosure.
[0046] FIG. 9 illustrates an exemplary representation (900) of product recommendation on a content at pause, in accordance with an embodiment of the present disclosure.
[0047] FIG. 10 illustrates an exemplary computer system (1000) in which or with which embodiments of the present disclosure may be utilized, in accordance with embodiments of the present disclosure.
[0048] The foregoing shall be more apparent from the following more detailed description of the disclosure.

DETAILED DESCRIPTION
[0049] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0050] The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
[0051] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
[0052] Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0053] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.
[0054] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0055] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
[0056] The present disclosure provides a robust and an effective system and method to recognise visually similar products, for example, apparels on a content and provides recommendations to a user on a user equipment while viewing the content. The content may include, but not limited to, a Video on Demand (VOD) content, a video content, an Augmented Reality/Virtual reality (AR/VR) content, a holographic content, etc. The present disclosure matches the apparels of the personalities in the content to an external catalogue. Also, the present disclosure automatically updates the recommendations with new products at regular intervals. The proposed disclosure uses techniques such as visual search, deep learning, deep neural network, for example, deepDream or deep Neural Network (deepNN), computer vision and other artificial intelligence (AI) techniques applied on the content. The present disclosure may detect the product, for example, the apparel worn by a person, on the content on a frame basis, and trigger an advertisement based on the detected product.
[0057] Certain terms and phrases have been used throughout the disclosure and will have the following meanings in the context of the ongoing disclosure.
[0058] The term “VOD content” may refer to Video on Demand content or video content such as movies and television shows delivered directly to individual customers for immediate viewing.
[0059] The term “products” may refer to apparels of personalities in the VOD content, the video content, AR/VR content, a holographic content, etc.
[0060] Various embodiments of the present disclosure will be explained in detail with reference to FIGs. 1-10.
[0061] FIG. 1 illustrates an exemplary network architecture (100) in which or with which proposed system may be implemented, in accordance with an embodiment of the present disclosure.
[0062] As illustrated in FIG. 1, by way of example and not by not limitation, the exemplary network architecture (100) may include a plurality of computing devices (104-1, 104-2…104-N), which may be individually referred as the computing device (104) and collectively referred as the computing devices (104). The plurality of computing devices (104) may include, but not be limited to, scanners such as cameras, webcams, scanning units, and the like configured to scan a plurality of images including one or more products (102) (also referred to as apparels herein) from a content. Further, the network architecture (100) may include a system (110) that may receive and identify a plurality of image frames from a database associated with a centralised server (112). The system (110) may extract a set of attributes or feature vectors of the one or more products (102) from the plurality of image frames pertaining to, for example, facial, body features, slot, person, gender, and the like.
[0063] For example, each slot from the plurality of image frames may be identified using a content-based shot boundary detection by analysing variations on mean Hue Saturation Value (HSV) value of the plurality of image frames. The system (110) may detect bounding boxes of a person in the plurality of image frames using a model, and face bounding boxes in the plurality of image frames using a RetinaNet. The system (110) may use ShuffleNet, as an example, to label gender to the faces detected. For example, the person may be detected by using, but not limited to, You Only Look Once version 4 (YOLO v4) model to find the bounding boxes of the person in the content.
[0064] In an exemplary embodiment, the system (110) may be equipped with an AI engine (214) that may train a fashion synthesis model based on the extracted set of feature vectors and data stored in the database to generate a plurality of priors for apparel segmentation, segmentation filtering, aggregation, feature extraction, and search. In an exemplary embodiment, the AI engine (214) may be configured to employ one or more pre-trained fashion synthesis models to detect a person, face, gender, apparel, or the like in each image frame in the plurality of image frames received by the system (110).
[0065] In an embodiment, the AI engine (214) may generate an AI model to detect and segment out the apparels. The detected apparels may be filtered based on a number of connected components determined using the AI model. In an embodiment, the detected person bounding boxes, face bounding boxes, and apparel segmentation model may be combined to label gender associated with the detected apparels. This process may eliminate possible false positives.
[0066] In an embodiment, the detected apparels may be passed through the AI model to extract at least 128-dimension feature vectors but not limited to the like. In another embodiment, the feature vectors of the detected apparels may be compared with one or more feature vectors of one or more catalogue images using an approximate nearest neighbour search to recognise visually similar products to provide similar product recommendations to a user.
[0067] In an embodiment, filtering of the product recommendations may be done by determining histogram and comparing the recommendations against recommended catalogue images using but not limited to a Hellinger distance. For example, the apparel detection may be performed by, but not limited to, using a Mask-Region-Based Convolutional Neural Network (RCNN) model.
[0068] In an exemplary embodiment, one or more contents may be processed and slot detection may be used to find different slots for understanding the apparel changes in the one or more contents. The slot detection may be then sent to a person detection module to find one or more person bounding boxes. The slot detection may be further sent for face detection and apparel detection to get the bounding boxes of faces and apparels.
[0069] In an embodiment, an aggregation module may be used to understand an overlap between the one or more bounding boxes to map the apparel to a gender to reduce the search space to a relevant category. The output from the aggregation model may be sent to a representation model to get one or more recommended products that may be similar products of interest obtained with the help of mapping and one or more features that may be generated using the representation model.
[0070] In an embodiment, a colour information may be added to the one or more recommended products and the coloured one or more recommended products may be then filtered.
[0071] Once the one or more products in the content are recommended, in an exemplary embodiment, an update of the one or more recommended products may be performed to remove out of stock products from the one or more recommended products.
[0072] In an embodiment, each product in the content may be passed through the representation model and bring out similar products. Before updating the one or more recommended products (also referred to as metadata), the system (110) may check if all the products are available in an ecommerce platform and then update the metadata.
[0073] In an embodiment, the representation model may include a training model that may normalize the colour information from a catalogue to a set of, but not limited to, 4000 colour data, may use word similarity on product description to understand the product details, may use colour and product details information to generate triplets for the training, gradually may increase the margin while increasing the difficulty of triplets, may use sub sampling techniques in training to improve the results, and may explore different augmentation techniques to replicate the testing dataset.
[0074] In an embodiment, catalogue mapping may be performed using object detection models and mapping models.
[0075] In an exemplary embodiment, the system (110) may be configured to provide flexibility to modify the preference parameters as per requirements at any stage of the process.
[0076] In an embodiment, the computing device (104) may include smart devices operating in a smart environment, for example, an Internet of Things (IoT) system. In such an embodiment, the computing device (104) may include, but is not limited to, smart phones, smart watches, smart sensors (e.g., mechanical, thermal, electrical, magnetic, etc.), networked appliances, networked peripheral devices, networked lighting system, communication devices, networked vehicle accessories, networked vehicular devices, smart accessories, tablets, smart television (TV), computers, smart security system, smart home system, other devices for monitoring or interacting with or for the users and/or entities, or any combination thereof.
[0077] A person of ordinary skill in the art will appreciate that the computing device or user equipment (104) may include, but is not limited to, intelligent, multi-sensing, network-connected devices, that can integrate seamlessly with each other and/or with a central server or a cloud-computing system or any other device that is network-connected.
[0078] In an embodiment, the user equipment (104) may include, but is not limited to, a handheld wireless communication device (e.g., a mobile phone, a smart phone, a phablet device, and so on), a wearable computer device(e.g., a head-mounted display computer device, a head-mounted camera device, a wristwatch computer device, and so on), a Global Positioning System (GPS) device, a laptop computer, a tablet computer, or another type of portable computer, a media playing device, a portable gaming system, and/or any other type of computer device with wireless communication capabilities, and the like. In an embodiment, the user equipment (104) may include, but is not limited to, any electrical, electronic, electro-mechanical, or an equipment, or a combination of one or more of the above devices such as virtual reality (VR) devices, augmented reality (AR) devices, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computing device, wherein the user equipment (104) may include one or more in-built or externally coupled accessories including, but not limited to, a visual aid device such as a camera, an audio aid, a microphone, a keyboard, and input devices for receiving input from the user or the entity such as touch pad, touch enabled screen, electronic pen, and the like.
[0079] A person of ordinary skill in the art will appreciate that the user equipment (104) may not be restricted to the mentioned devices and various other devices may be used.
[0080] In an exemplary embodiment, the user equipment (104) may communicate with the system (110), for example, a product recognition system, through a network (106). The network (106) may include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. A network (106) may include, by way of example but not limitation, one or more of: a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a public-switched telephone network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, some combination thereof.
[0081] Although FIG. 1 shows exemplary components of the network architecture (100), in other embodiments, the network architecture (100) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1. Additionally, or alternatively, one or more components of the network architecture (100) may perform functions described as being performed by one or more other components of the network architecture (100).
[0082] FIG. 2 illustrates an exemplary block diagram (200) of the products recognition system (110), in accordance with an embodiment of the present disclosure.
[0083] In an embodiment, and as shown in FIG. 2, the system (110) may include one or more processors (202). The one or more processors (202) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that manipulate data based on operational instructions. Among other capabilities, the one or more processor(s) (202) may be configured to fetch and execute computer-readable instructions stored in a memory (204) of the system (110). The memory (204) may store one or more computer-readable instructions or routines, which may be fetched and executed to create or share the data units over a network service. The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as Random-Access Memory (RAM), or non-volatile memory such as Erasable Programmable Read-Only Memory (EPROM), flash memory, and the like.
[0084] In an embodiment, the system (110) may also comprise an interface(s) (206). The interface(s) (206) may comprise a variety of interfaces, for example, a variety of interfaces, for example, interfaces for data input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) (206) may facilitate communication of the system (110) with various devices coupled to it. The interface(s) (206) may also provide a communication pathway for one or more components of the system (110). Examples of such components include, but are not limited to, processing engine(s) (208) and a database (210).
[0085] In an embodiment, the processing engine(s) (208) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) (208). In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) (208) may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the one or more processors (202) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) (208). In such examples, the system (110) may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the system (110) and the processing resource. In other examples, the processing engine(s) (208) may be implemented by electronic circuitry.
[0086] In an embodiment, the database (210) may comprise data that may be either stored or generated as a result of functionalities implemented by any of the components of the processor(s) (202) or the processing engine(s) (208) or the system (110).
[0087] In an exemplary embodiment, the processing engine(s) (208) may include one or more engines selected from any of a product detection engine (212), an AI engine (214), a recommendation engine (216), and other units/engines (218).
[0088] In an embodiment, the product detection engine (212) may identify each image frame of a plurality of image frames comprised in a content using but not limited to a HSV value. Further, the product detection engine (212) may detect one or more products from each image frame of the plurality of image frames based on the HSV value of each image frame.
[0089] In an embodiment, the AI engine (214) may extract feature vectors from the detected one or more products using a representation model or an AI model. The AI engine (214) may compare the extracted feature vectors with feature vectors of one or more catalogue images stored in a catalogue feature database to recognise visually similar products from the catalogue feature database.
[0090] In an embodiment, the recommendation engine (216) may determine recommendations corresponding to the recognised products and save the determined recommendations corresponding to the recognised products as a metadata in a metadata database.
[0091] Although FIG. 2 shows exemplary components of the products recognition system (110), in other embodiments, the products recognition system (110) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 2. Additionally, or alternatively, one or more components of the products recognition system (110) may perform functions described as being performed by one or more other components of the products recognition system (110).
[0092] FIG. 3 illustrates an exemplary block diagram (300) of key components of the product recognition system (110), in accordance with an embodiment of the present disclosure.
[0093] As illustrated in FIG. 3, in an aspect, the product recognition system (110) may include an input video (302), a catalogue (304), and a time-based trigger module (306). The input video (302) may be referred to as the video content. The input video (302) may be sent to a video processing module for video processing (308). The catalogue (304) may be sent to a catalogue processing module for catalogue processing (310). The time-based trigger module (306) may be used for performing update process (312).
[0094] In an embodiment, catalogue processing (310) may be performed to detect one or more products available in each image frame of the catalogue (304). Further, feature vectors of the one or more products available in each image frame of the catalogue (304) may be extracted and stored in a catalogue feature store (314) or a catalogue feature database.
[0095] In an embodiment, video processing (308) may be performed to identify image frames in the input video (302), and detect one or more products available in each image frame of the input video (302). Further, feature vectors of the one or more products may be extracted and the extracted feature vectors may be compared with the feature vectors of the one or more products available in each image frame of the catalogue (304). The comparison of the feature vectors may be performed to recognise visually similar products from the catalogue feature store (314) or a catalogue feature database. Furthermore, recommendations corresponding to the recognised products may be determined and the determined recommendations corresponding to the recognised products may be saved as a video metadata in a video metadata store (316).
[0096] In an embodiment, the determined recommendations corresponding to the recognised products may be updated at regular intervals by updating the video metadata stored in the video metadata store (316).
[0097] FIG. 4 illustrates an exemplary flow diagram (400) of video processing, in accordance with an embodiment of the present disclosure.
[0098] With respect to FIG. 4, the video processing module may receive an input video (404) to be processed from one or more entities or vendors (402).
[0099] In an embodiment, a mean HSV values may be used to identify each image frame from a plurality of image frames of the input video (404). The HSV values may be used to detect one or more slot boundaries in the input video (404) to identify a window for displaying one or more contents (interchangeably referred to as products). The slot boundaries detection process may act as a standalone process which may be later used in removal of duplicate products in a slot.
[00100] In an embodiment, the plurality of image frames of the input video (404) may be captured and processed in a batch-wise manner (407) to optimally use a disk and Graphics Processing Unit (GPU) memory usage for the following processes:
a. Person detection (416) to detect person bounding boxes in each image frame of the plurality of image frames of the input video (404),
b. Face detection (412) to detect face bounding boxes in each image frame of the plurality of image frames of the input video (404),
c. Gender detection (414) to label gender with the faces detected,
d. Apparel segmentation (408) to detect the apparel classes, bounding boxes and segmented images from the input video (404). The predicted classes may be checked with a pre-defined mapping which contains information about vendor source category and trained classes to discard unmapped classes,
e. Segmentation filtering (410) to eliminate one or more non-single connected segments using a Scan Array Union Find (SAUF) technique but not limited to it,
f. Aggregation (418) to aggregate all the above features. Using the person and face bounding boxes, a person detected in the image frame may be mapped with a face by calculating an Intersection over Union (IoU) threshold between the person and face bounding boxes. The person and face information may be used with apparel bounding boxes by repeating a check on the IoU threshold to further map apparel with the gender information. By the end of batch processing (407), the plurality of image frames with apparels may be mapped to gender on a complete video (409),
g. Apparel feature extraction (420) to extract 128-dimension feature vectors on each segmented apparel detected from the video using a Triplet Net model,
h. Search process (422) to search the feature vectors of the segmented images on a catalogue feature store (434) to recognise or obtain the nearest visually similar catalogue images. The catalogue feature store (434) may include 128-dimensional feature vectors of each product image provided by the vendor (402) and which is indexed upon the trained category and gender. For example, a Facebook Artificial Intelligence Similarity Search (FAISS) may use the feature extraction information to search for similar products in a relevant window,
i. Colour filtering (424) to calculate histogram of search process results and sort the search process results by a metric of Hellinger distances to bring similar colour results closer, and
j. Deduplication (426) to remove duplicate products by using the slot information and the search process results.
[00101] In an embodiment, a video metadata store (430) may save the output obtained from above processes as video metadata (428) including the product recommendations, which is later used by an Ad server (432) to display the recommendation on the video.
[00102] FIG. 5 illustrates an exemplary flow diagram (500) of catalogue processing, in accordance with an embodiment of the present disclosure.
[00103] With respect to FIG. 5, catalogue data (504) including catalogue images may be received from an entity or a retail partner (502). The catalogue data (504) may be added to or deleted from a catalogue feature store (524) at (506).
[00104] In an embodiment, the catalogue images included in the catalogue data (504) may be sent to a product detection and segmentation model for product detection and segmentation (508). The catalogue products and objects available in the catalogue images may be detected. The catalogue product categories and object detection classes may be determined based on the detection of the catalogue products and the objects available in the catalogue images. At 510, the catalogue product category may be mapped to one of the object detection classes based on preconfigured rules in a mapping file. Further, the feature vectors of the catalogue products may be extracted, at 518, from the catalogue images based on mapping. The extracted feature vectors (526) may be added to and stored in the catalogue feature store (524). The 128-dimension feature vectors may be stored in the catalogue feature store (524).
[00105] In an embodiment, at 512, the catalogue processing module may determine whether the catalogue images included in the catalogue data (504) is available in a video metadata store (522). Further, the catalogue processing module may mark the catalogue images as delisted, at 520, in the catalogue feature store (524) in response to the determination that the catalogue images is available in the video metadata store (522). The catalogue processing module may, at 514, delete the catalogue images from the catalogue feature store (524) in response to the determination that the catalogue images is not available in the video metadata store (522).
[00106] For example, object detection model may be trained on but not limited to an iMaterialist dataset. The catalogue product categories may be analysed on the object detection model and mapped to object detections classes. Analysis may involve iteratively running each category images through the model, understanding if it captures all the relevant details in the bounding box for a given catalogue category and coming up with different thresholds for each category. Additional checks like person and face detection may also be used to support the cataloguing. One or more products may be discarded based on one or more connected components. Any object detected in the video may be filtered on the basis of the number of connected components in its binary image. This orchestrates the removal of ethnic wear and dresses. Along with it, any object having greater number of connected components, may be too flaky to give substantial recommendations hence may be eliminated. The products present in the catalogue which has a greater number of connected segments, generally may not be recommended appropriately. Therefore, enumeration of connected components in a corresponding binary image of the product may help in efficient elimination of the catalogue items.
[00107] FIG. 6 illustrates an exemplary flow diagram (600) of product recommendation update process, in accordance with an embodiment of the present disclosure.
[00108] With respect to FIG. 6, once the products in the video are recommended, the product recommendations may be updated so that it doesn’t show out of stock products. The update process module may use a time-based trigger module (602) for retargeting the products. The update process may be performed to remove the outdated products from the processed videos. To perform the update process, the video metadata of new products may be pulled, at 604, from a video metadata store (618) based on the extracted feature vectors. At 606, the outdated video metadata may also be pulled from the video metadata store (618). At 608, the outdated catalogue data may be deleted from a catalogue feature store (616). The video metadata of new products and the outdated video metadata may be merged at 610. The merged video metadata may be pushed back to the video metadata store (618). The merged video metadata may be processed to perform update the product recommendations at 612. Further, the merged video metadata may be updated with up-to-date catalogue data stored in the catalogue feature store (616) after deletion of the outdated catalogue data from the catalogue feature store (616) at 614. The merged video metadata updated with up-to-date catalogue data may be sent to an ad server (620) to display the recommendations on video.
[00109] FIG. 7 illustrates an exemplary representation (700) of variations of mean HSV values per frame for a sample video, in accordance with an embodiment of the present disclosure.
[00110] With respect to FIG. 7, a plurality of variations of mean HSV values per video frame for a sample video input may be depicted. The mean HSV values from image frames may be used to detect the slot boundaries in the videos to identify the window for displaying the products. The plurality of variations may empirically depict the distribution of slot duration in a particular type of video. The inference may signify the temporality of apparel detected. The skewness of the distribution may decide on a time provided for a particular catalogue to be visualized to shop.
[00111] FIG. 8 illustrates an exemplary representation (800) of non-single connected component segmentation, in accordance with an embodiment of the present disclosure.
[00112] With respect to FIG. 8, the non-single connected component segmentation is depicted. The non-single connected component segmentation such as for example but not limited to dress, top, blazer, saree, etc., may be detected from the image frames of the input video.
[00113] FIG. 9 illustrates an exemplary representation (900) of product recommendation on the content at pause, in accordance with an embodiment of the present disclosure.
[00114] With respect to FIG. 9, sample recommendations of products may be determined while pausing the content, and displayed on a user equipment while viewing the content.
[00115] FIG. 10 illustrates an exemplary computer system (1000) in which or with which embodiments of the present disclosure may be implemented.
[00116] As shown in FIG. 10, the computer system (1000) may include an external storage device (1010), a bus (1020), a main memory (1030), a read only memory (1040), a mass storage device (1050), a communication port (1060), and a processor (1070).
[00117] A person skilled in the art will appreciate that the computer system (1000) may include more than one processor and communication ports. The processor (1070) may include various modules associated with embodiments of the present disclosure.
[00118] In an embodiment, the communication port (1060) may be any of an RS-232 port for use with a modem-based dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports. The communication port (1060) may be chosen depending on a network, such a Local Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system (1000) connects.
[00119] In an embodiment, the memory (1030) may be a Random Access Memory (RAM), or any other dynamic storage device commonly known in the art. The read-only memory (1040) may be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chips for storing static information e.g., start-up or Basic Input/Output system (BIOS) instructions for the processor (1070).
[00120] In an embodiment, the mass storage (1050) may be any current or future mass storage solution, which can be used to store information and/or instructions. Exemplary mass storage solutions include, but are not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives (internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces), one or more optical discs, Redundant Array of Independent Disks (RAID) storage, e.g., an array of disks (e.g., SATA arrays).
[00121] In an embodiment, the bus (1020) communicatively couples the processor(s) (1070) with the other memory, storage, and communication blocks. The bus (1020) may be, e.g., a Peripheral Component Interconnect (PCI) / PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), Universal Serial Bus (USB) or the like, for connecting expansion cards, drives, and other subsystems as well as other buses, such a front side bus (FSB), which connects the processor (1070) to computer system (1000).
[00122] Optionally, operator and administrative interfaces, e.g., a display, keyboard, joystick, and a cursor control device, may also be coupled to the bus (1020) to support direct operator interaction with the computer system (1000). Other operator and administrative interfaces may be provided through network connections connected through the communication port (1060). Components described above are meant only to exemplify various possibilities. In no way should the aforementioned exemplary computer system (1000) limit the scope of the present disclosure.
[00123] While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.

ADVANTAGES OF THE PRESENT DISCLOSURE
[00124] The present disclosure matches apparels of personalities in a content to external catalogue.
[00125] The present disclosure determines visually similar apparel recommendations on the content and provides the visually similar apparel recommendations to a user on a user equipment, thereby enhancing the user experience.
[00126] The present disclosure automatically updates the recommendations with new products at regular intervals.
[00127] The present disclosure enhances the user experience of shopping the products while viewing the content.

,CLAIMS:1. A system (110) for recognising products in content, the system (110) comprising:
one or more processors (202); and
a memory (204) operatively coupled to the one or more processors (202), wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to:
identify each image frame of a plurality of image frames comprised in a content via a Hue Saturation Value (HSV) value;
detect one or more products from each image frame of the plurality of image frames based on the HSV value of each image frame;
extract, via a representation model, feature vectors from the detected one or more products;
compare the extracted feature vectors with feature vectors of one or more catalogue images stored in a catalogue feature database to recognise visually similar products from the catalogue feature database; and
determine recommendations corresponding to the recognised products and save the determined recommendations corresponding to the recognised products as metadata in a metadata database,
wherein the metadata comprising the determined recommendations is displayed to a user on a user equipment (104) while viewing the content.

2. The system (110) as claimed in claim 1, wherein the one or more processors (202) are to detect the one or more products from each image frame of the plurality of image frames based on the HSV value by being configured to:
detect one or more slot boundaries in the content to identify at least one window for displaying the one or more products based on the HSV value of each image frame;
detect classes of the one or more products, person bounding boxes, face bounding boxes, and segmented images from the content based on the one or more slot boundaries in the content;
determine Intersection over Union (IoU) threshold between the person bounding boxes and the face bounding boxes to map at least one person detected in at least one image frame with at least one face detected in the at least one image frame;
map the one or more products with at least one gender based on the IoU threshold; and
obtain the plurality of image frames with the one or more products mapped to the at least one gender on the content.

3. The system (110) as claimed in claim 1, wherein the one or more processors (202) are to recognise the visually similar products from the catalogue feature database by being configured to:
search similar products from the catalogue feature database based on the extracted feature vectors;
determine histogram of search results and categorize the search results by a metric of Hellinger distances based on the extracted feature vectors;
detect the one or more products with similar colour from the catalogue feature database to filter a number of products to be displayed based on the extracted feature vectors; and
recognise the visually similar products from the catalogue feature database based on the detection of the one or more products with similar colour.

4. The system (110) as claimed in claim 1, wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to remove one or more duplicate products from the determined recommendations based on a detection of one or more slot boundaries in the content and search results of similar products from the catalogue feature database.

5. The system (110) as claimed in claim 2, wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to save the detected one or more slot boundaries in the content and the search results of the similar products in a form of the metadata in the metadata database.

6. The system (110) as claimed in claim 1, wherein the content comprises at least one of: a Video on Demand (VOD) content, a video content, an Augmented reality/Virtual reality (AR/VR) content, and holographic content.

7. A system (110) for recognising products in content, the system (110) comprising:
one or more processors (202); and
a memory (204) operatively coupled to the one or more processors (202), wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to:
receive catalogue data comprising one or more catalogue images from a computing device (104);
perform addition or deletion of the catalogue data to or from a catalogue feature database in response to the reception of the catalogue data;
recognise one or more products comprised in the one or more catalogue images which are visually similar to one or more products comprised in a metadata content, based on the addition or deletion of the catalogue data; and
store the recognised one or more products in the catalogue feature database.

8. The system (110) as claimed in claim 7, wherein the one or more processors (202) are to perform the addition of the catalogue data to the catalogue feature database by being configured to:
send the one or more catalogue images comprised in the catalogue data to a product detection and segmentation model;
detect one or more catalogue products and one or more objects available in the one or more catalogue images;
determine one or more catalogue product categories and one or more object detection classes based on the detection of the one or more catalogue products and the one or more objects available in the one or more catalogue images;
map at least one catalogue product category to one of the one or more object detection classes based on preconfigured rules in a mapping file;
extract feature vectors from the one or more catalogue images based on the mapping; and
store the extracted feature vectors of the one or more catalogue images in the catalogue feature database.

9. The system (110) as claimed in claim 7, wherein the one or more processors (202) are to perform the deletion of the catalogue data from the catalogue feature database by being configured to:
determine whether the one or more catalogue images comprised in the catalogue data are available in a metadata database; and
perform one of:
mark the one or more catalogue images as delisted in the catalogue feature database in response to a determination that the one or more catalogue images are available in the metadata database; and
delete the one or more catalogue images from the catalogue feature database in response to a determination that the one or more catalogue images are not available in the metadata database.

10. A system (110) for recognising products in content, the system (110) comprising:
one or more processors (202); and
a memory (204) operatively coupled to the one or more processors (202), wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to:
identify each image frame of a plurality of image frames comprised in a content, wherein each image frame comprises one or more products;
extract feature vectors from the one or more products comprised in each image frame of the plurality of image frames;
pull metadata corresponding to the one or more products comprised in each image frame from a metadata database based on the extracted feature vectors;
recognise, via a representation model, new products associated with the extracted feature vectors in a catalogue feature database;
merge the pulled metadata with the recognised new products and send the merged metadata to the metadata database; and
update recommendations corresponding to the merged metadata on the content based on the extracted feature vectors.

11. The system (110) as claimed in claim 10, wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to delete outdated products or out of stock products from the metadata.

12. The system (110) as claimed in claim 10, wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to delete outdated catalogue products from the catalogue feature database.

13. A method for recognising products on content, the method comprising:
identifying, by a processor (202) associated with a system (110), each image frame of a plurality of image frames comprised in a content via a Hue Saturation Value (HSV) value;
detecting, by the processor (202), one or more products from each image frame of the plurality of image frames based on the HSV value of each image frame;
extracting, via a representation model, by the processor (202), feature vectors from the detected one or more products;
comparing, by the processor (202), the extracted feature vectors with feature vectors of one or more catalogue images stored in a catalogue feature database to recognise visually similar products from the catalogue feature database; and
determining, by the processor (202), recommendations corresponding to the recognised products and saving the determined recommendations corresponding to the recognised products as metadata in a metadata database,
wherein the metadata comprising the determined recommendations is displayed to a user on a user equipment (104) while viewing the content.

14. The method as claimed in claim 13, wherein detecting, by the processor (202), the one or more products from each image frame of the plurality of image frames based on the HSV value comprises:
detecting, by the processor (202), one or more slot boundaries in the content to identify at least one window for displaying the one or more products based on the HSV value of each image frame;
detecting, by the processor (202), classes of the one or more products, person bounding boxes, face bounding boxes and segmented images from the content based on the one or more slot boundaries in the content;
determining, by the processor (202), Intersection over Union (IoU) threshold between the person bounding boxes and the face bounding boxes to map at least one person detected in at least one image frame with at least one face detected in the at least one image frame;
mapping, by the processor (202), the one or more products with at least one gender based on the IoU threshold; and
obtaining, by the processor (202), the plurality of image frames with the one or more products mapped to the at least one gender on the content.

15. The method as claimed in claim 13, wherein recognising, by the processor (202), the visually similar products from the catalogue feature database comprises:
searching, by the processor (202), similar products from the catalogue feature database based on the extracted feature vectors;
determining, by the processor (202), histogram of search results and categorizing the search results by a metric of Hellinger distances based on the extracted feature vectors;
detecting, by the processor (202), the one or more products with similar colour from the catalogue feature database to filter a number of products to be displayed based on the extracted feature vectors; and
recognising, by the processor (202), the visually similar products from the catalogue feature database based on the detection of the one or more products with similar colour.

16. The method as claimed in claim 13, comprising removing, by the processor (202), one or more duplicate products from the determined recommendations, based on a detection of one or more slot boundaries in the content and search results of similar products from the catalogue feature database.

17. The method as claimed in claim 14, comprising saving, by the processor (202), the detected one or more slot boundaries in the content and the search results of the similar products in a form of the metadata in the metadata database.

18. A method for recognising products in content, the method comprising:
receiving, by a processor (202) associated with a system (110), catalogue data comprising one or more catalogue images from a computing device (104);
performing, by the processor (202), addition or deletion of the catalogue data to or from a catalogue feature database in response to the reception of the catalogue data;
recognising, by the processor (202), one or more products comprised in the one or more catalogue images which are visually similar to one or more products comprised in a content, based on the addition or deletion of the catalogue data; and
storing, by the processor (202), the recognised one or more products in the catalogue feature database.

19. The method as claimed in claim 18, wherein performing, by the processor (202), the addition of the catalogue data to the catalogue feature database comprises:
sending, by the processor (202), the one or more catalogue images comprised in the catalogue data to a product detection and segmentation model;
detecting, by the processor (202), one or more catalogue products and one or more objects available in the one or more catalogue images;
determining, by the processor (202), one or more catalogue product categories and one or more object detection classes based on the detection of the one or more catalogue products and the one or more objects available in the one or more catalogue images;
mapping, by the processor (202), at least one catalogue product category to one of the one or more object detection classes based on preconfigured rules in a mapping file;
extracting, by the processor (202), feature vectors from the one or more catalogue images based on the mapping; and
storing, by the processor (202), the extracted feature vectors of the one or more catalogue images in the catalogue feature database.

20. The method as claimed in claim 18, wherein performing, by the processor (202), the deletion of the catalogue data from the catalogue feature database comprises:
determining, by the processor (202), whether the one or more catalogue images comprised in the catalogue data are available in a metadata database; and
performing, by the processor (202), one of:
marking the one or more catalogue images as delisted in the catalogue feature database in response to a determination that the one or more catalogue images are available in the metadata database; and
deleting the one or more catalogue images from the catalogue feature database in response to a determination that the one or more catalogue images are not available in the metadata database.

21. A method for recognising products in content, the method comprising:
identifying, by a processor (202) associated with a system (110), each image frame of a plurality of image frames comprised in a content, wherein each image frame comprises one or more products;
extracting, by the processor (202), feature vectors from the one or more products comprised in each image frame of the plurality of image frames;
pulling, by the processor (202), metadata corresponding to the one or more products comprised in each image frame from a metadata database based on the extracted feature vectors;
recognising, via a representation model, by the processor (202), new products associated with the extracted feature vectors in a catalogue feature database;
merging, by the processor (202), the pulled metadata with the recognised new products and sending the merged metadata to the metadata database; and
updating, by the processor (202), recommendations corresponding to the merged metadata on the content based on the extracted feature vectors.

22. The method as claimed in claim 21, comprising deleting, by the processor (202), outdated products or out of stock products from the metadata.

23. The method as claimed in claim 21, comprising deleting, by the processor (202), outdated catalogue products from the catalogue feature database.

24. A user equipment (104), comprising:
one or more processors; and
a memory operatively coupled to the one or more processors, wherein the memory comprises processor-executable instructions, which on execution, cause the one or more processors to:
receive metadata comprising recommendations corresponding to recognised products from a system (110); and
display the received metadata comprising the recommendations corresponding to the recognised products to a user while viewing a content,
wherein the one or more processors are communicatively coupled with the system (110), and wherein the system (110) is configured to:
identify each image frame of a plurality of image frames comprised in the content via a Hue Saturation Value (HSV) value;
detect one or more products from each image frame of the plurality of image frames based on the HSV value of each image frame;
extract, via a representation model, feature vectors from the detected one or more products;
compare the extracted feature vectors with feature vectors of one or more catalogue images stored in a catalogue feature database to recognise visually similar products from the catalogue feature database; and
determine the recommendations corresponding to the recognised products and save the determined recommendations corresponding to the recognised products as metadata in a metadata database.

Documents

Application Documents

#	Name	Date
1	202221042822-STATEMENT OF UNDERTAKING (FORM 3) [26-07-2022(online)].pdf	2022-07-26
2	202221042822-PROVISIONAL SPECIFICATION [26-07-2022(online)].pdf	2022-07-26
3	202221042822-POWER OF AUTHORITY [26-07-2022(online)].pdf	2022-07-26
4	202221042822-FORM 1 [26-07-2022(online)].pdf	2022-07-26
5	202221042822-DRAWINGS [26-07-2022(online)].pdf	2022-07-26
6	202221042822-DECLARATION OF INVENTORSHIP (FORM 5) [26-07-2022(online)].pdf	2022-07-26
7	202221042822-ENDORSEMENT BY INVENTORS [24-07-2023(online)].pdf	2023-07-24
8	202221042822-DRAWING [24-07-2023(online)].pdf	2023-07-24
9	202221042822-CORRESPONDENCE-OTHERS [24-07-2023(online)].pdf	2023-07-24
10	202221042822-COMPLETE SPECIFICATION [24-07-2023(online)].pdf	2023-07-24
11	202221042822-FORM-8 [01-08-2023(online)].pdf	2023-08-01
12	202221042822-FORM 18 [01-08-2023(online)].pdf	2023-08-01
13	Abstract1.jpg	2023-12-23
14	202221042822-FER.pdf	2025-05-20
15	202221042822-FORM 3 [20-08-2025(online)].pdf	2025-08-20
16	202221042822-FER_SER_REPLY [18-09-2025(online)].pdf	2025-09-18

Search Strategy

1	202221042822E_26-12-2024.pdf