System And Method For Obtaining A Group Of Cropped Images Of Objects

< Back

System And Method For Obtaining A Group Of Cropped Images Of Objects Using Generative Ai

Abstract: Embodiments herein provide a method for obtaining a group of cropped images of objects using Generative AI. The method includes (i) training a generative AI model with historical data of images of objects corresponding to different categories and descriptions, (ii) processing the obtained images from an image capturing device to generate cropped images of the object, (iii) providing the cropped images to the custom-trained generative AI model, (iv) generating a combination of attributes as a pre-defined hierarchy in a key value format of the objects, (v) converting the cropped images into image embeddings. (vi) converting text attributes from the combination of attributes in the key value format into text embeddings, (vii) concatenating the image embeddings and the text embeddings into joint embeddings, (viii) determining a unique combination of the attributes to obtain a group of the cropped images. FIG. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

02 March 2025

Publication Number

14/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

INFILECT TECHNOLOGIES PRIVATE LIMITED

PID NO.68- 6-420, IV BLOCK,100 FEET ROAD, 420, INDEQUBE ASCENT, MAHAYOGI VEMANA ROAD, KORAMANGALA, BENGALURU, KARNATAKA - 560073, INDIA

Inventors

1. VIJAY GABALE

A203, Mantri Classic, 4th Block, 8th - Cross Rd, S.T. Bed, Cauvery Colony, Koramangala, Bengaluru, Karnataka, India-560034

2. Cheekatla Raghu Venkata Manikanta

4-103 ,main road sunkarapalem, tallarev mandal, Kakinada,Andhra Pradesh, India 533464

3. Lokesh Kishor Nandanwar

192/1 Yashashree, Somwari Quarters, Raghuji Nagar,Nagpur, Maharashtra, India- 440009

Specification

Description:BACKGROUND
Technical Field
[0001] The embodiments herein generally relate to generative artificial 5 intelligence,
and more particularly, to a system and method forgenerating a group of cropped images of
objects using generative artificial intelligence (AI).
Description of the Related Art
[0002] Retailers face significant challenges when generating a nomenclature of a
10 stock keeping unit(SKU) nomenclature based on images of a retail-shelf, primarily due to
inaccuracies caused by visual inconsistencies. Retail shelves are dynamic, ever-changing
environments where product positions frequently shift, lighting conditions can vary greatly,
and items may be partially obscured or blocked from view. These factors make it difficult to
accurately capture and categorize SKUs through images alone. For instance, low-resolution
15 images, glare, or shadows can distort key details such as product labels or barcodes, leading
to misidentification or improper classification of products. Such errors can disrupt inventory
management, resulting in stock discrepancies, potential stockouts, or overstocking situations
that could affect sales and customer satisfaction.
[0003] Traditional implementation of advanced image recognition for SKU
20 identification relies on complex machine learning models and computer vision techniques
like deep learning, which require extensive training on large, diverse datasets, making them
costly and time-consuming. Even with significant investment, the models can still struggle
with new or rare SKUs, updated packaging, or items not well-represented in the training data,
leading to errors.
3
[0004] Moreover, SKU nomenclature generationthat relies solely on images of retailshelf
presents issues a technical issue of scaling and standardization across different retail
locations. This is because retail environments vary widely in terms of layout, product
placement, and shelf organization, making it difficult to create a consistent and accurate SKU
catalog. Each retail store may have unique characteristics that complicate 5 the automated
recognition process, such as differences in shelf height, spacing, or product grouping. The
inconsistencies can result in errors when consolidating data across various stores, hindering
centralized inventory management and supply chain optimization efforts.
[0005] Further, Traditional image recognition methods for SKU identification lack
10 domain knowledge and class knowledge, which limits their ability to distinguish between
similar products. They rely solely on visual patterns and do not incorporate specific insights
or classifications, resulting in reduced accuracy for SKU identification.
[0006] Accordingly, there remains a need of addressing the aforementioned technical
problems using a system and method for generating SKU nomenclature in a retail
15 environment using an artificial intelligence.
SUMMARY
[0007] In view of the foregoing, an embodiment herein provides a method
ofgenerating a group of cropped images of objects using generative artificial intelligence
(AI). The method comprising: (a) training a generative AI model with historical data of
20 images of objects corresponding to different categories and descriptions of the objects
corresponding to the different categories to obtain a custom trained generative AI model, (b)
obtaining, from an image capturing device, in real time, an image that comprises a plurality
of objects within an area, (c) processing the image to generate cropped images of the plurality
of objects, wherein each of the cropped images comprises one of the plurality of objects, (d)
25 providing the cropped images as inputs to the custom trained generative AI model, (e) (f)
4
generating a first set of vectors or image embeddingsforthe cropped images of the plurality
of objects using the custom trained generative AI model, (g) generating a second set of
vectors or text embeddingsforthe cropped images of the plurality of objects using the custom
trained generative AI model, (h) concatenating the first set of vectors or image embeddings
and the second set of vectors or text embeddings into joint embeddings 5 or a third set of
vectors, and (i) determining a unique combination of the attributes using the custom trained
generative AI model and cropped images associated with the unique combination of the
attributes to obtain a group of the cropped images.
[0008] In some embodiments, the custom trained generative AI model provides the
10 combination of attributes through prompt interfacing.
[0009] In some embodiments, the text attributes are used to generates acatalog of
objects based on a categorical hierarchy.
[0010] In some embodiments, the method includes capturing the objects by placing a
box around the objects, covering coordinates of the objects from top to bottom and left to
15 right.
[0011] In some embodiments, a K-mean method is applied on jointembeddings or a
third set of vectors of the cropped images to obtain a number K of uniquecombination of the
attributes, wherein the number K is utilized to group cropped images based onsimilar
appearance and attributes.
20 [0012] In another aspect, there is provided a system ofdetermining a unique
combination of attributes to obtain a group of cropped images of objects using a custom
trained generative AI model. The system comprises an image capturing device configured at
physical retail store environment to capture images of the objects kept on a shelf and an
image processing server, where the image processing sever includes(i) a memory that stores a
25 database and a set of instructions,and (ii) a processor that executes the set of instructions and
5
is configured to(a) train a generative AI model with historical data of images of objects
corresponding to different categories and descriptions of the objects corresponding to the
different categories to obtain a custom trained generative AI model, (b) obtain, from
theimage capturing device, in real time, an image that comprises a plurality of objects within
an area, (c) process the image to generate cropped images of the plurality of 5 objects, wherein
each of the cropped images comprises one of the plurality of objects, (d) provide the cropped
images as inputs to the custom trained generative AImodel, (e) generate a combination of
attributes as a pre-defined hierarchy in a key value format for each of the plurality objects
using the custom trained generative AI model, (f) convert the cropped images of the plurality
10 of objects into a first set of vectors or image embeddings, (g) convert text attributes from the
combinationof attributes in the key value format into a second set of vectors or text
embeddings, (h) concatenate the first set of vectors or image embeddings and the second set
of vectors or text embeddings into joint embeddings or a third set of vectors, (i) determine a
unique combination of the attributes using the custom trained generative AI model and
15 cropped images associated with the unique combination of the attributes to obtain a group of
the cropped images.
[0013] The system is of advantage that the system enables identification and
categorizing of products based only on images of the shelf, which simplifies inventory
management and accelerates information retrieval. Thesystem provides increased accuracy in
20 nomenclature of the SKUs while minimizing errors. Further, a well-established nomenclature
system facilitates more effective data analysis and reporting, supporting better decisionmaking.
[0014] These and other aspects of the embodiments herein will be better appreciated
and understood when considered in conjunction with the following description and the
25 accompanying drawings.It should be understood, however, that the following descriptions,
6
while indicating preferred embodiments and numerous specific details thereof, are given by
way of illustration and not of limitation. Many changes and modifications may be made
within the scope of the embodiments herein without departing from the spirit thereof, and the
embodiments herein include all such modification.
BRIEF DESCRIPTION 5 OF THE DRAWINGS
[0015] The embodiments herein will be better understood from the following detailed
description with reference to the drawings, which.
[0016] FIG. 1 is a block diagram of a systemfor generating a group of cropped images
of objects using generative artificial intelligence (AI) according to some embodiments herein;
10 [0017] FIG. 2 illustrates an exploded view of an image processing server of FIG. 1
according to some embodiments herein;
[0018] FIGS. 3A and 3B flow diagrams that illustrate a method for generating a
group of cropped images of objects using generative AI according to some embodiments
herein;
15 [0019] FIG. 4 is a representative hardware environment for practicing the
embodiments herein with respect to FIG. 1 through 3B.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
20 [0020] The embodiment herein and the various features and advantageous details
thereof are explained more fully with reference to the non-limiting embodiments that are
illustrated in the accompanying drawings and detailed in the following
description.Descriptions of well-known components and processing techniques are omitted so
as to not unnecessarily obscure the embodiments herein. The examples used herein are
7
intended merely to facilitate an understanding of ways in which the embodiments herein may
be practiced and to further enable those of skill in the art to practice the embodiments
herein.Accordingly, the examples should not be construed as limiting the scope of the
embodiments herein.
[0021] There remains a need for a system and method forgenerating 5 a group of
cropped images of objects using generative artificial intelligence (AI).
[0022] Referring now to the drawings, and more particularly to FIGS. 1 to 4, where
similar reference characters denote corresponding features consistently throughout the
figures, there are shown preferred embodiments.
10 [0023] The term “custom-trained generative AI” refers to a generative model that has
been fine-tuned on a specific dataset associated witha particular application or domain. The
customization includes further training a general-purpose generative AI model on specialized
data to enhance its performance and relevance for tasks within that domain.
[0024] The term "Computer-vision-based large-language models"refers to advanced
15 AI systems that integrate both computer vision and natural language processing capabilities
to understand and generate text based on visual inputs. These models are designed to analyze
and interpret images or video content and then generate descriptive, contextually relevant text
or answers..
[0025] The term “text embedding” refers to the process of converting text into
20 numerical vectors that capture asemantic meaning and contextual relationships of words or
phrases. The vectors, generated by models like Word2Vec, GloVe, or BERT, represent
textual data in a high-dimensional space, allowing performing tasks such as similarity
comparison, clustering, and classification. By embedding text into a format that reflects its
meaning and usage, text embeddings enable analysis of language for various natural language
25 processing applications.
8
[0026] The term “image embedding” refers to the process of converting an image into
a numerical vector that captures its key features and visual characteristics in a highdimensional
space. This transformation is typically achieved using deep learning models like
convolutional neural networks (CNNs) or pre-trained vision models, which extract and
encode the image's information into a compact, fixed-size vector. Image 5 embeddings enable
efficient comparison, retrieval, and classification of images by representing them in a way
that reflects their content and structure, facilitating tasks such as image search, object
recognition, and similarity analysis.
[0027] The term “K-means method” refers to a clustering algorithm used to partition
10 a dataset into k distinct, non-overlapping groups or clusters, where k is a user-specified
number. The K-means method works by iteratively assigning data points to the nearest cluster
centroid and then recalculating the centroids as the mean of all points assigned to each
cluster. The K-means method is widely used for data analysis, pattern recognition, and
unsupervised learning tasks.
15 [0028] FIG. 1 is a block diagram of a system for generating a group of cropped
images of objects using generative artificial intelligence (AI) according to some embodiments
herein. The system 100 includes one or more shelves 102, an image-capturing device 104, a
network 106, and an image-processing server 108. In some embodiments, the edge device
104 includes, but is not limited to, a mobile device, a smartphone, a smartwatch, a notebook,
20 a Global Positioning System (GPS) device, a tablet, a desktop computer, a laptop, or any
network-enabled device. The objects may be products kept on the one or more shelves 102 in
retail stores. The images captured by the image-capturing device 104 capture the entire shelf
from left to right and from top to bottom. The products may include food products, hair
products, skin products, etc. The image-capturing device 104 is communicatively connected
25 to the image-processing server 108 through the network 106. In some embodiments, the
9
network 106 is a wired network. In some embodiments, the network 106 is a wireless
network. In some embodiments, the network 106 is a combination of the wired network and
the wireless network. In some embodiments, network 106 is the Internet.
[0029] The image-processing server 108 includes a custom-trained generative AI
model 110. The custom-trained AI model is a personalized version of an 5 existing AI model
and is trained with historical data of images of objects that correspond to different types of
categories and descriptions of the objects. The categories are further divided into
subcategories. For example, within the category of hair care, the subcategory may be
shampoo. The shampoo can then be further segmented into premium or ordinary options,
10 with specific brands and flavors like Smooth and Silky.
[0030] The image-processing server 108 receives, through the network 106, the
images of the objects captured from an image-capturing device 104. The image-processing
server 108 processes the images of the objects to determine a unique combination of
attributes to obtain a group of cropped images of objects using a custom-trained generative
15 AI model.
[0031] In some embodiments, thehuman annotator 112 corrects the combination of
attributes and saves the cropped image into the catalog.
[0032] FIG. 2 illustrates an exploded view of an image processing server 108 of FIG.
1 according to some embodiments herein. The image processing server 108 includes a
20 cropped image generating module 202, a key value format generating module 204, a cropped
image conversion module 206, a text attribute conversion module 208, a concatenating
module210, a grouped cropped images generating module 212, and a custom-trained AI
model 214. The cropped image generating module 202 extracts objects from the captured
images by identifying and isolating the specific regions of interest that contain the objects
25 while removing any extraneous parts of the image. This precise cropping is crucial for
10
training and improving models that recognize and classify products accurately, as precise
cropping ensures that each training sample focuses solely on the product itself. By
concentrating on the relevant parts of the image, the cropping process is essential for training
models to recognize and classify the detailed features of the objects. The key value format
generating module 204 generates a combination of attributes as a pre-defined 5 hierarchy in a
key-value format using cropped images of each object utilizing the custom-trained AI model
110. The custom-trained generative AI model 110 provides a combination of attributes
through prompt interfacing. The prompt interfacing improves interactions by utilizingnatural
language processing (NLP) to understand and generate human-like responses.The prompt
10 interfacing also personalizes interactions based on historical data and user preferences and
adapts dynamically to a wide range of prompts. Since the location of the objects on the
shelves is not fixed and can vary, prompt interfacing is essential to manage and adapt to these
changes. By using AI-driven prompt interfacing, systems can dynamically detect and adjust
to the location of the objects, ensuring accurate and reliable responses regardless of the
15 location of the objects. The flexibility allows for seamless interaction and precise handling of
objects in diverse and shifting environments.
[0033] The cropped image conversion module 206 converts the cropped image into a
fixed-size first set of vectors of numerical values that captures the essential features and
characteristics of the cropped images. fixed-size of the first set of vectors may be a 256-bit
20 vector. The first set of vectors is also known as image embedding. The image embedding
represents the images in a lower-dimensional space, making it easier to analyze, compare,
and process. Image embeddings are useful for tasks such as image retrieval, classification,
and clustering, as they enable efficient and effective representation of visual data. The text
attribute conversion module 208 transforms text attributes from the combination of attributes
25 in the key value format into a second set of vectors or text embeddings. The size of the
11
second set of vectors may be a 256-bit vector. represent text data in a continuous vector
space, where each word, phrase, or document is converted into a fixed-size numerical vector.
The transformation captures semantic meanings and relationships between words or phrases,
allowing machine learning models to process and analyze text more effectively.
[0034] The concatenating module 210 combines the first set of 5 vectors or image
embeddings and the second set of vectors or text embeddings to generate a third set of vectors
or joint embeddings. The concatenation captures more contextual information about the
images. The concatenation enablesthe integration of data from different modalities (such as
text and images), enabling models to learn from and make predictions based on multi-modal
10 information. The size of the third set of vectors may be a 512-bit vector. A K-means method
is applied to the third set of vectors, where K represents the number of unique nomenclatures
identified by the generative AI algorithm. Subsequently, all the cropped images are organized
into K distinct groups based on the nomenclatures. The K-meansmethod involves
categorizing the images based on certain criteria or features, ensuring that each group
15 contains images with similar characteristics. The grouping helps in managing and analyzing
the images more effectively by reducing complexity and enabling focused examination
within each category. A unique combination of the attributes is determined using the customtrained
generative AI model and cropped images associated with the unique combination of
the attributes to obtain a group of cropped images. A custom-trained generative AI model
20 transforms the group of cropped images into a high-resolution pack shot.
[0035] FIG. 3 is a flow diagram of a method for generating a group of cropped
images of objects using generative AI. At step 302, the method includes training a generative
AI model with historical data of images of objects corresponding to different categories and
descriptions of the objects corresponding to the different categories to obtain a custom25
trained generative AI model. At step 304, the method includesobtaining, from an image12
capturing device, in real time, an image that comprises a plurality of objects within an area.
At step 306, the method includes processing the image to generate cropped images of the
plurality of objects, wherein each of the cropped images comprises one of the plurality of
objects. At step 308, the method includesproviding the cropped images as inputs to the
custom-trained generative AI model. At step 310, the method includes 5 generating a
combination of attributes as a pre-defined hierarchy in a key-value format for each of the
plurality objects using the custom-trained generative AI model. At step 312, the method
includes converting the cropped images of the plurality of objects into a first set of vectors or
image embeddings. At step 314, the method includesconverting text attributes from the
10 combination of attributes in the key value format into a second set of vectors or text
embeddings. At step 316, the method includes concatenating the first set of vectors or image
embeddings and the second set of vectors or text embeddings into joint embeddings or a third
set of vectors.At step 318, the method includes determining a unique combination of the
attributes using the custom-trained generative AI model and cropped images associated with
15 the unique combination of the attributes to obtain a group of the cropped images.
[0036] The method is of advantage that the method enables identification and
categorizing of products based only on images of the shelf, which simplifies inventory
management and accelerates information retrieval. Themethod provides increased accuracy in
nomenclature of the SKUs while minimizing errors. Further, a well-established nomenclature
20 system facilitates more effective data analysis and reporting, supporting better decisionmaking.
[0037] In embodiments herein may include a computer program product configured
to include a pre-configured set of instructions, which when performed, can result in actions as
stated in conjunction with the methods described above. In an example, the pre-configured
25 set of instructions can be stored on a tangible non-transitory computer readable medium or a
13
program storage device. In an example, the tangible non-transitory computer readable
medium can be configured to include the set of instructions, which when performed by a
device, can cause the device to perform acts similar to the ones described here. Embodiments
herein may also include tangible and/or non-transitory computer-readable storage media for
carrying or having computer executable instructions or data structures 5 stored thereon.
[0038] Generally, program modules utilized herein include routines, programs,
components, data structures, objects, and the functions inherent in the design of specialpurpose
processors, etc. that perform particular tasks or implement particular abstract data
types. Computer executable instructions, associated data structures, and program modules
10 represent examples of the program code means for executing steps of the methods disclosed
herein. The particular sequence of such executable instructions or associated data structures
represents examples of corresponding acts for implementing the functions described in such
steps.
[0039] The embodiments herein can include both hardware and software elements.
15 The embodiments that are implemented in software include but are not limited to, firmware,
resident software, microcode, etc.
[0040] A data processing system suitable for storing and/or executing program code
will include at least one processor coupled directly or indirectly to memory elements through
a system bus. The memory elements can include local memory employed during actual
20 execution of the program code, bulk storage, and cache memories which provide temporary
storage of at least some program code in order to reduce the number of times code must be
retrieved from bulk storage during execution.
[0041] Input/output (I/O) devices (including but not limited to keyboards, displays,
pointing devices, etc.) can be coupled to the system either directly or through intervening I/O
25 controllers. Network adapters may also be coupled to the system to enable the data
14
processing system to become coupled to other data processing systems or remote printers or
storage devices through intervening private or public networks. Modems, cable modem and
Ethernet cards are just a few of the currently available types of network adapters.
[0042] A representative hardware environment for practicing the embodiments herein
is depicted in FIG. 4, with reference to FIGS. 1 through 3B. This 5 schematic drawing
illustrates a hardware configuration of a server or a computer system or a computing device
in accordance with the embodiments herein. The system includes at least one processing
device CPU 10 that may be interconnected via system bus 14 to various devices such as a
random-access memory (RAM) 15, read-only memory (ROM) 17, and an input/output (I/O)
10 adapter 17. The I/O adapter 17 can connect to peripheral devices, such as disk units 12 and
program storage devices 13 that are readable by the system. The system can read the
inventive instructions on the program storage devices 13 and follow these instructions to
execute the methodology of the embodiments herein. The system further includes a user
interface adapter 20 that connects a keyboard 18, mouse 19, speaker 25, microphone 23, and
15 other user interface devices such as a touch screen device (not shown) to the bus 14 to gather
user input. Additionally, a communication adapter 21 connects the bus 14 to a data
processing network 42, and a display adapter 22 connects the bus 14 to a display device 24,
which provides a graphical user interface (GUI) 30 of the output data in accordance with the
embodiments herein, or which may be embodied as an output device such as a monitor,
20 printer, or transmitter, for example.
[0043] The foregoing description of the specific embodiments will so fully reveal the
general nature of the embodiments herein that others can, by applying current knowledge,
readily modify and/or adapt for various applications such specific embodiments without
departing from the generic concept, and, therefore, such adaptations and modifications should
25 and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed
herein is for the purpose of description and not of limitation. Therefore, while the
embodiments herein have been described in terms of preferred embodiments, those skilled in
the art will recognize that the embodiments herein can be practiced with modification within
the spirit and scope of the 5 appended claims
, Claims:I/We Claim:
1. A method for determining a unique combination of attributes to 1 obtain a group of
2 cropped images of objects using a custom-trained generative AI model, the method comprises;
3 training a generative AI model with historical data of images of objects corresponding to
4 different categories and descriptions of the objects corresponding to the different categories to
5 obtain the custom-trained generative AI model;
6 obtaining, from an image capturing device, in real time, an image that comprises a
7 plurality of objects within an area;
8 processing the image to generate cropped images of the plurality of objects, wherein each
9 of the cropped images comprises one of the plurality of objects;
10 providing the cropped images as inputs to the custom-trained generative AI model;
11 generatinga first set of vectors or image embeddingsof the cropped images of the
12 plurality of the objectsusing the custom trained generative AI model;
13 generating a second set of vectors or text embeddingsof the cropped images of the
14 plurality of the objectsusing the custom trained generative AI model;
15 concatenating the first set of vectors or the image embeddings and the second set of
16 vectors or the text embeddings into a joint embeddings or a third set of vectors; and
17 determining the unique combination of the attributes using the custom-trained generative
18 AI model and the cropped images associated with the unique combination of the attributes to
19 obtain the group of the cropped images.
1
17
2. The method as claimed in claim 1, wherein the custom-trained generative 1 AI model
2 provides the combination of the attributes through a prompt interfacing.
13
1 3. The method as claimed in claim 1, wherein the text attributes are used to generates a
2 catalog of objects based on a categorical hierarchy.
1
1 4. The method as claimed in claim 1, wherein capturing the objects by placing a box around the
2 objects, covering coordinates of the objects from top to bottom and left to right.
13 5. The method as claimed in claim 1, wherein a K-mean method is applied on the joint
2 embeddings or the third set of vectors of the cropped images to obtain a number K of unique
3 combination of the attributes, wherein the number K is utilized to group of the cropped images
4 based on similar appearance and attributes.
1
1 6. Asystem of determining a unique combination of attributes to obtain a group of cropped
2 images of objects using a custom-trained generative AI model, wherein the system comprises,
3 an image capturing device configured at a physical retail store environment to capture
4 images of the objects kept on a shelf; and
5 an image processing sever, wherein the image processing server comprises:
6 a memory that stores a database and a set of instructions; and
7 a processor that executes the set of instructions and is configured to:
8 train a generative AI model with historical data of images of the objects
9 corresponding to different categories and descriptions of the objects
18
corresponding to the different categories to obtain a custom-trained 10 generative AI
11 model;
12 obtain, from an image capturing device, in real time, an image that
13 comprises a plurality of objects within an area;
14 process the image to generate cropped images of the plurality of objects,
15 wherein each of the cropped images comprise one of the plurality of objects;
16 provide the cropped images as inputs to the custom-trained generative AI
17 model;
18 generate a first set of vectors or image embeddings for the cropped images
19 of the plurality of objects using the custom trained generative AI model;
20 generate a second set of vectors or text embeddings for the cropped
21 images of the plurality of objects using the custom trained generative AI model;
22 concatenate the first set of vectors or the image embeddings and the
23 second set of vectors or the text embeddings into joint embeddings or a third set
24 of vectors; and
25 determine the unique combination of the attributes using the custom26
trained generative AI model and cropped images associated with the unique
27 combination of the attributes to obtain the group of the cropped images.
1
1 7. The system ofclaim 6, wherein the custom-trained generative AI model provides the
2 combination of the attributes through a prompt interfacing.
13
19
8. The system of claim 6, wherein the text attributes are used to generates 1 a catalog of
2 objects based on a categorical hierarchy.
1
1 9. The system of claim 6, the processor is further configured to capture the objects by
2 placing a box around the objects, covering coordinates of the objects from top to bottom and left
3 to right.
1
1 10.The system of claim 6, wherein a K-mean method is applied on joint embeddings or a third
2 set of vectors of the cropped images to obtain a number K of unique combination of the
3 attributes, wherein the number K is utilized to group cropped images based on similar
4 appearance and attributes.
Dated this 28thFebruary 2025
Arjun KarthikBala
IN/PA - 1021

Documents

Application Documents

#	Name	Date
1	202541018270-STATEMENT OF UNDERTAKING (FORM 3) [02-03-2025(online)].pdf	2025-03-02
2	202541018270-PROOF OF RIGHT [02-03-2025(online)].pdf	2025-03-02
3	202541018270-FORM FOR STARTUP [02-03-2025(online)].pdf	2025-03-02
4	202541018270-FORM FOR SMALL ENTITY(FORM-28) [02-03-2025(online)].pdf	2025-03-02
5	202541018270-FORM 1 [02-03-2025(online)].pdf	2025-03-02
6	202541018270-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [02-03-2025(online)].pdf	2025-03-02
7	202541018270-EVIDENCE FOR REGISTRATION UNDER SSI [02-03-2025(online)].pdf	2025-03-02
8	202541018270-DRAWINGS [02-03-2025(online)].pdf	2025-03-02
9	202541018270-DECLARATION OF INVENTORSHIP (FORM 5) [02-03-2025(online)].pdf	2025-03-02
10	202541018270-COMPLETE SPECIFICATION [02-03-2025(online)].pdf	2025-03-02
11	202541018270-FORM-9 [01-04-2025(online)].pdf	2025-04-01
12	202541018270-STARTUP [08-04-2025(online)].pdf	2025-04-08
13	202541018270-FORM28 [08-04-2025(online)].pdf	2025-04-08
14	202541018270-FORM 18A [08-04-2025(online)].pdf	2025-04-08
15	202541018270-FORM-26 [22-04-2025(online)].pdf	2025-04-22
16	202541018270-FER.pdf	2025-05-29
17	202541018270-MARKED COPIES OF AMENDEMENTS [30-06-2025(online)].pdf	2025-06-30
18	202541018270-FORM 13 [30-06-2025(online)].pdf	2025-06-30
19	202541018270-AMENDED DOCUMENTS [30-06-2025(online)].pdf	2025-06-30
20	202541018270-OTHERS [06-11-2025(online)].pdf	2025-11-06
21	202541018270-FER_SER_REPLY [06-11-2025(online)].pdf	2025-11-06
22	202541018270-CORRESPONDENCE [06-11-2025(online)].pdf	2025-11-06
23	202541018270-COMPLETE SPECIFICATION [06-11-2025(online)].pdf	2025-11-06
24	202541018270-CLAIMS [06-11-2025(online)].pdf	2025-11-06
25	202541018270-ABSTRACT [06-11-2025(online)].pdf	2025-11-06
26	202541018270-US(14)-HearingNotice-(HearingDate-15-12-2025).pdf	2025-11-19

Search Strategy

1	202541018270_SearchStrategyNew_E_202541018270E_21-05-2025.pdf