Abstract: Disclosed is an optimized image based searching system (101) comprising a processor unit (101) and a method thereof. The processor (201) is configured for receiving an input query image. The processor (201) is configured for detecting and segmenting a plurality of objects. The processor (201) is configured for extracting a plurality of features from the plurality of objects. The processor (201) is configured for assigning a score to the plurality of features. The processor (201) is configured for generating a signature feature string. Further, the processor (201) is configured for converting the signature feature string to an input image vector. The processor (201) is configured for comparing the input image vector with the prestored vector representations of the images. Furthermore, the processor (201) is configured for retrieving and displaying at least one of a similar image vector matching with the query image vector representation. [To be published with Figure 2]
Claims:WE CLAIM:
1. An optimized image based searching system (101) comprising:
a processor unit (101); and
a memory unit (103) coupled to the processor unit (101), wherein the processor unit (101) is configured to execute instructions stored in the memory unit (103) for:
receiving an input in form of a query image from a user device (103);
detecting and segmenting a plurality of objects present in the query image based upon a pretrained object detection and segmentation data;
extracting a plurality of features from the plurality of objects based upon a pre-trained machine learning model;
assigning a score to the plurality of features extracted from the plurality of objects;
generating a signature feature string corresponding to the query image based upon the score assigned to each of the plurality of features;
converting the signature feature string in form of a vector representation thereby generating an input image vector corresponding to the query image;
comparing the input image vector with a plurality of prestored image vectors corresponding to a plurality of images;
retrieving at least one of a similar image vector matching with the input image vector corresponding to the query image; and
displaying, one or more images on the user device, corresponding to the at least one image vector matching with the query image vector representation.
2. The optimized image based searching system (100) as claimed in claim 1, wherein the query image is initially pre-processed and converted into a standard format, and wherein the standard format is a format enabled for a deep learning classifier technique.
3. The optimized image based searching system (100) as claimed in claim 1, wherein the system comprises customizable image segments data in a database (213), wherein the database (213) comprises pretrained objects data and segmentation data associated with a plurality of images, wherein the objects and segment information comprises at least a domain specific data, a plurality of features, a text and OCR data related to the plurality of images.
4. The optimized image based searching system (100) as claimed in claim 3, wherein the pretrained objects data is obtained by labeling and creating bounding boxes to the objects of the images stored as a domain specific information of the database (104).
5. The optimized image based searching system (100) as claimed in claim 4, wherein the pretrained objects data is obtained by a Convolutional Neural Networks (CNN) technique.
6. The optimized image based searching system (100) as claimed in claim 1, wherein the pretrained segment data is obtained by yielding pixel-wise masks from each object to precisely extract color of the images stored as a domain specific information of the database (104) by using a mask R-CNN technique.
7. The optimized image based searching system (100) as claimed in claim 1, wherein the signature feature string of the query image is generated by obtaining and quantifying score of the ‘n’ features based on a weighted probability of the extracted features and then repeating ‘n’ times to form a signature feature string of the query image, wherein the signature feature string is generated for all the images in the domain specific information prestored in the database (213).
8. The optimized image based searching system (100) as claimed in claim 1, wherein comparing the input image vector is performed by a cosine similarity technique in a vector space, wherein the input image vector corresponds to the query image prestored in the database (213).
9. The optimized image based searching system (100) as claimed in claim 8, wherein the input image vector corresponding to the query image are represented in form of a vector space occupying a point in a cubical matrix comprising the domain specific information prestored in the database (213).
10. An optimized image based searching method (300), comprising:
receiving (301), by a processor (102), an input in form of a query image from a user device (103);
detecting and segmenting (303), by the processor (102), a plurality of objects of the query image based upon a pretrained object detection and segmentation data;
extracting (305), by the processor (102), a plurality of features from the plurality of objects based upon a pretrained machine learning model;
assigning (307), by the processor (102), a score to the plurality of features extracted from the plurality of objects;
generating (309), by the processor (102), a signature feature string corresponding to the query image based upon the score assigned to each of the plurality of features;
converting (311), by the processor (102), the signature feature string to a vector representation thereby generating an input image vector corresponding to the query image;
comparing (313), by the processor (102), the input image vector with a plurality of prestored image vectors corresponding to a plurality of images;
retrieving (315), by the processor (102), at least one of an image vector matching with the query image vector; and
displaying (317), by the processor (102), one or more images on a user device (103), corresponding to the at least one image vector matching with the query image vector.
11. The optimized image based searching method (300) as claimed in claim 10, wherein the method comprises a sub-step of normalizing the score values, such that all the different features are represented on a common scale in the signature feature string.
Dated this 18th day of December 2019
Priyank Gupta
Agent for the Applicant
IN/PA- 1454
, Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
AN OPIMIZED IMAGE BASED SEARCHING SYSTEM AND A METHOD THEREOF
APPLICANT
Zensar Technologies Limited,
an Indian Entity,
having address as:
Zensar Knowledge Park, Plot # 4, MIDC, Kharadi, Off
Nagar Road, Pune-411014, Maharashtra, India
The following specification describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
The present application does not claim priority from any other Patent Application.
TECHNICAL FIELD
The present subject matter described herein, in general, relates to an image searching system. More particularly, the present subject matter is related to an optimized image based searching system and a method thereof.
BACKGROUND
In a machine learning environment, an unstructured data such as content based image searching is harder to implement, classify and manage. Various online/offline merchant domains now require search and retrieval of similar matching products given a product image as an input or content based image retrieval. In state of the art, text-based searching is utilized to arrive at an expected result. Normally, in the text based search techniques, users end up using different terminologies that increases ambiguity.
In the state of art, all available methods of content based image retrieval use the conventional approach of extracting and storing the features and then simply indexing them. This approach is limited in terms of representing similar data together. In most of the cases the conventional methods end up retrieving some dissimilar data along with the similar one.
Further, in the state of art, region extracting means are used for deleting a background region from each product image including one or more products and extracting only a product region. A database corresponding to the image information is utilized only for storing the product ID and product information of each of the product corresponding to the product region extracted from each of a product image. This product information may contain data like name, type, colour etc. These features are extracted with help of classifiers which classify the images and stores them in database as per need. For retrieval of similar kind of image these already extracted product information are compared with the new image information and based on that it searches the similar one, in some of the conventional methods pixels are compared of the two images to find their similarity.
Furthermore, in the conventional approach, the classifier stores the feature with maximum probability and discards all the other features with lesser probability. The major drawback of this conventional approach is that, it also discards the actual probability and the ratio of these probabilities resulting in loss of information.
Thus, there is a long felt need of a system and method for optimized image based search retrieving similar images with highest probability, and without any loss of information.
SUMMARY
This summary is provided to introduce the concepts related to an optimized image based searching system and a method thereof and the concepts are further described in the detail description. This summary is not intended to identify essential features of the claimed subject matter nor it is intended to use in determining or limiting the scope of claimed subject matter.
In one implementation, the present subject matter describes an optimized image based searching system. The optimized image based searching system may include a processor and a memory coupled to the processor. The processor may be configured to execute programmed instructions stored in the memory. The processor may execute one or more programmed instructions for receiving an input in form of a query image from a user device. The processor may execute one or more programmed instructions for detecting and segmenting a plurality of objects present in the query image based upon a pretrained object detection and segmentation data. The processor may execute one or more programmed instructions for extracting a plurality of features from the plurality of objects based upon a pre-trained machine learning model. The processor may execute one or more programmed instructions for assigning a score to the plurality of features extracted from the plurality of objects. The processor may execute one or more programmed instructions for generating a signature feature string corresponding to the query image based upon the score assigned to each of the plurality of features. The processor may execute one or more programmed instructions for converting the signature feature string in form of a vector representation thereby generating an input image vector corresponding to the query image. The processor may execute one or more programmed instructions for comparing the input image vector with a plurality of prestored image vectors corresponding to a plurality of images. The processor may execute one or more programmed instructions for retrieving at least one of a similar image vector matching with the input image vector corresponding to the query image. Furthermore, the processor may execute one or more programmed instructions for displaying, one or more images on the user device, corresponding to the at least one image vector matching with the query image vector representation.
In another implementation, the present subject matter describes an optimized image based searching method. The method may include receiving, by a processor, an input in form of a query image from a user device. The method may further include detecting and segmenting, by the processor, a plurality of objects of the query image based upon a pretrained object detection and segmentation data. The method may further include extracting, by the processor, a plurality of features from the plurality of objects based upon a pretrained machine learning model. The method may further include assigning, by the processor, a score to the plurality of features extracted from the plurality of objects. The method may further include generating, by the processor, a signature feature string corresponding to the query image based upon the score assigned to each of the plurality of features. The method may further include converting, by the processor, the signature feature string to a vector representation thereby generating an input image vector corresponding to the query image. The method may further include comparing, by the processor, the input image vector with a plurality of prestored image vectors corresponding to a plurality of images. The method may further include retrieving, by the processor, at least one of an image vector matching with the query image vector. The method may further include displaying, by the processor, one or more images on a user device, corresponding to the at least one image vector matching with the query image vector.
In one embodiment, the query image may be initially pre-processed and converted into a standard format enabled for a deep learning classifier technique.
In one embodiment, the system may comprise customizable image segments data in a database, wherein the database further comprises pretrained objects data and segmentation data associated with a plurality of images, wherein the objects and segment information comprises at least a domain specific data, a plurality of features, a text and OCR data related to the plurality of images.
In one embodiment, the pretrained objects data may be obtained by labeling and creating bounding boxes to the objects of the images stored as domain specific information of the database.
In one embodiment, the pretrained objects data may be obtained by a Convolutional Neural Networks (CNN) technique.
In one embodiment, the pretrained segment data may be obtained by yielding pixel-wise masks from each object to precisely extract color of the images stored as a domain specific information of the database by using a mask R-CNN technique.
In one embodiment, the signature feature string of the query image may be generated by obtaining and quantifying score of the ‘n’ features based on the score of the extracted features and then repeating ‘n’ times to form a signature feature string of the query image.
In one embodiment, the signature feature string may be generated for all the images in the domain specific information prestored in the database.
In one embodiment, the comparing of the input image vector may be performed by a cosine similarity technique in a vector space, wherein the input image vector corresponds to the query image prestored in the database.
In one embodiment, the input image vector corresponding to the query image may be represented in form of a vector space occupying a point in a cubical matrix comprising the domain specific information prestored in the database.
In one embodiment, the optimized image based searching method may comprise a sub-step of normalizing the score values, such that all the different features are represented on a common scale in the signature feature string.
BRIEF DESCRIPTION OF DRAWINGS
The detailed description is described with reference to the accompanying Figures.
Figure 1 illustrates a network implementation (100) of an optimized image based searching system (101), in accordance with an embodiment of the present disclosure.
Figure 2 illustrates a pipeline of the programmed instructions executed by the system (101) for enabling faster and optimized image based search retrieval, in accordance with an embodiment of the present disclosure.
Figure 3 illustrates a method (300) of an optimized image based searching by a user in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
Reference throughout the specification to “various embodiments,” “some embodiments,” “one embodiment,” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases “in various embodiments,” “in some embodiments,” “in one embodiment,” or “in an embodiment” in places throughout the specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
Referring to Figure 1, a network implementation (100) of an optimized image based searching system (101) is illustrated, in accordance with an embodiment of the present subject matter.
In an embodiment, the image based searching system (101) (hereinafter referred as system (101) interchangeably) may be connected to a user device (103) over a network (102). It may be understood that the optimized image based searching system (101) may be accessed by multiple users through one or more user devices (103-1), (103-2), (103-3)….(103-n), collectively referred to as a user device (103). The user device (103) may be any electronic device, communication device, image capturing device, machine, software, automated computer program, a robot or a combination thereof.
In an embodiment, through the present subject matter is explained considering that the system (101) is implemented (as an optimized image based searching system) on a server, it may be understood that the system (101) may also be implemented in a variety of user devices, such as, but not limited to, a portable computer, a personal digital assistance, a handheld device, a mobile, a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a mobile device, and the like. In one embodiment, system (101) may be implemented in a cloud-computing environment. In an embodiment, the network (102) may be a wireless network such as Bluetooth, Wi-Fi, 3G, 4G/LTE and alike, a wired network or a combination thereof. The network (102) can be accessed by the user device (103) using wired or wireless network connectivity means including updated communications technology.
In one embodiment, the network (102) can be implemented as one of the different types of networks, cellular communication network, local area network (LAN), wide area network (WAN), the internet, and the like. The network (102) may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further, the network (102) may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
Further, referring to Figure 1, various components of the optimized image based searching system (101) are illustrated, in accordance with an embodiment of the present subject matter. As shown, the system (101) may include at least one processor (201), an input/output interface (203), a memory (205), programmed instructions (207) and data (209). In one embodiment, the at least one processor (201) is configured to fetch and execute computer-readable instructions stored in the memory (205).
In one embodiment, the I/O interface (203) implemented as a mobile application or a web-based application and may further include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface (203) may allow the system (101) to interact with the user devices (103). Further, the I/O interface (203) may enable the user device (103) to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface (203) can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface (203) may include one or more ports for connecting to another server. In an exemplary embodiment, the I/O interface (203) is an interaction platform which may provide a connection between users and system (101).
In an implementation, the memory (205) may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and memory cards. The memory (205) may include modules (207) and data (209).
In one embodiment, the programmed instructions (207) may include, routines, programmes, objects, components, data structures, etc. which perform particular tasks, functions, or implement particular abstract data types. The data (209) may comprise a data repository (211), database (213) and other data (215). In one embodiment, the database (213) may comprise customizable image segments data pre-trained and customized objects data and segmentation data associated with a plurality of images. The objects and segment information may comprise at least a domain specific data, a plurality of features, a text and optical character recognition (OCR) data related to the plurality of images. The other data (215) amongst other things, serves as a repository for storing data processed, received, and generated by one or more components and programmed instructions.
The aforementioned computing devices may support communication over one or more types of networks in accordance with the described embodiments. For example, some computing devices and networks may support communications over a Wide Area Network (WAN), the Internet, a telephone network (e.g., analog, digital, POTS, PSTN, ISDN, xDSL), a mobile telephone network (e.g., CDMA, GSM, NDAC, TDMA, E-TDMA, NAMPS, WCDMA, CDMA-2000, UMTS, 3G, 4G), a radio network, a television network, a cable network, an optical network (e.g., PON), a satellite network (e.g., VSAT), a packet-switched network, a circuit-switched network, a public network, a private network, and/or other wired or wireless communications network configured to carry data. Computing devices and networks also may support wireless wide area network (WWAN) communications services including Internet access such as EV-DO, EV-DV, CDMA/1×RTT, GSM/GPRS, EDGE, HSDPA, HSUPA, and others.
The aforementioned computing devices and networks may support wireless local area network (WLAN) and/or wireless metropolitan area network (WMAN) data communications functionality in accordance with Institute of Electrical and Electronics Engineers (IEEE) standards, protocols, and variants such as IEEE 802.11 (“WiFi”), IEEE 802.16 (“WiMAX”), IEEE 802.20x (“Mobile-Fi”), and others. Computing devices and networks also may support short range communication such as a wireless personal area network (WPAN) communication, Bluetooth® data communication, infrared (IR) communication, near-field communication, electromagnetic induction (EMI) communication, passive or active RFID communication, micro-impulse radar (MIR), ultra-wide band (UWB) communication, automatic identification and data capture (AIDC) communication, and others.
The working of the system (101) in facilitating optimized image based search will now be described in detail referring to Figures 1, 2 and 3 as below:
In one embodiment, a user may select a product (e.g. shirt) or an image of the product from multiple available images displayed over a user device (103). If the user desires to look for more similar images or matching images to the selected product, the user may input a query image to the optimized image based searching system to retrieve at least highly similar images in very less amount of processing time.
In one embodiment, the processor (201) may be configured for receiving an input in form of a query image from the user device (103). In an embodiment, the processor (201) may be configured to initially pre-process and convert the query image into a standard format. The standard format is a format of images enabled for a deep learning classifier technique.
The processor (201) may be further configured for pre-processing of a query image. In one embodiment, the pre-processing of the query image may be performed by using open source library such as OpenCV or any similar machine learning environment. The pre-processing of the query image may comprise sub-steps including query image acquisition, image compression and decompression, image enhancement and restoration, image deionizing and image resolution modification known in the art and hence are not described in detail for the sake of brevity.
In one embodiment, the processor (201) may be further configured for detecting and segmenting a plurality of objects present in the query image based upon a pretrained object detection and segmentation data. In one embodiment, the detection and segmentation of plurality of objects present in the query image may be performed on a pre-processed query image by implementing machine learning platform models. In one embodiment, the detection and segmentation of plurality of objects may be performed as per product specific domain use case. The processor (201) may be further configured to detect and locate the objects of such models in the image for which the pre-trained models are trained for.
The query image fragment detection and segmentation may be performed by various computer implemented techniques based on Convolutional Neural Networks for Object Detection. Furthermore, the query image fragment detection and segmentation is performed by using Convolutional Neural Network techniques selected from, but are not limited to, R-CNN, SPP, Fast R-CNN, Faster R-CNN, Featured pyramid networks, RetinaNet (focal loss), YOLO framework (Yolo1, Yolo2, Yolo3) and SSD.
In one embodiment, the selection of computer implementation technique may be varied depending on the hardware ability and requirement. In a preferred embodiment, the computer implemented techniques based on Convolutional Neural Networks for Object Detection and segmentation is implemented by using a YOLO-V3 framework technique. It must be noted herein that the YOLO-V3 framework is enabled to provide decent and faster mean average precision (MAP) for object segmentation.
In one embodiment, the object detection from the query image may include various steps. In the first step, relevant data and additional data may be collected and labelled by a tool such as ‘labellmg’. The labelling of the data enables the processor (201) to create bounding boxes and labelling the bounding boxes as per the object class. In the next step, the YOLO framework may be implemented over a detected labelled object of the query image.
In one embodiment, the processor (201) may be configured for segmentation of pre-processed the query image. The segmentation of the pre-processed query image may be performed by forming bounding box (x, y) – coordinates for the various objects in the query image. However, unlike the object detection technique, the segmentation of query image is enabled for yielding pixel-wise masks for each of the segmented objects. The segmentation of the query image is enabled to extract colour from the objects of the query image precisely.
In one embodiment, the segmentation of the query image is enabled via Mask R-CNN computer implemented CNN technique. The processor (201) may be configured to collect and label the segmented objects of the query image by using a tool ‘visual image annotator’. The main purpose of obtaining the segmented image is to extract colour from the segmented objects of the query image precisely and without interfering with a background noise data of the query image. A segmented image data is then stored in the database (213) as pretrained segment data. It must be noted herein that the aforementioned techniques of object detection and segmentation using the listed technologies above are known in the art and hence the detailed description of those technologies are described in the disclosure for the sake of brevity.
The data obtained from the object detection and segmentation is then stored in the database (213) in form of pretrained objects data. In one embodiment, the pretrained objects data is obtained by labeling and creating bounding boxes to the objects of the images stored as a domain specific information of the database (104). The object detection and the object segmentation data are further used for performing feature extraction.
The processor (201) may be further configured to perform feature extraction. The feature extraction includes extraction of information from the query image by classifying the image into multiple categories. The classification of extracted information may be implemented by various custom trained deep learning models (classifiers) and known as feature extraction of the query images. The feature extraction may be executed by optimized models obtained by training Convolutional Neural Network (CNN) based on a tool ‘keras’ and the domain specific data.
The process of feature extraction may include multiple sub-steps. The first step of feature extraction process is to collect customized data as per the required feature pre-trained and fed to the database (213) of the system (101). In an exemplary embodiment, if the user requirement is to extract pattern of a shirt as one of the features, then the pretrained data fed to the database (213) of the system (101) is categorised as “Printed, Checked, Stripes, Solids” etc.
The CNN network based on ‘keras’ implemented by the processor (201) includes simultaneous steps of Convolution, Polling and Flattening. The classifier is enabled to determine and predict the pattern of the ‘shirt’ from the query image by comparing the feature data to a pretrained data stored in the database (213). The process of feature extraction is repeated for predetermined iterations in order to extract additional plurality of features from the objects in the query image. It must be noted herein that the aforementioned technique of feature extraction using the listed technologies above are known in the art and hence the detailed description of those technologies are described in the disclosure for the sake of brevity.
The processor (201) may be further configured for assigning a score (interchangeably referred as “score value”) to the plurality of features extracted from the plurality of objects of the query image. The score is assigned based on a pre-trained score assignment model stored in the database (213).
In one embodiment, the plurality of features are the attributes obtained from segmentation and detection of objects. In another embodiment, the plurality of features can be constituted as word or symbolic tags to the characteristics of the objects of the query image. In one embodiment, the plurality of features may be represented as, but are not limited to, optical character (OCR) recognized text available in the query image. Such text is also added as a feature to form a features string generated from attributes, tags and OCR identified text.
The processor (201) may be further configured to combine and add each of a plurality of features obtained from a pretrained data by generating a feature string corresponding to the query image based upon the score assigned to each of the plurality of features.
The processor (201) may be further configured for normalizing the score values, such that all the different features are represented on a common scale in a signature feature string. In one embodiment, the score values may be represented as the probability values.
The processor (201) may be further configured for generating a signature feature string corresponding to the query image based upon score assigned to each of the plurality of features. In one embodiment, the signature feature string of the query image may be generated by obtaining and quantifying score value of ‘n’ features based on the score of the extracted features and then repeating ‘n’ times to form a signature features string of a query image. In one embodiment, the signature feature string may be generated for all the images in the domain specific information prestored in the database (213).
The processor (201) may be further configured for converting the signature feature string in form of a vector representation thereby generating an input image vector corresponding to the query image. The processor (201) may be further configured for comparing the input image vector with a plurality of prestored image vectors corresponding to a plurality of images obtained from the database (213). In one embodiment, comparing the input image vector with the prestored image vectors stored in the database (213) may be performed by a cosine similarity technique. The input image vector corresponds to the query image data prestored in the database in form of a vector representation.
In one embodiment, the processor may be configured to represent input image vector corresponding to the query image in form of a vector space occupying a point in a cubical matrix comprising the domain specific information prestored in the database (213).
The processor (201) may be further configured to retrieving at least one of an image vector matching with the input image vector corresponding to the query image. The processor (201) may be further configured for displaying one or more images on the user devices (103) corresponding to the at least one image vector matching with the query image vector representation.
Now, referring to Figure 2, a pipeline of the programmed instructions executed by the system (101) for enabling faster and optimized image based search retrieval is illustrated, in accordance with an embodiment of the present disclosure. In an embodiment, the processor (201) may enable the database (213) to store the domain specific information in form a vector representation as in a vector space. The processor (201) may be configured for receiving an input image (instead of query image) from a us. The processor (201) may be configured for initially pre-processing and converting the input image into a standard format, wherein the standard format is a format enabled for a deep learning classifier technique. The processor (201) may be configured for independently detecting and segmenting a plurality of objects present in the query image based upon a pretrained object detection and segmentation data. The processor (201) may be configured for extracting a plurality of features from the plurality of objects of each of the input image based upon a pre-trained machine learning model.
The processor (201) may be configured for assigning a score value to the plurality of features extracted from the plurality of objects of the each of the input image. In one embodiment, the score may be a priority value assigned to the plurality of features extracted from the plurality of objects of the each of the input image, wherein the priority value may be weights assigned to the plurality of features. In one embodiment, the probability value, and wherein the priority value may assign weights to plurality of features extracted from the plurality of objects. In one embodiment weights may comprise percentage of similarity or difference between prestored features and the features extracted from the plurality of objects of the query image.
The processor (201) may be configured for generating and storing a signature feature string corresponding to the input image based upon the score assigned to each of the plurality of features of each of the input image. The processor (201) may be configured for converting the signature feature string in form of a vector representation thereby generating an input image vector corresponding to the each of the input image and store the input image vector corresponding to the each of the input image in the database (213). In one embodiment, the input image vector corresponding to the each of the input image are stored in a vector space. The input image vectors are represented in the vector space in such a way that the distance between two different features is more. More particularly, the vector space is generated such that the similar items are clustered together while dissimilar are far apart in the vector space. Therefore, multiple input images may be processed by the processor (201) by implementing the aforementioned steps of object detection, image segmentation, feature extraction, generating signature feature string, and converting the signature feature string into vector representation thereby generating an input image vector for each of the multiple input images. A vector space representation of the multiple input images is generated by the processor (201) which may be utilized for image-based search and retrieval by the system (101).
For example, when an input query image is received from the user device (103), the processor (201) executes the programmed instructions to implement the aforementioned steps of object detection, image segmentation, feature extraction, generating signature feature string, and converting the signature feature string into vector representation thereby generating an input image vector for the input query image. Further, cosine similarity between the vectors of query image signature and the stored signatures is calculated by the processor (201) and the results sorted on similarity are displayed to the user on the user device (103). It must be noted herein that the query image also is processed through the same pipeline of the programmed instructions (207) for generating a signature feature string which further is converted to and stored in a vector space. Thus, while retrieving the data in form of a vector space similar to the query image, the similar image data is retrieved with high similarity. The retrieved data with high similarity is therefore displayed on the user device (103).
Now, referring to figure 3, a method (300) depicting an optimized imaged base searching by the system (101) is illustrated in accordance with the embodiments of the present disclosure. The order in which method (300) is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method (300) or alternate methods. Furthermore, the method (300) can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for the ease of explanation, in the embodiments described below, the method (300) may be implemented in the above described system (101).
At step (301), the processor (201) may be configured for receiving an input in form of a query image from the user device (103). In one embodiment, the input query image may be a screenshot, camera captured image, copied image or a display image on an online merchant domain. In another embodiment, the input image may be a scanned image.
At step (303), the processor (201) may be configured for detecting (303 a) and segmenting (303 b) a plurality of objects of the query image based upon a pretrained object detection and segmentation data model. The step of detecting (303 a) and segmenting (303b) the plurality of objects is performed by detecting and locating the objects using pre-trained object detection models of a domain specific data. The step of detecting (303 a) further comprises a sub-step of labelling data and creating bounding boxes to form object classifiers. The step of segmenting (303 b) the plurality of objects is further performed for yielding a pixel-wise mask for each of the object extracting domain specific details from the image in a precise manner, as per the use case scenarios. In one example, if the use case scenario is related to a retail outfit domain, the domain specific details extracted may include a neckline of shirt, color of the shirt, fit of shirt (slim, tapered etc.), and the like. In another example, if the use case scenario is related to a furniture related domain, the domain specific details extracted may include shape of a furniture item, finishing of the furniture item, or a texture of the furniture item, and the like.
At step (305), the processor (201) may be configured for extracting a plurality of features from the plurality of objects based upon a predetermined machine learning model. In one embodiment, the plurality of features may be attribute features of the object, a word or a symbolic tag to a specific characteristic feature of the object or an OCR recognized text available in the query image as one of the objects.
At step (307), the processor (201) may be configured for assigning a score to the plurality of features extracted from the plurality of objects. In various embodiments, the score may be at least one of a numerical value, a probability value, a priority value, a weighted average value or a combination thereof, as per the use case scenario. Unlike conventional approach, a plurality of classifiers of present disclosure generate the features with a maximum probability as well as all the other features with a lesser probability. In one exemplary embodiment, the processor (201) is configured to generate a score related to the features based on their maximum probability value and a lesser probability value, wherein the probability value is indicative of one or more segments such as gender, size, type and the like.
Application of the score ensures that the scores should differ by a large margin. The larger margin in the score values of the features ensures that the processor (201) is enabled for clustering similar features together whereas unrelated features are not clustered being farther.
A plurality of deep learning models may be used to extract features from the query image. In one embodiment, in case of specific merchant domain use case of outfits (Color, Pattern, Fit, Style, Fabric), apart from that more features are extracted. The overall information of a query image is stored in the database (213) in form of a single string.
In one embodiment, the processor (201) may be configured for normalizing the score values, such that all the different features are represented on a common scale in a signature feature string. In one embodiment, the score values may be represented as the probability value.
At step (309), the processor (201) may be configured for generating a signature feature string corresponding to the query image based upon the score of the plurality of features extracted from the plurality of objects in the query image.
In one embodiment, the processor (201) may be configured for quantifying the probability value as a numerical value of count ‘n’. Each feature is then repeated ‘n’ times to obtain a signature. This signature is then converted to a vector format.
At step (311), the processor (201) may be configured for converting the signature feature string to a vector representation thereby generating an input image vector corresponding to the query image.
This process (steps 301 to 311) are repeated for all the query images in the database (213). The input image vectors representation of the query images obtained are then stored in the database (213) along with their respective images. The same process steps (steps 301 to 311) are repeated for a new input query image. The vector now obtained is compared to all the vectors in the database using cosine similarity and these images are then sorted based on similarity.
At step (313), the processor (201) is configured for comparing, the input image vector with a plurality of prestored image vectors corresponding to a plurality of images.
The processor (201) may be configured for converting, a vector representation of plurality of images into a vector space. The plurality of images represented in form of a vector space in such a way that the distance between to dissimilar features is larger than that of the similar features. In one embodiment, the processor (201) may be configured for clustering images with similar feature together and that of dissimilar features far apart.
At step (315), the processor (201) may configured for retrieving, at least one of an image vector matching with the query image vector. The data retrieved is therefore retrieved with high similarity. In one embodiment, a cosine similarity between the input vectors of the query image signature feature string and pre-stored signatures feature is calculated and the results displayed are sorted on the basis similarity. The matching results are returned in decreasing order of similarity or increasing order of distance between the vectors represented in form of the vector space.
At step (317), the processor (201) may be configured for displaying one or more images on a user device (103), corresponding to the at least one image vector matching with the query image vector.
The embodiments of the present disclosure are further elaborated in form of a working example. In case of an online outfit merchant domain based optimized image based search following steps are performed to obtain highly similar image to an input query image.
Initially, customized data related to the outfits (tops, bottoms, shirts etc.) pre-stored in the database is collected. For example, if a user desires to extract similar patterns of ‘a shirt’ as a feature, in that case, ‘shirt’ becomes a ‘query image’ and pattern (Printed, Checked, Stripes, Solids) of the shirt is considered as classifying feature. The data related to the outfit is prestored in the database in form of a ‘pretrained model’. Based on the pre-trained model, classifier can predict the pattern of the shirt from the given image.
In one embodiment, a model is configured for detecting and segmenting the query image comprising outfits. The query image detects an outfit from the image and segment them as bottoms, tops, male, female etc based on the models for which they are pre-trained for.
The models which are custom trained on thousands of open source images are used to identify [Gender Pattern, Fit, Style, Sleeve Length, Color] and more. Each of these features or word tags are added and combined together to make a single Feature String.
Feature String – [top short sleeve black white checked flared puma slim fit v neck]
An OCR (Optical Character Recognizer) is also used to identify any text that is available in the image and that text is also added to the feature string.
Final Output = Feature String + OCR Output
Further, the processor (201) generates a weighted priority to the objects of the query image comprising a shirt. Following implementation represents potential way of prediction of the query image.
Considering the scenario where image comprises an object ‘shirt’ more probably being a male outfit. The probability value assigned to gender may be:
Gender Model: Male = 90% : Female = 10%
These probabilities are then multiplied by a predetermined weight. For this case, the predetermined weight assigned for the gender is 2. The probability of the gender model is then multiplied by the predetermined weighted value 2.
Male = 180 : Female = 20%
Now these weighted probability values are quantified by predetermined priority factor of 20 to generate the score value for gender.
Male = 9 : Female = 1
In the next step, based on the quantification of the features, a signature feature string is generated for the query image. Similar process is repeated for other features and objects of the input query image.
So, the Signature obtained for this case may be represented as:
[Male Male Male Male Male Male Male Male Male Female Pattern, Fit, Style, Sleeve Length, Color]
Further, with the help of the Optical Character Recognition extraction any text present in the image is extracted. Therefore, a final output of the signature feature string generated as:
Feature String = [Male Male Male Male Male Male Male Male Male Female Pattern Fit Style Sleeve Length Colour] + OCR Text
Further, the generated feature string for the specific query image is converted to form an input vector.
The input vector of a feature string is then represented in form of a vector space, where the distance between two different features is more. The similar items are clustered together and dissimilar are far apart. In this example, the shirts for Men may be located far from shirts for women when represented in vector space using the aforementioned process. Therefore, while retrieving the data is retrieved with high similarity. In the next step, cosine similarity between the vectors of query image signature and stored signatures is calculated and the results are shown sorted on similarity. In the final step, the most relevant results matching with the query image of the ‘shirt’ are displayed on the user device as a final output.
The optimized image based searching system (101) as described in present disclosure may provide multiple advantages involving but not limited to:
• The system (101) discloses an approach for converting unstructured image data into Structured manageable vector representations in a vector space using a cosine similarity technique enabling more efficient storage and retrieval of similar data.
• The disclosed system (101) may be implemented or various applications including online merchant domains such as but not limited to fashion, furniture, outfits, closets, and inventory management domains.
• The disclosed system (101) represent the input/query image in form of a vector representation. When these vectors are represented in the Vector Space, the distance between two different features is more. The similar items are clustered together and dissimilar are far apart retrieving the data with high similarity and less processing time.
The embodiments, examples and alternatives of the preceding paragraphs or the description and drawings, including any of their various aspects or respective individual features, may be taken independently or in any combination. Features described in connection with one embodiment are applicable to all embodiments, unless such features are incompatible.
Although implementations for the optimized image based searching system (101) and the method (300) thereof have been described in language specific to structural features and/or methods, it is to be understood that the approached claims are not necessarily limited to the specific features or methods described. Rather, the specific features and method are disclosed as examples of implementations for the optimized image based searching system (101) and the method (300) thereof.
| # | Name | Date |
|---|---|---|
| 1 | 201921052708-STATEMENT OF UNDERTAKING (FORM 3) [18-12-2019(online)].pdf | 2019-12-18 |
| 2 | 201921052708-REQUEST FOR EXAMINATION (FORM-18) [18-12-2019(online)].pdf | 2019-12-18 |
| 3 | 201921052708-POWER OF AUTHORITY [18-12-2019(online)].pdf | 2019-12-18 |
| 4 | 201921052708-FORM 18 [18-12-2019(online)].pdf | 2019-12-18 |
| 5 | 201921052708-FORM 1 [18-12-2019(online)].pdf | 2019-12-18 |
| 6 | 201921052708-FIGURE OF ABSTRACT [18-12-2019(online)].pdf | 2019-12-18 |
| 7 | 201921052708-DRAWINGS [18-12-2019(online)].pdf | 2019-12-18 |
| 8 | 201921052708-COMPLETE SPECIFICATION [18-12-2019(online)].pdf | 2019-12-18 |
| 9 | Abstract1.jpg | 2019-12-23 |
| 10 | 201921052708-Proof of Right [18-02-2020(online)].pdf | 2020-02-18 |
| 11 | 201921052708-FORM 3 [03-06-2020(online)].pdf | 2020-06-03 |
| 12 | 201921052708-FER.pdf | 2021-10-19 |
| 13 | 201921052708-OTHERS [11-02-2022(online)].pdf | 2022-02-11 |
| 14 | 201921052708-FER_SER_REPLY [11-02-2022(online)].pdf | 2022-02-11 |
| 15 | 201921052708-CLAIMS [11-02-2022(online)].pdf | 2022-02-11 |
| 16 | 201921052708-Response to office action [22-10-2024(online)].pdf | 2024-10-22 |
| 1 | SearchHistoryE_13-08-2021.pdf |