System For Pose Invariant Identificationand Search Of An Image Object

System For Pose Invariant Identificationand Search Of An Image Object In An Input Image

Abstract: A method and system for retrieving image based results, the method comprising: capturing an image, identifying at least one region of interest (ROI) within the said captured image based on pre-determined criteria and extracting one or more descriptors corresponding to each of the at least one ROI, transmitting the extracted one or more descriptors for each of the at least one ROI to a visual search engine and collating the one or more descriptor corresponding to each of the at least one ROI to form a logical query block and conducting search on a data repository based on the logical data block. Ref: Fig. 1

Patent Information

Application #

Filing Date

09 November 2015

Publication Number

28/2017

Publication Type

INA

Invention Field

COMMUNICATION

Status

Email

Parent Application

Patent Number

Legal Status

Grant Date

2018-11-29

Renewal Date

Applicants

STAQU TECHNOLOGIES PVT LIMITED

H NO 331 F/F, MAIN MARKET, BADAPUR, NEW DELHI 110044

Inventors

1. ANURAG SAINI

799, VILLAGE SHAHBAD, MOHD PUR,NEW DELHI, 110061

2. ATUL RAI

VILLAGE AND POST TARAUKA,AZAMGARH, 276136, UTTAR PRADESH

3. CHETAN KUMAR

97, OLD BADARPUR VILLAGE, NEW DELHI, 110044

4. PANKAJ KUMAR SHARMA

D-139, H N EXN, PART-1, JAITPUR ROAD,BADARPUR, NEW DELHI-110044

Specification

DESC:FIELD OF INVENTION
The present invention relates generally to visual search technology and more particularly to image identification system and method for objects from digitally captured images wherein, image characteristics are used to identify an object from a plurality of objects in the database and wherein the invention allows the user to perform a visual search over databases directly from the image acquisition device.
BACKGROUND OF INVENTION
With the advent of the era of big data, Internet resources image rapid growth, large-scale image of how quickly and efficiently retrieve resources to meet user needs to be solved. Current image search engines on the web rely purely on the keywords around the images and the filenames, which produces a lot of garbage in the search results. Due to the fact that web-based image search engines are blind to the actual content of images, the result of querying for a specific object is often cluttered with irrelevant data.
Traditional image searching relies on text based and/or content based image searching techniques. In conventional content based image searching, the primary parameters for performing image recognition are colour, shape and texture. The disadvantage of such systems is that if the system gets trained towards any single feature, it will start giving incorrect and irrelevant results, thereby spoiling the entire user experience.
Technological developments in image capturing such as by using a camera or any image acquisition device, inbuilt as well as standalone, has led to big imaging capability in a diversity of applications. For example, image capturing hardware such as a portable cellular phones inbuilt with digital cameras are now common in the market and it is desirable that they be useful for duties other than picture taking for transmission to a remote location. Most of such cellular devices are pre-equipped with image acquisition devices which can generate high resolution imagery. However, traditional mobile applications developed for processing of images captured by the in-built mobile image acquisition device are inefficient and lead to shortage of storage space, battery consumption, poor user experience etc.
Furthermore, traditional methods for linking image objects to digital information, including applying a barcode, radio or optical transceiver or transmitter, or some other means of identification to the object, or modifying the image or object so as to encode detectable information in it, are cumbersome because of the multiplicity of units involved therein. It is desirable that any image or object can be identified solely by its visual appearance.
In view of the same, system and method for content based image retrieval are desired that require a user to submit a query image, and return images that are similar in content. Further, it is desirable that such image acquisition devices are adept at image identification, wherein such identification is efficient so that the computing required for such identification can be performed locally, shared with an Internet connected computer or performed remotely, depending on the database size and the available computing power. Further, it is desirable that an Artificial Intelligence (AI) powered method is developed which in collaboration with hardware capabilities can amalgamate shape, colour, texture, pattern, meta tag information collectively without being prejudiced to anyone of them to give best possible search result.
The present invention solves the shortcomings of the existing technologies in the area of visual search by integrating image identification functionality within the native image acquisition device and identifying whether symbolic content is included in the image. If so the symbol is decoded and communication is opened with the proper database, usually using the Internet, wherein the best match for the symbol is returned.
BRIEF SUMMARY OF INVENTION
The present invention seeks to address the aforementioned problems of the background art by providing image capturing and processing into a query image at the capturing stage itself and effective searching in order to find similar images. The present invention allows the user to directly access the multiple databases and retrieve desired results based on the image captured by the image acquisition device directly and recognize the item/commodity/object with high precision and accurate results while performing the search.
Embodiments of the present invention include a system for capturing and searching image having atleast one image capturing device standalone or in conjunction with a wireless pointing device, an image processing unit for automatically identifying atleast a region of interest (ROI) and extracting descriptors out of the captured image; and a visual search engine for searching and retrieving search results out of the query.
Another embodiment of the present invention include a method for capturing and searching images having the steps of: capturing an image by an image capturing device, automatically identifying atleast a region of interest (ROI) within the said image having an object, analyzing the ROI and searching for the related content based on ROI analysis.
By way of application and as a preferred embodiment, the present invention will allow the user to capture image of the desired object from the native camera application, generate a visual query based on the captured image, wirelessly transmit the query to remote database or servers which can in turn generate information content in response to the search query, more specifically, the matching products available on multiple e-commerce websites and display the results to the user’s mobile phone. Embodiments of the present invention also enable the user to compare the price of the products, time of delivery and other such attributes and subsequently buy the desired product, all using the native camera application in real time.
The present invention enhances the capabilities of the native camera application of a mobile phone in a manner as to be used for a use beyond taking pictures. This invention will empower the mobile makers to monetize the camera and at the same time improve the user experience.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a network diagram that illustrates an exemplary operating environment.
FIG. 2 is a block diagram showing the image processing unit which may be standalone or mounted as an internal arrangement of the camera on the client device.
FIG. 3 illustrates a visual search engine in accordance with the embodiments of the present invention.
FIG. 4 is a flowchart illustrating a processing procedure of the camera of the client device.
FIG. 5 is a flowchart illustrating the indexing and search operations performed by the visual search engine.
Fig. 6 is a graphical user interface 600 that illustrates images taken for an object or set of objects.
DETAILED DESCRIPTION OF THE EMBODIMENTS
The present invention broadly is related to a search technique for image query. A region of interest (ROI) containing an image in a scene of a displayed content is captured. The ROI is analyzed locally or remotely to search for related content associated with the image. The viewer may then receive the related content.
Embodiments of the invention include a technique to provide related content associated with an image captured on a scene of a displayed content. The related content is of interest to the viewer who is viewing the displayed content. A region of interest (ROI) containing the image is captured by highlighting an object in the scene using a wireless pointing device and selecting an area encompassing the object to correspond to the ROI. The ROI is then analyzed to search for the related content associated with the image.
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Please note that configurations shown in the following embodiments are merely examples and the present invention is not limited to them.
In this current exemplary embodiment, an image acquisition device mounted on a handheld device or standalone or on a laptop etc compares the feature amount of captured image data with that in a recognition dictionary built over a database, executes object detection processing to detect the likelihood of a detection target object such as a shirt, bag etc, and transmits the detection result and the feature amount to an image processing apparatus according to the detected likelihood. Upon receiving the detection result and feature amount, the image processing apparatus executes object redetection processing using the feature amount.
FIG. 1 is a network diagram that illustrates an exemplary operating environment 100, in accordance with embodiments of the invention. The operating environment 100 shown in Fig. 1 is merely exemplary and is not intended to suggest any limitation as to scope or functionality. Embodiments of the invention are operable with numerous other configurations. With reference to FIG. 1, the operating environment 100 includes a network 104, a visual search engine 105, image capturing devices including a handheld device 101, a standalone camera 102 and a camera mountable on a PC or a laptop 103, an image objects database 106 and an image processing unit 107.
The network 104 is configured to facilitate communication between the devices 101, 102 or 103 and the visual search engine 105. The devices include, without limitation, personal digital assistants, smart phones, laptops, personal computers, gaming systems, set-top boxes, or any other suitable client computing device. The network 104 also facilitates communication between the image objects database 106 and the visual search engine 105. The network 104 may be a communication network, such as a wireless network, local area network, wired network, or the Internet. In an embodiment, the devices 101, 102 or 103 interact with the visual search engine 105 utilizing the network 104. In response, the visual search engine 105 provides web pages, images, videos, or other electronic documents that contain terms provided or selected by the user.
The image processing unit 107 may be a graphical processing unit (GPU) or a controller that aids selection of the ROI of the captured image. The image processing unit 107 also serves as an information processing apparatus. The image processing unit 107 interacts with the image capturing devices to allow the viewer to capture an image and search for related content associated with the captured image. The image processing unit 107 receives image data and a detection result from the camera units 101, 102 or 103 and outputs, to the network 104, the received image data and detection result or the detection result of object redetection processing.
The image processing unit 107 will detect the ROI and returns the related content. To help further the analysis, ancillary information associated with the scene may be employed. The ancillary information may include metadata, content provider information, link to a Web site which may direct the user to a related Web site (e.g., official logo for a company identified in the ROI or to the company website for an identified product) etc. which may allow not only identification of the program but also where in the program the viewing is currently happening. For remote analysis, the ancillary information may be transmitted together with the ROI to the server. The image in the ROI may also uses optical character recognition (OCR) techniques if it contains textual information to give better result.
The visual search engine 105 is communicatively connected via network 104 to the devices 101, 102 or 103 and the objects database 106. In certain embodiments, the visual search engine 105 is a server that generates visual representations as a response to inputs by the devices 101, 102 or 103. The visual search engine 105 receives, over network 104, selection of an image or a combination of images from the devices 101, 102 or 103 that provide interfaces that receive interactions from users. In one embodiment, the visual search engine 105 is equipped to be able to retrieve search results in response to a query having a combination of images formulated and issued by user.
In certain embodiments, the visual search engine 105 traverses the objects database 106 to identify objects that correspond to the images received from the devices 101, 102 or 103. In turn, the objects database 106 transmits a set of objects that satisfies the selections to the visual search engine 105. The set of objects are associated with object queries, web pages, images, videos, or uniform resource locators (URLs) thereof along with other electronic documents. The visual search engine 105 is also able to format the URLs and transmit the URLs to the devices 101, 102 or 103.
The devices 101, 102 or 103 are utilized by a user to capture images as search terms, or to hover over objects, or to select images from web links, and to receive results or web pages that are relevant to the search terms, the selected links, or the selected images. The devices include user and system information storage to store user and system information on the devices. The user information may include search histories, cookies, and passwords. The system information may include internet protocol addresses, cached Web pages, and system utilization. The client devices communicate with the visual search engine 105 to receive the results or web pages that are relevant to the search images, the selected links of images, or the selected image objects identified from webpages. The web pages provide details about items that interest the user. The web pages are indexed by index servers and may include terms or metadata. The terms or metadata is used by the index servers to store the web page in an appropriate location. Additionally, the web pages are associated with URLs that are also stored by the index servers and may be provided to the devices.
The image objects database 106 stores attributes and images for each object. The attributes include titles, image size, image colour, image patterns, image dimensions, and other metadata for the object. The visual search engine 105 may request one or more objects from the image objects database 106. In turn, the image objects database 106 returns the matching score of the queried image attribute with already indexed attribute in image object database. Additionally, the image object servers may also be communicatively coupled with index servers that store web pages, terms associated with each web page, and URLs corresponding to the said matching results. The visual search engine 105 may request one or more web pages from the image object database 106. In turn, the image object database transmits the web pages to the visual search engine 104.
Accordingly, the operating environment 100 is configured with a visual search engine 105 that provides results that include web pages and objects to the devices 101, 102 or 103. The visual search engine 105 traverses the image object database 106 and in turn traverses the index servers connecter thereto to obtain results that satisfy the requests received from the users. In turn, the client devices render the results for display to the users.
In an embodiment, the visual search engine generates a graphical user interface that includes results that match terms provided by or selections selected by the user. The results may include URLs that point to web pages. The graphical user interface contains a link that reformats the results into a visual representation. The link to the visual representation may be highlighted on the graphical user interface.
FIG. 2 is a block diagram showing the image processing unit which may be standalone or mounted as an internal arrangement of the camera on the client device according to the first embodiment as shown above. Fig. 2 illustrates an image capture unit 201 which includes a lens and image sensor. The image capture unit 201 transmits captured image data to an object detection processing unit 203. The image encoder 202 encodes the image data into commonly known standard formats such as JPEG, MPEG2, MPEG4, or H.264 and the image proceeds for detection of ROI at the ROI detection processing unit.
An object detection processing unit 203 includes the ROI identification unit 204, a likelihood of object unit 205 and detection result generation unit 206. The ROI identification unit 204 detects the region with the most likelihood of having the image object. The likelihood of object unit identifies the feature amount of the image data, for example any pattern on the image within the selected ROI. The feature amount represents the features of an image, and is used for internal processing in detecting a likelihood or similarity to the object. The likelihood of object unit 205 uses a feature amount registered in an object recognition engine 206 to detect the object of which the image is being taken and the whether the same is predetermined type of object. The likelihood of object unit 205 tries to identify the type of object, image whereof has been taken and based on inputs from the object recognition engine 206 tries to decide whether the detection target object is an already identified predetermined type of object. The object recognition engine 206 is pre-trained on a lot of object from different domains for example, fashion, e-commerce, natural scenes, facial contours etc.
In an exemplary embodiment, the likelihood of object unit 205 compares a feature or attribute of a predetermined object amount held in the object recognition engine 206 with that of the input image, and then detects the position and likelihood of the said object in the input image based on the comparison result. Based on the result of the said comparison, a determination with respect to the shape and type of the object of which the image has been taken is made.
Based on the detection result of the detection processing unit 203, a detection result generation unit 207 generates ROI information, integrated object(s) information, collates the said information and then outputs it as a detection result in the form of a logical query. The object related information includes ROI, type of object and type of image which can range from region of image based on image data corresponding to the object region and the like. The communication unit 208 transmits the detection result of the detection result generation unit 207 to the visual search engine.
FIG. 3 illustrates a visual search engine that implements the image search part. The visual search engine 302 receives as input via the user interface 303, digital images and user input image information 301 as discussed above. The processing blocks of the visual search engine include the ROI 303 and ancillary information block 304. The ROI and object related information is captured as illustrated above with reference to Fig. 2. In addition, the visual search engine 302 also obtains ancillary information regarding the image that contains the ROI. The ancillary information 304 may also includes shape 305, colour 306, texture 307 or metadata 308. The ancillary information 304 provides additional search query or criteria to narrow the search space or refine the search for more accurate results. The ancillary information 304 may be processed or filtered before being submitted together with the ROI 303 to the search.
The search uses the image contained in the ROI 303 as the query together with the ancillary information 304 if such information is available. The search 304 may return a number of search results. The search results may be filtered or selected by the image processing unit 107. The filtering or selection criteria may be tailored according to the viewer's preferences based on established criteria or based on history of usage.
FIG. 4 is a flowchart illustrating a processing procedure of the camera of the client device according to the embodiment as described above. When the camera on the device has a processor and memory, the processing flow of Fig. 4 indicates a program for causing the processor to execute the procedure shown in Fig. 4. The processor of the image capturing device serves as a computer, which executes a program Which reads out the image from the said device.
In step 401, the user captures the image via the image capture unit 201 and subsequently the image processing unit 107, more specifically the image feature identification unit 204 acquires image data input from the image capture unit 201. The image processing unit performs encoding via the encoder 202 as image processing for the acquired image data. In step 402, the image feature identification unit 204 detects the ROI.
In step 403, the likelihood of match unit 205 compares the detected ROI with that in the recognition engine 206, and detects, from the image data, the particular object whose image has been captured. Note that by adding feature amounts such as brand recognition, product textual descriptions, in the recognition engine, it may be possible to perform more detailed image object detection processing and derive the exact product whereof the image has been captured. Subsequently, if the captured object is identified, image object information is generated in step 404.
On the other hand, if the image object is not identified, in step 405, the likelihood of match unit 205 generates available image information and descriptors about the image object. In step 406, the detection processing unit identifies whether the image needs to be recaptured or a separate region thereof needs to be recaptured and prompts the user recapture the image via the image capturing device 201.
In step 407, the detection result generation unit 207 generates, as a detection result, descriptors of the captured image and integrates specified object information including a plurality of pieces of information of object regions useful to determine the product and transmits the same over the network 104.
FIG. 5 is a flowchart illustrating the indexing and search operations performed by the visual search engine 302 based on features with iteration based on result selection, according embodiment as described above. The user inputted image as received from the camera unit is characterized with a set of feature descriptors in step 501. The visual features will be extracted from the said image for the purpose of classification. The visual features will be extracted from the queried image in step 502. Based on the matched features from already indexed image object database as given in 306 , the results are displayed to a user in step 504. User selection of preferred results may also be preferred which upon iterative refining would present better results.
Fig. 6 is a graphical user interface 600 that illustrates images taken for an object or set of objects. Whereas the present representation is depicted in a two-dimension, a three-dimensional representation in accordance with embodiments of the invention may also be done using the apparatus and method as described above. The user may use the wireless pointing device having the camera or the image acquisition device to capture the image and identify the ROI 601 and use it as a query for search for related content. In an exemplary embodiment, the object 602 whereof the search is to be made may typically be at the center of ROI. The camera mounted on the user device captures the image of the object and generates a visual representation of the said object. The image processing device mounted along with the camera (or standalone) identifies the object, and extracts deep visual features of the said object. Subsequently, the object information is sent over a communication network and the visual search engine matches the visual features extracted from the image to equivalent features of images in the image object database. The visual search engine then responds to the user query and replaces the as received URL results with the visual representation (three-dimensional or two-dimensional) based on the number of objects and the dimensions of the objects. The visual search engine animates the objects as the objects are rendered in the graphical user interface.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

CLAIMS:We claim:
1.A method for retrieving image based results, the method comprising:
capturing an image;
identifying at least one region of interest (ROI) within the said captured image based on pre-determined criteria and extracting one or more descriptors corresponding to each of the at least one ROI;
transmitting the extracted one or more descriptors for each of the at least one ROI to a visual search engine; and
collating the one or more descriptor corresponding to each of the at least one ROI to form a logical query block and conducting search on a data repository based on the logical data block.
2. The method for retrieving image based results as claimed in claim 1, wherein the predetermined criteria includes automatically identifying the region with the most likelihood of having the image object.
3. The method for retrieving image based results as claimed in claim 1, wherein analyzing the ROI comprises searching for descriptors associated with the image, wherein the descriptors include additional information that further describes the object such as content provider information, web URL, metadata etc.
4. The method for retrieving image based results as claimed in claim 3, wherein the additional information is also transmitted along with the ROI.
5. The method for retrieving image based results as claimed in claim 1 wherein analyzing the ROI comprises: performing an optical character recognition (OCR) analysis if the image has textual information used to help identifying the image.
6. The method for retrieving image based results as claimed in claim 5, wherein searching for related content comprises segregating the transmitted content into ROI related information and ancillary information such as shape, colour, texture or metadata of the object; and performing a search simultaneously with each of such parameters.
7. The method for capturing and searching images as claimed in any of the previous claims comprising filtering or selecting the search results based on user preferences and displaying the said results on a graphical user interface.
8. A system for retrieving image based results, the system comprising:
atleast one image capturing device standalone or in conjunction with a wireless pointing device for capturing an image;
an image processing unit communicably coupled to the at least one image capturing device, the image processing unit adapted to receive the captured image and to automatically identify at least one region of interest (ROI) in the captured image based on pre-determined criteria and extracting one or more descriptors corresponding to the at least one ROI; and
a visual search engine communicably coupled to the image processing unit for retrieving search results after collating together the one or more descriptors.
9.The system for retrieving image based results as claimed in claim 8, wherein the image processing unit comprises:
an image encoder for encoding the captured image;
an ROI identification unit for identifying the ROI by detecting at least a portion of the captured image with the most likelihood of having the image object;
a likelihood of object unit for identifying the feature amount of the image data;
a recognition engine for identifying the target object;
a detection result generation unit for integrating ROI information and integrated object(s) information into a logical query block; and
a communication unit for transmitting the said information over a communication network.
10. The system for capturing and searching image as claimed in claims 1 and 2, wherein the visual search engine analyzes the content sent by the communication unit and returns the related content.
11. The system for capturing and searching images as claimed in claim 3, wherein the visual search engine segregates the content into ROI related information and ancillary information such as shape, colour, texture or metadata of the object and performs a search simultaneously with each of such parameters.
12. The system for capturing and searching images as claimed in any of the previous claims, wherein the visual search engine is communicatively coupled to a image object database for retrieval of related content.
13. The system for capturing and searching images as claimed in any of the previous claims, wherein the image processing unit filters or selects the search results based on user preferences.
14. The system for capturing and searching images as claimed in any of the previous claims, wherein the search results are displayed on a graphical user interface to the user.

Documents

Orders

Section	Controller	Decision Date
15	AJAY SINGH MEENA	2018-11-29
15	AJAY SINGH MEENA	2018-11-29

Application Documents

#	Name	Date
1	3654-del-2015-GPA-(09-11-2015).pdf	2015-11-09
2	3654-del-2015-Form-5-(09-11-2015).pdf	2015-11-09
3	3654-del-2015-Form-3-(09-11-2015).pdf	2015-11-09
4	3654-del-2015-Form-2-(09-11-2015).pdf	2015-11-09
5	3654-del-2015-Form-1-(09-11-2015).pdf	2015-11-09
6	3654-del-2015-Correspondence Others-(09-11-2015).pdf	2015-11-09
7	Other Patent Document [25-05-2016(online)].pdf	2016-05-25
8	Drawing [08-11-2016(online)].jpg	2016-11-08
9	Description(Complete) [08-11-2016(online)].pdf	2016-11-08
10	Form 13 [12-11-2016(online)].pdf	2016-11-12
11	OTHERS [11-06-2017(online)].pdf	2017-06-11
12	Form 9 [04-07-2017(online)].pdf	2017-07-04
13	3654-del-2015-FORM 18A [18-07-2017(online)].pdf	2017-07-18
14	3654-DEL-2015-FER.pdf	2017-08-18
15	3654-del-2015-FER_SER_REPLY [05-02-2018(online)].pdf	2018-02-05
16	3654-del-2015-DRAWING [05-02-2018(online)].pdf	2018-02-05
17	3654-del-2015-CLAIMS [05-02-2018(online)].pdf	2018-02-05
18	3654-DEL-2015-HearingNoticeLetter.pdf	2018-07-20
19	3654-DEL-2015-FORM-26 [29-08-2018(online)].pdf	2018-08-29
20	3654-del-2015-FORM FOR STARTUP [29-08-2018(online)].pdf	2018-08-29
21	3654-del-2015-EVIDENCE FOR REGISTRATION UNDER SSI [29-08-2018(online)].pdf	2018-08-29
22	3654-del-2015-Written submissions and relevant documents (MANDATORY) [15-09-2018(online)].pdf	2018-09-15
23	3654-DEL-2015-PETITION UNDER RULE 137 [15-09-2018(online)].pdf	2018-09-15
24	3654-DEL-2015-PETITION UNDER RULE 137 [15-09-2018(online)]-1.pdf	2018-09-15
25	3654-DEL-2015-PETITION UNDER RULE 137 [15-09-2018(online)]-1-1.pdf	2018-09-15
26	3654-del-2015-Annexure (Optional) [15-09-2018(online)].pdf	2018-09-15
27	3654-DEL-2015-Written submissions and relevant documents (MANDATORY) [01-11-2018(online)].pdf	2018-11-01
28	3654-DEL-2015-Annexure (Optional) [01-11-2018(online)].pdf	2018-11-01
29	3654-DEL-2015-PatentCertificate29-11-2018.pdf	2018-11-29
30	3654-DEL-2015-IntimationOfGrant29-11-2018.pdf	2018-11-29

Search Strategy

1	3654DEL2015_27-07-2017.pdf