Sign In to Follow Application
View All Documents & Correspondence

System And Method For Automatically Capturing And Mapping A Retail Environment Using A Robotic Camera

Abstract: SYSTEM AND METHOD FOR AUTOMATICALLY CAPTURING AND MAPPING A RETAIL ENVIRONMENT USING A ROBOTIC CAMERA A system for automatically capturing and mapping a plurality of assets in a retail environment 106 is provided. The system includes a robotic camera 102 for capturing an image 5 of the retail environment 106. The robotic camera 102 comprises a processing unit 104 that is configured with a starting coordinate of the retail environment 106. The processing unit 104 (a) enables the robotic camera 102 to adjust its zoom parameter with respect to the starting coordinate and capture a first photo of the retail environment 106; (b) executes a first machine learning model 308A to identify a plurality of assets in the retail environment 106 by processing 10 the first photo; (c) executes a second machine learning model 310A to compute a plurality of features for different parts of an underlying physical space of the retail environment 106 using the first photo and the identified plurality of assets; (d) executes a third machine learning model 312A to determine a movement of the robotic camera 102 using the identified plurality of assets; (e) automatically configures a second coordinate of the retail environment 106 with respect to the 15 movement of the robotic camera 102 and enables the robotic camera 102 to move in a vertical and horizontal plane to position itself at the second coordinate of the retail environment 106; and (f) automatically captures a second photo of the underlying physical space or a non-overlapping region of the retail environment 106 from the second coordinate in order to capture the entire retail environment 106. 20 FIG. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
29 October 2019
Publication Number
12/2020
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ipo@myipstrategy.com
Parent Application

Applicants

INFILECT TECHNOLOGIES PRIVATE LIMITED
12056, SOBHA ELITE 8TH MILE, TUMKUR RD, NAGASANDRA, BENGALURU, KARNATAKA -560073, INDIA

Inventors

1. VIJAY GABALE
A105, Gopalan Admiralty Square, 6th Main, 13th Cross, Indiranagar, Bengaluru, Karnataka, India-560038 India.
2. ANAND PRABHU SUBRAMANIAN
12056, Sobha Elite 8th Mile, Tumkur Rd, Nagasandra, Bengaluru, Karnataka, India-560073.
3. VIJAYKUMAR KANNAN
Apartment 23166, Prestige Shantiniketan, ITPL Main Road, Bengaluru, Karnataka, India-560048.

Specification

Claims:I/We Claim:
1. A system for automatically capturing and mapping a plurality of assets in a retail 1 environment (106), wherein the system comprises: 2
a robotic camera (102) for capturing an image of the retail environment (106), 3 wherein the robotic camera (102) comprises a processing unit (104) that is configured with a 4 starting coordinate of the retail environment (106), wherein the processing unit (104) 5
(a) enables the robotic camera (102) to adjust its zoom parameter with respect 6 to the starting coordinate and capture a first photo of the retail environment (106); 7
(b) executes a first machine learning model (308A) to identify a plurality of 8 assets in the retail environment (106) by processing the first photo; 9
(c) executes a second machine learning model (310A) to compute a plurality 10 of features for different parts of an underlying physical space of the retail 11 environment (106) using the first photo and the identified plurality of assets; 12
(d) executes a third machine learning model (312A) to determine a movement 13 of the robotic camera (102) using the identified plurality of assets, wherein the 14 movement of the robotic camera (102) comprises a vertically upward movement, a 15 vertically downward movement, a horizontal movement in a right direction or a 16 horizontal movement in a left direction; 17
(e) automatically configures a second coordinate of the retail environment 18 (106) with respect to the movement of the robotic camera (102) and enables the 19 robotic camera (102) to move in a vertical and horizontal plane to position itself at the 20 second coordinate of the retail environment (106); and 21
(f) automatically captures a second photo of the underlying physical space or a 22 non-overlapping region of the retail environment (106) from the second coordinate in 23 order to capture the entire retail environment (106). 24
1
21
2. The system as claimed in claim 1, wherein the processing unit (104) repeats the steps 1 of (a) to (f) until the entire retail environment (106) is fully captured by the robotic camera 2 (102). 3
1
3. The system as claimed in claim 1, wherein the plurality of assets comprises at least 1 one of a retail shelf, a refrigerator, a retail shelf boundary, a point of sale material, a retail 2 shelf product, or a retail price tag. 3
1
4. The system as claimed in claim 1, wherein the processing unit (104) assists a user to 1 move the robotic camera (102) in the vertical or horizontal plane. 2
1
5. The system as claimed in claim 1, wherein the robotic camera (102) is mounted on an 1 opposite shelf or a wall mounted. 2
1
6. The system as claimed in claim 1, wherein the robotic camera (102) adjusts at least 1 one of pan, zoom and tilt configurations of the robotic camera (102) with respect to its 2 movement to position itself at the second coordinate of the retail environment (106) for 3 capturing the second photo of the retail environment (106). 4
1
7. The system as claimed in claim 1, wherein the processing unit (104) further 1 comprises a set of instructions to (i) determine a time at which the robotic camera (102) starts 2 capturing photos of the retail environment (106), (ii) run the first machine learning model 3 (308A) to determine the plurality of assets from the photos of the retail environment (106), 4 (iii) change the robotic camera (102) parameters comprising an angle or a zoom for capturing 5 the photos of the retail environment (106), (iv) determine a degree of the movement of the 6 robotic camera (102), (v) determine a time at which the robotic camera (102) has to take a 7 photo of the retail environment (106), or (vi) repeat the steps (a) to (f) until the entire retail 8 environment (106) is fully captured by the robotic camera (102). 9
1
8. The system as claimed in claim 1, wherein the system further comprises an 1 augmented reality device (208) that captures an augmented reality surface of the retail 2
22
environment (106) which overlays on captured area in a real-time video feed of the robotic 3 camera (102). 4
1
9. The system as claimed in claim 8, wherein the augmented reality device (208) enables 1 the robotic camera (102) to move around in the vertical and horizontal plane and changes one 2 or more coordinates for automatically capturing a set of photos of the non-overlapping region 3 of the retail environment (106). 4
1
10. The system as claimed in claim 1, wherein the robotic camera (102) comprises an in-1 built wireless transmitter and a receiver for sending the photos that are captured by the 2 robotic camera (102) over the Internet. 3
1
11. The system as claimed in claim 1, wherein the first machine learning model (308A) is 1 trained by providing one or more assets that are identified and its corresponding photos taken 2 at a plurality of instances corresponding to one or more retail environments (106) as training 3 data. 4
1
12. A method for automatically capturing and mapping a plurality of assets in a retail 1 environment (106) using a robotic system, wherein the robotic system comprises a robotic 2 camera (102), said method comprising: 3
(a) configuring, a processing unit (104) of the robotic camera (102), with a starting 4 coordinate of the retail environment (106) for capturing an image of the retail environment 5 (106); 6
(b) enabling the robotic camera (102) to adjust its zoom parameter with respect to the 7 starting coordinate and capture a first photo of the retail environment (106); 8
(c) executing a first machine learning model (308A) to identify a plurality of assets in 9 the retail environment (106) by processing the first photo; 10
23
(d) executing a second machine learning model (310A) to compute a plurality of 11 features for different parts of an underlying physical space of the retail environment (106) 12 using the first photo and the identified plurality of assets; 13
(e) executing a third machine learning model (312A) to determine a movement of the 14 robotic camera (102) using the identified plurality of assets, wherein the movement of the 15 robotic camera (102) comprises a vertically upward movement, a vertically downward 16 movement, a horizontal movement in a right direction or a horizontal movement in a left 17 direction; 18
(f) automatically configuring a second coordinate of the retail environment (106) with 19 respect to the movement of the robotic camera (102) and enables the robotic camera (102) to 20 move in a vertical and horizontal plane to position itself at the second coordinate of the retail 21 environment (106); and 22
(g) automatically capturing a second photo of the underlying physical space or a non-23 overlapping region of the retail environment (106) from the second coordinate in order to 24 capture the entire retail environment (106). 25
1
13. The method as claimed in claim 12, wherein the method comprises repeating the steps 1 of (b) to (g) until the entire retail environment (106) is fully captured by the robotic camera 2 (102). 3
1
14. The method as claimed in claim 12, wherein the method comprise adjusting, using the 1 robotic camera (102), at least one of pan, zoom and tilt configurations of the robotic camera 2 (102) with respect to its movement to position itself at the second coordinate of the retail 3 environment (106) for capturing the second photo of the retail environment (106). , Description:BACKGROUND
Technical Field
[0001] The embodiments herein generally relate to automatically capture an image of 5 a plurality of assets of a retail environment in order to capture the entire retail environment using a robotic camera with respect to the movement of the robotic camera, and more particularly to a system and method for automatically capturing and mapping a plurality of assets in the retail environment via a robotic camera and automatically takes a set of photos of the non overlapping regions as the robotic camera moves vertically and horizontally in 10 order to cover the entire retail shelf.
Description of the Related Art
[0002] Stocking shelf is an essential component of retail stores. An item that is not on a shelf cannot be sold easily. A company may lose the sale, in addition to a customer, if the consumer tries a competing brand and prefers the competing brand, which leads to a loss of 15 future sales from that consumer. A condition of an item missing from the shelf is called out-of-shelf. Typically, a retail store employee is assigned a task of inventorying product on the shelf and replenishing an out-of-shelf product, and this action is called as “facing the shelves”. Store managers and product manufactures are also concerned with an appearance of merchandise/item on shelves. The condition of shelves and products on the shelves reflects 20 an image of the retail store to the customer and the store managers and the product manufacturers also wish to maintain a high-quality image to the customers.
[0003] Retail stores stack products in horizontally wide and vertically tall shelves. It is important for retail stores to spot empty shelf slots and for retail brands to spot out of stock SKUs (Stock Keeping Unit) on a retail shelf. To accomplish this task, it is important to scan 25 the entire retail shelf. A height and a width of retail shelves that are coupled with inadequate space between two retail shelf aisles (i.e. adequate space between two retail shelf aisles may make it easy to take a single long-shot covering the entire retail shelf) makes it optically impossible to capture the entire shelf from a single vantage point. Consequently, if the retail
3
shelf is to be captured via. a human assistance, the human needs to carefully take a set of vertically and horizontally aligned photos of the retail shelf by moving a camera device vertically and horizontally in a 2D/3D plan in front of the retail shelf. Keeping track of alignments and overlaps across consecutive photos is a cognitively cumbersome task and may result in mistakes as either a shelf area is missed and not captured or captured several 5 times.
[0004] Often, the sales force workers or feet on street agents or the retail store owners may carry low-end mobile/Android phones that cannot capture high quality photos of the retail shelves. Further, a size of those photos is often less than 1000 by 1000 pixels in width and height or less than 1MB. Moreover, due to non-availability of high-speed networks and 10 poor network conditions, even if a high-resolution photo is captured using a high-end mobile/Android phones, it may take more time (e.g. more than 1 second) to upload the photo, or sometimes the upload simply fails due to network timeouts. Hence, even if a high-quality photo is captured, due to the limitations on network speed and available network bandwidth, a low resolution photo (e.g. a photo with dimensions of less than 1000 by 1000 pixels in 15 width and height, or a photo size of less than 1MB) needs to be uploaded in order to perform real-time processing of those photos.
[0005] Therefore, a system where either a CPG (consumer packaged goods) merchandiser or a sales force auditor or the retail store owner herself takes pictures of retail shelves in order to make the retail product-display environments captured by the photo 20 instantly available with a local point of sale terminal as well as a central product master database is desirable.
[0006] In addition to the need of capturing the entire retail product-display environments, it important to capture the photo of the retail shelves and process those photos using an image processing technique for finding and recognizing products and product 25 information at SKU level from low quality photos of retail shelves. Existing product recognition systems use complex algorithms or general image classifier models to extract a text description or a label of the products within the photo. However, many product attributes and smaller product details may not be determined using the existing product recognition systems. 30
4
[0007] In recent approach, some systems use a deep machine learning model for determining the product information at SKU level from photos of retail stores. However, these systems may not work in extracting product information efficiently from a low quality or a low-resolution photos (e.g. with a size less than 1MB) taken in retail environment.
[0008] Accordingly, there remains a need for a system and method for automatically 5 capturing and mapping a plurality of assets in a retail environment in effective manner.
SUMMARY
[0009] In view of the foregoing, an embodiment herein provides a system for automatically capturing and mapping a plurality of assets in a retail environment. The system includes a robotic camera for capturing an image of the retail environment. The robotic 10 camera comprises a processing unit that is configured with a starting coordinate of the retail environment. The processing unit (a) enables the robotic camera to adjust its zoom parameter with respect to the starting coordinate and capture a first photo of the retail environment; (b) executes a first machine learning model to identify a plurality of assets in the retail environment by processing the first photo; (c) executes a second machine learning 15 model to compute a plurality of features for different parts of an underlying physical space of the retail environment using the first photo and the identified plurality of assets; (d) executes a third machine learning model to determine a movement of the robotic camera using the identified plurality of assets, wherein the movement of the robotic camera comprises a vertically upward movement, a vertically downward movement, a horizontal movement in a 20 right direction or a horizontal movement in a left direction; (e) automatically configures a second coordinate of the retail environment with respect to the movement of the robotic camera and enables the robotic camera to move in a vertical and horizontal plane to position itself at the second coordinate of the retail environment; and (f) automatically captures a second photo of the underlying physical space or a non-overlapping region of the retail 25 environment from the second coordinate in order to capture the entire retail environment.
[0010] In some embodiments, the processing unit repeats the steps of (a) to (f) until the entire retail environment is fully captured by the robotic camera.
5
[0011] In some embodiments, the plurality of assets comprises at least one of a retail shelf, a refrigerator, a retail shelf boundary, a point of sale material, a retail shelf product, or a retail price tag.
[0012] In some embodiments, the processing unit assists a user to move the robotic camera in the vertical or horizontal plane. In some embodiments, the robotic camera is 5 mounted on an opposite shelf or a wall mounted.
[0013] In some embodiments, the robotic camera adjusts at least one of pan, zoom and tilt configurations of the robotic camera with respect to its movement to position itself at the second coordinate of the retail environment for capturing the second photo of the retail environment. In some embodiments, the processing unit further comprises a set of 10 instructions to (i) determine a time at which the robotic camera starts capturing photos of the retail environment, (ii) run the first machine learning model to determine the plurality of assets from the photos of the retail environment, (iii) change the robotic camera parameters comprising an angle or a zoom for capturing the photos of the retail environment, (iv) determine a degree of the movement of the robotic camera, (v) determine a time at which the 15 robotic camera has to take a photo of the retail environment, or (vi) repeat the steps (a) to (f) until the entire retail environment is fully captured by the robotic camera.
[0014] In some embodiments, the system further comprises an augmented reality device that captures an augmented reality surface of the retail environment which overlays on captured area in a real-time video feed of the robotic camera. In some embodiments, the 20 augmented reality device enables the robotic camera to move around in the vertical and horizontal plane and changes one or more coordinates for automatically capturing a set of photos of the non-overlapping region of the retail environment.
[0015] In some embodiments, the robotic camera comprises an in-built wireless transmitter and a receiver for sending the photos that are captured by the robotic camera over 25 the Internet.
[0016] In some embodiments, the first machine learning model is trained by providing one or more assets that are identified and its corresponding photos taken at a plurality of instances corresponding to one or more retail environments as training data.
6
[0017] In another aspect, a method for automatically capturing and mapping a plurality of assets in a retail environment using a robotic system is provided. The robotic system comprises a robotic camera. The method comprising the steps of: (a) configuring, a processing unit of the robotic camera, with a starting coordinate of the retail environment for capturing an image of the retail environment; (b) enabling the robotic camera to adjust its 5 zoom parameter with respect to the starting coordinate and capture a first photo of the retail environment; (c) executing a first machine learning model to identify a plurality of assets in the retail environment by processing the first photo; (d) executing a second machine learning model to compute a plurality of features for different parts of an underlying physical space of the retail environment using the first photo and the identified plurality of assets; (e) executing 10 a third machine learning model to determine a movement of the robotic camera using the identified plurality of assets, wherein the movement of the robotic camera comprises a vertically upward movement, a vertically downward movement, a horizontal movement in a right direction or a horizontal movement in a left direction; (f) automatically configuring a second coordinate of the retail environment with respect to the movement of the robotic 15 camera and enables the robotic camera to move in a vertical and horizontal plane to position itself at the second coordinate of the retail environment; and (g) automatically capturing a second photo of the underlying physical space or a non-overlapping region of the retail environment from the second coordinate in order to capture the entire retail environment.
[0018] In some embodiments, the method comprises repeating the steps of (b) to (g) 20 until the entire retail environment is fully captured by the robotic camera.
[0019] In some embodiments, the method comprises adjusting, using the robotic camera, at least one of pan, zoom and tilt configurations of the robotic camera with respect to its movement to position itself at the second coordinate of the retail environment for capturing the second photo of the retail environment. 25
[0020] In some embodiments, the plurality of assets comprises at least one of a retail shelf, a refrigerator, a retail shelf boundary, a point of sale material, a retail shelf product, or a retail price tag.
[0021] The system has advantages of automatically capture the entire retail shelf using a set of photos by moving the camera device in the vertical and horizontal plane. The 30 system has ability to control the movement and configuration of the robotic camera based on
7
the retail assets detected from a captured photo. The system has ability to automatically position and point the robotic camera on the appropriate part of a retail shelf or a retail environment, such photos of the non-overlapping regions are captured to exhaustively cover the retail shelf or retail environment. The system further runs low complexity and purpose-built Artificial intelligence (AI) algorithms that accurately locate and classify boundaries of 5 retail assets such as a retail shelf, or a set of retail products etc, and compute the next movement, pan, tilt, and zoom configuration of the robotic camera.
[0022] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, 10 while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
15
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
[0024] FIG. 1 illustrates a system view of a robotic system that automatically captures and maps a plurality of assets in a retail environment according to some 20 embodiments herein;
[0025] FIG. 2 illustrates a robotic system of FIG. 1 comprising an augmented reality device according to some embodiments herein;
[0026] FIG. 3 is an exploded view of the robotic system of FIG. 1 according to some embodiments herein; 25
[0027] FIGS. 4A & 4B are flow diagrams that illustrate a method of automatically capturing and mapping a plurality of assets in a retail environment using a robotic system of FIG. 1 according to some embodiments herein;
[0028] FIG. 5 is a flow diagram that illustrates a method of using a plurality of
8
machine learning models for automatically capturing and mapping a plurality of assets in a retail environment according to some embodiments herein; and
[0029] FIG. 6 is a schematic diagram of a computer architecture in accordance with the embodiments herein.
DETAILED DESCRIPTION OF THE DRAWINGS 5
[0030] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended 10 merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
[0031] Various embodiments disclosed herein provide a system and method for 15 automatically capturing and mapping a plurality of assets in a retail environment using a robotic camera. Referring now to the drawings, and more particularly to FIGS. 1 through 6, where similar reference characters denote corresponding features consistently throughout the figures, preferred embodiments are shown.
[0032] FIG. 1 illustrates a system view of a robotic system 100 that automatically 20 captures and maps a plurality of assets in a retail environment 106 according to some embodiments herein. The robotic system 100 comprises a robotic camera 102 for capturing an image of the retail environment 106. The robotic camera 102 comprises a processing unit 104 that is configured with a starting coordinate of the retail environment 106. The processing unit 104 enables the robotic camera 102 to adjust its zoom parameter with respect 25 to the starting coordinate and capture a first photo of the retail environment 106. The processing unit 104 executes a first machine learning model to identify a plurality of assets in the retail environment 106 by processing the first photo. The processing unit 104 executes a second machine learning model to compute a plurality of features for different parts of an underlying physical space of the retail environment 106 using the first photo and the 30
9
identified plurality of assets. The processing unit 104 executes a third machine learning model to determine a movement of the robotic camera 102 using the identified plurality of assets. The movement of the robotic camera 102 comprises a vertically upward movement, a vertically downward movement, a horizontal movement in a right direction or a horizontal movement in a left direction. The processing unit 104 automatically configures a second 5 coordinate of the retail environment 106 with respect to the movement of the robotic camera 102 and enables the robotic camera 102 to move in a vertical and horizontal plane to position itself at the second coordinate of the retail environment 106. The robotic camera 102 adjusts at least one of pan, zoom and tilt configurations of the robotic camera 102 with respect to its movement to position itself at the second coordinate of the retail environment 106 for 10 capturing a second photo of the retail environment 106. The processing unit 104 automatically captures the second photo of the underlying physical space or a non-overlapping region of the retail environment 106 from the second coordinate in order to capture the entire retail environment 106.
[0033] In some embodiments, the processing unit 104 repeats the steps of (a) to (f) 15 until the entire retail environment 106 is fully captured by the robotic camera 102. The processing unit 104 assists a user to move the robotic camera 102 in the vertical or horizontal plane. The robotic camera 102 is mounted on an opposite shelf or a wall mounted.
[0034] In some embodiments, the processing unit 104 comprises a graphical processing unit (GPU). The processing unit 104 further comprises a set of instructions to (i) 20 determine a time at which the robotic camera 102 starts capturing photos of the retail environment 106, (ii) run the first machine learning model to determine the plurality of assets from the photos of the retail environment 106, (iii) change the robotic camera 102 parameters comprising an angle or a zoom for capturing the photos of the retail environment 106, (iv) determine a degree of the movement of the robotic camera 102, (v) determine a time at which 25 the robotic camera 102 has to take a photo of the retail environment 106, or (vi) repeat the steps (a) to (f) until the entire retail environment 106 is fully captured by the robotic camera 102.
[0035] In some embodiments, the robotic camera 102 comprises an in-built wireless transmitter and a receiver for sending the photos that are captured by the robotic camera 102 30
10
over the Internet. The machine learning models are trained by providing one or more assets that are identified and its corresponding photos taken at a plurality of instances corresponding to one or more retail environments as training data.
[0036] In some embodiments, the robotic camera 102 may be selected from a camera of a handheld device, a camera of a computing device, a camera of a smartphone, a camera of 5 a virtual reality device or any kind of imaging device that has the processing power to run the machine learning models. In some embodiments, the robotic system 100 may be selected from a handheld device, a PDA (Personal Digital Assistant), a tablet, a computer, an electronic notebook or a smartphone.
[0037] In some embodiments, the image capturing and the mapping of the assets can 10 be performed by a same device. In some embodiments, the robotic camera 102 is communicated with the robotic system 100 via. a cloud server to capture the entire retail environment 106 from the robotic system 100. In some embodiments, the steps performed by the processing unit 104 are partially performed in the cloud server.
[0038] The robotic system 100 includes a memory and a processor. The robotic 15 camera 102 captures the images of the assets from the retail environment 106. The robotic system 100 receives the images of the assets captured by the robotic camera 102 and stores the captured images in the memory.
[0039] In some embodiments, the image of the retail environment 106 includes at least one image of an asset, a shelf brand display, a point of sale brand display, a digital 20 advertisement display or an image of at least one of a physical retail store environment, a digital retail store environment, a virtual reality store environment, a social media environment or a web page environment. In some embodiment, the robotic camera 102 captures a video of an asset, a video or a three-dimensional model of at least one of a physical retail store environment, a digital retail store environment, a virtual reality store 25 environment, a social media environment or a web page environment.
[0040] In some embodiments, the robotic system 100 determines the image of the retail environment 106 with a size of less than 1MB or the dimensions of less than 1000 by 1000 pixels in width and height as low resolution image of the retail environment 106 and the image of the retail environment 106 with a size of greater than 3MB or the dimensions of 30
11
greater than 3000 by 3000 pixels in width and height as high resolution image of the retail environment 106. In some embodiments, the image of the retail environment 106 is an image of the asset.
[0041] In some embodiments, the video of an asset or the video of at least one of the physical retail store environments, the digital retail store environment, the virtual reality store 5 environment, the social media environment or the web page environment is parsed to extract one or more images.
[0042] In some embodiments, the three-dimensional model of the retail environment 106 is converted into an image of the retail environment 106, when the three-dimensional model of the retail environment 106 is received from the digital retail store environment or 10 the virtual reality store environment.
[0043] The robotic system 100 includes a system of deep machine learning models to capture new and maximal non captured area of the underlying physical space by parsing the low-resolution image of the retail environment 106 (size of less than 1MB) associated with each of the one or more assets in the retail environment 106. 15
[0044] In some embodiments, the robotic system 100 includes a plurality of deep machine learning models comprising a first machine learning model, a second machine learning model and a third machine learning model to capture a new and maximal non captured area of the underlying physical space from the low resolution image of the retail environment 106 by parsing the image of the retail environment 106. The first machine 20 learning model receives the low-resolution image of the retail environment 106 as input, and processes the low-resolution image of the retail environment 106 to generate super-resolution image of the retail environment 106 of the low-resolution image of the retail environment 106 and to identify a plurality of assets in the retail environment 106. The second machine learning model receives the super-resolution image of the retail environment 106 and the low 25 resolution image of the retail environment 106 from the first machine learning model and computes a plurality of features for different parts of an underlying physical space of the retail environment 106 using the super-resolution image and the low resolution image of the retail environment 106.
[0045] The third machine learning model receives one or more identified plurality of 30
12
assets of the retail environment 106 captured at the first machine learning model and the computed plurality of features for different parts of an underlying physical space of the retail environment 106 as input and determines a movement of the robotic camera 102 using the identified plurality of assets from the first machine learning model and the computed plurality of features for different parts of an underlying physical space of the retail 5 environment 106. The movement of the robotic camera 102 comprises a vertically upward movement, a vertically downward movement, a horizontal movement in a right direction or a horizontal movement in a left direction.
[0046] In some embodiments, the deep machine learning model is a machine learning technique that is designed to recognize and interpret the data through a machine perception, a 10 labeling and by clustering raw data. The deep machine learning models are trained by providing one or more assets that are identified and its corresponding photos taken at a plurality of instances corresponding to one or more retail environments as training data. The machine learning model is trained to perform the task with the processor.
[0047] In some embodiments, the first machine learning model identifies the retail 15 assets such as different types of retail shelves placed in an aisle (e.g. a retail shelf that may have multiple bays, each bay having multiple rows where products are stocked), a special purpose of retail shelf (e.g. a refrigerator, or a cooler, or standalone brand-specific shelf), and/or different retail point-of-sale-materials (e.g. end-caps, counter-tops, hot-spot-windows, hangers, signages, shop-boards). 20
[0048] In some embodiments, the first machine learning model is trained to generate a bounding box around such different types of assets that appear in a first photo of a video frame. The first machine learning model runs on the robotic camera 102 online to parse the first photo and detect the plurality of assets in the first photo. The detection involves generating a bounding box on the plurality of assets and labeling the bounding box). Before 25 deploying and running the first machine learning model online on the robotic camera 102, the first machine learning model is trained on a cloud by providing a labeled dataset as training data. The labeled dataset consists of photos as well as labels for each photo where labels for the photo imply bounding boxes along with labels for all different types of assets that appear in the photo. 30
13
[0049] In some embodiments, the second machine learning model identifies different features of the first photo such as colours, shapes, patterns that appear in the first photo, as well as ,different units or products (e.g., retail products kept on a retail shelf) that appear within a particular retail asset that is part of the first photo. For example, the second machine learning model divides the first photo into n * m grid (e.g., if a photo is of 800 height and 5 600 width, then the second machine learning model divides the photo into 80 X 60 size 100 grid element) and computes a file-level computer presentation which is called as embeddings, (i.e., each grid element is represented by a n-bit/float valued vector, and this vector is called as embedding) for each of the 100 grid elements. In some embodiments, the file-level computer presentation is a fine level n-bit or low-level n-bit or abstract-level n-bit 10 computer representation. The second machine learning model computes the computer representation for each of the grid element to track the portions covered, portions not covered in the first photo, and thus portions overlapping between the first photo and new photo as the robotic camera 102 captures new photos. In some embodiment, as the robotic camera 102 moves, the robotic camera 102 continuously captures and evaluates a new photo frame from 15 the video stream. The second machine learning model then performs an evaluation to compute overlap with respect to the first photo. The overlap computation is performed based on the number of grids that appear in the first photo as well as the new photo frame. When the minimum overlap threshold is reached, which is be configured in the robotic system 100 (e.g. 10% overlap), the robotic camera 102 decides to capture the new photo. 20
[0050] In some embodiments, the first position of the robotic camera 102 (i.e. the first photo to be taken in the retail environment 106) is fixed during the robotic camera 102 configuration. The first position is typically configured to capture a top left part of the retail environment 106. The robotic camera 102 then moves either in the right direction or in the downward direction. The third machine learning model determines the robotic camera 102 25 movement as follows. For example, once the first photo is taken, the robotic camera 102 moves towards right direction. Based on the features computed by the second machine learning model, using the third machine learning model, the robotic camera 102 predicts whether it has reached the end of a shelf or end of aisle or end of the floor. If the robotic camera 102 reaches the end of the asset/shelf/aisle/floor, then the robotic camera 102 moves 30 downward direction and repeats the process by moving to the left direction. Once the robotic
14
camera 102 reaches the start of the shelf/aisle/floor, it again moves downwards and repeats the process by moving to the right direction. While moving to take the next photo, the robotic camera 102 stops with when there is X% overlap between the first photo and new photo with respect to all the four sides (i.e. left/right/top/bottom), where X is configuration (e.g. 10% overlap between the new photo and first photo). This movement of the robotic camera 102 is 5 called as pan movement. The movement of the robotic camera to the upwards and downwards is typically handled via. tilt movement. After the pan and tilt movement, based on the configuration provided by a cloud, the robotic camera 102 may determine to employ zoom to capture the new photos in the desired quality and resolution.
[0051] In some embodiment, Instead of beginning from the top left position, if the 10 robotic camera 102 starts from any other position, the robotic camera 102 first moves towards the left end, then towards the right end, and then move upwards and repeats the process of scanning, and then moves downwards and repeats the process of scanning.
[0052] In some embodiment, the first machine learning model is trained with the one or more low resolution image of assets taken in the retail environment 106 as input and the 15 corresponding ground truth high resolution images of assets as output. The second machine learning model is trained with one or more low resolution and high-resolution images of assets as input and the computed plurality of features for different parts of an underlying physical space of the retail environment 106 as output. The third machine learning model receives one or more identified plurality of assets of the retail environment 106 captured at 20 the first machine learning model and the computed plurality of features for different parts of an underlying physical space of the retail environment 106 as input and a movement of the robotic camera 102 using the identified plurality of assets from the first machine learning model and the computed plurality of features for different parts of an underlying physical space of the retail environment 106 as output. 25
[0053] FIG. 2 illustrates a robotic system 100 of FIG. 1 comprising an augmented reality device 208 according to some embodiments herein. The robotic system 100 further comprises an augmented reality device 208 that captures an augmented reality surface of the retail environment 106 which overlays on captured area in a real-time video feed of the robotic camera 102. The augmented reality device 208 enables the robotic camera 102 to 30
15
move around in a vertical and horizontal plane and changes one or more coordinates for automatically capturing a set of photos of the non-overlapping region of the retail environment 106.
[0054] FIG. 3 is an exploded view of the robotic system 100 of FIG. 1 according to some embodiments herein. The robotic system 100 includes a memory 300 that stores a 5 database 304 and a processor 302 that includes a set of modules and executes a set of instructions. The processor 302 includes a first image capturing module 306, an image identification module 308, a feature computing module 310, a camera movement determination module 312, a camera movement control module 314 and a second image capturing module 316. 10
[0055] The first image capturing module 306 enables the robotic camera 102 to adjust its zoom parameter with respect to the starting coordinate and capture a first photo of the retail environment 106. The image identification module 308 identifies a plurality of assets in the retail environment 106 by executing a first machine learning model 308A and processing the first photo captured by the first image capturing module 306. The feature 15 computing module 310 executes a second machine learning model 310A and computes a plurality of features for different parts of an underlying physical space of the retail environment 106 using the first photo and the identified plurality of assets. The camera movement determination module 312 executes a third machine learning model 312A and determines a movement of the robotic camera 102 using the identified plurality of assets. In 20 some embodiments, the movement of the robotic camera 102 comprises a vertically upward movement, a vertically downward movement, a horizontal movement in a right direction or a horizontal movement in a left direction.
[0056] The camera movement control module 314 automatically configures a second coordinate of the retail environment 106 with respect to the movement of the robotic camera 25 102 and enables the robotic camera 102 to move in a vertical and a horizontal plane to position itself at the second coordinate of the retail environment 106. In some embodiments, the robotic camera 102 adjusts at least one of pan, zoom and tilt configurations of the robotic camera 102 with respect to its movement to position itself at the second coordinate of the retail environment 106 for capturing a second photo of the retail environment 106. The 30
16
second image capturing module 316 automatically captures the second photo of the underlying physical space or a non-overlapping region of the retail environment 106 from the second coordinate in order to capture the entire retail environment 106.
[0057] In some embodiments, the plurality of assets comprises at least one of a retail shelf, a refrigerator, a retail shelf boundary, a point of sale material, a retail shelf product, or 5 a retail price tag.
[0058] In some embodiments, the first machine learning model 308A includes a convolution based encoder and decoder models that is trained to minimize mean square error loss for each of the pixel between the output images of assets produced by the first machine learning model 308A during training and the ground truth high resolution images of assets 10 provided as the true output images of assets.
[0059] In some embodiments, the second machine learning model 310A includes a convolution-based encoder and decoder that is trained to minimize regression loss to compute features for different parts of an underlying physical space of the retail environment 106. 15
[0060] In some embodiments, the third machine learning model 312A comprises a convolution-based encoder and decoder that is trained to minimize mean square error loss to determine a movement of the robotic camera (102) based on the identified plurality of assets.
[0061] In some embodiments, the robotic system 100 comprises a communication module that communicates with a point of sale device or a cloud server via a Bluetooth or 20 Wi-Fi and transfers the images obtained from the deep machine learning model, over a secure peer-to-peer protocol, to the point of sale device, which further updates a local point-of-sale product-master or the cloud server. Further, the cloud server may obtain the information about the products present on the shelves or assets of the retail environment 106.
[0062] In some embodiments, the robotic system 100 comprises a network module 25 that establishes a connection between the robotic system 100 and a global product-master/ the cloud server to transmit the extracted product information on the shelves of the retail environment 106 to the cloud server to update a global product-master database of the cloud server. In some embodiments, the network module uses the Internet connection
17
(2G/3G/4G/5G) network to transmit the product information to the global product master/cloud server. In some embodiments, the network module further transmits a store ID and a timestamp data along with the product information to the global product-master/cloud server.
[0063] FIGS. 4A & 4B are flow diagrams that illustrate a method of automatically 5 capturing and mapping a plurality of assets in a retail environment 106 using a robotic system 100 of FIG. 1 according to some embodiments herein. At step 402, a processing unit 104 of the robotic camera 102 is configured with a starting coordinate of the retail environment 106 for capturing an image of the retail environment 106. At step 404, the robotic camera 102 is enabled to adjust its zoom parameter with respect to the starting 10 coordinate and capture a first photo of the retail environment 106. At step 406, the processing unit 104 executes a first machine learning model 308A and identifies a plurality of assets in the retail environment 106 by processing the first photo. At step 408, the processing unit 104 executes a second machine learning model 310A and computes a plurality of features for different parts of an underlying physical space of the retail environment 106 using the first 15 photo and the identified plurality of assets. At step 410, the processing unit 104 executes a third machine learning model 312A and determines a movement of the robotic camera 102 using the identified plurality of assets. The movement of the robotic camera 102 comprises a vertically upward movement, a vertically downward movement, a horizontal movement in a right direction or a horizontal movement in a left direction. At step 412, the processing unit 20 104 automatically configures a second coordinate of the retail environment 106 with respect to the movement of the robotic camera 102 and enables the robotic camera 102 to move in a vertical and horizontal plane. The robotic camera 102 adjusts at least one of pan, zoom and tilt configurations of the robotic camera 102 with respect to its movement to position itself at the second coordinate of the retail environment 106 for capturing a second photo of the retail 25 environment 106. At step 414, the robotic camera 102 automatically captures the second photo of the underlying physical space or a non-overlapping region of the retail environment 106 from the second coordinate in order to capture the entire retail environment 106.
[0064] FIG. 5 is a flow diagram that illustrates a method of using a plurality of machine learning models for automatically capturing and mapping a plurality of assets in a 30 retail environment 106 according to some embodiments herein. The plurality of machine
18
learning models comprises a first machine learning model 308A, a second machine learning model 310A and a third machine learning model 312A. At step 502, the first machine learning model 308A receives the low resolution image of the retail environment 106 as input and processes the low resolution image of the retail environment 106 to provide super-resolution image of the retail environment 106 of the low resolution image of the retail 5 environment 106 and to identify a plurality of assets in the retail environment 106. At step 504, the second machine learning model 310A receives the super-resolution image of the retail environment 106 and the low resolution image of the retail environment 106 from the first machine learning model and computes a plurality of features for different parts of an underlying physical space of the retail environment 106 using the super-resolution image and 10 the low resolution image of the retail environment 106. At step 506, the third machine learning model 312A receives one or more identified plurality of assets of the retail environment 106 captured at the first machine learning model and the computed plurality of features for different parts of an underlying physical space of the retail environment 106 as input and to determine a movement of the robotic camera 102 using the identified plurality of 15 assets from the first machine learning model and the computed plurality of features for different parts of an underlying physical space of the retail environment 106. The movement of the robotic camera 102 comprises a vertically upward movement, a vertically downward movement, a horizontal movement in a right direction or a horizontal movement in a left direction. 20
[0065] A representative hardware environment for practicing the embodiments herein is depicted in FIG. 6, with reference to FIGS. 1 through 5. This schematic drawing illustrates a hardware configuration of a server/computer system/ computing device in accordance with the embodiments herein. The system includes at least one processing device CPU 10 that may be interconnected via system bus 14 to various devices such as a random-access 25 memory (RAM) 12, read-only memory (ROM) 16, and an input/output (I/O) adapter 18. The I/O adapter 18 can connect to peripheral devices, such as disk units 38 and program storage devices 40 that are readable by the system. The system can read the inventive instructions on the program storage devices 40 and follow these instructions to execute the methodology of the embodiments herein. The system further includes a user interface adapter 22 that 30 connects a keyboard 28, mouse 30, speaker 32, microphone 34, and/or other user interface
19
devices such as a touch screen device (not shown) to the bus 14 to gather user input. Additionally, a communication adapter 20 connects the bus 14 to a data processing network 42, and a display adapter 24 connects the bus 14 to a display device 26, which provides a graphical user interface (GUI) 36 of the output data in accordance with the embodiments herein, or which may be embodied as an output device such as a monitor, printer, or 5 transmitter, for example.
[0066] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications without departing from the generic concept, and, therefore, such adaptations and modifications should be comprehended within 10 the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the appended claims.

Documents

Orders

Section Controller Decision Date
Section 15, 2(1)(j), 3(k), 10(4) Vishal Shukla 2021-10-25
Section 77 Vishal Shukla 2022-10-21

Application Documents

# Name Date
1 201941043842-STATEMENT OF UNDERTAKING (FORM 3) [29-10-2019(online)].pdf 2019-10-29
1 201941043842-Written submissions and relevant documents [22-06-2022(online)].pdf 2022-06-22
2 201941043842-Correspondence to notify the Controller [07-06-2022(online)].pdf 2022-06-07
2 201941043842-FORM FOR STARTUP [29-10-2019(online)].pdf 2019-10-29
3 201941043842-FORM FOR SMALL ENTITY(FORM-28) [29-10-2019(online)].pdf 2019-10-29
3 201941043842-Correspondence to notify the Controller [03-06-2022(online)].pdf 2022-06-03
4 201941043842-ReviewPetition-HearingNotice-(HearingDate-07-06-2022).pdf 2022-05-26
4 201941043842-FORM 1 [29-10-2019(online)].pdf 2019-10-29
5 201941043842-FORM-24 [24-12-2021(online)].pdf 2021-12-24
5 201941043842-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [29-10-2019(online)].pdf 2019-10-29
6 201941043842-RELEVANT DOCUMENTS [24-12-2021(online)].pdf 2021-12-24
6 201941043842-EVIDENCE FOR REGISTRATION UNDER SSI [29-10-2019(online)].pdf 2019-10-29
7 201941043842-FORM 4 [24-11-2021(online)].pdf 2021-11-24
7 201941043842-DRAWINGS [29-10-2019(online)].pdf 2019-10-29
8 201941043842-US(14)-ExtendedHearingNotice-(HearingDate-29-06-2021).pdf 2021-10-17
8 201941043842-DECLARATION OF INVENTORSHIP (FORM 5) [29-10-2019(online)].pdf 2019-10-29
9 201941043842-COMPLETE SPECIFICATION [29-10-2019(online)].pdf 2019-10-29
9 201941043842-US(14)-HearingNotice-(HearingDate-28-06-2021).pdf 2021-10-17
10 201941043842-Written submissions and relevant documents [14-07-2021(online)].pdf 2021-07-14
10 Abstract 201941043842.jpg 2019-10-31
11 201941043842-FORM-9 [18-03-2020(online)].pdf 2020-03-18
11 201941043842-PETITION UNDER RULE 137 [13-07-2021(online)]-1.pdf 2021-07-13
12 201941043842-PETITION UNDER RULE 137 [13-07-2021(online)].pdf 2021-07-13
12 201941043842-STARTUP [19-03-2020(online)].pdf 2020-03-19
13 201941043842-FORM28 [19-03-2020(online)].pdf 2020-03-19
13 201941043842-RELEVANT DOCUMENTS [13-07-2021(online)]-1.pdf 2021-07-13
14 201941043842-FORM 18A [19-03-2020(online)].pdf 2020-03-19
14 201941043842-RELEVANT DOCUMENTS [13-07-2021(online)].pdf 2021-07-13
15 201941043842-FORM-26 [11-06-2020(online)].pdf 2020-06-11
15 201941043842-FORM-26 [28-06-2021(online)].pdf 2021-06-28
16 201941043842-Correspondence to notify the Controller [22-06-2021(online)].pdf 2021-06-22
16 201941043842-FER.pdf 2020-06-16
17 201941043842-OTHERS [16-12-2020(online)].pdf 2020-12-16
17 201941043842-ABSTRACT [16-12-2020(online)].pdf 2020-12-16
18 201941043842-CLAIMS [16-12-2020(online)].pdf 2020-12-16
18 201941043842-FER_SER_REPLY [16-12-2020(online)].pdf 2020-12-16
19 201941043842-COMPLETE SPECIFICATION [16-12-2020(online)].pdf 2020-12-16
19 201941043842-CORRESPONDENCE [16-12-2020(online)].pdf 2020-12-16
20 201941043842-COMPLETE SPECIFICATION [16-12-2020(online)].pdf 2020-12-16
20 201941043842-CORRESPONDENCE [16-12-2020(online)].pdf 2020-12-16
21 201941043842-CLAIMS [16-12-2020(online)].pdf 2020-12-16
21 201941043842-FER_SER_REPLY [16-12-2020(online)].pdf 2020-12-16
22 201941043842-ABSTRACT [16-12-2020(online)].pdf 2020-12-16
22 201941043842-OTHERS [16-12-2020(online)].pdf 2020-12-16
23 201941043842-Correspondence to notify the Controller [22-06-2021(online)].pdf 2021-06-22
23 201941043842-FER.pdf 2020-06-16
24 201941043842-FORM-26 [28-06-2021(online)].pdf 2021-06-28
24 201941043842-FORM-26 [11-06-2020(online)].pdf 2020-06-11
25 201941043842-FORM 18A [19-03-2020(online)].pdf 2020-03-19
25 201941043842-RELEVANT DOCUMENTS [13-07-2021(online)].pdf 2021-07-13
26 201941043842-FORM28 [19-03-2020(online)].pdf 2020-03-19
26 201941043842-RELEVANT DOCUMENTS [13-07-2021(online)]-1.pdf 2021-07-13
27 201941043842-PETITION UNDER RULE 137 [13-07-2021(online)].pdf 2021-07-13
27 201941043842-STARTUP [19-03-2020(online)].pdf 2020-03-19
28 201941043842-FORM-9 [18-03-2020(online)].pdf 2020-03-18
28 201941043842-PETITION UNDER RULE 137 [13-07-2021(online)]-1.pdf 2021-07-13
29 201941043842-Written submissions and relevant documents [14-07-2021(online)].pdf 2021-07-14
29 Abstract 201941043842.jpg 2019-10-31
30 201941043842-COMPLETE SPECIFICATION [29-10-2019(online)].pdf 2019-10-29
30 201941043842-US(14)-HearingNotice-(HearingDate-28-06-2021).pdf 2021-10-17
31 201941043842-US(14)-ExtendedHearingNotice-(HearingDate-29-06-2021).pdf 2021-10-17
31 201941043842-DECLARATION OF INVENTORSHIP (FORM 5) [29-10-2019(online)].pdf 2019-10-29
32 201941043842-FORM 4 [24-11-2021(online)].pdf 2021-11-24
32 201941043842-DRAWINGS [29-10-2019(online)].pdf 2019-10-29
33 201941043842-RELEVANT DOCUMENTS [24-12-2021(online)].pdf 2021-12-24
33 201941043842-EVIDENCE FOR REGISTRATION UNDER SSI [29-10-2019(online)].pdf 2019-10-29
34 201941043842-FORM-24 [24-12-2021(online)].pdf 2021-12-24
34 201941043842-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [29-10-2019(online)].pdf 2019-10-29
35 201941043842-ReviewPetition-HearingNotice-(HearingDate-07-06-2022).pdf 2022-05-26
35 201941043842-FORM 1 [29-10-2019(online)].pdf 2019-10-29
36 201941043842-FORM FOR SMALL ENTITY(FORM-28) [29-10-2019(online)].pdf 2019-10-29
36 201941043842-Correspondence to notify the Controller [03-06-2022(online)].pdf 2022-06-03
37 201941043842-Correspondence to notify the Controller [07-06-2022(online)].pdf 2022-06-07
37 201941043842-FORM FOR STARTUP [29-10-2019(online)].pdf 2019-10-29
38 201941043842-STATEMENT OF UNDERTAKING (FORM 3) [29-10-2019(online)].pdf 2019-10-29
38 201941043842-Written submissions and relevant documents [22-06-2022(online)].pdf 2022-06-22

Search Strategy

1 Searchstrategy201941043842E_09-06-2020.pdf