Action Detection During Image Tracking

< Back

Action Detection During Image Tracking

Abstract: A system includes a sensor, a weight sensor, and a tracking subsystem. The tracking subsystem receives an image feed of top-view images generated by the sensor and weight measurements from the weight sensor. The tracking subsystem detects an event associated with an item being removed from a rack in which the weight sensor is installed. The tracking subsystem determines that a first person and a second person may be associated with the event. The tracking subsystem then determines, using a first approach, whether an action associated with the event was performed by the first person or the second person. If results of the first approach do not satisfy criteria, a second approach is used to assign the action to the first or second person.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

13 May 2022

Publication Number

34/2022

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

patent@aggarwalassociates.com

Parent Application

Applicants

7-ELEVEN, INC.

3200 Hackberry Road Irving, TX 75063

Inventors

1. KRISHNAMURTHY, Sailesh Bharathwaaj

5337 N. MacArthur Blvd., Apt. 2119 Irving, TX 75038

2. MIRZA, Shahmeer Ali

4157 Maclin Drive Celina, TX 75009

3. VAKACHARLA, Sarath

7954 N. Glen Dr., Apt. 2114 Irving, TX 75063

4. NGUYEN, Trong Nghia

3618 Villaverde Ave., Apt. 225 Dallas, TX 75234

5. MAUNG, Crystal

5814 Edinburgh St. Dallas, TX 75252

6. PAUL, Deepanjan

8100 Memorial Lane, Apt. #6109 Plano, TX 75024

7. CHINNAM, Madan Mohan

543 Northwest Hwy., Apt. 3306 Irving, TX 75039

Specification

ACTION DETECTION DURING IMAGE TRACKING

TECHNICAL FIELD

The present disclosure relates generally to object detection and tracking, and more specifically to action detection during image tracking.

BACKGROUND

Identifying and tracking objects within a space poses several technical challenges. Existing systems use various image processing techniques to identify objects (e.g. people). For example, these systems may identify different features of a person that can be used to later identify the person in an image. This process is computationally intensive when the image includes several people. For example, to identify a person in an image of a busy environment, such as a store, would involve identifying everyone in the image and then comparing the features for a person against every person in the image. In addition to being computationally intensive, this process requires a significant amount of time which means that this process is not compatible with real-time applications such as video streams. This problem becomes intractable when trying to simultaneously identify and track multiple objects. In addition, existing system lacks the ability to determine a physical location for an object that is located within an image.

SUMMARY

from shelves, it is possible for the computer to determine which person in the store removed the item and to charge that person for the item without needing to ring up the item at a register. In other words, the person can walk into the store, take items, and leave the store without stopping for the conventional checkout process.

For larger physical spaces (e.g., convenience stores and grocery stores), additional sensors can be installed throughout the space to track the position of people and/or objects as they move about the space. For example, additional cameras can be added to track positions in the larger space and additional weight sensors can be added to track additional items and shelves. Increasing the number of cameras poses a technical challenge because each camera only provides a field of view for a portion of the physical space. This means that information from each camera needs to be processed independently to identify and track people and objects within the field of view of a particular camera. The information from each camera then needs to be combined and processed as a collective in order to track people and objects within the physical space.

The system disclosed in the present application provides a technical solution to the technical problems discussed above by generating a relationship between the pixels of a camera and physical locations within a space. The disclosed system provides several practical applications and technical advantages which include 1) a process for generating a homography that maps pixels of a sensor (e.g. a camera) to physical locations in a global plane for a space (e.g. a room); 2) a process for determining a physical location for an object within a space using a sensor and a homography that is associated with the sensor; 3) a process for handing off tracking information for an object as the object moves from the field of view of one sensor to the field of view of another sensor; 4) a process for detecting when a sensor or a rack has moved within a space using markers; 5) a process for detecting where a person is interacting with a rack using a virtual curtain; 6) a process for associating an item with a person using a predefined zone that is associated with a rack; 7) a process for identifying and associating items with a non-uniform weight to a person; and 8) a process for identifying an item that has been misplaced on a rack based on its weight.

In one embodiment, the tracking system may be configured to generate homographies for sensors. A homography is configured to translate between pixel locations in an image from a sensor (e.g. a camera) and physical locations in a physical space. In this configuration, the tracking system determines coefficients for a homography based on the physical location of markers in a global plane for the space and the pixel locations of the markers in an image from a sensor. This configuration will be described in more detail using FIGS. 2-7.

In one embodiment, the tracking system is configured to calibrate a shelf position within the global plane using sensors. In this configuration, the tracking system periodically compares the current shelf location of a rack to an expected shelf location for the rack using a sensor. In the event that the current shelf location does not match the expected shelf location, then the tracking system uses one or more other sensors to determine whether the rack has moved or whether the first sensor has moved. This configuration will be described in more detail using FIGS. 8 and 9.

In one embodiment, the tracking system is configured to hand off tracking information for an object (e.g. a person) as it moves between the field of views of adjacent sensors. In this configuration, the tracking system tracks an object’s movement within the field of view of a first sensor and then hands off tracking information (e.g. an object identifier) for the object as it enters the field of view of a second adjacent sensor. This configuration will be described in more detail using FIGS. 10 and 11.

In one embodiment, the tracking system is configured to detect shelf interactions using a virtual curtain. In this configuration, the tracking system is configured to process an image captured by a sensor to determine where a person is interacting with a shelf of a rack. The tracking system uses a predetermined zone within the image as a virtual curtain that is used to determine which region and which shelf of a rack that a person is interacting with. This configuration will be described in more detail using FIGS. 12-14.

In one embodiment, the tracking system is configured to detect when an item has been picked up from a rack and to determine which person to assign the item to using a predefined zone that is associated with the rack. In this configuration, the tracking system detects that an item has been picked up using a weight sensor. The

tracking system then uses a sensor to identify a person within a predefined zone that is associated with the rack. Once the item and the person have been identified, the tracking system will add the item to a digital cart that is associated with the identified person. This configuration will be described in more detail using FIGS. 15 and 18.

In one embodiment, the tracking system is configured to identify an object that has a non-uniform weight and to assign the item to a person’s digital cart. In this configuration, the tracking system uses a sensor to identify markers (e.g. text or symbols) on an item that has been picked up. The tracking system uses the identified markers to then identify which item was picked up. The tracking system then uses the sensor to identify a person within a predefined zone that is associated with the rack. Once the item and the person have been identified, the tracking system will add the item to a digital cart that is associated with the identified person. This configuration will be described in more detail using FIGS.16 and 18.

In one embodiment, the tracking system is configured to detect and identify items that have been misplaced on a rack. For example, a person may put back an item in the wrong location on the rack. In this configuration, the tracking system uses a weight sensor to detect that an item has been put back on rack and to determine that the item is not in the correct location based on its weight. The tracking system then uses a sensor to identify the person that put the item on the rack and analyzes their digital cart to determine which item they put back based on the weights of the items in their digital cart. This configuration will be described in more detail using FIGS. 17 and 18.

In one embodiment, the tracking system is configured to determine pixel regions from images generated by each sensor which should be excluded during obj ect tracking. These pixel regions, or “auto-exclusion zones,” may be updated regularly (e.g., during times when there are no people moving through a space). The auto-exclusion zones may be used to generate a map of the physical portions of the space that are excluded during tracking. This configuration is described in more detail using FIGS. 19 through 21

In one embodiment, the tracking system is configured to distinguish between closely spaced people in a space. For instance, when two people are standing, or otherwise located, near each other, it may be difficult or impossible for a previous

systems to distinguish between these people, particularly based on top-view images. In this embodiment, the system identifies contours at multiple depths in top-view depth images in order to individually detect closely spaced objects. This configuration is described in more detail using FIGS. 22 and 23.

In one embodiment, the tracking system is configured to track people both locally (e.g., by tracking pixel positions in images received from each sensor) and globally (e.g., by tracking physical positions on a global plane corresponding to the physical coordinates in the space). Person tracking may be more reliable when performed both locally and globally. For example, if a person is “lost” locally (e.g., if a sensor fails to capture a frame and a person is not detected by the sensor), the person may still be tracked globally based on an image from a nearby sensor, an estimated local position of the person determined using a local tracking algorithm, and/or an estimated global position determined using a global tracking algorithm. This configuration is described in more detail using FIGS. 24A-C through 26.

In one embodiment, the tracking system is configured to maintain a record, which is referred to in this disclosure as a “candidate list,” of possible person identities, or identifiers (i.e., the usernames, account numbers, etc. of the people being tracked), during tracking. A candidate list is generated and updated during tracking to establish the possible identities of each tracked person. Generally, for each possible identity or identifier of a tracked person, the candidate list also includes a probability that the identity, or identifier, is believed to be correct. The candidate list is updated following interactions (e.g., collisions) between people and in response to other uncertainty events (e.g., a loss of sensor data, imaging errors, intentional trickery, etc.). This configuration is described in more detail using FIGS. 27 and 28.

In one embodiment, the tracking system is configured to employ a specially structured approach for object re-identification when the identity of a tracked person becomes uncertain or unknown (e.g., based on the candidate lists described above). For example, rather than relying heavily on resource-expensive machine learning-based approaches to re-identify people, “lower-cost” descriptors related to observable characteristics (e.g., height, color, width, volume, etc.) of people are used first for person re-identification. “Higher-cost” descriptors (e.g., determined using artificial

neural network models) are used when the lower-cost descriptors cannot provide reliable results. For instance, in some cases, a person may first be re-identified based on his/her height, hair color, and/or shoe color. However, if these descriptors are not sufficient for reliably re-identifying the person (e.g., because other people being tracked have similar characteristics), progressively higher-level approaches may be used (e.g., involving artificial neural networks that are trained to recognize people) which may be more effective at person identification but which generally involve the use of more processing resources. These configurations are described in more detail using FIGS.

29 through 32.

In one embodiment, the tracking system is configured to employ a cascade of algorithms (e.g., from more simple approaches based on relatively straightforwardly determined image features to more complex strategies involving artificial neural networks) to assign an item picked up from a rack to the correct person. The cascade may be triggered, for example, by (i) the proximity of two or more people to the rack, (ii) a hand crossing into the zone (or a “virtual curtain”) adj acent to the rack, and/ or (iii) a weight signal indicating an item was removed from the rack. In yet another embodiment, the tracking system is configured to employ a unique contour-based approach to assign an item to the correct person. For instance, if two people may be reaching into a rack to pick up an item, a contour may be “dilated” from a head height to a lower height in order to determine which person’s arm reached into the rack to pick up the item. If the results of this computationally efficient contour-based approach do not satisfy certain confidence criteria, a more computationally expensive approach may be used involving pose estimation. These configurations are described in more detail using FIGS. 33A-C through 35.

In one embodiment, the tracking system is configured to track an item after it exits a rack, identify a position at which the item stops moving, and determines which person is nearest to the stopped item. The nearest person is generally assigned the item. This configuration may be used, for instance, when an item cannot be assigned to the correct person even using an artificial neural network for pose estimation. This configuration is described in more detail using FIGS. 36A,B and 37.

Certain embodiments of the present disclosure may include some, all, or none of these advantages. These advantages and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

FIG. 1 is a schematic diagram of an embodiment of a tracking system configured to track objects within a space;

FIG. 2 is a flowchart of an embodiment of a sensor mapping method for the tracking system;

FIG. 3 is an example of a sensor mapping process for the tracking system;

FIG. 4 is an example of a frame from a sensor in the tracking system;

FIG. 5A is an example of a sensor mapping for a sensor in the tracking system; FIG. 5B is another example of a sensor mapping for a sensor in the tracking system;

FIG. 6 is a flowchart of an embodiment of a sensor mapping method for the tracking system using a marker grid;

FIG. 7 is an example of a sensor mapping process for the tracking system using a marker grid;

FIG. 8 is a flowchart of an embodiment of a shelf position calibration method for the tracking system;

FIG. 9 is an example of a shelf position calibration process for the tracking system;

FIG. 10 is a flowchart of an embodiment of a tracking hand off method for the tracking system;

FIG. 11 is an example of a tracking hand off process for the tracking system; FIG. 12 is a flowchart of an embodiment of a shelf interaction detection method for the tracking system;

FIG. 13 is a front view of an example of a shelf interaction detection process for the tracking system;

FIG. 14 is an overhead view of an example of a shelf interaction detection process for the tracking system;

FIG. 15 is a flowchart of an embodiment of an item assigning method for the tracking system;

FIG. 16 is a flowchart of an embodiment of an item identification method for the tracking system;

FIG. 17 is a flowchart of an embodiment of a misplaced item identification method for the tracking system;

FIG. 18 is an example of an item identification process for the tracking system;

FIG. 19 is a diagram illustrating the determination and use of auto-exclusion zones by the tracking system;

FIG. 20 is an example auto-exclusion zone map generated by the tracking system;

FIG. 21 is a flowchart illustrating an example method of generating and using auto-exclusion zones for object tracking using the tracking system;

FIG. 22 is a diagram illustrating the detection of closely spaced objects using the tracking system;

FIG. 23 is a flowchart illustrating an example method of detecting closely spaced objects using the tracking system;

FIGS. 24A-C are diagrams illustrating the tracking of a person in local image frames and in the global plane of space 102 using the tracking system;

FIGs. 25A-B illustrate the implementation of a particle filter tracker by the tracking system;

FIG. 26 is a flow diagram illustrating an example method of local and global object tracking using the tracking system;

FIG. 27 is a diagram illustrating the use of candidate lists for object identification during object tracking by the tracking system;

FIG. 28 is a flowchart illustrating an example method of maintaining candidate lists during object tracking by the tracking system;

FIG. 29 is a diagram illustrating an example tracking subsystem for use in the tracking system;

FIG. 30 is a diagram illustrating the determination of descriptors based on obj ect features using the tracking system;

FIGS. 31A-C are diagrams illustrating the use of descriptors for re-identifi cation during object tracking by the tracking system;

FIG. 32 is a flowchart illustrating an example method of object re-identification during object tracking using the tracking system;

FIGS. 33A-C are diagrams illustrating the assignment of an item to a person using the tracking system;

FIG. 34 is a flowchart of an example method for assigning an item to a person using the tracking system;

FIG. 35 is a flowchart of an example method of contour dilation-based item assignment using the tracking system;

FIGS. 36A-B are diagrams illustrating item tracking-based item assignment using the tracking system;

FIG. 37 is a flowchart of an example method of item tracking-based item assignment using the tracking system; and

FIG. 38 is an embodiment of a device configured to track objects within a space.

DETAILED DESCRIPTION

Position tracking systems are used to track the physical positions of people and/or objects in a physical space (e.g., a store). These systems typically use a sensor (e.g., a camera) to detect the presence of a person and/or object and a computer to determine the physical position of the person and/or object based on signals from the sensor. In a store setting, other types of sensors can be installed to track the movement of inventory within the store. For example, weight sensors can be installed on racks and shelves to determine when items have been removed from those racks and shelves. By tracking both the positions of persons in a store and when items have been removed from shelves, it is possible for the computer to determine which person in the store removed the item and to charge that person for the item without needing to ring up the item at a register. In other words, the person can walk into the store, take items, and leave the store without stopping for the conventional checkout process.

Additional information is disclosed in U.S. Patent Application No. _ entitled,

“Scalable Position Tracking System For Tracking Position In Large Spaces” (attorney docket no. 090278.0176) and U.S. Patent Application No. _ entitled, “Customer- Based Video Feed” (attorney docket no. 090278.0187) which are both hereby incorporated by reference herein as if reproduced in their entirety.

Tracking system overview

FIG. 1 is a schematic diagram of an embodiment of a tracking system 100 that is configured to track objects within a space 102. As discussed above, the tracking system 100 may be installed in a space 102 (e.g. a store) so that shoppers need not engage in the conventional checkout process. Although the example of a store is used in this disclosure, this disclosure contemplates that the tracking system 100 may be installed and used in any type of physical space (e.g. a room, an office, an outdoor stand, a mall, a supermarket, a convenience store, a pop-up store, a warehouse, a storage center, an amusement park, an airport, an office building, etc.). Generally, the tracking system 100 (or components thereol) is used to track the positions of people and/or objects within these spaces 102 for any suitable purpose. For example, at an airport, the tracking system 100 can track the positions of travelers and employees for security

purposes. As another example, at an amusement park, the tracking system 100 can track the positions of park guests to gauge the popularity of attractions. As yet another example, at an office building, the tracking system 100 can track the positions of employees and staff to monitor their productivity levels.

In FIG. 1, the space 102 is a store that comprises a plurality of items that are available for purchase. The tracking system 100 may be installed in the store so that shoppers need not engage in the conventional checkout process to purchase items from the store. In this example, the store may be a convenience store or a grocery store. In other examples, the store may not be a physical building, but a physical space or environment where shoppers may shop. For example, the store may be a grab and go pantry at an airport, a kiosk in an office building, an outdoor market at a park, etc.

In FIG. 1, the space 102 comprises one or more racks 112. Each rack 112 comprises one or more shelves that are configured to hold and display items. In some embodiments, the space 102 may comprise refrigerators, coolers, freezers, or any other suitable type of furniture for holding or displaying items for purchase. The space 102 may be configured as shown or in any other suitable configuration.

In this example, the space 102 is a physical structure that includes an entry way through which shoppers can enter and exit the space 102. The space 102 comprises an entrance area 114 and an exit area 116. In some embodiments, the entrance area 114 and the exit area 116 may overlap or are the same area within the space 102. The entrance area 114 is adj acent to an entrance (e. g. a door) of the space 102 where a person enters the space 102. In some embodiments, the entrance area 114 may comprise a turnstile or gate that controls the flow of traffic into the space 102. For example, the entrance area 114 may comprise a turnstile that only allows one person to enter the space 102 at a time. The entrance area 114 may be adjacent to one or more devices (e.g. sensors 108 or a scanner 115) that identify a person as they enter space 102. As an example, a sensor 108 may capture one or more images of a person as they enter the space 102. As another example, a person may identify themselves using a scanner 115. Examples of scanners 115 include, but are not limited to, a QR code scanner, a barcode scanner, a near-field communication (NFC) scanner, or any other suitable type of scanner that can receive an electronic code embedded with information that uniquely

identifies a person. For instance, a shopper may scan a personal device (e.g. a smart phone) on a scanner 115 to enter the store. When the shopper scans their personal device on the scanner 115, the personal device may provide the scanner 115 with an electronic code that uniquely identifies the shopper. After the shopper is identified and/or authenticated, the shopper is allowed to enter the store. In one embodiment, each shopper may have a registered account with the store to receive an identification code for the personal device.

After entering the space 102, the shopper may move around the interior of the store. As the shopper moves throughout the space 102, the shopper may shop for items by removing items from the racks 112. The shopper can remove multiple items from the racks 112 in the store to purchase those items. When the shopper has finished shopping, the shopper may leave the store via the exit area 116. The exit area 116 is adjacent to an exit (e.g. a door) of the space 102 where a person leaves the space 102. In some embodiments, the exit area 116 may comprise a turnstile or gate that controls the flow of traffic out of the space 102. For example, the exit area 116 may comprise a turnstile that only allows one person to leave the space 102 at a time. In some embodiments, the exit area 116 may be adjacent to one or more devices (e.g. sensors 108 or a scanner 115) that identify a person as they leave the space 102. For example, a shopper may scan their personal device on the scanner 115 before a turnstile or gate will open to allow the shopper to exit the store. When the shopper scans their personal device on the scanner 115, the personal device may provide an electronic code that uniquely identifies the shopper to indicate that the shopper is leaving the store. When the shopper leaves the store, an account for the shopper is charged for the items that the shopper removed from the store. Through this process the tracking system 100 allows the shopper to leave the store with their items without engaging in a conventional checkout process.

Global Plane Overview

In order to describe the physical location of people and objects within the space 102, a global plane 104 is defined for the space 102. The global plane 104 is a user-defined coordinate system that is used by the tracking system 100 to identify the

locations of objects within a physical domain (i.e. the space 102). Referring to FIG. 1 as an example, a global plane 104 is defined such that an x-axis and ay-axis are parallel with a floor of the space 102. In this example, the z-axis of the global plane 104 is perpendicular to the floor of the space 102. A location in the space 102 is defined as a reference location 101 or origin for the global plane 104. In FIG. 1, the global plane 104 is defined such that reference location 101 corresponds with a comer of the store. In other examples, the reference location 101 may be located at any other suitable location within the space 102.

In this configuration, physical locations within the space 102 can be described using (x,y) coordinates in the global plane 104. As an example, the global plane 104 may be defined such that one unit in the global plane 104 corresponds with one meter in the space 102. In other words, an x-value of one in the global plane 104 corresponds with an offset of one meter from the reference location 101 in the space 102. In this example, a person that is standing in the comer of the space 102 at the reference location 101 will have an (x,y) coordinate with a value of (0,0) in the global plane 104. If person moves two meters in the positive x-axis direction and two meters in the positive y-axis direction, then their new (x,y) coordinate will have a value of (2,2). In other examples, the global plane 104 may be expressed using inches, feet, or any other suitable measurement units.

Once the global plane 104 is defined for the space 102, the tracking system 100 uses (x,y) coordinates of the global plane 104 to track the location of people and objects within the space 102. For example, as a shopper moves within the interior of the store, the tracking system 100 may track their current physical location within the store using (x,y) coordinates of the global plane 104.

Tracking system hardware

In one embodiment, the tracking system 100 comprises one or more clients 105, one or more servers 106, one or more scanners 115, one or more sensors 108, and one or more weight sensors 110. The one or more clients 105, one or more servers 106, one or more scanners 115, one or more sensors 108, and one or more weight sensors 110 may be in signal communication with each other over a network 107. The network 107

may be any suitable type of wireless and/or wired network including, but not limited to, all or a portion of the Internet, an Intranet, a Bluetooth network, a WIFI network, a Zigbee network, a Z-wave network, a private network, a public network, a peer-to-peer network, the public switched telephone network, a cellular network, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), and a satellite network. The network 107 may be configured to support any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art. The tracking system 100 may be configured as shown or in any other suitable configuration.

Sensors

The tracking system 100 is configured to use sensors 108 to identify and track the location of people and objects within the space 102. For example, the tracking system 100 uses sensors 108 to capture images or videos of a shopper as they move within the store. The tracking system 100 may process the images or videos provided by the sensors 108 to identify the shopper, the location of the shopper, and/or any items that the shopper picks up.

Examples of sensors 108 include, but are not limited to, cameras, video cameras, web cameras, printed circuit board (PCB) cameras, depth sensing cameras, time-of-flight cameras, LiDARs, structured light cameras, or any other suitable type of imaging device.

Each sensor 108 is positioned above at least a portion of the space 102 and is configured to capture overhead view images or videos of at least a portion of the space 102. In one embodiment, the sensors 108 are generally configured to produce videos of portions of the interior of the space 102. These videos may include frames or images 302 of shoppers within the space 102. Each frame 302 is a snapshot of the people and/or objects within the field of view of a particular sensor 108 at a particular moment in time. A frame 302 may be a two-dimensional (2D) image or a three-dimensional (3D) image (e.g. a point cloud or a depth map). In this configuration, each frame 302 is of a portion of a global plane 104 for the space 102. Referring to FIG. 4 as an example, a frame 302 comprises a plurality of pixels that are each associated with a pixel location 402 within the frame 302. The tracking system 100 uses pixel locations 402 to describe the location of an object with respect to pixels in a frame 302 from a sensor 108. In the example shown in FIG. 4, the tracking system 100 can identify the location of different marker 304 within the frame 302 using their respective pixel locations 402. The pixel location 402 corresponds with a pixel row and a pixel column where a pixel is located within the frame 302. In one embodiment, each pixel is also associated with a pixel value 404 that indicates a depth or distance measurement in the global plane 104. For example, a pixel value 404 may correspond with a distance between a sensor 108 and a surface in the space 102.

Each sensor 108 has a limited field of view within the space 102. This means that each sensor 108 may only be able to capture a portion of the space 102 within their field of view. To provide complete coverage of the space 102, the tracking system 100 may use multiple sensors 108 configured as a sensor array. In FIG. 1, the sensors 108 are configured as a three by four sensor array. In other examples, a sensor array may comprise any other suitable number and/or configuration of sensors 108. In one embodiment, the sensor array is positioned parallel with the floor of the space 102. In some embodiments, the sensor array is configured such that adjacent sensors 108 have at least partially overlapping fields of view. In this configuration, each sensor 108 captures images or frames 302 of a different portion of the space 102 which allows the tracking system 100 to monitor the entire space 102 by combining information from frames 302 of multiple sensors 108. The tracking system 100 is configured to map pixel locations 402 within each sensor 108 to physical locations in the space 102 using homographies 118. A homography 118 is configured to translate between pixel locations 402 in a frame 302 captured by a sensor 108 and (x,y) coordinates in the global plane 104 (i.e. physical locations in the space 102). The tracking system 100 uses homographies 118 to correlate between a pixel location 402 in a particular sensor 108 with a physical location in the space 102. In other words, the tracking system 100 uses homographies 118 to determine where a person is physically located in the space 102 based on their pixel location 402 within a frame 302 from a sensor 108. Since the tracking system 100 uses multiple sensors 108 to monitor the entire space 102, each sensor 108 is uniquely associated with a different homography 118 based on the sensor’s 108 physical location within the space 102. This configuration allows the

Weight sensors

The tracking system 100 is configured to use weight sensors 110 to detect and identify items that a person picks up within the space 102. For example, the tracking system 100 uses weight sensors 110 that are located on the shelves of a rack 112 to detect when a shopper removes an item from the rack 112. Each weight sensor 110 may be associated with a particular item which allows the tracking system 100 to identify which item the shopper picked up.

A weight sensor 110 is generally configured to measure the weight of objects (e.g. products) that are placed on or near the weight sensor 110. For example, a weight sensor 110 may comprise a transducer that converts an input mechanical force (e.g. weight, tension, compression, pressure, or torque) into an output electrical signal (e.g. current or voltage). As the input force increases, the output electrical signal may increase proportionally. The tracking system 100 is configured to analyze the output electrical signal to determine an overall weight for the items on the weight sensor 110.

Examples of weight sensors 110 include, but are not limited to, a piezoelectric load cell or a pressure sensor. For example, a weight sensor 110 may comprise one or more load cells that are configured to communicate electrical signals that indicate a weight experienced by the load cells. For instance, the load cells may produce an electrical current that varies depending on the weight or force experienced by the load cells. The load cells are configured to communicate the produced electrical signals to a server 105 and/or a client 106 for processing.

Weight sensors 110 may be positioned onto furniture (e.g. racks 112) within the space 102 to hold one or more items. For example, one or more weight sensors 110 may be positioned on a shelf of a rack 112. As another example, one or more weight sensors 110 may be positioned on a shelf of a refrigerator or a cooler. As another example, one or more weight sensors 110 may be integrated with a shelf of a rack 112. In other

examples, weight sensors 110 may be positioned in any other suitable location within the space 102.

In one embodiment, a weight sensor 110 may be associated with a particular item. For instance, a weight sensor 110 may be configured to hold one or more of a particular item and to measure a combined weight for the items on the weight sensor 110. When an item is picked up from the weight sensor 110, the weight sensor 110 is configured to detect a weight decrease. In this example, the weight sensor 110 is configured to use stored information about the weight of the item to determine a number of items that were removed from the weight sensor 110. For example, a weight sensor 110 may be associated with an item that has an individual weight of eight ounces. When the weight sensor 110 detects a weight decrease of twenty-four ounces, the weight sensor 110 may determine that three of the items were removed from the weight sensor 110. The weight sensor 110 is also configured to detect a weight increase when an item is added to the weight sensor 110. For example, if an item is returned to the weight sensor 110, then the weight sensor 110 will determine a weight increase that corresponds with the individual weight for the item associated with the weight sensor 110.

Servers

A server 106 may be formed by one or more physical devices configured to provide services and resources (e.g. data and/or hardware resources) for the tracking system 100. Additional information about the hardware configuration of a server 106 is described in FIG. 38. In one embodiment, a server 106 may be operably coupled to one or more sensors 108 and/or weight sensors 110. The tracking system 100 may comprise any suitable number of servers 106. For example, the tracking system 100 may comprise a first server 106 that is in signal communication with a first plurality of sensors 108 in a sensor array and a second server 106 that is in signal communication with a second plurality of sensors 108 in the sensor array. As another example, the tracking system 100 may comprise a first server 106 that is in signal communication with a plurality of sensors 108 and a second server 106 that is in signal communication with a plurality of weight sensors 110. In other examples, the tracking system 100 may comprise any other suitable number of servers 106 that are each in signal communication with one or more sensors 108 and/or weight sensors 110.

A server 106 may be configured to process data (e.g. frames 302 and/or video) for one or more sensors 108 and/or weight sensors 110. In one embodiment, a server 106 may be configured to generate homographies 118 for sensors 108. As discussed above, the generated homographies 118 allow the tracking system 100 to determine where a person is physically located within the entire space 102 based on which sensor 108 they appear in and their location within a frame 302 captured by that sensor 108. In this configuration, the server 106 determines coefficients for a homography 118 based on the physical location of markers in the global plane 104 and the pixel locations of the markers in an image from a sensor 108. Examples of the server 106 performing this process are described in FIGS. 2-7.

In one embodiment, a server 106 is configured to calibrate a shelf position within the global plane 104 using sensors 108. This process allows the tracking system 100 to detect when a rack 112 or sensor 108 has moved from its original location within the space 102. In this configuration, the server 106 periodically compares the current shelf location of a rack 112 to an expected shelf location for the rack 112 using a sensor 108. In the event that the current shelf location does not match the expected shelf location, then the server 106 will use one or more other sensors 108 to determine whether the rack 112 has moved or whether the first sensor 108 has moved. An example of the server 106 performing this process is described in FIGS. 8 and 9.

In one embodiment, a server 106 is configured to hand off tracking information for an object (e.g. a person) as it moves between the fields of view of adjacent sensors 108. This process allows the tracking system 100 to track people as they move within the interior of the space 102. In this configuration, the server 106 tracks an object’s movement within the field of view of a first sensor 108 and then hands off tracking information (e.g. an object identifier) for the object as it enters the field of view of a second adjacent sensor 108. An example of the server 106 performing this process is described in FIGS. 10 and 11.

In one embodiment, a server 106 is configured to detect shelf interactions using a virtual curtain. This process allows the tracking system 100 to identify items that a person picks up from a rack 112. In this configuration, the server 106 is configured to process an image captured by a sensor 108 to determine where a person is interacting with a shelf of a rack 112. The server 106 uses a predetermined zone within the image as a virtual curtain that is used to determine which region and which shelf of a rack 112 that a person is interacting with. An example of the server 106 performing this process is described in FIGS. 12-14.

In one embodiment, a server 106 is configured to detect when an item has been picked up from a rack 112 and to determine which person to assign the item to using a predefined zone that is associated with the rack 112. This process allows the tracking system 100 to associate items on a rack 112 with the person that picked up the item. In this configuration, the server 106 detects that an item has been picked up using a weight sensor 110. The server 106 then uses a sensor 108 to identify a person within a predefined zone that is associated with the rack 112. Once the item and the person have been identified, the server 106 will add the item to a digital cart that is associated with the identified person. An example of the server 106 performing this process is described in FIGS. 15 and 18.

In one embodiment, a server 106 is configured to identify an object that has a non-uniform weight and to assign the item to a person’s digital cart. This process allows the tracking system 100 to identify items that a person picks up that cannot be identified based on just their weight. For example, the weight of fresh food is not constant and will vary from item to item. In this configuration, the server 106 uses a sensor 108 to identify markers (e.g. text or symbols) on an item that has been picked up. The server 106 uses the identified markers to then identify which item was picked up. The server 106 then uses the sensor 108 to identify a person within a predefined zone that is associated with the rack 112. Once the item and the person have been identified, the server 106 will add the item to a digital cart that is associated with the identified person. An example of the server 106 performing this process is described in FIGS. 16 and 18.

In one embodiment, a server 106 is configured to identify items that have been misplaced on a rack 112. This process allows the tracking system 100 to remove items from a shopper’s digital cart when the shopper puts down an item regardless of whether they put the item back in its proper location. For example, a person may put back an

item in the wrong location on the rack 112 or on the wrong rack 112. In this configuration, the server 106 uses a weight sensor 110 to detect that an item has been put back on rack 112 and to determine that the item is not in the correct location based on its weight. The server 106 then uses a sensor 108 to identify the person that put the item on the rack 112 and analyzes their digital cart to determine which item they put back based on the weights of the items in their digital cart. An example of the server 106 performing this process is described in FIGS. 17 and 18.

Clients

In some embodiments, one or more sensors 108 and/or weight sensors 110 are operably coupled to a server 106 via a client 105. In one embodiment, the tracking system 100 comprises a plurality of clients 105 that may each be operably coupled to one or more sensors 108 and/or weight sensors 110. For example, first client 105 may be operably coupled to one or more sensors 108 and/or weight sensors 110 and a second client 105 may be operably coupled to one or more other sensors 108 and/or weight sensors 110. A client 105 may be formed by one or more physical devices configured to process data (e.g. frames 302 and/or video) for one or more sensors 108 and/or weight sensors 110. A client 105 may act as an intermediary for exchanging data between a server 106 and one or more sensors 108 and/or weight sensors 110. The combination of one or more clients 105 and a server 106 may also be referred to as a tracking sub system. In this configuration, a client 105 may be configured to provide image processing capabilities for images or frames 302 that are captured by a sensor 108. The client 105 is further configured to send images, processed images, or any other suitable type of data to the server 106 for further processing and analysis. In some embodiments, a client 105 may be configured to perform one or more of the processes described above for the server 106.

Sensor mapping process

FIG. 2 is a flowchart of an embodiment of a sensor mapping method 200 for the tracking system 100. The tracking system 100 may employ method 200 to generate a homography 118 for a sensor 108. As discussed above, a homography 118 allows the

tracking system 100 to determine where a person is physically located within the entire space 102 based on which sensor 108 they appear in and their location within a frame 302 captured by that sensor 108. Once generated, the homography 118 can be used to translate between pixel locations 402 in images (e.g. frames 302) captured by a sensor 108 and (x,y) coordinates 306 in the global plane 104 (i.e. physical locations in the space 102). The following is a non-limiting example of the process for generating a homography 118 for single sensor 108. This same process can be repeated for generating a homography 118 for other sensors 108.

At step 202, the tracking system 100 receives (x,y) coordinates 306 for markers 304 in the space 102. Referring to FIG. 3 as an example, each marker 304 is an object that identifies a known physical location within the space 102. The markers 304 are used to demarcate locations in the physical domain (i.e. the global plane 104) that can be mapped to pixel locations 402 in a frame 302 from a sensor 108. In this example, the markers 304 are represented as stars on the floor of the space 102. A marker 304 may be formed of any suitable object that can be observed by a sensor 108. For example, a marker 304 may be tape or a sticker that is placed on the floor of the space 102. As another example, a marker 304 may be a design or marking on the floor of the space 102. In other examples, markers 304 may be positioned in any other suitable location within the space 102 that is observable by a sensor 108. For instance, one or more markers 304 may be positioned on top of a rack 112.

In one embodiment, the (x,y) coordinates 306 for markers 304 are provided by an operator. For example, an operator may manually place markers 304 on the floor of the space 102. The operator may determine an (x,y) location 306 for a marker 304 by measuring the distance between the marker 304 and the reference location 101 for the global plane 104. The operator may then provide the determined (x,y) location 306 to a server 106 or a client 105 of the tracking system 100 as an input.

Referring to the example in FIG. 3, the tracking system 100 may receive a first (x,y) coordinate 306A for a first marker 304A in a space 102 and a second (x,y) coordinate 306B for a second marker 304B in the space 102. The first (x,y) coordinate 306A describes the physical location of the first marker 304A with respect to the global plane 104 of the space 102. The second (x,y) coordinate 306B describes the physical location of the second marker 304B with respect to the global plane 104 of the space 102. The tracking system 100 may repeat the process of obtaining (x,y) coordinates 306 for any suitable number of additional markers 304 within the space 102.

Returning to FIG. 2 at step 206, the tracking system 100 identifies markers 304 within the frame 302 of the sensor 108. In one embodiment, the tracking system 100 uses object detection to identify markers 304 within the frame 302. For example, the markers 304 may have known features (e.g. shape, pattern, color, text, etc.) that the tracking system 100 can search for within the frame 302 to identify a marker 304. Referring to the example in FIG. 3, each marker 304 has a star shape. In this example, the tracking system 100 may search the frame 302 for star shaped objects to identify the markers 304 within the frame 302. The tracking system 100 may identify the first marker 304A, the second marker 304B, and any other markers 304 within the frame 302. In other examples, the tracking system 100 may use any other suitable features for identifying markers 304 within the frame 302. In other embodiments, the tracking system 100 may employ any other suitable image processing technique for identifying markers 302 with the frame 302. For example, the markers 304 may have a known color or pixel value. In this example, the tracking system 100 may use thresholds to identify the markers 304 within frame 302 that correspond with the color or pixel value of the markers 304.

Returning to FIG. 2 at step 208, the tracking system 100 determines the number of identified markers 304 within the frame 302. Here, tracking system 100 counts the number of markers 304 that were detected within the frame 302. Referring to the example in FIG. 3, the tracking system 100 detects eight markers 304 within the frame 302.

Returning to FIG. 2 at step 210, the tracking system 100 determines whether the number of identified markers 304 is greater than or equal to a predetermined threshold value. In some embodiments, the predetermined threshold value is proportional to a level of accuracy for generating a homography 118 for a sensor 108. Increasing the predetermined threshold value may increase the accuracy when generating a homography 118 while decreasing the predetermined threshold value may decrease the accuracy when generating a homography 118. As an example, the predetermined threshold value may be set to a value of six. In the example shown in FIG. 3, the tracking system 100 identified eight markers 304 which is greater than the predetermined threshold value. In other examples, the predetermined threshold value may be set to any other suitable value. The tracking system 100 returns to step 204 in response to determining that the number of identified markers 304 is less than the predetermined threshold value. In this case, the tracking system 100 returns to step 204 to capture another frame 302 of the space 102 using the same sensor 108 to try to detect more markers 304. Here, the tracking system 100 tries to obtain a new frame 302 that includes a number of markers 304 that is greater than or equal to the predetermined threshold value. For example, the tracking system 100 may receive new frame 302 of the space 102 after an operator adds one or more additional markers 304 to the space 102. As another example, the tracking system 100 may receive new frame 302 after lighting conditions have been changed to improve the detectability of the markers 304 within the frame 302. In other examples, the tracking system 100 may receive new frame 302 after any kind of change that improves the detectability of the markers 304 within the frame 302.

The tracking system 100 proceeds to step 212 in response to determining that the number of identified markers 304 is greater than or equal to the predetermined threshold value. At step 212, the tracking system 100 determines pixel locations 402 in the frame 302 for the identified markers 304. For example, the tracking system 100 determines a first pixel location 402A within the frame 302 that corresponds with the first marker 304A and a second pixel location 402B within the frame 302 that corresponds with the second marker 304B. The first pixel location 402A comprises a first pixel row and a first pixel column indicating where the first marker 304A is located in the frame 302. The second pixel location 402B comprises a second pixel row and a second pixel column indicating where the second marker 304B is located in the frame 302.

At step 214, the tracking system 100 generates a homography 118 for the sensor 108 based on the pixel locations 402 of identified markers 304 with the frame 302 of the sensor 108 and the (x,y) coordinate 306 of the identified markers 304 in the global plane 104. In one embodiment, the tracking system 100 correlates the pixel location 402 for each of the identified markers 304 with its corresponding (x,y) coordinate 306. Continuing with the example in FIG. 3, the tracking system 100 associates the first pixel location 402 A for the first marker 304 A with the first (x,y) coordinate 306 A for the first marker 304A. The tracking system 100 also associates the second pixel location 402B for the second marker 304B with the second (x,y) coordinate 306B for the second marker 304B. The tracking system 100 may repeat the process of associating pixel locations 402 and (x,y) coordinates 306 for all of the identified markers 304.

The tracking system 100 then determines a relationship between the pixel locations 402 of identified markers 304 with the frame 302 of the sensor 108 and the (x,y) coordinates 306 of the identified markers 304 in the global plane 104 to generate a homography 118 for the sensor 108. The generated homography 118 allows the tracking system 100 to map pixel locations 402 in a frame 302 from the sensor 108 to (x,y) coordinates 306 in the global plane 104. Additional information about a homography 118 is described in FIGS. 5A and 5B. Once the tracking system 100 generates the homography 118 for the sensor 108, the tracking system 100 stores an association between the sensor 108 and the generated homography 118 in memory (e.g. memory 3804).

The tracking system 100 may repeat the process described above to generate and associate homographies 118 with other sensors 108. Continuing with the example in FIG. 3, the tracking system 100 may receive a second frame 302 from a second sensor 108. In this example, the second frame 302 comprises the first marker 304A and the second marker 304B. The tracking system 100 may determine a third pixel location 402 in the second frame 302 for the first marker 304A, a fourth pixel location 402 in the second frame 302 for the second marker 304B, and pixel locations 402 for any other

markers 304. The tracking system 100 may then generate a second homography 118 based on the third pixel location 402 in the second frame 302 for the first marker 304 A, the fourth pixel location 402 in the second frame 302 for the second marker 304B, the first (x,y) coordinate 306A in the global plane 104 for the first marker 304 A, the second (x,y) coordinate 306B in the global plane 104 for the second marker 304B, and pixel locations 402 and (x,y) coordinates 306 for other markers 304. The second homography 118 comprises coefficients that translate between pixel locations 402 in the second frame 302 and physical locations (e.g. (x,y) coordinates 306) in the global plane 104. The coefficients of the second homography 118 are different from the coefficients of the homography 118 that is associated with the first sensor 108. This process uniquely associates each sensor 108 with a corresponding homography 118 that maps pixel locations 402 from the sensor 108 to (x,y) coordinates 306 in the global plane 104.

Homographies

An example of a homography 118 for a sensor 108 is described in FIGS. 5 A and 5B. Referring to FIG. 5A, a homography 118 comprises a plurality of coefficients configured to translate between pixel locations 402 in a frame 302 and physical locations (e.g. (x,y) coordinates 306) in the global plane 104. In this example, the homography 118 is configured as a matrix and the coefficients of the homography 118 are represented as Hu, H12, H13, H14, H21, H22, H23, ¾4, H31, H32, H33, H34, H41, H42, H43, and H44. The tracking system 100 may generate the homography 118 by defining a relationship or function between pixel locations 402 in a frame 302 and physical locations (e.g. (x,y) coordinates 306) in the global plane 104 using the coefficients. For example, the tracking system 100 may define one or more functions using the coefficients and may perform a regression (e.g. least squares regression) to solve for values for the coefficients that project pixel locations 402 of a frame 302 of a sensor to (x,y) coordinates 306 in the global plane 104. Referring to the example in FIG. 3, the homography 118 for the sensor 108 is configured to proj ect the first pixel location 402A in the frame 302 for the first marker 304 A to the first (x,y) coordinate 306 A in the global plane 104 for the first marker 304A and to project the second pixel location 402B in the frame 302 for the second marker 304B to the second (x,y) coordinate 306B in the global plane 104 for the second marker 304B. In other examples, the tracking system 100 may solve for coefficients of the homography 118 using any other suitable technique. In the example shown in FIG. 5A, the z-value at the pixel location 402 may correspond with a pixel value 404. In this case, the homography 118 is further configured to translate between pixel values 404 in a frame 302 and z-coordinates (e.g. heights or elevations) in the global plane 104.

Using homographies

Once the tracking system 100 generates a homography 118, the tracking system 100 may use the homography 118 to determine the location of an object (e.g. a person) within the space 102 based on the pixel location 402 of the object in a frame 302 of a sensor 108. For example, the tracking system 100 may perform matrix multiplication between a pixel location 402 in a first frame 302 and a homography 118 to determine a corresponding (x,y) coordinate 306 in the global plane 104. For example, the tracking system 100 receives a first frame 302 from a sensor 108 and determines a first pixel location in the frame 302 for an object in the space 102. The tracking system 100 may then apply the homography 118 that is associated with the sensor 108 to the first pixel location 402 of the object to determine a first (x,y) coordinate 306 that identifies a first x-value and a first y-value in the global plane 104 where the object is located.

In some instances, the tracking system 100 may use multiple sensors 108 to determine the location of the object. Using multiple sensors 108 may provide more accuracy when determining where an object is located within the space 102. In this case, the tracking system 100 uses homographies 118 that are associated with different sensors 108 to determine the location of an object within the global plane 104. Continuing with the previous example, the tracking system 100 may receive a second frame 302 from a second sensor 108. The tracking system 100 may determine a second pixel location 402 in the second frame 302 for the object in the space 102. The tracking system 100 may then apply a second homography 118 that is associated the second sensor 108 to the second pixel location 402 of the object to determine a second (x,y) coordinate 306 that identifies a second x-value and a second y-value in the global plane 104 where the object is located.

When the first (x,y) coordinate 306 and the second (x,y) coordinate 306 are the same, the tracking system 100 may use either the first (x,y) coordinate 306 or the second (x,y) coordinate 306 as the physical location of the object within the space 102. The tracking system 100 may employ any suitable clustering technique between the first (x,y) coordinate 306 and the second (x,y) coordinate 306 when the first (x,y) coordinate 306 and the second (x,y) coordinate 306 are not the same. In this case, the first (x,y) coordinate 306 and the second (x,y) coordinate 306 are different so the tracking system 100 will need to determine the physical location of the object within the space 102 based off the first (x,y) location 306 and the second (x,y) location 306. For example, the tracking system 100 may generate an average (x,y) coordinate for the object by computing an average between the first (x,y) coordinate 306 and the second (x,y) coordinate 306. As another example, the tracking system 100 may generate a median (x,y) coordinate for the object by computing a median between the first (x,y) coordinate 306 and the second (x,y) coordinate 306. In other examples, the tracking system 100 may employ any other suitable technique to resolve differences between the first (x,y) coordinate 306 and the second (x,y) coordinate 306.

The tracking system 100 may use the inverse of the homography 118 to project from (x,y) coordinates 306 in the global plane 104 to pixel locations 402 in a frame 302 of a sensor 108. For example, the tracking system 100 receives an (x,y) coordinate 306 in the global plane 104 for an object. The tracking system 100 identifies a homography 118 that is associated with a sensor 108 where the object is seen. The tracking system 100 may then apply the inverse homography 118 to the (x,y) coordinate 306 to determine a pixel location 402 where the object is located in the frame 302 for the sensor 108. The tracking system 100 may compute the matrix inverse of the homograph 500 when the homography 118 is represented as a matrix. Referring to FIG. 5B as an example, the tracking system 100 may perform matrix multiplication between a (x,y) coordinates 306 in the global plane 104 and the inverse homography 118 to determine a corresponding pixel location 402 in the frame 302 for the sensor 108.

Sensor mapping using a marker grid

FIG. 6 is a flowchart of an embodiment of a sensor mapping method 600 for the tracking system 100 using a marker grid 702. The tracking system 100 may employ method 600 to reduce the amount of time it takes to generate a homography 118 for a sensor 108. For example, using a marker grid 702 reduces the amount of setup time required to generate a homography 118 for a sensor 108. Typically, each marker 304 is placed within a space 102 and the physical location of each marker 304 is determined independently. This process is repeated for each sensor 108 in a sensor array. In contrast, a marker grid 702 is a portable surface that comprises a plurality of markers 304. The marker grid 702 may be formed using carpet, fabric, poster board, foam board, vinyl, paper, wood, or any other suitable type of material. Each marker 304 is an object that identifies a particular location on the marker grid 702. Examples of markers 304 include, but are not limited to, shapes, symbols, and text. The physical locations of each marker 304 on the marker grid 702 are known and are stored in memory (e.g. marker grid information 716). Using a marker grid 702 simplifies and speeds the up the process of placing and determining the location of markers 304 because the marker grid 702 and its markers 304 can be quickly repositioned anywhere within the space 102 without having to individually move markers 304 or add new markers 304 to the space 102. Once generated, the homography 118 can be used to translate between pixel locations 402 in frame 302 captured by a sensor 108 and (x,y) coordinates 306 in the global plane 104 (i.e. physical locations in the space 102).

At step 602, the tracking system 100 receives a first (x,y) coordinate 306A for a first comer 704 of a marker grid 702 in a space 102. Referring to FIG. 7 as an example, the marker grid 702 is configured to be positioned on a surface (e.g. the floor) within the space 102 that is observable by one or more sensors 108. In this example, the tracking system 100 receives a first (x,y) coordinate 306A in the global plane 104 for a first comer 704 of the marker grid 702. The first (x,y) coordinate 306A describes the physical location of the first comer 704 with respect to the global plane 104. In one embodiment, the first (x,y) coordinate 306A is based on a physical measurement of a distance between a reference location 101 in the space 102 and the first comer 704. For example, the first (x,y) coordinate 306A for the first comer 704 of the marker grid 702 may be provided by an operator. In this example, an operator may manually place the marker grid 702 on the floor of the space 102. The operator may determine an (x,y) location 306 for the first comer 704 of the marker grid 702 by measuring the distance between the first comer 704 of the marker grid 702 and the reference location 101 for the global plane 104. The operator may then provide the determined (x,y) location 306 to a server 106 or a client 105 of the tracking system 100 as an input.

In another embodiment, the tracking system 100 may receive a signal from a beacon located at the first comer 704 of the marker grid 702 that identifies the first (x,y) coordinate 306 A. An example of a beacon includes, but is not limited to, a Bluetooth beacon. For example, the tracking system 100 may communicate with the beacon and determine the first (x,y) coordinate 306A based on the time-of-flight of a signal that is communicated between the tracking system 100 and the beacon. In other embodiments, the tracking system 100 may obtain the first (x,y) coordinate 306A for the first comer 704 using any other suitable technique.

Returning to FIG. 6 at step 604, the tracking system 100 determines (x,y) coordinates 306 for the markers 304 on the marker grid 702. Returning to the example in FIG. 7, the tracking system 100 determines a second (x,y) coordinate 306B for a first marker 304A on the marker grid 702. The tracking system 100 comprises marker grid information 716 that identifies offsets between markers 304 on the marker grid 702 and the first comer 704 of the marker grid 702. In this example, the offset comprises a distance between the first comer 704 of the marker grid 702 and the first marker 304A with respect to the x-axis and the y-axis of the global plane 104. Using the marker grid information 1912, the tracking system 100 is able to determine the second (x,y) coordinate 306B for the first marker 304A by adding an offset associated with the first marker 304A to the first (x,y) coordinate 306A for the first comer 704 of the marker grid 702.

In one embodiment, the tracking system 100 determines the second (x,y) coordinate 306B based at least in part on a rotation of the marker grid 702. For example, the tracking system 100 may receive a fourth (x,y) coordinate 306D that identifies x-value and a y-value in the global plane 104 for a second comer 706 of the marker grid 702. The tracking system 100 may obtain the fourth (x,y) coordinate 306D for the

second comer 706 of the marker grid 702 using a process similar to the process described in step 602. The tracking system 100 determines a rotation angle 712 between the first (x,y) coordinate 306A for the first comer 704 of the marker grid 702 and the fourth (x,y) coordinate 306D for the second comer 706 of the marker grid 702. In this example, the rotation angle 712 is about the first comer 704 of the marker grid 702 within the global plane 104. The tracking system 100 then determines the second (x,y) coordinate 306B for the first marker 304A by applying a translation by adding the offset associated with the first marker 304A to the first (x,y) coordinate 306A for the first comer 704 of the marker grid 702 and applying a rotation using the rotation angle 712 about the first (x,y) coordinate 306A for the first comer 704 of the marker grid 702. In other examples, the tracking system 100 may determine the second (x,y) coordinate 306B for the first marker 304A using any other suitable technique.

The tracking system 100 may repeat this process for one or more additional markers 304 on the marker grid 702. For example, the tracking system 100 determines a third (x,y) coordinate 306C for a second marker 304B on the marker grid 702. Here, the tracking system 100 uses the marker grid information 716 to identify an offset associated with the second marker 304A. The tracking system 100 is able to determine the third (x,y) coordinate 306C for the second marker 304B by adding the offset associated with the second marker 304B to the first (x,y) coordinate 306A for the first comer 704 of the marker grid 702. In another embodiment, the tracking system 100 determines a third (x,y) coordinate 306C for a second marker 304B based at least in part on a rotation of the marker grid 702 using a process similar to the process described above for the first marker 304A.

Once the tracking system 100 knows the physical location of the markers 304 within the space 102, the tracking system 100 then determines where the markers 304 are located with respect to the pixels in the frame 302 of a sensor 108. At step 606, the tracking system 100 receives a frame 302 from a sensor 108. The frame 302 is of the global plane 104 that includes at least a portion of the marker grid 702 in the space 102. The frame 302 comprises one or more markers 304 of the marker grid 702. The frame 302 is configured similar to the frame 302 described in FIGS. 2-4. For example, the frame 302 comprises a plurality of pixels that are each associated with a pixel location 402 within the frame 302. The pixel location 402 identifies a pixel row and a pixel column where a pixel is located. In one embodiment, each pixel is associated with a pixel value 404 that indicates a depth or distance measurement. For example, a pixel value 404 may correspond with a distance between the sensor 108 and a surface within the space 102.

At step 610, the tracking system 100 identifies markers 304 within the frame 302 of the sensor 108. The tracking system 100 may identify markers 304 within the frame 302 using a process similar to the process described in step 206 of FIG. 2. For example, the tracking system 100 may use object detection to identify markers 304 within the frame 302. Referring to the example in FIG. 7, each marker 304 is a unique shape or symbol. In other examples, each marker 304 may have any other unique features (e.g. shape, pattern, color, text, etc.). In this example, the tracking system 100 may search for objects within the frame 302 that correspond with the known features of a marker 304. Tracking system 100 may identify the first marker 304A, the second marker 304B, and any other markers 304 on the marker grid 702.

In one embodiment, the tracking system 100 compares the features of the identified markers 304 to the features of known markers 304 on the marker grid 702 using a marker dictionary 718. The marker dictionary 718 identifies a plurality of markers 304 that are associated with a marker grid 702. In this example, the tracking system 100 may identify the first marker 304A by identifying a star on the marker grid 702, comparing the star to the symbols in the marker dictionary 718, and determining that the star matches one of the symbols in the marker dictionary 718 that corresponds with the first marker 304A. Similarly, the tracking system 100 may identify the second marker 304B by identifying a triangle on the marker grid 702, comparing the triangle to the symbols in the marker dictionary 718, and determining that the triangle matches one of the symbols in the marker dictionary 718 that corresponds with the second marker 304B. The tracking system 100 may repeat this process for any other identified markers 304 in the frame 302.

In another embodiment, the marker grid 702 may comprise markers 304 that contain text. In this example, each marker 304 can be uniquely identified based on its text. This configuration allows the tracking system 100 to identify markers 304 in the frame 302 by using text recognition or optical character recognition techniques on the frame 302. In this case, the tracking system 100 may use a marker dictionary 718 that comprises a plurality of predefined words that are each associated with a marker 304 on the marker grid 702. For example, the tracking system 100 may perform text recognition to identify text with the frame 302. The tracking system 100 may then compare the identified text to words in the marker dictionary 718. Here, the tracking system 100 checks whether the identified text matched any of the known text that corresponds with a marker 304 on the marker grid 702. The tracking system 100 may discard any text that does not match any words in the marker dictionary 718. When the tracking system 100 identifies text that matches a word in the marker dictionary 718, the tracking system 100 may identify the marker 304 that corresponds with the identified text. For instance, the tracking system 100 may determine that the identified text matches the text associated with the first marker 304A.The tracking system 100 may identify the second marker 304B and any other markers 304 on the marker grid 702 using a similar process.

Returning to FIG. 6 at step 610, the tracking system 100 determines a number of identified markers 304 within the frame 302. Here, tracking system 100 counts the number of markers 304 that were detected within the frame 302. Referring to the example in FIG. 7, the tracking system 100 detects five markers 304 within the frame 302.

Returning to FIG. 6 at step 614, the tracking system 100 determines whether the number of identified markers 304 is greater than or equal to a predetermined threshold value. The tracking system 100 may compare the number of identified markers 304 to the predetermined threshold value using a process similar to the process described in step 210 of FIG. 2. The tracking system 100 returns to step 606 in response to determining that the number of identified markers 304 is less than the predetermined threshold value. In this case, the tracking system 100 returns to step 606 to capture another frame 302 of the space 102 using the same sensor 108 to try to detect more markers 304. Here, the tracking system 100 tries to obtain anew frame 302 that includes a number of markers 304 that is greater than or equal to the predetermined threshold value. For example, the tracking system 100 may receive new frame 302 of the space 102 after an operator repositions the marker grid 702 within the space 102. As another example, the tracking system 100 may receive new frame 302 after lighting conditions have been changed to improve the detectability of the markers 304 within the frame 302. In other examples, the tracking system 100 may receive new frame 302 after any kind of change that improves the detectability of the markers 304 within the frame 302.

The tracking system 100 proceeds to step 614 in response to determining that the number of identified markers 304 is greater than or equal to the predetermined threshold value. Once the tracking system 100 identifies a suitable number of markers 304 on the marker grid 702, the tracking system 100 then determines a pixel location 402 for each of the identified markers 304. Each marker 304 may occupy multiple pixels in the frame 302. This means that for each marker 304, the tracking system 100 determines which pixel location 402 in the frame 302 corresponds with its (x,y) coordinate 306 in the global plane 104. In one embodiment, the tracking system 100 using bounding boxes 708 to narrow or restrict the search space when trying to identify pixel location 402 for markers 304. A bounding box 708 is a defined area or region within the frame 302 that contains a marker 304. For example, a bounding box 708 may be defined as a set of pixels or a range of pixels of the frame 302 that comprise a marker 304.

At step 614, the tracking system 100 identifies bounding boxes 708 for markers 304 within the frame 302. In one embodiment, the tracking system 100 identifies a plurality of pixels in the frame 302 that correspond with a marker 304 and then defines a bounding box 708 that encloses the pixels corresponding with the marker 304. The tracking system 100 may repeat this process for each of the markers 304. Returning to the example in FIG. 7, the tracking system 100 may identify a first bounding box 708 A for the first marker 304A, a second bounding box 708B for the second marker 304B, and bounding boxes 708 for any other identified markers 304 within the frame 302.

In another embodiment, the tracking system may employ text or character recognition to identify the first marker 304A when the first marker 304A comprises text. For example, the tracking system 100 may use text recognition to identify pixels with the frame 302 that comprises a word corresponding with a marker 304. The tracking system 100 may then define a bounding box 708 that encloses the pixels

corresponding with the identified word. In other embodiments, the tracking system 100 may employ any other suitable image processing technique for identifying bounding boxes 708 for the identified markers 304.

Returning to FIG. 6 at step 616, the tracking system 100 identifies a pixel 710 within each bounding box 708 that corresponds with a pixel location 402 in the frame 302 for a marker 304. As discussed above, each marker 304 may occupy multiple pixels in the frame 302 and the tracking system 100 determines which pixel 710 in the frame 302 corresponds with the pixel location 402 for an (x,y) coordinate 306 in the global plane 104. In one embodiment, each marker 304 comprises alight source. Examples of light sources include, but are not limited to, light emitting diodes (LEDs), infrared (IR) LEDs, incandescent lights, or any other suitable type of light source. In this configuration, a pixel 710 corresponds with a light source for a marker 304. In another embodiment, each marker 304 may comprise a detectable feature that is unique to each marker 304. For example, each marker 304 may comprise a unique color that is associated with the marker 304. As another example, each marker 304 may comprise a unique symbol or pattern that is associated with the marker 304. In this configuration, a pixel 710 corresponds with the detectable feature for the marker 304. Continuing with the previous example, the tracking system 100 identifies a first pixel 710A for the first marker 304, a second pixel 710B for the second marker 304, and pixels 710 for any other identified markers 304.

At step 618, the tracking system 100 determines pixel locations 402 within the frame 302 for each of the identified pixels 710. For example, the tracking system 100 may identify a first pixel row and a first pixel column of the frame 302 that corresponds with the first pixel 710A. Similarly, the tracking system 100 may identify a pixel row and a pixel column in the frame 302 for each of the identified pixels 710.

The tracking system 100 generates a homography 118 for the sensor 108 after the tracking system 100 determines (x,y) coordinates 306 in the global plane 104 and pixel locations 402 in the frame 302 for each of the identified markers 304. At step 620, the tracking system 100 generates a homography 118 for the sensor 108 based on the pixel locations 402 of identified markers 304 in the frame 302 of the sensor 108 and the (x,y) coordinate 306 of the identified markers 304 in the global plane 104. In one

embodiment, the tracking system 100 correlates the pixel location 402 for each of the identified markers 304 with its corresponding (x,y) coordinate 306. Continuing with the example in FIG. 7, the tracking system 100 associates the first pixel location 402 for the first marker 304A with the second (x,y) coordinate 306B for the first marker 304A. The tracking system 100 also associates the second pixel location 402 for the second marker 304B with the third (x,y) location 306C for the second marker 304B. The tracking system 100 may repeat this process for all of the identified markers 304.

The tracking system 100 then determines a relationship between the pixel locations 402 of identified markers 304 with the frame 302 of the sensor 108 and the (x,y) coordinate 306 of the identified markers 304 in the global plane 104 to generate a homography 118 for the sensor 108. The generated homography 118 allows the tracking system 100 to map pixel locations 402 in a frame 302 from the sensor 108 to (x,y) coordinates 306 in the global plane 104. The generated homography 118 is similar to the homography described in FIGS. 5A and 5B. Once the tracking system 100 generates the homography 118 for the sensor 108, the tracking system 100 stores an association between the sensor 108 and the generated homography 118 in memory (e.g. memory 3804).

The tracking system 100 may repeat the process described above to generate and associate homographies 118 with other sensors 108. The marker grid 702 may be moved or repositioned within the space 108 to generate a homography 118 for another sensor 108. For example, an operator may reposition the marker grid 702 to allow another sensor 108 to view the markers 304 on the marker grid 702. As an example, the tracking system 100 may receive a second frame 302 from a second sensor 108. In this example, the second frame 302 comprises the first marker 304A and the second marker 304B. The tracking system 100 may determine a third pixel location 402 in the second frame 302 for the first marker 304A and a fourth pixel location 402 in the second frame 302 for the second marker 304B. The tracking system 100 may then generate a second homography 118 based on the third pixel location 402 in the second frame 302 for the first marker 304A, the fourth pixel location 402 in the second frame 302 for the second marker 304B, the (x,y) coordinate 306B in the global plane 104 for the first marker 304A, the (x,y) coordinate 306C in the global plane 104 for the second marker 304B, and pixel locations 402 and (x,y) coordinates 306 for other markers 304. The second homography 118 comprises coefficients that translate between pixel locations 402 in the second frame 302 and physical locations (e.g. (x,y) coordinates 306) in the global plane 104. The coefficients of the second homography 118 are different from the coefficients of the homography 118 that is associated with the first sensor 108. In other words, each sensor 108 is uniquely associated with a homography 118 that maps pixel locations 402 from the sensor 108 to physical locations in the global plane 104. This process uniquely associates a homography 118 to a sensor 108 based on the physical location (e.g. (x,y) coordinate 306) of the sensor 108 in the global plane 104.

Shelf position calibration

FIG. 8 is a flowchart of an embodiment of a shelf position calibration method 800 for the tracking system 100. The tracking system 100 may employ method 800 to periodically check whether a rack 112 or sensor 108 has moved within the space 102. For example, a rack 112 may be accidently bumped or moved by a person which causes the rack’s 112 position to move with respect to the global plane 104. As another example, a sensor 108 may come loose from its mounting structure which causes the sensor 108 to sag or move from its original location. Any changes in the position of a rack 112 and/or a sensor 108 after the tracking system 100 has been calibrated will reduce the accuracy and performance of the tracking system 100 when tracking objects within the space 102. The tracking system 100 employs method 800 to detect when either a rack 112 or a sensor 108 has moved and then recalibrates itself based on the new position of the rack 112 or sensor 108.

A sensor 108 may be positioned within the space 102 such that frames 302 captured by the sensor 108 will include one or more shelf markers 906 that are located on a rack 112. A shelf marker 906 is an object that is positioned on a rack 112 that can be used to determine a location (e.g. an (x,y) coordinate 306 and a pixel location 402) for the rack 112. The tracking system 100 is configured to store the pixel locations 402 and the (x,y) coordinates 306 of the shelf markers 906 that are associated with frames 302 from a sensor 108. In one embodiment, the pixel locations 402 and the (x,y) coordinates 306 of the shelf markers 906 may be determined using a process similar to the process described in FIG. 2. In another embodiment, the pixel locations 402 and the (x,y) coordinates 306 of the shelf markers 906 may be provided by an operator as an input to the tracking system 100.

A shelf marker 906 may be an object similar to the marker 304 described in FIGS. 2-7. In some embodiments, each shelf marker 906 on a rack 112 is unique from other shelf markers 906 on the rack 112. This feature allows the tracking system 100 to determine an orientation of the rack 112. Referring to the example in FIG. 9, each shelf marker 906 is a unique shape that identifies a particular portion of the rack 112. In this example, the tracking system 100 may associate a first shelf marker 906A and a second shelf marker 906B with a front of the rack 112. Similarly, the tracking system 100 may also associate a third shelf marker 906C and a fourth shelf marker 906D with a back of the rack 112. In other examples, each shelf marker 906 may have any other uniquely identifiable features (e.g. color or patterns) that can be used to identify a shelf marker 906.

Returning to FIG. 8 at step 802, the tracking system 100 receives a first frame 302A from a first sensor 108. Referring to FIG. 9 as an example, the first sensor 108 captures the first frame 302A which comprises at least a portion of a rack 112 within the global plane 104 for the space 102.

Returning to FIG. 8 at step 804, the tracking system 100 identifies one or more shelf markers 906 within the first frame 302A. Returning again to the example in FIG.

9, the rack 112 comprises four shelf markers 906. In one embodiment, the tracking system 100 may use object detection to identify shelf markers 906 within the first frame 302A. For example, the tracking system 100 may search the first frame 302A for known features (e.g. shapes, patterns, colors, text, etc.) that correspond with a shelf marker 906. In this example, the tracking system 100 may identify a shape (e.g. a star) in the first frame 302A that corresponds with a first shelf marker 906A. In other embodiments, the tracking system 100 may use any other suitable technique to identify a shelf marker 906 within the first frame 302A. The tracking system 100 may identify any number of shelf markers 906 that are present in the first frame 302 A.

Once the tracking system 100 identifies one or more shelf markers 906 that are present in the first frame 302A of the first sensor 108, the tracking system 100 then determines their pixel locations 402 in the first frame 302A so they can be compared to expected pixel locations 402 for the shelf markers 906. Returning to FIG. 8 at step 806, the tracking system 100 determines current pixel locations 402 for the identified shelf markers 906 in the first frame 302A. Returning to the example in FIG. 9, the tracking system 100 determines a first current pixel location 402A for the shelf marker 906 within the first frame 302A. The first current pixel location 402A comprises a first pixel row and first pixel column where the shelf marker 906 is located within the first frame 302A.

Returning to FIG. 8 at step 808, the tracking system 100 determines whether the current pixel locations 402 for the shelf markers 906 match the expected pixel locations 402 for the shelf markers 906 in the first frame 302A. Returning to the example in FIG.

9, the tracking system 100 determines whether the first current pixel location 402A matches a first expected pixel location 402 for the shelf marker 906. As discussed above, when the tracking system 100 is initially calibrated, the tracking system 100 stores pixel location information 908 that comprises expected pixel locations 402 within the first frame 302A of the first sensor 108 for shelf markers 906 of a rack 112. The tracking system 100 uses the expected pixel locations 402 as reference points to determine whether the rack 112 has moved. By comparing the expected pixel location 402 for a shelf marker 906 with its current pixel location 402, the tracking system 100 can determine whether there are any discrepancies that would indicate that the rack 112 has moved.

The tracking system 100 may terminate method 800 in response to determining that the current pixel locations 402 for the shelf markers 906 in the first frame 302A match the expected pixel location 402 for the shelf markers 906. In this case, the tracking system 100 determines that neither the rack 112 nor the first sensor 108 has moved since the current pixel locations 402 match the expected pixel locations 402 for the shelf marker 906.

The tracking system 100 proceeds to step 810 in response to a determination at step 808 that one or more current pixel locations 402 for the shelf markers 906 does not match an expected pixel location 402 for the shelf markers 906. For example, the tracking system 100 may determine that the first current pixel location 402A does not match the first expected pixel location 402 for the shelf marker 906. In this case, the tracking system 100 determines that rack 112 and/or the first sensor 108 has moved since the first current pixel location 402A does not match the first expected pixel location 402 for the shelf marker 906. Here, the tracking system 100 proceeds to step 810 to identify whether the rack 112 has moved or the first sensor 108 has moved.

At step 810, the tracking system 100 receives a second frame 302B from a second sensor 108. The second sensor 108 is adjacent to the first sensor 108 and has at least a partially overlapping field of view with the first sensor 108. The first sensor 108 and the second sensor 108 is positioned such that one or more shelf markers 906 are observable by both the first sensor 108 and the second sensor 108. In this configuration, the tracking system 100 can use a combination of information from the first sensor 108 and the second sensor 108 to determine whether the rack 112 has moved or the first sensor 108 has moved. Returning to the example in FIG. 9, the second frame 304B comprises the first shelf marker 906A, the second shelf marker 906B, the third shelf marker 906C, and the fourth shelf marker 906D of the rack 112.

Returning to FIG. 8 at step 812, the tracking system 100 identifies the shelf markers 906 that are present within the second frame 302B from the second sensor 108. The tracking system 100 may identify shelf markers 906 using a process similar to the process described in step 804. Returning again to the example in FIG. 9, tracking system 100 may search the second frame 302B for known features (e.g. shapes, patterns, colors, text, etc.) that correspond with a shelf marker 906. For example, the tracking system 100 may identify a shape (e.g. a star) in the second frame 302B that corresponds with the first shelf marker 906A.

Once the tracking system 100 identifies one or more shelf markers 906 that are present in the second frame 302B of the second sensor 108, the tracking system 100 then determines their pixel locations 402 in the second frame 302B so they can be compared to expected pixel locations 402 for the shelf markers 906. Returning to FIG.

8 at step 814, the tracking system 100 determines current pixel locations 402 for the identified shelf markers 906 in the second frame 302B. Returning to the example in FIG. 9, the tracking system 100 determines a second current pixel location 402B for the shelf marker 906 within the second frame 302B. The second current pixel location 402B comprises a second pixel row and a second pixel column where the shelf marker 906 is located within the second frame 302B from the second sensor 108.

Returning to FIG. 8 at step 816, tracking system 100 determines whether the current pixel locations 402 for the shelf markers 906 match the expected pixel locations 402 for the shelf markers 906 in the second frame 302B. Returning to the example in FIG. 9, the tracking system 100 determines whether the second current pixel location 402B matches a second expected pixel location 402 for the shelf marker 906. Similar to as discussed above in step 808, the tracking system 100 stores pixel location information 908 that comprises expected pixel locations 402 within the second frame 302B of the second sensor 108 for shelf markers 906 of a rack 112 when the tracking system 100 is initially calibrated. By comparing the second expected pixel location 402 for the shelf marker 906 to its second current pixel location 402B, the tracking system 100 can determine whether the rack 112 has moved or whether the first sensor 108 has moved.

The tracking system 100 determines that the rack 112 has moved when the current pixel location 402 and the expected pixel location 402 for one or more shelf markers 906 do not match for multiple sensors 108. When a rack 112 moves within the global plane 104, the physical location of the shelf markers 906 moves which causes the pixel locations 402 for the shelf markers 906 to also move with respect to any sensors 108 viewing the shelf markers 906. This means that the tracking system 100 can conclude that the rack 112 has moved when multiple sensors 108 observe a mismatch between current pixel locations 402 and expected pixel locations 402 for one or more shelf markers 906.

CLAIMS

1. A system, comprising:

a sensor positioned above a rack in a space, the sensor configured to generate top-view images of at least a portion of a space comprising the rack;

a plurality of weight sensors, each weight sensor associated with a corresponding item stored on a shelf of the rack; and

a tracking subsystem coupled to the image sensor and the weight sensors, the tracking subsystem configured to:

receive an image feed comprising frames of the top-view images generated by the sensor;

receive weight measurements from the weight sensors;

detect an event associated with one or both of a portion of a person entering a zone adjacent to the rack and a change of weight associated with a first item being removed from a first shelf associated with a first weight sensor;

in response to detecting the event, determine that a first person and a second person may be associated with the detected event, based on one or more of a first distance between the first person and the rack, a second distance between the second person and the rack, and an inter-person distance between the first person and the second person;

in response to determining that the first and second person may be associated with the detected event, store buffer frames of top-view images generated by the sensor following the detected event;

determine, using at least one of the stored buffer frames and a first action-detection algorithm, whether an action associated with the detected event was performed by the first person or the second person, wherein the first action- detection algorithm is configured to detect the action based on characteristics of one or more contours in the at least one stored buffer frame(s);

determine whether results of the first action-detection algorithm satisfy criteria based at least in part on a number of iterations required to implement the first action-detection algorithm;

in response to determining the results of the first action-detection algorithm do not satisfy the criteria, determine, by applying a second action- detection algorithm to at least one of the buffer frames, whether the action associated with the detected event was performed by the first person or the second person, wherein the second action-detection algorithm is configured to detect the action using an artificial neural network;

in response to determining the action was performed by the first person, assign the action to the first person; and

in response to determining the action was performed by the second person, assign the action to the second person.

2. The system of Claim 1, wherein the tracking subsystem is further configured to:

following storing the buffer frames, determine a region-of-interest of the top-view images of the stored frames; and

determine, using the region-of-interest of at least one of the stored buffer frames and the first action-detection algorithm, whether the action associated with the detected event was performed by the first person or the second person.

3. The system of Claim 1, wherein the stored buffer frames comprise three or fewer frames of top-view images following one or both of: the portion of the person entering the zone adjacent to the rack and the portion of the person exiting the zone adjacent to the rack.

4. The system of Claim 3, wherein the tracking subsystem is further configured to determine a subset of the buffer frames to use with the first action-detection algorithm and a second subset of the buffer frames to use with the second action detection algorithm.

5. The system of Claim 1, wherein the tracking subsystem is further configured to determine that the first person and the second person may be associated with the detected event based on a first relative orientation between the first person and the rack and a second relative orientation between the second person and the rack.

6. The system of Claim 1, wherein:

the detected action is associated with a person picking up the first item stored on the first shelf of the rack; and

the tracking subsystem is further configured to:

in response to determining the action was performed by the first person, assign the first item to the first person; and

in response to determining the action was performed by the second person, assign the first item to the second person.

7. The system of Claim 1, wherein:

the first action-detection algorithm involves iterative dilation of a first contour associated with the first person and a second contour associated with the second contour; and

the criteria comprise a requirement that the portion of the person entering the zone adjacent to the rack is associated with either the first person or the second person within a maximum number of iterative dilations of the first and second contours.

8. The system of Claim 7, wherein the tracking subsystem is further configured to:

in response to determining the first person is associated with the portion of the person entering the zone adjacent to the rack within the maximum number of dilations, assign the action to the first person.

9. A method, comprising:

receiving an image feed comprising frames of top-view images generated by a sensor, the sensor positioned above a rack in a space and configured to generate top-view images of at least a portion of a space comprising the rack;

receiving weight measurements from a weight sensor associated with a corresponding item stored on a shelf of the rack;

detecting an event associated with one or both of a portion of a person entering a zone adjacent to the rack and a change of weight associated with a first item being removed from a first shelf associated with the weight sensor;

in response to detecting the event, determining that a first person and a second person may be associated with the detected event, based on one or more of a first distance between the first person and the rack, a second distance between the second person and the rack, and an inter-person distance between the first person and the second person;

in response to determining that the first and second person may be associated with the detected event, storing buffer frames of top-view images generated by the sensor following the detected event;

determining, using at least one of the stored buffer frames and a first action-detection algorithm, whether an action associated with the detected event was performed by the first person or the second person, wherein the first action-detection algorithm is configured to detect the action based on characteristics of one or more contours in the at least one stored buffer frame(s);

determining whether results of the first action-detection algorithm satisfy criteria based at least in part on a number of iterations required to implement the first action-detection algorithm;

in response to determining the results of the first action-detection algorithm do not satisfy the criteria, determining, by applying a second action-detection algorithm to at least one of the buffer frames, whether the action associated with the detected event was performed by the first person or the

second person, wherein the second action-detection algorithm is configured to detect the action using an artificial neural network;

in response to determining the action was performed by the first person, assigning the action to the first person; and

in response to determining the action was performed by the second person, assigning the action to the second person.

10. The method of Claim 9, further comprising:

following storing the buffer frames, determining a region-of-interest of the top-view images of the stored frames; and

determining, using the region-of-interest of at least one of the stored buffer frames and the first action-detection algorithm, whether the action associated with the detected event was performed by the first person or the second person.

11. The method of Claim 9, wherein the stored buffer frames comprise three or fewer frames of top-view images following one or both of: the portion of the person entering the zone adjacent to the rack and the portion of the person exiting the zone adjacent to the rack.

12. The method of Claim 11 , further comprising determining a subset of the buffer frames to use with the first action-detection algorithm and a second subset of the buffer frames to use with the second action detection algorithm.

13. The method of Claim 9, further comprising determining that the first person and the second person may be associated with the detected event based on a first relative orientation between the first person and the rack and a second relative orientation between the second person and the rack.

14. The method of Claim 9, wherein:

the detected action is associated with a person picking up the first item stored on the first shelf of the rack; and

the method further comprises:

in response to determining the action was performed by the first person, assigning the first item to the first person; and

in response to determining the action was performed by the second person, assigning the first item to the second person.

15. The method of Claim 9, wherein:

the first action-detection algorithm involves iterative dilation of a first contour associated with the first person and a second contour associated with the second contour; and

16. The method of Claim 15, further comprising, in response to determining the first person is associated with the portion of the person entering the zone adjacent to the rack within the maximum number of dilations, assigning the action to the first person.

17. A tracking subsystem coupled to an image sensor and a weight sensor, wherein the image sensor is positioned above a rack in a space and configured to generate top-view images of at least a portion of the space comprising the rack, wherein the weight sensor is configured to measure a change of weight when an item is removed from a shelf of the rack, the tracking subsystem configured to:

receive an image feed comprising frames of the top-view images generated by the sensor;

receive weight measurements from the weight sensor;

in response to determining that the first and second person may be associated with the detected event, store buffer frames of top-view images generated by the sensor following the detected event;

determine, using at least one of the stored buffer frames and a first action-detection algorithm, whether an action associated with the detected event was performed by the first person or the second person, wherein the first action-detection algorithm is configured to detect the action based on characteristics of one or more contours in the at least one stored buffer frame(s);

determine whether results of the first action-detection algorithm satisfy criteria based at least in part on a number of iterations required to implement the first action-detection algorithm;

in response to determining the results of the first action-detection algorithm do not satisfy the criteria, determine, by applying a second action-detection algorithm to at least one of the buffer frames, whether the action associated with the detected event was performed by the first person or the second person, wherein the second action-detection algorithm is configured to detect the action using an artificial neural network;

in response to determining the action was performed by the first person, assign the action to the first person; and

in response to determining the action was performed by the second person, assign the action to the second person.

18. The tracking subsystem of Claim 17, further configured to: following storing the buffer frames, determine a region-of-interest of the top-view images of the stored frames; and

19. The tracking subsystem of Claim 17, wherein the stored buffer frames comprise three or fewer frames of top-view images following one or both of: the portion of the person entering the zone adjacent to the rack and the portion of the person exiting the zone adjacent to the rack.

20. The tracking subsystem of Claim 19, further configured to determine a subset of the buffer frames to use with the first action-detection algorithm and a second subset of the buffer frames to use with the second action detection algorithm.

Documents

Application Documents

#	Name	Date
1	202217027726-FER.pdf	2025-04-03
1	202217027726-FORM 3 [15-11-2023(online)].pdf	2023-11-15
1	202217027726.pdf	2022-05-13
2	202217027726-STATEMENT OF UNDERTAKING (FORM 3) [13-05-2022(online)].pdf	2022-05-13
2	202217027726-FORM 3 [15-11-2023(online)].pdf	2023-11-15
2	202217027726-FORM 18 [23-10-2023(online)].pdf	2023-10-23
3	202217027726-PRIORITY DOCUMENTS [13-05-2022(online)].pdf	2022-05-13
3	202217027726-FORM 18 [23-10-2023(online)].pdf	2023-10-23
3	202217027726-FORM 3 [05-05-2023(online)].pdf	2023-05-05
4	202217027726-FORM 3 [09-11-2022(online)].pdf	2022-11-09
4	202217027726-FORM 3 [05-05-2023(online)].pdf	2023-05-05
4	202217027726-FORM 1 [13-05-2022(online)].pdf	2022-05-13
5	202217027726-FORM 3 [09-11-2022(online)].pdf	2022-11-09
5	202217027726-DRAWINGS [13-05-2022(online)].pdf	2022-05-13
5	202217027726-Correspondence-080722.pdf	2022-07-15
6	202217027726-Others-080722.pdf	2022-07-15
6	202217027726-DECLARATION OF INVENTORSHIP (FORM 5) [13-05-2022(online)].pdf	2022-05-13
6	202217027726-Correspondence-080722.pdf	2022-07-15
7	202217027726-Proof of Right [04-07-2022(online)].pdf	2022-07-04
7	202217027726-Others-080722.pdf	2022-07-15
7	202217027726-COMPLETE SPECIFICATION [13-05-2022(online)].pdf	2022-05-13
8	202217027726-Proof of Right [04-07-2022(online)].pdf	2022-07-04
8	202217027726-FORM-26 [18-05-2022(online)].pdf	2022-05-18
8	202217027726-Correspondence-300522.pdf	2022-06-06
9	202217027726-Correspondence-300522.pdf	2022-06-06
9	202217027726-GPA-300522.pdf	2022-06-06
10	202217027726-Correspondence-300522.pdf	2022-06-06
10	202217027726-FORM-26 [18-05-2022(online)].pdf	2022-05-18
10	202217027726-GPA-300522.pdf	2022-06-06
11	202217027726-COMPLETE SPECIFICATION [13-05-2022(online)].pdf	2022-05-13
11	202217027726-FORM-26 [18-05-2022(online)].pdf	2022-05-18
11	202217027726-Proof of Right [04-07-2022(online)].pdf	2022-07-04
12	202217027726-COMPLETE SPECIFICATION [13-05-2022(online)].pdf	2022-05-13
12	202217027726-DECLARATION OF INVENTORSHIP (FORM 5) [13-05-2022(online)].pdf	2022-05-13
12	202217027726-Others-080722.pdf	2022-07-15
13	202217027726-DRAWINGS [13-05-2022(online)].pdf	2022-05-13
13	202217027726-DECLARATION OF INVENTORSHIP (FORM 5) [13-05-2022(online)].pdf	2022-05-13
13	202217027726-Correspondence-080722.pdf	2022-07-15
14	202217027726-DRAWINGS [13-05-2022(online)].pdf	2022-05-13
14	202217027726-FORM 1 [13-05-2022(online)].pdf	2022-05-13
14	202217027726-FORM 3 [09-11-2022(online)].pdf	2022-11-09
15	202217027726-PRIORITY DOCUMENTS [13-05-2022(online)].pdf	2022-05-13
15	202217027726-FORM 3 [05-05-2023(online)].pdf	2023-05-05
15	202217027726-FORM 1 [13-05-2022(online)].pdf	2022-05-13
16	202217027726-STATEMENT OF UNDERTAKING (FORM 3) [13-05-2022(online)].pdf	2022-05-13
16	202217027726-PRIORITY DOCUMENTS [13-05-2022(online)].pdf	2022-05-13
16	202217027726-FORM 18 [23-10-2023(online)].pdf	2023-10-23
17	202217027726.pdf	2022-05-13
17	202217027726-STATEMENT OF UNDERTAKING (FORM 3) [13-05-2022(online)].pdf	2022-05-13
17	202217027726-FORM 3 [15-11-2023(online)].pdf	2023-11-15
18	202217027726-FER.pdf	2025-04-03
18	202217027726.pdf	2022-05-13
19	202217027726-Information under section 8(2) [30-05-2025(online)].pdf	2025-05-30
20	202217027726-Information under section 8(2) [30-05-2025(online)]-4.pdf	2025-05-30
21	202217027726-Information under section 8(2) [30-05-2025(online)]-3.pdf	2025-05-30
22	202217027726-Information under section 8(2) [30-05-2025(online)]-2.pdf	2025-05-30
23	202217027726-Information under section 8(2) [30-05-2025(online)]-1.pdf	2025-05-30
24	202217027726-FORM 3 [30-05-2025(online)].pdf	2025-05-30
25	202217027726-FER_SER_REPLY [04-08-2025(online)].pdf	2025-08-04
26	202217027726-DRAWING [04-08-2025(online)].pdf	2025-08-04
27	202217027726-CORRESPONDENCE [04-08-2025(online)].pdf	2025-08-04
28	202217027726-COMPLETE SPECIFICATION [04-08-2025(online)].pdf	2025-08-04
29	202217027726-CLAIMS [04-08-2025(online)].pdf	2025-08-04
30	202217027726-ABSTRACT [04-08-2025(online)].pdf	2025-08-04

Search Strategy

1	202217027726E_20-08-2024.pdf