Abstract: ABSTRACT METHOD AND SYSTEM FOR OBJECT RECOGNITION AND TRACKING Provided is a system (101) for object recognition and tracking, the system (101) comprising a memory (219) configured to store computer-executable instructions and one or more processors (201) configured to execute the instructions to obtain a unique object identifier (ID) associated with an object, wherein the unique object ID includes information of a face of the object. The one or more processors (201) in the system (101) further identify the object based on the unique object ID. The one or more processors (201) further track the identified object, across a first field of view (FOV) of a first image capturing device to a second FOV of a second image capturing device, based on the unique object ID.
DESC:METHOD AND SYSTEM FOR OBJECT RECOGNITION AND TRACKING
TECHNOLOGICAL FIELD
[001] An example embodiment of the present invention generally relates to object recognition and tracking recognized objects.
BACKGROUND
[002] In the modern world, photographs, and particularly digital images can be acquired by numerous sources, including digital cameras, camcorders, cell phones, and web cams. Recent digital cameras have face recognition functions. Most related-art methods for face recognition perform face detection and identification using digital images.
[003] Mainly, the face recognition is performed using one or more deep learning models. The concept of deep learning is derived from the research of artificial neural network. The deep learning forms abstract high level concept by combining low-level features. However, the deep learning models that are used for face recognition requires higher computation cost and huge memory consumption. Further, tracking algorithms for tracking recognized person, such as optical flow and time of flight used in drones are very computation intensive and need larger storage. However, the optical flow and time of flight algorithms work well in sufficient lights but fail in low lighting conditions. Further during object tracking in a multiple camera framework, there is a problem that the person will be recognized and tracked by each camera every time. For example, in indoor home conditions, a person is spotted by one camera, the system may detect the person, recognize the person, and track the person based on some key features. Whenever the person moves from a field of view of the one camera to a field of view of another camera, the system may again detect the same person, recognize the person, and reinitialize a tracker for the person. This may require higher computation and more memory consumption. In addition, there exists a problem of recognizing a person in low lighting conditions. Accordingly, there is a need for an object recognition and tracking system to overcome the aforementioned problems.
SUMMARY OF THE INVENTION
[004] Accordingly, there is a need for the system that may track and analyze an object in multiple fields of view of multiple cameras without reinitiating the process. This system may save computation time and memory consumption.
[005] Some example embodiments disclosed herein provide a system for object recognition and tracking. The system comprising a memory configured to store computer-executable instructions and one or more processors configured to execute the instructions to obtain a unique object identifier (ID) associated with an object, wherein the unique object ID includes information of a face of the object. The one or more processors are further configured to identify the object based on the unique object ID and track the identified object, across a first field of view (FOV) of a first image capturing device to a second FOV of a second image capturing device, based on the unique object ID.
[006] According to an embodiment, the system further comprising the first image capturing device is different from the second image capturing device and the first FOV is different from the second FOV.
[007] According to an embodiment, wherein the one or more processor is further configured to obtain data from the first image capturing device, wherein the data may comprise one or more of an image or video content associated with the object captured by the first image capturing device and detect the object from the data based on an object detection algorithm.
[008] According to an embodiment, wherein the one or more processors are further configured to generate a plurality of tags, associated with the detected object, based on the object detection algorithm, wherein the plurality of tags indicates at least landmarks and key points associated with the face of the object and associate the plurality of tags with the unique object ID.
[009] According to an embodiment, wherein the one or more processors are further configured to store the unique object ID in the memory.
[0010] According to an embodiment, wherein the one or more processors are further configured to assign the unique object ID to the detected object.
[0011] According to an embodiment, wherein the one or more processors are further configured to change at least one value, associated with at least one tag of the plurality of tags, based on the track of the identified object.
[0012] According to an embodiment, wherein the one or more processors are further configured to determine a time period associated with presence of the object within the first FOV of the first image capturing device, change the at least one value, associated with the at least one tag of the plurality of tags, based on the determined time period being greater than a threshold time period and update the unique object ID based on the changed at least one value associated with the at least one tag of the plurality of tags.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The presently disclosed embodiments will be further explained with reference to the attached drawings. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the presently disclosed embodiments.
[0014] FIG. 1 illustrates a schematic diagram of a network environment of a system for object recognition and tracking, in accordance with an example embodiment.
[0015] FIG. 2 illustrates a block diagram of the system for object recognition and tracking, in accordance with an example embodiment.
[0016] FIG. 3 illustrates a sequence diagram of functions executed by the system for object recognition, in accordance with an example embodiment.
[0017] FIG. 4 illustrates a flow chart representing a sequence of steps for object recognition and tracking, in accordance with an example embodiment.
[0018] FIG. 5 illustrates an exemplary scenario of a working environment of the system for object recognition and tracking, in accordance with an example embodiment.
[0019] FIG. 6 illustrates a flow diagram of a method for object recognition and tracking, in accordance with an example embodiment.
DETAILED DESCRIPTION OF THE INVENTION
[0020] FIG. 1 illustrates a schematic diagram of a network environment 100 of a system 101 for object recognition and tracking, in accordance with an example embodiment. The system 101 may be communicatively coupled to a first image capturing device 103a, a second image capturing device 103b, a cloud 105 and a database 107 via a network 109. Further, it is possible that one or more components may be rearranged, changed, added, and/or removed in the first image capturing device 103a and the second image capturing device 103b.
[0021] In an example embodiment, the system 101 may be embodied in one or more of several ways as per the required implementation. For example, the system 101 may be embodied as a cloud based service or a cloud based platform. As such, the system 101 may be configured to operate outside the first image capturing device 103a and the second image capturing device 103b. However, in some example embodiments, the system 101 may be embodied within the first image capturing device 103a or within the second image capturing device 103b. In each of such embodiments, the system 101 may be communicatively coupled to the components shown in FIG. 1 to carry out the desired operations and wherever required modifications may be possible within the scope of the present disclosure. In an embodiment, the system 101 may be configured to track objects in real time using image processing, analytics and other advanced technologies like geo location tracking and proximity tracking. Further, in some example embodiments, the system 101 may be a standalone unit configured for object recognition and tracking embodied in the first image capturing device 103a or in the second image capturing device 103b.
[0022] In some example embodiments, each of the first image capturing device 103a and the second image capturing device 103b may be any user accessible device such as digital cameras, camcorders, web cams, or a video recorder or any device that may be configured to capture images in real-time to execute one or more functions. Each of the first image capturing device 103a and the second image capturing device 103b may comprise an image sensor, a processor, a memory and a communication interface and one or more different modules to perform different functions. The processor, the memory and the communication interface may be communicatively coupled to each other. In some example embodiments, each of the first image capturing device 103a and the second image capturing device 103b may be communicatively coupled to the system 101. In some example embodiments, the first image capturing device 103a and the second image capturing device 103b may be installed at multiple locations inside an indoor area (for example, offices or homes). In some example embodiments, at least one of the first image capturing device 103a or the second image capturing device 103b may be installed in an outdoor area. In such example embodiments, at least one of the first image capturing device 103a or the second image capturing device 103b may comprise processing means such as a central processing unit (CPU), storage means such as on-board read only memory (ROM) and random access memory (RAM), the one or more sensors such as an image sensors, a position sensors (such as a GPS sensor), a gyroscope, a LIDAR sensor, a proximity sensor, motion sensors (such as accelerometer), a display enabled user interface (such as a touch screen display), and other components as may be required for specific functionalities.
[0023] In some example embodiments, the cloud 105 may store data associated with one or more of an image or video content associated with one or more objects captured by the first image capturing device 103a and/or the second image capturing device 103b, which may be used later by the system 101. In some example embodiments, the cloud 105 may store information generated by the system 101 in accordance with the one or more of the image or the video content associated with the one or more objects. Further, in some example embodiment, the database 107 may be connected to system 101 via the network 109. The database 107 may be some external database or internal database that may be used to store data associated with the one or more objects captured by the first image capturing device 103a and/or the second image capturing device 103b.
[0024] The network 109 may be wired, wireless, or any combination of wired and wireless communication networks, such as cellular, Wi-Fi, internet, local area networks, or the like. In one embodiment, the network 109 may include one or more networks such as a data network, a wireless network, a telephony network, or any combination thereof. It is contemplated that the data network may be any local area network (LAN), metropolitan area network (MAN), wide area network (WAN), a public data network (e.g., the Internet), short range wireless network, or any other suitable packet-switched network, such as a commercially owned, proprietary packet-switched network, e.g., a proprietary cable or fiber-optic network, and the like, or any combination thereof. In addition, the wireless network may be, for example, a cellular network and may employ various technologies including enhanced data rates for global evolution (EDGE), general packet radio service (GPRS), global system for mobile communications (GSM), Internet protocol multimedia subsystem (IMS), universal mobile telecommunications system (UMTS), etc., as well as any other suitable wireless medium, e.g., worldwide interoperability for microwave access (WiMAX), Long Term Evolution (LTE) networks (for e.g. LTE-Advanced Pro), 5G New Radio networks, ITU-IMT 2020 networks, code division multiple access (CDMA), wideband code division multiple access (WCDMA), wireless fidelity (Wi-Fi), wireless LAN (WLAN), Bluetooth, Internet Protocol (IP) data casting, satellite, mobile ad-hoc network (MANET), and the like, or any combination thereof.
[0025] FIG. 2 illustrates a block diagram of the system 101 for object recognition and tracking, in accordance with an example embodiment. The system 101 may include a processing means such as at least one processor 201 (hereinafter, also referred to as “processor 201”), storage means such as at least one memory 219 (hereinafter, also referred to as “memory 219”), and a communication means such as at least one communication interface 221 (hereinafter, also referred to as “communication interface 221”). The processor 201 may retrieve computer program code instructions that may be stored in the memory 219 for execution of the computer program code instructions. The processor 201 may include one or more different modules to perform different functions, such as a motion manager module 203, an image provider service (IPS) 205, a recording manager module 207, a Region-of-Interest (ROI) manager module 209, an object detection service 211, an Object-of-Interest (OOI) manager module 213, an activity monitoring module 215 and a face recognition module 217.
[0026] The processor 201 may be embodied in the system 101 in different ways. For example, the processor 201 may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. As such, in some embodiments, the processor 201 may include one or more processing cores configured to perform independently. A multi-core processor may enable multiprocessing within a single physical package. Additionally or alternatively, the processor 201 may include one or more processors configured in tandem via a bus (not shown in FIG. 2) to enable independent execution of instructions, pipelining and/or multithreading.
[0027] The motion manager module 203 may execute motion monitoring operations in motion related events. To that end, the motion manager module 203 may detect motion of one or more objects in a region of interest (such as the FOV of the first image capturing device 103a and/or the second image capturing device 103b). To that end, the motion manager module 203 may register with the IPS 205 for fetching images of the one or more objects. The motion manager module 203 may detect the motion of the one or more objects from the images including the one or more objects. When the motion manager module 203 detects motion or stops detecting motion, the motion manager module 203 may provide an event callback. Accordingly, other modules in the system 101, registered with motion manager module 203, may receive notifications for such event callback. In an embodiment, motion manager module 203 may also execute zone based monitoring which allows a user to mark areas of interest in the FOV of the first image capturing device 103a and/or the second image capturing device 103b, and generate events only if the motion is detected in marked areas of interest.
[0028] The IPS 205 may perform functions related to capture one or more images of the surroundings, of the first image capturing device 103a and/or the second image capturing device 103b, in the FOV of the first image capturing device 103a and/or the second image capturing device 103b. In some example embodiments, the IPS 205 may upload the one or more images on the cloud 105. In some implementations, in case of multiple image capturing device instances, such as the first image capturing device 103a and the second image capturing device 103b, the IPS 205 may have one IPS 205 instance per image capturing device. In an embodiment, the IPS 205 may capture images continuously at 5 frames per second via the first image capturing device 103a and/or the second image capturing device 103b. Further, each module/service which requires image frames may request the image frames from the IPS 205. For example, the motion manager module 203 may request the IPS 205 to obtain the image frames periodically. In some embodiments, the IPS 205 maintains a record of all the images captured by the first image capturing device 103a and/or the second image capturing device 103b. In an embodiment, if any module generates any bookmark event with respect to any image, it may request the IPS 205 to save the corresponding image. The IPS 205 may save that image either on the memory 219 or on the cloud 105 as per a subscription policy. In some embodiments, the IPS 205 may perform regular scheduling task to delete all images for which no event is generated by any module of the processor 201 and the request to save image is not received with in a particular duration e.g. 30 seconds.
[0029] The recording manager module 207 may perform all recording related operations on the first image capturing device 103a and the second image capturing device 103b, e.g. start/stop the recording of the one or more images captured by the first image capturing device 103a and/or the second image capturing device 103b and generate bookmark events if required. The recording manager module 207 may start the recording based on one or more configurations from an application in each of the first image capturing device 103a and the second image capturing device 103b. In some example embodiments, the recording manager module 207 may stop the recording as soon as detection of motion stops irrespective of the one or more configuration.
[0030] The ROI manager module 209 may create and manage an ROI i.e. a region of interest in the surroundings of each of the first image capturing device 103a and the second image capturing device 103b. In some example embodiment, the ROI may correspond to the FOV of each of the first image capturing device 103a and the second image capturing device 103b. In some example embodiments, the ROI manager module 209 may fetch ROI details and save the details in a database (e.g. the database 107). For example, the ROI details may comprise information of foreground and background objects in the FOV of the first image capturing device 103a or the FOV of the second image capturing device 103b.
[0031] The object detection service 211 may correspond to a service to detect the one or more objects and their faces captured by each of the first image capturing device 103a and the second image capturing device 103b. In some embodiments, the object detection service 211 may execute one or more operations to detect motion of the one or more objects in the FOV of the first image capturing device 103a and the second image capturing device 103b. To that end, the object detection service 211 may be registered with motion manager module 203. In case motion of the one or more objects is detected, the object detection service 211 may execute one or more operations to detect the one or more objects in the FOV of each of the first image capturing device 103a and/or the second image capturing device 103b. The object detection service 211 may maintain a queue to process image frames captured by the first image capturing device 103a and/or the second image capturing device 103b. In some example embodiments, the object detection service 211 may also track the one or more objects inside the FOV of the first image capturing device 103a and/or the second image capturing device 103b. In some example embodiments, the object detection service 211 may execute one or more operations to identify color of the one or more objects in the FOV of the first image capturing device 103a and/or the second image capturing device 103b. More specifically, the object detection service 211 may determine upper body color and lower body color of the one or more objects such as human type objects. Further, the object detection service 211 may execute one or more operations to determine one or more characteristics of the human type objects, where the one or more characteristics may include age, gender, and the like. In some example embodiments, the object detection service 211 may execute one or more operations to determine a type of the one or more objects such as Face, human, TV, fan, sofa, door, windows, table, pet, and the like. In some example embodiments, the object detection service 211 may execute one or more operations to associate a plurality of tags with one or more objects in accordance with the determined type, color, and one or more characteristics. The object detection service 211 may execute one or more operations to generate a unique object identification (ID) based on the plurality of tags, and store the unique object ID in at least one of the memory 219, the cloud 105, or the database 107. The stored unique object ID may be utilized by the object detection service 211 to identify the one or more objects in future. To that end, the object detection service 211 may execute one or more operations to obtain the unique object ID corresponding to the one or more objects which pass the at least one of the FOV of the first image capturing device 103a or the second image capturing device 103b, and identify the one or more objects in the FOV of each of the first image capturing device 103a and/or the second image capturing device 103b based on the obtained unique object ID. Such objects with the stored unique object ID may correspond to known objects. In some example embodiments, the object detection service 211 may execute one or more operations to assign an identifier tag of a new object of interest (NOI), in case a new object is detected in the FOV of the first image capturing device 103a or the second image capturing device 103b. Such a tag remains same as long as an object stays inside the FOV. In case the object moves out of the FOV and then comes back in the FOV, the previous tag may not be retained. The OOI manager module 213 may utilize the tag to optimize lookup processes across an OOI database of the OOI manager module 213. In case of the new object, the object detection service 211 may be registered with the OOI manager module to execute NOI events. Each NOI event may contain a list of objects that are detected with respect to one image frame and the object may have an NOI descriptor object that may hold information about the detected object. Further, in some example embodiments, the object detection service 211 may execute one or more operations to determine current coordinates and previous coordinates of the one or more objects.
[0032] The OOI manager module 213 maintains a list of active object of interests (OOIs) in an ROI in the surroundings of the FOV of the first image capturing device 103a and/or the second image capturing device 103b. The OOI manager module 213 may manage a lifecycle of OOIs and generate bookmark events for all state changes. These bookmark events may be sent to the cloud 105. The cloud 105 may process these bookmark events and maintain a comprehensive collection of the OOIs and their states with respect to each ROI. After receiving the NOI events, the OOI manager module 213 may perform a lookup on the list of existing OOIs in the FOV of the first image capturing device 103a or the second image capturing device 103b, and determine if the NOI matches with any existing OOI. Based on the previous coordinates, current coordinates, tag and state of an OOI, the OOI Manager module 213 may update the state along with information of the NOI. Each of OOIs and NOIs may be assigned with states by the OOI Manager module 213 such as “New”, or “Entered”, or “Exited”, or “Present”, or “Disappeared”. In an embodiment, an object is considered “New” until an OOI is not created in the FOV of the first image capturing device 103a or the second image capturing apparatus 103b and an object is marked “Entered” if the object is new. In an embodiment, an object may be marked “Entered” if the object is new and appeared at some coordinates that overlap with an existing OOI of type Door or Window. In an embodiment, an object may be marked “Present” if the object is already present in the list of OOIs and another NOI is received with some updated information e.g. coordinates, recognition result etc. In an embodiment, an object may be marked “Disappeared” if the object is present/entered and no update is received on the same OOI for some duration e.g. 30 seconds. In an embodiment, an object may be marked “Exited” if last coordinates of that OOI are at the edges of the FOV of the first image capturing device 103a or the second image capturing device 103b, and no NOI is received for that object in next frame. In an embodiment, an object may be marked “Exited” if last coordinates of that OOI overlap with an existing OOI of type Door or window and no NOI is received for that object in next frame. In an embodiment, an object may be marked “Exited” if the object is in “Disappeared” state for some duration e.g. 2 minutes. In some embodiments, each OOI may also be designated either as temporary or permanent. In an example embodiment, to recognize an OOI by the system 101, the OOI manager module 213 may request a cloud service via an application program interface (API) (e.g. Representational State Transfer (REST) API) for object recognition. Each object recognition request may be assigned with a request ID so that an ROI can be tracked if a request is already in progress or not. The request ID is also required for the ROI to map the recognition result to a valid request. After the object recognition result is received back in response to the request, the OOI Manager module 213 may update the OOI to a permanent OOI if universally unique identifier (UUID) is not null.
[0033] The OOI manager module 213 may execute a number of activity monitor instances using the activity monitoring module 215. Each activity monitor instance may be responsible for monitoring one aspect of the OOI and generate bookmark events. The activity monitoring module 215 may include an activity level monitoring to monitor the object’s activity level i.e. whether the object is moving or stationary. To that end, object’s last coordinates (i.e. the previous coordinates) may be compared with a predetermined threshold value to determine whether the object is moving or stationary. Further, the OOI manager module 213 may generate a bookmark event if the OOI activity state changes from the moving state to the stationary state or vice-versa. The activity monitoring module 215 may also execute operation for direction monitoring and orientation monitoring for the one or more objects. More specifically, the activity monitoring module 215 may determine change in orientation and movement of the one or more objects such as a change in body movement of the one or more objects, and the like.
[0034] In some example embodiments, the face recognition module 217 may be a microservice in the processor 201 or the cloud 105 that may receive face recognition request from at least one of the first image capturing device 103a or the second image capturing device 103b. The face recognition module 217 may execute operations of face recognition of the one or more objects in the FOV of one of the first image capturing device 103a or the second image capturing device 103b. In some example embodiments, the face recognition module 217 may be a pre-trained model for all recognized faces of the one or more objects. In some example embodiments, the face recognition module 217 may be updated based on recognition of a new object in FOV of one of the first image capturing device 103a or the second image capturing device 103b.
[0035] Additionally or alternatively, the processor 201 may include one or more processors capable of processing large volumes of workloads and operations to provide support for big data analysis. In an example embodiment, the processor 201 may be in communication with the memory 219 via a bus for passing information among components coupled to the system 101.
[0036] The memory 219 may be non-transitory and may include, for example, one or more volatile and/or non-volatile memories. In other words, for example, the memory 219 may be an electronic storage device (for example, a computer readable storage medium) comprising gates configured to store data (for example, bits) that may be retrievable by a machine (for example, a computing device like the processor 201). The memory 219 may be configured to store information, data, content, applications, instructions, or the like, for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present invention. For example, the memory 219 may be configured to buffer input data for processing by the processor 201. As exemplarily illustrated in FIG. 2, the memory 219 may be configured to store instructions for execution by the processor 201. As such, whether configured by hardware or software methods, or by a combination thereof, the processor 201 may represent an entity (for example, physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor 201 is embodied as an ASIC, FPGA or the like, the processor 201 may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, when the processor 201 is embodied as an executor of software instructions, the instructions may specifically configure the processor 201 to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor 201 may be a processor specific device (for example, a mobile terminal or a fixed computing device) configured to employ an embodiment of the present invention by further configuration of the processor 201 by instructions for performing the algorithms and/or operations described herein. The processor 201 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 201.
[0037] The communication interface 221 may comprise an input interface and an output interface for supporting communications to and from the first image capturing device 103a, the second image capturing device 103b, or any other component with which the system 101 may communicate. The communication interface 221 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data to/from a communications device in communication with the first image capturing device 103a. The communication interface 221 may comprise an input/output interface in which a user may interact to adjust one or more user-configurable aspects (e.g., alert or notification criteria) of the system 101. In some embodiments, the communication interface 221 may include a display and one or more input keys for entering information related to the one or more user-configurable aspects.
[0038] FIG. 3 illustrates a sequence diagram 300 of functions 315a to 315f executed by the system 101 for object recognition, in accordance with an example embodiment. In FIG. 3, there is shown that the functions 315a to 315f are executed by a plurality of modules such as an object detection module 301 (i.e. the object detection service 211), a camera OOI manager 303 (i.e., the OOI manager module 213), an event notification helper 305 (i.e., the OOI manager module 213), a cloud MQTT broker 307 (i.e., a module of the cloud 105), a cloud face recognition service 309, face recognition module 311 (i.e., face recognition module 217), and a cloud OOI manager 315 (i.e. a module of the cloud 105). FIG. 3 is described in view of the description of the components in the FIG. 1 and FIG. 2.
[0039] At step 315a, motion of the one or more objects in the FOV of the first image capturing device 103a or the second image capturing device 301b may be detected. To that end, the object detection module 301 may execute the object detection algorithm. The object detection algorithm may be associated with a Convolution Neural Network (CNN). The CNN is a class of deep neural networks applied to analyze visual imagery (e.g. images of the one or more objects captured by the first image capturing device 103a and/or the second image capturing device 103b) for object detection. The object detection module 301 may generate a plurality of tags associated with the detected one or more objects, based on the execution of the object detection algorithm. The plurality of tags indicates one or more of color, type, and landmarks and key points associated with face of the one or more objects. Further, the object detection module 301 may be associated with the plurality of tags for a unique object ID corresponding to each of the one or more objects in the FOV of the first image capturing device 103a or the second image capturing device 301b. The object detection module 301 may obtain the unique object ID corresponding to each of the one or more objects from the cloud 105 based on the plurality of tags. The object detection module 301 may identify (recognize) the one or more objects based on the unique object ID. Further, the object detection module 301 may track the identified one or more objects inside the FOV of the first image capturing device 301a or the second image capturing device 301b.
[0040] At step 315b, a new object of interest (NOI) may be detected without recognition. In case a new object is detected in the FOV of the first image capturing device 103a or the second image capturing device 301b, NOI events may be triggered. These NOI events may be executed via the camera OOI manager 303, the cloud face recognition service309, face recognition module 311, and the cloud OOI manager 313 as described in detail from step 315c to step 315f.
[0041] At step 315c, the camera OOI manager 303 may send a face recognition request to the cloud face recognition service 309 for recognizing the detected new object of interest (NOI). The request may be sent via the API (e.g. REST API). Each face recognition request may be assigned with a request ID to map the face recognition result to a valid request. After the face recognition result is received, the camera OOI manager 303 may update the OOI to a permanent OOI in the OOI database if the unique object ID associated with the OOI is not null. Further, the control passed to step 315d.
[0042] At step 315d, the cloud face recognition service 309 may request the face recognition module 311 to recognize the face and the corresponding OOI (e.g. the one or more objects). The face recognition module 311 may obtain plurality of tags that may comprise face features such as landmarks and key points associated with the face of the OOI. Further, the face recognition module 311 may obtain the unique object ID from the cloud 105 or the database 107 based on the plurality of tags. Further, the face recognition module 311 may recognize the face of the OOI based on the obtained unique object ID associated with the OOI and send the result of the face recognition to the cloud face recognition service 309 and the camera OOI manager 303. The camera OOI manager 303 may maintain the list of active objects in the FOV of the first image capturing device 103a or the second image capturing device 301b, and the states of the active objects. Further, the camera OOI manager 303 may look up the list of existing OOIs (i.e., the identified one or more objects) in the first image capturing device 103a or the second image capturing device 103b to maintain the states of respective OOIs. To that end, the NOI may be matched with any existing OOI. Based on the previous coordinates, current coordinates, tag and states of an OOI, the camera OOI manager 303 may update the state along with other information of the OOI. Accordingly, the camera OOI manager 303 may assign a corresponding state to each OOI such as “New”, “Entered”, “Exited”, “Present”, or “Disappeared”. Further, the OOI manager 303 may send a message (e.g. “OOI detected with state updated”) to the cloud OOI manager 313 via OOI detected MQTT to update the OOI database.
[0043] At step 315e, the event notification helper 305 may mark and forward information, associated with the detection and recognition of the new object entered the FOV of the first image capturing device 103a or the second image capturing device 103b, to cloud service over MQTT. At step 317f, the cloud MQTT broker may send the information associated with the detection and recognition of the new object to cloud OOI module manager 313 that may update the new OOI to permanent OOI in the OOI database and notify information of the recognized OOI, via one or more mobile applications, to one or more mobile phone users. Based on the detection and recognition of the one or more objects (i.e. OOI or NOI) in the FOV of the first image capturing device 103a or the second image capturing device 103b, the processor 201 may track the one or more objects in the FOV of the first image capturing device 103a or the second image capturing device 103b. More specifically, the processor 201 may track the one or more objects, across the FOV of a first image capturing device 103a to the FOV of a second image capturing device 103b, based on the unique object ID of the one or more objects. Detailed operation for tracking the one or more objects is described further with reference to FIG. 4.
[0044] Accordingly, recognizing and tracking objects in an FOV of one or more image capturing devices facilitate the system 101 to execute one or more operations such as analyzing the objects and the surrounding of the objects. Such a system (i.e. the system 101) may be implemented with smart devices or different Internet of Things (IOT) infrastructures in order to track objects in real time using image processing, analytics and other advanced technologies like geo location tracking and proximity tracking. This feature enables the system 101 to implement scenarios like “Show videos of a specific user or all unknown people”, “How many people are in the room or house”, “When was the last object was seen” etc.
[0045] FIG. 4 illustrates a flow chart 400 representing a sequence of steps for object recognition and tracking, in accordance with one embodiment. In some embodiments, the sequence of steps is executed by the processor 201 of the system 101. Accordingly, blocks of the flow diagram support combinations of means for performing the specified functions and combinations of operations for performing the specified functions.
[0046] At step 401, the processor 201 may obtain data from the first image capturing device 103a and/or the second image capturing device 103b. The data may comprise one or more of an image or video content captured from the first image capturing device 103a and/or the second image capturing device 103b. In some example embodiments, at least one of the first image capturing device 103a or the second image capturing device 103b may be installed at multiple locations inside the indoor area, for example, offices, schools, colleges, homes, or other indoor premises. In some example embodiments, the first image capturing device 103a and/or the second image capturing device 103b may be installed in an outdoor area such as parks, roads, parking lots, and the like. The one or more objects may correspond to a person, an animal, furniture, tree, and the like. For the sake of convenience henceforth the description of one or more objects has been described by considering the one or more objects as a person. However, this should in no way be considered limiting.
[0047] At step 403, the processor 201 may detect the person associated with the data. The system 101 may detect the person based on the object detection algorithm associated with the CNN. The person may be moving or stationary. Generally, an image may correspond to a stationary person, whereas a video frame or a video may correspond to/indicate a moving person.
[0048] At step 405, the processor 201 may generate a plurality of tags associated with the detected person, based on the object detection algorithm. The plurality of tags may refer to a label that identifies the person to which the plurality of tags is attached. The plurality of tags may comprise face features such as landmarks and key points associated with the face of the person. In some example embodiments, the plurality of tags may comprise face recognized information such as an indication of an unknown person or a known person.
[0049] In some example embodiments, the plurality of tags may be assigned to upper dress color of the detected person, lower dress color of the detected person, a movement of the detected person (for example whether the detected person is moving or stationary), and any other key features associated with the person. In some example embodiments, the plurality of tags may be utilized in identifying the orientation of the person. To that end, a tag may be assigned when a person is looking at upper side with respect to the first image capturing device 103a and/or the second image capturing device 103b, while a different tag may be assigned when a person is looking at lower side with respect to the first image capturing device 103a and/or the second image capturing device 103b. Similarly, different tags may be assigned for different orientations or positions of the person.
[0050] In some example embodiments, each tag associated with the detected person may comprise a value. The value associated with each tag is utilized to convey information about that person. In some example embodiments, the processor 201 may update or store the corresponding tag values in at least one of the cloud 105, the database 107, or the memory 219 for identifying the person. In a non-limiting example, one or more values of a set of tags of the plurality of tags may be changed or updated, whereas one or more values of another set of tags of the plurality of tags may not be updated because they are constant with respect to constant features of the person such as face landmarks and key points of the person that remain same. The set of tags of the plurality of tags that are susceptible to change may be a dress color or a movement of the person.
[0051] At step 407, the processor 201 may associate the plurality of tag values of each person with a unique object identifier (ID). The unique object ID may comprise the information for detecting the face of the person.
[0052] At step 409, the processor 201 may assign the unique object ID to the detected person. To that end, the processor 201 may execute the object detection algorithm associated with the CNN based on the input data. The input data may comprise one or more of an image or video content of a person. A full frame is extracted from the input data and the person’s frame is cropped. The full frame may refer to surroundings or background of the person, whereas the person’s frame may refer to a frame, including the person, cropped from the full frame which is used for face detection.
[0053] Further, the object detection algorithm starts scanning the person’s frame to detect the face of the person. In some example embodiments, the detection and recognition of both front and side faces is supported. Once the face of the person is detected, the detected face is utilized for recognition of the face of the person. The face recognition module 217 in the processor 201 may recognize the detected face and generate information regarding the detected face. The information may comprise the unique object ID assigned to the detected face. The information corresponding to the detected face is stored in at least one of the cloud 105, the database 107, or the memory 219. In some example embodiments, the stored information may be retrieved for identifying and tracking the person across multiple image capturing devices (e.g. the first image capturing device 103a and the second image capturing device 103b).
[0054] At step 411, the processor 201 may identify and track the person across the FOV of the first image capturing device 103a to the FOV of the second image capturing device based on the unique object ID. In some example embodiment, the processor 201 may match the plurality of tags obtained from the input data associated with the detected person with stored unique object ID that comprises the plurality of tags associated with the same person. Based on the result of the matching, the processor 201 may identify the person. Further, based on the identification of the person in the FOV of the first image capturing device 103a, the processor 201 may further detect the person in the FOV of the second image capturing device 103b when the person moves in the FOV of the second image capturing device 103b. In such a case, the processor 201 obtains data associated with the image or video content captured by the second image capturing device 103b. The processor 201 may further detect the person from the data based on the object detection algorithm. Based on the detection of the person in the FOV of the second image capturing device 103b, the processor 201 may determine a time associated with presence of the person in the FOV of the first image capturing device 103a and a time associated with presence of the person in the FOV of the second image capturing device 103b. Further, the processor 201 may calculate a time difference between the time associated with the presence of the person in the FOV of the first image capturing device 103a and the FOV of the second image capturing device 103b. In case, the calculated difference is less than a threshold value, the processor 201 may restrict execution of operations associated with execution of the identification of the person in the FOV of the second image capturing device 103b. In some example embodiments, the processor 201 may determine a time period associated with presence of the person within the FOV of the first image capturing device 103a. The processor 201 may change the at least one value, associated with the at least one tag of the plurality of tags, based on the determined time period being greater than a threshold time period. Further, the processor 201 may update the unique object ID based on the changed at least one value associated with the at least one tag of the plurality of tags. Accordingly, till the time the person is present in the FOV of the first image capturing device 103a and no changes occur in the unique object ID, there is no need for repeatedly recognizing the face and re-initiating the tracker for the recognized person. Therefore, no face recognition request is generated till person is getting tracked in the FOV of the multiple image capturing devices, which also reduces image upload requests to the cloud 105. Accordingly, the system 101 is enabled to track the person in the FOV of the multiple image capturing devices with less computation and less memory consumption which results in a fast operation for object tracking. Further, as the person is recognized only once in the FOV of an image capturing device and tracked in the FOV of the other image capturing devices without executing the recognition operation, the system 101 facilitates the recognition and tracking of the person across the multiple image capturing devices independent of lighting conditions such as low lighting condition and high lighting condition.
[0055] FIG. 5 illustrates an exemplary scenario 500 of a working environment of the system 101 for object recognition and tracking, in accordance with an example embodiment. In FIG. 5, there is shown a first image capturing device 501a (similar to the first image capturing device 103a), a second image capturing device 501b (similar to the second image capturing device 103b), and a person 503 in an indoor premises such as a room.
[0056] When the person 503 is detected in the room in an FOV of the first image capturing device 501a, the system 101 is triggered to recognize and track the person 503. The system 101 generates a plurality of tags associated with the detected person 503, based on the object detection algorithm associated with the CNN. Each of the plurality of tags may convey information associated with the person 503. In an example embodiment, the plurality of tags may remain same or change for the person 503. In an embodiment, the plurality of tags that may remain same for the person 503 may be facial features such as landmarks and key points associated with the face of the person. In an example embodiment, the plurality of tags that may change, but not limited to, dress color, body movement of the person, and the like.
[0057] In some example embodiments, the system 101 may update or store the tags in the database 107 for identifying the person 503. In an embodiment, the tags that may change for the same person 503, but not limited to, change of color of dress and body movement that is different from the body movement already stored in the database107 and the like. The system 101 may not update few tags because they remain constant for the person 503, such as facial features, eyes, nose key points and the like. Based on the object detection algorithm and plurality of tags, the system may assign the unique object ID to the detected person 503. And the unique object (ID) may be stored in the database 107. In an example embodiment, the system 101 may update the unique object ID associated with the object in FOV of the first image capturing device 501a when at least one tag associated with the object is changed for more than a threshold period of time. For example, if the tag associated with the body movement of the person changes for more than a threshold period of time, the system 101 may update the unique object ID associated with the detected person.
[0058] When the person 503 is detected by the first image capturing device 501a, the first image capturing device 501a may request the system 101 for recognition (identification) of a face 505 of the person 503. The system 101 may recognize the detected face 505, based on the object detection algorithm associated with the CNN. The system 101 may further generate information regarding the detected face 505. The system 101 may retrieve information of the detected person 503 from the database 107. The information may comprise the unique object ID assigned to the detected person 503. Based on this, the system 101 may identify and track the person 503 in the FOV of first image capturing device 501a to the FOV of the second image capturing device 501b as described above in description with reference to FIG. 3 and FIG. 4. In an embodiment, the system 101 may not repeatedly recognize and re-initiate the tracking of the person 503 till the time the person is in FOV of the first image capturing device 501a and no changes occur in the unique object ID.
[0059] Further, when the person 503 moves in different FOV i.e. the FOV of second image capturing device 501b, the system 101 may detect the person 503. The system 101 may retrieve information of the detected person 503 already stored in the database 107. The system 101 recognizes the person 503 as same person who was in FOV of the second image capturing device 501b based on the plurality of tags associated unique object (ID) of the detected person 503. The system 101 may not repeatedly recognize the face 505 of the detected person 503.
[0060] FIG. 6 illustrates a flow diagram of a method 600 for object recognition and tracking, in accordance with an example embodiment. It will be understood that each block of the flow diagram of the method 600 may be implemented by various means, such as hardware, firmware, processor, circuitry, and/or other communication devices associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory 219 of the system 101, employing an embodiment of the present invention and executed by a processor 201. As will be appreciated, any such computer program instructions may be loaded onto a computer or other programmable apparatus (for example, hardware) to produce a machine, such that the resulting computer or other programmable apparatus implements the functions specified in the flow diagram blocks. These computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the function specified in the flowchart blocks.
[0061] Accordingly, blocks of the flow diagram support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flow diagram, and combinations of blocks in the flow diagram, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions. The method 600 illustrated by the flowchart diagram of FIG. 6 is for object recognition and tracking. Fewer, more, or different steps may be provided.
[0062] At step 601, the method comprises obtaining unique object (ID) associated with an object. The unique object ID includes information of a face of the object. The unique object (ID) may comprise of plurality of tag values associated with the object captured by a first image capturing device (e.g. a first image capturing device 103a).
[0063] At step 603, the method comprises identifying the object based on the unique object (ID). At step 605, the method comprises tracking the identified object across a first field of view (FOV) of the first image capturing device to a second FOV of a second image capturing device (e.g. the second image capturing device 103b) based on the unique object ID. The first image capturing device is different from the second image capturing device and the first FOV is different from the second FOV
[0064] In an example embodiment, the system for performing the method 600 described above may comprise a processor configured to perform some or each of the operations (601-605) described above. The processor may, for example, be configured to perform the operations (601- 605) by performing hardware implemented logical functions, executing stored instructions, or executing algorithms for performing each of the operations. Alternatively, the system may comprise means for performing each of the operations described above. In this regard, according to an example embodiment, examples of means for performing operations (601- 605) may comprise, for example, the processor and/or a device or circuit for executing instructions or executing an algorithm for processing information as described above.
[0065] In this way example embodiments of the present disclosure provide the system 101 and the method 600 for object recognition and tracking. The disclosed method 600 provides a significant advantage in terms of reducing computational efforts associated with the system 101 when detecting faces. For example, if a person may be assigned with a unique object identifier and tags for the first time, then subsequent detection of the same person may not require assigning unique object identifier. In such situation, the system 101 may retrieve the unique identifier and multiple tags from the database to identify the person. The reduced computational efforts lead to reduction of the memory consumption. The invention provides identification of optimized values for image quality variables, for images captured from the image capturing device, to work best in all types of lighting conditions with higher accuracy. Also, the algorithm employed in the invention may detect the person in the low of lighting conditions with higher accuracy.
[0066] Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the invention.
,CLAIMS:We Claim:
1. A system (101) for object recognition and tracking the system (101), comprising:
a memory (219) configured to store computer-executable instructions; and
one or more processors (201) configured to execute the instructions to:
obtain a unique object identifier (ID) associated with an object (503), wherein the unique object ID includes information of a face (505) of the object (503);
identify the object (503) based on the unique object ID; and
track the identified object (503), across a first field of view (FOV) of a first image capturing device (103a or 501a) to a second FOV of a second image capturing device (103b or 501b), based on the unique object ID.
2. The system (101) of claim 1, wherein
the first image capturing device(103a or 501a) is different from the second image capturing device (103b or 501b), and
the first FOV is different from the second FOV.
3. The system (101) of claim 1, wherein the one or more processor (201) is further configured to:
obtain data from the first image capturing device (103a or 501a), wherein the data comprises one or more of an image or video content associated with the object (503) captured by the first image capturing device (103a or 501a);
detect the object (503) from the data based on an object detection algorithm.
4. The system (101) of claim 3, wherein the one or more processors (201) are further configured to:
generate a plurality of tags, associated with the detected object (503), based on the object detection algorithm, wherein the plurality of tags indicates at least landmarks and key points associated with the face (505) of the object (503); and
associate the plurality of tags with the unique object ID.
5. The system (101) of claim 4, wherein the one or more processors (201) are further configured to store the unique object ID in the memory (219).
6. The system (101) of claim 4, wherein the one or more processors (201) are further configured to assign the unique object ID to the detected object (503).
7. The system (101) of claim 4, wherein the one or more processors (201) are further configured to change at least one value, associated with at least one tag of the plurality of tags, based on the track of the identified object (503).
8. The system (101) of claim 7, wherein the one or more processors (201) are further configured to:
determine a time period associated with presence of the object (503) within the first FOV of the first image capturing device (103a or 501a);
change the at least one value, associated with the at least one tag of the plurality of tags, based on the determined time period being greater than a threshold time period; and
update the unique object ID based on the changed at least one value associated with the at least one tag of the plurality of tags.
9. A method (600) for object recognition and tracking, the method (600), comprising:
obtaining (601) a unique object identifier (ID) associated with an object (503), wherein the unique object ID includes information of a face (505) of the object (503);
identifying (603) the object (503) based on the unique object ID; and
tracking (605) the identified object (503), across a first field of view (FOV) of a first image capturing device (103a or 501a) to a second FOV of a second image capturing device (103b or 501b), based on the unique object ID.
| # | Name | Date |
|---|---|---|
| 1 | 201911039114-FER.pdf | 2021-12-10 |
| 1 | 201911039114-STATEMENT OF UNDERTAKING (FORM 3) [27-09-2019(online)].pdf | 2019-09-27 |
| 2 | 201911039114-PROVISIONAL SPECIFICATION [27-09-2019(online)].pdf | 2019-09-27 |
| 2 | 201911039114-FORM 18 [27-11-2020(online)].pdf | 2020-11-27 |
| 3 | 201911039114-Proof of Right [17-07-2020(online)].pdf | 2020-07-17 |
| 3 | 201911039114-FORM 1 [27-09-2019(online)].pdf | 2019-09-27 |
| 4 | 201911039114-DRAWINGS [27-09-2019(online)].pdf | 2019-09-27 |
| 4 | 201911039114-COMPLETE SPECIFICATION [14-05-2020(online)].pdf | 2020-05-14 |
| 5 | 201911039114-CORRESPONDENCE-OTHERS [14-05-2020(online)].pdf | 2020-05-14 |
| 5 | abstract.jpg | 2019-10-05 |
| 6 | 201911039114-DRAWING [14-05-2020(online)].pdf | 2020-05-14 |
| 6 | 201911039114-FORM-26 [26-12-2019(online)].pdf | 2019-12-26 |
| 7 | 201911039114-DRAWING [14-05-2020(online)].pdf | 2020-05-14 |
| 7 | 201911039114-FORM-26 [26-12-2019(online)].pdf | 2019-12-26 |
| 8 | 201911039114-CORRESPONDENCE-OTHERS [14-05-2020(online)].pdf | 2020-05-14 |
| 8 | abstract.jpg | 2019-10-05 |
| 9 | 201911039114-COMPLETE SPECIFICATION [14-05-2020(online)].pdf | 2020-05-14 |
| 9 | 201911039114-DRAWINGS [27-09-2019(online)].pdf | 2019-09-27 |
| 10 | 201911039114-Proof of Right [17-07-2020(online)].pdf | 2020-07-17 |
| 10 | 201911039114-FORM 1 [27-09-2019(online)].pdf | 2019-09-27 |
| 11 | 201911039114-PROVISIONAL SPECIFICATION [27-09-2019(online)].pdf | 2019-09-27 |
| 11 | 201911039114-FORM 18 [27-11-2020(online)].pdf | 2020-11-27 |
| 12 | 201911039114-STATEMENT OF UNDERTAKING (FORM 3) [27-09-2019(online)].pdf | 2019-09-27 |
| 12 | 201911039114-FER.pdf | 2021-12-10 |
| 1 | 201911039114SEARCHSTRATERGYE_23-11-2021.pdf |