A System And Method For Detecting Activity Of A Human In An Image

Abstract: Disclosed is a method and system for detecting an activity of a human present in a plurality of images. An image capturing module may capture the plurality of images. An image of the plurality of images comprises pixels, wherein each pixel is having a gray scale value and a depth value. An image processing module may analyze each pixel to identify one or more candidate objects of the plurality of objects in the image. An image analysis module may perform a connected component analysis in order to detect a candidate object of the one or more candidate objects as a human. An activity detection module may detect the activity of the candidate object as one of a walking, a standing, a sleeping, and a sitting.

Patent Information

Application #

Filing Date

22 March 2014

Publication Number

40/2015

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

ip@legasis.in

Parent Application

Patent Number

Legal Status

Grant Date

2023-03-22

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. ROY, Sangheeta

Tata Consultancy Services Limited, Building 1B, Ecospace Plot - IIF/12 ,New Town, Rajarhat, Kolkata - 700156, West Bengal, India

2. CHATTOPADHYAY, Tanushyam

Tata Consultancy Services Limited, Building 1B, Ecospace Plot - IIF/12 ,New Town, Rajarhat, Kolkata - 700156, West Bengal, India

3. MUKHERJEE, Dipti Prasad

Indian Statistical Institute, 203, Barrackpore Trunk Road, Kolkata - 700108, West Bengal, India

Specification

CLIAMS:
1. A method for detecting an activity of a human present in a plurality of images, the method comprising:
capturing the plurality of images using a motion sensing device, wherein an image of the plurality of images comprises pixels, and wherein each pixel is having a gray scale value and a depth value, and wherein the gray scale value comprises intensity of each pixel corresponding to an object of a plurality of objects present in the image, and wherein the depth value comprises a distance of each object from the motion sensing device;
analyzing each pixel to identify one or more candidate objects of the plurality of objects in the image by,
executing a background subtraction algorithm on the image in order to remove one or more noisy objects of the plurality of objects from the image,
comparing the gray scale value of each pixel with a pre-defined gray scale value,
replacing a subset of the pixels having the gray scale value less than the pre-defined gray scale value with 0 and a remaining subset of the pixels with 1 in order to derive a binary image corresponding to the image, and
determining the subset as the one or more candidate objects;
performing a connected component analysis on the binary image in order to detect a candidate object of the one or more candidate objects as the human;
retrieving the depth value associated with each pixel corresponding to the candidate object from a look-up table; and
detecting the activity of the candidate object by using the depth value or a floor map algorithm, wherein the capturing, the analyzing, the performing, the retrieving, and the detecting are performed by a processor using programmed instructions stored in a memory.

2. The method of claim 1, wherein the one or more candidate objects is selected from a group comprising the human, a chair, a table.

3. The method of claim 1, wherein the one or more noisy objects is selected from a group comprising a ceiling, a wall, a floor, and a combination thereof.

4. The method of claim 1, wherein the one or more noisy objects are eliminated by using a vertical pixel projection technique.

5. The method of claim 1 further comprising de-noising each pixel in order to retain the depth value associated to each pixel using a nearest neighbor interpolation algorithm, wherein each pixel is de-noised when the depth value of each pixel bounded by the one or more pixels are not within a pre-defined depth value of the one or more pixels.

6. The method of claim 1, wherein the look-up table storing the depth value corresponding to each pixel present in the image.

7. The method of claim 1, wherein the pre-defined gray scale value is 2.

8. The method of claim 1, wherein the predefined threshold value is 20.

9. The method of claim 1, wherein the activity is at least one of a walking, a standing, a sitting or a sleeping.

10. The method of claim 1, wherein the activity of the candidate object is detected as the walking or the standing by,
computing an average depth value of one or more pixels of the pixels in the image, wherein the one or more pixels are associated to the candidate object, and
identifying the activity of the candidate object as the walking when difference of the average depth value of the one or more pixels in the image and the average depth value of the one or more pixels in a subsequent image of the image is greater than a predefined threshold value, or
identifying the activity of the candidate object as the standing.

11. The method of claim 1, wherein the activity of the candidate object is detected as the sleeping or the sitting by using the floor map algorithm and the depth value.

12. A system for detecting an activity of a human present in a plurality of images, the system comprising:
a processor; and
a memory coupled to the processor, wherein the processor executes a plurality of modules stored in the memory, and wherein the plurality of module comprising:
an image capturing module captures the plurality of images using a motion sensing device, wherein an image of the plurality of images comprises pixels, and wherein each pixel is having a gray scale value and a depth value, and wherein the gray scale value comprises intensity of each pixel corresponding to an object of a plurality of objects present in the image, and wherein the depth value comprises a distance of each object from the motion sensing device;
an image processing module analyzes each pixel to identify one or more candidate objects of the plurality of objects in the image by,
executing a background subtraction algorithm on the image in order to remove one or more noisy objects of the plurality of objects from the image,
comparing the gray scale value of each pixel with a pre-defined gray scale value,
replacing a subset of the pixels having the gray scale value less than the pre-defined gray scale value with 0 and a remaining subset of the pixels with 1 in order to derive a binary image corresponding to the image, and
determining the subset as the one or more candidate objects;
an image analysis module
performs a connected component analysis on the binary image in order to detect a candidate object of the one or more candidate objects as the human;
retrieves the depth value associated with each pixel corresponding to the candidate object from a look-up table; and
an activity detection module detects the activity of the candidate object by using the depth value or a floor map algorithm.

13. The system of claim 12, wherein the image processing module further de-noises each pixel in order to retain the depth value associated to each pixel using a nearest neighbor interpolation algorithm, wherein each pixel is de-noised when the depth value of each pixel bounded by the one or more pixels are not within a pre-defined depth value of the one or more pixels.

14. A non-transitory computer readable medium embodying a program executable in a computing device for detecting an activity of a human present in a plurality of images, the program comprising:
a program code for capturing the plurality of images using a motion sensing device, wherein an image of the plurality of images comprises pixels, and wherein each pixel is having a gray scale value and a depth value, and wherein the gray scale value comprises intensity of each pixel corresponding to an object of a plurality of objects present in the image, and wherein the depth value comprises a distance of each object from the motion sensing device;
a program code for analyzing each pixel to identify one or more candidate objects of the plurality of objects in the image by,
executing a background subtraction algorithm on the image in order to remove one or more noisy objects of the plurality of objects from the image,
comparing the gray scale value of each pixel with a pre-defined gray scale value,
replacing a subset of the pixels having the gray scale value less than the pre-defined gray scale value with 0 and a remaining subset of the pixels with 1 in order to derive a binary image corresponding to the image, and
determining the subset as the one or more candidate objects;
a program code for performing a connected component analysis on the binary image in order to detect a candidate object of the one or more candidate objects as the human;
a program code for retrieving the depth value associated with each pixel corresponding to the candidate object from a look-up table; and
a program code for detecting the activity of the candidate object by using the depth value or a floor map algorithm.
,TagSPECI:
FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION

(See Section 10 and Rule 13)

Title of invention:

A SYSTEM AND METHOD FOR DETECTING ACTIVITY OF A HUMAN IN AN IMAGE

APPLICANT:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification describes the invention and the manner in which it is to be performed.
PRIORITY INFORMATION
[001] This patent application does not take priority from any application.

TECHNICAL FIELD
[002] The present subject matter described herein, in general, relates to a system and a method for image processing, and more particularly to the system and the method for detecting activity of a human in an image through the image processing.
BACKGROUND
[003] Detection of human activities of the human for indoor and outdoor surveillance has become a major domain of research. It has been observed that, the detection of the human activities is very effective in applications like video indexing and retrieval, intelligent human machine interaction, video surveillance, health care, driver assistance, automatic activity detection, and predicting person behavior. Some of such applications may be utilized in offices, or retail stores, or shopping malls in order to monitor/detect people present in the offices, or the retail stores, or the shopping malls. It has been further observed that, the detection of the human activities through still images or video frames may also be possible in the indoor surveillance.
[004] In order to detect the human activities in the still images or the video frames, traditional methods have been implemented. However, such methods are capable of detecting RGB-D/ grayscale data along with other information, pertaining to each pixel in the still images or the video frames. This is because, the camera capturing the still images is not static or there is constant variation of lighting/environmental conditions around the camera. Further, since the video frames may contain a human leaning over a wall, or the person occluding on another person, it may be challenge to distinguish the person from the wall or the other person, thereby leading to incorrect/inaccurate detection of the activity corresponding to the person.

SUMMARY
[005] Before the present systems and methods, are described, it is to be understood that this application is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments which are not expressly illustrated in the present disclosures. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present application. This summary is provided to introduce concepts related to systems and methods for detecting an activity of a human in an image and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in detecting or limiting the scope of the claimed subject matter.
[006] In one implementation, a system for detecting an activity of a human present in a plurality of images is disclosed. In one aspect, the system may comprise a processor and a memory coupled to the processor for executing a plurality of modules present in the memory. The plurality of modules may comprise an image capturing module, an image processing module, an image analysis module, and an activity detection module. The image capturing module may capture the plurality of images by using a motion sensing device. In one aspect, an image of the plurality of images may comprise pixels. Each pixel may have a gray scale value and a depth value. The gray scale value may comprise intensity of each pixel corresponding to an object of a plurality of objects present in the image and the depth value may comprise a distance of each object from the motion sensing device. The image processing module may analyze each pixel to identify one or more candidate objects of the plurality of objects in the image by executing a background subtraction algorithm on the image in order to remove one or more noisy objects of the plurality of objects from the image. The image processing module may further compare the gray scale value of each pixel with a pre-defined gray scale value. The image processing module may further replace a subset of the pixels having the gray scale value less than the pre-defined gray scale value with 0 and a remaining subset of the pixels with 1 in order to derive a binary image corresponding to the image. The image processing module may further determine the subset as the one or more candidate objects. The image analysis module may perform connected component analysis on the binary image in order to detect a candidate object of the one or more candidate objects as the human. The image analysis module may further retrieve the depth value associated with each pixel corresponding to the candidate object from a look-up table. The activity detection module may detect the activity of the candidate object by using the depth value or a floor map algorithm.
[007] In another implementation, a method for detecting an activity of a human present in a plurality of images is disclosed. In one aspect, the plurality of images may be captured by using a motion sensing device. It may be understood that, an image of the plurality of images may comprise pixels. Each pixel may have a gray scale value and a depth value. The gray scale value may comprise intensity of each pixel corresponding to an object of a plurality of objects present in the image. The depth value may comprise a distance of each object from the motion sensing device. After capturing the image, each pixel may be analyzed to identify one or more candidate objects of the plurality of objects in the image by executing a background subtraction algorithm on the image in order to remove one or more noisy objects of the plurality of objects from the image. Upon execution of the background subtraction algorithm, the gray scale value of each pixel may be compared with a pre-defined gray scale value. Based on the comparison, a subset of the pixels having the gray scale value less than the pre-defined gray scale value may be replaced with 0 and a remaining subset of the pixels may be replaced with 1 in order to derive a binary image corresponding to the image. Subsequent to the replacement, the subset may be determined as the one or more candidate objects. Upon determination of the one or more candidate objects, a connected component analysis may be performed on the binary image in order to detect a candidate object of the one or more candidate objects as the human. After determining the candidate object as the human, the depth value associated with each pixel corresponding to the candidate object from a look-up table may be retrieved. Subsequent to the retrieval of the depth value, the activity of the candidate object may be detected by using the depth value or a floor map algorithm. In one aspect, the aforementioned method for detecting the activity is performed by a processor using programmed instructions stored in a memory
[008] In yet another implementation, non-transitory computer readable medium embodying a program executable in a computing device for detecting an activity of a human present in a plurality of images is disclosed. The program may comprise a program code for capturing the plurality of images using a motion sensing device. The image of the plurality of images may comprise pixels. In one aspect, each pixel may have a gray scale value and a depth value. The gray scale value may comprise intensity of each pixel corresponding to an object of a plurality of objects present in the image. The depth value may comprise a distance of each object from the motion sensing device. The program may further comprise a program code for analyzing each pixel to identify one or more candidate objects of the plurality of objects in the image. The program code for the analyzing further comprises executing a background subtraction algorithm on the image in order to remove one or more noisy objects of the plurality of objects from the image. The program code for the analyzing further comprises comparing the gray scale value of each pixel with a pre-defined gray scale value. The program code for the analyzing further comprises replacing a subset of the pixels having the gray scale value less than the pre-defined gray scale value with 0 and a remaining subset of the pixels with 1 in order to derive a binary image corresponding to the image. The program code for the analyzing further comprises determining the subset as the one or more candidate objects. The program may further comprise a program code for performing a connected component analysis on the binary image in order to detect a candidate object of the one or more candidate objects as the human. The program may further comprise a program code for retrieving the depth value associated with each pixel corresponding to the candidate object from a look-up table. The program may further comprise a program code for detecting the activity of the candidate object by using the depth value or a floor map algorithm.

BRIEF DESCRIPTION OF THE DRAWINGS
[009] The foregoing detailed description of embodiments is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the disclosure, there is shown in the present document example constructions of the disclosure; however, the disclosure is not limited to the specific methods and apparatus disclosed in the document and the drawings.
[0010] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[0011] Figure 1 illustrates a network implementation for detecting an activity of a human present in a plurality of images is shown, in accordance with an embodiment of the present disclosure.
[0012] Figure 2 illustrates the system, in accordance with an embodiment of the present disclosure.
[0013] Figure 3(a) -3(d) illustrates an example, in accordance with an embodiment of the present disclosure.
[0014] Figure 4, 5 and 6 illustrates a method for detecting the activity of the human present in the plurality of images, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0015] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the exemplary, systems and methods are now described. The disclosed embodiments are merely exemplary of the disclosure, which may be embodied in various forms.
[0016] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure is not intended to be limited to the embodiments illustrated, but is to be accorded the widest scope consistent with the principles and features described herein.
[0017] System and method for detecting an activity of a human present in a plurality of images is described. In one aspect, the plurality of images may refer to a sequence of a video frames hereinafter referred to as the ‘plurality of images’ of a video. It may be understood that, the plurality of images may include a plurality of objects such as a living object or a non-living object. Thus, in order to detect the activity corresponds to the living object (i.e. a human) in the video, the system and the method, initially, may capture the plurality of images using a motion sensing device. Example of the motion sensing device may include, but not limited to, a Kinect™ device. It may be understood that, an image of the plurality of images may comprise pixels. Each pixel may have a gray scale value and a depth value. The gray scale value indicates intensity of each pixel corresponding to an object of the plurality of objects present in the image. The depth value indicates a distance of each object from the Kinect™ device. In one embodiment, one or more pixels of the pixels may include some noise resulting into non-inclusion of the depth value while the image is captured by the Kinect™ device. Thus, a nearest neighbor interpolation algorithm may be implemented for de-noising the one or more pixels in order to retain the depth value associated to the one or more pixels.
[0018] Subsequent to the capturing of the plurality of images, each pixel may be analyzed to identify one or more candidate objects of the plurality of objects in the image. It may be understood that, the one or more candidate objects may be identified by eliminating one or more noisy objects from the plurality of objects. In one aspect, the one or more noisy objects may be eliminated by using a background subtraction algorithm or a vertical pixel projection technique. Examples of the one or more candidate objects may include, but not limited to, the human, a chair, and a table. Examples of the one or more noisy objects may include, but not limited to, a ceiling, a wall, and a floor. Subsequent to the elimination of the one or more noisy objects, the gray scale value of each pixel may be compared with a pre-defined gray scale value, and then a subset of the pixels having the gray scale value less than the pre-defined gray scale value may be replaced with 0 and a remaining subset of the pixels may be replaced with 1 in order to derive a binary image corresponding to the image. After determining the one or more candidate objects, a connected component analysis may be performed on the binary image in order to detect a candidate object of the one or more candidate objects as the living object (i.e. the human).
[0019] Since the binary image does not include the depth value of the pixels therefore, in order to detect the activity of the candidate object, the depth value associated with each pixel corresponding to the candidate object may be retrieved from a look-up table. In one aspect, the look-up table may store the depth value corresponding to each pixel present in the image. Thus, based on the depth value, the activity of the candidate object may be detected as one of a walking, a standing, a sleeping, and a sitting using at least one of an average depth value of the one or more pixels, a floor map algorithm and the depth value.
[0020] While aspects of described system and method for detecting the activity of the human present in the plurality of images may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
[0021] Referring now to Figure 1, a network implementation 100 of a system 102 for detecting an activity of a human present in a plurality of images is illustrated, in accordance with an embodiment of the present disclosure. The system 102 may capture the plurality of images by using a motion sensing device. In one aspect, an image of the plurality of images may comprise pixels. Each pixel may have a gray scale value and a depth value. The gray scale value may comprise intensity of each pixel corresponding to an object of a plurality of objects present in the image and the depth value may comprise a distance of each object from the motion sensing device. The system 102 may further analyze each pixel to identify one or more candidate objects of the plurality of objects in the image by executing a background subtraction algorithm on the image in order to remove one or more noisy objects of the plurality of objects from the image. The system 102 may further compare the gray scale value of each pixel with a pre-defined gray scale value. The system 102 may further replace a subset of the pixels having the gray scale value less than the pre-defined gray scale value with 0 and a remaining subset of the pixels with 1 in order to derive a binary image corresponding to the image. The system 102 may further determine the subset as the one or more candidate objects. The system 102 may further perform connected component analysis on the binary image in order to detect a candidate object of the one or more candidate objects as the human. The system 102 may further retrieve the depth value associated with each pixel corresponding to the candidate object from a look-up table. The system 102 may further detect the activity of the candidate object as one of a walking, a standing, a sleeping, and a sitting. In one aspect, the walking or standing may be detected by computing an average depth value of one or more pixels of the pixels in the image. The one or more pixels may be associated to the candidate object. The system 102 may further identify the activity of the candidate object as the walking when difference of the average depth value of the one or more pixels in the image and the average depth value of the one or more pixels in a subsequent image of the image is greater than a predefined threshold value. Alternatively, the system 102 may identify the activity of the candidate object as the standing. In another aspect, when the activity of the candidate object is identified as the standing, the system 102 may further detect the activity as the sleeping or the sitting by using a floor map algorithm and the depth value.
[0022] Although the present subject matter is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, a cloud-based computing environment and the like. In one implementation, the system 102 may comprise the cloud-based computing environment in which the user may operate individual computing systems configured to execute remotely located applications. It will be understood that the system102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user 104 hereinafter, or applications residing on the user devices 104. In one implementation, the system 102 may comprise the cloud-based computing environment in which a user may operate individual computing systems configured to execute remotely located applications. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system102 through a network 106.
[0023] In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0024] Referring now to Figure 2, the system 102 is illustrated in accordance with an embodiment of the present disclosure. In one embodiment, the system 102 may include at least one processor 202, an input/output (I/O) interface 204, and a memory 206. The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors,central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 202 is configured to fetch and execute computer-readable instructions stored in the memory 206.
[0025] The I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 102 to interact with the user directly or through the user devices 104 also hereinafter referred to as client devices 104. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.
[0026] The memory 206 may include any computer-readable medium and computer program product known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 206 may include modules 208 and data 210.
[0027] The modules 208 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 208 may include an image capturing module 212, an image processing module 214, an image analysis module 216, an activity detection module 218, and other modules 220. The other modules 220 may include programs or coded instructions that supplement applications and functions of the system 102. The modules 208 described herein may be implemented as software modules that may be executed in the cloud-based computing environment of the system 102.
[0028] The data 210, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules 208. The data 210 may also include a database 222 and other data 224. The other data 224 may include data generated as a result of the execution of one or more modules in the other modules 220.
[0029] In one implementation, at first, a user may use the client devices 104 to access the system 102 via the I/O interface 204. The user may register themselves using the I/O interface 204 in order to use the system 102. In one aspect, the user may accesses the I/O interface 204 of the system 102 for detecting an activity of a human present in a plurality of images. In order to detect the activity, the system 102 may employ the plurality of modules i.e. the image capturing module 212, the image processing module 214, the image analysis module 216, and the activity detection module 218. The detailed working of the plurality of modules is described below.
[0030] Further referring to Figure 2, at first, the image capturing module 212 may capture the plurality of images by using a motion sensing device. In one aspect, the plurality of images may refer to a sequence of video frames hereinafter referred to as the ‘plurality of images’ of a video. Example of the motion sensing device may include a Kinect™ device. It may be understood that, the Kinect™ device is capable of capturing an image of the plurality of images along with metadata associated to the image. The metadata may include a gray scale value and a depth value. In one aspect, the gray scale value and the depth value pertaining to an object of a plurality of objects present in the image. In one aspect, the gray scale value may indicate intensity of each pixel corresponding to the object, whereas the depth value may indicate distance of the object from the Kinect™ device as illustrated in figure 3(a). In one example, the image, captured by the image capturing module 212 may comprise living objects (i.e. a human) and non-living objects (i.e. a refrigerator, or a chair). Since the plurality of objects present in the image may be located at distinct location, therefore the image capturing module 212 may determine the depth value along with gray scale value of each object in the image.
[0031] In one embodiment, at least one pixel of the pixels associated to the image, captured by the Kinect™ device, may include some noise resulting into non-inclusion of the depth value. In one aspect, the noise indicates that the at least one pixel are appearing in ‘black’ color due to non-presence of the depth value. Thus, in order to retain the depth value associated to the at least one pixel, the image processing module 214 may de-noise the at least one pixel in order to retain the depth value associated to the at least one pixel. In one aspect, the depth value corresponding to the at least one pixel may be de-noised by using a nearest neighbor interpolation algorithm in which at least one pixel may be de-noised when the depth value of the at least one pixel, bounded by one or more other pixels of the pixels, is not within a pre-defined depth value of the one or more pixels.
[0032] In order to understand the working of the nearest neighbor interpolation algorithm, consider an image captured by the Kinect™ device as illustrated in Figure 3(a), where a pixel i.e. ‘E’ is bounded by the one or more other pixels i.e. ‘A’, ‘B’, ‘C’, ‘D’, ‘F’, ‘G’, ‘H’, and ‘I’. In one aspect, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, and ‘I’ indicates 9 pixels corresponding an object present in the image. It may be observed from the image, that ‘E’ includes the depth value as ‘null’, therefore it is considered as noise in the image. In order to retain the depth value corresponding to a pixel i.e. ‘E’, the image processing module 214 may implement the nearest neighbor interpolation algorithm on the image. In one aspect, the nearest neighbor interpolation algorithm may use a formulation as mentioned below in order to determine the average of the one or more other pixels that are bounded by the pixel ‘E’.

[0033] Since, the one or more other pixels including ‘A’, ‘B’, ‘C’, ‘D’, ‘F’, ‘G’, ‘H’, and ‘I’, therefore the image processing module 214 may determine an average of the depth value corresponding to the one or more other pixels (i.e. ‘A’, ‘B’, ‘C’, ‘D’, ‘F’, ‘G’, ‘H’, and ‘I’). Based on the determination of the average, the image processing module 214 may substitute the average of the depth value to the depth value corresponding to the pixel ‘E’. In this manner, the noise pertaining to the pixel ‘E’ in the image may be de-noised.
[0034] Subsequent to the capturing of the plurality of images, the image processing module 214 may analyze each pixel to identify one or more candidate objects of the plurality of objects in the image. Example of the one or more candidate objects may include, but not limited to, a human, a chair, and a table. In order to identify the one or more candidate objects, the image processing module 214 may execute a background subtraction algorithm on the image in order to remove one or more noisy objects of the plurality of objects as illustrated in figure 3(c). Examples of the one or more noisy objects may include, but not limited to, a ceiling, a wall, and a floor. In one aspect, the background subtraction algorithm is a technique in the field of an image processing wherein a foreground of the image may be extracted for further processing. The foreground of the image may indicate the one or more candidate objects are present in the image. In one aspect, the background subtraction algorithm is a used widely in the art for detecting the plurality of objects in an image frame captured from a video.
[0035] After executing the background subtraction algorithm, the image processing module 214 may further compare the gray scale value of each pixel, corresponding to the one or more candidate objects, with a pre-defined gray scale value. In one aspect, the pre-defined gray scale value is 2. Upon comparing the gray scale value of each pixel with the pre-defined gray scale value, the image processing module 214 may further replace a subset of the pixels having the gray scale value less than the pre-defined gray scale value with ‘0’ and a remaining subset of the pixels with ‘1’ in order to derive a binary image corresponding to the image. In one aspect, ‘0’ indicates the subset of the pixels is turned into ‘black’ in color having the gray scale value of ‘0’, whereas ‘1’ indicates the remaining subset of the pixels is turned into ‘white’ in color having the gray scale value as ‘255’. Thus, in this manner, the subset of the pixels (assigned with ‘0’ in the binary image) having the gray scale value greater than the pre-defined gray scale value may be determined as the one or more candidate objects.
[0036] Since the binary image may contain numerous imperfections and noise, therefore the image analysis module 216 may perform morphological image processing in order to remove the imperfections and the noise. In one aspect, the image analysis module 216 may perform a connected component analysis and morphological operations on the binary image in order to detect a candidate object of the one or more candidate objects as the human. It may be understood that, the candidate object may be detected by removing the imperfections and the noise from the binary image by using the connected component analysis and morphological operations as illustrated in the figure 3(d). Examples of the morphological operations may include, but not limited to, Erosion, Dilation, Opening, and Closing. After performing the connected component analysis, the image analysis module 216 may further retrieve the depth value associated with each pixel corresponding to the candidate object from a look-up table. In one aspect, the depth value may be retrieved by creating a mask of the binary image. The mask may facilitate to retrieve the depth value of each pixel associated to the subset of the pixels present in the binary image. It may be understood that, the depth value of each pixel of the remaining subset of the pixels in the binary image assigned with non zero (i.e. 1) is unchanged, whereas the depth value of each pixel of the subset of the pixels (detected as the candidate object) in the binary mask assigned with 0 is retrieved using the look-up table stored in a database 222. In one aspect, the look-up table stores the depth value corresponding to each pixel present in the image.
[0037] After retrieving the depth value, the activity detection module 218 may detect the activity of the candidate object present in the image by using the depth value or a floor map algorithm. In one embodiment, the activity may be detected as one of, but not limited to, a walking, a standing, a sleeping, and a sitting. However, it may be understood that, the aforementioned method may also facilitate to detect other activities than the walking, the standing, the sleeping, and the sitting based on the depth value or the floor map algorithm.
[0038] In one embodiment, the activity of the candidate object may be detected as the walking or the standing by computing an average depth value of one or more pixels of the pixels in the image. It may be understood that, the one or more pixels may be associated to the candidate object. In one aspect, subsequent to the computation of the average depth value, the activity of the candidate object may be identified as the walking when difference of the average depth value of the one or more pixels in the image and the average depth value of the one or more pixels in a subsequent image of the image is greater than a predefined threshold value. In another aspect, the activity of the candidate object may be identified as the standing, the sleeping or the sitting when difference of the average depth value of the one or more pixels in the image and the average depth value of the one or more pixels in the subsequent image of the image is less than the predefined threshold value. In one aspect, the predefined threshold value is 20. It is to be understood from the aforementioned description that when difference of the average depth value of the one or more pixels in the image and the average depth value of the one or more pixels in the subsequent image of the image is greater than 20, then the activity is detected as the walking otherwise the activity may be detected as one of the standing, the sitting or the sleeping. In one aspect, if the activity is not detected as the walking, then it is be understood that the candidate object is in no motion since the difference between the average depth value of the one or more pixels in the image and the average depth value of the one or more pixels in the subsequent image of the image is within the predefined threshold value i.e. 20. Thus, upon determining the candidate object is in no motion in most of the images amongst the plurality of images, the activity may be detected as one of the standing, the sitting or the sleeping.
[0039] In one example, the average depth value over a sequence of frames (i.e. n+4) may be computed, where ‘n’ indicates a first frame of the sequence. Further, the average depth value for the ‘nth’ frame and the subsequent frame of the ‘nth’ frame (i.e. n+1 frame) are computed by using a below formulation
[0040]
[0041] Therefore, the average depth value for a (n+2) frame, a (n+3) frame and a (n+4) frame are indicated by , , . Further, based on the below formulation:
[0042] , Standing, Sitting or sleeping
[0043] It may be understood that, if most of the frames among 5 frames are greater than predefined threshold value (i.e. 20), then the activity of the candidate object present in the sequence of frames is detected as the “walking”, otherwise the activity is detected as one of the “standing”, the “sitting”, or the “sleeping”.
[0044] In order to detect the activity as one of the standing, the sitting or the sleeping, the activity detection module 218 may use the floor map algorithm and the depth value. In one aspect, if the person spatial location is within a predefined sofa or bed position, the activity detection module 218 may use the floor map algorithm in order to detect the activity as the sitting. Further, the activity detection module 218 may extract features corresponding to the candidate object. A commonly known technique called look-up table (LUT) is used to map the depth value of the one or more pixels corresponding to the candidate object. It may be understood that, a histogram may be produced by matching the depth value with a bin using the look-up table (LUT). The look-up table may store an intensity transformation function. The intensity transformation may generate some specific output values of the corresponding input values. Those output value are quantized to number of bins. In one aspect, the number of bins is particularly evident in having some activities like the sleeping, which are generated by the depth value of the candidate object. In one aspect, if the activity of the candidate object in the images is detected as the sleeping, then it is to be understood that a depth distribution or the depth space by the candidate object is more than the depth distribution detected in the activity like the standing or the walking. Thus based on the aforementioned logic, the depth distribution for 5 frames may be calculated. Further, it may be understood that, if the depth distribution of most of the frames among the 5 frames are greater than the number of bins, then it is to be understood that, the activity of the candidate object in the sequence of frames is detected as the sleeping. Otherwise the activity of the candidate object in the sequence of frames is detected as the standing. In one example, when the depth distribution is greater than 5 out of 8 bins, then the activity is detected as the sleeping otherwise the activity is detected as the standing.
[0045] Referring now to Figure 4, a method 400 for detecting an activity of a human present in a plurality of images is shown, in accordance with an embodiment of the present disclosure. The method 400 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 400 may be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0046] The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 400 or alternate methods. Additionally, individual blocks may be deleted from the method 400 without departing from the spirit and scope of the disclosure described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 400 may be considered to be implemented in the above described in the system 102.
[0047] At block 402, the plurality of images may be captured by using a motion sensing device. In one aspect, an image of the plurality of images may comprise pixels, and each pixel may have a gray scale value and a depth value. The gray scale value may comprise intensity of each pixel corresponding to an object of a plurality of objects present in the image, and the depth value may comprise a distance of each object from the motion sensing device. In one implementation, the plurality of images may be captured by the image capturing module 212.
[0048] At block 404, each pixel may be analyzed to identify one or more candidate objects of the plurality of objects in the image. In one implementation, each pixel may be analyzed by the image processing module 214. Further, the block 404 may be explained in greater detail in Figure 5.
[0049] At block 406, a connected component analysis may be performed on a binary image in order to detect a candidate object of the one or more candidate objects as the human. In one aspect, the binary image may be derived from the image as explained below in figure 5. In one implementation, the connected component analysis may be performed by the image analysis module 216.
[0050] At block 408, the depth value associated with each pixel corresponding to the candidate object may be retrieved from a look-up table. In one implementation, the depth value may be retrieved by the image analysis module 216.
[0051] At block 410, the activity of the candidate object may be detected as one of a walking, a standing, a sleeping, and a sitting. In one implementation, the activity of the candidate object may be detected by the activity detection module 218. Further, the block 410 may be explained in greater detail in Figure 6.
[0052] Referring now to Figure 5, the block 404 for analyzing each pixel to identify one or more candidate objects of the plurality of objects in the image is shown, in accordance with an embodiment of the present subject matter.
[0053] At block 502, a background subtraction algorithm may be executed on the image in order to remove one or more noisy objects of the plurality of objects from the image. In one implementation, the background subtraction algorithm may be executed by the image processing module 214.
[0054] At block 504, the gray scale value of each pixel may be compared with a pre-defined gray scale value. In one implementation, the gray scale value of each pixel may be compared with the pre-defined gray scale value by the image processing module 214.
[0055] At block 506, a subset of the pixels having the gray scale value less than the pre-defined gray scale value may be replaced with 0 and a remaining subset of the pixels may be replaced with 1 in order to derive a binary image corresponding to the image. In one implementation, the subset and the remaining subset are replaced by the image processing module 214.
[0056] At block 508, the subset may be determined as the one or more candidate objects. In one implementation, the subset may be determined as the one or more candidate objects by the image processing module 214.
[0057] Referring now to Figure 6, the block 410 for detecting the activity of the candidate object as one of the walking, the standing, the sleeping, and the sitting is shown, in accordance with an embodiment of the present subject matter.
[0058] At block 602, an average depth value of one or more pixels of the pixels in the image may be computed. In one aspect, the one or more pixels may be associated to a candidate object of the one or more candidate objects. In one implementation, the average depth value of one or more pixels of the pixels in the image may be computed by the activity detection module 218.
[0059] At block 604, the activity of the candidate object may be identified as the walking when difference of the average depth value of the one or more pixels in the image and the average depth value of the one or more pixels in a subsequent image of the image is greater than a predefined threshold value. In one implementation, the activity of the candidate object as the walking may be identified by the activity detection module 218.
[0060] At block 606, the activity of the candidate object may be identified as the standing when difference of the average depth value of the one or more pixels in the image and the average depth value of the one or more pixels in the subsequent image of the image is less than the predefined threshold value. In one implementation, the activity of the candidate object may be identified as the standing by the activity detection module 218.
[0061] At block 608, the activity of the candidate object may be detected as the sleeping or the sitting subsequent to the detection of the activity of the candidate object as the standing. In one aspect, the activity of the candidate object may be detected as the sleeping or the sitting by using a floor map algorithm and the depth value. In one implementation, the activity of the candidate object may be detected as the sleeping or the sitting by the activity detection module 218.
[0062] Although implementations for methods and systems for detecting an activity of a human present in a plurality of images have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for detecting the activity of the human present in the plurality of images.
[0063] Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include those provided by the following features.
[0064] Some embodiments enable a system and a method to detect the activity of the object identified as human in the image.
[0065] Some embodiments enable the system and the method to detect the activity as one of a walking, a standing, a sleeping, and a sitting.
[0066] Some embodiments enable the system and the method to detect the activity based on the depth information pertaining to one or more pixels of the object identified as human in the image.

Documents

Application Documents

#	Name	Date
1	949-MUM-2014-IntimationOfGrant22-03-2023.pdf	2023-03-22
1	949-MUM-2014-Request For Certified Copy-Online(23-02-2015).pdf	2015-02-23
2	949-MUM-2014-PatentCertificate22-03-2023.pdf	2023-03-22
2	Thumbs.db	2018-08-11
3	Form 3.pdf	2018-08-11
3	949-MUM-2014-Written submissions and relevant documents [15-02-2023(online)].pdf	2023-02-15
4	Form 2_MarkUp Copy.pdf	2018-08-11
4	949-MUM-2014-Response to office action [07-02-2023(online)].pdf	2023-02-07
5	Form 2_Clean Copy.pdf	2018-08-11
5	949-MUM-2014-Correspondence to notify the Controller [01-02-2023(online)].pdf	2023-02-01
6	Form 2.pdf	2018-08-11
6	949-MUM-2014-FORM-26 [01-02-2023(online)]-1.pdf	2023-02-01
7	Form 13.pdf	2018-08-11
7	949-MUM-2014-FORM-26 [01-02-2023(online)].pdf	2023-02-01
8	Figure for Abstract.jpg	2018-08-11
8	949-MUM-2014-US(14)-HearingNotice-(HearingDate-07-02-2023).pdf	2023-01-19
9	949-MUM-2014-CLAIMS [12-07-2019(online)].pdf	2019-07-12
9	Drawings.pdf	2018-08-11
10	949-MUM-2014-COMPLETE SPECIFICATION [12-07-2019(online)].pdf	2019-07-12
10	Certified Copy-949-mum-2014.pdf ONLINE	2018-08-11
11	949-MUM-2014-FER_SER_REPLY [12-07-2019(online)].pdf	2019-07-12
11	Certified Copy-949-mum-2014.pdf	2018-08-11
12	949-MUM-2014-OTHERS [12-07-2019(online)].pdf	2019-07-12
12	ABSTRACT1.jpg	2018-08-11
13	949-MUM-2014-FER.pdf	2019-01-14
13	949-MUM-2014-FORM 26(25-4-2014).pdf	2018-08-11
14	949-MUM-2014-CORRESPONDENCE(25-4-2014).pdf	2018-08-11
14	949-MUM-2014-FORM 18.pdf	2018-08-11
15	949-MUM-2014-CORRESPONDENCE(7-4-2014).pdf	2018-08-11
15	949-MUM-2014-FORM 1(7-4-2014).pdf	2018-08-11
16	949-MUM-2014-CORRESPONDENCE(7-4-2014).pdf	2018-08-11
16	949-MUM-2014-FORM 1(7-4-2014).pdf	2018-08-11
17	949-MUM-2014-FORM 18.pdf	2018-08-11
17	949-MUM-2014-CORRESPONDENCE(25-4-2014).pdf	2018-08-11
18	949-MUM-2014-FER.pdf	2019-01-14
18	949-MUM-2014-FORM 26(25-4-2014).pdf	2018-08-11
19	949-MUM-2014-OTHERS [12-07-2019(online)].pdf	2019-07-12
19	ABSTRACT1.jpg	2018-08-11
20	949-MUM-2014-FER_SER_REPLY [12-07-2019(online)].pdf	2019-07-12
20	Certified Copy-949-mum-2014.pdf	2018-08-11
21	949-MUM-2014-COMPLETE SPECIFICATION [12-07-2019(online)].pdf	2019-07-12
21	Certified Copy-949-mum-2014.pdf ONLINE	2018-08-11
22	949-MUM-2014-CLAIMS [12-07-2019(online)].pdf	2019-07-12
22	Drawings.pdf	2018-08-11
23	949-MUM-2014-US(14)-HearingNotice-(HearingDate-07-02-2023).pdf	2023-01-19
23	Figure for Abstract.jpg	2018-08-11
24	Form 13.pdf	2018-08-11
24	949-MUM-2014-FORM-26 [01-02-2023(online)].pdf	2023-02-01
25	Form 2.pdf	2018-08-11
25	949-MUM-2014-FORM-26 [01-02-2023(online)]-1.pdf	2023-02-01
26	Form 2_Clean Copy.pdf	2018-08-11
26	949-MUM-2014-Correspondence to notify the Controller [01-02-2023(online)].pdf	2023-02-01
27	Form 2_MarkUp Copy.pdf	2018-08-11
27	949-MUM-2014-Response to office action [07-02-2023(online)].pdf	2023-02-07
28	Form 3.pdf	2018-08-11
28	949-MUM-2014-Written submissions and relevant documents [15-02-2023(online)].pdf	2023-02-15
29	949-MUM-2014-PatentCertificate22-03-2023.pdf	2023-03-22
30	949-MUM-2014-Request For Certified Copy-Online(23-02-2015).pdf	2015-02-23
30	949-MUM-2014-IntimationOfGrant22-03-2023.pdf	2023-03-22

Search Strategy

1	SEARCHSTRATEGYFOR949_10-01-2019.pdf