Sign In to Follow Application
View All Documents & Correspondence

A System And Method For Enhancing Human Counting By Fusing Results Of Human Detection Modalities

Abstract: The present invention discloses a method and a system for enhancing accuracy of human counting in at least one frame of a captured image in a real-time in a predefined area. The present invention detects human in one or more frames by using at least one human detection modality for obtaining the characteristic result of the captured image. The invention further calculates an activity probability associated with each human detection modality. The characteristic results and the activity probability are selectively integrated by using a fusion technique for enhancing the accuracy of the human count and for selecting the most accurate human detection modality. The human is then performed based on the selection of the most accurate human detection modality.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
09 November 2011
Publication Number
26/2013
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

TATA CONSULTANCY SERVICES LIMITED
NIRMAL BUILDIANG,9TH FLOOR, NARIMAN POINT,MUMBAI 400021, MAHARASHTRA, INDIA.

Inventors

1. GUPTA ROHIT
TATA CONSULTANCY SERVICES,BENGAL INTELLIGENCE PARK, PLOT A2,M2,& N2, SECTOR V,BLOCK GP,SALT LAKE ELECTRONICS COMPLEX KOLKATA-700091, WEST BENGAL INDIA
2. SINHA ANIRUDDHA
TATA CONSULTANCY SERVICES,BENGAL INTELLIGENCE PARK, PLOT A2,M2,& N2, SECTOR V,BLOCK GP,SALT LAKE ELECTRONICS COMPLEX KOLKATA-700091, WEST BENGAL INDIA
3. PAL ARPAN
TATA CONSULTANCY SERVICES,BENGAL INTELLIGENCE PARK, PLOT A2,M2,& N2, SECTOR V,BLOCK GP,SALT LAKE ELECTRONICS COMPLEX KOLKATA-700091, WEST BENGAL INDIA
4. CHAKRAVORTI ARITRA
FLAT-201,APSARA-APARTMENT,7, K.K. STREET,UTTARPARA,HOOGHLY- 712258,WEST BENGAL, INDIA.

Specification

FORM 2 THE PATENTS ACT, 1970 (39 of 1970) & THE PATENT RULES, 2003 COMPLETE SPECIFICATION (See Section 10 and Rule 13) Title of invention: A SYSTEM AND METHOD FOR ENHANCING HUMAN COUNTING BY FUSING RESULTS OF HUMAN DETECTION MODALITIES Applicant TATA Consultancy Services Limited A company Incorporated in India under The Companies Act, 1956 Having address: Nirmal Building, 9th Floor, Nariman Point, Mumbai 400021, Maharashtra, India The following specification particularly describes the invention and the manner in which it is to be performed. FIELD OF THE INVENTION The present invention generally relates to the field of image processing and particularly to a method and the system for enhancing the accuracy of human count in an image in real time. The present invention is an improvement of the invention described and claimed in an earlier Indian Patent Application no. 1359/MUM/2011. BACKGROUND OF THE INVENTION Detection of human activities in an image or video is of crucial importance and it is critical to determine the human presence for applications where automatic human body detection is a key enabler such as security and surveillance, robotics, surveillance and Intelligent Transport System, autonomous vehicle and automatic driver assistance system etc. Similarly, in computer vision systems, segmentation of image for detection of objects in each segment and differentiating human from other objects is still a challenge. Large numbers of visual pattern that appear in an image increase the complexity. Human detection involves the ability of hardware and software to detect the presence of human in an image. Detection of human in an image currently is performed by using various human detection techniques and algorithms. Though such techniques and algorithms are widely used however results provided by said techniques or algorithms often contain large number of false predictions. Many solutions have been proposed to address the problems associated with reduction of the false predictions or errors associated with the human detection and tracking techniques. One of the frequently followed techniques for detection of human is to combine plurality of human detection techniques in order to detect human in real time. However, the success of combination is affected by an error associated with each detection technique. One such solution has been disclosed in US 7,162,076 of Chengjun Liu that teaches a vector to represent an image to be analyzed, from the DFA vector as processed using a Bayesian fusion Classifier technique. Although, the method discloses face detection with relatively low probability of error and false detection rate but it remains silent on determining the accuracy of the solution when more than one technique or algorithm is involved. Therefore, there is a need in the art of a solution which is capable of reducing the false predictions of the plurality of techniques available for human detection by determining the accuracy of all the techniques applied for detection of human in an image. OBJECTS OF THE INVENTION It is the primary object of the invention to suggest a system and method that enhances the accuracy of human count in a human detection modality. It is another object of the invention to suggest a system and method that reduces the non-reliable factors associated with the human detection modality. It is yet another object of the invention to suggest a system and method that selects the most accurate human detection modality for counting a human in frame of a captured image. SUMMARY OF THE INVENTION In one aspect the present invention discloses a method for enhancing accuracy of human counting in at least one frame of a captured image in real-time in a predefined viewing area, wherein the said method comprises processor implemented steps of, detecting human in one or more frames by using at least one human detection modality for obtaining a characteristic result of the said captured image and calculating an accuracy probability by switching between the obtained characteristic result of the human detection modalities and by using the pre calculated activity probability. The activity probability is adapted to determine a variance in detected human count in each frame. The said method further comprises of the processor implemented step of selectively integrating the obtained characteristic results for each frame from a combination of human detection modalities and the activity probability by using a selection technique for detecting the location of the human in the predefined viewing area. The combination of human detection modalities is based on a Bayesian fusion technique. In another aspect the present invention also discloses a system for enhancing accuracy of human counting in at least one frame of a captured image in real-time in a predefined viewing area wherein the said system comprises of a detection unit embedded with at least one modality component. The detection unit is configured to detect human in cooperation with at least one human detection modality to obtain characteristic result associated with the said captured image. The system further comprises of a calculation module adapted to calculate an activity probability associated with each human detection modality. The activity probability determines a variance in detected human count in each frame. The system further comprises of a fusion processor adapted to selectively integrate the plurality of characteristic results obtained from each human detection modality for each frame. BRIEF DESCRIPTION OF DRAWINGS Figure 1 illustrates the architecture of the system in accordance with an embodiment of the invention. Figure 2 illustrates the mechanism of human count in accordance with an alternate embodiment of the invention. Figure 3 illustrates the process flow of accuracy calculation in accordance with an alternate embodiment of the invention. Figure 4 illustrates an exemplary embodiment of the invention. Figure 5 illustrates the results of detection accuracy in accordance with an exemplary embodiment of the invention. Figure 6 illustrates the results for non-reliable factors in accordance with an alternate embodiment of the invention. DETAILED DESCRIPTION OF THE INVENTION Some embodiments of this invention, illustrating its features, will now be discussed: The words "comprising", "having", "containing", and "including", and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a", "an", and "the" include plural references unless the context clearly dictates otherwise. Although any systems, methods, apparatuses, and devices similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred, systems and parts are now described. The disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms. The present invention relates to a method and a system for enhancing accuracy of human counting. Human counting is usually performed by using a plurality of human detection modalities. These human detection modalities, e.g. Histogram Oriented Gradient (HOG), Haar, and Background Subtraction (BG) detect and tracks human images to determine number thereof. Each human detection modality is associated with certain non-reliable factors, example, Fluctuation in video frames, improper detection of human, false positives, etc. The present invention calculates an accuracy of each human detection modality for reducing these non-reliable factors. The reduction in non reliable factors results in enhanced accuracy of human count further enabling selection of the most accurate human detection modality. In accordance with various aspects and embodiments of the present invention, the methods described herein are intended for operation as software programs (set of programmed instructions) running on a computer processor. In accordance with an aspect, referring to figure 1, the system (100) comprises of an image capturing device (102) for capturing an image in plurality of frames. The system (100) further comprises of a detection unit (104) configured for detecting human. The detection unit (104) is further embedded with at least one modality component (106) to apply at least one human detection modality for detection of human in at least one frame. The human detection modality is applied to obtain characteristics result associated with the captured image. The characteristic results include pixel value of a grayscale image of human. The human detection modality includes and is not limited to Haar, Histogram Oriented Gradient (HOG), Background Subtraction (BGS) or a combination thereof. In accordance with an embodiment of the invention, the system (100) further comprises of a tracking module (not shown in the figure) for tracking of human in one or more frames. The tracking module further processes the human image for tracking human by differentiating it with the non reliable factor present in the image. The system (100) further comprises of a calculation module (108) adapted to calculate an activity probability associated with each human detection modality. The activity probability provides a value of fluctuation in each frame of captured image for determining a variance in detected human count in each frame. The calculation module (108) also calculates an accuracy probability for determining the accuracy of each human detection modality. The system (100) further comprises of a fusion processor (110) communicating with the detection unit (104) and the calculation module (108) and is adapted for selectively integrating the characteristic results associated with the image captured by the image capturing device (102) and the activity probability associated with the each human detection modality. In accordance with an embodiment, the system (100) further comprises of an accuracy enhancer (112) communicatively coupled with the fusion processor (110). The accuracy enhancer (112) functions with the fusion processor (110) for enhancing the accuracy of the human count in the image. The accuracy enhancer (112) further comprises of a module (116) which selectively integrates the characteristic result associated with each human detection modality and the activity probability associated with the each human detection modality by using a regression model. The selective integration is performed to select the most accurate human detection modality. The system (100) further comprises of a human counter (114) communicating with the accuracy enhancer (112). The human counter (114) is adapted to count human in accordance with the selected human detection modality. The favorable human detection modality is the most accurate human detection modality selected by the accuracy enhancer (112) after the selective integration is performed. In accordance with an embodiment, referring to figure 2, the image capturing device (102) captures an image of human in a current frame and a previous frame. One or more features of the extracted image in the plurality of frame are compared to a threshold value. The threshold value is selected manually. The threshold value helps in determining whether the image is an activity image. The activity image is an image captured in an unstable frame. The activity is calculated by the calculation module (108) in terms of the activity probability. The human is then detected by the detection unit (104) in one or more frames by using at least one human detection modality. Still referring to figure 2, as shown in step 202, the detection unit (104) applies foreground extraction for detecting human. The detection unit (104) further applies Histogram Oriented Gradient (HOG) for detecting human body. The cascading of linear SVM is done for fast object detection. Here object refers to human. The detection unit (104) further applies Haar feature extraction for detecting face of human. The background changes are detected by using Background Subtraction (BGS) (referring to Parent Application No l359/MUM/2011). Again referring to figure 1, the calculation module (108) calculates an activity probability. The calculation module (108) further calculates an accuracy probability of each human detection modality by switching between the characteristic results obtained from the human detection modalities and by using the value of pre calculated activity probability. The activity probability determines a variance in detected human count in each video frame. Referring to figure 2, as shown in step 204, the fusion processor (110) generates a combination of the characteristic results obtained from individual human detection modalities by using a selection technique. The selective integration of the characteristic results of the above combination is performed to detect the location of human and to reduce non-reliable factors associated with each human detection modality. In accordance with an embodiment, the selection technique used for performing the combination of individual human detection modalities is a Bayesian Fusion technique. The Bayesian Fusion improves the classification performance of the human detection modalities. The individual human detection modalities (Haar, Histogram Oriented Gradient (HOG), Background Subtraction (BGS) provides their own characteristic results. The classification system determines whether an object belongs to a Human class (H) by observing the activity probability associated with each human detection modality. The Bayesian classifier of the Bayesian fusion technique fuses them with any prior p(H) to arrive to a global consensus posterior probability, p(H/Z), where Z = U{Zj} Vi. p(H) is the prior probability of class type H and Z={HOG, BGS, Haar}. The Histogram Oriented Gradient (HOG) classifier describes the posterior probability of object belonging to the human class H by: p (H/ZHOG)- Similarly, p (H/ZBGS) and p (H/ZHaar) are given by the other two human detection modalities. Assuming these information cues of same importance, same confidence level of detection should be given in the information fusion process. The information fusion is addressed using Bayesian modeling approach. In accordance with an embodiment, referring to figure 3, the image capturing device (102) captures the image in the plurality of frames for example in an old frame and a new frame. As shown in step 208, after taking the difference in feature values of both the frames a matrix is prepared. The feature value is the pixel value of image. The matrix is used for calculating the standard deviation and mean of the pixel value. As shown in step 202, the detection unit (104) further applies a plurality of human detection modality in one or more combination. The combination includes and is not limited to a combination of Histogram Oriented Gradient (HOG), Background subtraction and Haar, or a combination Histogram Oriented Gradient (HOG) and Background Subtraction (BG). Starting from the joint distribution and applying recursively the conjunction rule decomposition is obtained: Equation (1) assumes that the observations from different human detection modalities are independent. For multisensory system, it is reasonable to argue that the likelihood from each information source p(Zi/H), i = l...n are independent since the only parameter they have in common is the state. The conditional probability defining the information fusion can be written as (2). Again referring to figure 3, the characteristic results obtained from each human detection modality include pixel value of grayscale image. Through these characteristic results a set of matrices can be formed where the entries of matrix includes pixel values of the grayscale image. As shown in step 210, the matrices will be processed by the fusion processor (110) and the accuracy enhancer (112) for identifying those frames where significant activity took place. This will give a measure of activity for each frame. The significant activity arises if the pixel values change significantly from the previous frame. Then the matrix will have entries of differences in pixel values and will be processed. Again referring to figure 3, as shown in step 212 and 214, the accuracy enhancer (112) enhances the accuracy by applying the regression model to the characteristic result obtained by one or more human detection modality and the activity probability associated with the each human detection modality. The first step is to choose a mean and a standard deviation of the pixel values as a source of information. This value of mean and standard deviation would be taken as input covariates for applying the regression model to the activity probability and the accuracy probability will be calculated by using this value of activity probability. For the covariate vectorCV - [mean,variance], IctV - [1,mean,variance]; then the value of the logistic regression y has the following distribution: y - 1 with probability = 0 With probability estimated. where a is the vector parameter of the model to be Let there is a sample of size k for which the output values are predetermined. This predetermination can be a manual determination. The yi,is known for i = 1,2,3,...,k .A likelihood function L(a) which is a function of or given by: The likelihood function L(a) is maximized with respect to α to obtain an estimate α as the value of α maximizing L(a). With the help of these parameters, the value of activity probabilities is calculated. These values will provide a measure of activity. By way of specific example, a probability value greater than 0.7 indicates the desired unstable frame. With the help of this activity probability, the calculation module (108) will calculate the accuracy probability for determining the accuracy of each human detection modality. Again referring to figure 3, as shown in step 212, as an input for regression model, there is output of human detection modalities. By way of specific example, there is output of three human detection modalities at a frame level. Let for each frame we have outputs X1 = Haar, X2 = HOG+ BG and X3 = HOG + BG + Haar. All of these X,s are categorical variable taking values in integers. The other input is the calculated activity probability, P for each human detection modality. As the covariates, a set of independent contrasts is picked based on the inputs and the activity probability is taken unchanged as a covariate. In other words the covariates are: CV,=X2-X1; CV2=X3-Xl; CV3=P; Let = the set of all probability distribution over these algorithms N - the set of all possible values of covariates. A model is a function/ : . The best element in the class of these functions is chosen. But it is evident that the best element in that class is the regression model for which the output is closest to a ground truth data with probability value 1 (manually determined ground truth will be available for a small sample). Let us consider a special class of functions f{CV) = g(α'V). Here Vis an elementary transformation onCV. The optimum value of the matrix α will be determined for which the regression model gives best performance from the observed data available or sample. It is generally considered that g is a convex smooth function (means a function having non-zero positive derivative up to a certain order) for example: vector of logic functions for individual rows of α'V. Let CVI takes nvalues ki,1,ki,2,ki,3,...ki,n1then for the i'h covariate we introduce ni-1 indicator variables Ii,1,Ii,2,Ii,3,...Ii,nj,-1, asIi,j = Ind(CV1,= ki,j) for i = 1,2; j = 1,2,3,.-,ni,. Then that transformed vector is defined as follows: V=(1,I1,1,I1,2,I1,3,...I1,1,I1,2,I1,3,I1,n1i-1,I2,1,2,2,I2,3,....I2,n2-1,CV3) The ground truth data provides a sample of desired output vectors/ = (y1,y2,y3)- One of the yi is l and the rest are zeros. From the frame level values a function of the parameters is constructed and maximized with respect to the parameters to get the regression model. Let us assume g to be a vector of logic functions. Let V has m elements. It is considered that a = [a1, a2 ], where a1 and a2 are vectors of length m. Theng(a'V) = g([a1,a2]'V) = [P,(a'V),p2(a'V),p2(a'V)]'where pi s are defined as: p1 (a'V) = exp(a, 'V)/[l + exp(a 'V) + exp(a2'V)]; p2 (a' V) = exp(a2' V) /[l + exp(a1' V) + exp(a2' V)}; and P3(a

Documents

Orders

Section Controller Decision Date

Application Documents

# Name Date
1 3167-MUM-2011-RELEVANT DOCUMENTS [30-09-2023(online)].pdf 2023-09-30
1 Form 3 [22-12-2016(online)].pdf 2016-12-22
2 3167-MUM-2011-OTHERS [10-07-2018(online)].pdf 2018-07-10
2 3167-MUM-2011-US(14)-HearingNotice-(HearingDate-02-12-2020).pdf 2021-10-03
3 3167-MUM-2011-IntimationOfGrant30-07-2021.pdf 2021-07-30
3 3167-MUM-2011-FER_SER_REPLY [10-07-2018(online)].pdf 2018-07-10
4 3167-MUM-2011-PatentCertificate30-07-2021.pdf 2021-07-30
4 3167-MUM-2011-DRAWING [10-07-2018(online)].pdf 2018-07-10
5 3167-MUM-2011-Written submissions and relevant documents [15-12-2020(online)].pdf 2020-12-15
5 3167-MUM-2011-COMPLETE SPECIFICATION [10-07-2018(online)].pdf 2018-07-10
6 3167-MUM-2011-Correspondence to notify the Controller [01-12-2020(online)].pdf 2020-12-01
6 3167-MUM-2011-CLAIMS [10-07-2018(online)].pdf 2018-07-10
7 3167-MUM-2011-FORM-26 [01-12-2020(online)].pdf 2020-12-01
7 3167-MUM-2011-ABSTRACT [10-07-2018(online)].pdf 2018-07-10
8 ABSTRACT1.jpg 2018-08-10
8 3167-MUM-2011-Response to office action [01-12-2020(online)].pdf 2020-12-01
9 3167-MUM-2011-ABSTRACT.pdf 2018-08-10
9 3167-MUM-2011-FORM 3.pdf 2018-08-10
10 3167-MUM-2011-CLAIMS.pdf 2018-08-10
10 3167-MUM-2011-FORM 26(6-2-2012).pdf 2018-08-10
11 3167-MUM-2011-CORRESPONDENCE(6-2-2012).pdf 2018-08-10
11 3167-MUM-2011-FORM 2.pdf 2018-08-10
12 3167-MUM-2011-CORRESPONDENCE(7-12-2011).pdf 2018-08-10
12 3167-MUM-2011-FORM 2(TITLE PAGE).pdf 2018-08-10
13 3167-MUM-2011-CORRESPONDENCE.pdf 2018-08-10
13 3167-MUM-2011-FORM 18.pdf 2018-08-10
14 3167-MUM-2011-DESCRIPTION(COMPLETE).pdf 2018-08-10
14 3167-MUM-2011-FORM 1.pdf 2018-08-10
15 3167-MUM-2011-DRAWING.pdf 2018-08-10
15 3167-MUM-2011-FORM 1(7-12-2011).pdf 2018-08-10
16 3167-MUM-2011-FER.pdf 2018-08-10
17 3167-MUM-2011-FORM 1(7-12-2011).pdf 2018-08-10
17 3167-MUM-2011-DRAWING.pdf 2018-08-10
18 3167-MUM-2011-FORM 1.pdf 2018-08-10
18 3167-MUM-2011-DESCRIPTION(COMPLETE).pdf 2018-08-10
19 3167-MUM-2011-CORRESPONDENCE.pdf 2018-08-10
19 3167-MUM-2011-FORM 18.pdf 2018-08-10
20 3167-MUM-2011-CORRESPONDENCE(7-12-2011).pdf 2018-08-10
20 3167-MUM-2011-FORM 2(TITLE PAGE).pdf 2018-08-10
21 3167-MUM-2011-CORRESPONDENCE(6-2-2012).pdf 2018-08-10
21 3167-MUM-2011-FORM 2.pdf 2018-08-10
22 3167-MUM-2011-CLAIMS.pdf 2018-08-10
22 3167-MUM-2011-FORM 26(6-2-2012).pdf 2018-08-10
23 3167-MUM-2011-ABSTRACT.pdf 2018-08-10
23 3167-MUM-2011-FORM 3.pdf 2018-08-10
24 ABSTRACT1.jpg 2018-08-10
24 3167-MUM-2011-Response to office action [01-12-2020(online)].pdf 2020-12-01
25 3167-MUM-2011-FORM-26 [01-12-2020(online)].pdf 2020-12-01
25 3167-MUM-2011-ABSTRACT [10-07-2018(online)].pdf 2018-07-10
26 3167-MUM-2011-Correspondence to notify the Controller [01-12-2020(online)].pdf 2020-12-01
26 3167-MUM-2011-CLAIMS [10-07-2018(online)].pdf 2018-07-10
27 3167-MUM-2011-Written submissions and relevant documents [15-12-2020(online)].pdf 2020-12-15
27 3167-MUM-2011-COMPLETE SPECIFICATION [10-07-2018(online)].pdf 2018-07-10
28 3167-MUM-2011-PatentCertificate30-07-2021.pdf 2021-07-30
28 3167-MUM-2011-DRAWING [10-07-2018(online)].pdf 2018-07-10
29 3167-MUM-2011-IntimationOfGrant30-07-2021.pdf 2021-07-30
29 3167-MUM-2011-FER_SER_REPLY [10-07-2018(online)].pdf 2018-07-10
30 3167-MUM-2011-US(14)-HearingNotice-(HearingDate-02-12-2020).pdf 2021-10-03
30 3167-MUM-2011-OTHERS [10-07-2018(online)].pdf 2018-07-10
31 3167-MUM-2011-RELEVANT DOCUMENTS [30-09-2023(online)].pdf 2023-09-30
31 Form 3 [22-12-2016(online)].pdf 2016-12-22

Search Strategy

1 SearchQueries_02-01-2018.pdf