Abstract: The present application provides a system and method for tracking the multiple faces characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences, a system and method for tracking the multiple faces in video or image sequences which: do not fail to track the faces due to severe occlusion: do not fail to track the faces due to color matches with background of the image or frame of the video; and do not fail to track the faces due to isolation (unoccluded), entry/exit, reappearance and disappearance of the face.
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
A SYSTEM AND METHOD FOR TRACKING THE MULTIPLE FACES WITH APPEARANCE MODES AND REASONING PROCESS
Applicant:
Tata Consultancy Services Limited A company Incorporated in India under The Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021.
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
FIELD OF THE INVENTION
The present application relates to a system and method for image information processing. More particularly, the application relates to system and method for tracking the multiple faces with appearance modes and reasoning process.
BACKGROUND OF THE INVENTION
Multiple face tracking plays a key role in applications related to security-surveillance, human-computer interactions, video indexing etc. In order to track the multiple faces, the existing approaches proposed to extend the target region (track torso along with face), motion model guidance etc. to overcome occlusion. Further, Most of the works have concentrated on proposing better facial appearance features to localize the faces which however fail under severe occlusions and color matches with background.
Thus, in the light of the above mentioned background of the art, it is evident that, there is a need for a system and method for tracking the multiple faces in video or image sequences which:
• do not fail to track the faces due to severe occlusion;
• do not fail to track the faces due to color matches with background of the image or frame of the video; and
• do not fail to track the faces due to isolation (unoccluded), entry/exit, reappearance and disappearance of the face.
OBJECTS OF THE INVENTION
The principle object is to provide a system and method for tracking the multiple faces characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences.
Another significant object is to provide a system and method which do not fail to track the faces due to severe occlusion.
Still another object is to provide a system and method which do not fail to track the faces due to color matches with background of the image or frame of the video.
Further another object is to provide a system and method which do not fail to track the faces due to isolation, entry/exit, reappearance and disappearance of the face.
SUMMARY OF THE INVENTION
Before the present systems and methods, enablement are described, it is to be understood that this application is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments which are not expressly illustrated in the present disclosures. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present application.
The present application provides a system and method for tracking the multiple faces characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences.
A method for multiple face tracking characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences, the method comprising the various machine implemented steps.
Initially, real-times image sequence or video associated with at least two subjects is acquired by employing at least one sensor, wherein the said images or video captured in an indoor or outdoor environment is associated with the at least two human subjects. In one exemplary embodiment, the subject is human. In one exemplary embodiment, the sensor is a color camera using which color image sequences or color videos are acquired. In one exemplary embodiment, the said sensor is color camera and image sequence is color image sequence and video is color video.
Upon acquiring the image sequences or videos, a processor is configured with the sensor for analyzing the captured image sequences or video in real-time for tracking the multiple faces characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences in order provide applications related to security-surveillance, human-computer interactions, video indexing etc.
In one exemplary embodiment, upon acquiring the image sequence or frames of video associated with at least two subjects, face region of at least one subject in at least one profile or pose in a first frame of the video or first image of the image sequences is detected by employing at least one face detector and subsequently the detected face region in sub-sequent frames of video or images of image sequences is tracked using motion prediction initialized mean-shift iterations by employing at least one face tracker. In one exemplary embodiment, the face region of at least one subject in at least one profile or pose in a first frame of the video or first image of the image sequences is detected using Haar feature based boosted classifiers by employing at least one face detector. In another embodiment, face detector can be any one of the generally known face detector. In one exemplary embodiment, the profile or pose
comprises of profiles falling between angles 0 degree to 180 degree from left to right or vice versa. In a preferred embodiment, the profile or pose is front profile.
Upon tracking the face region, the tracked face region belonging to the particular subject is normalized to avoid illumination effects and subsequently clustered to discover multiple appearance modes in various facial poses of the particular subject.
In another embodiment, a reasoning process is performed by analyzing the fractional overlaps between tracked and detected face regions to identify a face of the particular subject in one of the states of isolation (no occlusion), grouping with other faces, disappearance (near complete occlusions) or new face detection, track loss and reappearance/entry/exit. In one exemplary embodiment, the re-appearances of faces are detected using a Normalized Face Cluster Set (NFCS). In one exemplary embodiment. NFCS is formed by clustering the normalized faces obtained from the different facial poses to learn the modes of appearances of the subject. In another exemplary embodiment, the NFCS captures the different appearance modes for varying facial poses of the subject and is used for discriminating new faces from the existing faces while restoring the face tracks.
In another embodiment, the features of the face region of the particular subject is updated automatically by employing the respective face tracker or the face tracks of the subjects are added/removed automatically by employing the respective face tracker, upon the identification of either one of the states of the face of the particular subject as used as cues while detecting re-appearance of the face of the particular subject.
In another embodiment, one of the updated facial features of the face region of the particular subject or face track restoration after re-appearance of the face is verified
using Gaussian Mixture Models (GMMs). In one exemplary embodiment the GMM identifies the facial appearance modes of each subject.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing summary, as well as the following detailed description of preferred embodiments, is better understood when read in conjunction with the appended drawings. There is shown in the drawings example embodiments, however, the application is not limited to the specific system and method disclosed in the drawings.
Figure 3 shows face representation scheme according to one exemplary embodiment of the invention.
Figure 2 shows results of multiple faces tracking under occlusions according to one exemplary embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
Some embodiments, illustrating its features, will now be discussed in detail. The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an." and "the" include plural references unless the context clearly dictates otherwise. Although any methods, and systems similar or equivalent to those described herein can be used in the practice or testing of embodiments, the preferred methods, and systems are now described. The disclosed embodiments are merely exemplary.
The present application provides a system and method for tracking the multiple faces characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences.
Figure 1 shows face representation scheme according to one exemplary embodiment of the invention. A method for multiple face tracking characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences, the method comprising the various machine implemented steps.
Initially, real-times image sequence or video associated with at least two subjects is acquired by employing at least one sensor, wherein the said images or video captured in an indoor or outdoor environment is associated with the at least two human subjects. In one exemplary embodiment, the subject is human. In one exemplary embodiment, the sensor is a color camera using which color image sequences or color videos are acquired.
Upon acquiring the image sequences or videos, a processor is configured with the sensor for analyzing the captured image sequences or video in real-time for tracking the multiple faces characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences in order provide applications related to security-surveillance, human-computer interactions, video indexing etc.
In one exemplary embodiment, upon acquiring the image sequence or frames of video associated with at least two subjects, face region of at least one subject in at least one profile or pose in a first frame of the video or first image of the image sequences is
detected by employing at least one face detector. . In one exemplary embodiment, the face region of at least one subject in at least one profile or pose in a first frame of the video or first image of the image sequences is detected using Haar feature based boosted classifiers by employing at least one face detector. In another embodiment. face detector can be any one of the generally known face detector.
In one exemplary embodiment, the Haar feature based face detectors is used (available with OpenCV software) to segment the regions of left/right profile or frontal faces in the image sequences. However, these detectors are extremely sensitive to the facial pose. Thus, although they are very accurate in detecting faces in left/right profile or frontal faces, they fail when the facial pose changes. It is also not practical to use a lot of detectors, each tuned to different face orientations as that would lead to both high memory and processor usage. Thus, a detection reduced to a local neighborhood search guided by face features is advantageous to satisfy real-time constraints. Such a necessity is achieved by the procedure of tracking.
In the next step of the proposed method, the detected face region in sub-sequent frames of video or images of image sequences is tracked using motion prediction initialized mean-shift iterations by employing at least one face tracker, wherein the face tracker uses the tracking algorithms which constitute schemes for "object representation" and a procedure for ''inter-frame object region correspondence". Additionally, a reasoning process/method is adapted to handle different face trackers in cases involving multiple subjects.
In one exemplary embodiment, the profile or pose comprises of profiles falling between angles 0 degree to 180 degree from left to right or vice versa. In a preferred embodiment, the profile or pose is front profile.
In another embodiment, a set of facial features are initially computed from the face region detected by one of the profile/frontal face detectors and are updated whenever the face is detected next, isolated from other detected/tracked face regions. The face bounding rectangle, motion history, color distribution and a mixture model learned on normalized face appearances as features is used for representing the face.
In another embodiment, the location of the face F in the image is identified by the face bounding rectangle BR(F) with sides parallel to image axes. A second order motion model (constant jerk) is used, continuously updated from the 3 consecutive centroid positions of BR(F). Using this model, the centroidal position Ct(F) at the tth instant is predicted as Ct(F) = 2.5Ct_1(F) - 2Ct_2(F) + 0.5Ct_3(F). The color distribution H(F) of the face F is computed as a normalized color histogram, position weighted by the Epanechnikov kernel supported over the maximal elliptical region BE(F) (centered at C(F)) inscribed in BR(F). Mean-shift iterations is initialized from the motion model predicted position converge to localize the target face region in the current image. The mean-shift tracking algorithm maximizes the Bhattacharya co-efficient between the target color distribution H(F) and the color distribution
i
computed from the localized region at each step of the iterations. The maximum Bhattacharya co-efficient obtained after the mean-shift tracker convergence is used as the tracking confidence tc(F) of the face F. This color based representation with an appearance model is combined to encode the structural information of the face. The RGB image region within BR(F) is first resized and then converted to a q x q monochrome image which is further normalized by its brightest pixel intensity to form the normalized face image nF of the face F.
Upon tracking the face region, the tracked face region belonging to the particular subject is normalized to avoid illumination effects as shown in the Figure 1 and
subsequently clustered to discover multiple appearance modes in various facial poses of the particular subject.
In another embodiment, a reasoning process is performed by analyzing the fractional overlaps between tracked and detected face regions to identify a face of the particular subject in one of the states of isolation (no occlusion), grouping with other faces, disappearance (near complete occlusions) or new face detection, track loss and reappearance/entry/exit. In one exemplary embodiment, the re-appearances of faces are detected using a Normalized Face Cluster Set (NFCS (F). During the course of tracking, a person appears with various facial poses, NFCS is formed by clustering the normalized faces obtained from the different facial poses to learn the modes of appearances of the subject. In another exemplary embodiment, the NFCS captures the different appearance modes for varying facial poses of the subject and is used for discriminating new faces from the existing faces while restoring the face tracks.
The normalized face image nF is re-arranged in a row-major format to generate the d = q x q dimensional feature vector X(nF). To achieve computational gain, the individual dimensions of the feature vector are un-correlated and hence, a diagonal co-variance matrix is sufficient to approximate the spread of the component Gaussians. A distribution over these feature vectors is approximated by learning a variant of the Gaussian mixture models where a set of normalized face clusters is constructed.
In one exemplary embodiment, the NFCS with K clusters is given by the set NFCS = {(µr,σr,πr);r = 1,...K], where µr,σr, are the respective mean and standard deviation vectors of the rth cluster and the weighing parameter πr is the fraction of the total number of normalized face vectors belonging to the rth cluster. The NFCS
initializes with µ1 = X(nF1) and an initial standard deviation vector σt = σinit and π1 = 1.0.
Let there be Kl_1 clusters in the NFCS until the processing of the vector X(nFl-1) We define the belongingness function Br(u) for the uth dimension of the rth cluster as Br (u) given by,
where λ is being typically chosen between 2.5 — 5.0 (Chebyshev's inequality). The vector X(nF1) is considered to belong to the rth cluster if,
where ήmv ε (0,1), such that ήmv x d denotes the upper limit of tolerance on the number of membership violations in the normalized face vector. If X(nFt) belongs to the rth cluster, then its parameters are updated as,
dusters r' ± r, the mean and standard deviation vectors remain unchanged while the cluster weight πr, is penalized as πr ←(1 — α1)πr. However, if X(nFt) is not found to belong to any existing cluster, a new cluster is formed (Kt = Kt-1 + 1) with its
mean vector as X(nFt), standard deviation vector as σinit and weight -; the weights of
the existing clusters are penalized as mentioned before.
It is worth noting that the parameter updates in equation 5 match the traditional Gaussian Mixture Model (GMM) learning. In GMMs, all the dimensions of the mean vector are updated with the incoming data vector. However, here the mean and standard deviation vector dimensions are updated selectively with membership checking to resist the fading out of the mean images. Hence, NFCS is called as a variant of the mixture of Gaussians. Figure 1 shows a few mean images of the normalized face clusters learned from the tracked face sequences of the subject.
In another embodiment, the features of the face region of the particular subject is updated automatically by employing the respective face tracker or the face tracks of the subjects are added/removed automatically by employing the respective face tracker, upon the identification of either one of the states of the face of the particular subject as used as cues while detecting re-appearance of the face of the particular subject.
In one embodiment, tracking multiple faces is not merely the implementation of multiple trackers but a reasoning scheme that binds the individual face trackers to act according >to problem case based decisions. For example, consider the case of tracking a face which gets occluded by another subject (object). A straight through tracking approach will try to establish correspondences even when the target face disappears in the image due to complete occlusion by some scene object leading to tracking failure. A reasoning scheme, on the other hand, will identify the problem situation of the disappearance due to the occlusion of the face and will accordingly wait for the face to reappear by freezing the concerned tracker. The proposed method to multiple face tracking proposes a reasoning scheme to identify the cases of face grouping/isolation along with the scene entry/exit of new/existing faces-
The process of reasoning is performed over three sets, viz, the sets of active, passive and detected faces. The active set Fa(t) consists of the faces that are well tracked until the tth instant. On the other hand, the passive set Fp(t) contains the objects for which either the face tracker has lost track or are not visible in the scene. The set of detected faces Fd(t) contains the faces detected in the tth frame. The face tracker initializes itself with empty active/passive/detected face sets and the objects are added or removed accordingly as they enter or leave the field of view, During the process of reasoning, the objects are often switched between the active and passive sets as the track is lost or restored. The process of reasoning is started at the tth frame based on the active/passive face sets available from the (t - l)th instant. The faces in the active set are first localized with motion prediction initialized mean-shift trackers. The extent of overlap between the tracked face regions from the active set and the detected face regions is computed to identify the isolation/grouping state of the faces. The reasoning process based on the tracked-detected region overlaps is described below.
Consider the case where m faces are detected (Fd = {dFj:j = 1 ...m}) while n faces were actively tracked till the last frame (Fa = {aFt; i = 1... n}). To analyze the correspondence between the tracked face region and the detected face area, we define the fractional overlap between the faces F1 and F2 as
which signify the fraction of the bounding rectangle of F1 overlapped with that of F2 The actively tracked face aFi and the detected face dFf are considered to have significant overlap, if either of their fractional overlap with the other crosses a certain threshold ήad (equation 7).
Let Sdf (i) denote the set of detected faces which has significant overlap with the face aFi in the active set and Saf (j) represent the set of faces in the active set which has significant overlap with the detected face dFl (equation 9).
Based on the cardinalities of these sets associated with either of aFt/dFj and the tracking confidence tc(.aFl), the following situations is identified during the process of tracking.
• Isolation and Feature Update ~ The face aFt is considered to be isolated if it
does not overlap with any other face in the active set
Vr # i /Overlaps(aFt, aFr); aFt aFr 6 Ta Under this condition of isolation of the tracked face, its features is updated, if there exists a pair (aFl,dFk) which significantly overlap only with each other and none else - ЗkOverlaps(aFi,dFk) Λ |δdf (i) = 1| Λ |δaf (k) = 1|. In such a case, the face boundaries on account of the face detection success are more accurate and thus the color distribution and the motion features of aFt are updated from the associated (detected) face dFk.
• Face Grouping — The face is considered to be in a group (e.g. multiple persons
with overlapping face regions) if the bounding rectangles of the tracked faces overlap.
In such a case, even if a single detected face dFk is associated to aFi, the
correspondence on account of multiple overlaps are not more accurate. Thus, in this
case we only update the motion model of aFt based on its currently tracked position.
• Detection and/or Tracking Failure — This is the case where due to facial pose variations, the presence of the face in the image is failed to detect. However, if the face aFi is tracked well (tc(aFi) > ήtc), only the motion model of aFi is updated. In absence of confident face detection, the exact face boundaries are not accurate and hence the color distribution is not updated. However, in case of both detection and tracking failure, aFi is notassociated with any detected face and the tracking confidence also drops below the threshold ( ήtc ). In this case, aFi is considered to disappear from the scene (equation 10) and transfer it from Fa to Fp.
• New Face Identification — A new face in the scene does not overlap with any of the bounding rectangles of the existing (tracked) faces. Thus, dFj is considered a new face if Saf (j) is a null set.
The proposed tracker might lose track of an existing face whose re-appearance is also detected as the occurrence of a new one. Hence, the newly detected face region is normalized first and checked against the NFCS of the faces in Fp using the belongingness criterion outlined in Equation 2. If a match is found, the track of the corresponding face is restored by moving it from Fp to Fa and its color and motion features are re-initialized from the newly detected face region. However, if no matches are found, a new face is added to Fa whose color and motion features are learned from the newly detected face region.
During the course of multiple object tracking, the faces in the active set are identified in one of the above situations and the feature update or active to passive set transfer
decisions are taken accordingly. By reasoning with these conditions, new trackers are initialized as new faces enter the scene and destroy them as the faces disappear.
In another embodiment, one of the updated facial features of the face region of the particular subject or face track restoration after re-appearance of the face is verified using Gaussian'Mixture Models (GMMs). In one exemplary embodiment, the GMM identifies the facial appearance modes of each subject.
BEST METHOD
The present application is described in the example given below which is provided only to illustrate the application and therefore should not be construed to limit the scope of the application.
The proposed methodology is tested offline on the following 4 different sets of image sequences from the movies "300" (624 images) and "Sherlock Holmes" (840 images), "House" TV Series, Season 8 (495 images) and a sequence of 1116 images recorded in the lab with a webcam. The fractional overlap threshold is empirically chosen as ήfo — 0.1. the tracking confidence threshold as ήtc — 0.6. The results of multiple objects tracking in these videos are shown in figure 2. The proposed approach for multiple face tracking is implemented on a single core 1.6 GHz Intel Pentium-4 PC with semi-optimized coding and operates at 13.33 FPS (face detection stage included).
Performance Analysis - An object centric performance analysis is presented by manually inspecting the surveillance log for computing the average rates of tracking precision and track switches. A tracker initialized over a certain face may eventually lose it on account of occlusions and may switch track to some other face(s) until an
exit event. The tracking is considered to be successful if the localized region has a non-zero overlap with the actual face region in the image. Consider the case of a tracker with a life span of T frames, of which for the first Ttrk frames, the tracker successfully tracks the same face over which it is initialized and then successively switches track to Nswitch number of (differnet) faces(s) during the remaining T — Ttrk
frames. The tracking precision of an individual subject (object) is then defined as
and the average tracking precision computed over the entire set of extracted faces is called the Tracking Success Rate for the entire video. In the same line, the Tracker Failure Rate is evaluated as the average number of track switches over the entire set of extracted objects. After a track switch from the Ttrk + 1 frame onwards, a different tracker may pick up the trail of this object through a track switch from some other face or through the initialization of a new tracker -- let there be Nreinit number of tracker re-initializations on some face region. The Tracker Re-initialization Rate is defined as the average number of tracker re-initializations per face computed over the entire set of extracted faces. The performance of the tracking algorithm with respect to these measures is presented in table 1.
TABLE 1: PERFORMANCE ANALYSIS OF MULTIPLE FACE TRACKING
Tracking
Success Tracker Failure Tracker Re-initialization
300
Figure 2(a)-(e) 85.10% 1.00 0.40
House Season 8
Figure 2(f)-(j) 87.80% 0.33 0.00
Sherlock Holmes
Figure 2(k)-(o) 91.52% 0.40 0.40
Lab Sequence
Figure 2(p)-(t) 81.20% 0.50 0.67
The methodology and techniques described with respect to the exemplary embodiments can be performed using a machine or other computing device within which a set of instructions, when executed, may cause the machine to perform any one or more of the methodologies discussed above. In some embodiments, the machine operates as a standalone device. In some embodiments, the machine may be connected (e.g., using a network) to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client user machine in a server-client user network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may comprise a server computer, a client user computer, a personal computer (PC), a tablet PC, a laptop computer, a desktop computer, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The machine may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU, or both), a main memory and a static memory, which communicate with each other via a bus. The machine may further include a video display unit (e.g., a liquid crystal display (LCD), a flat panel, a solid state display, or a cathode ray tube (CRT)). The machine may include an input device (e.g., a keyboard) or touch-sensitive screen, a cursor control device (e.g., a mouse), a disk drive unit, a signal generation device (e.g., a speaker or remote control) and a network interface device.
The disk drive unit may include a machine-readable medium on which is stored one or more sets of instructions (e.g., software) embodying any one or more of the methodologies or functions described herein, including those methods illustrated above. The instructions may also reside, completely or at least partially, within the main memory, the static memory, and/or within the processor during execution thereof by the machine. The main memory and the processor also may constitute machine-readable media.
Dedicated hardware implementations including, but not limited to, application specific integrated circuits, programmable logic arrays and other hardware devices can likewise be constructed to implement the methods described herein. Applications that may include the apparatus and systems of various embodiments broadly include a variety of electronic and computer systems. Some embodiments implement functions in two or more specific interconnected hardware modules or devices with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit. Thus, the example system is applicable to software, firmware, and hardware implementations.
In accordance with various embodiments of the present disclosure, the methods described herein are intended for operation as software programs running on a computer processor. Furthermore, software implementations can include, but not limited to, distributed processing or component/object distributed processing, parallel processing, or virtual machine processing can also be constructed to implement the methods described herein.
The present disclosure contemplates a machine readable medium containing instructions, or that which receives and executes instructions from a propagated signal
so that a device connected to a network environment can send or receive voice, video or data, and to communicate over the network using the instructions. The instructions may further be transmitted or received over a network via the network interface device.
While the machine-readable medium can be a single medium, the term "machine-readable medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term "machine-readable medium" shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure.
The term "machine-readable medium" shall accordingly be taken to include, but not be limited to: tangible media; solid-state memories such as a memory card or other package that houses one or more read-only (non-volatile) memories, random access memories, or other re-writable (volatile) memories; magneto-optical or optical medium such as a disk or tape; non-transitory mediums or other self-contained information archive or set of archives is considered a distribution medium equivalent to a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a machine-readable medium or a distribution medium, as listed herein and including art-recognized equivalents and successor media, in which the software implementations herein are stored.
The illustrations of arrangements described herein are intended to provide a general understanding of the structure of various embodiments, and they are not intended to serve as a complete description of all the elements and features of apparatus and systems that might make use of the structures described herein. Many other
arrangements will be apparent to those of skill in the art upon reviewing the above description. Other arrangements may be utilized and derived there from, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. Figures are also merely representational and may not be drawn to scale. Certain proportions thereof may be exaggerated, while others may be minimized. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
The preceding description has been presented with reference to various embodiments. Persons skilled in the art and technology to which this application pertains will appreciate that alterations and changes in the described structures and methods of operation can be practiced without meaningfully departing from the principle and scope.
ADVANTAGES
The above proposed system and method can be used in the applications wide range of applications like security-surveillance (face logging of visitors), face based video indexing (making a video browsable through face content), human-computer interactions (locating and tracking multiple users in front of camera) etc in order to track the multiple faces in the frames of video or images of image sequence. The above proposed system and method makes no constraining assumptions regarding the facial appearance and motion features, camera parameters, scene background model etc. and demonstrates that only reasoning applied on a set of easily computable event predicates leads to an efficient multiple face tracking system. Some salient strength of the proposed system and method are mentioned below:
• ability to distinguish a variety of problem situations using computationally simple event predicates;
• using the above information to decide on the advisability of feature updates;
• inherent ability of recognizing failure situations, should they occur;
• ability to automatic track restoration at a later time; and
• no constraining assumptions related to face appearance/motion or camera/environment models.
WE CLAIM:
1. A method for multiple face tracking characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences, the method comprising the machine implemented steps of:
detecting face region of at least one subject in at least one profile or pose in a first frame of the video or first image of the image sequences and subsequently tracking the detected face region in sub-sequent frames of video or images of image sequences using motion prediction initialized mean-shift iterations;
normalizing the tracked face region belonging to the particular subject to avoid illumination effects and subsequently clustering to discover multiple appearance modes in various facial poses of the particular subject;
characterized by performing a reasoning process by analyzing the fractional overlaps between tracked and detected face regions to identify a face of the particular subject in one of the states of isolation (no occlusion), grouping with other faces, disappearance (near complete occlusions) or new face detection, track loss and re-appearance/entry/exit, wherein the re-appearances of faces is detected using a Normalized Face Cluster Set (NFCS); and
automatically updating the features of the face region of the particular subject or adding/removing the face tracks of the subjects, selectively upon the identification of either one of the states of the face of the particular subject as used as cues while detecting re-appearance of the face of the particular subject.
2. The method of claim 1, further comprising the step of verifying either one of the updated facial features of the face region of the particular subject or face track
restoration after re-appearance of the face using Gaussian Mixture Models (GMMs), the GMM identifies the facial appearance modes of each subject.
3. The method of claim 1, wherein the NFCS is formed by clustering the normalized faces obtained from the different facial poses to learn the modes of appearances of the subject.
4. The method of claim 1, wherein the NFCS captures the different appearance modes for varying facial poses of the subject and is used for discriminating new faces from the existing faces while restoring the face tracks.
5. The method of claim 1, wherein the profile or pose comprises of profiles falling between angles 0 degree to 180 degree from left to right or vice versa.
6. The method of claim 1, wherein the subject comprises of human beings only.
7. A system for multiple face tracking characterized in that a reasoning process with a set of computationally simple event predicates in at least two frames of video or at least two images of image sequences, the system comprising of:
at least one face detector for detecting face region of at least one subject in at least one profile or pose in a first frame of the video or first image of the image sequences;
at least one tracker for tracking the detected face region in sub-sequent frames of video or images of image sequences using motion prediction initialized mean-shift iterations and characterized in that the tracker adapted to a reasoning process for automatically updating the target face features, when the face detector succeeds in detecting the frame presence of the face.
8. The system of claim 7 further comprises of normalizing means for normalizing the tracked face region belonging to the particular subject to avoid illumination effects.
9. The system of claim 7 further comprises of clustering means for discovering multiple appearance modes in various facial poses of the particular subject.
10. The system of claim 7 further comprises of Normalized Face Cluster Set (NFCS) means for capturing the different appearance modes for varying facial poses of the subject and for discriminating new faces from the existing faces while restoring the face tracks.
11. The system of claim 10, wherein the NFCS means is formed by clustering the normalized faces obtained from the different facial poses to learn the modes of appearances of the subject.
32. The system of claim 7 further comprises of Gaussian Mixture Model (GMM) means for verifying either one of the updated facial features of the face region of the particular subject or face track restoration after re-appearance of the face.
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 1959-MUM-2011-OTHERS [14-05-2018(online)].pdf | 2018-05-14 |
| 1 | 1959-MUM-2011-RELEVANT DOCUMENTS [27-09-2023(online)].pdf | 2023-09-27 |
| 2 | 1959-MUM-2011-FER_SER_REPLY [14-05-2018(online)].pdf | 2018-05-14 |
| 2 | 1959-MUM-2011-RELEVANT DOCUMENTS [30-09-2022(online)].pdf | 2022-09-30 |
| 3 | 1959-MUM-2011-IntimationOfGrant26-06-2020.pdf | 2020-06-26 |
| 3 | 1959-MUM-2011-DRAWING [14-05-2018(online)].pdf | 2018-05-14 |
| 4 | 1959-MUM-2011-PatentCertificate26-06-2020.pdf | 2020-06-26 |
| 4 | 1959-MUM-2011-COMPLETE SPECIFICATION [14-05-2018(online)].pdf | 2018-05-14 |
| 5 | 1959-MUM-2011-Written submissions and relevant documents [04-03-2020(online)].pdf | 2020-03-04 |
| 5 | 1959-MUM-2011-CLAIMS [14-05-2018(online)].pdf | 2018-05-14 |
| 6 | ABSTRACT1.jpg | 2018-08-10 |
| 6 | 1959-MUM-2011-Correspondence to notify the Controller [12-02-2020(online)].pdf | 2020-02-12 |
| 7 | 1959-MUM-2011-FORM-26 [12-02-2020(online)].pdf | 2020-02-12 |
| 7 | 1959-mum-2011-form 3.pdf | 2018-08-10 |
| 8 | 1959-MUM-2011-Response to office action [12-02-2020(online)].pdf | 2020-02-12 |
| 8 | 1959-MUM-2011-FORM 26(26-8-2011).pdf | 2018-08-10 |
| 9 | 1959-mum-2011-form 2.pdf | 2018-08-10 |
| 9 | 1959-MUM-2011-HearingNoticeLetter-(DateOfHearing-18-02-2020).pdf | 2020-01-28 |
| 11 | 1959-mum-2011-abstract.pdf | 2018-08-10 |
| 11 | 1959-mum-2011-form 2(title page).pdf | 2018-08-10 |
| 12 | 1959-mum-2011-form 18.pdf | 2018-08-10 |
| 13 | 1959-mum-2011-claims.pdf | 2018-08-10 |
| 13 | 1959-mum-2011-form 1.pdf | 2018-08-10 |
| 14 | 1959-MUM-2011-CORRESPONDENCE(20-7-2011).pdf | 2018-08-10 |
| 14 | 1959-MUM-2011-FORM 1(20-7-2011).pdf | 2018-08-10 |
| 15 | 1959-MUM-2011-CORRESPONDENCE(26-8-2011).pdf | 2018-08-10 |
| 15 | 1959-MUM-2011-FER.pdf | 2018-08-10 |
| 16 | 1959-mum-2011-correspondence.pdf | 2018-08-10 |
| 16 | 1959-mum-2011-drawing.pdf | 2018-08-10 |
| 17 | 1959-mum-2011-description(complete).pdf | 2018-08-10 |
| 18 | 1959-mum-2011-correspondence.pdf | 2018-08-10 |
| 18 | 1959-mum-2011-drawing.pdf | 2018-08-10 |
| 19 | 1959-MUM-2011-FER.pdf | 2018-08-10 |
| 19 | 1959-MUM-2011-CORRESPONDENCE(26-8-2011).pdf | 2018-08-10 |
| 20 | 1959-MUM-2011-CORRESPONDENCE(20-7-2011).pdf | 2018-08-10 |
| 20 | 1959-MUM-2011-FORM 1(20-7-2011).pdf | 2018-08-10 |
| 21 | 1959-mum-2011-claims.pdf | 2018-08-10 |
| 21 | 1959-mum-2011-form 1.pdf | 2018-08-10 |
| 22 | 1959-mum-2011-form 18.pdf | 2018-08-10 |
| 23 | 1959-mum-2011-abstract.pdf | 2018-08-10 |
| 23 | 1959-mum-2011-form 2(title page).pdf | 2018-08-10 |
| 25 | 1959-mum-2011-form 2.pdf | 2018-08-10 |
| 25 | 1959-MUM-2011-HearingNoticeLetter-(DateOfHearing-18-02-2020).pdf | 2020-01-28 |
| 26 | 1959-MUM-2011-Response to office action [12-02-2020(online)].pdf | 2020-02-12 |
| 26 | 1959-MUM-2011-FORM 26(26-8-2011).pdf | 2018-08-10 |
| 27 | 1959-mum-2011-form 3.pdf | 2018-08-10 |
| 27 | 1959-MUM-2011-FORM-26 [12-02-2020(online)].pdf | 2020-02-12 |
| 28 | ABSTRACT1.jpg | 2018-08-10 |
| 28 | 1959-MUM-2011-Correspondence to notify the Controller [12-02-2020(online)].pdf | 2020-02-12 |
| 29 | 1959-MUM-2011-Written submissions and relevant documents [04-03-2020(online)].pdf | 2020-03-04 |
| 29 | 1959-MUM-2011-CLAIMS [14-05-2018(online)].pdf | 2018-05-14 |
| 30 | 1959-MUM-2011-PatentCertificate26-06-2020.pdf | 2020-06-26 |
| 30 | 1959-MUM-2011-COMPLETE SPECIFICATION [14-05-2018(online)].pdf | 2018-05-14 |
| 31 | 1959-MUM-2011-IntimationOfGrant26-06-2020.pdf | 2020-06-26 |
| 31 | 1959-MUM-2011-DRAWING [14-05-2018(online)].pdf | 2018-05-14 |
| 32 | 1959-MUM-2011-RELEVANT DOCUMENTS [30-09-2022(online)].pdf | 2022-09-30 |
| 32 | 1959-MUM-2011-FER_SER_REPLY [14-05-2018(online)].pdf | 2018-05-14 |
| 33 | 1959-MUM-2011-OTHERS [14-05-2018(online)].pdf | 2018-05-14 |
| 33 | 1959-MUM-2011-RELEVANT DOCUMENTS [27-09-2023(online)].pdf | 2023-09-27 |
| 1 | search_1959_12-10-2017.pdf |