Abstract: The present disclosure relates to a system for identifying activity an area of interest includes image capturing devices configured to capture images of one or more persons in the area of interest and correspondingly generate a set of first signals. A processing unit having processor coupled with the image capturing devices. The processor is configured to receive, from the image capturing devices, the set of first signals. Extract, from the set of first signals, first attributes pertaining to facial data, second attributes pertaining to body posture, and third attributes pertaining to emotions of the one or more persons, and correspondingly generate a first score, a second score, and a third score. Cumulatively add the first score, the second score, and the third score to generate a collective score, and generate a set of second signals when the collective score is more than a pre-defined threshold value.
Claims:1. A system for identifying activity of one or more persons in an area of interest, the system comprising:
one or more image capturing devices configured to capture one or more images of the one or more persons in the area of interest and correspondingly generate a set of first signals;
a processing unit having one or more processors coupled with the one or more image capturing devices and a memory, the memory stores instructions executable by the processor to:
receive, from the one or more image capturing devices, the set of first signals,
extract, from the set of first signals, one or more first attributes pertaining to facial data of the one or more persons, one or more second attributes pertaining to body posture of the one or more persons, and one or more third attributes pertaining to emotions of the one or more persons, and correspondingly generate a first score, a second score, and a third score,
cumulatively add the first score, the second score, and the third score to generate a collective score, and
generating a set of second signals when the collective is more than a pre-defined threshold value, wherein the set of second signals actuate one or more indicating devices when the collective score is more than a pre-defined threshold value.
2. The system as claimed in claim 1, wherein the collective scores is generated by cumulatively adding the first score, the second score, and the third score and multiplying with a number of persons in the area of interest, and wherein while adding the first score, the second score, and the third score a highest weight is given to the first score, followed by the second score, and followed by the third score.
3. The system as claimed in claim 1, wherein the extraction of the one or more first attributes comprises:
a first module for extracting, from the set of first signals, one or more parameters;
a second module configured to receive the one or more first parameters and correspondingly generate and clean a face corpus, wherein the face corpus is compared with a database having pre-stored faces.
4. The system as claimed in claim 3, wherein the first score is generated based on the comparison between the face corpus and the database.
5. The system as claimed in claim 3, wherein the one or more first parameters comprises any or combination of head and shoulders, frontal face, side face, eyes, and ears.
6. The system as claimed in claim 1, wherein the extraction of the one or more second attributes comprises:
generate a one or more second parameters pertaining to motion-based features of the one or more persons,
extract, from the one or more second parameters, angular features pertaining to relative angle between joints of the one or more persons, and
normalizing the angular features of the one or more persons to detect body posture of the one or more persons and correspondingly generate the second score.
7. The system as claimed in claim 1, wherein extraction of the one or more third attributes comprises:
extracting, from the set of first signals, one or more third parameters pertaining to visual features of the one or more persons in the area of interest,
extracting, from the one or more third parameters, temporal features using deep learning, and
performing a multiclass probability estimation on the temporal features for detecting emotions of the one or more persons and correspondingly generate the third score.
8. A method for identifying activity of one or more persons in an area of interest, the method comprising:
receiving, by a processor, a set of first signals corresponding to one or more images of the one or more persons in the area of interest;
extracting from the set of first signals, by a processor, one or more first attributes pertaining to facial data of the one or more persons, one or more second attributes pertaining to body posture of the one or more persons, and one or more third attributes pertaining to emotions of the one or more persons, and correspondingly generate a first score, a second score, and a third score,
cumulatively adding, by the processor, the first score, the second score, and the third score to generate a collective score, and
generating, by the processor, a set of second signals when the collective is more than a pre-defined threshold value, wherein the set of second signals actuate one or more indicating devices when the collective score is more than a pre-defined threshold value.
, Description:TECHNICAL FIELD
[0001] The present disclosure relates to the field of identifying suspicious activity. More particularly the present disclosure relates to the field of identifying suspicious activity of one or more person in an area of interest.
BACKGROUND
[0002] Background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
[0003] Miscreants such as terrorist/criminal attacks are a serious threat for public security and are a challenge for both the private actors and public agencies involved in its provision. The recognition of faces is of paramount importance for detecting the persons in video surveillance cameras, retrieval of an identity from a database for criminal investigations, home and office security and forensic applications. Also, there is huge difficulty in getting an intelligent control system to recognize a person with a high degree of precision in a semblance of how human being would. Government organizations have put substantial efforts into detecting and thwarting terrorist and insurgent attacks by observing suspicious behavior, but there volume and diversity has sometimes been overwhelming, also effectiveness claims unlike of clear basis in technologies.
[0004] There is, therefore, a need of an improved system or method for identifying miscreants in an area of interest to predict any threat to public and property in advance.
OBJECTS OF THE PRESENT DISCLOSURE
[0005] Some of the objects of the present disclosure, which at least one embodiment herein satisfies are as listed herein below.
[0006] It is an object of the present disclosure to provides a system and method for identifying activity of one or more persons in an area of interest that combines body action recognition, emotion recognition, and the face (both frontal and side lobe) recognition, eye and ear tracking approach to investigate the dynamics and eventually predict the probability of a suspicious activity in a video frame.
[0007] It is an object of the present disclosure to provides a system and method for identifying activity of one or more persons in an area of interest that is highly accurate and is of practical value at homeland security solutions, airport, railway station, shopping mall and other crowded places with video cameras installed for surveillance.
[0008] It is an object of the present disclosure to provides a system and method for identifying activity of one or more persons in an area of interest which is cost effective and easy to use.
SUMMARY
[0009] The present disclosure relates to the field of identifying suspicious activity. More particularly the present disclosure relates to the field of identifying suspicious activity of one or more person in an area of interest.
[0010] An aspect of the present disclosure pertains to a system for identifying activity of one or more persons in an area of interest. The system includes one or more image capturing devices configured to capture one or more images of the one or more persons in the area of interest and correspondingly generate a set of first signals. A processing unit having one or more processors coupled with the one or more image capturing devices and a memory. The memory stores instructions executable by the processor to receive, from the one or more image capturing devices, the set of first signals. Extract, from the set of first signals, one or more first attributes pertaining to facial data of the one or more persons, one or more second attributes pertaining to body posture of the one or more persons, and one or more third attributes pertaining to emotions of the one or more persons, and correspondingly generate a first score, a second score, and a third score. Cumulatively add the first score, the second score, and the third score to generate a collective score, and generating a set of second signals when the collective is more than a pre-defined threshold value. The set of second signals actuate one or more indicating devices when the collective score is more than a pre-defined threshold value.
[0011] In an aspect, the collective scores may be generated by cumulatively adding the first score, the second score, and the third score and multiplying with a number of persons in the area of interest. While adding the first score, the second score, and the third score a highest weight may be given to the first score, followed by the second score, and followed by the third score. The extraction of the one or more first attributes may include a first module for extracting, from the set of first signals, one or more parameters. A second module configured to receive the one or more first parameters and correspondingly generate and clean a face corpus. The face corpus may be compared with a database having pre-stored faces. The first score may be generated based on the comparison between the face corpus and the database. The one or more first parameters may include any or combination of head and shoulders, frontal face, side face, eyes, and ears.
[0012] In an aspect, the extraction of the one or more second attributes may include generating a one or more second parameters pertaining to motion-based features of the one or more persons. Extracting, from the one or more second parameters, angular features pertaining to relative angle between joints of the one or more persons, and normalizing the angular features of the one or more persons to detect body posture of the one or more persons and correspondingly generate the second score. Extraction of the one or more third attributes may include extracting, from the set of first signals, one or more third parameters pertaining to visual features of the one or more persons in the area of interest. Extracting, from the one or more third parameters, temporal features using deep learning, and performing a multiclass probability estimation on the temporal features for detecting emotions of the one or more persons and correspondingly generate the third score.
[0013] Yet another aspect of the present disclosure relates to a method for identifying activity of one or more persons in an area of interest. The method includes receiving, by a processor, a set of first signals corresponding to one or more images of the one or more persons in the area of interest. Extracting from the set of first signals, by a processor, one or more first attributes pertaining to facial data of the one or more persons, one or more second attributes pertaining to body posture of the one or more persons, and one or more third attributes pertaining to emotions of the one or more persons, and correspondingly generate a first score, a second score, and a third score. Cumulatively adding, by the processor, the first score, the second score, and the third score to generate a collective score, and generating, by the processor, a set of second signals when the collective is more than a pre-defined threshold value. The set of second signals actuate one or more indicating devices when the collective score is more than a pre-defined threshold value.
[0014] Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.
BRIEF DESCRIPTION OF DRAWINGS
[0015] The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure. The diagrams are for illustration only, which thus is not a limitation of the present disclosure.
[0016] In the figures, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
[0017] FIG. 1 illustrates an exemplary block diagram of a system for identifying suspicious activity, in accordance with an embodiment of the present disclosure.
[0018] FIG. 2 illustrates an exemplary representation of a processing unit in the system for identifying suspicious activity, in accordance with an embodiment of the present disclosure.
[0019] FIG. 3 illustrates an exemplary block diagram of face recognition module, in accordance with an embodiment of the present disclosure.
[0020] FIG. 4 illustrates an exemplary block diagram of body posture recognition module, in accordance with an embodiment of the present disclosure.
[0021] FIG. 5 illustrates an exemplary block diagram of emotion recognition module, in accordance with an embodiment of the present disclosure.
[0022] FIG. 6 illustrates a method for identifying suspicious activity, in accordance with an embodiment of the present disclosure.
[0023] FIG. 7 illustrates an exemplary computer system in which or with which embodiments of the present invention can be utilized, in accordance with embodiments of the present disclosure.
DETAILED DESCRIPTION
[0024] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure as defined by the appended claims.
[0025] In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present invention. It will be apparent to one skilled in the art that embodiments of the present invention may be practiced without some of these specific details.
[0026] The present disclosure relates to the field of identifying suspicious activity. More particularly the present disclosure relates to the field of identifying suspicious activity of one or more person in an area of interest.
[0027] The present disclosure elaborates upon a system for identifying activity of one or more persons in an area of interest. The system includes one or more image capturing devices configured to capture one or more images of the one or more persons in the area of interest and correspondingly generate a set of first signals. A processing unit having one or more processors coupled with the one or more image capturing devices and a memory. The memory stores instructions executable by the processor to receive, from the one or more image capturing devices, the set of first signals. Extract, from the set of first signals, one or more first attributes pertaining to facial data of the one or more persons, one or more second attributes pertaining to body posture of the one or more persons, and one or more third attributes pertaining to emotions of the one or more persons, and correspondingly generate a first score, a second score, and a third score. Cumulatively add the first score, the second score, and the third score to generate a collective score, and generating a set of second signals when the collective is more than a pre-defined threshold value. The set of second signals actuate one or more indicating devices when the collective score is more than a pre-defined threshold value.
[0028] In an embodiment, the collective scores can be generated by cumulatively adding the first score, the second score, and the third score and multiplying with a number of persons in the area of interest. While adding the first score, the second score, and the third score a highest weight can be given to the first score, followed by the second score, and followed by the third score.
[0029] In an embodiment, the extraction of the one or more first attributes can include a first module for extracting, from the set of first signals, one or more parameters. A second module configured to receive the one or more first parameters and correspondingly generate and clean a face corpus. The face corpus can be compared with a database having pre-stored faces.
[0030] In an embodiment, the first score can be generated based on the comparison between the face corpus and the database.
[0031] In an embodiment, the one or more first parameters can include any or combination of head and shoulders, frontal face, side face, eyes, and ears.
[0032] In an embodiment, the extraction of the one or more second attributes can include generating a one or more second parameters pertaining to motion-based features of the one or more persons. Extracting, from the one or more second parameters, angular features pertaining to relative angle between joints of the one or more persons, and normalizing the angular features of the one or more persons to detect body posture of the one or more persons and correspondingly generate the second score.
[0033] In an embodiment, extraction of the one or more third attributes can include extracting, from the set of first signals, one or more third parameters pertaining to visual features of the one or more persons in the area of interest. Extracting, from the one or more third parameters, temporal features using deep learning, and performing a multiclass probability estimation on the temporal features for detecting emotions of the one or more persons and correspondingly generate the third score.
[0034] Yet another embodiment of the present disclosure relates to a method for identifying activity of one or more persons in an area of interest. The method includes receiving, by a processor, a set of first signals corresponding to one or more images of the one or more persons in the area of interest. Extracting from the set of first signals, by a processor, one or more first attributes pertaining to facial data of the one or more persons, one or more second attributes pertaining to body posture of the one or more persons, and one or more third attributes pertaining to emotions of the one or more persons, and correspondingly generate a first score, a second score, and a third score. Cumulatively adding, by the processor, the first score, the second score, and the third score to generate a collective score, and generating, by the processor, a set of second signals when the collective is more than a pre-defined threshold value. The set of second signals actuate one or more indicating devices when the collective score is more than a pre-defined threshold value.
[0035] FIG. 1 illustrates an exemplary block diagram of a system for identifying suspicious activity, in accordance with an embodiment of the present disclosure.
[0036] FIG. 2 illustrates an exemplary representation of a processing unit in the system for identifying suspicious activity, in accordance with an embodiment of the present disclosure.
[0037] As illustrated, a system for identifying suspicious activity of one or more person in an area of interest can include one or more image capturing devices 102 (also referred as imaging devices 102, herein) that can be configured to capture one or more images of the one or more persons in the area of interest and correspondingly generate a set of first signals. The area of interest can be but without limiting to any area where surveillance is requires such as railway station, air-port and other high security areas. The one or more imaging devices 102 can include but not limited to a camera configured to take any or combination of real time images and videos of the one or more person in the area of interest. the processing unit 200 can include one or more processor(s) 202.
[0038] In an embodiment, the one or more processor(s) 202 (also referred as processor, herein) can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that manipulate data based on operational instructions. Among other capabilities, the one or more processor(s) 202 are configured to fetch and execute computer-readable instructions stored in a memory 204 of the processing unit 200. The memory 204 can store one or more computer-readable instructions or routines, which can be fetched and executed to create or share the data units over a network service. The memory 204 can comprise any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as EPROM, flash memory, and the like.
[0039] In an embodiment, the processing unit 200 can also comprise an interface(s) 206. The interface(s) 206 can comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) 106 can facilitate communication of processing unit 200. The interface(s) 206 can also provide a communication pathway for one or more components of the processing unit 200. Examples of such components include, but are not limited to, processing engine(s) 208 and data 210. For example,
[0040] In an embodiment, the processing engine(s) 208 can be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) 208. In examples described herein, such combinations of hardware and programming can be implemented in several different ways. For example, the programming for the processing engine(s) 208 can be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) 208 can comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium can store instructions that, when executed by the processing resource, implement the processing engine(s) 208. In such examples, the processing unit 200 can comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium can be separate but accessible to processing unit 200 and the processing resource. In other examples, the processing engine(s) 208 can be implemented by electronic circuitry.
[0041] In an embodiment, the processing engine(s) 208 can include a face recognition module 104 that can be configured to receive the set of first signals from the one or more imaging devices 102 and correspondingly extract one or more first attributes pertaining to facial data of the one or more persons along with a first score. The first score can be calculated by comparing the facial data of the one or more persons in the area of interest by a database having pre-stores images of the miscreants or criminal. Based on the comparison the first score is generated.
[0042] In an embodiment, the processing engine(s) 208 can include a body posture recognition module 106 that can be configured to extract from the set of first signals, one or more second attributes pertaining to body posture of the one or more persons along with a second score.
[0043] In an embodiment, the processing engine(s) 208 can include an emotions recognition module 108 that can be configured to extract from the set of first signals, one or more third attributes pertaining to emotions of the one or more persons along with a third score. Further, the first score, the second score, and the third score can be cumulatively added, using a cumulative score generator 110, to generate a collective score (can also be referred as threat quotient value, herein). The cumulative scores can be generated by cumulatively adding the first score, the second score, and the third score and multiplying with a number of persons in the area of interest. While adding the first score, the second score, and the third score a highest weight can be given to the first score, followed by the second score, and followed by the third score. The processing unit can be configured to generate a set of second signal, If the cumulative score is more than a pre-defined threshold value. The set of second signals can be configured to actuate one or more indicating devices 112 to take necessary action. The one or more indicating devices can include but without limiting to audio alarms and light emitting diodes.
[0044] In an embodiment, all the module’s outputs the probability for each individual person with certain degree of confidence (also referred as first score, second score, and third score, herein). Modules (body movement, face and emotion recognition) are given different weightage to predict the occurrence of suspicious activity in a video frame which is indicated in terms of threat quotient value (also referred as cumulative score, herein). The cumulative score or threat quotient value (TQ) can be calculated by below formula:
where, P𝐵𝑀𝑗 is probability of suspicious body movement in the scene for each individual ,n is number of person detected with suspicious body movement and ‘WBM’ is the weightage of body movement module, P𝐸𝑀𝑘 is probability of suspicious emotions in the scene for each individual ,m is number of person detected with suspicious emotions and ‘WEM’ is the weightage of emotion recognition module and P𝐹𝑀𝑙 is probability of suspicious face recognition in the scene for each individual ,q is number of suspicious person recognized and WFM is the weightage of face recognition module.
[0045] FIG. 3 illustrates an exemplary block diagram of face recognition module, in accordance with an embodiment of the present disclosure.
[0046] As illustrated, input video frames, input video frames are passed through multibiometric detection module which extracts face from the images. For a detected individual, the detected face is classified into one of the following categories such as but not limited to left, half-left, right, half-right, and frontal face. The face recognition module can include a stage 1 detection module. In the state 1 detection module a detector array is designed using the following detectors head and shoulders, frontal face, eyes, and ear. The detectors can be used to detect one or more first parameters including but without limiting to head and shoulder, ear, eyes, face. The parameter sequences are defined using all detectors. For an input image, the head and shoulders detector can be used first. Once a head and shoulders region can be detected, the detection window is shrunk into this region, followed by activation of the frontal face detector. The proposed face detection system mainly contains three processes: Haar like feature design, AdaBoost training, and cascade classifier module uses a wavelet feature to classify the image regions. If frontal face detectors detect the target, the detection area shrinks into the detected frontal face region and the eyes detector starts. Otherwise, the ear detector is activated. For instance, all detectors capture the position of subject’s feature in real images. The approach for detection of face based on eye and ear is described below:
• Eye Detection: In order to avoid manual processing, eye detection can be utilized for this task. The eyes are detected by the approach (as in the face detection task). Only those face images with two successfully detected eyes are kept while the others are discarded. This step eliminates a number of incorrectly detected faces. Moreover, the successful detection of both eyes ensures to obtain more or less frontal images, while profiles should be removed.
• Ear detection: In the second output of frontal face detector, profiles faces are taken into account for identification of person. Images of ear of individual can be considered as a perfect source of data for passive person identification. Ear satisfies biometric characteristic (universality, distinctiveness, permanence and collectability). The approach for ear recognition is based on geometrical features extraction like (shape, mean, centroid and Euclidean distance between pixels). A model can be used to detect the ear, and a median filter can be applied to remove noise, also the images can be converted in to binary format. After that filters can be used to make some enhancement on the image, largest boundary is calculated and distance matrix is created then the image features can be extracted. Finally, the extracted features were classified by using nearest neighbor with absolute error distance. This method is invariant to scaling, translation and rotation.
[0047] In an embodiment, the face recognition module can include stage 2 detection module. The stage 2 detection module can be configured to receive extracted images from the stage 1 detection module. The extracted images are passed through stage II detection Module to do following tasks:
Feature Extraction: An approach in order to detect and extract faces from pictures and to create a face corpus automatically. This approach is not limited by the image resolution. The approach basically has four steps: extrema detection, removal of key-points with low contrast, orientation assignment and descriptor calculation.
Extrema Detection: The Difference of Gaussian (DoG) filter is applied to the input image. The image is gradually down-sampled and the filtering is performed at several scales. Filtering at several scales ensures scale invariance. Each pixel is then compared with its neighbours. Neighbours on its level as well as on the two neighbouring (lower and higher) levels are examined. If the pixel is the maximum or minimum of all the neighbouring pixels, it is considered to be a potential key-point.
Low Contrast Key-point Removal: The detected key-points are further examined to choose the “best” candidates. For the resulting set of key-points their stability is determined. Locations with low contrast and unstable locations along edges are discarded.
Orientation Assignment: The orientation of each key-point is computed. The computation is based upon gradient orientations in the neighbourhood of the pixel. The values are weighted by the magnitudes of the gradient.
Descriptor Calculation: The final step consists in the creation of descriptors. Gradient magnitudes and orientations are computed in each point of the neighbourhood. Their values are weighted by a Gaussian. For each sub-region, orientation histograms are created. Finally, a vector containing values is created.
Corpus Cleaning: A large number of face images can be obtained for each individual. However, these numbers differ significantly. Therefore, a “corpus cleaning” can be performed in order to choose the same number of the most representative face images. Manually verification on a randomly chosen small face sub-set can be performed, that the majority of face images is correct and the erroneous examples differ substantially from this representative set.
[0048] In an embodiment, an integrated classification of eye, ear and face features, which includes a database and a classifier. The training database contains the eye, ear and face features, whereas the classifier uses these inputs to identify persons. This approach integrates eye, ear and face features into the classifier and then outputs the identification result. The facial region is extracted by face detection. Next, a training template with the smallest Euclidean distance is obtained using the nearest neighbour (NN). Corresponding individual class of the template identified is the personal identification results.
[0049] FIG. 4 illustrates an exemplary block diagram of body posture recognition module, in accordance with an embodiment of the present disclosure.
[0050] As illustrated, input real time video is use to collect frames. Using image process to remove noise and segmentation to separate background and foreground. Then identify the presence of person as skeletal in frames. Motion based features such as the co-ordinates of the skeletal model of human body, which can be utilized for human action recognition. Three-dimensional position data corresponding to various points of the skeletal model is captured from motion-based sensor. Following joint location vector listed in below table are utilized to extract the angular features of the postures:
Joint Joint Locations
J1 Right hand and right elbow
J2 Right elbow and right shoulder
J3 Left elbow and left shoulder
J4 Left hand and left elbow
J5 Right shoulder and right hip
J6 Left shoulder and left hip
J7 Head and torso
J8 Right hip and right knee
J9 Right knee and right foot
J10 Left hip and left knee
J11 Left knee and left foot
In an embodiment, an angle between two joints can be calculated using below formula:
A set of angular features can be extracted and can be normalized between values -1 and +1. In the training phase, a classifier can be trained using these features. In the testing phase the postures can be classified by the trained classifier. In order to detect a posture (during the real-time recognition mode) the multi-class probability estimates of the posture samples are calculated. The maximum value of the probability estimates (of different classes) can be thresholded to detect a posture.
[0051] FIG. 5 illustrates an exemplary block diagram of emotion recognition module, in accordance with an embodiment of the present disclosure.
[0052] As illustrated, a basic framework of emotion recognition can be used to train a hybrid model to recognize and synthesize temporal dynamics for tasks involving sequential images. The emotion recognition system, based on a deep neural network, can learn basic emotions such as but not limited to violent, neutral, anxiety and stressed. First, a model can be used to extract visual features by learning on input video frames. Secondly, each visual feature determined through a model can be passed to the corresponding temporal model, and produces a fixed or variable-length vector representation. In order to detect an emotion, the multi-class probability estimates of the samples can be calculated. The maximum value of the probability estimates (of different classes) is thresholded to detect an emotion. Finally, the predicted distribution is computed by applying below formulae:
[0053] FIG. 6 illustrates a method for identifying suspicious activity, in accordance with an embodiment of the present disclosure.
[0054] As illustrated, at step 602, a method 600 for identifying activity of one or more persons in an area of interest can includes receiving, by a processor, a set of first signals corresponding to one or more images of the one or more persons in the area of interest.
[0055] At step 604, the method 600 can include extracting from the set of first signals, by a processor, one or more first attributes pertaining to facial data of the one or more persons, one or more second attributes pertaining to body posture of the one or more persons, and one or more third attributes pertaining to emotions of the one or more persons, and correspondingly generate a first score, a second score, and a third score.
[0056] At step 606, the method 600 can include cumulatively adding, by the processor, the first score, the second score, and the third score to generate a collective score, and generating, by the processor, a set of second signals when the collective is more than a pre-defined threshold value. The set of second signals actuate one or more indicating devices when the collective score is more than a pre-defined threshold value.
[0057] FIG. 7 illustrates an exemplary computer system in which or with which embodiments of the present invention can be utilized, in accordance with embodiments of the present disclosure.
[0058] As shown in FIG. 5, computer system 700 can include an external storage device 710, a bus 720, a main memory 730, a read only memory 740, a mass storage device 750, communication port 760, and a processor 770. A person skilled in the art will appreciate that the computer system can include more than one processor and communication ports. Examples of processor 770 include, but are not limited to, an Intel® Itanium® or Itanium 2 processor(s), or AMD® Opteron® or Athlon MP® processor(s), Motorola® lines of processors, FortiSOC™ system on chip processors or other future processors. Processor 770 can include various modules associated with embodiments of the present invention. Communication port 760 can be any of an RS-232 port for use with a modem-based dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports. Communication port 760 can be chosen depending on a network, such a Local Area Network (LAN), Wide Area Network (WAN), or any network to which computer system connects.
[0059] Memory 730 can be Random Access Memory (RAM), or any other dynamic storage device commonly known in the art. Read-only memory 740 can be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chips for storing static information e.g., start-up or BIOS instructions for processor 770. Mass storage 750 can be any current or future mass storage solution, which can be used to store information and/or instructions. Exemplary mass storage solutions include, but are not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives (internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces), e.g. those available from Seagate (e.g., the Seagate Barracuda 7102 family) or Hitachi (e.g., the Hitachi Deskstar 7K1000), one or more optical discs, Redundant Array of Independent Disks (RAID) storage, e.g. an array of disks (e.g., SATA arrays), available from various vendors including Dot Hill Systems Corp., LaCie, Nexsan Technologies, Inc. and Enhance Technology, Inc.
[0060] Bus 720 communicatively couple’s processor(s) 770 with the other memory, storage and communication blocks. Bus 720 can be, e.g. a Peripheral Component Interconnect (PCI) / PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), USB or the like, for connecting expansion cards, drives and other subsystems as well as other buses, such a front side bus (FSB), which connects processor 770 to software system.
[0061] Optionally, operator and administrative interfaces, e.g. a display, keyboard, and a cursor control device, can also be coupled to bus 720 to support direct operator interaction with a computer system. Other operator and administrative interfaces can be provided through network connections connected through communication port 760. The external storage device 710 can be any kind of external hard-drives, floppy drives, IOMEGA® Zip Drives, Compact Disc - Read Only Memory (CD-ROM), Compact Disc-Re-Writable (CD-RW), Digital Video Disk-Read Only Memory (DVD-ROM). Components described above are meant only to exemplify various possibilities. In no way should the aforementioned exemplary computer system limit the scope of the present disclosure.
[0062] Moreover, in interpreting the specification, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refer to at least one of something selected from the group consisting of A, B, C ….and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.
[0063] While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.
ADVANTAGES OF THE INVENTION
[0064] The proposed invention provides a system and method for identifying activity of one or more persons in an area of interest that combines body action recognition, emotion recognition, and the face (both frontal and side lobe) recognition, eye and ear tracking approach to investigate the dynamics and eventually predict the probability of a suspicious activity in a video frame.
[0065] The proposed invention provides a system and method for identifying activity of one or more persons in an area of interest that is highly accurate and is of practical value at homeland security solutions, airport, railway station, shopping mall and other crowded places with video cameras installed for surveillance.
[0066] The proposed invention provides a system and method for identifying activity of one or more persons in an area of interest which is cost effective and easy to use.
| # | Name | Date |
|---|---|---|
| 1 | 202041056367-STATEMENT OF UNDERTAKING (FORM 3) [24-12-2020(online)].pdf | 2020-12-24 |
| 2 | 202041056367-POWER OF AUTHORITY [24-12-2020(online)].pdf | 2020-12-24 |
| 3 | 202041056367-FORM 1 [24-12-2020(online)].pdf | 2020-12-24 |
| 4 | 202041056367-DRAWINGS [24-12-2020(online)].pdf | 2020-12-24 |
| 5 | 202041056367-DECLARATION OF INVENTORSHIP (FORM 5) [24-12-2020(online)].pdf | 2020-12-24 |
| 6 | 202041056367-COMPLETE SPECIFICATION [24-12-2020(online)].pdf | 2020-12-24 |
| 7 | 202041056367-Proof of Right [28-01-2021(online)].pdf | 2021-01-28 |
| 8 | 202041056367-RELEVANT DOCUMENTS [23-12-2024(online)].pdf | 2024-12-23 |
| 9 | 202041056367-POA [23-12-2024(online)].pdf | 2024-12-23 |
| 10 | 202041056367-FORM 18 [23-12-2024(online)].pdf | 2024-12-23 |
| 11 | 202041056367-FORM 13 [23-12-2024(online)].pdf | 2024-12-23 |