Abstract: METHOD AND SYSTEM FOR DETECTING AND NOTIFYING ACTIONABLE EVENTS DURING SURVEILLANCE ABSTRACT The disclosure relates to method (500) and system (100) for detecting and notifying actionable events during surveillance. The method (500) may include receiving (502) initial multi-modal inputs from a geo-location during surveillance, determining (504) an incident of interest based on an analysis of the initial multi-modal inputs, and collecting (506) additional multi-modal inputs from at least one access device (110) corresponding to at least one person in the geo-location upon determination of the incident of interest. The method (500) may further include determining (508) the actionable event based on an analysis of the initial and the additional multi-modal inputs, and providing (510) a notification of the actionable event to one or more appropriate authorities (114). To be published with Figure 2.
Claims:We Claim:
1. A system (100) for detecting and notifying an actionable event during surveillance, the system (100) comprising:
an edge server (104) comprising a processor (120) and a memory (118) communicatively coupled to the processor (120), wherein the memory (118) stores processor-executable instructions, which, on execution, causes the processor (120) to:
receive (502) initial multi-modal inputs from a geo-location during surveillance;
determine (504) an incident of interest based on an analysis of the initial multi-modal inputs;
collect (506) additional multi-modal inputs from at least one access device (110) corresponding to at least one person in the geo-location upon determination of the incident of interest;
determine (508) the actionable event based on an analysis of the initial and the additional multi-modal inputs; and
provide (510) a notification of the actionable event to one or more appropriate authorities (114).
2. The system (100) of claim 1, further comprising one or more surveillance devices (102) for acquiring the initial multi-modal inputs, wherein the processor (120) receive initial multi-modal inputs from the one or more surveillance devices (102), and wherein the one or more surveillance devices (102) comprise at least one of a closed-circuit television (CCTV) camera, an Internet Protocol (IP) camera, a microphone, an Internet-of-Things (IoT) sensor, a mobile device, a hand-held device, or a wearable device.
3. The system (100) of claim 1, wherein the processor-executable instructions further cause the processor (120) to:
identify (602) a set of persons in the geo-location upon determination of the incident of interest by performing at least one of a facial recognition of a set of faces or a voice recognition of a set of voices in the initial multi-modal inputs against a plurality of persons in a population register;
determine (604) a set of access devices corresponding to the set of persons from the population register; and
activate (606) the at least one access device (110) from the set of access devices for collecting additional multi-modal inputs.
4. The system (100) of claim 1, wherein the processor-executable instructions further cause the processor (120) to:
determine (702) a plurality of access devices in the geo-location upon determination of the incident of interest based on inputs from one or more network operators;
identify (704) a plurality of persons corresponding to the plurality of access devices from a population register;
identify (706) a set of persons by performing at least one of a facial recognition of a set of faces or a voice recognition of a set of voices in the multi-modal inputs against the plurality of persons in the population register; and
activate (708) the at least one access device (110) corresponding to the at least one person from the set of persons for collecting additional multi-modal inputs.
5. The system (100) of claim 1, and wherein the at least one access device (110) comprises a mobile device, a hand-held device, or a wearable device, and wherein the at least one access device (110) is activated for collecting additional multi- modal inputs upon at least one of: a notification to the at least one person, and a permission from the at least one person.
6. The system (100) of claim 1, wherein determining the actionable event comprises correlating the initial and the additional multi-modal inputs so as to validate the incident of interest and gathering specific inputs with respect to the incident of interest.
7. The system (100) of claim 1, wherein the processor-executable instructions further cause the processor (120) to:
provide one or more recommendations to the one or more appropriate authorities (114) based on the actionable event.
8. The system (100) of claim 1, wherein the processor-executable instructions further cause the processor (120) to:
generate (802) a confidence score for each of a plurality of incidents of interest determined over a period of time, wherein the confidence score is based on a criticality of the actionable event;
create (804) a catalogue of the plurality of incidents of interest based on their respective confidence scores and actionable events; and
utilize (806) the catalogue for evaluating a new incident of interest.
9. A method (500) for detecting and notifying an actionable event during surveillance, the method (500) comprising:
receiving (502) initial multi-modal inputs from a geo-location during surveillance;
determining (504) an incident of interest based on an analysis of the initial multi-modal inputs;
collecting (506) additional multi-modal inputs from at least one access device (110) corresponding to at least one person in the geo-location upon determination of the incident of interest;
determining (508) the actionable event based on an analysis of the initial and the additional multi-modal inputs; and
providing (510) a notification of the actionable event to one or more appropriate authorities (114).
10. The method (500) of claim 9, further comprising:
identifying (602) a set of persons in the geo-location upon determination of the incident of interest by performing at least one of a facial recognition of a set of faces or a voice recognition of a set of voices in the initial multi-modal inputs against a plurality of persons in a population register;
determining (604) a set of access devices corresponding to the set of persons from the population register; and
activating (606) the at least one access device (110) from the set of access devices for collecting additional multi-modal inputs.
11. The method (500) of claim 9, further comprising:
determining (702) a plurality of access devices in the geo-location upon determination of the incident of interest based on inputs from one or more network operators;
identifying (704) a plurality of persons corresponding to the plurality of access devices from a population register;
identifying (706) a set of persons by performing at least one of a facial recognition of a set of faces or a voice recognition of a set of voices in the multi-modal inputs against the plurality of persons in the population register; and
activating (708) the at least one access device (110) corresponding to the at least one person from the set of persons for collecting additional multi-modal inputs.
12. The method (500) of claim 9, further comprising providing one or more recommendations to the one or more appropriate authorities (114) based on the actionable event.
13. The method (500) of claim 9, further comprising:
generating (802) a confidence score for each of a plurality of incidents of interest determined over a period of time, wherein the confidence score is based on a criticality of the actionable event;
creating (804) a catalogue of the plurality of incidents of interest based on their respective confidence scores and actionable events; and
utilizing (806) the catalogue for evaluating a new incident of interest.
, Description:Title: “METHOD AND SYSTEM FOR DETECTING AND NOTIFYING ACTIONABLE EVENTS DURING SURVEILLANCE”
DESCRIPTION
Technical Field
[001] This disclosure relates generally to surveillance, and more particularly to method and system for detecting and notifying actionable events during surveillance.
Background
[002] Thousands of individuals experience serious, unexpected emergency situations every day, where they may require assistance. Throughout the world, many rescue authorities are available for providing assistance in such situation to the person in need. However, many a times, individuals requiring such assistance may not be in a condition to seek assistance. For example, they may not have time or ability to respond to such emergencies due to unfavorable circumstances. Such circumstances may involve limitations of body movement, panic after an accident or any other hostile condition. Further, such circumstances may lead to irreparable damage (e.g., permanent disability, death, etc.), if not given timely and appropriate assistance. Sometimes, even if an individual, is somehow able to initiate the need for assistance, taking an appropriate action is almost impossible due to unavailability of the details of the emergency.
[003] There are many conventional techniques for triggering and providing assistance to individuals in emergencies. However, such techniques are prone to false initiation and fail to detect the severity of the situation. Further, only initiation for the assistance by the individual may not be enough. Conventional techniques lack the ability to effectivity capture details, if any, in order to identify the type of emergency and condition of the individual in a specific situation and to detect seriousness of the situation. As will be appreciated, such details are needed to provide appropriate remedial response to the individual in need.
SUMMARY
[004] In one embodiment, a method for detecting and notifying an actionable event during surveillance is disclosed. In one example, the method may include receiving initial multi-modal inputs from a geo-location during surveillance. The method may further include determining an incident of interest based on an analysis of the initial multi-modal inputs. The method may further include collecting additional multi-modal inputs from at least one access device corresponding to at least one person in the geo-location upon determination of the incident of interest. The method may further include determining the actionable event based on an analysis of the initial and the additional multi-modal inputs. The method may further include providing a notification of the actionable event to one or more appropriate authorities.
[005] In another embodiment, a system for detecting and notifying an actionable event during surveillance is disclosed. In one example, the system may include a processor and a memory communicatively coupled to the processor, wherein the memory stores processor-executable instructions, which, on execution, may cause the processor to receive initial multi-modal inputs from a geo-location during surveillance. The processor-executable instructions, on execution, may further cause the processor to determine an incident of interest based on an analysis of the initial multi-modal inputs. The processor-executable instructions, on execution, may further cause the processor to collect additional multi-modal inputs from at least one access device corresponding to at least one person in the geo-location upon determination of the incident of interest. The processor-executable instructions, on execution, may further cause the processor to determine the actionable event based on an analysis of the initial and the additional multi-modal inputs. The processor-executable instructions, on execution, may further cause the processor to provide a notification of the actionable event to one or more appropriate authorities.
[006] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[007] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.
[008] FIG. 1 is a block diagram of an exemplary system for detecting and notifying actionable events during surveillance, in accordance with some embodiments of the present disclosure.
[009] FIG. 2 illustrates a functional block diagram of an exemplary edge server for detecting and notifying actionable events during surveillance, in accordance with some embodiments of the present disclosure.
[010] FIGS. 3A and 3B illustrate detection and evaluation of an exemplary incident of interest and generation of an associated knowledge graph, in accordance with some embodiments of the present disclosure.
[011] FIGS. 4A, 4B, and 4C illustrate detection and evaluation of another exemplary incident of interest and generation of an associated knowledge graph, in accordance with some embodiments of the present disclosure.
[012] FIG. 5 is a flow diagram of an exemplary process for detecting and notifying actionable events during surveillance, in accordance with some embodiments of the present disclosure.
[013] FIG. 6 is a flow diagram of an exemplary process for identifying persons present in a geo-location of an incident of interest, in accordance with some embodiments of the present disclosure.
[014] FIG. 7 is another flow diagram of an exemplary process for identifying persons present in a geo-location of an incident of interest, in accordance with some embodiments of the present disclosure.
[015] FIG. 8 is a flow diagram of an exemplary process for generating a catalogue of incidents of interest for subsequent evaluation of new incidents of interest, in accordance with some embodiments of the present disclosure.
[016] FIG. 9 is a flow diagram of an exemplary process for detecting and notifying actionable events during surveillance by public surveillance devices, in accordance with some embodiments of the present disclosure.
[017] FIG. 10. is a flow diagram of a detailed exemplary process for detecting and notifying actionable events during surveillance, in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION
[018] Exemplary embodiments are described with reference to the accompanying drawings. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims. Additional illustrative embodiments are listed below.
[019] Referring now to FIG. 1, an exemplary system 100 for detecting and notifying actionable events during surveillance is illustrated, in accordance with some embodiments of the present disclosure. The system 100 may include surveillance devices 102, an edge server 104, a population register 106, a network operator server 108, access devices 110, external devices 112, and authorities 114 communicatively coupled to each other via a network 116. The network 116 may be a wired or a wireless network and the examples may include, but are not limited to, Internet, Wireless Local Area Network (WLAN/Wi-Fi), Wireless CDMA (WCDMA), Long Term Evolution (LTE), 5G-NR, LiFi, Worldwide Interoperability for Microwave Access (WiMAX), and General Packet Radio Service (GPRS).
[020] The surveillance devices 102 may be responsible for continuously capturing initial multi-modal inputs and sharing the same with the edge server 104. In some embodiments, the surveillance device 102 may perform primary processing the initial multi-modal inputs and then share the initial multi-modal inputs as well as result of the primary processing with the edge server 104. It should be noted that the surveillance devices 102 may include public surveillance devices as well as personal devices (e.g., mobile access devices, fitness tracking devices, health monitoring devices, user devices, etc.). By way of an example, the surveillance devices 102 may include, but are not limited to, closed-circuit television (CCTV) camera, Internet Protocol (IP) camera, microphone, Internet-of-Things (IoT) sensor, mobile device, hand-held device, and wearable device.
[021] In some embodiment, the initial multi-modal inputs correspond to a continuously shared live feed or readings from the surveillance devices 102. In some embodiments, the initial multi-modal inputs may include, but are not limited to, speech, image, video, notification (e.g., SOS notification) from personal access device, and sensor reading (e.g., heart rate, GPS readings, etc.) from personal access/wearable device. By way of an example, the initial multi-modal inputs may be a feed from public CCTV camera showing a damaged vehicle and injured and/or unconscious persons. By way of another example, the initial multi-modal inputs may be a feed from the public CCTV camera device showing a person running away from two people in a street, a SOS notification received from a user device (in possession of the person running away) along with a trigger for the SOS notification (e.g., increased heart rate (probably due to panicked running) recorded by a fitness tracker of the person running away). As will be appreciated, the initial multi-modal inputs may come with or may include geo-location information of the initial multi-modal inputs. For example, the CCTV camera feed may come from the CCTV camera installed at a particular location. Similarly, the feed from hand-held device or wearable device may come along with the GPS location.
[022] The edge server 104 may analyze the initial multi-modal inputs for identifying an incident of interest. The incident of interest may include, but is not limited to, an accident, a crime, a riot, and a natural disaster. By way of an example, the edge server 104 may analyse the camera feed showing the damaged vehicle and injured and/or unconscious persons to identify an accident or a probable accident. By way of another example, the edge server 104 may analyse the camera feed showing the person running away along with SOS notification to identify a probable crime. Upon determination of the incident of interest, the edge server 104 may proceed to collect additional information with respect to the incident of interest. In particular, the edge server 104 may collect additional multi-modal inputs from at least one access device corresponding to at least one person in the geo-location upon determination of the incident of interest.
[023] In some embodiments, the edge server 104 may identify persons present in the geo-location of the incident of interest. It should be noted that the identification may be carried for the persons in need (e.g., accident victims, crime victims, etc.) as well as any persons around the persons in need (e.g., people present at the accident/crime scene, the criminals, etc.). The edge server 104 may communicate with the population register server 106 via network 116 in order to identify the persons. The edge server 104 may perform at least one of a facial recognition of faces or a voice recognition of voices in the multi-modal inputs against that of a number of persons in a population register server 106. The population register server 106 may include identity information (name, photo, contact number, etc.) of the population. The population register server 106 may include, but is not limited to, a Unique Identification Authority of India (UIDAI) server comprising Aadhaar registration details of the population and a social security server comprising social security registration details of the population. In some embodiments, the population register server 106 may also include biometric information (voice sample, iris scan, fingerprints, etc.), health information (blood group, presence of a health condition (e.g., diabetes, hypertension, etc.), allergy information, etc.) emergency information (e.g., emergency contact persons and contact numbers, insurance number, etc.).
[024] Upon identifying the persons, the edge server 104 may further coordinate with the network operator server 108 to validate the geo-location of retrieved contact information associated with the identified persons with the geo-location of incident of interest. In some embodiments, the network operator server 108 may correspond to wired or wireless communication services provider that owns the infrastructure necessary to sell and deliver services to mobile network operators (MO), virtual network operators, and end users. The edge server 104 may activate an access device 110 corresponding to a user from among the identified persons present in the geo-location of incident of interest. In some embodiments, activation of the access device 110 may be associated with remote activation of emergency application that reside on the access device 110. It should be noted that the activation of the access device 110 may be automatic or manual. The automatic activation may be either in stealth mode (i.e., without notifying the user) or in non-stealth mode (i.e., with a notification to the user). In non-stealth mode, the user may be able to disable the activation, if desired. The manual activation may be upon proactive action by the user (i.e., with a permission of the user). Further, it should be noted that, in some embodiments, the user may be able to pre-set preferred activation modes through the emergency application that reside on the access device 110. By way of an example, the access devices 110 may include, but are not limited to, mobile device, tablet device, wearable device, and any other communication device. The access devices 110 on activation may capture additional multi-modal inputs either through in-built sensors or through sensors residing on one or more external devices 112 in communication with the activated access device 110. The access devices 110 may then share the captured additional multi-modal inputs with the edge server 104. By way of an example, the external devices 112 may include, but are not limited to, camera, voice recorder, fitness tracker, and health monitoring system.
[025] The edge server 104, may then determine a correlation between the initial-multi model inputs and the additional multi-modal inputs for validating the incident of interest. For example, the validation may include, but is not limited to, a validation of a geo-location extracted from the initial multi-modal inputs with respect to a geo-location received extracted from the additional multi-modal inputs. Thereafter, the edge server 104 may analyze both the initial and the additional multi-modal inputs, in order to determine an actionable event. The actionable event may include, but is not limited to, an injured person in road accident, a person in endangered situation, and a fire emergency. It should be noted that the incident of interest is an incident that should be further looked into for determining actionable event, while the actionable event is a specific event that requires a remedial response. By way of an example, the edge server 104 may analyse the camera feed showing the damaged vehicle and injured and/or unconscious persons as well the additional multi-modal inputs collected from the access device to identify the injured persons, a type and a severity of injuries in each of the identified injured persons, their respective blood groups, their vital health parameters, and so forth. By way of another example, the edge server 104 may analyse the camera feed showing the person running away along with SOS notification as well the additional multi-modal inputs collected from the access device to identify the victim and the criminals, recorded crime history of the identified criminals, and so forth.
[026] The edge server 104 may then notify about the actionable event to one or more appropriate authorities 114. By way of example, the notification may be an intimation of the accident/crime at a particular geo-location. Additionally, in some embodiments, the edge server 104 may also provide a recommendation (e.g., keep two units B+ blood, etc.) along with notification about the actionable event (e.g., an accident involving an injured person with excessive bleeding) to the one or more appropriate authorities 114. The one or more appropriate authorities may include, but are not limited to, nearby police station, nearby hospital, and nearby fire station. In some embodiments, the notification about the actionable event may be in form of or along with a synopsis or a summary of a story. By way of example, the story summary may be that “A car accident on Highway 07 at about 5 KM from Toll Plaza 3. Two injured persons. One with minor external injuries, but unconscious. The other with excessive bleeding (B+ blood group) and unconscious.” By way of another example, the story summary may be that “A person is being chased by two history-sheeters (Ranga and Billa) on Street No. 5 off the Saint Mark’s Street.”
[027] The edge server 104 may include a memory 118 and a processor 120. The memory 118 may store the initial and the additional multi-modal inputs. The memory may also store instructions, which when executed by the processor 120, causes the processor 120 to analyse the initial and the additional multi-modal inputs to detect any actionable events, to determine suitable recommendations, to notify the actionable events to the authorities 114, and so forth. The memory 118 may be a non-volatile memory or a volatile memory. Examples of non-volatile memory, may include, but are not limited to flash memory, Read Only Memory (ROM), Programmable ROM (PROM), Erasable PROM (EPROM), and Electrically EPROM (EEPROM) memory. Examples of volatile memory may include, but are not limited to, Dynamic Random Access Memory (DRAM), and Static Random-Access memory (SRAM). Further, the edge server may include a display 122 that may render a user interface 124. A user or an administrator may interact with the edge server 104 and vice versa through the user interface 124. The user interface 124 may be used to display multi-modal inputs received from the surveillance devices 102 and/or the access devices 110. The user interface 124 may be used to provide further multi-modal inputs to the edge server device 104. Such further multi-modal inputs may be inputs by the user based on a manual assessment of the initial/additional multi-modal inputs received from the surveillance/access devices. Such further multi-modal inputs may be in form of voice or text.
[028] Referring now to FIG. 2, a functional block diagram of an edge server 104 for detecting and notifying an actionable event during surveillance is illustrated, in accordance with an embodiment. The edge server 104 may include an input module 202, an analytics module 204, a storage module 206, a notification module 216 and a recommendation module 218. The input module 202 may receive initial and additional multi-modal inputs received from the plurality of surveillance devices 102 and the plurality of access device 110 respectively. The input module 202 may then share these initial multi-modal inputs and additional multi-modal inputs as an input data with the analytics module 204. The analytics module 204 may then analyze the input data received from the input module 202 to determine an incident of interest and a actionable event. Further, for analyzing the input data received, the analytics module 204 may communicate with the storage module 206.
[029] The storage module 206 may further include machine learning (ML) or artificial intelligence (AI) models 208, a catalogue for incidents of interest (along with their evaluation and event classification) 210, incidents of interest and actionable events 212, and classified events and corresponding story summarizations 214. The analytics module 204 may employ a suitable ML/AI model 208 for identifying and classifying a new incident of interest based on the input data received. By way of an example, the ML/AI models 208 may be trained based on training dataset comprising a number of incidents of interest. For example, a Convolutional Neural Network (CNN) based AI model may be trained on a large number of labelled data that may include incidents of interest from different scenes so as to identify the incident of interest for any real-time scene and to classify the identified incident of interest. Thereafter, when a new incident of interest is received, the CNN based AI model may identify and classify the new incident of interest based on the learning. Upon identification and classification of the incident of interest, the analytics module 204 may employ a suitable ML/AI model 208 for determining an actionable event based on the input data received. The analytics module 204 may also prepare a story based on the incident of interest and the actionable event, based on classification of the incident of interest and information stored in the catalogue of incidents of interest 210 (e.g., evaluation and classification of similar past incidents event). In some embodiments, the catalogue for incidents of interest 210 may be stored in form of a knowledge graph, which may be referred for decision making while evaluating a new incident of interest, in reference to a discrete data available. The discrete data may correspond to the input data which may include the initial multi-modal inputs and additional multi-modal inputs. Once the story of the incident is prepared, the analytics module 204 may perform further analysis in order to classify the actionable event, based on seriousness and criticality of the incident of interest. The seriousness and criticality of the incident of interest may be determined based on information stored in the incidents of interest and actionable events 212, and the classified events and corresponding story summarizations 214. Moreover, information stored in the incidents of interest and actionable events 212 and the classified events and corresponding story summarizations 214 may include confidence scores associated with incidents of interest and with actionable events. Thereafter, upon determining seriousness and criticality of the actionable event, the analytics module 204 may communicate with the notification module 216 for providing a notification about the actionable event, to one or more appropriate authorities 114. Additionally, the analytics module may also provide a recommendation from the recommendation module 218 when notifying one or more appropriate authorities 114 about the actionable event.
[030] Referring now to FIGS. 3A and 3B, detection and evaluation of an exemplary incident of interest and generation of an associated knowledge graph is illustrated, in accordance with some embodiments of the present disclosure. Referring now to FIG. 3A, a situation 300A is captured and assessed to determine the incident of interest 302, in accordance with an embodiment. One or more surveillance devices 102 may continuously capture and share multi-modal inputs with the edge server 104. The edge server 104 may then use ML/AI models 208 for identifying the incident of interest 302 by analyzing the captured multi-modal inputs. In the illustrated embodiment, the incident of interest 302 may include a road accident. By way of an example, the initial multi-modal inputs may include, but are not limited to, collision data from vehicle or public CCTV camera, control failure data from vehicle, and clear image of vehicle and driver from public CCTV camera. As will be appreciated, the occurrence of the incident of interest 302 with a particular relationship with each other or with past or future states may construe a meaningful insight that qualifies to be a reason for triggering information sharing with the authorities (e.g., safety agencies). For example, if a vehicle is detected to be stationary and deformed along with people rushing towards the vehicle, it may be an indication of an accident. When correlated with a state of the vehicle being in motion just before this set of events, the confidence level of considering this situation as a case of accident will get better. The edge server 104 may then perform identification of persons in a geo-location of the incident of interest 302 by performing at least one of a facial recognition of faces or a voice recognition of voices. The edge server 104 may then determine the access devices 110 belonging to the identified persons in the geo-location of the incident of interest 302 so as to trigger gathering of additional information.
[031] Once the contact information is retrieved, the edge server 104 may identify correlation between a geo-location extracted from the initial multi-modal inputs and a geo-location of the access devices, corresponding to the retrieved contact information of the persons, so as to further validate the incident of interest 302. The edge server 104 may also remotely enable at least one access device 110 to collect additional multi-modal inputs with respect to the incident of interest 302. For example, the additional multi-modal inputs may be people talking about injured and unconscious people inside the vehicle. The edge server 104 may then perform analysis based on the initial and the additional multi-modal inputs, in order to identify an actionable event. For example, the actionable event may be that the two injured and unconscious persons are present inside the vehicle with a lot of blood loss. Thereafter, based on analysis of the initial and the additional multi-modal inputs, the edge server 104 may provide a notification of the actionable event along with a recommendation to one or more appropriate authorities 114 so that they may take remedial action. By way of an example, a notification may be ‘road accident involving two injured and unconscious people with a lot of blood loss’, the recommendation may be ‘rush police response team and an ambulance to ’, and the authorities may be nearby police stations and nearby hospitals.
[032] Referring now to FIG. 3B, a knowledge graph 300B representing the learning from the situation 300A is illustrated, in accordance with an embodiment. In the illustrated embodiment, an incident of interest such as the road accident 304 may lead to an immobile vehicle 306 and probably a deformed vehicle 308. The road accident 304 may also involve blood 310 of injured people involved in the accident. Additionally, once the vehicle has become immobile and/or deformed, the road accident 304 may involve people approaching towards the immobile and/or deformed vehicle 312 during and/or after the accident.
[033] Referring now to FIGS. 4A, 4B and 4C detection and evaluation of another exemplary incident of interest and generation of an associated knowledge graph is illustrated, in accordance with some embodiments of the present disclosure. Referring now to FIG. 4A, a situation 400A is captured and assessed to determine the incident of interest 402, in accordance with an embodiment. One or more surveillance devices 102 may continuously capture and share multi-modal input along with geo-location, with the edge server 104. The edge server 104 may then use ML/AI models 208 for identifying the incident of interest 402 by analyzing the captured multi-modal inputs. In the illustrated embodiment, the incident of interest 402 may include a crime. By way of an example, the initial multi-modal inputs may include, but are not limited to, images/voices captured at crime scene from public CCTV camera/microphone (e.g., pointed weapon, panic in voice, gun shot, etc.). The edge server 104 may then perform identification of persons in a geo-location of the incident of interest 402 by performing at least one of a facial recognition of faces or a voice recognition of voices. The edge server 104 may then determine the access devices 110 belonging to the identified persons in the geo-location of the incident of interest 402 so as to trigger gathering of additional information.
[034] Once the contact information is retrieved, the edge server 104 may identify correlation between a geo-location extracted from the initial multi-modal inputs and a geo-location of the access devices, corresponding to the retrieved contact information of the persons, so as to further validate the incident of interest 402. The edge server 104 may also remotely enable at least one access device 110 to collect additional multi-modal inputs with respect to the incident of interest 402. For example, the additional multi-modal inputs may be voice recording of terrorist threatening the hostages. It should be noted that such collection of additional multi-modal inputs may preferably be performed in stealth mode. Referring now to FIG. 4B, a transcript of the captured voice recording 404 is provided, in accordance with an embodiment. In the illustrated embodiment, the transcript of the captured voice recording may include, but is not limited to, “leave me”, “help”, a pleading tone, a gunshot, and a groaning sound. The edge server 104 may then perform analysis based on the initial and the additional multi-modal inputs, in order to identify an actionable event. For example, the actionable event may be that a person is highly endangered. Thereafter, based on analysis of the initial and the additional multi-modal inputs, the edge server 104 may provide a notification of the actionable event along with a recommendation to one or more appropriate authorities 114.
[035] Referring now to FIG. 4C, a knowledge graph 400C representing the learning from the situation 400A is illustrated, in accordance with an embodiment. In the illustrated embodiment, an incident of interest such as the road accident 402C involves a terrorist 404C. The terrorist 404C makes people put their hands at the back 406C. The terrorist 404C wears a mask 408C and holds a gun 410C which on firing produces a gunshot sound 412C. The gun shot 412C may lead to a groaning sound 414C and may wound a victim 416C. The wounded victim 416C may speak “leave me” 420C, while making a groaning sound 418C.
[036] Referring now to FIG. 5, an exemplary control logic 500 for detecting and notifying actionable events during surveillance is depicted via a flowchart, in accordance with some embodiment of the present disclosure. At step 502, initial multi-modal inputs may be received from the plurality of surveillance devices 102. The initial multi-modal inputs may correspond to a captured live feed during surveillance. By way of an example, the initial multi-modal inputs may include, but are not limited to, speech, image, video, notification from device, and sensor reading (e.g., heart rate, GPS readings, etc.) from device. At step 504, an incident of interest and a geo-location of incident of interest is determined based on the analysis of initial multi-modal inputs. By way of an example, the incident of interest may include, but is not limited to, an accident, a crime, a riot, and a natural disaster. At step 506, additional multi-modal inputs may be collected from at least one access device corresponding to at least one person in the geo location of incident of interest. The additional multi-modal inputs may be collected by enabling an emergency application installed in at least one access device corresponding to at least one person in the geo location of incident of interest. By way of an example, the additional multi-modal inputs may include, but are not limited to, voice of the identified persons, video-recording of the surrounding, and monitored sensor parameters. As will be appreciated, the additional multi-modal inputs may provide a detailed information with respect to the incident of interest (e.g., how accident happened, a nature of injury, amount of blood loss, etc.), a personal information with respect to the person involved (blood group, emergency contact number, etc.), and so forth.
[037] At step 508, the initial and additional multi-modal inputs are analyzed for determining an actionable event. In an embodiment, the actionable event may correspond to any emergency situation that requires assistance. By way of an example, the actionable event may include, but is not limited to, an injured person in road accident, a person in endangered situation, and a fire emergency. At step 510, a notification about the actionable event identified is provided to one or more appropriate authorities. The notification may further be followed with a recommendation with respect to the actionable event, to one or more authorities. By way of an example, one or more authorities may include but are not limited to, nearby police station, nearby hospital, and nearby fire station. The notification may also be sent to friends and families when such information is available.
[038] Referring now to FIG. 6, an exemplary control logic 600 for identifying persons present in a geo-location of an incident of interest is depicted via a flowchart, in accordance with some embodiment of the present disclosure. At step 602, a set of persons may be identified in the geo-location of incident of interest by performing at least one of a facial recognition of a set of faces or a voice recognition of a set of voices in the multi-modal input against a plurality of person in an identity database. By way of an example, the identity database may include, but is not limited to, a UIDAI database and a social security database. Once the set of persons is identified, at step 604, a set of access devices is identified corresponding to the set of persons present in a geo-location of the incident of interest. By way of an example, the access devices may include, but are not limited to, mobile device, tablet device, wearable device, and any other communication device. Once the plurality of access devices are identified, at step 606, the plurality of access devices may be activated, in order to collect additional multi-modal inputs.
[039] Referring now to FIG. 7, another exemplary control logic 700 for identifying persons present in the geo-location of the incident of interest is depicted via a flowchart, in accordance with some embodiment of the present disclosure. At step 702, a plurality of access devices present in the geo-location of the incident of interest are determined, based on inputs from one or more network operators. The network operators may correspond to wired or wireless communication service providers that own the infrastructure necessary to sell and deliver services to mobile network operators (MO), virtual network operators, and end users. At step 704, a plurality of persons corresponding to the plurality of access devices are identified from the population register. At step 706, a set of persons present at the exact place of incident of interest are identified from among the plurality of persons identified at step 704. The identification at step 706 may be performed by performing at least one of a facial recognition of a set of faces or a voice recognition of a set of voices in the multi-modal inputs against the plurality of persons identified at step 704. As will be appreciated, this may substantially reduce the computing resource and time required as against identifying the set of persons against a larger population. At step 708, at least one access device corresponding to at least one person from the set of persons is activated by a remote trigger. The at least one activated access device may then collect additional multi-modal inputs.
[040] Referring now to FIG. 8, an exemplary control logic 800 for generating a catalogue for incidents of interest for subsequent evaluation of new incidents of interest is depicted via a flowchart, in accordance with some embodiment of the present disclosure. At step 802, a confidence score is generated for each of a plurality of incidents of interest determined over a period of time, based on a criticality of the actionable event. By way of an example, the critical actionable event may correspond with any event that is crucial, dangerous, and risky. Once the confidence score is generated, at step 804, a catalogue is created based on confidence score associated to the plurality of incidents of interest and the actionable events. At step 806, the catalogue created is utilized for evaluating a new incident of interest. As discussed above, the catalogue may also be employed to generate a story of the incident of interest and/or the actionable event based on the received multi-modal inputs. This story may be presented to appropriated authorities during notification of actionable event. As will be appreciated, this helps in provided a more effective and efficient remedial response to the person in need of assistance. Alternatively, as discussed above, the actionable event itself may be in the form of a story.
[041] Referring now to FIG. 9, an exemplary control logic 900 for detecting and notifying actionable events during surveillance by public surveillance devices is depicted via a flowchart, in accordance with some embodiments of the present disclosure. At step 902, initial multi-modal inputs along with a geo-location, is received during surveillance by public surveillance devices. At step 904, based on analysis of the initial multi-modal inputs received, an incident of interest is determined. Once the incident of interest is determined, at step 906, additional multi-modal inputs is collected from at least one access device corresponding to at least one person in the geo-location upon determination of the incident of interest. At step 908, the initial and the additional multi-modal inputs are analyzed, in order to determine an actionable event. Thereafter, based on the actionable event determined, at step 910, a notification along with recommendation for the actionable event is provided to one or more appropriate authorities.
[042] Referring now to FIG. 10, exemplary control logic 1000 for detecting and notifying actionable events during surveillance is depicted in greater detail via a flowchart, in accordance with some embodiments of the present disclosure. At step 1002, initial multi-modal inputs may be received from a geo-location during surveillance from plurality of surveillance devices. By way of an example, initial multi-modal inputs may include, but are not limited to, speech, image, video, notification from device, and sensor reading from device. By way of another example, the surveillance devices 102 may include, but are not limited to, CCTV camera, IP camera, microphone, IoT sensor, mobile device, hand-held device, and wearable device. As stated above, the initial multi-modal inputs may come with or may include geo-location information of the reading. These received initial multi-modal inputs are further analyzed in order to identify an incident of interest. The incident of interest may include, but is not limited to, an accident, a crime, a riot, and a natural disaster. By way of an example, the initial multi-modal inputs may be a feed from public surveillance devices (e.g., CCTV cameras) showing a damaged stationary vehicle probably with injured passenger.
[043] Further, at step 1004, a set of persons may be detected from the initial multi-modal inputs. A set of identifiers (e.g., face, voice, etc.) corresponding to each of the set of persons may be extracted from the initial multi-modal inputs. At step 1005, a set of persons may be identified in the geo-location of incident of interest based on the extracted identifiers. By way of example, in order to identify the set of persons, at least one of a facial recognition of a set of faces or a voice recognition of a set of voices may be performed. Additionally, the one or more mobile numbers corresponding to the at least one person from the set of identified persons may be retrieved. At step 1006, a geo-location may be retrieved for the one or more retrieved mobile numbers. Thereafter, at step 1008, the geo-location corresponding to the set of persons is verified with the geo-location of the public surveillance device providing the feed. It should be noted that, in some embodiments, the set of persons may be a single person with a verified geo-location (i.e., the location of the mobile device of the identified single person is in the geo-location of the incident of interest).
[044] If the geo-location corresponding to the set of persons matches with the geo-location of the public surveillance devices, at step 1010, a request is made for collecting additional multi-modal inputs. As stated above, the additional multi-modal inputs may be collected by enabling an emergency application installed in at least one access device corresponding to at least one person in the geo location of incident of interest. Further, as stated above, the emergency application in the mobile device of the identified person may be triggered automatically or manually, in stealth mode or in non-stealth mode, with a notification or without a notification, upon permission or without permission. By way of an example, the additional multi-modal inputs may include, but are not limited to, voice of identified persons, video-recording of the surrounding, and monitored sensor parameters. Alternatively, if the geo-location corresponding to the set of persons does not match with the geo-location of the public surveillance devices, the process reiterates back to step 1004.
[045] At step 1012, the initial and additional multi-modal inputs are analyzed for determining an actionable event, via machine learning or artificial intelligence. By way of an example, the actionable event may include, but is not limited to, an injured person in road accident, a person in endangered situation, and a fire emergency. At step 1014, a story is made based on the identified incident of interest and the determined actionable event. Thereafter, at step 1016, a notification about the actionable event is provided to one or more appropriate authorities. The notification may further be followed with a recommendation for the actionable event, to the one or more authorities. By way of an example, one or more authorities may include, but are not limited to, nearby police station, nearby hospital, and nearby fire station.
[046] In an alternative embodiment, if the identified incident of interest is not received from public surveillance devices, then, at step 1018, a user device in possession of the user may be triggered via an external device or by a user. The user device may then initiate a trigger for collecting initial multi-modal inputs. By way of an example, the user device may include the emergency application discussed. The user device may include, but is not limited to, an access device 110, a fitness tracker, a tracking device, a specialized device. At step 1020, the user device may communicate with one or more external devices 112 for collecting additional multi-modal inputs. By way of an example, the external devices 112 may include, but are not limited to, camera, voice recorder, fitness tracker, and health monitoring system. Additionally, in some embodiments, the user device may note a timestamp at which a request is initiated to the external devices for collecting the additional multi-modal inputs, and a timestamp at which the user device receives the additional multi-modal inputs. The user device may also capture a geo-location information.
[047] For example, in some embodiments, if the trigger is from blood sugar level sensor or blood pressure monitor, the user device may recognize and receive the metadata in order to identify and make sense of the shared data. It should be noted that the user device may receive a trigger when the blood sugar level drops beyond a pre-defined threshold or when blood pressure crosses pre-defined thresholds. In that case, the blood sugar level sensor and blood pressure monitor may continuously share sugar level and blood pressure level with the user device. Thereafter, at step 1020, a first processing of the initial multi-modal inputs and the additional multi-modal inputs may be performed by the user device. By way of an example, the user device may perform the first processing based on emergency application installed on the user device. The user device may then share the initial and the additional multi-modal inputs with the edge server 104 for further processing. The edge server 104, at step 1012, may then analyze the initial and additional multi-modal inputs for determining an actionable event, via machine learning or artificial intelligence. By way of an example, the actionable event may include, but is not limited to, a health emergency. At step 1014, a story is made based on the identified actionable event. Thereafter, at step 1016, a notification about the identified actionable event is provided to one or more appropriate authorities. The notification may further be followed with a recommendation for the actionable event about the actionable event, to one or more authorities. By way of an example, one or more authorities may include, but are not limited to, a health center or emergency response team.
[048] By way of another example, an incident of interest may be coming from a handheld device. For example, a person riding a vehicle may experience sudden shock as detected by the handheld device. In such case, if the device ceases to move soon after, it’s indication of a potential fall or an accident encountered by the person. Under such condition, the application resident on the handheld device, may trigger an emergency alarm either directly or via the communicatively coupled mobile device, subject to further check to eliminate false trigger.
[049] By way of a further example, an incident of interest may be coming from a fitness tracking device. For example, a person running away from criminal may experience increased heart rate due to panic. In such case, if the heart rate reading crosses a certain pre-defined enhanced threshold (threshold greater than that experienced due to normal running) soon after, it’s indication of a potential threat encountered by the person. Under such condition, the fitness tracking device may trigger an emergency application resident on the access device, which may trigger an emergency alarm, subject to further check to eliminate false trigger. In such case, the edge server may further validate threat by correlating a geo-location of the perceived threat with feed from a public CCTV camera in that geo-location.
[050] As will be also appreciated, the above described techniques may take the form of computer or controller implemented processes and apparatuses for practicing those processes. The disclosure can also be embodied in the form of computer program code containing instructions embodied in tangible media, such as floppy diskettes, solid state drives, CD-ROMs, hard drives, or any other computer-readable storage medium, wherein, when the computer program code is loaded into and executed by a computer or controller, the computer becomes an apparatus for practicing the invention. The disclosure may also be embodied in the form of computer program code or signal, for example, whether stored in a storage medium, loaded into and/or executed by a computer or controller, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. When implemented on a general-purpose microprocessor, the computer program code segments configure the microprocessor to create specific logic circuits.
[051] It will be appreciated that, for clarity purposes, the above description has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units, processors or domains may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controller. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality, rather than indicative of a strict logical or physical structure or organization.
[052] The techniques described in various embodiments discussed above provide for validated and actionable event that require remedial response. The techniques further provide for layered details of the situation by identifying a type and a severity of the situation, a condition of an individual in the situation, and so forth. As will be appreciated, such details help in providing appropriate and better remedial response. Thus, the techniques provide for 2 level of processing of information - a first level processing of initial multi-modal inputs and a second level processing of initial and additional multi-modal inputs. In some embodiments, the first level of processing may be performed at device level while the second level of processing may be performed at edge server. This reduces the latency and bandwidth requirement and increases computational efficiency at the edge server.
[053] In some embodiments, the techniques provide for detection of any untoward activity by using public infrastructure as well as other personal devices. The techniques further provide for collection of additional information to form a story about any untoward activity. It should be noted that the techniques provide for emergency trigger and additional information collection in an automatic as well as semi-automatic way. This helps in handling aggravated emergency conditions in an effective and efficient manner. For example, the techniques may trigger alarm and capture additional details with respect to an endangered person without drawing attention of the aggravator.
[054] The specification has described method and system for detecting and notifying actionable events during surveillance. The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[055] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[056] It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.
| # | Name | Date |
|---|---|---|
| 1 | 201941054420-STATEMENT OF UNDERTAKING (FORM 3) [30-12-2019(online)].pdf | 2019-12-30 |
| 2 | 201941054420-REQUEST FOR EXAMINATION (FORM-18) [30-12-2019(online)].pdf | 2019-12-30 |
| 3 | 201941054420-POWER OF AUTHORITY [30-12-2019(online)].pdf | 2019-12-30 |
| 4 | 201941054420-FORM 18 [30-12-2019(online)].pdf | 2019-12-30 |
| 5 | 201941054420-FORM 1 [30-12-2019(online)].pdf | 2019-12-30 |
| 6 | 201941054420-DRAWINGS [30-12-2019(online)].pdf | 2019-12-30 |
| 7 | 201941054420-DECLARATION OF INVENTORSHIP (FORM 5) [30-12-2019(online)].pdf | 2019-12-30 |
| 8 | 201941054420-COMPLETE SPECIFICATION [30-12-2019(online)].pdf | 2019-12-30 |
| 9 | 201941054420-Request Letter-Correspondence [01-01-2020(online)].pdf | 2020-01-01 |
| 10 | 201941054420-Power of Attorney [01-01-2020(online)].pdf | 2020-01-01 |
| 11 | 201941054420-Form 1 (Submitted on date of filing) [01-01-2020(online)].pdf | 2020-01-01 |
| 12 | abstract_201941054420.jpg | 2020-01-02 |
| 13 | 201941054420-FER.pdf | 2021-10-17 |
| 14 | 201941054420-Proof of Right [25-03-2022(online)].pdf | 2022-03-25 |
| 15 | 201941054420-POA [25-03-2022(online)].pdf | 2022-03-25 |
| 16 | 201941054420-OTHERS [25-03-2022(online)].pdf | 2022-03-25 |
| 17 | 201941054420-Information under section 8(2) [25-03-2022(online)].pdf | 2022-03-25 |
| 18 | 201941054420-FORM 3 [25-03-2022(online)].pdf | 2022-03-25 |
| 19 | 201941054420-FORM 13 [25-03-2022(online)].pdf | 2022-03-25 |
| 20 | 201941054420-FER_SER_REPLY [25-03-2022(online)].pdf | 2022-03-25 |
| 21 | 201941054420-DRAWING [25-03-2022(online)].pdf | 2022-03-25 |
| 22 | 201941054420-CLAIMS [25-03-2022(online)].pdf | 2022-03-25 |
| 23 | 201941054420-AMENDED DOCUMENTS [25-03-2022(online)].pdf | 2022-03-25 |
| 24 | 201941054420-US(14)-HearingNotice-(HearingDate-15-04-2024).pdf | 2024-03-11 |
| 25 | 201941054420-Correspondence to notify the Controller [19-03-2024(online)].pdf | 2024-03-19 |
| 26 | 201941054420-FORM-26 [10-04-2024(online)].pdf | 2024-04-10 |
| 27 | 201941054420-Written submissions and relevant documents [30-04-2024(online)].pdf | 2024-04-30 |
| 28 | 201941054420-PatentCertificate27-06-2024.pdf | 2024-06-27 |
| 29 | 201941054420-IntimationOfGrant27-06-2024.pdf | 2024-06-27 |
| 1 | searchstrategyE_22-07-2021.pdf |