A System For Enhanced Object Tracking.

< Back

A System For Enhanced Object Tracking.

Abstract: The present invention relates to object tracking systems which are used to detect the presence of any moving object in a scene and track the object to distinguish it from other similar objects in the scene and also to record the trajectory of the object. In particular, the invention relates to automatically track the object and zoom on the object so that the detail features of the object is visible in the video frames and which can be advantageously deployable in a real life video , even when the video is infected with noises like shadow, glare, electronic noises etc. Further the system of the invention is directed to be also adaptive to demographic and environmental variations. The object tracking system of the invention is adapted to enhance the functionalities and utility of a traditional Object tracking system and at the same time eliminates the drawbacks of a standalone PTZ camera based tracking mechanism.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 March 2012

Publication Number

37/2013

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2020-03-17

Renewal Date

Applicants

VIDEONETICS TECHNOLOGY PRIVATE LIMITED

PLOT-5, BLOCK-BP, SALT LAKE, KOLKATA-700091

Inventors

1. ACHARYA, TINKU

E-375 BAISHNABGHATA - PATULI TOWNSHIP, KOLKATA - 700091, WEST BENGAL, INDIA

2. BHATTACHARYYA, DIPAK

16/839 KISHORI BAGAN MEARBER PIRTALA, P.O.: CHINSURAH, DIST.:HOOGHLY, PIN: 712101 WEST BENGAL, INDIA.

3. BOSE, TUHIN

BE-1/14/1, PEYARA BAGAN DESHBANDHU NAGAR, CITY:KOLKATA, PIN: 700 059, WEST BENGAL, INDIA

4. DALAL, TUTAI KUMAR

KHIRPAI - HATTALA (WD - 4), DIST: PASCHIM MEDINIPUR, PIN: 721232, WEST BENGAL, INDIA.

5. DAS, SAWAN

16 GREEN VIEW, GARIA, CITY: KOLKATA, PIN: 700084, WEST BENGAL, INDIA.

6. DHAR, SOUMYADEEP

PURBAYAN APPARTMENT, TARASANKAAR ROAD BY LANE, DESBANDHU PARA, P.O.: SILIGURI, DIST: DARJEELING, PIN: 734404, WEST BENGAL, INDIA.

7. MAITY, SOUMYADIP

VILL & PO: DUMARDARI, PIN: 721425, PURBA MEDINIPUR, WEST BENGAL, INDIA

Specification

Field of the Invention
The present invention relates to object tracking systems which are used to detect the
presence of any moving object in a scene and track the object to distinguish it from
other similar objects in the scene and also to record the trajectory of the object. In
particular, the invention relates to automatically track the object and zoom on the
object so that the detail features of the object is visible in the video frames and which
can be advantageously deployable in a real life video , even when the video is
infected with noises like shadow, glare, electronic noises etc. Further the system of
the invention is directed to be also adaptive to demographic and environmental
variations. The object tracking system of the invention is adapted to enhance the
functionalities and utility of a traditional Object tracking system and at the same time
eliminates the drawbacks of a standalone PTZ camera based tracking mechanism.
Background of the Invention
Video Management Systems are used for video data acquisition and search processes
using single or multiple servers. They are often loosely coupled with one or more
separate systems for performing operations on the acquired video data such as
analyzing the video content, etc. Servers can record different types of data in storage
media, and the storage media can be directly attached to the servers or accessed
over IP network. This demands a significant amount of network bandwidth to receive
data from the sensors (e.g, Cameras) and to concurrently transfer or upload the data
in the storage media. Due to high demand in bandwidth to perform such tasks,
especially for video data, often separate high speed network are dedicated to transfer
data to storage media. Dedicated high speed network is costly and often require
costly storage devices as well. Often this is overkill for low or moderately priced
installations.
It is also known that to back up against server failures, one or more dedicated fail-
over (sometimes called mirror) servers are often deployed in prior art. Dedicated fail-
over servers remain unused during normal operations and hence resulting in wastage
of such costly resources. Also, a central server process either installed in the failover
server or in a central server is required to initiate the back-up service, in case a
server stops operating. This strategy does not avoid a single point of failure.
Moreover, when the servers and clients reside over different ends in an internet and
the connectivity suffers from low or widely varying bandwidth, transmission of multi-

channel data from one point to another becomes a challenge. Data aggregation
techniques are often applied in such cases which are computationally intensive or
suffer from inter-channel interference, particularly for video, audio or other types of
multimedia data.
As regards analytic servers presently in use it is well known that there are many
video analytics system in the prior art. Video content analysis is often done per frame
basis which is mostly pre defined which make such systems lacking in desired
efficiency of analytics but are also unnecessarily cost extensive with unwanted loss of
valuable computing resources.
Added to the above, in case of presently available techniques of video analysis ,cases
of unacceptable number of false alarms are reported when the content analysis
systems are deployed in a noisy environment for generating alerts in real time. This
is because the traditional methods are not automatically adaptive to demography
specific environmental conditions, varying illumination levels, varying behavioural
and movement patterns of the moving objects in a scene, changes of appearance of
colour in varying lighting conditions, changes of appearance of colours in global or
regional illumination intensity and type of illumination, and similar other factors.
It has therefore been a challenge to identify the appearance of a non-moving foreign
object (static object) in a scene in presence of other moving objects, where the
moving objects occasionally occlude the static object. Detection accuracy suffers in
various degrees under different demographic conditions.
Extraction of particular types of objects (e.g. face of a person, but not limited to) in
images based on fiduciary points is a known technique. However, computational
requirement is often too high for traditional classifier used for this purpose in the
prior art, e.g., Haar classifier.
Also, in a distributed system where multiple sites with independent administrative
controls are present, unification of those systems through a central monitoring
station may be required at any later point of time. This necessitates hardware and OS
independence in addition to the backward compatibility of the underlying
computational infrastructure components, and the software architecture should
accommodate such amalgamation as well.
It would be thus clearly apparent from the above state of the art that there is need
for advancement in the art of sensory input/data such as video acquisition cum

recording and /or analytics of such sensory inputs/data such as video feed adapted to
facilitate fail-safe integration and /or optimized utilization of various sensory inputs
for various utility applications including event/alert generation, recording and related
aspects.
It is well known that object tracking systems are used to detect the presence of any
moving object in a scene and track the object to distinguish it from other similar
objects in the scene and also to record the trajectory of the object. In some of such
systems Video data of the scene as captured by a fixed camera is analyzed to detect
and track moving objects. However, this requires the background to be stable and
the camera should cover the whole region where the trajectory is to be formed. This
has the side effect that the size of the object in the camera view becomes small,
particularly when the object is far.
To overcome this limitation, PTZ Camera based Tracking Systems are used where A
PTZ camera is used to automatically track the object and zoom on the object so that
the detail features of the object is visible in the video frames. However, traditional
PTZ based tracking system suffers from some major drawbacks and is not deployable
in a real life video , particularly when the video is infected with noises like shadow,
glare, electronic noises etc. Also, the system is non adaptive to demographic and
environmental variations.
Additionally, when PTZ camera starts tracking an object, it loses the visibility of other
parts of the scene. Therefore, some important scene event may be missed while the
PTZ camera tracks one of the objects. This may encourage miscreants to fool the
system. The accuracy of detection and tracking of objects is also very low, as there is
no fixed background while the tracking is in progress and the foreground objects are
to be extracted based on motion detection or some modified version of the method or
using some modified version of object extraction technique from still images. In case
of some tracking error, which is likely to occur when the speed of the object in the
scene is high or random, the system cannot recover from this error state in a short
time, as it loses visibility of the object.
Objects of the Invention
It is thus the basic object of the present invention to provide for a system for
enhanced object tracking which would be efficient and enable object tracking in
conjunction with one or more PTZ cameras.

Another object of the present invention is directed to a system for enhanced object
tracking wherein when an object is detected in the Fixed camera view, the object
tracking system would be adapted to track the object and pass on the positional
information of the object along with a velocity prediction data to the PTZ camera
controller.
A further object of the present invention is directed to a system for enhanced object
tracking wherein if more than a single object is detected, one object is taken at a
time for handling based on select criteria (viz, the zone of appearance of the object,
the duration of the object in the scene etc.).
Yet further object of the present invention is directed to a system for enhanced object
tracking involving a PTZ camera controller adapted to receive the positional
information of the object for each frame and estimates corresponding position of the
object in the PTZ camera view involving an advanced Scene Registration and
coordinate transformation technique.
Another object of the present invention is directed to a system for enhanced object
tracking which would enhance the functionalities and utility of a traditional Object
tracking system and at the same time eliminates the drawbacks of a standalone PTZ
camera based tracking mechanism.
Yet another object of the invention is directed to a system for enhanced object
tracking which can be extended to develop systems to handle multiple objects in
parallel with the more than one PTZ cameras and further adaptable to trigger from
multiple fixed cameras to develop a system with multiple fixed cameras and multiple
PTZ cameras together to cover a wider range in the scene, or to enhance multiple
Object tracking systems over a single framework.
Another object of the present invention is directed to advancements in system
discussed above by interconnecting a number of intelligent components consisting of
hardware and software, and involving implementation techniques adapted to make
the system efficient, scalable, cost effective, fail-safe, adaptive to various
demographic conditions, adaptive to various computing and communication
infrastructural facilities.

Summary of the Invention
Thus according to the basic aspect of the present invention there is provided a
system for enhanced object tracking comprising:
object tracking means in conjunction with one or more PTZ cameras wherein when
an object is first detected in a fixed camera view of the said object tracking means
the same is adapted to track the object and also generate and transmit the
positional values alongwith a velocity prediction data to the PTZ camera controller;
said PTZ camera controller adapted to receive the positional information of the object
in the PTZ camera view periodically involving scene registration and coordinate
transformation technique.
A system for enhanced object tracking as above wherein more than one object is
tracked involving multiple PTZ cameras such as to cover a wider range in the scene
and to enhance multiple object tracking over a single framework.
A system for enhanced object tracking as above wherein said means of coordinate
transformation from fixed camera view to PTZ camera view involves coordinate
transformation technique comprising weighted interpolation method.
A system for enhanced object tracking as above which is adapted to carry out said
coordinate transformation following:
a. identifying a set of points in the static camera as A ,B, etc and also
corresponding points A',B', etc respectively in the PTZ camera by the user;
b. mapping any arbitrary point C in the static camera to the corresponding point
C in the PTZ camera view dynamically wherein:
ax, bx, cx are x-coordinates of points A, B and C respectively in the static Camera view
and similarly a'x, b'x and c'x are for the corresponding points in PTZ view where C is
interpolated with the help of points A and B, with a confidence factor WAB , where WAB
[Minimum of (Cx - Bx, Cx - Ax)] is determined to be

and wherein similarly, an estimate of x-coordinate of the same point C is generated
for all pair of points (A, B) in the Static camera view based on:

and similarly generating also the y-coordinate Cy for the point C.
A system for enhanced object tracking as above wherein for a bounding rectangle to
be mapped from the static view to the PTZ view, the system is adapted to apply said
coordinate transformation technique for all the four corner points of the rectangle.
A system for enhanced object tracking as above wherein the bounding rectangle
corresponding to an object in the static camera view is associated with a velocity
prediction information, the system is adapted to apply that velocity prediction
information to map the rectangle in the PTZ camera view.
The framework disclosed herein can be used for such situations, and also for
integrating multiple heterogeneous systems in a distributed environment. The
proposed architecture is versatile enough to interface and scale it to many other
management systems.
The details of the invention and its objects and advantages are explained hereunder
in greater detail in relation to the following non-limiting exemplary illustrations as per
the following accompanying figures:
Brief Description of the Drawings
Fig 1: is a schematic layout of an illustrative embodiment showing an integrated
intelligent server based system of the invention having sensory input/data
acquisition cum recording server group and /or analytics server group adapted to
facilitate fail-safe integration and /or optimized utilization of various sensory inputs
for various utility applications;
Fig 2 : is an illustrative top level view of intelligent video management system with
framework for multiple autonomous system integration;
Fig 3: is an illustration of fail-safe bandwidth optimized recording without any
supporting failover support server in accordance with the present invention;

Fig 4.:is an illustration of the dataflow diagram from a single video source through
the recording server ;
Fig. 4A to 4J: illustrate an exemplary Intelligent Home Security" box involving the
system of the invention;
Fig.5 : is an illustration of the single channel data flow in video analytical engine in
accordance with the present invention;
Fig. 6: is an illustration of intelligent video analytics server in accordance with the
present invention;
Fig.7 : is an illustration of video management interface functionalities in accordance
with the present invention;
Fig.8 : is an illustration of the enhanced object tracking system in accordance with
the present invention;
Fig.9: is an illustration of the coordinate transformation used in the present
invention;
Detailed Description of the Invention:
Reference is first invited to accompanying figure 1 which shows the broad overview
of an illustrative embodiment showing an integrated intelligent server based system
having sensory input/data acquisition cum recording server group and /or analytics
server group adapted to facilitate fail-safe integration and /or optimized utilization
of various sensory inputs for various utility applications. More specifically, the system
involves the method for bandwidth adaptive data transfer to central storage cluster in
accordance with the present invention.The following description in relation to figures
1 to 7 deals with the utilities of the advancement in an integrated intelligent server
based system and further in relation to figures 8 and 9 further illustrates the manner
of effecting the stated system for enhanced object tracking in accordance with the
present invention.

As would be apparent from the figure the system basically involves the self-reliant
group of recording servers (101), the group of analytical servers (102) and an
intelligent interface (103).Importantly, said recording servers apart from being
mutually cooperative and self-reliant to continuously monitor and distribute the
operative load based on the number of active servers in the group are also adapted
for bandwidth optimized fail-safe recording ((104 ) and join-split mechanism for multi
channel video streaming ( 105).
The analytical servers (102) are also adapted to cater to atleast one of more of
background estimation (106), identifying moving, static, quasi static objects ( 107),
enhanced object tracking (108), content aware resource scheduling ( 109) , join-split
mechanism for sensory date streaming (110) and resource dependent accuracy
control (111).
The various components of the above system adapted to carry out the above
advanced functionalities in accordance with the present invention is further outlined
and schematically described in Fig 2:
1. Intelligent Video Management System (204)
1.1. Video Recording Server (201)
1.2. Video Management Interface (203)

1.2.1. User management and Client access controller
1.2.2. Event concentrator and Handler (206)
1.2.3. Event distributor

2. Intelligent Video Analytics Server (202)
3. Surveillance Client (207)
4. Web client (207)
5. Mobile device Client (207)
6. Remote Event Receiver ( 206 )
As is clearly apparent from Figure 2, the present system would enable seamless and
intelligent Interconnection of multiple Autonomous Systems (210-01;210-02... 210-
On). Thus at the same time, multiple such Autonomous Systems can be used as
building blocks for a distributed system spanning across wide geographical regions
under different local administrative control, with a Centralized view of the whole
system from a single point. An Autonomous system (210-01)) is considered as a
system capable to implement the functionalities and services involving sensory data
and /or its analysis.

Also, the system is capable of handling any sensory data/input and it is only by way
of an illustration but not by way of any limitations of the present system that the
various exemplary illustrations hereunder are discussed with reference to video
sensory data. The underlying system architecture/methodology is applicable in other
sensory data types for a true Intelligent Sensor Management System .
A number of machine vision products spanning the domain of Security and
surveillance, Law enforcement, Data acquisition and Analysis, Transmission of
multimedia contents, etc can be adapted to one or more or the whole of the system
components of the present invention.
Reference is now invited to accompanying figure 3 which shows by way of an
embodiment a fail-safe bandwidth optimized recording without any failover support
server. As apparent from said figure, for the purpose the input from the pool of
sensors (305) are fed not to any single server but to a group of servers
(301).Importantly , communication channel (303) is provided to carry inter-VRS
communication forming a team towards failover support without any central
management and failover server while the communications channel (302) is provided
to carry data to central storage involving Intelligent bandwidth sharing technique of
the invention.
The implementation of the Recording System :
The Recording system essentially implements the functionalities and services as
hereunder:
1. Collecting Data real time: Collect data from various images, video and
other sensory sources, both on-line and off-line, archiving and indexing
them to seamlessly map in any relational or networked database in a
fail-safe way making optimal usage of computing, communication and
storage resources, facilitate efficient search, transcoding,
retransmission, authentication of data, rendering and viewing of
archived data at any point of time.
2. Streaming data real time or on Demand: Streaming video and other
sensory content in multiple formats to multiple devices for purposes like
live view in different matrix layout, relay of the content, local archiving,
rendering of the sensory data in multiple forms and formats, etc. by a

fail-safe mechanism without affecting speed and performance of on-
going operations and services.
The Video Recording system is implemented using hardware and software, where the
hardware can be any standard computing platform operated under control of various
operating systems like Windows, Linux, MacOS, Unix, etc. Dependence on hardware
computing platform and operating system has been avoided and no dedicated
hardware and communication protocol has been used to implement the system.
Recording server implements an open interface both for input and output, (including
standard initiatives by various industry consortium such as ONVIF, PSIA, etc.), and
can input video feed from multiple and different types of video sources in parallel,
with varying formats including MPEG4, H.264, MJPEG, etc. OEM specific SDKs to
receive video can also be used. Internal operating principle of the Recording server is
outline below:
Recording Server operating principle is adapted for the following:
1. Auto register itself to the IVMS system so that other components like VMS,
Surveillance Clients, other VRSes can automatically find and connect it even
when its IP-address changes automatically or manually.
2. Form a group with other VRS in the system to implement a failover support
without any central control and without support from any dedicated failover
server.
3. Accept request from VMI to add and delete data sources including video
sources like cameras, receive data from those input sources over IP-network
or USB or other connectivity, wired or wireless, using open protocols or SDKs
as applicable for a particular data source.
4. Record the video and other sensory data in local storage either continuously or
on trigger from external devices including the data source itself or on trigger
from other components of the Video management system or on user request
or on combination of some of the above cases
5. Intelligently upload the video or other sensory data in a cluster of storage
devices, where a cluster contains of one or more network accessible storages,

in an efficient way giving fair share to individual data sources, utilizing
optimal bandwidth and in a cooperative way.
6. Insert information in database so that the data including video data can be
searched easily by any component in the system.
7. Stream the video or other sensory data in their original format or in some
other transcoded format to other devices including the Surveillance clients
when the surveillance client connects it using defined protocol.
Auto registration of servers:
All the servers in the system, including the Recording servers, auto register
themselves by requesting and then getting a unique Identification number (ID) from
the VMI. All the configuration data related to the server including the identification of
data sources including the video sources it caters to, the storage devices it uses, etc
are stored in the database against this ID. This scheme has the advantage that with
only one Static IP address (that of the VMI), one can access any component of the
Autonomous System (AS), and the IP addresses of the individual hardware
components may be kept varying.
Recording Video or other sensory data in local storage and streaming the data to
Client machine:
The cameras, other video sources or sources generating streaming data (henceforth
called Channels) can be auto detected or manually added to the VRS. The details of
the channels are stored in the Central Database. Once done, one or more channels
can be added to the Recording System. The Recording system thus comprises of one
or more Recording servers (VRS) and the Central Database Management System.
VRS-es consults the database, know about details of the system, and records the
channel streaming data either continuously, or on trigger from any external or
internal services, as configured by the user.
The data stream is first segmented into small granular clips or segments of
programmable and variable length sizes (usually of 2 to 10 minutes duration) and the
clips are stored in the Local storage of the server, the clip metadata being stored in
local database.

Reference is invited to accompanying figure 4 which shows the dataflow mechanism
in accordance with the invention from a single video service through the recording
server. As apparent from Figure 4, the sensory data stream viz. video (405) is feed
to a data segment generator (401) which is next stored in segments in local storage
(403/402) and thereafter uploaded through data upload module (404) to a central
storage (406)/407).
Any external component of the system can enquire the VRS to know about the details
of the channels it is using and get the data streams for purposes like live view,
Relaying to other devices etc using a networked mutual client-server communication
protocol
Bandwidth adaptive data uploading to central storage system
In the system of the invention, an efficient technique has been designed to transfer
video or other sensory data received from the channels to the central storage system
via the local storage. Instead of allocating a particular data source (e.g., a camera) to
a particular server (dedicated point to point) for recording of data (e.g, video), it is
allocated to a 'Server group' with multiple servers in the group [Fig 3]. The members
of the group exchange their capacity information amongst themselves and share the
load according to their capacity. In case of breakdown of one or more servers, the
team members share the load of the failed server(s), without any central control or
without support from any dedicated fail-over server. For data uploading, each server
not only monitors the available bandwidth but also the data inflow rate for each
channel into the server, and accordingly adjusts the upload rate for an individual
channel. For the purpose the data stream is segmented into variable sized clips and
the rate of uploading the clips to the central storage is adjusted depending on the
available network bandwidth and data inflow rate for that particular channel [Fig
4].As shown in the figure , the sensor data stream ( 405) is segmented in data
segment generator (401) which is next stored in local storage ( (402 ,403) and
thereafter involving a data upload module (404) the same is sent to the central
storages ( 406/407).
Implementing fail-over support without any dedicated failover server and mirror
central control
The system of the invention is further adapted for back up support in case of server
failure without the involvement of any special independent stand by support server.
Traditionally (prior art), dedicated fail-over servers are used which senses the

heartbeat signals broadcasted by the regular servers. Once the heart beat is found
missing, the failover server takes up the task of the failed server. This technique is
inefficient as it not only blocks the resources as dedicated failover servers, but cannot
utilize the remaining capacity of the existing servers for back up support. Also, failure
of the failover server itself jeopardizes the overall failover support system.
In the proposed system the recording servers exchange information amongst
themselves so that each server knows the leftover capacity and the channel
information of every other server. In case of server failure, the remaining active
servers distributes the load amongst themselves.
The Implementation of the Video Analytics System
The Video Analytics System essentially implements the functionalities as hereunder:
1. Data Content Analysis: Intelligently analysing the data, on-line or off-line, to
extract the meaningful content of the data, identifying the activities of
foreground human and other inanimate objects in the scene from the sensor
generated data, establishing correlation among various objects (living or
non-living) in the scene, establishing correlation amongst multiple types of
sensory data, identifying events of interests based on the detected activities-
— all either automatically or in an user interactive way under various
demographic and natural real life situations. Several novelties have described
in the relevant sections describing the details of the data content analysis
techniques.
2. Automatic Alert Generation: Generating Alerts, signals, video clips, other
sensory data segments, covering the events automatically as and when
detected.
The Video Analytics system comprises hardware and software, where the hardware
can be any standard computing platform operated under control of various operating
systems like Microsoft Windows, Linux, MacOS, Unix, RTOS for embedded hardware,
etc.
Dependence on hardware computing platforms and operating systems has been
avoided and no dedicated closed hardware needs to be used to implement the
system. At the same time, part or whole of the system can be embedded into other
products with some existing services, without affecting those services.

An example is provided in the form of "Intelligent Home Security" box shown in
Figures 4A to 4J where a specially built hardware is used to provide several services
viz, Digital Photo-frame, Perimeter security, Mobile camera FOV recording & relay,
Live view of cameras, etc.
Referring to FIG. 4A, a schematic diagram of a Networked Intelligent
Villa/Home/Property Monitoring System is shown. All of the intelligent video
management server and intelligent monitoring applications that are described in
previous sections have been embedded into the Videonetics Box. The Box has an
easy to use GUI using touch-screen so that any home/villa/property owner can easily
operate it with minimum button pressing using visual display based instructions only.
The top level systems architecture for the embedded hardware and details of the
components in the hardware system is shown in FIG. 4B.
The following is a micro-architectural components summary for an example of a
multi-channel IP-camera solution. Video from IP-Cameras is directly fed to the
computer without the requirement of any encoder. There are three options: One, no
network switch is required. The Motherboard should have multiple Ethernet ports;
two, the Motherboard has only one Ethernet port assuming all the cameras are
wireless IP-Cameras. The Motherboard should have 1 x Ethernet port and 1 x Wifi
interface; and three, the Motherboard has only one Ethernet port, the cameras are
wired, but a Network switch is required as an external hardware.
On detection of events the following tasks are performed:
a siren blows;
an SMS/MMS is sent;
event clip is archived; and
the event clip is also streamed to any designated device over the Internet.
The following Interfaces are required to handle the above tasks: at least one RELAY
O/P for siren drive or DIO for Transmitter interface; and a 3G interface for SMS/MMS
or sending event clip to Cell Phone. Other usual hardware includes:
a. USB;
b. Touch Screen Interface;
c. external storage;
d. 3G dongle, if 3G is not embedded into motherboard;
e. keyboard, if touch screen is not attached; and
f. DVI port for display.

The following is a micro-architectural components summary for an example of a
multi-channel analog camera solution. Video from analog camera is received by an
encoder hardware. The encoded RAW image is fed to the computer for processing.
System Hardware should be capable to handle the following activities:
1. multi channel encoding, each at 15 - 30 fps for Dl size, but not limited to, higher
frame rate and higher resolution as long as computing bandwidth supports this frame
rate and resolution video data
a. Input to encoder: Analog video in NTSC or PAL
b. Output from encoder: YUV or RGB
There are two options:
a. The encoder could be a separate module connected to motherboard
through PCIE
b. The encoder circuitry may be embedded in the mother board
2. On detection of events following tasks are performed:
a. A siren blows
b. An SMS/MMS is sent
c. Event clip is archived
d. Event clip is also streamed to any designated device over Internet
The following hardware Interfaces are required to handle the above tasks:
a. At least one RELAY O/P for siren drive or External Transmitter interface
(DIO)
b. 3G interface for SMS/MMS or sending event clip to Cell Phone.
c. Ethernet for remote access to the system
3. Other usual hardware:
1. USB :
a. Touch Screen Interface
b. External Storage
c. 3G dongle, if 3G is not embedded into motherboard
d. keyboard if touch screen is not attached

e. DVI port: for Display
Referring to FIG. 4C, a top level heterogeneous system architecture (both IP and
analog cameras) is illustrated. Referring additionally to FIGS. 4D-4J an operational
flow by a user and representative GUI using a touch panel display of the intelligent
monitoring system is detailed in a step-by-step flow.
Thus, a new and improved intelligent video surveillance system is illustrated and
described. The improved intelligent video surveillance system is highly adaptable
and can be used in a large variety of applications can be conveniently adapted to a
variety of customer-specific requirements. Also, the intelligent video surveillance
system is automated, intelligent, and requires a minimum or no human intervention.
Various changes and modifications to the embodiment herein chosen for purposes of
illustration will readily occur to those skilled in the art. To the extent that such
modifications and variations do not depart from the spirit of the invention, they are
intended to be included within the scope thereof.
The Analytics Engine
Various rule sets for inferencing the dynamics of the data (interpretation of Events)
are defined inherently in the system or they can be defined by the users. An Analytics
engine detects various activities in the video or other sensory data stream and on
detection of said activities conforming to one or more Events, sends notification
messages with relevant details to the recipients. The recipients can be the VMI, the
central VMS or Surveillance Clients or any other registered devices. To perform the
above tasks, the scene is analyzed and the type of analysis depends on the type of
events to be detected.
The data flow within the Analytics Engine for a single channel, taking video stream as
the channel data, is as schematized below [Fig. 5],. The functionalities of various
internal modules of the Analytics Engine and other components are described below,
taking Video channel as an example for Sensory data source.
(A) Scene Analyzer (501) : The Scene analyzer is the primary module of the
Analytics engine and that of the IVAS as well. Depending on the Events to be
detected, various techniques have been developed to analyze the video and sensory
data content and extract the objects of interests in the scene or the multi-sensory
acquired data. Importantly, the scene analyzer is adapted to analyze the content of

the media(e.g, video) based on intelligent scene adaptive colour coherent object
analysis framework and method . Implementation of the same has been done so
that it is adaptive to the availability of computational bandwidth and memory and the
processing steps are dynamically reconfigured. As for example, as described further
in detail hereunder a trade-off is done automatically by the Analytics engine to strike
a balance between the accuracy of face capture and the CPU clock cycles available for
processing.
The Scene Analyzer generates meta-data against each frame supplied to it for
analyzing. It also computes the complexity of the scene using a novel technique and
dynamically reconfigure the processing steps in order to achieve optimal analysis
result depending upon the availability of the computational and other resources for
on-line and real-time detection of events and follow up actions. It feeds the metadata
along with the scene complexity measure to the Controller, so that the Controller can
decide the optimal rate at which the frames of that particular video channel should be
sent to the Analytics engine for processing. This technique is unique and saves
computational and memory bandwidth for decoding and analysis of the video frames
(B) Rule Engine (502): The Rule Engine keeps history of the metadata and correlates
the data across multiple frames to decide behavioural patterns of the objects in the
scene. Based on the rules, various applications can be defined. As for example it is
possible to detect whether a person in jumping a fence or whether there is a
formation of crowd or whether a vehicle is exceeding the speed limit, etc.
(C) Event Decider (503): The behavioural patterns, as detected by the Rule Engine is
analyzed by this module to detect various events in parallel. The Events can be
inherently defined or it may be configured by the user. As for example, if there is
crowd formation only in a specific zone whereas other areas are not crowded, that
may be defined to be an Event. Once an Event is detected, a message is generated
describing the type of event, time of occurrence of the Event, the location of
occurrence of the Event, the Video clip URL, etc.
The Event decider can also control any external device including a PTZ camera
controller which can focus a region where the event has taken place for better
viewing of the activities around that region or recording the scene in a close up view.
One such advanced framework is detailed hereunder as enhanced object tracking
where the utility of an Object tracking system is enhanced using a novel technique
using a PTZ camera along with the Object tracking system.

The Analytics Engine Controller
A Controller module (602) as shown in Figure 6 has been designed which can receive
multiple video channels, possibly in some compressed form (e.g, MJPEG, Motion
JPEG2000, MPEG, H.264, etc. for video and relevant format for other sensory data
such as MP4 for audio, for example but not limited to), and feeds the decoded video
frames to the Analytic engine. The Controller uses an advanced technique to decide
the rate of decoding of the frames and feed the decoded video frames of multiple
channels to the Analytics engine in an optimal way, so that the number of frames
sent per second for each video channel is individually and automatically controlled
depending on the requirement of the Analytics engine and also on the computational
bandwidth available in the system at any point of time. The technique has been
described in detail in relation to video content driven resource allocation for analytical
processing.
The Controller also streams the video along with all the Video Analytics data (existing
configuration for Events, Event Information, video clip URL etc), either as individual
streams for each channel, or as a joined single stream of video data for all or user
requested channels. A novel technique for joining the video channels and
transmitting the resulting combined single channel over IP network has been
deployed to adapt to varying and low bandwidth network connectivity. The technique
is described in detail in relation to video channel join-split mechanism for low
bandwidth communications.
The Controller can generate Events on its own for the cases where Events can be
generated without the help of Video Analytics engine (eg, Loss of Video, Camera
Tampering as triggered by Camera itself, Motion detection as intimated by the
Camera itself, as so on).
The implementation of Video Management Interface (VMI)
The Video Management Interface (702) is shown in figure 7 which interfaces between
an individual Autonomous System and rest of the world. It also acts as the
coordinator among various other components within a single Autonomous system,
viz, Video Recording System (703), Intelligent Video Analytical Server (704),
Surveillance Clients (701), Remote Event Receiver (705), etc. [It essentially
implements the functionalities including:

1. Filtering and need based transmission of data: Distribution of whole or part
of the collected sensory data, including the video and other sensory data
segments generated as a result of detection of an Event by the Analytical
engine, at the right recipient at the right point of time automatically or on
user interaction.
2. Directed distribution of Alerts: Distributing Event information in various
digital forms (SMS, MMS, emails, Audio alerts, animation video, Text,
illustrations, etc. but not limited to) with or without received data segments
(viz, video clips) to the right recipient at the right point of time
automatically or on user interaction.
3. Providing a common gateway for heterogeneous entities: Providing a unified
gateway for users to access the rest of the system for configuration,
management and monitoring of system components.
The Interface operating principle involved in the system is discussed hereunder:
1. Auto register itself to the IVMS system so that other components like
Surveillance Clients (including Web Clients and Mobile Clients), Remote Event
Receivers, can find and connect it even when its IP-address changes;
2. Accept request from Surveillance clients to add and delete data sources like
cameras to the VRSes and IVASes and relay the same to the corresponding
VRSes and IVASes.
3. Receive configuration data from the Surveillance clients and feed them to the
intended components (viz, VRS, IVAS, DBMS, Camera etc) of the system. For
VRS, the configuration data includes Recording parameters, Database paths,
Retention period of recording, etc. For IVAS, it is the Event and Application
settings, Event clip prologue-, after event- and lifetime-duration, etc.
4. Receives Event information from IVAS on-line and transmit it to various
recipients including Remote Event Receivers. Fetch outstanding Event clips, if
any, from IVAS. Outstanding clips may have been there inside IVAS, in case
there was a temporary network connectivity failure to IVAS.

5. Periodically receive heartbeat signals along with status information from all
the active devices, and relay that to other devices in the same or in other
networks.
6. Serve the Web clients and Mobile embedded clients by streaming Live video,
Recorded Video or Event Alerts at the right time.
7. Join multiple channel video into a single combined stream to adapt to variable
and low bandwidth network. A novel technique for joining the video channels
and transmitting the resulting combined single channel over IP network has
been deployed to adapt to varying and low bandwidth network connectivity.
The technique is described in relation to video channel join -split mechanism
for low bandwidth communication.
8. Enable the user to search for the recorded video and the Event clips based on
various criteria, including Data, Time, Event types, Video Channels.
9. Enable the user to perform an User-interactive Smart search to filter out
desired segment of video from video database
In essence, once the Interface (702) is installed, the VRS (703), IVAS (704) and
other components of the system can be configured, and the user can connect to the
System. However, at run time all the VRS and IVAS can operate on their own, and do
not require any service from the VMI, unless and otherwise some System
configuration data has been changed.
Independence for of the servers from any Central controller for their routine
operation gives unprecedented scalability with respect to increase in number of
servers. This is because, it does not add any extra load to any other component than
the server itself. This is a unique advancement where the Video Management Server
Interface acts only as a unified gateway to the services being executed in other
hardware devices, only for configuration and status updating tasks. This opens up the
possibility of keeping the User interface software unchanged while integrating new
type of devices. The devices themselves can supply their configuration pages when
the VMI connects to them for configuration. Similarly, the messages generated by the
servers can also be shown in the VMI panel seamlessly.
The Video Management Client(701), Web client(707), Mobile device embedded client
(708)

All the above client modules in essence implement the functionalities including:
Providing Live view or recorded view of the data stream: Enabling user to view
camera captured video in different matrix layouts, view other sensory data in a
presentable form, recorded video and other data search and replay, Event clips
search and replay, providing easy navigation across camera views with help of
sitemaps, PTZ control, and configuring the system as per intended use.
The VMS system can be accessed through the standalone surveillance client or any
standard Internet browser can be used to access the system. Handheld devices like
Android enabled cell phone or tablet PCs can also be used as a Client to the system
for the purposes (wholly or partially) as mentioned above.
The Remote Event receiver (705)
RER (705) shown in Figure 7 is the software module which can be integrated to any
other modules of the IVMS. The Remote Event Receiver is meant to receive and
display messages and ALERTs from other components, which are multicast or
broadcasted. Those messages include Event ALERTS, ERROR status from VRS or
IVAS, operator generated messages, etc. The Messages can be in the Video as well as
Audio form, or any other form as transmitted by the Video management system
components and the resulting response from by the RER depends on the capability
and configuration of the hardware where the RER is installed. When integrated with
the Surveillance clients (IVMC), the IVMC can switch to RER mode and thus will
respond to ALERTs and messages only.
The Central VMS system
Central VMS System (204 in Figure 2) is adapted to serve as a gateway to any
Autonomous System (210-01...210-0n) components. It also stores the configuration
data for all ASes in its Centralized database. It is possible to integrate otherwise
running independent VMS systems into a single unified system by including Central
VMS in a Server and configure that accordingly.
The Sitemap Server
A Sitemap server is included within each Autonomous System (210-01...210-0n) and
also within the Centralized VMS(204 in Figure 2). The Sitemap server listens to
requests from any authorized components of the System and responds with positional

data corresponding to any component (Camera, server, user etc.) which is linked to
the Site map. The Site map is multilayered and components can be linked to any
spatial position of the map in any layer.
The above describe the framework, architecture and system level components of the
Intelligent system of the invention. The technology involved in the development of
the system can be used to integrate various other types of components not shown or
discussed above. As for example, an Access Control System or a Fire Detection
System can be integrated similar to VRS or IVAS, configured using IVMC and VMI,
and their responses or messages can be received, shown or displayed and responded
to by IVMC or RER, stored as done for Event clips or Video segments and searched on
various criteria.
The system of the invention detailed above is further versatile enough to interface
and scale to many other management systems such as the involvement in intelligent
automated traffic enforcement system also discussed in later sections.
Reference is now invited to accompanying figures 8 involving references of
components/ features/stages 2201 to 2209 and Figure 9 which illustrate in greater
detail the features of the advancement involving enhanced object tracking .
Object tracking systems are used to detect the presence of any moving object in a
scene and track the object to distinguish it from other similar objects in the scene
and also to record the trajectory of the object. In some of such systems video data
of the scene as captured by a fixed camera is analyzed to detect and track moving
objects. However, this requires the background to be stable and the camera should
cover the whole region where the trajectory is to be formed. This has the side effect
that the size of the object in the camera view becomes small, particularly when the
object is far.
To overcome this limitation, PTZ Camera based Tracking Systems are used where A
PTZ camera is used to automatically track the object and zoom on the object so that
the detail features of the object is visible in the video frames. However, traditional
PTZ based tracking system suffers from some major drawbacks and is not deployable
in a real life video , particularly when the video is infected with noises like shadow,
glare, electronic noises etc. One of the reasons is the inability of such systems to
form a good reference background frame. Also, the system is non adaptive to
demographic and environmental variations.

Additionally, when PTZ camera starts tracking an object, it loses the visibility of other
parts of the scene. Therefore, some important scene event may be missed while the
PTZ camera tracks one of the objects. This may encourage miscreants to fool the
system. The accuracy of detection and tracking of objects is also very low, as there is
no fixed background while the tracking is in progress and the foreground objects are
to be extracted based on motion detection or some modified version of the method or
using some modified version of object extraction technique from still images. In case
of some tracking error, which is likely to occur when the speed of the object in the
scene is high or random, the system cannot recover from this error state in a short
time, as it loses visibility of the object.
To take the best of the above two techniques, a novel method is designed where an
Object tracking system is used in conjunction with one or more PTZ cameras. When
an object is detected in the Fixed camera view, the object tracking system tracks the
object and pass on the positional information of the object along with a velocity
prediction data to the PTZ camera controller in a periodic manner. If more than a
single object is detected, one object is taken at a time for handling based on some
criteria (viz, the priority of the zone where the object appeared, the duration of the
object in the scene etc.). A PTZ camera controller receives the positional information
of the object periodically and estimates corresponding position of the object in the
PTZ camera view using a novel Scene Registration and coordinate transformation
technique. The P, T and Z values are set by the Controller such that the object
remains nearly at the center of the PTZ camera view and is sufficiently large.
Hence, the proposed system enhances the functionalities and utility of a traditional
Object tracking system and at the same time eliminates the drawbacks of a
standalone PTZ camera based tracking mechanism. This concept and implementation
technique is novel and unique. The concept can be extended to develop a system to
handle multiple objects in parallel with the more than one PTZ cameras. Also, trigger
from multiple fixed cameras can be received to develop a system with multiple fixed
cameras and multiple PTZ cameras together to cover a wider range in the scene, or
to enhance multiple Object tracking systems over a single framework.
Fig. 8 thus shows an embodiment of the enhanced object tracking system.
Technique for Coordinate Transformation from Fixed Camera view to PTZ camera
view

To map the bounding rectangle of an object visible in the Static camera view to the
corresponding Rectangle in the PTZ camera view a weighted interpolation technique
is used. The technique requires as input a set of points (A, B ...) spread uniformly
over the static camera view and their corresponding positions in the PTZ camera
view. This can be done by the user while configuring the system.
Fig. 9: Illustrates the Coordinate Transformation involved in the present invention
enhanced object tracking.
Let A and B be any two such points in the static camera view as marked by the user,
and let A' and B' be the corresponding mapped points in the PTZ camera view as also
marked by the user. Now, any arbitrary point (C) in the static camera view is
mapped to the corresponding point (C) in the PTZ camera view dynamically, using
the following method:
Let ax, bx, cx are x-coordinates of points A, B and C respectively in the static Camera
view. Similarly a'x, b'x and c'x are for the corresponding points in PTZ view. Let,

This gives an estimate of the x-coordinate of the point C' as interpolated with the
help of points A and B, with a confidence factor WAB , where WAB =
[Minimum of (Cx - Bx, Cx - Ax)].
Similarly, an estimate of x-coordinate of the same point C is calculated for all pair of
points (A, B) in the Static camera view.

Similarly, the y-coordinate Cy is calculated for the point C.
When a bounding rectangle is to be mapped from the static view to the PTZ view, this
technique is applied for all the four corner points of the rectangle.
It is thus possible by way of the method according to the present invention to achieve
the following:

a) to provide for a system for enhanced object tracking which would be efficient and
enable object tracking in conjunction with one or more PTZ cameras;
b) a system for enhanced object tracking wherein when an object is detected in the
Fixed camera view, the object tracking system would be adapted to track the object
and pass on the positional information of the object along with a velocity prediction
data to the PTZ camera controller;
c) a system for enhanced object tracking wherein if more than a single object is
detected, one object is taken at a time for handling based on select criteria (viz, the
zone of appearance of the object, the duration of the object in the scene etc.);

d) a system for enhanced object tracking involving a PTZ camera controller adapted
to receive the positional information of the object for each frame and estimates
corresponding position of the object in the PTZ camera view involving an advanced
Scene Registration and coordinate transformation technique;
e) a system for enhanced object tracking which would enhance the functionalities
and utility of a traditional Object tracking system and at the same time eliminates
the drawbacks of a standalone PTZ camera based tracking mechanism; and
f) a system for enhanced object tracking which can be extended to develop systems
to handle multiple objects in parallel with the more than one PTZ cameras and
further adaptable to trigger from multiple fixed cameras to develop a system with
multiple fixed cameras and multiple PTZ cameras together to cover a wider range in
the scene, or to enhance multiple Object tracking systems over a single framework.

We Claim:
1. A system for enhanced object tracking comprising:
object tracking means in conjunction with one or more PTZ cameras wherein
when an object is first detected in a fixed camera view of the said object
tracking means the same is adapted to track the object and also generate
and transmit the positional values alongwith a velocity prediction data to the
PTZ camera controller;
said PTZ camera controller adapted to receive the positional information of the
object in the PTZ camera view involving scene registration and coordinate
transformation technique.
2. A system for enhanced object tracking as claimed in claim 1 wherein more
than one object is tracked involving multiple PTZ cameras such as to cover a
wider range in the scene and to enhance multiple object tracking over a single
framework.
3. A system for enhanced object tracking as claimed in any preceding claim
wherein said means of coordinate transformation from fixed camera view to
PTZ camera view involves coordinate transformation technique comprising
weighted interpolation method.
4. A system for enhanced object tracking as claimed in any preceding claim
which is adapted to carry out said coordinate transformation following:
a. identifying a set of points in the static camera as A ,B, etc and also
corresponding points A',B', etc respectively in the PTZ camera by the
user;
b. mapping any arbitrary point C in the static camera to the
corresponding point C in the PTZ camera view dynamically wherein:
ax, bx, cx are x-coordinates of points A, B and C respectively in the static
Camera view and similarly a'x, b'x and c'x are for the corresponding points in
PTZ view as interpolated with the help of points A and B, with a confidence
factor WAB , where WAB = (Ax - Bx) [Minimum of (Cx - Bx , Cx - Ax)] is
determined to be

and wherein similarly, an estimate of x-coordinate of the same point C is
generated for all pair of points (A, B) in the Static camera view based on:

and similarly generating also the y-coordinate Cy for the point C.
5. A system for enhanced object tracking as claimed in any preceding claim
wherein for a bounding rectangle to be mapped from the static view to the
PTZ view, the system is adapted to apply said coordinate transformation
technique for all the four corner points of the rectangle.
6. A system for enhanced object tracking as claimed in any preceding claim
wherein the bounding rectangle corresponding to an object in the static
camera view is associated with a velocity prediction information, the system is
adapted to apply that velocity prediction information to map the rectangle in
the PTZ camera view.

ABSTRACT
The present invention relates to object tracking systems which are used to detect the
presence of any moving object in a scene and track the object to distinguish it from
other similar objects in the scene and also to record the trajectory of the object. In
particular, the invention relates to automatically track the object and zoom on the
object so that the detail features of the object is visible in the video frames and which
can be advantageously deployable in a real life video , even when the video is
infected with noises like shadow, glare, electronic noises etc. Further the system of
the invention is directed to be also adaptive to demographic and environmental
variations. The object tracking system of the invention is adapted to enhance the
functionalities and utility of a traditional Object tracking system and at the same time
eliminates the drawbacks of a standalone PTZ camera based tracking mechanism.

Documents

Application Documents

#	Name	Date
1	258-Kol-2012-(12-03-2012)SPECIFICATION.pdf	2012-03-12
2	258-Kol-2012-(12-03-2012)FORM-3.pdf	2012-03-12
3	258-Kol-2012-(12-03-2012)FORM-2.pdf	2012-03-12
4	258-Kol-2012-(12-03-2012)FORM-1.pdf	2012-03-12
5	258-Kol-2012-(12-03-2012)DRAWINGS.pdf	2012-03-12
6	258-Kol-2012-(12-03-2012)DESCRIPTION (COMPLETE).pdf	2012-03-12
7	258-Kol-2012-(12-03-2012)CORRESPONDENCE.pdf	2012-03-12
8	258-Kol-2012-(12-03-2012)CLAIMS.pdf	2012-03-12
9	258-Kol-2012-(12-03-2012)ABSTRACT.pdf	2012-03-12
10	258-KOL-2012-(30-03-2012)-FORM-1.pdf	2012-03-30
11	258-KOL-2012-(30-03-2012)-CORRESPONDENCE.pdf	2012-03-30
12	258-KOL-2012-(09-04-2012)-PA.pdf	2012-04-09
13	258-KOL-2012-(09-04-2012)-CORRESPONDENCE.pdf	2012-04-09
14	258-KOL-2012-FORM-18.pdf	2012-09-04
15	258-KOL-2012-FER.pdf	2018-09-11
16	258-KOL-2012-OTHERS [01-03-2019(online)].pdf	2019-03-01
17	258-KOL-2012-FER_SER_REPLY [01-03-2019(online)].pdf	2019-03-01
18	258-KOL-2012-COMPLETE SPECIFICATION [01-03-2019(online)].pdf	2019-03-01
19	258-KOL-2012-CLAIMS [01-03-2019(online)].pdf	2019-03-01
20	258-KOL-2012-ABSTRACT [01-03-2019(online)].pdf	2019-03-01
21	258-KOL-2012-HearingNoticeLetter-(DateOfHearing-02-03-2020).pdf	2020-02-20
22	258-KOL-2012-Correspondence to notify the Controller [27-02-2020(online)].pdf	2020-02-27
23	258-KOL-2012-FORM-26 [28-02-2020(online)].pdf	2020-02-28
24	258-KOL-2012-Written submissions and relevant documents [16-03-2020(online)].pdf	2020-03-16
25	258-KOL-2012-PETITION UNDER RULE 137 [16-03-2020(online)].pdf	2020-03-16
26	258-KOL-2012-PatentCertificate17-03-2020.pdf	2020-03-17
27	258-KOL-2012-IntimationOfGrant17-03-2020.pdf	2020-03-17
28	258-KOL-2012-RELEVANT DOCUMENTS [25-09-2021(online)].pdf	2021-09-25
29	258-KOL-2012-RELEVANT DOCUMENTS [30-09-2022(online)].pdf	2022-09-30
30	258-KOL-2012-RELEVANT DOCUMENTS [24-07-2023(online)].pdf	2023-07-24

Search Strategy

1	search(74)_11-09-2018.pdf