Abstract: The present invention related an Unmanned Aerial System to detect and track multiple moving targets"on image sequences recorded by Unmanned Aerial Vehicles (UAVs). Our approach focuses on challenging urban scenarios, where several objects are simultaneously moving in various directions, and we must expect frequent occlusions caused by other moving vehicles or static scene objects such as buildings and bridges. In addition, since the UAVs are flying at relatively low altitude, the 3D-ness of the scene affects strongly the camera motion compensation process, and the independent object motions may be often confused by artefacts of frame registration. Our method enables real time operation, processing 320x240 frames at around 15 fps and 640x480 frames at 5 fps. The automatic detection of these moving objects can help the operator by giving a caution to them. The tracking of the moving objects can also give useful information i.e, a vehicle is moving toward the defended camp. Change detection in video sequences can also reduce the size of the data to be transmitted to the control station. To avoid redundancy, there is no need to transmit the pixels belonging to an unchanged area.
The present invention related an Unmanned Aerial System to detect and track multiple moving targets on image sequences recorded by Unmanned Aerial Vehicles (UAVs). Our approach focuses on challenging urban scenarios, where several object are simultaneously moving in various directions, and we must expect frequent occlusions caused by other moving vehicles or static scene objects such as buildings and bridges. In addition, since the UAVs are flying at relatively low altitude, the 3D-ness of the scene affects strongly the camera motion compensation process, and the independent object motions may be often confused by artefacts of frame registration. Our method enables real time operation, processing 320x240 frames at around 15 fps and 640x480 frames at 5 fps.Detecting objects of interest is a key task in reconnaissance and surveillance applications and also on the combat field. The moving objects are in most cases relevant, since they are frequently vehicles or persons. The automatic detection of these moving objects can help the operator by
giving a caution to them. The tracking of the moving objects can also give useful information i.e. a
>
vehicle is moving toward the defended camp. Change detection in video sequences can also reduce the size of the data to be transmitted to the control station.-To avoid redundancy, there is no need to transmit the pixels belonging to an unchanged area.
c
Summary: The present invention related an Unmanned Aerial System to detect and track multiple
moving targets on image sequences recorded by Unmanned Aerial Vehicles (UAVs). Our approach
focuses on challenging urban scenarios, where several object are simultaneously moving in various
directions, and we must expect frequent occlusions caused by other moving vehicles or static scene
objects such as buildings and bridges. In addition, since the UAVs are flying at relatively low altitude,
the 3D-ness of jthe scene affects strongly the camera motion compensation process, and the
independent object motions may be often confused by artefacts of frame registration. Our method
enables real time operation, processing 320x240 frames at around 15 fps and 640x480 frames at 5
fps. The automatic detection of these moving objects can help the operator by giving a caution to
them. The tracking of the moving objects can also give useful information i.e. a vehicle is moving
toward the defended camp. Change detection in video sequences can also reduce the size of the
data to be transmitted to the control station. To avoid redundancy, there is no need to transmit the
pixels belonging to an unchanged area. ^
Brief description of drawings:
The detailed description is described with reference to the accompanying figures. In the figures, the
left most digit in the reference number identifies the figure in which the reference number first
appears. The same numbers are used throughout the drawings to reference like features and
components.
Figure 1: Process overview
Figure 2: Warped frames Figure 3; Background subtraction
Figure 4: Foreground detection, Moving objects are marked by red colour on the mosaic image
;? Figure 5: The steps of object assignment with Kalman-filtering
Figure 6: Object assignment
Detailed description of drawings:
Exemplary^embodiments will now be described with reference to the accompanying drawing. The invention may, however, be embodied in many different forms and should hot be construed as limited to the embodiments set forth herein ; rather, these embodiments are provided so that this invention will be thorough and complete , and will fully convey its scope to those skilled in the art. The terminology used in the detailed description of the particular exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting. In the drawings, like numbers refer to like elements.
Reference in this.specification to "one embodiment" or "an embodiment" means that a particular feature , structure ,or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the plirase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which, may be exhibited by some embodiments and not by others. Similarly, various requirements, are described which may be requirements for some embodiments but not other embodimentsThe specification may refer to "an", "one" or "some" embodiment(s) in several locations. This does not necessarily imply that each such reference is to the sameembodiment(s), or that the feature only implies to a single embodiment. Single features of diftef^^ft%bpdiments may also be combined to provide other embodiments.
As used herein , the singular forms "a", "an" and "the" are intended to include the plural forms as well , unless expressly stated otherwise. It will be further understood that the terms "includes" , "comprises" , "including", and/or "comprising" when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Furthermore, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations and arrangements of one or more of the associated listed items.
Unless otherwise defined, ail terms (including technical and scientific terms) used herein have same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. It will be further understood that term, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in anidealized or overlay formal sense unless expressly so defined herein.
The terms used in this specification generally have their ordinary meanings in the art, within the context of the disclosure, and in the specific context where each term is used. Certain terms that are used to describe the disclosure are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner regarding the description of the disclosure. For convenience, certain terms may be highlighted, for example using italics and/or quotation marks. The use of highlighting has no influence on the scope and meaning of a term; the scope and meaning of a term is the same, in the same context, whether or not it is highlighted. It will be.appreciated that same thing can be said in more than one way.
The figures depict a simplified structure only showing some elements and functional entities, all being logical units whose implementation may differ from what is shown. The connections shown are logical connections; the actual physical connections may be different.
Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is any special significance to be placed upon whether or not a term is elaborated or discussed herein. Synonyms for certain terms are provided. A recital of one or more
synonyms does not exclude the use of other synonyms. The use of examples anywhere in this specification including examples of any terms discussed herein is illustrative only, and is. not intended to further limit the scope and meaning of the disclosure or of any exemplified term. Likewise, the disclosure is hot limited to various embodiments given in this specification.
Without intent to further limit the scope of the disclosure, examples of instruments, apparatus, methods and their related results according to the embodiments of the present disclosure are given below. Note thattitles or subtitles may be used in the examples for convenience of a reader^ which in no way should limit the scope,of the disclosure. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. In the case of conflict, the, present document, including definitions will control.
According to the preferred embodiment of the present invention moving objects or targets are identified and tracked by the following approach as explained in Figure 1 which involves
• (i) video stabilization
• (ii) foreground extraction
• (iii) moving object detection
• (iv) tracking
The first step is the compensation of the camera's ego-motion. This can be achieved by warping the frames to a common coordinate system, where the images can be considered still. This step can also provide a better visual information to the operator by avoiding the shaking of the camera. The images in the common coordinate system can be handled by similar algorithms to ones developed for fixed cameras. However, we must consider that ego-motion compensation is not totally accurate, thus efficient filtering of the registration noise is needed. The tracking also needs the world coordinate system, where the position of an object in the irtiage refers to it's real position in the world. The transformation to the world coordinate system needs additional information, i.e. global position, camera parameters. In the image warped into the common coordinate-system, the motion detection is done by background subtraction. Then we obtain a foreground mask, which shows the moving pixels. The moving objects are blobs on the mask, which can be detected and tracked. Foreground detection may contain errors, e.g. some blobs can be split, or only parts of a given moving object are covered by its blob. Therefore, sorhe a priori kh{^^^!^3eedpd about
the objects of interest. In aerial videos the size of the objects is a good feature, because it can be easily estimated for several targets such as cars or pedestrian, using the camera parameters (camera altitude, angles, focal length). In the final step, the detection results from the separate frames are assigned to each other by applying Kalman filtering for the consecutive positions and considering the object histograms. The assigned object positions on the frames yield the track.
VIDEO STABILIZATION
The ego-motion compensation is achieved by calculating an optimized homogen linear transform
between the frames, and warping the images to a common coordinate systerii.The applied image
- '' >
registration and alignment techniques are used :
i . '
The perspective transformation between two images taken of a plane surface can described by the
■9 ...
homography matrix H. One point po is transformed to p\ by the following equation:
Xl
,P1 = yx
Z\ -.—'
#n #12 #13
#21 #22 #33
#31 #32 #13
2/0 1
(1)
The pixel
coordinates are:
%pix —
Vi
Zi
Vpix — —
(2) (3)
G*?
This H matrix can ;be calculated for every image pair taken over the same scene, but it will give an accurate transformation only if the scene is a planar surface. This constraint cannot be met in general, but for aerial images it is a good assumption, since the surface of the earth from low altitudes is approximately a plane. This assumption in not met perfectly in the presence of relatively high buildings and considerable relief. In this situations the points emerging from the ground plane
surface cause parallax errorsjn the Registration .procedure.ln- special-cases speciarhomography matrices can be used. These are when there is only translation or rotation between two images, or affirie transformation. On the considered videos the camera's ego motion is arbitrary, thus we describe the transformation by the most general homography matrix.
To warp one image to the other we need to calculate the H homography matrix. We mention here
two approaches to calculate this matrix:
• Direct (pixel) based
• Feature based
Direct (pixel) based registration
These approaches define an error metric which shows how much the corresponding pixel values agree after the warping transform. Then the transformation which warps the imagjes with a minimum error is searched, which needs a computationally suitable searching technique.
The advantage of this approach is that it can handle blurred images, when there are no feature
points and thus other, feature based, approaches fail . On the, other hand, this method is mostly
suitable for specific transformatipns, i.e. translation, rotation, but it can hardly handle general
homography transformations. ,
Feature biased registration
The feature based techniques begin with extraction of feature points on the images. These points can be found and aligned on the two image. This yields a points set 5 on one image which is aligned to point set S° on the other image.
By fitting a transformation to these points the transformation matrix can be estimated.
S° = HS (4)
In case of aerial images the transformation cannot be restricted to translation or rotation, thus the more general affine or perspective transformation has to be used. Therefore, we use the feature based method to find the homography matrix of the perspective transformation.
Feature points
We use the corner detector which is more suitable in manmade environments where corners are abundant. It is also computationally less expensive than the other feature point detectors such as SIFT or SURF. Note that if there are no feature points, like in large homogenous areas, the registration fails.
Point alignment
Corresponding points are searched on two consecutive frames by the pyramidal optical flow algorithm. This step yields the positions of the feature points on the next image, thus the transformation between the frames can be calculated. The applied opticalrflow algorithm assumes small displacement and nearly constant pixel intensity across the consecutive frames. This constraint is fulfilled on the considered videos. The transformation is fitted by RANSAC to the extracted point correspondences to reduce the effect of outliers.
If the transformation between two frames is available, the frames can be warped into a common coordinate-system. The common coordinate system is a periodically chosen reference frame. There is a homography matrix between the two following frames n,n- 1 Hn,n-i, arid a homography between the frame and the reference frame H„,o,:
Hn,G= Hn,n-lHn-Ln-2...H2;iHlO (5)
The current image is warped into the coordinate-system of the reference image by the Homography matrix H, U is the pixel value in the destiny image, /s is the pixel value in the source image.
r ( ) = / {Mux + Hi2y + H13 H21X + H22y + H23\ d{x%y) a ^3ii + H^y + Ha3, H^x + H^y + ff^J
The warped image has artefacts, because of the unperfect estimation of the transformation and discretization errors. The homography transformation results in continuous coordinate values which should be discretized into pixel positions, which may be relevant in strong perspective cases, when a given source pixel is warped to several pixels of the output image. This reduces the effective resolution of the image.
FOREGROUND EXTRACTION
The image registration yields an image that looks like a window in a global image (Figure 2). If the registration is optimal only the pixels belonging to a moving object are changing, though this cannot be fully achieved due to image registration and parallax errors. Working on the considered videos, these errors are typically located along the edges and their expansion is narrow (a few pixels).
Background model
The background image is synthesized and updated in the common coordinate system, calculating the pixel-by-pixel running average and variance of the consecutive warped videc^^wfl^^
We calculate the mean value x'n, and the variance o„2for every pixel on-line.
X'n = (1 - Cr)~X„-i + OXn (7)
where a is a constant that gives the refresh rate. am = (1 - a)on2-i+ a(xn - xV)(xn - xVi) (8)
Foreground detection '
The pixels of the actual frame are classified either as foreground or background based on the normalized Euclidean distance from the background pixel values in the CIE l_*u*v* colour space. This is the Mahala Nobis-distance for diagonal covariance matrix.
d(Pn) =
E
(Pn,i,-Pn-l,i)2
This yields a distance image, which is noisy as one can see on
Figure 3. This noisy image is filtered by a special Difference of Gaussians filter, applying Gaussian blur and threshold. The blurring spreads the narrow pixel errors, thus the concerning values in the difference image drop below the threshold level. The moving objects correspond to blobs on the foreground mask, though the mask of an object can be split and incomplete. Figure 4 shows the foreground mask marked by red colour on the image.
MOVING OBJECT DETECTION
The foreground detection yields a binary mask on which the moving objects, e.g. cars and pedestrians are blobs. These blobs can be noisy, split, etc., and there can be false detected blobs which do not belong to moving objects. So in general the blob detection is under-constrained. By using a priori knowledge the blob detection can be restricted to special blobs, i.e. by shape or size.
We propose a fast object detection algorithm which is based on the foreground mask; meanwhile it
considers split and incomplete object blobs. The input is the size of a car. The size of the cars can be
precisely defined. If we know the altitude, the angles (raw, pitch) and the focus length of the
camera( the airborne vehicles have an inertiial navigation system which can provide these
parameters) the size of the car can be approximately calculated by assuming that it is moving on
the ground. , .
The initial step of the object detector algorithm divides the mask image to disjunctive rectangles with size x*x, where x is the size of the car in pixels, and the foreground covering ratio is calculated for each rectangle. The rectangles containing foreground pixels above a threshold are kept as object candidates (OC).
Next, the OC-s are shifted and/or merged by an iterative algorithm. An OC is shifted by mean-shift based on the binary mask values. This shift moves the OC rectangles toward the dense foreground pixel regions. Thereafter, these two OCs are merged, if in the shifted positions their overlapping area is above a threshold. The mean shifting and merging steps are repeated several times till convergence.
At the end the OCs are located around areas containing large number of foreground pixels. Since the binary mean shifting and intersection calculations are very simple operations the detection algorithm is significantly quicker than similar approaches, e.g. the colour and texture based methods but since it is based only on the foreground mask, it can fail in cases when the foreground mask has errors, e.g. the large objects e.g. buses, can be detected as more cars if their silhouette is split.
Also the pedestrians can cause false positive errors, if more moving pedestrians are close, because in this case their foreground blob's size is close to the size of a car. The steps of the algorithm should be followed in Fig. 4.
TRACKING
The object detection is processed for each frame independently. Thus these detections have to be assigned across the frames to yield the tracks of the objects. The difficulty is that in general the number of object detections for consecutive frames can vary even in the case of perfect detection, i.e. objects enter and leave the scene, objects are occluded by buildings or bridges. To handle the disappearing and later reappearing objects Kalman filtering is used.
The steps of the detection:
1. calculate the foreground pixels h/ore in the x*x sized rectangles Rt
2. if r\fore> PA{w) (where A means the area of the rectangle and /? is the threshold parameter) /?,wi be an object candidate and the "mass center" is calculated for Ri
3. Find the n closest neighbors Nj for every object candidate/?/. I used n = 4
4. Calculate the areas Ay of the cuts between the neighbors and the object candidate Ay = A{
5. if Aij> f where f is a threshold parameter the OCs RiRjsre merged, this means that the OC with less foreground pixels is deleted and the size of the other is increased.
6. The position of the object candidates is modified by meanshifting it m times, m is a parameter, which is different in the iteration steps. In the first iterations it is low (2-3), later it is increased to 4-5. This accelerates the algorithm, since in the first iterations when the number of OCs is high less mean shifting is needed.
7. The steps 4 - 6 are repeated till convergence with different f and m parameters. Practically the convergence can be achieved by repeating the steps several times. .
8. The remaining OC rectangles are the objects, the 1 dimensional histogram is calculated for every color channel in the HSV color space for the objects.
KALMAN FILTER
The Kalman filter is an efficient tool for filtering a noisy dynamic system. It predicts the new states of the system and then corrects it by the measurements.
In tracking we do not have information about the control of the motion, therefore the acceleration is assumed to be zero, and the change in velocity is modelled by the process noise. Consequently, we do not include the acceleration in the process equation, and the effect of the acceleration noise is described by the velocity noise. The motion can be described by the following equations:
Xk Xk = 1 1
0. 1 Xk-l Xk-l
Xk 0. 1 Xk-i
Zk- = [10]£fc + wfc (11)
Where x* is the position coordinate in one direction, z* is the measured position, Wk-i is the process noise, wis the measurement noise.
ASSIGNMENT
On each current frame the k detected objects have to be assigned to n tracked objects from the* previous frames. If n = k, this can be done in n! ways. The Hungarian method solves this in 0[n3) running time. We solve the assignment problem with a greedy algorithm, which is computationally simple and gives good result in most cases. The cases when the greedy algorithm fail can be neglected, since they last only a few frames, and on a long term the Kalman filtering corrects these errors.
We construct a n x k score matrix; S, whose elements are calculated based on the euclidean distance of the predicted and detected positions and the object's color histogram. The elements of
the matrix are fitness values which describe how good the objects from the previous frames match
J- '■ • - *
the objects detected on the current frame.
Sij = tf-J— + (1 - $)dhist(Oi,Di)
dposiO^Dj) (12)
where dposis the euclidean distance of the positions, dWis the histogram similarity and tf is a weight value.
dPos{oi,Dj)= ,(p*-u- p/*)2 + Qf-iy- PJ>y)2 (13)
p -/is the predicted position of object Ot, p/is the position of detected object Oj.
The steps of the score table 5/ycalculation for every ij pair:
1. Calculate the distance dposof the predicted position p~f of object Ot and the position p; of the detected object D/.
2. \fdPosij> (, SIJ= 0 and terminate, (is a threshold parameter.
3. if dp(Wy^ (calculateS/yaccording to (12)
If the number of detected objects is equal or greater than the number of tracked objects from the previous frames, the assignment is done forward, this means that the tracked objects are assigned to the detected ones.
The steps of forward assignment:
1. i=l
2. for O/find max /7?/in S/i...*, mt=Stj
3. if /77,> (, assign Dj to Ot, set the column Si...nj= 0
4. I = I + 1, go to step 2 until i Dna
The objects which are passive for more than NpaSsiVe time are deleted.
If the number of detected objects is less than the number of tracked objects from the previous frames, the assignment is done backward, this means that the detected objects are assigned to the tracked ones. Distinguishing between the two assignments is needed because the 9 algorithm is greedy, thus the first objects in the order have priority.
Assignment backward:
_________ . : _ , _
2. for D/find max myin Si„,njf mj= Sy
3.. if mj> C, assign 0/to Djt set the row Siu.k= 0
4. y=y + 1, go to step 2 unt\\j passive
6. create new objects Oneivfor the detections which are not assigned to an object Dna,.and assign these new objects to them. Onew-* Dna
J»
We claim: • . .
A system in Unmanned Aerial System.having one or multiple aerial vehicle which is capable
tracking one or multiple moving objects on ground.
A system in Unmanned Aerial System which is capable of mapping the respective position of
moving objects with respect to their previous position oh ground, water or air.
A system as claimed in Claim 1 and Claim 2 which is capable of tracking both aerial and
ground targets which may be static or moving.
A system in Unmanned Aerial Vehicle which is capable of tracking , locating , identifying
moving objects from UAV on ground using image processing by calculating the difference
between two frames captured,
A system as claimed in 1,2,3 which is capable of tracking and surveillance of multiple ground
and aerial targets\ising multiple aerial vehicles in swarm formation.
| # | Name | Date |
|---|---|---|
| 1 | 201811040639-Other Patent Document-291018.pdf | 2018-10-30 |
| 2 | 201811040639-Other Patent Document-291018-.pdf | 2018-10-30 |
| 3 | 201811040639-FORM28-291018.pdf | 2018-10-30 |
| 4 | 201811040639-Form 5-291018.pdf | 2018-10-30 |
| 5 | 201811040639-Form 2(Title Page)-291018.pdf | 2018-10-30 |
| 6 | 201811040639-Form 1-291018.pdf | 2018-11-05 |
| 7 | abstract.jpg | 2018-12-18 |