Aerial Video Analytics Based Dynamic Non Linear Distance Measurement

< Back

Aerial Video Analytics Based Dynamic Non Linear Distance Measurement Between On Ground Subjects

Abstract: Drones are adopted for aerial video-based applications. An important aspect in analyzing drone camera feed is perspective correction to understand linearity of images. Unlike a static camera, in a drone (aerial view), the camera may move up-down or forward-backward, continuously changing the position of the camera in a 3-dimensional environment while the subject(s) under consideration may also be moving. Since the distance between pixels is non-linear, determining a vanishing point and subsequently a perspective correct image is challenging. Conventional approaches have relied on process intensive deep learning methods, or a camera mounted on a vehicle where the distance from the ground is fixed and hence does not address the problems in drone video analytics. In the present disclosure, an optical flow-based approach is provided, wherein corner features from a set of n consecutive frames are tracked to dynamically determine the vanishing point. . [To be published with FIG.15]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

28 December 2021

Publication Number

26/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2025-09-17

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. DAS, Apurba

Tata Consultancy Services Limited, Anchor Building, ITPL, Whitefield, Bangalore, Karnataka 560066, India

2. BHOI, Jaimin Ashokbhai

Tata Consultancy Services Limited, Anchor Building, ITPL, Whitefield, Bangalore, Karnataka 560066, India

Specification

Claims:We Claim:
A processor implemented method (200) comprising:
receiving, via one or more hardware processors, an aerial video comprising a plurality of frames, wherein the aerial video captures one or more on-ground subjects to be tracked by an aerial vehicle operating at one or more of varying speeds, varying angle with a horizontal plane, and varying distance from ground (202);
detecting, via the one or more hardware processors, a plurality of corner features in a set of n consecutive frames in the plurality of frames (204);
eliminating, via the one or more hardware processors, one or more of the plurality of corner features representing noise, based on two displacement thresholds to obtain a plurality of flow anchors (206);
tracking, via the one or more hardware processors, optical flows of the plurality of flow anchors in the set of n consecutive frames, wherein the optical flows are associated with optical flow vectors and cardinality n of the set of consecutive frames is based on a gross average optical flow vector magnitude associated with the plurality of flow anchors (208); and
dynamically determining, via the one or more hardware processors, a vanishing point based on (i) an envelope of the optical flow vectors generated using the plurality of flow anchors or (ii) extrapolation of the optical flows moving in a same direction (210).

The processor implemented method of claim 1, wherein the step of detecting a plurality of corner features comprises sequentially obtaining the plurality of corner features using a sliding window method, wherein a first frame from the set of n consecutive frames is eliminated and an ?(n+1)?^th frame is included into the set of n consecutive frames such that the cardinality n of the set of consecutive frames is maintained.

The processor implemented method of claim 1, wherein the two displacement thresholds represent (i) an upper value and (ii) a lower value respectively, compared to the gross average optical flow vector magnitude, and is a % value with reference to each of the optical flow vectors.

The processor implemented method of claim 1, wherein the step of determining a vanishing point based on an envelope of the optical flow vectors generated using the plurality of flow anchors comprises:
reproducing the optical flows associated with the flow anchors on a binary image having a size matching a size of a frame in the plurality of frames, wherein the plurality of flow anchors in a first frame from the set of n consecutive frames are represented by white pixels on a black background of the binary image;
processing the binary image to detect contours; and
determining the vanishing point as a center of a largest contour from the detected contours representing the envelope of the optical flow vectors.

The processor implemented method of claim 4, wherein processing the binary image comprises performing one or more of blur, Sobel, morphologyEx and erosion sequentially to generate the envelope of the optical flow vectors associated with the white pixels on the binary image.

The processor implemented method of claim 1, wherein the step of determining a vanishing point based on extrapolation of the optical flows moving in a same direction comprises:
identifying the optical flows associated with the flow anchors moving in a same direction based on an angle of each of the optical flows with the horizontal plane; and
determining the vanishing point as a point of intersection of extrapolated optical flows.

The processor implemented method of claim 1, further comprising performing, via the one or more hardware processors, perspective correction of a first frame in the set of n consecutive frames using the vanishing point, wherein the perspective correction comprises:
determining four farthest flow anchors from the center of a frame in the plurality of frames by:
generating two lines, one on either side of a central line, being perpendicular to the horizontal plane and passing through a center of the frame in the plurality of frames, wherein the two lines are generated along the flow anchors located farthest from the central line on either side thereof; and
determining four intersecting points of a rectangular slice of the frame with the two generated lines as the four farthest flow anchors; and
applying inverse perspective transform using (i) the four farthest flow anchors; and (ii) the first frame from the set of n consecutive frames as an input to obtain a corresponding perspective corrected frame.

The processor implemented method of claim 7, further comprising performing, via the one or more hardware processors, one or more of (i) detecting one or more subjects in the aerial video using the perspective corrected frames as an input to a deep learning method; (ii) computing a Euclidean distance between the detected one or more subjects using the perspective corrected frames; and (iii) generating a heatmap using the detected one or more subjects and distances therebetween.

A system (100) comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
receive, an aerial video comprising a plurality of frames, wherein the aerial video captures one or more on-ground subjects to be tracked by an aerial vehicle operating at one or more of varying speeds, varying angle with a horizontal plane, and varying distance from ground;
detect, a plurality of corner features in a set of n consecutive frames in the plurality of frames;
eliminate one or more of the plurality of corner features representing noise, based on two displacement thresholds to obtain a plurality of flow anchors;
track optical flows of the plurality of flow anchors in the set of n consecutive frames, wherein the optical flows are associated with optical flow vectors and cardinality n of the set of consecutive frames is based on a gross average optical flow vector magnitude associated with the plurality of flow anchors; and
dynamically determine a vanishing point based on (i) an envelope of the optical flow vectors generated using the plurality of flow anchors or (ii) extrapolation of the optical flows moving in a same direction.

The system of claim 9, wherein the one or more processors are configured by the instructions to detect a plurality of corner features by sequentially obtaining the plurality of corner features using a sliding window method, wherein a first frame from the set of n consecutive frames is eliminated and an ?(n+1)?^th frame is included into the set of n consecutive frames such that the cardinality n of the set of consecutive frames is maintained.

The system of claim 9, wherein the two displacement thresholds represent (i) an upper value and (ii) a lower value respectively, compared to the gross average optical flow vector magnitude, and is a % value with reference to each of the optical flow vectors.

The system of claim 9, wherein the one or more processors are configured by the instructions to determine the vanishing point, based on an envelope of the optical flow vectors generated using the plurality of flow anchors, by:
reproducing the optical flows associated with the flow anchors on a binary image having a size matching a size of a frame in the plurality of frames, wherein the plurality of flow anchors in a first frame from the set of n consecutive frames are represented by white pixels on a black background of the binary image;
processing the binary image to detect contours; and
determining the vanishing point as a center of a largest contour from the detected contours representing the envelope of the optical flow vectors.

The system of claim 12, wherein the one or more processors are configured by the instructions to process the binary image by performing one or more of blur, Sobel, morphologyEx and erosion sequentially, to generate the envelope of the optical flow vectors associated with the white pixels on the binary image.

The system of claim 9, wherein the one or more processors are configured by the instructions to determine the vanishing point based on extrapolation of the optical flows moving in a same direction, by:
identifying the optical flows associated with the flow anchors moving in a same direction based on an angle of each of the optical flows with the horizontal plane; and
determining the vanishing point as a point of intersection of extrapolated optical flows.

The system of claim 9, wherein the one or more processors are further configured by the instructions to perform perspective correction of a first frame in the set of n consecutive frames using the vanishing point, by:
determining four farthest flow anchors from the center of a frame in the plurality of frames by:
generating two lines, one on either side of a central line, being perpendicular to the horizontal plane and passing through a center of the frame in the plurality of frames, wherein the two lines are generated along the flow anchors located farthest from the central line on either side thereof; and
determining four intersecting points of a rectangular slice of the frame with the two generated lines as the four farthest flow anchors; and
applying inverse perspective transform using (i) the four farthest flow anchors; and (ii) the first frame from the set of n consecutive frames as an input to obtain a corresponding perspective corrected frame.

The system of claim 15, wherein the one or more processors are further configured by the instructions to perform one or more of (i) detecting one or more subjects in the aerial video using the perspective corrected frames as an input to a deep learning method; (ii) computing a Euclidean distance between the detected one or more subjects using the perspective corrected frames; and (iii) generating a heatmap using the detected one or more subjects and distances therebetween.

Dated this 28th day of December 2021
Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086
, Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
AERIAL VIDEO ANALYTICS BASED DYNAMIC NON-LINEAR DISTANCE MEASUREMENT BETWEEN ON-GROUND SUBJECTS

Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

Preamble to the description:
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to the field of non-linear distance measurement between on-ground subjects, and, more particularly, to systems and methods for aerial video analytics based dynamic non-linear distance measurement between on-ground subjects.

BACKGROUND
Heatmaps or social distancing-based applications deal with detection of persons and their proximity with each other. In case of an epidemic or a pandemic situation like Covid-19, it is important to maintain social distancing in public as well as private indoor spaces. Cameras are typically used to detect persons in a space being monitored. In case of a static camera, it is easy to understand the linearity of objects and distances between them. In most solutions dealing with proximity distance computation, a Euclidian distance between 2 points in an image provides an approximate distance based on number of pixels covered in a proximity distance. However, in case of a moving camera or a drone video, the objects are moving along with the camera. Also, height from the ground of the camera may not be static, which creates a problem in understanding linearity for measuring distance between the objects from an aerial view.

SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
In an aspect, there is provided a processor implemented method comprising the steps of: receiving, via one or more hardware processors, an aerial video comprising a plurality of frames, wherein the aerial video captures one or more on-ground subjects to be tracked by an aerial vehicle operating at one or more of varying speeds, varying angle with a horizontal plane, and varying distance from ground; detecting, via the one or more hardware processors, a plurality of corner features in a set of n consecutive frames in the plurality of frames; eliminating, via the one or more hardware processors, one or more of the plurality of corner features representing noise, based on two displacement thresholds to obtain a plurality of flow anchors; tracking, via the one or more hardware processors, optical flows of the plurality of flow anchors in the set of n consecutive frames, wherein the optical flows are associated with optical flow vectors and cardinality n of the set of consecutive frames is based on a gross average optical flow vector magnitude associated with the plurality of flow anchors; and dynamically determining, via the one or more hardware processors, a vanishing point based on (i) an envelope of the optical flow vectors generated using the plurality of flow anchors or (ii) extrapolation of the optical flows moving in a same direction.
In another aspect, there is provided a system comprising: memory storing instructions; one or more communication interfaces; one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive, an aerial video comprising a plurality of frames, wherein the aerial video captures one or more on-ground subjects to be tracked by an aerial vehicle operating at one or more of varying speeds, varying angle with a horizontal plane, and varying distance from ground; detect, a plurality of corner features in a set of n consecutive frames in the plurality of frames; eliminate one or more of the plurality of corner features representing noise, based on two displacement thresholds to obtain a plurality of flow anchors; track optical flows of the plurality of flow anchors in the set of n consecutive frames, wherein the optical flows are associated with optical flow vectors and cardinality n of the set of consecutive frames is based on a gross average optical flow vector magnitude associated with the plurality of flow anchors; and dynamically determine a vanishing point based on (i) an envelope of the optical flow vectors generated using the plurality of flow anchors or (ii) extrapolation of the optical flows moving in a same direction.
In yet another aspect, there is provided a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive, an aerial video comprising a plurality of frames, wherein the aerial video captures one or more on-ground subjects to be tracked by an aerial vehicle operating at one or more of varying speeds, varying angle with a horizontal plane, and varying distance from ground; detect, a plurality of corner features in a set of n consecutive frames in the plurality of frames; eliminate one or more of the plurality of corner features representing noise, based on two displacement thresholds to obtain a plurality of flow anchors; track optical flows of the plurality of flow anchors in the set of n consecutive frames, wherein the optical flows are associated with optical flow vectors and cardinality n of the set of consecutive frames is based on a gross average optical flow vector magnitude associated with the plurality of flow anchors; and dynamically determine a vanishing point based on (i) an envelope of the optical flow vectors generated using the plurality of flow anchors or (ii) extrapolation of the optical flows moving in a same direction.
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured by the instructions to detect a plurality of corner features by sequentially obtaining the plurality of corner features using a sliding window method, wherein a first frame from the set of n consecutive frames is eliminated and an ?(n+1)?^th frame is included into the set of n consecutive frames such that the cardinality n of the set of consecutive frames is maintained.
In accordance with an embodiment of the present disclosure, the two displacement thresholds represent (i) an upper value and (ii) a lower value respectively, compared to the gross average optical flow vector magnitude, and is a % value with reference to each of the optical flow vectors.
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured by the instructions to determine the vanishing point, based on an envelope of the optical flow vectors generated using the plurality of flow anchors, by: reproducing the optical flows associated with the flow anchors on a binary image having a size matching a size of a frame in the plurality of frames, wherein the plurality of flow anchors in a first frame from the set of n consecutive frames are represented by white pixels on a black background of the binary image; processing the binary image to detect contours; and determining the vanishing point as a center of a largest contour from the detected contours representing the envelope of the optical flow vectors.
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured by the instructions to process the binary image by performing one or more of blur, Sobel, morphologyEx and erosion sequentially, to generate the envelope of the optical flow vectors associated with the white pixels on the binary image.
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured by the instructions to determine the vanishing point based on extrapolation of the optical flows moving in a same direction, by: identifying the optical flows associated with the flow anchors moving in a same direction based on an angle of each of the optical flows with the horizontal plane; and determining the vanishing point as a point of intersection of extrapolated optical flows.
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured by the instructions to perform perspective correction of a first frame in the set of n consecutive frames using the vanishing point, by: determining four farthest flow anchors from the center of a frame in the plurality of frames by: generating two lines, one on either side of a central line, being perpendicular to the horizontal plane and passing through a center of the frame in the plurality of frames, wherein the two lines are generated along the flow anchors located farthest from the central line on either side thereof; and determining four intersecting points of a rectangular slice of the frame with the two generated lines as the four farthest flow anchors; and applying inverse perspective transform using (i) the four farthest flow anchors; and (ii) the first frame from the set of n consecutive frames as an input to obtain a corresponding perspective corrected frame.
In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured by the instructions to perform one or more of (i) detecting one or more subjects in the aerial video using the perspective corrected frames as an input to a deep learning method; (ii) computing a Euclidean distance between the detected one or more subjects using the perspective corrected frames; and (iii) generating a heatmap using the detected one or more subjects and distances therebetween.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG.1 illustrates an exemplary block diagram of a system for aerial video analytics based dynamic non-linear distance measurement between on-ground subjects, in accordance with some embodiments of the present disclosure.
FIG.2 illustrates an exemplary flow diagram of a computer implemented method for aerial video analytics based dynamic non-linear distance measurement between on-ground subjects, in accordance with some embodiments of the present disclosure.
FIG.3 illustrates an optical flow as a traversal of a pixel P0 across 4 consecutive frames, in accordance with some embodiments of the present disclosure.
FIG.4 illustrates corners in an image that are tracked, in accordance with some embodiments of the present disclosure.
FIG.5 illustrates a sliding window method for tracking optical flows, in accordance with some embodiments of the present disclosure.
FIG.6 illustrates a flowchart for tracking optical flows using n consecutive frames, in accordance with some embodiments of the present disclosure.
FIG.7 illustrates a sample output of optical flow tracking based on a scene from a parking lot, in accordance with some embodiments of the present disclosure.
FIG.8 illustrates a vanishing point, in accordance with some embodiments of the present disclosure.
FIG.9 illustrates a flowchart for determining a vanishing point based on the envelope of the optical flow vectors generated using the plurality of corner features and perspective correction, in accordance with some embodiments of the present disclosure.
FIG.10 illustrates processing of a binary image for detecting contours, in accordance with some embodiments of the present disclosure.
FIG.11 illustrates the detected contours and the envelope of the optical flow vectors, in accordance with some embodiments of the present disclosure.
FIG.12 illustrates a flowchart for determining a vanishing point based on extrapolation of the optical flows moving in a same direction and perspective correction, in accordance with some embodiments of the present disclosure.
FIG.13 illustrates a sample output of a scene from a parking lot with a vanishing point region, the vanishing point and four points determined for perspective correction, in accordance with some embodiments of the present disclosure.
FIG.14 illustrates a perspective corrected image of the image in FIG.13, in accordance with some embodiments of the present disclosure.
FIG.15 provides a combined representation of FIG.7, FIG.11, FIG.13 and FIG.14 respectively for enhanced clarity on the processed image, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Drones are increasingly being adopted for several aerial video-based applications such as search and rescue, livestock management, crop monitoring, surveillance, disaster management and the like. An important aspect in analyzing drone camera feed is perspective correction to understand linearity of images. Perspective correction works well when a static camera is used since the camera’s height and distance from the object is fixed. But in case of a drone (aerial view), the camera can be moving up-down or forward-backward, continuously changing the position of the camera in a 3-dimensional environment while the subject(s) under consideration may also be moving. Since the distance between pixels is non-linear, determining a vanishing point and subsequently a perspective correct image is challenging.
In accordance with the present disclosure, an optical flow-based approach has been provided to address the technical problem mentioned above. There have been some work using the optical flow-based approach in this area using a camera mounted on a vehicle, or by employing deep learning for detection of depth optical flow. When a camera is mounted on a vehicle the height of the camera gets fixed (approximately). Again, the vehicle mostly moves in one direction only (forward) and rarely in a backward direction. When it comes to drone camera feed, there are many variables which affect vanishing point detection - the height of the drone camera may change anytime; the drone camera may tilt at an angle; the drone may move in any direction possible (360 degree); and the speed of the drone may also vary. Applying methods of the art to a drone video feed does not address all the problems in drone video analytics.
The disadvantage of using deep learning in the applications mentioned above may be costly in terms of processing time whereas the traditional method (sparse optical flow method of the present disclosure) provides results fast but with a trade-off with accuracy of the optical flow. Since the applications considered in this disclosure require an approximate vanishing point, to further perform perspective correction for solving non-linearity to calculate the distance between two points in an image, the deep learning model may not provide an economically viable solution.
Referring now to the drawings, and more particularly to FIG. 1 through FIG.15, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG.1 illustrates an exemplary block diagram of a system 100 for aerial video analytics based dynamic non-linear distance measurement between on-ground subjects, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more hardware processors 104, communication interface (s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more hardware processors 104. The one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) are configured to fetch and execute computer-readable instructions stored in the memory. In the context of the present disclosure, the expressions ‘processors’ and ‘hardware processors’ may be used interchangeably. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
The communication interface (s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface(s) can include one or more ports for connecting a number of devices to one another or to another server.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 102.
FIG.2 illustrates an exemplary flow diagram of a computer implemented method 200 for aerial video analytics based dynamic non-linear distance measurement between on-ground subjects, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions configured for execution of steps of the method 200 by the one or more hardware processors 104. The steps of the method 200 will now be explained in detail with reference to the components of the system 100 of FIG.1. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to receive, at step 202, an aerial video comprising a plurality of frames, wherein the aerial video captures one or more on-ground subjects to be tracked by an aerial vehicle operating at one or more of varying speeds, varying angle with a horizontal plane, and varying distance from ground. In the context of the present disclosure, the expression ‘subjects’ refer to living beings or things being tracked by aerial vehicles such as drones.
Pixels in the plurality of frames move when a subject moves while the camera is static or when the subject is static and the camera moves. Pixel movement from one position to other creates an optical track from a previous location to a new location and is referred to as the optical flow. In conventional optical flow-based methods, only a current frame and a previous frame are used to track pixel points. The optical flow approach of the present disclosure tracks corner features using n consecutive frames in the plurality of frames of the received aerial video, for a better understanding of the optical flow. FIG.3 illustrates the optical flow as a correspondence between pixel P1 and pixel P0 based on a traversal of pixel P0 across 4 consecutive frames, in accordance with some embodiments of the present disclosure. Pixel P0 having coordinates (x0,y0) in Frame-1 is tracked to position P1 (x1,y1) in Frame-2. P0 is now P1 (x1,y1) that moves to P1(x2,y2) in Frame-3. Accordingly, P0 is now P1(x2,y2) that moves to P1(x3,y3) in Frame-4 and so on. The updated positions of the pixel P0 from Frame-1 to Frame-4 has created an optical flow. The updated positions of the pixels provide not only the location information, but also the direction (and angle) in which the pixels in a frame are moving.
In accordance with the present disclosure, corners or corner features (used interchangeably) in an image are considered as good features to be tracked. Corners are those locations where two or more edges merge or intersect. End of lines are also considered as corners because it creates an edge between a frame background and a line that enables identifying the end of the line. FIG.4 illustrates corners in an image that are tracked, in accordance with some embodiments of the present disclosure. Accordingly, in an embodiment of the present disclosure, the one or more hardware processors 104, are configured to detect, at step 204, a plurality of corner features in a set of n consecutive frames in the plurality of frames. Typically, corner features are detected using methods like Harris Corner detection or Shi-Tomasi Corner Detection.
In an embodiment, the step of detecting a plurality of corner features comprises sequentially obtaining the plurality of corner features using a sliding window method, wherein a first frame from the set of n consecutive frames is eliminated and an ?(n+1)?^th frame is included into the set of n consecutive frames such that the cardinality n of the set of consecutive frames is maintained. FIG.5 illustrates the sliding window method for tracking optical flows, in accordance with some embodiments of the present disclosure. This improves accuracy of tracking optical flows. The sliding window method takes n consecutive frames, at time (in FIG. 5, n=5), processes the optical flows and moves the window on a right side one by one, as illustrated.
Not all pixels can be tracked due to common properties of pixel intensities. In the exemplary optical flow of pixel P0 shown in FIG.3, there can be other pixels having similar values, implying there can be more than one P1 pixels in every frame. When tracking the corners in the set of n consecutive frames in the plurality of frames, it is possible that if P1 in the next frame moves from one corner to another corner but another P1 pixel is near to the original P0 pixel. It is highly unlikely that the pixel in another corner is the same pixel that has moved in the next frame (not impossible though). To eliminate noise (pixels having similar values) and identify the P1 pixels in a subsequent frame for every tracked pixel (corners) in a previous frame, in accordance with the present disclosure, the one or more hardware processors 104, are configured to eliminate, at step 206 one or more of the plurality of corner features representing noise using two displacement thresholds to obtain a plurality of flow anchors.
In accordance with the present disclosure, eliminating noise is based on outlier rejection. A fast-moving group of optical flows may contain fewer no. of slow-moving optical flows (outliers). Likewise, a slow-moving group of optical flows may contain fewer no. of fast-moving optical flows (outliers). Group association is typically created based on spatial neighborhood.
In accordance with the present disclosure, the two displacement thresholds (one each for the faster moving group and the slower moving group respectively) are empirically determined and represent (i) an upper value and (ii) a lower value respectively, compared to the gross average optical flow vector magnitude, and is a % value with reference to each of the optical flow vectors. The upper value eliminates corner features that have a high possibility of not having correspondence with corner features of a previous frame (refer pixel in another call explained above). The lower value typically enables eliminating corner features corresponding to the aerial vehicle body appearing in a camera view. Once the one of more of the plurality of corner features are eliminated, the remaining corner features are referred to as flow anchors.
Tracking a single pixel may not provide a desired outcome considering there is a possibility that the tracked pixel may get lost in subsequent frames. Hence in accordance with the present disclosure, a plurality of flow anchors is tracked in a set of n consecutive frames. Accordingly, in an embodiment of the present disclosure, the one or more hardware processors 104, are configured to track, at step 208, optical flows of the plurality of flow anchors in the set of n consecutive frames, wherein the optical flows are associated with optical flow vectors and cardinality n of the set of consecutive frames is based on a gross average optical flow vector magnitude associated with the plurality of flow anchors. The gross average optical flow vector magnitude represents an average of all distances between P0 and P1. If the gross average optical flow vector magnitude (corresponding to speed of the aerial vehicle) is very high, then more no. of frames may have to be selected and vice-versa.
In accordance with the present disclosure, a new set of corner features are initialized if there is change in the scene observed by the camera. The scene change is determined through histogram variation of the scene and discontinuity in optical flow tracking. FIG.6 illustrates a flowchart for tracking the optical flows using the n consecutive frames, in accordance with some embodiments of the present disclosure.
FIG.7 illustrates a sample output of optical flow tracking based on a scene from a parking lot, in accordance with some embodiments of the present disclosure, wherein P0 pixels are referenced by white rectangles and P1 pixels are referenced by black points.
In an embodiment of the present disclosure, the one or more hardware processors 104, are configured to dynamically determine, at step 210, a vanishing point based on (i) an envelope of the optical flow vectors generated using the plurality of flow anchors or (ii) extrapolation of the optical flows moving in a same direction.
FIG.8 illustrates a vanishing point, in accordance with some embodiments of the present disclosure. Vanishing point is a point where two perpendicular lines in a 3-Dimensional (3D) plane intersect with each other in a 2-Dimensional plane (2D). FIG.8 illustrates a vanishing point, in accordance with some embodiments of the present disclosure. The left-side image shows a road with middle lane mark. Though the lines are parallel to each other, they appear to intersect at a point. The right-side image shows a perspective corrected image. Also, it may be noted that the length of middle lane marks in the left side image are not uniform, they decrease as the marks move closer to the vanishing point. In reality (Bird’s eye view), all lane marks have the same length.
FIG.9 illustrates a flowchart for determining a vanishing point based on the envelope of the optical flow vectors generated using the plurality of corner features and perspective correction, in accordance with some embodiments of the present disclosure. The optical flows associated with the flow anchors are reproduced on a binary image having a size matching a size of a frame in the plurality of frames, wherein the plurality of flow anchors in a first frame from the set of n consecutive frames are represented by white pixels on a black background of the binary image. It may be noted that the frames in the plurality of frames are of the same size and may also be referred as an input frame. The binary image is then processed to detect contours. In an embodiment, processing the binary image comprises performing one or more of blur, Sobel, morphologyEx and erosion sequentially to generate the envelope of the optical flow vectors (a blob created by contours or boundaries) associated with the white pixels on the binary image.
FIG.10 illustrates processing of a binary image for detecting contours, in accordance with some embodiments of the present disclosure. Suitable functions such as findContours() from libraries such as OpenCV may be employed in the process of detecting contours. The center of a largest contour (based on no. of pixels occupied by the contours) from the detected contours representing the envelope of the optical flow vectors, is then determined as the vanishing point. FIG.11 illustrates the detected contours and the envelope of the optical flow vectors, in accordance with some embodiments of the present disclosure.
FIG.12 illustrates a flowchart for determining a vanishing point based on extrapolation of the optical flows moving in a same direction and perspective correction, in accordance with some embodiments of the present disclosure. The optical flows associated with the flow anchors moving in a same direction (within a predefined threshold) are identified based on an angle of each of the optical flows with the horizontal plane. For instance, if there are 10 pixels and if 7 pixels are moving at an angle of 45-degrees and 3 pixels are moving at an angle of 90-degrees with the horizontal plane, then the 3 pixels are removed because majority of the pixels are moving at an angle of 45-degrees. The vanishing point is then determined as a point of intersection of the extrapolated optical flows.
To understand the linearity of the image, perspective correction (bird’s eye view) is performed. Accordingly, in an embodiment of the present disclosure, the one or more hardware processors 104, are configured to perform, perspective correction of the first frame in the set of n consecutive frames using the vanishing point. For performing perspective correction, four farthest flow anchors from the center of a frame in the plurality of frames is determined.
Ideally, the optical flows converge at a point as illustrated in FIG.8. However, practically, it may not happen. The vanishing point is determined as a center of a rectangle (vanishing point region in FIG.13) around the largest contour illustrated in FIG.11. If a line is considered passing through the center of the frame, perpendicular to the horizontal plane, there are flow anchors on either side of this line. Two lines are generated, one on either side of the central line, such that the lines are generated along the flow anchors located farthest from the central line. Four intersecting points (Point 1, Point 2, Point 3 and Point 4 in FIG.13) of a rectangular slice of the frame with the two generated lines serve as the four farthest flow anchors for performing perspective correction. FIG.13 illustrates a sample output of a scene from a parking lot with a vanishing point region, the vanishing point and the four points determined for perspective correction, in accordance with some embodiments of the present disclosure.
In accordance with the present disclosure, an inverse perspective transform is applied using (i) the four farthest flow anchors; and (ii) the first frame from the set of n consecutive frames as an input, to obtain a corresponding perspective corrected frame. FIG.14 illustrates a perspective corrected image of the image in FIG.13, in accordance with some embodiments of the present disclosure. FIG.15 provides a combined representation of FIG.7, FIG.11, FIG.13 and FIG.14 respectively for enhanced clarity on the processed image. It may be noted that the two tyres of the car in the middle are relatively further apart from each other in FIG.13 as compared to the perspective corrected image of FIG.14 due to non-linearity.
Once the perspective corrected image is obtained, distance between the pixels in the frame is linear. Hence, distance between subjects detected in the frame will be more accurate than the original aerial view captured by the aerial vehicle.
Practical applications of the method 200 of the present disclosure include detecting proximity between subjects or creating a heatmap for say, maintaining social distancing. Accordingly, the one or more hardware processors 104, are configured to perform one or more of (i) detecting one or more subjects in the aerial video using the perspective corrected frames as an input to a deep learning method [e.g. MobileNet, Single Shot Detector (SSD), You Only Look Once (YOLO)]; (ii) computing a Euclidean distance between the detected one or more subjects using the perspective corrected frames; and (iii) generating a heatmap using the detected one or more subjects and distances therebetween.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more hardware processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

Documents

Application Documents

#	Name	Date
1	202121061341-FER.pdf	2025-01-27
1	202121061341-Proof of Right [21-04-2022(online)].pdf	2022-04-21
1	202121061341-STATEMENT OF UNDERTAKING (FORM 3) [28-12-2021(online)].pdf	2021-12-28
2	202121061341-REQUEST FOR EXAMINATION (FORM-18) [28-12-2021(online)].pdf	2021-12-28
2	202121061341-Proof of Right [21-04-2022(online)].pdf	2022-04-21
2	202121061341-FORM-26 [20-04-2022(online)].pdf	2022-04-20
3	Abstract1.jpg	2022-03-23
3	202121061341-FORM-26 [20-04-2022(online)].pdf	2022-04-20
3	202121061341-FORM 18 [28-12-2021(online)].pdf	2021-12-28
4	Abstract1.jpg	2022-03-23
4	202121061341-FORM 1 [28-12-2021(online)].pdf	2021-12-28
4	202121061341-COMPLETE SPECIFICATION [28-12-2021(online)].pdf	2021-12-28
5	202121061341-COMPLETE SPECIFICATION [28-12-2021(online)].pdf	2021-12-28
5	202121061341-DECLARATION OF INVENTORSHIP (FORM 5) [28-12-2021(online)].pdf	2021-12-28
5	202121061341-FIGURE OF ABSTRACT [28-12-2021(online)].jpg	2021-12-28
6	202121061341-DECLARATION OF INVENTORSHIP (FORM 5) [28-12-2021(online)].pdf	2021-12-28
6	202121061341-DRAWINGS [28-12-2021(online)].pdf	2021-12-28
7	202121061341-DECLARATION OF INVENTORSHIP (FORM 5) [28-12-2021(online)].pdf	2021-12-28
7	202121061341-DRAWINGS [28-12-2021(online)].pdf	2021-12-28
7	202121061341-FIGURE OF ABSTRACT [28-12-2021(online)].jpg	2021-12-28
8	202121061341-COMPLETE SPECIFICATION [28-12-2021(online)].pdf	2021-12-28
8	202121061341-FIGURE OF ABSTRACT [28-12-2021(online)].jpg	2021-12-28
8	202121061341-FORM 1 [28-12-2021(online)].pdf	2021-12-28
9	202121061341-FORM 1 [28-12-2021(online)].pdf	2021-12-28
9	202121061341-FORM 18 [28-12-2021(online)].pdf	2021-12-28
9	Abstract1.jpg	2022-03-23
10	202121061341-FORM 18 [28-12-2021(online)].pdf	2021-12-28
10	202121061341-FORM-26 [20-04-2022(online)].pdf	2022-04-20
10	202121061341-REQUEST FOR EXAMINATION (FORM-18) [28-12-2021(online)].pdf	2021-12-28
11	202121061341-Proof of Right [21-04-2022(online)].pdf	2022-04-21
11	202121061341-REQUEST FOR EXAMINATION (FORM-18) [28-12-2021(online)].pdf	2021-12-28
12	202121061341-FER.pdf	2025-01-27
12	202121061341-STATEMENT OF UNDERTAKING (FORM 3) [28-12-2021(online)].pdf	2021-12-28
13	202121061341-OTHERS [03-06-2025(online)].pdf	2025-06-03
14	202121061341-FER_SER_REPLY [03-06-2025(online)].pdf	2025-06-03
15	202121061341-DRAWING [03-06-2025(online)].pdf	2025-06-03
16	202121061341-CLAIMS [03-06-2025(online)].pdf	2025-06-03
17	202121061341-PatentCertificate17-09-2025.pdf	2025-09-17
18	202121061341-IntimationOfGrant17-09-2025.pdf	2025-09-17

Search Strategy

1	drone_serE_22-01-2024.pdf