System And Method For 3 D Modeling Of Power Grid Corridor Using An

< Back

System And Method For 3 D Modeling Of Power Grid Corridor Using An Aerial Video

Abstract: Disclosed is a system (102) for 3D modeling of one or more videos. According to the system (102), a scene classification module (214) receives a video captured by a camera and classifies the video into scenes. A key frame identification module (216) identifies key frames from the set of frames of each scene. A depth determination module (218) determines a depth map associated with each scene, using key frames of that scene. An interpolation module (220) segments the depth map in order to obtain a segmented depth map, interpolates holes in the segmented depth map and fits the segmented depth map into a plane. An infringement determination module (222) determines an infringement of one or more objects in the plane with the predefined reference plane in the scene based upon a minimum distance from a point in the predefined reference plane to one of the points in the plane.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

24 March 2015

Publication Number

41/2016

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

ip@legasis.in

Parent Application

Patent Number

Legal Status

Grant Date

2024-03-19

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point, Mumbai 400021, Maharashtra, India.

Inventors

1. DUTTA, Tanima

Tata Consultancy Services Limited, Abhilash Building, Plot No. 96 EP-IP Industrial Area, Whitefield Road, Bangalore 560 066, Karnataka, India

2. PURUSHOTHAMAN, Balamuralidhar

Tata Consultancy Services Limited, Abhilash Building, Plot No. 96 EP-IP Industrial Area, Whitefield Road, Bangalore 560 066, Karnataka, India

3. SHARMA, Hrishikesh

Tata Consultancy Services Limited, Abhilash Building, Plot No. 96 EP-IP Industrial Area, Whitefield Road, Bangalore 560 066, Karnataka, India

Specification

CLIAMS:WE CLAIM:

1. A system (102) for 3D modeling of one or more videos, the system comprising:
a processor (202); and
a memory (206) coupled to the processor (202), wherein the processor (202) is configured to execute a plurality of modules (208) stored in the memory (206), and wherein the plurality of modules (208) comprising:
a scene classification module (214) to
receive a video captured by a camera, wherein the video comprises a plurality of frames; and
classify the video into a plurality of scenes using contextual information associated with the video, wherein a scene comprises a set of frames of the plurality of frames;
a key frame identification module (216) to identify key frames from the set of frames of the scene;
a depth determination module (218) to determine a depth map associated with the scene, wherein the depth map is a fusion of a plurality of disparity maps associated with the key frames;
an interpolation module (220) to
segment the depth map using a color segmentation technique in order to obtain a segmented depth map;
interpolate holes in the segmented depth map, wherein the holes indicate noisy depth values in the segmented depth map, wherein the holes are identified based upon the segmentation; and
fit the segmented depth map into a coordinate plane; and
an infringement determination module (222) to
compute a minimum distance from a point in a predefined reference plane in the scene to one of a plurality of points in the coordinate plane;
compare the minimum distance with a predefined threshold; and
determine infringement of one or more objects in the coordinate plane with the predefined reference plane based upon the comparison of the minimum distance with the predefined threshold.

2. The system (102) of claim 1 further comprising a calibration module (212) configured to calibrate one or more parameters of the camera based upon an input checker board technique, wherein the one or more parameters comprise at least one of focal length, principal point offset, rotation and translation matrices with respect to world coordinate system.

3. The system (102) of claim 1, wherein the key frame identification module (216) identifies the key frames by:
analyzing each frame of the set of frames using Markov Random Field technique;
assigning a weight to each frame of the set of frames based upon the analysis of the Markov Random Field technique; and
generating the key frames based upon an average of the weights assigned to each frame.

4. The system (102) of claim 1, wherein the depth determination module (218) is further configured to determine a disparity map using a semi-global block matching technique, and wherein the semi-global block matching technique comprises:
dividing a key frame into multiple blocks;
comparing an energy function of a block in a key frame with an energy function of every block in an adjacent key frame in order to identify a matching block;
determining the disparity map corresponding to the key frame and the adjacent key frame, wherein the disparity map comprises depth values, and wherein a depth value indicates a distance of a pixel present in the matching block from the camera, and wherein the depth value is computed using a triangulation technique based upon the position of the pixel and positions of the camera.

5. The system (102) of claim 1, wherein the interpolation module (220) further represents the coordinate plane by plane parameters including η, μ and of an equation,

6. The system (102) of claim 5, wherein the holes are interpolated using a combination of a RANSAC technique and a weighted kernel voting technique by estimating value of each of the plane parameters for the holes as well as other significant blocks indicating the PLC.

7. A method (300) for 3D modeling of one or more videos, the method comprising:
receiving (304), by a processor, a video captured by a camera, wherein the video comprises a plurality of frames;
classifying (306), by the processor, the video into a plurality of scenes using contextual information associated with the video, wherein a scene comprises a set of frames of the plurality of frames;
identifying (308), by the processor, key frames from the set of frames of the scene;
determining (310), by the processor, a depth map associated with the scene, wherein the depth map is a fusion of a plurality of disparity maps associated with the key frames;
segmenting (312), by the processor, the depth map using a color segmentation technique in order to obtain a segmented depth map;
interpolating (314), by the processor, holes in the segmented depth map, wherein the holes indicate noisy depth values in the segmented depth map, wherein the holes are identified based upon the segmentation;
fitting (316), by the processor, the segmented depth map into a coordinate plane;
computing (318), by the processor, a minimum distance from a point in a predefined reference plane in the scene to one of a plurality of points in the coordinate plane;
comparing (320), by the processor, the minimum distance with a predefined threshold; and
determining (322), by the processor, an infringement of one or more objects in the coordinate plane with the predefined reference plane based upon the comparison of the minimum distance with the predefined threshold.

8. The method of claim 7 further comprising calibrating (302), by the processor, one or more parameters of the camera based upon an input checker board technique, wherein the one or more parameters comprise at least one of focal length, principal point offset, rotation and translation matrices with respect to world coordinate system.

9. The method of claim 7, wherein the camera is present in an unmanned aerial vehicle (UAV), and wherein the video is associated with a target environment being monitored by the UAV, wherein the target environment includes a power line corridor surrounded by the one or more objects, wherein the power line corridor is indicative of the predefined reference plane in the scene.

10. The method of claim 7, wherein the contextual information comprises at least one of GPS data associated each frame in the video, a number of frames captured by the camera per second, a velocity of the UAV, a height of the UAV, an angle subtended by entire view of the camera, and an angle subtended by an angular bisector of the field of view with the vertical.

11. The method of claim 7, wherein the identifying (308) the key frames further comprises:
analyzing each frame of the set of frames using Markov Random Field technique;
assigning a weight to each frame of the set of frames based upon the analysis of the Markov Random Field technique; and
generating the key frames based upon an average of the weights assigned to each frame.

12. The method of claim 7, wherein the determining (310) further comprises obtaining a disparity map using a semi-global block matching technique, and wherein the semi-global block matching technique comprises:
dividing a key frame into multiple blocks;
comparing an energy function of a block in a key frame with an energy function of every block in an adjacent key frame in order to identify a matching block;
determining the disparity map corresponding to the key frame and the adjacent key frame, wherein the disparity map comprises depth values, and wherein a depth value indicates a distance of a pixel present in the matching block from the camera, and wherein the depth value is computed using a triangulation technique based upon the position of the pixel and positions of the camera.

13. The method of claim 7, wherein the color segmentation technique facilitates representing one or more pixels, in the depth map, having similar depth values with a similar color.

14. The method of claim 7, the coordinate plane is represented by plane parameters including η, μ and of an equation,

15. The system (102) of claim 14, wherein the holes are interpolated using a combination of a RANSAC technique and a weighted kernel voting technique by estimating value of each of the plane parameters for the holes.

16. The method of claim 7, wherein the infringement of the one or objects is determined when the minimum distance is less than the predefined threshold.

17. A non-transitory computer readable medium embodying a program executable in a computing device for 3D modeling of one or more videos, the program comprising:
a program code for receiving a video captured by a camera, wherein the video comprises a plurality of frames;
a program code for classifying the video into a plurality of scenes using contextual information associated with the video, wherein a scene comprises a set of frames of the plurality of frames;
a program code for identifying key frames from the set of frames of the scene;
a program code for determining a depth map associated with the scene, wherein the depth map is a fusion of a plurality of disparity maps associated with the key frames;
a program code for segmenting the depth map using a color segmentation technique in order to obtain a segmented depth map;
a program code for interpolating holes in the segmented depth map, wherein the holes indicate noisy depth values in the segmented depth map, wherein the holes are identified based upon the segmentation;
a program code for fitting the segmented depth map into a coordinate plane;
a program code for computing a minimum distance from a point in a predefined reference plane in the scene to one of a plurality of points in the coordinate plane;
a program code for comparing the minimum distance with a predefined threshold; and
a program code for determining an infringement of one or more objects in the coordinate plane with the predefined reference plane based upon the comparison of the minimum distance with the predefined threshold. ,TagSPECI:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
SYSTEM AND METHOD FOR 3D MODELING OF POWER GRID CORRIDOR USING AN AERIAL VIDEO

APPLICANT:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
[001] The present application does not claim priority to any application.

TECHNICAL FIELD
[002] The present disclosure described herein, in general, relates to video processing, more particularly to a system and method for 3D modeling of one or more videos captured via a camera present in unmanned aerial vehicle (UAV).

BACKGROUND
[003] Power line corridors (PLC) include distribution poles carrying transmission lines for distributing electricity across different households. The PLC infrastructure has properties of vastness like long distance and large coverage and entails high security and reliability requirements due to its impact on the life of the general public. Few of the factors that may impact the PLC safety include encroaching vegetation, ambient temperature, and structural faults of insulator and tower and the like. It has been observed that due to over sagging of the transmission lines and interferences by trees encroaching in right of way, or leaning dangerously due to storm, flood etc, accidents are caused. Therefore, maintenance, both preventive and breakdown, of the power line corridors is required. Further, regular monitoring of PLC is required for uninterrupted power supply.
[004] In the existing art, the automated monitoring of power line corridors is being facilitated using aerial remote surveillance platforms such as unmanned aerial vehicles (UAVs) and vision-based techniques. Typically, these techniques may be used to detect infringements/interference caused by the trees, building, and pavements etc on the PLC infrastructure. Such detection is possible based on 3D reconstruction of the objects such as trees, building, and pavements etc from aerial images captured through cameras present in the UAV. However, there exist technical challenges with respect to utilization of the existing remote surveillance platforms such as unmanned aerial vehicles (UAVs) and the vision-based techniques.
[005] One of the technical challenges is that the motion of the UAV is highly turbulent and flickers due to the presence of wind, vibration of motors, electromagnetic interference with power lines, etc. Even the IMU data, which is available from the UAV, is not very accurate. None of the existing video stabilization techniques may be suited for stabilizing such a video. Another technical challenge is that the power lines in aerial images do not expose significant surface, especially in UAV images, the power lines are not even focused in many frames. In addition, the component of PLCs, such as transmission lines and distribution poles are texture-less (wiry) objects and the UAVs cannot fly very close to the power line due to electromagnetic interference with power lines. In such a scenario, 3-D reconstruction of PLCs is challenging task.
[006] Yet another technical challenge is that the video collected using the UAV is a one way trip following the PLCs. To provide a cost effective sensing scenario, the camera used may be a simple digital camera and not a stereo-rig camera. This may further lead difficulties in the selection of appropriate stereo images for reconstruction. In aerial imaging, a fish-eye lens or a wide angle lens camera is generally used to capture video, which generates barrel distortion. While rectifying such video there may be loss of information, such as resolution, depth information, etc. and thus the quality degrades.
[007] Further, in stereo vision, generally, the viewing angle of the camera is taken perpendicular to the baseline. However, in to reduce the minimum occlusions of the power line, if a front looking camera is used for overhead monitoring of PLC, the viewing angle of the camera is not perpendicular to the baseline. Such scenario leads to the problem of non-consistent baseline due to turbulent motion of the UAV, which makes the 3-D reconstruction process more complex.
SUMMARY
[008] Before the present systems and methods, are described, it is to be understood that this application is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments which are not expressly illustrated in the present disclosures. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present application. This summary is provided to introduce concepts related to systems and methods for 3D modeling of one or more videos and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the disclosure nor is it intended for use in determining or limiting the scope of the disclosure.
[009] In one implementation, a system for 3D modeling of one or more videos is disclosed. In one aspect, the system may comprise a processor and a memory coupled to the processor. The processor may be configured to execute a plurality of modules stored in the memory. The plurality of modules may comprise a calibration module, a scene classification module, a key frame identification module, a depth determination module, an interpolation module and an infringement determination module. The calibration module may calibrate one or more parameters of a camera based upon an input checker board technique. The scene classification module may receive a video captured by the camera. The video may further comprise a plurality of frames, in a time sequence. Further, the scene classification module may classify the video into a plurality of scenes using contextual information associated with the video. In an aspect, the classification is done in a manner such that a scene of the plurality of scenes comprises a subsequence of frames. The key frame identification module may identify key frames from the set of frames of the scene. The depth determination module may determine a depth map associated with the scene. In an aspect, the depth map may be a fusion of a plurality of disparity maps associated with various key frames, in order to improve reliability. The interpolation module may segment the depth map using a color segmentation technique in order to obtain a segmented depth map. Further, the interpolation module may interpolate holes in the segmented depth map, wherein the holes indicate noisy depth values in the segmented depth map. In an aspect, the holes may be identified based upon the segmentation. Furthermore, the interpolation module may fit the segmented depth map into a coordinate plane. The infringement determination module may compute a minimum distance from a point in a predefined reference plane in the scene to one of a plurality of points in the coordinate plane. Further, the infringement determination module may compare the minimum distance with a predefined threshold. Further, the infringement determination module may determine an infringement of one or objects in the coordinate plane with the predefined reference plane based upon the comparison of the minimum horizontal distance with the predefined threshold.
[0010] In one implementation, a method for 3D modeling of one or more videos is disclosed. The method may comprise calibrating, by a processor, one or more parameters of the camera. Further, the method may comprise receiving, by the processor, a video captured by the camera. The video may further comprise a plurality of frames. The method may further comprise classifying, by the processor, the video into a plurality of scenes using contextual information associated with the video. In an aspect, the classification is done in a manner such that a scene of the plurality of scenes comprises a set of frames of the plurality of frames. Further, the method may comprise identifying, by the processor, key frames from the set of frames of the scene. The method may comprise determining, by the processor, a depth map associated with the scene. In an aspect, the depth map may be a fusion of a plurality of disparity maps associated with key frames. The method may further comprise segmenting, by the processor, the depth map using a color segmentation technique in order to obtain a segmented depth map. The method may further comprise interpolating holes in the segmented depth map, wherein the holes indicate noisy depth values in the segmented depth map. In an aspect, the holes may be identified based upon the segmentation. Further, the method may comprise fitting the segmented depth map into a coordinate plane. Further, the method may comprise computing, by the processor, a minimum distance from a point in a predefined reference plane in the scene to one of a plurality of points in the coordinate plane. The method may further comprise comparing, by the processor, the minimum distance with a predefined threshold. Further, the method may comprise determining, by the processor, an infringement of one or objects in the coordinate plane with the predefined reference plane based upon the comparison of the minimum horizontal distance with the predefined threshold.
[0011] In yet another implementation, non-transitory computer readable medium embodying a program executable in a computing device for 3D modeling of one or more videos is disclosed is disclosed. The program may comprise a program code for calibrating one or more parameters of the camera. Further, the program may comprise a program code for receiving a video captured by the camera. The video may further comprise a plurality of frames. The program may further comprise a program code for classifying the video into a plurality of scenes using contextual information associated with the video. In an aspect, the classification is done in a manner such that a scene of the plurality of scenes comprises a set of frames of the plurality of frames. Further, the program may comprise a program code for identifying key frames from the set of frames of the scene. The program may comprise a program code for determining a depth map associated with the scene. In an aspect, the depth map may be a fusion of a plurality of disparity maps associated with key frames. The program may further comprise a program code for segmenting the depth map using a color segmentation technique in order to obtain a segmented depth map. The program may further comprise a program code for interpolating holes in the segmented depth map, wherein the holes indicate noisy depth values in the segmented depth map. In an aspect, the holes may be identified based upon the segmentation. Further, the program may comprise a program code for fitting the segmented depth map into a coordinate plane. Further, the program may comprise a program code for computing a minimum distance from a point in a predefined reference plane in the scene to one of a plurality of points in the coordinate plane. The program may further comprise a program code for comparing the minimum distance with a predefined threshold. Further, the program may comprise a program code for determining an infringement of one or objects in the coordinate plane with the predefined reference plane based upon the comparison of the minimum horizontal distance with the predefined threshold.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] The foregoing summary as well as detailed description of embodiments of the present disclosure is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the disclosure, there is shown in the present document example constructions of the disclosure; however, the disclosure is not limited to the specific methods and apparatus disclosed in the document and the drawings.
[0013] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[0014] Fig. 1 illustrates a network implementation of a system for 3D modeling of one or more videos, in accordance with an embodiment of the present disclosure.
[0015] Fig. 2 illustrates the system, in accordance with an embodiment of the present subject matter.
[0016] Fig. 3 illustrates a method for 3D modeling of one or more videos, in accordance with an embodiment of the present disclosure.
[0017] Fig. 4 illustrates blocks of adaptive size used in implementing a semiglobal block matching technique for determining a depth map, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0018] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the exemplary, systems and methods are now described. The disclosed embodiments are merely exemplary of the disclosure, which may be embodied in various forms.
[0019] While aspects of described system and method for 3D modeling of one or more videos may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
[0020] Referring now to Figure 1, a network implementation 100 of a system 102 for 3D modeling of one or more videos is illustrated, in accordance with an embodiment of the present subject matter. The one or more videos may be captured by a camera (not shown in Figure 1) present in an unmanned aerial vehicle (UAV) 104. Though the system 102 has been shown communicatively coupled with the UAV 104, in some embodiments, the system 102 may be present in the UAV 104 itself. In one embodiment, the system 102 may initially calibrate one or more parameters of the camera. In one embodiment, the one or more parameters may be calibrated based upon an input checker board technique. The one or more parameters calibrated may comprise at least one of a focal length, principal point offset, rotation, translation matrices with respect to world coordinate system and the like. In an embodiment, after the calibration of the one or more parameters, the camera may capture one or more videos (hereinafter referred to as a video) associated with a target environment 106 being monitored by the UAV 104. The target environment 106 may include a power line corridor (PLC) surrounded by one or more objects. The one or more objects may include buildings, trees, pavement, and the like.
[0021] The system 102 may receive the video captured by the camera. The video may further comprise a plurality of frames. The system 102 may further classify the video into a plurality of scenes using contextual information associated with the video. In one embodiment, the contextual information, used for the classification, may comprise at least one of GPS data associated each frame in the video, a number of frames captured by the camera per second, a velocity of the UAV 104, a height of the UAV 104, an angle subtended by entire view of the camera, and an angle subtended by angular bisector of the field of view with the vertical, wherein the angle subtended by the angular bisector may also be interpreted as the “pitch” angle of the camera mount. In an aspect, the classification is done in a manner such that a scene may comprise a set of frames of the plurality of frames.
[0022] Further, the system 102 may identify key frames from the set of frames of the scene. The key frames may be identified based upon analyzing each frame of the set of frames using Markov Random Field technique, assigning a weight to each frame of the set of frames based upon the analysis of the Markov Random Field technique and generating the key frames based upon an average of the weights assigned to each frame. The system 102 may determine a depth map associated with the scene. In an aspect, the depth map may be a fusion of a plurality of disparity maps associated with key frames.
[0023] In one embodiment, a disparity map may be determined using a semi-global block matching technique. The system 102, using the semi-global block matching technique, may divide a key frame into multiple blocks. Further, system 102, using the semi-global block matching technique, may compare an energy function of a block in a key frame with an energy function of every block in an adjacent key frame in order to identify a matching block. Further, the system 102, using the semi-global block matching technique, may determine the disparity map corresponding to the key frame and the adjacent key frame. The disparity map further comprises depth values, wherein a depth value indicates a distance of a pixel present in the matching block from the camera. In an embodiment, the depth value may be computed, using a triangulation technique, based upon the position of the pixel and positions of the camera. The fusion of the plurality of disparity maps results in the generation of the depth map.
[0024] The system 102 may further segment the depth map using a color segmentation technique in order to obtain a segmented depth map. The color segmentation technique may facilitate representing one or more pixels in the depth map having similar depth values with a similar color. Further, system 102 may fit the segmented depth map into a coordinate plane. In an aspect, the system 102 may represent the coordinate plane by plane parameters including η, μ and of an equation,
[0025] It must be understood that like the standard equation for a line in coordinate geometry, y = m*x +c, the coordinate plane is a 2-D artifact in coordinate geometry. Hence there are two axes, denoted by u and v. The plane cuts u-v plane in a line. The angle between the said line and the line corresponding to u axis is represented by η. Similarly, the angle between the line of planes’ intersection and line corresponding to v axis is μ. Ψ is the normal/perpendicular offset of the plane with the origin.
[0026] Before fitting the segmented depth map into the coordinate plane, the system 102 may interpolate holes in the segmented depth map, wherein the holes indicate noisy depth values in the segmented depth map. In an aspect, the holes may be identified based upon the segmentation. In one embodiment, the holes may be interpolated using a combination of a Random sample consensus (RANSAC) technique and a weighted kernel voting technique. Particularly, the system 102 may estimate value of each of the plane parameters for the holes. Further, the system 102 may compute a minimum distance from a point in a predefined reference plane in the scene to one of a plurality of points in the coordinate plane. In an aspect, the predefined reference plane may be indicative of the power line corridor (PLC) in the scene. The system 102 may further compare the minimum distance with a predefined threshold. Further, the system 102 may determine an infringement of the one or objects in the coordinate plane with the predefined reference plane based upon the comparison of the minimum distance with the predefined threshold. The infringement of the one or objects may be determined when the minimum horizontal distance is less than the predefined threshold.
[0027] Although the present subject matter is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, a cloud-based computing environment and the like. It will be understood that the system102 may be accessed by multiple users through one or more user devices 108-1, 108-2…108-N, collectively referred to as user devices 108 hereinafter, or applications residing on the user devices 108. In one implementation, the system 102 may comprise the cloud-based computing environment in which a user may operate individual computing systems configured to execute remotely located applications. Examples of the user devices 108 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 108 are communicatively coupled to the system 102 through a network 110.
[0028] In one implementation, the network 110 may be a wireless network, a wired network or a combination thereof. The network 110 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 110 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 110 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0029] Referring now to Figure 2, the system 102 is illustrated in accordance with an embodiment of the present disclosure. In one embodiment, the system 102 may include a processor 202, an input/output (I/O) interface 204, and a memory 206. The processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 202 is configured to fetch and execute computer-readable instructions stored in the memory 206.
[0030] The I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 102 to interact with the user directly or through the user devices 108. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.
[0031] The memory 206 may include any computer-readable medium and computer program product known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 206 may include modules 208 and data 210.
[0032] The modules 208 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 208 may include a calibration module 212, a scene classification module 214, a key frame identification module 216, a depth determination module 218, an interpolation module 220, an infringement determination module 222 and other modules 224. The modules 208 may include programs or coded instructions that supplement applications and functions of the system 102. The modules 208 described herein may be implemented as software modules that may be executed in the cloud-based computing environment of the system 102.
[0033] The data 210, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules 208. The data 210 may also include a database 226 and other data 228. The other data 228 may include data generated as a result of the execution of one or more modules in the other modules 224.
[0034] In one implementation, at first, a user may use one of the user devices 108 to access the system 102 via the I/O interface 204. The user may register themselves using the I/O interface 204 in order to use the system 102. In one aspect, the system 102 may enable 3D modeling of the one or more videos using the plurality of modules 208. Further, the system 102 may implement a method 300 illustrated in Figure 3 using the plurality of modules 208. The detailed description of the method 300 along with functionalities of each of the plurality of modules 208 implementing the method 300 is further described referring to Figures 2 and 3 as below.
[0035] Referring to Figure 3, the method 300 for 3D modeling of the one or more videos is shown, in accordance with an embodiment of the present disclosure. The method 300 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 300 may be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0036] The order in which the method 300 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 300 or alternate methods. Additionally, individual blocks may be deleted from the method 300 without departing from the spirit and scope of the disclosure described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 300 may be considered to be implemented in the above described in the system 102.
[0037] At block 302, one or more parameters of the camera of the UAV 104 may be calibrated. In one implementation, the one or more parameters of the camera may be calibrated by the calibration module 212. Further, the calibration of the one or more parameters using the calibration module 212 is explained in detail hereinafter.
[0038] In one embodiment, the camera present in the UAV 104 may typically be a fish-eye lens or a wide angle lens camera. Such lens may distort the video being captured. Furthermore, the fish-eye lens may generate barrel distortion. Therefore, the calibration module 212 may be configured to perform calibration of the camera based on an input checker board video technique. The camera may be calibrated by changing angles and the position of the chosen objects in a video. For a cost effective sensing solution, instead of capturing snapshots from a stereo-rig, the UAV 104 may use a normal aerial camera (digital) to capture images. However, such aerial cameras have fish lens or wide angle lens which may cause distortion in the image. Therefore, the calibration module 212 may be configured to use the same camera to capture a checker board video for calibration. The calibration video may be captured by placing the camera at a fixed position and moving the checker board at different locations and at each location, denoted by shot, a set of images may be captured.
[0039] In this embodiment, assume a set of shots, denoted by Gj, where each Gj = {Y1, Y2, Y3…….Yi) may be selected from the calibration video U such that U = {G1, G2, G3…..Gj}. It is to be noted that a shot Yi = {C1, C2, C3…..Ck) may be a sequence of images and ‘k’ is the number of images in a shot. For each Yi, the calibration module 212 may filter images to select the stereo images. Such stereo images in all Yi collectively form a best possible Gj, which reduces the total re-projection error for all the points in all the available views of the video in the process of calibration. The calibration module 212 may assume that the intrinsic parameters of the camera for each of the views and external pose information are already estimated using a technique proposed by Z. Zhang in a publication titled “A flexible new technique for camera calibration” (IEEE Trans. On pattern analysis and machine intelligence, vol. 22, no. 11, pp. 1330–1334, 2000) and for each stereo pair in Yi, the calibration module 212 may compute the following extrinsic parameters:

[0040] Where, Q1i, Q2i, T1i and T2i are rotation matrices and translation vectors of a stereo pair {C1, C2} respectively. Qi and Ti are relative rotation matrix and translation vector of the stereo pair {C1, C2} respectively. After computing all such stereo pairs, finally Q and T matrices may be estimated to obtain the overall relative rotation and translation matrices of the camera with respect to that video using methodology proposed in a publication titled “Opencv 3d reconstruction documentation,” 2014 (Online available: http://docs.opencv.org/modules/calib3d/doc/calib3d.html). Furthermore, the calibration module 212 may rectify the video based on calibration parameters such that all the epipolar lines become parallel and simplifies the dense stereo correspondence problem using technique proposed by A. Fusiello, E. Trucco, and A. Verri in a publication titled “A Compact Algorithm for Rectification of Stereo Pairs” (Mach. Mach. Vision Appl., vol. 12, no. 1, pp. 16–22, 2000.) It reduces the search for correspondences to the epipolar lines. The key feature of this type of correspondence processing is that the epipolar lines are easily determined and the emphasis is on increasing the speed and accuracy of the search itself. The outputs of the above process helps to estimate rectification transform matrix, projection matrix, and disparity-to-depth mapping matrix (perspective transformation matrix) as proposed in the publication titled “A Compact Algorithm for Rectification of Stereo Pairs”. After the calibration of the parameters of the camera, the camera may be enabled to capture the video associated with Power Line Corridors (PLC) 106 shown in Figure 1.
[0041] Now referring to Figure 3, at block 304, the video captured by the camera may be received. The video may be received by the scene classification module 214. The video may further comprise a plurality of frames. Further, at block 306, the video may be classified into a plurality of scenes using contextual information associated with the video captured. The video may be classified into the plurality of scenes by the scene classification module 214. Further, the classification of the video into the plurality of scenes using the scene classification module 214 is explained in detail hereinafter.
[0042] The video captured may be further processed by the system 102 in order reduce the complexity of all further processes such that the turbulent motion of the UAV 104 and complex heterogeneous background will not affect the 3-D reconstruction of objects in the video. It is to be noted that the 3-D modeling of the video of the PLC may not be feasible due to lack of depth information in many parts of the video due to complex background and the erratic camera shake that arise the complex problem of non-consistent baseline. In order to handle this problem, the scene classification module 214 may be configured to divide the video associated with the PLC 106 into scenes so that depth information for each scene may be estimated. In one embodiment, the scene classification module 214 may implement a mathematical analysis for the scene classification based on the contextual information, included but not limited to, a speed, a height, and an orientation of the UAV 104, frame rate associated to the camera, and a direction of the field view.
[0043] In one embodiment, the scene classification module 214 may divide the video into the plurality of scenes in a manner such that a scene is defined as a set of frames which contain information on certain real-world geographical location. All frames in the scene are assumed to be highly correlated. While an input is a video sequence, the primary task is to divide the sequence into scenes. The camera, which acts a front facing camera, placed on the UAV 104 may capture the video with field of view at different instants. The spatial displacement of the foreground objects in consecutive frames is quite high and also random in nature due to the turbulent motion of the UAV 104. The scene classification module 214 implements the mathematical analysis which calculates the number of frames Fy captured by the camera till the last frame reserved V amount of a particular scene from its initial point of capture, such that,

Where ‘fc’ is frames per second of the camera assuming uniform video recording, ‘ω’ is the velocity (in m/s) of the UAV 104, ‘h’ is the height (in meters) of the UAV 104, ‘θ’ is an angle subtended by the entire field of view of camera, and ‘Ф’ is the angle subtended by the angular bisector of field of view with vertical, i.e. in the direction of the field view. It is to be noted that the aforementioned contextual based information is available from the UAV 104. Further, in accordance with this embodiment, the value of V is assumed to be 50%. That means each scene comprises content that is 50% overlapped with the subsequent scene. However, the value of ‘V’ is highly dependent on the GPS/IMU information available from UAV 104. Moreover, by adopting a naive approach the total number of scenes in the video is given by S = Ft/Ff, where Ft represents the total number of frames in the video sequence. Each scene initially contains Ff consecutive frames in the order of input video sequence. Further, after the classification of the video into the plurality of scenes, each scene may further be processed using a key frame identification module 216 for identification of key frames for each scene explained in detail as below.
[0044] Referring Figure 3, at block 308, the key frame identification module 216 may be configured to identify key frames of the set of frames of a scene. It is to be noted that because the erratic camera shake present in the videos is amplified by the speed-up, simply frame sub-sampling coupled with existing video stabilization methods may not work. Most registration methods deal with sequences recorded simultaneously by cameras or rigidly attached to each other that involve a fixed parametric model and a fixed geometric transformation across the frames. The problem may be formulated as some minimization over a small number of parameters related to geometric transformations. However, the frame alignment problem becomes more challenging when such constraint for spatial mapping is unstable because the input video has been acquired by non-uniformly moving camera on the UAV 104, i.e., the video does not have a consistent baseline. The key frame identification module 216 herein implements a scene based key frame estimation method, which uses a frame alignment process to handle the challenging problem of aligning frames obtained from a non-uniformly moving camera in a turbulent environment and estimates the key frames for the scene. First, the key frame identification module 216 addresses the problem of frame alignment, which is formulated as a unique inference problem on a huge number of parameters for non-fixed geometric transformation. The key frame identification module 216 may integrate the estimation of the spatial parameters into a standard Markov Random Field (MRF), thus restricting the spatial transformation according to the neighborhood. Next, the key frame identification module 216 may assign weight maps for each of the candidate frames to finally estimate the key frames for each scene. The details of the scene-based key frame estimation are explained hereinafter as below.
[0045] In one embodiment, consider an individual scene Z may be extracted using the scene classification module 214 as explained above. The key frame identification module 216 may select two frames namely a reference frame i and any other frame j from the total number of frames fz in the scene Z without loss of generality. The registration or spatial alignment, estimates a parameter θi,j of the geometric transformation model which relates the image coordinate systems of such pair of corresponding frames (i, j). For instance, θi, j would have 8 parameters defining the homography between the pair of corresponding frames. Therefore, the problem of registration deals with estimating a non-parametric and unknown geometric transformation, e.g., a homography or affine transformation. In such a scenario, the key frame identification module 216 assume a pair of corresponding frames to be recorded at the same position but with different camera pose, where the principal point, i.e., the origin of the image coordinate system, is at the image center and the focal length of the axis are equal. The image-coordinates of such a pair are related by a conjugate rotation H = X.R (θi,j).X-1, where X being diag (f, f, 1), X-1 is the inverse of X, and f is the focal length. The relative pose R(θi,j) is parameterized by the Euler angles θi,j = [θx;i,j θy;i,j θz;i,j] proposed by F. Diego, D. Ponsa, J. Serrat, and A. Lopez in a publication titled “Video Alignment for Change Detection” (IEEE Trans. on Image Processing, vol. 20, no. 7, pp. 1858–1869, 2011). The spatial alignment estimates the parameter Θ, such that, Θ = [θi, 1: fz], where fz denote the number of frames in the scene Z. Given the scene Z, the scene alignment Θ* is posed as a maximum a posteriori Bayesian inference problem, i.e.,

where Θ* is the most likely spatial mapping between the corresponding frames and τ is the set of all geometric transformation parameters as proposed by F. Diego, J. Serrat, and A. Lopez in a publication titled “Joint Spatio-Temporal Alignment of Sequences” (IEEE Trans. on Multimedia, vol. 15, no. 6, pp. 1377–1387, 2013). The posterior probability density is decomposed assuming Gibbs distributions proposed by S. Geman and D. Geman in a publication titled “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images” (IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721–741, 1984) since the probability density function (PDF) is described as sum parts of energy functions in the distribution. Such decomposition is written as follows:

Where, N(Θ), E(Θ) = - log p (Z | Θ), E(Θ) and E (θi,1:fz) = - log p(θi,1:fz) are the partition function, the spatial alignment energy of the scene Z, initial spatial alignment energy, and the energy of regularization terms, respectively. Therefore, based on (4), Θ* in (3) is reformulated as follows:

Simplifying, the key frame identification module 216 may evaluate the spatial alignment energy independently from a given pair of corresponding frames, such that,

In order to make it more robust for alignment of outliers, the key frame identification module 216 may compute similarity as the sum of intensity-differences evaluated by a robust function δ as follows:

Where Zi (u; θi, j) = Zi (u + v (u; θi, j)) and v (u; θi, j)) = φ (u) x θi, j is the quadratic approximation of the geometric transformation between corresponding frames. φ (u) is a matrix which depends only on the pixel coordinates and Zj is a frame in the scene Z. Under the assumptions that these angles are small and the focal length is large enough, Zi (u; θi, j) is approximated using a Taylor series. Therefore, without loss of generality, the key frame identification module 216 may perform the quadratic approximation of the conjugate rotation, as proposed by F. Diego, D. Ponsa, J. Serrat, and A. Lopez in the publication titled “Video Alignment for Change Detection”, as follows:

For smoothness constraint, spatial regularization E (Θ) penalizes high variations among successive parameters of the geometric transformation θi, j. The spatial regularization term may therefore be evaluated as follows:

Where λz, Nd, and ρz are the parameters that maintain influence of the regularization term to restrict the smoothness of geometric transformation among successive frames, the number of variables in the geometric transformation, e.g., 3 parameters in a conjugate rotation, and the function to evaluate each geometric variable, e.g., Charbonnier proposed by D. Sun, S. Roth, and M. Black in a publication titled “Secrets of optical flow estimation and their principles” (CVPR, 2010, pp. 2432–2439) or L-2 penalty function, respectively. In an embodiment, the key frame identification module 216 may estimate the alignment parameters Θ by minimizing the following equation:

It is to be noted that the optimization of the equation (10) may be non-convex, and difficult to minimize depending on the choice of the robust penalty functions δ (.) and ρz (.). Therefore, instead of finding a global optimum in the equation (10), the key frame identification module 216 may perform a simple local optimization by computing the gradient and setting to zero as proposed by F. Diego, J. Serrat, and A. Lopez in the publication titled “Joint Spatio-Temporal Alignment of Sequences” (IEEE Trans. on Multimedia, vol. 15, no. 6, pp. 1377–1387, 2013). In the next step, the task is to determine the optimal number of input frames for every individual scene, which generates depth maps with acceptable quality. This is required both for efficiency and to reduce outliers, which might occur in 3-D depth estimation if each pixel were to be chosen from a different frame. The key frame identification module 216 may assign weight maps for each of these candidate frames, such that,

Where Wk, l (x, y) denotes a weight for using frame k for scene l and αmax is an upper threshold above which the key frame identification module 216 considers the quality “good enough”. In one embodiment, the value of αmax = 0.6. However, in a frame if pixels have χ k, l below αmin = 0.2, which is a lower threshold, then the quality of the frame is too bad and such pixels are not considered in the frame. Hence value of Wk, l is set to be 0. It may be further noted that the depth data which is retrieved from a pair of images fails when their optical axis is parallel to the baseline, i.e., when there is a pure forward motion with respect to the camera. Baseline refers to the line joining the optical centres of two frames under consideration. Hence, it is optimal to use the angular error as measure to define χ. For a pair of frames from a scene, i.e.:

Where denote unit direction vector joining the optical centre o

Where denote unit direction vector joining the optical centre of the frame, i.e., cb and ca to corresponding pixels on the frames b and a, i.e., which represent the same real point w, respectively. Thus weight maps are assigned to all the frame of the scene in a pair wise fashion. In an embodiment, the frames that give highest quality may be selected as:

In an embodiment, the key frame identification module 216 may keep selecting frames from a scene that give the most improvement over the previously selected subset,

Where, an contains the previously selected best value for every pixel is given by:

In an embodiment, the key frame identification module 216 may keep selecting source frames in this manner until the average improvement per pixel falls below a threshold. Moreover, some video frames are poor because of camera shake-induced motion blur. The key frame identification module 216 may reduce the effect of motion blur by adjusting the weights. Let σk (x, y) is a per-pixel blur measure, which is obtained by low-pass filtering the gradient magnitude of the texture of an image k. The key frame identification module 216 may now replace the weights used above by the following equation:

Where, is the normalized blur measure. In this process, the key frame identification module 216 may select 3 to 5 source frames on an average for every output key frame. While this is done independently for every key frame, it is observed that in practice similar source frames are selected for nearby output key frames. This is important for achieving more coherent and smooth results when estimating depth. After the identification of the key frames, the system 102 may configure the depth determination module 218 to determine depth map associated with the key frames as explained in detail below.
[0046] Referring again to figure 3, at block 310, the depth map associated with the scene may be determined by the depth determination module 218. The depth map may be determined as described in detail below:
[0047] In an embodiment, the depth determination module 218 may implement an adaptive semiglobal block matching technique for estimation of a disparity image for each scene. First, the depth determination module 218 may process blocks for fast processing. The block size may be locally adaptive. Regions with uniform depth may be estimated using larger blocks while blocks containing edges will have a smaller size. This makes the process faster and helps in edge preserving. Next, the depth determination module 218 may apply a hybrid median filter on disparity images for smoothening and removing irregularities, such as outliers and peaks. The adaptive median filter also helps in preserving edges. Further, the depth determination module 218 may perform consistency checks to further minimize outliers and false correspondences by exchanging the left image and the right image in a stereo-pair. Finally, the depth determination module 218 may fuse disparity images in order to obtain a final disparity map or a depth map associated to each scene. The details of the adaptive semiglobal block matching technique in estimating the depth map is further explained as below.
[0048] After estimating the key stereo images (or the key frames) using the key frame identification module 216, it may be assumed, by the depth determination module 218 that there exists a left image and a set of right images for each scene, denoted by IL and IR(k), respectively where k is the number of key images in a scene. The depth determination module 218 may downscale the image pairs (IL and IR(k)) and respective disparity image, denoted by Dini, by a value s using a hierarchical approach which recursively down-scale the images to reduce the runtime complexity. Initially, the depth determination module 218 may consider a random disparity image at a resolution of 1=16 of the original. The algorithm has blocks and the radius of the blocks is locally adaptive. The blocks are of different sizes, such as (8x8, 4x4, 2x2} as shown in Figure 4. The block wise cost calculation is generally ambiguous and wrong matches can easily have a lower cost than correct ones due to the presence of noise, especially in remotely sensed aerial images. Therefore, the depth determination module 218 uses an energy function proposed by H. Hirschm in a publication titled “Stereo Processing by Semiglobal Matching and Mutual Information” (IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008) which helps to smoothen by penalizing changes of neighboring disparities, as an additional constraint. However, the energy function is modified to operate on blocks rather than over pixels. It depends on the disparity image D. Next, the depth determination module 218 may perform the stereo matching to find the disparity image that minimizes the energy function.
[0049] To solve occlusion problems in the block wise matching, the depth determination module 218 may perform the block wise matching for all possible pairs. Let the disparity Dk be the result of matching the left image IL against a right image IR(k). The disparities of the images Dk are scaled differently, according to some factor tk. The factor tk may be linear to the length of the baseline between IL and IR(k) if all images are rectified against each other, i.e., if all images are projected onto a common plane that has the same distance to all optical centers. Thus, disparities may be normalized by Dk x p/tk.
[0050] All the disparity images may be filtered by the depth determination module 18 using a hybrid median filter. The filtering may help in smoothening and removing irregularities in the disparity images. The depth determination module 218 may use a two-way hybrid median kernel of size M = 5. Accordingly, two median values (Mr and Md) may be calculated, where Mr is the median of horizontal and vertical R pixels and Md is the median of diagonal pixels D. The filter value is the arithmetic mean of the two median values and the central pixel C, i.e., (Mr + Md + C) = 3. The hybrid median filter preserves edges much better than a square-box median filter because it is a three-step ranking operation, where data from different spatial directions are ranked separately.
[0051] In an embodiment, to further reduce outliers and false correspondence, the depth determination module 218 may re-implement the algorithm by exchanging the left and right image and the filter noise is expressed as follows:

Where th1 and th2 are positive thresholds, Dinv, DLp, and DRp represent an invalid disparity, disparity map of left image corresponding to block p, and disparity map of right image corresponding to block p, respectively. , and are the effective disparity map of left and right images and the overall disparity map corresponding to block p, respectively. Depending on the application, the depth determination module 218 may consider the value of thresholds th1 and th2 as 1 and 3, respectively, to obtain a dense disparity map. The depth determination module 218 may perform a fusion of disparity values by calculating the weighted mean of disparities using the factors tk as weights. Possible outliers may be

discarded by considering only those disparities that are within a 1 pixel interval around the median of all disparity values for a certain block:

and represents the median value over i. The depth determination module 218 therefore increases robustness due to the median, as well as, accuracy due to the weighted mean. Additionally, if enough match images are available, a certain minimum size of the set Θp may be enforced for increasing the reliability of the resulting disparities. Blocks that do not fulfill the criteria are set to invalid. The depth determination module 218 may upscale the disparity map by scale s. The process is iterative. Until the value of s converges to unity, the depth determination module 218 may iterate the process recursively to obtain the final disparity map Dfin. The fusion of disparity images, peak filtering or left and right consistency check may invalidate some disparities, which leads to holes in the disparity image. The holes therefore need to be interpolated for a dense result. In order to interpolate the holes, first the system 102 segments the depth map using a color segmentation technique as described below.
[0052] Referring to Figure 3, at block 312, after the determination of the depth map, the depth map may be segmented using the color segmentation technique in order to obtain a segmented depth map. The segmentation is followed by the coordinate plane fitting step, at block 316. In an implementation, the color segmentation and the coordinate plane fitting steps are implemented by the interpolation module 220 as explained in detail below.
[0053] In an embodiment, in order to interpolate the holes in the disparity image, the interpolation module 220 may segment the depth map using the modified mean-shift algorithm proposed by C. Bing, Z. Nanning, W. Ying, Z. Yongping, and Z. Zhihua in a publication titled “Color Image Segmentation Based on Edge-preservation Smoothing and Soft C-means Clustering” (Machine Graphics & Vision International Journal, vol. 11, no. 2/3, pp. 183–194, 2002). The modified mean-shift algorithm comprises features including (a) mean shift in the spatial dimensions, (b) anisotropic diffusion for edge preserving and smoothing, and (c) joined bilateral filtering for both the intensity and position of each pixel by replacing with the weighted average of its neighbors. This preserves the linearity property of power lines. The advantage of the color segmentation is that the segments are more accurate and that occlusions may better be optimized. The color segmentation is followed by the coordinate plane fitting step. In an embodiment, the interpolation module 220 may represent a coordinate plane by three parameters η, μ, and of following equation:

By using the matching reliability of each pixel as weight, the normal direction of the disparity plane may be given by the eigenvector belonging to the minimum eigen value of the matrix K, such that,

Where wi is the weight the matching pixel (ui, vi), di is the disparity of the pixel (ui, vi), and J is the number of the matching pixels in the region. The above matrix is in accordance with the technique proposed by M. Humenberger, T. Engelke, and W. Kubinger in a publication titled “A census-based stereo vision algorithm using modified Semi-Global Matching and plane fitting to improve matching quality” (Computer Vision and Pattern Recognition Workshops, 2010, pp. 77–84). However, it must be noted that some false matching will be included in the disparity map Dfin. These outliers or holes may then cause the false disparity fitting results. Therefore, the system 102 may need to get rid of the outliers from the disparities before fitting. Specifically, the system 102 may employ the interpolation module 220 to interpolate the holes in the segmented depth map at block 314. RANSAC algorithm may be used to remove the holes or the outliers. However, it is to be noted that the RANSAC algorithm needs to set a threshold to distinguish whether a point is an outlier or otherwise according to the range of the point to the estimated plane. When a false threshold is selected, the fitting result will be incorrect. In order to avoid the limitation, the interpolation module 220 may adopt a more robust method based on the combination of RANSAC and a weighted kernel voting algorithms in order to remove the outliers, explained in detail as below.
[0054] First, the interpolation module 220 may use context based weighted intensity filtering using a data driven approach (i.e. RANSAC algorithm) and then, a weighted kernel voting algorithm is implemented to identify the power line corridor regions of the scene rather than using any using any heuristic solution. The detection of line/edge based spatial variations in the scene, is an intrinsic characteristic of the PLC, and is important for 3-D modeling. It helps to align the depth discontinuities with the intensity edges from a stereo pair of images. Therefore, the interpolation module 220 may adopt the context based weighted intensity filtering using the RANSAC on the obtained disparity values to highlight transmission lines and distribution poles,

where αp is a set of pixel blocks where the transmission lines and distribution poles are detected and lp is the weighting factor. However, over weighting may highlight the linear components in PLCs but may destroy the background information. The interpolation module 220 may run the RANSAC algorithm only once to obtain an inlier set and obtain the initial values of the parameters η, μ and φ. The second step is used to collect “votes”. The interpolation module 220 may estimate the plane parameter η of all significant regions by calculating from a pair of points on a line along corresponding axis belonging to the region. The interpolation module 220 may estimate a one-dimensional histogram by a voting operation, where the x-coordinate is η and the y-coordinate is the count number of η. Each vote, i.e., each estimated value of η, is weighted by the support of peaks in the histogram. The interpolation module 220 may further perform smoothening operation on the histogram using a Gaussian filter. Finally, the maximum of the histogram may be regarded as the final estimation of η. Similarly, the interpolation module 220 may obtain one-dimensional histograms from the estimation of μ and φ by a similar voting operation.
[0055] After the interpolation of the holes and the coordinate plane fitting, the system 102 may further detect interferences/infringements in the PLC. In order to detect the interferences/infringements, the PLC and corresponding planes (hereinafter referred to as predefined reference plane) may be predefined and already known beforehand. More specifically, out of the total planes, the system 102 may know beforehand that the predefined reference plane in the scene indicates the PLC. The PLC may have different components and/or objects including transmission lines, distribution poles, transmission towers, insulators etc. Each of these components in a single image lies in different planes. In accordance with this embodiment, the predefined reference plane or the PLC plane indicates the union of all such planes. Similarly, the parts of a tree or greenery patch will lie in different planes fitted using the plane fitting step as explained above.
[0056] Now, again referring to figure 3, at block 318, a minimum distance from a point in the predefined reference plane in the scene to one of a plurality of points in the plane may be computed using the infringement determination module 222. Specifically, the distance from a point (i) in the predefined reference plane (where PLC lies), to a neighboring plane, Пj, is the smallest distance from the point to one of the infinite points on the plane. This distance corresponds to the perpendicular line from the point (i) to the coordinate plane. In one embodiment, the distance δ(I, Пj) between the point i = (x0, y0, z0) and the neighboring plane Пj = Ax + By + Cz + D = 0 is

[0057] After the computation of the distance, at block 320, the distance may be compared with a predefined threshold δth. Further, at block 322, interference/infringement of the one or objects with the predefined reference plane may be determined based upon the comparison. In one embodiment, the interference/infringement may be determined when the minimum distance is less than the predefined threshold δth. In an embodiment, assume that the infringement determination module 222 divides the PLC plane into patches containing q points. In this embodiment, if the minimum distance of g points out of q points for a patch k with the neighboring plane j is < δth, then that patch may have some infringement, i.e. patch k may be in danger zone indicative of causing interference to the PLC plane. In this embodiment, the infringement determination in the PLC is based upon a probability of patch k to be in danger zone with respect to plane j, a probability of plane j to become an infringement, and a probability of expected value of patch k to be in danger zone. Each of these probabilities may be computed using equations below:
The probability of the patch k to be in the danger zone with respect to the plane j is given by:

Further, the probability of plane j to become an infringement is given by:

Where,

Further, the probability the expected value of the patch k to be in the danger zone is given by:

[0058] Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include those provided by the following features.
[0059] Some embodiments enable estimating key frames for every scene using Markov Random Field and weight maps and thereby handling the problem of aligning frames obtained from a non-uniformly moving camera in a turbulent environment.
[0060] Some embodiments enable minimizing the effect of camera shake-induced motion blur and non-consistent baseline which is unavoidable in UAV based aerial imaging.
[0061] Some embodiments enable estimating rectified disparity map from extracted key frames using an adaptive semiglobal block matching technique. The blocks are locally adaptive. The adaptive semiglobal block matching technique preserves and accentuates edges of linear structures. Further, it smoothens and removes irregularities in disparity maps. A depth map is obtained after consistency check by the fusion of disparity maps.
[0062] Some embodiments enable detection of infringements based on extracted data on the depth maps and different contextual information including height of power tower, thickness of wires etc.
[0063] Although implementations for methods and systems for 3D modeling of one or more videos have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for the 3D modeling of the one or more videos.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	975-MUM-2015-FORM 26-(27-04-2015).pdf	2015-04-27
1	975-MUM-2015-IntimationOfGrant19-03-2024.pdf	2024-03-19
2	975-MUM-2015-PatentCertificate19-03-2024.pdf	2024-03-19
2	975-MUM-2015-CORRESPONDENCE-(27-04-2015).pdf	2015-04-27
3	Form 3.pdf	2018-08-11
3	975-MUM-2015-Written submissions and relevant documents [14-03-2024(online)].pdf	2024-03-14
4	Form 2.pdf	2018-08-11
4	975-MUM-2015-Correspondence to notify the Controller [27-02-2024(online)].pdf	2024-02-27
5	Figure of Abstract.jpg	2018-08-11
5	975-MUM-2015-FORM-26 [27-02-2024(online)]-1.pdf	2024-02-27
6	Drawing.pdf	2018-08-11
6	975-MUM-2015-FORM-26 [27-02-2024(online)].pdf	2024-02-27
7	975-MUM-2015-US(14)-HearingNotice-(HearingDate-28-02-2024).pdf	2024-02-05
7	975-MUM-2015-FORM 1(2-7-2015).pdf	2018-08-11
8	975-MUM-2015-DEFENCE REPLY-28-03-2023.pdf	2023-03-28
8	975-MUM-2015-CORREPONDENCE(2-7-2015).pdf	2018-08-11
9	975-MUM-2015-Defence-20-09-2021.pdf	2021-09-20
9	975-MUM-2015-CLAIMS [24-12-2021(online)].pdf	2021-12-24
10	975-MUM-2015-COMPLETE SPECIFICATION [24-12-2021(online)].pdf	2021-12-24
10	975-MUM-2015-FER.pdf	2021-10-18
11	975-MUM-2015-FER_SER_REPLY [24-12-2021(online)].pdf	2021-12-24
11	975-MUM-2015-OTHERS [24-12-2021(online)].pdf	2021-12-24
12	975-MUM-2015-FER_SER_REPLY [24-12-2021(online)].pdf	2021-12-24
12	975-MUM-2015-OTHERS [24-12-2021(online)].pdf	2021-12-24
13	975-MUM-2015-COMPLETE SPECIFICATION [24-12-2021(online)].pdf	2021-12-24
13	975-MUM-2015-FER.pdf	2021-10-18
14	975-MUM-2015-CLAIMS [24-12-2021(online)].pdf	2021-12-24
14	975-MUM-2015-Defence-20-09-2021.pdf	2021-09-20
15	975-MUM-2015-CORREPONDENCE(2-7-2015).pdf	2018-08-11
15	975-MUM-2015-DEFENCE REPLY-28-03-2023.pdf	2023-03-28
16	975-MUM-2015-FORM 1(2-7-2015).pdf	2018-08-11
16	975-MUM-2015-US(14)-HearingNotice-(HearingDate-28-02-2024).pdf	2024-02-05
17	975-MUM-2015-FORM-26 [27-02-2024(online)].pdf	2024-02-27
17	Drawing.pdf	2018-08-11
18	975-MUM-2015-FORM-26 [27-02-2024(online)]-1.pdf	2024-02-27
18	Figure of Abstract.jpg	2018-08-11
19	Form 2.pdf	2018-08-11
19	975-MUM-2015-Correspondence to notify the Controller [27-02-2024(online)].pdf	2024-02-27
20	Form 3.pdf	2018-08-11
20	975-MUM-2015-Written submissions and relevant documents [14-03-2024(online)].pdf	2024-03-14
21	975-MUM-2015-PatentCertificate19-03-2024.pdf	2024-03-19
21	975-MUM-2015-CORRESPONDENCE-(27-04-2015).pdf	2015-04-27
22	975-MUM-2015-IntimationOfGrant19-03-2024.pdf	2024-03-19
22	975-MUM-2015-FORM 26-(27-04-2015).pdf	2015-04-27

Search Strategy

1	SearchHistoryE_29-09-2021.pdf