Abstract: Disclosed herein is a system and method for monitoring burning forest status through lightweight hybrid deep learning based aerial visuals analysis, that helps forest surveillance team to easily visualize smoke free images to estimate actual damages of wildlife and plants due to fire incidents and take corrective measures. The system comprises a drone mounted sensor module (100), a server (200), and a handheld user device (300) communicatively linked with one another. The drone mounted sensor module (100) acquires input data associated with aerial visuals with thermal and infrared radiation data of a target forest (F). A visual segmentation module (400) is embedded in the drone sensor module (100) to perform pixel-wise segmentation on the input visuals (I) based on semantic feature extraction via a convolutional neural network to output smoke and fire binary masks. A visual refinement module (500) is configured to apply an auto encoder-decoder based inpainting function to extract contextual features using the binary masks, followed by missing pixels restoration based on contextual features, and residual correction with edge enhancement and noise suppression in the visual frames to obtain refined visual frames. A pixel compensation network (600) is configured to bring structural and textural consistency in the final visual status (VS) to be displayed in the user device (300) in real-time. Fig. 1
Description:FIELD OF THE INVENTION
The present invention broadly relates to forest image/video monitoring. More specifically, the present invention relates to a system and method for monitoring burning forest status through lightweight hybrid deep learning based aerial visuals analysis, that helps forest surveillance authority to easily visualize and estimate actual damages of wildlife and plants occurred due to fire incidents and accordingly take corrective measures. The hybrid deep learning models are trained to transform smoky and hazy visuals to smoke and haze free visuals in real-time.
BACKGROUND OF THE INVENTION
Forest fires, also known as wildfires, are caused by both natural factors like lightning and human activities like carelessness or arson. They spread uncontrollably, through natural vegetation like plants, animals, grasslands and brushlands that fall in their path. They can be devastating, causing significant environmental damage, property destruction, and even loss of life. The wind spreads the fire rapidly, causing significant air pollution. Generally, the fires that continue for longer or are highly inflammable are caused by climatic changes. High atmospheric temperatures and dryness (low humidity), sometimes lightening offer favourable circumstance for a fire to start. Burning of forests can impact the economy as many families and communities depend on the forest for food, fodder and fuel. It burns down the small shrubs and grasses, leading to landslides and soil erosion. It causes smoke and poisonous gas emissions that result in significant health issues in humans. Forest fire further causes imbalances in nature and endangers biodiversity by reducing faunal and floral wealth. Therefore, the forest departments (government agencies) and other environmental protection groups always keep vigilance eyes to protect forest resources from any fire outbreak situations.
The existing forest fire surveillance approaches include manual entangled inspection, remote video monitoring, satellite remote sensing, and unmanned aerial vehicle (UAV) patrol. In manual entangled inspection mode, many fires cannot be discovered early due to watchman’s (security personal’s) negligence and mistake, the fire extinguishing time is delayed, and serious consequences are caused. The remote video monitoring mode employs large number of cameras installed in different locations of the forest and real-time pictures are transmitted to a monitoring centre through a wired or wireless network, but very often such images lack visual clarity and limitations due to various environmental factors. The satellite remote sensing mode is to discover forest fires after processing remote sensing photos, but the satellite can only discover forest fires in a large area and cannot discover the forest fires in the early stage of a fire. On the other hand, the conventional fire sensors cannot be feasible for monitoring a too wide area like forest fire incidents.
During the UAV patrolling, the drones are deployed over the target forest regions to take aerial videography which are visually monitored at some distant ground stations. The UAV air patrol has the advantages of being outstanding relatively, and good in adaptability and real-time performance. However, the drone captured videos/images often show fire sign/indication although it is not actual fire, since the noise (moving object, reflected light, artificial light, drone motion etc.) and fire like information (smoke, flame, fog, environmental pollution related haziness) are not properly removed in the course of video/image processing and analysis. Further, it becomes quite challenging to locate original of fire outbreak and estimate actual damages of the forest resources. Therefore, few researchers have explored various image/video processing techniques to get visual clarity with reliable information that can help forest/environmental departments to quickly assess and mitigate forest fire like emergency situations.
A reference may be made to CN104834920A that discloses an intelligent forest fire recognition method and device based on a multispectral image of an unmanned plane, in which Deep Belief Networks made of limited Boltzmann machine network of multilayer and one deck counter propagation network is trained to detect smoke/flame regions in the images.
Another reference may be made to CN108416963B that discloses a forest fire early warning method and system, in which the UAV captured images are processed by a selective search and R-CNN (region based connected Neural network) techniques.
One more reference may be made to US20210049885A1 that discloses a fire detection method and system, in which the video image of target regions are converted into a preset YCbCr (a sampling frequency ratio of luminance signal and colour difference signal) (Y: luminance, CbCr: colour) colour space, the harmful effects which generate illuminance in the YCbCr colour space are removed, and the smoke area is detected by applying a random forest classification technique (classification and regression tree) to the video image.
Therefore, in view of the above limitations of the conventional/existing forest fire surveillance approaches, techniques, devices and methods, there exists a need to develop an improved fire dehazing and de-smoking technique which would in turn address a variety of issues including, but not limited to, inherent noise and smoke/haziness found in the aerial videos/images, finding exact fire origin, underlying damage assessment, and software-hardware incompatibility. Moreover, it is desired to develop a system and method for monitoring burning forest status through lightweight hybrid deep learning based aerial visuals analysis, that can transform smoky and hazy visuals to smoke and haze free visuals in real-time, which includes all the advantages of the conventional/existing techniques/methodologies and overcomes the deficiencies of such techniques/methodologies.
OBJECT OF THE INVENTION
It is an object of the present invention to remotely monitor drone aerial visuals of forest fire incidents through handheld computing devices (e.g., smartphone, tab, laptop etc.).
It is another object of the present invention to accurately locating exact fire breakout regions/places and performing damage assessment by removing unwanted noise and fire-like irrelevant information (smoke, flame, fog, similar environmental pollution related haziness etc.) from the aerial visuals.
It is one more object of the present invention to develop a unique advanced fire dehazing and de-smoking technique to process and analyse raw burning forest visuals in real-time.
It is a further object of the present invention to provide a reliable, safe, and cost-effective system and method for monitoring of burning forest status so that the necessary actions can be taken immediately to mitigate the fire spread, thus saving forest resources and wildlife.
SUMMARY OF THE INVENTION
In one aspect, the present invention provides a system for monitoring burning forest status through lightweight hybrid deep learning based aerial visuals analysis, that helps forest surveillance team to easily access and estimate actual damages of wildlife and plants due to fire incidents and take corrective measures. The system comprises a drone mounted sensor module, a server, and a handheld user device communicatively linked with one another. The drone mounted sensor module acquires input data associated with aerial visuals with thermal and infrared radiation data of a target forest. The system further deploys a hybrid deep learning architecture made of a visual segmentation module, a visual refinement module, and a pixel compensation network. The visual segmentation module is embedded in the drone sensor module to perform pixel-wise segmentation on the input visuals based on semantic feature extraction via a convolutional neural network to output smoke and fire binary masks. The visual refinement module is configured to apply an auto encoder-decoder based inpainting function to extract contextual features using the binary masks, followed by missing pixels restoration based on contextual features, and residual correction with edge enhancement and noise suppression in the visual frames to obtain refined visual frames. The pixel compensation network is configured to compute thickness of dense smoke regions in the refined visual frames, followed by residue component estimation associated with missing pixel values of the dense smoke regions having the thickness greater than a threshold value, and addition of the residue components to the binary mask outputs and the refined visual frame outputs to bring structural and textural consistency in the final visual status. The final visual status (smoke free images) is displayed in the user device in real-time.
Other aspects, advantages, and salient features of the present invention will become apparent to those skilled in the art from the following detailed description, which delineate the present invention in different embodiments.
BRIEF DESCRIPTION OF DRAWINGS
These and other features, aspects, and advantages of the present invention will become better understood when the following detailed description is read with reference to the accompanying figures.
Fig. 1 illustrates various hardware components employed in the system for visual inspection of fire affected forests, in accordance with an embodiment of the present invention.
Fig. 2 illustrates various modules used in transforming smoky visuals into smoke/haze free visuals, in accordance with an embodiment of the present invention.
Fig. 3 illustrates various method steps for visual inspection of fire affected forests, in accordance with an embodiment of the present invention.
Fig. 4 illustrates a CNN architecture used for visual pixel segmentation, in accordance with an embodiment of the present invention.
Fig. 5 illustrates visual data processing through various deep learning modules, in accordance with an embodiment of the present invention
List of reference numerals
100 drone mounted sensor module
102 180-degree camera
104 thermal sensor
106 infrared sensor
108 microprocessor
200 server
300 handheld user device
400 visual segmentation module (CNN)
500 visual refinement module (auto encoder-decoder based inpainting function)
600 pixel compensation network module
F target forest
I input data
VS Output visual status
DETAILED DESCRIPTION OF THE INVENTION
Various embodiments described herein are intended only for illustrative purposes and subject to many variations. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient, but are intended to cover the application or implementation without departing from the scope of the present invention. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting.
The use of terms “includes,” “comprises,” or “having” and variations thereof herein are meant to encompass the items listed thereafter and equivalents thereof as well as additional items. The terms “an” and “a” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items. The terms “at least one” and “one or more” herein are used to indicate one minimum number of components/features to be essentially present in the invention. The term ‘smoke/smoky’ used herein refers to smoke, flame, fume, vapour, fog, and fire or environmental pollution related haziness etc. which are usually visible in atmosphere over any fire/burning like incidents.
According to an embodiment of the present invention, as shown in Fig. 1-2, the system for monitoring of burning forest status is depicted. The system comprises a drone mounted sensor module (100), a server (200), and a handheld user device (300) communicatively linked with one another. The drone mounted sensor module (100) includes a 180-degree camera (102) for capturing the aerial visuals of a target forest (F), a thermal sensor (104) for measuring the temperatures of the target forest (F), and an infrared sensor (106) for measuring the infrared radiations emitted from the target forest (F), and a microprocessor (108) for processing the input data segmentation. The handheld user device (300) is a portable computing device (such as smartphone, tab, laptop, computer) having a processor, a memory, and a display. The server is either a ground control station server or a cloud server. The system further deploys a hybrid deep learning architecture consisting of a visual segmentation module (400), a visual refinement module (500), and a pixel compensation network module (600); where each module employs one or more uniquely designed machine/deep learning tools to carry out specific operation in specific capacities to transform smoky visual inputs (I) into high-quality smoke-free output visual status (VS) in real-time. The visual segmentation module (400) is embedded in the microprocessor (108) of the drone sensor module (100). The visual refinement module (500) and the pixel compensation network (600) are embedded in the server (200) or in processor of the handheld user device (300).
According to an embodiment of the present invention, the drone mounted sensor module (100) acquires data associated with aerial visuals with thermal and infrared radiation data of the target forest (F). The visual frames captured from different angles are meticulously stitched together, providing a comprehensive view crucial for total situational awareness. The comprehensive data acquisition is crucial for predicting and analysing the incidents such as fire's behaviour, planning evacuation routes, and deploying firefighting efforts effectively. By leveraging such sophisticated sensor technology, the drone provides a robust tool for emergency responders, enabling faster, safer, and more informed decision-making during fire-related emergencies. The acquired data are initially segmented in the drone processor using the visual segmentation module (400). The segmented image data are wirelessly streamed from the drone (100) to the server (200) or directly to the handheld her device (300) which is optimized for low latency to ensure the footage is relayed in real-time. The output of the visual segmentation module (400) is fed into the visual refinement module (500) for the smoke removal with edge enhancement and noise suppression in the visual frames. The outputs of the visual segmentation module (400) and the visual refinement module (500) are fed into the pixel compensation network module (600) for image reconstruction with fine tuning structural and textural consistency in the final visual frames. The final visual frames are displayed in the handheld user device (300) in real-time.
According to an embodiment of the present invention, as shown in Fig. 2-3, the method for monitoring of burning forest status is depicted. The method employs a drone mounted sensor module (100), a server (200), and a handheld user device (300) communicatively linked with one another, and a hybrid deep learning architecture consisting of a visual segmentation module (400), a visual refinement module (500), and a pixel compensation network module (600). The method comprises steps of: acquiring (S1) aerial visuals with thermal and infrared radiation data of a target forest as input data by the drone mounted sensor module (100); configuring (S2) the handheld user device (300) in communication with the drone mounted sensor module (100) via the server (200) to display burning forest visual status in real-time; performing (S3) pixel-wise segmentation of visual frames by the visual segmentation module (400) to output smoke and fire binary masks; applying (S4) an auto encoder-decoder based inpainting function by the visual refinement module (500) to obtain refined visual frames from the binary masks; and deploying (S5) a contrast attenuation function on outputs of the binary masks and the refined visual frames by the pixel compensation network module (600) to bring structural and textural consistency in the final visual status.
According to an embodiment of the present invention, as shown in Fig. 4, the visual segmentation module (400) is built on a lightweight convolutional neural network (CNN) model architecture comprising an initial (feature extraction) block, three sequential inverted residual blocks, a final (feature extraction) block, and a segmentation head. The initial block consists of a convolutional layer followed by a normalization and an activation function. Each residual block consists of an expansion convolution layer, a first normalization, a first activation function, followed by a depth-wise convolution, a second normalization, and a second activation function. The final block consists of a convolution layer, a sequence of inverted residual layers, and a global average pooling layer. While the input data pass through all the said blocks/layers, (low and high level) semantic features are extracted from the raw visual frames and the temperature and infrared radiation data. The segmentation head performs pixel-wise segmentation to compute a smoke probability score and a fire probability score using the extracted semantic features, thereby resulting in a smoke binary mask and a fire binary mask.
The smoke detection relies on aerial image/video frame data as captured by the drone camera. The smoke probability score [Psmoke(x,y)] indicating likelihood of each pixel being part of the smoke region is computed using extracted feature vector F(x,y) at corresponding pixel (x,y), a smoke learned weight (Ws), a bias term (bs), and a sigmoid activation function (σ), as expressed in equation 1.
Psmoke(x,y)=σ(Ws∗F(x,y)+bs) equation 1
The fire detection relies on combination/fusion of thermal and infrared sensor data as captured by the drone. The thermal sensor outputs a 2D temperature map where each pixel [Sthermal(x,y)] corresponds to an absolute or normalized temperature. The infrared sensor produces an intensity map [Sinfrared(x,y)] that reflects emitted or reflected heat energy helping to distinguish fire from other heat-emitting objects. The CNN model learns a pixel-wise heat intensity mapping function, where heat intensity refers to a fused measure of absolute temperature and emitted radiation from the thermal and IR sensors. This composite heat signal enables the model to assign a fire probability score [Pfire(x,y)] indicating likelihood of each pixel being part of the fire region is computed using the temperature map value [Sthermal(x,y)] and the infrared intensity map value [Sinfrared(x,y)] at corresponding pixel (x,y), a bias term(bf), thermal and infrared learned weights (Wf, Wi), and rectified linear unit (ReLU6) activation function (σ’), as expressed in equation 2.
Pfire(x,y)=σ(Wf∗Sthermal(x,y)+Wi∗Sinfrared(x,y)+bf) equation 2
The probability scores are compared against a predefined threshold value (for example 0.65). If the smoke probability score is ≤ the predefined threshold value of 0.65, then the visual segmentation module (400) classifies the corresponding pixel region as smoke (non-fire) region. If the fire probability score exceeds the predefined threshold value of 0.65, then the visual segmentation module (400) classifies the corresponding pixel region as fire region. Based on pixel-wise segmentation and computations of the probability scores, the visual segmentation module (400) (CNN model) outputs the binary fire mask (Mfire) and the binary smoke mask (Msmoke).
The key advantages of employing of the said lightweight yet highly accurate smoke and fire segmentation (CNN) model are achieving real-time segmentation (< 25ms per frame) on the drone hardware, making it suitable for edge deployment. The inverted residual blocks and squeeze-excite mechanisms of the segmentation model reduce redundant feature computations while preserving critical information The segmentation model is optimized for on-drone inference using quantization (INT8/FP16) and pruning techniques. Further, instead of transmitting full-resolution frames to the handheld user device, only the binary masks (Mfire, Msmoke) are wirelessly sent using a low-latency communication protocol. This drastically reduces data transmission overhead, allowing the handheld user device to perform subsequent operation efficiently.
According to an embodiment of the present invention, as shown in Fig. 5, the outputs i.e., binary masks (Mfire, Msmoke) and learned features (A), of the visual segmentation module (400) are transmitted to the visual refinement network/module (500) that is responsible for reconstructing smoke-free and fire-free frames. Once the binary segmentation masks (Mfire,Msmoke) are received from the drone (100), the handheld user device (300) applies the visual refinement network (500) to intelligently fill the occluded regions. An auto encoder-decoder based inpainting function is applied to extract contextual features from surrounding clean pixel regions in the visual frames using the segmented binary masks, that follows a patch-based texture synthesis strategy. This strategy examines adjacent pixels around the masked region and infers what the missing content should look like by learning texture continuity, edge gradients, and structure propagation. Rather than regenerating the whole frame, the model processes smaller image patches, making the inference faster, reducing computational operations and ensuring high-quality inpainting. The restored frame after inpainting (I) is computed using equation 3.
I =Ginpaint(A,Mfire,Msmoke) equation 3
where ‘Ginpaint’ is pixel reconstruction generator network responsible for reconstruction, and ‘A’ is learned contextual features.
To enhance blending accuracy, lightweight attention mechanisms are incorporated to ensure that the inpainted regions match the surrounding areas both in texture and intensity. Unlike standard convolutional-based inpainting, which often introduces visible artifacts, the attention module selectively weighs the most relevant spatial features to generate coherent, seamless reconstructions. This ensures that sharp edges and natural textures are maintained, particularly in high-contrast regions where fire previously exists.
Following the inpainting step, a shallow residual learning module is applied for edge enhancement and noise suppression, ensuring that the reconstructed image/video maintain high visual quality. The residual learning module is trained to capture the fine-grained feature details that the initial inpainting may miss. The final output frame (Ifinal ) after residual learning application is computed using equation 4.
Ifinal=I+Rresidual(I) equation 4
where ‘I’ is the coarse inpainted frame output from the generator Ginpaint, and Rresidual(I) is the residual correction map that adds texture sharpness and enhances structural boundaries, thus refines object contours and improves visual continuity.
To suppress reconstruction noise and further improve clarity in complex regions (e.g., where both smoke and fire exist), a filtering layer is added post-residual enhancement. This filtering layer employs a lightweight guided bilateral filter that smooths flat regions while preserving edges. Mathematically, the smoothed frame Ismooth is represented in equation 5.
Ismooth=Bilateral(Ifinal,G) equation 5.
where G is the guidance map generated using edge intensity features.
This filtering ensures that high-contrast areas retain definition during smoothing. As a result, the final smoke-free and fire-free image/video maintains not only structural consistency but also temporal stability, essential for real-time applications. The lightweight nature of the residual and filtering layers ensures that all operations remain feasible for the handheld device without introducing latency. The output video is then streamed in real-time to the handheld device, offering emergency responders a clear, unobstructed view of the hazardous area. This clear visualization is crucial for accurate assessment, navigation, and decision-making in dynamic fire scenarios.
According to an embodiment of the present invention, as shown in Fig. 5, the outputs of the visual segmentation module (400) and the visual refinement network/module (500) are transmitted to the pixel compensation network module (600) that handles residue smoke and missing content in areas obscured by thick/dense smoke. Dense smoke regions tend to have low contrast (negligible difference in brightness and colour between the smoke and the background, causing edges and object boundaries within the affected regions to appear blurred) and a high degree of opacity (obscuring most of the visual content behind it), making segmentation challenging. The thickness of smoke is estimated using a contrast attenuation function based on local image gradients and intensity variance, this generates a binary mask to identify the regions affected by the dense smoke. This binary mask ensures that the network focuses only on areas needing reconstruction, leaving the already-clear parts untouched. Once these regions are identified, the residue estimation module predicts the missing pixel values using an encoder-decoder architecture with skip connections. The residue component is added to the segmented binary mask outputs and the refined visual frame outputs to bring structural and textural consistency in the final visual status (fully restored image/video). This combination of thick binary smoke masking and residue estimation ensures that even the most challenging areas are accurately reconstructed, making the final image/video both smoke-free and visually complete. The pixel compensation network ensures that highly occluded areas are reconstructed with enhanced detail and natural transitions, effectively eliminating artifacts, colour mismatches, and unnatural textures. This final restored image/video significantly improves visibility, ensuring that the processed output maintains high visual fidelity and seamless integration of inpainted regions with real-world elements.
According to an embodiment of the present invention, the primary communication link between the drone and the Ground Control Station (GCS) server is established using low-power, long-range wireless technologies suited for forest environments, where cellular connectivity is often unavailable. To transmit critical information like binary segmentation masks (Mfire and Msmoke), thermal and infrared sensor data, and GPS telemetry, the drone can utilize LoRa (Long Range Radio) or sub-GHz ISM band radios, which are well-suited for transmitting low-bandwidth data over several kilometres. For higher-bandwidth requirements such as image patches or live low-resolution video snippets, directional Wi-Fi (802.11ac/n) links are used within line-of-sight range, typically up to 1–2 km. In case both these options are infeasible due to terrain or weather, drones may optionally relay data via satellite-based systems such as Starlink or other LEO satellite services, especially when continuous operation is needed in deep forest regions.
According to an embodiment of the present invention, the GCS acts as a bridge between the drone and the handheld device carried by emergency response personnel. A local wireless hotspot is created by the GCS using a rugged LTE router (when cellular is available) or a Wi-Fi access point. This allows the handheld device to connect locally and receive segmentation data, sensor fusion outputs, and other metadata in real time. In scenarios with no internet access, the GCS uses offline local networking to maintain direct communication with handheld devices via Wi-Fi Direct. The GCS does not host the application logic itself—it simply forwards received data from the drone to the app running on the handheld device. This setup ensures minimum latency and a decentralized operation model where each handheld unit can operate independently as long as it is within wireless range of the GCS.
According to an embodiment of the present invention, the deep learning architectures/frameworks as used in the visual refinement network (auto encoder-decoder based inpainting) and the pixel compensation network are combined and converted into a lightweight, cross-platform application hosted directly on the handheld user device. The application is designed for on-device execution, ensuring the user can operate independently even when disconnected from central servers or the internet. The application is optimized for edge processing and minimal power consumption, making it suitable for field operatives in forest fire scenarios. It supports on-device TensorFlow Lite or ONNX Runtime inference engines to apply the autoencoder-based inpainting model in real-time. No server-based hosting is required; however, when internet or satellite access is available, the app can sync data with cloud servers or emergency dashboards using secure HTTPS (Hypertext Transfer Protocol Secure) or MQTT (Message Queuing Telemetry Transport) protocols. The application has a real-time viewer interface which displays the dehazed image/video feed in real-time with frame-level overlays for detected fire/smoke regions. All communication with the GCS or the drone is encrypted using AES (Advanced Encryption Standard) level security.
The proposed hybrid deep learning model is tested on 5000 real-time wildfire images available online. The hyperparameters are learning rate=0.0006, optimizer=adam, epochs=800, training-validation-testing = 60%-20%-20%. Each convolution layer has a batch-normalization layer (Norm) and a ReLU activation function. The evaluation metrics used are the accuracy rate, structural similarity (SSIM), the peak signal to noise ratio (PSNR), and the gradient magnitude similarity deviation (GMSD). The model archives 96.6±1.89% accuracy rate in removing smoke and haze, and 98.24±1.06% accuracy rate in restoring the video frame image, in comparison to baseline CNN models available. The proposed model obtained PSNR, SSIM and GMSD values are 24.63±4.47, 0.91±0.26, and 0.88±1.59 (x10-5) respectively. The values obtained by baseline models are 16.34±0.77, 0.59±0.75, and 14.76±6.43(x10-5) respectively. Hence, values obtained by proposed model is better than baseline models. The higher values of the PSNR and the SSIM mean better performance, while the lower values in the GMSD indicate less colour distortion and better texture preservation.
The foregoing descriptions of exemplary embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiment was chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable the persons skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is understood that various omissions, substitutions of equivalents are contemplated as circumstance may suggest or render expedient, but is intended to cover the application or implementation without departing from the scope of the claims of the present invention. , Claims:We claim:
1. A system for monitoring of burning forest status, the system comprises:
a drone mounted sensor module (100) adapted to acquire input data associated with aerial visuals with thermal and infrared radiation data of a target forest (F);
a handheld user device (300) communicatively linked to the drone mounted sensor module (100) via a server (200) to display burning forest visual status (VS) in real-time;
a visual segmentation module (400) embedded in the drone mounted sensor module (100) to pass the input data through a convolutional neural network comprising:
an initial block consisting of a convolutional layer followed by a normalization and an activation function;
three sequential inverted residual blocks, each consisting of an expansion convolution layer, a first normalization, a first activation function, followed by a depth-wise convolution, a second normalization, and a second activation function;
a final block consisting of a convolution layer, a sequence of inverted residual layers, and a global average pooling layer; and
a segmentation head adapted to perform pixel-wise segmentation using semantic features extracted from the visual frames with the temperature and infrared radiation data while passing through all the blocks, and output a smoke binary mask and a fire binary mask based on the pixel-wise segmentation results;
a visual refinement module (500) embedded in the server (200) or the handheld user device (300) to:
receive the segmented binary mask outputs;
apply an auto encoder-decoder based inpainting function to extract contextual features from surrounding clean pixel regions in the visual frames using the binary masks;
restore missing pixels in the visual frames based on the extracted contextual features; and
deploy a residual correction map followed by a bilateral filter for edge enhancement and noise suppression in the visual frames; and
a pixel compensation network module (600) embedded in the server (200) or the handheld user device (300) to:
receive the segmented binary mask outputs and the refined visual frame outputs;
compute thickness of dense smoke regions in the visual frames through a contrast attenuation function;
estimate residue components associated with missing pixel values of the dense smoke regions having the thickness greater than a threshold value; and
adding the residue components to the segmented binary mask outputs and the refined visual frame outputs to bring structural and textural consistency in the final visual status.
2. The system as claimed in claim 1, wherein drone mounted sensor module (100) includes a 180-degree camera (102) for capturing the aerial visuals of the target forest (F), a thermal sensor (104) for measuring the temperatures of the target forest (F), and an infrared sensor (106) for measuring the infrared radiations emitted from the target forest (F), and a microprocessor (108) for processing the input data segmentation.
3. The system as claimed in claim 1, wherein the visual segmentation module (400) is configured to: compute a smoke probability score indicating likelihood of each pixel being part of the smoke region, using extracted feature vector at corresponding pixel, a smoke learned weight, a bias term, and a sigmoid activation function; and compare the smoke probability score against a threshold value based on which the corresponding pixels are determined as the smoke region.
4. The system as claimed in claim 1, wherein the visual segmentation module (400) is configured to compute a fire probability score indicating likelihood of each pixel being part of the fire region, using temperature and infrared intensity at corresponding pixel, a bias term, thermal and infrared learned weights, and rectified linear unit (ReLU6) activation function; and compare the fire probability score against a threshold value based on which the corresponding pixels are determined as the fire region.
5. The system as claimed in claim 1, wherein the visual refinement module (500) is configured to derive inpainting function output using a pixel reconstruction generator network, the learned contextual features, and the binary masks; add the inpainting function output with the residual correction map to obtain a final output frame; and smoothen the final output frame using edge intensity feature based guidance map.
6. A method for monitoring of burning forest status, the method comprises steps of:
acquiring (S1) aerial visuals with thermal and infrared radiation data of a target forest (F) as input data by a drone mounted sensor module (100);
configuring (S2) a handheld user device (300) in communication with the drone mounted sensor module (100) via a server (200) to display burning forest visual status in real-time;
performing (S3) pixel-wise segmentation of the visual frames by a visual segmentation module (400) to output a smoke binary mask and a fire binary mask; wherein the pixel-wise segmentation is based on semantic features extracted from the input data while passing through a convolutional neural network;
applying (S4) an auto encoder-decoder based inpainting function by a visual refinement module (500) to extract contextual features from surrounding clean pixel regions in the visual frames using the binary masks, followed by missing pixels restoration based on contextual features, and residual correction with edge enhancement and noise suppression in the visual frames to obtain refined visual frames; and
deploying (S5) a contrast attenuation function by a pixel compensation network module (600) to compute thickness of dense smoke regions in the refined visual frames, followed by residue component estimation associated with missing pixel values of the dense smoke regions having the thickness greater than a threshold value, and addition of the residue components to the binary mask outputs and the refined visual frame outputs to bring structural and textural consistency in the final visual status.
7. The method as claimed in claim 6, wherein the configuring step (S2) includes embedding the visual segmentation module (400) in the drone mounted sensor module (100).
8. The method as claimed in claim 6, wherein the configuring step (S2) includes embedding the visual refinement module (500) and the pixel compensation network module (600) in the server (200) or in the handheld user device (300).
9. The method as claimed in claim 6, wherein the pixel-wise segmentation step (S3) comprises:
computing a smoke probability score indicating likelihood of each pixel being part of the smoke region, using extracted feature vector at corresponding pixel, a smoke learned weight, a bias term, and a sigmoid activation function; and
comparing the smoke probability score against a threshold value based on which the corresponding pixels are determined as the smoke region.
10. The method as claimed in claim 6, wherein the pixel-wise segmentation step (S3) comprises:
computing a fire probability score indicating likelihood of each pixel being part of the fire region, using temperature and infrared intensity at corresponding pixel, a bias term, thermal and infrared learned weights, and rectified linear unit (ReLU6) activation function; and
comparing the fire probability score against a threshold value based on which the corresponding pixels are determined as the fire region.
| # | Name | Date |
|---|---|---|
| 1 | 202521048971-FORM 1 [21-05-2025(online)].pdf | 2025-05-21 |
| 2 | 202521048971-DRAWINGS [21-05-2025(online)].pdf | 2025-05-21 |
| 3 | 202521048971-COMPLETE SPECIFICATION [21-05-2025(online)].pdf | 2025-05-21 |
| 4 | 202521048971-FORM-9 [04-06-2025(online)].pdf | 2025-06-04 |
| 5 | 202521048971-FORM-26 [04-06-2025(online)].pdf | 2025-06-04 |
| 6 | 202521048971-FORM 3 [04-06-2025(online)].pdf | 2025-06-04 |
| 7 | Abstract.jpg | 2025-06-21 |
| 8 | 202521048971-FORM 18 [02-07-2025(online)].pdf | 2025-07-02 |