Abstract: A MULTI-SCALING SYSTEM WITH ACUTE PIXELATION IN VISION TRANSFORMER FOR ENHANCING DISEASE DETECTION IN BELL PEPPER LEAF IMAGES A system and method for detecting and classifying bell pepper plant diseases are disclosed. The invention receives leaf images, preprocesses them to normalise and enhance quality, and applies an acute-pixel attention encoder to adaptively partition images into semantically meaningful regions. A multi-scale context transformer encoder comprising multi-head self-attention at distinct scales captures both fine-grained and broad contextual features, with a cross-attention mechanism integrating information across scales. The processed features are classified to predict healthy or diseased regions and disease type. Results, including segmented disease areas and textual classification outputs, are presented to the user. By combining adaptive pixel-level attention with multi-scale transformer processing, the invention achieves high accuracy and efficiency in bell pepper disease prediction under adverse imaging conditions, providing a robust, real-time tool for crop monitoring and management.
Description:FIELD OF THE INVENTION
This invention relates to computer vision in agriculture, specifically to a system and method using hierarchical attention and pixelated multi-scale vision transformers for enhancing disease detection in bell pepper leaf images. It integrates adaptive pixel-level partitioning with multi-scale transformer encoders to achieve accurate and efficient plant disease prediction.
BACKGROUND OF THE INVENTION
Bell-pepper plant diseases can negatively impact their growth, reduces yield quality and lead to significant economic loss for farmers. Early diseases prediction is challenging task due to the complexity and variability of symptoms. DL models provides promising results in plant diseases prediction. Especially, Inception-V3 (IV3) model delivers the best performance among other models which effectively extracts multi-scale features using inception modules. But, it struggles with images corrupted by noise, poor image sensor quality, capture imperfections or adverse weather conditions that hides the important information. Vision Transformer (ViT) is emerging technique for visual recognition and provides the impressive results for handling these distortions. But, the advantages of a transformer often come with a large number of parameters and computational cost, making it difficult to achieve the optimal balance between the accuracy and model complexity. Also some different and complex crop disease types will hinder the precise segmentation and noise elimination in the affected regions which lowers the output quality representation in ViT. So, an effective hierarchical mechanism is required to solve this ViT issue to improve efficiency in bell-pepper disease prediction.
US20210073692A1: Systems and methods for utility infrastructure condition monitoring, detection, and response are disclosed. One exemplary system includes a sensor package and a monitoring and control module. The sensor package includes a plurality of sensors such as, for example, an image sensor, a video sensor, and a LiDAR sensor. The sensors may each be configured to capture data indicative of one or more conditions (e.g., an environmental condition, a structural condition, etc.) in the vicinity of the utility infrastructure. The monitoring and control includes a detection module and an alert module. The detection module is configured to receive data captured by each sensor and, based on the captured data, determine one or more conditions in the vicinity of the utility infrastructure. The detection module may be configured to then, based on the determined conditions, provide an alert for the condition using the alert module.
US12342746B2: The present invention discloses a method for selective crop management in real time. The method comprises steps of: (a) producing a biosensor plant, said biosensor plant comprises a visual biomarker, said biomarker is encoded by at least one modified genetic locus comprising (i) preselected reporter gene allele having a phenotype detectable by a sensor, and (ii) a regulatory region of a preselected gene allele responsive to at least one parameter or condition of said plant or its environment, said regulatory region is operably linked to said reporter gene, such that the expression of said reporter gene phenotype is correlated with the status of said at least one parameter or condition of said biosensor plant or its environment; (b) acquiring image data of a target area comprising a plurality of said biosensor plants via said sensor and processing said data to generate a signal indicative of the phenotypic expression of said reporter gene allele of said biosensor plant; and (c) communicating said signal to an execution unit communicably linked to the sensor, said execution unit is capable of exerting in real time a selective monitoring and/or treatment of said target area or a portion thereof comprising said biosensor plants, said treatment is being responsive to said status of said parameter or condition of the biosensor plant or its environment. The present invention further discloses systems and plants related to the aforementioned method.
Bell-pepper plant diseases cause significant yield loss and are difficult to predict early due to variability of symptoms and image quality issues such as noise, occlusions, and rotation. Traditional convolutional or transformer models either miss fine-grained features or are computationally expensive. Existing vision transformer approaches use fixed-size patch division and single-scale attention, reducing their ability to localise disease-affected regions. The present invention solves these problems by introducing an adaptive pixel attention encoder and a multi-scale context transformer encoder with multi-head self-attention at different scales, improving robustness, accuracy, and efficiency in disease prediction from bell pepper leaf images.
SUMMARY OF THE INVENTION
This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention.
This summary is neither intended to identify key or essential inventive concepts of the invention and nor is it intended for determining the scope of the invention.
The invention provides a hierarchical attention pixelated multi-scale vision transformer model combined with a convolutional feature extractor for bell pepper plant disease prediction. It adaptively partitions input images into semantically meaningful regions, applies multi-scale context transformer encoders to capture both local and global features, and integrates these representations for accurate disease classification.
An acute-pixel attention encoder segments the image based on pixel-level saliency, focusing on visually important leaf textures and disease patterns. Multi-scale transformer encoders operating at different scales use distinct attention heads to capture fine-grained and broad contextual information. Cross-attention mechanisms exchange information between scales, providing a holistic view of the input image.
The extracted features are then fed into a classification module to distinguish between healthy and diseased leaf regions. This architecture improves segmentation and classification even under adverse imaging conditions, balancing high accuracy with computational efficiency for deployment in real-time agricultural settings.
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
To solve the defined problem, a Hierarchical Attention Pixelated Multi-Scale Vision Transformer with InceptionV3 (HAPMSViT-IV3) model is developed for efficient bell pepper plant disease prediction. Traditional ViT suffers from irregular pixel distribution due to its fixed-size and grid-based patch partitioning leads to weak spatial localization of disease-affected regions. To address this limitation, the proposed model introduces an Acute-Pixel Attention Encoder (APAE) within the ViT framework which adaptively partitions the image into regular and semantically meaningful regions based on pixel-level information. Also, APAE facilitates on each patch by capturing intricate leaf textures and disease patterns highly focusing on visually salient areas.
BRIEF DESCRIPTION OF THE DRAWINGS
The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
FIGURE 1: SYSTEM ARCHITECTURE
The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION OF THE INVENTION
The detailed description of various exemplary embodiments of the disclosure is described herein with reference to the accompanying drawings. It should be noted that the embodiments are described herein in such details as to clearly communicate the disclosure. However, the amount of details provided herein is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
It is also to be understood that various arrangements may be devised that, although not explicitly described or shown herein, embody the principles of the present disclosure. Moreover, all statements herein reciting principles, aspects, and embodiments of the present disclosure, as well as specific examples, are intended to encompass equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a",” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
In addition, the descriptions of "first", "second", “third”, and the like in the present invention are used for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
To solve the defined problem, a Hierarchical Attention Pixelated Multi-Scale Vision Transformer with InceptionV3 (HAPMSViT-IV3) model is developed for efficient bell pepper plant disease prediction. Traditional ViT suffers from irregular pixel distribution due to its fixed-size and grid-based patch partitioning leads to weak spatial localization of disease-affected regions. To address this limitation, the proposed model introduces an Acute-Pixel Attention Encoder (APAE) within the ViT framework which adaptively partitions the image into regular and semantically meaningful regions based on pixel-level information. Also, APAE facilitates on each patch by capturing intricate leaf textures and disease patterns highly focusing on visually salient areas. This enables more effective feature extraction at both the initial and intermediate stages of processing. Then, the extracted acute pixilated features are fed into Multi-Scale Context Transformer Encoder (MSCTE) to aggregate information across these adaptively partitioned regions. Multiple transformer encoders operate parallelly at different scales enabling the model to capture both fine-grained local details and broader contextual information. A cross-attention mechanism is integrated to allow communication between these scales ensuring a holistic understanding of the spatial and semantic structure of the input image. This multi-scale representation significantly improves the segmentation and classification of healthy versus diseased leaf regions, even in the presence of occlusions or rotational variance. Since the standard transformer encoder uses Multi-Head Self-Attention (MHSA), the proposed MSCTE replaces it with the Multi-Scale Context Multi-Head Self-Attention (MSCMHSA) module. Unlike conventional self-attention operates with a fixed receptive field, MSCMHSA allows each attention head to function at a distinct scale. The scale of each head determines the spatial extent of its attention: larger scales capture broader contextual information, resulting in smoother feature representations, while smaller scales focus on local details yielding sharper and more discriminative features. This diversity in attention scale across heads enhances the model's ability to distinguish subtle disease symptoms. Combined with multi-layer perceptron (MLP), MSCMHSA block ensures robust feature learning that supports accurate disease prediction. Finally, the gathered features will be fed into IV3 model for the efficient bell-pepper plant diseases prediction. The suggested model aids decision-making by demonstrating its resilience to occlusions and rotations, enabling effective bell-pepper plant disease prediction.
The proposed HAPMSViT-IV3 model incorporates an APAE and MSCTE with the MSCMHSA framework for efficient bell pepper plant disease prediction. The APAE enables regular and content-aware partitioning of patches by replacing the fixed-size grid-based patch division used in standard ViTs. This improves the model's ability to detect disease-affected regions by enhancing its robustness to occlusions, rotations, and noise crucially addressing the complexities of leaf imagery in real-world agricultural settings. Then, MSCMHSA is applied within MSCTE where each attention head operates at different spatial scales. This enables the model to capture fine-grained local features as well as broader contextual information within the same processing layer. This multi-scale strategy enhances the model’s capacity to differentiate visually similar disease categories, significantly improving feature representation. Among, other prediction model, IV3 provides efficient result in diseases prediction task. The proposed model combines an adaptive attention mechanism with a scalable architecture achieving a balance between high accuracy and computational efficiency for effective diseases prediction in bel-pepper plants.
The invention comprises an input interface for uploading bell pepper leaf images captured under field conditions.
A preprocessing module standardises images by resizing, normalising and enhancing contrast to reduce noise and illumination variability.
An acute-pixel attention encoder adaptively partitions the image into regular and content-aware patches based on pixel-level information. This replaces fixed grid patch division used in standard transformers, allowing more precise localisation of disease-affected regions.
Within each patch, the encoder extracts fine textures and subtle disease patterns while focusing on visually salient areas.
The encoded patches are fed into a multi-scale context transformer encoder comprising several transformer layers operating in parallel at different spatial scales. Each transformer layer includes a multi-scale context multi-head self-attention module where each head attends at a distinct scale.
Large-scale attention heads capture broad contextual information such as overall leaf structure, while small-scale heads focus on local details like lesions or edge distortions. This diversity yields sharper, more discriminative features.
A cross-attention mechanism links the outputs of different scales, enabling information exchange and a unified feature representation.
Multi-layer perceptrons following the attention blocks ensure robust feature learning and non-linear transformation of the aggregated features.
Extracted features are then passed to a classification module that predicts whether a leaf region is healthy or diseased and, if diseased, categorises the disease stage or type.
The architecture combines adaptive attention with a scalable transformer backbone, achieving a balance between accuracy and computational cost suitable for deployment on edge devices or cloud servers.
Periodic training with augmented and annotated datasets enhances model generalisation to new disease patterns and environmental conditions.
The system’s interface displays results to the user, including segmented disease areas and classification outcomes, supporting decision-making in crop management.
Security and privacy measures ensure that uploaded images are stored and processed securely.
The model can be integrated into a mobile or web application for farmers and agronomists, providing real-time disease prediction in the field.
This framework significantly improves early detection and precise segmentation of bell pepper diseases, reducing manual inspection time and increasing diagnostic reliability.
BEST METHOD OF WORKING
The preferred embodiment deploys the model on a cloud or edge server accessible via a mobile application. Leaf images captured in the field are uploaded and preprocessed. The acute-pixel attention encoder adaptively partitions images and extracts salient features, which are processed by the multi-scale context transformer encoder. Outputs are classified by the disease prediction module and displayed to the user as segmented images and textual results. This configuration achieves high prediction accuracy with low latency and is resilient to occlusions, rotations, and noise.
, Claims:1. A system for detecting and classifying bell pepper plant diseases comprising:
an input module configured to receive leaf images; a preprocessing module configured to normalise and enhance the images; an acute-pixel attention encoder configured to adaptively partition the images into semantically meaningful regions and extract salient features; a multi-scale context transformer encoder comprising multiple transformer layers operating at different spatial scales, each layer including a multi-head self-attention module with heads attending at distinct scales; a cross-attention mechanism configured to exchange information between different scales; a classification module configured to predict healthy or diseased regions and categorise disease type; and
an output interface configured to present segmented disease regions and classification results to a user.
2. The system as claimed in claim 1, wherein the preprocessing module performs resizing, normalisation and contrast enhancement to reduce noise and illumination variability.
3. The system as claimed in claim 1, wherein the acute-pixel attention encoder focuses on visually salient areas to improve localisation of disease-affected regions.
4. The system as claimed in claim 1, wherein the multi-scale context transformer encoder uses attention heads of different receptive fields to capture both local and global features.
5. The system as claimed in claim 1, wherein the classification module outputs segmented disease areas and disease stage or type.
6. A method for detecting and classifying bell pepper plant diseases comprising:
receiving leaf images from a user;
preprocessing the images to normalise and enhance them;
adaptively partitioning the images into semantically meaningful regions using an acute-pixel attention encoder;
processing the partitioned regions with a multi-scale context transformer encoder comprising multi-head self-attention at distinct scales;
applying a cross-attention mechanism to integrate information across scales;
classifying the processed features into healthy or diseased categories and, if diseased, predicting disease type; and
displaying segmented disease regions and classification results to a user.
7. The method as claimed in claim 6, wherein the acute-pixel attention encoder extracts fine textures and subtle disease patterns from visually salient areas.
8. The method as claimed in claim 6, wherein the multi-scale context transformer encoder captures both local details and broad contextual information within the same processing layer.
9. The method as claimed in claim 6, wherein the cross-attention mechanism merges outputs from different scales for a unified feature representation.
10. The method as claimed in claim 6, wherein the classification results include segmented disease regions overlaid on the original image and textual information on disease type.
| # | Name | Date |
|---|---|---|
| 1 | 202541090649-STATEMENT OF UNDERTAKING (FORM 3) [23-09-2025(online)].pdf | 2025-09-23 |
| 2 | 202541090649-REQUEST FOR EARLY PUBLICATION(FORM-9) [23-09-2025(online)].pdf | 2025-09-23 |
| 3 | 202541090649-POWER OF AUTHORITY [23-09-2025(online)].pdf | 2025-09-23 |
| 4 | 202541090649-FORM-9 [23-09-2025(online)].pdf | 2025-09-23 |
| 5 | 202541090649-FORM FOR SMALL ENTITY(FORM-28) [23-09-2025(online)].pdf | 2025-09-23 |
| 6 | 202541090649-FORM 1 [23-09-2025(online)].pdf | 2025-09-23 |
| 7 | 202541090649-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [23-09-2025(online)].pdf | 2025-09-23 |
| 8 | 202541090649-EVIDENCE FOR REGISTRATION UNDER SSI [23-09-2025(online)].pdf | 2025-09-23 |
| 9 | 202541090649-EDUCATIONAL INSTITUTION(S) [23-09-2025(online)].pdf | 2025-09-23 |
| 10 | 202541090649-DRAWINGS [23-09-2025(online)].pdf | 2025-09-23 |
| 11 | 202541090649-DECLARATION OF INVENTORSHIP (FORM 5) [23-09-2025(online)].pdf | 2025-09-23 |
| 12 | 202541090649-COMPLETE SPECIFICATION [23-09-2025(online)].pdf | 2025-09-23 |