A Hybrid System Of Transformer Cnn Framework For Automated Glaucoma

< Back

A Hybrid System Of Transformer Cnn Framework For Automated Glaucoma Detection With Hybrid Attentive U Net Segmentation

Abstract: A HYBRID SYSTEM OF TRANSFORMER-CNN FRAMEWORK FOR AUTOMATED GLAUCOMA DETECTION WITH HYBRID-ATTENTIVE U-NET SEGMENTATION The invention discloses GLAUCO-VITNET, a hybrid transformer-convolutional neural network framework for automated glaucoma detection from retinal fundus images. The system comprises a preprocessing module for contrast enhancement, normalization, and noise reduction; a Hybrid-Attentive U-Net segmentation module with channel and spatial attention for accurate optic disc and cup segmentation; a data augmentation module to generate synthetic training samples; a hybrid feature extraction module combining wavelet transforms, convolutional features, and transformer-based contextual features; a feature selection module employing swarm optimization; and a classification module integrating vision transformers and CNNs for glaucoma detection. The method involves preprocessing retinal images, segmenting optic disc and cup, extracting hybrid features, selecting optimal discriminative attributes, and classifying the image as glaucomatous or non-glaucomatous. The invention provides accurate glaucoma diagnosis through simultaneous segmentation and classification, enhancing interpretability and clinical relevance. It reduces subjectivity in manual assessment, improves early detection, and supports large-scale screening in ophthalmic healthcare.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

22 September 2025

Publication Number

43/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SR UNIVERSITY

ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Inventors

1. RAMAVATH VINOD KUMAR

RESEARCH SCHOLAR, SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

2. DR. N. SHARMILA BANU

ASSISTANT DEAN (RESEARCH)&ASSISTANT PROFESSOR(CS&AI), SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Specification

Description:FIELD OF THE INVENTION
The present invention relates to medical imaging and artificial intelligence, particularly to ophthalmic disease detection. More specifically, it concerns a hybrid deep learning framework combining convolutional neural networks (CNNs) and vision transformers (ViTs) for automated glaucoma detection. The invention further provides a hybrid-attentive U-Net architecture for segmentation of optic disc and optic cup regions from retinal fundus images, enabling accurate clinical biomarker extraction.
BACKGROUND OF THE INVENTION
The main cause of glaucoma, which is a major contributor to irreversible blindness, is elevated intraocular pressure (IOP) that harms the optic nerve. Traditional diagnosis relies on the subjective, laborious, and variable manual examination of ophthalmologists, despite the fact that early detection is crucial to preventing vision loss. Although computer vision-based deep learning for glaucoma diagnosis has demonstrated promise, existing models cannot correctly classify glaucomatous conditions and segment the optic disc (OD) and optic cup (OC). Challenges such as extraction of key biomarkers such as the Cup-to-Disc Ratio (CDR), handling variability of data, and model generalizability across multiple datasets are a few. The objective of this work is to build a scalable and impactful framework that utilizes top-of-the-line deep learning approaches like CNNs, U-Net architecture, attention blocks, transfer learning, and multi-task learning. For the highest performance, several segmentation and classification models have been designed with publicly available datasets such as ORIGA, REFUGE, RIM-ONE, and DRISHTI-GS. Yet, to enhance robustness, accuracy, and adaptability, additional effort is required. The primary challenge is developing a comprehensive system that is effective with respect to data management, correct glaucoma classification, and high segmentation accuracy. This research presents new DL models to improve classification accuracy, segmentation performance, and best feature extraction. The ultimate long-term goal is the construction of a reliable automatic system to help ophthalmologists diagnose early glaucoma and decrease diagnosis time, while enhancing patient care.
US11972568B2: Systems and methods for assessing glaucoma loss using optical coherence topography. One method according to an aspect comprises receiving optical coherence image data and assessing functional glaucoma damage from retinal optical coherence image data. In an aspect, the systems and methods can map regions and layers of the eye to determine structural characteristics to compare to functional characteristics.
US12170147: Systems, methods, and computer program products for predicting and detecting the onset of retinal diseases are provided. A method of detecting glaucoma includes: pre-training at least one neural network model of a plurality of neural network models based on a small data classifier; training the plurality of neural network models based on a plurality of indications of glaucoma based on retinal data including at least two of a peripapillary atrophy value, a disc hemorrhage value, and a blood vessel structure analysis value; simultaneously generating a risk score associated with each of the plurality of indications based on the trained plurality of neural network models; combining the risk score associated with each of the plurality of indications based on a classification model to produce a likelihood of glaucoma; and determining whether glaucoma is present based on the likelihood of glaucoma.
SUMMARY OF THE INVENTION
This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention.
This summary is neither intended to identify key or essential inventive concepts of the invention and nor is it intended for determining the scope of the invention.
Glaucoma is a leading cause of irreversible blindness worldwide, primarily caused by elevated intraocular pressure that damages the optic nerve. Early detection is critical but conventional diagnosis depends on manual assessment by ophthalmologists, which is time-consuming, subjective, and prone to variability. Existing automated methods using deep learning show promise but have limitations. CNN-only models fail to capture global context, while transformer-only models lack strong local feature extraction. Furthermore, optic disc and cup segmentation remains challenging, affecting accurate measurement of the cup-to-disc ratio (CDR), a key biomarker. There is thus a pressing need for a robust, accurate, and interpretable automated system capable of performing both segmentation and classification across diverse datasets. The present invention addresses these issues by introducing a hybrid CNN-Transformer framework with dual-attention segmentation, improving detection accuracy and reliability.
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
This invention introduced a new GLAUCO-ViTNet model for Automated Glaucoma Detection with Segmentation framework. The new glaucoma detection process starts with preprocessing, where image normalization and contrast enhancement are done using AHE and Gamma Correction, followed by noise reduction using a HNLMF and Guided Filter. Next, ROI identification is achieved via a Hybrid-Attentive U-Net (HAU-Net), incorporating a Hybrid U-Net-MobileNetV2 Encoder with a Dual-Attention Mechanism (integrating Channel and Spatial Attention) and a MSFFM for accurate segmentation of the optic disc, blood vessels, and lesions. Data augmentation is done using geometric transformations and elastic deformations via a GADM to create high-quality synthetic samples. In the feature extraction phase, a hybrid approach blends WST with RSA-based CNN to extract multi-scale texture and structural features. The process of feature selection uses the SFOAand WSO to optimize discriminative features. Lastly, detection using deep learning is done by GLAUCO-ViTNet, a hybrid Transformer-CNN architecture, combining LSA Transformer for spatial context, CAMV-CNN for multi-view glaucoma classification, DGCN for learning inter-feature relationships, and Hybrid Vision Transformer-CNN Fusion (ViT-CNN-Fusion) for extracting both spatial and contextual representations to ensure precise glaucoma detection. Figure 1 illustrates the proposed glaucoma detection framework architecture
BRIEF DESCRIPTION OF THE DRAWINGS
The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
FIGURE 1: SYSTEM ARCHITECTURE
The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION OF THE INVENTION
The detailed description of various exemplary embodiments of the disclosure is described herein with reference to the accompanying drawings. It should be noted that the embodiments are described herein in such details as to clearly communicate the disclosure. However, the amount of details provided herein is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
It is also to be understood that various arrangements may be devised that, although not explicitly described or shown herein, embody the principles of the present disclosure. Moreover, all statements herein reciting principles, aspects, and embodiments of the present disclosure, as well as specific examples, are intended to encompass equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a",” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
In addition, the descriptions of "first", "second", “third”, and the like in the present invention are used for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
This study introduced a new GLAUCO-ViTNet model for Automated Glaucoma Detection with Segmentation framework. The new glaucoma detection process starts with preprocessing, where image normalization and contrast enhancement are done using AHE and Gamma Correction, followed by noise reduction using a HNLMF and Guided Filter. Next, ROI identification is achieved via a Hybrid-Attentive U-Net (HAU-Net), incorporating a Hybrid U-Net-MobileNetV2 Encoder with a Dual-Attention Mechanism (integrating Channel and Spatial Attention) and a MSFFM for accurate segmentation of the optic disc, blood vessels, and lesions. Data augmentation is done using geometric transformations and elastic deformations via a GADM to create high-quality synthetic samples. In the feature extraction phase, a hybrid approach blends WST with RSA-based CNN to extract multi-scale texture and structural features. The process of feature selection uses the SFOAand WSO to optimize discriminative features. Lastly, detection using deep learning is done by GLAUCO-ViTNet, a hybrid Transformer-CNN architecture, combining LSA Transformer for spatial context, CAMV-CNN for multi-view glaucoma classification, DGCN for learning inter-feature relationships, and Hybrid Vision Transformer-CNN Fusion (ViT-CNN-Fusion) for extracting both spatial and contextual representations to ensure precise glaucoma detection. Figure 1 illustrates the proposed glaucoma detection framework architecture
GLAUCO-VITNET lies in its hybrid design that combines the strengths of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) for automated glaucoma detection. While CNNs excel at capturing local spatial features, the integrated ViT modules contribute global contextual awareness, allowing the model to better understand complex retinal structures. Additionally, the framework introduces a novel Hybrid-Attentive U-Net architecture for optic disc and cup segmentation, which incorporates both spatial and channel attention mechanisms to focus on diagnostically relevant regions. This dual-attention mechanism significantly improves segmentation precision, even in challenging fundus images. Another key innovation is the fusion of deep features with clinically interpretable handcrafted metrics such as the cup-to-disc ratio (CDR), enhancing both performance and explainability. Furthermore, GLAUCO-VITNET employs a multi-task learning pipeline that unifies segmentation and classification tasks, enabling end-to-end optimization and improved generalization. These contributions collectively make GLAUCO-VITNET a robust, interpretable, and clinically relevant system for early glaucoma screening and diagnosis.
The invention provides a system and method for automated glaucoma detection from retinal fundus images. The system architecture begins with a preprocessing module that performs image normalization, adaptive histogram equalization, gamma correction, and denoising using hybrid filters. These operations enhance image contrast, reduce noise, and prepare data for reliable feature extraction.
The segmentation stage employs a Hybrid-Attentive U-Net that integrates CNN encoders with transformer-based attention modules. A MobileNetV2 encoder is used to capture local features, while dual-attention blocks consisting of channel and spatial attention guide the network to focus on diagnostically relevant regions. This enables accurate segmentation of the optic disc and cup boundaries.
The segmentation results allow computation of the cup-to-disc ratio, an essential biomarker for glaucoma diagnosis. The network also incorporates a multi-scale feature fusion module, combining features at different resolutions to improve boundary delineation of the optic disc and cup.
Data augmentation is performed using geometric transformations and elastic deformations, creating diverse synthetic samples. This step improves model generalization and robustness in real-world clinical scenarios.
Feature extraction combines traditional wavelet scattering transforms with CNN-derived hierarchical features and transformer-derived global features. This hybrid feature representation improves discriminatory power between glaucomatous and non-glaucomatous images.
Feature selection is optimized using swarm intelligence algorithms such as whale swarm optimization or seagull optimization, which refine the feature space to retain only the most informative attributes.
The classification stage employs GLAUCO-VITNET, a hybrid framework combining vision transformers and CNNs. Local spatial features are captured through convolutional modules, while transformers capture long-range dependencies and global context. The fusion of these complementary representations ensures robust classification even under variable imaging conditions.
The detection process is further enhanced by the inclusion of contextual modules such as graph convolutional networks to capture inter-feature relationships. The system outputs both segmentation masks for optic disc and cup and a classification label indicating presence or absence of glaucoma.
This invention is designed to operate across multiple publicly available datasets, including ORIGA, REFUGE, RIM-ONE, and DRISHTI-GS, ensuring high generalizability. Its modular architecture also allows integration with new datasets and adaptation to other retinal diseases.
The novelty lies in its hybridization of CNN and transformer paradigms within a single unified framework, dual-attentive U-Net segmentation, integration of handcrafted clinical features, and explainable attention mechanisms. This ensures clinical interpretability while maintaining high diagnostic performance.
The invention is applicable to healthcare settings, screening programs, and telemedicine platforms, enabling early glaucoma detection and reducing the burden on ophthalmologists.
Best Method of Working
The best method of working involves implementing the GLAUCO-VITNET system on retinal fundus images acquired from clinical settings or publicly available datasets. Images are preprocessed through normalization, contrast enhancement, and denoising. Segmentation of optic disc and cup is carried out using the Hybrid-Attentive U-Net with channel and spatial attention. The extracted features are processed through wavelet scattering transforms, CNN layers, and transformer blocks. Feature selection is optimized using swarm-based algorithms, and classification is performed using the hybrid CNN-transformer fusion module. Performance is evaluated by computing accuracy, sensitivity, specificity, and area under the curve (AUC) metrics. The method is best implemented with GPU acceleration for training and inference, ensuring scalability in real-time clinical screening environments.
ADVANTAGES OF THE INVENTION
1. Superior Feature Representation
• Combines CNN’s ability to extract low-level features (edges, textures) with the Transformer’s global context awareness.
• Leads to better discrimination between glaucomatous and healthy eyes, especially in subtle cases.
2. Hybrid-Attentive U-Net for Better Segmentation
• Attention modules within U-Net enhance focus on relevant regions (optic disc and cup), improving segmentation precision.
• Helps in accurate cup-to-disc ratio (CDR) estimation, which is crucial for glaucoma detection.
3. Improved Classification Accuracy
• Leveraging both CNN and ViT leads to higher classification accuracy, F1-score, and AUC.
• Better at handling imbalanced data and noisy fundus images compared to pure CNNs.
4. Greater Generalization
• Hybrid models tend to generalize better across diverse datasets and patient demographics.
• The combination avoids overfitting to local textures (a known issue in CNNs alone).

, Claims:1. A system for automated glaucoma detection from retinal images, comprising:
• a preprocessing module configured for normalization, contrast enhancement, and noise reduction;
• a segmentation module comprising a Hybrid-Attentive U-Net with CNN encoder, channel attention, and spatial attention for optic disc and cup segmentation;
• a data augmentation module configured to generate synthetic retinal images using geometric and elastic transformations;
• a feature extraction module configured to combine wavelet scattering transforms, convolutional features, and transformer-based contextual features;
• a feature selection module employing swarm optimization algorithms to refine discriminative attributes; and
• a classification module comprising a hybrid vision transformer-convolutional neural network fusion for glaucoma detection.
2. The system as claimed in claim 1, wherein the preprocessing module applies adaptive histogram equalization, gamma correction, and hybrid noise reduction filters.
3. The system as claimed in claim 1, wherein the segmentation module enables accurate estimation of cup-to-disc ratio for glaucoma assessment.
4. The system as claimed in claim 1, wherein the data augmentation module applies geometric transformations and elastic deformations to increase training diversity.
5. The system as claimed in claim 1, wherein the classification module integrates graph-based contextual learning for inter-feature relationship modeling.
6. A method for automated glaucoma detection from retinal fundus images, comprising the steps of:
• preprocessing retinal images by applying normalization, contrast enhancement, and denoising;
• segmenting optic disc and optic cup using a Hybrid-Attentive U-Net with CNN encoder, channel attention, and spatial attention;
• performing data augmentation to generate synthetic samples for training;
• extracting hybrid features combining wavelet transforms, convolutional layers, and transformer-based global representations;
• selecting optimal features using swarm intelligence optimization techniques; and
• classifying the retinal image using a hybrid transformer-convolutional neural network fusion model to detect glaucoma.
7. The method as claimed in claim 6, wherein the segmentation step includes multi-scale feature fusion for precise boundary delineation.
8. The method as claimed in claim 6, wherein the feature selection employs whale swarm optimization or equivalent algorithms.
9. The method as claimed in claim 6, wherein classification integrates both handcrafted clinical features and deep learning-derived features.
10. The method as claimed in claim 6, wherein the method outputs both glaucoma classification results and segmented optic disc and cup masks for clinical interpretation.

Documents

Application Documents

#	Name	Date
1	202541090185-STATEMENT OF UNDERTAKING (FORM 3) [22-09-2025(online)].pdf	2025-09-22
2	202541090185-REQUEST FOR EARLY PUBLICATION(FORM-9) [22-09-2025(online)].pdf	2025-09-22
3	202541090185-POWER OF AUTHORITY [22-09-2025(online)].pdf	2025-09-22
4	202541090185-FORM-9 [22-09-2025(online)].pdf	2025-09-22
5	202541090185-FORM FOR SMALL ENTITY(FORM-28) [22-09-2025(online)].pdf	2025-09-22
6	202541090185-FORM 1 [22-09-2025(online)].pdf	2025-09-22
7	202541090185-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [22-09-2025(online)].pdf	2025-09-22
8	202541090185-EVIDENCE FOR REGISTRATION UNDER SSI [22-09-2025(online)].pdf	2025-09-22
9	202541090185-EDUCATIONAL INSTITUTION(S) [22-09-2025(online)].pdf	2025-09-22
10	202541090185-DRAWINGS [22-09-2025(online)].pdf	2025-09-22
11	202541090185-DECLARATION OF INVENTORSHIP (FORM 5) [22-09-2025(online)].pdf	2025-09-22
12	202541090185-COMPLETE SPECIFICATION [22-09-2025(online)].pdf	2025-09-22