Abstract: 040] The present invention discloses a hybrid computer-implemented system for automated glaucoma detection using retinal fundus images. The system integrates an EfficientNetB0-based convolutional feature extractor with a Transformer encoder to capture both local spatial details and global contextual dependencies of retinal structures. Preprocessing includes image resizing and normalization to ensure uniformity, while the EfficientNetB0 backbone generates robust feature vectors that are further refined by multi-head self-attention and positional encoding within the Transformer module. A classification head employing pooling and dense layers produces probabilistic outputs for glaucoma and non-glaucoma categories. Experimental validation on composite datasets demonstrates superior performance, achieving accuracy above 99% with high sensitivity and specificity, surpassing conventional CNN-only or transfer learning models. The invention is lightweight, scalable, and adaptable for clinical use, portable ophthalmic devices, and tele-ophthalmology platforms, thereby enabling early detection and prevention of vision loss. Accompanied Drawing [FIGS. 1-2]
Description:001] The present invention relates generally to the field of medical imaging and computer-assisted diagnostics. More specifically, the invention concerns a hybrid artificial intelligence (AI)-based system that integrates a convolutional neural network (CNN) architecture, namely EfficientNet, with Transformer-based attention mechanisms for the automated detection of glaucoma from retinal fundus images.
[002] The invention lies at the intersection of ophthalmology, deep learning, and healthcare automation, addressing the need for accurate, scalable, and efficient diagnostic solutions in eye-care. It enables the development of automated screening tools, particularly suitable for clinical use, tele-ophthalmology, and deployment in resource-constrained environments where expert ophthalmologists may not be readily available.
BACKGROUND OF THE INVENTION
[003] Glaucoma is a chronic, progressive optic neuropathy and one of the leading causes of irreversible blindness worldwide. It is primarily characterized by damage to the optic nerve, often associated with elevated intraocular pressure, although normal-tension glaucoma also exists. In its early stages, glaucoma is typically asymptomatic, which makes early detection difficult and results in late diagnosis in many patients. Delayed identification significantly reduces the effectiveness of available treatments and may ultimately lead to permanent vision loss.
[004] Traditional clinical methods for glaucoma diagnosis include tonometry for intraocular pressure measurement, gonioscopy for angle assessment, pachymetry for corneal thickness evaluation, perimetry for visual field testing, and optical coherence tomography (OCT) for structural imaging of the optic nerve head and retinal nerve fiber layer. While these techniques are well established, they require expensive equipment, trained ophthalmologists, and are time-consuming, limiting their applicability in population-scale screening programs.
[005] Fundus photography, which captures high-resolution images of the retina, has emerged as a valuable, non-invasive, and relatively cost-effective modality for glaucoma assessment. Clinicians rely on features such as the cup-to-disc ratio, optic nerve head morphology, and retinal nerve fiber layer thinning for diagnosis. However, manual interpretation of fundus images is highly subjective, operator-dependent, and prone to inter-observer variability, which can lead to inconsistent diagnostic outcomes.
[006] The application of machine learning and, more recently, deep learning techniques to medical image analysis has demonstrated significant promise in addressing these challenges. Convolutional Neural Networks (CNNs) have achieved remarkable performance in image classification and segmentation tasks, enabling the automated detection of ocular pathologies, including glaucoma. Nevertheless, CNN-based approaches have inherent limitations, particularly their inability to fully capture long-range dependencies and global contextual relationships across an image, which are crucial in detecting subtle manifestations of glaucoma.
[007] Transformers, originally introduced for natural language processing, have recently been adapted to vision tasks and have proven capable of modeling global dependencies across image regions through multi-head self-attention mechanisms. Although these architectures achieve superior accuracy, they typically require large datasets and high computational resources, making them challenging to implement in resource-limited clinical environments.
[008] Hybrid architectures that combine CNNs and Transformers have therefore been proposed to leverage the strengths of both approaches. CNNs excel at local feature extraction, while Transformers capture global dependencies. However, existing hybrid solutions often suffer from excessive computational complexity, limited generalizability across diverse datasets, or lack of clinical validation.
[009] There is thus a pressing need for a diagnostic system that integrates computational efficiency, high accuracy, and clinical scalability. Such a system should be capable of analyzing large collections of retinal fundus images, provide consistent results across heterogeneous datasets, and be deployable in real-world settings ranging from specialized hospitals to rural screening programs.
[010] The present invention addresses these unmet needs by introducing a Hybrid EfficientNet–Transformer architecture specifically tailored for glaucoma detection. This architecture combines the lightweight and efficient design of EfficientNetB0 for local spatial feature extraction with the global attention capabilities of a Transformer encoder. The resulting system achieves high classification accuracy, reduces overfitting, and demonstrates robustness across multiple benchmark datasets.
[011] By offering a scalable, accurate, and computationally feasible solution, the invention fills a critical gap in the field of ophthalmic diagnostics. It provides an automated, objective, and accessible means of glaucoma detection, ultimately enabling early intervention, reducing the burden of blindness, and contributing to improved public health outcomes.
SUMMARY OF THE INVENTION
[012] The present invention provides a computer-implemented hybrid architecture for the automated detection of glaucoma using retinal fundus images. The system integrates a convolutional neural network, specifically EfficientNetB0, with a Transformer encoder to achieve high-precision classification of glaucomatous and normal cases. This combination effectively overcomes the limitations of conventional CNN-only models by incorporating both local feature extraction and global dependency modeling in a single, lightweight framework.
[013] In one embodiment, the invention utilizes EfficientNetB0 as the feature extraction backbone. The EfficientNetB0 model is designed with compound scaling, depthwise separable convolutions, and squeeze-and-excitation blocks, enabling it to capture fine-grained retinal structures while remaining computationally efficient. The extracted feature vectors are subsequently refined through Transformer encoder layers, which employ multi-head self-attention mechanisms and positional encoding to capture global structural relationships across the optic nerve head and surrounding regions.
[014] The invention further comprises a classification head, which aggregates the refined feature vectors through pooling operations, followed by dense layers and a softmax activation layer to provide probabilistic outputs corresponding to glaucoma and non-glaucoma categories. The architecture is trained using labeled datasets comprising a wide variety of fundus images, including those from ACRIMA, Drishti-GS1, HRF, RIM-One, and ORIGA-LIGHT, ensuring robustness and generalizability across diverse populations.
[015] Experimental validation of the system demonstrates its superior performance, achieving an accuracy exceeding 99% with minimal overfitting. Comparative evaluations against transfer learning models such as VGG16 and ResNet50, as well as traditional classifiers such as support vector machines, show that the proposed hybrid model significantly outperforms existing approaches in terms of sensitivity, specificity, and F1-score.
[016] The invention also provides scalability and adaptability. It is designed for deployment in multiple environments, including clinical diagnostic centers, portable ophthalmic screening devices, and tele-ophthalmology platforms. Its lightweight design ensures compatibility with standard computational resources, making it practical for real-time application and use in low-resource healthcare settings.
[017] The hybrid EfficientNet–Transformer system disclosed herein thus represents a novel and effective solution to the long-standing challenge of early glaucoma detection. By combining advanced machine learning techniques with clinical applicability, the invention enables accurate, consistent, and objective diagnosis, thereby facilitating timely treatment and reducing the risk of irreversible blindness.
[018] Accordingly, the invention offers substantial advantages over prior art systems, including improved diagnostic accuracy, reduced dependency on human expertise, efficient computational requirements, and the potential for extension to other ophthalmic diseases.
BRIEF DESCRIPTION OF THE DRAWINGS
[019] The accompanying figures included herein, and which form parts of the present invention, illustrate embodiments of the present invention, and work together with the present invention to illustrate the principles of the invention Figures:
[020] Figure 1, illustrates a representative retinal fundus image obtained from the Acrima dataset, depicting the optic disc and optic cup regions.
[021] Figure 2, provides a schematic block diagram of the proposed Hybrid EfficientNet–Transformer system architecture.
DETAILED DESCRIPTION OF THE INVENTION
[022] The present invention provides a hybrid computer-implemented system for the automated detection of glaucoma from retinal fundus images. The invention addresses limitations of existing diagnostic techniques by integrating EfficientNetB0-based convolutional neural networks for local feature extraction with Transformer encoders for global context modeling. This unique combination allows the system to capture both fine-grained structural details and long-range dependencies across the retina, thereby improving diagnostic accuracy.
[023] Figure 1 illustrates a representative fundus image used in glaucoma detection, obtained from the Acrima dataset. The optic disc (OD) and optic cup (OC) regions are indicated, as these are critical clinical markers in assessing glaucomatous damage. A normal eye typically exhibits a cup-to-disc ratio (CDR) in the range of 0.2 to 0.3, whereas glaucomatous eyes often present with a CDR greater than 0.6. The invention automates the detection of such abnormalities, reducing reliance on manual assessment by ophthalmologists.
[024] Figure 2 provides a schematic block diagram of the invention. The system comprises:
(a) an input module for receiving and preprocessing retinal fundus images;
(b) a feature extraction module utilizing EfficientNetB0;
(c) a global context modeling module based on Transformer encoders; and
(d) a classification head for generating diagnostic predictions.
[025] In operation, retinal fundus images are first acquired from digital fundus cameras or publicly available datasets. The input module preprocesses these images by converting them into RGB format, resizing them to 224×224 pixels, and normalizing pixel intensities using ImageNet mean and standard deviation values. These preprocessing steps ensure consistency and compatibility with the EfficientNetB0 backbone. Data augmentation techniques such as rotation, flipping, and scaling may also be applied to improve robustness.
[026] The EfficientNetB0 feature extraction module comprises depthwise separable convolutions organized into Mobile Inverted Bottleneck Convolution (MBConv) blocks. Each MBConv block incorporates squeeze-and-excitation units that apply channel-wise attention to emphasize critical features, such as the optic cup boundary and retinal nerve fiber layer variations. Mathematically, the expansion, depthwise convolution, and squeeze-and-excitation steps are implemented as follows:
The EfficientNetB0 backbone outputs a 1280-dimensional feature vector for each input image.
[027] The Transformer encoder module refines the feature vector obtained from EfficientNetB0 by applying multi-head self-attention and positional encoding. Each encoder block consists of eight attention heads, feed-forward layers, residual connections, and normalization layers. The self-attention mechanism enables the model to capture long-range dependencies across the optic nerve head, blood vessels, and retinal structures that may not be locally connected but are clinically relevant to glaucoma diagnosis.
[028] Positional encoding is applied to preserve spatial information in the sequence of features, ensuring that the Transformer can differentiate between structural regions of the retina. The feed-forward network introduces nonlinearity and improves the representation power of the model. Residual connections aid gradient flow during training, while normalization stabilizes and accelerates convergence. The refined features maintain the same dimensionality (B, 1, 1280), allowing seamless integration with the classification head.
[029] The classification head comprises an average pooling layer, fully connected dense layers, and a final softmax output layer. The average pooling operation reduces feature redundancy and aggregates information across channels. The dense layers map refined features into class probabilities, and the softmax function generates final outputs corresponding to the categories "glaucoma" and "normal."
[030] The system is trained using a composite dataset that includes ACRIMA, Drishti-GS1, HRF, RIM-One, and ORIGA-LIGHT. The dataset is divided into training (70%), validation (15%), and testing (15%) subsets. Label encoding is employed to convert categorical labels into numerical values compatible with the cross-entropy loss function. Training is performed using the Adam optimizer with a learning rate adjusted via a scheduling policy.
[031] The invention achieves high diagnostic performance, with training and validation losses approaching zero and minimal overfitting observed. Comparative experiments demonstrate that the proposed hybrid system achieves 99.7% accuracy, outperforming conventional CNN models such as VGG16, ResNet50, and support vector machine classifiers. Confusion matrix analysis shows accurate classification of 2250 glaucomatous and 2249 normal cases out of 4500 test samples, with only a single misclassification.
[032] The system is suitable for real-world deployment in various environments. It can be integrated into clinical workflows as a decision-support tool for ophthalmologists, embedded in portable fundus cameras for community screening programs, or incorporated into cloud-based tele-ophthalmology platforms for remote diagnosis. Its lightweight architecture ensures feasibility even in low-resource healthcare settings.
[033] The invention is not limited to glaucoma detection. By retraining the hybrid EfficientNet–Transformer architecture with appropriate datasets, the system can be extended to other ophthalmic diseases such as diabetic retinopathy, age-related macular degeneration, and hypertensive retinopathy. This adaptability makes the invention a scalable platform for AI-powered ophthalmic screening.
[034] In summary, the detailed architecture of the present invention provides an efficient and highly accurate means of automated glaucoma detection. By combining the strengths of CNN-based local feature extraction and Transformer-based global context modeling, the invention delivers superior performance compared to existing methods, ensures scalability for real-world use, and contributes significantly to the prevention of irreversible blindness.
[035] The present invention introduces a novel hybrid EfficientNet–Transformer architecture designed specifically for the automated detection of glaucoma from retinal fundus images. By combining lightweight convolutional neural network modules for local feature extraction with Transformer encoders for global contextual modeling, the invention achieves superior diagnostic accuracy, robustness, and scalability compared to existing approaches. The system demonstrates a diagnostic accuracy exceeding 99%, with minimal misclassification, thereby establishing its reliability for clinical deployment.
[036] One of the key advantages of the invention lies in its computational efficiency and adaptability. Unlike conventional deep learning architectures that demand significant hardware resources, the proposed system is optimized for use in diverse healthcare environments, including hospitals, diagnostic centers, mobile screening units, and tele-ophthalmology platforms. This ensures that early glaucoma detection becomes accessible to both urban and rural populations, particularly in regions with limited access to specialist ophthalmologists.
[037] The invention also provides a scalable framework for broader medical applications. While the system has been demonstrated for glaucoma detection, the hybrid architecture can be retrained with domain-specific datasets to identify other ocular diseases such as diabetic retinopathy, macular degeneration, and hypertensive retinopathy. Beyond ophthalmology, the same architecture may be extended to other medical imaging tasks, including dermatology, radiology, and oncology, where both local structural features and global dependencies are critical for accurate diagnosis.
[038] Future developments of the invention may include integration with multi-modal imaging data, such as optical coherence tomography (OCT) and scanning laser ophthalmoscopy (SLO), to enhance diagnostic precision. Incorporating explainable AI (XAI) techniques will further allow clinicians to interpret system outputs, thereby increasing trust and facilitating adoption in clinical practice. Additionally, cloud-based implementations of the invention may support large-scale screening programs, enabling real-time population-level monitoring of glaucoma and other eye diseases.
[039] In conclusion, the invention represents a significant advancement in AI-driven ophthalmic diagnostics, addressing longstanding limitations in manual screening and CNN-only models. Its combination of accuracy, efficiency, and adaptability positions it as a transformative solution in the fight against preventable blindness. With further refinements and clinical integration, the invention is expected to play a pivotal role in reshaping eye-care delivery, empowering healthcare systems worldwide to provide timely, affordable, and accurate glaucoma detection.
, Claims:1. A computer-implemented system for automated glaucoma detection comprising:
o an input module configured to receive and preprocess retinal fundus images;
o a convolutional feature extraction module employing an EfficientNetB0 backbone for generating spatial feature vectors;
o a Transformer encoder module configured to refine said feature vectors using multi-head self-attention and positional encoding; and
o a classification head comprising pooling layers, dense layers, and a softmax activation function for generating diagnostic outputs.
2. The system of claim 1, wherein the preprocessing includes resizing input images to 224×224 pixels and normalizing pixel values based on ImageNet statistics to ensure consistency across datasets.
3. The system of claim 1, wherein the EfficientNetB0 feature extractor comprises Mobile Inverted Bottleneck Convolution (MBConv) blocks with squeeze-and-excitation units for enhancing channel-wise feature representation.
4. The system of claim 1, wherein the Transformer encoder comprises at least two encoder layers with eight attention heads each, configured to capture long-range dependencies across the optic disc and optic cup regions.
5. The system of claim 1, wherein the classification head is trained using a composite dataset comprising ACRIMA, Drishti-GS1, HRF, RIM-One, and ORIGA-LIGHT fundus images, thereby improving robustness across heterogeneous populations.
6. The system of claim 1, wherein the system achieves a diagnostic accuracy of at least 99%, with sensitivity and specificity exceeding 99% when evaluated on a test dataset of fundus images.
7. The system of claim 1, wherein the system is deployed on a portable ophthalmic screening device or tele-ophthalmology platform, enabling real-time glaucoma detection in resource-constrained healthcare environments.
8. A method for automated detection of glaucoma, comprising the steps of:
o receiving retinal fundus images as input;
o preprocessing the images into a standardized format;
o extracting spatial features using an EfficientNetB0 backbone;
o refining said features using a Transformer encoder; and
o classifying the processed features into glaucoma and non-glaucoma categories using a softmax classifier.
9. A non-transitory computer-readable medium storing instructions which, when executed by one or more processors, cause the processor(s) to perform the method according to claim 8.
10. The system of claim 1, wherein the hybrid architecture is further configured to be extended for detection of other ophthalmic diseases, including diabetic retinopathy, age-related macular degeneration, and hypertensive retinopathy, by retraining on corresponding annotated datasets.
| # | Name | Date |
|---|---|---|
| 1 | 202541080997-STATEMENT OF UNDERTAKING (FORM 3) [26-08-2025(online)].pdf | 2025-08-26 |
| 2 | 202541080997-REQUEST FOR EARLY PUBLICATION(FORM-9) [26-08-2025(online)].pdf | 2025-08-26 |
| 3 | 202541080997-FORM-9 [26-08-2025(online)].pdf | 2025-08-26 |
| 4 | 202541080997-FORM 1 [26-08-2025(online)].pdf | 2025-08-26 |
| 5 | 202541080997-DRAWINGS [26-08-2025(online)].pdf | 2025-08-26 |
| 6 | 202541080997-DECLARATION OF INVENTORSHIP (FORM 5) [26-08-2025(online)].pdf | 2025-08-26 |
| 7 | 202541080997-COMPLETE SPECIFICATION [26-08-2025(online)].pdf | 2025-08-26 |