Abstract: A MULTI-MODAL AI SYSTEM INTEGRATING FUNDS, OCT, AND HYPERSPECTRAL IMAGES FOR SUPERIOR DR DETECTION A multi-modal artificial intelligence system and method for diabetic retinopathy detection are disclosed. Fundus, optical coherence tomography and hyperspectral retinal images are acquired and preprocessed to remove artefacts and enhance quality. Modality-specific convolutional neural network encoders extract complementary structural, cross-sectional and biochemical features from each image type. An attention-based fusion module concatenates and dynamically weights the extracted features to form a fused representation. A classification module uses the fused representation to assign a diabetic retinopathy stage, and an output module displays the result along with explanatory heatmaps highlighting image regions contributing to the decision. By integrating fundus, optical coherence tomography and hyperspectral imaging within a single end-to-end deep learning architecture, the invention achieves improved sensitivity, specificity and clinical interpretability compared to single-modality or dual-modality systems, enabling earlier and more reliable diabetic retinopathy screening in hospitals, clinics and telemedicine settings.
Description:FIELD OF THE INVENTION
The present invention relates to the field of artificial intelligence (AI)-based medical imaging and diagnostics, particularly to the automated detection of Diabetic Retinopathy (DR). More specifically, it pertains to a novel multi-modal system that integrates Fundus Photography, Optical Coherence Tomography (OCT), and Hyperspectral Imaging (HSI) using advanced deep learning techniques for improved accuracy and early-stage detection of DR.
BACKGROUND OF THE INVENTION
Current diagnostic systems often rely on a single imaging modality such as fundus images or OCT. While effective to an extent, each modality alone is limited by resolution, field of view, or contrast detail. Hyperspectral imaging has recently shown potential in capturing subtle biochemical and structural changes in retinal tissue that are not visible in standard imaging.
However, no current system integrates all three modalities using a unified AI architecture. This invention addresses this gap by combining the strengths of each modality to deliver significantly enhanced diagnostic performance.
Diabetic Retinopathy (DR) is a leading cause of vision loss globally, especially among working-age populations. Early and accurate detection is critical for timely intervention and preventing irreversible damage. Traditional DR screening methods often rely on single imaging modalities such as fundus photography, which, although useful, may miss subtle pathological features visible in other imaging techniques. Optical Coherence Tomography (OCT) provides depth-resolved information, while hyperspectral imaging offers spectral data that can reveal biochemical changes at the cellular level. However, these modalities are typically analyzed in isolation, limiting diagnostic accuracy and efficiency.
There exists a critical gap in leveraging the complementary strengths of these imaging technologies in an integrated, AI-powered diagnostic system. Current approaches do not effectively utilize multi-modal data fusion, leading to suboptimal sensitivity and specificity in DR detection. Furthermore, there is a lack of robust, scalable frameworks that can process and interpret this complex data in a clinically meaningful way.
US20250014155A1: Embodiments disclose systems and methods that aid in screening, diagnosis and/or monitoring of medical conditions. The systems and methods may allow, for example, for automated identification and localization of lesions and other anatomical structures from medical data obtained from medical imaging devices, computation of image-based biomarkers including quantification of dynamics of lesions, and/or integration with telemedicine services, programs, or software.
US9749547B2: Systems and methods for implementing array cameras configured to perform super-resolution processing to generate higher resolution super-resolved images using a plurality of captured images and lens stack arrays that can be utilized in array cameras are disclosed. An imaging device in accordance with one embodiment of the invention includes at least one imager array, and each imager in the array comprises a plurality of light sensing elements and a lens stack including at least one lens surface, where the lens stack is configured to form an image on the light sensing elements, control circuitry configured to capture images formed on the light sensing elements of each of the imagers, and a super-resolution processing module configured to generate at least one higher resolution super-resolved image using a plurality of the captured images.
Diabetic retinopathy is a leading cause of vision loss worldwide. Early detection can prevent irreversible damage, but current screening tools typically rely on a single imaging modality such as fundus photography or OCT. This siloed approach misses complementary information and subtle features, resulting in reduced sensitivity and specificity. Hyperspectral imaging can reveal biochemical changes but has not been integrated into mainstream diagnostic systems. Existing AI models are either single-modality or lack real-time multi-modal fusion. The present invention addresses these shortcomings by providing an AI-powered platform that fuses structural, spatial and spectral data from three distinct modalities, significantly improving diagnostic accuracy and enabling early-stage detection of diabetic retinopathy.
SUMMARY OF THE INVENTION
This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention.
This summary is neither intended to identify key or essential inventive concepts of the invention and nor is it intended for determining the scope of the invention.
The invention provides a deep learning-based diagnostic system that ingests and processes fundus, OCT and hyperspectral retinal images. Each modality is preprocessed to remove artefacts and normalise inputs. Modality-specific convolutional neural network (CNN) encoders extract features unique to each imaging technique.
An attention-based fusion layer dynamically weighs the extracted features according to their relevance in a given case. The fused representation is passed through fully connected layers to classify diabetic retinopathy into its various stages (no DR, mild, moderate, severe, proliferative).
An optional explainability module generates heatmaps or attention maps to show clinicians which regions of which modality contributed most to the prediction.
By integrating multiple imaging modalities and advanced feature fusion within one end-to-end trainable architecture, the invention delivers higher sensitivity, specificity and clinical trustworthiness than existing single-modality or dual-modality systems.
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
The proposed invention introduces a deep learning-based diagnostic system that ingests and processes three different image types—fundus, OCT, and hyperspectral images—to detect and classify the severity of diabetic retinopathy.
This multi-modal system uses a feature-level fusion approach, where features extracted from each modality via modality-specific CNN backbones are fused and passed through an attention-based fusion network. The output is a probabilistic prediction of DR stage (No DR, Mild, Moderate, Severe, Proliferative).
BRIEF DESCRIPTION OF THE DRAWINGS
The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
FIGURE 1: SYSTEM ARCHITECTURE
The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION OF THE INVENTION
The detailed description of various exemplary embodiments of the disclosure is described herein with reference to the accompanying drawings. It should be noted that the embodiments are described herein in such details as to clearly communicate the disclosure. However, the amount of details provided herein is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
It is also to be understood that various arrangements may be devised that, although not explicitly described or shown herein, embody the principles of the present disclosure. Moreover, all statements herein reciting principles, aspects, and embodiments of the present disclosure, as well as specific examples, are intended to encompass equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a",” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
In addition, the descriptions of "first", "second", “third”, and the like in the present invention are used for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The proposed invention introduces a deep learning-based diagnostic system that ingests and processes three different image types—fundus, OCT, and hyperspectral images—to detect and classify the severity of diabetic retinopathy.
This multi-modal system uses a feature-level fusion approach, where features extracted from each modality via modality-specific CNN backbones are fused and passed through an attention-based fusion network. The output is a probabilistic prediction of DR stage (No DR, Mild, Moderate, Severe, Proliferative).
Key components include:
• Image preprocessing module for normalization, denoising, and contrast enhancement specific to each modality.
• Modality-specific encoders (e.g., ResNet, EfficientNet, or custom CNNs).
• Fusion layer that performs concatenation and attention-weighted feature integration.
• Classification layer employing dense layers and softmax output.
• Optional explainability module using Grad-CAM or SHAP for clinical interpretation.
Detailed Operation:
a. Data Acquisition:
• Fundus images are captured using traditional fundus cameras.
• OCT scans are obtained from cross-sectional retinal imaging devices.
• Hyperspectral images are acquired using hyperspectral cameras tuned for ophthalmic use.
b. Preprocessing:
• Each modality undergoes custom preprocessing pipelines: resizing, spectral denoising (for HSI), and artifact removal.
c. Feature Extraction:
• CNN models extract features independently from each image type.
• OCT features may include retinal layer segmentation; fundus features capture lesions (microaneurysms, hemorrhages); HSI features highlight biochemical markers.
d. Feature Fusion and Learning:
• Features are concatenated and passed through an attention module that learns modality importance weights dynamically.
• Final classification is performed using fully connected layers with softmax activation.
e. Explainability:
• Grad-CAM or attention heatmaps are generated for each input to highlight regions contributing to the prediction.
System Implementation Workflow:
Step 1: Image Acquisition and Preprocessing
• Fundus images: resized, contrast-enhanced, and color-normalized.
• OCT images: undergo retinal layer segmentation and denoising.
• HSI data: spectral bands are denoised, normalized, and selected using PCA or band-selection filters to reduce dimensionality.
Step 2: Feature Extraction via CNNs
• Each modality is passed through a modality-specific convolutional neural network (CNN):
o Fundus: ResNet50 or EfficientNet
o OCT: U-Net or 3D CNN for layer features
o HSI: Spectral CNN (3D-CNN or spectral-spatial hybrid model)
These CNNs extract deep feature vectors that represent structural and biochemical patterns.
Step 3: Attention-Based Feature Fusion
• Extracted features are concatenated and passed through a multi-head self-attention module.
• The attention layer learns the importance of each modality dynamically per case.
• This ensures that, for example, hyperspectral features are weighted more in early DR, while OCT may dominate in advanced stages.
Step 4: Classification & Severity Grading
• The fused feature vector is passed through fully connected layers.
• Final output is a softmax-based classification into stages:
o No DR
o Mild Non-Proliferative DR
o Moderate NPDR
o Severe NPDR
o Proliferative DR
Step 5: Explainability Layer (Optional but Important)
• Using Grad-CAM or attention heatmaps, the system highlights key retinal regions from each modality that contributed to the prediction.
• This output helps clinicians verify and trust the diagnosis.
ADVANTAGES OF THE INVENTION:
• Improved diagnostic accuracy compared to single-modality models.
• Early-stage DR detection enabled by HSI’s biochemical sensitivity.
• Clinically interpretable results via explainable AI tools.
• Generalizable architecture suitable for other ophthalmic diseases (e.g., AMD, glaucoma).
The proposed invention presents a novel and non-obvious approach to diabetic retinopathy (DR) detection by introducing a unified AI-based diagnostic system that integrates three distinct ophthalmic imaging modalities—fundus photography, optical coherence tomography (OCT), and hyperspectral imaging (HSI)—into a single end-to-end deep learning framework.
The invention consists of three main input sources: fundus images, OCT scans and hyperspectral retinal images acquired using standard ophthalmic imaging equipment.
Each modality undergoes a dedicated preprocessing pipeline. Fundus images are resized, denoised and contrast-enhanced. OCT scans may include retinal layer segmentation and speckle noise removal. Hyperspectral images are spectrally denoised, normalised and band-selected to reduce dimensionality.
Following preprocessing, each modality is processed by its own convolutional neural network encoder optimised for that image type. The fundus encoder captures wide-field structural lesions such as microaneurysms and hemorrhages. The OCT encoder extracts cross-sectional features of retinal layers and macular edema. The hyperspectral encoder captures biochemical and metabolic signatures invisible in standard images.
The feature vectors output by the three encoders are concatenated and passed into an attention-based fusion module. This module learns dynamic weights that emphasise the most informative modality for a given patient or disease stage.
The fused feature representation is input to a fully connected classification head that produces a softmax probability over diabetic retinopathy stages.
An optional explainability component, such as Grad-CAM or attention heatmaps, highlights important image regions across all modalities for clinician review, increasing trust and transparency.The system can be deployed on cloud infrastructure or on-premise servers and accessed via a clinician-facing interface. The interface displays input images, predicted stage, and explanatory maps.
Training uses labelled multi-modal datasets of retinal images. The model can be fine-tuned for local populations or different imaging hardware.
The architecture tolerates missing modalities; if one imaging type is unavailable, the remaining modalities can still be processed with reduced weighting.
Security and privacy measures ensure all medical data are encrypted and handled according to healthcare standards.
The invention may be extended to other ophthalmic diseases such as age-related macular degeneration and glaucoma by retraining with appropriate data.
By combining structural, spatial and spectral information in one AI framework, the invention achieves earlier, more accurate and more interpretable diabetic retinopathy detection than current systems.
The preferred embodiment implements the system as a cloud-based service integrated with existing fundus cameras, OCT devices and hyperspectral imaging systems. Images from all three modalities are uploaded to the platform, preprocessed and passed through modality-specific CNN encoders. The attention-based fusion layer combines features and outputs a classification of diabetic retinopathy stage along with explanatory heatmaps. This configuration provides fast, automated, clinically interpretable screening suitable for hospitals, clinics and telemedicine environments.
, Claims:1. A multi-modal artificial intelligence system for diabetic retinopathy detection comprising:
an image acquisition module configured to receive fundus, optical coherence tomography and hyperspectral retinal images;
a preprocessing module configured to normalise, denoise and enhance each image modality;
a plurality of modality-specific convolutional neural network encoders configured to extract features from fundus, optical coherence tomography and hyperspectral images respectively;
an attention-based fusion module configured to concatenate and dynamically weight the extracted features to form a fused representation;
a classification module configured to classify diabetic retinopathy stage from the fused representation; and
an output module configured to display the classification result and explanatory maps to a user.
2. The system as claimed in claim 1, wherein the fundus encoder captures structural lesions, the optical coherence tomography encoder captures cross-sectional retinal features and the hyperspectral encoder captures biochemical signatures.
3. The system as claimed in claim 1, wherein the preprocessing module performs resizing, spectral denoising and band selection for hyperspectral data.
4. The system as claimed in claim 1, wherein the attention-based fusion module assigns modality weights dynamically according to disease stage.
5. The system as claimed in claim 1, wherein the output module provides heatmaps or attention maps highlighting regions contributing to the classification.
6. A method for multi-modal artificial intelligence-based diabetic retinopathy detection comprising:
a) acquiring fundus, optical coherence tomography and hyperspectral retinal images of a patient;
b) preprocessing each image modality to normalise, denoise and enhance it:
c) extracting features from each modality using modality-specific convolutional neural network encoders;
d) concatenating and dynamically weighting the extracted features in an attention-based fusion module to form a fused representation;
e) classifying diabetic retinopathy stage from the fused representation using a classification module; and
f) displaying the classification result and explanatory maps to a user.
7. The method as claimed in claim 6, wherein the hyperspectral images are denoised and band-selected to reduce dimensionality before feature extraction.
8. The method as claimed in claim 6, wherein the attention-based fusion module dynamically weights modalities according to their relevance for a given case.
9. The method as claimed in claim 6, wherein explanatory heatmaps highlight important retinal regions from each modality for clinical interpretation.
10. The method as claimed in claim 6, wherein the system tolerates missing modalities by adjusting fusion weights accordingly.
| # | Name | Date |
|---|---|---|
| 1 | 202541090646-STATEMENT OF UNDERTAKING (FORM 3) [23-09-2025(online)].pdf | 2025-09-23 |
| 2 | 202541090646-REQUEST FOR EARLY PUBLICATION(FORM-9) [23-09-2025(online)].pdf | 2025-09-23 |
| 3 | 202541090646-POWER OF AUTHORITY [23-09-2025(online)].pdf | 2025-09-23 |
| 4 | 202541090646-FORM-9 [23-09-2025(online)].pdf | 2025-09-23 |
| 5 | 202541090646-FORM FOR SMALL ENTITY(FORM-28) [23-09-2025(online)].pdf | 2025-09-23 |
| 6 | 202541090646-FORM 1 [23-09-2025(online)].pdf | 2025-09-23 |
| 7 | 202541090646-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [23-09-2025(online)].pdf | 2025-09-23 |
| 8 | 202541090646-EVIDENCE FOR REGISTRATION UNDER SSI [23-09-2025(online)].pdf | 2025-09-23 |
| 9 | 202541090646-EDUCATIONAL INSTITUTION(S) [23-09-2025(online)].pdf | 2025-09-23 |
| 10 | 202541090646-DRAWINGS [23-09-2025(online)].pdf | 2025-09-23 |
| 11 | 202541090646-DECLARATION OF INVENTORSHIP (FORM 5) [23-09-2025(online)].pdf | 2025-09-23 |
| 12 | 202541090646-COMPLETE SPECIFICATION [23-09-2025(online)].pdf | 2025-09-23 |