An Advanced Ai Driven System For Multi Modal, Interpretable, And

< Back

An Advanced Ai Driven System For Multi Modal, Interpretable, And Privacy Preserving Detection Of Abnormalities In Medical Imaging

Abstract: AN ADVANCED AI-DRIVEN SYSTEM FOR MULTI-MODAL, INTERPRETABLE, AND PRIVACY-PRESERVING DETECTION OF ABNORMALITIES IN MEDICAL IMAGING The invention discloses a system and method for multi-modal, interpretable, and privacy-preserving detection of abnormalities in medical imaging. The system integrates convolutional neural networks with transformer-based architectures to extract both local and global features, enabling accurate classification, segmentation, and localization of abnormalities. The invention supports multimodal imaging inputs, including MRI, CT, PET, and X-ray, in combination with patient metadata. Explainability is achieved through integrated Grad-CAM++, SHAP, and LIME modules, providing interpretable outputs for clinicians. The invention enables real-time deployment on edge devices and cloud platforms, ensuring adaptability across diverse healthcare settings. Federated learning allows collaborative model training across institutions without sharing raw patient data, thereby ensuring compliance with privacy regulations. By combining hybrid architectures, explainable AI, and privacy-preserving mechanisms, the invention provides a clinically feasible, scalable, and trustworthy diagnostic solution that enhances early and accurate detection of abnormalities in medical imaging.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

22 September 2025

Publication Number

43/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SR UNIVERSITY

ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Inventors

1. NATTALA SUBBARAYUDU

DEPARTMENT OF ECE, SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

2. SREEDHAR KOLLEM

DEPARTMENT OF ECE, SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

3. T. VENKATAKRISHNAMOORTHY

SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Specification

Description:FIELD OF THE INVENTION
The present invention relates to the field of medical diagnostics using artificial intelligence. More particularly, it pertains to an advanced system and method for multi-modal, interpretable, and privacy-preserving detection of abnormalities in medical imaging through a hybrid convolutional neural network and transformer-based architecture, with federated learning capabilities and real-time explainability features.
BACKGROUND OF THE INVENTION
Although there has been an increase in the advanced medical imaging systems, early and accurate detection of abnormalities remains a major clinical problem. The existing conventional methods of diagnosis are also highly processed since they are conducted manually by the radiologists and expertise of technicians and therefore, susceptible to human error and generally constrained by inter-observer variance. The majority of the developed artificial intelligence (AI) systems to date have shown a promise of aiding diagnosis, but are not typically interactive in real-time, do not directly translate to clinicians, and do not lend themselves to standard imaging patterns. Moreover, a majority of the classical models of AI may be characterized by a sense of a black box, i.e. the outcomes and decisions may be hard to explain, or cannot be justified, creating an obstacle to acceptance by a clinician. There is the also the problem of generalizability in these systems to other forms of imaging (e.g. MRI, CT and ultrasound) and other groups of people. No adequate AI-enhanced imaging system can facilitate a correct interpretation, in real-time, and therefore possible detection of abnormal conditions during active scans, operates on the multi-modal data, is capable of considering the clinical situation, and can communicate with the operator and diagnostic procedure without losing control of them.
US20220301666A1: This disclosure provides an efficient, hands-free system and method for capturing and recording patient treatment and physiological data in critical care environments. The systems and methods described herein enables clinicians to record and transcribe patient information and physiological data onto an individual disposable medical record tag, which accompanies the patientthroughout initial stabilization and presentation to a treatment center. The data tag digitally stores a patient's health status, and displays a specific color based on a patient's degree of injury or if treatment is required. The data tag forms the center of a patient centric network PCN of connectedhealth devices. An artificial intelligence machine learning model is used in combination with predictive analytics to assess a patient's condition and provide clinical decision support for clinicians based on predictive analytical models.
US20240161035: A medical scan viewing system is configured to: generate inference data via at least one inference function, based the at least one medical scan and further based on receiver operating characteristic (ROC) parameters that include at least one ROC set point; present for display, via an interactive user interface, medical image data corresponding to the at least one medical scan, the inference data and a ROC adjustment tool; generate, in response to user interaction with the ROC adjustment tool, at least one adjusted ROC set point; generate updated inference data via the at least one inference function, based the at least one medical scan and further based on the at least one adjusted ROC set point; and present for display, via the interactive user interface, the medical image data corresponding to the at least one medical scan and the updated inference data.
Conventional diagnostic imaging relies heavily on human interpretation and is susceptible to error, delay, and inter-observer variation. Existing AI-based diagnostic systems lack explainability, are limited to single modalities, cannot perform multi-task detection, and are not optimized for real-time use in clinical workflows. Moreover, current systems often fail to ensure privacy in distributed medical data environments. The present invention addresses these limitations by providing a hybrid CNN-Transformer model capable of multimodal analysis, interpretable predictions, multi-task outputs, and federated learning to ensure data privacy and clinical adaptability.
SUMMARY OF THE INVENTION
This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention.
This summary is neither intended to identify key or essential inventive concepts of the invention and nor is it intended for determining the scope of the invention.
The present invention provides an AI-driven diagnostic system and method for detecting abnormalities in medical imaging with high accuracy, real-time interpretability, and multi-task functionality. The invention integrates convolutional neural networks with transformer-based architectures to capture both local and global features of medical images.
The system is designed to process multimodal inputs, including MRI, CT, PET, and X-ray, in combination with patient metadata, thereby enhancing diagnostic accuracy and clinical decision-making. It employs a multi-task learning framework that simultaneously supports classification, segmentation, and localization of abnormalities.
To overcome the black-box problem, the system integrates explainable AI techniques such as Grad-CAM++, SHAP, and LIME, generating visual and semantic outputs that aid clinicians in understanding predictions. Deployment can be carried out on both edge and cloud platforms, making the system adaptable to diverse healthcare environments.
A key feature of the invention is the incorporation of federated learning to enable collaborative model training across institutions without sharing raw medical data. This ensures compliance with data privacy regulations and enhances scalability. The invention thus provides a clinically feasible, secure, and highly interpretable solution to medical abnormality detection.
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
As compared to the existing medical imaging systems, the proposed invention relieves limitations presented by the existing medical imaging solutions since it offers an end-to-end, explainable, and high-performance AI-based diagnostic system that can manage the primary shortcomings of existing systems in terms of poor accuracy, lack of interpretability, a modality-specific approach, and inability to adapt to clinical conditions called by accident-related challenges.
BRIEF DESCRIPTION OF THE DRAWINGS
The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
FIGURE 1: SYSTEM ARCHITECTURE
FIGURE 2: FLOWCHART OF DIAGNOSTIC PIPELINE
The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION OF THE INVENTION
The detailed description of various exemplary embodiments of the disclosure is described herein with reference to the accompanying drawings. It should be noted that the embodiments are described herein in such details as to clearly communicate the disclosure. However, the amount of details provided herein is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
It is also to be understood that various arrangements may be devised that, although not explicitly described or shown herein, embody the principles of the present disclosure. Moreover, all statements herein reciting principles, aspects, and embodiments of the present disclosure, as well as specific examples, are intended to encompass equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a",” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
In addition, the descriptions of "first", "second", “third”, and the like in the present invention are used for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The invention introduces a hybrid CNN-Transformer architecture designed to leverage the strengths of both convolutional neural networks and transformer modules. CNN layers are employed to capture fine-grained local features, while transformer blocks provide global context modeling through attention mechanisms. This combination significantly improves accuracy in identifying abnormalities within complex or low-contrast imaging conditions.
The invention supports multimodal imaging input, enabling the system to process MRI, CT, PET, and X-ray images. Metadata such as patient age, clinical history, and symptoms can be integrated to enrich the diagnostic pipeline. The multimodal fusion framework ensures a holistic approach to abnormality detection and clinical decision-making.
The system features a multi-task head capable of producing classification outputs for disease type, segmentation maps for lesion boundaries, and localization of abnormal regions. These outputs are provided concurrently, thus reducing diagnostic delays and offering comprehensive insights to medical practitioners.
The explainability layer is a critical part of the invention. It employs techniques such as Grad-CAM++, SHAP, and LIME to generate pixel-level heatmaps and feature-importance maps. These outputs provide interpretable justifications for the system’s predictions, enhancing trust and adoption among clinicians.
Deployment of the system is highly flexible. It can operate on low-power diagnostic consoles, portable devices, and mobile tablets in addition to high-capacity cloud servers. This dual deployment strategy ensures accessibility in diverse healthcare settings, from urban hospitals to rural clinics with limited infrastructure.
The invention incorporates federated learning for training and refining models across multiple institutions. Instead of transferring sensitive patient data, the system aggregates model updates securely, preserving privacy while benefiting from large, diverse datasets. This approach complies with major data protection regulations such as GDPR and HIPAA.
The workflow of the invention begins with image preprocessing, including normalization and scaling. Feature extraction is then carried out through CNN and transformer layers. The multi-task learning framework subsequently produces classification, segmentation, and localization results. The explainability layer overlays heatmaps and generates text-based diagnostic reports.
Outputs may include lesion boundaries, classification of tumor types such as glioma, meningioma, or pituitary tumor, and confidence levels for localization. The clinician receives both visual overlays and interpretable reports, which support informed decision-making.
The system architecture includes a deployment interface designed for real-time inference. On edge devices, optimizations reduce computational load, while cloud platforms offer scalable processing for high-throughput environments.
The invention is adaptable for integration into existing clinical workflows, including Picture Archiving and Communication Systems (PACS) and Radiology Information Systems (RIS). This interoperability ensures seamless adoption without disrupting established medical processes.
The federated learning engine employs secure aggregation protocols to ensure privacy-preserving collaboration. This mechanism allows hospitals and research centers to benefit from joint model improvement without exposing raw patient data.
The invention improves upon prior solutions by enabling hybrid feature extraction, multi-task learning, multimodal processing, and explainable outputs in a single integrated framework. Unlike existing black-box AI systems, it provides interpretable predictions in real time and supports flexible deployment across diverse healthcare infrastructures.
The diagnostic accuracy of the invention is enhanced by combining CNN-based local feature detection with transformer-based global attention mechanisms. This ensures that even subtle anomalies in complex imaging conditions are reliably identified.
The invention is further characterized by its adaptability. New imaging modalities, diagnostic tasks, or clinical settings can be easily accommodated by retraining or fine-tuning the model within the federated learning ecosystem.
This invention provides a robust, explainable, scalable, and privacy-preserving AI diagnostic tool that significantly improves upon conventional and existing AI-based systems. It is designed to aid clinicians with real-time, trustworthy, and actionable diagnostic outputs.
BEST METHOD OF WORKING
The best method of working the invention involves deploying the system in a hospital environment where medical images from MRI, CT, PET, and X-ray machines are directly fed into the diagnostic system. Images are preprocessed, and the hybrid CNN-Transformer architecture extracts local and global features. The multi-task head generates classification, segmentation, and localization outputs. Explainability modules overlay heatmaps and provide interpretable reports for the clinician.
The system is connected to a federated learning network across partner hospitals. Each institution trains the local model on its own dataset, and only encrypted model updates are shared to a central server. The aggregated updates refine the global model, ensuring continual improvement while preserving privacy. The final results are displayed on diagnostic consoles or portable devices in real time, enabling clinicians to make quick and accurate decisions.
As compared to the existing medical imaging systems, the proposed invention relieves limitations presented by the existing medical imaging solutions since it offers an end-to-end, explainable, and high-performance AI-based diagnostic system that can manage the primary shortcomings of existing systems in terms of poor accuracy, lack of interpretability, a modality-specific approach, and inability to adapt to clinical conditions called by accident-related challenges.
 Transformation-like CNN Architecture
• The combination of CNNs (with the collaboration of modeling the local texture and small details) and Transformers (with the collaboration of the global attention and the context modeling).
• The combined technique allows the accurate localization of the abnormalities like the tumors, lesions and anatomical anomalies with high confidence despite the method of imaging being hard or of low contrast.
 Multi-Task Functioning and Multimodal Functioning
• Shared backbone features one can utilize in the system is i.e. segmentation, classification, localization.
• It is also multimodal fusion, and it encompasses medical image such as the MRI, CT, PET, and the structured patient which is the age, the symptoms, the history, and having the ability to identify the better diagnostic outcomes and clinical decision making.
 XAI Features
• Makes inferences SHAP, LIME and Grad-CAM++.
• Visual heat maps and maps of feature importance as well as the provision of predictions in addition to making it look more clinical in a way so that it can be interpreted and lead to clinical certainty in controlled medical situations by the medical practioners.
 Edge and Cloud: Deployment that is dynamically scalable:
• Across the edge devices (e.g. mobile tablets, diagnostic consoles) and on clouds it presents the best framework to expand infer to the most advantageous potential.
• This latitude may be shortened in rural clinic, tele-medicine and fluctuating structure chains.
 Federated Learning Privacy-Conscious Training
• Trains the model with the capabilities of federated learning to use a network of campuses, and, simultaneously, does not transfer patient data.
• Tracks against the law of data privacy (e.g. HIPAA, GDPR) and secures sensitive records of health data and allows the combine work of model refinement.
Implementation Workflow:
 Input preprocessing: The representation of the image in the form of MRI or CT or any other is scaled and normalized.
 Feature Extraction: CNN layer indicates the local features, and the Transformer blocks bridge the spatial connection between global components.
 Multi-Task Head: The output of the model is the segmentation of a tumor area, the type of a tumor (e.g., glioma, meningioma) and the confidence of the localization.
 Explainability Layer: Approx. explanations on Grad-CAM++, SHAP and LIME.
 Output: XAI output and overlay will become an originator of images and text reports.
 Haed Inference/Option Cloud: The network may be on the clinic servers, or any portable diagnostic platforms.
 Federated Updates: Obsolescing in safe style: Updates of models will be carried out in a safe style of obsolescing in plural institutions/ hospitals.
This invention offers a clinically feasible, scalable, and trustworthy method of coming up with an automated approach of detecting abnormalities in medical imaging, a solution that is based on powerful hybrid design in conjunction with multi-task learning, explainable AI, flexible deployment, as well as federated privacy-preserving mechanisms.
Present invention is a new and non-obvious, end-to-end diagnostic system that has a possibility to design a combined CNN-Transformer architecture and explainable artificial intelligence (XAI) methods (Grad-CAM++, SHAP, LIME) to study a unified stream of medical images under a variety of complexes in the case of multi-tasks. The system has the advantage of operating on the multi-modal data unlike the other existing known solutions that use one set data of either the classification or segmentation information only; the detection, classification and localization of abnormalities are done concurrently. It also differs by its privacy-preserving federate learning algorithm and the ability to implement its application edges/clouds, intraoperative, interpretable, and secure diagnostics. The additional payloads of trade-off in the design integration are flexibility, interpretability, and precision to the clinical practice.

, Claims:1. A system for detecting abnormalities in medical imaging, comprising:
a hybrid deep neural network including convolutional neural network layers and transformer-based attention modules for feature extraction; a feature fusion module configured to integrate outputs of the convolutional neural network and transformer modules; a multi-task output unit configured to perform classification, segmentation, and localization of abnormalities;
an explainability module configured to generate interpretable outputs using Grad-CAM++, SHAP, and LIME; a deployment interface enabling real-time inference on edge devices and cloud platforms; and a federated learning engine configured to train models collaboratively across institutions without sharing raw medical data.
2. The system as claimed in claim 1, wherein the medical imaging data comprises one or more of MRI, CT, PET, or X-ray.
3. The system as claimed in claim 1, wherein the classification output includes identification of tumor types selected from glioma, meningioma, or pituitary tumor.
4. The system as claimed in claim 1, wherein the segmentation output defines lesion boundaries with pixel-level precision.
5. The system as claimed in claim 1, wherein the transformer-based attention module incorporates positional encoding and self-attention mechanisms.
6. The system as claimed in claim 1, wherein the explainability module provides visual heatmaps and feature-level importance maps.
7. The system as claimed in claim 1, wherein the federated learning engine employs secure aggregation protocols for privacy preservation.
8. The system as claimed in claim 1, wherein the deployment interface is optimized for low-power diagnostic devices.
9. A method for detecting abnormalities in medical imaging, comprising: receiving multimodal medical imaging inputs; preprocessing the inputs for normalization and scaling;
extracting local and global features using a hybrid CNN-Transformer model; performing classification, segmentation, and localization of abnormalities through a multi-task output unit; generating explainable outputs using Grad-CAM++, SHAP, and LIME; deploying results in real time on edge or cloud platforms; and training models collaboratively using federated learning without sharing raw data.
10. The method as claimed in claim 9, wherein the output comprises both visual overlays and interpretable diagnostic reports to support clinical decision-making.

Documents

Application Documents

#	Name	Date
1	202541090192-STATEMENT OF UNDERTAKING (FORM 3) [22-09-2025(online)].pdf	2025-09-22
2	202541090192-REQUEST FOR EARLY PUBLICATION(FORM-9) [22-09-2025(online)].pdf	2025-09-22
3	202541090192-POWER OF AUTHORITY [22-09-2025(online)].pdf	2025-09-22
4	202541090192-FORM-9 [22-09-2025(online)].pdf	2025-09-22
5	202541090192-FORM FOR SMALL ENTITY(FORM-28) [22-09-2025(online)].pdf	2025-09-22
6	202541090192-FORM 1 [22-09-2025(online)].pdf	2025-09-22
7	202541090192-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [22-09-2025(online)].pdf	2025-09-22
8	202541090192-EVIDENCE FOR REGISTRATION UNDER SSI [22-09-2025(online)].pdf	2025-09-22
9	202541090192-EDUCATIONAL INSTITUTION(S) [22-09-2025(online)].pdf	2025-09-22
10	202541090192-DRAWINGS [22-09-2025(online)].pdf	2025-09-22
11	202541090192-DECLARATION OF INVENTORSHIP (FORM 5) [22-09-2025(online)].pdf	2025-09-22
12	202541090192-COMPLETE SPECIFICATION [22-09-2025(online)].pdf	2025-09-22