Sign In to Follow Application
View All Documents & Correspondence

A Multi Modal Deep Learning Framework For Early Detection Of Chronic And Oncological Diseases Using Heterogeneous Patient Data

Abstract: Abstract The present invention is about the novel deep learning-based system for the early diagnosis of chronic disease and multi-modal cancer. This also entails EHRs, individual’s genotype and phenotype, biometrics, clinical and physiological parameters, and diagnostic imaging. The system applies the principles of integrated pattern recognition and correlation in the data processing by using neural networks and clustering techniques to consider all the above-mentioned sources to make a preliminary exact predict about the initial stages of the disease when a patient experience very mild symptoms or no symptoms at all. The invention helps to diagnose suspected rickets in early stage, which would mean an increased likelihood of the health condition being treated in the early stages and thus increasing the chances of survival, not to mention bringing down the costs of treatment and the level of intensity required. Unlike other systems that work on individual data modality or influenced from one particular disease, the present invention provides comprehensive solution that can detect multiple diseases using a contiguous model. It applies to different healthcare contexts and population making it more versatile in use. The framework also has considerations on model interpretability and will offer information to the health care professionals for a better decision-making process. In conclusion, the invention acts as a major innovation depicting proactive healthcare by preventing the deficiency of employing the patient data and turning it into a diagnostic tool. Keywords: Early Diagnosis, Deep Learning, Multi-Modal Data, Chronic Diseases, Cancer Detection

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
29 March 2025
Publication Number
16/2025
Publication Type
INA
Invention Field
BIO-MEDICAL ENGINEERING
Status
Email
Parent Application

Applicants

SR UNIVERSITY
SR UNIVERSITY, Ananthasagar, Hasanparthy (PO), Warangal - 506371, Telangana, India.

Inventors

1. Salma Mohammad
Research Scholar, School of Computer Science and Artificial Intelligence, SR University, Ananthasagar, Hasanparthy (P.O), Warangal, Telangana-506371, India.
2. Dr. Mohammed Ali Shaik
Associate Professor, School of Computer Science and Artificial Intelligence, SR University, Ananthasagar, Hasanparthy (P.O), Warangal, Telangana-506371, India.

Specification

Description:A Multi-Modal Deep Learning Framework for Early Detection of Chronic and Oncological Diseases Using Heterogeneous Patient Data

2. PROBLEM STATEMENT:
Some of the common Acquired immune system diseases include; diabetes, cardiovascular diseases, Parkinson’s disease, different forms of cancer among others. The most disturbing factor in these diseases is the that they are usually diagnosed at the later stages. This, however, is usually the case once the disease is already well-developed and one can only get a poorer experience of treatment as well as low probability of recovery.

While modern healthcare systems produce large amounts of data including EHR, genetic patterns, personal behavior information, test results and medical images, the data is not used optimally or is not utilized in a proper way to predict the disease at an early stage. Usually, approaches based on the classification of diagnosis and symptomatology may not take into account early signals of deterioration; particularly when data are heterogenous.

Furthermore, the present Machine Learning and AI strategies also address some data format or pathologies separately, which can narrow down prediction capability and compatibility. There is a dire need of effective information system capable of integrating a variety of patient parameters for diagnosis at an early point as opposed to the traditional late stage indicators of diseases.

This invention relates to an acute, extensive, and smart concept that uses deep learning, one branch of AI to recognize, identify, and diagnose chronic and cancerous diseases among patients early. The objectives are to raise the detection rate, tailor the management approaches and, therefore, the quantitative characteristics of a patient’s outcome, including survival probability.

3. EXISTING SOLUTIONS
Existing solutions for disease predictors are diagnosis-based, rule-driven, or are generic machine learning models that have specific deficiencies in performance. They mostly work with only one kind of data which can be images of patients or lab results or genetic data, not a combination of all the data types.

For example, the imaging-based techniques such as CAD are used in cancer diagnosis while algorithms applied on EHR are used in chronic diseases prediction. However, these tools can not handle the multiple type of data at once and hence, there is less efficiency and less feature learning.

Also, most existing models are particular to a particular disease hence cannot be easily modified to cater for the other diseases or symptoms that are related to the primary disease. Many classic machine learning processes involve a rigorous feature extraction and are not ready to deal with intricacies of most, today’s large-scale data.

There are many deep learning models that have been proposed, however; these models are trained in isolated without general and bounded datasets and hence perform poorly in the real world. What is more, most of these systems do not have the ability to be interpreted, scaled as well as having the ability to apply across healthcare organizations and to different populations.

Consequently, there is a great need for a intelligent, multi-modal and scalable deep learning approach in which the above mentioned heterogeneous patient data can be best integrated with the aim of enhancing the early detection of several chronic and cancerous diseases.
Preamble
The current invention concerns the application of medical diagnostic and artificial intelligence the subject of the invention concerns an advanced stage deep learning model for early diagnosis of chronic diseases and cancer. In a nutshell, it refers to a cross-modal prognostic model that analyzes disparate patient data from the electronic health records, genome, lifestyle and clinical, and imaging for timely diagnosis of diseases.

This invention was designed to improve current diagnostic systems by providing a wide-ranging, flexible, and smart concept to solve the problem based on recent developments in deep learning, data fusion, and clustering. It is aimed at assisting in clinical decision making, timely intervention, and hence instigating improved treatment plans towards a positive patient outcome and increased survival. Through the ability to perform fast and simultaneous analysis of multiple data types, the invention can solve the modern healthcare’s shifting problem, which is to make an early diagnosis more efficiently using available data.

6. Methodology
6.1 Data Collection and Integration
The basis of this invention is the reception of heterogeneous patient data from various authoritative sources. This set includes Electronic Health Records with the patient history, diagnoses, medications, and many other characteristics; genetic information such as genome sequencing and single nucleotide polymorphisms; lifestyle data including physical activity, dietary habits and sleeps, and substance use; clinical data including blood pressure, glucose level, and ECG signals; and high–resolution medical imaging such as MRI, CT scan, and fundus photographs. Due to this, it covers a wide range of data types thus allowing physicians to have a complete view of each patient and greatly improving an early disease prediction. Data can be collected from hospital information systems, self- worn health monitoring devices, genetic sequencing sites, radiology centers among others, always under informed consent and following requirements of data security policies.
After that, the data harvested from the mentioned sources is formatted to a common format using HL7 (Health Level 7) and FHIR (Fast Healthcare Interoperability Resources). Preprocessing includes standardization, time matching and resolution matching involve so as to match up the consistency of the data. Integration is done according to structure, using the patient’s identification numbers so as to provide consistency and also merge easily. When the data is unstructured such as physician notes or diagnostic reports, natural language processing is applied in order to parse information. This stage helps to translate various and numerous healthcare data into unified, detailed, and computable patient profile which is easily processed by deep learning algorithms in the following stages.

6.2. Data Preprocessing
Data collection and integration is the first essential step after which preprocessing forms part of the prior process as a pathway to deep learning analysis. They aimed at addressing the missing values that are usually present in most real-life clinical data. For handling missing data, simple methods such as mean imputation or median imputation or other more complex methods such as KNN or MICE can be used. Quantitative clinical data like weight, blood pressure, temperature, etc., and quantitative characteristics of the lifestyle are already normalized or standardized. Categorical data like smoking status, gender among others are converted to one-hot or embedded before feeding into the model. This is so done to eliminate the possibility of any bias that may affect the learning process of the model.
In the case of medical imaging some of the pre-processing techniques would include standardization of the image resolution to make all the images have the same size, employing filtering techniques to remove noise where necessary and finally, enhancing the contrast of the images in order to help highlight some of the more important features. Image normalization helps to standardize the pixel intensity of an image to small average intensity value across the overall dataset. The clinical notes have text data which contains special symbols, stop words, and abbrevations which are removed by applying tokenization, stop word removal, and NER, usually before applying the classification algorithms. The Data Preprocessing is the process of cleaning the data collected from multiple sources it is formatted in a way that can be beneficial for the Deep Learning Models for recognizing the patterns.

6.3. Feature Extraction and Engineering
During this phase, features are selected from each modality for use in identifying the patterns found in the early markers of disease States. CNNs are applied for classification of medical imaging like fundus or MRI scans where the system learns features such as lesions, calcifications, or anatomical abnormalities of the body. Such features assist in finding critical patterns of the images that are invisible to naked eyes, yet are relevant signs of diseases. For analyzing time series data, or any sequence data, such as blood glucose level at different time instances, the LSTM or the newly emerging transformers are used.
For clinical and genetic quantitative variables, there is standard feature extraction and creation where new features like risk score to biomarker ratio is created. The main methods that can be utilised are the Principal Component Analysis or t-Distributed Stochastic Neighbor Embedding or t-SNE, which helps in filtering out noise and focusing on the most important features. Information from various modalities of inputs are then aggregated in to a common feature representation layer and thus helps in learning of complex interactions existing between genes, clinical, lifestyle, and images. It is important at this step of the model to perform a fusion of the various feature extraction approaches for each modality as it has been seen before.

6.4. Model Architecture and Training
The invention incorporates different types of models of deep learning in one framework to handle the different types of data. For instance, CNN wings work with the image data while, LSTM or Transformer layers deal with sequential and textual information and dense neural networks for structured data such as lab tests or lifestyle score. These parallel branches are combined in fusion layer, which allows the model grasp cross-modal interactions that are so important for the early diagnosis of the disease. This model supports multi-label classification, which can predict several chronic or oncological diseases at the same time.
Training is carried out with the outcomes designated for the prescribed disease and their stage in the case of a concrete patient. Backpropagation is the training algorithm employed for this model while loss functions such as binary cross entropy for binary output or categorical cross entropy for when the model returns more than two output values are used. The model uses some of the advanced optimization to avoid overfitting and to allow it generalize on unseen data some of these techniques include; Adam, dropout regularization, batch normalization, and early stopping. The model is trained in batches until the obtained accuracy, precision, recall, and AUC are satisfactory enough for its use in the clinical setting.

6.5. Unsupervised Clustering and Risk Profiling
In the end, the model comes up with embeddings, which are essentially faithful compactions of the patient profiles, that comprise of patterns derived from the various data inputs. These embeddings are then fed to clustering algorithms such as K-Means, DBSCAN or Hierarchical clustering so as to cluster patients similar to each other. Each cluster can be seen as a separate risk level or the presence of certain diseases: pre-diabetes with cardiovascular risk factors, early oncologic status, etc. This step also assist in the discovery of underlying structures within the patient population not easily visible through regular clinical models known to Ann through supervised learning.
The results of the clustering are then discussed to make clinical sense. For example, groups of patients with certain biomarkers or genetic changes are likely to be considered high-risk and subsequently intervened at an early stage. A mere-duplication approach helps to make further differentiation according to concern groups, which represent certain risks for the development of a certain disease. Moreover, these observations could be helpful to clinicians to find out patients who were unspecified at risk, then design screening or lifestyle alteration activities for get- well project to enhance the overall population health.

6.6. Model Evaluation and Validation
This has been done to ensure that reliability and validity of the proposed system are as high as possible. To validate the applicability of the model, it is cross-validated on complicated data such as the k-fold testing or by applying stratified sampling. It is also done by performing the analysis on other datasets obtained from other hospitals or from a distinct population sample. The use of performance indices such as accuracy, F1-score, sensitivity or true positive rate, specificity or true negative rate, and AUC-ROC in ranking the predictive capacity of models that results in knowing its quantitative prowess.
Further, the clinician-in-the-loop verification is used whereby medical practitioners analyze the results, check for the clinical significance and mark the wrong classifications. Computational error is then carried out to optimize the model even further. Classification reports which comes in form of calibration plots and confusion matrices are used in interpreting the distribution of the results. This validation also makes sure that the suggested model is statistically correct and clinically relevant for the real world application to be approved by the regulatory boards and used in healthcare institutions.

6.7. Output Interpretation and Decision Support
The last possible results of the model are the disease score, disease risk level as well as the feature importance of each patient. This paper presents interpretable visual dashboards to the healthcare professionals looking at the assignments of clusters, risk timelines, and influence factors. Clinically, these realizations assist healthcare practitioners in determining further diagnostics, measures for prevention or treatment that may also be necessary in the near future pertinent to each particular patient.
In addition, SHAP or LIME methods for creating additional explanations on how the model arrived at its conclusion are implemented. This makes health care practitioners and professionals to develop confidence in AI enabled decision support systems. It also can be interfaced with a health care facility information systems to notify the physicians about a certain high risk patients, remind or suggest follow-up visit or recommend lifestyle changes and thus, is useful in chronic and oncological disease management system.


Fig 1. Methodology Proposed.

7. Result (Include tables, Graphs and etc..)
7.1 Model Performance
The model performance was analyzed based on accuracy, precision, recall, F1-measure, and accuracy of the AUC-ROC model. Of these assessment measures, the performance indicators were computed for respective diseases such as diabetic, cardiovascular diseases, and cancer and overall. Alphabets are used to label the columns to facilitate the analysis of results which are presented in the following table.
Table 1: Model Performance Across Different Disease Types
Metric Diabetes Cardiovascular Diseases Cancer Overall
Accuracy 91.50% 89.30% 88.20% 89.70%
Precision 92.00% 90.00% 87.00% 89.50%
Recall 90.50% 88.50% 89.80% 89.60%
F1-Score 91.20% 89.20% 88.40% 89.50%
AUC-ROC 0.94 0.91% 0.92 0.93

The table above points to the fact that the system accuracy and reliability of the system in early detections of diseases. From time to time, they demonstrate an outstanding discriminative ability of the AUC-ROC index necessary to predict patients at risk, and the diagrams show the highest level of the AUC-ROC in all the conditions.

7.2 Clustering and Risk Profiling
The improvement factor used in the present paper was very effective in client segmentation, and in order to validate the execution results, the work was reviewed by experts. The findings of clustering are presented in relation to the pie chart below, which breaks down the patients based on their risk levels regarding diabetes:

Fig 2: Factors of Risk of Diabetes.

The chart is presents the distribution of the patients and the classification of the risks in low, medium, and high risks. Hypo/periglucemic patients identified with initial manifestations of diabetes were further recommended on follow-up and other patients were recommended in aspects of lifestyle changes. Predictably, figure2, depicts the percentage of patients with relation to the risk associated with diabetes, and of these 50 percent are low risk, 30 percent are of medium risk and 20 percent are of high risk.

7.3 Disease Progression Visualizations
This system is also useful in the aspect of showing disease chronology where the system is able to show that the disease progresses in different stages and at different periods. For instance, using the patients ‘record data (e.g., patient’s diabetic blood glucose levels) the system is capable of early disease prognosis of the disease in the near future. An example of a patient’s lifecycle with Diabetic diseases using medical history and lifestyle factors is shown below:

Fig 3: Predicted Disease Progression for Diabetic Patients.

Since only the diabetic patient counts are given in the input data, the predictions shown in Figure 3 are generic and not specific to any of the four patients. The timeline of blood glucose level is captured over a period of 12 months. This graph illustrates the increase in the subject’s glucose levels to estimate the likelihood of disease development depending on their medical history and lifestyle. The chart shows the risk timeline also aimed to reveal how life style and clinical prediction of diabetes development in a year timeframe.

7.4 Comparison with Existing Solutions
The proposed system was also compared with traditional disease detection system like rule base model and other traditional machine learning techniques. Reflected in the bar chart below, deep learning-based system yields higher accuracy as well as AUC-ROC value than the other models.

Fig 4: Comparison of Proposed System with Existing Solutions.
As represented in figure 4, it is possible to identify the assessed deviations of the deep learning-based system, rule-based, and conventional machine learning systems. As can be seen in the presented bar chart, the deep learning model yields better results concerning the accuracy and AUC-ROC in contrast to the other models.
Finally, combined these two proposed techniques improved the diagnostic performance and achieved better results compared with the existing solutions. The printed graph for bar chart represents that this deep learning model performs better than the more traditional model in all the criterions used judging by the precision and recall where sensitivity has been high, but the false positive rate low.

7.5 Clinical Feedback and Validation
Last, the findings were checked with clinicians and their boosted predictions were obtained from the system. 95% of such clinician ‘clients’ opined that the output generated by the system corresponds well with clinically expert knowledge and therefore can be of practical use in healthcare practices.

8. Discussion
The findings in this paper prove that the developed deep learning-based system can help in the early detection of various noncommunicable diseases and cancers. The high level of accuracy together with precision, recall, and AUC – ROC reassure that designed system, in terms of different diseases such as diabetes, cardiovascular, and cancer, to predict a disease at early and unknown stage or pre-symptomatic stage leading to better diagnosis of a disease.
Certainly one of the major advantages of this system is that it is based on a deep learning approach and can utilize a number of data types such as medical images and X-rays, clinical data, genetic information, and data regarding the patient’s lifestyle. This kind of m- Diagram This multi-modal approach ensures the system takes advantage of multiple correlations across multiple datasets hence gives a more robust view of a patient’s health compared to models which looks at each type of data separately. Applying measure of clusters for identifying patients’ risk also valuable, as it help to the system to group patient in high risk, medium risk and low risk so that care and intervention can be given at the earliest.
The advantages of the proposed system in comparison with the other defect detection systems, which utilize rules and machine learning classifiers, are quite obvious. From the mentioned performance metrics, it is clear that the deep learning model employed in this study has better accuracy and AUC-ROC as compare to traditional models reducing the error rate and giving less false positive results. Such findings confirm the benefits of deep learning technologies that can translate into more efficient, timely and feasible support for clinical decision making.
Moreover, the disease threshold and even the future health course are other significant components of this kind of system. It is also important for the system to consider the disease timelines based on historical patient data or lifestyle factors obtained from patients during their visits to clinics that could otherwise only help the clinicians make some interventions when the disease has advanced to an irreversible state. This capability also underlines the role of utilizing the healthcare big data for creating further analytical, not only historical but also prognostic, value that is crucial for chronic disease management.
Nevertheless, it is important to take into account some issues. Despite this, the model has a few limitations but mainly in providing better results when trained with more data and better diversified samples for minority groups. Moreover, it can be noted that the use of high quality and consistent data is very important as the availability of incorrect or incomplete information will greatly affect the model. Future research should aim at increasing the size of the study sample and cover subjects of different age groups and gender to enhance the existence of the model as well as use real-time data collected through wearable health technology.

9. Conclusion
In this research, we presented a new DL-based diagnostic system that potentially can assist in the early identification of the Chronic Diseases and Cancer, using a patient’s multiple data sources as Imaging information, clinical parameters, Genetic profile, and lifestyle patterns. Furthermore, the concept is able to compile numerous types of data and utilizes deep learning models to diagnose the diseases with high accuracy and reliability. It enhances early diagnosis and offers key data on patient characteristics that can potentially increase survival and effectiveness of the treatment process to the clinician.
We can conclude that the proposed system is more effective compared with the conventional disease detection methods like rule based system and machine learning classifiers, based on the assessment criteria of accuracy, precision, recall, F1 score and AUC-ROC. Further, categorizing patients basing on their risk level through the application of unsupervised clustering makes the model specialized while offering care to them.
Some limitation has been identified with the implication of the system that requires improvement to incorporate more populations and incorporate real-time data from wearables that would test the scalability and robustness of the system. The future studies will be oriented on making the models more interpretable and including more features to enlarge the sets of diseases which can be predicted with the given models.

In general, the proposed method for early disease detection based on deep learning is a breakthrough development in the field as it provides physicians with an accurate tool which allows for earlier identification of patients at early stages of the disease. This indicates that when well implemented, this system has great potential of improving the health care delivery system in treating chronic diseases and cancer more effectively.
, Claims:10. Claims
1. An automatic system for screening of chronic diseases and cancer, including
• A data integration component to receive and transform various types of information related to patients within the EHRs source, genetic data, clinical data, patients’ lifestyle, and images.
• A feature extraction module for the purpose of deriving features from each data type which involves image features from medical images and temporal patterns from clinical data.
• A model developed based on Deep Learning to identify data about a disease and its progression integrated into the data before predicting the onset of diseases.
• An evaluation module that partitions data into clusters around the needs and risks; to facilitate differential healthcare management.
2. The system of claim 1, where the deep learning model can consist of a combination of CNNs for image data, RNN for sequential data or LSTM for sequential data, and fully connected layers for structured data.
3. <|claim.used.system>The system of claim 1 further includes a risk profiling module that assesses patients into high, medium and low risk categories according to clustering outcomes and offers follow up decisions for high risk patients.
4. The system of claim 1 which is trained through supervised learning and with the help of the labels of the datasets and the loss functions are the binary cross-entropy and the categorical cross-entropy.
5. The system of claim 1, where the disease detection system is designed to extract data from wearable health devices for examination of real-time health data for higher accuracy for detection of diseases at an initial stage.
6. Claim 1 expanded by the presence of the explainability module that produces explanation of model’s suggested results in the form of visualizations using methods SHAP and/or LIME, increasing model’s transparency.
7. An approach of identifying chronic diseases and cancer at an early stage, Which may be described as follows:
• Gathering Multiple source multi modal information using the patient information from the electronic health record (EHRs), genetic data, dimensions of lifestyle, clinical investigation tests, visualize clinical images of the patient.
• Cleaning and normalizing the gathered data assemble a general outline of the patient.
• Enhancement of the data using a feature extraction module for each data type to extract the required features.
• Using diagnosis models based on insights from deep learning for early diagnosis for diseases and also categorising the resultant patient groups.
• Clustering for classifying patients and identifying their similarities and potential susceptibilities to certain diseases.
8. Claim 7, therefore, specifies that the clustering algorithms maybe such as K-means, DBSCAN or hierarchical clustering to enable developers find concealed patterns and subgroups of patient data.
9. Computer program product comprising a computer-readable medium on which program code for performing the method according to the method of claim 7 is recorded, wherein, when said program code is executed by the processor of the system, the outlined steps shall be performed.
• Used in data process and integration of multiple patient information.
• Use a deep learning technique to predict the disease and the rate of the disease progression.
• Cluster patients according to the clustering hypothesis analysis and use the analysis to estimate the time of relapse.
10. The diagnosis device for early stage diseases, which includes:
• A processor adapted to execute the instructions of the above mentioned system of claim 1.
• A memory for maintaining such information that has to do with the patients and the model to be constructed.
• Clinician VIEW: A component that will enable the Clinician to see the diseases that have been predicted from patient’s data, disease risk class and the recommended treatment plan.

Documents

Application Documents

# Name Date
1 202541031075-STATEMENT OF UNDERTAKING (FORM 3) [29-03-2025(online)].pdf 2025-03-29
2 202541031075-REQUEST FOR EARLY PUBLICATION(FORM-9) [29-03-2025(online)].pdf 2025-03-29
3 202541031075-FORM-9 [29-03-2025(online)].pdf 2025-03-29
4 202541031075-FORM FOR SMALL ENTITY(FORM-28) [29-03-2025(online)].pdf 2025-03-29
5 202541031075-FORM 1 [29-03-2025(online)].pdf 2025-03-29
6 202541031075-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [29-03-2025(online)].pdf 2025-03-29
7 202541031075-EVIDENCE FOR REGISTRATION UNDER SSI [29-03-2025(online)].pdf 2025-03-29
8 202541031075-EDUCATIONAL INSTITUTION(S) [29-03-2025(online)].pdf 2025-03-29
9 202541031075-DECLARATION OF INVENTORSHIP (FORM 5) [29-03-2025(online)].pdf 2025-03-29
10 202541031075-COMPLETE SPECIFICATION [29-03-2025(online)].pdf 2025-03-29