Sign In to Follow Application
View All Documents & Correspondence

An Integrated Machine And Deep Learning Framework For Enhanced Lung Cancer Detection: Optimized Feature Selection, Data Augmentation, And Robust Classification

Abstract: To enhance patient outcomes, lung cancer must be diagnosed as soon as possible with high precision. However, big data dimensions and unbalanced classes typically result in less accurate and interpretable classifications in CAD systems. This work proposes an enhanced multistage CAD framework that combines data science methods with machine learning and deep learning techniques to improve the categorization of lung cancer in terms of accuracy and resilience. To increase classification accuracy and computational efficiency, the model first selects features using RFECV, lowering dimensions by 30–40% while retaining high-relevance features. When hyperparameter tuning is done using GBM and Bayesian optimization, it yields a classification accuracy of 92–95% with extremely high specificity and sensitivity. It additionally integrates a hybrid CNN-RNN model to detect sequential patterns in multi-slice images, enhancing focus on both the crucial tumor locations and the transitions between slices, and an EfficientNet backbone with self-attention layers to clarify spatial features. When SMOTE is employed for data augmentation, the dataset is balanced, and Non-Local Means denoising improves feature extraction accuracy. Balanced measures of high performance are ensured by thorough analyses based on the AUC-ROC, F1 and MCC scores. The integrated strategy is accomplished with sensitivity and specificity above 90%, target accuracy above 95%, and an AUC-ROC score above 0.95. As a result, it has a high degree of diagnostic reliability for lung cancer. In addition to providing a strong automated diagnostic solution with enormous potential for clinical application in oncology, the suggested approach improves the interpretability and generalizability of CAD systems.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
26 September 2025
Publication Number
44/2025
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

SR University
Ananthasagar, Hasanparthy (M), Warangal Urban, Telangana - 506371, India
M. Madhavi Latha
Research Scholar, Department of Computer Science and Artificial Intelligence, SR University, Warangal, Telangana-506371
Dr. Amit Kumar Yadav
School of Computer Science and Artificial Intelligence, SR University, Warangal - 506371, Telangana, India

Inventors

1. M. Madhavi Latha
Research Scholar, Department of Computer Science and Artificial Intelligence, SR University, Warangal, Telangana-506371
2. Dr. Amit Kumar Yadav
School of Computer Science and Artificial Intelligence, SR University, Warangal - 506371, Telangana, India

Specification

Description:The present invention relates to the field of computer-aided diagnosis (CAD) systems, and more particularly to an integrated machine learning and deep learning framework for automated detection of lung cancer. The invention leverages optimized feature selection, hybrid deep learning architectures, data augmentation, and robust classification strategies to enhance diagnostic precision, computational efficiency, and interpretability in clinical oncology applications.
BACKGROUND OF THE INVENTION
The following description of related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art.

Lung cancer is one of the most fatal cancers worldwide, with early detection being critical for improving survival rates. Conventional diagnostic methods rely heavily on manual image interpretation, which is prone to error and subjectivity.

Computer-aided diagnosis (CAD) systems have emerged to assist radiologists in identifying cancerous lesions. However, the high dimensionality of medical imaging data and unbalanced class distributions often compromise the accuracy of existing CAD systems.

Traditional machine learning models struggle with redundant or irrelevant features, leading to lower sensitivity and specificity in classification tasks. Moreover, class imbalance due to fewer positive cancer cases reduces the reliability of predictions.

Deep learning models such as convolutional neural networks (CNNs) have demonstrated success in medical imaging, but they often lack mechanisms to capture sequential dependencies in multi-slice imaging data, thereby limiting their effectiveness.

Further, the interpretability of deep learning models remains a challenge, which restricts their acceptance in clinical decision-making. Techniques that improve both accuracy and transparency are required for clinical translation.

There exists a need for an integrated, multi-stage diagnostic system that optimally selects relevant features, balances class distributions, incorporates hybrid architectures for improved feature representation, and ensures robustness across diverse datasets while maintaining high sensitivity, specificity, and interpretability.

OBJECTIVE OF THE INVENTION

Some of the objects of the present disclosure, which at least one embodiment herein satisfies are listed herein below.

To provide a computer-aided diagnostic framework that integrates machine learning and deep learning for enhanced detection of lung cancer.

To reduce feature dimensionality while preserving discriminative features through Recursive Feature Elimination with Cross Validation (RFECV).

To achieve optimized model training and improved accuracy through gradient boosting and Bayesian hyperparameter tuning.

To incorporate a hybrid CNN-RNN architecture for capturing both spatial and sequential imaging features in multi-slice scans.

To enhance classification robustness through the use of EfficientNet with self-attention mechanisms for spatial feature refinement.

To address data imbalance using Synthetic Minority Oversampling Technique (SMOTE) and improve feature clarity through Non-Local Means (NLM) denoising.

To ensure diagnostic reliability by validating system performance with balanced metrics, including AUC-ROC, F1-score, MCC, sensitivity, and specificity above 90%.

SUMMARY OF THE INVENTION
This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.

In an aspect, the invention discloses a multi-stage integrated CAD system combining feature selection, deep learning architectures, and augmentation techniques for accurate lung cancer detection. The system first applies RFECV for optimized feature reduction, followed by Bayesian-optimized gradient boosting for efficient training. A hybrid CNN-RNN model extracts sequential patterns from multi-slice CT images, while EfficientNet with self-attention enhances spatial focus on tumor regions.
Data balancing using SMOTE and image denoising via NLM preprocessing improves dataset robustness and feature extraction quality. The proposed system achieves diagnostic performance with accuracy greater than 95%, sensitivity and specificity above 90%, and an AUC-ROC exceeding 0.95, ensuring clinical reliability.
BRIEF DESCRIPTION OF DRAWINGS
The accompanying drawings, which are incorporated herein, and constitute a part of this invention, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that invention of such drawings includes the invention of electrical components, electronic components or circuitry commonly used to implement such components.

FIG. 1 illustrates an exemplary computer-aided diagnostic system for lung cancer detection, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION OF THE INVENTION

In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.

The ensuing description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.

Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.

Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.

The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.

Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

The present invention discloses a multi-stage computer-aided diagnostic (CAD) framework that integrates machine learning-based feature selection, advanced data augmentation, and hybrid deep learning models to enhance lung cancer detection accuracy and reliability. The system is designed to address critical challenges in medical imaging, such as high data dimensionality, class imbalance, poor interpretability, and limited generalizability across diverse datasets.

In the initial stage, the system performs preprocessing on clinical imaging and metadata. Feature extraction is conducted using image filters, radiomics analysis, and statistical descriptors. To reduce dimensionality and computational overhead, the framework employs Recursive Feature Elimination with Cross Validation (RFECV). This algorithm evaluates features iteratively, removing less significant variables while preserving highly relevant ones. As a result, the overall dimensionality of the dataset is reduced by 30–40%, while ensuring that discriminative features essential for cancer detection are retained.

The next stage focuses on dataset balancing and noise removal. Medical imaging datasets typically suffer from a severe imbalance between malignant and non-malignant cases. The system employs the Synthetic Minority Oversampling Technique (SMOTE) to generate synthetic data points for underrepresented cancer-positive cases. This creates a balanced dataset distribution that enhances classification performance. To further improve the clarity of CT scan slices, Non-Local Means (NLM) denoising is integrated into the pipeline, reducing random image noise while preserving critical tumor structures, thereby improving the fidelity of feature extraction.

For classification and prediction, the invention employs a two-tier learning mechanism. At the first tier, gradient boosting classifiers, optimized using Bayesian hyperparameter tuning, are trained on the reduced feature set. This combination improves generalization, prevents overfitting, and achieves an accuracy range of 92–95% during preliminary evaluations. At the second tier, a deep learning architecture is utilized. Specifically, a hybrid Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) is introduced to capture both spatial and temporal dependencies. The CNN layers extract local tumor features from CT slices, while the RNN layers model sequential inter-slice dependencies to track tumor growth patterns across multiple slices.

To enhance spatial representation, the system incorporates an EfficientNet backbone integrated with self-attention layers. This architecture not only provides computational efficiency but also improves interpretability by dynamically focusing on the most relevant tumor regions. The combination of CNN, RNN, and self-attention mechanisms results in a robust hybrid framework that captures both local and global imaging characteristics of lung tumors, ensuring improved diagnostic precision.

Finally, the system evaluates performance using multiple metrics, including sensitivity, specificity, Area Under the Curve – Receiver Operating Characteristic (AUC-ROC), F1-score, and Matthews Correlation Coefficient (MCC). Experimental results demonstrate that the framework consistently achieves sensitivity and specificity above 90%, an overall classification accuracy exceeding 95%, and an AUC-ROC greater than 0.95. These results indicate a high level of diagnostic reliability and clinical applicability.

In one embodiment, the system integrates feature selection and machine learning-based classification. Clinical metadata and imaging-derived radiomic features are processed through RFECV to eliminate redundant and non-informative attributes. The refined dataset is then subjected to classification using Gradient Boosting Machines (GBM). Hyperparameters, including learning rate, tree depth, and number of estimators, are optimized using Bayesian optimization techniques. This embodiment emphasizes efficiency, reducing computational complexity while achieving strong predictive performance. The results show improved sensitivity to early-stage cancer cases, making it suitable for structured patient datasets with limited imaging slices.

In another embodiment, the invention focuses on deep learning-driven analysis of CT imaging data. Multi-slice CT scans are processed through CNN layers to extract spatial tumor features such as shape, texture, and density variations. The extracted feature maps are subsequently fed into RNN layers, specifically Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU) networks, which capture sequential dependencies between adjacent slices. This hybrid CNN-RNN model enables accurate detection of tumor progression across multiple slices, addressing limitations of purely spatial models. The embodiment is particularly effective for identifying subtle patterns in early-stage nodules and monitoring tumor development over time.

In a further embodiment, the invention integrates EfficientNet as the primary deep learning backbone to optimize accuracy-to-computation ratio. EfficientNet layers are combined with self-attention mechanisms, allowing the model to prioritize tumor-relevant regions while suppressing irrelevant background information. This embodiment is complemented by data augmentation techniques, including SMOTE for balancing positive and negative classes and Non-Local Means (NLM) denoising for improving imaging clarity. This combined approach significantly boosts classification performance in imbalanced and noisy datasets, making the model highly generalizable across different clinical imaging conditions. The embodiment ensures interpretability by generating attention maps that highlight critical tumor regions for clinical validation.

While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter to be implemented merely as illustrative of the invention and not as limitation.
, Claims:1. A computer-aided diagnostic system for lung cancer detection comprising:
a feature selection module employing Recursive Feature Elimination with Cross Validation (RFECV) for dimensionality reduction;
a data augmentation module employing Synthetic Minority Oversampling Technique (SMOTE) for balancing datasets;
a denoising module using Non-Local Means (NLM) for improving image quality;
a classification module employing gradient boosting with Bayesian hyperparameter optimization; and
a hybrid deep learning architecture comprising a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN), optionally integrated with an EfficientNet backbone and self-attention layers.

2. The system of claim 1, wherein RFECV reduces feature dimensions by 30–40% while retaining clinically relevant features.
3. The system of claim 1, wherein SMOTE generates synthetic samples of minority cancer cases to balance positive and negative class distributions.
4. The system of claim 1, wherein the hybrid CNN-RNN architecture captures both spatial tumor features and sequential dependencies across multi-slice imaging data.
5. The system of claim 1, wherein the EfficientNet backbone with self-attention enhances interpretability by focusing on tumor-relevant spatial features.
6. The system of claim 1, wherein Bayesian optimization tunes hyperparameters of the gradient boosting classifier to achieve improved diagnostic performance.
7. The system of claim 1, wherein the CAD framework achieves sensitivity and specificity above 90%, classification accuracy above 95%, and an AUC-ROC score exceeding 0.95.
8. The system of claim 1, wherein performance is evaluated using balanced metrics including F1-score, Matthews Correlation Coefficient (MCC), sensitivity, specificity, and AUC-ROC.

Documents

Application Documents

# Name Date
1 202541092417-STATEMENT OF UNDERTAKING (FORM 3) [26-09-2025(online)].pdf 2025-09-26
2 202541092417-REQUEST FOR EARLY PUBLICATION(FORM-9) [26-09-2025(online)].pdf 2025-09-26
3 202541092417-FORM-9 [26-09-2025(online)].pdf 2025-09-26
4 202541092417-FORM 1 [26-09-2025(online)].pdf 2025-09-26
5 202541092417-DRAWINGS [26-09-2025(online)].pdf 2025-09-26
6 202541092417-DECLARATION OF INVENTORSHIP (FORM 5) [26-09-2025(online)].pdf 2025-09-26
7 202541092417-COMPLETE SPECIFICATION [26-09-2025(online)].pdf 2025-09-26