Abstract: The present invention relates to an automated system and a method for lung cancer detection which produces more meaningful samples and enables early detection and accurate diagnosis of lung cancer. The present invention implements Vector Quantization Variational Auto Encoder (VQ-VAE), which produces more meaningful samples, especially in the context of small data sets. Then, a hybrid deep learning model [ResNet+GRU] is used to train on the augmented dataset and used for the detection and categorization of lung cancer. The results of this study contribute to the early detection and accurate diagnosis of lung cancer leading to improved treatment and survival rates for patients.
Description:FIELD OF THE INVENTION
The present invention relates to an automated system for cancer detection and a method thereof. More particularly, the present invention relates to an automated system and a method for lung cancer detection which produces more meaningful samples and enables early detection and accurate diagnosis of lung cancer.
BACKGROUND OF THE INVENTION
Lung cancer poses a substantial public health challenge in low- and middle-income countries, including India. As per global statistics, pre- COVID, lung cancer emerged as the most diagnosed cancer worldwide, with approximately 2.2 million new cases and 1.8 million fatalities. Lung cancer develops when abnormal cells grow uncontrollably in the lungs, usually in the cells lining the air passages. It can lead to high mortality rate, Health Complications, Limited treatment options, Side effects of treatment, etc. Lung cancer is often detected at an advanced stage when it has already spread beyond the lungs, making it more difficult to treat effectively. Additionally, certain types of lung cancer, such as small cell lung cancer, tend to be more aggressive and have lower survival rates compared to non-small cell lung cancer. Indeed, despite advancements in diagnostic techniques and treatment options for lung cancer, the survival outcomes can still be challenging.
This type of medical research often relies on large datasets to draw meaningful conclusions and develop accurate models. The major challenges in this field are as follows:
1. Deep learning models require a large quantity of labeled medical images to get trained. This is a major challenge in the medical field, where obtaining a large amount of labeled medical images can be difficult.
2. Class imbalance is another major drawback that refers to a situation where one or more classes (categories) of data are underrepresented in the dataset, while others are overrepresented. This can pose a challenge for machine learning models as they may have the tendency to favor the majority class, leading to poor performance in the minority class.
Lung cancer is the leading cause of cancer related deaths worldwide which accounts for 25% of all cancer deaths. Medical imaging techniques such as Chest X-Ray, CT scan, Magnetic Resonance Imaging (MRI), PET (Positron Emission Tomography) have been used in the detection and diagnosis of lung cancer. The analysis of the medical images requires significant expertise and is time-consuming. The lack of real-time annotated images and their high dimensionality remains key issues in medical fields. In addition, the wide use of deep learning framework requiring a large amount of data has made the need for data augmentation (DA) crucial to avoid poor performance or over-fitting. But these augmentation techniques are data-dependent and cannot be generalized. Class imbalance is another major setback to deep learning models. For example, non-cancerous images will be lesser available than cancerous images. In such cases these models will generate results more favorable to the class which contains maximum number of images.
There are multiple patent and non-patent literature published in said area of invention, for instance, US patent application no.: US-202217944881-A titled as “Systems, Methods, And Apparatuses For Systematically Determining An Optimal Approach For The Computer-Aided Diagnosis Of A Pulmonary Embolism” discloses computer-aided diagnosis of Pulmonary Embolism (PE). The prior art discloses a modified Convolutional Neural Network (CNN) architecture with a "squeeze and excitation (SE) block" is implemented for PE diagnosis. The model renders a prediction as the presence or absence of PE.
Another patent application no.: CN-201910894463-A titled as “Method, device, equipment and storage medium for evaluating curative effect of cancer targeted therapy” discloses tool for evaluating cancer therapy efficacy. The invention involves three main steps:
a) Getting 3D tumor data that matches the tumor size in each image.
b) Analyzing the data using a characteristic learning network to obtain time sequence characteristic data.
c) Using a classification network with gating circulation to classify and output the evaluation result for the treatment's effectiveness.
Several deep learning models such as CNN, RNN and GAN were developed and are used for lung cancer detection. The major setbacks of the existing models are:
• They require a large amount of labelled data to train effectively. This can be a major challenge in the medical field, where obtaining a large amount of labeled medical images can be difficult.
• They generate false positives and false negatives which can be a concern in the context of lung cancer detection and treatment.
• CNNs are trained on specific datasets and may not generalize well to other datasets/populations. RNNs are computationally expensive, especially when analyzing the large size of medical images.
• Also, variable-length images cannot be handled well.
• GANs can generate images that are realistic looking (unrealistic images) but do not accurately reflect the desired characteristics of the underlying medical data.
Accordingly, lack of huge, annotated images and class imbalance are the two major challenges faced by the existing deep learning frameworks.
To overcome the above issues, the present invention provides Vector Quantization Variational Auto Encoder (VQ-VAE), which produces more meaningful samples, especially in the context of small data sets. Then, a hybrid deep learning model ResNet (Residual Network) with + GRU (Gated Recurrent Unit neural network) [ResNet +GRU] is used to train on the augmented dataset and used for the detection and categorization of lung cancer. The results of this study can contribute to the early detection and accurate diagnosis of lung cancer leading to improved treatment and survival rates for patients.
OBJECT OF THE INVENTION
In order to obviate the drawbacks in the existing state of art, the main object of the invention is to provide an automated system and method for cancer detection.
Yet another object of the invention is to provide an automated system and method for lung cancer detection using synthetic data generated by vector quantization variational auto encoder (vq-vae).
Yet another object of the invention is to provide an automated system and method for cancer detection using a hybrid deep learning model, a combination of ResNet (Residual Network) with GRU (Gated Recurrent Unit neural network).
Yet another object of the present invention is to provide an efficient system for early detection and accurate diagnosis of lung cancer.
Yet another object of the present invention is to provide an economical system for detection and categorization of lung cancer.
SUMMARY OF THE INVENTION:
Accordingly, the present invention discloses an automated system and a method thereof for detection and classification of cancer through synthetic data generated. More particularly, the present invention discloses a system and a method for Lung cancer detection, classification and categorization as shown in Figure 1.
The present invention implements Vector Quantization Variational Auto Encoder (VQ-VAE), which produces more meaningful samples, especially in the context of small data sets. Then, a hybrid deep learning model [ResNet+GRU] is used to train on the augmented dataset and used for the detection and categorization of lung cancer. The results of this study can contribute to the early detection and accurate diagnosis of lung cancer leading to improved treatment and survival rates for patients.
The proposed VQ-VAE (Vector Quantization Variational Auto Encoder) model overcomes the above drawbacks of the existing methods by re-constructing the medical images using generative models, thereby increasing the samples the model can get trained. Also, it aims to handle class imbalance as well.
The small imbalanced image dataset is obtained from the samples and are pre-processed. Pre-processing resizes the images and removal of noise by using matched filtering. Image generation of the processed dataset is performed using Auto-Encoder to obtain Balanced dataset. The Auto encoder used is VQ-VAE (Vector Quantization Variational Auto Encoder). After obtaining the processed images from Auto-encoder, said images are segmented. Segmentation is used to extract the Region Of Interest (ROI). Segmentation is done by an fuzzy C-means segmented technique. Thereafter, feature extraction and selection of images are performed by a deep learning system comprising a plurality of trained machine learning (“ML”) models, one of the model uses here is a hybrid deep learning model [ResNet+GRU].
Based on said extraction and selection, images are categorized and classified for cancer. In the case of lung cancer classification is done as to whether it is small lung cancer or non-small lung cancer.
BRIEF DESCRIPTION OF DRAWINGS
Figure 1 displays a block diagram of the system.
Figure 2 (A) displays images depicting the image generation using VQ-VAE.
Figure 2 (B) displays the block diagram depicting image reconstruction using VQ-VAE Autoencoder.
Figure 3 displays the flowchart for lung cancer detection, classification, and categorization.
Figure 4 displays the evaluation results.
Figure 5 displays the original images and the corresponding reconstructed images.
DETAILED DESCRIPTION OF THE INVENTION ILLUSTRATIONS AND EXAMPLES
While the invention has been disclosed with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made, and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt to a particular situation or material to the teachings of the invention without departing from its scope.
Throughout the specification and claims, the following terms take the meanings explicitly associated herein unless the context clearly dictates otherwise. The meaning of “a”, “an”, and “the” include plural references. Additionally, a reference to the singular includes a reference to the plural unless otherwise stated or inconsistent with the disclosure herein.
Following is the non-limiting design and development of the social robot of the present invention.
The main purpose of the innovation is to provide an efficient automated system and method thereof for lung cancer detection, classification and categorization as shown in Figure 1.
The said method comprises of following steps:
The trained variational autoencoder is tested to re-generate the samples from the codebook as shown in Figure 2(a) and (b)
The design flow of the system is shown in Fig.3. The various Hardware components used in the invention including but not limited to CT Scanner: CT scanner is used to capture the lung image for analysis, highly configured system to analyze images.
The various software components as used in the invention are as follows:
• Deep Learning Toolbox: To detect lung cancer from the CT image. And, to classify and categorize the cancer.
• Tensor flow: An open-source machine learning framework used for training deep learning models, image processing and augmentation, transfer learning, model optimization, and deployment.
• Keras: A high-level deep-learning library used for model architecture design, Integration with Tensor flow backend, data augmentation, and Pre-processing, training, and evaluation of the model.
• Python: A versatile and widely used programming language, that offers several tools and libraries that can be leveraged for lung cancer detection tasks. It enables data processing and analysis and aids in the visualization of the data for yielding improved accuracy.
There are various advantages of the VQ-VAE model over other automated techniques such as said model possess improved accuracy, have reduced computational cost, robust to noise, improved interpretability and possess efficient transfer learning.
Overall, the advantages of VQ-VAE over other automated techniques for lung cancer detection make it a promising approach for improving the accuracy and efficiency of lung cancer diagnoses. The goal of this research is to develop a hybrid deep-learning model which is used to reconstruct, detect, and classify lung cancer images. Initially, the VQ-VAE model was developed for image reconstruction. Then in the second phase, a deep learning model was developed to train on the reconstructed images generated from the 1st phase. Then lung cancer segmentation, classification, and categorization were performed. The developed model was able to classify and predict the stage of cancer in a very efficient manner.
EXAMPLES:
Examples are described herein in the context of systems and methods for classifying and predicting the stage of cancer in the patients. Those of ordinary skill in the art will realize that the following description is illustrative only and is not intended to be in any way limiting.
To experiment and validate the proposed model “Chest CT-scan image dataset” [Open-source dataset]. This dataset comprises diverse chest cancer images along with normal cell images, categorized into Adenocarcinoma, Large cell carcinoma, and Squamous cell carcinoma types.
The original images and the corresponding reconstructed images are shown in Figure 5.
The proposed model is validated under the following scenario’s.
(i) Case I: Chest CT-Scan dataset – 80% training and 20% Testing.
(ii) Case II: Augmented dataset (Chest CT-Scan dataset and generated reconstructed Images) – 80% training and 20% Testing.
(iii) Case III: Training dataset comprises of 80% of data from original dataset and testing dataset comprises of 20% of data from reconstructed images.
(iv) Case IV: Training dataset comprises of 80% data from reconstructed images and testing dataset comprises of 20% of data from original dataset.
The corresponding confusion matrix and the evaluation results are as shown in Table 1 below and also as shown in Figure 5.
Table 1:
Case (I)
Class Precision Recall F1-score Accuracy
adenocarcinoma 0.75 0.75 0.75 0.75
large cell 0.67 0.75 0.71 0.75
Normal 1.00 1.00 1.00 1.00
Squamous
1.00 0.88 0.93 0.88
Case (II)
Class Precision Recall F1-score Accuracy
adenocarcinoma 0.88 0.94 0.91 0.94
large cell 0.88 0.94 0.91 0.94
Normal 1.00 1.00 1.00 1.00
Squamous 1.00 0.85 0.92 0.85
Case (III)
Class Precision Recall F1-score Accuracy
adenocarcinoma 0.89 0.81 0.85 0.81
large cell 0.87 1.00 0.93 1.00
Normal 1.00 1.00 1.00 1.00
Squamous 0.84 0.80 0.82 0.80
Case (IV)
Class Precision Recall F1-score Accuracy
adenocarcinoma 0.61 0.85 0.71 0.85
large cell 0.93 0.65 0.76 0.65
Normal 1.00 1.00 1.00 1.00
Squamous 0.89 0.80 0.84 0.80
, Claims:We claim:
1. A method for early detection and diagnosis of lung cancer comprising:
- obtaining one or more imbalanced images dataset of a sample;
- pre-processing a plurality of imbalanced image dataset from one or more images;
- image generation of the processed dataset using Auto-encoder to obtain balanced dataset;
- segmentation of the generated images as received from Auto-encoder;
- extraction and selection of requisite images as received from Auto-encoder;
- determining and classifying cancer by the deep learning system comprising a plurality of trained machine learning (“ML”) models;
wherein the method is able to classify and predict the stage of cancer in a very efficient manner.
2. A method as claimed in claim 1 wherein said Auto-encoder is Vector-Quantization Variational Auto Encoder (VQ-VAE)
3. A method as claimed in claim 1 wherein said deep machine learning model is a hybrid of Residual Neural Network + Gated Recurrent Unit [ResNet+GRU]
4. A method as claimed in claim 1 wherein pre-processing resizes the images and removes the noise by using matched filtering.
5. A method as claimed in claim 1 wherein segmentation is performed by fuzzy C-means segmented technique.
6. A method as claimed in claim 1 wherein hardware components such as CT scanner is used to capture the lung image for analysis.
7. An automated system for early detection and diagnosis of lung cancer wherein said system comprising
- a memory to store instructions;
- a processor to execute the instructions stored in the memory;
- an autoencoder for producing meaningful samples;
- a hybrid deep learning model for detection and categorization of lung cancer for early and improved diagnosis;
wherein the system is configured to receive a plurality of medical images for early detection and classification of lung cancer.
| # | Name | Date |
|---|---|---|
| 1 | 202341084626-STATEMENT OF UNDERTAKING (FORM 3) [12-12-2023(online)].pdf | 2023-12-12 |
| 2 | 202341084626-REQUEST FOR EXAMINATION (FORM-18) [12-12-2023(online)].pdf | 2023-12-12 |
| 3 | 202341084626-REQUEST FOR EARLY PUBLICATION(FORM-9) [12-12-2023(online)].pdf | 2023-12-12 |
| 4 | 202341084626-FORM-9 [12-12-2023(online)].pdf | 2023-12-12 |
| 5 | 202341084626-FORM FOR SMALL ENTITY(FORM-28) [12-12-2023(online)].pdf | 2023-12-12 |
| 6 | 202341084626-FORM FOR SMALL ENTITY [12-12-2023(online)].pdf | 2023-12-12 |
| 7 | 202341084626-FORM 18 [12-12-2023(online)].pdf | 2023-12-12 |
| 8 | 202341084626-FORM 1 [12-12-2023(online)].pdf | 2023-12-12 |
| 9 | 202341084626-FIGURE OF ABSTRACT [12-12-2023(online)].pdf | 2023-12-12 |
| 10 | 202341084626-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [12-12-2023(online)].pdf | 2023-12-12 |
| 11 | 202341084626-EVIDENCE FOR REGISTRATION UNDER SSI [12-12-2023(online)].pdf | 2023-12-12 |
| 12 | 202341084626-DRAWINGS [12-12-2023(online)].pdf | 2023-12-12 |
| 13 | 202341084626-DECLARATION OF INVENTORSHIP (FORM 5) [12-12-2023(online)].pdf | 2023-12-12 |
| 14 | 202341084626-COMPLETE SPECIFICATION [12-12-2023(online)].pdf | 2023-12-12 |
| 15 | 202341084626-ENDORSEMENT BY INVENTORS [12-01-2024(online)].pdf | 2024-01-12 |
| 16 | 202341084626-Proof of Right [30-01-2024(online)].pdf | 2024-01-30 |
| 17 | 202341084626-ENDORSEMENT BY INVENTORS [30-01-2024(online)].pdf | 2024-01-30 |
| 18 | 202341084626-FORM-26 [20-02-2024(online)].pdf | 2024-02-20 |
| 19 | 202341084626-FER.pdf | 2025-08-28 |
| 1 | 202341084626_SearchStrategyNew_E_SearchHistory-202341084626E_21-07-2025.pdf |