Sign In to Follow Application
View All Documents & Correspondence

Deep Learning Method To Detect Chest X Ray Or Ct Scan Images Based On Hybrid Yolo Model

Abstract: Abstract Present invention relates to system and method for novel model building which is accurate, fast and automatic to quantify severity of seventeen lung diseases and provides RGB coloured heat map based images to represent diseases severity, and is capable for CT Scan and X-ray both image scanning via same system. It represents Deep Learning Method based on hybrid YOLO CNN model, which is based on multiple neural network parameters, powered by YOLO Framework in which YOLO is a convolution neural network which consists of at least twenty-four convolutional layers, followed by at least two fully connected layers. The present novel system, after the analysis of images it under-go for the diagnosis process where the disease severity prediction takes place and it also plot the boundary-boxes where actual diseases are being predicted with different colors of squared boxes and their confidence.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
31 March 2022
Publication Number
15/2022
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

Manentia Advisory Private Limited
A-44 Rosedale County-I, Sundarpura, Taluka- Vadodara Vadodara Gujarat India 391240
Pandit Deendayal Energy University
Pandit Deendayal Energy University PDEU Road, Raisan Gandhinagar Gujarat India 382426
PDEU Innovation and Incubation Centre
Pandit Deendayal Energy University PDEU Road, Raisan Gandhinagar Gujarat India 382426

Inventors

1. ANUJ CHANDALIA
2 S.V.P ROAD JAMNAGAR Gujarat India 361001
2. HITESH GUPTA
2 S.V.P ROAD JAMNAGAR Gujarat India 361001

Specification

Claims:Claims:

We Claim:
1. A Deep Learning Method to Detect Chest X-Ray or CT Scan Images based on Hybrid YOLO Model, wherein the hybrid model has following characteristics and is built as follows:
wherein the XChesNet Model is built on top of YOLO by tuning its hyper parameter on the torch-1.6.0 framework;
Wherein for CNN Architecture, the YOLOV5 backbone is used;
wherein the X-Ray or CT Scan image has resolution of at least 224 x 224 × 3;
wherein the present algorithm applies a single neural network to entire X-Ray or CT Scan image; then the said network divides the said image into regions to provide the bounding boxes and to predict probabilities for each region;
wherein it uses non-max suppression technique to detect each object only once, and it discards any false detections;
wherein the said YOLO model consists of at least twenty-four convolutional layers, followed by at least two fully connected layers and the said layers are separated by their functionality;
wherein the Input layer tuned for taking specified Image dimension, and the output layer is tuned to seventeen neurons, which are used for seventeen classes;
wherein the Weights of the said convo layer are updated by binary cross entropy losses;
wherein the first twenty convolutional layers were followed by an average pooling layer;
wherein the fully connected layer is pre-trained on the ImageNet dataset, which is a 1000-class classification dataset;
wherein the said layers comprise 3×3 convolutional layers and 1x1 reduction layers;
wherein for object detection, four convolutional layers followed by two fully connected layers are added in the end;
wherein the resolution of the dataset is increased to at least 448 x 448;
wherein the final layer predicts the class probabilities and bounding boxes;
wherein the sigmoid activation function is used in the final detection layer and the Leaky ReLU activation function is used in the middle/hidden layers;
wherein the input is atleast 448 x 448 for X-Ray or CT Scan image and the output is class prediction of the detected object enclosed in the bounding box;
wherein it provides YOLOV5 with FP16 support for faster training, quantization support, and a flexible codebase;
wherein the Cross Stage Partial Networks are used as a backbone in YOLOV5 to extract rich in useful characteristics from an input image;
wherein PANet is used as a neck in YOLOV5 to get feature pyramids, is useful in the identification of the same object in different sizes and scales; and
wherein model Head uses anchor boxes to construct final output vectors with class probabilities, objectness scores, and bounding boxes for the final detection step.
2. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein the system detects to specify the level of severity (mild, moderate, severe) and the areas of infection for either one and more of the following lung diseases using either X-ray image or CT scan images: Aortic enlargement, Atelectasis, Calcification, Cardiomegaly, ILD, Infiltration, Lung Opacity, Nodule/Mass, Other lesion, Pleural effusion, Pleural thickening, Pneumothorax, Pulmonary fibrosis, Covid, Edema, Pneumonia, Tuberculosis or normal condition by marking infected areas with heatmap and bounding box.
3. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein for training, the initial learning rate is set to 0.01 (SGD=1E-2, Adam=1E-3), the final one cycle learning rate is set to 0.2; the momentum is taken as 0.937 with an optimizer weight decay of 0.0005; set epochs to 3, and set the box loss gain to 0.05, class loss gains to 0.5, and object loss gain to 1.0.
4. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein the SGD is used for optimization function for training.
5. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein the target and the output, set class BCELoss and object class BCELoss to 1.0; Set the iou (Intersection over Union) training threshold to 0.20; and anchors per output layer is set to 4.
6. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein the focal loss gamma is set to 0.0; set the hue to 0.015, saturation to 0.7, and value to 0.4; set degrees to 0.0; set translation to 0.1; set scale and shear to 0.5 and 0.0; set the perspective range to 0-0.001; set the probability of flipping images from left to right to 50%; set the mosaic probability to 1; set the probability of flipping images from up-down to 0.
7. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein the Initial Learning Rate set to (1, 1e-5, 1e-1) with SGD class and Adam, whereas final learning rate is maximized to (1, 0.01, 1.0).
8. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein the SGD momentum, Optimizer weight decay, warmup epochs, warmup initial momentum and warmup initial bias learning rate are set to (0.3, 0.6, 0.98), (1, 0.0, 0.001), (1, 0.0, 5.0), (1, 0.0, 0.95) and (1, 0.0, 0.2) respectively.
9. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein the box loss gain, cls loss gain, class BCELoss positive weight, obj loss gain (scaled with pixels) and obj BCELoss positive_weight is set to (1, 0.02, 0.2), (1, 0.2, 4.0), (1, 0.5, 2.0), (1, 0.2, 4.0) and (1, 0.5, 2.0) respectively.
10. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein the Intersection over Union (IoU) training threshold for Object Detection is initialized with (0, 0.1, 0.7); the anchor-multiple threshold and anchors per output grid (0 to ignore) are initialized to (1, 2.0, 8.0) and (2, 2.0, 10.0) respectively.
11. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein for image tuning, focal loss gamma (efficientDet default gamma=1.5), image HSV-Hue augmentation (fraction), image HSV-Saturation augmentation (fraction), image rotation (+/- deg), image translation (+/- fraction), image scale (+/- gain), image shear (+/- deg) and image perspective (+/- fraction), range 0-0.001 are initialized to (0, 0.0, 2.0), (1, 0.0, 0.1), (1, 0.0, 0.9), (1, 0.0, 0.9), (1, 0.0, 45.0), (1, 0.0, 0.9), (1, 0.0, 0.9), (1, 0.0, 10.0) and (0, 0.0, 0.001) respectively.
12. The Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model as claimed in claim 1, wherein the image flip and mixup parameters are image flip up-down (probability), image flip left-right (probability), image mixup (probability) and image mixup (probability) are initialized with (1, 0.0, 1.0), (0, 0.0, 1.0), (1, 0.0, 1.0) and (1, 0.0, 1.0) respectively.
, Description:Title of Invention
Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model

The present patent application is a patent of addition to patent application number: 202221013695 filled dated on 14.03.2022, the said basic patent application is entitled on “Deep Learning Method to Diagnose Chest X-Ray or CT Scan Images based on Hybrid ResNet”

Field of Invention
The invention relates to a method for detecting nodules in a lung CT image based on an improved YOLO algorithm and belongs to the technical field of computer medical images.

Background of Invention
Medical imaging is one of the important information in disease diagnosis. Imaging technologies include: Computed Tomography (CT), Single Photon Radiation Tomography, MRI Imaging, ultrasound imaging, traditional X-ray, in which X-ray diagnosis is still one of the important basis for radiological diagnosis. The chest cavity is hailed as the mirror of human health and disease, because it contains many important tissue structures of the human body, and can provide various information about the human body, such as the diagnosis of lung diseases, rib fractures and injuries, etc. It also may be used to help diagnose and monitor treatment for a variety of lung conditions such as pneumonia, emphysema and cancer. Chest radiography (X-ray) is an essential part of the diagnostic (and monitoring) examination, and is the first step in the radiological evaluation of patients with suspected respiratory diseases. Because chest x-ray is fast and easy, it is particularly useful in emergency diagnosis and treatment. Chest X-rays are one of the most common medical imaging procedures with over 2-10 times more scans than other imaging modalities such as MRI, CT scan, and other scans. In turn, the number of Chest X-rays scans places significant workloads on radiologists and medical practitioners. At present, the method of CT scan and X-ray image scanning is based on the experience and judgement of the medical staff. Manual observation of the same is adopted for identifying lung disease in the prior art, the identification efficiency is low, and the method is directly related to the experience of the medical staff. At present since doctors in the hospitals judge lung nodule size based on experience by looking at visual inspection of the CT scan and X-ray images, a long time is needed for some interns to accumulate such experience. Also, the experience and technical level of each doctor is different, so there's a possibility of obtaining different results or outcomes. Also, the visual method of inspection has certain subjectivity and is difficult to avoid error.
In comparison, the lung CT examination has the advantages of real-time, rapidness, high positive rate, a high correlation between lung lesion areas and clinical symptoms and the like, the CT examination is adopted as the main diagnosis in the epidemic outbreak period so that missing and delayed diagnosis are avoided, the isolation time is delayed, and the prevention and control effect of cutting off an infection source can be achieved.
Medical CT (computed tomography) is one of the most common and effective imaging examinations, and it uses an X-ray collimation system to obtain clear cross-sectional images, accurate layer thickness, high density resolution, and no interference data of out-of-layer structures. With the widespread use of medical CT, the lung CT image data is "explosively" growing, and usually a whole lung CT contains 100-500 lung sectional images. In addition, the early pulmonary nodules generally have the characteristics of small volume, blurred edge and difficulty in distinguishing by naked eyes, so that the early pulmonary nodules and the positions thereof are identified from the CT images, the workload of doctors is greatly increased, very high requirements are also provided for professional level and experience of doctors, the judgment results are different from person to person, and different doctor judgment results may be different, so that a method for processing CT image data is needed, and the method can assist doctors in quickly and accurately identifying the pulmonary nodules and the positions thereof.
The CT imaging technology is mature, the CT imaging of the novel coronavirus pneumonia is expressed as a single-shot or double-shot and multiple-shot glass density image, the texture is in a grid shape, local spot sheet-shaped sublevel distribution is mainly used in the early stage, the double-shot and multiple-shot coronary in the developing period are partially changed, the double-shot and multiple-shot pulmonary diffuse change is 'white lung', experienced doctors can accurately read the CT image of a patient, and the method is urgently needed to find a method which can reduce the pressure of medical workers and can quickly and accurately diagnose the CT image of the lung of the patient.
The chest CT can verify the position and the affected area of the lesion and can also roughly distinguish the benign and malignant lesions, and the CT image has good density resolution capability on various lung lesions, so the CT image becomes an important means for diagnosing lung cancer at present. The pulmonary nodules are the lesion manifestations of the early stage of lung cancer, and the CT influence is manifested as the opaque light shadow in the lung, the malignant nodules are manifested as irregular edges or half burrs, needle point-shaped or eccentric calcification, the benign nodules have smooth edges and no obvious leaf-dividing burrs, etc. However, since CT is a tomographic image, the total lung CT of one case usually contains 200-500 images. Identifying tiny lung nodules from so many images and distinguishing between benign lesions and malignant tumours or other lung lesions can be a significant physical and mental challenge for imaging diagnosticians. Meanwhile, in early CT images, pulmonary nodules are small in size, unobvious in edges and difficult to distinguish by naked eyes, so that missed diagnosis and misdiagnosis are increased. Under the circumstances, in order to reduce the burden of image reading of the imaging department doctors, improve the Diagnosis accuracy of diseases, and reduce the missed Diagnosis rate and the misdiagnosis rate, a Computer Aided Diagnosis (CAD) system is developed at the same time, and becomes an effective auxiliary tool for the imaging department doctors, which plays an important role in improving the working efficiency of the doctors, improving the diagnosis accuracy, and reducing the missed diagnosis rate and the misdiagnosis rate.
An invention disclosed in patent application number CN112233117A discloses a new coronal pneumonia CT detection, identification and positioning system and computing equipment, and the system comprises an image collection unit, a module building unit, a new coronal pneumonia lesion recognition unit, and a new coronal pneumonia lesion positioning unit. The image acquisition unit is used for acquiring a CT image of new coronal pneumonia to be identified and detected, a new coronal pneumonia CT image focus segmentation training data set and a new coronal pneumonia CT image identification training set; the module establishing unit is used for establishing a U_Net convolutional neural network model, and adding an InceptionV3 network of an attention mechanism and a target detection model; a new coronal pneumonia lesion identification unit identifies the contour feature image of the segmented lesion; the new coronal pneumonia lesion positioning unit determines the position of the lesion in the lung of the human body. According to the system, the U_Net convolutional neural network model is used for detecting and segmenting the new coronal lesion, the new coronal pneumonia is identified by adding the network of the attention mechanism, the position of the lesion in the lung is located through the target detection model, the identification accuracy is high, and the calculation speed is high.
An invention disclosed in patent application number CN111127438A discloses a lung CT image nodule detection method based on an improved YOLO algorithm and belongs to the technical field of computer medical images. The method comprises the following steps of: pre-processing a CT image; dividing grids; carrying out clustering analysis on a data set by using a K-means algorithm; and constructing a deep convolutional neural network with tight connection among multiple scales by referring to ideas of a Darkne-53 network and a Densenet network. Experimental results show that the accuracy and detection efficiency of pulmonary nodule detection by the newly improved deep convolutional neural network are both improved. According to the method, the precision ratio, recall ratio and efficiency of the pulmonary nodule detection are greatly improved; and conditions are provided for the real-time detection of pulmonary nodules in lung CT images.
An invention disclosed in patent application number CN112184684A discloses an improved YOLO-v3 algorithm and application thereof in pulmonary nodule detection, and the method comprises the steps: firstly optimizing a detection frame loss function of the YOLO-v3 algorithm, and optimizing an anchor frame according to an average intersection ratio; performing image scaling on the obtained data set, and performing cell division and feature extraction on an input image to complete improvement of a YOLO-v3 algorithm; then obtaining a Luna16 data set, and pre-processing images in the Luna16 data set, wherein pre-processing comprises the steps that the gray value of a CT image is converted into an HU value, then a mask is generated, and finally normalization and size unification are conducted; turning the pre-processed images by 90 degrees, 180 degrees and 270 degrees respectively, renaming the images according to a naming rule of PASCAL VOC, converting all data into a data set in a VOC format, meanwhile, inputting a training set divided from the data set into an improved YOLOv-3 algorithm for feature extraction, generating a nodule prediction box and a prediction confidence probability; and improving detection accuracy.
The present novel invention Deep Learning Method to Detect Chest X Ray or CT Scan Images based on Hybrid YOLO Model is built with tele-screening software, which can show the severity of an infection, lung opacity, confidence of the disease, and the location of the disease, which is an add-on feature. The present novel system is less time-consuming as it only takes a maximum of 20 seconds for the diagnosis. Therefore, doctors can diagnose more patients and start their treatment as soon as possible. The present novel system can also diagnose early-stage infection. Hospitals, small clinics, diagnosis centres, healthcare professionals all can use the present novel system. It can also be helpful in rural areas and villages. The processing of the feature map by using the convolutional neural network to obtain the feature information of the feature map has an input of the characteristic diagram which is the output of the ResNet system, which is then passed through a present YOLOV5 (present novel invention is compatible with YOLOV5 and above versions) network, and detecting to obtain characteristic information of the characteristic diagram; the YOLOV5 network comprises a plurality of convolution layers and a plurality of full-connection layers, wherein each convolution layer of the YOLOV5 network is used for extracting the features in the feature map, and each full-connection layer of the YOLOV5 network is used for predicting the image position and the class probability of the features. An automated diagnosis report is generated that specifies the level of severity (mild, moderate, severe) as well as the areas of infection (high, low). An automated report is generated which also calculates the parameters of lung opacity, cardiomegaly, pulmonary fibrosis, Aortic enlargement.

Objectives of the Invention

? Principal objective of the present novel invention is to identify lung disease stages, severity level (mild, moderate, severe) as well as areas of infection using improved YOLOV5.
? Another objective of the present invention is to determine the infection level of diseases in lung areas in a right to left manner.
? Another objective of the present invention is to use ground-based lung segmentation for abnormal detection and bifurcation in different ways.
? Another objective of the present invention is to determine the severity of infection by identifying and labelling the clear infected areas in the mild, moderate, and severe stages.
? Another objective of the present invention is that it can compute the parameter of lung opacity, cardiomegaly, pulmonary fibrosis, Aortic enlargement and shows the bounding box areas in x-ray images.
? Another objective of the present invention is able to be used at Hospitals, small clinics, diagnosis centres, healthcare professionals including rural areas and villages.
? Another objective of the present invention is to use Hybrid deep learning approach to achieve promising accuracy above 90% on lung disease.

List of Drawings
Figure 1: Overall Performance Graph
Figure 3: Sample outputs with marking infected area (output of the present model building, of scans with known label)
Figure 4: Sample outputs with marking infected area (output of the present model building, of the clinical unknown samples)
Figure 5: Flow chart for present novel hybrid YOLOV5 deep learning model building
Figure 6: YOLO CNN Architecture
Figure 7: YOLO CNN Architecture
Figure 8: System Process

Detailed Description of Invention
To further clarify the objects, technical solutions, and advantages of the present application, the present invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include but are not limited to, the following examples. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be understood that this invention is not limited to the particular methodology, protocols, systems, etc., described herein and as such may vary. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present invention, which is defined solely by the claims. As used in the specification and appended claims, unless specified to the contrary, the following terms have the meaning indicated below:
“Architecture” refers to a set of rules and methods that describe the functionality, organization, and implementation of computer systems.
"Convolutional Neural Network (CNN)” refers to a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery. CNNs use a variation of multilayer perceptrons designed to require minimal pre-processing. A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers (also known as convo layers), pooling layers, fully connected layers, and normalization layers. Convolutional layers apply a convolution operation to the input, passing the result to the next layer. Local or global pooling layers combine the outputs of neuron clusters at one layer into a single neuron in the next layer. Fully connected layers connect every neuron in one layer to every neuron in another layer. CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage.
The present novel XChesNet Model has been built on top of YOLO (XChes13Net2.0 and YOLOV5, the present novel system is compatible to higher version of same as well) by tuning its hyper parameter on the torch-1.6.0 framework.
Below are the diseases or infections which can be detected using a present novel hybrid deep learning integrated interface to inspect machine-based Chest CT scan or X-ray images:
1. Aortic enlargement
2. Atelectasis
3. Calcification
4. Cardiomegaly
5. ILD (Interstitial Lung Disease)
6. Infiltration
7. Lung Opacity
8. Nodule/Mass
9. Other lesion
10. Pleural effusion
11. Pleural thickening
12. Pneumothorax
13. Pulmonary fibrosis
14. Covid
15. Edema
16. Pneumonia
17. Tuberculosis

The present novel model is also capable of detecting whether the Chest CT scan or X-ray images of the patient are in normal condition.
Further detailed description and working of the present novel invention for development of novel YOLO Deep Learning Method for Detection of Critical Findings in Lung using Chest X-ray or CT scan Images based on ResNet, is given in the following flow along with the form of examples and detailed description:
1.1 Datasets
1.1.1 NIH Chest X-rays 112k Dataset
1.1.2 NIH Chest CT32K Dataset
1.1.3 VINBIG Chest Xray 18k Dataset
1.2 Reading the Scans
1.3 Developing the YOLOV5 Deep Learning Novel Model
1.3.1 Model Architecture
1.3.1.1 Backbone:
1.3.1.2 Neck
1.3.1.3 Head
1.3.2 While training, hyperparameters evolve over time, based on the accuracy
1.4 Machine Learning Algorithm Report (Results and Conclusion)

1.1 Datasets
For the development of a present novel hybrid deep learning integrated interface to inspect machine-based Chest CT scan or X-ray images to inspect, identify and diagnose seventeen types of pathological labels and clinical findings, the present novel system has used the following three datasets: NIH Chest X-rays 112k Dataset, NIH Chest CT32K Dataset, and VINBIG Chest Xray 18k Dataset.
1.1.1 NIH Chest X-rays 112k Dataset
National Institute of Health (NIH), Chest X-ray Dataset comprises 112,120 X-ray images with seventeen disease labels. In total, 108,948 frontal-view X-ray images are in the database, of which 24,636 images contain one or more pathologies. The remaining 84,312 images are normal cases. The main body of each chest X-ray report is generally structured as “Comparison”, “Indication”, “Findings”, and “Impression” sections. The said dataset is available on Kaggle platform at: https://www.kaggle.com/nih-chest-xrays/data, which the present novel system has used for the machine learning of the present novel invention. In the present invention, applicants have used this dataset for novel model building and for training purposes, also in the present novel invention development, applicants have used about 30k unique images from the said dataset for validation and testing of the novel developed algorithm. The present invention focuses on detecting disease concepts in the findings and impression sections.
1.1.2 NIH Chest CT32K Dataset
The National Institutes of Health’s Clinical Center has made a large-scale dataset of CT images publicly available to help the scientific community improve the detection accuracy of lesions. This dataset by NIH, named DeepLesion, has over 32,000 annotated lesions identified on CT images. The images, which have been thoroughly anonymized, represent 4,400 unique patients, who are partners in research at the NIH. The said dataset is available on Kaggle platform at: https://www.kaggle.com/kmader/nih-deeplesion-subset which the applicants have used for the machine learning and model building of the present novel invention. The dataset released is large enough to train a deep neural network – to develop a universal lesion detector that is helping radiologists to find all types of lesions. Based on this the present novel invention is developed to create a large-scale universal lesion detector with one unified framework. The present novel invention is capable of more accurately and automatically measuring sizes of all lesions a patient may have, enabling the whole body assessment in an easy, accurate, and efficient manner.
1.1.3 VINBIG Chest Xray 18k Dataset
The present invention uses preprocessed images from VinBigData. To the said dataset the original Dicom format is converted to a png image, preserving the resolution and aspect ratio. The Preprocess algorithm helps to augment the images, which are uploaded in the kernel processing, having the original size and lossless png. The VinDr-CXR dataset is built to provide a large dataset of chest X-ray (CXR) images with high-quality labels for the research community, this dataset has more than 100,000 raw images in DICOM format that were retrospectively collected from 108 Hospitals and the Hanoi Medical University Hospital, from Vietnam. The published dataset consists of 18,000 posteroanterior (PA) view CXR scans that come with both the localization of critical findings and the classification of common thoracic diseases. These images were annotated by a group of 17 radiologists with at least 8 years of experience for the presence of 22 critical findings (local labels) and 6 diagnoses (global labels); each finding is localized with a bounding box. The local and global labels correspond to the “Findings” and “Impressions” sections, respectively, of a standard radiology report. This dataset is divided into two parts: the training set of about 15,000 scans and the test set of about 3,000 scans. The said dataset is available on Kaggle platform at: https://www.kaggle.com/corochann/vinbigdata-chest-xray-original-png, which the present novel system has used for the machine learning and model building of the present novel invention.
1.2 Reading the Scans
For the development of the present novel hybrid YOLOV5 model, about 50% of images with known labels from above-mentioned dataset of Chest CT scan images and X-ray images from various sources (NIH Chest X-rays 112k Dataset, NIH Chest CT32K Dataset, and VINBIG Chest Xray 18k Dataset) were imported to the dataset. The said dataset has measurement and markings for clinically meaningful findings by the experienced radiologists, by an electronic bookmark tool, and was used to make the system learn and for model training.
Random images and reports of the Chest CT Scan or X-ray from the said dataset are further evaluated by a group of experienced radiologists to check the appropriateness of the said dataset.
For the compilation of master dataset preparation for the development of present novel hybrid deep learning integrated interface to inspect machine-based Chest CT scan or X-ray images, identify and diagnose seventeen types of pathological labels and clinical findings, the findings, and bookmarks mentioned in said clinical reports written by radiologists were considered as the gold standard.
The present novel hybrid YOLOV5 deep learning integrated interface was tested using 40% images with known labels and validated by 10% images with known labels based on the reports from the subset of said datasets (NIH Chest X-rays 112k Dataset, NIH Chest CT32K Dataset, and VINBIG Chest Xray 18k Dataset) to ensure that the inferred information was accurate and could be used as the gold standard.
1.3 Developing the Hybrid YOLOV5 Deep Learning Novel Model
The present novel YOLOV5 algorithm uses a completely different approach. The novel algorithm applies a single neural network to the entire full image. Then this network divides that image into regions which provides the bounding boxes and also predicts probabilities for each region. These generated bounding boxes are weighted by the predicted probabilities.
The non-max suppression technique makes sure that the object detection algorithm only detects each object once and it discards any false detections, it then gives out the recognized objects along with the bounding boxes.
YOLOV5 is a convolution neural network that consists of at least twenty-four convolutional layers and followed by at least two fully connected layers. Each layer has its own importance and the layers are separated by their functionality, as shown in figure 7.
— The first twenty convolutional layers followed by an average pooling layer and a fully connected layer is pre-trained on the ImageNet dataset which is a 1000-class classification dataset.
— The pretraining for classification is performed on the dataset with the image resolution of at least 224 x 224 × 3.
— The layers comprise 3×3 convolutional layers and 1x1 reduction layers.
— For object detection, in the end, the last four convolutional layers followed by two fully connected layers are added to train the network.
— Object detection requires more precise detail hence the resolution of the dataset is increased to at least 448 x 448.
— Then the final layer predicts the class probabilities and bounding boxes.
— All the other convolutional layers use leaky ReLU activation whereas the final layer uses a linear activation.
— The input is of 512 x 512 or (448 x 448) image and the output is the class prediction of the detected object enclosed in the bounding box.
Figure 5 represents the flow chart for the CheXNet and CTxNet based on present novel hybrid deep learning model building, the detailed description is as follows:
Deep learning is a form of machine learning where the model used is a neural network with a large number of (usually convolutional) layers. Training this model requires a large amount of data for which the truth is already known. Training is usually performed by an algorithm called backpropagation. In this algorithm, the model is iteratively modified to minimize the error between predictions of the model and the known ground truth for each data point.
Before training, dataset is separated in Train, Test and validation set and each set is consisting of images and its annotations file.
The present novel invention provides a developed hybrid YOLOV5 CNN, which is based on multiple neural network parameters, powered by YOLOV5 Framework.
In the present novel invention, Deep Learning Method to Diagnose Chest X-ray or CT scan Images based on Hybrid YOLOV5 CNN is also trained for identification of Lung infection, lung opacity, lung volume, bone density, and rib fractures.
1.3.1 Model Architecture:
Previous version of YOLO supplies images with its annotation to get trains, the YOLOV5 (the present novel system is compatible to higher version of same as well) also uses the same mechanism. For the development of present novel invention, it provides YOLOV5 that uses FP16 support for faster training, quantization support, and a flexible codebase. Wherein the FP16 refers to a floating point precision of 16 bits, there are 16 bits or 4 bytes used to store decimals. As most weights are long decimals, so floating point precision is important in deep learning Because of these features, the YOLOV5 framework is used to train the present model.
As shown in Figure 6, It is a novel convolutional neural network (CNN) that detects objects in real-time with better accuracy. This approach uses a single neural network to process the entire picture, then separates it into parts and predicts bounding boxes and probabilities for each component. These bounding boxes are weighted by the expected probability. The method “just looks once” at the image in the sense that it makes predictions after only one forward propagation run through the neural network. It then delivers detected items after non-max suppression (which ensures that the object detection algorithm only identifies each object once).
Its architecture mainly consisted of three parts, namely-
1.3.1.1. Backbone: Model Backbone is mostly used to extract key features from an input image. CSP (Cross Stage Partial Networks) are used as a backbone in YOLOV5 to extract rich in useful characteristics from an input image.
1.3.1.2. Neck: The Model Neck is mostly used to create feature pyramids. Feature pyramids aid models in generalizing successfully when it comes to object scaling. It aids in the identification of the same object in various sizes and scales.
Feature pyramids are quite beneficial in assisting models to perform effectively on previously unseen data. Other models, such as FPN, BiFPN, and PANet, use various sorts of feature pyramid approaches.
For the development of present novel PANet is used as a neck in YOLOV5 to get feature pyramids.
1.3.1.3. Head: The model Head is mostly responsible for the final detection step. It uses anchor boxes to construct final output vectors with class probabilities, objectness scores, and bounding boxes.
In summary, the data are first input to CSPDarknet for feature extraction and then fed to PANet for feature fusion. Finally, YOLOV5 Layer outputs detection results (class, score, location, size).
For CNN Architecture, the present invention uses YOLOV5’s backbone.
Before the image enters the CNN model, it is flattened along with augmentation with hyper-parameters which can be found in the next section. After flattening, the image is passed to the Input layer which is open with at least 512*512 matrix.
The layer is pre-defined with weights of YOLOV5.
CNN layer is designed with (Conv layer + BottleneckCSP) in recursive with sparsification (removing redundant information from model). The number of classes is defined as seventeen which served in the last layer of YOLOV5.
Here, the number classes refer to the total number of diseases or ailments that will be detected by the present model. The present invention uses the IMAGE resize variable to specify the image size when it is inserted into the model. Due to hardware limitations, the entire available data cannot be used, so in the present invention it gets divided into batches for processing. Epochs indicate the number of passes of the entire training dataset the machine learning algorithm has completed. While training, set the initial learning rate to 0.01 (SGD=1E-2, Adam=1E-3), and the final one cycle learning rate is set to 0.2, the momentum is taken as 0.937 with an optimizer weight decay of 0.0005. For warmup, set epochs to 3, momentum to 0.8, and initial bias learning rate to 0.1. For best results, set the box loss gain to 0.05, class loss gains to 0.5, and object loss gain to 1.0 to keep it in scale with pixels.
BCELoss creates a criterion that measures the Binary Cross Entropy between, the target and the output, set class BCELoss and object class BCELoss to 1.0. Set the iou (Intersection over Union) training threshold to 0.20, and anchors per output layer are set to 4. In the present invention, focal loss gamma is set to 0.0 which by default is 1.5. For data augmentation, set the hue to 0.015, saturation to 0.7, and value to 0.4. For image rotation set degrees to 0.0, set translation to 0.1, to scale the image, set scale and shear to 0.5 and 0.0. In the present invention the perspective range is set to 0-0.001. For further image augmentation, set the probability of flipping images from left to right to 50% and set the mosaic probability to 1 so that it is applied to all the images. And set the probability of flipping images from up-down to 0 so as to prevent it from happening.
Leaky ReLU and Sigmoid activation function has been used in this architecture. The Leaky ReLU activation function is used in middle/hidden layers and the sigmoid activation function is used in the final detection layer.
For optimization function in YOLOV5, two options are: SGD and Adam. The optimization function used for training is SGD.
1.3.2. While training, hyperparameters evolve over time, based on accuracy.
Along with this, during training of the present model, the hyper-parameter evolves over time, based on accuracy. Note: Following values are listed in tuples form (i.e., mutation scale 0-1, lower limit, upper limit)
Initial Learning Rate set to (1, 1e-5, 1e-1) with SGD class and Adam whereas final learning rate was maximized to (1, 0.01, 1.0). SGD momentum, Optimizer weight decay, warmup epochs, warmup initial momentum and warmup initial bias learning rate were set to (0.3, 0.6, 0.98), (1, 0.0, 0.001), (1, 0.0, 5.0), (1, 0.0, 0.95) and (1, 0.0, 0.2) respectively.
As the present invention uses YOLOV5 Framework, box loss gain, cls loss gain, class BCELoss positive weight, obj loss gain (scaled with pixels) and obj BCELoss positive_weight were set to (1, 0.02, 0.2), (1, 0.2, 4.0), (1, 0.5, 2.0), (1, 0.2, 4.0) and (1, 0.5, 2.0) respectively.
The Intersection over Union (IoU) training threshold for Object Detection is initialized with (0, 0.1, 0.7). Apart from this, anchor-multiple threshold and anchors per output grid (0 to ignore) are initialized to (1, 2.0, 8.0) and (2, 2.0, 10.0) respectively.
Regarding image tuning, focal loss gamma (efficientDet default gamma=1.5), image HSV-Hue augmentation (fraction), image HSV-Saturation augmentation (fraction), image rotation (+/- deg), image translation (+/- fraction), image scale (+/- gain), image shear (+/- deg) and image perspective (+/- fraction), range 0-0.001 are intalized to (0, 0.0, 2.0), (1, 0.0, 0.1), (1, 0.0, 0.9), (1, 0.0, 0.9), (1, 0.0, 45.0), (1, 0.0, 0.9), (1, 0.0, 0.9), (1, 0.0, 10.0) and (0, 0.0, 0.001) respectively.
Continuing to this, image flip and mixup parameters are image flip up-down (probability), image flip left-right (probability), image mixup (probability) and image mixup (probability) which are initialized with (1, 0.0, 1.0), (0, 0.0, 1.0), (1, 0.0, 1.0) and (1, 0.0, 1.0) respectively.
Input layer and Output layer of YOLOV5 CNN Architecture has been modified.
Input layer tuned for taking specified Image dimension, and output layer has been tuned to seventeen neurons which are used for seventeen classes.
Rest of YOLOV5 CNN architecture is left untouched, because rest arrangement of this architecture works well. Also, as the present invention tuned the hyper parameters, CNN structure has been tuned.
1.4 Machine Learning Algorithm Report (Results and Conclusions)
Overall, the present novel model performed well as compared to previous algorithm and alongside as it was being not able to locate the disease. There are some diseases where AP (Average Precision) is around 70% which represents further scope of significant improvement. Below is summarization of above mentioned work.
Machine Learning Algorithm named “XChesNet” is able to classify seventeen critical radiographic diseases. the present novel model is also capable of identifying whether the patient is in Normal Condition or not. The algorithm is trained by 18,000 images in total. Overall performance is shown in Figure 1. And, the following formulae have been used to calculate precision and sensitivity/recall. Moreover, the algorithm has been tested on a few sample images which have not been used to train the model and its output is shown in Figure 3 and 4.
Figure 3 represents - Sample outputs with marking infected area (output of the present model building, of scans with known label)
Figure 4 represents - Sample outputs with marking infected area (output of the present model building, of the clinical unknown samples)

Precision Calculation:

Equation 1: Precision Calculation

Sensitivity or Recall Calculation:

Equation 2: Recall calculation

The bounding box represents and identifies the pathology label for CT Scan or X-ray images to represent particular disease conditions as shown in figure 3 and 4.
AP (average precision) of Trained model to locate disease individually in an image are listed below:
AP: Calcification 91% ± 0.2%, Cardiomegaly: 89% ± 0.2%, Interstitial lung disease (ILD): 71% ± 0.2%, Infiltration: 85% ± 0.2%, Lung Opacity: 88% ± 0.2%, Nodule/Mass: 92% ± 0.2%, Other lession: 85% ± 0.2%, Pleural effusion: 79% ± 0.2%, Pulmonary fibrosis: 93% ± 0.2%, Covid: 94% ± 0.2%, Edema: 93% ± 0.2%, Pneumonia: 89% ± 0.2%, Tuberculosis: 96% ± 0.2%, Aortic enlargement: 95% ± 0.2%, Atelectasis: 92% ± 0.2%, Cardiomegaly: 91% ± 0.2% and Normal: 100% ± 0.2%

Detailed process to detect chest X-Ray or CT Scan images for seventeen lung diseases based on Hybrid YOLO Model is mentioned below:

Step 1: Upload the X-ray or CT-Scan image in given format i.e., .jpg., DICOM or .png file in the present novel system.
Step 2: Further the said image is feeded in the present novel system, where algorithm first checks that either image belongs to X-ray or CT-Scan, which is possible by feeding image into VGG-19 CNN Neural Network. After prediction from the CNN Neural Network, based on the result checked, if the image does not belong to the X-ray or CT-Scan, then it will reject the image and notify the user to upload X-ray or CT-Scan images, else it continues to feed the image in XChesNet Deep CNN.
Step 3: If the uploaded image belongs to the X-ray or CT-Scan, then the present novel system’s algorithm accepts it and sends it for the bifurcation of the disease analysis. Moreover, the algorithm will not compress the image size as it might lose its features.
Step 4: Further the present novel system, after the analysis of images it under-go for the diagnosis process where the disease prediction takes place and it also plot the boundary-boxes where actual diseases are being predicted with different colors of squared boxes. Along with this, boxes on the image shows predicted diseases and their confidence.
Step 5: Further the present novel system, after analyzing and predicting the uploaded image, it gives results to the user with a proper message of where to download the generated report in PDF format.

Documents

Application Documents

# Name Date
1 202223019813-STARTUP [31-03-2022(online)].pdf 2022-03-31
2 202223019813-FORM28 [31-03-2022(online)].pdf 2022-03-31
3 202223019813-FORM-9 [31-03-2022(online)].pdf 2022-03-31
4 202223019813-FORM-26 [31-03-2022(online)].pdf 2022-03-31
5 202223019813-FORM FOR STARTUP [31-03-2022(online)].pdf 2022-03-31
6 202223019813-FORM FOR SMALL ENTITY(FORM-28) [31-03-2022(online)].pdf 2022-03-31
7 202223019813-FORM 3 [31-03-2022(online)].pdf 2022-03-31
8 202223019813-FORM 18A [31-03-2022(online)].pdf 2022-03-31
9 202223019813-FORM 1 [31-03-2022(online)].pdf 2022-03-31
10 202223019813-FIGURE OF ABSTRACT [31-03-2022(online)].jpg 2022-03-31
11 202223019813-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [31-03-2022(online)].pdf 2022-03-31
12 202223019813-ENDORSEMENT BY INVENTORS [31-03-2022(online)].pdf 2022-03-31
13 202223019813-DRAWINGS [31-03-2022(online)].pdf 2022-03-31
14 202223019813-COMPLETE SPECIFICATION [31-03-2022(online)].pdf 2022-03-31
15 Abstract.jpg 2022-04-11
16 202223019813-FER.pdf 2022-07-13
17 202223019813-FER_SER_REPLY [23-08-2022(online)].pdf 2022-08-23
18 202223019813-Request Letter-Correspondence [13-03-2023(online)].pdf 2023-03-13
19 202223019813-Power of Attorney [13-03-2023(online)].pdf 2023-03-13
20 202223019813-FORM28 [13-03-2023(online)].pdf 2023-03-13
21 202223019813-Form 1 (Submitted on date of filing) [13-03-2023(online)].pdf 2023-03-13
22 202223019813-Covering Letter [13-03-2023(online)].pdf 2023-03-13
23 202223019813-US(14)-HearingNotice-(HearingDate-26-04-2024).pdf 2024-02-21
24 202223019813-Correspondence to notify the Controller [06-04-2024(online)].pdf 2024-04-06
25 202223019813-US(14)-ExtendedHearingNotice-(HearingDate-27-02-2025)-1430.pdf 2025-02-17
26 202223019813-Correspondence to notify the Controller [21-02-2025(online)].pdf 2025-02-21
27 202223019813-Written submissions and relevant documents [10-03-2025(online)].pdf 2025-03-10
28 202223019813-PatentCertificate15-10-2025.pdf 2025-10-15
29 202223019813-IntimationOfGrant15-10-2025.pdf 2025-10-15

Search Strategy

1 202221013695E_15-06-2022.pdf