Dual Head Cross Attention Fusion Based Iterative Dense Rnn

< Back

Dual Head Cross Attention Fusion Based Iterative Dense Rnn Architecture For Multimodal Lung Cancer Detection With Feedback Enabled Optimization

Abstract: The present invention discloses an advanced deep learning-based system for early and accurate detection of lung cancer using multimodal imaging data, specifically CT and histopathology images. The invention utilizes a Pyramid Dilated Spatial Pyramid Pooling-based R2Unet++ (PDSPP-R2Unet++) model for precise tumor segmentation, followed by the extraction of spatial, semantic, and morphological features. These features are integrated using a dual-head cross attention fusion mechanism and classified through an Iterative Dense Recurrent Neural Network (IDRNN). To enhance performance and reduce overfitting, an Enhanced Wild Gibbon Optimization Algorithm (EWGOA) is employed to dynamically tune model parameters. The proposed architecture demonstrates superior accuracy, interpretability, and robustness compared to conventional methods, making it a clinically viable solution for early lung cancer diagnosis.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

14 July 2025

Publication Number

38/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SR University

Ananthasagar, Hasnparthy (M), Waranagal Urban, Telangana 506371, India.

Mrs. Kasi Sailaja

Research Scholar, School of CS & AI, SR University, Ananthasagar, Hasnparthy (M), Waranagal Urban, Telangana 506371, INDIA.

Dr. Anurodh Kumar

Assistant Professor, School of CS & AI, SR University, Ananthasagar, Hasnparthy (M), Waranagal Urban, Telangana 506371, India.

Inventors

1. Mrs. Kasi Sailaja

Research Scholar, School of CS & AI, SR University, Ananthasagar, Hasnparthy (M), Waranagal Urban, Telangana 506371, INDIA.

2. Dr. Anurodh Kumar

Assistant Professor, School of CS & AI, SR University, Ananthasagar, Hasnparthy (M), Waranagal Urban, Telangana 506371, India.

Specification

Description:The present invention relates to the field of artificial intelligence (AI) and medical imaging, specifically to a multimodal lung cancer detection system using deep learning architectures. More particularly, it concerns a novel dual-head cross attention fusion-based iterative dense recurrent neural network that integrates CT and histopathology images with an advanced optimization algorithm to achieve high-accuracy lung cancer detection and classification.
BACKGROUND OF THE INVENTION
The following description of related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art.

Lung cancer is one of the deadliest and most prevalent cancers worldwide, responsible for a significant percentage of cancer-related deaths. Early diagnosis of lung cancer is crucial, as it increases the likelihood of successful treatment and improves patient survival rates. However, current diagnostic practices often detect the disease at advanced stages, limiting treatment options and reducing effectiveness. Despite the availability of imaging tools like X-rays, CT scans, PET, and biopsies, the process is often invasive, expensive, and prone to delays and inaccuracies.

Traditional imaging techniques such as CT and PET scans can be useful for detecting lung nodules, but they suffer from limitations in terms of resolution, contrast, and diagnostic certainty. PET scans have lower spatial resolution and often produce blurred tissue boundaries, while CT scans, despite higher resolution, lack the functional metabolic information needed to confirm malignancy. Moreover, manual interpretation of these images is time-consuming and prone to error, especially in identifying small or complex tumor regions.

Deep learning has emerged as a transformative tool in medical image analysis, especially for cancer detection. Convolutional Neural Networks (CNNs) have shown promising results by automatically learning hierarchical image features. However, they often lack the ability to capture long-range dependencies due to their limited receptive fields. Vision Transformers (ViTs) offer an alternative by introducing attention mechanisms for better global context understanding, although they require large datasets and extensive training.

Despite these advances, existing methods struggle with effectively integrating multimodal data, such as combining structural information from CT scans with cellular details from histopathology images. Many approaches either ignore one of the modalities or apply simplistic fusion techniques that do not fully capture the complementary nature of the data. Moreover, most models suffer from challenges such as overfitting, lack of interpretability, and high computational complexity, which hinders their real-world applicability.

The segmentation of tumor regions is another critical step in lung cancer detection. Popular methods like U-Net fail to capture multi-scale contextual information, especially in heterogeneous or irregular tumor regions. They often misclassify complex structures due to inadequate feature learning and poor generalization across different datasets. Additionally, segmentation methods often operate independently of the classification stage, resulting in disjointed pipelines that reduce overall accuracy.

Furthermore, many current fusion models do not employ attention mechanisms to prioritize critical spatial, semantic, and morphological features. This leads to information loss, especially when integrating features from multiple imaging modalities. Without effective optimization strategies, models tend to underperform on low-quality or low-contrast images, missing key indicators of early-stage malignancy.

OBJECTIVE OF THE INVENTION

Some of the objects of the present disclosure, which at least one embodiment herein satisfies are listed herein below.

The primary objective of the present invention is to design a robust and accurate system for early detection and classification of lung cancer using multimodal data integration. By combining CT and histopathology images, the invention aims to capture both macroscopic and microscopic features of tumors, providing a comprehensive diagnostic solution.

Another objective is to enhance the segmentation of tumor regions by using the Pyramid Dilated Spatial Pyramid Pooling-based R2Unet++ (PDSPP-R2Unet++) model. This model is designed to overcome limitations of existing segmentation networks by capturing multi-scale contextual information, thereby improving the accuracy of lesion boundary detection.

Another objective is to extract spatial, semantic, and morphological features from both CT and histopathology images and fuse them using a multi-head attention mechanism. This process ensures that critical features from both modalities are preserved and prioritized for improved classification performance.

The invention also seeks to incorporate a novel Dual-Head Cross Attention Fusion-based Iterative Dense Recurrent Neural Network (DCAF-IDNNOA) for robust and interpretable lung cancer classification. The model's architecture is designed to learn inter-modality relationships and temporal dependencies across feature representations.

Another objective is to implement a feedback-enabled optimization strategy using an Enhanced Wild Gibbon Optimization Algorithm (EWGOA). This algorithm dynamically adjusts model parameters, reducing overfitting and enhancing generalization across different datasets and image qualities.

The invention also aims to reduce computational complexity while maintaining high accuracy and interpretability. The proposed model is lightweight and scalable, making it suitable for deployment in real-time clinical settings without requiring high-end infrastructure.

The invention aspires to establish a comprehensive evaluation protocol using standard positive (e.g., accuracy, sensitivity, specificity) and negative (e.g., false positive rate) performance measures. This ensures that the model’s effectiveness is rigorously validated, paving the way for regulatory compliance and clinical adoption.
SUMMARY OF THE INVENTION
This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.

The present invention introduces an intelligent multimodal lung cancer detection framework that integrates CT and histopathology imaging data. The process begins with segmentation using the PDSPP-R2Unet++ model, followed by the extraction of spatial, semantic, and morphological features from both image types. These features are then fused via multi-head attention and further refined using a dual-head cross attention fusion module, enabling deep inter-modal correlation learning.

The fused features are processed by an iterative dense recurrent neural network (DCAF-IDNNOA) for final classification, while an Enhanced Wild Gibbon Optimization Algorithm (EWGOA) dynamically tunes the model’s parameters for improved performance. The architecture outperforms existing systems by addressing overfitting, enhancing interpretability, and offering a reliable, scalable, and clinically viable solution for early lung cancer detection.

BRIEF DESCRIPTION OF DRAWINGS
The accompanying drawings, which are incorporated herein, and constitute a part of this invention, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that invention of such drawings includes the invention of electrical components, electronic components or circuitry commonly used to implement such components.

FIG. 1 illustrates an exemplary structural description of proposed lung cancer detection model, in accordance with an embodiment of the present disclosure.

DETAILED DESCRIPTION OF THE INVENTION

In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.

The ensuing description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.

Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.

Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.

The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.

Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

The present invention discloses a novel deep learning architecture for the early detection and classification of lung cancer using multimodal imaging data, specifically computed tomography (CT) and histopathology images. The proposed pipeline is designed to extract, fuse, and interpret high-dimensional features across both modalities using attention mechanisms and iterative recurrent layers. The method introduces a systematic approach starting from segmentation to optimization, ensuring precision and clinical relevance.

Initially, both CT and histopathology images undergo a preprocessing step to normalize intensity and remove artifacts. These preprocessed images are then passed through a Pyramid Dilated Spatial Pyramid Pooling-based R2Unet++ (PDSPP-R2Unet++) segmentation network. This network enhances lesion boundary detection by capturing multi-scale contextual information through dilated convolutions and spatial pyramid pooling layers. The segmentation results in delineated tumor regions that are further used for feature extraction.

From the segmented tumor regions, three types of features are extracted: spatial, semantic, and morphological. Spatial features are captured using Convolutional Neural Networks (CNNs), semantic features represent the contextual meaning and location of structures, while morphological features include shape, size, and contour characteristics of the tumor. These features are independently extracted from both CT and histopathology modalities to ensure modality-specific richness.

To integrate these diverse features, a Multihead Attention Fusion mechanism is applied separately to each modality. This mechanism enables the model to assign appropriate weights to different feature types based on their importance in cancer detection. Following intra-modality fusion, a Dual-Head Cross Attention Fusion (DCAF) module is used to merge the feature vectors of CT and histopathology images. This cross-modal fusion captures correlations between modalities and generates a comprehensive feature representation for final classification.

The fused multimodal feature vector is passed through an Iterative Dense Recurrent Neural Network (IDRNN) which models long-range dependencies across the fused data and enables robust temporal learning. This network is designed to refine predictions by iteratively adjusting neuron activations across multiple dense layers, reducing the noise and improving classification confidence. The classification output includes cancer presence, type, and severity.

To optimize the model's learning process, an Enhanced Wild Gibbon Optimization Algorithm (EWGOA) is introduced. This metaheuristic algorithm fine-tunes key parameters such as learning rate, dropout ratio, and weight initializations. The feedback mechanism within EWGOA iteratively minimizes classification error by simulating intelligent gibbon behavior during exploration and exploitation phases. This ensures improved convergence, reduced overfitting, and better generalization across diverse datasets.

In one embodiment, CT and histopathology images are segmented independently using the PDSPP-R2Unet++ model. The segmentation network applies spatial pyramid pooling and dilated convolutions to capture multi-resolution information. This setup allows for precise delineation of heterogeneous tumor regions in both modalities. Once segmented, three streams of feature extraction occur: CNN layers extract spatial texture information, semantic features are derived via encoder-decoder pathways, and morphological properties such as perimeter and shape compactness are computed through geometric analysis.

This embodiment addresses the limitations of traditional U-Net and CNN models that fail to recognize complex tumor boundaries or miss high-level contextual features. By using PDSPP-R2Unet++ for both modalities, the system ensures consistency in region-of-interest identification, critical for reliable multimodal integration.

In the second embodiment, the extracted features from both modalities undergo attention-based fusion. Initially, multi-head self-attention is employed within each modality to identify the most significant intra-modal features. This allows the model to focus on spatially and contextually important regions. Afterward, a Dual-Head Cross Attention Fusion module is implemented, wherein features from CT images are used as queries while histopathology features act as keys and values and vice versa allowing bi-directional interaction.

This architecture effectively learns deep inter-modality relationships, which are particularly important when features such as vascular spread in CT and cellular anomalies in histopathology are jointly considered. The cross-attention mechanism ensures that modality-specific limitations are compensated by complementary strengths from the other modality.

In yet another embodiment, the system incorporates a feedback-enabled optimization module using Enhanced Wild Gibbon Optimization Algorithm (EWGOA) to refine model parameters. EWGOA mimics the intelligent foraging behaviour of wild gibbons to explore the parameter space efficiently. It dynamically updates the hyperparameters during training, such as the learning rate, number of hidden units, and dropout values, to prevent overfitting and ensure smooth convergence.

This optimization module includes a feedback loop that monitors classification error, precision, and F1-score during each training epoch. Based on this feedback, the algorithm adjusts search paths in the solution space to locate globally optimal configurations. As a result, the system demonstrates strong generalization on unseen data and performs well on low-contrast, noisy, or partially labelled medical images.

Moreover, the output of the model is not limited to binary classification. It can include probabilistic interpretation, histological subtype classification, and even provide heatmaps using Grad-CAM for clinical interpretability. This embodiment ensures that the system is not only technically accurate but also user-friendly and clinically actionable.

While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter to be implemented merely as illustrative of the invention and not as limitation.

We claim(s)

1. A system for multimodal lung cancer detection and classification, comprising:
• a segmentation module configured to process computed tomography (CT) and histopathology images using a Pyramid Dilated Spatial Pyramid Pooling-based R2Unet++ (PDSPP-R2Unet++) model for identifying tumor regions;
• a feature extraction module configured to derive spatial, semantic, and morphological features from the segmented images;
• a fusion module comprising multi-head attention mechanisms configured to perform intra-modality and inter-modality feature fusion using a Dual-Head Cross Attention Fusion structure;
• a classification module comprising an Iterative Dense Recurrent Neural Network (IDRNN) configured to process the fused features and output a diagnostic classification; and
• an optimization module configured to dynamically tune one or more hyperparameters of the classification module using an Enhanced Wild Gibbon Optimization Algorithm (EWGOA),
• wherein the system is operable to detect lung cancer with improved accuracy, robustness, and interpretability.

2. The system of claim 1, wherein the segmentation module uses dilated convolutions and spatial pyramid pooling to capture multi-scale contextual information from the input images.
3. The system of claim 1, wherein the feature extraction module employs convolutional neural networks (CNNs) to extract spatial features, and morphological feature computation based on shape, size, and texture of tumor boundaries.
4. The system of claim 1, wherein the fusion module applies multi-head self-attention within each modality and dual-head cross attention across CT and histopathology modalities to enhance feature correlation.
5. The system of claim 1, wherein the classification module processes the fused features in an iterative manner to capture long-range dependencies and temporal relationships among the features.
6. The system of claim 1, wherein the optimization module adjusts learning rate, dropout ratio, and neuron weights of the IDRNN during training using population-based exploration and exploitation strategies of EWGOA.
7. The system of claim 1, wherein the system generates output comprising cancer type classification selected from the group consisting of benign, malignant, or indeterminate categories.
8. The system of claim 1, wherein the classification output includes probability scores and visual interpretability maps generated using gradient-based class activation mapping (Grad-CAM).
9. The system of claim 1, wherein the system performance is evaluated using at least one metric selected from the group consisting of accuracy, sensitivity, specificity, precision, F1-score, negative predictive value (NPV), and Matthews correlation coefficient (MCC).
, Claims:1. An IoT Smart ECG Monitoring System comprising a wearable ECG sensor, a wireless communication module, and a cloud-based AI-driven analytics platform for real-time cardiac monitoring, anomaly detection, and remote patient management.
2. The system of claim 1, wherein the wearable ECG sensor includes bio-compatible electrodes and low-power signal acquisition circuitry to ensure continuous, long-term, and non-invasive cardiac monitoring with minimal patient discomfort.
3. The system of claim 2, wherein the wireless communication module utilizes Bluetooth Low Energy (BLE), Zigbee, LoRa, or 5G technology for seamless and energy-efficient ECG data transmission to a smartphone or cloud server.
4. The system of claim 3, wherein an AI-powered algorithm processes ECG signals to detect arrhythmias, tachycardia, bradycardia, and other cardiac anomalies with high accuracy, generating alerts for patients and healthcare providers.
5. The system of claim 4, wherein edge computing is integrated to perform preliminary ECG signal processing, noise filtering, and anomaly detection on a local gateway before transmitting critical data to the cloud.
6. The system of claim 5, wherein the cloud-based platform stores historical ECG data, enabling trend analysis, predictive analytics, and remote access for healthcare professionals through a secure web or mobile dashboard.
7. The system of claim 6, wherein real-time alerts and notifications are sent via mobile applications, SMS, or email when abnormal heart activity is detected, ensuring timely medical intervention.
8. The system of claim 7, wherein the wearable ECG sensor incorporates motion compensation algorithms and adaptive filtering techniques to minimize motion artifacts and improve signal accuracy during physical activity.
9. The system of claim 8, wherein blockchain-based security mechanisms and homomorphic encryption ensure end-to-end data privacy, integrity, and compliance with healthcare regulations such as HIPAA and GDPR.
10. The system of claim 9, wherein federated learning enables AI model updates across multiple medical institutions without sharing raw patient data, allowing collaborative research while maintaining data privacy.

Documents

Application Documents

#	Name	Date
1	202541067119-STATEMENT OF UNDERTAKING (FORM 3) [14-07-2025(online)].pdf	2025-07-14
2	202541067119-REQUEST FOR EARLY PUBLICATION(FORM-9) [14-07-2025(online)].pdf	2025-07-14
3	202541067119-FORM-9 [14-07-2025(online)].pdf	2025-07-14
4	202541067119-FORM 1 [14-07-2025(online)].pdf	2025-07-14
5	202541067119-DRAWINGS [14-07-2025(online)].pdf	2025-07-14
6	202541067119-DECLARATION OF INVENTORSHIP (FORM 5) [14-07-2025(online)].pdf	2025-07-14
7	202541067119-COMPLETE SPECIFICATION [14-07-2025(online)].pdf	2025-07-14
8	202541067119-FORM-26 [16-09-2025(online)].pdf	2025-09-16