A Cardiac Mri Segmentation System Using Dual Attention Transformer U

< Back

A Cardiac Mri Segmentation System Using Dual Attention Transformer U Net (Cardio Da Tunet)

Abstract: A CARDIAC MRI SEGMENTATION SYSTEM USING DUAL ATTENTION TRANSFORMER U-NET (CARDIO DA-TUNET) The invention discloses a system and method for automated segmentation of cardiac structures from cardiac magnetic resonance imaging (MRI) using a novel architecture termed Cardio DA-TUNet. The system comprises a preprocessing module for normalization and augmentation, an encoder integrating convolutional layers with a transformer encoder block for local and global feature extraction, and a decoder with up-sampling layers for segmentation reconstruction. Dual attention modules combining channel attention and spatial attention are incorporated into skip connections to refine features and enhance boundary localization. Hierarchical feature fusion ensures preservation of multi-scale representations, improving accuracy in segmenting left ventricle, right ventricle, and myocardium. The method involves preprocessing, feature extraction, attention-based refinement, and segmentation map generation. The invention provides robust and accurate segmentation across diverse datasets and imaging conditions, addressing limitations of CNN-only or transformer-only models. It enhances consistency, interpretability, and clinical applicability in cardiovascular diagnosis and treatment planning.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

22 September 2025

Publication Number

43/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SR UNIVERSITY

ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Inventors

1. D RAJANI

DEPT. OF ECE, SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

2. SREEDHAR KOLLEM

DEPT. OF ECE, SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Claims

1. A system for automated cardiac MRI segmentation, comprising: • a preprocessing module configured to normalize and augment cardiac MRI images; • an encoder comprising convolutional layers and a transformer encoder block for extracting local features and modeling global dependencies; • a decoder configured with up-sampling layers for reconstructing segmentation maps; • dual attention modules comprising channel attention and spatial attention, integrated into skip connections for feature refinement; • a hierarchical feature fusion mechanism combining multi-scale representations; and • an output module configured to generate segmentation masks of left ventricle, right ventricle, and myocardium.

2. The system as claimed in claim 1, wherein the preprocessing module applies normalization, denoising, and elastic deformation for training augmentation.

3. The system as claimed in claim 1, wherein the transformer encoder employs self-attention to capture long-range spatial dependencies.

4. The system as claimed in claim 1, wherein the dual attention modules enhance boundary localization of cardiac structures by emphasizing relevant channels and regions.

5. The system as claimed in claim 1, wherein the hierarchical feature fusion combines semantic and edge-level features for accurate segmentation.

6. A method for automated cardiac MRI segmentation, comprising the steps of: • preprocessing cardiac MRI images by applying normalization and augmentation; • extracting local and global features through convolutional layers and a transformer encoder block; • segmenting cardiac structures using a decoder with up-sampling layers and skip connections; • refining segmentation through dual attention modules comprising channel and spatial attention; • fusing hierarchical features from multiple scales; and • generating segmentation masks of the left ventricle, right ventricle, and myocardium.

7. The method as claimed in claim 6, wherein the preprocessing step includes elastic deformation, rotation, and intensity scaling.

8. The method as claimed in claim 6, wherein the transformer encoder captures anatomical dependencies across distant regions of MRI slices.

9. The method as claimed in claim 6, wherein the dual attention modules improve accuracy of boundary delineation in low-contrast regions.

10. The method as claimed in claim 6, wherein the method operates in 2D, 2.5D, or 3D modes depending on dataset and clinical requirements.

Specification

Description:FIELD OF THE INVENTION
The present invention relates to medical imaging and artificial intelligence. More particularly, it relates to a deep learning-based system and method for automated segmentation of cardiac structures from cardiac magnetic resonance imaging (MRI). Specifically, it introduces a hybrid architecture combining dual attention mechanisms and transformer-based encoding within a U-Net framework to achieve accurate and robust segmentation of the left ventricle, right ventricle, and myocardium.
BACKGROUND OF THE INVENTIO
Accurate segmentation of cardiac structures i.e., the left ventricle (LV), right ventricle (RV), and myocardium (MYO) from cardiac magnetic resonance imaging (MRI) is a vital clinical diagnosis and treatment planning task for cardiovascular diseases. Traditional convolutional neural network (CNN) based models tend to be impeded in their ability to discern global contextual dependencies and multi-scale spatial patterns, especially for images with low contrast, motion artifacts, and anatomical differences between patients.
Current segmentation approaches either use solely convolutional encoders, which are incapable of modelling long-range dependencies, or use transformer architectures without incorporating local feature attention, resulting in poor performance in identifying fine structural boundaries. In addition, most models suffer from a lack of generalizability and robustness to different cardiac MRI datasets.
Hence, an innovative deep learning model that seamlessly integrates global attention mechanisms with local feature enhancement is needed to attain robust, accurate, and consistent cardiac structure segmentation from MRI images.
US8494236B2: A method for cardiac segmentation in magnetic resonance (MR) cine data, includes providing a time series of 3D cardiac MR images acquired at a plurality of phases over at least one cardiac cycle, in which each 3D image includes a plurality of 2D slices, and a heart and blood pool has been detected in each image. Gray scales of each image are analyzed to compute histograms of the blood pool and myocardium. Non-rigid registration deformation fields are calculated to register a selected image slice with corresponding slices in each phase. Endocardium and epicardium gradients are calculated for one phase of the selected image slice. Contours for the endocardium and epicardium are computed from the gradients in the one phase, and the endocardium and epicardium contours are recovered in all phases of the selected image slice. The recovered endocardium and epicardium contours segment the heart in the selected image slice.
US9412044B2: A method (10) for respiratory motion compensation by applying principle component analysis (PCA) on cardiac imaging samples obtained using 2D/3D registration of a pre-operative 3D segmentation of the coronary arteries.
Existing convolutional neural networks and U-Net based models are constrained by limited receptive fields, preventing them from fully capturing global dependencies in cardiac MRI images. Attention-based U-Nets enhance focus on relevant structures but typically employ either spatial or channel attention in isolation, which reduces their ability to distinguish fine boundaries in complex cardiac anatomy. Transformer-based methods capture long-range dependencies but often neglect local refinement, performing poorly on small or ambiguous structures. Furthermore, many existing solutions lack robustness to inter-patient variability, imaging artifacts, and low-contrast regions. The present invention overcomes these limitations by combining transformer encoding with dual attention modules in a unified segmentation architecture that ensures both global contextual modeling and local feature refinement.
SUMMARY OF THE INVENTION
This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention.
This summary is neither intended to identify key or essential inventive concepts of the invention and nor is it intended for determining the scope of the invention.
The invention provides an automated segmentation framework called Cardio DA-TUNet that combines convolutional, attention, and transformer-based modules for precise delineation of cardiac structures. The framework utilizes a transformer encoder to model global dependencies, while dual attention modules—channel attention and spatial attention—refine features at local levels. These modules are integrated into the encoder-decoder pipeline with skip connections, ensuring preservation of multi-scale features.
The system preprocesses cardiac MRI images for normalization and augmentation before segmentation. Hierarchical feature fusion ensures both edge-level and semantic-level representations are maintained. Training is optimized using a hybrid loss function combining Dice loss and cross-entropy loss, improving accuracy in challenging datasets.
The method is adaptable for 2D, 2.5D, or 3D implementations, making it suitable for both slice-wise and volumetric segmentation. It generalizes across multiple datasets, including ACDC, Sunnybrook, and M&Ms, and is robust to anatomical variations and imaging conditions.
The novelty lies in combining dual attention mechanisms with a transformer encoder in a U-Net-based framework, providing significant improvement in segmentation accuracy, consistency, and robustness compared to existing CNN-only or transformer-only architectures.
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
The present invention introduces a new deep learning architecture for automatically segmenting cardiac structures, specifically the left ventricle (LV), right ventricle (RV), and myocardium (MYO) from cardiac magnetic resonance imaging (CMRI) data. This model, called Cardio DA-TUNet (Dual Attention Transformer U-Net), combines the benefits of dual attention methods and transformer-based encoding within a single U-Net-inspired framework.
BRIEF DESCRIPTION OF THE DRAWINGS
The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
FIGURE 1: SYSTEM ARCHITECTURE
The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION OF THE INVENTION
The detailed description of various exemplary embodiments of the disclosure is described herein with reference to the accompanying drawings. It should be noted that the embodiments are described herein in such details as to clearly communicate the disclosure. However, the amount of details provided herein is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
It is also to be understood that various arrangements may be devised that, although not explicitly described or shown herein, embody the principles of the present disclosure. Moreover, all statements herein reciting principles, aspects, and embodiments of the present disclosure, as well as specific examples, are intended to encompass equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a",” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
In addition, the descriptions of "first", "second", “third”, and the like in the present invention are used for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The present invention introduces a new deep learning architecture for automatically segmenting cardiac structures, specifically the left ventricle (LV), right ventricle (RV), and myocardium (MYO) from cardiac magnetic resonance imaging (CMRI) data. This model, called Cardio DA-TUNet (Dual Attention Transformer U-Net), combines the benefits of dual attention methods and transformer-based encoding within a single U-Net-inspired framework.
Architectural Overview
The invention features an encoder-decoder structure improved with these key innovations:
 Dual Attention Modules: Found in the skip connections and decoder stages, these modules have both spatial attention and channel attention branches. Channel attention, such as Squeeze-and-Excitation, models relationships between channels. This improves the features that distinguish different modalities. Spatial attention zeroes in on important anatomical areas. It reduces unnecessary background noise and helps better locate cardiac boundaries.
 Transformer Encoder Block: Placed after the initial convolutional layers, the transformer encoder captures global context among positions in the CMRI slices. The model uses self-attention methods, which let the network learn long-distance interactions among pixels. This is vital in situations with weak boundary contrast or high anatomical differences.
 Hierarchical Feature Fusion: Features from different resolutions in the encoder are combined through up sampling paths and attention-modulated skip connections. This keeps both low-level edge features and high-level semantic representations intact.
 Optimization and Training: The architecture is trained with composite loss functions, such as combining Dice Loss and Cross-Entropy Loss, to improve segmentation accuracy. Optional data augmentation, including elastic deformation and intensity changes, is used to improve generalization across various CMRI datasets.
 Deployment Flexibility: The model allows for 2D and 2.5D implementations for slice-wise segmentation and can expand to 3D volume processing. It works with standard CMRI datasets like ACDC, M&Ms, Sunnybrook, and DICOM-formatted clinical data.
 Deployment Flexibility: The model supports 2D and 2.5D implementations for slice-wise segmentation, and can be extended to 3D volume processing. It is compatible with industry-standard CMRI datasets (e.g., ACDC, M&Ms, Sunnybrook) and DICOM-formatted clinical data.
The proposed invention presents a hybrid deep learning architecture that combines dual attention mechanisms (channel and spatial attention) with a transformer-based encoder within a U-Net-like segmentation framework. It is specifically optimized for cardiac MRI (CMRI) segmentation.
The invention introduces a deep learning system for cardiac MRI segmentation, designed with an encoder-decoder structure inspired by the U-Net architecture.
The encoder initially applies convolutional layers to extract low-level features from MRI inputs. A transformer encoder block follows, applying self-attention mechanisms to model global dependencies across the image. This enables the system to capture long-range spatial relationships essential for accurate cardiac segmentation.
The decoder reconstructs the segmented image using up-sampling layers. Skip connections between encoder and decoder stages preserve intermediate representations. These skip pathways are augmented with dual attention modules comprising both channel attention and spatial attention. Channel attention emphasizes inter-channel feature relevance, while spatial attention highlights important regions within the MRI slices, improving boundary localization of cardiac structures.
The architecture supports hierarchical feature fusion, combining features from multiple scales to ensure both fine detail and semantic information are preserved during reconstruction. This is particularly beneficial for segmenting thin myocardial walls and differentiating between left and right ventricles.
To improve model generalization, the system incorporates data augmentation methods, including elastic deformation, rotation, and intensity scaling. Preprocessing also includes normalization and denoising to enhance contrast in low-visibility regions.
Training of the model employs composite loss functions, combining Dice similarity coefficient loss with categorical cross-entropy. This balance enhances performance in both large and small structures.
Deployment flexibility is provided by enabling the model to function in 2D mode for slice-wise segmentation, 2.5D mode for context-aware slice groups, and 3D mode for volumetric segmentation. The system is tested across publicly available datasets and clinical data in DICOM format, demonstrating robustness to variations in patient anatomy, scanner hardware, and imaging artifacts.
Comparative evaluations show that Cardio DA-TUNet outperforms conventional U-Net, Attention U-Net, and transformer-only architectures in segmentation accuracy and structural consistency. The modular design allows the system to be adapted to other medical image segmentation tasks beyond cardiac MRI by retraining with relevant datasets.
The invention can be deployed in clinical workflows for automated cardiac structure analysis, supporting diagnosis, treatment planning, and follow-up monitoring.
The system offers significant advantages, including reduced need for manual correction, improved consistency across datasets, and computational efficiency balanced with high accuracy.
This hybrid approach addresses the limitations of current methods by combining local and global feature learning in a unified and interpretable model.
Best Method of Working
The best method of working involves training the Cardio DA-TUNet on a large dataset of annotated cardiac MRI scans, using preprocessing steps such as normalization and augmentation. The model is implemented in a 2.5D mode for balance between computational efficiency and contextual accuracy. The transformer encoder is configured to process intermediate-level features, while dual attention modules are integrated into skip connections for refinement. The trained model is deployed on GPU-enabled servers, integrated into clinical imaging systems for real-time segmentation. This ensures accurate delineation of cardiac structures, supports clinical decision-making, and maintains robustness across varied datasets.

, Claims:1. A system for automated cardiac MRI segmentation, comprising:
• a preprocessing module configured to normalize and augment cardiac MRI images;
• an encoder comprising convolutional layers and a transformer encoder block for extracting local features and modeling global dependencies;
• a decoder configured with up-sampling layers for reconstructing segmentation maps;
• dual attention modules comprising channel attention and spatial attention, integrated into skip connections for feature refinement;
• a hierarchical feature fusion mechanism combining multi-scale representations; and
• an output module configured to generate segmentation masks of left ventricle, right ventricle, and myocardium.
2. The system as claimed in claim 1, wherein the preprocessing module applies normalization, denoising, and elastic deformation for training augmentation.
3. The system as claimed in claim 1, wherein the transformer encoder employs self-attention to capture long-range spatial dependencies.
4. The system as claimed in claim 1, wherein the dual attention modules enhance boundary localization of cardiac structures by emphasizing relevant channels and regions.
5. The system as claimed in claim 1, wherein the hierarchical feature fusion combines semantic and edge-level features for accurate segmentation.
6. A method for automated cardiac MRI segmentation, comprising the steps of:
• preprocessing cardiac MRI images by applying normalization and augmentation;
• extracting local and global features through convolutional layers and a transformer encoder block;
• segmenting cardiac structures using a decoder with up-sampling layers and skip connections;
• refining segmentation through dual attention modules comprising channel and spatial attention;
• fusing hierarchical features from multiple scales; and
• generating segmentation masks of the left ventricle, right ventricle, and myocardium.
7. The method as claimed in claim 6, wherein the preprocessing step includes elastic deformation, rotation, and intensity scaling.
8. The method as claimed in claim 6, wherein the transformer encoder captures anatomical dependencies across distant regions of MRI slices.
9. The method as claimed in claim 6, wherein the dual attention modules improve accuracy of boundary delineation in low-contrast regions.
10. The method as claimed in claim 6, wherein the method operates in 2D, 2.5D, or 3D modes depending on dataset and clinical requirements.

Documents

Application Documents

#	Name	Date
1	202541090179-STATEMENT OF UNDERTAKING (FORM 3) [22-09-2025(online)].pdf	2025-09-22
2	202541090179-REQUEST FOR EARLY PUBLICATION(FORM-9) [22-09-2025(online)].pdf	2025-09-22
3	202541090179-POWER OF AUTHORITY [22-09-2025(online)].pdf	2025-09-22
4	202541090179-FORM-9 [22-09-2025(online)].pdf	2025-09-22
5	202541090179-FORM FOR SMALL ENTITY(FORM-28) [22-09-2025(online)].pdf	2025-09-22
6	202541090179-FORM 1 [22-09-2025(online)].pdf	2025-09-22
7	202541090179-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [22-09-2025(online)].pdf	2025-09-22
8	202541090179-EVIDENCE FOR REGISTRATION UNDER SSI [22-09-2025(online)].pdf	2025-09-22
9	202541090179-EDUCATIONAL INSTITUTION(S) [22-09-2025(online)].pdf	2025-09-22
10	202541090179-DRAWINGS [22-09-2025(online)].pdf	2025-09-22
11	202541090179-DECLARATION OF INVENTORSHIP (FORM 5) [22-09-2025(online)].pdf	2025-09-22
12	202541090179-COMPLETE SPECIFICATION [22-09-2025(online)].pdf	2025-09-22