Sign In to Follow Application
View All Documents & Correspondence

A System For Segmentation Of Pancreatic Tumors In Volumetric Computed Tomography Images

Abstract: A system (100) and method implemented on a computing device (D) configured to segment pancreatic ductal adenocarcinoma in three-dimensional computed-tomography scans. An input module (101) receives the scan and a preprocessing module (102) normalizes, resizes and augments the data by random flipping and Gaussian noise. A synthetic modality generation module (103) builds a four-channel volume from the original image, a histogram equalized image, a Gaussian-blurred image and a percentile threshold image. A processing module (104) comprises a deep learning module (105) with an encoder module (105A), an attention module (105C) at skip connections and a decoder module (105B). The output module (106) applies a one-by-one convolution and sigmoid activation to create a single channel probability map, then thresholds the map to give a binary mask of tumor voxels and output via a display interface, GUI or storage medium operatively coupled with the computing device (D).

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
16 July 2025
Publication Number
31/2025
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

AMRITA VISHWA VIDYAPEETHAM
Amrita Vishwa Vidyapeetham, Amritapuri Campus, Amritapuri, Clappana PO, Kollam, Kerala - 690525

Inventors

1. PANICKER, Dr. Manju Bhaskara Radha
Srinilayam, Power House road, Tholicode PO, Punalur, Kollam, Kerala – 691333, India
2. REMA, Parvathy
Sreebhavanam, Temple Nagar 115, Padinjattinkara, Kottarakara PO, Kollam, Kerala – 691506, India

Specification

Description:FIELD OF THE INVENTION
The present invention relates to the field of medical image analysis and imaging systems. More particularly, the present invention relates to a system and method for automatic segmentation of pancreatic ductal adenocarcinoma (PDAC) tumors in three-dimensional computed tomography (CT) image volumes using a condition-aware attention-gated three-dimensional UNet (CAAG-UNet3D) model to improve segmentation accuracy and clinical usability.

BACKGROUND OF THE INVENTION
Pancreatic ductal adenocarcinoma (PDAC) is a devastating malignancy, known for its high mortality rate and resistance to current therapies. Despite advances in oncology, the prognosis for PDAC remains grim, with a five-year survival rate below 10%. Pancreatic ductal adenocarcinoma (PDAC) presents significant segmentation challenges due to its heterogeneous structure and low contrast with surrounding tissues. It remains one of the deadliest cancers due to the difficulty of delineating tumor boundaries in standard computed tomography (CT) images. Accurate segmentation of PDAC tumors from CT scans is crucial for staging, treatment planning and monitoring but manual segmentation is time-consuming, prone to inter-observer variability and lacks consistency across clinical settings.
Conventional methods for automatic tumor segmentation either rely on two-dimensional slices or require multiple scans with different acquisition protocols to enhance contrast and tumor visibility. These approaches are often constrained by the limitations of standard hardware and clinical imaging protocols and may suffer from inconsistent spatial feature representation and high false positive rates in regions with noise or background tissues.
One of the primary limitations associated with conventional segmentation methods is their dependency on single-modality inputs and their inability to adapt to patient-specific imaging features. The volumetric CT images present challenges related to intensity variation, noise and spatial complexity. Existing deep learning models applied to medical imaging typically operate on single-channel inputs, struggle with poor boundary resolution and lack ways to adapt to patient specific imaging variations. These models fail to adequately focus on tumor relevant regions within complex anatomical structures, especially in the pancreas and fail to focus on tumor-relevant regions effectively especially when tumors exhibit low contrast against neighboring anatomical structures, therefore are susceptible to degraded performance, particularly in cases where tumors are small, irregularly shaped or partially obscured by adjacent organs. Without the mechanisms to emphasize tumor zones dynamically, these systems tend to generate inaccurate boundaries and include false positives
Reference is made to Patent application no. CN118485643A titled “A medical image analysis and processing system based on image analysis”, disclosing a medical image analysis and processing system based on image analysis, which relates to the technical field of medical image analysis and specifically includes an image acquisition module, a data preprocessing module, an image segmentation module, a feature extraction module, a decision analysis module and a result output module. The image segmentation module segments the lesion area image and the organ area image from the preprocessed medical image data through a fusion model and the fusion model is specifically a fusion model of multi-scale U-Net, PSPNet, hybrid CNN-Transformer and multi-task learning; the decision analysis module uses PU semi-supervised learning to combine positive samples and unlabeled data to train the classification model and implements a semi-supervised disease classification strategy through the classification model. The present invention uses PU semi-supervised learning to effectively utilize limited labeled data and a large amount of unlabeled data, thereby improving the ability to extract image features and the accuracy of classification and segmentation.
Another reference is made to a Patent application no. CN111612754B titled “MRI tumor optimization segmentation method and system based on multi-modal image fusion” disclosing an MRI tumor optimization segmentation method and system based on multi-modal image fusion, comprising the following steps: step 1: constructing an MRI tumor multi-modal image fusion network; and 2, step: constructing a multi-modal 3D network for enhancing tumor image segmentation; and 3, step 3: constructing a significance loss function based on image fusion of GAN; and 4, step 4: constructing a Mask attention mechanism contrast loss function; and 5: constructing an SSIM loss function; and 6: and performing MRI tumor optimized segmentation according to the MRI tumor multi-modal image fusion network, the multi-modal 3D network and three loss functions. When the deep architecture is trained, one residual unit is helpful; the recursive residual convolution layer is used for carrying out feature accumulation, so that better feature representation is provided for the segmentation task; and a U-NET architecture with the same network parameters and better performance is designed for medical image segmentation.
Therefore, there exists a pressing need for a dedicated system for PDAC segmentation that operates efficiently on standard single-scan CT inputs, enhances tumor contrast using synthetic modality construction and adapts to patient-specific imaging characteristics using an intelligent, condition-aware attention mechanism. Such a system must provide robust and clinically deployable segmentation outputs, optimized through loss functions that can balance precision, recall and spatial accuracy under challenging class imbalance conditions.

ADVANTAGES OF THE INVENTION OVER THE EXISTING STATE OF ART
The present invention provides a technically advanced system for the automatic segmentation of pancreatic ductal adenocarcinoma in volumetric computed tomography images. The present invention addresses the gaps in the exiting state of the art models by providing a condition-aware attention-gated three-dimensional UNet model that enables patient-specific feature modulation and real-time segmentation using synthetic multimodal inputs generated from a single scan.
The system derives a synthetic multimodal representation from a single computed tomography scan, eliminating the need for multiple acquisitions while preserving complementary tumor-relevant contrast features. This enhances tumor visibility without introducing scan-to-scan variability and increases consistency and applicability in clinical settings. The three-dimensional encoder–decoder architecture incorporates residual convolutional blocks for robust feature extraction and reconstruction. The system is designed to operate on volumetric data, which improves spatial continuity in segmentation and the condition specific attention modules are embedded at every skip connection within the network which dynamically adapt to patient-specific imaging features. It selectively emphasizes tumor relevant regions while suppressing irrelevant background information before feature fusion resulting in improved segmentation precision in complex anatomical contexts. The condition-specific attention module of the three-dimensional UNet model dynamically adapts to patient-specific features, enabling more precise focus on tumor relevant regions, thereby enhancing the sensitivity to pancreatic ductal adenocarcinoma heterogeneity.

OBJECTS OF THE INVENTION
In order to overcome the limitations in the existing state of the art, the primary object of the present invention is to provide a computer implemented system and method for automatic segmentation of pancreatic ductal adenocarcinoma in three-dimensional computed tomography images using a condition-specific attention-based deep learning architecture.
Yet another object of the present invention is to employ a condition-aware attention-gated three-dimensional UNet (CAAG-UNet3D) model comprising residual convolution blocks and skip connections, configured to extract, transform and reconstruct spatial features from the synthetic multimodal volume with high fidelity.
Another object of the present invention is to derive a synthetic multimodal image volume from a single computed-tomography scan by stacking four processed representations of the original image volume, thereby enhancing tumor contrast without multiple scans.
Yet another object of the present invention is to employ a three-dimensional encoder–decoder model with residual convolution blocks and skip connections, configured to extract, transform and reconstruct spatial features from the synthetic multimodal volume with high fidelity.
Yet another object of the present invention is to integrate a condition-specific attention mechanism at each skip connection that adapts to patient-specific features so tumor relevant regions are emphasized and background features are suppressed before merging with decoder features.
Yet another object of the present invention is to generate a voxel-level probability map by applying a one-by-one convolution followed by sigmoid activation and to obtain a binary segmentation mask of pancreatic tumor regions by thresholding the probability values.
Yet another object of the present invention is to train the segmentation model with a hybrid loss composed of Focal Tversky loss and Dice loss addressing class imbalance and improving boundary accuracy.
Yet another object of the present invention is to improve training efficiency and numerical stability through automatic mixed-precision computation, gradient clipping and adaptive learning rate scheduling.
Yet another object of the present invention is to output binary segmentation masks in a format suitable for clinical imaging systems.
Yet another object of the present invention is to validate segmentation performance on annotated pancreatic tumor datasets using the Dice score.
Yet another object of the present invention is to preprocess the original CT image volumes using normalization, resizing and augmentation techniques to standardize input quality and enhance model robustness.

SUMMARY OF THE INVENTION
The present invention provides a computer-implemented system and method for segmentation of pancreatic ductal adenocarcinoma in three-dimensional computed-tomography images implemented on a computing device (D) comprising at least one processor, at least one memory and at least one non-transitory storage medium, configured to execute a pipeline of steps from image preprocessing and synthetic multimodal volume generation to segmentation and generation of tumor masks. The invention comprises of a condition-aware attention-gated three-dimensional UNet (CAAG-UNet3D) model designed to adapt to patient-specific imaging characteristics and deliver high-fidelity segmentation.
An input module first receives the original computed-tomography image volume obtained from the patient. A preprocessing module prepares this volume by scaling voxel intensities to a common range, resizing every image and its matching mask to one fixed three-dimensional grid and introducing controlled data variation through random flips along the three anatomical axes and the addition of mild Gaussian noise. These steps create uniform input data while preserving anatomical detail and adding realistic variations that help the model perform reliably on scans acquired with different scanners and settings.
After preprocessing, a synthetic modality generation module forms a four-channel volume from the single prepared scan. The first channel is the unaltered image. The second channel is produced through histogram equalization, which increases soft-tissue contrast. The third channel is generated by applying Gaussian blur, revealing broader intensity patterns useful for rough localization. The fourth channel is constructed by keeping only the brightest twenty per cent of voxels, which highlights hyper-dense regions that often correspond to tumor tissue. By stacking these four complementary channels the system gains richer information from a single acquisition, eliminating the need for further scans or contrast injections.
The four-channel volume is forwarded to a processing module that contains a three-dimensional encoder–decoder network. The processing module of the invention comprises of a condition-aware attention-gated three-dimensional UNet (CAAG-UNet3D) model. This model comprises an encoder module with residual convolution blocks that extract hierarchical spatial features and a decoder module that reconstructs the original spatial resolution through upsampling operations and skip connection
The encoder extracts feature through residual convolution blocks and progressively reduces spatial resolution while increasing feature depth, so that both local detail and global context are retained.
At each skip connection the network applies an attention module that computes a spatial weighting map, suppressing background features and retaining tumor relevant information before those features are merged with decoder features. The decoder restores spatial resolution through trilinear up-sampling and skip connections, producing a dense feature map that still preserves fine boundaries. The integration of condition-specific attention modules at the skip connections, which use patient-specific gating signals to selectively enhance tumor-relevant features while suppressing irrelevant background data, results in improved focus on clinically significant regions, thereby improving segmentation accuracy and generalization.
An output module converts the final feature map to a single-channel probability volume by means of a one-by-one convolution followed by sigmoid activation, assigning every voxel a likelihood of being part of the tumor. A fixed threshold is applied to the probability values to obtain a binary segmentation mask that clearly separates tumor and non-tumor tissue. The resulting mask can be stored alongside the original computed-tomography image volume in a format that can displayed for review.
The present invention provides a significant technical advancement over existing methods by delivering accurate tumor segmentation from a single computed-tomography scan, eliminating the need for multiple acquisitions or manual boundary tracing. The synthetic four-channel input brings out contrast that single-channel systems fail to deliver, while the attention mechanism adjusts to patient-specific tumor patterns that fixed-weight networks cannot track. The balanced loss function sharpens boundaries in the face of severe class imbalance and the combined use of gradient clipping, mixed-precision computation and adaptive learning rates reduces training time while keeping calculations stable.
The deep learning module of the present invention is trained using a hybrid loss function comprising Focal Tversky and Dice losses, which specifically addresses the class imbalance often seen in medical image segmentation tasks enabling improved learning at tumor boundaries and reduces false negatives, which are critical in oncological applications. The use of synthetic multi-modal inputs enriches feature diversity, improving tumor boundary delineation even in low-contrast or infiltrative cases. Residual convolutional blocks ensure stable gradient flow and deeper feature learning, while minimizing the risk of overfitting. To manage class imbalance, a hybrid loss function combining Focal Tversky Loss and Dice Loss is employed, effectively improving segmentation of small or irregular tumor regions. The model maintains computational efficiency through lightweight encoder designs and mixed precision training, making it suitable for processing high-resolution tree dimensional CT data.
The hardware and software integration in the present invention yields a concrete technical solution providing a robust, efficient and clinically advanced system that improves delineation accuracy and speeds deployment compared with inventions in the existing state of the art.

BRIEF DESCRIPTION OF DRAWINGS
Figure 1 illustrates a structural block diagram of the pancreatic tumor segmentation system, showing the sequential operation of modules responsible for image acquisition, preprocessing, synthetic modality generation, segmentation and tumor-mask output.
Figure 2 illustrates a flowchart depicting the working methodology of the present invention for PDAC segmentation, outlining the sequential steps from image acquisition to the generation of binary tumor masks.
Figure 3 illustrates representative middle axial CT slices corresponding to the four synthetic imaging modalities derived from a single CT scan.
Figure 4 illustrates the architecture of the condition-aware attention-gated UNet-3D (CAAG-UNet3D), including residual convolutional blocks, skip connections and integrated attention modules across the encoder–decoder pathway.
Figure 5 illustrates voxel-wise comparisons between expert-annotated ground-truth tumor masks and the predicted segmentation outputs generated by the present invention for selected patient cases.
Figure 6 illustrates CT slices alongside three-dimensional voxelised renderings of predicted and ground-truth tumor volumes, highlighting spatial conformity and segmentation accuracy.
Figure 7 illustrates qualitative segmentation results on representative CT slices, comparing the performance of the present invention with other 3D segmentation networks in terms of boundary delineation.

DETAILED DESCRIPTION OF THE INVENTION WITH ILLUSTRATIONS AND NON-LIMITING EXAMPLES
While the invention has been disclosed with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt to a particular situation or material to the teachings of the invention without departing from its scope. However, one of the ordinary skills in art will readily recognize that the present disclosure including the definitions listed here below are not intended to be limited to the embodiments illustrated but is to be accorded with the widest scope consistent with the principles and features described herein.
Throughout the specification and claims, the following terms take the meanings explicitly associated herein unless the context clearly dictates otherwise. The meaning of “a”, “an” and “the” include plural references. Additionally, a reference to the singular includes a reference to the plural unless otherwise stated or inconsistent with the disclosure herein.
A person of ordinary skill in art will readily ascertain that the illustrated steps detailed in the figures and here below are set out to explain the exemplary embodiments shown and it should be anticipated that ongoing technological development will change the way functions are performed. It is to be noted that the drawings are to be regarded as being schematic representations and elements that are not necessarily shown to scale. Rather, the various elements are represented such that their function and general purpose becomes apparent to a person skilled in the art.
The present invention relates to a system and method for the automatic segmentation of pancreatic ductal adenocarcinoma (PDAC) from three-dimensional computed tomography (CT) image volumes using a condition-aware attention-gated three-dimensional UNet model. The system (100) is implemented on a computing device (D) comprising at least one processor, at least one memory and at least one non-transitory computer readable storage medium configured to store and execute software modules described herein.
The computing device (D) may further include a graphical user interface, a display interface and a communication interface such as a DICOM adapter or network interface, enabling integration with standard medical imaging infrastructure. The graphical user interface is configured to display the resulting binary segmentation mask together with the corresponding CT scan for review and interpretation.
In certain embodiments, the computing device (D) may further include a graphical processing unit (GPU) configured to accelerate matrix operations and deep-learning inference, particularly within the processing module (104). The GPU enables efficient handling of large 3D CT volumes and improves real time segmentation performance.

COMPONENTS OF THE INVENTON:
Input Module (101)
The system (100) comprises at least one input module (101) configured to receive original volumetric CT image data of pancreatic ductal adenocarcinoma. The input module (101) may be implemented as or operatively coupled to a DICOM (Digital Imaging and Communications in Medicine) interface or a communication interface, enabling the system to transmit or receive computed tomography image data to and from external imaging systems, servers, Picture Archiving and Communication Systems (PACS) or Hospital Information Systems (HIS). The received image volumes serve as the foundation for downstream preprocessing, synthetic channel generation and segmentation tasks. The module ensures compatibility with standard clinical imaging formats and serves as the entry point for patient-specific imaging data into the system.
Preprocessing Module (102)
The preprocessing module (102) is responsible for preparing the raw input data for subsequent processing. The preprocessing include:
Normalization of intensity values: The intensity values of the CT volume are standardized to a uniform scale, which mitigates the variation in contrast and brightness across scans from different sources or equipment.
Resizing to a fixed three-dimensional voxel dimension: All volumes are resized to a standardized shape using bilinear interpolation for CT image inputs and nearest-neighbor interpolation for segmentation masks. This harmonization ensures that the data can be batch-processed during training and inference without introducing shape-related inconsistencies.
Application of image augmentation techniques: Specifically, random flipping along different axes and Gaussian noise injection are used to increase the diversity of training data, improve model generalization and enhance robustness to input variations.
Synthetic Modality Generation Module (103)
The synthetic modality generation module (103) constructs a four-channel synthetic image volume by deriving four different processed versions of the original CT image:
First channel: The original CT scan is retained as-is to preserve raw imaging features.
Second channel: A histogram equalized version of the original scan is generated to enhance contrast and visibility of the tumor boundary.
Third channel: A Gaussian-blurred version is created to smooth fine-grained noise and highlight coarse structures.
Fourth channel: A high-intensity map produced by thresholding voxel values to keep the top twenty per cent of intensities, thereby accentuating hyper-dense regions frequently associated with tumor tissue.
All four channels are derived from the same input volume. This allows extraction of diverse, complementary contrast features from a single scan, removing the need for multi-modal acquisition. The resulting synthetic volume forms a rich multimodal representation that serves as input to the deep learning segmentation model.
Processing Module (104) and Deep Learning Module (105)
The processing module (104) comprises a processor operatively coupled to a deep learning module (105), the processor configured to execute a condition-aware attention-gated three-dimensional UNet model for segmentation of pancreatic tumors.
The deep learning module comprises of:
Encoder module (105A): This consists of residual convolution blocks, each including two three-dimensional convolutional layers, normalization, ReLU activations and identity skip connections. These blocks progressively extract high-level spatial features while reducing the resolution through down-sampling, enabling hierarchical feature learning across spatial scales.
Decoder module (105B): The decoder reconstructs the spatial resolution through trilinear upsampling followed by three-dimensional convolution operations. This process mirrors the encoder’s structure in reverse and aims to restore fine-grained spatial details lost during downsampling.
Attention module (105C): Integrated at each skip connection between the encoder and decoder, the condition-aware attention mechanism dynamically adapts the skip-connection features based on patient-specific imaging characteristics. This allows the model to emphasize tumor-relevant regions and suppress irrelevant background features before merging encoder and decoder features. The attention module operates as a filtering mechanism, allowing selective feature propagation during decoding, thereby improving segmentation precision for anatomically complex tumor regions.
This condition-aware attention-gated architecture is critical to the system’s novelty, enabling personalized segmentation aligned with the specific anatomical and imaging features of each patient.
Output Module (106)
The output module (106) processes the final feature map from the decoder and produces the segmentation output:
A one-by-one convolution is applied to the final decoded feature map, followed by a sigmoid activation function, resulting in a single-channel probability map. Each voxel in this map represents the predicted likelihood of being part of a tumor.
A thresholding operation is then applied to this probability map (typically at 0.5 as per standard binary segmentation practices) to generate a binary segmentation mask, which separates tumor regions from healthy tissue.
This binary mask is formatted for compatibility with clinical imaging systems so the mask can be displayed alongside the original scan.

Training Strategies and Loss Functions
The model is trained using a hybrid loss function, combining Focal Tversky Loss and Dice Loss, to balance sensitivity to boundary details and class imbalance.
Training is further stabilized and accelerated by employing automatic mixed-precision (AMP) computation, gradient clipping and an adaptive OneCycleLR learning rate schedule

WORKING OF THE INVENTION:
The condition-aware attention-gated three-dimensional UNet model of the present invention leverages residual learning, attention mechanisms and multi-scale feature extraction to achieve this objective. The methodology consists of several key stages: Data acquisition, data preprocessing, synthetic modality generation, model architecture, loss function optimization, training procedures and evaluation metrics. Figure 1 illustrates the structural block diagram of the system for segmentation of pancreatic ductal adenocarcinoma in three-dimensional computed-tomography images and Figure 2 illustrates the flowchart of the methodology of the present invention.
Data acquisition
The dataset used in the present invention were source from various clinical settings and consists of contrast-enhanced CT scans along with corresponding ground truth segmentation masks and were manually annotated by expert medical professionals.
Data Preprocessing
The preprocessing pipeline involves normalization, resizing and data augmentation to enhance numerical stability and improve generalization. CT images exhibit inconsistent intensity ranges due to different acquisition protocols and scanners. To address this, min-max normalization is applied to scale pixel values to the range [0,1]:
I^'=(I-I_min)/(I_max-I_min+ε)
where I' is the normalized image, I_min and I_max are the minimum and maximum pixel intensities and ε is a small constant to prevent division by zero. This transformation ensures that images have a uniform intensity distribution, improving stability during training.
Since ground truth segmentation masks may have varying intensity levels, they are binarized using a threshold function:
M^'=L(M>0.5)
where L(x) is the indicator function defined as:
L(x)={█(1,x>0.5@0,x≤0.5)┤
This ensures that all non-zero values in the mask are set to 1, while the background remains 0.
Given that CT scans vary in spatial dimensions, images and masks are resized to a uniform shape of (64, 96, 96) voxels. To retain anatomical structures in images, bilinear interpolation is used:
I_resized (x,y,z)=∑_(i=0)^1▒∑_(j=0)^1▒∑_(k=0)^1▒〖w_ijk I(i,j,k)〗
where〖 w〗_ijk are the interpolation weights determined by the neighboring voxel values. Conversely, since masks contain discrete values representing tumor regions, they are resized using nearest-neighbor interpolation:
M_resized (x y z)=M(⌊x⌋,⌊y⌋,⌊z⌋)
which assigns each pixel to the nearest available voxel in the original mask.
To increase data diversity and prevent overfitting, random transformations are applied to the images. The augmentations include random flipping along different axes: I^'=I(H-x,y,z) or I^'=I(x,W-y,z), where H and W are the height and width of the image. Additionally, Gaussian noise is injected into the images to simulate scanner variations:
I^'=I+ℵ(0,〖 σ〗^2 ), σ=0.01
where ℵ is a Gaussian distribution with a mean of zero and a standard deviation of 0.01. This prevents the network from learning scanner-dependent artifacts.
Normalizing clinical data C_k∈R^F to have zero mean and unit variance:
〖C'〗_k=(C_k-μc)/σc
Where μc=1/n ∑_(k=1)^n▒〖C_k,〗 σc=√(1/n ∑_(k=1)^n▒〖(C_k-μc)〗^2 )
Synthetic modality generation
Synthetic multi-modal medical images were generated from the pre-processed images and utilized as input channels for the condition-aware attention-gated three-dimensional UNet model (CAAG-UNet3D). The T1-weighted imaging (T1) modality remained unmodified, serving as the baseline reference. Contrast-Enhanced T1 (T1ce) images were processed using histogram equalization to enhance contrast by transforming pixel intensities based on their cumulative distribution function (CDF), ensuring a uniform intensity distribution. The T2-weighted imaging modality was synthetically simulated by applying a Gaussian blur with a standard deviation of σ=1, mimicking the soft tissue contrast characteristic of T2 imaging. The Fluid-Attenuated Inversion Recovery (FLAIR) modality was generated by thresholding the top 20% of the brightest intensities, where voxel intensities above the 80th percentile were retained while the rest were set to zero, emphasizing hyperintense regions. These synthesized imaging modalities were then stacked as separate input channels to create a multi-modal representation. Figure 3 illustrates representative middle axial CT slices corresponding to the four synthetic imaging modalities derived from a single CT scan showing middle axial slices from four CT phases: T1-weighted, T1 contrast-enhanced (T1ce) and T2-weighted and FLAIR.
The condition-aware attention-gated three-dimensional UNet model of the present invention is an advanced 3D U-Net architecture specifically designed for the segmentation of PDAC tumors in volumetric CT scans. This model extends the conventional U-Net by integrating residual connections, channel-wise attention mechanisms and skip connections, significantly enhancing segmentation accuracy.
Figure 4 illustrates the architecture of the condition-aware attention-gated UNet-3D (CAAG-UNet3D), including residual convolutional blocks, skip connections and integrated attention modules across the encoder–decoder pathway executed by at least one processor of the computing device (D) within the deep learning module (105) of the processing module (104).
Training deep neural networks often suffers from gradient degradation, which slows down convergence and makes deeper layers ineffective. Residual connections address this by allowing a direct shortcut path for gradients, ensuring efficient backpropagation and enabling deeper feature representations. PDAC tumors exhibit high heterogeneity, making it difficult for conventional U-Net models to distinguish them from surrounding pancreatic tissue. To overcome this, attention mechanisms are incorporated at every skip connection to enhance the focus on tumor specific features while suppressing irrelevant regions. Also, skip connections bridge the encoder and decoder, preventing the loss of high-resolution anatomical information that is crucial for accurate segmentation.
Encoder module: Feature Extraction Using Residual Convolutions
Each Residual Convolution Blocks (RCBs) consists of two consecutive 3D convolutional layers with kernel size 3×3×3, along with batch normalization to stabilize training and ReLU activation for non-linearity. The residual mapping in each RCB ensures that the transformation learns only the residual function, allowing efficient gradient flow during backpropagation.
Mathematically, an RCB is represented as:
Y=ReLU(BN(W_2*ReLU(BN(W_1*X))+X))
Where X is the input feature tensor of shape C×D×H×W, where C represents the number of channels and D,H,W are the depth, height and width of the 3D volume, W_1 and W_2 are the learned 3D convolutional kernels of size 3×3×3 and BN(⋅) denotes batch normalization applied to each convolutional output to normalize feature distributions.
Each stage of the encoder progressively reduces the spatial resolution of feature maps through stride-2 convolutions, which down-sample the input feature maps while increasing the number of channels. Given an input volume X, the down-sampled representation X′ is obtained as:
X^'=W_d*X
where W_d is a 3D convolution kernel with a stride of 2, reducing the spatial resolution by half. This operation captures high-level semantic information while preserving computational efficiency.
Decoder Module: Up-Sampling and Skip Connections
The decoder reconstructs the segmentation mask by progressively upsampling the encoded feature maps and concatenating them with the corresponding high-resolution feature maps from the encoder through skip connections. These skip connections help retain fine spatial details lost during down-sampling.
The up-sampling operation is performed using trilinear interpolation defined as:
U(X)=interp(X,scale factor=2)
where U(X) represents the up-sampled feature map and interp() denotes interpolation-based up-sampling with a factor of 2 along each dimension.
Each up-sampled feature map is concatenated with the corresponding feature map from the encoder, ensuring that fine details from early layers are preserved. Mathematically, the concatenation operation can be expressed as:
X_concat=concat(X_up,X_skip)
where X_up is the up-sampled feature map and X_skip is the feature map from the encoder.
The concatenated feature maps are passed through Residual Convolution Blocks (RCBs) for refinement, like the encoder, ensuring that high-frequency details are maintained throughout the reconstruction process.
Final Output Layer
The final segmentation output is generated using a 1×1×1 convolution, which projects the multi-channel feature representation to a single-channel probability map. This operation is expressed as:
P(X)=σ(W_out*X)
Where W_out is a learnable 1×1×1 convolution kernel that maps the feature channels to a single output channel and σ() is the sigmoid activation function which ensures that the output probability values are within the range [0,1] suitable for binary segmentation.
The final output P(X) represents the predicted probability of each voxel belonging to the PDAC region. A thresholding operation is applied to obtain the final binary segmentation mask:
M(x)={█(1, P(x)≥0.5@0, P(x)<0.5)┤
Here, the threshold value is set to 0.5, to classify each voxel as tumor or background.

Loss Function Design
To effectively address the class imbalance inherent in PDAC segmentation, a hybrid loss function combining Focal Tversky Loss (FTL) and Dice Loss is employed. The FTL component enhances sensitivity to hard-to-segment regions by penalizing false negatives more than false positives, while Dice Loss ensures maximal overlap between predicted and ground truth masks, thereby improving segmentation accuracy.
Focal Tversky Loss (FTL)
Focal Tversky Loss (FTL) is an extension of the Tversky Index, which generalizes the Dice coefficient by allowing differential weighting of false negatives (FN) and false positives (FP). The FTL function is mathematically defined as:
F(FTL)=(1-(TP+ε)/(TP+α*FN+β*FP+ϵ))^γ
Here, TP and FN represent correctly segmented tumor regions and missed tumor regions respectively, α and β are weighting factors that control the trade-off between false negatives and false positives and γ is the focusing parameter which enhances the contribution of hard-negative examples.
By raising the Tversky Index ((TP+ε)/(TP+α*FN+β*FP+ϵ)) complement to the power of γ, the FTL loss prioritizes difficult-to-segment tumor regions, ensuring improved sensitivity in detecting small tumor structures.
Dice Loss
Dice Loss is included in the loss function to maximize the overlap between the predicted segmentation mask and the ground truth. It is derived from the Dice Similarity Coefficient (DSC) and is formulated as:
F(Dice)=1-(2*|X∩Y|+ε)/(|X|+|Y|+ε)
Here, X is the predicted segmentation mask, Y is the ground truth segmentation mask. |X∩Y| represents the intersection (true positive region) between the prediction and ground truth, |X| and |Y| denote the total number of predicted and actual tumor voxels, respectively.
Total Loss Function
The final loss function is formulated as a weighted combination of Focal Tversky Loss and Dice Loss:
Loss=θ_1 F(FTL)+θ_2 F(Dice)
Here, θ_1 and θ_2 are hyperparameters that control the relative contribution of FTL and Dice Loss, respectively.
Training strategies and optimization
The input to the model comprises synthetic multi-modal medical images, specifically T1-weighted, contrast-enhanced T1, T2-weighted and FLAIR images. These four modalities are concatenated along the channel dimension, forming a four-channel 3D input volume. The model is trained using a batch size of 2, suitable for large 3D volumes and GPU memory limitations. Training is conducted for 200 epochs, using the AdamW optimizer, which decouples weight decay from gradient updates for better generalization. A OneCycleLR scheduler is employed to dynamically vary the learning rate, promoting fast convergence while reducing the risk of getting trapped in suboptimal minima.
Learning Rate Scheduling
The OneCycleLR scheduler adjusts the learning rate over time t using a cosine annealing strategy, defined as:
LR(t)=LR_max*(1+cos⁡(πt/T))/2
where LR_max is the maximum learning rate and T is the total number of epochs. This cyclical adjustment improves learning efficiency by initially increasing the learning rate to escape shallow minima and gradually decreasing it to refine the model during the later stages of training.
Gradient Clipping
To prevent gradient explosion, especially during backpropagation through deep residual networks, gradient clipping is applied. The gradients θ are clipped using the following rule:
θ=θ*max⁡(1,‖θ‖/c)
where ‖θ‖ is the norm of the gradient and c=1 is the clipping threshold. This ensures that excessively large gradients do not destabilize the training process, leading to smoother convergence.
Mixed Precision Training (AMP)
To reduce memory consumption and accelerate training, Automatic Mixed Precision (AMP) is utilized. This involves performing computations in FP16 (16-bit floating point) while maintaining model stability using FP32 (32-bit) master weights. The conversion between formats is expressed as:
X_FP16=X_FP32*2^(-scale)
where the scale factor is dynamically adjusted to maintain numerical stability and avoid underflow. Mixed precision training significantly improves computational efficiency, allowing for larger batch sizes and faster model iterations without sacrificing accuracy.
Evaluation Metrics and Results
To assess the performance of the segmentation model of the present invention, standard voxel-wise evaluation metrics are used, focusing on spatial overlap and segmentation accuracy. The primary evaluation metric is the Dice Score, which quantifies the similarity between the predicted segmentation and the ground truth.
In addition to Dice, the Intersection over Union (IoU), also known as the Jaccard Index, along with Volume Overlap Error (VOE) is used to evaluate segmentation quality. IoU is defined as:
IoU=(TP+ε)/(TP+FN+FP+ε)

Here, TP, FN and FP are true positives, false negatives and false positives, respectively. This metric complements Dice by penalizing over-segmentation and under-segmentation. VOE quantifies the dissimilarity between two segmented volumes by measuring the percentage of non-overlapping voxels relative to the union of both segmentations, with higher values indicating greater segmentation discrepancy.
During training, model checkpointing is employed to save the model whenever an improvement in the test Dice Score is observed, ensuring that the best-performing weights are preserved. Furthermore, to avoid overfitting and reduce computational cost, early stopping is applied with a patience of 20 epochs, stopping training if no improvement in validation performance is detected over this period.
RESULTS:
Quantitative and Qualitative Evaluation
To assess the segmentation performance of the condition-aware attention-gated three-dimensional UNet model of the present invention on the synthetic multi-modal CT dataset, the predictions were evaluated using three standard overlap-based metrics: Dice Similarity Coefficient (Dice), Volume overlap error (VOE) and Intersection over Union (IoU). These metrics were computed on the held-out test set consisting of 20% of the full dataset, with each subject’s prediction compared against its corresponding expert-annotated ground truth mask.
The model achieved a mean Dice score of 0.832 ∓ 0.027 and a mean IoU of 0.766 ∓ 0.034, with 12% VOE indicating a strong agreement between predicted and ground truth segmentations. These results suggest that the model is capable of reliably localizing pancreatic tumors, despite variability in size, shape and contrast across patients.
To qualitatively evaluate the effectiveness of the condition-aware attention-gated three-dimensional UNet model, a visual comparison is performed between the input CT image, ground truth tumor mask and the predicted segmentation output. Representative slices from four different patient cases are presented in Figure 4, including both high-performing cases (with high Dice scores) and challenging examples (with low-to-moderate overlap due to complex tumor boundaries).
As shown in Figure 5, illustrates voxel-wise comparisons between expert-annotated ground-truth tumor masks and the predicted segmentation outputs generated by the present invention for selected patient cases. The model accurately captures the tumor boundaries in cases with well-defined lesion margins and clear contrast differentiation, demonstrating a high degree of spatial overlap with the reference masks. In challenging cases, where tumor boundaries are ambiguous due to infiltration, low contrast or perivascular extension, the model still manages to localize the core region of the tumor with reasonable accuracy. However, slight under-segmentation or over-segmentation may occur, particularly near the periphery of the lesion or at interfaces with adjacent anatomical structures.
Beyond 2D comparisons, the 3D tumor structure was examined using volumetric renderings to evaluate shape consistency, completeness and spatial coverage. Figure 6 illustrates CT slices alongside three-dimensional voxelised views of ground-truth (blue) tumor volumes and predicted (green) tumor volumes highlighting spatial conformity for a representative subject. These visualizations clearly depict the topological conformity between prediction and ground truth. The model successfully reconstructs the overall morphology and volume of the tumor, preserving its spatial continuity and anatomical extent. Minor discrepancies are primarily located at boundary voxels, which are tolerable for clinical interpretability.
Component-wise Contribution to Segmentation Performance
To evaluate the individual contribution of each component within the segmentation pipeline, a detailed study is conducted. This analysis involved systematically modifying or removing specific architectural and training elements namely the use of synthetic multi-modal input, the Focal Tversky loss function, residual convolutional blocks and data augmentation strategies while keeping all other configurations constant. The mean Dice score, IoU score and VOE score on the test set were used as the primary performance indicator. Table 1 shows the results of the performance comparison.

When the model was trained using only the T1 synthetic modality, excluding the additional channels from T1ce, T2 and FLAIR, the Dice score dropped significantly to 0.743 (∓0.031), compared to 0.832 (∓0.027) with the full multi-modal input, along with a IoU score of 0.691 (∓0.057) and VOE 17%. This highlights the importance of leveraging complementary imaging contrasts provided by the synthetic modalities, which aid in better tumor boundary delineation and structural interpretation.
Replacing the Focal Tversky loss with the standard Dice loss also led to a noticeable performance reduction, yielding a Dice score of 0.672 (∓0.030), IoU of 0.624 (∓0.047) and VOE 19%. This finding suggests that the Focal Tversky loss has an important role in handling class imbalance and improving sensitivity to small or irregular tumor regions, particularly in cases where false negatives are more critical than false positives.
Furthermore, removing residual convolutional blocks and replacing them with standard sequential convolution layers resulted in a Dice score of 0.625 (∓0.028), IoU of 0.618 (∓0.018) and VOE 20%. This drop indicates the benefit of residual learning in maintaining gradient flow and enhancing deeper feature reuse, which is essential for complex 3D segmentation tasks involving heterogeneous tumor structures.
Finally, when data augmentation techniques were disabled during training, the model’s performance degraded further, with a Dice score of just 0.621 (∓0.034), IoU of 0.604 (∓0.087) and VOE 22%. This highlights the significance of augmentation in preventing overfitting and promoting generalization, especially when working with limited datasets.
Comparative Analysis with Existing Models
To benchmark the performance of the condition-aware attention-gated three-dimensional UNet model with residual learning and synthetic multi-modal input, five widely used 3D segmentation architectures are implemented and evaluated on the same dataset. These include the TransBTS, CBAM-UNet, nnUNet, 3D-ResUNet and UNet++3D. All models were trained under identical experimental settings, using the same training and test splits, optimization parameters and loss function (Focal Tversky Loss) to ensure a fair comparison. Figure 6 presents the visual representation on how each of the UNet based segmentation models differ in segmenting tumors from 4 different samples.
Figure 7 illustrates qualitative segmentation results on four representative CT slices, displaying the ground-truth tumor masks alongside predictions from CAAG-UNet3D and five other state-of-the-art 3D segmentation models. Among these approaches, the condition-aware attention-gated three-dimensional UNet model most faithfully reproduces the true tumor contours, accurately capturing both core and peripheral regions. In contrast, TransBTS and CBAM-UNet often under-segment the lesion margins, while nnUNet occasionally introduces small false positives in non-tumorous areas. Although 3D-ResUNet and UNet++3D generates smooth, cohesive masks overall, they sometimes miss irregular protrusions. This highlights the superior boundary delineation and better segmentation power achieved by the condition-aware attention-gated three-dimensional UNet model of the present invention.
Table 2. Comparison of state-of-the-art UNet based 3D segmentation models with
the present invention

Table 2 shows the results of the evaluation and comparison of other well-known 3D segmentation models with the condition-aware attention-gated three-dimensional UNet model. The comparative results report three evaluation metrics: Dice score, Jaccard Index (IoU) and Volume Overlap Error (VOE). Among all models, CAAG-UNet3D achieved the highest performance across all metrics, with a Dice score of 0.832, Jaccard Index of 0.766 and the lowest VOE at 12%. This indicates a high degree of overlap with the ground truth and minimal volumetric discrepancy. In contrast, the next best-performing model, 3D-ResUNet, achieved a Dice score of 0.814, followed by CBAM-UNet at 0.795. Models like TransBTS and nnUNet showed relatively lower Dice scores (0.761 and 0.787, respectively), along with higher VOE values, suggesting more prominent segmentation errors. While UNet++3D showed moderate performance (Dice: 0.782), it still fell short of the method in both overlap accuracy and volume agreement. This comparative analysis underscores the superior segmentation performance of the condition-aware attention-gated three-dimensional UNet model of the present invention.

COMMERCIAL APPLICATIONS:
The integration of imaging and clinical data for precise tumor segmentation and outcome prediction can be embedded into CDSS platforms. Hospitals and clinics can use such systems to provide visual and analytical support for tumor boundary identification and assessment of anatomical characteristics relevant to surgical and treatment planning workflow.

AI-powered Radiology Workstations:
Incorporating this model into radiology software can enhance the efficiency and accuracy of radiologists. The model's ability to process 3D CT scans and highlight clinically relevant regions reduces the time and effort required for manual segmentation.

Surgical planning and navigation systems:
By providing precise and biologically plausible tumor segmentation, the model can be utilized in to provide preoperative image analysis and anatomical visualization support in surgical planning workflows.

Pharmaceutical research and trials:
Pharmaceutical companies conducting oncology research can leverage this model to assess tumor responses to drugs. The accurate segmentation and outcome prediction features are particularly valuable for monitoring tumor progression or regression during clinical trials.

Telemedicine:
This technology can be integrated into telemedicine platforms to enable remote visualization and computational analysis of CT images, aiding healthcare providers in reviewing patient data alongside clinical context.
, Claims:1. A system (100) for segmentation of pancreatic tumors in volumetric computed tomography images, the system (100) comprising:
• at least one input module (101) configured to receive original computed tomography image volumes;
• at least one preprocessing module (102) operatively coupled to the input module (101), said preprocessing module (102) configured to:
 normalize the intensity values of the received original computed tomography image volumes;
 resize the image volumes to a fixed three-dimensional voxel dimension;
 apply image augmentation techniques including random flipping along different axes and Gaussian noise injection;
• at least one synthetic modality generation module (103) operatively coupled to the preprocessing module (102), said synthetic modality generation module (103) configured to generate a four-channel synthetic image volume derived from the pre-processed image volume, wherein each channel represents a distinct processed version of the received image volume;
• at least one processing module (104) operatively coupled to the synthetic modality generation module (103), said processing module (104) being executed by at least one processor and comprising a deep learning module (105), wherein said deep learning module (105) is based on condition-aware attention-gated three-dimensional UNet model and comprising of:
 an encoder module (105A) comprising residual convolution blocks for feature extraction;
 a decoder module (105B) comprising upsampling operations and skip connections for reconstructing spatial dimensions; and
 an attention module (105C) integrated between the encoder and decoder for modulating skip connection features;
• at least one output module (106) operatively coupled to the processing module (104), said output module (106) configured to:
 project a final decoded feature map into a single-channel probability map by applying a one-by-one convolution followed by sigmoid activation;
 apply a threshold to the probability map to generate a binary segmentation mask representing pancreatic tumor regions; and
 output the binary segmentation mask to a display device, graphical user interface or storage medium within or operatively coupled to the computing device (D),
wherein the system is implemented on a computing device (D) comprising at least one processor, at least one memory and at least one non-transitory storage medium configured to execute each of the modules (101) to (106) of the system (100).
2. The system (100) as claimed in claim 1, wherein the computing device (D) further comprises a graphical processing unit (GPU), a display interface and a graphical user interface configured to display the generated binary segmentation mask together with the original computed tomography image and a communication interface configured to transmit or receive image data or segmentation results to or from an external device, server or imaging system.

3. The system (100) as claimed in claim 1, wherein the attention module (105C) in the skip connections is configured to dynamically adapt to patient-specific features to enable condition-specific segmentation of pancreatic ductal adenocarcinoma based on the four-channel synthetic image volume thereby emphasizing tumor relevant regions and suppressing background features before merging with decoder features.
4. The system (100) as claimed in claim 1, wherein the residual convolution blocks comprise two three-dimensional convolutional layers, normalization layers, ReLU activations and residual identity connections.

5. The system (100) as claimed in claim 1, wherein the decoder module (105 B) performs trilinear upsampling followed by three-dimensional convolution to reconstruct spatial dimensions.

6. The system (100) as claimed in claim 1, wherein the preprocessing module (102) resizes image volumes using bilinear interpolation for computed tomography inputs and nearest-neighbor interpolation for segmentation masks.

7. The system as claimed in claim 1, wherein the pre-processing module (104) normalizes voxel intensities to a standardized range prior to resizing.

8. The system (100) as claimed in claim 1, wherein the pre-processing module (102) resizes each image and mask to a common three-dimensional voxel dimension.

9. The system (100) as claimed in claim 1, wherein the synthetic modality generation module (103) is configured to generate all four channels from the original computed tomography image volume of the patient.

10. The system (100) as claimed in any of the preceding system claims, wherein successive down-sampling stages within the encoder module progressively reduce spatial resolution while increasing the number of feature channels in the encoded representation.

11. A computer implemented method for segmenting pancreatic tumors in volumetric computed tomography images, the method comprising the steps of:
• receiving a computed tomography image volume via an input module (101);
• preprocessing the image volume using a preprocessing module (102) by:
 normalizing intensity values;
 resizing the volume to a fixed voxel dimension; and
 applying augmentation techniques including random flipping along different axes and Gaussian noise injection;

• generating a four-channel synthetic image volume using a synthetic modality generation module (103) by:
 retaining the original image as a first modality;
 applying histogram equalization to create a second modality;
 applying Gaussian blur to create a third modality;
 performing voxel intensity thresholding to retain the top twenty percent of voxels as a fourth modality;

• processing the synthetic image volume using a processing module (104), said processing module (104) being executed by a processor and comprising a deep learning module (105) based on a condition-aware attention-gated three-dimensional UNet model, by:
 extracting features using an encoder module (105A) with residual convolution blocks;
 modulating skip connections using an attention module (105C) integrated between the encoder module (105A) and decoder module (105B);
 reconstructing spatial dimensions using a decoder module (105B) with upsampling operations and skip connections;
• generating a single-channel probability map using a one-by-one convolution followed by sigmoid activation;
• applying a thresholding operation to generate a binary segmentation mask representing tumor regions;
• exporting the segmentation mask using an output module (106) in a format compatible with clinical imaging systems to a display device, graphical user interface or non-transitory storage medium operatively coupled to the computing device (D),
wherein said method is executed on a computing device (D) comprising at least one processor, at least one memory and at least one non-transitory storage medium or a radiology workstation operatively coupled to Hospital Information Systems (HIS) or Picture Archiving and Communication Systems (PACS) via a communication interface configured for automatic and robust segmentation of pancreatic tumors from computed tomography images.

12. The method as claimed in claim 11, wherein the deep learning module (105) is trained using a hybrid loss function that combines Focal Tversky Loss and Dice Loss and employs automatic mixed precision, gradient clipping and an adaptive learning-rate schedule.

Documents

Application Documents

# Name Date
1 202541067838-STATEMENT OF UNDERTAKING (FORM 3) [16-07-2025(online)].pdf 2025-07-16
2 202541067838-FORM FOR SMALL ENTITY(FORM-28) [16-07-2025(online)].pdf 2025-07-16
3 202541067838-FORM 1 [16-07-2025(online)].pdf 2025-07-16
4 202541067838-FIGURE OF ABSTRACT [16-07-2025(online)].pdf 2025-07-16
5 202541067838-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [16-07-2025(online)].pdf 2025-07-16
6 202541067838-EVIDENCE FOR REGISTRATION UNDER SSI [16-07-2025(online)].pdf 2025-07-16
7 202541067838-EDUCATIONAL INSTITUTION(S) [16-07-2025(online)].pdf 2025-07-16
8 202541067838-DRAWINGS [16-07-2025(online)].pdf 2025-07-16
9 202541067838-DECLARATION OF INVENTORSHIP (FORM 5) [16-07-2025(online)].pdf 2025-07-16
10 202541067838-COMPLETE SPECIFICATION [16-07-2025(online)].pdf 2025-07-16
11 202541067838-FORM-9 [30-07-2025(online)].pdf 2025-07-30
12 202541067838-FORM 18 [30-07-2025(online)].pdf 2025-07-30
13 202541067838-Proof of Right [13-10-2025(online)].pdf 2025-10-13
14 202541067838-FORM-26 [13-10-2025(online)].pdf 2025-10-13