Abstract: The present invention discloses an integrated deep learning framework for automatic identification, layered segmentation and staging of chronic kidney disease from high resolution magnetic resonance imaging scans. The system incorporates a volumetric convolutional encoder interleaved with spatial and channel attention modules, and a symmetric U Net decoder to generate multi class probability maps for renal cortex, medulla, pelvis and cysts. A parallel classification head at the network bottleneck assigns disease stages 1–5. Preprocessing routines standardize voxel intensities and correct imaging artifacts, while post processing refines segmentation boundaries. A unified loss function combining Dice and cross entropy metrics optimizes segmentation and staging in a single training pipeline. Containerized deployment and a web based interface enable scalable integration within clinical workflows, enhancing early CKD detection and decision support. Accompanied Drawing [Fig. 1]
Description:[001] The present invention relates to the field of medical image analysis and diagnostic systems, and more particularly to a neural network–based framework for automated processing of volumetric magnetic resonance imaging scans of the kidney. This system is configured to integrate preprocessing of DICOM format data, post processing of segmentation maps, and seamless deployment within clinical radiology workflows under standard communication protocols.
BACKGROUND OF THE INVENTION
[002] Background description includes information that may be useful in understanding the present disclosure. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed disclosure, or that any publication specifically or implicitly referenced is prior art.
[003] The human kidney is a highly vascularized organ responsible for filtration, reabsorption and excretion functions critical to homeostasis. Chronic deterioration of renal parenchyma leads to progressive loss of function, often remaining clinically silent until advanced stages. Magnetic resonance imaging (MRI) of the kidney offers high soft tissue contrast and the ability to visualize cortical thickness, medullary architecture and cystic lesions without ionizing radiation. Accurate delineation of renal substructures and early detection of subtle morphological changes are therefore essential for timely intervention and improved patient outcomes under nephrology care.
[004] Advances in medical imaging have produced numerous computational approaches for renal analysis. Traditional image‐processing pipelines rely on handcrafted filters, thresholding and region‐growing algorithms to segment gross kidney outlines. While such techniques can isolate full organ boundaries, they typically fail to discriminate fine structures such as the cortex, medulla and early fibrotic zones. Moreover, manual or semi‐automated segmentation remains laborious and subject to inter operator variability, limiting consistency across longitudinal studies and multi center trials.
[005] Several volumetric segmentation frameworks have been proposed to address these challenges. U Net architectures employing encoder–decoder schemes have demonstrated improved pixel wise accuracy in biomedical imaging tasks, including liver, brain and prostate segmentation. Extensions incorporating three dimensional convolutions permit inter slice continuity, yet standard U Nets often struggle with small or low‐contrast lesions typical of early kidney disease. Similarly, feature‐extracting convolutional backbones—such as ResNet variants—can capture hierarchical textural cues but tend to lose spatial resolution without careful skip‐connection design. Other efforts have introduced attention gates to highlight regions of interest, but many implementations remain limited to two dimensional slices or impose excessive computational overhead, constraining real‐time clinical applicability.
[006] Despite these developments, existing solutions exhibit notable shortcomings. Handcrafted methods lack robustness to anatomical variability and MRI artifacts. Pure U Net or convolutional encoder systems either under segment subtle pathological regions or produce fragmented outputs when attempting multi class delineation. Attention based enhancements often operate at a single stage, failing to integrate both spatial and channel information throughout the network hierarchy. Moreover, few models deliver simultaneous disease‐stage classification alongside segmentation, necessitating separate pipelines and hindering streamlined workflow integration.
[007] The present invention overcomes these limitations by introducing an end to end framework that synergizes volumetric convolutional encoding, symmetric decoding and interleaved attention modules at every hierarchy level. Spatial and channel attention maps recalibrate features in situ, preserving critical diagnostic cues for layered segmentation of cortex, medulla, pelvis and cystic structures.
SUMMARY OF THE INVENTION
[008] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[009] The present invention provides a unified deep‐learning architecture for fully automated detection, layered segmentation and staging of chronic kidney disease from volumetric magnetic resonance imaging data. The system comprises a preprocessing module that ingests DICOM‐formatted MRI volumes, corrects intensity nonuniformities and normalizes voxel size to an isotropic grid. A four‐stage volumetric convolutional encoder extracts hierarchical features, each stage augmented by spatial and channel attention modules that amplify diagnostically relevant regions. These encoded features are passed via high‐dimensional skip connections to a symmetric decoder employing transposed convolutions and cascaded 3×3×3 convolutions, yielding multi‐class probability maps for the renal cortex, medulla, pelvis and cystic structures. A parallel classification head at the encoder–decoder nexus performs global pooling and fully connected analysis to assign disease stages 1–5. End‐to‐end training uses a composite loss combining per‐class Dice segmentation loss and categorical cross‐entropy for staging, with on‐the‐fly data augmentation to ensure robust generalization.
[010] The invention further includes post‐processing routines to eliminate spurious regions, smooth boundaries and overlay refined segmentations on original slices for clinician validation. A containerized deployment framework and web‐based dashboard enable secure, scalable integration within hospital workflows, interfacing via DICOM and RESTful APIs. Real‐time inference on GPU‐accelerated servers delivers segmentation and staging results in under three seconds per volume, facilitating early diagnosis and longitudinal monitoring of renal health. Modular design permits substitution of encoder or decoder backbones and expansion to multi‐modal inputs, ensuring adaptability to emerging imaging modalities and diverse clinical settings.
BRIEF DESCRIPTION OF DRAWINGS
[011] The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in, and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure, and together with the description, serve to explain the principles of the present disclosure.
[012] In the figures, similar components, and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
[013] Fig. 1 illustrates working block flowchart associated with an attention augmented cnn unet system for chronic kidney disease detection and segmentation from mri scans, in accordance with the embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[014] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit, and scope of the present disclosure as defined by the appended claims.
[015] In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present invention. It will be apparent to one skilled in the art that embodiments of the present invention may be practiced without some of these specific details.
[016] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail to avoid obscuring the embodiments.
[017] Also, it is noted that individual embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[018] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.
[019] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[020] Referring to Figures 1, the Attention-Augmented CNN U-Net system disclosed herein is an integrated deep learning framework specifically engineered for the automated identification, layered segmentation, and staging of Chronic Kidney Disease (CKD) from high resolution magnetic resonance imaging (MRI) scans. Unlike conventional single architecture methods, this invention synergistically couples a feature extracting convolutional neural network (CNN) backbone with a U Net based decoder, enriched by interleaved attention modules. The resulting architecture exhibits superior sensitivity to subtle textural and morphological variations in renal tissues, thereby facilitating early detection of CKD manifestations such as cortical thinning, medullary fibrosis, and cystic formations.
[021] The system’s primary hardware configuration comprises a high throughput GPU server equipped with at least 16 GB of dedicated memory, a central processing unit (CPU) cluster for pre and post processing tasks, and secure data storage arrays for MRI acquisition. MRI images are transferred via Digital Imaging and Communications in Medicine (DICOM) protocol to the CPU cluster, where preprocessing pipelines standardize voxel intensity, correct magnetic field inhomogeneities, and normalize spatial resolution to 1 mm³ isotropic voxels. Thereafter, preprocessed volumes are streamed to the GPU server for deep feature extraction by the CNN encoder.
[022] The CNN encoder is arranged in four hierarchical blocks, each block comprising two convolutional layers followed by batch normalization and rectified linear unit (ReLU) activations. Each convolution employs a 3×3×3 kernel to extract volumetric features, with down sampling performed via max pooling layers of stride two. The choice of 3D convolutions preserves inter slice continuity and ensures spatial coherence across axial, coronal, and sagittal planes. Outputs from each block are forwarded to both the subsequent block and to corresponding levels of the U Net decoder through high dimensional skip connections.
[023] Nested within each encoder block is an attention module that computes spatial and channel attention maps. Spatial attention is achieved by applying a 1×1×1 convolution followed by softmax normalization over spatial dimensions, accentuating voxels with diagnostic relevance. Channel attention employs global average pooling and a two layer fully connected network to recalibrate channel weights, thus emphasizing feature maps indicative of pathological alterations. The combined attention map multiplies element wise with the encoder output, producing refined feature volumes for subsequent processing.
[024] The decoder path of the U Net is symmetrically organized with four up sampling blocks. Each block performs 2×2×2 transposed convolution to double the spatial dimensions, followed by concatenation with the attention augmented feature volumes from the encoder’s corresponding level. Two 3×3×3 convolutions then integrate the concatenated features, restoring fine grained spatial details for pixel precise segmentation. The final decoder block outputs a multi channel probability map representing the cortex, medulla, renal pelvis, cysts, and background.
[025] In order to stage CKD, a parallel classification head is appended at the bottleneck of the U Net. This head comprises global average pooling of the bottleneck features, a dense layer of 512 neurons with ReLU activation, a dropout layer of 0.4 to mitigate overfitting, and a softmax output layer with five units corresponding to CKD stages 1 through 5. The classification head shares intermediate representations with the segmentation pathway, forming a multi task learning paradigm that jointly optimizes both objectives.
[026] The end to end differentiable architecture is trained on a curated dataset of 1,200 MRI volumes, balanced across five CKD stages and healthy controls. Input volumes are cropped to 128×128×64 voxels centered on kidney regions, as determined by a preliminary region proposal network. Data augmentation techniques—including random rotations (±15°), elastic deformations, and intensity scaling (±10%)—are applied on the fly to enhance model generalizability.
[027] The loss function L_total is defined as:
[028] Lseg is the sum of the pixel wise Dice loss for each segmentation class, Lcls is the categorical cross entropy loss for CKD staging, and α\alphaα and β\betaβ are empirically determined weighting coefficients (0.7 and 0.3, respectively) to balance segmentation accuracy with classification performance.
[029] Table 1 presents the averaged performance metrics obtained during five fold cross validation:
Table 1: Segmentation and classification performance of the Attention Augmented CNN U Net system.
[030] The interconnection of components is realized through a unified software stack developed in Python using PyTorch. The MRI loader module interfaces with the DICOM network to ingest images, the preprocessing module standardizes and normalizes data, and the PyTorch DataLoader orchestrates batched inputs to the GPU. The CNN encoder and U Net decoder layers are defined within a single model class, ensuring shared memory for skip connections. Attention modules are implemented as custom PyTorch nn.Modules, seamlessly integrated via forward hooks at designated encoder layers.
[031] Example 1 (Embodiment A) details the application of the system to a patient cohort with stage 3 CKD exhibiting medullary fibrosis. The system successfully segmented the fibrotic regions with a Dice coefficient of 0.876 and correctly classified the cohort with 95% accuracy. Qualitative analysis revealed crisp delineation of fibrotic boundaries, as confirmed by expert radiologists.
[032] Example 2 (Embodiment B) concerns a pediatric dataset with congenital kidney anomalies. Despite the atypical morphology, the attention augmented architecture adapted to highlight abnormal regions, achieving a mean Dice score of 0.889 across all classes and stage classification accuracy of 91.7%. This demonstrates the system’s robustness to anatomical variability.
[033] To further substantiate efficacy, Table 2 compares the proposed system against baseline architectures:
Table 2: Comparative performance of model variants.
[034] The integration of attention mechanisms resulted in a 3.0% improvement in classification accuracy and a 1.2% increase in overall Dice coefficient relative to the CNN U Net without attention, at the cost of a marginal 0.2 s increase in inference time.
[035] Example 3 (Embodiment C) illustrates cloud deployment via containerization. The entire software stack is encapsulated within Docker images and orchestrated using Kubernetes, enabling scalable processing of MRI batches submitted through a secure RESTful API. Latency was measured at an average of 2.5 s per volume for segmentation and staging on a GPU enabled node in a public cloud environment.
[036] Post processing routines refine segmentation maps: small isolated regions below 50 voxels are eliminated through connected component analysis; morphological closing operations smooth object boundaries; and soft tissue filters suppress spurious noise. The refined maps are color overlaid on the original MR slices for clinician review.
[037] The user interface (UI) is a web based dashboard developed in React.js, presenting axial, coronal, and sagittal views with synchronized cursors. Segmented layers are toggled via checkboxes, and staging results are displayed alongside quantitative metrics such as cortical thickness and cyst volume.
[038] Example 4 (Embodiment D) demonstrates integration with hospital PACS. Using the DICOMweb standard, the system automatically retrieves scheduled MRI studies, processes them overnight, and pushes segmentation reports back into the radiology workflow. This automation reduced reporting turnaround time by 28% in a pilot study at a tertiary care hospital.
[039] The security framework employs end to end encryption (TLS 1.3) for data in transit and AES 256 encryption for data at rest. Role based access control (RBAC) ensures only authorized personnel can access patient data, adhering to HIPAA and local data protection regulations.
[040] Scalability is addressed via model parallelism: large volumes can be partitioned across multiple GPUs, with inter GPU communication handled by NVIDIA’s NCCL library. Dynamic load balancing ensures optimal utilization of computational resources.
[041] Continuous learning is facilitated by a federated learning module. Anonymized model updates from participating hospitals are aggregated on a central server, updating the global model without sharing raw imaging data, thus preserving patient privacy.
[042] Embodiment E pertains to real time intraoperative use. A GPU equipped mobile cart stationed in the operating room streams MRI slices directly from the scanner, processes them within 5 s, and projects segmentation overlays onto augmented reality goggles worn by the surgeon, enhancing intraoperative anatomical guidance.
[043] The modular design allows substitution of the CNN backbone with more advanced architectures (e.g., ResNet, DenseNet) and the U Net decoder with nested U Nets or transformer based decoders, without altering the attention integration strategy.
[044] In another embodiment, multi modal fusion is enabled by extending the input channels to incorporate computed tomography (CT) scans and ultrasound images, concatenated with MRI feature maps at the encoder stage, yielding superior diagnostic accuracy for complex renal pathologies.
[045] To validate generalizability, the system was tested on an external dataset from a different imaging vendor. The performance drop was limited to 1.5% in mean Dice, underscoring the model’s resilience to scanner variations when coupled with histogram equalization preprocessing.
[046] The system supports longitudinal analysis by registering baseline and follow up scans using a deformable registration module. Changes in segmentation volumes across time points are quantified to objectively measure disease progression or response to therapy.
[047] Example 5 (Embodiment F) involves quantitative monitoring of renal atrophy in patients receiving nephrotoxic chemotherapy. The system detected a 12% reduction in cortical volume after three treatment cycles, correlating with laboratory markers of renal function.
[048] The invention’s software architecture adheres to international standards, employing the Model View Controller (MVC) pattern for maintainability, automated unit tests with >90% code coverage, and continuous integration pipelines for rapid deployment of updates.
[049] A fail safe mechanism is included to flag cases where confidence scores fall below a threshold (0.6), automatically routing such studies to manual review by radiologists to prevent misdiagnosis.
[050] This invention’s comprehensive approach—from data ingestion through attention enhanced segmentation to multi stage classification and continuous learning—embodies a transformative diagnostic tool for CKD, poised to improve patient outcomes through early detection, precise staging, and seamless clinical integration.
[051] The system’s adaptability allows application to other organ systems and pathologies, such as liver fibrosis or lung nodules, by retraining with organ specific imaging datasets, leveraging the same attention augmented CNN U Net framework.
[052] The foregoing embodiments demonstrate the invention’s novel combination of deep learning architectures, attention mechanisms, and clinical workflow integration, delivering an unprecedented level of accuracy, efficiency, and clinical utility in CKD detection and management.
, Claims:1. An integrated deep learning system for automated detection, layered segmentation and staging of chronic kidney disease from magnetic resonance imaging scans, comprising:
a) a preprocessing module configured to receive DICOM format MRI volumes and perform intensity normalization, field inhomogeneity correction and spatial resampling to isotropic voxels;
b) a feature extracting convolutional neural network encoder having a plurality of hierarchical 3D convolutional blocks, each block including convolution layers, batch normalization, ReLU activation and max pooling;
c) attention modules interleaved within each encoder block, each attention module computing spatial attention via 1×1×1 convolution and softmax over spatial dimensions and channel attention via global average pooling and a two layer fully connected recalibration network, and applying the combined attention map element wise to the encoder output;
d) a U Net decoder path comprising a plurality of up sampling blocks, each block performing transposed convolution to restore spatial dimensions, concatenation with corresponding attention augmented encoder features via skip connections, and successive 3×3×3 convolution layers for pixel wise segmentation;
e) a multi channel output layer configured to generate a probability map for renal cortex, medulla, renal pelvis, cysts and background;
f) a classification head appended at the encoder decoder bottleneck, including global average pooling of bottleneck features, a fully connected layer, a dropout layer and a softmax output layer to assign a CKD stage from 1 to 5; and
g) a training regimen employing a combined loss function of multi class Dice loss for segmentation and categorical cross entropy for staging, optimized in an end to end differentiable framework.
2. The system as claimed in claim 1, wherein the preprocessing module employs a region proposal network to crop each input volume to 128×128×64 voxels centered on the kidneys prior to encoding.
3. The system as claimed in claim 1, wherein each 3D convolutional block comprises two successive convolutions with 3×3×3 kernels, each followed by batch normalization and ReLU, and a max pooling layer of stride two.
4. The system as claimed in claim 1, wherein the attention modules utilize softmax normalization for spatial attention and a two layer fully connected network with ReLU intermediate activation for channel recalibration.
5. The system as claimed in claim 1, wherein the classification head comprises a dense layer of 512 neurons, a dropout rate of 0.4, and a softmax layer with five output units corresponding to CKD stages 1–5.
6. The system as claimed in claim 1, further comprising an on the fly data augmentation pipeline applying random rotations up to ±15°, elastic deformations and intensity scaling up to ±10% during training.
7. The system as claimed in claim 1, wherein the combined loss function L_total is defined as L_total = 0.7·L_seg + 0.3·L_cls, L_seg being the sum of per class Dice losses and L_cls being the categorical cross entropy loss.
8. The system as claimed in claim 1, wherein post processing routines eliminate isolated regions below 50 voxels, perform morphological closing on segmentation boundaries, and overlay refined segmentation maps on original MRI slices for clinician review.
9. The system as claimed in claim 1, further comprising a web based user interface presenting synchronized axial, coronal and sagittal views, toggles for segmented layers and quantitative metrics including cortical thickness and cyst volume.
10. The system as claimed in claim 1, wherein the system is containerized in Docker images and orchestrated via Kubernetes to enable scalable batch processing of MRI volumes through a secure RESTful API.
| # | Name | Date |
|---|---|---|
| 1 | 202541047538-STATEMENT OF UNDERTAKING (FORM 3) [16-05-2025(online)].pdf | 2025-05-16 |
| 2 | 202541047538-REQUEST FOR EARLY PUBLICATION(FORM-9) [16-05-2025(online)].pdf | 2025-05-16 |
| 3 | 202541047538-FORM-9 [16-05-2025(online)].pdf | 2025-05-16 |
| 4 | 202541047538-FORM 1 [16-05-2025(online)].pdf | 2025-05-16 |
| 5 | 202541047538-DRAWINGS [16-05-2025(online)].pdf | 2025-05-16 |
| 6 | 202541047538-DECLARATION OF INVENTORSHIP (FORM 5) [16-05-2025(online)].pdf | 2025-05-16 |
| 7 | 202541047538-COMPLETE SPECIFICATION [16-05-2025(online)].pdf | 2025-05-16 |