Sign In to Follow Application
View All Documents & Correspondence

A Transformer Augmented System Of Efficient Unet++ Architecture For Advanced Medical Image Denoising And Diagnostic Enhancement

Abstract: A TRANSFORMER-AUGMENTED SYSTEM OF EFFICIENT UNET++ ARCHITECTURE FOR ADVANCED MEDICAL IMAGE DENOISING AND DIAGNOSTIC ENHANCEMENT The invention relates to a medical image denoising system and method integrating transformer encoders with an Efficient UNet++ architecture to achieve high-fidelity diagnostic enhancement. The system includes a preprocessing unit for noise isolation, efficient encoder blocks for multi-resolution feature extraction, transformer modules for global contextual awareness, and a UNet++ skip decoder for structural preservation. Channel attention is incorporated through squeeze-and-excitation units, while deep supervision ensures stability and interpretability. The output includes a denoised medical image and an optional uncertainty heat map to aid clinician trust. Unlike conventional denoising approaches, the invention maintains edge details, adapts across imaging modalities, and operates efficiently in real-time clinical workflows. The invention is suitable for integration into PACS infrastructure, enabling artifact-free and diagnostically reliable imaging for MRI, CT, X-ray, mammogram, and PET modalities.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
22 September 2025
Publication Number
43/2025
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

SR UNIVERSITY
ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Inventors

1. NAGARAJU PANAGNATI
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA
2. SREEDHAR KOLLEM
SR UNIVERSITY, ANANTHSAGAR, HASANPARTHY (M), WARANGAL URBAN, TELANGANA - 506371, INDIA

Specification

Description:FIELD OF THE INVENTION
The present invention relates to the field of medical imaging and diagnostic enhancement. More specifically, it concerns a transformer-augmented Efficient UNet++ based system and method for advanced medical image denoising. The invention integrates convolutional and transformer-based architectures with efficient attention modules and deep supervision to achieve artifact-free, edge-preserving, and modality-agnostic denoising suitable for clinical workflows.
BACKGROUND OF THE INVENTION
Medical imaging plays a critical role in diagnosis, treatment planning, and disease monitoring. However, noise artifacts introduced during acquisition—due to low-dose protocols, hardware limitations, or patient motion—significantly degrade image quality in modalities like MRI, CT, X-rays, and mammograms. This noise compromises diagnostic precision, obscures subtle pathology, and leads to potential misinterpretations.
Traditional denoising methods (e.g., median filtering, anisotropic diffusion, or wavelet thresholding) and early deep learning approaches (CNNs, autoencoders) suffer from the following limitations:
Over-smoothing of fine anatomical structures and edges.
Poor generalization across imaging modalities or institutions.
Inability to separate structured noise from diagnostic texture.
Lack of global context awareness for contextual denoising.
Absence of model interpretability or feedback for clinical trust.
Despite advancements, most current models (e.g., U-Net, GAN-based denoisers) lack the ability to integrate both local detail preservation and global contextual understanding, essential for reliable clinical decision-making. Thus, a next-generation solution is urgently needed to provide artifact-free, edge-preserving, and modality-adaptable denoising for modern diagnostic workflows.
US11207035B2: A framework for sensor-based patient treatment support. In accordance with one aspect, one or more sensors are used to acquire sensor data of one or more objects of interest. The sensor data is then automatically interpreted to generate processing results. One or more actions may be triggered based on the processing results to support treatment of a patient, including supporting medical scanning of the patient.
US11769229B2: A computer-implemented method is provided for improving live video quality. The method comprises: (a) acquiring, using a medical imaging apparatus, a stream of consecutive image frames of a subject; (b) feeding the stream of consecutive image frames to a first set of denoising components, wherein each of the first set of denoising components is configured to denoise an image frame from the stream of consecutive image frames in a spatial domain to output an intermediate image frame; (c) feeding a plurality of the intermediate image frames to a second denoising component, wherein the second denoising component is configured to (i) denoise the plurality of the intermediate image frames in a temporal domain and (ii) generate a weight map; and outputting a final image frame with improved quality in both temporal domain and spatial domain based at least in part on the weight map.
Medical imaging is indispensable for diagnosis and treatment planning. However, images often suffer from noise due to low-dose acquisition protocols, hardware limitations, or patient motion. Existing denoising approaches either smooth out essential anatomical details, lack global contextual awareness, or fail to generalize across imaging modalities. Moreover, current solutions provide limited interpretability for clinicians and cannot consistently preserve subtle pathologies such as lesions or calcifications. The present invention addresses these problems by combining transformer-based global context modeling with efficient convolutional pathways and deep supervision, resulting in reliable, interpretable, and high-quality denoising across diverse medical modalities.
SUMMARY OF THE INVENTION
This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention.
This summary is neither intended to identify key or essential inventive concepts of the invention and nor is it intended for determining the scope of the invention.
The invention proposes MedCleanNet++, a next-generation medical image denoising system that combines a transformer-augmented encoder, Efficient UNet++ architecture, channel attention through squeeze-and-excitation blocks, and deep supervision for enhanced reliability. The architecture ensures both global contextual understanding and fine-grained structural preservation.
The system processes raw medical images through a preprocessing stage, optionally applying wavelet transforms to isolate noise-frequency features. Features are then extracted using efficient encoder blocks, followed by transformer modules that capture long-range spatial dependencies. Nested skip connections in the UNet++ decoder allow effective feature fusion, while channel attention modules emphasize diagnostically relevant structures. Deep supervision stabilizes training and ensures interpretability, while the output layer provides both denoised images and optional uncertainty maps.
This approach improves adaptability across imaging modalities, reduces computational cost, and enhances diagnostic usability in real-time workflows. The novelty lies in the integrated use of transformer attention within an Efficient UNet++ framework, reinforced by deep clinical optimization.
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
The invention introduces MedCleanNet++, a next-generation Transformer-enhanced Efficient UNet++ model designed for precision medical image denoising. The architecture integrates:
BRIEF DESCRIPTION OF THE DRAWINGS
The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
Figure 1: System block diagram of MedCleanNet++
Figure 2: Architecture showing Transformer-UNet++ integration
Figure 3: Output comparison (noisy vs. denoised vs. ground truth)
The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.
DETAILED DESCRIPTION OF THE INVENTION
The detailed description of various exemplary embodiments of the disclosure is described herein with reference to the accompanying drawings. It should be noted that the embodiments are described herein in such details as to clearly communicate the disclosure. However, the amount of details provided herein is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
It is also to be understood that various arrangements may be devised that, although not explicitly described or shown herein, embody the principles of the present disclosure. Moreover, all statements herein reciting principles, aspects, and embodiments of the present disclosure, as well as specific examples, are intended to encompass equivalents thereof.
The terminology used herein is for the purpose of describing embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a",” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
In addition, the descriptions of "first", "second", “third”, and the like in the present invention are used for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The invention introduces MedCleanNet++, a next-generation Transformer-enhanced Efficient UNet++ model designed for precision medical image denoising. The architecture integrates:
 Transformer encoders for global contextual learning,
 Dense skip pathways from UNet++ for feature reuse and resolution preservation,
 Efficient Net-based encoder blocks for parameter efficiency,
 Squeeze-and-Excitation (SE) blocks for channel-wise attention,
 Deep supervision layers for stabilized and interpretable training,
 Optional uncertainty maps for clinician trust and risk visualization.
Architecture Workflow:
1. Input Pre processing:
o Raw medical images (MRI, X-ray, CT) are normalized and optionally passed through a wavelet transform to isolate noise-frequency features.
2. Efficient Encoder (Efficient Net Blocks):
o Extract hierarchical features using mobile inverted bottlenecks and depthwise convolutions.
3. Transformer Encoder Module:
o Applies self-attention mechanisms using Axial Transformer or Swin Transformer layers to learn spatially aware, long-range dependencies.
o Helps distinguish contextual noise from texture patterns.
4. UNet++ Skip Decoder:
o Employs densely nested skip connections to facilitate feature fusion across different resolutions.
o Maintains structural integrity and anatomical detail.
5. SE Attention Units:
o Channel-wise recalibration improves sensitivity to diagnostically relevant features (e.g., calcifications, lesions).
6. Deep Supervision:
o Intermediate predictions are supervised to guide the network at multiple depths, improving convergence and reliability.
7. Output:
o Final denoised image,
o Optional uncertainty heat map for diagnostic confidence.
Implementation and Compatibility:
 Optimized using mixed precision (FP16) for inference efficiency.
 Deployable on GPUs, edge devices, and PACS-integrated workstations.
 Modular design allows fine-tuning per modality and acquisition protocol.
 The invention is novel in its integration of global Transformers, Efficient UNet++ structure, and deep clinical optimization. Key innovations include:
 Use of Transformer modules within UNet++ for context-aware denoising.
 Application of SE-blocks and deep supervision for anatomical fidelity and edge preservation.
 Modality-agnostic pipeline adaptable to MRI, CT, X-ray, mammogram, and PET scans.
 Optional integration of wavelet preprocessing and uncertainty estimation, not seen in conventional architectures.
 End-to-end learnable system optimized for real-world clinical images with varying acquisition quality.
The invention introduces a multimodal system for medical image denoising. The system comprises an input unit configured to receive raw medical images obtained from modalities such as MRI, CT, X-ray, mammogram, or PET scans. The input data is normalized and optionally processed through wavelet transforms to isolate and analyze noise components.
An efficient encoder block, based on mobile inverted bottlenecks and depthwise convolutions, is employed to extract hierarchical features at multiple resolutions. These features are then passed into transformer encoder modules that apply self-attention to capture long-range spatial and contextual dependencies. The use of axial or Swin transformer layers ensures adaptability to high-resolution images without excessive computational demand.
The UNet++ based skip-decoder incorporates densely nested connections, allowing fine-grained fusion of features across scales. This design preserves subtle anatomical structures while enhancing feature reuse for efficiency. Channel recalibration is performed using squeeze-and-excitation modules, which strengthen sensitivity toward diagnostically significant image features such as calcifications and lesions.
Intermediate feature maps are supervised through deep supervision layers that guide network optimization at multiple depths, thereby improving convergence, stability, and interpretability. The final output module generates a denoised medical image and, where enabled, an uncertainty heat map that indicates regions of diagnostic risk or low confidence.
The system is optimized for inference efficiency using mixed precision computations, making it deployable in clinical environments including edge devices, GPU workstations, and PACS-integrated hospital systems. Its modular design allows adaptation for modality-specific requirements without retraining the entire architecture.
The invention further incorporates methods for ensuring robustness against varying acquisition protocols and noise patterns. The transformer component provides the global context necessary to distinguish between structured diagnostic texture and noise, while the convolutional components preserve fine local details. This hybrid approach results in superior edge preservation and improved clarity of diagnostic features.
The invention achieves computational efficiency by using lightweight encoder blocks and selective attention, reducing runtime overhead compared to pure transformer-based architectures. Its adaptability allows deployment in low-resource environments where hardware constraints exist.
The optional uncertainty visualization provides additional clinical trust by indicating areas where diagnostic reliability may be reduced. This feature is particularly beneficial for radiologists in decision-making, as it allows cross-verification of potentially ambiguous regions.
The novelty resides in combining the strengths of convolutional and transformer networks with UNet++ dense skip pathways, attention modules, and deep supervision. The design ensures anatomical fidelity, adaptability across imaging modalities, and real-time usability in clinical diagnostics.
The best method of working involves deploying the MedCleanNet++ system within a PACS-integrated clinical workstation. Raw MRI, CT, or X-ray scans are input to the system, which preprocesses them through normalization and optional wavelet decomposition. The efficient encoder extracts hierarchical features that are globally contextualized using transformer modules. Decoder fusion through UNet++ pathways reconstructs high-quality images, while squeeze-and-excitation units refine diagnostically important details. Deep supervision stabilizes the model during training and ensures interpretability. The denoised output, along with an optional uncertainty heat map, is generated in real time and can be directly used by radiologists for diagnosis. This configuration provides the most effective and clinically beneficial performance of the invention.

, Claims:We Claim:
1. A medical image denoising system comprising:
an input preprocessing module configured to normalize raw medical images and optionally apply wavelet transformations;
an efficient encoder comprising hierarchical convolutional blocks for extracting multi-resolution features;
a transformer encoder module configured to apply self-attention for global contextual learning;
a UNet++ skip decoder incorporating densely nested skip pathways for feature fusion;
at least one squeeze-and-excitation attention unit for channel-wise recalibration;
a deep supervision unit for intermediate optimization and interpretability; and
an output module configured to generate a denoised image and an optional diagnostic uncertainty map.
2. The system as claimed in claim 1, wherein the transformer encoder comprises a Swin transformer or axial transformer configured to process high-resolution medical images efficiently.
3. The system as claimed in claim 1, wherein the preprocessing module performs wavelet-based decomposition to isolate noise-frequency features prior to encoding.
4. The system as claimed in claim 1, wherein the squeeze-and-excitation attention unit enhances diagnostically relevant features including calcifications, lesions, and tumors.
5. The system as claimed in claim 1, wherein the UNet++ skip decoder fuses features across multiple scales to preserve anatomical structures.
6. The system as claimed in claim 1, wherein the deep supervision unit stabilizes network training through intermediate predictions at multiple depths.
7. The system as claimed in claim 1, wherein the system is optimized using mixed-precision inference for deployment on GPUs, edge devices, or PACS-integrated workstations.
8. The system as claimed in claim 1, wherein the output module generates an uncertainty heat map indicating regions of reduced diagnostic confidence.
9. A method for denoising medical images comprising:
receiving raw medical image data from one or more imaging modalities;
preprocessing the input images by normalization and optional wavelet transformation;
extracting hierarchical features through efficient encoder blocks;
applying transformer-based self-attention to capture long-range dependencies;
reconstructing denoised images through a UNet++ skip decoder with densely nested connections;
recalibrating channels using squeeze-and-excitation attention modules;
supervising intermediate feature maps through deep supervision layers; and
generating a final denoised image and optional uncertainty heat map.
10. The method as claimed in claim 9, wherein the method is applied across multiple imaging modalities including MRI, CT, mammogram, and PET scans to provide modality-agnostic denoising.

Documents

Application Documents

# Name Date
1 202541090180-STATEMENT OF UNDERTAKING (FORM 3) [22-09-2025(online)].pdf 2025-09-22
2 202541090180-REQUEST FOR EARLY PUBLICATION(FORM-9) [22-09-2025(online)].pdf 2025-09-22
3 202541090180-POWER OF AUTHORITY [22-09-2025(online)].pdf 2025-09-22
4 202541090180-FORM-9 [22-09-2025(online)].pdf 2025-09-22
5 202541090180-FORM FOR SMALL ENTITY(FORM-28) [22-09-2025(online)].pdf 2025-09-22
6 202541090180-FORM 1 [22-09-2025(online)].pdf 2025-09-22
7 202541090180-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [22-09-2025(online)].pdf 2025-09-22
8 202541090180-EVIDENCE FOR REGISTRATION UNDER SSI [22-09-2025(online)].pdf 2025-09-22
9 202541090180-EDUCATIONAL INSTITUTION(S) [22-09-2025(online)].pdf 2025-09-22
10 202541090180-DRAWINGS [22-09-2025(online)].pdf 2025-09-22
11 202541090180-DECLARATION OF INVENTORSHIP (FORM 5) [22-09-2025(online)].pdf 2025-09-22
12 202541090180-COMPLETE SPECIFICATION [22-09-2025(online)].pdf 2025-09-22