A System And Method For Denoising Coronary Angiography Images

< Back

A System And Method For Denoising Coronary Angiography Images

Abstract: ABSTRACT A SYSTEM AND METHOD FOR DENOISING CORONARY ANGIOGRAPHY IMAGES The present invention relates to a system (100) and method (300) for denoising coronary angiography images. The system (100) comprises an input acquisition unit (101) configured to receive noisy fluoroscopic image frames. A processing unit (102) employs a residual encoder–decoder neural network to analyze and clean the images. The encoder module (103) progressively transforms each input frame into a multi-scale representation. A central module (104) enhances the feature maps by increasing channel depth without altering spatial dimensions. The decoder module (105) then reconstructs the spatial dimensions and reduces feature depth to produce refined image representations. An output estimation module (106) generates noise, which is subtracted from the original input frame by an image reconstruction module (107) to yield a denoised output image. The output unit (108) delivers the final denoised frame with restored anatomical clarity and preserved bit depth, facilitating real-time clinical use without increasing radiation dose. Fig. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

26 August 2025

Publication Number

36/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

INNVOLUTION HEALTHCARE PRIVATE LIMITED

Plot No.143-A1, Bommasandra Industrial Area, Hebbagodi Village, Attibele Hobli Anekal Taluk, Bommasandra Industrial Estate, Bangalore, Bangalore South, Karnataka, India, 560099.

Inventors

1. Susmita Shivaprasad

Plot No.143-A1, Bommasandra Industrial Area, Hebbagodi Village, Attibele Hobli Anekal Taluk, Bommasandra Industrial Estate, Bangalore, Bangalore South, Karnataka, India, 560099

2. Anisha Subodh Palkar

Plot No.143-A1, Bommasandra Industrial Area, Hebbagodi Village, Attibele Hobli Anekal Taluk, Bommasandra Industrial Estate, Bangalore, Bangalore South, Karnataka, India, 560099

3. Veerakoti Reddy Bheemavarapu

Plot No.143-A1, Bommasandra Industrial Area, Hebbagodi Village, Attibele Hobli Anekal Taluk, Bommasandra Industrial Estate, Bangalore, Bangalore South, Karnataka, India, 560099.

4. Anubhav Agrawal

Plot No.143-A1, Bommasandra Industrial Area, Hebbagodi Village, Attibele Hobli Anekal Taluk, Bommasandra Industrial Estate, Bangalore, Bangalore South, Karnataka, India, 560099.

Specification

Description:A SYSTEM AND METHOD FOR DENOISING CORONARY ANGIOGRAPHY IMAGES

FIELD OF INVENTION
The present invention relates to medical image processing. Specifically, it provides a system and method for denoising low-dose fluoroscopy frames, using deep learning algorithm. The algorithm denoises low-dose fluoroscopy frames in real time, sharpening vessel detail and overall image clarity while enabling meaningful X-ray dose reductions for both patients and clinicians.

BACKGROUND OF THE INVENTION
Coronary artery disease (CAD) remains the world’s leading cause of mortality and continues to place a significant burden on global healthcare systems. One of the most vital diagnostic and interventional procedures for managing CAD is coronary angiography, which enables real-time imaging of the coronary arteries to identify blockages and guide catheter-based interventions such as stent placement. During such procedures, clinicians rely on continuous fluoroscopic video to navigate catheters, deploy devices, and monitor for complications with high precision in real time.

However, prolonged fluoroscopy use contributes to cumulative X-ray radiation exposure for both patients and medical staff. In an effort to reduce this radiation burden, modern catheterisation laboratories implement low-dose protocols by reducing tube current, pulse width, or frame rate. Although effective in minimizing radiation risk, these reductions significantly increase quantum (shot) noise and structured artefacts, which can obscure the fine-scale anatomical features crucial for clinical decision-making, such as stenoses, dissections, or thrombi that is critical for safe, successful treatment.

Conventional noise reduction methods offer only partial solutions. Spatial filters like Gaussian or median blurring may suppress noise but often degrade edge fidelity and often erasing the narrow lumens of distal vessels. More advanced techniques, such as frequency-domain filtering or non-local means algorithms, require manual parameter tuning based on patient size, imaging angles, or contrast medium dynamics—making them inconsistent and difficult to use during real-time procedures. Additionally, multi-frame averaging improves signal-to-noise ratio (SNR) but introduces motion ghosting due to heartbeats or table movement, further compromising image clarity when clarity is needed most.

Emerging solutions involving convolutional neural networks (CNNs) have shown promise; however, many of these models are trained on low-resolution images (8-bit down-sampled images) or synthetic noise profiles, failing to accurately represent the Poisson noise characteristics inherent in real fluoroscopic data. Furthermore, their high computational requirements often prevent real-time performance on full-resolution clinical images, limiting practical deployment in catheterisation laboratories.

As a result, clinicians are faced with an unsatisfactory trade-off—either accept grainy, low-dose images or increase radiation exposure to achieve diagnostic clarity. Therefore, there exists a clear and urgent need for a single, AI-driven denoising system that is trained on realistic X-ray noise, preserves critical sub-millimeter vessel detail, and operates in real-time on the existing GPU infrastructure found in most interventional settings. Such a solution would enhance procedural safety and accuracy without increasing radiation dose.

The prior art US20230099663A1 discloses a system for denoising fluoroscopic images using deep learning, particularly convolutional neural networks, to enhance image quality under low-dose conditions. It primarily focuses on learning spatial and temporal features and is typically trained on down-sampled, low-bit-depth data. However, it does not disclose a structured, modular architecture comprising an input acquisition unit, an encoder–decoder pipeline with a central module for channel enhancement, or an output estimation module generating a per-pixel noise map for subtraction. The prior art lacks the real-time denoising capability of full-resolution coronary angiography frames with explicit modeling of Poisson noise as taught by the present invention.

The prior art, such as CN116596796A, discloses a neural network-based medical image denoising method that utilizes attention mechanisms and dual-branch architectures to enhance noise suppression across medical scans. However, it does not teach or suggest a system specifically structured for denoising coronary angiography images captured via low-dose fluoroscopy. In particular, it lacks the defined modular architecture comprising an encoder–decoder structure with a central module for enhancing channel depth without altering spatial dimension, as well as an output estimation module that generates and subtracts a per-pixel noise map from the original frame. These system-level and functional distinctions enable real-time, full-resolution denoising and form a key aspect of the present invention, which are absent from the cited prior art.

The prior art, such as CN112801887A, discloses a denoising method for medical images based on residual learning and feature enhancement networks, primarily focusing on improving image quality in general radiology contexts. However, it does not teach or suggest a dedicated system for denoising coronary angiography images obtained through low-dose fluoroscopy. Specifically, the prior art lacks the defined encoder–decoder architecture incorporating a central module to enhance feature depth while preserving spatial dimensions and an output estimation module designed to compute a per-pixel noise map for subtraction from the input frame. These modular and processing features, enabling real-time denoising of full-resolution fluoroscopic sequences, are distinctive attributes of the present invention and are not disclosed in the cited document.

The prior art, such as CN115988995A, discloses a denoising method for low-dose medical images using attention mechanisms and residual networks to enhance image clarity. However, it does not disclose or suggest a modular system specifically tailored for denoising coronary angiography images via an integrated encoder–decoder architecture. The cited document lacks a central module that enhances channel depth while maintaining spatial dimensions, and it does not include an output estimation module that generates a per-pixel noise map for direct subtraction from the input frame. These structural and functional distinctions form the core of the present invention and are not addressed in the prior art.

Currently, fluoroscopic coronary angiography suffers from a trade-off between image clarity and patient safety due to X-ray dose limitations. Low-dose imaging protocols, while reducing radiation exposure, result in high quantum noise that obscures critical anatomical details, such as narrow lumens and vessel edges essential for diagnosing coronary artery disease. Existing denoising techniques—including spatial filters, adaptive models, and CNNs trained on synthetic data—either degrade image features or fail to perform in real time. The present invention addresses these limitations by introducing an AI-based system that denoises low-dose coronary angiography images using a residual encoder–decoder architecture. This system learns to estimate and subtract noise while preserving diagnostic features such as vessel edges and grayscale fidelity. Trained using real cine–low fluoro image pairs and optimized with a hybrid loss function reflecting X-ray photon statistics, the invention enables real-time, high-quality denoising on standard GPU hardware—improving safety, clinical accuracy, and workflow efficiency in catheterization labs.

OBJECTS OF THE INVENTION
The principal object of the invention is to provide a system and method for denoising coronary angiography images obtained via low-dose fluoroscopy using a deep learning-based encoder–decoder architecture that enhances diagnostic quality without increasing radiation exposure.

Another object of the invention is to enable real-time generation of denoised output frames by estimating and subtracting noise from each fluoroscopic image frame.

Another object of the invention is to provide flexible input support for fluoroscopy frames in formats such as 16-bit PNG and NumPy arrays, ensuring compatibility with clinical imaging data.

Said and other objects of the present disclosure will be apparent to a person skilled in the art after consideration of the following summary of subject matter as claimed, detailed description taken into consideration with accompanying drawings in which preferred embodiments of the present disclosure are illustrated.

SUMMARY OF THE INVENTION
It is therefore a general aspect of the present invention to provide a system and method for denoising coronary angiography images acquired from low-dose fluoroscopy using an AI-driven encoder–decoder architecture. The system comprises an input acquisition unit configured to receive fluoroscopic image frames represented as pixel matrices. A processing unit processes these frames through an encoder–decoder architecture, wherein an encoder module transforms each input frame into hierarchical feature maps by increasing feature depth and reducing spatial dimensions. A central module further enhances these encoded features by increasing channel depth without altering spatial dimensions. A decoder module then reconstructs the original spatial dimension while reducing the number of feature maps to generate refined feature representations. An output estimation module uses these to generate noise corresponding to predicted noise, which is subtracted from the input frame to produce a denoised output in image reconstruction unit. Finally, an output unit provides the denoised frames, delivering enhanced image clarity in real time while maintaining diagnostic accuracy and compatibility with clinical imaging workflows.

According to an embodiment, the encoder module comprises a convolution layer followed by a max pooling operation for spatial downsampling. The convolution layer is configured to increase the number of feature maps derived from the input fluoroscopic image frames, thereby enhancing feature representation and capturing detailed patterns. The max pooling operation that follows reduces the spatial dimensions of the feature maps, enabling hierarchical abstraction and efficient multi-scale analysis. This combination facilitates robust feature extraction necessary for accurate noise detection and effective denoising in coronary angiography imaging.

According to an embodiment, the decoder module comprises a convolution layer preceded by an inverse convolution operation. The inverse convolution operation is adapted to increase the spatial dimensions of the image frame, effectively reconstructing the resolution reduced during encoding. The subsequent convolution layer reduces the number of feature maps, refining the reconstructed features while preserving structural integrity. This configuration enables the generation of high-quality denoised images that retain anatomical detail critical for coronary angiography interpretation.

According to an embodiment, the encoder module is configured to process the input image frame through a sequence of operations, each comprising a convolution layer followed by a max pooling operation. These operations are arranged consecutively and repeated more than once, enabling progressive enhancement of feature representation and gradual reduction in spatial dimension. This layered architecture allows for deep hierarchical encoding of image characteristics, supporting robust noise isolation and improved accuracy in denoising coronary angiography images.

According to an embodiment, the decoder module is configured to reconstruct the spatial dimensions of the image frame by applying a sequence of operations, each comprising a convolution layer preceded by an inverse convolution operation. The sequence includes more than one such operation arranged consecutively, allowing for progressive spatial upscaling and simultaneous refinement of feature maps. This structured reconstruction pathway ensures the recovery of original resolution with enhanced clarity, thereby enabling accurate and high-fidelity denoising of coronary angiography images.

According to an embodiment, the central module is configured to apply at least one convolution layer to the encoded feature maps. This processing step enhances the level of feature abstraction and improves the contextual understanding of image content.
According to an embodiment, the convolution layer is implemented as a double convolution layer, wherein each double convolution layer comprises two sequential two-dimensional convolutional layers (Conv2D), each followed by a non-linear activation function such as ReLU. This configuration enhances non-linear feature extraction by allowing deeper representation learning within each stage of the encoder and decoder. The use of ReLU activation after each convolution facilitates the modeling of complex patterns in the image data, thereby improving the system’s ability to distinguish noise from relevant anatomical structures in coronary angiography frames.

According to an embodiment, each double convolution layer in the encoder module is configured to increase the number of feature maps by a factor of two, enhancing the depth and richness of learned features at each level. Correspondingly, each max pooling operation reduces the spatial dimensions of the image frame by a factor of two, enabling efficient downsampling. This coordinated scaling of features and dimensions supports effective multi-scale feature abstraction essential for accurate denoising of coronary angiography images.

According to an embodiment, each inverse convolution operation in the decoder module is configured to increase the spatial dimensions of the image frame by a factor of two, progressively reconstructing the original dimension. Each double convolution layer that follows is configured to reduce the number of feature maps by a factor of two, refining the feature representation at each stage. This symmetric design ensures balanced upscaling and feature reduction, enabling precise restoration of image details while minimizing noise in coronary angiography outputs.

According to an embodiment, the system comprises a training module configured to train the encoder–decoder architecture using datasets containing pairs of noisy low dose fluoroscopic image frames and their corresponding reference image frames. The training process optimizes a loss function designed to accurately capture noise characteristics and guide the network toward minimizing reconstruction error. This enables the model to effectively learn the mapping between noisy inputs and denoised outputs, thereby enhancing the quality and diagnostic reliability of the resulting coronary angiography images.

According to an embodiment, the loss function used in the training module is a fixed weighted sum of four components. These include a Poisson–ROF loss term configured to model the statistical characteristics of X-ray photon noise; a mean-squared-error (MSE) term to ensure global brightness consistency between the denoised and reference frames; a Laplacian loss term to preserve the sharpness of anatomical edges; and a high-pass filter loss term to retain fine textures and spatial features. This composite loss structure facilitates robust learning and high-fidelity reconstruction in denoised coronary angiography images.

According to an embodiment, the training module is configured to extract random patches of predefined dimensions from the input image frames during each training iteration. The location of these patches is varied across training epochs to ensure that the network is exposed to diverse regions within each image. This strategy enhances generalization by allowing the model to learn noise patterns and structural features from multiple spatial contexts, thereby improving its denoising performance across varied coronary angiography data.

According to an embodiment, the input acquisition unit is configured to ingest fluoroscopy image frames provided in the form of NumPy array (.npy) files or 16-bit PNG images. This allows compatibility with commonly used medical imaging data formats and ensures seamless integration with existing clinical data pipelines for efficient preprocessing and denoising operations in coronary angiography workflows.

A method for denoising coronary angiography images obtained via low-dose fluoroscopy is provided. The method comprises receiving a plurality of fluoroscopic image frames, each represented as a matrix of pixel values defining spatial dimensions. Each image frame is processed through an encoder–decoder architecture that first encodes the input by increasing the number of feature maps while reducing spatial dimensions to extract hierarchical and multi-scale features. The encoded feature maps are transmitted to a central module, where the number of feature maps is increased further without altering spatial dimensions to enhance representational depth. These enhanced features are then decoded by progressively reducing feature maps and restoring the original spatial dimensions to generate reconstructed feature maps. The noise is estimated from the reconstructed features using an output estimation module, which is subtracted from the corresponding input frame to produce a denoised output frame. The final denoised frame is then outputted, enabling high-quality, low-radiation coronary angiographic visualization.

The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the device and process illustrated herein may be employed without departing from the principles of the disclosure described herein.

BRIEF DESCRIPTION OF DRAWING:
Fig. 1 is a block diagram illustrating a system architecture for denoising coronary angiography images obtained via low-dose fluoroscopy.
Fig. 2 shows the internal architecture of the processing unit, which is based on a U-Net-inspired residual encoder–decoder neural network.
Fig. 3 shows a flow chart depicting a method for denoising coronary angiography images obtained via low-dose fluoroscopy.
Fig. 4 shows an examples of coronary-angiography frame pair showing the original noisy image (left) and its AI-denoised counterpart (right).

DETAILED DESCRIPTION
The best and other modes for carrying out the present invention are presented in terms of the embodiments, herein depicted in Drawings provided. The embodiments are described herein for illustrative purposes and are subject to many variations. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient, but are intended to cover the application or implementation without departing from the spirit or scope of the present invention. Further, it is to be understood that the phraseology and terminology employed herein are for the purpose of the description and should not be regarded as limiting. Any heading utilized within this description is for convenience only and has no legal or limiting effect.

The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items.

The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more sub-systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other, sub-systems, elements, structures, components, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.

For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as would normally occur to those skilled in the art are to be construed as being within the scope of the present invention.

It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.

Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the art to which this invention belongs.

The system, method and examples provided herein are only illustrative and not intended to be limiting. Embodiments of the present invention will be described below in detail with reference to the accompanying figures.

The invention provides a system and method for denoising coronary angiography images acquired through low-dose fluoroscopy, enabling enhanced image quality without increasing radiation exposure or requiring additional hardware. By delivering clear, noise-suppressed angiographic frames in real time, the system allows interventional cardiologists to visualize sub-millimetre vascular details with greater clarity and safety, thereby supporting more accurate diagnosis and procedure planning.

The proposed system comprises an input acquisition unit, a processing unit and an output unit to convert noisy, low-dose fluoroscopy frames into denoised coronary-angiography frames in real time. At its core, the processing unit implements a deep learning architecture based on a U-Net-inspired residual encoder–decoder neural network. This multi-layered AI model has been specifically tailored to the domain of coronary imaging, trained with a composite loss function that incorporates X-ray photon noise statistics, suppress fluoroscopic noise, preserve vessel edges, and maintain fine textures—consistently raises signal-to-noise ratio and sharpness metrics across unseen angiography sequences, producing cleaner images with crisper vessel borders in real time.

For each input fluoroscopy frame, the network estimates noise representing the residual noise component—and subtracts it from the original input, thereby isolating and removing only the unwanted noise. This approach preserves the full grayscale fidelity and dynamic range of the raw images, ensuring anatomical accuracy.

Training is guided by a carefully weighted loss function that captures the statistical behaviour of authentic X-ray noise and spatial variability. The model aligns its outputs with high-quality cine frames used as ground truth, preserving global brightness and contrast while maintaining diagnostic integrity. Edge-aware and high-pass loss components further ensure the preservation of vascular features critical for detecting stenoses, bifurcations, and other fine structures. The training procedure is designed for robustness and seamless recovery, supporting scalable deployment.

Crucially, the network is optimized to run on standard GPU hardware already available in catheterisation labs, enabling real-time operation without extra hardware and modifying existing imaging infrastructure. By learning directly from real coronary angiography datasets, the model internalizes the unique noise characteristics of these images—significantly outperforming generic denoisers—and delivers substantial clinical value by enhancing both image clarity and patient safety at reduced radiation levels.

As illustrated in Figure 1, the system (100) architecture is designed for denoising coronary angiography images obtained via low-dose fluoroscopy in real time, while preserving the full 16-bit dynamic range of the original input. The system (100) comprises three principal modules: an input acquisition unit (101), a processing unit (102), and an output unit (108).

The input acquisition unit (101) is configured to receive noisy fluoroscopic image frames acquired at reduced X-ray dose levels. These frames are intended to minimize radiation exposure to the patient while still capturing diagnostically relevant anatomical information. Each fluoroscopic image frame is structured as a two-dimensional matrix of pixel intensity values representing vascular and anatomical structures, and is defined by spatial dimensions in terms of height and width. In one embodiment, the input acquisition unit (101) is adapted to ingest images supplied in the form of NumPy arrays (.npy) or 16-bit PNG files, allowing for high-fidelity input handling and ensuring compatibility with real-time digital processing. A typical input image frame is a single-channel grayscale image represented in the format 1×1344×1344, where ‘1’ denotes the number of input channels and ‘1344×1344’ defines the pixel resolution i.e a height and width of 1344 pixels. This representation contains rich information extracted from various feature kernels, including texture, contrast, and structural patterns associated with both signal and noise.

Once acquired, the fluoroscopic image frames are transmitted to the processing unit (102), which performs denoising using a deep learning-based residual encoder–decoder neural network, as depicted in Figure 2. This unit (102) comprises five sub-modules: the encoder module (103), central module (104), decoder module (105), output estimation module (106), and image reconstruction module (107).

The encoder module (103) receives the input image frame. The encoder module (103) serves as the initial feature extraction stage of the denoising system and is designed to transform the raw fluoroscopic input image into a rich set of hierarchical feature representations. The process begins with an input grayscale image of dimension 1×1344×1344, representing a single-channel image with defined spatial dimensions. This input is passed through a convolution layer and max pooling operation to transform the image into a set of encoded feature map. Here, each convolution operation is organized into a double convolution layer, comprising two sequential two dimensional convolution (Conv2D) layers (3 × 3) activated through non-linear activation functions such as ReLU, which enhances local feature extraction. The first convolutional layer is configured to apply 16 bit scale (native intensity range), producing a set of 16 feature maps of size 1344×1344. These filters learn to detect various localized patterns such as edges, textures, and vessel boundaries across the entire image.

In an embodiment, other nonlinear activation functions can be used to optimize performance, depending on the dataset and hardware constraints. The proposed architecture supports any nonlinear activation function suited for improving learning stability and performance.

This output is then processed by a max pooling operation, which reduces the spatial dimension of the feature maps by a factor of two (from 1344×1344 to 672×672), effectively summarizing information and focusing on the most salient activations within each localized region. The max pooling operation not only helps reduce computational load in subsequent layers but also provides a degree of translational invariance, which is beneficial in capturing anatomy across variable positions.

Encoder module have multiple sequence of double convolution layer followed by a max pooling operation for spatial downsampling. Each double convolution increases the depth of the feature representation by doubling the number of feature maps (e.g., from 16 to 32, then 64, and up to 128), thereby enabling the network to learn increasingly abstract and complex patterns. In parallel, each max pooling step halves the spatial dimension (e.g., from 672×672 to 336×336, and further to 168×168), gradually compressing the image while preserving critical semantic and contextual information.

The role of the double convolution blocks can be fulfilled by other architectural components that provide comparable feature extraction and refinement. For example, comprising one or more convolutional layers or equivalent feature extraction structures, including residual, dilated, separable, or attention-enhanced blocks. Each of these structures can serve the same functional purpose of extracting hierarchical features and refining details during denoising.

In an embodiment, alternative downsampling operations could be substituted. Strided convolutions may be used to reduce resolution while learning additional filters, and average pooling can offer smoother feature maps in some cases.

In the current implementation, double convolutional layer are used because two sequentialconvolutions allow the network to extract both low-level and progressively refined features within each stage. The first convolution captures local edge and texture information, while the second convolution refines those features and deepens the receptive field without increasing stride or kernel size. While a single convolution layer could technically be substituted, it generally offers reduced feature extraction capability, which may degrade performance for fine-detail tasks such as vessel preservation in denoising.

By the end of the encoder path, the image data has been transformed into a compact, information-rich set of feature maps with higher channel depth (e.g., 128×168×168). This multi-scale representation enables the model to capture both fine anatomical detail and abstract contextual features necessary for denoising.

The central module (104) acts as a transitional and computationally dense processing module situated between the encoder and decoder paths in the denoising architecture. Its primary function is to aggregate and transform the compressed, multi-scale features generated by the encoder module (103) into a globally informative representation without further altering the spatial dimension. Upon receiving the encoded feature maps from the encoder, typically of dimension 128×168×168, the central module initiates a process of deep feature refinement and contextual integration. Unlike the encoder (103), which reduces spatial dimension, the central module (104) is specifically designed to preserve spatial dimensions while enhancing the depth and expressiveness of the learned features.

At the core of the central module (104) lies a 128-channel bottleneck that increases the receptive field without further reducing spatial dimension. This module (104) is designed to encode global context and high-level semantic representations, allowing the system (100) to distinguish relevant anatomical structures from noise patterns. The central module (104) employs a convolution layer that maintain the existing spatial dimensions while increasing the depth and abstraction of the feature maps and generate enhanced feature maps. Further, each convolution operation is organized into a double convolution layer, comprising two sequential Conv2D operation, each utilizing 3×3 filters. These convolutions are typically followed by non-linear activation functions such as ReLU (Rectified Linear Unit), enabling the network to model complex non-linearity and suppress irrelevant patterns.

Importantly, the spatial dimension is maintained throughout the central module (104) meaning the input and output dimensions of the feature maps remain consistent (e.g., 128×168×168). This ensures that no further down-sampling occurs. The output of this module (104) is a refined set of feature maps enriched with both local and contextual information.

The decoder module (105) plays a critical role in the denoising system by performing the reverse operation of the encoder (103) namely, it restores the spatial dimension of the compressed feature maps back to the original image dimensions while simultaneously refining and reducing the number of feature channels. The decoder module (105) is tasked with reconstructing a high-resolution, denoised output image from the deep, abstract features produced by the central module (104), ensuring the preservation of anatomical accuracy and subtle vascular detail. To achieve this, the decoder (105) operates through a sequence of learned upsampling steps, where each step comprises two key operations: an inverse convolution (also known as transposed convolution or deconvolution) (2 × 2) followed by a convolution layer (3 × 3), similar to the encoder. Further, each convolution operation is organized into a double convolution layer, comprising two sequential Conv2D layers activated by non-linear functions such as ReLU.

The inverse convolution operations double the spatial dimension (e.g., from 168×168 to 336×336) by performing learned upsampling, while the subsequent convolutional layers refine the restored features. As a result, the decoder module (105) progressively reconstructs the spatial dimensions from compressed representations (e.g., 64×336×336, 32×672×672, 16×1344×1344) while reducing the number of feature maps. This upsampling and reconstruction process restores fine-grained anatomical details necessary for producing high-quality denoised output.

In an embodiment, the network’s convolutional layers all use same padding alongside balanced downsampling and upsampling, so input and output frames share identical spatial dimensions.

The dotted connections shown in the figure between the encoder convolution layers and the corresponding decoder convolution layers represent skip connections, a core design element in U-Net and residual encoder–decoder architectures. These connections are crucial for enhancing the model's ability to preserve fine-grained spatial and anatomical information during the reconstruction process in image denoising.

Specifically, during encoding, the input image frame undergoes a series of convolution and max-pooling operations that progressively reduce its spatial dimension while increasing the number of feature maps. This transformation extracts high-level semantic features but also tends to lose some low-level spatial details that are vital for accurate image reconstruction—especially in medical imaging tasks like coronary angiography, where structural precision is essential. To counteract this, skip connections directly link the output of each encoder convolution layer to the input of its mirrored decoder layer. For example, the output from the first convolutional layer in the encoder (e.g., 16 × 1344 × 1344) is passed—not just forward through the encoder—but also laterally to the corresponding decoder layer that processes the same resolution. This lateral transfer of feature maps allows the decoder to concatenate or merge these early-stage, high-resolution features with the upsampled features it is currently processing.

The inclusion of these skip connections significantly boosts the decoder’s ability to reconstruct precise structural details. It ensures that the high-resolution context lost during downsampling is reintroduced at each stage of upsampling.

Hence, the encoder and decoder are inherently modular. The architecture is constructed so that additional convolutional blocks, pooling layers, or upsampling layers can be added or removed based on the resolution of the input images, computational resources, or the specific application domain. For example, processing higher resolution medical scans may require adding deeper levels to capture hierarchical features, while lower-resolution data can be processed with fewer blocks to save computation. This modularity ensures that the same framework can be adapted across different use cases without changing the fundamental design.

Following the decoder module (105), the output estimation module (106) receives the reconstructed/final feature maps and applies an additional 1×1 convolution layer to estimate a per-pixel residual noise map. The convolution layer performs a weighted sum of the 16 input feature values at each pixel location using a set of learnable weights, but it does not alter the spatial dimensions. As a result, for every pixel position (i, j), the 1×1 convolution combines the 3 × 3 convolution, into a single channel residual, effectively reducing the channel depth from 16 to 1, while retaining the spatial dimensions. The output of this operation is a residual noise map of shape (1 × 1344 × 1344) matching the dimensions of the original input frame. This map contains estimation of the noise present in the input low dose fluoroscopic frame.

The image reconstruction module (107) subsequently subtracts the residual noise map from the input image to produce a denoised output frame. Since the subtraction is performed pixel-wise and only modifies intensity values, the output frame retains the original size and bit depth of the input image.

Finally, the output unit (108) is responsible for delivering the denoised image frames for display or archival. Because the denoising operation alters only pixel intensity values and does not modify the spatial or structural characteristics of the image, the output frame (108) is fully compatible with existing clinical viewing systems. It can be directly displayed on diagnostic monitors or stored alongside original image sequences for retrospective analysis. The preservation of the 16-bit dynamic range ensures that even subtle grayscale variations, critical in vascular diagnostics, are maintained, supporting accurate and reliable clinical interpretation.

Hence, the proposed system (100) enhances overall image quality by suppressing noise thus sharpening the vessel structures, yielding clearer, more diagnostic images at the same resolution and depth.

Further, the training module (109) is designed to train the core residual encoder–decoder neural network that powers the system’s real-time denoising capability. It plays a critical role in enabling the network to effectively distinguish true anatomical content from quantum noise present in low-dose fluoroscopy images. The training data is organized into two parallel directories: one containing noisy low-dose fluoroscopic image frames, and the other containing corresponding high-dose cine reference frames, which serve as ground truth. Each pair is aligned by index, ensuring that the model learns from accurately matched inputs and targets. Both image types are stored as NumPy arrays (.npy) or optionally as 16-bit PNG images, and during pre-processing, the loader converts these arrays into 32-bit floating-point format while preserving the original 0–65535 grayscale intensity range to maintain dynamic fidelity.

To balance memory constraints with contextual awareness, the training module (109) randomly crops 256×256 patches from different regions of each image pair during every training epoch. This strategy exposes the network to all parts of the training images over time and enhances generalization across spatial variations. A batch size of 2 is used, which refers to the number of image patches processed together during one forward and backward pass. This small batch size is often chosen in high-resolution medical imaging due to the large memory requirements of the model and data.

Additionally, training is driven by a composite loss function, specifically engineered to model the statistical and visual characteristics of coronary angiography. It includes four weighted components: a Poisson–ROF loss that simulates realistic X-ray photon noise along with mild total variation regularization; a mean squared error (MSE) loss to preserve global brightness and contrast; a Laplacian loss that strengthens the alignment of vessel edges; and a high-pass filter loss that ensures preservation of fine textures such as distal branches and subtle stenoses. The weighting of these components is empirically determined to maximize signal-to-noise ratio and sharpness on validation data.

Optimization is conducted using the AdamW algorithm that decouples weight decay from the gradient update, allowing for more effective regularization. The algorithm has an initial learning rate set to 1×10⁻³and weight decay of 1×10⁻⁵ is applied to prevent overfitting by penalizing large weight magnitudes. Training is performed over 150 epochs, enabling the model to iterate multiple times over the full training dataset to refine its parameters. A Reduce-on-Plateau scheduler adaptively lowers the learning rate when performance stagnates, while global gradient clipping ensures numerical stability during training. This scheduler monitors a chosen performance metric—typically the validation loss—and reduces the learning rate by a factor of 0.5 if no improvement is observed over 5 consecutive epochs. To ensure numerical stability during backpropagation and prevent exploding gradients, gradient clipping is applied with a maximum norm of 0.2. Once trained, the model can process full-resolution images (e.g., 1344×1344 pixels) in a single pass or in tiled segments, depending on the GPU’s memory capacity. The output is then converted back to the legal 16-bit range and saved as .npy or PNG files for seamless integration into clinical systems. While the system is designed to operate in real-time, the training module enables it by internalizing the actual noise signatures present in coronary fluoroscopy, far surpassing generic denoisers. As such, it allows for reduced radiation doses without compromising diagnostic clarity and integrates effortlessly into existing catheterization lab workflows, providing clinicians with sharper visualization of the coronary anatomy and increased procedural confidence.

Although the present code primarily uses 16-bit .npy files for paired noisy-clean training and PNG files for evaluation, the architecture itself does not process file formats directly. It operates on arrays of 32-bit, 16-bit, 8-bit, or 4-bit data, with all inputs automatically converted to floating-point tensors internally.

Following the neural network’s denoising operation, the system (100) does not require any mandatory post-processing to finalize the output image. However, optional enhancements—such as contrast-limited adaptive histogram equalization (CLAHE) or edge-sharpening kernels—may be applied if the display requirements of a specific workstation call for a customized visual appearance. The output frame (108) retains the same spatial dimensions and 16-bit grayscale depth as the original fluoroscopy input, ensuring full compatibility with existing clinical infrastructure. As a result, these denoised images can be seamlessly stored, reviewed, or transmitted through any system that currently supports raw fluoroscopic data, avoiding the need for integration changes.

The system (100) is trained specifically on real-world coronary fluoroscopy data. It accurately learns the statistical properties and spatial behaviour of X-ray noise in low-dose imaging scenarios. Consequently, the neural network can effectively suppress granular noise while preserving diagnostically critical features such as small arterial branches and subtle luminal stenoses. This preservation of fine vascular detail is crucial for the accurate identification of coronary pathology. Moreover, the system (100) operates entirely within the hardware constraints of standard catheterization lab workstations, utilizing GPUs that are already deployed, without requiring any additional computing equipment or image acquisition hardware. By enabling lower radiation dose protocols without compromising diagnostic quality, the system directly contributes to enhanced patient safety, improved procedural clarity, and greater clinical confidence for interventional cardiologists. In conclusion, the described invention represents a complete, self-contained AI solution for improving coronary angiography under low-dose fluoroscopy.

Fig. 3 shows a method (300) for denoising coronary angiography images obtained via low-dose fluoroscopy. The method initiates with receiving a plurality of fluoroscopic image frames at step 301. Each image frame comprises a two-dimensional matrix of pixel values that define spatial dimensions such as height and width, and represents vascular structures captured under reduced X-ray dose settings to limit radiation exposure. At step 302, the method involves processing each input image frame through an encoder–decoder architecture designed for noise reduction. This architecture includes multiple interconnected modules that collectively transform and refine the input image data.

At step 303, the method proceeds by receiving the input image frame in an encoder module. This encoder module begins the transformation process by breaking down the input image into more abstract representations. Next, at step 304, the method involves transforming the input image into a set of encoded feature maps using the encoder module. This transformation is performed by increasing the number of feature maps (channels) and progressively reducing the spatial dimensions (resolution) through a series of double convolution and max pooling operations. These operations extract hierarchical and multi-scale features necessary to isolate vessel structures from noise artifacts.

At step 305, the method includes transmitting the encoded feature maps from the encoder module to a central module. Then, at step 306, the method continues by processing the encoded feature maps in the central module. This central module performs further convolutional operations to increase the number of feature maps (i.e., channel depth) without altering the spatial dimensions. The goal of this operation is to generate enhanced feature representations that contain both local detail and broader anatomical context, critical for distinguishing subtle noise patterns from true signal.

At step 307, the method proceeds by transmitting the enhanced feature representations from the central module to a decoder module. Following this, at step 308, the method involves processing the enhanced feature representations in the decoder module. This operation entails progressively reducing the number of feature maps and restoring the spatial dimensions back to their original values. The decoder performs inverse convolution operation (upsampling) followed by convolution to refine the output, thereby generating reconstructed feature maps that mirror the dimensions of the original input image frame while preserving anatomical accuracy.

At step 309, the method includes estimating noise from the reconstructed feature maps using an output estimation module. This estimation produces a single-channel residual map that indicates the predicted noise intensity at each pixel location. Then, at step 310, the method performs the operation of subtracting the estimated noise from the corresponding input image frame. This subtraction yields a denoised output frame that retains original grayscale fidelity while minimizing noise artifacts.

Finally, at step 311, the method includes outputting the denoised output frame for subsequent visualization, archival, or diagnostic evaluation. The output frame maintains the same resolution and dynamic range as the original input, ensuring clinical integrity.

Fig. 4 illustrates a comparative visual output of the AI-denoising system applied to low-dose coronary angiography images. The left panel of the figure represents a typical fluoroscopic image frame acquired under reduced X-ray exposure. This image, though clinically necessary to minimize patient and staff radiation risk, exhibits pronounced quantum noise, graininess, and structural artefacts that obscure fine vascular detail. Such noise severely impacts the visibility of narrow vessel lumens and subtle anatomical features critical for diagnosing stenosis, dissection, or thrombus during interventional cardiology procedures.

The right panel shows the corresponding AI-denoised image, generated by the residual encoder–decoder neural network described in the invention. Here, the system has accurately predicted and subtracted noise from the original input, resulting in a markedly cleaner image while preserving the anatomical fidelity and grayscale integrity. Fine vessel boundaries that were partially or fully obscured in the noisy frame are now sharply delineated, allowing clinicians to interpret the coronary tree with improved confidence.

This transformation is achieved without increasing the X-ray dose or relying on temporal averaging, which could introduce motion artefacts. Importantly, the denoised image retains the full 16-bit dynamic range of the original input, ensuring that no information is lost in the cleaning process. The comparison in Fig. 4 underscores the efficacy of the AI model in enhancing diagnostic clarity under low-dose conditions, fulfilling the clinical goal of maintaining image quality while reducing radiation exposure.
, Claims:I/We Claim:
1. A system (100) for denoising coronary angiography images obtained via low-dose fluoroscopy, the system (100) comprising:
• an input acquisition unit (101), configured to receive a plurality of fluoroscopic image frames, wherein each image frame comprises a matrix of pixel values to define spatial dimensions;
• a processing unit (102) configured to
o receive one or more input fluoroscopic image frames as input;
o process each input image frame through an encoder–decoder architecture, the architecture comprising:
 an encoder module (103) adapted to receive the input image frames and process the same to transform the image into a set of encoded feature map by increasing the number of features, while progressively reducing the spatial dimensions of the image frame, thereby enabling the extraction of hierarchical and multi-scale image features;
 a central module (104) adapted to receive the encoded feature maps from the encoder module (103) and configured to further process them to increase the number of feature maps without altering their spatial dimensions to generate an enhanced feature representations with increased channel depth;
 a decoder module (105) configured to receive the enhanced feature representations from the central module (104), and to process them progressively to reduce the number of feature maps and restore the spatial dimension to original values and to generate reconstructed feature maps;
 an output estimation module (106) adapted to receive the reconstructed feature maps from the decoder module (105) and estimate noise corresponding to the predicted noise content present in the input image frame; and
 image reconstruction module (107) adapted to subtract the estimated noise from the corresponding input image frame to generate a denoised output frame; and
• an output unit (108), configured to provide the denoised output frame corresponding to each received fluoroscopic image frame.
2. The system (100) as claim in claim 1, wherein the encoder module (103) comprising a convolution layer, followed by a max pooling operation, wherein the convolution layer is adapted to increase the number of feature maps and the max pooling operation is adapted to reduce the spatial dimension of the image frame.
3. The system (100) as claim in claim 1, wherein the decoder module (105) comprising convolution layer preceded by an inverse convolution operation, wherein the inverse convolution operation is adapted to increase the spatial dimensions of the image frame and the convolution layer reduce the number of feature maps.
4. The system (100) as claim in claim 2, wherein the encoder module (103) is configured to process the input image frame through a sequence of operations, each operation comprising a convolution layer followed by a max pooling operation, the sequence including more than one such operation arranged consecutively.
5. The system (100) as claimed in claim 3, wherein the decoder module (105) is configured to reconstruct spatial dimension by applying a sequence of operations, each operation comprising a convolution layer preceded by an inverse convolution operation, the sequence including more than one such operation arranged consecutively.
6. The system (100) as claimed in claim 1, wherein the central module (104) is configured to process the encoded feature map through at least one convolution layer, thereby enhancing feature abstraction and contextual understanding without further down-sampling.
7. The system (100) as claim in claim 1 to 6, wherein the convolution layer is double convolution layer, wherein each double convolution layer comprises two sequential two-dimensional convolutional layers (Conv2D), each followed by a non-linear activation function (ReLU) configured to enhance non-linear feature extraction.
8. The system (100) as claimed in claim 2, wherein each double convolution layer in the encoder module (103) is configured to increase the number of feature maps by a factor of two, and each max pooling operation is configured to reduce the spatial dimensions of the image frame by a factor of two.
9. The system (100) as claimed in claim 3, wherein each inverse convolution operation in the decoder module (105) is configured to increase the spatial dimensions of the image frame by a factor of two, and each double convolution layer is configured to reduce the number of feature maps by a factor of two.
10. The system (100) as claim in claim 1, comprising a training module (109) configured to train the encoder-decoder architecture using datasets comprising pairs of noisy fluoroscopic image frames and corresponding reference image frames, the training optimizes a loss function to enable accurate estimation of noise characteristics and enhance image quality.
11. The system (100) as claim in claim 9, wherein the loss function is a fixed weighted sum of four components, comprising a Poisson–ROF loss term, configured to model X-ray photon noise characteristics; a mean-squared-error (MSE) term, configured to maintain global brightness consistency between denoised and reference frames; a Laplacian loss term, configured to preserve edge sharpness; and a high-pass filter loss term, configured to retain textures and spatial features.
12. The system (100) as claim in claim 10, wherein the training module (109) is configured to extract random patches of predefined dimensions from the input image frames during each training iteration, the patch location being varied across training epochs to ensure that the network is exposed to different regions of each image.
13. The system (100) as claimed in claim 1, wherein the input acquisition unit (101) is configured to ingest fluoroscopy image frames supplied in the form of NumPy array (.npy) files or 16-bit PNG images.
14. A method (300) for denoising coronary angiography images obtained via low-dose fluoroscopy, the method comprising:
• receiving (301) a plurality of fluoroscopic image frames, each image frame comprising a matrix of pixel values defining spatial dimensions;
• processing (302) each input image frame through an encoder–decoder architecture comprising:
o receiving (303) the input image frame in an encoder module;
o transforming (304) the input image frame into a set of encoded feature maps by increasing the number of feature maps and progressively reducing the spatial dimensions of the image frame to extract hierarchical and multi-scale features;
o transmitting (305) the encoded feature maps to a central module;
o processing (306) the encoded feature maps in the central module by increasing the number of feature maps without altering spatial dimensions to generate enhanced feature representations with increased channel depth;
o transmitting (307) the enhanced feature representations to a decoder module;
o processing (308) the enhanced feature representations in the decoder module by progressively reducing the number of feature maps and restoring the spatial dimensions to their original values, thereby generating reconstructed feature maps;
• estimating (309) noise from the reconstructed feature maps using an output estimation module;
• subtracting (310) the estimated noise from the corresponding input image frame to generate a denoised output frame; and
• outputting (311) the denoised output frame

Documents

Application Documents

#	Name	Date
1	202541080967-STATEMENT OF UNDERTAKING (FORM 3) [26-08-2025(online)].pdf	2025-08-26
2	202541080967-REQUEST FOR EARLY PUBLICATION(FORM-9) [26-08-2025(online)].pdf	2025-08-26
3	202541080967-PROOF OF RIGHT [26-08-2025(online)].pdf	2025-08-26
4	202541080967-POWER OF AUTHORITY [26-08-2025(online)].pdf	2025-08-26
5	202541080967-FORM-9 [26-08-2025(online)].pdf	2025-08-26
6	202541080967-FORM FOR SMALL ENTITY(FORM-28) [26-08-2025(online)].pdf	2025-08-26
7	202541080967-FORM FOR SMALL ENTITY [26-08-2025(online)].pdf	2025-08-26
8	202541080967-FORM 1 [26-08-2025(online)].pdf	2025-08-26
9	202541080967-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [26-08-2025(online)].pdf	2025-08-26
10	202541080967-EVIDENCE FOR REGISTRATION UNDER SSI [26-08-2025(online)].pdf	2025-08-26
11	202541080967-DRAWINGS [26-08-2025(online)].pdf	2025-08-26
12	202541080967-DECLARATION OF INVENTORSHIP (FORM 5) [26-08-2025(online)].pdf	2025-08-26
13	202541080967-COMPLETE SPECIFICATION [26-08-2025(online)].pdf	2025-08-26
14	202541080967-MSME CERTIFICATE [28-08-2025(online)].pdf	2025-08-28
15	202541080967-FORM28 [28-08-2025(online)].pdf	2025-08-28
16	202541080967-FORM 18A [28-08-2025(online)].pdf	2025-08-28