Method And System For Multimodal Image Super Resolution Using

< Back

Method And System For Multimodal Image Super Resolution Using Convolutional Dictionary Learning

Abstract: ABSTRACT METHOD AND SYSTEM FOR MULTIMODAL IMAGE SUPER-RESOLUTION USING CONVOLUTIONAL DICTIONARY LEARNING 5 This disclosure relates generally to the field of image processing, and, more particularly, to a method and system for Multimodal Image Super-Resolution (MISR) using convolutional dictionary learning. Existing sparse representation learning based techniques for MISR have certain limitations which impact quality 10 of the reconstructed image. The present disclosure performs MISR using convolutional dictionaries which are translation invariant. Low-resolution image of the target modality and high-resolution image of the guidance modality are modelled using their respective convolutional dictionaries and associated coefficients. Additionally, two coupling convolutional dictionaries are learned to 15 model the relationship between them and synthesize the high-resolution image of the target modality more efficiently.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

29 December 2023

Publication Number

27/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. KUMAR, Kriti

Tata Consultancy Services Limited, Gopalan Global Axis, H- Block, EPIP Industrial Area, Whitefield, Bangalore 560066, Karnataka, India

2. KUMAR, Achanna Anil

Tata Consultancy Services Limited, Gopalan Global Axis, H- Block, EPIP Industrial Area, Whitefield, Bangalore 560066, Karnataka, India

3. CHANDRA, Mariswamy Girish

Tata Consultancy Services Limited, Gopalan Global Axis, H- Block, EPIP Industrial Area, Whitefield, Bangalore 560066, Karnataka, India

4. SAHU, Saurabh

Tata Consultancy Services Limited, Gopalan Global Axis, H- Block, EPIP Industrial Area, Whitefield, Bangalore 560066, Karnataka, India

5. MAJUMDAR, Angshul

Indraprastha Institute of Information Technology Delhi, Okhla Industrial Estate, Phase III, near Govind Puri Metro Station, New Delhi 110020, Delhi, India

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR MULTIMODAL IMAGE
SUPER-RESOLUTION USING CONVOLUTIONAL
DICTIONARY LEARNING
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description:
The following specification particularly describes the invention and the manner in
which it is to be performed.

CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
[001] The present application claims priority from Indian provisional application no. 202321089711, filed on December 29, 2023. The entire contents of the aforementioned application are incorporated herein by reference. 5
TECHNICAL FIELD [002] The disclosure herein generally relates to the field of image processing, and, more particularly, to a method and system for multimodal image super-resolution using convolutional dictionary learning. 10
BACKGROUND
[003] Image super-resolution (ISR) refers to enhancing pixel-based image
resolution by minimizing visual artifacts. Generating a high-resolution (HR)
version of a low-resolution (LR) image is a complex task that involves inferring
15 missing pixel values, which makes ISR a challenging and ill-posed problem.
Different approaches have been proposed to regularize this ill-posed problem by
incorporating prior knowledge, such as natural priors, local and non-local
similarity, features based on dictionary learning and deep learning. But all these
methods primarily focus on single-modality images without exploiting the
20 information available from other imaging modalities that can be utilized as
guidance for ISR. In many practical applications with multi-modal imaging systems
in place, the same scene is captured by different imaging modalities. Remote
sensing for earth observation is one such application where panchromatic, multi-
spectral images are acquired at different resolutions to manage the cost, bandwidth,
25 and complexity. This drives the need for Multi-modal Image Super-Resolution
(MISR) techniques that can enhance the LR images of the target modality using the
information from HR images of other modalities (referred as guidance modality)
that share salient features like boundaries, textures, edges, etc. Depth map super-
resolution with guidance from RGB, medical image super-resolution using multi-
30 modal Magnetic Resonance (MR) and cross-modal Computed Tomography (CT)
and MR image are some other applications where MISR is required.

[004] Existing works in the field of MISR include compressive sensing,
Guided image Filtering (GF), Joint Bilateral Filtering (JBF), Joint image
Restoration (JR), Deep Joint image Filtering (DJF), Coupled Dictionary Learning
(Coupled DL), Joint Coupled Transform Learning (JCTL) and Joint Coupled Deep
5 Transform Learning (JCDTL), and the recent Multi-modal Convolutional
Dictionary Learning technique (MCDL). All these techniques differ in the way they
transfer the structural details of the guidance image to enhance the LR image of
target modality. Their performance depends on how well they are able to identify
and model the complex dependencies between the two imaging modalities. Sparse
10 representation learning using Dictionary Learning (DL) has gained popularity for
addressing inverse problems including MISR but it has certain constraints. One
notable limitation is that the learned dictionaries are inherently not translation
invariant, i.e., the basis elements often appear as shifted versions of one another.
Additionally, as these dictionaries operate on individual patches rather than the
15 entire image, they reconstruct and sparsify patches independently, due to which the
underlying structure of the signal may get lost. Both these limitations impact quality
of the reconstructed image. Convolutional Dictionary Learning (CDL), on the other
hand, employs translation-invariant dictionaries, called convolutional dictionaries,
which can effectively mitigate these limitations. These dictionaries are well-suited
20 for representing signals that exhibit translation invariance, such as natural images
and sounds, making them applicable to a range of image processing and computer vision tasks.
[005] Some of the works in literature have attempted usage of CDL for
MISR but they have certain drawbacks. One such work discloses an image super-
25 resolution analysis method based on multi-modal convolution sparse coding
network which comprises of learning 5 modules (1 multi-modal convolutional
sparse encoder, 2 side information encoders and 2 convolutional decoders) to
generate a HR image of target modality. This network has a large number of
parameters that has to be trained and therefore requires huge amount of training
30 data and longer training time.

SUMMARY
[006] Embodiments of the present disclosure present technological
improvements as solutions to one or more of the above-mentioned technical
problems recognized by the inventors in conventional systems. For example, in one
5 embodiment, a method for multimodal image super-resolution using convolutional
dictionary learning is provided. The method includes obtaining a plurality of training images comprising a set of low-resolution images ‘�’ of a target modality, a set of high-resolution images ‘�’ of a guidance modality, and a set of high-resolution images ‘�’ of the target modality. Further, the method includes
10 initializing a plurality of dictionaries and associated plurality of sparse coefficients.
The plurality of dictionaries comprise i) a first convolutional dictionary ‘�’ associated with a first set of sparse coefficients ‘�’ among the plurality of sparse coefficients, ii) a second convolutional dictionary ‘�’ associated with a second set of sparse coefficients ‘�’ among the plurality of sparse coefficients, iii) a first
15 coupling convolutional dictionary ‘�’, and iv) a second coupling convolutional
dictionary ‘�’. Furthermore, the method includes jointly training the initialized plurality of dictionaries and the associated plurality of sparse coefficients using the plurality of training images by performing a plurality of steps iteratively until convergence of an objective function is achieved. The plurality of steps comprise
20 training the plurality of sparse coefficients by keeping the plurality of dictionaries
fixed; and training the plurality of dictionaries by keeping the plurality of sparse coefficients fixed. The trained plurality of dictionaries and the associated plurality of sparse coefficients obtained upon achieving the convergence of the objective function are used for performing a multimodal image super resolution.
25 [007] In another aspect, a system for multimodal image super-resolution
using convolutional dictionary learning is provided. The system includes: a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are
30 configured by the instructions to obtain a plurality of training images comprising a
set of low-resolution images ‘�’ of a target modality, a set of high-resolution

images ‘�’ of a guidance modality, and a set of high-resolution images ‘�’ of the
target modality. Further, the one or more hardware processors are configured by the
instructions to initialize a plurality of dictionaries and associated plurality of sparse
coefficients. The plurality of dictionaries comprise i) a first convolutional
5 dictionary ‘�’ associated with a first set of sparse coefficients ‘�’ among the
plurality of sparse coefficients, ii) a second convolutional dictionary ‘�’ associated with a second set of sparse coefficients ‘�’ among the plurality of sparse coefficients, iii) a first coupling convolutional dictionary ‘�’, and iv) a second coupling convolutional dictionary ‘�’. Furthermore, the one or more hardware
10 processors are configured by the instructions to jointly train the initialized plurality
of dictionaries and the associated plurality of sparse coefficients using the plurality of training images by performing a plurality of steps iteratively until convergence of an objective function is achieved. The plurality of steps comprise training the plurality of sparse coefficients by keeping the plurality of dictionaries fixed; and
15 training the plurality of dictionaries by keeping the plurality of sparse coefficients
fixed. The trained plurality of dictionaries and the associated plurality of sparse coefficients obtained upon achieving the convergence of the objective function are used for performing a multimodal image super resolution.
[008] In yet another aspect, there are provided one or more non-transitory
20 machine-readable information storage mediums comprising one or more
instructions which when executed by one or more hardware processors cause a method for multimodal image super-resolution using convolutional dictionary learning. The method includes obtaining a plurality of training images comprising a set of low-resolution images ‘�’ of a target modality, a set of high-resolution
25 images ‘�’ of a guidance modality, and a set of high-resolution images ‘�’ of the
target modality. Further, the method includes initializing a plurality of dictionaries and associated plurality of sparse coefficients. The plurality of dictionaries comprise i) a first convolutional dictionary ‘�’ associated with a first set of sparse coefficients ‘�’ among the plurality of sparse coefficients, ii) a second
30 convolutional dictionary ‘�’ associated with a second set of sparse coefficients ‘�’
among the plurality of sparse coefficients, iii) a first coupling convolutional

dictionary ‘�’, and iv) a second coupling convolutional dictionary ‘�’.
Furthermore, the method includes jointly training the initialized plurality of
dictionaries and the associated plurality of sparse coefficients using the plurality of
training images by performing a plurality of steps iteratively until convergence of
5 an objective function is achieved. The plurality of steps comprise training the
plurality of sparse coefficients by keeping the plurality of dictionaries fixed; and
training the plurality of dictionaries by keeping the plurality of sparse coefficients
fixed. The trained plurality of dictionaries and the associated plurality of sparse
coefficients obtained upon achieving the convergence of the objective function are
10 used for performing a multimodal image super resolution.
[009] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
15 BRIEF DESCRIPTION OF THE DRAWINGS
[010] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[011] FIG. 1 illustrates an exemplary system for multimodal image super-
20 resolution using convolutional dictionary learning, according to some embodiments
of the present disclosure.
[012] FIG. 2 illustrates a flow diagram of a processor implemented method
for multimodal image super-resolution using convolutional dictionary learning,
according to some embodiments of the present disclosure.
25 [013] FIG. 3 is a block diagram of an example implementation of method
of FIG. 2, according to some embodiments of the present disclosure.
[014] FIG. 4 illustrates convergence plot of an example implementation of method of FIG. 2, according to some embodiments of the present disclosure.
[015] FIGS. 5A to 5D, collectively referred as FIG. 5, illustrate examples
30 of trained plurality of dictionaries obtained from method of FIG. 2, according to
some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
[016] Exemplary embodiments are described with reference to the
accompanying drawings. In the figures, the left-most digit(s) of a reference number
identifies the figure in which the reference number first appears. Wherever
5 convenient, the same reference numbers are used throughout the drawings to refer
to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[017] In multi-modal imaging systems, different modalities often capture
10 the image from the same scene. While different sensors contain unique features,
they still share some common features, for example, edges, texture, and shapes, that can be leveraged for super-resolution tasks. The objective of Multi-modal Image Super resolution (MISR) is to reconstruct a High Resolution (HR) image � of target modality from a Low Resolution (LR) image � of target modality with the guidance
15 of HR image � from another modality, referred as guidance modality, by modelling
the cross-modal dependencies between the different modalities. The existing techniques of MISR differ in the way they transfer the structural details of the guidance image to enhance the LR image of target modality. Their performance depends on how well they are able to identify and model the complex dependencies
20 between the two imaging modalities. To handle the disparities between the
guidance and target modality much better, learning-based methods like deep learning, and sparse representation learning employing dictionaries and transforms are more popular. In general, deep learning methods require more training data and massive compute resources for good reconstruction. Also, they lack interpretability
25 and the trained models are not guaranteed to enforce measurement consistency
between the inputs and the output during testing. In contrast, sparse representation learning-based methods do not suffer from these drawbacks and offer an improved performance compared to deep learning techniques, especially with limited training data.
30 [018] However, sparse representation learning based techniques also have
certain constraints. The learned dictionaries are inherently not translation invariant.

Additionally, as these dictionaries operate on individual patches rather than the entire image, they reconstruct and sparsify patches independently, due to which the underlying structure of the signal may get lost. Both these limitations impact the quality of the reconstructed image. Thus, the present disclosure provides a method 5 and system for multi-modal image super resolution using convolutional dictionaries which are translation invariant. The convolutional dictionaries consists of a set of � filters and associated sparse coefficients that are learned to capture features (such as edges, corners, color gradients, boundaries etc.) of input images from different modalities. As understood by a person skilled in the art, in convolutional dictionary 10 learning, a signal (i.e. image, in the context of present disclosure) {��}��=1 with � measurements each of � dimension is reconstructed using a set of � linear filters {��}�=1 and associated set of coefficients {��,�}��=1 by optimizing equation 1.
min 1∑��=1‖��-∑�=1��∗ ��,�‖2 + �∑�,�‖��,�‖ , subject to ‖��‖2 =
1 ∀ � ….. (1)
15 In equation 1, ∗ denotes convolution operation, �1 norm on ��,� is used to enforce
sparsity and �2 norm constraint on �� is employed to compensate for scaling
ambiguity between dictionary atoms and the coefficients. Defining �� as a linear
operator such that ��,� = �� ∗ ��,� and taking � = (�1,…,��), � =
�1,1 …�1,�
(�1,…, ��) and � = ⋮ ⋮ , equation 1 is rewritten as equation 2.
( ⋮ 1 ... aMi)
20 min-‖�-Di4£ + A||i4 (2)
�,� 2 ‖� ‖1
The optimization problem in equation 2 is not jointly convex in both the dictionary filters and the coefficients. Hence, Alternating Minimization (AM) technique is employed to estimate them. While the convolutional sparse coefficients can be solved by Alternating Direction Method of Multipliers (ADMM) in the DFT 25 domain, the convolutional dictionaries can be solved using the Convolutional Constrained Method of Optimal Directions (CCMOD).
[019] Embodiments of present disclosure provide a method and system for multi-modal image super resolution using convolutional dictionaries. Initially a

plurality of training images are obtained. The plurality of training images comprise
a set of low-resolution (LR) images ‘�’ of a target modality, a set of high-resolution
(HR) images ‘�’ of a guidance modality, and a set of high-resolution (HR) images
‘�’ of the target modality. Then, a plurality of dictionaries and associated plurality
5 of sparse coefficients are initialized. The plurality of dictionaries comprise i) a first
convolutional dictionary ‘�’ associated with a first set of sparse coefficients ‘�’ among the plurality of sparse coefficients. The first convolutional dictionary comprises � filters to extract features from the set of low-resolution images of the target modality; ii) a second convolutional dictionary ‘�’ associated with a second
10 set of sparse coefficients ‘�’ among the plurality of sparse coefficients. The second
convolutional dictionary comprises � filters to extract features from the set of high-resolution images of the guidance modality; iii) a first coupling convolutional dictionary ‘�’; and iv) a second coupling convolutional dictionary ‘�’. The first and second coupling convolutional dictionaries model the relationship between the
15 target and guidance modalities. Initializing the dictionaries and coefficients
provides a starting point for the training process. The initialized plurality of dictionaries and associated plurality of sparse coefficients are jointly trained (updated) iteratively using the plurality of training images until convergence of an objective function is achieved. At each iteration, the sparse coefficients are trained
20 by keeping dictionaries fixed and then the dictionaries are trained by keeping sparse
coefficients fixed. The trained plurality of dictionaries and the associated plurality of sparse coefficients obtained upon achieving the convergence of the objective function are used for performing multimodal image super resolution.
[020] When a new low-resolution image of the target modality and a high
25 resolution image of guidance modality are obtained, the plurality of sparse
coefficients are computed based on the trained plurality of dictionaries and the obtained images. A first set of coefficients � is computed based on the trained first convolutional dictionary � and the low-resolution image of the target modality by using a standard convolutional sparse coding update. Similarly, a second set of
30 coefficients � is computed based on the trained second convolutional dictionary �
and the low-resolution image of the target modality by using a standard

convolutional sparse coding update. Then, a high resolution image �� of the
target modality is generated using the trained first coupling convolutional
dictionary, the trained second coupling convolutional dictionary, the first set of test
coefficients, and the second set of test coefficients as �� = �� + ��.
5 [021] Referring now to the drawings, and more particularly to FIG. 1
through 5, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
10 [022] FIG. 1 illustrates an exemplary system for multimodal image super-
resolution using convolutional dictionary learning, according to some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more processors (104), communication interface device(s) (106) or Input/Output (I/O) interface(s) (106) or user interface (106), and one or more data storage devices or
15 memory (102) operatively coupled to the one or more processors (104). The one or more processors (104) that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the
20 processor(s) is configured to fetch and execute computer-readable instructions
stored in the memory. In an embodiment, the system 100 can be implemented in a
variety of computing systems, such as laptop computers, notebooks, hand-held
devices, workstations, mainframe computers, servers, a network cloud, and the like.
[023] The I/O interface device(s) (106) can include a variety of software
25 and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) (106) receives low resolution image of target modality
30 and high resolution image of guidance modality as input and provides high resolution image of target modality as output. The memory (102) may include any

computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, 5 and magnetic tapes. Functions of the components of system 100 are explained in conjunction with flow diagram depicted in FIG. 2 and examples illustrated in FIGS. 3 and 5 for multimodal image super-resolution using convolutional dictionary learning.
[024] In an embodiment, the system 100 comprises one or more data
10 storage devices or the memory (102) operatively coupled to the processor(s) (104) and is configured to store instructions for execution of steps of the method (200) depicted in FIG. 2 by the processor(s) or one or more hardware processors (104). The steps of the method of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1, the
15 steps of flow diagrams as depicted in FIG. 2, and examples illustrated in FIGS. 3 and 5. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the
20 steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
[025] FIG. 2 illustrates a flow diagram of a processor implemented method (200) for multimodal image super-resolution using convolutional dictionary
25 learning, according to some embodiments of the present disclosure. At step 202 of the method 200, the one or more hardware processors are configured to obtain a plurality of training images (� = 1,2,…, � training images) comprising a set of LR images '�’ of a target modality, a set of HR images �' of a guidance modality, and a set of HR images '�’ of the target modality. For example, the target modality is
30 NIR (Near InfraRed) or MS (Multispectral) and guidance modality is RGB. Further, at step 204 of the method 200, the one or more hardware processors are configured

to initialize a plurality of dictionaries and associated plurality of sparse coefficients. The plurality of dictionaries have � filters each to extract features from images of different modalities. The plurality of dictionaries comprise i) a first convolutional dictionary ‘�’ (mathematically represented as {��}��=1) associated with a first set
5 of sparse coefficients ‘�’ ({��,�}��=1), ii) a second convolutional dictionary ‘�’
({��}�=1) associated with a second set of sparse coefficients ‘�’ ({��,�}��=1), iii) a first coupling convolutional dictionary ‘�’ ({��}��=1), and iv) a second coupling convolutional dictionary ‘�’ ({��}��=1). Initializing the dictionaries and coefficients provides a starting point for training process. In an embodiment, all the
10 dictionary filters are initialized with a random matrix with real numbers between 0 and 1 drawn from a uniform distribution. The coefficients are initialized with a matrix with 0's. Different initialization techniques may be used in alternate embodiments. The first convolutional dictionary comprises � filters to extract features from low-resolution images of the target modality. The second
15 convolutional dictionary comprises � filters to extract features from the high-resolution images of the guidance modality. The first coupling convolutional dictionary and the second coupling convolutional dictionary are used to model relationship between the target and guidance modalities i.e., between the first set of sparse coefficients and the second set of sparse coefficients such that equation 3 is
20 satisfied. In equation 3, �� is an image from the set of HR images ‘�’ of the target modality.
�� = ∑�=1 �� ∗ ��,� + ∑�=1 �� ∗ ��,� ….. (3)
[026] Once the plurality of dictionaries and associated plurality of sparse coefficients are initialized, at step 206 of the method 200, the one or more hardware
25 processors are configured to jointly train (or update) the initialized plurality of dictionaries and the associated plurality of sparse coefficients using the plurality of training images by performing steps 206A and 206B iteratively until convergence of an objective function is achieved. The objective function is given by equation 4. Convergence of the objective function is achieved when the loss given by equation
30 4 does not change significantly over the subsequent iterations, with the absolute

value being less than an empirically calculated threshold value. The trained plurality of dictionaries and the associated plurality of sparse coefficients obtained upon achieving the convergence of the objective function are used for performing a multimodal image super resolution.
5 min -\\X-��‖2+-�-GB2+�\\Z-��-��‖2+��‖�‖1 +
�,�,�,�,�,� 2 " t 2‖ ‖� 2" ^
��‖�‖1 ….. (4)
The equation 4 is derived from equation 5 by using block structured notations for
dictionaries, coefficients as defined in equation 2 and considering ��, ��, ��, ��
as linear operators such that ��,� = �� ∗ ��,�, �� ,� = �� ∗ ��,�, �� ,� =
10 ��∗��,� and ��,� = ��∗��,�.
min -∑�‖�� - ∑� �� ∗ ��,�‖2 +1∑�‖�� - ∑� �� ∗
��,�‖2 + 2 ∑�‖�� - ∑� �� ∗ ��,� -∑� ��∗ ��,�‖22 + �� ∑�,�‖��,�‖1 +
��∑�,�‖��, �‖1 subject to ||��||2 = 1,||��||2 = 1,||��||22 = 1,| |��||22 = 1 …..
(5)
15 [027] The first two terms in equations 4 and 5 ensure that each of the
dictionary filters and coefficients are learnt in such a way that they reconstruct the images of respective modality well. The third term defines coupling between coefficients of different image modalities to reconstruct the HR image of target modality. The remaining terms constrain the learned coefficients to be sparse.
20 [028] At step 206A, the plurality of sparse coefficients are trained
(alternatively referred as learnt or updated) by keeping the plurality of dictionaries fixed. The first set of sparse coefficients are updated based on the set of LR images of the target modality, the set of HR images of the target modality, the first convolutional dictionary, the first coupling convolutional dictionary, the second
25 coupling convolutional dictionary, and the second set of sparse coefficients by solving equation 6 using Alternating Direction Method of Multipliers (ADMM) technique. The second set of sparse coefficients are updated based on the set of HR images of guidance modality, the set of HR images of the target modality, the second convolutional dictionary, the first coupling convolutional dictionary, the

second coupling convolutional dictionary, and the first set of sparse coefficients by solving equation 7 using ADMM technique.
min1 ‖� - SA2‖� + �2 ‖� ′ - ��‖2 + ��‖�‖1, wherein �′ =�-�� ….
(6)
1 2 �
5 min1 ‖� - ��‖ 2 + �2 ‖ �′ - VB 2‖� + ��‖�‖ 1, wherein �′=�-�� ….
(7)
[029] Applying ADMM to equation 6 using variable splitting by introducing an auxiliary variable �� constrained to be equal to the primary variable � results in equation 8.
10 min-2‖� - ��‖2 +�2 ‖�′ -��‖2 + ��‖��‖1 �� = �� … (8)
With �� as dual variable, the ADMM iterations are given by equations 9 to 11, where � controls convergence rate.
��+1 = min12 ‖� - ��‖2 +�2 ‖�′ -��‖2 +�2 ‖�- �� + ��‖2….. (9)
��+1 = min�‖��+1 - �� + ��f + ��‖��‖1 ….. (10)
��
15 ��+1 = �� + ��+1 - ��+1 ….. (11)
Taking �� = �� - ��, the solution of � is obtained by taking derivative of equation 9 with respect to � and equating it to zero which results in equation 12.
(�� + �� + ��)� = �� + ��′ +��̂� … (12)
Applying Discrete Fourier Transform (DFT) to equation 12 results in equation 13 20 where ^ indicates DFT transform of respective parameters.
(�̂��̂ + ��̂��̂ + ��)�̂ = �̂��̂ + ��̂��̂′ +��̂� … (13)
The solution to equation 13 is obtained using iterated Sherman-Morrison algorithm. The solution to equation 10 is obtained by soft-thresholding similar to the work in [Fangyuan Gao et.al, “Multi-modal convolutional dictionary learning,” IEEE 25 Transactions on Image Processing, vol. 31, pp. 1325- 1339, 2022] and equation 11 is solved by simple arithmetic operations. Similarly, solution to equation 7 is given by equation 14, where �� = �� - ��.
(�̂��̂ + ��̂��̂ + ��)�̂ = �̂��̂ + ��̂��̂′ + ��̂� … (14)

The auxiliary variable �� and dual variable �� associated with the second set of coefficients � follow standard updates similar to equations 10 and 11.
[030] Once the plurality of sparse coefficients are updated, at step 206B, the plurality of dictionaries are trained by keeping the plurality of sparse 5 coefficients fixed. The first convolutional dictionary is updated based on the set of LR images of the target modality using Convolutional Constrained Method of Optimal Directions (CCMOD) technique. The second convolutional dictionary is updated based on the set of HR images of the guidance modality using the CCMOD technique. The first coupling convolutional dictionary is updated based on the set 10 of HR images of the target modality, the first set of sparse coefficients, the second set of sparse coefficients and the second coupling convolutional dictionary by converting into a standard Convolutional Dictionary Learning (CDL) problem by
minimizing 1 ‖�′ - ��‖22, wherein �′ = � - ��. Similarly, the second coupling
convolutional dictionary is updated based on the set of HR images of the target
15 modality, the first set of sparse coefficients, the second set of sparse coefficients
and the first coupling convolutional dictionary by converting into a standard CDL
problem by minimizing 12 ‖�′ - ��‖2, wherein �′ = � - ��. The plurality of
dictionaries obtained after training are used to perform MISR.
[031] FIGS. 5A to 5D, collectively referred as FIG. 5, illustrate examples
20 of trained plurality of dictionaries obtained from method of FIG. 2, according to some embodiments of the present disclosure. FIG. 5A illustrates first convolutional dictionary {��}��=1 that is trained to extract features from images of the target modality (Multispectral (MS) images in this example). FIG. 5B illustrates second convolutional dictionary {��}�=1 that is trained to extract features from images of
25 the guidance modality (RGB (Red, Green, Blue) images in this example). FIGS. 5C and 5D illustrates first coupling convolutional dictionary {��}��=1 and second coupling convolutional dictionary {��}��=1, respectively, that are trained to model relationship between target and guidance modalities. FIG. 5 illustrates each of the trained plurality of dictionaries employing different numbers of filters M = 4, 8 and
30 12 of atom size k = 8×8 on RGB-MS data. Low-Resolution MS images are known

to be dominated by low-frequency components, resulting in a smoothed dictionary �� emphasizing low frequencies for the LR target modality. On the other hand, High-Resolution guidance modality (RGB) images contain both high and low-frequency components, that enables the learned filters �� to accommodate a wider 5 frequency range. A high correlation between �� and ��, as well as between �� and corresponding �� can be observed from FIGS. 5A, 5C and FIGS. 5B, 5D respectively. This correlation arises from the shared set of sparse codes (coefficients) �� and ��, respectively. Also, it can be observed that the learned coupling convolutional dictionaries effectively combine the high frequency
10 information from HR guidance and low frequency information from LR target to synthesize the HR target image.
[032] The plurality of dictionaries obtained upon achieving the convergence of the objective function are used for performing multimodal image super resolution on a new low resolution image �� of the target modality by using
15 a HR image �� of the guidance modality. Initially a first set of test coefficients �� is computed based on the learned first convolutional dictionary � and the LR image of the target modality �� following a standard convolutional sparse coding update by solving equation 15.
min1 ‖�� -��‖ 2 +�� … (15)
20 Similarly, a second set of test coefficients �� is computed based on the learned second convolutional dictionary G and the HR image of the guidance modality �� following the standard convolutional sparse coding update by solving equation 16.
min1‖�� - ��‖�2 + �� … (16)
25 Finally, a high resolution image �� of the target modality is generated using the learned first coupling convolutional dictionary �, the second coupling convolutional dictionary �, the first set of test coefficients, and the second set of test coefficients as �� = �� + ��. FIG. 3 is a block diagram of an example implementation of method of FIG. 2,
30 according to some embodiments of the present disclosure. Images of a scene in the

target modality (MS) and guidance modality (RGB) are obtained. Then, a first set
of coefficients � and a second set of coefficients � are computed using the trained
first convolutional dictionary � (illustrated in FIG. 5A) and the second
convolutional dictionary � (illustrated in FIG. 5B), respectively. Finally, a high
5 resolution image of target modality is generated using the computed first set of
coefficients �, a second set of coefficients �, trained first coupling convolutional dictionary (illustrated in FIG. 5C) and trained second coupling convolutional dictionary (illustrated in FIG. 5D). EXPERIMENTAL RESULTS
10 [033] Data Description: Datasets used for evaluating the performance of
the method 200 are: (1) RGB-Multispectral dataset [Ayan Chakrabarti and Todd Zickler, “Statistics of real-world hyperspectral images,” in CVPR 2011. IEEE, 2011, pp. 193– 200.] and (2) RGB-NIR dataset [Matthew Brown and Sabine Susstrunk, “Multi-spectral sift for scene category recognition,” in CVPR 2011.
15 IEEE, 2011, pp. 177–184.]. In both the datasets, RGB image is considered as the
guidance modality. Multispectral image is considered as the target modality in the first dataset and NIR (Near InfraRed) image is considered as the target modality in the second dataset. The two datasets contain the HR images of both the guidance (�) and target modalities (�). For experimentation purpose, LR image of target
20 modality is generated by downsizing the HR image by a required factor and then
applying bicubic interpolation on the down sampled image to upscale by the same factor. A factor of 4 is considered for down sampling both RGB/Multispectral and RGB/NIR for a fair comparison with benchmark techniques. Here, the RGB image used as guidance modality is converted to grayscale. Also, the multispectral image
25 at 640 nm is considered in the experiments for the RGB-Multispectral dataset.
[034] Benchmark Methods: The method 200 is compared with eight state-of-the-art MISR techniques including:
1) Guided image Filtering (GF [Kaiming He, Jian Sun, and Xiaoou Tang, “Guided image filtering,” IEEE transactions on pattern analysis and
30 machine intelligence, vol. 35, pp. 1397–1409, 06 2013]),

2) Joint Bilateral Filtering (JBF [Johannes Kopf, Michael F Cohen, Dani
Lischinski, and Matt Uyttendaele, “Joint bilateral upsampling,” ACM
Transactions on Graphics (ToG), vol. 26, no. 3, pp. 96–es, 2007.]),
3) Joint image Restoration (JR [Xiaoyong Shen, Qiong Yan, Li Xu, Lizhuang
5 Ma, and Jiaya Jia, “Multispectral joint image restoration via optimizing a
scale map,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 12, pp. 2518–2530, 2015]),
4) Deep Joint image Filtering (DJF [Yijun Li, Jia-Bin Huang, Narendra Ahuja,
and Ming-Hsuan Yang, “Deep joint image filtering,” in European
10 Conference on Computer Vision. Springer, 2016, pp. 154–169]),
5) Coupled Dictionary Learning (Coupled DL [Pingfan Song, Xin Deng, Joao
FC Mota, Nikos Deligiannis, Pier Luigi Dragotti, and Miguel RD
Rodrigues, “Multimodal image super-resolution via joint sparse
representations induced by coupled dictionaries,” IEEE Transactions on
15 Computational Imaging, vol. 6, pp. 57–72, 2019.]),
6) shallow variant of Joint Coupled Transform Learning (JCTL [Andrew
Gigie, A Anil Kumar, Angshul Majumdar, Kriti Kumar, and M Girish
Chandra, “Joint coupled transform learning framework for multimodal
image super-resolution,” in ICASSP 2021-2021 IEEE International
20 Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,
2021, pp. 1640–1644.]),
7) Joint Coupled Deep Transform Learning (JCDTL [R Krishna Kanth,
Andrew Gigie, Kriti Kumar, A Anil Kumar, Angshul Majumdar, and
Balamuralidhar P, “Multi-modal image super-resolution with joint coupled
25 deep transform learning,” in 2022 30th European Signal Processing
Conference (EUSIPCO), 2022, pp. 474–478.]), and
8) Multi-modal Convolutional Dictionary Learning (MCDL [Fangyuan Gao,
Xin Deng, Mai Xu, Jingyi Xu, and Pier Luigi Dragotti, “Multi-modal
convolutional dictionary learning,” IEEE Transactions on Image
30 Processing, vol. 31, pp. 1325– 1339, 2022]).

[035] Results: The reconstruction quality of the HR image of the target
modality is assessed using Structural SIMilarity (SSIM) and Peak Signal to Noise
Ratio (PSNR) metrics. The benchmark methods are trained on 31 image pairs for
RGB/NIR and 35 image pairs for RGB/Multispectral. Each 512 × 512 image is
5 divided into non-overlapping patches of size 16 × 16. During testing, the patches of
the test image are reconstructed individually and combined to create the full image. On the other hand, the techniques using convolutional dictionary learning (method 200 and MCDL) consider non-overlapping patches of size 256 × 256 for training, and testing is conducted on the full image. The method 200 is trained only on 10
10 image pairs for each dataset instead of 31 and 35 image pairs, respectively,
considered for benchmark methods. The hyperparameters associated with all the techniques are tuned using grid search. The method 200 is implemented using the SPORCO library in MATLAB. The results for method 200 and MCDL are reported using 4 filters of size 8 × 8 that gave optimal performance for both the datasets.
15 Table 1

RGB/NIR Dataset (4x up sampling)
Test image Indoor 4 Indoor 5 Indoor 11 Indoor 16 Indoor 21
Method PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
Method 200 31.423 0.925 32.263 0.933 31.387 0.908 32.128 0.930 31.257 0.911

MCDL 28.932 0.902 28.307 0.908 28.874 0.842 30.601 0.914 28.787 0.865
JCDTL 28.783 0.899 30.414 0.915 30.384 0.903 32.018 0.916 29.442 0.893

JCTL 27.961 0.915 30.580 0.937 26.808 0.893 30.438 0.925 26.554 0.895
Coupled DL 30.629 0.942 29.865 0.951 28.879 0.896 33.070 0.950 29.668 0.915

DJF 26.958 0.898 27.804 0.899 27.015 0.817 30.149 0.890 27.619 0.853
JR 22.271 0.841 25.076 0.939 22.864 0.815 23.502 0.867 21.626 0.794

GF 29.854 0.946 32.058 0.971 27.589 0.901 31.916 0.938 27.133 0.909

JBF 26.354 0.919 31.283 0.968 26.480 0.906 30.431 0.929 25.746 0.902
Table 2

RGB/Multispectral Dataset (4x up sampling)
Test image Imge6 Imge7 Imgf5 Imgf7 Imgh3
Method PSNR SSIM
0.856
0.605 0.841 0.814 0.835 0.828 0.814 0.774 0.746 PSNR SSIM 0.928
0.709
0.899
0.889
0.877
0.938
0.889 0.869 0.799 PSNR SSIM
0.954
0.880 0.947 0.939 0.906 0.824 0.890 0.901 0.886 PSNR SSIM
0.907
0.722 0.888 0.864 0.878 0.902 0.804 0.880 0.839 PSNR SSIM
Method 200 31.77 9
36.732
38.984
34.830
40.361 0.961
MCDL 25.47 7
28.955
33.426
27.894
35.564 0.896
JCDTL 31.44 1
35.496
37.522
33.339
39.403 0.948

JCTL 28.79 3
32.669
36.277
31.964
37.140 0.941
Couple d DL 31.04 9
33.222
34.239
31.401
36.107 0.920
DJF 20.96 8
26.732
32.588
23.851
30.788 0.924
JR 26.51 9
32.781
33.933
29.295
33.999 0.922
GF 25.33 2
29.709
31.706
28.045
33.518 0.807
JBF 25.53 5
29.655
32.411
28.874
34.461 0.902
[036] Tables 1 and 2 summarize the MISR results for the first and second
datasets, respectively, obtained with 5 test image pairs. It can be observed that the
5 joint learning-based approaches (Coupled DL, JCTL, JCDTL, and the method 200)

display superior performance compared to other filtering (DJF, JR, GF, and JBF)
and two-stage (MCDL) based approaches for most of the images. This is because
joint learning enables effective learning of discriminative and common features or
representations from each modality (LR image of target and HR image of guidance)
5 that assists in improved reconstruction of HR image of target modality. Among the
joint learning methods, the method 200 shows improved reconstruction compared to other benchmark methods for most of the images, despite training with limited data. This demonstrates the potential of using shift-invariant dictionaries, i.e., convolutional dictionaries, in learning representation for robust and effective
10 modelling for MISR. It is important to note that compared to the two-stage CDL-
based MCDL approach that requires learning 6 convolutional dictionaries and 3 associated sparse coefficients (1 common and 2 unique for the respective modality), method 200 requires learning only 4 convolutional dictionaries and 2 associated sparse coefficients. Thus, even with reduced complexity, the disclosed method
15 provides improved performance over the MCDL approach as shown in tables 1 and
2. The convergence plot of the method 200 is given in FIG. 4, which shows that the method converges within a few iterations. The method 200 took approximately 10 minutes for training over 10 image pairs with 50 iterations and approximately 0.9 seconds for testing on one image pair on AMD Ryzen 5 4500U CPU@2.3GHz with
20 16GB RAM without GPU. In contrast, the existing deep learning methods, like
DJF, require approximately 2 hours to train and 2 seconds for testing, despite using a GPU accelerator.
[037] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the
25 subject matter embodiments is defined by the claims and may include other
modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
30 [038] It is to be understood that the scope of the protection is extended to
such a program and in addition to a computer-readable means having a message

therein; such computer-readable storage means contain program-code means for
implementation of one or more steps of the method, when the program runs on a
server or mobile device or any suitable programmable device. The hardware device
can be any kind of device which can be programmed including e.g., any kind of
5 computer like a server or a personal computer, or the like, or any combination
thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with
10 software processing components located therein. Thus, the means can include both
hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
15 [039] The embodiments herein can comprise hardware and software
elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a
20 computer-usable or computer readable medium can be any apparatus that can
comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[040] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological
25 development will change the manner in which particular functions are performed.
These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are
30 appropriately performed. Alternatives (including equivalents, extensions,
variations, deviations, etc., of those described herein) will be apparent to persons

skilled in the relevant art(s) based on the teachings contained herein. Such
alternatives fall within the scope of the disclosed embodiments. Also, the words
“comprising,” “having,” “containing,” and “including,” and other similar forms are
intended to be equivalent in meaning and be open ended in that an item or items
5 following any one of these words is not meant to be an exhaustive listing of such
item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[041] Furthermore, one or more computer-readable storage media may be
10 utilized in implementing embodiments consistent with the present disclosure. A
computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or
15 stages consistent with the embodiments described herein. The term “computer-
readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any
20 other known physical storage media.
[042] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method (200), comprising:
obtaining (202), via one or more hardware processors, a plurality of
training images comprising a set of low-resolution images ‘�’ of a target
5 modality, a set of high-resolution images ‘ �’ of a guidance modality, and a set
of high-resolution images ‘�’ of the target modality;
initializing (204), via the one or more hardware processors, a plurality
of dictionaries and associated plurality of sparse coefficients, wherein the
plurality of dictionaries comprise i) a first convolutional dictionary ‘�’
10 associated with a first set of sparse coefficients ‘�’ among the plurality of sparse
coefficients, ii) a second convolutional dictionary ‘�’ associated with a second
set of sparse coefficients ‘�’ among the plurality of sparse coefficients, iii) a
first coupling convolutional dictionary ‘�’, and iv) a second coupling
convolutional dictionary ‘�’; and
15 jointly training (206), via the one or more hardware processors, the
initialized plurality of dictionaries and the associated plurality of sparse
coefficients using the plurality of training images by performing a plurality of
steps iteratively until convergence of an objective function is achieved, wherein
the trained plurality of dictionaries and the associated plurality of sparse
20 coefficients obtained upon achieving the convergence of the objective function
are used for performing a multimodal image super resolution, and wherein the plurality of steps comprise:
training (206A) the plurality of sparse coefficients by keeping
the plurality of dictionaries fixed; and
25 training (206B) the plurality of dictionaries by keeping the
plurality of sparse coefficients fixed.
2. The method as claimed in claim 1, wherein the objective function is represented
as: min -‖� -SA2‖� +-\\� - GB2‖� +�\\Z - �� - VB2‖� + ��‖�‖1 +
�,�,�,�,�,� 2 2" 2"
30 ��‖�‖1.

3. The method as claimed in claim 1, wherein training the plurality of sparse
coefficients by keeping the plurality of dictionaries fixed comprises:
updating the first set of sparse coefficients based on the set of low-
resolution images of the target modality, the set of high-resolution images of
5 the target modality, the first convolutional dictionary, the first coupling
convolutional dictionary, the second coupling convolutional dictionary, and the
second set of sparse coefficients by solving min1‖� - ��‖�2 +�2 ‖�′ -
WA2‖� + ��‖�‖1, wherein �′ = � - ��; and
updating the second set of sparse coefficients based on the set of high-
10 resolution images of guidance modality, the set of high-resolution images of the
target modality, the second convolutional dictionary, the first coupling
convolutional dictionary, the second coupling convolutional dictionary, and the
first set of sparse coefficients by solving min12 ‖ � - GB2‖� +�2 ‖�′- VB2‖� +
��‖�‖1 , wherein �′ =�-��.
15
4. The method as claimed in claim 1, wherein training the plurality of dictionaries
by keeping the plurality of sparse coefficients fixed comprises:
updating the first convolutional dictionary based on the set of low-
resolution images of the target modality;
20 updating the second convolutional dictionary based on the set of high-
resolution images of the guidance modality;
updating the first coupling convolutional dictionary based on the set of
high-resolution images of the target modality, the first set of sparse coefficients,
the second set of sparse coefficients and the second coupling convolutional
25 dictionary by converting into a standard Convolutional Dictionary Learning
(CDL) problem; and
updating the second coupling convolutional dictionary based on the set
of high-resolution images of the target modality, the first set of sparse
coefficients, the second set of sparse coefficients and the first coupling
30 convolutional dictionary by converting into a standard CDL problem.

5. The method as claimed in claim 4, wherein converting the first coupling convolutional dictionary into a standard CDL problem comprises minimizing
1‖�′-��‖22, wherein �′ =�-��, and wherein converting the second
coupling convolutional dictionary into a standard CDL problem comprises
5 minimizing 12 ‖�′ - ��‖2, wherein �′= � - ��.
6. The method as claimed in claim 1, comprising performing multimodal image
super resolution on a new low resolution image of the target modality by:
obtaining the new low-resolution image �� of the target modality and
10 a high-resolution image �� of the guidance modality;
computing a first set of test coefficients �� based on the trained first convolutional dictionary � and the low-resolution image of the target modality �� by using a standard convolutional sparse coding update;
computing a second set of test coefficients �� based on the trained
15 second convolutional dictionary G and the high-resolution image of the
guidance modality �� by using the standard convolutional sparse coding update; and
generating a high resolution image �� of the target modality using the
trained first coupling convolutional dictionary, the trained second coupling
20 convolutional dictionary, the first set of test coefficients, and the second set of
test coefficients as �� = �� + ��.
7. A system (100), comprising:
a memory (102) storing instructions;
25 one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
obtain a plurality of training images comprising a set of low-
30 resolution images '�’ of a target modality, a set of high-resolution images

‘�’ of a guidance modality, and a set of high-resolution images ‘�’ of the target modality;
initialize a plurality of dictionaries and associated plurality of sparse
coefficients, wherein the plurality of dictionaries comprise i) a first
5 convolutional dictionary ‘�’ associated with a first set of sparse coefficients
‘�’ among the plurality of sparse coefficients, ii) a second convolutional
dictionary ‘�’ associated with a second set of sparse coefficients ‘�’ among
the plurality of sparse coefficients, iii) a first coupling convolutional
dictionary ‘�’, and iv) a second coupling convolutional dictionary ‘�’; and
10 jointly train the initialized plurality of dictionaries and the associated
plurality of sparse coefficients using the plurality of training images by
performing a plurality of steps iteratively until convergence of an objective
function is achieved, wherein the trained plurality of dictionaries and the
associated plurality of sparse coefficients obtained upon achieving the
15 convergence of the objective function are used for performing a multimodal
image super resolution, and wherein the plurality of steps comprise:
training the plurality of sparse coefficients by keeping the plurality of dictionaries fixed; and
training the plurality of dictionaries by keeping the plurality
20 of sparse coefficients fixed.
8. The system as claimed in claim 7, wherein the obj ective function is represented
as: min -\\X -SA2‖� +-\\� - GB2‖� +�\\Z - �� - VB2‖� + ��‖�‖1 +
�,�,�,�,�,�2" 2" 2"
��‖�‖1. 25
9. The system as claimed in claim 7, wherein training the plurality of sparse
coefficients by keeping the plurality of dictionaries fixed comprises:
updating the first set of sparse coefficients based on the set of low-
resolution images of the target modality, the set of high-resolution images of
30 the target modality, the first convolutional dictionary, the first coupling

convolutional dictionary, the second coupling convolutional dictionary, and the second set of sparse coefficients by solving min12 ‖�-��‖�2+�2 ‖�′-
WA2‖� + ��‖�‖1, wherein �′ = � - ��; and
updating the second set of sparse coefficients based on the set of high-
5 resolution images of guidance modality, the set of high-resolution images of
the target modality, the second convolutional dictionary, the first coupling convolutional dictionary, the second coupling convolutional dictionary, and
the first set of sparse coefficients by solving min1 ‖� - GB2‖� +
�2 ‖�′ - VB2‖� +��‖�‖1, wherein �′ = � - ��. 10
10. The system as claimed in claim 7, wherein training the plurality of dictionaries
by keeping the plurality of sparse coefficients fixed comprises:
updating the first convolutional dictionary based on the set of low-
resolution images of the target modality;
15 updating the second convolutional dictionary based on the set of high-
resolution images of the guidance modality;
updating the first coupling convolutional dictionary based on the set of
high-resolution images of the target modality, the first set of sparse coefficients,
the second set of sparse coefficients and the second coupling convolutional
20 dictionary by converting into a standard Convolutional Dictionary Learning
(CDL) problem; and
updating the second coupling convolutional dictionary based on the set
of high-resolution images of the target modality, the first set of sparse
coefficients, the second set of sparse coefficients and the first coupling
25 convolutional dictionary by converting into a standard CDL problem.
11. The system as claimed in claim 10, wherein converting the first coupling
convolutional dictionary into a standard CDL problem comprises minimizing
1‖�′-��‖22, wherein �′ =�-��, and wherein converting the second

coupling convolutional dictionary into a standard CDL problem comprises minimizing 12 ‖�′ - ��‖2, wherein �′= � - ��.
12. The system as claimed in claim 7, comprising performing multimodal image
5 super resolution on a new low resolution image of the target modality by:
obtaining the new low-resolution image �� of the target modality and a high-resolution image �� of the guidance modality;
computing a first set of test coefficients �� based on the trained first
convolutional dictionary � and the low-resolution image of the target modality
10 �� by using a standard convolutional sparse coding update;
computing a second set of test coefficients �� based on the trained
second convolutional dictionary G and the high-resolution image of the
guidance modality �� by using the standard convolutional sparse coding
update; and
15 generating a high resolution image �� of the target modality using the
trained first coupling convolutional dictionary, the trained second coupling convolutional dictionary, the first set of test coefficients, and the second set of test coefficients as �� = �� + ��.

Documents

Application Documents

#	Name	Date
1	202321089711-STATEMENT OF UNDERTAKING (FORM 3) [29-12-2023(online)].pdf	2023-12-29
2	202321089711-PROVISIONAL SPECIFICATION [29-12-2023(online)].pdf	2023-12-29
3	202321089711-FORM 1 [29-12-2023(online)].pdf	2023-12-29
4	202321089711-DRAWINGS [29-12-2023(online)].pdf	2023-12-29
5	202321089711-FORM-26 [22-01-2024(online)].pdf	2024-01-22
6	202321089711-Proof of Right [13-03-2024(online)].pdf	2024-03-13
7	202321089711-FORM 18 [13-03-2024(online)].pdf	2024-03-13
8	202321089711-ENDORSEMENT BY INVENTORS [13-03-2024(online)].pdf	2024-03-13
9	202321089711-DRAWING [13-03-2024(online)].pdf	2024-03-13
10	202321089711-CORRESPONDENCE-OTHERS [13-03-2024(online)].pdf	2024-03-13
11	202321089711-COMPLETE SPECIFICATION [13-03-2024(online)].pdf	2024-03-13
12	Abstract1.jpg	2024-05-22
13	202321089711-FORM 3 [12-02-2025(online)].pdf	2025-02-12
14	202321089711-REQUEST FOR CERTIFIED COPY [17-02-2025(online)].pdf	2025-02-17
15	202321089711-REQUEST FOR CERTIFIED COPY [17-02-2025(online)]-1.pdf	2025-02-17
16	202321089711-Power of Attorney [09-04-2025(online)].pdf	2025-04-09
17	202321089711-Form 1 (Submitted on date of filing) [09-04-2025(online)].pdf	2025-04-09
18	202321089711-Covering Letter [09-04-2025(online)].pdf	2025-04-09
19	202321089711-FORM-26 [22-05-2025(online)].pdf	2025-05-22
20	202321089711-FORM-26 [14-11-2025(online)].pdf	2025-11-14