Specification
DESC:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR A LOW-POWER LOSSLESS IMAGE COMPRESSION USING A SPIKING NEURAL NETWORK
Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
The present application claims priority from Indian provisional patent application number 202321040920, filed on June 15, 2023. The entire content of the abovementioned application is incorporated herein by reference.
TECHNICAL FIELD
The disclosure herein generally relates to the field of image compression, and, more particularly, to a method and system for reducing earth-bound image volume with an efficient lossless compression technique that is based on a highly energy efficient neural network.
BACKGROUND
Emergence of small remote sensing satellites has radically changed the landscape of space applications by enabling commercial organizations to launch Low Earth Orbit (LEO) CubeSat constellations for various earth observation missions such as weather monitoring, disaster monitoring, precision agriculture etc. Generally, LEO satellites are extremely constrained in terms of memory, computing resource, and power so that they are unable carry out even low-scale analytics on captured earth observation images. High resolution large (in the order of GBs) images are usually sent to receiving stations for further analysis, such as, segmentation, object detection, classification etc. However, the limited line-of-sight window for satellites in low orbit, large downstream data volume and high power requirement for transmission often jeopardize this earth-bound communication and delay the overall task of earth observation.
Orbital Edge Computing (OEC) is a recent approach that enables sensed data to be processed on-board to generate real-time alerts, thereby improving application latency. However, due to size and weight restrictions, these small satellites cannot carry large batteries or solar panels resulting in inadequate power supply at the same time, these satellites cannot carry large computing platforms either since such platforms are power hungry or are often heavy. These limitations have, to a large extent, adversely affected implementations of OEC.
Existing systems and methods are illustrating two paths, firstly, ways to reduce the size, weight and power requirements of the hardware that can be put on-board, and secondly, ways to reduce the downstream data volume (various types of high-resolution images and other sensed data) by different compression techniques that may consume less power. However, Lossy image compression techniques can reduce the data volume effectively, but these also introduce noise artefacts and lose image details resulting in poor observation results.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for reducing earth-bound image volume with an efficient lossless compression technique that is based on a highly energy efficient neural network is provided. The processor-implemented method includes receiving, via an input/output interface, a high resolution image with one or more channel spectral information as an input, wherein the one or more channel spectral information comprising a plurality of predefined square sized patches. Further, the processor-implemented method comprises extracting a plurality of neighborhood pixels from rows above center pixel and immediate left pixels of the center pixel in the patch from each of the plurality of predefined square sized patches associated to each spectral information channel. The extracted one or more neighborhood pixels are normalized based on a predefined range. Furthermore, the processor-implemented method comprises encoding values of the received one or more channel spectral information into one or more spike trains after normalizing information values for each spectral information channel and training a Spiking Neural Network (SNN) model using a set of received high resolution images with one or more channel spectral information to predict values of one or more non-boundary pixels of the image from the plurality of neighborhood pixels.
Further, the processor-implemented method comprises computing a residual error between predicted values of one or more non-boundary pixels of the image by SNN model and the actual values for each pixel of the plurality of neighborhood pixels and creating a residual error vector for the received high resolution image by combining the computed residual error for each pixel. Furthermore, the processor-implemented method comprises compressing the created residual error vector using a classical Arithmetic Encoder (AE) to send a compressed residual error vector along with a plurality of boundary pixels of the received high resolution image to a receiving station and predicting one or more missing pixels using the plurality of boundary pixels of the received high resolution image. Further, the processor-implemented method comprises decoding the compressed residual error vector using an arithmetic decoder. Furthermore, the processor-implemented method comprises determining a pixel value for a first missing pixel of the predicted one or more missing pixels using the spike trains and the plurality of boundary pixels and determining the next missing pixel of the one or more missing pixels via the SNN model using the determined pixel value of the first missing pixel. The prediction of next missing pixel continues till each of the one or more missing pixels are predicted. Finally, a high resolution image with one or more channel spectral information is reconstructed by determining the pixel values for each of the one or more missing pixels.
In another aspect, a system for reducing earth-bound image volume with an efficient lossless compression technique is provided. The system comprises a memory storing a plurality of instructions and one or more Input/Output (I/O) interfaces to receive a high resolution image with one or more channel spectral information as an input. The one or more channel spectral information comprising a plurality of predefined square sized patches. Further, the system comprises one or more hardware processors coupled to the memory via the one or more I/O interfaces, wherein the one or more hardware processors are configured by the instructions to execute the plurality of instructions stored in the at least one memory.
Further, the system is configured to extract a plurality of neighborhood pixels from rows above center pixel and immediate left pixels of the center pixel in the patch from each of the plurality of predefined square sized patches associated to each spectral information channel. The extracted one or more neighborhood pixels are normalized based on a predefined range. Furthermore, the one or more hardware processors are configured by the instructions to encode values of the received one or more channel spectral information into one or more spike trains after normalizing information values for each spectral information channel and training a Spiking Neural Network (SNN) model using a set of received high resolution images with one or more channel spectral information to predict values of one or more non-boundary pixels of the image from the plurality of neighborhood pixels.
Further, the one or more hardware processors are configured by the instructions to compute a residual error between predicted values of one or more non-boundary pixels of the image by SNN model and the actual values for each pixel of the plurality of neighborhood pixels and create a residual error vector for the received high resolution image by combining the computed residual error for each pixel. Furthermore, the one or more hardware processors are configured by the instructions to compress the created residual error vector using a classical Arithmetic Encoder (AE) to send a compressed residual error vector along with a plurality of boundary pixels of the received high resolution image to a receiving station and predicting one or more missing pixels using the plurality of boundary pixels of the received high resolution image. Further, the one or more hardware processors are configured by the instructions to decode the compressed residual error vector using an arithmetic decoder. Furthermore, the one or more hardware processors are configured by the instructions to determine a pixel value for a first missing pixel of the predicted one or more missing pixels using the spike trains and the plurality of boundary pixels and determine the next missing pixel of the one or more missing pixels via the SNN model using the determined pixel value of the first missing pixel. The prediction of next missing pixel continues till each of the one or more missing pixels are predicted. Finally, a high resolution image with one or more channel spectral information is reconstructed by determining the pixel values for each of the one or more missing pixels.
In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for a low-power lossless image compression using a Spiking Neural Network (SNN) is provided. The processor-implemented method includes receiving, via an input/output interface, a high resolution image with one or more channel spectral information as an input, wherein the one or more channel spectral information comprising a plurality of predefined square sized patches. Further, the processor-implemented method comprises extracting a plurality of neighborhood pixels from rows above center pixel and immediate left pixels of the center pixel in the patch from each of the plurality of predefined square sized patches associated to each spectral information channel. The extracted one or more neighborhood pixels are normalized based on a predefined range. Furthermore, the processor-implemented method comprises encoding values of the received one or more channel spectral information into one or more spike trains after normalizing information values for each spectral information channel and training a Spiking Neural Network (SNN) model using a set of received high resolution images with one or more channel spectral information to predict values of one or more non-boundary pixels of the image from the plurality of neighborhood pixels.
Further, the processor-implemented method comprises computing a residual error between predicted values of one or more non-boundary pixels of the image by SNN model and the actual values for each pixel of the plurality of neighborhood pixels and creating a residual error vector for the received high resolution image by combining the computed residual error for each pixel. Furthermore, the processor-implemented method comprises compressing the created residual error vector using a classical Arithmetic Encoder (AE) to send a compressed residual error vector along with a plurality of boundary pixels of the received high resolution image to a receiving station and predicting one or more missing pixels using the plurality of boundary pixels of the received high resolution image. Further, the processor-implemented method comprises decoding the compressed residual error vector using an arithmetic decoder. Furthermore, the processor-implemented method comprises determining a pixel value for a first missing pixel of the predicted one or more missing pixels using the spike trains and the plurality of boundary pixels and determining the next missing pixel of the one or more missing pixels via the SNN model using the determined pixel value of the first missing pixel. The prediction of next missing pixel continues till each of the one or more missing pixels are predicted. Finally, a high resolution image with one or more channel spectral information is reconstructed by determining the pixel values for each of the one or more missing pixels.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates an exemplary system for reducing earth-bound image volume with an efficient lossless compression technique that is based on a highly energy efficient neural network, in accordance with some embodiments of the present disclosure.
FIG. 2 is a functional block diagram to illustrate a low-power lossless image compression using a Spiking Neural Network (SNN), in accordance with some embodiments of the present disclosure.
FIG. 3A and 3B (collectively referred as FIG. 3) is an exemplary flow diagram illustrating a processor-implemented method for a low-power lossless image compression using a Spiking Neural Network (SNN) implemented by the system of FIG. 1, in accordance with some embodiments of the present disclosure.
FIG. 4 is a schematic diagram illustrating three channels for a neighborhood quartet of a given target pixel with predicted value for each color channel in accordance with some embodiments of the present disclosure.
FIG. 5 is a schematic diagram illustrating a feed-forward Spiking Neural Network (SNN) in accordance with some embodiments of the present disclosure.
FIG. 6 is a schematic diagram illustrating the SNN based lossless compression framework as a system workflow, in accordance with some embodiments of the present disclosure.
FIG. 7 is a schematic diagram illustrating iterations of pixel prediction using a sliding window mechanism on the neighborhood quartets in accordance with some embodiments of the present disclosure.
FIG. 8A, 8B, 8C, and 8D (collectively referred as FIG. 8) are graphical representation to illustrate variation of compression ratio in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
It would be appreciated that the lossless image compression techniques enable reconstruction of the image data without incurring any loss of information, thus significantly reducing the downstream data volume during space-to-ground data transmission. Large multi-layer Deep Neural Network (DNN) based solutions are found to achieve high compression ratio. In the DNN based flow, there is usually a Density Estimator (DE) block that learns to predict the pixels of input image patches, followed by an Arithmetic Encoder (AE) block that encodes the residual error (between original and predicted pixel values) and finally sends it as downstream data to the receiving station. This Density Estimator is usually designed either as a fully connected network or as a Recurrent Neural Network (RNN) such as LSTM. However, DNNs incur the cost of large memory and power requirements thus making them unfit for running on-board LEO satellites.
The lossless compression techniques can broadly be categorized as follows: (i) Prediction-based, (ii) Transform-based, (iii) Vector Quantization-based, and (iv) Deep Learning based techniques. Several recent deep learning based methods address the requirement for lossless image compression. Learning-based lossless image compression techniques use autoregressive models. The current approach follows a two-step method, a density estimator model followed by a conventional encoder such as, Arithmetic Encoder (AE), Lempel–Ziv–Markov Chain Algorithm (LZMA) encoder etc. Density estimator models learn to predict the pixel values of the input image. These can be further categorized into fully connected neural network and recurrent neural network such as DeepZip, and hierarchical probabilistic models that can learn both image distribution and its auxiliary representations (L3C). Another approach employs deep learning based frameworks as predictors to estimate the residuals which substantially improve the prediction accuracy compared to other prediction-based methods. These residual errors are then encoded using a novel context-tree-based biplane codec.
Other prior arts provide a compression method by means of a Convolutional Neural Network (CNN) to learn a compact representation of the original image and then encode it using the Lempel-Ziv Markov chain algorithm. The encoded image is reconstructed at the receiver end with the same quality as that of the original image. A learning-based predictive model is created for lossless image compression. These models offer voxel-wise prediction and train MLP and LSTM models, respectively. Predictions obtained by these models are used for calculating the residual error which are encoded by the arithmetic encoder.
Embodiments herein provide a method and system for reducing an earth-bound image volume with an efficient lossless compression technique that is based on a highly energy efficient neural network approach known as Spiking Neural Network (SNN). SNN, a 3rd generation Artificial Intelligence (AI) paradigm, achieves at least 100x power benefit while running on Neuromorphic computing platforms. The Neuromorphic computing platforms follow a non Von Neumann architecture closely mimicking the functionality of mammalian brain. Owing to collocation of compute and memory at neurons and synapses of a neuromorphic processor and the event based sparse asynchronous data processing, SNN running on neuromorphic platforms are less computationally intensive, thereby making them a suitable candidate for on-board processing in LEO satellites.
In another aspect, the classical DNN based Density Estimator is replaced with a corresponding SNN version. In the present disclosure, the complete lossless compression framework comprises of a SNN based Density Estimator (DE) followed by a classical Arithmetic Encoder (AE). The SNN model is used to obtain residual errors (difference between SNN predicted image pixels and the ground truth image pixels) which are compressed by AE and thereafter transmitted to the receiving station.
Referring now to the drawings, and more particularly to FIG. 1 through FIG. 8D, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates a block diagram of a system 100 for reducing earth-bound image volume with an efficient lossless compression technique that is based on a highly energy efficient neural network, in accordance with an example embodiment. Although the present disclosure is explained considering that the system 100 is implemented on a server, it may be understood that the system 100 may comprise one or more computing devices 102, such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. It will be understood that the system 100 may be accessed through one or more input/output interfaces 104-1, 104-2... 104-N, collectively referred to as I/O interface 104. Examples of the I/O interface 104 may include, but are not limited to, a user interface, a portable computer, a personal digital assistant, a handheld device, a smartphone, a tablet computer, a workstation, and the like. The I/O interface 104 are communicatively coupled to the system 100 through a network 106.
In an embodiment, the network 106 may be a wireless or a wired network, or a combination thereof. In an example, the network 106 can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further, the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices. The network devices within the network 106 may interact with the system 100 through communication links.
The system 100 supports various connectivity options such as BLUETOOTH®, USB, ZigBee, and other cellular services. The network environment enables connection of various components of the system 100 using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system 100 is implemented to operate as a stand-alone device. In another embodiment, the system 100 may be implemented to work as a loosely coupled device to a smart computing environment. Further, the system 100 comprises at least one memory with a plurality of instructions, one or more databases 112, and one or more hardware processors 108 which are communicatively coupled with the at least one memory to execute a plurality of modules 114 therein. The components and functionalities of the system 100 are described further in detail.
FIG. 2 is an exemplary flow diagram illustrating a low-power lossless image compression using a Spiking Neural Network (SNN) implemented by the system 100 of FIG. 1. The workflow is divided into two main functional blocks, namely the sender i.e., the satellite and the receiver i.e., a ground station. The sender comprises of a three main image components (i) a Spike Encoder for encoding real-valued image data into spikes, (ii) a Feed-forward Spiking Neural Network (SNN) for actual data processing, and (iii) an arithmetic Encoder for encoding the image data. The receiver block also comprises of the same pre-trained SNN model along with the corresponding arithmetic decoder.
There are different spike encoding schemes such as rate encoding, temporal encoding, phase encoding etc. Rate encoding and direct encoding techniques are well suited for SNN while processing large dataset on constrained edge devices such as satellite. In the process, the RGB values of the image pixels are encoded into spike trains. The range of RGB values for each channel lies within 0-255.
The Rate encoding is robust and more energy efficient than direct encoding technique. The direct encoding becomes more advantageous than rate encoding when number of layers in networks increases, and the dataset is larger. Moreover, the direct encoding performs better than date encoding where the input spikes are generated based on the pixel intensity. As the weights of the input coding layer are trained during direct encoding mechanism, the input pixels are converted into an optimal spike train.
An arithmetic coding is a popular compression technique that is used in a variety of lossless compression applications. In arithmetic coding, the arithmetic encoder encodes a message as an interval of numbers between 0 and 1. Successive symbols in the message reduces the interval size in accordance with the probabilities of the symbol generated by the model.
FIG. 3A and 3B (collectively referred as FIG. 3) is a flow diagram illustrating a processor-implemented method 300 for a low-power lossless image compression using a Spiking Neural Network (SNN) implemented by the system 100 of FIG. 1. Functions of the components of the system 100 are now explained with reference to FIG. 2 through steps of flow diagram in FIG. 3, according to some embodiments of the present disclosure.
Initially, at step 302 of the processor-implemented method 300, the one or more hardware processors 108 are configured by the programmed instructions to receive, via an input/output interface, a high resolution image with one or more channel spectral information as an input. The one or more channel spectral information comprising a plurality of predefined square sized patches. The images can be multi-channel (RGB/MSI/SAR). The square sized patches can be any of odd value such as 3x3, 5x5, 7x7, and 11x11.
At the next step 304 of the processor-implemented method 300, the one or more hardware processors 108 are configured by the programmed instructions to extract a plurality of neighborhood pixels from each of the plurality of predefined square sized patches associated to each spectral information channel. Wherein the plurality of neighborhood pixels is extracted from rows above a center pixel and immediate left pixels of the center pixel in each predefined square sized patch. Every non-boundary image pixel can be successfully predicted as y ^_(i,j)based on the values of the neighborhood quartet (four neighboring pixels) with pixel values (y_(i-1,j-1,) y_(i-1,j,) y_(i-1,j+1,) y_(i,j-1)).
At the next step 306 of the processor-implemented method 300, the one or more hardware processors 108 are configured by the programmed instructions to normalize the extracted plurality of neighborhood pixels based on a predefined range.
At the next step 308 of the method 300, the one or more hardware processors 108 are configured by the programmed instructions to encode values of the received one or more channel spectral information into one or more spike trains after normalizing information values for each spectral information channel. It is to be noted that the SNNs work in a spike domain. Any real valued input to the model must be converted into spike trains before being consumed by the SNN. There are different spike encoding schemes such as rate encoding, temporal encoding, phase encoding etc.
In one illustration, Red-Green-Blue (RGB) values of image pixels are encoded into spike trains. The range of RGB values for each channel lies within 0-255 and they are normalized by the maximum value (i.e., 255) before encoding.
At the next step 310 of the method 300, the one or more hardware processors 108 are configured by the programmed instructions to train a Spiking Neural Network (SNN) model using a set of received high resolution images with one or more channel spectral information to predict values of one or more non-boundary pixels of the image from the plurality of neighborhood pixels. The main objective of the SNN is to learn a mapping function that can predict target pixels from a given quartet of input pixels. A sliding window mechanism with this quartet helps the SNN train on every pixel of an image for training set.
FIG. 4 illustrates a neighborhood quartet of a given target pixel with predicted value for each color channel in accordance with some embodiments of the present disclosure. Considering a quartet for each color channel of RGB, Ni the number of neuron input neurons in the trained SNN is 12. Value of Nh, number of neurons in the hidden layer, is subject to performance tuning and is taken as 100. An empirical loss is minimized over the training set of each image:
L=(?_(l=1)^n¦?y-y ^??)/n (1)
wherein, y is the ground truth value of lth pixel while y ^ is the neuron membrane potential and n are total number of data samples. The SNN is trained to minimize the residual error (R=y-y ^) for each pixel. The compression ratio is inversely proportional to RE.
FIG. 5 is a schematic diagram illustrating a feed-forward Spiking Neural Network (SNN) in accordance with some embodiments of the present disclosure. The SNN is trained using a Deep Globe Land Cover (DGLC) and Signal and Image Processing Institute (USC-SIPI) datasets. From the DGLC dataset, 31 satellite images with dimensions 2448 x 2448 pixels have been selected out of which 25 images are used for training and 6 images are used for testing. The SIPI dataset comprises of 25 aerial images with dimensions 1024 x 1024 pixels from where 19 images are used for training and remaining 6 for testing.
In one aspect, the encoded spike trains are consumed by the network that follows a spike encoder module. A spiking feed-forward neural network is designed, where each layer comprises of a leaky integrate and fire (LIF) neurons. The equation for LIF neuron is expressed below:
t_m dV/dt=(V_rest-V)+IR (2) s={¦(1,V=V_thresh@0,V
Documents
Application Documents
| # |
Name |
Date |
| 1 |
202321040920-STATEMENT OF UNDERTAKING (FORM 3) [15-06-2023(online)].pdf |
2023-06-15 |
| 2 |
202321040920-PROVISIONAL SPECIFICATION [15-06-2023(online)].pdf |
2023-06-15 |
| 3 |
202321040920-FORM 1 [15-06-2023(online)].pdf |
2023-06-15 |
| 4 |
202321040920-DRAWINGS [15-06-2023(online)].pdf |
2023-06-15 |
| 5 |
202321040920-DECLARATION OF INVENTORSHIP (FORM 5) [15-06-2023(online)].pdf |
2023-06-15 |
| 6 |
202321040920-Proof of Right [03-07-2023(online)].pdf |
2023-07-03 |
| 7 |
202321040920-FORM-26 [16-08-2023(online)].pdf |
2023-08-16 |
| 8 |
202321040920-FORM 3 [14-09-2023(online)].pdf |
2023-09-14 |
| 9 |
202321040920-FORM 18 [14-09-2023(online)].pdf |
2023-09-14 |
| 10 |
202321040920-ENDORSEMENT BY INVENTORS [14-09-2023(online)].pdf |
2023-09-14 |
| 11 |
202321040920-DRAWING [14-09-2023(online)].pdf |
2023-09-14 |
| 12 |
202321040920-COMPLETE SPECIFICATION [14-09-2023(online)].pdf |
2023-09-14 |
| 13 |
Abstract1.jpg |
2024-01-20 |
| 14 |
202321040920-FORM 3 [19-07-2024(online)].pdf |
2024-07-19 |
| 15 |
202321040920-REQUEST FOR CERTIFIED COPY [13-08-2024(online)].pdf |
2024-08-13 |
| 16 |
202321040920-REQUEST FOR CERTIFIED COPY [13-08-2024(online)]-1.pdf |
2024-08-13 |
| 17 |
202321040920-CORRESPONDENCE(IPO)-(CERTIFIED LATTER)-23-08-2024.pdf |
2024-08-23 |