Sparsity Structure Specific Storage Of Optimized Sparse Deep Learning

< Back

Sparsity Structure Specific Storage Of Optimized Sparse Deep Learning (Dl) Models For Edge Devices

Abstract: This disclosure relates generally to a method and system for sparsity structure specific storage of optimized deep learning (DL) models. State-of-the-art methods for handling resource constraint edge devices offer storing only non-zero elements of a sparse tensor in order to reduce model size. However, storing such non-zero elements severely affected the latency of the sparse model. The disclosed method provides at least two storage options, the first option stores sparse model by way of compression; and the second option stores the sparse model by way of encoding followed by compression. The preferential selection of the storage format is based on capacity of the edge device.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

02 June 2023

Publication Number

49/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point, Mumbai 400021, Maharashtra, India

Inventors

1. BASAK, Barnali

Tata Consultancy Services Limited, Block -1A, Eco Space, Plot No. IIF/12 (Old No. AA-II/BLK 3. I.T) Street 59 M. WIDE (R.O.W.) Road,New Town, Rajarhat, P.S. Rajarhat, Dist - N. 24 Parganas, Kolkata - 700160, West Bengal, India

2. KULKARNI, Gitesh

Tata Consultancy Services Limited, Unit-III, No 18, SJM Towers, Seshadri Road, Gandhinagar, Bangalore - 560009, Karnataka, India

3. MUKHERJEE, Arijit

Tata Consultancy Services Limited, Global Development Centre (GDC), Plot C, Block EP, Sector V, Salt Lake Electronic Complex, Kolkata - 700091, West Bengal, India

4. PAL, Arpan

Tata Consultancy Services Limited, Global Development Centre (GDC), Plot C, Block EP, Sector V, Salt Lake Electronic Complex, Kolkata - 700091, West Bengal, India

5. DASGUPTA, Pallab

IIT Kharagpur, Indian Institute of Technology, Kharagpur, Kolkata - 721602, West Bengal, India

Specification

DESC:FORM 2 THE PATENTS ACT, 1970 (39 of 1970) & THE PATENT RULES, 2003 COMPLETE SPECIFICATION (See Section 10 and Rule 13) Title of invention: SPARSITY STRUCTURE SPECIFIC STORAGE OF OPTIMIZED SPARSE DEEP LEARNING (DL) MODELS FOR EDGE DEVICES Applicant: Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956 Having address: Nirmal Building, 9th Floor, Nariman Point, Mumbai 400021, Maharashtra, India The following specification particularly describes the invention and the manner in which it is to be performed. CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY The present application claims priority from Indian provisional patent application no. 202321037939, filed on June 02, 2023. The entire contents of the aforementioned application are incorporated herein by reference. TECHNICAL FIELD The disclosure herein generally relates to a sparsified Deep Learning (DL) model, and, more particularly, to a selective storage of sparsified DL model that offers reduced size and desired latency. BACKGROUND Emerging Internet of Things (IoT) and an intelligent embedded applications necessitate the deployment of neural network models in resource constrained tiny edge devices without affecting accuracy. Tiny edge devices, such as microcontroller units (MCUs) like Arduino Nano, STM32 32-bit Arm Cortex, etc., typically have memory in kilobytes (KB) whereas, the state-of-the-art machine learning models, such as GoogleNet, ResNet, BigBird, and Switch Transformer have parameters in millions or billions and the model sizes can be in hundreds of MBs. Although such models are highly accurate and ideal for complex multi-class classification problems, they cannot be directly deployed in tiny devices to solve less complex but important classification problems at the edge. Migrating such models to useful counterparts for edge devices requires both data science and system expertise, and typically a substantial amount of human effort. A model compression technique like pruning automatically identifies neurons and connections within a neural network with minimal significance and prunes them. Structured pruning removes the less significant neurons of a network and the associated connections, resulting in a reduced dense neural network. Unstructured pruning modifies the weights of the salient connections to zero, without altering the neurons, resulting in a sparse network with a substantial amount of zero weighted connections and unstructured sparsity. The number of pruned connections widely varies across the layers, typically introducing sparsity within the range of 20% to 80% in many networks. The major limitation in deploying methods that result in unstructured pruning is that the available Deep Learning frameworks have no support for leveraging sparsity. In other words, currently, available deployment frameworks do not differentiate between zero and non-zero elements i.e., they store all the zero elements in the memory during deployment and use them in the computation, even though such computations are redundant. For example, TFLite for Microcontrollers, the de-facto standard for deploying neural network models in tiny edge devices like Micro Controller Units (MCUs), lacks the support to effectively remove the zero elements. The lack of sparsity support in available frameworks makes the unstructured pruning approach a less appealing strategy in practice. The fundamental challenge in exploiting sparsity is the strategy to store sparse data, as this choice primarily affects the trade-off between model size reduction and access overhead of non-zero elements. The choice of the storage format is critical if the target is a tiny edge device with stringent resource constraints. The effect of storage selection assumes more significance for sparse neural networks, as they have multiple sparse layers with varied sparsity. Consequently, a storage format with minimal overhead per non-zero element access may not reduce the model size significantly, whereas a highly compressed storage format may actually cause significant computation overhead per non-zero element access, affecting the latency of sparse DL/CNN models. The latency is critical as it affects the power as well as performance. Existing approaches hardly provide a fine balance between size and latency. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles: FIG. 1 is a functional block diagram of a system for sparsity structure specific storage of an optimized sparse Deep Learning (DL) model, in accordance with some embodiments of the present disclosure. FIG. 2 is a schematic overview of an essential elements of sparsity structure specific storage of the optimized sparse DL model, in accordance with some embodiments of the present disclosure. FIGS. 3A through 3B depict flow diagrams of an illustrative method for storing optimized sparse DL model based on sparsity structure specific storage selection, in accordance with some embodiments of the present disclosure. FIGS. 4A through 4C illustrate partition types of a Partitioned Hybrid Sparse (PaHS) format for three-dimensional sparse tensors, in accordance with some embodiments of the present disclosure. FIGS. 5A through 5G illustrate the PaHS format for three-dimensional sparse tensors, in accordance with some embodiments of the present disclosure. FIGS. 6A through 6I illustrate the PaHS format for four-dimensional sparse tensors, in accordance with some embodiments of the present disclosure. FIGS. 7A through 7H illustrate piece-wise encoding of the partitions generated from 4-dimensional tensor of size 4 × 2 × 3 × 3 and shown in FIG. 6B, in accordance with some embodiments of the present disclosure. FIG. 8 shows a graph depicting new Pareto optimal frontier of gain(%) in size vs gain(%) in latency for sparse ResNet50 model, in accordance with some embodiments of the present disclosure. FIG. 9 shows a graph depicting new Pareto optimal frontier of gain(%) in size vs gain(%) in latency for ECG model with low sparsity, in accordance with some embodiments of the present disclosure. FIG. 10 shows a graph depicting new Pareto optimal frontier of gain(%) in size vs gain(%) in latency for sparse ResNet50 model using EPaHS storage format, in accordance with some embodiments of the present disclosure. FIG. 11 shows a graph showing Pareto optimal frontier of gain(%) in size vs gain(%) in latency for sparse ECG model with low sparsity, in accordance with some embodiments of the present disclosure. DETAILED DESCRIPTION OF EMBODIMENTS Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following embodiments described herein. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed. Referring now to the drawings, and more particularly to FIGS. 1 through 11, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments, and these embodiments are described in the context of the following exemplary system and/or method. In the foregoing description, terms “sparse” and “pruned” are used interchangeably, and both refer to a Deep Learning (DL) model with presence of zero elements in the sparse tensor. As used herein, the term “size optimized” in concurrence to an optimized sparse DL model mean a model architecture optimized for size, such as a compact network with fewer layers. As used herein, the term “latency optimized” in concurrence to an optimized sparse DL model mean a model architecture optimized for latency, such as a compact network with fewer layers, can reduce the time it takes to make inferences. Given the target hardware constraints, one requires a suitable storage strategy for accommodating the sparse neural network model without adversely affecting the latency. In the present disclosure, the proposed method addresses the sparsity sensitivity issue in edge devices and provides an automated pathway for leveraging sparsity in model compaction without adversely affecting model latency. A heuristic-based partitioning strategy, customizing the storage representation to the individual sparsity structure of the sparse model is provided that enables block compression of the partitions, allowing the use of a hybrid storage format named Partitioned Hybrid Sparse (PaHS) format where each partition uses a different storage format, and the block compression enabling piece-wise encoding of the partitions, allowing the use of an encoded storage format named Encoded Partitioned Hybrid Sparse (EPaHS) format where each partition is encoded and stored differently. The partition-based storage representations PaHS and EPaHS benefit the sparse DL models with low to medium sparsity by optimizing both size and latency and makes is deployable at edge device. FIG. 1 is a functional block diagram of a system 100 for sparsity structure specific storage of optimized Deep Learning (DL) models, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes a processor(s) 104, communication interface device(s), alternatively referred as input/output (I/O) interface(s) 106, and one or more data storage devices or a memory 102 operatively coupled to the processor(s) 104. The system 100 with one or more hardware processors is configured to execute functions of one or more functional blocks of the system 100. Referring to the components of system 100, in an embodiment, the processor(s) 104, can be one or more hardware processors 104. In an embodiment, the one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 104 are configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems that are resource constraint including laptop computers, notebooks, hand-held devices such as mobile phones, workstations, mainframe computers, servers, and the like. The system 100, such as a handheld device or mobile phone is equipped partitioning the tensor weight and predicting the right kind of storage for optimized sparse DL model. The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface to display the generated target images and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular and the like. In an embodiment, the I/O interface (s) 106 can include one or more ports for connecting to a number of external devices or to another servers or devices. The servers are used to process requests and send responses, to execute algorithm for training and detection image segments for various segmentation approaches used in the multi-layer crop segmentation disclosed herein for small field boundary detection. The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. Further, the memory 102 includes a plurality of modules, for examples modules as depicted in FIG. 2, which include but are not limited to Partitioned Hybrid Sparse (PaHS) storage format 110A that offers block compression to n-dimensional sparse tensor to obtain compressed partitions suitable for storing sparse DL model, and an Encoded Partitioned Hybrid Sparse (EPaHS) format 110B that offers block encoding of compressed partitions obtained by first storage format to obtain compressed-cum-encoded partitions suitable for storing sparse DL model. Further, the memory 102 includes a database 108 to store received unstructurally spare DL model to be optimized for size and latency and to output a Convolutional Neural Network (CNN) based trained DL model (not shown) and the like. The CNN based DL model has a sparsity structure specific storage formats which customizes the storage based on the distribution of non-zero elements. The two storage formats are provided. A first storage format 110A is a Partitioned Hybrid Sparse format (PaHS) that essentially splits the sparse data into multiple pieces based on the maximum possible compression and stores each piece using a customized storage format. A second storage format 110B is an Encoded Partitioned Hybrid Sparse (EPaHS) storage format that stores each of the encoded partitions. In the EPaHS, indices of an encoded partition with no possible grouping are stored using a single one-dimensional vector, whereas the rest of the encoded partitions with possible groupings are stored using three one-dimensional vectors. Such a hybrid way of storing sparse data compacts the model without affecting the latency. Further, the memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system 100 and methods of the present disclosure. In an embodiment, the database 108 may be external (not shown) to the system 100 and coupled to the system via the I/O interface 106. Functions of the components of the system 100 are explained in conjunction with the flow diagram of FIG. 2. FIG. 2 is a schematic overview of essential elements of sparsity structure specific storage of optimized sparse DL model in accordance with some embodiments of the present disclosure. As is illustrated, unstructurally sparse DL model 200, is received for size and latency optimization using the approach disclosed by the system 100 disclosed herein. The unstructurally sparse DL model 200 has multiple sparse layers with varied sparsity and unstructured distributions of non-zero elements. The unstructurally sparse DL model 200 having multiple sparse layers, each layer handling n-dimensional sparse weight tensor having non-zero elements resulting in sparsity is split into multiple partitions in the parsing module 202 for most efficient storage. The n-dimensional sparse weight tensor selected for storage according to the present disclosure has low to medium sparsity, having 30% to 70% zero elements present in the sparse tensor. At a partitioning module 202, n-dimensional indices of non-zero elements are parsed and create partitions of n-dimensional indices as based on maximum possible compression of each partition. The parsed tensor creating partitions is subjected to block compression 204 by compressing the partitions so that each partition is maximally compressed, resulting in compact storage and optimized latency. This results in an optimized sparse DL model which is to be stored efficiently. The optimized sparse DL model thus obtained is taken for prediction at 206 wherein the size and latency of the optimized sparse DL model is predicted. If the predicted size and latency of optimized sparse DL model falls under the target hardware constraints, then the optimized sparse DL model stores in first storage format which is a partitioned Hybrid Sparse format (PaHS). Otherwise, the model cannot be optimized to fit the target hardware constraints and the optimization process should be aborted. Based on predicted size and latency, system 100 offers second storage format wherein the optimized sparse DL model undergo piece-wise encoding of block-compressed partitions at 208 and stored in second storage format 212 which is an Encoded Partitioned Hybrid Sparse (EPaHS) storage. Storing sparse data is challenging as the choice of storage affects the trade-off between model size reduction and access overhead of non-zero elements. Selecting the suitable storage format does not pose concerns of paramount significance if the resource is not constrained, for example when the target hardware is in a cloud with abundant resources. On the other hand, the choice of the storage format is critical if the target is a tiny edge device with stringent resource constraints. The effect of storage selection assumes more significance for sparse neural networks, as they have multiple sparse layers with varied sparsity. Consequently, a storage format with minimal overhead per non-zero element access may not reduce the model size significantly, whereas a highly compressed storage format may actually cause significant computation overhead per non-zero element access, affecting the latency. As latency affects both power and performance, embodiments of the present disclosure focus on balancing the trade-off between size and latency. FIGS. 3A through 3B depict flows diagram of an illustrative method 300 for storing optimized sparse DL model based on sparsity structure specific storage selection. At step 302 of the method 300, the one or more hardware processors 104 are configured to receive the unstructurally sparse DL model having multiple sparse layers, each handling n-dimensional sparse weight tensor of specific size and the tensor have non-zero elements resulting in sparsity. The unstructurally sparse DL to be stored in size and latency optimized storage has overall low to medium sparsity, and multiple sparse layers with varied sparsity and sparsity structures. All the sparse weight tensors of pruned DL model from different sparse layers have varied sparsity and sparsity structures, where sparsity structure denote the set of indices of the non-zero elements. An unstructurally pruned DL model M having multiple sparse layers, each handling n-dimensional sparse weight tensor Ti (d1, ……dn) of size s_1^i x ….. x s_n^i, assuming s_k^i be the size of the tensor Ti along dimension dk, and the tensor having mi non-zero elements resulting in sparsity, computed as: (?s_k^i-m^i)/(?s_k^i ) (1) At step 304 of the method 300, the one or more hardware processors 104 are configured to parse the n-dimensional indices of non-zero elements and partitioning the n-dimensional indices into n partitions based on maximum possible compression of each partition. The parser executing this step parses the sparsity structure of the sparse data (i.e., the indices of non-zero elements) and categorizes the indices into multiple partitions based on the maximum number of outer dimensions used to group the indices. Given an n-dimensional index J from a partition where all n-dimensional indices are grouped at maximum k outermost dimensions, an index I belongs to the same partition if I[1]=J[1], ..., I[k]=J[k], I[k+1] .J[k+1]. As a result of parsing, the n-dimensional indices of mi non-zero elements and partitioning said n-dimensional indices into n partitions p_0^i, . . . , p_(n-1)^i based on maximum possible compression of each partition, partition p_0^i does not showcase any feasible compression, partition p_1^i showcases compression at outer most dimension, partition p_2^i showcases compression at outer two dimensions, and so on. The parsing of sparsity pattern of the sparse tensors splits the data into multiple pieces where each piece is grouped at a maximum number of dimensions. Pieces with no possible grouping are stored using COOrdinate (COO) format, whereas other pieces are stored using variations of Compressed Sparse Fibre (CSF) format, based on the number of dimensions used for grouping. At step 306 of the method 300, the one or more hardware processors 104 includes block compressing the partitions so that each partition is maximally compressed, resulting in an optimized sparse DL model with compact storage and optimized latency. In an embodiment, block compression provides partitions P_(n-1)^i,….., P_0^i of n-dimensional indices of mi non-zero elements of sparse tensor Ti comprises steps of: Assume ipList be the set of n-dimensional indices of mi non-zero elements. For a partition P_j^i, set P_j^i to ?. For indices e, e’ € ipList, if e[d1]= e'[d1],… e[dj] = e'[dj], and e[dj+1] ? e'[dj+1] then P_j^i = P_j^i U {e} {e'} Update ipList to ipList\ P_j^i. Repeat steps (b) to (d) for generation of partitions P_(n-1)^i to P_0^i. The block compression of partition P_0^i does not compress data at any dimension, but the same for each partition of {P_j^i |=j =n-1} compresses index elements at outer j dimensions comprises steps of: Assume that tensor Ti has mi non-zero elements and partition P_j^i of Ti has m_j^i number of n-dimensional indices where m_j^i < mi. For a partition P_j^i, assume cij holds the j-dimensional index elements compressed at outer j dimensions and Uij holds the n-j-dimensional uncompressed index elements at inner n-j dimensions. Also set cij and Uij to ?. For n-dimensional indices e, e' € P_j^i, if e[d1] ? e' [d1], or e[d2] ? e' [d2], so on till e [dj] ? e' [dj] then Cij = Cij U {(e[d1],….., e[dj])} U {(e' [d1],….., e' [dj])} For n-dimensional indices e, e' € P_j^i, Uij = Uij U {(e[dj+1],….., e[dn])} U {(e' [dj+1],….., e' [dn])} Repeat steps (a) to (d) for partitions P_1^i to P_(n-1)^i. An algorithm or pseudocode outlining the way to partition the indices of the non-zero elements of an n-dimensional sparse tensor T (d1, . . . , dn) is described. The algorithm takes ipList, the list of indices of the non-zero elements, as the input and constructs at most n partitions, P?, . . . , Pn-1. For indices ei and ek in ipList, if they are identical at dimensions d1, . . . , dj i.e., ei[dl] = ek[dl] for 1 = l = j and ei [dj+1] ? ek [dj+1] then ei and ek belong to the same partition Pj. Once Pj is computed, ipList is updated as ipList = ipList\Pj. This process repeats from Pn-1 to P1. Finally, the remaining indices in ipList form Partition P?. The cost of the partitioning algorithm is 0 (n ×m) where m is the size of ipList. The partitioning overhead incurs during the compilation of the model, thus does not affect its latency. Algorithm 1: Algorithm to partition the set of n-dimensional indices for PaHS Storage Input: ipList, ordered list of n-dimensional indices of non-zero elements. Output: Partitions P?, P1,…., Pn-1, where P? contains indices without any feasible grouping, P1 contains indices grouped at dimension d1 and d2, and so on. for each partition j=n-1 down to 1 do Pj = ? for ei, ek € ipList do »construct Partition Pj if ei,and ek are identical at dimensions d1, …. dj then Pj = Pj U {ei} U {ek} end if end for ipList = ipList\Pj end for P? = ipList »construct Partition P? with remaining indices At step 308 of the method 300, the one or more hardware processors 104 are configured to predict the size of the optimized sparse DL model from the size of the block compressed partitions. The prediction is based on hardware size constraints. The method provides two mechanisms to store the optimized sparse DL model. In the first mechanism, if the size of an optimized sparse DL model obtained by compression of sparse tensor in the form of multiple maximally compressed partitions falls within the capacity of the target hardware, the method suggests a Partitioned Hybrid Storage (PaHS) format for storage. In the second mechanism, if the size of an optimized sparse DL model obtained by compression of sparse tensor in the form of multiple maximally compressed partitions falls beyond the capacity of the target hardware, the method suggests to store the optimized sparse DL model in the Encoded Partitioned Hybrid Storage (EPaHS) format. At step 310 of the method 300, the one or more hardware are configured to optimize the sparse DL model in the first storage format (PaHS) if the predicted size of the optimized sparse DL model falls under the target hardware constraint. Hardware capacity possess a constraints in deploying optimized sparse DL model in edge devices due to larger size of neural networks. Suitable storage strategy is required for accommodating the sparse neural network model without adversely affecting the latency. Practitioners typically follow a trial-and-error method by implementing individual storage representation for all the sparse layers of the model and executing the model to evaluate the corresponding storage consumption and latency overhead, eventually either accepting or rejecting the model based on the target constraints. This approach suffers from the following shortcomings: (a) the selected storage format is agnostic to the varied sparsity and sparsity structures across different layers of the neural network model and (b) the effort for sparse acceleration is entirely manual and requires a substantial amount of computing power. Storing optimized sparse DL model in first storage format (PaHS) as compressed partitions overcome the above shortcomings and support the storage efficiently. The optimized sparse DL model is stored in the second storage format (EPaHS) if the predicted size of the optimized sparse DL model falls beyond the target hardware constraint. The piece-wise encoding of the block compressed partitions processes the block compressed partitions by the piece-wise encoding. The piece-wise encoding is initiated once the size of the optimized model found above than the target hardware capacity. The second storage format of the sparsity structure specific storage format stores indices of non-zero elements by performing piece-wise encoding block compressed partitions. When the predicted size of the optimized sparse DL model is above the target hardware constraint, the plurality of maximally compressed partitions of multiple pieces of n-dimensional indices are grouped at particular level by encoding into one-dimensional indices thereby reducing number of elements per dimension and total number of dimensions. The piece-wise encoding of block-compressed partitions P_0^i, ….., P_(n-1)^i, resulting in optimized storage vs latency trade-off. The data encoding strategy reduces the size without incurring significant latency overhead. To meet the hardware capacity constraint, the PaHS storage is advanced with the encoding strategy. The partitioning strategy leads to block compression of partitions which in turn leads to the piece-wise encoding of the partitioned data. A partition with grouping at Level Lk showcases block compression at dimensions d1 to dk which are encoded into a single dimension, the rest of n - k dimensions are encoded into another single dimension, resulting in the piece-wise encoding of the partition. Assuming an n-dimensional index ?i1, . . . , in? of a tensor of size S1 × . . . ×Sn belongs to a partition with grouping at level Lk, the outer k-dimensional index values are encoded as ?_(l-1)^k¦i_1 x ?_(j=l+1)^k¦S_j and the remaining n-k dimensional index values are encoded as ?_(l=k+1)^n¦i_1 x ?_(j=l+1)^n¦?Sj ?whereas pointer information at level Lk remains unchanged. A partition with no feasible grouping can be encoded by the standard RLE so that all n-dimensional indices reduce into one-dimensional ones. All other partitions with possible groupings are reduced into three-dimensional data, two dimensions storing piece-wise encoded indices and the third dimension storing the pointer information. The compressed-cum-encoded partitions in Encoded Partitioned Hybrid Sparse (EPaHS) storage format is stored. Indices of an encoded partition with no possible grouping are stored using a single one-dimensional vector, whereas the rest of the encoded partitions with possible groupings are stored using three one-dimensional vectors. The customized storage of block-compresses and piece-wise encoded partitions P_0^i,…. P_(n-1)^i so that both the model size and the latency of M' are optimized. At step 312 of the method 300, the one or more hardware processors 104 are configured to deploy the optimized sparse DL model in the target hardware. The optimized model thus obtained is deployable on various edge devices (hardware constraint devices) using the sparsity structure specific storage format, PaHS and EPaHS. FIGS. 4A through 4C illustrate the PaHS formats for three-dimensional sparse tensors. According to FIGS. 4A-4C, system 100 comprises indices of non-zero elements of a three-dimensional tensor that are partitioned into multiple pieces. Indices ?i0, j0, k0? and ?i1, j1, k1? in partition (P0) 400, showcase no possible grouping. Indices ?i2, j0, k0?, ?i2, j1, k1?, and ?i2, j2, k2? in partition (P1) at 402 are grouped by T (i2, :, :), generating groups at Level L1. Similarly, indices ?i3, j0, k0?, ?i3, j0, k1?, and ?i3, j0, k2? in partition (P2) at 404 are grouped by T(i3, j0, :), generating groups at Level L2. The hierarchical groupings essentially construct partitions holding tree-like structures where each partition must satisfy one of the following properties: (1) Each node at level L1 should have a single and unique child node at level l2 and each child node at level L2 should have a single and unique child node at level L3. This essentially means that all the nodes should follow a single and unique child policy. For example, node i0 at level L1 in FIG. 4A has a single and unique child node j0 at level L2 which also has a single and unique child node k0 at level L3. (2) Each node at level L1 has multiple children node at level L2 but each child node at level L2 should have a single and unique child node at level L3. In FIG. 4B, node i2 at level L1 has multiple children node j0, j1, and j2, but each child node at level L2 has a single and unique child node at level L3, like j0 has k0, j1 has k1, and j2 has k2. (3) Each node at level L1 should have a single and unique child node at level L2 but each child node at level L2 should have multiple children node at level L3. As shown in FIG. 4C, the node i3 at level L1 has single and unique child node j0 at level L2 which in turn has multiple children node k0, k1, and k2 at level L3. Any other form of the tree structure is not valid in the case of partitioning three-dimensional sparse tensors. Such partitioning of indices enables block compression of each partition at different levels of the tree structure. A partition with no feasible grouping does not benefit from compression at all, whereas a partition with groups at level Lk block compresses indices at k outer dimensions. For example, the indices of Partition (P2) 404 in FIG. 4C are grouped at level L2 and thus are block compressed at outer two dimensions. Block compression is highly beneficial compared to hierarchical compression as block compression removes redundant pointer information. Hierarchical compression at each dimension results in pointer information pointing to the next dimension, as a result creating pointer information to be stored at each of n-1 dimensions. On the contrary, block compression at outer k dimensions removes redundant pointer information at outer k-1 dimensions. As the partitions are created based on the maximum compression, a partition with groups at level Lk showcases no possible compression at remaining n-k dimensions, therefore removing the need to store any pointer information at inner n-k-1 dimensions. However, the k-th dimension should store the pointer information pointing to the multiple children at Level Lk+1. For example, the hierarchical compression of Partition (P2) 404 in FIG. 4C results in pointer information at levels L1 and L2, whereas the block compression only stores pointer information at level L2. Block compression leads to customized storage of each partition, resulting in the Partitioned Hybrid Sparse format (PaHS). The partition with the first property is stored using COO format because storing this partition using CSF consumes redundant memory due to the unnecessary pointers at level L1 and level L2. The partition satisfying the second property is best stored using CSL that stores pointers at level L1 but avoids storing redundant pointers at level L2. The partition satisfying the third property is best stored using a variation of CSF that only stores pointers at Level L2 but avoids storing pointers at level L. FIGS. 5A through 5G illustrate the PaHS format for a three-dimensional sparse tensors. According to FIGS 5A-5G, system 100 comprises the partition types of PaHS format for a 3-dimensional sparse tensor T (d1, d2, d3) that are partitioned into multiple pieces as shown in FIGS. 5A-5G. 5A is a three-dimensional sparse tensor T (d1, d2, d3 ) of size 5 × 3 × 3. FIG. 5B-5D are the partitions generated by grouping at different dimensions. FIG. 5B shows the partition 0 with no possible grouping, FIG. 5C shows partition 1 with grouping at dimension d1, and FIG. 5D shows partition 2 with grouping at dimensions d1 and d2. FIGS. 5E-5G shows sparse data stored using PaHS. FIG. 5E shows partition 0 stored using COO, FIG. 5F shows partition-1 stored using CSL, and FIG. 5G shows partition-2 stored using variation of CSF. The partition types of PaHS format for a 3-dimensional sparse tensor T (d1, d2, d3) has indices ?0, 0, 0?, ?0, 0, 1?, and ?0, 0, 2? that are grouped at dimensions d1 and d2, resulting in groups at Level L2. Similarly, the indices ?3, 1, 0? and ?3, 1, 1? are also grouped at Level L2. The resulting partition-2 is shown in FIG. 5D and the corresponding customized storage using variation of CSF is shown in FIG. 5G. Note that, only jPtr vector holds the pointer information pointing to the multiple children at Level L3. Indices ?1, 0, 0? and ?1, 1, 0? in partition-1 are grouped at dimension d1, resulting in groups at Level L1. Partition-1 in FIG. 5C is stored using the CSL storage format, as shown in FIG. 5F. Indices ?2, 2, 1?, ?3, 2, 1?, and ?4, 1, 1? cannot be grouped at any level, thereby creating Partition-0 in FIG. 5B. This partition is stored using the COO storage format, as shown in FIG. 5E. The partitioning strategy and formulation of PaHS can be easily extended to tensors of higher dimensions. For an n-dimensional tensor T (d1, . . . , dn) where n > 3, the valid partitioning results in tree structures of the following properties: (1) All nodes of a tree must adhere to single and unique child policy. (2) Nodes at maximum of a single level can have multiple children. As a result, no node in a valid tree structure has multiple parents. The block compression of each partition leads to individual storage customized to the level of grouping permitted in the partition. In general, a partition with grouping at Level Lk is stored using a variation of CSF, namely CSF-Lk, that avoids the pointer information at levels L1 to Lk-1 and levels Lk+1 to Ln-1, storing the pointer information only at Level Lk. FIGS. 6A through 6H illustrate the PaHS format for a four-dimensional sparse tensors. According to FIGS. 6A-6H, the four-dimensional sparse tensors storing indices in different partitions are presented. FIG. 6A is a 4-dimensional sparse tensor T (d1, d2, d3, d4 ) of size 4 ×2 ×3 ×3. FIGS. 6B-6E shows the partitions generated by grouping at different dimensions. FIG. 6B represents Partition-0 with no possible grouping, 6C represents Partition-1 with grouping at dimension d1, 6D represents Partition-2 with grouping at dimensions d1 and d2, and 6E represents Partition-3 with grouping at dimensions d1, d2, and d3. FIG. 6F-6I represents sparse data stored using PaHS whereas FIG. 6F shows Partition-0 stored using COO, FIG. 6G shows Partition-1 stored using CSF-L1, FIG. 6H shows Partition-2 stored using CSF-L2, and FIG. 6I shows Partition-3 stored using CSF-L3. The tree structures formed by the partitioning of a four-dimensional sparse tensor either strictly follows single and unique child policy or allows multiple children only at a single level. The tree structure of Partition-1 in FIG. 6C has multiple children only at Level L2, whereas the tree structures in Figure 6D have multiple children only at Level L3. Similarly, the tree in FIG. 6E has multiple children only at Level L4. Each child node has a single parent. CSF-L1 storage for Partition-1 in FIG. 6G stores the pointer information at Level L1 pointing to the children node at Level L2. Similarly, CSF-L2 storage for Partition-2 in FIG. 6H stores pointer information at Level L2 pointing to the nodes at Level L3. Likewise, CSF-L3 storage for Partition-3 in FIG. 6I stores pointer information at level L3 pointing to the nodes at Level L4. Note that the storage for Partition-0 in FIG. 6F does not hold any pointer information, thus customizing it to COO. The number of elements stored by COO, CSF, and PaHS formats for each partition of the 4-dimensional sparse tensor in FIGS. 6A-6H are shown in Table-1. The COO format stores 8 elements for each of Partition-0, Partition-1, Partition-3, and 20 elements for Partition-2, requiring a total of 44 elements. The CSF format stores 14 elements for Partition-0, 12 elements for Partition-1, 23 elements for Partition-2, and 10 elements for Partition-3, requiring 59 elements in total. The PaHS format stores 8 elements for Partition-0, 9 elements for Partition-1, 17 elements for Partition-2, and 7 elements for Partition-3, requiring 41 elements in total. The latency overhead by PaHS is the same as the number of elements stored by the format, assuming accessing each element takes one unit of time. Table 1 Partition-0 Partition-1 Partition-2 Partition-3 Total COO 8 8 20 8 44 CSF 14 12 23 10 59 PaHS 8 9 17 7 41 FIGS. 7A though 7D illustrates piece-wise encoding of the partitions generated from 4-dimensional tensor of size 4 × 2 × 3 × 3. FIG. 7A demonstrates the piece-wise encoding of the partitions generated from the 4-dimensional tensor of a size 4 × 2 × 3 × 3 and shown in FIG. 6B. FIG. 7A-7D represents piece-wise encoding of the partitions 0, 1, 2, and 3 respectively. Indices ?2, 1, 2, 1? and ?3, 0, 1, 0? from Partition-0 are encoded as ?2 ×2 ×3 ×3 + 1 ×3 ×3 + 2 ×3 + 1? and ?3 ×2 ×3 ×3 + 1 ×3?, resulting in encoded values ?52? and ?57? respectively. Here, the 4-dimensional indices are encoded into one-dimensional ones and are stored using a one-dimensional vector as shown in FIG. 7E. Indices ?1, 0, 0, 1? and ?1, 1, 1, 0? in Partition-1 are encoded as ?1, 1? and ?1, 12?. Here indices at all but the outermost dimension d1 are encoded, generating 2-dimensional encoded indices. For example, the index ?1, 1, 1, 0? is encoded as ?1, 1 ×3 ×3 + 1 ×3?, resulting in encoded value ?1, 12?. The encoded indices and the pointer information are stored using three one-dimensional vectors, as shown in FIG. 7F. Similarly, encoding of indices ?0, 0, 0, 0?, ?0, 0, 1, 0?, ?0, 0, 2, 1?, ?3, 1, 1, 1?, and ?3, 1, 2, 1? from Partition-2 generates indices ?0, 0?, ?0, 3?, ?0, 7?, ?7, 4?, and ?7, 7? respectively. Here values at dimensions d1 and d2 are encoded differently than dimensions d3 and d4. For example, the index ?3, 1, 2, 1? is encoded as ?3 ×2 + 1, 2 ×3 + 1?, resulting in encoded value ?7, 7?. The encoded indices with pointer information are stored using three one-dimensional vectors, as shown in FIG. 7G. Finally, indices ?3, 0, 0, 1? and ?3, 0, 0, 2? from Partition-3 are encoded into ?18, 1? and ?18, 2? and the encoded indices with pointer information are stored using three one-dimensional vectors, as shown in FIG. 7H. Here all but the innermost dimension d4 are encoded. For example, the index ?3, 0, 0, 2? is encoded as ?3 ×2 ×3, 2?. The piece-wise encoding approach stems from the combination of data compression and data encoding. Compressing data along each dimension reduces the number of elements to be stored per dimension, whereas encoding the compressed elements reduces the number of dimensions to be stored. As a result, the piece-wise encoding reduces both the number of elements per dimension and the total number of dimensions. However, such piece-wise encoding cannot be applied to the hierarchically compressed data as this compression generates tree-like structures where each level contains pointers to the next level. On the contrary, block compression, enabled by the sparsity structure-specific partitioning technique, generates two block-like structures, with a single level of pointer information, that can be individually encoded into single dimensions. The space requirement of a partition with grouping at Level Lk reduces from (k + 1) ×lk + (n - k) ×mk, while stored using PaHS format, to 2lk + mk, while stored using EPaHS format, assuming the grouping of mk indices at Level Lk creates lk groups. For the same partition, the latency overhead caused by EPaHS is 2k × lk + (2n - 2k + 1) × mk, assuming each memory access and computation takes one unit of time. Out of this, 2k × lk latency overhead occurs due to the access and decode of lk encoded elements generated by grouping at k dimensions and (2n - 2k + 1) × mk overhead occurs due to the access and decode of mk encoded elements of the remaining n - k dimensions. Assuming ??RLE, ??PaHS, and ??EPaHS denote the latency overhead caused by RLE, PaHS, and EPaHS storage formats for a partition with lk possible groups at Level Lk of mk indices; (a) ??RLE is set to (2n - 1) × mk (b) ??PaHS is set to (k + 1) × lk + (n - k) × mk, and (c) ??EPaHS is set to 2k × lk + (2n - 2k + 1) × mk. It can be shown that ??EPaHS < ??RLE. This follows from the fact that the inequality reduces to: l_k/m_k < (k-1)/k (2) which is valid as long as k =1. On the contrary, the inequality dEPaHS < dPaHS reduces to: l_k/m_k < (k - n- 1)/(k - 1) (3) which is invalid as k-n always produces negative result such as k

Documents

Application Documents

#	Name	Date
1	202321037939-STATEMENT OF UNDERTAKING (FORM 3) [02-06-2023(online)].pdf	2023-06-02
2	202321037939-PROVISIONAL SPECIFICATION [02-06-2023(online)].pdf	2023-06-02
3	202321037939-FORM 1 [02-06-2023(online)].pdf	2023-06-02
4	202321037939-DRAWINGS [02-06-2023(online)].pdf	2023-06-02
5	202321037939-DECLARATION OF INVENTORSHIP (FORM 5) [02-06-2023(online)].pdf	2023-06-02
6	202321037939-FORM-26 [16-08-2023(online)].pdf	2023-08-16
7	202321037939-Proof of Right [01-12-2023(online)].pdf	2023-12-01
8	202321037939-FORM 3 [31-05-2024(online)].pdf	2024-05-31
9	202321037939-FORM 18 [31-05-2024(online)].pdf	2024-05-31
10	202321037939-ENDORSEMENT BY INVENTORS [31-05-2024(online)].pdf	2024-05-31
11	202321037939-DRAWING [31-05-2024(online)].pdf	2024-05-31
12	202321037939-COMPLETE SPECIFICATION [31-05-2024(online)].pdf	2024-05-31
13	Abstract1.jpg	2024-06-28