Abstract: A method (600) and system (100) of compressing a first deep learning (DL) model is disclosed. A processor receives a verified DL model. The verified DL model is converted into a standard DL model based on a framework corresponding to a plurality of provisional compression types. A compression strategy is selected from a plurality of compression strategies using a neural network (NN) based on determining a compression feature vector based on a knowledge graph. A concatenated vector is determined based on a model feature vector, a dataset feature vector and compression feature vector. The NN is trained based on the concatenated vector. A bias of the NN is trained based on a model score corresponding to the standard NN. A compression embedding is determined corresponding to the selected compression strategy. [To be published with FIG. 1]
DESCRIPTION
Technical Field
[001] This disclosure relates generally to deep learning models and more particularly to a
method and system of compressing deep learning models.
5
BACKGROUND
[002] Many real-life situations require immediate processing capabilities right on the device.
Imagine home security camera with smart AI capabilities that would be required to analyze
possibility of any intruder in real-time to quickly warn about possible intrusion. The primary
10 hurdle in employing cutting-edge AI for real-time applications lies in the limitations of
deployment devices-constrained by limited resources, both in memory and processing power.
Most edge devices rely on battery power and are unable to accommodate high-computation
tasks due to their propensity to rapidly deplete the battery. The highly effective deep learning
models tend to be sizable, posing a challenge due to their substantial storage requirements,
15 rendering deployment on resource-restricted devices arduous. Furthermore, larger models
contribute to extended inference times and increase energy consumption, rendering them
impractical for widespread use in real-world applications despite their impressive performance.
[003] Smaller models typically require less computational power, resulting in faster inference
times. This is crucial for real-time applications where quick decisions are necessary, such as in
20 autonomous vehicles or real-time image processing. With that heavy model do face challenges
like network delay, power budget, etc. while deploying them on cloud infrastructure. In
scenarios where models need to be transferred over a network, heavy models increase the
amount of data that needs to be transmitted. This is a drawback in situations with limited
bandwidth or where data transfer costs are a concern.
25 [004] Therefore, there is a requirement for an efficient and effective methodology for
compressing deep learning model.
SUMMARY OF THE INVENTION
[005] In an embodiment, a method for compressing a deep learning (DL) model is disclosed.
30 The method may include receiving, by a computing device, a verified DL model. In an
embodiment, the verified DL model may be determined based on a data verification of a DL
model based on a plurality of test datasets. The method may further include converting, by the
3
computing device, the verified DL model into a standard DL model based on a framework
corresponding to a plurality of provisional compression types. The method may further include
determining, by the computing device, a compression feature vector based on a knowledge
graph corresponding to the plurality of provisional compression types. The method may further
5 include determining, by the computing device, a model feature vector based on a set of model
attributes corresponding to the standard DL model. The method may further include
determining, by the computing device, a dataset feature vector based on a set of input dataset
attributes corresponding to an input dataset. The method may further include determining, by
the computing device, a concatenated vector of the compression feature vector, the model
10 feature vector and the dataset feature vector. The method may further include selecting, by the
computing device, a compression strategy from a plurality of compression strategies using a
neural network (NN). In an embodiment, the NN may be trained based on the concatenated
vector. In an embodiment, the trained NN may determine a predicted accuracy rate and a
predicted parameter reduction information of each of the plurality of compression strategies.
15 In an embodiment, the selected compression strategy by the trained NN may correspond to one
of the plurality of compression strategies having a predicted accuracy rate and a predicted
parameter reduction information about equal to a reference accuracy rate and a reference
parameter reduction information respectively of one or more corresponding provisional
compression types from the plurality of provisional compression types. The method may
20 further include determining, by the computing device, a compression embedding
corresponding to the selected compression strategy.
[006] In another embodiment, a system of compressing a deep learning (DL) model is
disclosed. The system may include a processor, a memory communicably coupled to the
processor, wherein the memory may store processor-executable instructions, which when
25 executed by the processor may cause the processor to receive a verified DL model. In an
embodiment, the verified DL model may be determined based on a data verification of a DL
model based on a plurality of test datasets. The processor may further convert the verified DL
model into a standard DL model based on a framework corresponding to a plurality of
provisional compression types. The processor may further determine a compression feature
30 vector based on a knowledge graph corresponding to the plurality of provisional compression
types. The processor may further determine a model feature vector based on a set of model
attributes corresponding to the standard DL model. The processor may further determine a
dataset feature vector based on a set of input dataset attributes corresponding to an input dataset.
4
The processor may further determine a concatenated vector of the compression feature vector,
the model feature vector and the dataset feature vector. The processor may further select a
compression strategy from a plurality of compression strategies using a neural network (NN).
In an embodiment, the NN may be trained based on the concatenated vector. In an embodiment,
5 the trained NN may determine a predicted accuracy rate and a predicted parameter reduction
information of each of the plurality of compression strategies. In an embodiment, the selected
compression strategy by the trained NN may correspond to one of the plurality of compression
strategies having a predicted accuracy rate and a predicted parameter reduction information
about equal to a reference accuracy rate and a reference parameter reduction information
10 respectively of a provisional compression type from the plurality of provisional compression
types. The processor may further determine a compression embedding corresponding to the
selected compression strategy.
WE CLAIM:
1. The method (600) of compressing a deep learning (DL) model, the method comprising:
receiving (602), by a computing device (102), a verified DL model, wherein the verified
DL model is determined based on a data verification of a DL model based on a plurality of test
datasets;
converting (604), by the computing device (102), the verified DL model into a standard
DL model based on a framework corresponding to a plurality of provisional compression types;
determining (606), by the computing device (102), a compression feature vector based
on a knowledge graph corresponding to the plurality of provisional compression types;
determining (612), by the computing device (102), a model feature vector based on a
set of model attributes corresponding to the standard DL model;
determining (614), by the computing device (102), a dataset feature vector based on a
set of input dataset attributes corresponding to an input dataset;
determining (616), by the computing device (102), a concatenated vector of the
compression feature vector, the model feature vector and the dataset feature vector;
selecting (622), by the computing device (102), a compression strategy from a plurality
of compression strategies using a neural network (NN),
wherein the NN is trained based on the concatenated vector,
wherein the NN determines a predicted accuracy rate and a predicted parameter
reduction information of each of the plurality of compression strategies, and
wherein the selected compression strategy by the trained NN corresponds to one
of the plurality of compression strategies having a predicted accuracy rate and a
predicted parameter reduction information about equal to a reference accuracy rate and
a reference parameter reduction information respectively of one or more corresponding
provisional compression types from the plurality of provisional compression types; and
determining (624), by the computing device (102), a compression embedding
corresponding to the selected compression strategy.
2. The method as claimed in claim 1, comprising:
training (620), by the computing device (102), a bias of the NN based on one or more
of a model score corresponding to the verified DL model, a dataset score corresponding to the
input dataset, and a model performance score determined based on the data verification of the
DL model.
24
3. The method (600) as claimed in claim 1, wherein the knowledge graph is determined by:
determining (608), by the computing device (102), a search space by aggregating a set
of performance parameters of the plurality of provisional compression types for each of a
plurality of domains from domain knowledge corresponding to the plurality of provisional
compression types,
wherein the plurality of domains corresponds to a plurality technical area of
application of the DL model; and
determining (610), by the computing device (102), the knowledge graph based on the
search space,
wherein the knowledge graph comprises a plurality of nodes corresponding to
the set of performance parameters, and
wherein each of the plurality of nodes are connected to determine a plurality of
relationships between each of the set of performance parameters of the plurality of
provisional compression types for each of the plurality of domains.
4. The method (600) as claimed in claim 3, wherein the set of performance parameters
comprises a plurality of model compression techniques, one or more hyperparameters, one or
more hardware deployment parameters, a reference accuracy, a reference error, a reference
latency time, and a reference parameter reduction corresponding to each of the plurality of
provisional compression types.
5. The method (600) as claimed in claim 4, wherein each of the plurality of compression
strategies comprises a unique combination of the set of performance parameters for
compressing the first DL model based on each of the plurality of provisional compression types.
6. The method (600) as claimed in claim 1, wherein the plurality of provisional compression
types comprises pruning, quantization, knowledge distillation, network architecture search, and
low-rank approximation.
7. A system (100) of compressing a deep learning model (DL model), comprising:
a processor (104); and
a memory (106) communicably coupled to the processor (104), wherein the memory
stores processor-executable instructions, which, on execution, cause the processor (104) to:
25
receive a verified DL model, wherein the verified DL model is determined based
on a data verification of a DL model based on a plurality of test datasets;
convert the verified DL model into a standard DL model based on a framework
corresponding to a plurality of provisional compression types;
determine a compression feature vector based on a knowledge graph
corresponding to the plurality of provisional compression types;
determine a model feature vector based on a set of model attributes
corresponding to the standard DL model;
determine a dataset feature vector based on a set of input dataset attributes
corresponding to an input dataset;
determine a concatenated vector of the compression feature vector, the model
feature vector and the dataset feature vector;
select a compression strategy from a plurality of compression strategies using a
neural network (NN).
wherein the NN is trained based on the concatenated vector; and
wherein the NN determines a predicted accuracy rate and a predicted
parameter reduction information of each of the plurality of compression
strategies, and
wherein the selected compression strategy by the trained NN
corresponds to one of the plurality of compression strategies having a predicted
accuracy rate and a predicted parameter reduction information about equal to a
reference accuracy rate and a reference parameter reduction information
respectively of one or more corresponding provisional compression types from
the plurality of provisional compression types; and
determine a compression embedding corresponding to the selected compression
strategy.
8. The system (100) as claimed in claim 7, wherein, the processor (104) is configured to:
train a bias of the NN based on one or more of a model score corresponding to the
verified DL model, a dataset score corresponding to the input dataset, and a model performance
score determined based on the data verification of the DL model,
9. The system (100) as claimed in claim 7, wherein the processor (104) is configured to:
26
determine a search space by aggregating a set of performance parameters of the
plurality of provisional compression types for each of a plurality of domains from domain
knowledge corresponding to the plurality of provisional compression types,
wherein the plurality of domains corresponds to a plurality technical area of
application of the DL model; and
determine the knowledge graph based on the search space,
wherein the knowledge graph comprises a plurality of nodes corresponding to
the set of performance parameters, and
wherein each of the plurality of nodes are connected to determine a plurality of
relationships between each of the set of performance parameters of the plurality of
provisional compression types for each of the plurality of domains.
10. The system (100) as claimed in claim 9, wherein the set of performance parameters
comprises a plurality of model compression techniques, one or more hyper parameters, one or
more hardware deployment parameters, a reference accuracy, a reference error, a reference
latency time, and a reference parameter reduction corresponding to each of the plurality of
provisional compression types.
11. The system (100) as claimed in claim 10, wherein each of the plurality of compression
strategies comprises a unique combination of the set of performance parameters for
compressing the DL model based on each of the plurality of provisional compression types.
12. The system (100) as claimed in claim 7, wherein the plurality of provisional compression
types comprises pruning, quantisation, knowledge distillation, and low-rank approximation.
| # | Name | Date |
|---|---|---|
| 1 | 202441014357-STATEMENT OF UNDERTAKING (FORM 3) [27-02-2024(online)].pdf | 2024-02-27 |
| 2 | 202441014357-REQUEST FOR EXAMINATION (FORM-18) [27-02-2024(online)].pdf | 2024-02-27 |
| 3 | 202441014357-PROOF OF RIGHT [27-02-2024(online)].pdf | 2024-02-27 |
| 4 | 202441014357-POWER OF AUTHORITY [27-02-2024(online)].pdf | 2024-02-27 |
| 5 | 202441014357-FORM 18 [27-02-2024(online)].pdf | 2024-02-27 |
| 6 | 202441014357-FORM 1 [27-02-2024(online)].pdf | 2024-02-27 |
| 7 | 202441014357-DRAWINGS [27-02-2024(online)].pdf | 2024-02-27 |
| 8 | 202441014357-DECLARATION OF INVENTORSHIP (FORM 5) [27-02-2024(online)].pdf | 2024-02-27 |
| 9 | 202441014357-COMPLETE SPECIFICATION [27-02-2024(online)].pdf | 2024-02-27 |
| 10 | 202441014357-Form 1 (Submitted on date of filing) [28-01-2025(online)].pdf | 2025-01-28 |
| 11 | 202441014357-Covering Letter [28-01-2025(online)].pdf | 2025-01-28 |
| 12 | 202441014357-FORM 3 [17-04-2025(online)].pdf | 2025-04-17 |