Abstract: A method and a system for partitioning neural network model for offloading computational load between local device and server is disclosed. While performing tasks/operations in a distributed environment, lot of resources of the computing devices are utilized. These computationally intensive tasks/operations are processed using neural network models. The system monitors various performance parameters (battery utilization, CPU utilization, GPU utilization) of the local device to determine performance parameter values. If the parameters parameter values satisfy conditions like battery power remaining is less than 30% and CPU utilization is around than 60-70%, then the system partitions the neural network model running on the local device i.e., partitioning the hidden layers. Based on the partitioning, the system selectively selects one or more hidden layers for being executed by the server.
The present invention generally relates to a neural network and more particularly, to a method
and a system for partitioning a neural network model.
Background of Invention
This section is intended to provide information relating to the field of the invention and thus
any approach/functionality described below should not be assumed to be qualified as prior art
merely by its inclusion in this section.
Neural network is a machine learning model which is made up of multiple neurons which acts
as processing/computing units. These neurons are primarily arranged in three layers i.e., input
layer, hidden layer and output layer. Processing of data is mainly performed by the hidden layer
in such a manner that it takes an input from one or more neurons of one layer, process the input,
and passes an output to one or more neurons of next layer. Based on the correctness of the
output, the neurons pass the feedback to the one or more neurons in backward direction. This
whole process helps the neural network to learn about any task/operation over the time.
This way, the neural networks are trained over the time for performing various tasks/operations
in various computing devices. In many applications, these computing devices work in a
distributed computing environment, for example a user device and a server. The
tasks/operations are computationally intensive and various resources of the computing devices
are utilized for performing the task like battery/power source, central processing unit (CPU),
graphics processing unit (GPU), memory and the like. Optimizing consumption of these
resources becomes a challenge when the number of tasks/operations increase, and the devices
becomes overloaded. Another challenge is to offload or rebalance the tasks/operations between
the devices in the distributed computing environment. However, such offloading/rebalancing
becomes difficult when these operations are linked with the neural network model.
The present invention aims at overcoming the drawbacks in the conventional techniques of
multimedia clustering and, also provide additional advantages that enhance user’s experience
of browsing through multimedia content on the user’s device.
3
Objects of the Invention
An object of the present invention is to provide a method for partitioning the neural network
mode into parts to optimize the usage of the computing devices.
Another object of the present invention is to provide a method of rebalancing the computational
load between the computing devices based on the partitioning of the neural network model.
Summary of the Invention
The present disclosure overcomes one or more shortcomings of the prior art and provides
additional advantages discussed throughout the present disclosure. Additional features and
advantages are realized through the techniques of the present disclosure. Other embodiments
and aspects of the disclosure are described in detail herein and are considered a part of the
claimed disclosure.
In one non-limiting embodiment of the present disclosure, a method for partitioning of a neural
network model for offloading computational load between a local device and a server is
disclosed. The method comprising monitoring a performance of the local device which requires
the neural network model for executing an application while being in a communication with
the server. The neural network model comprises input layers, hidden layers, and output layers.
The method further comprises determining a plurality of performance parameter values of the
local device, while monitoring, corresponding to a plurality of performance parameters
associated with a plurality of components of the local device. The plurality of performance
parameter values indicates computational load borne by the local device while executing the
application. Further, the method comprises partitioning, based on at least one of the plurality
of performance parameter values, the hidden layers into a plurality of hidden layers such that
at least one hidden layer, amongst the plurality of hidden layers, is selected for being executed
by the server for supporting the application running on the local device. The server performs
the execution of the at least one hidden layer by processing corresponding weights associated
with the at least one hidden layer.
In one non-limiting embodiment of the present disclosure, a system for partitioning of a neural
network model for offloading computational load between a local device and a server is
disclosed. The system comprises a monitoring unit to monitor performance of the local device
which requires the neural network model for executing an application while being in a
4
communication with the server. The neural network model comprises input layers, hidden
layers, and output layers. The system further comprises determining unit to determine a
plurality of performance parameter values of the local device, while monitoring, corresponding
to a plurality of performance parameters associated with a plurality of components of the local
device. The plurality of performance parameter values indicates computational load borne by
the local device while executing the application. Further, the system comprises partitioning
unit to partition, based on at least one of the plurality of performance parameter values, the
hidden layers into a plurality of hidden layers such that at least one hidden layer, amongst the
plurality of hidden layers, is selected for being executed by the server for supporting the
application running on the local device. The server performs the execution of the at least one
hidden layer by processing corresponding weights associated with the at least one hidden layer.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In
addition to the illustrative aspects, embodiments, and features described above, further aspects,
embodiments, and features will become apparent by reference to the drawings and the
following detailed description.
Brief Description of the Drawings
The embodiments of the disclosure itself, as well as a preferred mode of use, further objectives
and advantages thereof, will best be understood by reference to the following detailed
description of an illustrative embodiment when read in conjunction with the accompanying
drawings. One or more embodiments are now described, by way of example only, with
reference to the accompanying drawings in which:
Figure 1 shows an exemplary environment 100 of a system for partitioning a neural network
model for offloading computational load between a local device and a server in accordance
with an embodiment of the present disclosure;
Figure 2 shows a block diagram 200 illustrating a system for partitioning the neural network
model in accordance with an embodiment of the present disclosure;
Figure 3 shows a partitioning of the hidden layers of the neural network model in accordance
with an embodiment of the present disclosure; and
5
Figure 4 shows a method 400 for partitioning the neural network model for offloading
computational load between the local device and the server in accordance with an embodiment
of the present disclosure.
The figures depict embodiments of the disclosure for purposes of illustration only. One skilled
in the art will readily recognize from the following description that alternative embodiments of
the structures and methods illustrated herein may be employed without departing from the
principles of the disclosure described herein.
Detailed Description of the Invention
The foregoing has broadly outlined the features and technical advantages of the present
disclosure in order that the detailed description of the disclosure that follows may be better
understood. It should be appreciated by those skilled in the art that the conception and specific
embodiment disclosed may be readily utilized as a basis for modifying or designing other
structures for carrying out the same purposes of the present disclosure.
The novel features which are believed to be characteristic of the disclosure, both as to its
organization and method of operation, together with further objects and advantages will be
better understood from the following description when considered in connection with the
accompanying figures. It is to be expressly understood, however, that each of the figures is
provided for the purpose of illustration and description only and is not intended as a definition
of the limits of the present disclosure.
Disclosed herein is a method and a system for partitioning the neural network model for
offloading computational load between two or more computing devices, for example a local
device and a server. In a distributed computing environment, many devices work in
coordination with each other. While performing tasks/operations which are highly
computational, a lot of resources of these computing devices are utilized. These
tasks/operations are processed using neural network models, and therefore the performing the
tasks/operations become computationally intensive. The present disclosure aims to partition
the neural network model running on the computing devices and offload the computational
load between them. For example, partitioning the neural network model of the local device (if
the local device faces resource crunch issue like battery draining, more CPU utilization) in
order to offload the tasks/operations on the server. According to an embodiment of the present
disclosure, the partitioning of the neural network model is performed by splitting hidden layers
6
into plurality of layers and determining which hidden layers should be executed by the local
device and which of them should to be executed by the server.
Figure 1 shows an exemplary environment 100 of a system for partitioning the neural network
model for offloading computational load between a local device and a server in accordance
with an embodiment of the present disclosure. It must be understood to a person skilled in art
that the present invention may also be implemented in various environments, other than as
shown in Fig. 1.
The environment 100 comprises the system 102 connected with the local device 104 and the
server 106 through a network 108. The system 102 may be any computing device capable of
monitoring the activities on the local device 104 and the server 106. The local device 104 may
be a user device like a laptop, a mobile device (smartphone), PDA, tablet, and the like. The
server 106 may be any type of server or any computing device remotely connected with the
local device 104 of the user. It may be understood to a skilled person that the functionality of
the system 102 may be implemented in the local device 104 itself or in the server 106 itself or
in combination of the local device 104 and the server 106. In other words, the local device 104
or the server 106 may also be capable of determining how to partition the neural network model
for offloading the computational load between them. From figure 1, it can be observed that
Neural Network Model(Local Device) 110 is running on the local device 104, whereas Neural
Network Model (Server) 112 is running on the server 104.
Now, figure 1 is explained in conjunction with figure 2 which shows a block diagram 200
illustrating the system 102 and its component in detail. The system 102 comprises an
input/output interface 202, a processor 204, a memory 206 and various units 210. The I/O
interface 202 may include a variety of software and hardware interfaces, for example, a web
interface, a graphical user interface, input device, output device and the like. The I/O interface
202 may allow the system 102 to interact with users directly or through other devices. The
memory 206 is communicatively coupled to the processor 204. Further, the memory 206
comprises data 208 such as performance parameter values 212, Neural Network Model(Local
Device) 110, and Neural Network Model (Server) 112. Further, the units 210 comprises a monitoring
unit 214, a determining unit 216, a partitioning unit 218 and a synchronizing unit 220. All these
units may be dedicated hardware units capable of performing various operations of the system
102. However, according to an embodiment, the units 214-218 may a processor or an
7
application-specific integrated circuit (ASIC) or any circuitry capable of executing instructions
stored in the memory 206 of the system 102.
In the example shown in figure 1, the local device 104 and the server 106 are connected via the
network 108. The local device 104 and the server 106 may be capable of performing various
task/operations which utilizes their corresponding resources like battery, central processing
unit (CPU), graphics processing unit (GPU) and the like. However, when these tasks/operations
become computationally intensive it demands more utilization of the resources which may
affect the performance of the computing devices, specially the local device 104.
Consider an example, in which, a user of the local device 104 is playing a game on social media
which requires face processing (image processing), audio processing or video processing. As
these operations require high usage of computing resources (e.g., memory, battery, CPU,
GPU), the game developer may always try to let the local device 104 perform these operations
locally instead of game developer’s server to save cost. However, there could be a situation
where the user’s local device 104 battery may drain, or CPU utilization goes high or memory
requirement increases due to which the local device 104 may not be able to support these
computationally intensive operations, and therefore may not be able to play the game. To tackle
this issue, the present discloses provides a solution, in which, the Neural Network Model(Local
Device) 110 running on the local device 104 is partitioned and a portion of the same is now
computed on the game developer’s server. Since these tasks/operations are processed by the
neural network models, the present disclosure provides a technique of splitting the neural
network model running on the local device 104 so that some of the computational task could
be offload to the server 106.
When the local device 104 starts running any application, the monitoring unit 214 of the system
102 also starts monitoring the performance of the local device 104 which requires the Neural
Network Model(Local Device) 110 for executing the application while being in a communication
with the server 106. The Neural Network Model(Local Device) 110 comprises input layers, hidden
layers, and output layers as shown in figure 3. The monitoring is done to continuously check
how the local device 104 would perform while running the application. The application may
be any gaming application which requires intensive image/audio/video processing, or any data
analysis application which may demand high usage of the local device 104’s resources or any
other application.
8
For checking the performance, the local device 104 is monitored based on different
performance parameters like processing speed, Central Processing Unit (CPU) utilization,
power utilization, network availability, and memory utilization, Graphics Processing Unit
(GPU) utilization, waiting time for existing processes to complete, and number of background
applications running on the local device 104 and the like. It may be understood to a skilled
person that there may be various parameters as well other than as discussed above which may
be monitored by the monitoring unit 214 of the system 102.
Based on the monitoring, the determining unit 216 determines a plurality of performance
parameter values 212 of the local device 104 corresponding to the plurality of performance
parameters. The plurality of performance parameter values 212 determined indicates current
state of the computational load borne by the local device 104 while executing the application.
An example of the plurality of performance parameter values 212 is shown in below table:
Performance parameters Performance parameter values
CPU utilization 60-70% for the last 1 minute
Power utilization (Battery power) 20%-25%
Network availability Minimum 2GS speed
It can be understood to a skilled person that there may be other performance parameter values
212 as well different from as shown in the above table. Based on the above values, the system
102 understands that the power of the local device 104 is draining and also the CPU utilization
has increased. The system 102 may also derive based on these values that the application
running on the local device 104 requires immediate support from the server 104 as the local
device 104 may not be able to bear the computational load for a longer period.
To this, the partitioning unit 218 of the system partitions the hidden layers of the Neural
Network Model(Local Device) 110 into a plurality of hidden layers based on the plurality of
performance parameter values 212 determined in the above table. It can be understood to the
skilled person that the partitioning may be performed based on all or any one performance
parameter value determined while monitoring the performance of the local device 104. For an
example, even if the CPU utilization is not so high, but the system 102 determines that the
remaining power is now less than 20% or 15 %, the system 102 immediately performs the
partitioning of the neural network model of the local device 104.
9
The partitioning of the neural network model performed is shown in figure 3. It can be observed
that the hidden layers are partitioned into 3 hidden layers HL1, HL2, and HL3 which are
connected through the lines i.e., weights assigned. It can be understood to the skilled person
that partitioning of the neural network model shown in figure 3 is merely an example and there
may be different structure and weights of the neural network model.
Now the system 102 analyses the plurality of weights associated with the hidden layers to
understand which hidden layer or which pair of neurons are performing how much
computationally intensive operations/tasks while running the application on the local device
104. Based upon the analysis, the system 102 may select one more hidden layers, amongst the
plurality of hidden layers, for being executed by the server 106 for supporting the application
running on the local device 104. In the example as shown in figure 3, HL1 and HL2 are selected
for being executed on the server 106.
The system 102 may decide upon selecting both the hidden layers (HL1 and HL2) in single
instance or in two instances which depends upon the variation of the performance parameter
values 212 determined. For example, if the system 102 determines that though the CPU
utilization is higher or beyond predefined CPU usage threshold but the battery power is still in
a decent condition (based on predefined battery threshold), the system 102 only selects first
hidden layer HL1 for being executed on the server 106. However, over the time if the user
continues to run/use the application on his/her local device 104 and the battery now comes
down (e.g. 20%-25%), the system 102 further selects another hidden layer HL2 for being
executed on the server 106. The local device 104 and the server 106 both are known with the
partitioning scheme which is done on the neural network model.
Now, the Neural Network Model (Server) 112 running on the server 106, upon processing the
hidden layers (HL1 and HL2) and their corresponding weights, gives back the response (server
output) to the local device 104. According to an embodiment, the synchronization unit 220 of
the system 102 synchronizes the server output (pertaining to the execution of the hidden layers
HL1 and HL2) with local device output (pertaining to the execution of remaining hidden layer
(HL3) for running the application on the local device 104.
Further, the system 102 continuously keep monitoring the changes in the performance of the
local device 104. When it is determined that the performance parameters values come to
normal/decent usage state, the system 102 may switch back the execution of the hidden layer(s)
from the server 106 to the local device 104. In other words, when the local device 104 becomes
10
capable of executing all the hidden layers of the neural network model locally, then processing
is given back to the local device 104. This way, the system 102 manages the computational
load between the local device 104 and the server 106.
Figure 4 shows a method 400 for partitioning the neural network model in accordance with an
embodiment of the present disclosure.
As illustrated in Figure 4, the method 400 includes one or more blocks illustrating a method
for selectively offloading computational load between the local device and the server. The
method 400 may be described in the general context of computer executable instructions.
Generally, computer executable instructions can include routines, programs, objects,
components, data structures, procedures, modules, and functions, which perform specific
functions or implement specific abstract data types.
The order in which the method 400 is described is not intended to be construed as a limitation,
and any number of the described method blocks can be combined in any order to implement
the method. Additionally, individual blocks may be deleted from the methods without
departing from the spirit and scope of the subject matter described herein.
At block 402, the method 400 may include monitoring a performance of the local device 104
which requires the neural network model 110 for executing an application while being in a
communication with the server 106. The neural network model 110 comprises input layers,
hidden layers, and output layers. Further, plurality of performance parameter values (212)
comprises at least one of processing speed value, power utilization value, network availability
value, and memory utilization value associated with the local device (104).
At block 404, the method 400 may include determining a plurality of performance parameter
values 212 of the local device 104, while monitoring, corresponding to a plurality of
performance parameters associated with a plurality of components of the local device 104. The
plurality of performance parameter values 212 indicate computational load borne by the local
device 104 while executing the application.
At block 406, the method 400 may include partitioning, based on at least one of the plurality
of performance parameter values 212, the hidden layers into a plurality of hidden layers such
that at least one hidden layer, amongst the plurality of hidden layers, is selected for being
executed by the server 106 for supporting the application running on the local device 104. The
11
server 106 performs the execution of the at least one hidden layer by processing corresponding
weights associated with the at least one hidden layer.
The terms "including", "comprising", “having” and variations thereof mean "including but not
limited to", unless expressly specified otherwise.
The enumerated listing of items does not imply that any or all the items are mutually exclusive,
unless expressly specified otherwise.
The terms "a", "an" and "the" mean "one or more", unless expressly specified otherwise.
A description of an embodiment with several components in communication with each other
does not imply that all such components are required. On the contrary, a variety of optional
components are described to illustrate the wide variety of possible embodiments of the
invention.
When a single device or article is described herein, it will be readily apparent that more than
one device/article (whether or not they cooperate) may be used in place of a single
device/article. Similarly, where more than one device or article is described herein (whether or
not they cooperate), it will be readily apparent that a single device/article may be used in place
of the more than one device or article or a different number of devices/articles may be used
instead of the shown number of devices or programs. The functionality and/or the features of
a device may be alternatively embodied by one or more other devices which are not explicitly
described as having such functionality/features. Thus, other embodiments of the invention need
not include the device itself.
Finally, the language used in the specification has been principally selected for readability and
instructional purposes, and it may not have been selected to delineate or circumscribe the
inventive subject matter. It is therefore intended that the scope of the invention be limited not
by this detailed description, but rather by any claims that issue on an application based here on.
Accordingly, the embodiments of the present invention are intended to be illustrative, but not
limiting, of the scope of the invention, which is set forth in the following claims.
While various aspects and embodiments have been disclosed herein, other aspects and
embodiments will be apparent to those skilled in the art. The various aspects and embodiments
disclosed herein are for purposes of illustration and are not intended to be limiting, with the
true scope and spirit being indicated by the following claims.
We claim:
1. A method for partitioning of a neural network model for offloading computational load
between a local device (104) and a server (106), the method comprising:
monitoring a performance of the local device (104) which requires the neural network
model (110) for executing an application while being in a communication with the server (106),
wherein the neural network model (110) comprises input layers, hidden layers, and output
layers;
determining a plurality of performance parameter values (212) of the local device (104),
while monitoring, corresponding to a plurality of performance parameters associated with a
plurality of components of the local device (104), wherein the plurality of performance
parameter values (212) indicate computational load borne by the local device (104) while
executing the application; and
partitioning, based on at least one of the plurality of performance parameter values
(212), the hidden layers into a plurality of hidden layers such that at least one hidden layer,
amongst the plurality of hidden layers, is selected for being executed by the server (106) for
supporting the application running on the local device (104), wherein the server (106) performs
the execution of the at least one hidden layer by processing corresponding weights associated
with the at least one hidden layer.
2. The method as claimed in claim 1, wherein the plurality of performance parameter
values (212) comprises at least one of CPU utilization value, power utilization value, network
availability value, and memory utilization value associated with the local device (104).
3. The method as claimed in claim 1, wherein the selection of the at least one hidden layer
for being executed by the server (106) is performed by analyzing a plurality of weights
associated with the plurality of hidden layers after being partitioned.
4. The method as claimed in claim 1, further comprising synchronizing server output,
received upon the execution of the at least one hidden layer by the server (106), with local
device output pertaining to the execution of remaining hidden layers, of the plurality of hidden
layers, for running the application.
13
5. The method as claimed in claim 1, further comprising switching back the execution of
the at least one hidden layer from the server (106) to the local device (104) upon observing
changes in the at least one of the performance parameter values (212), wherein the changes
indicate the capability of the local device (104) for being able to execute the at least one hidden
layer along with the remaining hidden layers of the plurality of hidden layers of the neural
network model (110).
6. A system (102) for partitioning of a neural network model for offloading computational
load between a local device (104) and a server (106), the system (102) comprises:
a monitoring unit (214) to monitor performance of the local device (104) which requires
the neural network model (110) for executing an application while being in a communication
with the server (106), wherein the neural network model (110) comprises input layers, hidden
layers, and output layers;
determining unit (216) to determine a plurality of performance parameter values (212)
of the local device (104), while monitoring, corresponding to a plurality of performance
parameters associated with a plurality of components of the local device (104), wherein the
plurality of performance parameter values (212) indicate computational load borne by the local
device (104) while executing the application; and
partitioning unit (218) to partition, based on at least one of the plurality of performance
parameter values (212), the hidden layers into a plurality of hidden layers such that at least one
hidden layer, amongst the plurality of hidden layers, is selected for being executed by the server
(106) for supporting the application running on the local device (104), wherein the server (106)
performs the execution of the at least one hidden layer by processing corresponding weights
associated with the at least one hidden layer.
7. The system (102) as claimed in claim 6, wherein the plurality of performance parameter
values (212) comprises at least one of CPU utilization value, power utilization value, network
availability value, and memory utilization value associated with the local device (104).
8. The system (102) as claimed in claim 6, wherein the partitioning unit (218) selects the
at least one hidden layer for being executed by the server (106) by analyzing a plurality of
weights associated with the plurality of hidden layers after being partitioned.
14
9. The system (102) as claimed in claim 6, further comprises a synchronization unit (220)
to synchronize server output, received upon the execution of the at least one hidden layer by
the server (106), with local device output pertaining to the execution of remaining hidden
layers, of the plurality of hidden layers, for running the application.
10. The system (102) as claimed in claim 6, is further configured to switch back the
execution of the at least one hidden layer from the server (106) to the local device (104) upon
observing changes in the at least one of the performance parameter values (212), wherein the
changes indicate the capability of the local device (104) for being able to execute the at least
one hidden layer along with the remaining hidden layers of the plurality of hidden layers of the
neural network model.
| # | Name | Date |
|---|---|---|
| 1 | 202011029408-FORM 18 [10-05-2024(online)].pdf | 2024-05-10 |
| 1 | 202011029408-STATEMENT OF UNDERTAKING (FORM 3) [10-07-2020(online)].pdf | 2020-07-10 |
| 2 | 202011029408-POWER OF AUTHORITY [10-07-2020(online)].pdf | 2020-07-10 |
| 2 | 202011029408-Proof of Right [22-10-2020(online)].pdf | 2020-10-22 |
| 3 | 202011029408-COMPLETE SPECIFICATION [10-07-2020(online)].pdf | 2020-07-10 |
| 3 | 202011029408-FORM 1 [10-07-2020(online)].pdf | 2020-07-10 |
| 4 | 202011029408-FIGURE OF ABSTRACT [10-07-2020(online)].pdf | 2020-07-10 |
| 4 | 202011029408-DECLARATION OF INVENTORSHIP (FORM 5) [10-07-2020(online)].pdf | 2020-07-10 |
| 5 | 202011029408-DRAWINGS [10-07-2020(online)].pdf | 2020-07-10 |
| 6 | 202011029408-DECLARATION OF INVENTORSHIP (FORM 5) [10-07-2020(online)].pdf | 2020-07-10 |
| 6 | 202011029408-FIGURE OF ABSTRACT [10-07-2020(online)].pdf | 2020-07-10 |
| 7 | 202011029408-COMPLETE SPECIFICATION [10-07-2020(online)].pdf | 2020-07-10 |
| 7 | 202011029408-FORM 1 [10-07-2020(online)].pdf | 2020-07-10 |
| 8 | 202011029408-POWER OF AUTHORITY [10-07-2020(online)].pdf | 2020-07-10 |
| 8 | 202011029408-Proof of Right [22-10-2020(online)].pdf | 2020-10-22 |
| 9 | 202011029408-FORM 18 [10-05-2024(online)].pdf | 2024-05-10 |
| 9 | 202011029408-STATEMENT OF UNDERTAKING (FORM 3) [10-07-2020(online)].pdf | 2020-07-10 |
| 10 | 202011029408-FER.pdf | 2025-07-04 |
| 11 | 202011029408-FORM 3 [06-08-2025(online)].pdf | 2025-08-06 |
| 1 | 202011029408_SearchStrategyNew_E_202011029408E_05-03-2025.pdf |