Abstract: With the growing demand of accurate Deep Learning (DL) systems, larger Convolutional Neural Network (CNN) models have been proposed for various tasks. Due to extreme quantization of network parameters, conventional BCNNs suffer from significant accuracy loss which is still unaddressed. The present disclosure creates a binary convolutional neural network (BCNN) as the baseline architecture for a vision based system like gesture recognition. The BCNN uses single-bit precision computation replacing the floating point arithmetic operations with bit-wise operations that lead to high memory and computational efficiency for DL inference. Although BCNNs can theoretically achieve maximum possible accuracy due to their functional completeness, in practice binarization reduces BCNN accuracy compared to their full precision CNN counterpart. The present disclosure derives an optimal BCNN architecture for hand gesture recognition task by optimally applying skip connections within the network by using the bi-level optimization technique.
Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR OPTIMIZED BINARIZATION OF CONVOLUTIONAL NEURAL NETWORK
Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to the field of machine learning and, more particularly, to a method and system for optimized binarization of convolutional neural network.
BACKGROUND
With the growing demand for accurate Deep Learning (DL) systems, larger Convolutional Neural Network (CNN) models have been proposed for various tasks. However, the large CNN models are not always deployable on smart phones or tiny Internet of Things (IoT) devices which are required to be compact, lightweight, power and energy efficient DL models. During CNN inferencing, convolutional layers include many floating point operations that result in high computation time and often overload the edge, especially the constrained tiny-edge boards.
Conventional Binary Neural Network (BNN) architectures solved this problem through binarization of weights and biases with sign function, and bit-wise operations with XNOR and pop count (population count or number of bits set in a binary value) replacing the expensive matrix multiplications. Binarization of CNNs, i.e. extreme quantization of 32-bit activations and weights to 1-bit values, can significantly reduce the inference latency, which is particularly important for systems designed with edge or tiny edge devices as target hardware. Theoretically, BNN consume 32x less memory, and provide 58x faster inference speed as compared to conventional CNNs. Additionally, Straight Through Estimator (STE) was used in BNNs that assumes the binarizing threshold function (sign) as an identity function and directly passes on the gradient during back propagation. This is because the derivative of the sign function is zero, and standard gradient-descent is inapplicable on binary network parameters. Hence, during training, forward pass is through binary parameters whereas backward pass is through full precision parameters which are discarded after training completes. Due to extreme quantization of network parameters, BCNNs tend to suffer from significant accuracy loss compared to their CNN counterparts which is still unaddressed.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for optimized binarization of convolutional neural network is provided. The method includes receiving, by one or more hardware processors, a Convolutional Neural Network (CNN) model to be optimized, wherein the CNN model is associated with a vision based deep learning solution. Further, the method includes extracting, by the one or more hardware processors, a network configuration and a layer-wise configuration associated with the CNN model using a parsing technique. Furthermore, the method includes generating, by the one or more hardware processors, a Binarized CNN (BCNN) skeleton based on the network configuration and layer-wise configuration, wherein, the BCNN skeleton comprises a plurality of Binarized Convolutional (BC) blocks. Furthermore, the method includes computing, by the one or more hardware processors, a plurality of skip connection parameters associated with each of the plurality of BC blocks using a skip connection optimization technique, wherein the plurality of skip connection parameters comprises (i) number of skip connections from previous BC block to current BC block (ii) the BC block number from which the skip connections are done and (iii) number of channels to be forwarded from a particular BC block. Furthermore, the method includes concatenating, by the one or more hardware processors, intermediate outputs of previous BC blocks from among the plurality of BC blocks to a current BC block of the BCNN based on the plurality of skip connection parameters. Finally, the method includes generating, by the one or more hardware processors, an optimized BCNN using a Bi-level optimization technique, wherein the Bi-level optimization technique trains the BCNN model until a predefined training loss and a predefined validation loss is obtained.
In another aspect, a system for optimized binarization of convolutional neural network is provided. The system includes at least one memory storing programmed instructions, one or more Input /Output (I/O) interfaces, and one or more hardware processors operatively coupled to the at least one memory, wherein the one or more hardware processors are configured by the programmed instructions to receive a Convolutional Neural Network (CNN) model to be optimized, wherein the CNN model is associated with a vision based deep learning solution. Further, the one or more hardware processors are configured by the programmed instructions to extract a network configuration and a layer-wise configuration associated with the CNN model using a parsing technique. Furthermore, the one or more hardware processors are configured by the programmed instructions to generate a Binarized CNN (BCNN) skeleton based on the network configuration and layer-wise configuration, wherein, the BCNN skeleton comprises a plurality of Binarized Convolutional (BC) blocks. Furthermore the one or more hardware processors are configured by the programmed instructions to compute a plurality of skip connection parameters associated with each of the plurality of BC blocks using a skip connection optimization technique, wherein the plurality of skip connection parameters comprises (i) number of skip connections from previous BC block to current BC block (ii) the BC block number from which the skip connections are done and (iii) number of channels to be forwarded from a particular BC block. Furthermore, the one or more hardware processors are configured by the programmed instructions to concatenate intermediate outputs of previous BC blocks from among the plurality of BC blocks to a current BC block of the BCNN based on the plurality of skip connection parameters. Finally, the one or more hardware processors are configured by the programmed instructions to generate an optimized BCNN using a Bi-level optimization technique, wherein the Bi-level optimization technique trains the BCNN model until a predefined training loss and a predefined validation loss is obtained.
In yet another aspect, a computer program product including a non-transitory computer-readable medium having embodied therein a computer program for optimized binarization of convolutional neural network is provided. The computer readable program, when executed on a computing device, causes the computing device to receive a Convolutional Neural Network (CNN) model to be optimized, wherein the CNN model is associated with a vision based deep learning solution. Further, the computer readable program, when executed on a computing device, causes the computing device to extract a network configuration and a layer-wise configuration associated with the CNN model using a parsing technique. Furthermore, the computer readable program, when executed on a computing device, causes the computing device to generate a Binarized CNN (BCNN) skeleton based on the network configuration and layer-wise configuration, wherein, the BCNN skeleton comprises a plurality of Binarized Convolutional (BC) blocks. Furthermore the computer readable program, when executed on a computing device, causes the computing device to compute a plurality of skip connection parameters associated with each of the plurality of BC blocks using a skip connection optimization technique, wherein the plurality of skip connection parameters comprises (i) number of skip connections from previous BC block to current BC block (ii) the BC block number from which the skip connections are done and (iii) number of channels to be forwarded from a particular BC block. Furthermore, the computer readable program, when executed on a computing device, causes the computing device to concatenate intermediate outputs of previous BC blocks from among the plurality of BC blocks to a current BC block of the BCNN based on the plurality of skip connection parameters. Finally, computer readable program, when executed on a computing device, causes the computing device to generate an optimized BCNN using a Bi-level optimization technique, wherein the Bi-level optimization technique trains the BCNN model until a predefined training loss and a predefined validation loss is obtained.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 is a functional block diagram of a system for optimized binarization of Convolutional Neural Network (CNN), in accordance with some embodiments of the present disclosure.
FIG. 2 illustrates a flow diagram for a processor implemented method for optimized binarization of CNN, in accordance with some embodiments of the present disclosure.
FIG. 3 (FIG. 3A and FIG. 3B) illustrates architecture of the Binarized CNN for the processor implemented method for optimized binarization of CNN, in accordance with some embodiments of the present disclosure.
FIG. 4 and FIG. 5 illustrates experimental results for the processor implemented method for optimized binarization of CNN, in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments.
With the growing demand for accurate Deep Learning (DL) systems, larger Convolutional Neural Network (CNN) models have been proposed for various tasks. However, the large models are not always deployable on smart phones or tiny Internet of Things (IoT) devices which require compact, lightweight, power and energy efficient DL models. During CNN inferencing, convolutional layers include many floating point operations that result in high computation time and often overload the edge, especially the constrained tiny-edge boards.
Conventional Binary Neural Network (BNN) architectures solved this problem through binarization of weights and biases with sign function1, and bit-wise operations with XNOR and pop count (population count or number of bits set in a binary value) replacing the expensive matrix multiplications. Binarization of CNNs, i.e. extreme quantization of 32-bit activations and weights to 1-bit values, can significantly reduce the inference latency, which is particularly important for systems designed with edge or tiny edge devices as target hardware. Theoretically, BNN consume 32x less memory, and provide 58x faster inference speed as compared to conventional CNNs. Additionally, Straight Through Estimator (STE) was used in BNNs that assumes the binarizing threshold function (sign) as an identity function and directly passes on the gradient during back propagation. This is because the derivative of the sign function is zero, and standard gradient-descent is inapplicable on binary network parameters. Hence, during training, forward pass is through binary parameters whereas backward pass is through full precision parameters which are discarded after training completes. Due to extreme quantization of network parameters, BCNNs tend to suffer from significant accuracy loss compared to their CNN counterparts which is still unaddressed.
In order to overcome the challenges of the conventional approaches, Embodiments herein provide a method and system for optimized binarization of convolutional neural network. In the present disclosure, creates an edge-friendly hand-gesture DL-pipeline, for example, a binary convolutional neural network (BCNN) as the baseline architecture for a vision based system like gesture recognition. The BCNN uses single-bit precision computation replacing the floating point arithmetic operations with bit-wise operations that lead to high memory and computational efficiency for DL inference. Although BCNNs can theoretically achieve maximum possible accuracy due to their functional completeness, in practice binarization reduces BCNN accuracy compared to their full precision CNN counterpart. The present disclosure derive an optimal BCNN architecture for hand gesture recognition task by optimally applying skip connections within the network by using the bi-level optimization technique. In an embodiment, the present disclosure recovered 36.4% of the accuracy drop due to binarization of CNN and further improved the inference speed to 6.7x more and memory usage to 20x less than that of the full precision CNN.
Referring now to the drawings, and more particularly to FIG. 1 and FIG. 5, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 is a functional block diagram of optimized binarization of convolutional neural network, in accordance with some embodiments of the present disclosure. The system 100 includes or is otherwise in communication with hardware processors 102, at least one memory such as a memory 104, an I/O interface 112. The hardware processors 102, memory 104, and the Input /Output (I/O) interface 112 may be coupled by a system bus such as a system bus 108 or a similar mechanism. In an embodiment, the hardware processors 102 can be one or more hardware processors.
The I/O interface 112 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 112 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer and the like. Further, the I/O interface 112 may enable the system 100 to communicate with other devices, such as web servers, and external databases.
The I/O interface 112 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the I/O interface 112 may include one or more ports for connecting several computing systems with one another or to another server computer. The I/O interface 112 may include one or more ports for connecting several devices to one another or to another server.
The one or more hardware processors 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, node machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 102 is configured to fetch and execute computer-readable instructions stored in the memory 104.
The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 106. The memory 104 also includes a data repository (or repository) 110 for storing data processed, received, and generated by the plurality of modules 106.
The plurality of modules 106 include programs or coded instructions that supplement applications or functions performed by the system 100 for optimized binarization of convolutional neural network. The plurality of modules 106, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 106 may also be used as, signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 106 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 102, or by a combination thereof. The plurality of modules 106 can include various sub-modules (not shown). The plurality of modules 106 may include computer-readable instructions that supplement applications or functions performed by the system 100 for optimized binarization of convolutional neural network.
The data repository (or repository) 110 may include a plurality of abstracted piece of code for refinement and data that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 106.
Although the data repository 110 is shown internal to the system 100, it will be noted that, in alternate embodiments, the data repository 110 can also be implemented external to the system 100, where the data repository 110 may be stored within a database (repository 110) communicatively coupled to the system 100. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in FIG. 1) and/or existing data may be modified and/or non-useful data may be deleted from the database. In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS). Working of the components of the system 100 are explained with reference to the method steps depicted in FIG. 2.
FIG. 2 is an exemplary flow diagrams illustrating a method 200 for optimized binarization of convolutional neural network implemented by the system of FIG. 1 according to some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more data storage devices or the memory 104 operatively coupled to the one or more hardware processor(s) 102 and is configured to store instructions for execution of steps of the method 200 by the one or more hardware processors 102. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagram as depicted in FIG. 2. The method 200 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 200 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communication network. The order in which the method 200 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 200, or an alternative method. Furthermore, the method 200 can be implemented in any suitable hardware, software, firmware, or combination thereof.
At step 202 of the method 200, the one or more hardware processors 102 are configured by the programmed instructions to receive a Convolutional Neural Network (CNN) model to be optimized, wherein the CNN model is associated with a vision based deep learning solution. For example, the vision based deep learning solution includes hand-gesture recognition system, interactions in virtual-reality, gaming control, communication through sign language, medical rehabilitation and the like.
At step 204 of the method 200, one or more hardware processors 102 are configured by the programmed instructions to extract a network configuration and a layer-wise configuration associated with the CNN model using a parsing technique. For example, network configuration includes number of layers in network, number of trainable parameters, types of layers in the network and the like. Similarly, the layer-wise configuration includes type of layer operation, kernel sizes and number of filters for convolutional layers, stride values, padding values for pool layers and the like.
At step 206 of the method 200, one or more hardware processors 102 are configured by the programmed instructions to generate a Binarized CNN (BCNN) skeleton based on the network configuration and layer-wise configuration, wherein, the BCNN skeleton comprises a plurality of Binarized Convolutional (BC) blocks.
In an embodiment, the BCNN skeleton is shown in FIG. 3A, and the structure of the BC block is shown in FIG. 3B. Now referring to FIG. 3A, the BCNN skeleton includes a stem block and a plurality of BC blocks. Now referring to FIG. 3B, the BC block includes a binary convolutional 3x3 2D, a binary convolutional 1x1 2D, concatenation layer, MaxPool 2D, Binary batch normalization layer, and a binary activation layer. The stem-block includes a 3x3 Conv2D layer with 16 filters. The intermediate outputs of previous BC-Blocks is concatenated to current block using skip connections in the network to achieve higher accuracy on target dataset as well as to reduce overall computation during inference and total network parameters.
At step 208 of the method 200, one or more hardware processors 102 are configured by the programmed instructions to compute a plurality of skip connection parameters associated with each of the plurality of BC blocks using a skip connection optimization technique, wherein the plurality of skip connection parameters comprises (i) number of skip connections from previous BC block to current BC block (ii) the BC block number from which the skip connections are done and (iii) number of channels to be forwarded from a particular BC block. The skip connections associated with each BC block is optimized during training based on a plurality of optimization parameters, wherein the plurality of optimization parameters comprises accuracy and network complexity threshold. The skip connection optimization technique is explained as follows.
In an embodiment, the present disclosure introduces skip connections in BCNNs to generate network outputs as a linear or non-linear combination of the outputs of intermediate layers. It was assumed that a CNN has dense connections such as those in a dense block. Then the output of the ??-th convolutional layer of such a CNN can be represented as:
y_k=H_k (x_(k-1),H_(k-1) (x_(k-2)),H_(k-2) (x_(k-3)),..,H_1 (x_0),x_0) …………..(1)
where, H_k represents a convolutional layer that maps network input x_0, and outputs from (k - 1) number of previous layers to the ??-th layer output y_k, i.e., layer H_k receives channels from all its previous convolutional layers as input. As evident from Equation 1, with the increase of value of k, the number of channels at each layer will exponentially increase to become network inference bottlenecks. One of the solutions is to split the feature maps into two halves and feed-forward one half to reduce the number of channels at per layer output. This feed forwarding technique is used in the BCNN design. Although feedforwarding intermediate outputs to the fore-coming layers in the network helps reduce network parameters, it also reduces the overall model capacity due to sharing of trainable weights.
Let (?? + 1)-th layer of a BCNN require ????+1 number of channels as input. From Equation 1 it can be rewritten as the output of ??-th layer in the network as:
y_(k )= H_(k,3x3 ) (x_k)?H_(k,1x1 ) (?||?_(i?I_k ) x_i [1:2^i] ……………..(2)
Where ? refers to concatenation operation, ???? is a set of ?? elements combined from elements in the set S = 2,3,4,...(n + 1). The value r and elements i?I_k is searched such that (k - i) > 0. The 3x3 Conv2D in Equation 2 generates half of ????+1 channels, and the other half is concatenated from the output of 1x1 Conv2D. The value c_(k+1) is taken from the set (16, 32, 64, 128). The content of this set is fixed since increasing the number of elements in this set can increase the total search time of our optimization process, also higher number of filters can result in wider models unsuitable for edge/tiny-edge. The subset ???? can be represented as one of the combinations in C (n¦r) where n = min( (k - 2),v(c_(k+1)/2). The value of ?? indicates the number of skip connections from previous blocks to current block. The elements i?I_k indicate the block number from which the skip connections are done, and the number of channels to be forwarded (2?? number of channels from ??-th previous block) from that block. The combinations in I_k , c_(k+1), and the value of ?? are optimized in the search. By following the conventional search methodology, the categorical choice of a combination in I_k to a softmax over all possible combinations in ?? (Equation 3) and then represent the search as a bi-level optimization problem as shown in Equation 4.
(I_k ) ¯=?_(I_k?C)¦?[exp(aI_k)/?_(I_(k`)?C)¦?[exp(aI_(k`))]I_k ?? …………………………(3)
min-a??L_val (w ¯ (a),a):w ¯ (a)=?argmin?_w L_train (w,a)?…………….(4)
In Equation 3, the optimization of skip connections in a BC-Block is given. For the ??-th BC-Block, (I_k ) ¯ is the set of optimal combinations from all ?? possible combinations, and ?????? is the BC-Block architecture with I_k combinations of skip connections from its previous BC-Blocks. The overall optimization of BCNN skeleton architecture ?? is represented by Equation 4. The weights w and architecture a determines the training loss L_train and validation loss L_val . The Equations 4 and 3 are optimized together to obtain a ¯ optimal architecture with weights w ¯. The objective here is the minimization of ???????? and ????????????. The above explained methodology in the Algorithm 1.
At step 210 of the method 200, one or more hardware processors 102 are configured by the programmed instructions to concatenate intermediate outputs of previous BC blocks from among the plurality of BC blocks to a current BC block of the BCNN based on the plurality of skip connection parameters
At step 212 of the method 200, one or more hardware processors 102 are configured by the programmed instructions to generate an optimized BCNN using a Bi-level optimization technique, wherein the Bi-level optimization technique trains the BCNN model until a predefined training loss and a predefined validation loss is obtained. The predefined training loss and the predefined validation loss are the minimum loss values.
Algorithm 1: Finding optimal BCNN architecture from BCNN skeleton
Result: Optimal BCNN architecture ? ?? with ????? optimal combinations at any ??-th BC-Block/convolutional layer in the skeleton.
?? ?initial BCNN architecture skeleton;
?? ?number of BC-Blocks in BCNN skeleton;
?? ?total combinations for each block;
while ?????????????????????? do
for ?? in ?? ? (?? - 1) do
for ?? in ?? do
update weights ???? by ????????????;
end
update architecture ?????? by ????????;
end
end
In an embodiment, the present disclosure is experimented as follows: A customized dataset is created for car music control using hand gestures including 6 categories: Stop, Play, Pause, Rewind, Forward, and Continue. These are represented by the actual hand gestures: ’palm’, ’one’, ’fist’, ’dislike’, ’like’, and ’okay’ respectively. The dataset includes images collected from Raspberry Pi Camera as well as public dataset. The complete dataset is split into 6 : 2 : 2 ratio for training, validation and testing. For training purposes Ubuntu 18.04 workstation with Intel Xeon 32 Core processor, 16GB DRAM and 2 NVIDIA GTX 1080 8GB GPUs was used, and for edge implementation Raspberry Pi 3 Model B V1.2 (1GB RAM) was used. The present disclosure was also deployed in Arduino Nano BLE (Bluetooth Low Energy) sense platform.
FIG. 4 illustrates how validation loss decreases as the search for optimal BCNN configuration progresses with time on training GPU. During the search, a proxy-training of 15 epochs is performed on each BCNN architecture. After exploring approximately 300 architectures in 3 GPU-hours the optimal BCNN was obtained. FIG. 5 is a confusion matrix indicating the outcome of the present disclosure in gesture recognition system where the present disclosure/opt-BCNN reduces the number of false predictions due to complete binarization of the network.
Further, it is evident from the min-max normalized representation of average memory consumption and speed of the models during their inference in two edge devices, Rpi3 and Arduino Nano BLE, the optimal BCNN is 6.7x time efficient and 20x more memory efficient than conventional approaches.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
The embodiments of present disclosure herein address the unresolved problem of optimized binarization of convolutional neural network. The present disclosure provides a model to downsize a complex, multi-component, Deep Learning based vision based recognition pipeline by replacing some of the full precision Convolutional Neural Network components with Binary neural networks. Further, the present disclosure addressed the drastic accuracy drop resulting from the conversion by proposing unique architectural modifications. In an embodiment, the present disclosure provides 6.2x increase in network inference speed, 15x reduction in inference memory usage but an accuracy drop of 48% on our gesture data due to optimized binarization of CNN.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein such computer-readable storage means contain program-code means for implementation of one or more steps of the method when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs, GPUs and edge computing devices.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e. non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
, Claims:
1. A processor implemented method (200), the method comprising:
receiving (202), by one or more hardware processors, a Convolutional Neural Network (CNN) model to be optimized, wherein the CNN model is associated with a vision based deep learning solution;
extracting (204), by the one or more hardware processors, a network configuration and a layer-wise configuration associated with the CNN model using a parsing technique;
generating (206), by the one or more hardware processors, a Binarized CNN (BCNN) skeleton based on the network configuration and layer-wise configuration, wherein, the BCNN skeleton comprises a plurality of Binarized Convolutional (BC) blocks;
computing (208), by the one or more hardware processors, a plurality of skip connection parameters associated with each of the plurality of BC blocks using a skip connection optimization technique, wherein the plurality of skip connection parameters comprises (i) number of skip connections from previous BC block to current BC block (ii) the BC block number from which the skip connections are done and (iii) number of channels to be forwarded from a particular BC block;
concatenating (210), by the one or more hardware processors, intermediate outputs of previous BC blocks from among the plurality of BC blocks to a current BC block of the BCNN based on the plurality of skip connection parameters; and
generating (212), by the one or more hardware processors, an optimized BCNN using a Bi-level optimization technique, wherein the Bi-level optimization technique trains the BCNN model until a predefined training loss and a predefined validation loss is obtained.
2. The method as claimed in claim 1, wherein the number of BC-Blocks in the BCNN skeleton is equivalent to the number of convolutional layers in the CNN model and, wherein an intermediate output of a previous BC block is given as input to current BC blocks.
3. The method as claimed in claim 1, wherein each of the plurality of BC-blocks comprises mxm convolutional2d, 1x1 convolutional2d, maxpool2d, binary batch normalization and binary activation layers, wherein ‘m’ is derived from the plurality of layer-wise configuration of the CNN model.
4. The method as claimed in claim 1, wherein skip connections associated with each BC block is optimized during training based on a plurality of optimization parameters, wherein the plurality of optimization parameters comprises accuracy and network complexity threshold.
5. A system (100) comprising:
at least one memory (104) storing programmed instructions; one or more Input /Output (I/O) interfaces (112); and one or more hardware processors (102) operatively coupled to the at least one memory (104), wherein the one or more hardware processors (102) are configured by the programmed instructions to:
receive a Convolutional Neural Network (CNN) model to be optimized, wherein the CNN model is associated with a vision based deep learning solution;
extract a network configuration and a layer-wise configuration associated with the CNN model using a parsing technique;
generate a Binarized CNN (BCNN) skeleton based on the network configuration and layer-wise configuration, wherein, the BCNN skeleton comprises a plurality of Binarized Convolutional (BC) blocks;
compute a plurality of skip connection parameters associated with each of the plurality of BC blocks using a skip connection optimization technique, wherein the plurality of skip connection parameters comprises (i) number of skip connections from previous BC block to current BC block (ii) the BC block number from which the skip connections are done and (iii) number of channels to be forwarded from a particular BC block;
concatenate intermediate outputs of previous BC blocks from among the plurality of BC blocks to a current BC block of the BCNN based on the plurality of skip connection parameters; and
generate an optimized BCNN using a Bi-level optimization technique, wherein the Bi-level optimization technique trains the BCNN model until a predefined training loss and a predefined validation loss is obtained.
6. The system of claim 5, wherein the number of BC-Blocks in the BCNN skeleton is equivalent to the number of convolutional layers in the CNN model and, wherein an intermediate output of a previous BC block is given as input to current BC blocks.
7. The system of claim 5, wherein each of the plurality of BC-blocks comprises mxm convolutional2d, 1x1 convolutional2d, maxpool2d, binary batch normalization and binary activation layers, wherein ‘m’ is derived from the plurality of layer-wise configuration of the CNN model.
8. The system of claim 5, wherein skip connections associated with each BC block is optimized during training based on a plurality of optimization parameters, wherein the plurality of optimization parameters comprises accuracy and network complexity threshold.
| # | Name | Date |
|---|---|---|
| 1 | 202321062675-STATEMENT OF UNDERTAKING (FORM 3) [18-09-2023(online)].pdf | 2023-09-18 |
| 2 | 202321062675-REQUEST FOR EXAMINATION (FORM-18) [18-09-2023(online)].pdf | 2023-09-18 |
| 3 | 202321062675-FORM 18 [18-09-2023(online)].pdf | 2023-09-18 |
| 4 | 202321062675-FORM 1 [18-09-2023(online)].pdf | 2023-09-18 |
| 5 | 202321062675-FIGURE OF ABSTRACT [18-09-2023(online)].pdf | 2023-09-18 |
| 6 | 202321062675-DRAWINGS [18-09-2023(online)].pdf | 2023-09-18 |
| 7 | 202321062675-DECLARATION OF INVENTORSHIP (FORM 5) [18-09-2023(online)].pdf | 2023-09-18 |
| 8 | 202321062675-COMPLETE SPECIFICATION [18-09-2023(online)].pdf | 2023-09-18 |
| 9 | 202321062675-FORM-26 [17-10-2023(online)].pdf | 2023-10-17 |
| 10 | Abstract.jpg | 2024-01-11 |
| 11 | 202321062675-Proof of Right [24-01-2024(online)].pdf | 2024-01-24 |