Sign In to Follow Application
View All Documents & Correspondence

Methods And Systems For Implementing Privacy Enabled Artificial Intelligence Applications Using Fully Homomorphic Encryption

Abstract: Use of convolutional neural network (CNN) is becoming ubiquitous in solving computer vision problems. But training the CNN’s for performing a computer vision task is a challenge due to requirement of large amount of training data for training the model. Further, for training purposes, data needs to be shared with different entity who is working on CNN model, thus providing them access to shared data. As data consist of raw images, threat of compromising privacy of users associated with data is always there. Present application provides methods and systems for implementing privacy enabled AI applications using fully homomorphic encryption. The system first applies pruning algorithms over a trained CNN to obtain corresponding pruned CNNs. The system then uses variable length packing technique of FHE to optimize private inference for pruned CNNs. Thereafter, system uses multi-criteria decision analysis algorithm to obtain best network solution for trained CNN based on user requirements. [To be published with FIG. 4]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
27 October 2021
Publication Number
17/2023
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application
Patent Number
Legal Status
Grant Date
2024-04-24
Renewal Date

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. GUBBI LAKSHMINARASIMHA, Jayavardhana Rama
Tata Consultancy Services Limited Gopalan Global Axis, SEZ "H" Block, No. 152 (Sy No. 147,157 & 158), Hoody Village, Bangalore Karnataka India 560066
2. BHATTACHAR, Rajan Mindigal Alasingara
Tata Consultancy Services Limited Unit-III, No 18, 4th Floor, Cubicle No:3, SJM Towers, Seshadri Road, Gandhinagar, Bangalore Karnataka India 560009
3. SHAIK, Imtiyazuddin
Tata Consultancy Services Limited Deccan Park, Plot No 1, Survey No. 64/2, Software Units Layout, Serilingampally Mandal, Madhapur, Hyderabad Telangana India 500081
4. CHAUDHARI, Raj Anil
Tata Consultancy Services Limited Gopalan Global Axis, SEZ "H" Block, No. 152 (Sy No. 147,157 & 158), Hoody Village, Bangalore Karnataka India 560066
5. RANGARAJAN, Mahesh
Tata Consultancy Services Limited Gopalan Global Axis, SEZ "H" Block, No. 152 (Sy No. 147,157 & 158), Hoody Village, Bangalore Karnataka India 560066
6. KUMAR, Sahil
Tata Consultancy Services Limited Gopalan Global Axis, SEZ "H" Block, No. 152 (Sy No. 147,157 & 158), Hoody Village, Bangalore Karnataka India 560066
7. SANTHANAM, Sivakumar Kuppusamy
Tata Consultancy Services Limited No :-185/188 Lloyds Road, Gopalapuram, Chennai Tamil Nadu India 600086
8. PURUSHOTHAMAN, Balamuralidhar
Tata Consultancy Services Limited Gopalan Global Axis, SEZ "H" Block, No. 152 (Sy No. 147,157 & 158), Hoody Village, Bangalore Karnataka India 560066
9. LODHA, Sachin
Tata Consultancy Services Limited Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Hadapsar, , Pune Maharashtra India 411013

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
METHODS AND SYSTEMS FOR IMPLEMENTING PRIVACY
ENABLED ARTIFICIAL INTELLIGENCE APPLICATIONS USING
FULLY HOMOMORPHIC ENCRYPTION
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to data privacy, and, more particularly, to methods and systems for implementing privacy enabled artificial intelligence (AI) applications using Fully Homomorphic Encryption (FHE).
BACKGROUND
[002] Computer vision is among one of the fast-advancing technologies due to its application in multiple areas, such as face recognition and identification, crowd analytics, smart cars, medical imaging etc. Nowadays, the machine learning powered by convolutional neural network (CNN) is widely being used in computer vision as it is becoming ubiquitous in solving computer vision problems. But training the CNN’s for performing a task associated with computer vision is another challenge as it requires a large amount of data for training a model to perform that particular task.
[003] In a conventional setting, once a machine is trained and the model is created, the raw test images are provided as input at the inference stage. The trained model is then deployed during inferencing either in a Cloud or on a client machine. But most of the times, a business who is an expert in creating machine learning models creates the model for performing the task and a company (referred as client) who has hired the business provides the company data to the creator of the model for training as well as for inference purposes. This sharing of client data with the business is always not welcomed as it sometimes compromises privacy of users whose data is collected as part of client data. Further, in case the client data that is to be shared with the business is subjected to regulatory compliance, there is no option to share the client data and thus making the client data unusable. Similarly, during testing also, the client data must be shared with the business. So, irrespective of the deployment scenarios (Cloud or Edge), client data is required to be shared. On the other hand, the business developing the machine learning model cannot deploy the model code at client side due to copyright infringement issues.
[004] To overcome the above mentioned challenges, encryption can be used to ensure security during data transmission, but encryption also cannot ensure

data privacy as computations cannot be performed on the encrypted data. The encrypted data needs to be decrypted first for performing computations thus causing privacy issues. One more issue associated with the encryption is that performing private inference of CNN on encrypted data is computationally expensive.
[005] Additionally, cameras are used for collecting data for implementation of the computer vision models that sometimes create additional challenge of exposing people and their activities which may not be allowed under general data protection regulation (GDPR) laws.
SUMMARY [006] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, there is provided a processor implemented method for implementing privacy enabled artificial intelligence (AI) applications using Fully Homomorphic Encryption (FHE). The method comprises receiving, by a private inferencing system via one or more hardware processors, (i) a trained convolutional neural network (CNN), the trained CNN comprising a plurality of layers, each layer of the plurality of layers comprising a plurality of neurons, (ii) one or more pruning algorithms, (iii) a list of user requirement constraints, the list comprising a range of required time constraint values, a range of required memory constraint values, a range of required accuracy constraint values, and a range of required security constraint values, (iv) a trained CNN layer number, and (v) a server side layer number; determining, by the private inferencing system via one or more hardware processors, a plurality of estimated time values, a plurality of estimated memory values, a plurality of estimated accuracy values and a plurality of estimated security values for each pruning algorithm of the one or more pruning algorithms by iteratively performing: applying, by the private inferencing system via one or more hardware processors, a pruning algorithm of the one or more pruning algorithms on the trained CNN to obtain a pruned CNN corresponding to the pruning algorithm; determining, by the private inferencing system via one or more hardware

processors, a minimum in-degree of the pruned CNN; assigning, by the private inferencing system via one or more hardware processors, a packing length for the pruned CNN based on the minimum in-degree of the pruned CNN, wherein the packing length corresponds to a number of neuron inputs to be packed in each layer of the pruned CNN; iteratively performing: performing, by the private inferencing system via one or more hardware processors, packing of the plurality of neuron inputs in each layer of the pruned CNN based on the packing length to obtain a packed CNN; estimating, by the private inferencing system via one or more hardware processors, an estimated time value, an estimated memory value, an estimated accuracy value and an estimated security value for the packed CNN based on the pruned CNN, the trained CNN layer number, and the server side layer number; adding, by the private inferencing system via one or more hardware processors, the estimated time value in a list of estimated time values, the estimated memory value in a list of estimated memory values, the estimated accuracy value in a list of estimated accuracy values and the estimated security value in a list of estimated security values; and increasing, by the private inferencing system via one or more hardware processors, the packing length based on a pre-defined criteria, until the packing length reach a maximum packing length for the packed CNN; identifying, by the private inferencing system via one or more hardware processors, time values present in the list of estimated time values as the plurality of estimated time values, memory values present in the list of estimated memory values as the plurality of estimated memory values, accuracy values present in the list of estimated accuracy values as the plurality of estimated accuracy values and security values present in the list of estimated security values as the plurality of estimated security values for the corresponding pruning algorithm; identifying, by the private inferencing system via one or more hardware processors, another pruning algorithm among the one or more pruning algorithms as the pruning algorithm, until all the pruning algorithms in the one or more pruning algorithms are identified; determining, by the private inferencing system via one or more hardware processors, a network solution for the trained CNN based, at least in part, on the plurality of estimated time values, the plurality of estimated memory values, the

plurality of estimated accuracy values and the plurality of estimated security values that are determined for each pruning algorithm and the list of user requirement constraints using a multi-criteria decision analysis algorithm; and displaying, by the private inferencing system via one or more hardware processors, the network solution, wherein the network solution comprises a recommended pruning algorithm among the one or more pruning algorithms, an optimal packing length and a Fully Homomorphic Encryption (FHE) parameter.
[007] In another aspect, there is provided a private inferencing system for implementing privacy enabled AI applications using FHE. The system comprises a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive (i) a trained convolutional neural network (CNN), the trained CNN comprising a plurality of layers, each layer of the plurality of layers comprising a plurality of neurons, (ii) one or more pruning algorithms, (iii) a list of user requirement constraints, the list comprising a range of required time constraint values, a range of required memory constraint values, a range of required accuracy constraint values, and a range of required security constraint values, (iv) a trained CNN layer number, and (v) a server side layer number; determine a plurality of estimated time values, a plurality of estimated memory values, a plurality of estimated accuracy values and a plurality of estimated security values for each pruning algorithm of the one or more pruning algorithms by iteratively performing: applying a pruning algorithm of the one or more pruning algorithms on the trained CNN to obtain a pruned CNN corresponding to the pruning algorithm; determining a minimum in-degree of the pruned CNN; assigning a packing length for the pruned CNN based on the minimum in-degree of the pruned CNN, wherein the packing length corresponds to a number of neuron inputs to be packed in each layer of the pruned CNN; iteratively performing: performing packing of the plurality of neuron inputs in each layer of the pruned CNN based on the packing length to obtain a packed CNN; estimating an estimated time value, an estimated memory value, estimated accuracy value and an estimated

security value for the packed CNN based on the pruned CNN, trained CNN layer number, and the server side layer number; adding the estimated time value in a list of estimated time values, the estimated memory value in a list of estimated memory values, the estimated accuracy value in a list of estimated accuracy values and the estimated security value in a list of estimated security values; increasing, by the private inferencing system via one or more hardware processors, the packing length based on a pre-defined criteria, until the packing length reach a maximum packing length for the packed CNN; identifying time values present in the list of estimated time values as the plurality of estimated time values, memory values present in the list of estimated memory values as the plurality of estimated memory values, accuracy values present in the list of estimated memory values as the plurality of estimated memory values and security values present in the list of estimated security values as the plurality of estimated security values for the corresponding pruning algorithm; identifying another pruning algorithm among the one or more pruning algorithms as the pruning algorithm, until all the pruning algorithms in the one or more pruning algorithms are identified; determine a network solution for the trained CNN based, at least in part, on the plurality of estimated time values, the plurality of estimated memory values, the plurality of estimated accuracy values and the plurality of estimated security values that are determined for each pruning algorithm and the list of user requirement constraints using a multi-criteria decision analysis algorithm; and display the network solution, wherein the network solution comprises a recommended pruning algorithm among the one or more pruning algorithms, an optimal packing length and a Fully Homomorphic Encryption (FHE) parameter.
[008] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause a method for implementing privacy enabled AI applications using FHE. The method comprises receiving, by a private inferencing system via one or more hardware processors, (i) a trained convolutional neural network (CNN), the trained CNN comprising a plurality of layers, each layer of the plurality of layers comprising a

plurality of neurons, (ii) one or more pruning algorithms, (iii) a list of user requirement constraints, the list comprising a range of required time constraint values, a range of required memory constraint values, a range of required accuracy constraint values and a range of required security constraint values, (iv) a trained CNN layer number, and (v) a server side layer number; determining, by the private inferencing system via one or more hardware processors, a plurality of estimated time values, a plurality of estimated memory values, a plurality of estimated memory values and a plurality of estimated security values for each pruning algorithm of the one or more pruning algorithms by iteratively performing: applying, by the private inferencing system via one or more hardware processors, a pruning algorithm of the one or more pruning algorithms on the trained CNN to obtain a pruned CNN corresponding to the pruning algorithm; determining, by the private inferencing system via one or more hardware processors, a minimum in-degree of the pruned CNN; assigning, by the private inferencing system via one or more hardware processors, a packing length for the pruned CNN based on the minimum in-degree of the pruned CNN, wherein the packing length corresponds to a number of neuron inputs to be packed in each layer of the pruned CNN; iteratively performing: performing, by the private inferencing system via one or more hardware processors, packing of the plurality of neuron inputs in each layer of the pruned CNN based on the packing length to obtain a packed CNN; estimating, by the private inferencing system via one or more hardware processors, an estimated time value, an estimated memory value, an estimated accuracy value and an estimated security value for the packed CNN based on the pruned CNN, the trained CNN layer number, and the server side layer number; adding, by the private inferencing system via one or more hardware processors, the estimated time value in a list of estimated time values, the estimated memory value in a list of estimated memory values, the estimated accuracy value in a list of estimated accuracy values and the estimated security value in a list of estimated security values; and increasing, by the private inferencing system via one or more hardware processors, the packing length based on a pre-defined criteria, until the packing length reach a maximum packing length for the packed CNN; identifying, by the private

inferencing system via one or more hardware processors, time values present in the list of estimated time values as the plurality of estimated time values, memory values present in the list of estimated memory values as the plurality of estimated memory values, accuracy values present in the list of estimated accuracy values as the plurality of estimated accuracy values and security values present in the list of estimated security values as the plurality of estimated security values for the corresponding pruning algorithm; identifying, by the private inferencing system via one or more hardware processors, another pruning algorithm among the one or more pruning algorithms as the pruning algorithm, until all the pruning algorithms in the one or more pruning algorithms are identified; determining, by the private inferencing system via one or more hardware processors, a network solution for the trained CNN based, at least in part, on the plurality of estimated time values, the plurality of estimated memory values, the plurality of estimated accuracy values and the plurality of estimated security values that are determined for each pruning algorithm and the list of user requirement constraints using a multi-criteria decision analysis algorithm; and displaying, by the private inferencing system via one or more hardware processors, the network solution, wherein the network solution comprises a recommended pruning algorithm among the one or more pruning algorithms, an optimal packing length and a Fully Homomorphic Encryption (FHE) parameter.
[009] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS [010] The accompanying drawings, which are incorporated in and
constitute a part of this disclosure, illustrate exemplary embodiments and, together
with the description, serve to explain the disclosed principles:
[011] FIG. 1 is an example representation of an environment, related to at
least some example embodiments of the present disclosure.

[012] FIG. 2 illustrates an exemplary block diagram of a system for implementing privacy enabled artificial intelligence (AI) applications using Fully Homomorphic Encryption (FHE), in accordance with an embodiment of the present disclosure.
[013] FIGS. 3A, 3B and 3C, collectively, represent an exemplary flow diagram of a method for implementing privacy enabled AI applications using Fully Homomorphic Encryption, in accordance with an embodiment of the present disclosure.
[014] FIG 4 illustrates an example representation of a system architecture used for implementing privacy enabled AI applications using FHE, in accordance with an embodiment of the present disclosure.
[015] FIG 5 illustrates an example representation of a variable length packed convolutional neural network (CNN) used for implementing privacy enabled AI applications along with a regular CNN and a pruned CNN, in accordance with an embodiment of the present disclosure.
[016] FIG 6 is a tabular representation illustrating impact of variable length packing on performance of fully connected layer computations, in accordance with an embodiment of the present disclosure.
[017] FIG 7 is a tabular representation illustrating impact of pruning on accuracy and computation, in accordance with an embodiment of the present disclosure.
[018] FIG 8 is a tabular representation illustrating performance of private inference for different models, in accordance with an embodiment of the present disclosure.
[019] FIG 9 is a tabular representation illustrating different network solutions recommended by the system in different scenarios, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [020] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number

identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[021] Nowadays, convolutional neural networks (CNNs) have been widely used in computer vision because of their robust performance in a variety of applications. As discussed previously, computer vision-based algorithms, such as image detection algorithms, classification algorithms etc., are generally being offered as a service by cloud service providers, thus providing service providers an access to the rich data that is sent as a query. In case of the computer vision-based algorithms, the rich data typically consist of images thereby causing serious threat to the privacy of user. To handle this concern, many service providers have started using encryption to ensure security of the rich data during transmission. However, encryption has its own limitation as encryption can only ensure security of the data during transmission not data privacy.
[022] In the recent past, several techniques have been introduced in the field of private inference. For example, in a private inferencing technique by Nathan, Dowlin et al. (e.g., refer “Cryptonets: Applying neural network to encrypted data with high throughput and accuracy. Proceedings of the 33rd ICML (2016)”), authors applied neural networks to encrypted data with high throughput. The authors manage to show predictions on MNIST data and explained how each layer in a fully-connected block can be computed on encrypted data efficiently, thus achieving a throughput of 58,982 predictions per hour. Few other authors also followed the same approach in private inference by just making some changes in approximations and using new optimization methods. However, these techniques require constant communication between each layer of the CNN.
[023] To overcome the above challenge, Riazi et al.in XONN came up with a constant communication technique that can perform communication up to 21 layers of fully connected (FC) CNN using Oblivious Transfer (OT). Further, few

authors come up with a technique named Hcnn (first homomorphic CNN), that uses graphical processing unit support for private inference.
[024] Recently, some techniques based on end-to-end neural network (CNN+FC) have been introduced for private inference using fully homomorphic encryption (FHE) in which a need for communication is eliminated. Similarly, few more techniques have been introduced for inference in which a machine learning model is trained on encrypted data to ensure data privacy.
[025] Though the above mentioned techniques have shown substantially improved performance in ensuring security of the user data, but their focus remains on optimization of encryption implementations and corresponding libraries. However, private inferencing techniques that ensures security and privacy of user data by design itself i.e., techniques that can perform computer vision task on encrypted data is still to be explored.
[026] Embodiments of the present disclosure overcome the above-mentioned disadvantages, such as compromised privacy of user, higher computational expense, etc., by providing systems and methods for implementing privacy enabled AI applications using fully homomorphic encryption (FHE). More specifically, the systems and methods of the present disclosure follow a divide and conquer strategy for achieving private inference using FHE. Firstly, the system analyses the effect of a plurality of pruning algorithms on performance of a trained encrypted fully connected CNN for private inference by applying the plurality of pruning algorithms over the CNN to obtain the corresponding pruned CNNs. Secondly, the system uses a variable length packing technique of FHE to optimize private inference for the pruned CNNs. Thirdly, the system uses a multi-criteria decision analysis algorithm to obtain the best network solution for the trained encrypted CNN based on user requirements. The multi-criteria decision analysis algorithm takes user requirements like latency, communication cost, accuracy, security as inputs and analyses all possible solutions obtained using the plurality of pruned CNNs and variable length packing technique to output the best solution to realize private inference for the given CNN. The best solution includes a recommended pruning algorithm among the one or more pruning algorithms, an

optimal packing length that works best for the recommended pruning algorithm and a FHE parameter.
[027] In the present disclosure, system and method ensure security and privacy of data during transmission by providing a private inferencing system (explained in detail with reference to FIG. 1) that automatically recommends appropriate parameters in pruning and variable length packing while considering preferences of users for desired quality of service. The recommend appropriate parameters are then used to create optimal network design of the CNN model that ensures the privacy of data while reducing computational and memory overhead.
[028] Referring now to the drawings, and more particularly to FIGS. 1 through 9, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[029] FIG. 1 illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, extracting features from user data, encrypting user data, sharing the encrypted user data, etc. The environment 100 generally includes a user device 100 and a server 106, each coupled to, and in communication with (and/or with access to) a network 104. It should be noted that one user device is shown for the sake of explanation; there can be more number of user devices.
[030] The network 104 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among two or more of the parts or users illustrated in FIG. 1, or any combination thereof.

[031] Various entities in the environment 100 may connect to the network 104 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof.
[032] The user device 102 is associated with a data owner (e.g., a user or an entity such as an organization) who has hired an expert company to create a computer vision model 108 for performing a computer vision task. Example of the user device 102 include, but are not limited to, a personal computer (PC), a mobile phone, a tablet device, a Personal Digital Assistant (PDA), a voice activated assistant, a server, a sensor, a smartphone, and a laptop.
[033] The server 106 is associated with an entity (e.g., expert company) that has been hired by the data owner for generating the computer vision model 108. The server 106 may be, but is not limited to, network servers, data storage servers, web servers, interface/gateway servers, application servers, a cloud server, and virtual servers, etc. The server 106 includes the created the computer vision model 108 (also referred as model 108). In one embodiment, the model is a fully connected CNN model i.e., the model includes a plurality of fully connected layers and a decision layer for performing computation over encrypted data.
[034] For performing inference using the model 108, the model 108 requires user data. In traditional approaches, the user device 102 shares the user data with the server 106 using the network 104. In an embodiment, as the model 108 is the computer vision model, the user data may include raw-images/preprocessed raw images. The server, upon receiving the user data, performs the feature extraction and then the classification task is performed based on the extracted features using the model 108. As discussed earlier the traditional approaches suffer from multiple disadvantages like data privacy issues, high computational requirements, etc.
[035] To overcome the disadvantages of the traditional approaches, a three-stage process is followed. In a first stage, the user data is shared with a local

feature extractor for performing feature extraction. In an embodiment, the feature extractor is provided as part of the user device 104. In another embodiment, the feature extractor is provided in the server 106. In case the feature extractor is provided in the server 106, the user data is first encrypted using an encryption mechanism and then shared with the server 106 using the network 104. In a second stage, the extracted features are first encrypted to obtain encrypted features which are then shared with the server 106 for classification. In an embodiment, FHE encryption is performed over the extracted features. The model 108 included in the server 106 then makes the inference using the encrypted features to provide an encrypted decision as the model is already trained to make inference using plain as well as encrypted/FHE encrypted data. In a third stage, the encrypted decision is sent back to the user device 102 using the network 104. The data owner, upon receiving the encrypted decision on the user device 102, decrypts the encrypted decision using an original key.
[036] The environment 100 ensures that the server 108 does not have access to the user data, thereby ensuring privacy of the user data as the inference is made using the encrypted data and security of the user data as only the data owner possess the original key for decryption. However, traditional CNN model/fully connected model available in standard libraries cannot be employed for the same purpose because of their inability to work on encrypted data.
[037] Further, as the complexity of computation overhead of fully connected (FC) layers depends on number of layers, number of neurons in each layer and type of activation functions used in fully connected CNN, so to achieve an efficient implementation for the model 108 there is a need to reduce the network topology complexity and a good approximation for the activation function is required to replicate the results of a normal model. Reduction in the network topology complexity can be achieved by performing pruning of the fully connected CNN network.
[038] The term ‘pruning’, used herein, refers to a technique used to reduce weights (edges) i.e., the weights are reduced to zero for edges in a neural network that do not contribute to classification that is to be performed by the neural network.

In the neural network, for a given set of edges, some edges might have weights that are so small in magnitude that they do not contribute or significantly change an output of a neuron for any given input. Hence, such edges can be ignored, and their weights can be considered as zero to reduce amount of computation that needs to be performed by the neural network. Similarly, if a neuron in the neural network has no outgoing edges to any other neurons of the next layer, that neuron itself can be removed and subsequently, all the incoming edges to it are removed as well to further reduce the amount of computation. Pruning technique is generally applied after completion of training of a neural network model to remove such edges from the neural network. In some cases, pruning reduces computational cost by up to 80% without significantly reducing the accuracy of the neural network model. Briefly, for a given neuron k at layer i, dik+ and dik- denotes the number of incoming and outgoing edges. Then after pruning we get:
dik+ ≤ dik+ and dik′- ≤ dik-, Where dik′+ and dik′- denotes the number of incoming and outgoing edges for neuron k at layer i after pruning is completed.
[039] As there are multiple pruning algorithms available for pruning fully connected CNN, a pruning algorithm that will work best for the CNN model 108 needs to be determined and applied to the model 108.
[040] To further improvise the effects of pruning (i.e., reduction in computational complexity), an efficient packing technique of FHE (which helps in reducing memory and communication cost) can be used to optimize private inference for fully connected CNN. However, packing techniques that are available for pruned network are very minimal as there is a chance of introducing irregularity in a network topology. So, the server 106 can use a new packing technique referred as ‘variable length packing’ in which number of neuron inputs to be packed in each layer of the network differs for further optimizing private inference for the pruned CNN network.
[041] Based on above discussion, it can be inferred that an efficient design (with respect to performance, accuracy, etc.) to implement private inference for the CNN model 108 depends on the efficiency of a pruning algorithm, packing

technique, approximation functions for activation function, split architecture and FHE scheme used.
[042] Further, due to availability of a plurality of pruning, packing and FHE techniques, a large set of solutions are possible for implementing private inference. To address this, a private inferencing system (discussed in detail with reference to FIGS 2 and 3A-3C) is provided that provides a best network solution for a CNN model, such as the CNN model 108 while taking into consideration requirements, such as latency, communication cost, accuracy etc., of the data owner. The best network solution provided by the private inferencing system may then be used to update the trained CNN model, such as the CNN model 208 for implementing private inference. An application using the trained CNN model for performing the private inference task may then be referred as privacy enabled AI application as the application will ensure user data security and privacy.
[043] FIG. 2 illustrates an exemplary block diagram of a private inferencing system 200 for implementing privacy enabled artificial intelligence (AI) applications using fully homomorphic encryption (FHE), in accordance with an embodiment of the present disclosure. In an embodiment, the private inferencing system 200 may also be referred as a system and may be interchangeably used herein. In some embodiments, the system 200 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture. In some embodiments, the system 200 may be implemented in a server system. In some embodiments, the system 200 may be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, and the like.
[044] In an embodiment, the system 200 includes one or more processors 204, communication interface device(s) or input/output (I/O) interface(s) 206, and one or more data storage devices or memory 202 operatively coupled to the one or more processors 204. The one or more processors 204 may be one or more software processing modules and/or hardware processors. In an embodiment, the hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational

instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 200 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[045] The I/O interface device(s) 206 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
[046] The memory 202 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment a database 208 can be stored in the memory 202, wherein the database 208 may comprise, but are not limited to inputs received from one or more user devices such as user data. In an embodiment, the memory 202 may store information pertaining to user requirement constraints, configuration of trained CNN, one or more pruning algorithms, pre-defined formula, and the like. The memory 202 further comprises (or may further comprise) information pertaining to input(s)/output(s) of each step performed by the systems and methods of the present disclosure. In other words, input(s) fed at each step and output(s) generated at each step are comprised in the memory 202 and can be utilized in further processing and analysis.
[047] FIGS. 3A, 3B and 3C, with reference to FIG. 1 and FIG. 2, collectively, represent an exemplary flow diagram of a method 300 for implementing privacy enabled AI applications using FHE, in accordance with an embodiment of the present disclosure. The method 300 may use the system 200 of

FIG. 2 for execution. In an embodiment, the system 200 comprises one or more data storage devices or the memory 202 operatively coupled to the one or more hardware processors 204 and is configured to store instructions for execution of steps of the method by the one or more processors 204. The sequence of steps of the flow diagram may not be necessarily executed in the same order as they are presented. Further, one or more steps may be grouped together and performed in form of a single step, or one step may have several sub-steps that may be performed in parallel or in sequential manner. The steps of the method 300 of the present disclosure will now be explained with reference to the components of the system 200 as depicted in FIG. 2, and the flow diagram.
[048] In an embodiment of the present disclosure, at step 302, the one or more hardware processors 204 comprised in the system 200 receive (i) a trained convolutional neural network (CNN), (ii) one or more pruning algorithms, (iii) a list of user requirement constraints, (iv) a trained CNN layer number, and (v) a server side layer number from the user device 102. The trained CNN includes a plurality of layers and each layer of the plurality of layers includes a plurality of neurons. The list includes a range of required time constraint values, a range of required memory constraint values, a range of required accuracy constraint values and a range of required security constraint values as provided by the data owner using the user device 102. In an embodiment, the server side layer number denotes the number of layer to be executed at a server, such as the server 106 for making inference.
[049] At step 304 of the present disclosure, the one or more hardware processors 204 of the system 200 determine a plurality of estimated time values, a plurality of estimated memory values, a plurality of estimated accuracy values and a plurality of estimated security values for each pruning algorithm of the one or more pruning algorithms by iteratively performing a plurality of steps 304a through 304f until all the pruning algorithms in the one or more pruning algorithms are identified.
[050] More specifically, at step 304a of the present disclosure, the one or more hardware processors 204 of the system 200 apply a pruning algorithm of the

one or more pruning algorithms on the trained CNN to obtain a pruned CNN corresponding to the pruning algorithm. In an embodiment, the pruned CNN may include a smaller number of edges as compared to the trained CNN.
[051] At step 304b of the present disclosure, the one or more hardware processors 204 of the system 200 determine a minimum in-degree of the pruned CNN. The in-degree of the pruned CNN represents the number of incoming edges.
[052] At step 304c of the present disclosure, the one or more hardware processors 204 of the system 200 assign a packing length for the pruned CNN based on the minimum in-degree of the pruned CNN. The packing length can also be refereed as minimum packing length as it corresponds to a minimum number of neuron inputs to be packed in each layer of the pruned CNN. At this step, the one or more hardware processors 204 of the system 200 also determines the maximum packing length for the packed CNN based on the received range of required security constraint values. The maximum packing length denotes a maximum number of neuron inputs that can be packed together in each layer of the pruned CNN. The maximum packing length denoted by lmax that can be packed depends on a ring
modulus
[053] In an embodiment, at step 304d of the present disclosure, the one or more hardware processors 204 of the system 200 iteratively perform a plurality of steps 304d1 through 304d4 until the packing length reach a maximum packing length for the packed CNN. The objective of this step is to try different packing scheme by varying the packing length on the pruned CNN and determining estimated time, memory, accuracy and security value for each used packing scheme. The above step can be better understood by way of following description.
[054] As discussed earlier, pruning leads to reduction in the number of neurons and edges of the FC layers that can further lead to reduction in the computation complexity of the CNN. So, in order to take the advantage of the pruned CNN, optimal ciphertext packing of elements needs to be determined i.e., an optimal packing length lmax that works best for the pruned CNN needs to be determined.

[055] For example, let wikj be set of all the weights of the incoming edges from the neurons of layer i to kth neuron of layer j. These weights are multiplied to feature vector fv. Assume all the elements of fv are encrypted as a single ciphertext, then an entire neuron computation can be skipped if all its incoming edges have weight zero after pruning. However, even if one of the incoming edge has a non-zero weight, wikj * fv needs to be computed. Due to this, there can be computation overhead. Hence, there is a need to determine the optimal packing length lopt to minimize the computation overhead.
[056] Variable length packing is proved to be very useful in scenarios where | fv | ≥ lmax, where | fv | represents dimension of feature vector. If |fv| < lmax , then variable length packing by default is set to lmax . It is observed that size of the packing can affect the performance of the computation. Since |fv| ≥ lmax ,
each neuron needs to operate on number of ciphertext packs thus leading to
computation/memory overhead due to a greater number of operations to be performed. For instance, for a FC network with p server side layers, m trained CNN layer number and n number of neurons in layer i, the total number of ciphertext packs (with maximum number of elements that can be packed) that are part of FC computations is given by:

[057] The number of ciphertext packs can be reduced by increasing N, but increase in N further increases the computation overhead. So, higher the N, higher is the computation complexity of FHE operations. Thus, to arrive at an optimal packing length lopt , all possible packing lengths (which are generally of 2x) needs to be explored. Then, the complexity (computation/memory) for the FC network is estimated to arrive at lopt for the pruned CNN. Further lopt is bounded by:
δ+ ≤ lopt ≤ l max, Where δ+ is the minimum in degree of pruned CNN, which is given by: δ+ =min(wikj+|m -p + 1 ≤ i ≤ m,1 ≤ k ≤ ni )

[058] Thus, for determining the optimal packing length, the system 200 needs to try different packing lengths starting from the minimum packing length till the maximum packing length on the pruned CNN. More specifically, at step 304d1 of the present disclosure, the one or more hardware processors 204 of the system 200 perform packing of the plurality of neuron inputs s in each layer of the pruned CNN based on the packing length to obtain the packed CNN. The objective of this step is to try variable length packing scheme on the pruned CNN starting from the minimum packing length to obtain the corresponding packed CNN.
[059] In an embodiment, at step 304d2 of the present disclosure, the one or more hardware processors 204 of the system 200 estimate an estimated time value, an estimated memory value, an estimated accuracy value and an estimated security value for the packed CNN based on the pruned CNN, the trained CNN layer number, and the server side layer number. The above step can be better understood by way of following description.
[060] For determining the estimated time value, the estimated memory value, the estimated accuracy value and the estimated security value for the packed CNN, the one or more hardware processors 204 of the system 200 first determine a starting CNN layer number by applying a pre-defined formula on the trained CNN layer number and the server side layer number. For example, if there are m number of CNN layers and p number of server side layer i.e., the layers that are to be executed by the server 106. Then the pre-defined formula for determining the starting CNN layer i can be presented as:
i=m-p+1
[061] Once the starting CNN layer number is determined, the one or more hardware processors 204 of the system 200, for each layer in the packed CNN from the starting CNN layer number, iteratively perform a plurality of steps until the layer reach the trained CNN layer number. A first step of the plurality of steps includes iteratively performing, for each packed neuron in the respective layer of the packed CNN, another set of steps until all the packed neurons with non-zero weight vectors in the respective layer are covered.

[062] In an embodiment, for performing a first step in the set of steps, the one or more hardware processors 204 of the system 200 calculate weight vector of the respective packed neuron input. The objective of this step is to calculate weight vector for each packed neuron input in each layer of the packed CNN starting from the starting CNN layer i.
[063] Thereafter, the one or more hardware processors 204 of the system 200, as part of a second step in the set of steps, determine whether the weight vector for the respective packed neuron input is non-zero. This step is performed to check whether the packed neuron input can lead to increase in computation overhead as the presence of non-zero weight vector indicates that there will computation overhead.
[064] Upon determining that the weight vector for the respective packed neuron input is non-zero, the one or more hardware processors 204 of the system 200, as part of a third step in the set of steps, estimate a neuron time value, a neuron memory value, a neuron accuracy value and a neuron security value for the respective packed neuron. In an embodiment, the neuron time value, the neuron memory value, the neuron accuracy value and the neuron security value help in determining computation overhead occurring due to the respective packed neuron.
[065] The neuron time values, the neuron memory values, the neuron accuracy values and the neuron security values estimated for each packed neuron in each layer of the packed CNN are then added by the one or more hardware processors 204 of the system 200 i.e., the neuron time values are added to obtain the estimated time value, the neuron memory values are added to obtain the estimated memory value, the neuron accuracy values are added to obtain the estimated accuracy value and the neuron security values are added to obtain the estimated security value for the packed CNN as part of a fourth step in the set of steps until all the packed neurons with non-zero weight vectors in the particular layer of the packed CNN are covered.
[066] Once the computing overhead calculation for one layer completes, the one or more hardware processors 204 of the system 200, as part of a second step of the plurality of steps, identify next layer after the respective layer in the packed

CNN until all layers in the packed CNN are covered i.e., until the layer reach the trained CNN layer number.
[067] Referring now to steps of FIGS. 3A and 3B, at step 304d3 of the present disclosure, the one or more hardware processors 204 of the system 200 add the estimated time value in a list of estimated time values, the estimated memory value in a list of estimated memory values, the estimated accuracy value in a list of estimated accuracy values and the estimated security value in a list of estimated security values.
[068] At step 304d4 of the present disclosure, the one or more hardware processors 204 of the system 200 increase the packing length based on a pre-defined criteria. In an embodiment, the pre-defined criteria incudes increasing the packing length by twice, until the packing length reach the maximum packing length for the packed CNN. For example, if a starting packing length is x, then the new packing length will x*2.
[069] At step 304e of the present disclosure, the one or more hardware processors 204 of the system 200 identify time values present in the list of estimated time values as the plurality of estimated time values, memory values present in the list of estimated memory values as the plurality of estimated memory values, accuracy values present in the list of estimated accuracy values as the plurality of estimated accuracy values and security values present in the list of estimated security values as the plurality of estimated security values for the corresponding pruning algorithm. The objective of this step is to get the list of estimated time values, the list of estimated memory values, the list of estimated accuracy values and the list of estimated security values for each pruning algorithm of the one or more pruning algorithms.
[070] At step 304f of the present disclosure, the one or more hardware processors 204 of the system 200 identify another pruning algorithm among the one or more pruning algorithms as the pruning algorithm until all the pruning algorithms in the one or more pruning algorithms are identified. Once the list of estimated time values, the list of estimated memory values, the list of estimated accuracy values and the list of estimated security values are available for the particular pruning

algorithm, the system identifies the next pruning algorithms among the one or more pruning algorithms and repeats the steps 304a to 304f.
[071] In an embodiment, at step 306 of the present disclosure, the one or more hardware processors 204 of the system 200 determine a network solution for the trained CNN based, at least in part, on the plurality of estimated time values, the plurality of estimated memory values, the plurality of estimated accuracy values and the plurality of estimated security values that are determined for each pruning algorithm and the list of user requirement constraints using a multi-criteria decision analysis algorithm, such as Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) or any other algorithm known in the art. In an embodiment, the network solution comprises a recommended pruning algorithm among the one or more pruning algorithms, an optimal packing length and a Fully Homomorphic Encryption (FHE) parameter. The objective of this step is to analyze all possible solutions and outputs the best solution to realize private inference for the given CNN model i.e., the CNN model 108 while taking into consideration the user requirement constraints. The above step can be better understood by way of the following description.
[072] Let Τ ,Μ and γ be the computation time, memory and communication cost required to realize private inference of a CNN, such as the CNN used in CNN model 108 with accuracy n and security A respectively. Assuming that CNN D consists of m layers with ni is the number of neurons in a layer i. Let there are p number of FC layers (layers from m-p + 1 to m). Let eik lj denote that there is an edge from kth neuron of layer i to Ith neuron of layer), which means output of kth neuron goes to Ith neuron. Let dik+ and dik- denotes the number of incoming and outgoing edges related to kth neuron of layer i. Let wiklj be the weight associated with the edge efj. Let Aik is the activation function for the kth neuron of layer i. Let ftk is the computation performed at kth neuron of layer i. Let fikτ, fikμ and /fiky, Aikτ, Aikμ and Aiky are the computing, memory and communication requirement to realize fik and Aik respectively for kth neuron of layer i. Let F denote the total computation performed to realize CNN D. Let Fτ Fμ and Fγ are the computing,

memory, and communication requirement to realize CNN D. Let Fn be the accuracy of the CNN realized using F. Note that in this formulation, the system 200 assume that accuracy of CNN inherently depends on topology of the CNN and approximation used for activation function. The above description stated an optimization problem. The main objective of this optimization problem is to arrive at an optimal network design (also referred as network solution) of CNN model to minimize computation/memory and communication cost with maximizing the accuracy of the CNN model for a required security.
argmin{Fτ}, argmin{Fμ}, argmin{Fγ}, argmax{Fn} Fτ = ∑mi=1 ∑nik=1(fikτ + Aik τ)

Subjected to
Fτ ≤ τ ;Fμ≤ μ;Fγ≤γ;Fn ≥ n
[073] Below is an exemplary algorithm used by the system (S) for determining a network solution for trained CNN for facilitating private inference as described by the present disclosure. The exemplary algorithm estimates the constraint parameters, such as time, memory, accuracy, and security values based on different pruning algorithms for a given CNN network. For each pruned network, different estimates of the performance of the private inference are tabulated by varying the packing lengths (see steps 5 to 22, a total number of ciphertext packings (totalPackc) for the fully connected layers is determined). Using the aggregated estimates of constraints for all the pruned algorithms, and optimization problem is solved using the multi-criteria decision analysis algorithm, such as TOPSIS (see step 26). The TOPSIS provides a best network solution as a recommendation to realize private inference based on quality of service as required

by the user. The network solution includes a recommended pruning algorithm among the available pruning algorithms, an optimal packing length and a Fully Homomorphic Encryption (FHE) parameter.
1: procedure S(CNN; nPrune; nPruneAlg[])
2: INPUT CNN, m; p, constraints { τ,μ and γ}
3: count = 0
4: for z in {1,2,...,nPrune} do
5: CNNprune = nPruneAlg[z](CNN)
6: Find δ^ + for FC layers of CNNprune
7: for length in { δ+, δ+ * 2.. l_max}, do
8: i_pack = [(|fv|)/length]
9: total packc = 0
10: for i in m-p + 1 do
11: for l in 0…ni - 1 do
12: for k in 0…l_max - 1 do
13: w = w_l ^k [k * length: (k + 1) * length]
14: if nonzero(w) then
15: total Packc = total Packc + 1
16: end if
17: end for
18: end for
19: end for
20: N = length *2
21: estimate [count] = evaluate_constraint ( CNN prune;
22: m; p;N; length; total Packc )
23: count = count + 1
24: end for
25: end for
26: RecomondedSolution=decisionanalysisalgorithm(estimate[])
27: end procedure
[074] The complexity of the above algorithm can be calculated using:

O(nPrune * log (lmax)* m*ni* length), Where nPrune is a number of pruning algorithms, m is trained CNN layer number and nt is maximum number of neurons in any given FC layer.
[075] FIG 4, with reference to FIGS. 1 through 4, illustrates an example representation 300 of a system architecture used for implementing privacy enabled AI applications using FHE, in accordance with an embodiment of the present disclosure.
[076] The representation 300 also includes an architecture of system used in prior arts for implementing inference.
[077] FIG 5, with reference to FIGS. 1 through 4, illustrates an example representation 500 of a variable length packed CNN used for implementing privacy enabled AI applications along with a regular CNN and a pruned CNN on a CIFAR-10 dataset, in accordance with an embodiment of the present disclosure.
[078] As seen in the FIG.5, the number of edges in the pruned CNN are less as compared to the regular CNN. Once variable length packing is performed over the pruned CNN, the pruned CNN is obtained based on a packing length used for performing packing.
[079] FIG 6 is a tabular representation illustrating impact of variable length packing on performance of fully connected layer computations, in accordance with an embodiment of the present disclosure.
[080] As seen in the FIG. 6, even though the CNN network is pruned with higher sparsity, total number of ciphertext packings is not reduced in proportion and thus no significant reduction in complexity of the computation is achieved using the variable length packing. This can be due to random pattern in pruning of edges across the different fully connected layers. Hence, there is a need to run different pruning algorithms for the given CNN network to obtain an optimal pruned network. It should be noted that a ‘pruned7’ model represents that the baseline model is pruned by 70% i.e., the number of edges in the neural network is reduced by 70% as compared to the baseline model. Similarly, ‘pruned8’ and ‘pruned9’ represents that the baseline model is pruned by 80% and 90%, respectively.

[081] FIG 7 is a tabular representation illustrating impact of pruning on accuracy and computation, in accordance with an embodiment of the present disclosure.
[082] As seen in the FIG. 7, the computation requirement drastically comes down by pruning while making almost negligible impact on the accuracy of the CNN network.
[083] FIG 8 is a tabular representation illustrating performance of private inference for different models, in accordance with an embodiment of the present disclosure.
[084] As seen in the FIG. 8, the approximation ReLU-3 (i.e., pruned and packed model) is performing better in all quality of service aspects, such as memory, accuracy, and time.
[085] FIG 9 is a tabular representation illustrating different network solutions recommended by the system 200 in different scenarios, in accordance with an embodiment of the present disclosure.
[086] As seen in the FIG. 9, when user wishes to give equal importance to all quality of service parameters, such as accuracy, time and memory, the baseline is recommended by the system as the best choice. Similarly, when user wishes to have more accuracy and less time and memory as negative weights are imparted to the time and the memory, but all are equally important, then pruned9 is selected and recommended as the best choice by the system. However, if the user wishes to give more importance to accuracy along with low time and memory, i.e., when 10 times higher importance is provided to the accuracy and negative weights are assigned to the time and memory, then the system 200 recommends the pruned8 as the best solution.
[087] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do

not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[088] As discussed earlier, the convolutional neural network (CNN) is widely being used in computer vision for solving computer vision problems. In order to protect the privacy and security of data, CNN models that can work on encrypted data are required. However, CNN models available in art that can work on encrypted data considerably increase the computation cost. To overcome the disadvantages, embodiments of the present disclosure provide methods and systems for implementing privacy enabled artificial intelligence (AI) applications using Fully Homomorphic Encryption (FHE). More specifically, the system first analyses the effect of a plurality of pruning algorithms on performance of a trained encrypted fully connected CNN for private inference by applying the plurality of pruning algorithms over the CNN to obtain the corresponding pruned CNNs. The system then uses a variable length packing technique of FHE to optimize private inference for the pruned CNNs. Thereafter, the system uses a multi-criteria decision analysis algorithm to obtain the best network solution for the trained encrypted CNN based on user requirements. The multi-criteria decision analysis algorithm takes user requirements like latency, communication cost, accuracy, security as inputs and analyses all possible solutions obtained using the plurality of pruned CNNs and variable length packing technique to output the best solution to realize private inference for the given CNN. The best solution includes a recommended pruning algorithm among the one or more pruning algorithms, an optimal packing length that works best for the recommended pruning algorithm and a FHE parameter. In performing the above method, the present disclosure ensures that privacy of data is maintained while performing data transmission and computation, while further eliminating the need to perform expensive and unwanted computation.
[089] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device

can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[090] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[091] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are

intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[092] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[093] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method, comprising:
receiving, by a private inferencing system via one or more hardware processors, (i) a trained convolutional neural network (CNN), the trained CNN comprising a plurality of layers, each layer of the plurality of layers comprising a plurality of neurons, (ii) one or more pruning algorithms, (iii) a list of user requirement constraints, the list comprising a range of required time constraint values, a range of required memory constraint values, a range of required accuracy constraint values, and a range of required security constraint values, (iv) a trained CNN layer number, and (v) a server side layer number;
determining, by the private inferencing system via one or more hardware processors, a plurality of estimated time values, a plurality of estimated memory values, a plurality of estimated accuracy values, and a plurality of estimated security values for each pruning algorithm of the one or more pruning algorithms by iteratively performing:
applying, by the private inferencing system via one or more hardware processors, a pruning algorithm of the one or more pruning algorithms on the trained CNN to obtain a pruned CNN corresponding to the pruning algorithm;
determining, by the private inferencing system via one or more hardware processors, a minimum in-degree of the pruned CNN;
assigning, by the private inferencing system via one or more hardware processors, a packing length for the pruned CNN based on the minimum in-degree of the pruned CNN, wherein the packing length corresponds to a number of neuron inputs to be packed in each layer of the pruned CNN;
iteratively performing:
performing, by the private inferencing system via one or more hardware processors, packing of the plurality of neuron inputs in each layer of the pruned CNN based on the packing length to obtain a packed CNN;

estimating, by the private inferencing system via one or more hardware processors, an estimated time value, an estimated memory value, an estimated accuracy value, and an estimated security value for the packed CNN based on the pruned CNN, the trained CNN layer number, and the server side layer number;
adding, by the private inferencing system via one or more hardware processors, the estimated time value in a list of estimated time values, the estimated memory value in a list of estimated memory values, the estimate accuracy value in a list of estimated accuracy values, and the estimated security value in a list of estimated security values; and
increasing, by the private inferencing system via one or more hardware processors, the packing length based on a pre-defined criteria,
until the packing length reach a maximum packing length for the packed CNN;
identifying, by the private inferencing system via one or more hardware processors, time values present in the list of estimated time values as the plurality of estimated time values, memory values present in the list of estimated memory values as the plurality of estimated memory values, accuracy values present in the list of estimated accuracy values as the plurality of estimated accuracy values and security values present in the list of estimated security values as the plurality of estimated security values for the corresponding pruning algorithm; and
identifying, by the private inferencing system via one or more hardware processors, another pruning algorithm among the one or more pruning algorithms as the pruning algorithm,
until all the pruning algorithms in the one or more pruning algorithms are identified;
determining, by the private inferencing system via one or more hardware processors, a network solution for the trained CNN based, at least in part, on the

plurality of estimated time values, the plurality of estimated memory values, the plurality of estimated accuracy values and the plurality of estimated security values that are determined for each pruning algorithm and the list of user requirement constraints using a multi-criteria decision analysis algorithm; and
displaying, by the private inferencing system via one or more hardware processors, the network solution.
2. The processor implemented method of claim 1, wherein the network solution comprises a recommended pruning algorithm among the one or more pruning algorithms, an optimal packing length and a Fully Homomorphic Encryption (FHE) parameter.
3. The processor implemented method of claim 1, wherein the step of iteratively performing is preceded by:
determining, by the private inferencing system via one or more hardware processors, the maximum packing length for the packed CNN based on the received range of required security constraint values.
4. The processor implemented method of claim 1, wherein the step of
estimating, by the private inferencing system via one or more hardware processors,
the estimated time value, the estimated memory value, the estimated accuracy value
and the estimated security value for the packed CNN based on the pruned CNN,
trained CNN layer number, and the server side layer number comprises:
determining, by the private inferencing system via one or more hardware processors, a starting CNN layer number by applying a pre-defined formula on the trained CNN layer number and the server side layer number;
for each layer in the packed CNN from the starting CNN layer number, iteratively performing:
for each packed neuron input in the respective layer of the packed
CNN, iteratively performing:

calculating, by the private inferencing system via one or more hardware processors, weight vector of the respective packed neuron input;
determining, by the private inferencing system via one or more hardware processors, whether the weight vector for the respective packed neuron input is non-zero;
upon determining that the weight vector for the respective packed neuron input is non-zero, estimating, by the private inferencing system via one or more hardware processors, a neuron time value, a neuron memory value, a neuron accuracy value, and a neuron security value for the respective packed neuron input; and
adding, by the private inferencing system via one or more hardware processors, the neuron time value to obtain the estimated time value, the neuron memory value to obtain the estimated memory value, the neuron accuracy value to obtain the estimated accuracy value and the neuron security value to obtain the estimated security value for the packed CNN,
until all the packed neuron inputs with non-zero weight vectors in the respective layer are covered, identifying, by the private inferencing system via one or more hardware processors, next layer after the respective layer in the packed CNN, until the layer reach the trained CNN layer number.
5. The processor implemented method of claim 1, wherein the multi-criteria
decision analysis algorithm is a Technique for Order of Preference by Similarity to
Ideal Solution (TOPSIS).
6. A private inferencing system, comprising:
a memory storing instructions;
one or more communication interfaces; and

one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to:
receive (i) a trained convolutional neural network (CNN), the trained CNN comprising a plurality of layers, each layer of the plurality of layers comprising a plurality of neurons, (ii) one or more pruning algorithms, (iii) a list of user requirement constraints, the list comprising a range of required time constraint values, a range of required memory constraint values, a range of required accuracy constraint values, and a range of required security constraint values, (iv) a trained CNN layer number, and (v) a server side layer number;
determine a plurality of estimated time values, a plurality of estimated memory values a plurality of estimated accuracy values, and a plurality of estimated security values for each pruning algorithm of the one or more pruning algorithms by iteratively performing:
applying a pruning algorithm of the one or more pruning algorithms on the trained CNN to obtain a pruned CNN corresponding to the pruning algorithm;
determining a minimum in-degree of the pruned CNN; assigning a packing length for the pruned CNN based on the minimum in-degree of the pruned CNN, wherein the packing length corresponds to a number of neuron inputs to be packed in each layer of the pruned CNN;
iteratively performing:
performing packing of the plurality of neuron inputs in each layer of the pruned CNN based on the packing length to obtain a packed CNN;
estimating an estimated time value, an estimated memory value, an estimated accuracy value and an estimated security value for the packed CNN based on the pruned CNN, trained CNN layer number, and the server side layer number;

adding the estimated time value in a list of estimated time values, the estimated memory value in a list of estimated memory values, the estimate accuracy value in a list of estimated accuracy values and the estimated security value in a list of estimated security values;
increasing, by the private inferencing system via one or more hardware processors, the packing length based on a pre-defined criteria,
until the packing length reach a maximum packing length for the packed CNN;
identifying time values present in the list of estimated time values as the plurality of estimated time values, memory values present in the list of estimated memory values as the plurality of estimated memory values, accuracy values present in the list of estimated accuracy values as the plurality of estimated accuracy values, and security values present in the list of estimated security values as the plurality of estimated security values for the corresponding pruning algorithm;
identifying another pruning algorithm among the one or more pruning algorithms as the pruning algorithm,
until all the pruning algorithms in the one or more pruning algorithms are identified;
determine a network solution for the trained CNN based, at least in part, on the plurality of estimated time values, the plurality of estimated memory values, the plurality of estimated accuracy values, and the plurality of estimated security values that are determined for each pruning algorithm and the list of user requirement constraints using a multi-criteria decision analysis algorithm; and display the network solution.
7. The system as claimed in claim 6, wherein the network solution comprises
a recommended pruning algorithm among the one or more pruning algorithms, an optimal packing length and a Fully Homomorphic Encryption (FHE) parameter.

8. The system as claimed in claim 6, the step of iteratively performing is
preceded by:
determining the maximum packing length for the packed CNN based on the received range of required security constraint values.
9. The system as claimed in claim 6, wherein the step of estimating, by the
private inferencing system via one or more hardware processors, the estimated time
value, the estimated memory value, the estimated accuracy value and an estimated
security value for the packed CNN based on the pruned CNN, trained CNN layer
number, and the server side layer number comprises:
determining a starting CNN layer number by applying a pre-defined formula on the trained CNN layer number and the server side layer number;
for each layer in the packed CNN from the starting CNN layer number, iteratively performing:
for each packed neuron input in the respective layer of the packed CNN, iteratively performing:
calculating weight vector of the respective packed neuron input;
determining whether the weight vector for the respective packed neuron input is non-zero;
upon determining that the weight vector for the respective packed neuron input is non-zero, estimating a neuron time value, a neuron memory value, a neuron accuracy value, and a neuron security value for the respective packed neuron input; and
adding the neuron time value to obtain the estimated time value, the neuron memory value to obtain the estimated memory value, the neuron accuracy value to obtain the estimated accuracy value, and the neuron security value to obtain the estimated security value for the packed CNN,
until all the packed neuron inputs with non-zero weight vectors in the respective layer are covered; identifying next layer after the respective layer in the packed CNN,

until the layer reach the trained CNN layer number.
10. The system as claimed in claim 6, wherein the multi-criteria decision
analysis algorithm is a Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS).

Documents

Application Documents

# Name Date
1 202121049153-STATEMENT OF UNDERTAKING (FORM 3) [27-10-2021(online)].pdf 2021-10-27
2 202121049153-REQUEST FOR EXAMINATION (FORM-18) [27-10-2021(online)].pdf 2021-10-27
3 202121049153-PROOF OF RIGHT [27-10-2021(online)].pdf 2021-10-27
4 202121049153-FORM 18 [27-10-2021(online)].pdf 2021-10-27
5 202121049153-FORM 1 [27-10-2021(online)].pdf 2021-10-27
6 202121049153-FIGURE OF ABSTRACT [27-10-2021(online)].jpg 2021-10-27
7 202121049153-DRAWINGS [27-10-2021(online)].pdf 2021-10-27
8 202121049153-DECLARATION OF INVENTORSHIP (FORM 5) [27-10-2021(online)].pdf 2021-10-27
9 202121049153-COMPLETE SPECIFICATION [27-10-2021(online)].pdf 2021-10-27
10 Abstract1.jpg 2021-12-16
11 202121049153-FORM-26 [20-04-2022(online)].pdf 2022-04-20
12 202121049153-FER.pdf 2023-09-21
13 202121049153-OTHERS [12-02-2024(online)].pdf 2024-02-12
14 202121049153-FER_SER_REPLY [12-02-2024(online)].pdf 2024-02-12
15 202121049153-CLAIMS [12-02-2024(online)].pdf 2024-02-12
16 202121049153-PatentCertificate24-04-2024.pdf 2024-04-24
17 202121049153-IntimationOfGrant24-04-2024.pdf 2024-04-24

Search Strategy

1 Search_202121049153E_15-09-2023.pdf

ERegister / Renewals

3rd: 02 May 2024

From 27/10/2023 - To 27/10/2024

4th: 30 Sep 2024

From 27/10/2024 - To 27/10/2025

5th: 24 Sep 2025

From 27/10/2025 - To 27/10/2026