Abstract: A CONTROLLER AND METHOD TO GENERATE BALANCED DATASET FOR VULNERABLE ASSESSMENT OF A TARGET MODEL Abstract The controller 110 is part of a system hosted in a server or cloud. The controller 110 configured to generate input vectors using combination of a first noise vector 104 and a second noise vector 106 from a group 102, and store as an initial dataset 108. The first noise vector 104 is different from the second noise vector 106. The controller 110 further queries a target model 112 by random selection of input vectors from the initial dataset 108. The target model 112 is the AI model. The controller 110 further configured to identify a subset of output classes of the target model 112, which are predicted lesser than a threshold. The controller 110 is then configured to process the input vectors through a targeted Fast Gradient Sign Method (FGSM) 114 in order to predict the subset of output classes, and generate the balanced dataset 116 of input vectors. Figure 1
Description:Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed.
Field of the invention:
[0001] The present invention relates to a controller to generate balanced dataset of input vectors for vulnerable assessment of a target model and a method thereof.
Background of the invention:
[0002] With the advent of data science, data processing and decision making systems are implemented using Artificial Intelligence (AI) modules. The AI modules use different techniques like machine learning, neural networks, deep learning etc. Most of the AI based systems, receive large amounts of data and process the data to train AI models. The trained AI models generate output based on the use cases requested by the user. Typically the AI systems are used in the fields of computer vision, speech recognition, natural language processing, audio recognition, healthcare, autonomous driving, manufacturing, robotics etc. where data is processed to generate required output based on certain rules/intelligence acquired through training.
[0003] To process the inputs and give a desired output, the AI systems use various models/algorithms which are trained using the training data. Once the AI system is trained using the training data, the AI systems use the models to analyze the real time data and generate appropriate result. The models may be fine-tuned in real-time based on the results. The AI models in the AI systems form the core of the system. Lots of effort, resources (tangible and intangible), and knowledge goes into developing these models.
[0004] It is possible that some adversary may try to tamper/manipulate/evade the AI model to create incorrect outputs. The adversary may use different techniques to manipulate the output of the model. One of the simplest techniques used by the adversary is where the adversary sends queries to the AI system using his own test data to compute or approximate the gradients through the model. Based on these gradients, the adversary can then manipulate the input in order to manipulate the output of the Model. Another technique is wherein the adversary may manipulate the input data to bring an artificial output. This will cause hardships to the original developer of the AI in the form of business disadvantages, loss of confidential information, loss of lead time spent in development, loss of intellectual properties, loss of future revenues etc. Hence there is a need to identify samples in the test data or generate samples that can efficiently extract internal information about the working of the models and assess the vulnerability of the AI system against those sample-based queries.
[0005] AI Models are prone to vulnerabilities and attacks. In an extraction attack, the attacker uses intelligent attack vectors to query the model. The attacker then builds a learnt labelled dataset to train a surrogate model. Through such a method, the attacker will be able to produce a functional equivalent model of the original model. But sometimes the attack vectors generated are prone to show class imbalance, meaning only one class will fire, another class will not at all be fired. In order to generate a surrogate model, with good accuracy it is better to have class balanced in the attack vectors. Addressing class imbalance in the attack vectors helps ensure that the surrogate model captures the nuanced behavior of the original model across different classes. By incorporating a balanced representation, the attacker can create a more reliable and functional equivalent model, reducing potential biases and inaccuracies.
[0006] According to a prior art WO2021095984, an apparatus and method for retraining substitute model for evasion is disclosed. The method talks about retraining a substitute model that partially imitates the target model by allowing the target model to misclassify for specific attack data. However, in a classifier type AI Model there is a need to identify adversarial input of attack vectors spread across all classes and test/asses the vulnerability of the AI Model against them.
Brief description of the accompanying drawings:
[0007] An embodiment of the disclosure is described with reference to the following accompanying drawings,
[0008] Fig. 1 illustrates a block diagram of a controller to generate balanced dataset of input vectors for vulnerable assessment of a target model, according to an embodiment of the present invention, and
[0009] Fig. 2 illustrates a method for generating balanced dataset of input vectors for vulnerable assessment of a target model, according to the present invention.
Detailed description of the embodiments:
[0010] It is important to understand some aspects of Artificial Intelligence (AI) technology and AI based systems, which can be explained as follows. Depending on the architecture of the implements, AI systems may include many components. One such component is an AI model. The AI model can be defined as reference or an inference set of data, which uses different forms of correlation matrices. Using these AI models and the data from these AI models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI models such as linear regression, naïve bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed and can be applied to any AI module irrespective of the AI model being executed. A person skilled in the art will also appreciate that the AI model may be implemented as a set of software instructions, combination of software and hardware or any combination of the same.
[0011] Some of the typical tasks performed by AI systems are classification, clustering, regression etc. Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are, face recognition, object identification, gesture recognition, voice recognition etc. In a regression task, the model is trained based on labeled datasets, where the target labels are numeric values. Some of the typical applications of regressions are, Weather forecasting, Stock price predictions, House price estimation, energy consumption forecasting etc. Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities.
[0012] As the AI module forms the core of the AI system, the module needs to be protected against attacks. An AI adversarial threats can be largely categorized into – model extraction attacks, inference attacks, evasion attacks, and data poisoning attacks. In poisoning attacks, the adversarial carefully inject crafted data to contaminate the training data which eventually affects the functionality of the AI system. In inference attacks, an attempt to infer the training data from the corresponding output or other information leaked by a target model 112. Studies have shown that it is possible to recover training data associated with arbitrary model output. Ability to extract this data further possess data privacy issues. Evasion attacks are the most prevalent kind of attack that may occur during AI system operations. In this method, the attacker works on the AI algorithm's inputs to find small perturbations leading to large modifications of its outputs (e.g., decision errors) which leads to evasion of the AI model.
[0013] In Model Extraction Attacks (MEA), the attacker gains information about the model internals through analysis of input, output, and other external information. Stealing such a model reveals the important intellectual properties of the organization and enables the attacker to craft other adversarial attacks such as evasion attacks. This attack is initiated through an attack vector. In the computing technology a vector may be defined as a method in which a malicious code/virus data uses to propagate itself such as to infect a computer, a computer system, or a computer network. Similarly, an attack vector is defined a path or means by which a hacker can gain access to a computer or a network in order to deliver a payload or a malicious outcome. A model stealing attack uses a kind of attack vector that can make a digital twin/replica/copy of an AI module.
[0014] The attacker typically generates random queries of the size and shape of the input specifications and starts querying the model with these arbitrary queries. This querying produces input-output pairs for random queries and generates a secondary dataset that is inferred from the pre-trained model. The attacker then take this I/O pairs and trains the new model from scratch using this secondary dataset. This is a black box model attack vector where no prior knowledge of original model is required. As the prior information regarding model is available and increasing, attacker moves towards more intelligent attacks.
[0015] The attacker chooses relevant dataset at his disposal to extract model more efficiently. Our aim through this disclosure is to identify queries that give the best input/output pair needed to evade the trained model. Once the set of queries in the dataset that can efficiently evade the model are identified, we assess the vulnerability of the AI system against those queries. For the purposes of this disclosure our objective to assess the vulnerability of a classifier AI model (or target model 112) against all classes of attack vectors.
[0016] Fig. 1 illustrates a block diagram of a controller to generate balanced dataset of input vectors for vulnerable assessment of a target model, according to an embodiment of the present invention. The controller 110 is part of a system hosted in a server or cloud. The controller 110 configured to generate input vectors using combination of a first noise vector 104 and a second noise vector 106 from a group 102, and store as an initial dataset 108. The first noise vector 104 is different from the second noise vector 106. The controller 110 further queries a target model 112 by random selection of input vectors from the initial dataset 108. The target model 112 is the AI model as described earlier. The controller 110 further configured to identify a subset of output classes of the target model 112, which are predicted lesser than a threshold. The identification is either manual or automatic. The controller 110 is then configured to process the input vectors through a FGSM 114 or a targeted Fast Gradient Sign Method (FGSM) 114 in order to predict the subset of output classes, and generate the balanced dataset 116 of input vectors. The FGSM 114 is an evasion technique.
[0017] According to an embodiment of the present invention, to process the input vectors through the targeted FGSM 114 and generation of balanced dataset 116, the controller 110 configured to modify the input vector through application of the targeted FGSM 114. The modification indicates adding of noise or perturbations. The controller 110 then queries the modified input vector to the target model 112 in order to flip prediction/label in favor of the identified subset of output classes which were predicted lesser than the threshold. Once the prediction/label is flipped, the controller 110 accumulates and generates the balanced dataset 116 of input vectors after the targeted FGSM 114, i.e. the controller 110 accumulates the input vectors from the initial dataset 108 for which the output classes were predicted, and input vectors processed by the targeted FGSM 114 for obtaining the identified subset of output classes to generate/create the balanced dataset 116 of input vectors.
[0018] According to an embodiment of the present invention, the generated input vectors are shuffled, by a shuffling module, within the initial dataset 108 before the target model 112 is queried by the controller 110. The controller 110 is configured to monitor labels of output classes of the target model 112 for each input vector which is queried to the target model 112 to obtain the subset of output classes.
[0019] According to an embodiment of the present invention, the controller 110 is applicable for image data and time series data. The first noise vector 104 and the second noise vector 106 for the image data is selected from a group comprising a gaussian noise, a square box, circles, a checker box, circles (different color), a square box inverted, a circular blob inverted, a Laplacian, a synthetic, a uniform, and gaussian filter or attack vectors known in the art. The first noise vector 104 and the second noise vector 106 for the time series data is selectable from a group comprising random, and sine waves with chirp and other attack vectors known in the art.
[0020] In accordance to an embodiment of the present invention, the controller 110 is provided with necessary signal detection, acquisition, and processing circuits. The controller 110 is the one which comprises input/output interfaces having pins or ports, the memory element (not shown) such as Random Access Memory (RAM) and/or Read Only Memory (ROM), Analog-to-Digital Converter (ADC) and a Digital-to-Analog Convertor (DAC), clocks, timers, counters and at least one processor (capable of implementing machine learning) connected with each other and to other components through communication bus channels. The memory element is pre-stored with logics or instructions or programs or applications or modules/models and/or threshold values/ranges, reference values, predefined/predetermined criteria/conditions, which is/are accessed by the at least one processor as per the defined routines. The internal components of the controller 110 are not explained for being state of the art, and the same must not be understood in a limiting manner. The controller 110 may also comprise communication units such as transceivers to communicate through wireless or wired means such as Global System for Mobile Communications (GSM), 3G, 4G, 5G, Wi-Fi, Bluetooth, Ethernet, serial networks, and the like. The controller 110 is implementable in the form of System-in-Package (SiP) or System-on-Chip (SOC) or any other known types. Examples of controller 110 comprises but not limited to, microcontroller, microprocessor, microcomputer, etc.
[0021] Further, the processor may be implemented as any or a combination of one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The processor is configured to exchange and manage the processing of various AI models.
[0022] According to an embodiment of the present invention, a working example of the controller 110 is explained. Consider in a server/cloud an AI framework is hosted by a service provider. The target model 112 is provided/uploaded by a customer to the AI framework for vulnerability assessment along with respective input-output paid data. Once the target model 112 is received, the controller 110 in the cloud generates the input vectors (or attack vectors) using combination of first noise vector 104 and the second noise vector 106. For example, for generating gaussian noise attack vectors, random values from the gaussian distribution is generated and added to input data. Similarly, for square box attack vectors, an image/input can be created where a specific region is filled with a solid color or pattern. The controller 110 then randomly shuffles the generated input vectors to ensure a diverse distribution of samples. The controller 110 now predicts against the target model 112, i.e. uses the shuffled input vectors to query the target model 112 and observes the predicted classes for each input vector. The controller 110 then check for class imbalance, i.e. analyzes the distribution of predicted classes in the input vectors. The controller 110 is either aided by a human for the analysis or is done automatically. If significant class imbalance is detected, the controller 110 uses targeted FGSM 114 to flip labels of the predicted output. The controller 110 applies the FGSM 114 to flip the labels of selected input vectors. The controller 110 calculates the gradients of the loss function with respect to the input data, and then adjusts the input data to maximize the loss in order to force a different prediction. The controller 110 is configured to apply the FGSM 114 to a selected number of samples in the input vectors to balance the class distribution. Once done, the controller 110 accumulates the input vectors of the initial dataset with the modified input vectors and forms the balanced dataset 116 of input vectors. The balanced dataset 116 is now ready to build test building a surrogate model of the target model 112 to check for vulnerability.
[0023] According to an embodiment of the present invention, the balanced dataset 116 of the input vectors is usable to build the surrogate model or a replica model of the target model 112 as the balanced dataset 116 is for all the output classes in proper distribution. The same is used to assess or analyze the vulnerability of the target model 112 and then usable to build a defense model.
[0024] Fig. 2 illustrates a method for generating balanced dataset of input vectors for vulnerable assessment of a target model, according to the present invention. The method comprises plurality of steps, of which a step 202 comprises generating input vectors by combining the first noise vector 104 and the second noise vector 106, and storing as the initial dataset 108. The first noise vector 104 is different from the second noise vector 106 and are selected from the group 102. A step 204 comprises querying the target model 112 by randomly selecting input vectors from the initial dataset 108. A step 206 comprises identifying the subset of output classes which are predicted lesser than the threshold by the target model 112. The set of output classes is obtained by observing labels of output classes of the target model 112 for each input vector which is queried to the target model 112. A step 208 comprises processing the input vectors through the targeted Fast Gradient Sign Method (FGSM) 114 in order to predict the subset of output classes, and generating the balanced dataset 116 of input vectors.
[0025] According to the method, the step 208 of processing through the targeted FGSM 114 and generating balanced input dataset comprises a step 210 of modifying input vector by applying the targeted FGSM 114. A further step 212 comprises querying the modified input vector to the target model 112 in order to flip prediction in favor of the subset of output classes. A step 214 comprises accumulating the modified input vector together with the initial dataset 108 to generate the balanced dataset 116 of input vectors (or attack vectors).
[0026] According to the method, before the step 204, a step 216 is optionally executable which comprises shuffling the generated input vectors within the initial dataset 108 before querying the target model 112.
[0027] According to the present invention, the method is applicable for image data and time series data. The first noise vector 104 and the second noise vector 106 for the image data is selected from the group comprising gaussian noise, Square Box, Circles, checker boxes, circles (different color), Square Box Inverted, circular blob inverted, Laplacian, Synthetic, Uniform, gaussian filter. The first noise vector 104 and the second noise vector 106 for the time series data is selectable from the group comprising Random, and Sine waves with chirp.
[0028] According to the present invention, to tackle the class imbalance issue pertaining to input vectors (also known as attack vectors), the targeted FGSM 114 is employed. By utilizing the targeted FGSM 114, the labels of a desired set of input vectors are effectively flipped. This approach allows for manipulating the predictions and perturbing the gradients to create a balanced distribution among the classes. By applying the technique, the class imbalance problem is addressed comprehensively. The targeted FGSM 114 is a widely used technique in the field of adversarial machine learning, providing a robust method for flipping labels and introducing controlled perturbations. With this solution in place, the resulting distribution is balanced, ensuring fair representation for all input vectors. This approach is instrumental in creating a more reliable and robust system for handling class imbalance input vector.
[0029] It should be understood that the embodiments explained in the description above are only illustrative and do not limit the scope of this invention. Many such embodiments and other modifications and changes in the embodiment explained in the description are envisaged. The scope of the invention is only limited by the scope of the claims.
, Claims:We claim:
1. A controller (110) to generate balanced dataset (116) of input vectors for vulnerable assessment of a target model (112), characterized in that, said controller (110) configured to:
generate input vectors using combination of a first noise vector (104) and a second noise vector (106), and store said generated input vectors as an initial dataset (108), said first noise vector (104) is different from said second noise vector (106);
query said target model (112) by random selection of input vectors from said initial dataset (108);
identify a subset of output classes of said target model (112), which are predicted lesser than a threshold;
process said input vectors through a targeted Fast Gradient Sign Method (FGSM) (114) in order to predict said subset of output classes, and generate said balanced dataset (116) of input vectors.
2. The controller (110) as claimed in claim 1, wherein to process said input vectors through said targeted FGSM (114) and generation of balanced dataset (116), said controller (110) configured to
modify said input vector through said targeted FGSM (114);
query said modified input vector to said target model (112) in order to flip prediction in favor of said identified subset of output classes, and
accumulate said modified input vector together with said initial dataset (108) to generate said balanced dataset (116) of input vectors.
3. The controller (110) as claimed in claim 1, wherein said generated input vectors are shuffled within said initial dataset (108) before said target model (112) is queried.
4. The controller (110) as claimed in claim 1 is configured to monitor labels of output classes of said target model (112) for each input vector which is queried to said target model (112) to obtain said subset of output classes.
5. The controller (110) as claimed in claim 1 is applicable for image data and time series data, and said first noise vector (104) and said second noise vector (106) for said image data is selected from a group comprising gaussian noise, Square Box, Circles, checker boxes, circles (different color), Square Box Inverted, circular blob inverted, Laplacian, Synthetic, Uniform, gaussian filter, and wherein said first noise vector (104) and said second noise vector (106) for said time series data is selectable from a group comprising Random, and Sine waves with chirp.
6. A method for generating balanced dataset (116) of input vectors for vulnerable assessment of a target model (112), characterized by, said method comprising the steps of:
generating input vectors by combining a first noise vector (104) and a second noise vector (106), and storing as an initial dataset (108), said first noise vector (104) is different from said second noise vector (106);
querying said target model (112) by randomly selecting input vectors from said initial dataset (108);
identifying a subset of output classes of said target model (112) which are predicted lesser than a threshold, and
processing said input vectors through a targeted Fast Gradient Sign Method (FGSM) (114) in order to predict said subset of output classes, and generating said balanced dataset (116) of input vectors.
7. The method as claimed in claim 6, wherein processing through said targeted FGSM (114) and generating balanced input dataset (116) comprises
modifying input vector by applying said targeted FGSM (114);
querying said modified input vector to said target model (112) in order to flip prediction in favor of said subset of output classes, and
accumulating said modified input vector together with said initial dataset (108) to generate said balanced dataset (116) of input vectors.
8. The method as claimed in claim 6 comprises shuffling said generated input vectors within said initial dataset (108) before querying said target model (112).
9. The method as claimed in claim 6, wherein said set of output classes is obtained by observing labels of output classes of said target model (112) for each input vector which is queried to said target model (112).
10. The method as claimed in claim 6, wherein said method is applicable for image data and time series data, and said first noise vector (104) and said second noise vector (106) for said image data is selected from a group comprising gaussian noise, Square Box, Circles, checker boxes, circles (different color), Square Box Inverted, circular blob inverted, Laplacian, Synthetic, Uniform, gaussian filter, and wherein said first noise vector (104) and said second noise vector (106) for said time series data is selectable from a group comprising Random, and Sine waves with chirp.
| # | Name | Date |
|---|---|---|
| 1 | 202341052137-POWER OF AUTHORITY [03-08-2023(online)].pdf | 2023-08-03 |
| 2 | 202341052137-FORM 1 [03-08-2023(online)].pdf | 2023-08-03 |
| 3 | 202341052137-DRAWINGS [03-08-2023(online)].pdf | 2023-08-03 |
| 4 | 202341052137-DECLARATION OF INVENTORSHIP (FORM 5) [03-08-2023(online)].pdf | 2023-08-03 |
| 5 | 202341052137-COMPLETE SPECIFICATION [03-08-2023(online)].pdf | 2023-08-03 |
| 6 | 202341052137-Power of Attorney [09-05-2024(online)].pdf | 2024-05-09 |
| 7 | 202341052137-Covering Letter [09-05-2024(online)].pdf | 2024-05-09 |