Abstract: TITLE: A method (200) of assessing vulnerability of an AI Model (M) and a framework (100) thereof. ABSTRACT The present invention proposes a method (200) and a framework (100) to assess vulnerability of an AI Model (M). The framework (100) comprises the AI model (M), a stolen AI model (30), at least one generator AI model (40) and at least a processor (20). The framework (100) is designed is generate wide range of attack vectors that test the vulnerability of the AI model trained on regression algorithms. The processor (20) is configured to feed a first set of attack vectors generated by a generator AI model (40) to the trained AI model (M) and the stolen AI model (30) to generate a first output and a second output respectively. The first and second output to are compared compute a loss function that is fed back to the generator AI model (40) to improve the accuracy of the first set of attack vectors. Figure 1.
Description:
Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed
Field of the invention
[0001] The present disclosure relates to the field of Artificial Intelligence security. In particular, it proposes a method of assessing vulnerability of an AI Model and a framework thereof.
Background of the invention
[0002] With the advent of data science, data processing and decision making systems are implemented using artificial intelligence modules. The artificial intelligence modules use different techniques like machine learning, neural networks, deep learning etc. Most of the AI based systems, receive large amounts of data and process the data to train AI models. Trained AI models generate output based on the use cases requested by the user. Typically the AI systems are used in the fields of computer vision, speech recognition, natural language processing, audio recognition, healthcare, autonomous driving, manufacturing, robotics etc. where they process data to generate required output based on certain rules/intelligence acquired through training.
[0003] To process the inputs and give a desired output, the AI systems use various models/algorithms which are trained using the training data. Once the AI system is trained using the training data, the AI systems use the models to analyze the real time data and generate appropriate result. The models may be fine-tuned in real-time based on the results. The AI models in the AI systems form the core of the system. Lots of effort, resources (tangible and intangible), and knowledge goes into developing these models.
[0004] It is possible that some adversary may try to tamper/manipulate/evade the AI model to create incorrect outputs. The adversary may use different techniques to manipulate the output of the model. One of the simplest techniques used by the adversary is where the adversary sends queries to the AI system using his own test data to compute or approximate the gradients through the model. Based on these gradients, the adversary can then manipulate the input in order to manipulate the output of the Model. Another technique is wherein the adversary may manipulate the input data to bring an artificial output. This will cause hardships to the original developer of the AI in the form of business disadvantages, loss of confidential information, loss of lead time spent in development, loss of intellectual properties, loss of future revenues etc. Hence there is a need to identify samples in the test data or generate samples that can efficiently extract internal information about the working of the models and assess the vulnerability of the AI system against those sample-based queries. Hence there is a need to identify such manipulations to assess the vulnerability of the AI system.
[0005] There are methods known in the prior arts on the method of attacking an AI System. The prior art WO2021/095984 A1 – Apparatus and Method for Retraining Substitute Model for Evasion Attack and Evasion attack Apparatus discloses one such method. The method talks about retraining a substitute model that partially imitates the target model by allowing the target model to misclassify for specific attack data. However, in a classifier type AI Model there is a need to identify adversarial input of attack vectors spread across all classes and test the vulnerability of the AI Model against them.
Brief description of the accompanying drawings
[0006] An embodiment of the invention is described with reference to the following accompanying drawings:
[0007] Figure 1 depicts a framework for assessing vulnerability AI Model (M);
[0008] Figure 2 depicts an AI system (10);
[0009] Figure 2 illustrates method steps (200) of assessing vulnerability of an AI model (M).
Detailed description of the drawings
[0010] It is important to understand some aspects of artificial intelligence (AI) technology and artificial intelligence (AI) based systems or artificial intelligence (AI) system. Some important aspects of the AI technology and AI systems can be explained as follows. Depending on the architecture of the implements AI systems may include many components. One such component is an AI model. A model can be defined as reference or an inference set of data, which is use different forms of correlation matrices. Using these models and the data from these models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI models such as linear regression, naïve bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed and can be applied to any AI module irrespective of the AI model being executed. A person skilled in the art will also appreciate that the AI model may be implemented as a set of software instructions, combination of software and hardware or any combination of the same.
[0011] Some of the typical tasks performed by AI systems are classification, clustering, regression etc. Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are: face recognition, object identification, gesture recognition, voice recognition etc. In a regression task, the model is trained based on labeled datasets, where the target labels are numeric values. Some of the typical applications of regressions are: Weather forecasting, Stock price predictions, House price estimation, energy consumption forecasting etc. Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities.
[0012] As the AI module forms the core of the AI system, the module needs to be protected against attacks. AI adversarial threats can be largely categorized into – model extraction attacks, inference attacks, evasion attacks, and data poisoning attacks. In poisoning attacks, the adversarial carefully inject crafted data to contaminate the training data which eventually affects the functionality of the AI system. Inference attacks attempt to infer the training data from the corresponding output or other information leaked by the target model. Studies have shown that it is possible to recover training data associated with arbitrary model output. Ability to extract this data further possess data privacy issues. Evasion attacks are the most prevalent kind of attack that may occur during AI system operations. In this method, the attacker works on the AI algorithm's inputs to find small perturbations leading to large modifications of its outputs (e.g., decision errors) which leads to evasion of the AI model.
[0013] In Model Extraction Attacks (MEA), the attacker gains information about the model internals through analysis of input, output, and other external information. Stealing such a model reveals the important intellectual properties of the organization and enables the attacker to craft other adversarial attacks such as evasion attacks. This attack is initiated through an attack vector. In the computing technology a vector may be defined as a method in which a malicious code/virus data uses to propagate itself such as to infect a computer, a computer system or a computer network. Similarly, an attack vector is defined a path or means by which a hacker can gain access to a computer or a network in order to deliver a payload or a malicious outcome. A model stealing attack uses a kind of attack vector that can make a digital twin/replica/copy of an AI module.
[0014] The attacker typically generates random queries of the size and shape of the input specifications and starts querying the model with these arbitrary queries. This querying produces input-output pairs for random queries and generates a secondary dataset that is inferred from the pre-trained model. The attacker then take this I/O pairs and trains the new model from scratch using this secondary dataset. This is a black box model attack vector where no prior knowledge of original model is required. As the prior information regarding model is available and increasing, attacker moves towards more intelligent attacks.
[0015] The attacker chooses relevant dataset at his disposal to extract model more efficiently. Our aim through this disclosure is to identify queries that give the best input/output pair needed to evade the trained model. Once the set of queries in the dataset that can efficiently evade the model are identified, we test the vulnerability of the AI system against those queries. For the purposes of this disclosure our aim to test the vulnerability of a classifier AI model against all classes of attack vectors.
[0016] Figure 1 depicts a framework (100) for assessing vulnerability of a trained AI model (M). The framework (100) comprises the trained AI model (M), a stolen AI model (30), at least one generator AI model (40) and at least a processor (20). The framework (100) is designed to generate quality attack vectors that can test the vulnerability of the trained AI model (M).
[0017] The AI model (M) is a regressor type model trained. In an embodiment of the present disclosure, the AI model (M) is configured to process images and identify various objects by classifying them into a class within bounding boxes having coordinates values. For example, in an images of a street, the AI model extracts objects like car, cat, motorbike with boundaries in various bounding boxes.
[0018] Figure 2 depicts an AI system (10). The AI model (M) could additionally be part of the AI system (10) comprising other components and modules. The AI system (10) additionally comprises an input interface (11), an output interface (22), a submodule (14) and at least a blocker module (18). The submodule (14) is trained using various techniques to identify an attack vector in the input. The blocker module (18) is configured to block a user or modify the output when an input query is determined as an attack vector. The blocker module (18) is configured to at least restrict a user of the AI system (10) in dependance of the assessment. It is further configured to modify the original output generated by the AI model (M) on identification of an input or a batch of input queries as attack vector.
[0019] As used in this application, the terms "component," "system," "module," "interface," are intended to refer to a computer-related entity or an entity related to, or that is part of, an operational apparatus with one or more specific functionalities, wherein such entities can be either hardware, a combination of hardware and software, software, or software in execution. As further yet another example, interface(s) can include input/output (I/O) components as well as associated processor (20), application, or Application Programming Interface (API) components. The AI system (10) could be a hardware combination of these modules or could be deployed remotely on a cloud or server. Similarly, the framework (100) could be a hardware or a software combination of these modules or could be deployed remotely on a cloud or server.
[0020] The generator AI model (40) is fed with a training dataset and generates a first set of attack vectors. The training dataset comprises either one or both from a set of pre-determined attack vectors and random vector points. For example in the example where the AI model (M) is configured to process images and identify various objects by classifying them into a class within bounding boxes, Random vector points say z1 and z2 are sampled from a given set of distributions (such as Gaussian and Laplacian distributions) for an image and fed into the generator AI model (40).
[0021] The stolen model is another AI model initialized using a limited number of known input-output queries for the trained AI model (M). Using pre-determined set of attack vectors fed to AI model and the corresponding out generated by it, a learnt-labelled dataset is created. This learnt labeled dataset is used to create a replica of the AI model using a model extraction technique as elucidated above. However, this stolen AI model (30) would not be precise as the learnt labelled dataset doesn’t cover the whole spectrum of input. Hence the stolen model (30) is the reverse engineered replica of the trained AI model (M) initialized by examining some predetermined attack vectors and the corresponding first output. The generator AI model (40) is deployed in an adversarial framework wherein it is pitted against the trained AI model (M) and the stolen AI model (30). The trained AI model (M) and the stolen AI model (30) act a discriminator to the generator AI model (40).
[0022] Generally, the processor (20) may be implemented as any or a combination of: one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor (20), firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The processor (20) is configured to exchange and manage the processing of various AI models in the framework (100) i.e. namely the AI model (M), the stolen AI model (30) and the generator AI model (40).
[0023] The processor (20) is configured to: feed a training dataset to a generator AI model (40) to get a first set of attack vectors; feed the first set of attack vectors to the trained AI model (M) and a stolen AI model (30) to generate a first output and a second output respectively; compare the first and second output to compute a loss function by means of a processor (20); input the computed loss as feedback the generator AI model (40) to improve the accuracy of the first set of attack vectors; feed the first set of attack vectors to the AI Model(M) to assess the vulnerability of the AI Model (M).The loss function further comprises a classification loss and at least a regression loss. The processor (20) is further configured to store the second set of attack vectors in a database. The processor (20) further analyzes the output of the trained AI model (M) for the second set of attack vectors to assess the vulnerability of the AI model.
[0024] It should be understood at the outset that, although exemplary embodiments are illustrated in the figures and described below, the present disclosure should in no way be limited to the exemplary implementations and techniques illustrated in the drawings and described below.
[0025] Figure 3 illustrates method steps of assessing vulnerability of an AI model (M). The framework (100) used to assess vulnerability of the AI model (M) and its components have been explained in accordance with figure 1 and figure 2.
[0026] Method step 201 comprises feeding a training dataset to a generator AI model (40) to get a first set of attack vectors. This training dataset comprise limited distribution of input. They are fetched from a database comprising identified attack vector. In an embodiment of the present invention such databases resides inside the processor (20). In another embodiment, the processor (20) is configured to retrieve information from this remotely located database.
[0027] Method step 202 comprises feeding the first set of attack vectors to the trained AI model (M) and a stolen AI model (30) to generate a first output and a second output respectively. Method step 203 comprises comparing the first and second output to compute a loss function by means of a processor (20). This loss is a distance measure equation that defines difference between the estimated and true values. This difference is used to improve the results by estimating the inaccuracy of predictions. The loss function further comprises a classification loss and at least regression loss. Taking cue from the previous example wherein the trained AI model (M) is configured to process images and identify various objects by classifying them into a class within bounding boxes having coordinates values. The classification loss is the difference in class of object identified and the regression loss is difference with respect to the coordinates of the boundary of the identified object.
[0028] Method step 204 comprises inputting the computed loss as feedback the generator AI model (40) to improve the accuracy of the first set of attack vectors. Method step 205 comprises feeding the first set of attack vectors to the trained AI model (M) to assess the vulnerability of the trained AI model (M). For a robust defense mechanism of the trained AI model (M) in an AI system (10), it is expected that a component of the AI system (10) recognizes majority of the attack vectors. Thereafter the blocker module (18) of the AI system (10) is supposed to block a user or modify the output when a batch of input queries or an input query is determined as an attack vector. The processor (20) further analyzes the output of the trained AI model (M) for the second set of attack vectors to assess the vulnerability of the AI model. Hence, while assessing the vulnerability of the AI Model, we record the output of the AI system (10), the processor (20) determines percentage and severity of the modified output to assess the vulnerability of the AI system (10).
[0029] It must be understood that the embodiments explained in the above detailed description are only illustrative and do not limit the scope of this invention. Any modification the framework (100) and adaptation of the method assessing vulnerability of an AI model are envisaged and form a part of this invention. The scope of this invention is limited only by the claims.
, Claims:We Claim:
1. A method (200) of assessing vulnerability of a trained AI model (M), said AI model (M) trained on regression algorithms, the method comprising;
feeding (201) a training dataset to a generator AI model (40) to get a first set of attack vectors;
feeding (202) the first set of attack vectors to the trained AI model (M) and a stolen AI model (30) to generate a first output and a second output respectively;
comparing (203) the first and second output to compute a loss function by means of a processor (20);
inputting (204) the computed loss as feedback the generator AI model (40) to improve the accuracy of the first set of attack vectors;
feeding (205) the first set of attack vectors to the AI Model(M) to assess the vulnerability of the AI Model (M).
2. The method (200) of assessing vulnerability of an AI model as claimed in claim 1, wherein the stolen model is another AI model initialized using a limited number of known input-output queries for the trained AI model (M).
3. The method (200) of assessing vulnerability of an AI model (M) as claimed in claim 1, wherein the loss function further comprises a classification loss and at least regression loss.
4. The method (200) of assessing vulnerability of an AI model (M) as claimed in claim 1, wherein the output of the AI Model for the second set of attack vectors is analyzed to assess the vulnerability of the AI Model (M) .
5. A framework (100) for assessing the vulnerability of a trained AI model (M), the framework (100) comprising a stolen AI model (30), at least one generator AI model (40) and at least a processor (20), characterized in that framework (100):
the processor (20) configured to:
Feed a training dataset to a generator AI model (40) to get a first set of attack vectors;
Feed the first set of attack vectors to the trained AI model (M) and a stolen AI model (30) to generate a first output and a second output respectively;
Compare the first and second output to compute a loss function by means of a processor (20);
Input the computed loss as feedback the generator AI model (40) to improve the accuracy of the first set of attack vectors;
Feed the first set of attack vectors to the AI Model(M) to assess the vulnerability of the AI Model (M).
6. The framework (100) for assessing the vulnerability of the AI model as claimed in claim 5, wherein the stolen model is another AI model initialized using a limited number of known input-output queries for the trained AI model (M).
7. The framework (100) for assessing the vulnerability of the AI model as claimed in claim 5, wherein the loss function further comprises a classification loss and at least a regression loss.
8. The framework (100) for assessing the vulnerability of the AI model as claimed in claim 5, wherein the processor (20) is further configured to store the first set of attack vectors in a database.
9. The framework (100) for assessing the vulnerability of the AI model as claimed in claim 5, wherein the processor (20) further analyzes the output of the trained AI model (M) for the second set of attack vectors to assess the vulnerability of the trained AI model (M).
| # | Name | Date |
|---|---|---|
| 1 | 202341005924-POWER OF AUTHORITY [30-01-2023(online)].pdf | 2023-01-30 |
| 2 | 202341005924-FORM 1 [30-01-2023(online)].pdf | 2023-01-30 |
| 3 | 202341005924-DRAWINGS [30-01-2023(online)].pdf | 2023-01-30 |
| 4 | 202341005924-DECLARATION OF INVENTORSHIP (FORM 5) [30-01-2023(online)].pdf | 2023-01-30 |
| 5 | 202341005924-COMPLETE SPECIFICATION [30-01-2023(online)].pdf | 2023-01-30 |
| 6 | 202341005924-Power of Attorney [17-01-2024(online)].pdf | 2024-01-17 |
| 7 | 202341005924-Covering Letter [17-01-2024(online)].pdf | 2024-01-17 |