Abstract: TITLE: A method (200) of assessing vulnerability of an AI Model (M) and a framework (100) thereof. ABSTRACT This invention discloses a framework (100) for assessing vulnerability of an AI model (M) and method (200) thereof. The framework (100) comprises a stolen AI Model (S), an XAI module (30) and at least a processor (20). The AI Model (M) is fed with a first set of pre-determined attack vectors to generate a first output by means of the processor (20). The stolen AI Model is initialized by examining the input predetermined attack vectors and the corresponding first output. The processor (20) is configured to update the stolen AI model (S) to an updated stolen AI model (S1…Sn) after performing multiple iterations of method step (203) by using the XAI module (30) for the stolen AI model (S). The processor (20) analyzes responses of the updated stolen AI model (Sn) for random inputs to assess vulnerability of the AI model (M).
Description:Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed
Field of the invention
[0001] The present disclosure relates to the field of Artificial Intelligence security. In particular, it proposes a method of assessing vulnerability of an AI Model and a framework thereof.
Background of the invention
[0002] With the advent of data science, data processing and decision making systems are implemented using artificial intelligence modules. The artificial intelligence modules use different techniques like machine learning, neural networks, deep learning etc. Most of the AI based systems, receive large amounts of data and process the data to train AI models. Trained AI models generate output based on the use cases requested by the user. Typically the AI systems are used in the fields of computer vision, speech recognition, natural language processing, audio recognition, healthcare, autonomous driving, manufacturing, robotics etc. where they process data to generate required output based on certain rules/intelligence acquired through training.
[0003] To process the inputs and give a desired output, the AI systems use various models/algorithms which are trained using the training data. Once the AI system is trained using the training data, the AI systems use the models to analyze the real time data and generate appropriate result. The models may be fine-tuned in real-time based on the results. The AI models in the AI systems form the core of the system. Lots of effort, resources (tangible and intangible), and knowledge goes into developing these models.
[0004] It is possible that some adversary may try to tamper/manipulate/evade the AI model to create incorrect outputs. The adversary may use different techniques to manipulate the output of the model. One of the simplest techniques used by the adversary is where the adversary sends queries to the AI system using his own test data to compute or approximate the gradients through the model. Based on these gradients, the adversary can then manipulate the input in order to manipulate the output of the Model. Another technique is wherein the adversary may manipulate the input data to bring an artificial output. This will cause hardships to the original developer of the AI in the form of business disadvantages, loss of confidential information, loss of lead time spent in development, loss of intellectual properties, loss of future revenues etc. Hence there is a need to identify samples in the test data or generate samples that can efficiently extract internal information about the working/architecture of these models and assess the vulnerability of the AI system against those sample-based queries.
[0005] There are methods known in the prior arts on the method of attacking an AI System. The prior art WO2021/095984 A1 – Apparatus and Method for Retraining Substitute Model for Evasion Attack and Evasion attack Apparatus discloses one such method. The method talks about retraining a substitute model that partially imitates the target model by allowing the target model to misclassify for specific attack data. However, in a classifier type AI Model there is a need to identify adversarial input of attack vectors spread across all classes and test the vulnerability of the AI Model against them.
Brief description of the accompanying drawings
[0006] An embodiment of the invention is described with reference to the following accompanying drawings:
[0007] Figure 1 depicts a framework for assessing vulnerability AI Model (M);
[0008] Figure 2 depicts an AI system (10);
[0009] Figure 2 illustrates method steps (200) of assessing vulnerability of the AI model (M).
[0010] Figure 4 is a process flow diagram for method step 203.
Detailed description of the drawings
[0011] It is important to understand some aspects of artificial intelligence (AI) technology and artificial intelligence (AI) based systems or artificial intelligence (AI) system. Some important aspects of the AI technology and AI systems can be explained as follows. Depending on the architecture of the implements AI systems may include many components. One such component is an AI model. A model can be defined as reference or an inference set of data, which is use different forms of correlation matrices. Using these models and the data from these models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI models such as linear regression, naïve bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed and can be applied to any AI module irrespective of the AI model being executed. A person skilled in the art will also appreciate that the AI model may be implemented as a set of software instructions, combination of software and hardware or any combination of the same.
[0012] Some of the typical tasks performed by AI systems are classification, clustering, regression etc. Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are: face recognition, object identification, gesture recognition, voice recognition etc. In a regression task, the model is trained based on labeled datasets, where the target labels are numeric values. Some of the typical applications of regressions are: Weather forecasting, Stock price predictions, House price estimation, energy consumption forecasting etc. Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities.
[0013] As the AI module forms the core of the AI system, the module needs to be protected against attacks. AI adversarial threats can be largely categorized into – model extraction attacks, inference attacks, evasion attacks, and data poisoning attacks. In poisoning attacks, the adversarial carefully inject crafted data to contaminate the training data which eventually affects the functionality of the AI system. Inference attacks attempt to infer the training data from the corresponding output or other information leaked by the target model. Studies have shown that it is possible to recover training data associated with arbitrary model output. Ability to extract this data further possess data privacy issues. Evasion attacks are the most prevalent kind of attack that may occur during AI system operations. In this method, the attacker works on the AI algorithm's inputs to find small perturbations leading to large modifications of its outputs (e.g., decision errors) which leads to evasion of the AI model.
[0014] In Model Extraction Attacks (MEA), the attacker gains information about the model internals through analysis of input, output, and other external information. Stealing such a model reveals the important intellectual properties of the organization and enables the attacker to craft other adversarial attacks such as evasion attacks. This attack is initiated through an attack vector. In the computing technology a vector may be defined as a method in which a malicious code/virus data uses to propagate itself such as to infect a computer, a computer system, or a computer network. Similarly, an attack vector is defined a path or means by which a hacker can gain access to a computer or a network to deliver a payload or a malicious outcome. A model stealing attack uses a kind of attack vector that can make a digital twin/replica/copy of an AI module.
[0015] The attacker typically generates random queries of the size and shape of the input specifications and starts querying the model with these arbitrary queries. This querying produces input-output pairs for random queries and generates a secondary dataset that is inferred from the pre-trained model. The attacker then take this I/O pairs and trains the new model from scratch using this secondary dataset. This is a black box model attack vector where no prior knowledge of original model is required. As the prior information regarding model is available and increasing, attacker moves towards more intelligent attacks.
[0016] The attacker chooses relevant dataset at his disposal to extract model more efficiently. Our aim through this disclosure is to identify queries that give the best input/output pair needed to extract the maximum information about the working and architecture of the trained model. Once these model extraction queries/ attack vectors in the dataset are identified, we test the vulnerability of the AI system against those queries. For the purposes of this disclosure our aim is to test the vulnerability of an AI model against such model extraction attack vectors.
[0017] Figure 1 depicts a framework (100) for assessing vulnerability of an AI model (M). The framework (100) comprises a stolen AI Model (S), an XAI module (30) and at least a processor (20).
[0018] The AI Model (M) is fed with a first set of pre-determined attack vectors to generate a first output by means of the processor (20). The AI model (M) can be a standalone component or part of an AI system. Figure 2 depicts such AI system. The AI model (M) here is part of the AI system comprising other components and modules. The AI system additionally comprises an input interface (11), an output interface (22), a submodule (14) and at least a blocker module (18). The submodule (14) is trained using various techniques to identify an attack vector in the input. The blocker module (18) is configured to block a user or modify the output when an input query is determined as an attack vector. The blocker module (18) is configured to at least restrict a user of the AI system in dependance of the assessment. It is further configured to modify the original output generated by the AI model (M) on identification of an input or a batch of input queries as attack vector.
[0019] The XAI module (30) implements algorithms which give outputs that helps humans understand the reasoning behind decisions or predictions made by the AI. It contrasts with the "black box" concept in machine learning, where even the AI's designers cannot explain why it arrived at a specific decision. The basic goal of XAI is to describe in detail how AI models produce their prediction, since it is of much help for different reasons. The XAI module (30) configured to give a saliency map for a random input for the stolen AI model (S,S1,….Sn). Saliency refers to unique features (pixels, resolution etc.) of the image in the context of visual processing. The XAI method can include one or more of the below techniques and their alike such as – GradCAM, GradCAM++, Guided-BackProp, Integrated Gradients, SHAP or LIME.
[0020] In context of the present invention, a Saliency map is an image that highlights the region on which the AI model focus first i.e the high importance region, when giving an output. In an exemplary embodiment of the present invention, XAI module (30) derives the saliency maps using a Grad-Cam technique. Grad-CAM heat-map is a weighted combination of feature maps. The Grad CAM converts the gradients at the final convolutional layer into a Heatmap that highlights the important regions (at a broad-region level).
[0021] The stolen AI Model is the reverse engineered replica of the AI model (M) initialized by examining the input predetermined attack vectors and the corresponding first output. Based on the correlation established between the output and the corresponding input (pre-determined attack vector), a temporary architecture/internal working of the AI Model (M) is guessed. This is deemed as the initial stolen AI Model (S). The processor (20) is configured to update the stolen AI model (S) to an updated stolen AI model (S1…Sn) after performing multiple iterations of method step (203) by using the XAI module (30) for the stolen AI model (S).
[0022] Generally, the processor (20) may be implemented as any or a combination of one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The processor (20) is configured to exchange and manage the processing of information between the components of the framework (100) such as the AI model (M), stolen AI model (S,S1…Sn), and the XAI module (30) . The processor (20) analyzes responses of the updated stolen AI model (Sn) for random inputs to assess vulnerability of the AI model (M).
[0023] While updating the stolen AI model (S) to an updated stolen AI model (S1…Sn), the processor (20) is configured to: provide a random input chosen from a test dataset to the XAI module (30) for the stolen AI model (S) to get a saliency map (SM1); compare the saliency map (SM1) with the random input to identify low importance and high importance features; add perturbations in the low importance features of the random input to generate a refined attack vector (AV); feed the refined attack vector (AV) as input to the AI Model (M) to generate a second output by means of the processor (20); update the stolen AI model (from S to S1) by examining the input refined attack vector and the corresponding second output. The processor (20) performs multiple iterations of the afore-mentioned sub-steps for different random inputs on the latest update of the stolen AI model (S1) to get the eventual updated stolen AI model (Sn).
[0024] As used in this application, the terms "component," "system," "module," "interface," are intended to refer to a computer-related entity or an entity related to, or that is part of, an operational apparatus with one or more specific functionalities, wherein such entities can be either hardware, a combination of hardware and software, software, or software in execution. As further yet another example, interface(s) can include input/output (I/O) components as well as associated processor (20), application, or Application Programming Interface (API) components. The AI system could be a hardware combination of these modules or could be deployed remotely on a cloud or server. Similarly, the framework (100) could be a hardware or a software combination of these modules or could be deployed remotely on a cloud or server.
[0025] It should be understood at the outset that, although exemplary embodiments are illustrated in the figures and described below, the present disclosure should in no way be limited to the exemplary implementations and techniques illustrated in the drawings and described below.
[0026] Figure 3 illustrates method steps of assessing vulnerability of an AI model (M). The framework (100) used to assess vulnerability of the AI model (M) has been explained in accordance with figure 1 and figure 2. For the purposes of clarity, it is reiterated that the framework (100) comprises a stolen AI Model (S), an XAI module (30) and at least a processor (20).
[0027] Method step 201 comprises feeding a first set of pre-determined attack vectors to the AI Model (M) to generate a first output by means of the processor (20). Figure 4 is a process flow diagram for method step 200. Method step 202 comprises initializing a stolen AI Model (S,S1,….Sn) by examining the input predetermined attack vectors and the corresponding first output. The stolen AI Model is the reverse engineered replica of the AI model (M). Based on the correlation established between the output and the corresponding input (pre-determined attack vector), a temporary architecture/internal working of the AI Model (M) is guessed, which is deemed as the a stolen AI Model (S,S1,….Sn).
[0028] Method step 203 comprises updating the stolen AI model (S,S1…Sn) using an XAI module (30) for the stolen AI model (S,S1…Sn). The updating (203) the stolen AI model further comprises the following sub-steps, which are depicted in iteration 01 of the process flow diagram in figure 4. First a random input chosen from a test dataset is provided to the XAI module (30) for the stolen AI model (S) to get a saliency map (SM). Then the saliency map (SM) is compared with the random input to identify low importance and high importance features. Then perturbations in the low importance features of the random input are added (while retaining the high importance features) to generate a refined attack vector (AV). This basically means overlaying the saliency map on the random to determine an attack pattern. The refined attack vector (AV) is then fed as input to the AI Model (M) to generate a second output by means of the processor (20). Finally, the stolen AI model is updated from (S to S1) by examining the input refined attack vector (AV) and the corresponding second output.
[0029] Multiple iterations of the sub-steps described above are performed for different random inputs chosen from the test dataset on the latest update of the stolen AI model (S1) to get the eventual updated stolen AI model (Sn). For example, as shown in Iteration 02 in figure 4. Again, a random input is provided to the XAI module (30) for the stolen AI model (S1) to get a saliency map (SM2). Then the saliency map (SM2) is compared with the random input to identify low importance and high importance features. Then perturbations in the low importance features of the random input are added to generate a refined attack vector (AV2). The refined attack vector (AV2) is then fed as input to the AI Model (M) to generate a second output by means of the processor (20). Finally, the stolen AI model is updated from (S1 to S2) by examining the input refined attack vector (AV2) and the corresponding second output. Multiple such iterations are performed until the stolen AI model (S,S1,…Sn) stops learning any further i.e. there are no updates to stolen AI model.
[0030] Method step 204 comprises analyzing (204) responses of the updated stolen AI model (Sn) by means of the processor (20) for random input to assess vulnerability of the AI model (M). If the AI model was less vulnerable it is expected that a component of the AI system (10) recognized majority of the attack vectors and thereafter the blocker module (18) of the AI system (10) to block such attack vectors or modified the output. The updated stolen AI model (Sn) would not have extracted the true architecture/function of the AI model (M) as it got modified responses. Hence the poorer the response of the updated stolen AI model (Sn), the less vulnerable the AI model (M).
[0031] To assess the vulnerability of the model, it is necessary to understand how to easy or difficult is it to extract an AI model (M). Using the method steps (200) our aim is build the Stolen AI Model (Sn) which is structurally and functionally same as the AI model (M). The method described in this disclosure combines multiple explainable AI techniques, such as saliency maps and perturbation analysis. The proposed concept in this disclosure provides a more comprehensive understanding of a model's behavior and decision-making processes, which can then be leveraged to generate a more effective attack vector. The use of explainable AI in this manner can help improve the security of machine learning models against model extraction attacks. The insights from the extraction can then be used for building defenses to protect AI models.
[0032] Adversarial attacks involve intentionally manipulating inputs to deceive the model or exploit its weaknesses. The proposed XAI method can help in identifying areas where the model is susceptible to such attacks. Let's consider an image classification system used for security purposes, such as identifying potential threats in airport X-ray scans. The model used in this system is a complex deep learning model that is difficult to interpret due to its black-box nature. However, by applying XAI techniques, an interpretable version of the model can be extracted. With the extracted model, XAI methods like saliency maps, gradient-based methods, or rule-based explanations can be applied to identify the important features or regions in an input image that the model relies on for classification. These explanations provide insights into the decision-making process of the model. With knowledge of adversarial examples a defense model can be employed to analyze the explanations and detect any abnormal or unexpected behavior. Image / Computer Vision tasks could include Segmentation – Object identification in Airport Xray, Object Detection – Security camera with person detection, Image Classification – such as in Automated optical inspection.
[0033] It must be understood that the embodiments explained in the above detailed description are only illustrative and do not limit the scope of this invention. Any modification the framework (100) and adaptation of the method assessing vulnerability of an AI model are envisaged and form a part of this invention. The scope of this invention is limited only by the claims.
, Claims:We Claim:
1. A method (200) of assessing vulnerability of an AI model (M), the method comprising:
feeding (201) a first set of pre-determined attack vectors to the AI Model (M) to generate a first output by means of a processor (20);
initializing (202) a stolen AI Model (S) by examining the input predetermined attack vectors and the corresponding first output;
updating (203) the stolen AI model (S,S1…Sn) using an XAI module (30) for the stolen AI model (S,S1…Sn);
analyzing (204) responses of the updated stolen AI model (Sn) by means of the processor (20) for random input to assess vulnerability of the AI model (M).
2. The method (200) of assessing vulnerability of an AI model (M) as claimed in claim 1, wherein the updating (203) the stolen AI model further comprises the sub-steps:
providing a random chosen input from a test set to the XAI module (30) for the stolen AI model (S) to get a saliency map (SM);
comparing the saliency map (SM) with the random input to identify low importance and high importance features;
adding perturbations in the low importance features of the random input to generate a refined attack vector (AV);
feeding the refined attack vector (AV) as input to the AI Model (M) to generate a second output by means of the processor (20);
updating the stolen AI model (S to S1) by examining the input refined attack vector (AV) and the corresponding second output.
3. The method (200) of assessing vulnerability of an AI model (M) as claimed in claim 1, wherein multiple iterations of the sub-steps claimed in claim 2 are performed for different random inputs chosen from the test dataset on the latest update of the stolen AI model (S1) to get the eventual updated stolen AI model (Sn).
4. A framework (100) for assessing the vulnerability of an AI Model (M), the framework (100) comprising a stolen AI Model (S), an XAI module (30) in communication with the stolen AI model (S) and at least a processor (20), said processor (20) in communication with the AI model (M), characterized in that framework:
the processor (20) configured to:
feed a first set of pre-determined attack vectors to the AI Model (M) to generate a first output;
initialize a stolen AI Model (S) by examining the input predetermined attack vectors and the corresponding first output;
update the stolen AI model (S,S1…Sn) using the XAI module (30) for the stolen AI model (S,S1,…Sn);
analyze responses of the updated stolen AI model (Sn) for random inputs to assess vulnerability of the AI model (M).
5. The framework (100) for assessing the vulnerability of an AI Model (M) as claimed in claim 5, wherein the processor (20) is configured to:
provide a random input chosen from a test dataset to the XAI module (30) for the stolen AI model (S) to get a saliency map (SM1);
compare the saliency map (SM1) with the random input to identify low importance and high importance features;
add perturbations in the low importance features of the random input to generate a refined attack vector (AV);
feed the refined attack vector (AV) as input to the AI Model (M) to generate a second output by means of the processor (20);
update the stolen AI model (from S to S1) by examining the input refined attack vector and the corresponding second output.
6. The framework (100) for assessing the vulnerability of an AI Model (M) as claimed in claim 5, wherein the processor (20) performs multiple iterations of the sub-steps claimed in claim 5 for different random inputs chosen from the test dataset on the latest update of the stolen AI model (S1) to get the eventual updated stolen AI model (Sn).
| # | Name | Date |
|---|---|---|
| 1 | 202341044061-POWER OF AUTHORITY [30-06-2023(online)].pdf | 2023-06-30 |
| 2 | 202341044061-FORM 1 [30-06-2023(online)].pdf | 2023-06-30 |
| 3 | 202341044061-DRAWINGS [30-06-2023(online)].pdf | 2023-06-30 |
| 4 | 202341044061-DECLARATION OF INVENTORSHIP (FORM 5) [30-06-2023(online)].pdf | 2023-06-30 |
| 5 | 202341044061-COMPLETE SPECIFICATION [30-06-2023(online)].pdf | 2023-06-30 |
| 6 | 202341044061-Power of Attorney [10-05-2024(online)].pdf | 2024-05-10 |
| 7 | 202341044061-Covering Letter [10-05-2024(online)].pdf | 2024-05-10 |