Abstract: TITLE: A system (100) adapted to identify an attack vector from an input and a method (200) thereof. ABSTRACT The present invention proposes a system (100) adapted to identify an attack vector from an input to a classifier AI model (M) and a method (200) thereof. The system (100) comprising an XAI module (30) in communication with a processor (20), said processor (20) in communication with the classifier AI model (M). The XAI module (30) configured to give a saliency map (InXT)for the input and a set of saliency maps for a plurality of transformed inputs. The processor (20) is configured to compute a cumulative of the inversely transformed saliency maps (InA) and compare the ( InA ) with saliency map for the input (InXT). An input is identified as attack vector based on said comparison. Figure 1.
Description:Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed
Field of the invention
[0001] The present disclosure relates to the field of Artificial Intelligence security. In particular, it proposes a system adapted to identify an attack vector from an input fed to a classifier AI model and a method thereof.
Background of the invention
[0002] With the advent of data science, data processing and decision-making systems are implemented using artificial intelligence modules. The artificial intelligence modules use different techniques like machine learning, neural networks, deep learning etc. Most of the AI based systems, receive large amounts of data and process the data to train AI models. Trained AI models generate output based on the use cases requested by the user. Typically, the AI systems are used in the fields of computer vision, speech recognition, natural language processing, audio recognition, healthcare, autonomous driving, manufacturing, robotics etc. where they process data to generate required output based on certain rules/intelligence acquired through training.
[0003] To process the inputs and give a desired output, the AI systems use various models/algorithms which are trained using the training data. Once the AI system is trained using the training data, the AI systems use the models to analyze the real time data and generate appropriate result. The models may be fine-tuned in real-time based on the results. The models in the AI systems form the core of the system. Lots of effort, resources (tangible and intangible), and knowledge goes into developing these models.
[0004] It is possible that some adversary may try to tamper/manipulate/evade the model in AI Systems to create incorrect outputs. The adversary may use different techniques to manipulate the output of the model. One of the simplest techniques used by the adversary is where the adversary sends queries to the AI system using his own test data to compute or approximate the gradients through the model. Based on these gradients, the adversary can then manipulate the input in order to manipulate the output of the Model. Another technique is wherein the adversary may manipulate the input data to bring an artificial output. This will cause hardships to the original developer of the AI in the form of business disadvantages, loss of confidential information, loss of lead time spent in development, loss of intellectual properties, loss of future revenues etc. Hence there is a need to assess the input that is fed to the AI Model.
[0005] There are methods known in the prior arts on the method of attacking an AI System. The prior art WO2021/095984 A1 – Apparatus and Method for Retraining Substitute Model for Evasion Attack and Evasion attack Apparatus discloses one such method. The method talks about retraining a substitute model that partially imitates the target model by allowing the target model to misclassify for specific attack data.
Brief description of the accompanying drawings
[0006] An embodiment of the invention is described with reference to the following accompanying drawings:
[0007] Figure 1 depicts a system (100) adapted to identify an attack vector from an input fed to a classifier AI model (M);
[0008] Figure 2 illustrates method steps (200) to identify an attack vector from an input fed to a classifier AI model (M);
[0009] Figure 3 is a visual illustration of method steps (200) to identify an attack vector from an input fed to a classifier AI model (M).
Detailed description of the drawings
[0010] It is important to understand some aspects of artificial intelligence (AI) technology and artificial intelligence (AI) based systems or artificial intelligence (AI) system. Some important aspects of the AI technology and AI systems can be explained as follows. Depending on the architecture of the implements AI systems may include many components. One such component is an AI model. A model can be defined as reference or an inference set of data, which is use different forms of correlation matrices. Using these models and the data from these models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI models such as linear regression, naïve bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed and can be applied to any AI module irrespective of the AI model being executed.
[0011] Some of the typical tasks performed by AI systems are classification, clustering, regression etc. Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are: face recognition, object identification, gesture recognition, voice recognition etc. In a regression task, the model is trained based on labeled datasets, where the target labels are numeric values. Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities. Learning without labels is called unsupervised learning. Unlabeled data is the majority of data in the world.
[0012] As the AI model forms the core of the AI system, the module needs to be protected against attacks. AI adversarial threats can be largely categorized into – model extraction attacks, inference attacks, evasion attacks, and data poisoning attacks.
[0013] In model extraction attacks, the attacker gains information about the model internals through analysis of input, output, and other external information. Stealing such a model reveals the important intellectual properties of the organization and enables the attacker to craft other adversarial attacks such as evasion attacks. Inference attacks attempt to infer the training data from the corresponding output or other information leaked by the target model. Studies have shown that it is possible to recover training data associated with arbitrary model output. Ability to extract this data further possess data privacy issues. In poisoning attacks, the adversarial carefully inject crafted data to contaminate the training data which eventually affects the functionality of the AI system.
[0014] Evasion attacks are the most prevalent kind of attack that may occur during AI system operations. In this method, the attacker works on the AI algorithm's inputs to find small perturbations leading to large modifications of its outputs (e.g., decision errors) which leads to evasion of the AI model. In context of an image classifier, an small perturbation for example addition of whiskers on the image of a dog may fool the model to misclassify it as a cat.
[0015] Figure 1 depicts a system (100) to identify an attack vector from an input fed to a classifier AI model (M). The system (100) comprises an XAI module (30) in communication with a processor (20), said processor (20) in communication with the classifier AI model (M).
[0016] The XAI module (30) implements algorithms which give outputs that humans can understand the reasoning behind decisions or predictions made by the AI. It contrasts with the "black box" concept in machine learning, where even the AI's designers cannot explain why it arrived at a specific decision. The basic goal of XAI is to describe in detail how AI models produce their prediction, since it is of much help for different reasons. The XAI module (30) configured to give a saliency map (InXT)for the input and a set of saliency maps for a plurality of transformed inputs. Saliency refers to unique features (pixels, resolution etc.) of the image in the context of visual processing. The XAI method can include one or more of the below techniques and their alike such as – GradCAM, GradCAM++, Guided-BackProp, Integrated Gradients, SHAP or LIME.
[0017] In context of the present invention, a Saliency map is an image that highlights the region on which the AI model focus first, when giving an output. In an exemplary embodiment of the present invention, XAI module (30) derives the saliency maps using a Grad-Cam technique. Grad-CAM heat-map is a weighted combination of feature maps. The GradCAM converts the gradients at the final convolutional layer into a Heatmap that highlights the important regions (at a broad-region level).
[0018] The processor (20) may be implemented as any or a combination of: one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA).
[0019] The processor (20) is configured to perform a plurality of transformations on the input by means of a processor (20) to get the plurality of transformed inputs; perform an inverse transformation on each of the saliency maps obtained for the plurality of transformed inputs; compute a cumulative of the inversely transformed saliency maps (InA); compare the InA with saliency map for the input (InXT ); identify the input as attack vector based on said comparison. The processor (20) performs at least spatial augmentation techniques as part of the plurality of transformations. The processor (20) takes an average of the each of the inversely transformed saliency maps when computing the cumulative of the inversely transformed saliency maps (InA). Further the processor (20) examines if the distance measure between InA and InXT is above a pre-defined threshold to identify an attack vector.
[0020] The classifier AI model (M) is configured to classify an input into at least two classes. In a real-world application, it can be employed in a binary classification task for disease diagnosis. Given an image, the Classifier AI Model classifies the input as Pneumonia or No-Pneumonia. The classifier AI model (M) can reside in an AI system (100) (10). The AI system (100) further comprises at least a defense model configured to block a user or modify the output when a batch of input queries or an input query is determined as an attack vector. This defense model is configured to identify the input fed to the classifier AI model (M) as attack vector. It is further configured to modify the original output generated by the classifier AI model (M) on identification of an input or a batch of input queries as attack vector. For simplicity only components having a bearing on the methodology disclosed in the present invention have been elucidated.
[0021] As used in this application, the terms "component," "system," "module," "interface," are intended to refer to a computer-related entity or an entity related to, or that is part of, an operational apparatus with one or more specific functionalities, wherein such entities can be either hardware, a combination of hardware and software, software, or software in execution. As further yet another example, interface(s) can include input/output (I/O) components as well as associated processor, application, or Application Programming Interface (API) components. A module with reference to this disclosure refers to a logic circuitry or a set of software programs that respond to and processes logical instructions to get a meaningful result. The system (100) could be a hardware combination of these modules or could be deployed remotely on a cloud or server.
[0022] It should be understood at the outset that, although exemplary embodiments are illustrated in the figures and described below, the present disclosure should in no way be limited to the exemplary implementations and techniques illustrated in the drawings and described below.
[0023] Figure 2 illustrates method steps to identify an attack vector from an input fed to a classifier AI model (M). A person skilled in the art would appreciate that the method steps are implemented by the system (100) and it’s components as explained in accordance with figure 1. For clarity it is reiterated that system (100) comprises an XAI module (30) in communication with a processor (20), said processor (20) in communication with the classifier AI model (M). Figure 3 is a visual illustration of method steps (200) to identify an attack vector from an input fed to a classifier AI model (M).
[0024] Method step 201 comprises obtaining saliency map (InXT)for the input by means of an XAI module. In an exemplary implementation of the method steps, the saliency maps are derived by the XAI module (30) using a Grad-Cam technique. As seen from the figure 3, the Grad-Cam map for a danger sign highlights the exclamation mark in the image i.e. this exclamation mark is the region on which the AI model focuses the most while giving the output.
[0025] Method step 202 comprises performing a plurality of transformations on the input by means of a processor (20) to get a plurality of transformed inputs. The plurality of transformations comprise at least a spatial augmentation techniques. As seen in figure 3, these spatial augmentation techniques include scaling, rotation, skew, flipping and combination of these techniques.
[0026] Method step 203 comprises obtaining saliency maps for each of the plurality of transformed inputs by means of the XAI module. Method step 204 comprises performing an inverse transformation on each of the saliency maps obtained for the plurality of transformed inputs. The idea here is that performing the inverse transformation should help revert the saliency map back to the saliency map of the original input image, unless there is a hidden perturbation that is trying to evade the classifier AI model (M).
[0027] Method step 205 comprises computing a cumulative of the inversely transformed saliency maps (InA) by means of the processor (20). The cumulative of the inversely transformed saliency maps (InA) comprises taking an average of the each of the inversely transformed saliency maps.
[0028] Method step 206 comprises comparing the InA with InXT by means of the processor (20). Method step 207 comprises identifying the input as attack vector based on said comparison.
[0029] The underlying concept here is that for an input clean image, the XAI methods show the contribution of the features on the output decision of the AI Model. The explanations (contribution of the features) for image, remain the same irrespective of the augmented nature of the input image. Example: If an input image is rotated by 10°, the corresponding explanation also is rotated by 10°. This will not hold for attack vectors. The attack vector will have hidden perturbation (say some features of a cat hidden behind the image of the dog) which when augmented will be highlighted by the XAI. Hence, the prediction of the output due to augmentations to the attack vectors vis-à-vis the clean input will vary.
[0030] In a real-world application, the Classifier AI Model (M), could be a classification model for road-signs in a traffic driving scenario. The classifier AI Model (M) could classify the given input as one of the different classes such as Stop, Yield, Speed Limit etc., which are typical in a road scene. An attacker could try to extract the classifier AI Model (M) using attack vectors that are advanced and could be created through a number of techniques. The method described in the present invention has to be performed while the model is in deployment, where the input samples are passed through the model (In) are passed through this technique as well. When the attacker uses the attack vector, the present invention detects it. Based on the detection, appropriate actions can be taken to prevent further attack or deterioration of the system. The remediations could include – blocking the user, operating the system with reduced privileges or reduced functionalities, gradual decay of system features (such as limp-home mode), pass a modified output or operate in back-up mode.
[0031] It must be understood that the invention in particular discloses methodology used for identifying an attack vector from an input fed to a classifier AI model (M). While these methodologies describe only a series of steps to accomplish the objectives, these methodologies are implemented in the system (100), which may be a combination of hardware or software or a combination thereof wherein the components of the system (100) may be altered according to requirement.
[0032] It must be understood that the embodiments explained in the above detailed description are only illustrative and do not limit the scope of this invention. Any adaptation and modification of the system (100) or the method to identify an attack vector from an input fed to a classifier AI model (M)is envisaged and form a part of this invention. The scope of this invention is limited only by the claims.
, Claims:We Claim:
1. A system (100) adapted to identify an attack vector from an input fed to a classifier AI model, the system (100) comprising an XAI module (30) in communication with a processor (20), said processor (20) in communication with the classifier AI model (M), characterized in that system (100):
the XAI module (30) configured to give a saliency map (InXT)for the input and a set of saliency maps for a plurality of transformed inputs;
the processor (20) configured to:
perform a plurality of transformations on the input by means of a processor (20) to get the plurality of transformed inputs;
perform an inverse transformation on each of the saliency maps obtained for the plurality of transformed inputs;
compute a cumulative of the inversely transformed saliency maps (InA) by means of the processor (20);
compare the InA with InXT by means of the processor (20);
identify the input as attack vector based on said comparison.
2. The system (100) adapted to identify an attack vector as claimed in claim 1, wherein the saliency maps are derived by the XAI module (30) using a Grad-Cam technique.
3. The system (100) adapted to identify an attack vector as claimed in claim 1, wherein the processor (20) performs at least spatial augmentation techniques as part of the plurality of transformations.
4. The system (100) adapted to identify an attack vector as claimed in claim 1, wherein the processor (20) takes an average of the each of the inversely transformed saliency maps when computing the cumulative of the inversely transformed saliency maps (InA).
5. The system (100) adapted to identify an attack vector as claimed in claim 1, wherein the processor (20) examines if the distance measure between InA and InXT is above a pre-defined threshold to identify an attack vector.
6. A method (200) to identify an attack vector from an input fed to a classifier AI model, the method comprising:
obtaining (201) saliency map (InXT)for the input by means of an XAI module;
performing (202) a plurality of transformations on the input by means of a processor (20) to get a plurality of transformed inputs;
obtaining (203) saliency maps for each of the plurality of transformed inputs by means of the XAI module;
performing (204) an inverse transformation on each of the saliency maps obtained for the plurality of transformed inputs;
computing (205) a cumulative of the inversely transformed saliency maps (InA) by means of the processor (20);
comparing (206) the InA with InXT by means of the processor (20);
identifying (207) the input as attack vector based on said comparison.
7. The method (200) to identify an attack vector as claimed in claim 6, wherein the saliency maps are derived by the XAI module (30) using a Grad-Cam technique.
8. The method (200) to identify an attack vector as claimed in claim 6, wherein the plurality of transformations comprise at least a spatial augmentation techniques.
9. The method (200) to identify an attack vector as claimed in claim 6, wherein computing the cumulative of the inversely transformed saliency maps (InA) comprises taking an average of the each of the inversely transformed saliency maps.
10. The method (200) to identify an attack vector as claimed in claim 6, wherein identifying an attack vector comprises examining if the distance measure between InA and InXT based on the comparison is above a pre-defined threshold.
| # | Name | Date |
|---|---|---|
| 1 | 202341037407-POWER OF AUTHORITY [31-05-2023(online)].pdf | 2023-05-31 |
| 2 | 202341037407-FORM 1 [31-05-2023(online)].pdf | 2023-05-31 |
| 3 | 202341037407-DRAWINGS [31-05-2023(online)].pdf | 2023-05-31 |
| 4 | 202341037407-DECLARATION OF INVENTORSHIP (FORM 5) [31-05-2023(online)].pdf | 2023-05-31 |
| 5 | 202341037407-COMPLETE SPECIFICATION [31-05-2023(online)].pdf | 2023-05-31 |
| 6 | 202341037407-Power of Attorney [10-05-2024(online)].pdf | 2024-05-10 |
| 7 | 202341037407-Covering Letter [10-05-2024(online)].pdf | 2024-05-10 |