Abstract: TITLE: A processor (11) adapted to assess the vulnerability of an AI system (10) and a method (200) thereof. ABSTRACT The present invention proposes a processor (11) adapted to assess vulnerability of an AI system (10) and the method (200) thereof. The AI system (10) comprises an AI model (M) that processes images as input queries and at least a defense model (16). The processor (11) is configured to segregate the values of pixels in the input image into three R,G,B channels and calculate an average of perturbation for each of the R,G,B channel. The averaged perturbation is added to the input image to get a manipulated image. The manipulated image is fed as input to the AI model (M). The response of the defense model (16) for said manipulated image is analyzed to assess vulnerability of the AI system (10). Figure 1.
Description:Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed
Field of the invention
[0001] The present disclosure relates to the field of Artificial Intelligence security. In particular, it proposes a method of assessing vulnerability of an AI system and a processor thereof.
Background of the invention
[0002] With the advent of data science, data processing and decision-making systems are implemented using artificial intelligence modules. The artificial intelligence modules use different techniques like machine learning, neural networks, deep learning etc. Most of the AI based systems, receive large amounts of data and process the data to train AI models. Trained AI models generate output based on the use cases requested by the user. Typically the AI systems are used in the fields of computer vision, speech recognition, natural language processing, audio recognition, healthcare, autonomous driving, manufacturing, robotics etc. where they process data to generate required output based on certain rules/intelligence acquired through training.
[0003] To process the inputs and give a desired output, the AI systems use various models/algorithms which are trained using the training data. Once the AI system is trained using the training data, the AI systems use the models to analyze the real time data and generate appropriate result. The models may be fine-tuned in real-time based on the results. The models in the AI systems form the core of the system. Lots of effort, resources (tangible and intangible), and knowledge goes into developing these models.
[0004] It is possible that some adversary may try to tamper/manipulate/evade the model in AI Systems to create incorrect outputs. The adversary may use different techniques to manipulate the output of the model. One of the simplest techniques used by the adversary is where the adversary sends queries to the AI system using his own test data to compute or approximate the gradients through the model. Based on these gradients, the adversary can then manipulate the input in order to manipulate the output of the Model. Another technique is wherein the adversary may manipulate the input data to bring an artificial output. This will cause hardships to the original developer of the AI in the form of business disadvantages, loss of confidential information, loss of lead time spent in development, loss of intellectual properties, loss of future revenues etc. Hence there is a need to identify samples in the test data or generate samples that can efficiently extract internal information about the working of the models and assess the vulnerability of the AI system against those sample-based queries. Hence there is a need to identify such manipulations to assess the vulnerability of the AI system.
[0005] There are methods known in the prior arts on the method of attacking an AI System. The prior art WO2021/095984 A1 – Apparatus and Method for Retraining Substitute Model for Evasion Attack and Evasion attack Apparatus discloses one such method. The method talks about retraining a substitute model that partially imitates the target model by allowing the target model to misclassify for specific attack data.
Brief description of the accompanying drawings
[0006] An embodiment of the invention is described with reference to the following accompanying drawings:
[0007] Figure 1 depicts a framework for assessing vulnerability AI system (10);
[0008] Figure 2 illustrates method steps (200) of assessing vulnerability of an AI system (10);
[0009] Figure 3 illustrates an example of an evasion attack on a AI model (M).
Detailed description of the drawings
[0010] It is important to understand some aspects of artificial intelligence (AI) technology and artificial intelligence (AI) based systems or artificial intelligence (AI) system. Some important aspects of the AI technology and AI systems can be explained as follows. Depending on the architecture of the implements AI systems may include many components. One such component is an AI module. An AI module with reference to this disclosure can be explained as a component which runs a model. A model can be defined as reference or an inference set of data, which is use different forms of correlation matrices. Using these models and the data from these models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI models such as linear regression, naïve bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed in the AI module and can be applied to any AI module irrespective of the AI model being executed. A person skilled in the art will also appreciate that the AI module may be implemented as a set of software instructions, combination of software and hardware or any combination of the same.
[0011] Some of the typical tasks performed by AI systems are classification, clustering, regression etc. Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are: face recognition, object identification, gesture recognition, voice recognition etc. In a regression task, the model is trained based on labeled datasets, where the target labels are numeric values. Some of the typical applications of regressions are: Weather forecasting, Stock price predictions, House price estimation, energy consumption forecasting etc. Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities. Learning without labels is called unsupervised learning. Unlabeled data is the majority of data in the world.
[0012] As the AI model forms the core of the AI system, the module needs to be protected against attacks. AI adversarial threats can be largely categorized into – model extraction attacks, inference attacks, evasion attacks, and data poisoning attacks.
[0013] In model extraction attacks, the attacker gains information about the model internals through analysis of input, output, and other external information. Stealing such a model reveals the important intellectual properties of the organization and enables the attacker to craft other adversarial attacks such as evasion attacks. Inference attacks attempt to infer the training data from the corresponding output or other information leaked by the target model. Studies have shown that it is possible to recover training data associated with arbitrary model output. Ability to extract this data further possess data privacy issues. In poisoning attacks, the adversarial carefully inject crafted data to contaminate the training data which eventually affects the functionality of the AI system.
[0014] Evasion attacks are the most prevalent kind of attack that may occur during AI system operations. In this method, the attacker works on the AI algorithm's inputs to find small perturbations leading to large modifications of its outputs (e.g., decision errors) which leads to evasion of the AI model.
[0015] Figure 1 depicts a framework for assessing vulnerability of an AI system (10). The framework comprises the AI system that is in communication with a processor (11). Generally, the processor (11) may be implemented as any or a combination of: one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA).
[0016] The AI system (10) is configured to process images as input queries by means of an AI Model (M) and give an output. A monochrome image has equal channel values for three channels i.e. R,G,B. For Computer vision tasks, the input attack vectors are expected to be visually imperceptible. Hence it is our objective in the presently claimed invention to create visually imperceptible attack vectors for monochrome image and the test the vulnerability of the AI system.
[0017] The AI system (10) comprises the AI Model (M) and at least a defense model (16) amongst other components known to a person skilled in the art such as the input interface (11), output interface (22) and the like. For simplicity only components having a bearing on the methodology disclosed in the present invention have been elucidated.
[0018] As used in this application, the terms "component," "model," "module," "interface," are intended to refer to a computer-related entity or an entity related to, or that is part of, an operational apparatus with one or more specific functionalities, wherein such entities can be either hardware, a combination of hardware and software, software, or software in execution. As further yet another example, interface(s) can include input/output (I/O) components as well as associated processor (11), application, or Application Programming Interface (API) components. These various modules can either be a software embedded in a single chip or a combination of software and hardware where each module and its functionality is executed by separate independent chips connected to each other to function as a system. The AI model (M) is a neural network could be embedded on a separate neural network chip.
[0019] The defense model (16) is configured to identify an attack vector. It can be designed or built-in multiple ways to achieve in ultimate functionality of identifying an attack vector from amongst the input. It can further be configured to block a user or modify the output when a batch of input queries is determined as an attack vector. It is further configured to modify the original output generated by the AI Model (M) on identification of a batch of input queries as attack vector.
[0020] The processor (11) is configured to segregate the values of pixels into three R,G,B channels; calculate a perturbation for each of the R,G,B channel for an input image. The processor (11) is configured to calculate perturbation using gradient methods or approximated methods. Further the processor (11) is configured to calculate an average of perturbation for each of the R,G,B channel; add the averaged perturbation to the input image to get a manipulated image; feed the manipulated image as input to the AI model; record the behavior of the AI system to assess the vulnerability of the AI system. The processor (11) analyzes the response of the defense model (16) for said manipulated image to assess vulnerability of the AI system. The processor (11) is configured to update the defense model (16) based on response of the defense model (16) for said manipulated image. The functionality of the processor (11) in further elaborated in accordance with method steps (200).
[0021] It should be understood at the outset that, although exemplary embodiments are illustrated in the figures and described below, the present disclosure should in no way be limited to the exemplary implementations and techniques illustrated in the drawings and described below.
[0022] Figure 2 illustrates method steps of assessing vulnerability of an AI system (10). The AI system (10) and a processor (11) used to assess vulnerability of the AI system (10) have been explained in accordance with figure 1.
[0023] Method step 201 comprises segregating the values of pixels into three R,G,B channels by means of the processor (11). Each image is an RGB image, sometimes referred to as a true color image is stored as an m-by-n-by-3 data array that defines red, green, and blue color components for each individual pixel. For a monochrome or grayscale image, R=G=B.
[0024] Method step 202 comprises calculating a perturbation for each of the R,G,B channel for an input image by means of the processor (11). This calculation of the perturbation uses gradient methods (FGSM, PGD, BIM, AutoPGD) or approximated methods such as such as Decision boundary, ZOO or similar. The calculation of the perturbation is such that a small perturbation has maximum impact on the output. The impact on the output is to evade the model through misclassification (for classification) or incorrect detection (in object detection or segmentation tasks). The amount of perturbation in each of the three axes can be visualized as a sphere in 3-Dimension, where the radius of the sphere represents the maximum amount of perturbation. The perturbation is crafted such that it is a point inside the defined sphere (boundary condition for maximum perturbation), so as to ensure that the attack is visually imperceptible.
[0025] Method step 203 comprises calculating an average of perturbation for each of the R,G,B channel by means of the processor (11). Method step 204 comprises adding the averaged perturbation to the input image to get a manipulated image. Figure 3 is a visual explanation for the aforementioned method steps. This manipulated image is an attack vector containing the monochrome perturbation. Since the perturbation is monochrome, it is imperceptible to the naked eye, especially for the monochrome images.
[0026] Method step 205 comprises feeding the manipulated image as input to the AI model. Method step 206 comprises recording the behavior of the AI system to assess the vulnerability of the AI system. Recording the behavior comprises analyzing the response of the defense model (16) for said manipulated image to assess vulnerability of the AI system. The processor (11) is configured to update the defense model (16) based on response of the defense model (16) for said manipulated image.
[0027] Figure 4 illustrates an example of an evasion attack on an AI model (M). The example illustrates how an attacker can try to manipulate the output of an AI model (M) that is trained to classify medical diagnosis images. Since X-ray images are monochrome, the real-world effect of the imperceptible perturbations could have life-threatening implications. For example, as in the figure the imperceptible perturbations added to chest X-ray can fool the model into mis-classifying this X-ray image as Pneumonia. Hence the defense model (16) of the AI system (10) must be trained to identify and block such manipulated images with imperceptible perturbations as attack vectors. A defense model trained grayscale attack vectors are robust against 3-channel attack vectors hence protecting the AI system against a large variety of attack vectors.
[0028] It must be understood that the invention in particular discloses methodology used for assessing vulnerability of an AI system (10). The embodiments explained in the above detailed description are only illustrative and do not limit the scope of this invention. Any modification the processor (11) and any adaptation of the method for assessing vulnerability of an AI system (10) are envisaged and form a part of this invention. The scope of this invention is limited only by the claims.
, Claims:We Claim:
1. A processor (11) adapted to assess vulnerability of an AI system (10), the AI system (10) comprising an AI model and at least a defense model (16), said AI model (M) configured to classify an image, the processor (11) adapted to segregate the values of pixels in the image into three R,G,B channels; the processor (11) characterized to:
calculate a perturbation for each of the R,G,B channel for an input image;
calculate an average of perturbation for each of the R,G,B channel;
add the averaged perturbation to the input image to get a manipulated image;
feed the manipulated image as input to the AI model;
record the behavior of the AI system to assess the vulnerability of the AI system.
2. The processor (11) adapted to assess vulnerability of an AI model (M) as claimed in claim 1, wherein the processor (11) is configured to calculate perturbation using gradient methods or approximated methods.
3. The processor (11) adapted to assess vulnerability of an AI model (M) as claimed in claim 1, wherein the processor (11) analyzes the response of the defense model (16) for said manipulated image to assess vulnerability of the AI system (10).
4. The processor (11) adapted to assess vulnerability of an AI model (M) as claimed in claim 1, wherein the processor (11) is configured to update the defense model (16) based on response of the defense model (16) for said manipulated image.
5. A method (200) to assess vulnerability of an AI system (10), said AI system (10) configured to classify an RGB image, the AI system comprising an AI model (M) and at least a defense model (16), said AI model (M) configured to classify an image, a processor (11) in communication with the AI system (10), the method comprising:
segregating (201) the values of pixels in the image into three R,G,B channels by means of a processor (11);
calculating (202) a perturbation for each of the R,G,B channel for an input image by means of the processor (11);
calculating (203) an average of perturbation for each of the R,G,B channel by means of the processor (11);
adding (204) the averaged perturbation to the input image to get a manipulated image;
feeding (205) the manipulated image as input to the AI model;
recording (206) the behavior of the AI system to assess the vulnerability of the AI system.
6. The method (200) to assess vulnerability of an AI system (10) as claimed in claim 5, wherein calculating the perturbation uses gradient methods or approximated methods.
7. The method (200) to assess vulnerability of an AI system (10) as claimed in claim 5, wherein recording the behavior comprises analyzing the response of the defense model (16) for said manipulated image to assess vulnerability of the AI system.
8. The method (200) to assess vulnerability of an AI system (10) as claimed in claim 5, wherein processor (11) is configured to update the defense model (16) based on response of the defense model (16) for said manipulated image.
| # | Name | Date |
|---|---|---|
| 1 | 202341037406-POWER OF AUTHORITY [31-05-2023(online)].pdf | 2023-05-31 |
| 2 | 202341037406-FORM 1 [31-05-2023(online)].pdf | 2023-05-31 |
| 3 | 202341037406-DRAWINGS [31-05-2023(online)].pdf | 2023-05-31 |
| 4 | 202341037406-DECLARATION OF INVENTORSHIP (FORM 5) [31-05-2023(online)].pdf | 2023-05-31 |
| 5 | 202341037406-COMPLETE SPECIFICATION [31-05-2023(online)].pdf | 2023-05-31 |
| 6 | 202341037406-Power of Attorney [14-04-2024(online)].pdf | 2024-04-14 |
| 7 | 202341037406-Covering Letter [14-04-2024(online)].pdf | 2024-04-14 |