A Method Of Assessing Vulnerability Of An Ai Model And A Framework

< Back

A Method Of Assessing Vulnerability Of An Ai Model And A Framework Thereof

Abstract: TITLE: A method (200) of assessing vulnerability of an AI model (M) and a framework (100) thereof. ABSTRACT This invention discloses a framework (100) for assessing vulnerability of an AI model (M) and method (200) thereof. The framework (100) comprises a surrogate AI model (S), an XAI module (30) and at least a processor (20). The processor (20) generates a feature ranking list based on application of the XAI module (30) on an output generated by the surrogate AI model (S). The processor (20) then determines an optimal noise (“e”) using method step 300 needed to generate a manipulated input based on the feature ranking list. Finally, the optimal noise (“e”) is analyzed to assess the vulnerability of the AI model (M). Figure 1.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

31 July 2023

Publication Number

28/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Bosch Global Software Technologies Private Limited

123, Industrial Layout, Hosur Road, Koramangala, Bangalore – 560095, Karnataka, India

Robert Bosch GmbH

Postfach 30 02 20, 0-70442, Stuttgart, Germany

Inventors

1. Manojkumar Somabhai Parmar

#202, Nisarg, Apartment, Nr - . L G Corner, Maninagar, Ahmedabad, Gujarat 380008, India

2. Yuvaraj Govindarajulu

#816, 16th A Main, 23rd B Cross, Sector-3, HSR Layout, Bengaluru, Karnataka 560102, India

3. Pavan Kulkarni

#33, "KALPAVRUKSHA", 2nd cross, Shreya Estate, Gokul road, Hubli - 580030, Dharwad, Karnataka, India

4. Vidit Khazanchi

Flat 1D,Tribhuvan Apartment, 137 Shyama Prasad Mukherjee Road Kolkata – 700026, West Bengal, India

Specification

Description:
Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed

Field of the invention
[0001] The present disclosure relates to the field of Artificial Intelligence security. In particular, it proposes a method of assessing vulnerability of an AI Model and a framework thereof.

Background of the invention
[0002] With the advent of data science, data processing and decision making systems are implemented using artificial intelligence modules. The artificial intelligence modules use different techniques like machine learning, neural networks, deep learning etc. Most of the AI based systems, receive large amounts of data and process the data to train AI models. Trained AI models generate output based on the use cases requested by the user. Typically the AI systems are used in the fields of computer vision, speech recognition, natural language processing, audio recognition, healthcare, autonomous driving, manufacturing, robotics etc. where they process data to generate required output based on certain rules/intelligence acquired through training.

[0003] To process the inputs and give a desired output, the AI systems use various models/algorithms which are trained using the training data. Once the AI system is trained using the training data, the AI systems use the models to analyze the real time data and generate appropriate result. The models may be fine-tuned in real-time based on the results. The AI models in the AI systems form the core of the system. Lots of effort, resources (tangible and intangible), and knowledge goes into developing these models.

[0004] It is possible that some adversary may try to tamper/manipulate/evade the AI model to create incorrect outputs. The adversary may use different techniques to manipulate the output of the model. One of the simplest techniques used by the adversary is where the adversary sends queries to the AI system using his own test data to compute or approximate the gradients through the model. Based on these gradients, the adversary can then manipulate the input to manipulate the output of the Model. Another technique is wherein the adversary may manipulate the input data to bring an artificial output. This will cause hardships to the original developer of the AI in the form of business disadvantages, loss of confidential information, loss of lead time spent in development, loss of intellectual properties, loss of future revenues etc. Hence there is a need to identify such manipulated samples in the test data that try to fool the AI model. It is also imperative that we assess the vulnerability of the AI model in respect of these manipulated samples.

[0005] There are methods known in the prior arts on the method of attacking an AI System. The prior art WO2021/095984 A1 – Apparatus and Method for Retraining Substitute Model for Evasion Attack and Evasion attack Apparatus discloses one such method. The method talks about retraining a substitute model that partially imitates the target model by allowing the target model to misclassify for specific attack data. However, in a classifier type AI Model there is a need to identify adversarial input of attack vectors spread across all classes and test the vulnerability of the AI Model against them.

Brief description of the accompanying drawings
[0006] An embodiment of the invention is described with reference to the following accompanying drawings:
[0007] Figure 1 depicts a framework for assessing vulnerability AI model (M);
[0008] Figure 2 depicts an AI system (10);
[0009] Figure 3 illustrates method steps (200) of assessing vulnerability of the AI model (M);
[0010] Figure 4 illustrates the method steps (300) of determining optimal noise (“e”).

Detailed description of the drawings
[0011] It is important to understand some aspects of artificial intelligence (AI) technology and artificial intelligence (AI) based systems or artificial intelligence (AI) system. Some important aspects of the AI technology and AI systems can be explained as follows. Depending on the architecture of the implements AI systems may include many components. One such component is an AI model (M). A model can be defined as reference or an inference set of data, which is use different forms of correlation matrices. Using these models and the data from these models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI model (M)s such as linear regression, naïve bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed and can be applied to any AI module irrespective of the AI model (M) being executed. A person skilled in the art will also appreciate that the AI model (M) may be implemented as a set of software instructions, combination of software and hardware or any combination of the same.

[0012] Some of the typical tasks performed by AI systems are classification, clustering, regression etc. Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are: face recognition, object identification, gesture recognition, voice recognition etc. In a regression task, the model is trained based on labeled datasets, where the target labels are numeric values. Some of the typical applications of regressions are: Weather forecasting, Stock price predictions, House price estimation, energy consumption forecasting etc. Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities.

[0013] As the AI module forms the core of the AI system, the module needs to be protected against attacks. AI adversarial threats can be largely categorized into – model extraction attacks, inference attacks, evasion attacks, and data poisoning attacks. In poisoning attacks, the adversarial carefully inject crafted data to contaminate the training data which eventually affects the functionality of the AI system. Inference attacks attempt to infer the training data from the corresponding output or other information leaked by the target model. Studies have shown that it is possible to recover training data associated with arbitrary model output. Ability to extract this data further possess data privacy issues.

[0014] In Model Extraction Attacks (MEA), the attacker gains information about the model internals through analysis of input, output, and other external information. Stealing such a model reveals the important intellectual properties of the organization and enables the attacker to craft other adversarial attacks such as evasion attacks. This attack is initiated through an attack vector. In the computing technology a vector may be defined as a method in which a malicious code/virus data uses to propagate itself such as to infect a computer, a computer system, or a computer network. Similarly, an attack vector is defined a path or means by which a hacker can gain access to a computer or a network to deliver a payload or a malicious outcome. A model stealing attack uses a kind of attack vector that can make a digital twin/replica/copy of an AI module.

[0015] Evasion attacks are the most prevalent kind of attack that may occur during AI system operations. In this method, the attacker works on the AI algorithm's inputs to find small perturbations leading to large modifications of its outputs (e.g., decision errors) which leads to evasion of the AI model (M). The attacker typically generates random queries of the size and shape of the input specifications and starts querying the model with these arbitrary queries. The attacker chooses relevant dataset at his disposal to evade or fool the model more efficiently. Our aim through this disclosure is to identify queries that are successful in fooling the AI model (M). Once these model evasion queries/ attack vectors in the dataset are identified, we identify the optimal noise or epsilon that when added to any input makes a successful evasion attack. By pinpointing where the AI model (M) is susceptible to evasion, we can make informed modifications to the model's architecture or learning algorithm.

[0016] Figure 1 depicts a framework (100) for assessing vulnerability of an AI model (M). The framework (100) comprises a surrogate AI model (S), an XAI module (30) and at least a processor (20).

[0017] The AI model (M) is fed with a first set of pre-determined attack vectors to generate a first output by means of the processor (20). The AI model (M) can be a standalone component or part of an AI system. Figure 2 depicts such AI system. In an exemplary embodiment of the present system, the AI model (M) is part of the AI system comprising other components and modules. The AI system additionally comprises an input interface (11), an output interface (22), a submodule (14) and at least a blocker module (18). The submodule (14) is trained using various techniques to identify an attack vector in the input. The blocker module (18) is configured to block a user or modify the output when an input query is determined as an attack vector. The blocker module (18) is configured to at least restrict a user of the AI system in dependance of the assessment. It is further configured to modify the original output generated by the AI model (M) on identification of an input or a batch of input queries as attack vector.

[0018] The surrogate AI model (M) is the reverse engineered replica of the AI model (M) initialized by examining the input predetermined attack vectors and the corresponding first output. Based on the correlation established between the output and the corresponding input (pre-determined attack vector), an architecture/internal working of the AI model (M) is guessed. This is deemed as the surrogate AI model (S).

[0019] The XAI module (30) implements algorithms which give outputs that helps humans understand the reasoning behind decisions or predictions made by the AI. It contrasts with the "black box" concept in machine learning, where even the AI's designers cannot explain why it arrived at a specific decision. The basic goal of XAI is to describe in detail how AI model (M)s produce their prediction, since it is of much help for different reasons.

[0020] In context of the present invention, the XAI module (30) implements a SHAP (SHapley Additive exPlanations) algorithm. It assigns values to features in a prediction, revealing their contributions to the AI model (M)'s output. SHAP assigns values to features, thereby identifies the most influential features in AI predictions, highlighting their impact on the output. Hence, here the XAI module (30) is applied on an output generated by the surrogate AI model (S) to generate a feature ranking list.

[0021] Generally, the processor (20) may be implemented as any or a combination of one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor (20), firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The processor (20) is configured to exchange and manage the processing of information between the components of the framework (100) such as the AI model (M), surrogate AI model (S), and the XAI module (30).

[0022] The processor (20) is configured to feed a first set of pre-determined attack vectors to the AI model (M) to generate a first output by means of a processor (20); initialize the surrogate AI model (S) by examining the input predetermined attack vectors and the corresponding first output; feed a random input to the surrogate AI model (S); generate a feature ranking list based on the application of the XAI module (30) on an output generated by the surrogate AI model (S); determine an optimal noise (“e”) needed to generate a manipulated input based on the feature ranking list; analyze the optimal noise (“e”) to assess the vulnerability of the AI model (M).

[0023] The manipulated input induces the AI model (M) to misclassify the input. the processor (20) determines an optimal noise (“e”) using method steps: add a pre-determined noise (e0) to a combination of one more features in the ranking list for a random input belonging to the first class to generate a perturbed input; feed the perturbed input to the AI model (M) to get a second output; changing the value of pre-determined noise (e0 to e1) and/or the combination of one more features in the ranking list in dependance of the second output; ascertain the value of pre-determined noise (en) as the optimal noise when the second output belongs to the second class.

[0024] As used in this application, the terms "component," "system," "module," "interface," are intended to refer to a computer-related entity or an entity related to, or that is part of, an operational apparatus with one or more specific functionalities, wherein such entities can be either hardware, a combination of hardware and software, software, or software in execution. As further yet another example, interface(s) can include input/output (I/O) components as well as associated processor (20), application, or Application Programming Interface (API) components. The AI system could be a hardware combination of these modules or could be deployed remotely on a cloud or server. Similarly, the framework (100) could be a hardware or a software combination of these modules or could be deployed remotely on a cloud or server.

[0025] It should be understood at the outset that, although exemplary embodiments are illustrated in the figures and described below, the present disclosure should in no way be limited to the exemplary implementations and techniques illustrated in the drawings and described below.

[0026] Figure 3 illustrates method steps of assessing vulnerability of an AI model (M). The framework (100) used to assess vulnerability of the AI model (M) has been explained in accordance with figure 1 and figure 2. For the purposes of clarity, it is reiterated that the framework (100) comprises a surrogate AI model (S), an XAI module (30) and at least a processor (20).

[0027] Method step 201 comprises feeding a first set of pre-determined attack vectors to the AI model (M) to generate a first output by means of the processor (20). Method step 202 comprises initializing a surrogate AI model (S) by examining the input predetermined attack vectors and the corresponding first output. The surrogate AI model (M) is the reverse engineered replica of the AI model (M). Based on the correlation established between the output and the corresponding input (pre-determined attack vector), an internal working of the AI model (M) is guessed, which is deemed as the surrogate AI model (S).

[0028] Method step 203 comprises feeding a random input to the surrogate AI model (S). Method step 204 comprises applying the XAI module (30) on an output generated by the surrogate AI model (S) to generate a feature ranking list. The XAI algorithm used here is SHAP (SHapley Additive exPlanations) algorithm which assigns values to features in a prediction, revealing their contributions to the AI model (M)'s output. SHAP assigns values to features, thereby it identifies the most influential features in AI predictions, highlighting their impact on the output. Based on the outcome of SHAP we get the feature ranking list, i.e. which features contribute the most to the output of the AI model (M).

[0029] Method step 205 comprises determining an optimal noise (“e”) needed to generate a manipulated input based on the feature ranking list. The manipulated input induces the AI model (M) to misclassify the input. For example, the AI model (M) is configured to classify inputs into two classes – cat and dogs. A manipulated input is such that although it is a dog it induces the AI model (M) to classify it as cat and vice-versa.

[0030] Figure 4 illustrates the method steps of determining optimal noise (“e”). In method step 301, the processor (20) adds a pre-determined noise (e0) to a combination of one more features in the ranking list for a random input belonging to the first class to generate a perturbed input. Usually, the top features in the ranking list are first perturbed. In method step 302, the processor (20) feeds the perturbed input to the AI model (M) to get a second output. The output is analyzed i.e. whether or not the manipulated input based on (e0) was successful in fooling the model to misclassify or not.

[0031] If the AI model (M) doesn’t misclassify, the processor (20) changes (method step 303) the value of pre-determined noise (e0 to e1) and the combination of one more features in the ranking list. In an alternate method of determining the optimal noise (“e”) the processor (20) changes the value of pre-determined noise (e0 to e1) or the combination of one more features in the ranking list. The alterations are made to the features in a specific order, and the size of the alteration is determined by the surrogate model's understanding of how the AI model (M) will react to changes in the feature. If a successful alteration is found, the size of the alteration and the number of features that had to be altered are recorded. The process returns a list of all successful alterations, each being a pair of the number of features altered and the size of the alteration. This efficient use of binary search allows for quickly finding the smallest alterations that can cause the target model to make a mistake, saving time and computational resources. In method step 304 the processor (20) ascertains the value of pre-determined noise (en ) as the optimal noise when the second output belongs to the second class.

[0032] Method step 206 comprises analyzing the optimal noise (“e”) to assess the vulnerability of the AI model (M). This technique calculates the average min-max epsilon for multiple inputs, representing the smallest change needed to cause an AI model (M) to misclassify a sample. This information, averaged over all samples, can be utilized for various purposes, including plotting and analysis. A significant application of these average optimal epsilon values is in testing phase. By comparing these values for various subsets of input, one can determine which subset is more robust and resistant to the attacking technique. The subset with the higher average optimal epsilon is considered as more resilient towards evasion attacks, as it necessitates larger alterations to the samples to induce misclassification. Thus, this method not only illuminates the vulnerabilities of the AI model (M)s but also provides a quantitative measure of their robustness.

[0033] Consider a scenario of an image classification model used for self-driving cars to recognize traffic signs. An attacker could use an adversarial attack with an optimal epsilon to slightly perturb an image of a "Stop" sign. The perturbation is almost imperceptible to humans but may cause the model to misclassify it as a "Speed Limit" sign, leading to potentially dangerous consequences. Another real time situation could be in medical diagnosis system where machine learning model used to diagnose medical conditions based on patient data. An adversary could introduce subtle changes to a patient's medical records with an epsilon. The perturbations may not be noticeable to doctors or patients but could cause the model to make incorrect diagnoses, potentially leading to improper treatment. Hence there is a need for the present invention in the real world to identify this optimal epsilon for AI models deployed in real critical situations.

[0034] It must be understood that the embodiments explained in the above detailed description are only illustrative and do not limit the scope of this invention. Any modification the framework (100) and adaptation of the method (200) assessing vulnerability of an AI model (M) are envisaged and form a part of this invention. The scope of this invention is limited only by the claims.

, Claims:We Claim:

1. A method (200) of assessing vulnerability of an AI model (M), said AI model (M) configured to classify an input into at least two classes – a first class and a second class, the method comprising:
feeding a first set of pre-determined attack vectors to the AI model (M) to generate a first output by means of a processor (20);
initializing a surrogate AI model (S) by examining the input predetermined attack vectors and the corresponding first output;
feeding a random input to the surrogate AI model (S);
applying an XAI module (30) on an output generated by the surrogate AI model (S) to generate a feature ranking list;
determining an optimal noise (“e”) needed to generate a manipulated input based on the feature ranking list by means of the processor (20);
analyzing the optimal noise (“e”) by means of the processor (20) to assess the vulnerability of the AI model (M).

2. The method (200) of assessing vulnerability of an AI model (M) as claimed in claim 1, wherein the manipulated input induces the AI model (M) to misclassify the input.

3. The method (200) of assessing vulnerability of an AI model (M) as claimed in claim 1, wherein the determining an optimal noise (“e”) (205) further comprises:
adding (301) a pre-determined noise (e0) to a combination of one more features in the ranking list for a random input belonging to the first class to generate a perturbed input;
feeding (302) the perturbed input to the AI model (M) to get a second output;
changing (303) the value of pre-determined noise (e0 to e1) and the combination of one more features in the ranking list in dependance of the second output;
ascertaining (304) the value of pre-determined noise (en ) as the optimal noise when the second output belongs to the second class.

4. The method (200) of assessing vulnerability of an AI model (M) as claimed in claim 1, wherein the determining an optimal noise (“e”) further comprises:
adding (301) a pre-determined noise (e0) to a combination of one more features in the ranking list for a random input belonging to the first class to generate a perturbed input;
feeding (302) the perturbed input to the AI model (M) to get a second output;
changing (303) the value of pre-determined noise (e0 to e1) or the combination of one more features in the ranking list in dependance of the second output;
ascertaining (304) the value of pre-determined noise (en ) as the optimal noise when the second output belongs to the second class.

5. A framework (100) for assessing the vulnerability of an AI model (M), said AI model (M) configured to classify an input into at least two classes – a first class and a second class, the framework comprising a surrogate AI model (S), an XAI module (30) in communication with the surrogate AI model (S) and at least a processor (20), said processor (20) in communication with the AI model (M), characterized in that framework:
the processor (20) configured to :
feed a first set of pre-determined attack vectors to the AI model (M) to generate a first output;
initialize a surrogate AI model (S) by examining the input predetermined attack vectors and the corresponding first output;
feed a random input to the surrogate AI model (S);
generate a feature ranking list based on the application of XAI module (30) on an output generated by the surrogate AI model (S);
determine an optimal noise (“e”) needed to generate a manipulated input based on the feature ranking list;
analyze the optimal noise (“e”) to assess the vulnerability of the AI model (M).

6. The framework (100) for assessing the vulnerability of an AI model (M) as claimed in claim 5, the manipulated input induces the AI model (M) to misclassify the input.

7. The framework (100) for assessing the vulnerability of an AI model (M) as claimed in claim 5, wherein the processor (20) determines an optimal noise (“e”) using method steps:
add a pre-determined noise (e0) to a combination of one more features in the ranking list for a random input belonging to the first class to generate a perturbed input;
feed the perturbed input to the AI model (M) to get a second output;
changing the value of pre-determined noise (e0 to e1) and the combination of one more features in the ranking list in dependance of the second output;
ascertain the value of pre-determined noise (en) as the optimal noise when the second output belongs to the second class.

8. The framework (100) for assessing the vulnerability of an AI model (M) as claimed in claim 5, wherein the processor (20) determines an optimal noise (“e”) using method steps:
add a pre-determined noise (e0) to a combination of one more features in the ranking list for a random input belonging to the first class to generate a perturbed input;
feed the perturbed input to the AI model (M) to get a second output;
changing the value of pre-determined noise (e0 to e1) or the combination of one more features in the ranking list in dependance of the second output;
ascertain the value of pre-determined noise (en ) as the optimal noise when the second output belongs to the second class.

Documents

Application Documents

#	Name	Date
1	202341051282-POWER OF AUTHORITY [31-07-2023(online)].pdf	2023-07-31
2	202341051282-FORM 1 [31-07-2023(online)].pdf	2023-07-31
3	202341051282-DRAWINGS [31-07-2023(online)].pdf	2023-07-31
4	202341051282-DECLARATION OF INVENTORSHIP (FORM 5) [31-07-2023(online)].pdf	2023-07-31
5	202341051282-COMPLETE SPECIFICATION [31-07-2023(online)].pdf	2023-07-31
6	202341051282-Power of Attorney [10-05-2024(online)].pdf	2024-05-10
7	202341051282-Covering Letter [10-05-2024(online)].pdf	2024-05-10