Abstract: TITLE: A method (200) and framework (10) of identifying a vulnerable AI Model in a repository (16). Abstract The present disclosure proposes a method of identifying a vulnerable AI model in a repository (16) and a framework (10) thereof. Method steps (200) include comparing the checksum of at least one searched AI model with the checksum of the reference AI model by means of a processor (14). This is followed by comparison of the internal architecture of the searched AI model with the reference AI model. The searched AI model is identified as vulnerable in dependence of the said comparison and vulnerability assessment is performed or recommended to the user. Figure 3.
Description:Complete Specification:
The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed
Field of the invention
[0001] The present disclosure relates to the field of artificial intelligence (AI) and AI security along with cybersecurity. In particular, the present invention discloses method and framework of identifying a vulnerable AI Model in a repository.
Background of the invention
[0002] With the advent of data science, data processing and decision-making systems are implemented using artificial intelligence modules. The artificial intelligence modules use different techniques like machine learning, neural networks, deep learning etc. Most of the AI based systems, receive large amounts of data and process the data to train AI models. Trained AI models generate output based on the use cases requested by the user. Typically, the AI systems are used in the fields of computer vision, speech recognition, natural language processing, audio recognition, healthcare, autonomous driving, manufacturing, robotics etc. where they process data to generate required output based on certain rules/intelligence acquired through training.
[0003] To process the inputs and give a desired output, the AI systems use various models/algorithms which are trained using the training data. Once the AI system is trained using the training data, the AI systems use the models to analyze the real time data and generate appropriate result. The models may be fine-tuned in real-time based on the results. The models in the AI systems form the core of the system. Lots of effort, resources (tangible and intangible), and knowledge goes into developing these models.
[0004] Companies working on AI development are having a tough time identifying models in the repositories and then configuring and triggering API calls for vulnerability analysis of Models and Artefacts. Repositories contain trained models, python scripts, package managers and other key artefacts necessary for model development and deployment. Traditional file repository scanning tools do not specialize in checking AI models and the internal contents. Additionally, identification of trivial and substantial changes within the AI Models is a key in deciding complete repo scan vs individual models’ scans. There is a need for a method to identify or discover AI models in the repository that require scanning afresh.
Brief description of the accompanying drawings
[0005] An embodiment of the invention is described with reference to the following accompanying drawings:
[0006] Figure 1 depicts a framework (10) identifying a vulnerable AI Model;
[0007] Figure 2 illustrates method steps for identifying a vulnerable AI Model in a repository (16);
[0008] Figure 3 is a process flow diagram for the method step 200.
Detailed description of the drawings
[0009] Figure 1 depicts a framework (10) identifying a vulnerable AI Model. The framework (10) comprises a repository (16) in communication with an input/output interface via a processor (14). The repository (16) refers to the centralized digital storage that developers used to make and manage changes to an AI model(s). The repository (16) comprises said AI model(s), associated artefacts, model files, configuration files and the like collection of objects. For example, the repository contains “n” number of AI models, each having different number of layers (say M, n1,n2 and the like). These AI models may be different versions of one original AI model M that has been altered differently by different developers.
[0010] An AI Model is a program or algorithm that may be embedded in a specialized hardware, that utilizes a set of datasets to recognize certain patterns. Using these models and the data from these models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of AI models such as linear regression, naïve bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed in the AI module and can be applied to any AI module irrespective of the AI model being executed.
[0011] A person skilled in the art will also appreciate that the AI module may be implemented as a set of software instructions, combination of software and hardware or any combination of the same. For example, a neural network is embodied within an electronic chip. Such neural network chips are specialized silicon chips, which incorporate AI technology and are used for machine learning. Other AI models such as linear regression, naïve bayes classifier, support vector machine can be a set of software instructions residing in a cloud.
[0012] The processor (14) is configured to exchange and manage the processing of information between the components of the framework (10) such as the repository (16) and input-output interface (12). The processor (14) may be implemented as any or a combination of one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA).
[0013] In particular the processor (14) is configured to : search for AI model files in the repository (16); designate a previously scanned AI model as the reference AI model; compare the checksum of at least one searched AI model with the checksum of the reference AI model; execute the searched AI model if the checksum of the searched AI model is different from the reference AI model; compare the internal architecture of the searched AI model with the reference AI model in dependance of the successful execution of the searched AI model; identify the searched AI model as vulnerable in dependence of the said comparison.
[0014] Further, the processor (14) compares the internal architecture by comparing the number of layers and dimension of each layer of the searched AI model with the reference AI model. The processor (14) performs a vulnerability analysis on the searched AI model if the number of layers and dimension of the searched AI model differs with the reference AI model beyond a pre-determined threshold. Furthermore, the processor (14) further compares the internal architecture by : comparing the weights and attributes of each layer of the searched AI model with reference AI model; calculating a distance between the weights using mean square error. The processor (14) performs a vulnerability analysis on the searched AI model if the calculated distance is beyond an acceptable limit.
[0015] The input-output interface (12) is a combination of software and hardware both that receives input from at least one user and displays the corresponding output. The input-output interface (12) displays the list of AI models identified as vulnerable. It further displays the results of vulnerability assessment performed on each of the AI model(s).
[0016] As used in this application, the terms "component," "framework (10)," "module," "interface," are intended to refer to a computer-related entity or an entity related to, or that is part of, an operational apparatus with one or more specific functionalities, wherein such entities can be either hardware, a combination of hardware and software, software, or software in execution. As further yet another example, interface(s) can include input/output (I/O) components as well as associated processor, application, or Application Programming Interface (API) components. The framework (10) could be a hardware combination of these modules or could be deployed remotely on a cloud or server.
[0017] It should be understood at the outset that, although exemplary embodiments are illustrated in the figures and described below, the present disclosure should in no way be limited to the exemplary implementations and techniques illustrated in the drawings and described below.
[0018] Figure 2 illustrates method steps for identifying a vulnerable AI Model in a repository (16). The framework (10) on the repository (16) that is used to implement these method steps has been explained in accordance with figure 1. For the purposes of clarity, it is reiterated that the framework (10) comprises the repository (16) that is in communication with an input/output interface via a processor (14).
[0019] Method step 201 comprises searching for AI model files in the repository (16) by means of the processor (14). In this method step all files, artefacts and collection of like objects in the repository (16) are scanned based on their file extension (for example .pdf, .doc, etc.). Files that are actually AI models will be listed out separately.
[0020] Method step 202 comprises designating a previously scanned AI model as the reference AI model. The reference AI model refers to the AI model predecessor to the searched AI model against which the searched AI model will be compared. The reference model is the previous (snapshot in time) saved version of the model in the repository. Typically, a developer would work on this reference AI model for improvements and updates and save the next version to the repository. In absence of a reference AI model, the searched model will be identified as vulnerable and processed appropriately.
[0021] Method step 203 comprises comparing the checksum of at least one searched AI model with the checksum of the reference AI model by means of the processor (14). A checksum is a unique sequence of numbers and letters assigned to every version of every file that is uploaded on the repository (16). By checking the checksum, we basically check if searched AI model is same as the original/reference AI model or not. Every time the AI model is altered a new checksum is created for each version.
[0022] Method step 204 comprises executing the searched AI model if the checksum of the searched AI model is different from the reference AI model. This is basically to ascertain the legitimacy of the searched AI model i.e. to validate that the searched AI model is not corrupt.
[0023] Method step 205 comprises comparing the internal architecture of the searched AI model with the reference AI model in dependance of the successful execution of the searched AI model by means of the processor (14). Comparing the internal architecture follows a top-down approach i.e. first we do comparison on the overall architecture then within each level of the AI model. Method step 206 comprises identifying the searched AI model as vulnerable in dependence of the said comparison by means of the processor (14).
[0024] The first level of comparison comprises comparing the number of layers and dimension of each layer of the searched AI model with the reference AI model. A vulnerability analysis is performed or recommended to the user via the input-output interface (12) on the searched AI model if the number of layers and dimension of the searched AI model differs with the reference AI model beyond a pre-determined threshold.
[0025] The second level of comparison of the internal architecture comprises comparing the weights and attributes of each layer of the searched AI model with reference AI model. A vulnerability analysis is performed or recommended to the user via the input-output interface (12) on the searched AI model if the calculated distance is beyond an acceptable limit.
[0026] Figure 3 is a process flow diagram for the method step 200. The model input or Mi is searched AI model. If the checksum of Mi matches with that of Mref, we exit the process i.e. there has been no change in the reference AI model or the searched AI model is same as the reference AI model, hence no vulnerability assessment is warranted. If the checksum differs we proceed with sanity check that is method step 204. If the execution fails that means the searched AI model is corrupt, hence no vulnerability assessment is warranted. On a successful execution, we proceed to method step 205.
[0027] First the overall AI model architecture is compared as mentioned in para [0024]. If no substantial change is detected, it mean the searched AI model has not altered significantly, Hence, no vulnerability assessment is warranted. If a substantial change is detected, the internals of the model architecture is compared as mentioned in para [0025]. Finally, based on this comparison we know whether or not a vulnerability assessment is warranted.
[0028] A person skilled in the art will appreciate that while these method steps describes only a series of steps to accomplish the objectives, these methodologies may be implemented modifications to the method step and customization to the framework (10).
[0029] This idea to develop a method and framework (10) of identifying a vulnerable AI Model in a repository (16) can automatically identify models, their formats, and metadata within an organization’s complex collection of AI models in repository (16). It can further help to automatically trigger an API call when a new model is discovered or when an existing model undergoes a significant change. Additionally, it provides comprehensive reporting detailing the overview of all models present in the repository (16). This method will streamline and automate the vulnerability scanning process that will make further adversarial vulnerability assessment of the AI models smooth.
[0030] It must be understood that the embodiments explained in the above detailed description are only illustrative and do not limit the scope of this invention. Any modification to the method and framework (10) of identifying a vulnerable AI Model in repository (16) are envisaged and form a part of this invention. The scope of this invention is limited only by the claims.
, Claims:We Claim:
1. A method (200) of identifying a vulnerable AI Model in a repository (16), said AI model stored in a repository (16) amongst other artefacts and collection of objects, the repository (16) in communication with a processor (14), the method comprising: searching (201) for AI model files in the repository (16) by means of the processor (14), characterized in that method:
Designating (202) a previously scanned AI model as the reference AI model;
comparing (203) the checksum of at least one searched AI model with the checksum of the reference AI model by means of the processor (14);
executing (204) the searched AI model if the checksum of the searched AI model is different from the reference AI model;
comparing (205) the internal architecture of the searched AI model with the reference AI model in dependance of the successful execution of the searched AI model by means of the processor (14);
identifying (206) the searched AI model as vulnerable in dependence of the said comparison by means of the processor (14).
2. The method (200) of identifying a vulnerable AI Model in a repository (16) as claimed in claim 1, wherein comparing (205) the internal architecture comprises comparing the number of layers and dimension of each layer of the searched AI model with the reference AI model.
3. The method (200) of identifying a vulnerable AI Model in a repository (16) as claimed in claim 1, wherein a vulnerability analysis is performed on the searched AI model if the number of layers and dimension of the searched AI model differs with the reference AI model beyond a pre-determined threshold.
4. The method (200) of identifying a vulnerable AI Model in a repository (16) as claimed in claim 1, wherein comparing (205) the internal architecture further comprises:
comparing the weights and attributes of each layer of the searched AI model with reference AI model;
calculating a distance between the weights using mean square error.
5. The method (200) of identifying a vulnerable AI Model in a repository (16) as claimed in claim 1, wherein a vulnerability analysis is performed on the searched AI model if the calculated distance is beyond an acceptable limit.
6. A framework (10) for identifying a vulnerable AI Model in a repository (16), the framework (10) comprising a repository (16) in communication with at least a processor (14), the framework (10) characterized by the processor (14) configured to:
search for AI model files in the repository (16):
designate a previously scanned AI model as the reference AI model;
compare the checksum of at least one searched AI model with the checksum of the reference AI model;
execute the searched AI model if the checksum of the searched AI model is different from the reference AI model;
compare the internal architecture of the searched AI model with the reference AI model in dependance of the successful execution of the searched AI model;
identify the searched AI model as vulnerable in dependence of the said comparison.
7. The framework (10) for identifying a vulnerable AI Model in a repository (16) as claimed in claim 6, wherein the processor (14) compares the internal architecture by comparing the number of layers and dimension of each layer of the searched AI model with the reference AI model.
8. The framework (10) for identifying a vulnerable AI Model in a repository (16) as claimed in claim 6, wherein the processor (14) performs a vulnerability analysis on the searched AI model if the number of layers and dimension of the searched AI model differs with the reference AI model beyond a pre-determined threshold.
9. The framework (10) for identifying a vulnerable AI Model in a repository (16) as claimed in claim 6, wherein the processor (14) further compares the internal architecture by :
comparing the weights and attributes of each layer of the searched AI model with reference AI model;
calculating a distance between the weights using mean square error.
10. The framework (10) for identifying a vulnerable AI Model in a repository (16) as claimed in claim 6, wherein the processor (14) performs a vulnerability analysis on the searched AI model if the calculated distance is beyond an acceptable limit.
| # | Name | Date |
|---|---|---|
| 1 | 202341055421-POWER OF AUTHORITY [18-08-2023(online)].pdf | 2023-08-18 |
| 2 | 202341055421-FORM 1 [18-08-2023(online)].pdf | 2023-08-18 |
| 3 | 202341055421-DRAWINGS [18-08-2023(online)].pdf | 2023-08-18 |
| 4 | 202341055421-DECLARATION OF INVENTORSHIP (FORM 5) [18-08-2023(online)].pdf | 2023-08-18 |
| 5 | 202341055421-COMPLETE SPECIFICATION [18-08-2023(online)].pdf | 2023-08-18 |
| 6 | 202341055421-Power of Attorney [09-05-2024(online)].pdf | 2024-05-09 |
| 7 | 202341055421-Covering Letter [09-05-2024(online)].pdf | 2024-05-09 |