Method And System For Generating A Data Model For Adversarial Sample

< Back

Method And System For Generating A Data Model For Adversarial Sample Detection

Abstract: ABSTRACT METHOD AND SYSTEM FOR GENERATING A DATA MODEL FOR ADVERSARIAL SAMPLE DETECTION State of the art mechanisms for adversarial attack detection require complex re-training procedures and parameter tuning. Existing techniques like adversarial training and other adversarial detection methods require large number of adversarial samples to train or re-train the models to achieve a required level of robustness. This process could degrade the original performance of the models. The disclosure herein generally relates to adversarial sample detection, and, more particularly, to a method and system for generating a data model for adversarial sample detection. The system allows training of the data model using varying strengths of different adversarial attacks on a generated training data, such that a plurality of optimal detector parameters generated by virtue of the training can be used for the adversarial sample detection. [To be published with FIG. 3]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

28 February 2022

Publication Number

35/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. BODEMPUDI, Vineetha

Tata Consultancy Services Limited, GS252, Deccan Park, Plot No 1, Survey No. 64/2, Software Units Layout , Serilingampally Mandal, Madhapur, Hyderabad 500081, Telangana, India

2. CHALAMALA, Srinivasa Rao

Tata Consultancy Services Limited, GS255, Deccan Park, Plot No 1, Survey No. 64/2, Software Units Layout , Serilingampally Mandal, Madhapur, Hyderabad 500081, Telangana, India

3. SINGH, Ajeet Kumar

Tata Consultancy Services Limited, Tata Research Development & Design Centre, Cubicle 271, 54-B, Hadapsar Industrial Estate, Hadapsar, Pune 411013, Maharashtra, India

4. PATHIVADA, Kanaka Mahalakshmi

Tata Consultancy Services Limited, 3N2-07, Deccan Park, Plot No 1, Survey No. 64/2, Software Units Layout , Serilingampally Mandal, Madhapur, Hyderabad 500081, Telangana, India

5. VANGAVOLU, Kumaramangalam

Tata Consultancy Services Limited, 3N2-04, Deccan Park, Plot No 1, Survey No. 64/2, Software Units Layout , Serilingampally Mandal, Madhapur, Hyderabad 500081, Telangana, India

Specification

Claims:We Claim:
1. A processor implemented method (200) of generating a data model for adversarial sample detection, comprising:
generating (202) a training data, via one or more hardware processors, wherein generating the training data comprising:
fetching (302) a first clean sample set (S1), a second clean sample set (S2), and a third clean sample set (S3), wherein each of the S1, the S2, and the S3 comprises of a plurality of clean data samples;
generating (304) a plurality of adversarial data samples, wherein the plurality of adversarial data samples are generated by applying a plurality of adversarial attacks on the clean data samples in the S3; and
generating (306) a plurality of support data sets from the clean data samples in the S1, by applying the plurality of adversarial attacks on the clean data samples in the S1;
training (204) the data model using the generated training data, via the one or more hardware processors, comprising:
initializing the plurality of adversarial attacks with an assigned value of split percentage for each of the plurality of adversarial attacks such that summation of value of the split percentage of the plurality of adversarial attacks is equal to 1;
determining an optimum value of the split percentage for each of the plurality of adversarial attacks;
splitting a training dataset into a plurality of splits, based on the optimum value of the split percentage;
updating the plurality of adversarial samples by applying each of the plurality of the adversarial attacks on the plurality of splits;
generating an adversarial class using the updated plurality of adversarial samples;
generating the data model using the generated adversarial class and the clean data samples in the second clean sample set; and
training the generated data model using the generated training data to obtain a plurality of optimal detector parameters, wherein the plurality of optimal detector parameters are used for the adversarial sample detection.

2. The method as claimed in claim 1, wherein the composition of the adversarial class is fixed to satisfy a saturation point, wherein the saturation point is satisfied in terms of (a) a measured difference between ADR of each of the test attacks before and after updating the split percentage of the test attack, and (b) sum of differences of the ADR of the attacks other than the test attack before and after updating the split percentage.

3. The method as claimed in claim 1, wherein determining the optimum value of the split percentage for each of the plurality of adversarial attacks comprising:

obtaining an updated training dataset by training the data model using data in an adversarial class after fixing a data composition in the adversarial class;
determining Attack Detection Rates (ADR) on each of a plurality of support sets obtained from the training dataset, after training the data model using the updated training dataset;
selecting each attack from among the plurality of adversarial attacks as a test attack, in each of a plurality of iterations till all the plurality of adversarial attacks are selected as the test attack and the optimum value of the split percentage for each of the plurality of adversarial attacks is determined; and
processing the test attack selected in each iteration, comprising:
determining if the computed value of ADR for the test attack is less than a threshold, for an attack selected as a test attack from among the plurality of adversarial attacks; and
increasing the split percentage of the test attack by a pre-defined percentage m, comprising reducing the split percentage of the attacks other than the test attack from among the plurality of adversarial attacks by an amount to compensate for the increase in split percentage of the test attack.

4. The method as claimed in claim 1, further comprises:
receiving, via the one or more hardware processors, a test data sample as input;
determining, via the one or more hardware processors, whether the received test data sample is an adversarial class or a clean class, by processing the test data sample using the data model; and
performing, via the one or more hardware processors, one of:
discarding the received test data sample if the received test data sample is determined as the adversarial class; or
sending the received test data sample to a target model if the received test data sample is determined as the clean class.

5. A system (100) for generating a data model for adversarial sample detection, comprising:
one or more hardware processors (102);
a communication interface (106); and
a memory storing a plurality of instructions, wherein the plurality of instructions when executed, cause the one or more hardware processors to:
generate a training data, by:
fetching a first clean sample set (S1), a second clean sample set (S2), and a third clean sample set (S3), wherein each of the S1, the S2, and the S3 comprises of a plurality of clean data samples;
generating a plurality of adversarial data samples, wherein the plurality of adversarial data samples are generated by applying a plurality of adversarial attacks on the clean data samples in the S3; and
generating a plurality of support data sets from the clean data samples in the S1, by applying the plurality of adversarial attacks on the clean data samples in the S1;
train the data model using the generated training data, by:
initializing the plurality of adversarial attacks with an assigned value of split percentage for each of the plurality of adversarial attacks such that summation of value of the split percentage of the plurality of adversarial attacks is equal to 1;
determining an optimum value of the split percentage for each of the plurality of adversarial attacks;
splitting a training dataset into a plurality of splits, based on the optimum value of the split percentage;
updating the plurality of adversarial samples by applying each of the plurality of the adversarial attacks on the plurality of splits;
generating an adversarial class using the updated plurality of adversarial samples;
generating the data model using the generated adversarial class and the clean data samples in the second clean sample set; and
training the generated data model using the generated training data to obtain a plurality of optimal detector parameters, wherein the plurality of optimal detector parameters are used for the adversarial sample detection.

6. The system as claimed in claim 5, wherein the composition of the adversarial class is fixed to satisfy a saturation point, wherein the saturation point is satisfied in terms of (a) a measured difference between ADR of each of the test attacks before and after updating the split percentage of the test attack, and (b) sum of differences of the ADR of the attacks other than the test attack before and after updating the split percentage.

7. The system as claimed in claim 5, wherein the one or more hardware processors are configured to determine the optimum value of the split percentage for each of the plurality of adversarial attacks by:

obtaining an updated training dataset by training the data model using data in an adversarial class after fixing a data composition in the adversarial class;
determining Attack Detection Rates (ADR) on each of a plurality of support sets obtained from the training dataset, after training the data model using the updated training dataset;
selecting each attack from among the plurality of adversarial attacks as a test attack, in each of a plurality of iterations till all of the plurality of adversarial attacks are selected as the test attack and the optimum value of the split percentage for each of the plurality of adversarial attacks is determined; and
processing the test attack selected in each iteration, comprising:
determining if the computed value of ADR for the test attack is less than a threshold, for an attack selected as a test attack from among a plurality of adversarial attacks; and
increasing the split percentage of the test attack by a pre-defined percentage m, comprising reducing the split percentage of the attacks other than the test attack from among a plurality of adversarial attacks by an amount to compensate for the increase in split percentage of the test attack.

8. The system as claimed in claim 5, wherein the one or more hardware processors are configured to perform the adversarial sample detection by:
receiving a test data sample as input;
determining whether the received test data sample is an adversarial class or a clean class, by processing the test data sample using the data model; and
performing one of:
discarding the received test data sample if the received test data sample is determined as the adversarial class; or
sending the received test data sample to a target model if the received test data sample is determined as the clean class.

Dated this 28th day of February 2022

Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086 , Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:

METHOD AND SYSTEM FOR GENERATING A DATA MODEL FOR ADVERSARIAL SAMPLE DETECTION

Applicant

Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to adversarial sample detection, and, more particularly, to a method and system for generating a data model for adversarial sample detection.

BACKGROUND
Performance of any machine learning data model depends on quality of data used to generate and train the data model. While accurate training data allows the data model to generate accurate predictions, false or inaccurate training data reduces accuracy and efficiency with which various predictions are made by the data model.
An adversarial attack is a method to generate adversarial examples or data samples. Adversarial attacks that add very minute perturbations to clean data, fool deep learning models, and result in security and safety concerns when used in critical applications. These adversarial examples, which is an input to a machine learning model, is purposely designed to cause a model to make a mistake in its predictions.
Several defense mechanisms for adversarial attacks have been proposed in the literature, but they require complex re-training procedures and parameter tuning. Existing techniques like adversarial training and other adversarial detection methods require large number of adversarial samples to train or re-train the models to achieve a required level of robustness. This process could degrade the original performance of the models.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method of generating a data model for adversarial sample detection is provided. The method includes generating a training data, via one or more hardware processors. Generating the training data includes the following steps. Initially, a first clean sample set (S1), a second clean sample set (S2), and a third clean sample set (S3) are fetched as input, wherein each of the S1, S2, and S3 comprises of a plurality of clean data samples. Further, a plurality of adversarial data samples are generated by applying a plurality of adversarial attacks on the clean data samples in S3. Further, a plurality of support data sets are generated from the clean data samples in S1, by applying a plurality of adversarial attacks on the clean data samples in S1. The clean data samples in S1, S2, and S3, the plurality of adversarial data samples, and the plurality of support data sets form the training data of target/black-box model. Further, the data model is trained using the generated training data, via the one or more hardware processors. Training the data model includes the following steps. Initially, the plurality of adversarial attacks are initialized with an assigned value of split percentage for each of the plurality of adversarial attacks such that summation of value of the split percentage of the plurality of adversarial attacks is equal to 1. Further, an optimum value of the split percentage is determined for each of the plurality of adversarial attacks. Further, the training dataset is split into a plurality of splits, based on the optimum value of the split percentage. Further, the plurality of adversarial samples are updated by applying each of the plurality of the adversarial attacks on the plurality of splits. Further, an adversarial class is generated using the updated plurality of adversarial samples. Then the data model is generated using the generated adversarial class and the clean data samples in the second clean sample set. Further, the generated data model is trained using the generated training data to obtain a plurality of optimal detector parameters, wherein the plurality of optimal detector parameters are used for the adversarial sample detection.
In another embodiment, determining the optimum value of the split percentage for each of the plurality of adversarial attacks comprises of the following steps. Initially, an updated training dataset is obtained by training the data model using data in an adversarial class after fixing a data composition in the adversarial class. Further, an Attack Detection Rates (ADR) on each of a plurality of support sets obtained from the training dataset is determined, after training the data model using the updated training dataset. Further, each attack from among the plurality of adversarial attacks is selected as a test attack, in each of a plurality of iterations till all of the plurality of attacks are selected as the test attack and the optimum value of the split percentage for each of the plurality of adversarial attacks is determined, and the test attack selected in each iteration is processed, wherein processing the test attack includes the following steps. Initially, it is determined if the computed value of ADR for the test attack is less than a threshold, for an attack selected as a test attack from among a plurality of attacks. Further, the split percentage of the test attack is increased by a pre-defined percentage m, comprising reducing the split percentage of the attacks other than the test attack from among a plurality of attacks by an amount to compensate for the increase in split percentage of the test attack.
In yet another embodiment, the method comprises receiving, via the one or more hardware processors, a test data sample as input. Further, it is determined via the one or more hardware processors, whether the received test data sample is an adversarial class or a clean class, by processing the test data sample using the data model. The received test data sample is discarded if identified as the adversarial class, or is sent to a target model if the received test data sample is determined as the clean class.
In yet another embodiment, a system for generating a data model for adversarial sample detection is provided. The system includes one or more hardware processors, a communication interface, and a memory storing a plurality of instructions. The plurality of instructions when executed, cause the one or more hardware processors to initially generate a training data. In the process of generating the training data, a first clean sample set (S1), a second clean sample set (S2), and a third clean sample set (S3) are fetched as input, wherein each of the S1, S2, and S3 comprises of a plurality of clean data samples. Further, a plurality of adversarial data samples are generated by applying a plurality of adversarial attacks on the clean data samples in S3. Further, a plurality of support data sets are generated from the clean data samples in S1, by applying a plurality of adversarial attacks on the clean data samples in S1. The clean data samples in S1, S2, and S3, the plurality of adversarial data samples, and the plurality of support data sets form the training data of target/black-box model. Further, the system trains the data model using the generated training data, via the one or more hardware processors. Training the data model includes the following steps. Initially, the plurality of adversarial attacks are initialized with an assigned value of split percentage for each of the plurality of adversarial attacks such that summation of value of the split percentage of the plurality of adversarial attacks is equal to 1. Further, an optimum value of the split percentage is determined for each of the plurality of adversarial attacks. Further, the training dataset is split into a plurality of splits, based on the optimum value of the split percentage. Further, the plurality of adversarial samples are updated by applying each of the plurality of the adversarial attacks on the plurality of splits. Further, an adversarial class is generated using the updated plurality of adversarial samples. Then the data model is generated using the generated adversarial class and the clean data samples in the second clean sample set. Further, the generated data model is trained using the generated training data to obtain a plurality of optimal detector parameters, wherein the plurality of optimal detector parameters are used for the adversarial sample detection.
In yet another embodiment, the one or more hardware processors in the system are configured to determine the optimum value of the split percentage for each of the plurality of adversarial attacks by executing the following steps. Initially, an updated training dataset is obtained by training the data model using data in an adversarial class after fixing a data composition in the adversarial class. Further, an Attack Detection Rates (ADR) on each of a plurality of support sets obtained from the training dataset is determined, after training the data model using the updated training dataset. Further, each attack from among the plurality of adversarial attacks is selected as a test attack, in each of a plurality of iterations till all of the plurality of attacks are selected as the test attack and the optimum value of the split percentage for each of the plurality of adversarial attacks is determined, and the test attack selected in each iteration is processed, wherein processing the test attack includes the following steps. Initially, it is determined if the computed value of ADR for the test attack is less than a threshold, for an attack selected as a test attack from among a plurality of attacks. Further, the split percentage of the test attack is increased by a pre-defined percentage m, comprising reducing the split percentage of the attacks other than the test attack from among a plurality of attacks by an amount to compensate for the increase in split percentage of the test attack.
In yet another embodiment, the one or more hardware processors in the system are configured to receive a test data sample as input. Further, it is determined via the one or more hardware processors, whether the received test data sample is an adversarial class or a clean class, by processing the test data sample using the data model. The received test data sample is discarded if identified as the adversarial class, or is sent to a target model if the received test data sample is determined as the clean class.
In yet another embodiment, a non-transitory computer readable medium for adversarial sample detection is provided. The non-transitory computer readable medium includes a plurality of instructions, which when executed, cause one or more hardware processors to initially generate a training data. Generating the training data includes the following steps. Initially, a first clean sample set (S1), a second clean sample set (S2), and a third clean sample set (S3) are fetched as input, wherein each of the S1, S2, and S3 comprises of a plurality of clean data samples. Further, a plurality of adversarial data samples are generated by applying a plurality of adversarial attacks on the clean data samples in S3. Further, a plurality of support data sets are generated from the clean data samples in S1, by applying a plurality of adversarial attacks on the clean data samples in S1. The clean data samples in S1, S2, and S3, the plurality of adversarial data samples, and the plurality of support data sets form the training data of target/black-box model. Further, the data model is trained using the generated training data, via the one or more hardware processors. Training the data model includes the following steps. Initially, the plurality of adversarial attacks are initialized with an assigned value of split percentage for each of the plurality of adversarial attacks such that summation of value of the split percentage of the plurality of adversarial attacks is equal to 1. Further, an optimum value of the split percentage is determined for each of the plurality of adversarial attacks. Further, the training dataset is split into a plurality of splits, based on the optimum value of the split percentage. Further, the plurality of adversarial samples are updated by applying each of the plurality of the adversarial attacks on the plurality of splits. Further, an adversarial class is generated using the updated plurality of adversarial samples. Then the data model is generated using the generated adversarial class and the clean data samples in the second clean sample set. Further, the generated data model is trained using the obtained training data to obtain a plurality of optimal detector parameters, wherein the plurality of optimal detector parameters are used for the adversarial sample detection.
In another embodiment, the non-transitory computer readable medium determines the optimum value of the split percentage for each of the plurality of adversarial attacks by executing the following steps. Initially, an updated training dataset is obtained by training the data model using data in an adversarial class after fixing a data composition in the adversarial class. Further, an Attack Detection Rates (ADR) on each of a plurality of support sets obtained from the training dataset is determined, after training the data model using the updated training dataset. Further, each attack from among the plurality of adversarial attacks is selected as a test attack, in each of a plurality of iterations till all of the plurality of attacks are selected as the test attack and the optimum value of the split percentage for each of the plurality of adversarial attacks is determined, and the test attack selected in each iteration is processed, wherein processing the test attack includes the following steps. Initially, it is determined if the computed value of ADR for the test attack is less than a threshold, for an attack selected as a test attack from among a plurality of attacks. Further, the split percentage of the test attack is increased by a pre-defined percentage m, comprising reducing the split percentage of the attacks other than the test attack from among a plurality of attacks by an amount to compensate for the increase in split percentage of the test attack.
In yet another embodiment, the non-transitory computer readable medium receives via the one or more hardware processors, a test data sample as input. Further, it is determined via the one or more hardware processors, whether the received test data sample is an adversarial class or a clean class, by processing the test data sample using the data model. The received test data sample is discarded if identified as the adversarial class, or is sent to a target model if the received test data sample is determined as the clean class. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates an exemplary system for generating a data model for adversarial sample detection, according to some embodiments of the present disclosure.
FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps involved in the process of generating a data model for adversarial sample detection, using the system of FIG. 1, according to some embodiments of the present disclosure.
FIG. 3 is a flow diagram depicting steps involved in the process of generating a training data set, using the system of FIG. 1, in accordance with some embodiments of the present disclosure.
FIG. 4 is an example functional flow diagram for preparation of support data sets while generating the training data, using the system of FIG. 1, according to some embodiments of the present disclosure.
FIG. 5 is an example functional flow diagram for generating the training data, using the system of FIG. 1, according to some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Performance of any machine learning data model depends on quality of data used to generate and train the data model. While accurate training data allows the data model to generate accurate predictions, false or inaccurate training data reduces accuracy and efficiency with which various predictions are made by the data model. Adversarial attacks that add very minute perturbations to clean data fool deep learning models and result in security and safety concerns when used in critical applications. These adversarial examples, which is an input to a machine learning model, is purposely designed to cause a model to make a mistake in its predictions.
The state-of-the-art defense mechanisms for adversarial attacks require complex re-training procedures and parameter tuning. They require large number of adversarial samples to train or re-train the models to achieve a required level of robustness. This process could degrade the original performance of the models.
The method and system disclosed herein provides a mechanism for generating a data model for adversarial sample detection. The data model is generated by adaptively varying split percentage of different adversarial attacks used for training the data model
Referring now to the drawings, and more particularly to FIG. 1 through FIG. 5, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates an exemplary system for generating a data model for adversarial sample detection, according to some embodiments of the present disclosure. The system 100 includes or is otherwise in communication with hardware processors 102, at least one memory such as a memory 104, an I/O interface 112. The hardware processors 102, memory 104, and the Input /Output (I/O) interface 112 may be coupled by a system bus such as a system bus 108 or a similar mechanism. In an embodiment, the hardware processors 102 can be one or more hardware processors.
The I/O interface 112 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 112 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer and the like. Further, the I/O interface 112 may enable the system 100 to communicate with other devices, such as web servers, and external databases.
The I/O interface 112 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the I/O interface 112 may include one or more ports for connecting several computing systems with one another or to another server computer. The I/O interface 112 may include one or more ports for connecting several devices to one another or to another server.
The one or more hardware processors 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, node machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 102 is configured to fetch and execute computer-readable instructions stored in the memory 104.
The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 106.
The plurality of modules 106 include programs or coded instructions that supplement applications or functions performed by the system 100 for executing different steps involved in generating the data model and further for adversarial sample detection using the generated data model. The plurality of modules 106, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 106 may also be used as, signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 106 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 102, or by a combination thereof. The plurality of modules 106 can include various sub-modules (not shown). The plurality of modules 106 may include computer-readable instructions that supplement applications or functions performed by the system 100 for executing the different steps involved in generating the data model and further for adversarial sample detection using the generated data model.
The data repository (or repository) 110 may include a plurality of abstracted piece of code for refinement and data that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 106.
Although the data repository 110 is shown internal to the system 100, it will be noted that, in alternate embodiments, the data repository 110 can also be implemented external to the system 100, where the data repository 110 may be stored within a database (repository 110) communicatively coupled to the system 100. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in FIG. 1) and/or existing data may be modified and/or non-useful data may be deleted from the database. In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS). Functions of the components of the system 100 are now explained with reference to steps in flow diagrams in FIGS. 2 and 3, and the functional flow diagrams in FIGS. 4 and 5.
FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting a method 200 involved in the process of generating a data model for adversarial sample detection, using the system of FIG. 1, according to some embodiments of the present disclosure.
In an embodiment, the system 100 includes one or more data storage devices or the memory 104 operatively coupled to the one or more hardware processor(s) 102 and is configured to store instructions for execution of steps of the method 200 by the one or more hardware processors 102. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagram as depicted in FIG. 2. The method 200 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 200 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communication network. The order in which the method 200 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 200, or an alternative method. Furthermore, the method 200 can be implemented in any suitable hardware, software, firmware, or combination thereof.
At step 202 of the method 200, the system 100 generates a training data so as to train the data model. Various steps involved in the process of generating the training data are depicted in FIG. 3. At step 302, the system 100 fetches/obtains a first clean sample set (S1), a second clean sample set (S2), and a third clean sample set (S3) comprising a plurality of data samples. In various embodiments, the number of data samples in each of S1, S2, and S3 may be same or may be different, and may be configured to be different depending on various implementation requirements. Further, at step 304, the system 100 generates a plurality of adversarial data samples by applying a plurality of adversarial attacks on the clean data samples in at least one of the clean sample sets S1, S2, and S3. As an example, it is considered that the adversarial attacks are applied on the clean data samples in S3. Examples of some of the adversarial attacks that may be applied on the clean data samples in S3 are:
Fast Gradient Sign Method (FGSM):- In this attack, the gradients of the loss function with respect to input of the neural network are calculated. The attack is generated using the following equation:
x_adv= x + ? * sign (?_x L(x,y_true ))
Where, L is loss function of a trained data model, ?_x is gradient of the data model, x is normal sample with true label y_true, ? is amount of perturbation (attack strength), and x_adv is adversarial sample
Basic Iterative Method (BIM):- BIM adds perturbations to input samples iteratively using small step size defined as:
x_0 *= x; x_(n+1*)= ?Clip?_(x,e) {x_(n*) + a * sign (?_x L(x_n*,y_true ))}
Where, a is step size, ?Clip?_(x,e) {A} is element-wise clipping of x, and range of A_ij after clipping is in the interval {x_(i,j) - ?,x_(i,j)+?}
Projected Gradient Descent (PGD): PGD is similar to BIM attack. Difference is it initializes the adversarial example to a random point within the L8 norm ball and does random restarts.
DeepFool (DF): DF computes minimal norm adversarial perturbation for a given clean image in an iterative manner. Algorithm used by DF initializes with clean image and adds perturbations (by a small vector) at each iteration so that the clean image moves in the direction close to the boundary of classifier. Once it crosses the boundary, then it is marked as adversarial image.
Impact of each of the adversarial attacks while applied on the clean data samples is measured in terms of the following metrics:
Attack Success Rate (ASR): Number of adversarial examples generated on a test set of clean data samples that successfully fool the target model to the total number of adversarial examples generated on test set. ASR in percentage is defined as (ASR*100).
Attack Detection Rate (ADR): Number of adversarial examples generated on the test set that are successfully detected by the detector to the total number of adversarial examples generated on test set. ADR in percentage is defined as (ADR*100).
Further, at step 306, the system 100 generates a plurality of support data sets from the clean data samples in at least one of the clean data sets (for example, S1), by applying a plurality of adversarial attacks on the clean data samples in S1. During this process, the system 100 applies the plurality of adversarial attacks on the clean data samples in the one or more clean data sets. Each attack forms a corresponding adversarial class of support set. The clean data samples in one or more of the clean data classes act as clean class of all the support sets. Each combination of the clean class of the support sets and the adversarial class of support set forms a support set. Samples of k- different attacks are generated and kth adversarial sample of kth attack corresponds to an adversarial class of support set k. In this way, K number of support sets are formed. Following this approach, a plurality of support sets are formed. An example functional flow diagram depicting the steps involved in the process of generating the support data sets is given in FIG. 4. The clean data samples in S1, S2, S3, the adversarial data samples, and the support sets generated together form the training data. An example functional flow diagram depicting the steps involved in the process of generating the training data sets is given in FIG. 5.
Once the training data is generated at step 202, further at step 204 the system 100 trains a data model using the training data. Various steps in training the data model are depicted in steps 204a through 204g.
At step 204a, the system 100 initializes the plurality of adversarial attacks with an assigned value of split percentage for each of the plurality of adversarial attacks, such that summation of value of the split percentages is equal to 1. The split percentage varies the attack strengths of each of the adversarial attacks, hence plays a key role in forming the adversarial class of the training dataset of adversarial detector. Further, at step 204b, the system 100 determines an optimum value of the split percentage, for each of the plurality of adversarial attacks.
Updating the composition of data in the adversarial class is crucial in training of the adversarial detector. The composition decides the attacks, attack strength and the corresponding number of samples of each attack that are required for training the detector. If the adversarial class data composition is not learned properly, the adversarial detector fails to detect attacks with different attack strengths. Also, if the adversarial class size is increased by generating adversarial samples using multiple attacks and at different attack strengths, then the training dataset used for generating the data model becomes imbalanced and overfits to the adversarial samples.
For learning composition of data in adversarial class, the detector is stationary i.e. the detector is not updated while updating composition of the data. The system 100 at this stage obtains an updated training dataset by training the data model using data in an adversarial class after fixing a data composition in the adversarial class. Further, attack detection rates (?ADR?_1,?ADR?_2,....,?ADR?_K ) are computed against the plurality of support sets (?SS?_1,?SS?_2,...,?SS?_K ), after training the data model using the updated training data set. Further, from among the plurality of attacks, each attack is selected as a test attach in different iterations, till all of the plurality of attacks are selected as the test attack and the optimum value of split percentage is determined for each of the test attacks in each iteration. The test attack selected in each iteration is further processed, wherein processing the test attack includes the following steps. Initially, it is checked if the computed values of each of the ADRs is less than a threshold for any i^thattack (from a predefined set of K attacks). If yes, then split percentage of that i^thattack is increased by a percentage. Here, the percentage is denoted by m and increased amount m is compensated by remaining K-1 attacks which means the split percentage of rest of the K-1 attacks is reduced by an amount of m/K-1, such that the summation of the split percentage of each of the plurality of attacks is always equal to 1. The percentage value by which the split percentage is increased for ith attack is determined by the system 100 by incrementing the percentage value in a plurality of iterations, wherein in each of the plurality of iterations the percentage value is increased by a pre-defined step size (for example, 2%) in comparison with the split percentage in previous iteration. For example, where the difference between ADR on support set (1) and ADR of support set (2) is very significant (for about 20%) then the increase of samples will be high when compared to the second attack. The logic is based on the difference of ADR metric between the adversarial attacks, based on that the split percentage is increased by the user. However, if the ADR of i^th attack achieved is exceeding the threshold, or is maximum, and cannot be improved further then the split percentage of that i^th attack is fixed.
Further, after determining the optimum value of split percentage for each of the attacks, at step 204c, the system 100 splits the training data into a plurality of splits, based on the determined optimum split percentage i.e. the clean data samples in the dataset S_3 is split into K splits-(D_1,D_2,D_3,...D_K ) with split percentages s_1,s_2,.,s_K which is the determined optimum split percentage.
Further, at step 204d, the system 100 updates the generated adversarial samples by applying each of the plurality of adversarial attacks on the plurality of splits i.e. each Attack i (A_i ) is applied on each split D_i; for i=1,2,K; and adversarial samples are generated. Using all the adversarial samples, the adversarial class (A) is generated at step 204e. At this stage the composition of adversarial class is fixed/stationary and the adversarial detector is trained with clean class and adversarial class (A). The above steps are repeated iteratively until the optimal data composition of adversarial class and optimal detector parameters are found which means until the adversarial detector and process of learning the data composition of adversarial class reach corresponding saturation point and cannot improve its metrics further. Here, saturation Point is evaluated in terms of (i) Difference between ADR of i^thattack before and after the updating split percentage of i^thattack and (ii) sum of differences of ADR of remaining K-1 attacks before and after split percentage update. The first metric i.e. ASR is improved or not is being checked while maintaining the second metric i.e. ADR close to '0'. Here, if the ADR is not improved further for the attacks, then that point is considered as saturation point. Further at step 204f, the system 100 generating the data model using the adversarial class and the clean data samples from the second clean sample set S2. Further, at step 204g, the system 100 trains the data model using the generated training data to obtain a plurality of optimal detector parameters, wherein the optimal detector parameters are used for adversarial sample detection. The training process is terminated when the metrics are at respective saturation levels.
When the detector trained using the generated data model is used for the adversarial sample detection, the following steps are executed. Initially, the data for which the adversarial sample detection is to be performed is collected/received as a test data sample. Further, the received test data sample is processed using the data model, by the detector, to determine whether the received test data sample is an adversarial class or a clean class. Further, one of the following options is performed. If the test data sample is identified as the adversarial class, the system discards the received test data sample. If the received test data sample is determined as the clean class, the same is sent to a target model.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
The embodiments of present disclosure herein address unresolved problem of adversarial sample detection. The embodiment thus provides a mechanism to generate a data model that may be used by a detector for performing the adversarial sample detection. Moreover, the embodiments herein further provide a mechanism of performing the adversarial sample detection in a test data sample, using the generated data model.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

Documents

Application Documents

#	Name	Date
1	202221010821-STATEMENT OF UNDERTAKING (FORM 3) [28-02-2022(online)].pdf	2022-02-28
2	202221010821-REQUEST FOR EXAMINATION (FORM-18) [28-02-2022(online)].pdf	2022-02-28
3	202221010821-FORM 18 [28-02-2022(online)].pdf	2022-02-28
4	202221010821-FORM 1 [28-02-2022(online)].pdf	2022-02-28
5	202221010821-FIGURE OF ABSTRACT [28-02-2022(online)].jpg	2022-02-28
6	202221010821-DRAWINGS [28-02-2022(online)].pdf	2022-02-28
7	202221010821-DECLARATION OF INVENTORSHIP (FORM 5) [28-02-2022(online)].pdf	2022-02-28
8	202221010821-COMPLETE SPECIFICATION [28-02-2022(online)].pdf	2022-02-28
9	202221010821-Proof of Right [22-04-2022(online)].pdf	2022-04-22
10	202221010821-FORM-26 [22-06-2022(online)].pdf	2022-06-22
11	Abstract1.jpg	2022-07-12
12	202221010821-FER.pdf	2025-03-13
13	202221010821-PETITION UNDER RULE 137 [14-08-2025(online)].pdf	2025-08-14
14	202221010821-OTHERS [14-08-2025(online)].pdf	2025-08-14
15	202221010821-FER_SER_REPLY [14-08-2025(online)].pdf	2025-08-14
16	202221010821-CLAIMS [14-08-2025(online)].pdf	2025-08-14
17	202221010821-ABSTRACT [14-08-2025(online)].pdf	2025-08-14

Search Strategy

1	SearchHistory(3)E_08-03-2024.pdf