Abstract: Embodiments provide methods and systems for generating proxy sensitive attribute labels for bias mitigation. The method includes feeding input data to a neural network model for generating embeddings. The method includes computing differentiable loss from an output data and feeding the embeddings to a classification neural network model to generate classified downstream task labels. The method includes computing a discrepancy loss between a probability distribution function of the classified downstream task labels and an expected probability distribution function of downstream task labels for removing a downstream task label information from the embedding. The method includes training the neural network model based on overall loss computed from the differentiable loss and the discrepancy loss. The trained neural network model is configured to generate learned embeddings to be fed to unsupervised clustering algorithm to generate corresponding plurality of proxy labels for mitigating bias from any developmental stage of biased neural network model.
Description:TECHNICAL FIELD
The present disclosure relates to artificial intelligence processing systems and, more particularly to, electronic methods and complex processing systems for generating proxy labels for protected / sensitive attributes compatible with all types of debiasing techniques and support bias mitigation at any stage of a neural network model development.
BACKGROUND
Machine Learning has attained high success rates in practically every field, including healthcare, finance, and education, based on the accuracy and efficiency of a neural network model’s outcome. The deep learning data-driven methodologies automatically recognize patterns in large amounts of data. However, this leaves the deep learning model vulnerable to biases in the model itself and to biases in the data. For example, user groups with sensitive (or protected) attributes are over or under represented in the deep learning model. Examples of the protected attributes include age, gender, ethnicity, income, sexual orientation, religion, and the like. As a result, the deep learning models tend to replicate and even amplify this data bias. For example, the models may be biased and exhibit a propensity to favor one demographic group over another in various applications, including credit and loan approval, criminal justice, resume-based candidate shortlisting, medical image processing, and the like. If the decision-making is even partially based on the values of sensitive attributes, the consequences may be irreversible.
In recent years, various techniques have been developed to mitigate the unfairness/bias of machine learning models. However, most of the current debiasing algorithms have restrictions on their use in real-world scenarios since they need access to protected attributes for bias mitigation. Further, Government imposes privacy policy regulations on payment networks to restrict storage and access to the protected attributes for machine learning model training and inference. For example, General Data Protection Regulation (GDPR) by European Union, May 2018 prohibits the processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, and bio-metric data for the purpose of uniquely identifying a natural person, and data concerning health, etc. Further, Equal Credit Opportunity Act prohibits credit institutions to ask or access information about race to credit applicants.
Thus, there is a need for a technical solution for generating proxy labels for sensitive attributes to make the present bias mitigation approaches suitable for real-world applications where access to protected attributes during the model training is constrained.
SUMMARY
Various embodiments of the present disclosure provide systems, methods, electronic devices, and computer program products for generating proxy sensitive attribute labels for bias mitigation.
In an embodiment, a computer-implemented method is disclosed. The computer-implemented method includes feeding by a processor, an input data to a neural network model configured to generate a plurality of embeddings. The input data includes only a plurality of non-protected features. The method includes computing a differentiable loss from an output data generated by the neural network model. The method includes feeding the plurality of embeddings to a classification neural network model configured to assign a plurality of downstream task labels to the corresponding plurality of embeddings to generate a plurality of classified downstream task labels. The plurality of downstream task labels belongs to a biased neural network model having at least one bias in at least one developmental stage of the biased neural network model. The method includes computing a probability distribution function of the plurality of classified downstream task labels. The method includes computing a discrepancy loss between the probability distribution function of the plurality of classified downstream task labels and an expected probability distribution function of the plurality of downstream task labels. The discrepancy loss is computed for removing a downstream task label information from the plurality of embeddings. The method includes training the neural network model based on an overall loss computed from the differentiable loss and the discrepancy loss. The trained neural network model is configured to generate a plurality of learned embeddings to be fed to an unsupervised clustering algorithm to generate a corresponding plurality of proxy labels for mitigating the at least one bias in the at least one developmental stage of the biased neural network model.
In another embodiment, a system is disclosed. The system includes a communication interface, a memory including executable instructions, and a processor communicably coupled to the communication interface. The processor is configured to execute the executable instructions to cause the system to at least feed an input data to a neural network model configured to generate a plurality of embeddings. The input data includes only a plurality of non-protected features. The system is further caused to compute a differentiable loss from an output data generated by the neural network model. The system is further caused to feed the plurality of embeddings to a classification neural network model configured to assign a plurality of downstream task labels to the corresponding plurality of embeddings to generate a plurality of classified downstream task labels. The plurality of downstream task labels belongs to a biased neural network model having at least one bias in at least one developmental stage of the biased neural network model. The system is further caused to compute a probability distribution function of the plurality of classified downstream task labels. The system is further caused to compute a discrepancy loss between the probability distribution function of the plurality of classified downstream task labels and an expected probability distribution function of the plurality of downstream task labels. The discrepancy loss is computed for removing a downstream task label information from the plurality of embeddings. The system is further caused to train the neural network model based on an overall loss computed from the differentiable loss and the discrepancy loss. The trained neural network model is configured to generate a plurality of learned embeddings to be fed to an unsupervised clustering algorithm to generate a corresponding plurality of proxy labels for mitigating the at least one bias in the at least one developmental stage of the biased neural network model.
In yet another embodiment, a computer-implemented method is disclosed. The computer-implemented method performed by a processor includes feeding, by a processor, a plurality of learned embeddings generated by a trained neural network model to an unsupervised learning algorithm. The trained neural network model is previously trained using an overall loss computed from a differentiable loss and a discrepancy loss. The discrepancy loss is computed between a probability distribution function of a plurality of classified downstream task labels and an expected probability distribution function of a plurality of downstream task labels. The method includes generating two clusters using the unsupervised learning algorithm. The method includes assigning each of the two clusters as a favorable cluster and an unfavorable cluster to generate a plurality of proxy labels. The method includes feeding the plurality of proxy labels to at least one bias mitigation algorithm. The method includes applying the at least one bias mitigation algorithm for mitigating at least one bias in at least one developmental stage of a biased neural network model.
BRIEF DESCRIPTION OF THE FIGURES
For a more complete understanding of example embodiments of the present technology, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
FIG. 1 is an example representation of an environment, related to at least some example embodiments of the present disclosure;
FIG. 2 is a simplified block diagram of a system, in accordance with one embodiment of the present disclosure;
FIG. 3 is a schematic block diagram representation of a proxy label generation system, in accordance with an example embodiment;
FIG. 4 represents a schematic block diagram representation of data pre-processing, in accordance with an example embodiment;
FIG. 5 represents a schematic block diagram representation of generating a plurality of learned embeddings using an autoencoder, in accordance with an embodiment of the present disclosure;
FIG. 6 represents a schematic block diagram representation of generating a plurality of proxy labels using an unsupervised learning algorithm, in accordance with one embodiment of the present disclosure;
FIG. 7 represents a schematic block diagram representation of generating a plurality of learned embeddings using a transformer, in accordance with an embodiment of the present disclosure;
FIG. 8 represents a schematic block diagram representation of utilizing the plurality of proxy labels for applying a plurality of bias mitigation algorithms for mitigating one or more biases in any developmental stage of a biased neural network model, in accordance with an embodiment of the present disclosure;
FIG. 9 represents a flow diagram of a computer-implemented method for generating proxy sensitive attribute labels for bias mitigation, in accordance with an example embodiment;
FIG. 10 represents a flow diagram of another computer-implemented method for generating proxy sensitive attribute labels for bias mitigation, in accordance with an example embodiment; and
FIG. 11 is a simplified block diagram of a server system, in accordance with an example embodiment of the present disclosure.
The drawings referred to in this description are not to be understood as being drawn to scale except if specifically noted, and such drawings are only exemplary in nature.
DETAILED DESCRIPTION
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure can be practiced without these specific details.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in an embodiment” in various places in the specification is not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
Moreover, although the following description contains many specifics for the purposes of illustration, anyone skilled in the art will appreciate that many variations and/or alterations to said details are within the scope of the present disclosure. Similarly, although many of the features of the present disclosure are described in terms of each other, or in conjunction with each other, one skilled in the art will appreciate that many of these features can be provided independently of other features. Accordingly, this description of the present disclosure is set forth without any loss of generality to, and without imposing limitations upon, the present disclosure.
The term "payment network", used throughout the description, refers to a network or collection of systems used for the transfer of funds through the use of cash-substitutes. Payment networks may use a variety of different protocols and procedures in order to process the transfer of money for various types of transactions. Transactions that may be performed via a payment network may include product or service purchases, credit purchases, debit transactions, fund transfers, account withdrawals, etc. Payment networks may be configured to perform transactions via cash-substitutes, which may include payment cards, letters of credit, checks, financial accounts, etc. Examples of networks or systems configured to perform as payment networks include those operated by such as Mastercard®.
OVERVIEW
Various example embodiments of the present disclosure provide methods, systems, user devices, and computer program products for generating proxy sensitive attribute labels for bias mitigation in neural network models. Bias occurs when a neural network model or a machine learning algorithm has made an enormous number of assumptions that are not compatible with the real-life problems for which the model/algorithm is used. Bias skews the result of the model in favor or against a demographic or a protected group.
In various example embodiments, the present disclosure describes a computing system that facilitates the generation of a plurality of proxy labels for use in various bias mitigation techniques (hereinafter alternatively referred to as “proxy label generation system” or “system”). The system includes at least a processor and a memory. The processor is configured to feed an input data to a neural network model configured to generate a plurality of embeddings. The input data includes a plurality of non-protected features that are transformed and pre-processed by a data-preprocessing engine from a plurality of raw features. The neural network model is configured to generate an output data. The processor is configured to compute a differentiable loss from the output data generated by the neural network model. Examples of the neural network model include an autoencoder, a transformer, and the like. Accordingly, the examples of the differentiable loss include a reconstruction loss, a cross-entropy loss, and the like.
The plurality of embeddings includes a downstream task information that should be removed to generate a plurality of proxy labels accurately. In order to achieve this, the processor is configured to feed the plurality of embeddings to a classification neural network model such as a Multi-Layer Perceptron (MLP) neural network model (hereinafter alternatively referred to as “MLP engine / MLP model”). The MLP model is configured to generate a plurality of classified downstream task labels by assigning a plurality of downstream task labels to the corresponding plurality of embeddings. In one embodiment, the plurality of downstream task labels belongs to a biased neural network model having one or more biases in any of developmental stages (e.g., training, testing, validation, execution etc.) of a biased neural network model. There exists an expected probability distribution function of the plurality of downstream task labels. A downstream task is a perspective of a pretrained model running fine-tuned tasks.
In one embodiment, the processor is further configured to compute a probability distribution function of the plurality of classified downstream task labels. The processor computes a discrepancy loss between the probability distribution function of the plurality of classified downstream task labels and the expected probability distribution function of the plurality of downstream task labels. In at least one embodiment, the discrepancy loss is computed for removing the downstream task label information from the plurality of embeddings. In one embodiment, the discrepancy loss is a Kullback–Leibler Divergence (KLD) loss.
The neural network model is trained based on an overall loss computed from the differentiable loss and the discrepancy loss. The trained neural network model is configured to generate a plurality of learned embeddings that do not contain the downstream task label information. In at least one embodiment, the processor is configured to feed the plurality of learned embeddings to an unsupervised learning algorithm that generates two clusters. Each of the two clusters is assigned as a favorable cluster and an unfavorable cluster to generate a plurality of proxy labels.
In at least one embodiment, the processor is configured to feed the plurality of proxy labels to one or more bias mitigation algorithms. Bias mitigation algorithms are categorized based on where in the machine learning pipeline they are deployed, for example, pre-processing bias mitigation algorithms, in-processing bias mitigation algorithms, and post-processing bias mitigation algorithms. The proxy labels generated from the two clusters are fed as input to a bias mitigation algorithm. The bias mitigation algorithm is applied for mitigating bias in any developmental stage of a biased neural network model. The proxy labels generated for the protected attributes improve the existing debiasing techniques, specifically, in cases when the biased neural network model is not fed protected / sensitive attributes due to legal restrictions such as privacy laws or the unavailability of enough data. In one embodiment, the fairness of the trained neural network models is evaluated using various metrics such as Statistical Parity Difference (SPD) and Equalized Odds Difference (EOD).
Various embodiments of the present disclosure offer multiple advantages and technical effects. The protected attributes are required for most of the bias mitigation algorithms, and therefore generation of the proxy labels for the same improves the mitigation of bias from the neural network models drastically. Apart from revealing protected attribute information, the proxy labels also help in downsizing the number of inputs. Further, the present disclosure provides multiple ways to generate the proxy labels for protected attributes by leveraging the information embedded in the available features without actually using the protected features in the input data. Moreover, fairness is achieved in modeling outcomes while having limited/no access to the protected information, specifically, payment networks do not have consent to use or store protected attributes of cardholders.
Various example embodiments of the present disclosure are described hereinafter with reference to FIGS. 1 to 11.
FIG. 1 illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise in a different manner. The environment 100 generally includes a payment network 112 including a payment server 114, a proxy label generation system 102, a model debiasing system 104, a neural network model training system 106, and a fairness evaluation system 108 each connected to, and in communication with (and/or with access to) a network 110. The network 110 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber-optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among the entities illustrated in FIG. 1, or any combination thereof.
Various entities in the environment 100 may connect to the network 110 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof. The network 110 may include, without limitation, a local area network (LAN), a wide area network (WAN) (e.g., the Internet), a mobile network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among two or more of the entities illustrated in FIG. 1, or any combination thereof. For example, the network 110 may include multiple different networks, such as a private network made accessible by the payment network 112 and separately, a public network (e.g., the Internet) through which the proxy label generation system 102, the model debiasing system 104, the neural network model training system 106 and the fairness evaluation system 108 may communicate.
In one embodiment, the Neural Network (NN) model training system 106 (hereinafter alternatively referred to as, “NN model training system 106”) facilitates the development of a neural network model. In at least one embodiment, the neural network model is a biased neural network model that generates biased output due to various limitations such as, but not, limited to biased input, unavailability of required data inputs, privacy law restrictions to use protected attributes such as gender, race, etc. Many times, bias propagates to the neural network model even when protected attributes are not used during the training phase. This is attributed to the frequent incorporation of protected attribute data into other correlated non-protected attributes. Zip codes, for instance, can be associated with the race attribute. In at least one embodiment, the neural network model/model being developed in the NN model training system 106 is referred to as a biased neural network model due to the presence of one or more biases in the model at any developmental stage of the model i.e., training, testing or execution.
In one embodiment, the model debiasing system 104 (hereinafter alternatively referred to as, the debiasing system 104) is used via a communication interface over the network 110 to mitigate/remove one or more biases from the biased neural network model being developed in the NN model training system 106. The debiasing system 104 includes various debiasing algorithms/bias mitigation algorithms generally, classified into three categories namely, pre-processing, post-processing, and in-processing. While pre-processing bias mitigation techniques attempt to transform the input before feeding it to the biased neural network model for training, post-processing strategies filter out the output through certain transformations. To produce fair output, in-processing strategies struggle to learn bias-invariant models by imposing certain constraints during training. Most state-of-the-art bias mitigation algorithms require information about sensitive attributes to produce an unbiased model. However, in practice, these sensitive attributes are inaccessible due to difficulties in data collection, privacy, and legal constraints imposed by Government.
Various embodiments of the present disclosure provide ways to generate proxy labels for sensitive attributes (i.e., proxy sensitive attribute labels) that can be fed to the debiasing system 104 to eventually be used for mitigating the bias from the biased output generated by biased the neural network model. The proxy label generation system 102 (hereinafter alternatively referred to as, “proxy system 102” or “system 102”) includes a processor and a memory. The proxy system 102 is configured to utilize non-protected attributes to obtain proxy-sensitive labels. More specifically, all the latent information associated with the protected attributes embedded into the available non-protected features/attributes is recovered. In one embodiment, the proxy system 102 includes a self-supervised learning algorithm to produce the contextual embeddings of input samples. An embedding with maximum information about the protected attribute is learned. Next, proxy labels are generated for favorable and unfavorable groups using an unsupervised clustering approach on the embedding obtained using the self-supervised learning algorithm. In addition, the proxy system 102 should be understood to be embodied in at least one computing device in communication with the network 110, which may be specifically configured, via executable instructions, to perform as described herein, and/or embodied in at least one non-transitory computer-readable media.
Fairness in machine learning measures the degree of disparate treatment for different groups (e.g., female vs. male), or individual fairness, emphasizing similar individuals should be treated similarly. In one embodiment, a neural network model debiased using the proxy labels is evaluated for fairness using the fairness evaluation system 108. The fairness evaluation system 108 includes various metrics to quantify fairness, each focusing on different aspects of fairness of the debiased neural network model.
In one embodiment, the proxy system 102 is a separate part of the environment 100 and may operate apart from (but still in communication with, for example, via the network 110) the payment server 114. However, in other embodiments, the proxy system 102 may actually be incorporated, in whole or in part, into one or more parts of the environment 100, for example, the payment server 114. In addition, the proxy system 102 should be understood to be embodied in at least one computing device in communication with the network 110, which may be specifically configured, via executable instructions, to perform as described herein, and/or embodied in at least one non-transitory computer-readable media. In one embodiment, the payment server 114 associated with the payment network 112 is shown. The payment network 112 may be used by the payment card issuing authorities as a payment interchange network. Examples of payment interchange networks include, but are not limited to, Mastercard® payment system interchange network. The Mastercard® payment system interchange network is a proprietary communications standard promulgated by Mastercard International Incorporated® for the exchange of financial transaction data between financial institutions that are members of Mastercard International Incorporated®. (Mastercard is a registered trademark of Mastercard International Incorporated located in Purchase, N.Y.).
The number and arrangement of systems, devices, and/or networks shown in FIG. 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks, and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1. Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 100 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 100.
Referring now to FIG. 2, a simplified block diagram of a system 200, is shown, in accordance with an embodiment of the present disclosure. The system 200 is similar to the proxy system 102. In some embodiments, the system 200 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture. In some embodiments, the system 200 may be implemented in a server system. In one embodiment, the system 200 is a part of the payment network 112 or is integrated within the payment server 114. In another embodiment, the system 200 is similar to the model debiasing system 104, the NN model training system 106, or the fairness evaluation model 108.
The system 200 includes a computer system 202 and a database 204. The computer system 202 includes at least one processor 206 for executing instructions, a memory 208, a communication interface 210, and a user interface 216 that communicates with each other via a bus 212.
In some embodiments, the database 204 is integrated within the computer system 202. For example, the computer system 202 may include one or more hard disk drives as the database 204. A storage interface 214 is any component capable of providing the processor 206 with access to the database 204. The storage interface 214 may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing the processor 206 with access to the database 204.
In one embodiment, the database 204 is configured to store a plurality of attributes inclusive of protected attributes (e.g., gender, ethnicity, etc.) and non-protected attributes (e.g., transactional features such as a transaction amount, a transaction status, a transaction time, a transaction type, etc. or card features such as a card type, a card information, etc., or merchant features such as a merchant name, MCC, an aggregate merchant type, industry, super industry, etc.), a plurality of embeddings, a plurality of learned embeddings, loss functions, probability distribution functions, downstream task labels and a plurality of proxy labels.
The processor 206 includes suitable logic, circuitry, and/or interfaces to execute operations for predicting payment transaction events. Examples of the processor 206 include, but are not limited to, Graphics Processing Unit (GPU), an application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a field-programmable gate array (FPGA), and the like. The memory 208 includes suitable logic, circuitry, and/or interfaces to store a set of computer-readable instructions for performing operations. Examples of the memory 208 include a random-access memory (RAM), a read-only memory (ROM), a removable storage drive, a hard disk drive (HDD), and the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the memory 208 in the system 200, as described herein. In another embodiment, the memory 208 may be realized in the form of a database server or cloud storage working in conjunction with the system 200, without departing from the scope of the present disclosure.
The processor 206 is operatively coupled to the communication interface 210 such that the processor 206 is capable of communicating with a remote device 218 such as, the payment server 114, the proxy system 102, the debiasing system 104, the NN training model system 106, the fairness evaluation system 108 or communicate with any entity connected to the network 110 (as shown in FIG. 1).
It is noted that the system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the system 200 may include fewer or more components than those depicted in FIG. 2.
In one embodiment, the processor 206 includes a data pre-processing engine 220, an autoencoder 222, a transformer 224, a Multi-Layer Perceptron (MLP) engine 226, a Kullback–Leibler Divergence (KLD) loss computation engine 228, an unsupervised learning algorithm 230, a bias mitigation engine 232, a trained neural network engine 234, and a fairness evaluation engine 236. It should be noted that the components, described herein, can be configured in a variety of ways, including electronic circuitries, digital arithmetic, and logic blocks, and memory systems in combination with software, firmware, and embedded technologies.
The data pre-processing engine 220 includes suitable logic and/or interfaces for pre-processing raw data and converting the raw data into structured input data that is further split into a training dataset and a test dataset. The data pre-processing engine 220 includes one or more data pre-processing algorithms for performing various tasks such as, but not limited to, noise filtering, missing value imputation, feature selection, space transformation, normalization, removal of protected attributes, and the like. Some non-exhaustive examples of data pre-processing algorithms include Iterative Partitioning Filter (IPF), K–Nearest Neighbor Imputation (KNNI), relief algorithm, Mutual Information based Features Selection (MIFS), Locally Linear Embedding (LLE), and the like.
The processor 206 is configured to utilize the input data generated excluding the protected attributes to feed to a neural network model such as the autoencoder 222 in at least one embodiment. The autoencoder 222 is trained to generate learned embeddings. The autoencoder 222 involves an encoder-decoder architecture with at least one hidden layer. For a given training input data, all the features are compressed using an encoder to an embedding layer and later decoded using a decoder. The processor 206 is configured to feed the output of the hidden layer which are a plurality of embeddings to the decoder and the MLP engine 226 parallelly in at least one embodiment.
The MLP engine 226 is configured to classify the plurality of embeddings into a plurality of classified downstream task labels. The plurality of downstream task labels represents expected end-tasks to be performed by the trained neural network engine 234. In at least one embodiment, the trained neural network engine 234 is a biased neural network model having at least one bias in at least one developmental stage of the biased neural network model. The MLP engine 226 consists of three types of layers, namely, an input layer, an output layer, and a hidden layer. The input layer receives the input signal / the embeddings to be processed. The required task such as prediction and classification is performed by the output layer. An arbitrary number of hidden layers that are placed in between the input and output layer are the true computational engine of the MLP engine 226. Similar to a feed-forward network, in the MLP engine 226, the data flows in the forward direction from input to the output layer. The neurons in the MLP engine 226 are trained with the backpropagation learning algorithm.
The processor 206 is configured to compute the probability distribution function of the classified downstream task labels generated by the MLP engine 226. Thereafter, the processor 206 is configured to compute a Kullback-Leibler Divergence (KLD) loss (an example of a discrepancy loss) between the computed probability distribution function and an expected probability distribution function of the plurality of downstream task labels using the KLD loss computation engine 228. Kullback-Leibler Divergence (KLD) loss is an information-theoretic measure that evaluates the discrepancy between two probability distribution functions.
The processor 206 is also configured to compute the reconstruction loss (an example of a differentiable loss) at the output of the decoder. The overall loss is computed by the processor 206 using the reconstruction loss and the KLD loss to train the autoencoder 222 and eventually get the learned embeddings.
In an alternate example embodiment, the processor 206 is configured to utilize the input data generated including only a plurality of non-protected features to feed to another neural network model such as the transformer 224. The transformer 224 is trained to generate learned embeddings. The transformer 224 applies a self-attention mechanism to learn the inter-feature relationship. Further, transformer 224 works on a Masked Language Modeling (MLM) concept by randomly masking some fields to get the embeddings. The masked field is further predicted. The processor 206 is also configured to compute the cross-entropy loss (another example of the differentiable loss).
Parallelly, the embeddings are fed to the MLP engine 226. The MLP engine 226 is configured to classify the embeddings into the classified downstream task labels. The KLD loss computation engine 228 is configured to compute a KLD loss between the computed probability distribution function of the classified downstream task labels and an expected probability distribution function of the plurality of downstream task labels. The overall loss is computed by the processor 206 using the cross-entropy loss and the KLD loss to train the transformer 224 and eventually get the learned embeddings.
In at least one embodiment, the unsupervised learning algorithm 230 is utilized to identify various groups in the learned embeddings obtained either from the autoencoder 222 or the transformer 224. Clustering is a subjective statistical analysis, and there are many algorithms suitable for each data set and problem type. In one embodiment, centroid-based and hierarchical clustering unsupervised learning algorithms are used. More particularly, unsupervised learning algorithms such as K means, Hierarchical, or BIRCH are used to obtain two clusters that serve as a proxy for favorable and unfavorable groups.
In one embodiment, the performance of the generated proxy labels from the unsupervised clustering/learning algorithm 230 is evaluated on existing bias mitigation techniques. The processor 206 is configured to feed the proxy labels to the bias mitigation engine 232. The bias mitigation engine 232 includes various bias mitigation algorithms such as, but not limited to, Fair Mix-up, Fairness via Representation Neutralization, Adversarial Debiasing, Optimized Pre-processing, Data Augmentation, K% removal, On Fairness and Calibration, and the like. The bias mitigation engine 232 is configured to utilize the proxy labels generated for the protected attributes to de-bias the trained Neural Network (NN) engine 234.
Fairness evaluation engine 236 includes suitable logic and/or interfaces for accessing the performance of the fair model (i.e., the trained NN engine 234 / model 234) which indicates no discrimination against the bias or the protected attribute. Various fairness metrics such as the Demographic Parity or Statistical Parity Difference (SPD) metric, Equality of Odds Difference (EOD) metric, Average Odds Difference (AOD), and the like are included in the fairness evaluation engine 236 for evaluating the fairness of the model 234 after utilizing the proxy labels in any one of the debiasing algorithms as mentioned hereinabove. The fair model provides predictions that are not influenced by protected attributes. In one embodiment, the test data is used for bias testing of the trained NN model 234.
Referring now to FIG. 3, a schematic block diagram representation 300 of the proxy label generation system 102, is shown, in accordance with an example embodiment. as shown, an input data 302 is prepared using the data pre-processing engine 220 of FIG. 2 without including protected / sensitive attributes. Pre-processing of the input data 302 is explained in detail later with reference to FIG. 4. In one embodiment, one or more self-supervised learning algorithms 304 are utilized to generate a plurality of learned embeddings 310. Some non-exhaustive examples of the self-supervised learning algorithms 304 include training one or more neural network models such as the autoencoder 222, the transformer 224, and the like. Any neural network model that generates the embeddings from the given features includes labels (e.g., the downstream task information) and protected attributes.
In one embodiment, the autoencoder 222 also includes the MLP engine 226 and the KLD loss computation engine 228. The MLP engine 226 provides classified downstream task labels. In various embodiments, other classification neural network models such as a Support Vector Machine (SVM), random forests, a restricted Boltzmann machine, etc. may be used instead of the MLP engine 226 without deviating from the scope of the present disclosure. The addition of the MLP engine 226 and the KLD loss computation engine 228 makes sure that the learned embeddings 310 do not represent the downstream task label information.
When the autoencoder 222 is utilized to generate the learned embeddings 310, the reconstruction loss and the KLD loss are utilized to compute the overall loss and to train the autoencoder 222. In various embodiments, Jensen–Shannon divergence, Bhattacharyya distance, f-divergence, etc. may be used instead of the KLD loss computation engine 228 (hereinafter alternatively referred to as “KLD engine 228” ) without deviating from the scope of the present disclosure. The learned embeddings 310 are then fed to the unsupervised learning algorithm 230 which is configured to provide two groups of clusters, namely, favorable groups and unfavorable groups. From these clusters, a plurality of suitable proxy labels 314 is generated. This is explained in detail later with reference to FIG. 6. In one embodiment, the proxy labels 314 are then fed to the bias mitigation engine 232 for further use in debiasing the trained NN engine 234. The autoencoder 222 architecture is further explained in detail with reference to FIG. 5 later.
In one embodiment, the transformer 224 also includes the MLP engine 226 and the KLD loss computation engine 228. When the transformer 224 is utilized to generate the learned embeddings 310, the cross-entropy loss and the KLD loss are utilized to compute the overall loss to train the transformer 224. The learned embeddings 310 are then fed to the unsupervised learning algorithm 230 that is configured to provide two groups of clusters, namely, the favorable groups/clusters and the unfavorable groups/clusters. From these clusters, a plurality of suitable proxy labels 314 is generated. This is explained in detail later with reference to FIG. 6. In one embodiment, the proxy labels 314 are then fed to the bias mitigation engine 232 for further use in debiasing the trained NN engine 234. The transformer 224 architecture is further explained in detail with reference to FIG. 7 later.
FIG. 4 is a block diagram representation 400 of data pre-processing, in accordance with an example embodiment of the present disclosure. Depending on the use case and the objective to be achieved using a machine learning model, a corresponding type of data is collected. The data can be collected from various sources such as databases, files, libraries, web searches, and external repositories, and therefore, there are higher chances of bias present in such data. As shown, the raw data 402 is collected from various sources such as a source 402a, a source 402b … a source 402n (hereinafter alternatively referred to as “sources 402a-n”). The raw data 402 may be any data that is machine-readable or that may be converted into an embedding space. In the real world, the raw data 402 is generally incomplete i.e., lacking attribute values, lacking certain attributes of interest, or containing only aggregate data. Sometimes the raw data 402 is noisy i.e., containing errors or outliers. The raw data 402 may also be inconsistent i.e., containing discrepancies in codes or names. Further, the raw data 402 may or may not include the protected attributes. Since the collected raw data 402 may be in an undesired format, unorganized, or extremely large, further steps are needed to enhance its quality. Data pre-processing is a data mining technique that involves transforming the raw data 402 into an understandable format.
The raw data 402 is fed to the data pre-processing engine 220 for being converted to a suitable form (i.e., an input data 410) for a specific machine learning algorithm such as the trained NN engine 234. The raw data 402 includes a plurality of features represented as {f1, f2,…fn}. The data pre-processing engine 220 is configured to perform various operations on the raw data 402 such as importing applicable libraries, checking out any missing values, checking categorical values, splitting the raw data 402 into a training dataset and a test dataset, performing feature scaling, removing duplicates, correcting errors, normalization, data type conversions, visualizing data to help detect relevant relationships between variables or class imbalances, removing protected attributes and the like. For example, normalization is performed for continuous features, and one hot encoder method is performed for categorical features. Alternatively, the continuous features may be binned and the categorical features may be label encoded.
Some non-exhaustive examples of the data pre-processing algorithms include missing values imputation, noise filtering, dimensionality reduction (including feature selection and space transformations), instance reduction (including selection and generation), discretization and treatment of data for imbalanced pre-processing, and the like.
The three common steps for pre-processing the raw data 402 are formatting, cleaning, and sampling. Formatting is required to ensure that all variables within the same attribute are consistently written. For example, all phone numbers, addresses, or sums of money should be written in the same format. Data cleaning is applied to remove messy data and manage missing values. Data cleaning also includes filling in the missing values with mean values or the most frequent items or just dummy values (e.g., 0). Sampling might be required if there is a huge amount of raw data 402. During exploring and prototyping, a smaller representative sample can be fed into a model to save time and costs. Thereafter, the input data 410 (i.e., the input data 302 of FIG. 3) is generated by transforming the cleaned and formatted data (not shown) into a form that is appropriate for feeding into a model. The input data 410 includes a plurality of transformed features represented as {f1́ˈ, f2ˈ,….fnˈ}. The input data 410 may be generated through scaling, decomposition, or aggregation. In at least one embodiment, the input data 410 includes features that are not sensitive or protected. To obtain contextual embedding of the input data 410, neural network architectures such as the autoencoder 222 or the transformer 224 are trained in a self-supervised fashion to efficiently encode inter-feature relationships.
FIG. 5 represents a schematic block diagram representation 500 of generating a plurality of learned embeddings using the autoencoder 222, in accordance with an embodiment of the present disclosure. The autoencoder 222 is trained on the reconstruction loss (i.e., a differentiable loss) to obtain learned embeddings containing crucial input data details. The autoencoder 222 consists of an encoder 504, a hidden layer 508, and a decoder 506. An input data (x) 502 includes transformed features {f_1^',f_2^',⋯,f_n^' } (e.g., the transformed features 410 of FIG. 4) not including the protected features. The input data (x) 502 corresponds to the input data 410 as explained with reference to FIG. 4. During the encoding operation, each input feature vector of the input data (x) 502 gets mapped to a lower dimensional latent representation. In the decoding operation, the original input data (x) 502 gets reconstructed back from the latent representation. The autoencoder 222 is trained on a reconstruction loss that minimizes the mean square error between the input data (x) 502 and an output data (x ̂ ) 510.
In one embodiment, an input data X (i.e., the input data (x) 502) is passed through the encoder 504 to get latent representation in terms of a plurality of learned embeddings (h) 522 via the hidden layer 508 and then reconstructed as a reconstructed output data (x ̂ ) 510 by the decoder 506. The reconstructed output data (x ̂ ) 510 includes reconstructed features as represented by {f ̂_1^',f ̂_2^',⋯,f ̂_n^' } . This is shown in the equations 1 and 2 below.
h=f1(W_i*X+b_i ) …………………………. (1)
X ̂=f2(W_j*h+b_j ) …………………………. (2)
Where f1 and f2 are activation functions, W is a weight matrix and b is a bias.
The autoencoder 222 is trained on a reconstruction loss 512 represented as LossAE in equation 3 below.
Loss_AE=1/n ∑_(i=1)^n▒|X_i-(X_i ) ̂ | ………………………. (3)
where n represents the number of data points in a batch.
When features are inputted into any neural network model, the generated embeddings include labels i.e., downstream task labels / downstream task information and protected attributes. The latent embeddings obtained from the encoder 504 during training contain information about the protected attribute as it is generated from features that are correlated with the protected attribute. To ensure that the generated embeddings do not contain data relevant to the downstream classification task, a Kullback-Leibler Divergence (KLD) loss 518 is incorporated into the neural network model i.e., the autoencoder 222 in at least one embodiment. To facilitate the training of the proposed neural network architecture (i.e., the autoencoder 222) on the KLD loss 518, the generated embeddings are fed into a linear layer such as the MLP engine 514. This layer/the engine 514 is employed to predict the class labels of the downstream classification task.
In at least one embodiment, the embeddings generated by the hidden layer 508 are parallelly fed to the MLP engine 514. The MLP engine 514 is similar to the MLP engine 226 of FIG. 2. As explained with reference to FIG. 2, the MLP engine 226 includes suitable logic and/or instructions to classify the embeddings into the classified downstream task labels. In one embodiment, the MLP engine 514 is a Multi-layer Perceptron classifier that connects to a Neural Network (i.e., the autoencoder 222). Unlike other classification algorithms such as Support Vectors or Naive Bayes Classifiers, the MLP Classifier relies on an underlying Neural Network (i.e., the autoencoder 222) to perform the task of classification. MLP Classifier trains iteratively since at each time step the partial derivatives of the loss function with respect to the model parameters are computed to update the parameters. Thereafter, the processor 206 of FIG. 2 is configured to compute the probability distribution function (shown as Probability (p ̂) 516) of the classified downstream task labels generated by the MLP engine 514.
The KLD engine 228 is configured to compute the KLD loss 518. The KLD loss 518 is computed between the probability distribution (p ̂) 516 and the target probability distribution (y). KLD evaluates the discrepancy between two probability distribution functions, one being the probability distribution (p ̂) 516 of the classified downstream task labels generated by the MLP engine 514, and two being the expected probability distribution (exemplarily represented by ‘y’) of the downstream task labels that belong to the trained neural network model/engine 234. The KLD loss 518 (i.e., y||p ̂) ensures that the autoencoder 222 can be trained to generate the learned embeddings (h) 522 that do not contain information related to the downstream task information. The gradients produced from the KLD loss 518 are then propagated through the MLP engine 514 and the encoder 504 along with the reconstruction loss 512. In one embodiment, the overall loss is computed by the processor 206 using the reconstruction loss 512 and the KLD loss 518 to train the autoencoder 222 and eventually get the learned embeddings (h) 522. The overall loss is represented in equation (4) below.
Overall Loss=MAE(x,x ̂ )+KLD(y||p ̂)……………… (4)
Where MAE is the reconstruction loss 512 computed as Mean Absolute Error (MAE) loss.
This ensures that the learned embeddings (h) 522 are free from information that could be used to differentiate between different classes of the downstream task.
Referring now to FIG. 6, a schematic block diagram representation 600 of generating a plurality of proxy labels i.e., the proxy labels 314 using the unsupervised learning algorithm 230, in accordance with one embodiment of the present disclosure, is shown. The unsupervised learning algorithm 230 is used to identify various groups in the learned embeddings 522 obtained from the trained autoencoder 222. Some non-exhaustive examples of the unsupervised learning algorithm 230 include K means, Hierarchical, BIRCH, and the like.
In one example embodiment, the unsupervised learning algorithm 230 is Lloyd k-means algorithm. ‘K’ refers to the total number of clusters to be defined in the entire dataset. There is a centroid chosen for a given cluster type which is used to calculate the distance of a given data point. The distance essentially represents the similarity of features of a data point to a cluster type. In other words, the K-means algorithm identifies ‘K’ number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible. The ‘means’ in the K-means refers to averaging of the data i.e., finding the centroid. The goal is to group data into ‘K’ clusters. The Κ-means clustering algorithm uses iterative refinement to produce a final result. The algorithm inputs are the number of clusters ‘Κ’ and the data set. The data set is a collection of features for each data point (i.e., the learned embeddings (h) 522). Determining the right number of clusters in a data set is important, not only because some clustering algorithms like k-means require such a parameter, but also because the appropriate number of clusters controls the proper granularity of cluster analysis. There are many possible ways to estimate the number of clusters such as cross-validation, information criteria, the information-theoretic jump method, the G-means algorithm, and the like.
As shown by a clustering graph 608, two clusters namely, a favorable cluster 602 and an unfavorable cluster 604 are obtained that serve as the proxy labels 314. In one embodiment, a proportion of positive samples in each of the clusters is determined and the cluster having a high proportion of the positive samples is tagged as the favorable cluster 602. In other words, the group which has more likelihood of getting a positive outcome just because of their protected attributes is referred to as the favorable cluster 602, and the group which has more likelihood of getting a negative outcome just because of their protected attributes is referred to as the unfavorable cluster 604. The proxy labels 314 are determined for the favorable cluster 602 and the unfavorable cluster 604 by leveraging the bias information embedded in the non-sensitive features available in the given input data (i.e., input data (x) 502 of FIG. 5).
In one example embodiment, a customized loss function is utilized for clustering to ensure that the clusters do not represent class labels / downstream task labels. The customized loss function includes two parts, one for focusing on the quality of the clusters and the second one for making sure the clusters do not represent class labels using the KLD loss. The customized loss function is represented by equation (5) as shown below.
J=∑_(j=1)^k▒∑_(i=1)^n▒‖x_i^((j) )-c_j ‖^2 +∑_(n=0)^N▒∑_(j=1)^k▒〖KL(L_i ||C_j) 〗………………… (5)
Where J is an objective function, k is the number of clusters, n is the number of cases, x_i^((j) ) is the case i, c_j is the centroid for cluster j, and ‖x_i^((j) )-c_j ‖ is the distance function.
FIG. 7 represents a schematic block diagram representation 700 of generating a plurality of learned embeddings using the transformer 224, in accordance with an embodiment of the present disclosure. In one embodiment, the input data 410 including the transformed features {f_1^',f_2^',⋯,f_n^' } are fed to the transformer architecture. As explained with reference to FIG. 4, the transformed features are pre-processed using the data pre-processing engine 220 and do not contain the protected attributes. Further, the transformed features {f_1^',f_2^',⋯,f_n^' } are represented as ├ f_1^',410a,f_2^' 410b,⋯,f_n^' 410n┤. In one embodiment, the transformer 224 is trained on a self-supervised learning task called Masked Language Modelling (MLM). Towards this, 15% of the input data fields/features are chosen randomly and replaced with a masked token. In at least one embodiment, f_2^' 410b is a randomly masked transformed feature as shown.
The input data 410 is fed to a static embedding layer 702 of the transformer 224. The static embedding layer 702 is configured to process the features/samples to produce contextual row embeddings. The corresponding embeddings of the features are represented as field1 704a, field2 704b, … fieldn 704n. For example, the input data 410 includes ten features such as age, work class, education, marital status, hours per week, capital gain, capital loss, etc. 10 features are {f_1^',f_2^',⋯,f_10^' } . Further, each field contains an embedding of that particular feature. For example, if ten features with each field contain embedding of 16 dimensions, then the input (i.e., 704a, 704b…704n) given to a multiheaded self-attention module 750 of the transformer 224 is 10x16.
In at least one embodiment, the transformer 224 utilizes the multiheaded self-attention module 750 (hereinafter alternatively referred to as “attention module 750”) to learn the embeddings and the inter-feature relationship. The attention module 750 repeats its computations multiple times in parallel. Each of these is called an attention head. The attention module 750 splits its Query, Key, and Value parameters N-ways and passes each split independently through a separate Head. All of these similar attention calculations are then combined together to produce a final attention score. This is called multi-head attention and gives the transformer 224 greater power to encode multiple relationships and nuances for each feature. To compute self-attention, first, three vectors, Query (Q), Key (K), and Value (V), are learned corresponding to each feature in the input, and then the attention is computed as shown in the equations (6) and (7).
Attention(Q,K,V)=softmax((QK^T)/√(d_k ))V…………. (6)
where d_k is the dimension of the Query (Q) and the Key (K) vectors. The product of Q and K is a vectorized implementation of the dot product which measures the similarity between the two vectors.
head_i=Attention(QW_i^Q,KW_i^K,VW_i^V ) …………… (7)
There are three separate linear layers for the Query, Key, and Value. Each linear layer has its own weight as represented in equation (7). The input is passed through these Linear layers to produce the Q, K, and V matrices. They are three distinct matrices corresponding to three separate linear transformations. Matrices are also initialized randomly and learned during training. Finally, self-attained embeddings are obtained as shown in the equation (8).
h=Concat(head_1,…,head_h ) W^O …………… (6)
Next, an MLM (Masked Language Modelling) head 708, including a plurality of MLP layers (such as the MLP engine 226 of FIG 2), reconstructs the original fields including the masked features from the row embeddings (i.e., 704a, 704b,…704n). An output data 710 representing the reconstructed features is shown as {f ̂_1^' 710a,f ̂_2^' 710b,⋯,f ̂_n^' 710n}. The MLM head 708 trains the transformer 224 to predict a random sample of input tokens that have been replaced by a mask (e.g., the randomly masked transformed feature f_2^') placeholder in a multi-class setting over the entire input data 410.
p_i=Softmax(MLP(h)) …………… (7)
where p_i represents the probability of prediction of y_i which is a masked feature (e.g., f_2^') from the MLM head 708.
The transformer 224 is trained end-to-end by minimizing cross-entropy loss as shown in the equation (8). In one embodiment, the loss is calculated only on masked fields.
Loss_T=-∑_(c=1)^M▒〖y_i log(p_i ) 〗…………… (8)
where y_iis a masked feature.
As explained with reference to FIG. 5, when features are inputted to any neural network model, the generated embeddings include labels i.e., downstream task labels / information and protected attributes. To ensure that the generated embeddings do not contain data relevant to the downstream classification task, a Kullback-Leibler Divergence (KLD) loss 716 is incorporated into the transformer 224, In at least one embodiment. To train the transformer 224, on the KLD loss 716, the embeddings are fed to an MLP engine 712 (similar to the MLP engine 226 of FIG. 2). This engine 712 predicts the class labels of the downstream classification task i.e., the classified downstream task labels.
In at least one embodiment, the self-attained latent embeddings generated by the multiheaded self-attention module 750 is parallelly fed to the MLP engine 712. The MLP engine 712 is similar to the MLP engine 226 of FIG. 2 and the MLP engine 514 of FIG. 5. As explained earlier with reference to FIG. 2, the MLP engine 226 includes suitable logic and / or instructions to assign a plurality of downstream task labels to the corresponding plurality of embeddings to generate a plurality of classified downstream task labels. Thereafter, the processor 206 of FIG. 2 is configured to compute the probability distribution function (shown as probability (p ̂) 714) of the classified downstream task labels generated by the MLP engine 712.
In one embodiment., the KLD engine 228 is configured to compute the KLD loss 716. The KLD loss 716 is computed between the probability distribution function (p ̂) 714 and the target / expected probability distribution function (y_2) of the plurality of downstream task labels. KLD evaluates the discrepancy between the probability distribution (p ̂) 714 of the classified downstream labels generated by the MLP engine 712 and the expected probability distribution function (exemplarily represented by ‘y_2’) of the downstream task labels. The KLD loss 716 (i.e., y_2 ||p ̂) ensures that the transformer 224 can be trained to generate a plurality of learned embeddings (h) 706 that do not contain information related to the downstream classification task. In one embodiment, the overall loss is computed by the processor 206 using the cross-entropy loss as computed using the equation (8) and the KLD loss 716 to train the transformer 224 and eventually get the learned embeddings (h) 706. The overall loss is represented in equation (9) below.
Overall loss=-∑_(c=1)^M▒〖y_i log(p_i ) 〗+KLD(y_2 ||p ̂)…………………. (9)
This ensures that the learned embeddings (h) 706 are free from information that could be used to differentiate between different classes of the downstream task. The learned embeddings (h) 706 obtained from the transformer 224 contain information about the protected attribute due to its inherent property to learn the inter-feature relationships.
In at least one embodiment, the unsupervised learning algorithm 230 is used to identify various groups/clusters in the learned embeddings 706 obtained from the trained transformer 224. As explained with reference to FIG. 6, one or more unsupervised learning algorithm 230 such as K means, Hierarchical, BIRCH, and the like are utilized to generate two clusters namely, a favorable cluster and an unfavorable cluster that serve as the proxy labels such as the proxy labels 314. In one example embodiment, a customized loss function is utilized for clustering to ensure that clusters do not represent class labels / downstream task labels as explained hereinabove with reference to FIG. 6 and equation (5).
Various embodiments of the present disclosure aim to provide proxy labels for sensitive attributes to make the present bias mitigation approaches suitable for real-world applications where access to protected attributes during model training is constrained. Ideally, the likelihood of a positive outcome should be the same regardless of a person’s protected group. However, in real life, this does not hold true. The bias propagates through non-sensitive attributes that are correlated to the sensitive attributes. When such data is mapped to the high dimensional latent space, clusters of different demographic groups that exist in the data are produced to generate proxy labels. Apart from revealing protected attribute information, the proxy labels also help in downsizing the number of inputs i.e., providing a solution for extreme class classification problems. For example, if the input marker is set as aggregate merchant type, then the number of inputs would be 5000 classes. Labeling such a huge number of inputs using a supervised learning algorithm is very difficult. Using the unsupervised learning algorithm 230, the number of inputs is reduced to a smaller number gradually which further increases the overall accuracy of the trained neural network engine 2346. Over-clustering regularisation positively affects decision boundary and feature generalization.
FIG. 8 represents a schematic block diagram representation 800 of utilizing the plurality of proxy labels for applying a plurality of bias mitigation algorithms for mitigating one or more biases in any developmental stage of a biased neural network model, in accordance with an embodiment of the present disclosure. Bias is the inability to capture the true relationship between the data and the learning line by the machine learning algorithm. Bias could be introduced at various phases of a biased neutral network model’s 812 (hereinafter alternatively referred to as “biased model 812”) development, including insufficient data, inconsistent data collecting, and poor data practices. If the machine learning pipeline contains inherent biases, the biased model 812 makes wrong predictions. When creating a new machine learning model, it is vital to identify, assess, and eliminate any biases that may influence the predictions. Various embodiments of the present disclosure provide means to mitigate biases introduced in the pipeline of developing the biased neural network model 812. The proxy labels 314 generated for the protected attributes are used instead of truly protected labels due to the limitations such as privacy laws and the like. The proxy labels 314 are fed to various bias mitigation techniques at different stages of the model development to remove the biases.
As shown, a pre-processing bias mitigation algorithm 804 mitigates bias by removing the underlying discrimination from a trained data 802 which is used in the first phase of the AI development process of the biased model 812. Bias occurs when data collected is not representative of the environment in which a program is expected to implement. Further, bias occurs when some feature(s) are excluded from the dataset usually during the data wrangling. When there is a large amount of data, e.g., petabytes of data, choosing a small sample for training purposes is the best option, but while doing so features might be accidentally excluded from the sample, resulting in a biased sample. There can also be exclusion bias due to removing duplicates from the sample. Also, the exclusion of the protected attributes (i.e., a specific gender being more or less likely to get car insurance) makes the training data 802 biased. Optimized Pre-processing, Data Augmentation, or K% removal may be used as the pre-processing bias mitigation algorithm 804. The proxy labels 314 generated for any protected attributes are fed to such pre-processing bias mitigation algorithm 804 to generate a pre-processed data 806. The pre-processed data 806 does not include any bias in the input data and is ready to be fed for training the biased model 812.
In various cases, the bias is not present in the biased model 812 only due to the biased training data 802, and therefore, despite generating the pre-processed data 806, there may exist bias in the biased model 812 due to other reasons. As shown, an in-processing bias mitigation algorithm 810 provides modifications to the biased model 812 to mitigate bias during the model training phase and generate an unbiased trained neural network model 814 (hereinafter alternatively referred to as “unbiased model 814”). In-processing bias mitigation algorithm 810 describes the set of interventions and enforcing constraints during the learning process of the biased model 812. Classification with Fairness Constraints, Prejudice Remover Regularizer and Adversarial debiasing, Fair Mixup, RNF, etc. are some non-exhaustive examples of the in-processing bias mitigation algorithm 810. For example, when a bank is attempting to calculate a customer’s “ability to repay” before approving a loan. The model 812 may predict someone’s ability based on sensitive variables such as race, gender, and the like. This can be overcome by feeding the proxy labels 314 generated for such sensitive attributes to adversarial debiasing or prejudice remover. As shown, a test data 820 is fed to the unbiased model 814 that is capable of generating unbiased predictions 824 after the bias removal during the training phase.
Many times, certain parameters of a trained model create unfair or subjective outcomes, for example, the model unfairly favors someone or something over another person or thing. The bias can exist because of the design of the trained model. For example, a trained model decides to approve credit card applications, and the data is fed that includes the gender of the applicant. On this basis, the trained model might decide that women are earning less than men, and therefore women’s applications would be rejected. Such biases can be easily removed from a trained model using the proxy labels 314 in the post-processing stage of the model development. In an alternate example embodiment, the biased model 812 is a trained biased model (not shown) which is not capable of generating unbiased predictions 824. In order to make the trained biased model unbiased, a post-processing bias mitigation algorithm (not shown) can be used to mitigate bias by equalizing the odds post-training in the trained biased model. To apply a post-processing bias mitigation algorithm in practice, test data is used to determine how outputs should be modified in order to limit bias and generate an unbiased output. Post-processing bias mitigation algorithms such as On Fairness and Calibration, Equalized odds, classifying reject options and the like may be used along with the proxy labels 314 to generate the unbiased output.
For example, equality of odds is enforced on a trained biased model for the task of predicting the income of a person, in particular, predicting whether the income is >$50,000 for various attributes about the person as made available in the University of California, Irvine (UCI) machine learning repository adult dataset. Further, there are 14 attributes such as income >50,000, income <=50,000, age, work-class, education, marital status, occupation, relationship, race, sex, capital gain, capital loss, hours per week, and native country. As can be seen, race, sex, age, etc. are sensitive attributes for which the proxy labels can be generated using the autoencoder 222 or the transformer 224 by feeding the learned embeddings to the unsupervised learning algorithm 230 as explained earlier with reference to FIGS. 5, 6, and 7.
In at least one embodiment, the fairness of the unbiased model 814 for using the proxy labels 314 along with the bias mitigation algorithms is evaluated using Statistical Parity Difference (SPD) and Equalized Odds Difference (EOD). Statistical Parity Difference (SPD) is computed to determine if the prediction Y on input features X is independent of the protected attribute S for a classifier. SPD is computed using the equation (10).
SPD=P(Yˆ = 1|S = 0) - P(Yˆ = 1|S = 1)| …………. (10)
Equalized Odds Difference (EOD) is computed based on the fairness of the unbiased model 814 to be fair across both privileged and unprivileged groups. The predictor Y has an equal False Positive Rate (FPR) and False Negative Rate (FNR). This constraint enforces that accuracy is equally high in all demographics since the rate of positive and negative classification is equal across the groups. The notion of fairness here is that chances of being correctly or incorrectly classified positive should be equal for every group. EOD is computed as shown below in equations (11) to (13).
△FPR = |{P(Yˆ = 1|S = 1,Y = 0)−
P(Yˆ = 1|S = 0,Y = 0)}| (11)
………… (12)
……….. (13)
FIG. 9 represents a flow diagram of a computer-implemented method 900 for generating proxy sensitive attribute labels for bias mitigation, in accordance with an example embodiment. The method 900 depicted in the flow diagram may be executed by the system 102 or the system 200. Operations of the method 900, and combinations of operations in the method 900, may be implemented by, for example, hardware, firmware, a processor, circuitry and/or a different device associated with the execution of software that includes one or more computer program instructions. The method 900 starts at operation 902.
At operation 902, the method 900 includes feeding, by a processor 206, an input data to a neural network model configured to generate a plurality of embeddings. The input data includes only a plurality of non-protected features. The neural network model is the autoencoder 222 or the transformer 224. The input data is pre-processed by the data pre-processing engine 220 of FIG. 2.
At 904, the method 900 includes computing, by the processor 206, a differentiable loss from an output data generated by the neural network model. In case of the autoencoder 222, the differentiable loss is a reconstruction loss computed for the autoencoder 222. In case of the transformer 224, the differentiable loss is a cross-entropy loss computed for the transformer 224.
At 906, the method 900 includes feeding, by the processor 206, the plurality of embeddings to a classification neural network model configured to assign a plurality of downstream task labels to the corresponding plurality of embeddings to generate a plurality of classified downstream task labels. The plurality of downstream task labels belongs to a biased neural network model having at least one bias in at least one developmental stage of the biased neural network model. The classification neural network model is a Multi-Layer Perceptron (MLP) neural network model.
At 908, the method 900 includes computing, by the processor 206, a probability distribution function of the plurality of classified downstream task labels.
At 910, the method 900 includes computing, by the processor 206, a discrepancy loss between the probability distribution function of the plurality of classified downstream task labels and an expected probability distribution function of the plurality of downstream task labels. The discrepancy loss is computed for removing a downstream task label information from the plurality of embeddings. The discrepancy loss is a Kullback–Leibler Divergence (KLD) loss.
At 912, the method 900 includes training, by the processor 206, the neural network model based on an overall loss computed from the differentiable loss and the discrepancy loss. The trained neural network model is configured to generate a plurality of learned embeddings to be fed to an unsupervised clustering algorithm to generate a corresponding plurality of proxy labels for mitigating the at least one bias in the at least one developmental stage of the biased neural network model. The method 900 stops at the step 912.
FIG. 10 represents a flow diagram of another computer-implemented method 1000 for generating proxy sensitive attribute labels for bias mitigation, in accordance with an example embodiment. The method 1000 depicted in the flow diagram may be executed by the system 102 or the system 200. Operations of the method 1000, and combinations of operations in the method 1000, may be implemented by, for example, hardware, firmware, a processor, circuitry, and/or a different device associated with the execution of software that includes one or more computer program instructions. The method 1000 starts at operation 1002.
At operation 1002, the method 1000 includes feeding, by a processor 206, a plurality of learned embeddings generated by a trained neural network model to an unsupervised learning algorithm. The trained neural network model is previously trained using an overall loss computed from a differentiable loss and a discrepancy loss. The discrepancy loss is computed between a probability distribution function of a plurality of classified downstream task labels and an expected probability distribution function of a plurality of downstream task labels.
At operation 1004, the method 1000 includes generating, by the processor 206, generating, two clusters using the unsupervised learning algorithm. As explained herein above with reference to FIG. 7, a customized loss function may be utilized for training the unsupervised learning algorithm to ensure that the clusters do not contain downstream label tasks.
At operation 1006, the method 1000 includes assigning, by the processor 206, each of the two clusters as a favorable cluster and an unfavorable cluster to generate a plurality of proxy labels.
At operation 1008, the method 1000 includes feeding, by the processor 206, the plurality of proxy labels to at least one bias mitigation algorithm.
At operation 1010, the method 1000 includes applying, by the processor 206, the at least one bias mitigation algorithm for mitigating at least one bias in at least one developmental stage of a biased neural network model. The method 1000 ends at operation 1010.
Various embodiments of the present disclosure provide multiple advantages. Various embodiments provide multiple ways to generate proxy labels for protected attributes by leveraging the information embedded in the available features without actually using the protected features in the input data. Further, fairness is achieved in modeling outcomes while having limited/no access to the protected information, specifically, payment networks do not have consent to use or store protected attributes of cardholders. As the protected attributes are required for most of the bias mitigation algorithms, the generation of the proxy labels for the same improves the mitigation of bias from the neural network models drastically. A customized loss function is utilized for clustering to ensure that clusters do not represent downstream task labels.
FIG. 11 is a simplified block diagram of a server system 1100, in accordance with one embodiment of the present disclosure. In one embodiment, the server system 1100 is an example of a server system that includes a proxy label generation system 1102a. The proxy label generation system 1102a is the same as the proxy label generation system 102 shown and explained with reference to FIG. 1. In one embodiment, the server system 1100 is the payment server 114 of FIG. 1. The server system 1100 includes a processing system 1102 configured to extract programming instructions from a memory 1104 to provide various features of the present disclosure. In at least one embodiment, the processing system 1102 is a Graphics Processing Unit (GPU). Further, two or more components may be embodied in one single component, and/or one component may be configured using multiple sub-components to achieve the desired functionalities. Some components of the server system 1100 may be configured using hardware elements, software elements, firmware elements, and/or a combination thereof. In one embodiment, the server system 1100 is configured to generate proxy labels for protected attributes for bias mitigation in an artificial intelligence model.
Via a communication interface 1106, the processing system 1102 receives information from a remote device 1108 such as the neural network training system 106, the model debiasing system 104, the fairness evaluation system 108, the payment server 114, and the like. The processing system 1102 also includes the proxy label generation system 1102a. The server system 1100 may perform similar operations as performed by the system 200 for data pre-processing, generating the plurality of embeddings, generating the plurality of learned embeddings, computing various loss functions, computing various probability distribution functions, generating clusters of favorable and unfavorable groups, utilizing the generated proxy labels for input to the bias mitigation algorithms, removing bias from the biased neural network model and the like.
The components of the server system 1100 provided herein may not be exhaustive, and the server system 1100 may include more or fewer components than those depicted in FIG. 11. Further, two or more components may be embodied in one single component, and/or one component may be configured using multiple sub-components to achieve the desired functionalities. Some components of the server system 1100 may be configured using hardware elements, software elements, firmware elements, and/or a combination thereof.
The disclosed method with reference to FIG. 9 and FIG. 10, or one or more operations of the system 200 may be implemented using software including computer-executable instructions stored on one or more computer-readable media (e.g., non-transitory computer-readable media, such as one or more optical media discs, volatile memory components (e.g., DRAM or SRAM)), or nonvolatile memory or storage components (e.g., hard drives or solid-state nonvolatile memory components, such as Flash memory components) and executed on a computer (e.g., any suitable computer, such as a laptop computer, netbook, Web book, tablet computing device, smartphone, or other mobile computing devices). Such software may be executed, for example, on a single local computer or in a network environment (e.g., via the Internet, a wide-area network, a local-area network, a remote web-based server, a client-server network (such as a cloud computing network), or other such networks) using one or more network computers. Additionally, any of the intermediate or final data created and used during the implementation of the disclosed methods or systems may also be stored on one or more computer-readable media (e.g., non-transitory computer-readable media) and are considered to be within the scope of the disclosed technology. Furthermore, any of the software-based embodiments may be uploaded, downloaded, or remotely accessed through a suitable communication means. Such suitable communication means include, for example, the Internet, the World Wide Web, an intranet, software applications, cable (including fiber optic cable), magnetic communications, electromagnetic communications (including RF, microwave, and infrared communications), electronic communications, or other such communication means.
Although the invention has been described with reference to specific exemplary embodiments, it is noted that various modifications and changes may be made to these embodiments without departing from the broad spirit and scope of the invention. For example, the various operations, blocks, etc., described herein may be enabled and operated using hardware circuitry (for example, complementary metal-oxide semiconductor (CMOS) based logic circuitry), firmware, software, and/or any combination of hardware, firmware, and/or software (for example, embodied in a machine-readable medium). For example, the apparatuses and methods may be embodied using transistors, logic gates, and electrical circuits (for example, application-specific integrated circuit (ASIC) circuitry and/or in Digital Signal Processor (DSP) circuitry).
Particularly, the system 200 and its various components may be enabled using software and/or using transistors, logic gates, and electrical circuits (for example, integrated circuit circuitry such as ASIC circuitry). Various embodiments of the invention may include one or more computer programs stored or otherwise embodied on a computer-readable medium, wherein the computer programs are configured to cause a processor of the computer to perform one or more operations. A computer-readable medium storing, embodying, or encoded with a computer program, or similar language, may be embodied as a tangible data storage device storing one or more software programs that are configured to cause a processor or computer to perform one or more operations. Such operations may be, for example, any of the steps or operations described herein. In some embodiments, the computer programs may be stored and provided to a computer using any type of non-transitory computer-readable media. Non-transitory computer-readable media include any type of tangible storage media. Examples of non-transitory computer-readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (compact disc read-only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (BLU-RAY® Disc), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash memory, RAM (random access memory), etc.). Additionally, a tangible data storage device may be embodied as one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. In some embodiments, the computer programs may be provided to a computer using any type of transitory computer-readable media. Examples of transitory computer-readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer-readable media can provide the program to a computer via a wired communication line (e.g., electric wires, and optical fibers) or a wireless communication line.
Various embodiments of the invention, as discussed above, may be practiced with steps and/or operations in a different order, and/or with hardware elements in configurations, which are different than those which are disclosed. Therefore, although the invention has been described based upon these exemplary embodiments, it is noted that certain modifications, variations, and alternative constructions may be apparent and well within the spirit and scope of the invention.
Although various exemplary embodiments of the invention are described herein in a language specific to structural features and/or methodological acts, the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as exemplary forms of implementing the claims.
, Claims:WE CLAIM:
1. A computer-implemented method comprising:
feeding, by a processor, an input data to a neural network model configured to generate a plurality of embeddings, wherein the input data comprises only a plurality of non-protected features;
computing, by the processor, a differentiable loss from an output data generated by the neural network model;
feeding, by the processor, the plurality of embeddings to a classification neural network model configured to assign a plurality of downstream task labels to the corresponding plurality of embeddings to generate a plurality of classified downstream task labels, wherein the plurality of downstream task labels belongs to a biased neural network model having at least one bias in at least one developmental stage of the biased neural network model;
computing, by the processor, a probability distribution function of the plurality of classified downstream task labels;
computing, by the processor, a discrepancy loss between the probability distribution function of the plurality of classified downstream task labels and an expected probability distribution function of the plurality of downstream task labels, the discrepancy loss computed for removing a downstream task label information from the plurality of embeddings; and
training, by the processor, the neural network model based on an overall loss computed from the differentiable loss and the discrepancy loss, wherein the trained neural network model is configured to generate a plurality of learned embeddings to be fed to an unsupervised clustering algorithm to generate a corresponding plurality of proxy labels for mitigating the at least one bias in the at least one developmental stage of the biased neural network model.
2. The method as claimed in claim 1, wherein the classification neural network model is a Multi-Layer Perceptron (MLP) neural network model.
3. The method as claimed in claim 2, wherein the discrepancy loss is a Kullback–Leibler Divergence (KLD) loss.
4. The method as claimed in claim 3, wherein the neural network model is an autoencoder.
5. The method as claimed in claim 4, wherein the differentiable loss is a reconstruction loss computed for the autoencoder.
6. The method as claimed in claim 5, wherein the overall loss is computed using the reconstruction loss and the Kullback–Leibler Divergence (KLD) loss.
7. The method as claimed in claim 3, wherein the neural network model is a transformer.
8. The method as claimed in claim 7, wherein the differentiable loss is a cross-entropy loss computed for the transformer.
9. The method as claimed in claim 8, wherein the overall loss is computed based on the cross-entropy loss and the Kullback–Leibler Divergence (KLD) loss.
10. The method as claimed in claim 1, further comprising:
generating two clusters using the unsupervised learning algorithm by feeding the plurality of learned embeddings to the unsupervised learning algorithm; and
assigning each of the two clusters as a favorable cluster and an unfavorable cluster to generate a plurality of proxy labels.
11. The method as claimed in claim 10, further comprising:
feeding the plurality of proxy labels to at least one bias mitigation algorithm; and
applying the at least one bias mitigation algorithm for mitigating at least one bias in at least one developmental stage of a biased neural network model.
12. A system comprising:
a communication interface;
a memory comprising executable instructions; and
a processor communicably coupled to the communication interface and configured to execute the instructions to cause the system to at least:
feed an input data to a neural network model configured to generate a plurality of embeddings, wherein the input data comprises only a plurality of non-protected features;
compute a differentiable loss from an output data generated by the neural network model;
feed the plurality of embeddings to a classification neural network model configured to assign a plurality of downstream task labels to the corresponding plurality of embeddings to generate a plurality of classified downstream task labels, wherein the plurality of downstream task labels belongs to a biased neural network model having at least one bias in at least one developmental stage of the biased neural network model;
compute a probability distribution function of the plurality of classified downstream task labels;
compute a discrepancy loss between the probability distribution function of the plurality of classified downstream task labels and an expected probability distribution function of the plurality of downstream task labels, the discrepancy loss computed for removing a downstream task label information from the plurality of embeddings; and
train the neural network model based on an overall loss computed from the differentiable loss and the discrepancy loss, wherein the trained neural network model is configured to generate a plurality of learned embeddings to be fed to an unsupervised clustering algorithm to generate a corresponding plurality of proxy labels for mitigating the at least one bias in the at least one developmental stage of the biased neural network model.
13. The system as claimed in claim 12, wherein the classification neural network model is a Multi-Layer Perceptron (MLP) neural network model.
14. The system as claimed in claim 13, wherein the discrepancy loss is a Kullback–Leibler Divergence (KLD) loss.
15. The system as claimed in claim 14, wherein the neural network model is an autoencoder.
16. The system as claimed in claim 15, wherein the differentiable loss is a reconstruction loss computed for the autoencoder.
17. The system as claimed in claim 16, wherein the overall loss is computed based on the reconstruction loss and the Kullback–Leibler Divergence (KLD) loss.
18. The system as claimed in claim 14, wherein the neural network model is a transformer.
19. The system as claimed in claim 12, wherein the system is further caused to:
generate two clusters using the unsupervised learning algorithm by feeding the plurality of learned embeddings to the unsupervised learning algorithm;
assign each of the two clusters as a favorable cluster and an unfavorable cluster to generate a plurality of proxy labels;
feed the plurality of proxy labels to at least one bias mitigation algorithm; and
apply the at least one bias mitigation algorithm for mitigating at least one bias in at least one developmental stage of a biased neural network model.
20. A computer-implemented method comprising:
feeding, by a processor, a plurality of learned embeddings generated by a trained neural network model to an unsupervised learning algorithm, the trained neural network model previously trained using an overall loss computed from a differentiable loss and a discrepancy loss, the discrepancy loss computed between a probability distribution function of a plurality of classified downstream task labels and an expected probability distribution function of a plurality of downstream task labels;
generating, by the processor, two clusters using the unsupervised learning algorithm;
assigning, by the processor, each of the two clusters as a favorable cluster and an unfavorable cluster to generate a plurality of proxy labels;
feeding, by the processor, the plurality of proxy labels to at least one bias mitigation algorithm; and
applying, by the processor, the at least one bias mitigation algorithm for mitigating at least one bias in at least one developmental stage of a biased neural network model.
| # | Name | Date |
|---|---|---|
| 1 | 202341038888-STATEMENT OF UNDERTAKING (FORM 3) [06-06-2023(online)].pdf | 2023-06-06 |
| 2 | 202341038888-POWER OF AUTHORITY [06-06-2023(online)].pdf | 2023-06-06 |
| 3 | 202341038888-FORM 1 [06-06-2023(online)].pdf | 2023-06-06 |
| 4 | 202341038888-FIGURE OF ABSTRACT [06-06-2023(online)].pdf | 2023-06-06 |
| 5 | 202341038888-DRAWINGS [06-06-2023(online)].pdf | 2023-06-06 |
| 6 | 202341038888-DECLARATION OF INVENTORSHIP (FORM 5) [06-06-2023(online)].pdf | 2023-06-06 |
| 7 | 202341038888-COMPLETE SPECIFICATION [06-06-2023(online)].pdf | 2023-06-06 |
| 8 | 202341038888-Proof of Right [21-10-2023(online)].pdf | 2023-10-21 |