Methods And Systems For Bias Mitigation For Credit Risk Assessment

< Back

Methods And Systems For Bias Mitigation For Credit Risk Assessment

Abstract: Embodiments provide methods and systems for bias mitigation for credit risk assessment. Method includes accessing training data including a plurality of training samples from a database. Each training sample represents historical credit event timeline data associated with a cardholder of a plurality of cardholders. Method includes filtering biased training data from the training data to generate unbiased training data based on a bias mitigation model and evaluating fairness evaluation metrics associated with credit risk assessment model that is trained based on the unbiased training data. The credit risk assessment model is trained for determining decision whether to extend credit to particular cardholder or not. Method further includes training a neural network model based on the unbiased training data. The neural network model is trained based on exploring a fairness related metrics corresponding to two or more cardholder groups and a delayed reward function.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

03 August 2021

Publication Number

08/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

MASTERCARD INTERNATIONAL INCORPORATED

2000 Purchase Street, Purchase, NY 10577, United States of America

Inventors

1. Aakash Agarwal

C3 - 604, Uniworld Gardens II, Sector 47, Gurugram 122018, Haryana, India

2. Debasmita Das

28, South Road, Santoshpur, Kolkata 700075, West Bengal, India

3. Tanmoy Bhowmik

206 United Elysium, Seegehalli, Bangalore 560067, Karnataka, India

4. Kamna Meena

Flat No. B-208, Nandani Bhawan, Sant Nagar, Main Road, Burari, Delhi 110084, India

5. Aditi Rai

Rai Saw Mill parisar, Agasoud Road, Kanoongo ward, Bina 470113, Madhya Pradesh, India

6. Yatin Katyal

23 A, Mansarovar Colony, Rohtak 124001, Haryana, India

7. Anubha Pandey

A-88, Kanta Khaturiya Colony, Near Mann Mandir, Bikaner 334001, Rajasthan, India

8. Deepak Bhatt

H 274, Nehru Colony, Dharampur, Dehradun 248001, Uttarakhand, India

9. Ram Ganesh V

Sree Hari Ram, Kra 194, New Road, Infort, Statue Junction, Tripunithura, Ernakulam 682301, Kerala, India

10. Bhushan Jayant Chaudhari

Vinayak Housing Society, Plot number-29 P-5, N-8, CIDCO Aurangabad, Aurangabad 431003, Maharashtra, India

Specification

DESC:
FORM 2
THE PATENTS ACT 1970
(39 of 1970)
&
The Patent Rules 2003
COMPLETE SPECIFICATION
(refer section 10 & rule 13)

TITLE OF THE INVENTION:
METHODS AND SYSTEMS FOR BIAS MITIGATION FOR CREDIT RISK ASSESSMENT

APPLICANT(S):

Name:

Nationality:

Address:

MASTERCARD INTERNATIONAL INCORPORATED

United States of America

2000 Purchase Street, Purchase, NY 10577, United States of America

PREAMBLE TO THE DESCRIPTION

The following specification particularly describes the invention and the manner in which it is to be performed.

DESCRIPTION
(See next page)

METHODS AND SYSTEMS FOR BIAS MITIGATION FOR CREDIT RISK ASSESSMENT

TECHNICAL FIELD
The present disclosure relates to artificial intelligence processing systems and, more particularly to, electronic methods and complex processing systems for bias mitigation in credit risk assessment models.

BACKGROUND
Credit risk assessment or loan default risk evaluation is a critical component for assessing loan approval, monitoring loans, and the pricing process. Credit risk assessment is important to financial institutions that provide loans to businesses and individuals. Credit and loans have the risk of being defaulted. To understand the risk levels of credit users (corporations and individuals), credit loan providers may collect a vast amount of information from borrowers. Thereafter, statistical predictive analytic techniques may be used to analyze or determine risk levels involved in loans. In banking, credit risk assessment often relies on credit scoring models.
In an example, credit scoring models provide estimates of the probability of default (PD) of a borrower, usually over a one-year period. These models output a score that translates the probability of a given entity, a private individual or a company, becoming a defaulter in a future period. The credit scoring models may perform classification of a borrower to distinguish potential defaulters (non-payers) from non-defaulters (payers) based on information stored within a set of features of the borrower involved in credit or loan transactions.
The models for credit scoring and profiling require historical and reference data to accurately calibrate the credit algorithm. Algorithm design plays a crucial role in potentially introducing bias to the classifier’s prediction. In certain cases, a lot of historical data may not be available for a certain business, such as a new business, or a community, such as an underprivileged group. Moreover, distributions in the algorithm are different for different groups due to the under-representation or historical bias (for example, males and females with similar features may have different salaries and thus different interest rates, new businesses may lack historical financial data and be classified as potential defaulters, under-represented groups may lack historical financial data and be classified as potential defaulter, and so on). Typically, algorithms tend to explore this difference in the distribution among the groups to lift the overall accuracy and in the process may result in bias being exhibited towards certain groups. Therefore, a large number of unrealistic assumptions may be imposed in the credit algorithm. This may result in biased classifications and error rates.
In certain cases, synthetic data may be generated for under-represented groups. In other words, synthetic data may be used to supplement a lack of historical data for fraud or credit risk assessment of under-represented groups. With the advent of generative modeling techniques, synthetic data and its use has penetrated across various domains from unstructured data such as images, text to structured dataset modeling healthcare outcomes, risk decisions in the financial domain, and many more. It overcomes various challenges such as limited training data, class imbalance, etc. Generative adversarial networks (GANs) have become a popular choice for synthetic data generation. However, the increasing reliance on synthetic data may lead to risks of unintended bias. In particular, the algorithm of the GAN model may also explore differences in distribution among various groups and in the process amplify these biases while generating synthetic data.
Thus, there exists a technological need for technical solutions to suppress or mitigate unintended harms owing to bias introduced in algorithms of trained decision-making models used for credit risk assessment.

SUMMARY
Various embodiments of the present disclosure provide methods and server systems for mitigating bias in credit risk assessment.
In an embodiment, a computer-implemented method is disclosed. The method includes accessing, by a server system, training data comprising a plurality of training samples from a database. Each training sample represents historical credit event timeline data associated with a cardholder of a plurality of cardholders. The method includes filtering, by the server system, biased training data from the training data to generate unbiased training data based, at least in part, on a bias mitigation model and evaluating, by the server system, fairness evaluation metrics associated with a credit risk assessment model that is trained based, at least in part, on the unbiased training data. The credit risk assessment model is trained for determining a decision whether to extend credit to a particular cardholder or not. The method further includes training, by the server system, a neural network model based, at least in part, on the unbiased training data. The neural network model is trained based on exploring a plurality of fairness related metrics corresponding to the two or more cardholder groups and a delayed reward function.

BRIEF DESCRIPTION OF THE FIGURES
For a more complete understanding of example embodiments of the present technology, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
FIG. 1 is an example representation of an environment, related to at least some example embodiments of the present disclosure;
FIG. 2 is a simplified block diagram of a server system, in accordance with one embodiment of the present disclosure;
FIG. 3 is a block diagram representation of assessing the credit worthiness of a cardholder using different models, in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic representation for generating unbiased training data and fair synthetic training data for under-represented groups, in accordance with an embodiment of the present disclosure;
FIG. 5 shows a process flow for generating unbiased training data and fair synthetic training data, in accordance with an embodiment of the present disclosure;
FIG. 6 is an exemplary representation of plots depicting data points spread after and before applying the bias mitigation model, in accordance with an embodiment of the present disclosure;
FIG. 7 is a simplified block diagram of a classification model, in accordance with an embodiment of the present disclosure;
FIGS. 8A and 8B show experiment results for output comparison of the classification model with the Mix-group neural layer, in accordance with an embodiment of the present disclosure;
FIGS. 9A and 9B, collectively, represent a flow diagram of a training phase for implementing bias mitigation in credit risk assessment, in accordance with an embodiment of the present disclosure; and
FIG. 10 is a flow diagram of a computer-implemented method for bias mitigation for the credit risk assessment model, in accordance with an embodiment of the present disclosure.
The drawings referred to in this description are not to be understood as being drawn to scale except if specifically noted, and such drawings are only exemplary in nature.

DETAILED DESCRIPTION
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure can be practiced without these specific details.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in an embodiment” in various places in the specification is not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
Moreover, although the following description contains many specifics for the purposes of illustration, anyone skilled in the art will appreciate that many variations and/or alterations to said details are within the scope of the present disclosure. Similarly, although many of the features of the present disclosure are described in terms of each other, or in conjunction with each other, one skilled in the art will appreciate that many of these features can be provided independently of other features. Accordingly, this description of the present disclosure is set forth without any loss of generality to, and without imposing limitations upon, the present disclosure.
The term “credit loan provider”, used throughout the description, refers to a financial institution normally called a "credit lending company" in which an individual or an institution can request a credit loan. The credit loan provider may also perform internal screening to check the credit worthiness of the cardholders.
The terms "cardholder" or "the user", used throughout the description, refer to a person or an organization who requests a credit from the credit loan provider. In certain cases, certain financial information, such as transaction details, and user-related information, such as demographic information, etc., relating to the cardholder may be stored by the credit loan provider.
The term "delinquency rate", used throughout the description, refers to a number in their economic sense and refers to failure in or neglect of duty or obligation; dereliction; something, such as a past debt. For example, the term “user delinquency rate” is a number that may be represented as a percentage to show the probability that the user will become delinquent in the future.
The term "payment network", used throughout the description, refers to a network or collection of systems used for the transfer of funds through the use of cash-substitutes. Payment networks may use a variety of different protocols and procedures to process the transfer of money for various types of transactions. Transactions that may be performed via a payment network may include product or service purchases, credit purchases, debit transactions, fund transfers, account withdrawals, etc. Payment networks may be configured to perform transactions via cash-substitutes, which may include payment cards, letters of credit, checks, financial accounts, etc. Examples of networks or systems configured to perform as payment networks include those operated by such as Mastercard®.
The term “credit risk assessment model” used throughout the description, refers to a group of decision models and their underlying techniques which give support to credit providers when providing credit or loans to borrowers or customers. The credit risk assessment model may be alternatively used with “credit risk scoring model” or “credit scoring model”. The credit scoring model may be used on decisions related to credit admission evaluation. The credit scoring model is developed to classify credit applications as “accepted” or “rejected” with respect to a set of features associated with applicants or borrowers. Such a set of features may include, for example, demographic information and financial transaction information.
It may be understood that an application is accepted or rejected based on the expectation that an applicant is able or not able to repay his financial obligation. The credit scoring model may include a plurality of classification rules based on previously accepted and rejected applications. Furthermore, they are used to predict borrowers’ credit risk. In an example, the credit scoring model is built using a plurality of artificial neural networks (ANNs) operating as various classification algorithms for data mining.

OVERVIEW
Various example embodiments of the present disclosure provide methods and systems for bias mitigation in credit risk assessment.
In existing solutions, a lot of under-represented communities' data is not reflected in credit risk scoring models. Most statistical learning algorithms or models strongly rely on an oversimplified assumption that the source and target data are independent and identically distributed. But the data from different sources or domains have different distributions and a model trained on data from one domain (i.e., source domain) might not be able to perform well for another domain with a different distribution (i.e., target domain). The use of synthetic data to supplement a lack of historical data for fraud or risk monitoring AIDA solutions is becoming common. However, this increasing reliance on synthetic data may lead to risks of unintended bias owing to bias in the generated synthetic data and may create problems in credit risk assessments.
To solve the above technical problems in the credit risk assessment, the present disclosure provides methods and systems to generate fair synthetic data which is fair and bias-free using certain data pre-processing techniques. In one embodiment the present disclosure provides methods and systems to develop a universal pipeline to generate fair and unbiased synthetic data independent of the GAN architectures. The proposed methods and systems will generate fair synthetic data independent of the GAN architecture as the biased samples are removed from training data and the synthetic data generated has better information to capture the underlying distribution of the training data. Further, the model developed using fair synthetic data will achieve better performance on the test sample. The proposed product allows for fair data generation which is of high relevance across data-sensitive industries such as finance, healthcare, etc. As it may be understood, certain training data for training a model for synthetic data generation may include protected attributes that may not be allowed for use. Subsequently, the proposed methods and systems aim to develop and validate a method for synthetic data generation which minimizes the error between clusters being discriminated differently in an unsupervised setting owing to protected attributes.
In general, a neural network model represents a function from inputs to outputs. Each unit sums its inputs and adds a constant (the ‘bias’) to form a total input. Thereafter, a function is applied to give the output. The links have weights that multiply the signals traveling along them by that factor. The output may be fed back to learn from the loss and gradient of the function. The weight is subsequently updated. In this regard, bias in the training data is a type of error in which certain elements of the training data are more heavily weighted and/or represented than others. A biased dataset does not accurately represent a model's use case, resulting in skewed outcomes, low accuracy levels, and analytical errors.
In addition to the bias mitigation in credit risk assessment, the present disclosure provides methods and systems to generate synthetic data which is fair and bias-free using certain data pre-processing techniques. In one embodiment, the present disclosure provides methods and systems to analyze various state-of-the-art pre-processing-based bias mitigation techniques and develop a universal pipeline to generate fair and unbiased synthetic data independent of the GAN architectures. The proposed methods and systems will generate fair synthetic data independent of the GAN architecture as the biased samples are removed from training data and the synthetic data generated has better information to capture the underlying distribution of the training data. Further, the model developed using fair synthetic data will achieve better performance on the test sample. The proposed product allows for fair data generation which is of high relevance across data-sensitive industries such as finance, healthcare, etc. As it may be understood, certain training samples for training a model for synthetic data generation may include protected attributes that may not be allowed for use. Subsequently, the proposed methods and systems aim to develop and validate a method for synthetic data generation which minimizes the error between clusters being discriminated differently in an unsupervised setting owing to protected attributes.
In various example embodiments, the present disclosure describes a server system for data pre-processing using a bias mitigation model. The server system includes a processor and memory. The server system is configured to access training data from a database. Alternatively, the server system may access training data from a plurality of sources. The training data includes a plurality of training samples associated with one or more population groups from a set of population groups. The server system uses the received training data to perform data pre-processing techniques to remove biased samples in the training data.
It may be noted that a neural network model represents a function from inputs to outputs. Each unit sums its inputs and adds a constant (the ‘bias’) to form a total input. Thereafter, a function is applied to give the output. The links have weights that multiply the signals traveling along them by that factor. The output may be fed back to learn from the loss and gradient of the function. The weight is subsequently updated. In this regard, bias in the training data is a type of error in which certain elements of the training data are more heavily weighted and/or represented than others. A biased dataset does not accurately represent a model's use case, resulting in skewed outcomes, low accuracy levels, and analytical errors.
In one embodiment, the server system is configured to access training data from a database. The training data includes a plurality of training samples where each training sample represents historical credit event timeline data associated with each cardholder. The historical credit event timeline data includes an event timeline including credit events associated with the cardholder that occurred during a particular time interval. The server system is configured to filter biased training data from the training data to generate unbiased training data based, at least in part, on a bias mitigation model. The bias mitigation model is based on the K% removal method.
To perform bias mitigation from the training data, the server system is configured to determine a set of credit features associated with each cardholder based, at least in part, on the plurality of training samples. Thereafter, the server system is configured to remove one or more credit features from the set of credit features that are in correlation with a particular protected attribute equal to or above a threshold correlation value. The plurality of training samples is also divided into two or more cardholder groups based on one or more protected attributes and labeled data associated with the plurality of training samples. The one or more protected attributes include at least one of: gender, ethnicity, age, or demographics. The two or more cardholder groups include at least a privileged group and an unprivileged group. The server system is configured to calculate a cosine similarity score of each training sample from the unprivileged group to each training sample of the privileged group and identifies similar training samples from the privileged group and the unprivileged group based, at least in part, on a similarity threshold value and the cosine similarity score. Based on the identification, the server system is configured to remove top K% biased training samples from each of the privileged group and the unprivileged group to generate unbiased training data.
In one embodiment, the server system is configured to generate fair synthetic training data corresponding to under-represented groups from the two or more cardholder groups based, at least in part, on the unbiased training data and a generative adversarial network (GAN) model.
In one embodiment, the server system is configured to provide unbiased training data and fair synthetic training data to a credit risk assessment model for training. The credit risk assessment model is a classification model (e.g., feed-forward neural network). The classification model includes a plurality of feed-forward layers and a Mix-group neural layer, where the Mix-group neural layer is configured to mitigate bias in classification output and enable learning representations that are invariant for two or more cardholder groups. The server system is also configured to evaluate fairness evaluation metrics for the credit risk assessment model to learn fairness and bias-free predictions.
In one embodiment, the server system is configured to train a neural network model (i.e., reinforcement learning (RL) model) based on unbiased training data and fair synthetic training data. The RL model is trained based at least on a plurality of fairness-related metrics corresponding to two or more cardholder groups that are determined in simulated delayed episodes. The RL model is further trained based on a delayed reward function that includes at least the credit risk score gain of the cardholder within an observation period and the delinquency rate of the cardholder.
Various embodiments of the present disclosure offer multiple advantages and technical effects. For instance, the present disclosure allows removing of biased training samples from training data to mitigate bias in the credit risk assessment model or credit risk scoring model. The biased samples are removed from training data. As a result, the analytics provided by the credit risk scoring model is not skewed and hence reliable. Moreover, the present disclosure allows the generation of an unbiased synthetic dataset. The proposed solution generates fair synthetic data independent of the GAN architecture. The synthetic training samples generated has better information to capture the underlying distribution of the training data. The fairness objective is to ensure that systems developed using synthetic samples are constrained to not increase disparate harms to different groups in society. In one embodiment, the systems developed using synthetic samples are constrained to not increase the true positive rate and false positive rate ratio among different groups in society. In another embodiment, the objective is to constrain the disparity for groups created using the intersection of protected attributes like gender-race, etc. Further, the reinforcement learning model developed using the fair training synthetic data will achieve better performance on test data.
Various example embodiments of the present disclosure are described hereinafter with reference to FIGS. 1 to 10.
FIG. 1 illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, generating fair credit risk assessment models by performing bias mitigation techniques, etc. The environment 100 generally includes a server system 102, an issuer server 104, a plurality of users 106a, 106b, and 106c, a payment network 108 including a payment server 110, a credit loan provider 118, each coupled to, and in communication with (and/or with access to) a network 112. The network 112 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among two or more of the parts or users illustrated in FIG. 1, or any combination thereof.
Various entities in the environment 100 may connect to the network 112 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, future communication protocols or any combination thereof. For example, the network 112 may include multiple different networks, such as a private network made accessible by the server system 102 and a public network (e.g., the Internet, etc.) through which the server system 102, the issuer server 104, and the payment server 110 may communicate.
In one embodiment, the issuer server 104 is a financial institution that manages accounts of one or more of the plurality of users 106a, 106b, and 106c. Account details of the accounts established with the credit provider are stored in user profiles of the users in memory of the issuer server 104 or on a cloud server associated with the issuer server 104. The terms “issuer server”, “issuer”, or “issuing bank” will be used interchangeably herein.
In one embodiment, the server system 102 is configured to perform one or more of the operations described herein. In one example, the server system 102 coupled with a database 114 is embodied in the payment network 108. In general, the server system 102 is configured to train credit risk assessment model 116 without any bias. The server system 102 is configured to mitigate biased training data from a plurality of training data to generate unbiased training data. The server system 102 is configured to perform various bias mitigation techniques to make the credit risk assessment models unbiased so that the prediction output of the credit risk assessment models is fair for under-represented cardholder groups as well. In one non-limiting example, the server system 102 accesses training data from the database 114 or third-party data sources. The training data includes historical credit event timeline data associated with a plurality of cardholders of two or more cardholder groups. The two or more cardholder groups are determined based, at least in part, on one or more protected attributes associated with each cardholder. The server system 102 then filters the biased training data from the training data to generate unbiased training data based on a bias mitigation model. The server system 102 evaluates fairness evaluation metrics associated with a classification model that is trained based on the unbiased training data. The classification model is trained for determining a decision whether to extend a credit to a particular cardholder or not. The server system 102 also trains a neural network model based on the unbiased training data. The neural network model is trained by exploring a plurality of fairness related metrics corresponding to the two or more cardholder groups and a delayed reward function. The neural network model determines whether the cardholder is given equal preference among the one or more cardholder groups
The server system 102 is a separate part of the environment 100, and may operate apart from (but still in communication with, for example, via the network 112) the issuer server 104, the payment server 110, and any third-party external servers (to access data to perform the various operations described herein). However, in other embodiments, the server system 102 may actually be incorporated, in whole or in part, into one or more parts of the environment 100, for example, the payment server 110 and the issuer server 104. In addition, the server system 102 should be understood to be embodied in at least one computing device in communication with the network 112, which may be specifically configured, via executable instructions, to perform steps as described herein, and/or embodied in at least one non-transitory computer-readable media.
In one embodiment, the database 114 is configured to store training data and a credit risk assessment model 116. The training data includes a plurality of training data associated with one or more population groups from one or more cardholder groups. The plurality of training data may be collected from a plurality of sources, such as, survey data, census data, issuer server 104, third-party, and so forth. The plurality of training data may indicate transaction features and demographic features of an applicant or borrower. The applicant may belong to one of the one or more cardholder groups.
In one embodiment, the payment network 108 may be used by the payment card issuing authorities as a payment interchange network. The payment network 108 may include a plurality of payment servers such as, the payment server 110. Examples of payment interchange networks include, but are not limited to, Mastercard® payment system interchange network. The Mastercard® payment system interchange network is a proprietary communications standard promulgated by Mastercard International Incorporated® for the exchange of financial transactions among a plurality of financial activities that are members of Mastercard International Incorporated®. (Mastercard is a registered trademark of Mastercard International Incorporated located in Purchase, N.Y.).
The number and arrangement of systems, devices, and/or networks shown in FIG. 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1. Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 100 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 100.
Referring now to FIG. 2, a simplified block diagram of a server system 200 is shown, in accordance with an embodiment of the present disclosure. The server system 200 is similar to the server system 102. In some embodiments, the server system 200 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture. In one embodiment, the server system 200 is a part of the payment network 108 or is integrated within the payment server 110. In another embodiment, the server system 200 is embodied within the issuer server 104.
Although, the present disclosure explains the concept of bias mitigation in credit risk assessment models, it can be applied to any machine learning model. In one embodiment, the server system 200 is configured to perform data pre-processing algorithms to filter biased training data and to generate fair and unbiased synthetic training samples independent of any GAN (Generative Adversarial Network) architectures.
In one embodiment, the server system 200 includes a computer system 202 and a database 204. The computer system 202 includes at least one processor 206 for executing instructions, a memory 208, and a communication interface 210. The one or more components of the computer system 202 communicate with each other via a bus 212.
In some embodiments, the database 204 is integrated within computer system 202. For example, the computer system 202 may include one or more hard disk drives as the database 204. A storage interface 214 is any component capable of providing the processor 206 with access to the database 204. The storage interface 214 may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing the processor 206 with access to the database 204. In one embodiment, the database 204 is configured to store a credit risk assessment model 226 and reinforcement learning (RL) model 228.
The processor 206 includes suitable logic, circuitry, and/or interfaces to execute computer-readable instructions for implementing a credit risk model with improved accuracy and determining a credit risk score of a user without any biases in prediction. Examples of the processor 206 include, but are not limited to, an application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphical processing unit (GPU) processor, a field-programmable gate array (FPGA), and the like. The memory 208 includes suitable logic, circuitry, and/or interfaces to store a set of computer-readable instructions for performing operations. Examples of the memory 208 include a random-access memory (RAM), a read-only memory (ROM), a removable storage drive, a hard disk drive (HDD), and the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the memory 208 in the server system 200, as described herein. In another embodiment, the memory 208 may be realized in the form of a database server or cloud storage working in conjunction with the server system 200, without departing from the scope of the present disclosure.
The processor 206 is operatively coupled to the communication interface 210 such that the processor 206 is capable of communicating with a remote device 216 such as, the plurality of users 106a, 106b, and 106c, or communicating with any entity connected to the network 112 (as shown in FIG. 1). It is noted that the server system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the server system 200 may include fewer or more components than those depicted in FIG. 2.
In one embodiment, the processor 206 includes a data pre-processing engine 218, a credit application assessment engine 220, a reinforcement learning (RL) engine 222, and a fairness testing engine 224. It should be noted that components, described herein, such as the data pre-processing engine 218, the credit application assessment engine 220, the RL engine 222, and a fairness testing engine 224 can be configured in a variety of ways, including electronic circuitries, digital arithmetic and logic blocks, and memory systems in combination with software, firmware, and embedded technologies.
The data pre-processing engine 218 includes suitable logic and/or interfaces for accessing historical credit event timeline data associated with a plurality of cardholders 106a-106c of one or more cardholder groups. In one example, the historical credit event timeline data may indicate past credit events associated with each cardholder within a particular time interval. For example, past credit events may include credit loans approved for the cardholder, total premiums paid by the cardholder to existing credit loans, transaction history, etc.
To train a credit risk assessment model, the data pre-processing engine 218 is configured to pre-process a plurality of training samples in such a way that discrimination or bias is reduced. Each training sample includes the past credit events associated with a cardholder within a particular time interval. At first, the data pre-processing engine 218 may identify a plurality of cardholder groups based on past credit profiles, a number of data representations in the training data, etc. Each cardholder group may represent credit profile characteristics with at least one similar attribute (for example, location, credit scores, etc.). For example, a protected attribute may divide the cardholders into two groups (privileged and unprivileged).
In one example, when a particular cardholder group has a greater number of data representations in the received training samples, high credit limits and credit scores, or high salaries, the particular cardholder group can be considered as ‘privileged’ or ‘favorable’ group. In contrast, when a particular cardholder group may have a smaller number of data representations in the received training data, low credit limits, low credit scores or low salaries, the particular cardholder group can be considered as an ‘unprivileged’ or ‘unfavorable’ group.
In one embodiment, the data pre-processing engine 218 is configured to apply a fair data generation model to filter biased training data from the training data. The fair data generation model defines a pipeline to generate fair and unbiased synthetic training data. Initially, the data pre-processing engine 218 is configured to explore a set of training samples that are introducing bias and then drop a portion of these training samples to get a fairer data representation of different cardholder groups. In other words, the data pre-processing engine 218 is configured to identify and drop the set of training samples from different cardholder groups that have similar data representation but with opposite outcomes and use the remaining training samples for synthetic fair data generation. In one embodiment, the data pre-processing engine 218 is configured to apply a data pre-processing technique (e.g., K% removal method, data augmentation, etc.). In the K% removal method, the data pre-processing engine 218 is configured to remove top K% biased training samples from both the cardholder groups. Thereafter, the data pre-processing engine 218 provides the pre-processed training data to a generative adversarial network (GAN) to create fair synthetic training data for the underprivileged group that has lesser data representations.
The credit application assessment engine 220 includes suitable logic and/or interfaces for determining a decision whether to extend a credit loan from a particular cardholder or not. The credit application assessment engine 220 may implement or build a credit risk assessment model 226 that is trained based on the unbiased training data and the fair synthetic training data generated by the fair data generation model. The credit risk assessment model 2226 is a classification model. In one example, the classification model is an artificial neural network (ANN). The credit risk assessment model 226 is trained with a fairness objective to ensure that the credit risk assessment model 226 trained using the fair synthetic data does not increase the true positive rate and false positive rate ratio among different cardholder groups. In other words, the credit risk assessment model 226 is trained in such a way so that the credit application decision task does not provide unintended bias to a particular cardholder group in the one or more protected attributes.
In one embodiment, the credit risk assessment model 226 aims to learn data distribution that is invariant to different cardholder groups so that the credit risk assessment model does not explore the difference in data distribution among different cardholder groups and does not give an accurate prediction for certain cardholder groups only. The credit risk assessment model captures the domain generalization concepts to eliminate biases in its prediction outputs.
In particular, a Mix-group neural layer is introduced in a plurality of neural layers of the classification model that probabilistically mixes group-level feature statistics of training samples across different cardholder groups in the protected attribute. This way all the training samples in a batch get the same mean and variance. Since the information of protected attribute is more prominent in the initial neural layers of the ANN, the Mix-group neural layer is introduced in the initial layers to handle the bias.
In one embodiment, model parameters of the credit risk assessment model 226 are updated based on a plurality of fairness evaluation metrics. When the credit risk assessment model is fully-trained, the credit risk assessment model 226 is utilized for determining whether a credit application for a particular cardholder should be approved or declined.
In one scenario, when the credit risk assessment model 226 finds out that the credit application for a particular cardholder should be declined, the processor 206 is configured to trigger the trained reinforcement learning (RL) model to explore the credit event data of the particular cardholder for a particular time interval (e.g., next six months) and decides whether the credit application for the particular cardholder should have been approved or declined.
The reinforcement learning (RL) engine 222 includes suitable logic and/or interfaces for determining an optimal action (e.g., providing a credit loan to a particular user or not) based, at least in part, on a deep reinforcement learning model. In one embodiment, the RL engine 222 is configured to perform offline training based on the unbiased training data and the fair synthetic training data associated with the two or more cardholder groups. The RL engine 222 generates a policy that distributes fair distribution of rewards, long and short term protected by continuous fairness, business, and A/B testing.
In one example, the RL engine 222 is configured to predict or determine whether a particular user can be issued a credit loan by a credit provider or not. Initially, the RL model 228 is configured to identify a particular cardholder group associated with the particular user and simulate multiple episodes with exploration technique methods to identify an episode with an optimal reward value. During the exploration method, the fairness related metrics of the applicant are provided as feedback for each simulated episode. In one example, it is assumed that a particular applicant ‘A’ raises a request for a credit loan. For the credit loan request, the RL engine 222 is configured to determine whether the user ‘A’ will be able to pay back the credit loan within a particular time frame or not. In this example task, delayed rewards are present when the RL engine 222 decides to give the credit loan to the user ‘A’ for the credit loan, but the RL model may have not known about the future transaction behavior of the user ‘A’. The delayed reward function is introduced to enable the RL engine 222 to learn the future transaction behavior of the user ‘A’, the RL engine 222 is penalized or incentivized based on the transaction behavior of the user ‘A’ in the last one year.
In an example, algorithm of the RL model 228 employs trial and error to come up with a solution to a problem. For example, the problem may be to classify a training sample as accepted (fit for credit or loan) or rejected (unfit for credit or loan). Each action in a decision sequence may have a corresponding reward or penalty. The goal of the algorithm is to maximize the total reward or minimize the total penalty.
The RL engine 222 performs training based on a reward function and a delayed reward function. The reward function is based on credit approval and decline probability scores of a training sample from the plurality of training samples associated with a cardholder (who applies for credit or loan). In particular, the reward function may be based on the credit risk score of the cardholder over a pre-defined time period, such as 12 months, and delinquency rate. In an example, the reward function includes a positive reward state for effective credit risk score generation and a negative reward state for ineffective credit risk score generation. For example, the reward function may provide a positive reward state when the algorithm classifies a training sample of a borrower as approved and the borrower pays back a loan in time, or when the algorithm classifies a training sample of a borrower as rejected and the borrower does not give back a loan in time. Moreover, the reward function may provide a negative reward state when the algorithm classifies a training sample of a borrower as approved and the borrower fails to pay back a loan in time, or when the algorithm classifies a training sample of a borrower as rejected and the borrower takes a loan from another financial institution or credit provider and does pay back the loan in time.
To express the use of reinforcement learning in determining the credit score of multiple users for lending, the present disclosure explains theoretical models of deep Q learning, and the Markov Decision Process (MDP) in more detail. It would be apparent to those skilled in the art that several deep reinforcement learning models may be applied to accomplish the spirit of the present disclosure. In general, the goal of the RL model 228 is to learn a policy (control strategy) that maximizes the expected return (cumulative, discounted reward).
The RL model 228 implements Markov Decision Process (MDP). The MDP may be represented by a four-tuple , where,
1) S is a State Space, which includes a set of environmental states that the agent or the RL engine 222 may perceive. Herein, at any time t, a state of the RL model 228 refers to a state of credit features corresponding to a cardholder who is applying for a credit loan.
2) A is an Action Space, which includes a set of actions that the agent may take on each state of the environment. Herein, the set of actions includes providing a credit loan to the user upon request or rejecting the credit loan request of the cardholder.
3) R is a reward function and R (s, a, s’) represents a delayed reward function that the agent obtains from the environment when the action ‘a’ is performed on the state ‘s’ and the state is changed to state ‘s’. Herein, the reward function consists of credit risk scores of the cardholder in an observation period (e.g., next 12 months) and the delinquency rate associated with the cardholder.
Further, the delayed reward function includes a delayed positive reward state and a delayed negative reward state. The delayed positive reward state and the delayed negative reward state may be determined based on a plurality of fairness metrics. The delayed reward state may be provided as delayed feedback to the RL engine 222. For an example, a delayed negative reward state may be provided against a borrower (or a training sample) when the borrower gets approved for a loan, but gets defaulted several times or fails to pay the loan amount. In another example, a delayed negative reward state may be provided against a borrower when the borrower gets approved for a loan, and the borrower takes one or more other loans to pay back the loan. Beneficially, such delayed reward function prevents any harm to a community owing to the action of one borrower, especially, in an under-represented and/or an under-privileged community.
The fairness testing engine 224 includes suitable logic and/or interfaces for evaluating fairness evaluation metrics associated with the credit risk assessment model 226 that is trained based, at least in part, on the unbiased training data. The fairness testing engine 224 may apply an A/B testing algorithm, a plurality of fairness and business metrics on outputs of the credit risk assessment model 226 and provides the fair evaluation metrics as feedback to the credit risk assessment model 226. The fairness testing engine 224 evaluates business metrics and fairness metrics iteratively after scoring a fixed number of cardholders and updates credit risk status determined using the credit risk assessment model 226 and RL model 228. This weightage to each model for credit risk assessment may be changed as per the business metric and fairness metric feedback to maximize the business metric while minimizing the unintended harm to credit loan providers.
FIG. 3 is a block diagram representation 300 of assessing the credit worthiness of a cardholder using different models, in accordance with an embodiment of the present disclosure. The block diagram representation 300 may include a rule-based credit model 302, a reinforcement learning (RL) model 304, a classification model 306, and a fairness testing model 308. As previously mentioned, the processor 206 is configured to train the RL model 304 and the classification model 306 based on stratified training data containing equal representation from all segments of cardholder groups.
In one embodiment, the processor 206 is configured to receive a credit application request for a loan/credit associated with a cardholder 106a. In one example, suppose, the cardholder 106a requests a credit lending bank ‘A’ to loan ‘X’ dollar amount. The credit lending bank ‘A’ sends the request to the server system 200 which acts as a fair credit lending determination platform.
Upon receiving the credit loan request, the processor 206 is configured to decide which of two models (see, 302 and 306) to pass the credit loan request based on initial screening and bank preferences of the credit lending bank ‘A’.
In one embodiment, the processor 206 is configured to send the credit loan request of the cardholder 106a to the rule-based credit model 302. The rule-based credit model 302 is a conventional model which is configured to store credit risk rules for borrowers based on a plurality of business metrics and existing credit risk assessment rules. For example, the existing credit risk assessment rules are defined based on past transaction patterns and a set of protected attributes associated with the borrowers. In an example, the set of protected attributes associated with a borrower may include, but are not limited to, demographic information, such as age, gender, education, income, geo-location, population group or race, housing, marital status, and so forth, and financial transaction information, such as credit history, FICO score, financial transaction data, and so forth.
In another embodiment, the processor 206 is configured to send the credit loan request to the classification model 306 (i.e., credit risk assessment model). The classification model 306 is trained based on unbiased training data and is implemented as a feed-forward neural network. The classification model 306 is configured to determine whether the credit loan request should be approved or declined. When the classification model 306 predicts that the credit loan request can be approved, the processor 206 is configured to send a notification to the credit lending bank.
In one scenario, when the classification model 306 predicts that the credit loan request should be declined, the processor 206 is configured to trigger the RL model 304 and provide the credit loan request of the cardholder 106a to the RL model 304. During offline training, the RL model 304 learns a fair policy that treats each cardholder equitably. The RL model 304 simulates delayed episodes and determines a plurality of fairness related metrics corresponding to under-represented groups. The RL model 304 also utilizes delayed feedback to develop fair models to handle any special situations like Covid-related loans, loans to thin-file customers, etc. by dynamically changing the configuration of the RL model 304 through the fairness testing model 308.
In one embodiment, upon receiving the credit loan request, the RL model 304 determines a total reward value by tracking credit risk score gain over an observation period (e.g., the next six months) and delinquency rate patterns. The RL model 304 also takes the fairness evaluation metrics as feedback and incorporates them into the total reward value. Based on the total reward value, the RL model 304 decides whether the credit loan request of the cardholder 106a should have been approved or declined.
The fairness testing model 308 performs a randomized testing of all the outputs of the rule-based credit model 302, the RL model 304, and the classification model 306. The server system 200 may evaluate a plurality of business metrics and a plurality of fairness metrics for outputs of the rule-based credit model 302, the RL model 304, and the classification model 306, iteratively. The fairness testing model 308 assesses business metrics as well as fairness metrics corresponding to model prediction, thereby ensuring that the credit risk assessment model provides an unbiased output.
FIG. 4 is a schematic representation 400 of a process for generating unbiased training data and fair synthetic training data for under-represented groups, in accordance with an embodiment of the present disclosure.
As mentioned earlier, the server system 200 is configured to generate fair synthetic training data based on data pre-processing techniques and the GAN model. The pre-processing-based bias mitigation approach filters the biased training data from the training data and then the GAN model generates fair synthetic training data. The server system 200 generates the fair synthetic training data corresponding to under-represented groups from the one or more cardholder groups and trains the RL model and the credit risk assessment model based, at least in part, on the unbiased training data and the fair synthetic training data.
Further, the server system 200 develops a robust fairness assessment using intersectional bias testing to evaluate the models. The technical objective is to ensure that systems developed using synthetic training data are constrained to not increase disparate harms to different population groups. The fairness assessment is performed by evaluating fairness evaluation metrics associated with the models. Further, the systems are constrained using synthetic training data to not increase the true positive rate and false positive rate ratio among different population groups in society. Another objective is to constrain the disparity for groups created using the intersection of protected attributes like gender, race, privileged, unprivileged, etc., and the labels like favorable, unfavorable, etc. There are some data models which become gender biased after training with lots of training data. Even with similar attributes, the male population is given a higher preference by the data model and considered a ‘privileged group’. In an equitable world, the outcome for a person should be independent of a protected attribute. In one embodiment, two groups are formed, namely a privileged group with favorable outcomes and an under-privileged group with unfavorable outcomes. Since, a data model should not consider any protected attribute of a user while prediction, therefore, the biased training data should be dropped from training data, thus mitigating biases.
The method allows for unbiased training data generation which is of high relevance across data-sensitive industries such as finance, healthcare, etc. The existing methods rely on utilizing protected attributes which at times may not be readily available or not allowed to be used. For instance, where the protected attribute usage is not allowed, the proposed method develops and validates methods that minimize the error between clusters being discriminated differently in an unsupervised setting.
The method generates fair synthetic training data independent of the GAN model 408. The GAN model 408 includes a generator that deterministically generates synthetic training data based on input training data and a discriminator whose function is to distinguish training data from the real dataset and the generator. As biased training data are removed from training data, the synthetic training data generated has better information to capture the underlying distribution of the training data. Further, any AI model developed using fair synthetic training data will achieve better performance on test data.
At first, the server system 200 provides a plurality of training data associated with one or more population groups (for example, privileged group, underprivileged/under-represented group, etc.) as input (see, 402) to a bias mitigation model 404. The unbiased training data includes a plurality of synthetic training data associated with the under-represented group generated by the bias mitigation model 404. The bias mitigation model 404 utilizes a pre-processing technique to create fairer data that helps itself to understand the type of bias existing in the training data. In one example, the bias mitigation model 404 may be, but is not limited to, raw data processing, K% removal method, data augmentation method, etc. A detailed explanation of the fair data generation model as the K% removal method is provided with reference to FIG. 5. Thereafter, the server system 200 provides the pre-processed training data 406 to the GAN model 408 to create the fair synthetic training data 410.
Referring now to FIG. 5, a process flow 500 for generating unbiased training data and fair synthetic training data is described, in accordance with an embodiment of the present disclosure. The sequence of operations of the process flow 500 may not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped together and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or a sequential manner.
At step 502, the server system 200 accesses training data from a database. The training data includes historical credit event timeline data associated with a plurality of cardholders. The server system 200 determines a set of credit features associated with each cardholder based on the historical credit event timeline data. The set of credit features may correspond with one or more protected attributes including age, gender, community, race, etc.
At step 504, the server system 200 removes one or more credit features from the set of credit features that are in correlation with a particular protected attribute equal to or above a threshold correlation value. In one example, the threshold correlation value is equal to 0.7 with a protected attribute to filter the plurality of training data. In one example, one of the features of certain pair with a correlation higher than 0.7 is dropped.
At 506, the server system 200 divides the training samples into two or more cardholder groups based, at least in part, on the one or more protected attributes and labeled data associated with the training samples. In one embodiment, the two or more cardholder groups based on the protected attributes include privileged group and unprivileged group, and the groups based on the labels include favorable and unfavorable groups. In one example embodiment, the groups formed may be at least be one of privileged-favorable, unprivileged-favorable, privileged-unfavorable, unprivileged-favorable, etc.
At 508, the server system 200 performs normalizing of continuous features of the training data and generates one hot encoded vector representing categorical features.
At 510, the server system 200 calculates a cosine similarity score of each training sample or instance from an unprivileged or unfavorable group to each training example of a privileged group or favorable group. The cosine similarity score is calculated for each training example or instance of the unprivileged training data and the unfavorable training data to each of the training example instances of the privileged training data and the favorable training data.
At 512, the server system 200 flags or identifies similar training samples or instances from both privileged and unprivileged groups based on a similarity threshold value and the cosine similarity score.
At 514, the server system 200 ranks each training sample or instance as per the count of similar training samples associated with an opposite group. The higher the count of instances similar to the opposite group, the higher the rank of the training sample.
At 516, the server system 200 removes the top K% instances or biased training samples from each of the unprivileged and the privileged group based on the ranking of the count of training samples that are similar to a training sample.
At 518, the server system 200 generates fair synthetic training data using the GAN model 408 after removing the top K%-biased training samples from each of the unprivileged group and the privileged group. The GAN model 408 uses the unbiased training data of the training data set to generate fair synthetic training data. In one embodiment, the GAN model is the FairGAN model.
Further, the server system 200 may classify the fair synthetic training data generated by the GAN model 408 and the test data set to generate accurate and fair test training data. The classification model classifies the unbiased synthetic training data and the test training data to classify and remove the biased data present in the test training data set. The biased data removed using the K% bias mitigation technique is used as label noise data. The label noise data consists of similar training data but has opposite outcomes. The label noise data is used for training the general adversarial network (GAN) model. In one embodiment, the server system 200 checks for accuracy and fairness on the test training data set to ensure that the test training data consists of no biased training data. The server system 200 performs a robust fairness assessment using intersectional bias testing to evaluate the models.
FIG. 6 is an exemplary representation 600 of plots depicting bias mitigation by filtering bias training data after and before applying the bias mitigation model 404, in accordance with an embodiment of the present disclosure. In one embodiment, the bias mitigation model 404 is a K% removal technique performed by the server system 200 on the plurality of training samples. The bias mitigation techniques include the generation of synthetic training samples, where synthetic training samples are unbiased training samples. The generation of synthetic training samples consists of the server system 200 classifying each training sample from the plurality of training samples within one of a well-represented group and an under-represented group based, at least in part, on a corresponding population group.
Further, the server system 200 identifies a pair of bias training samples from the plurality of training samples, the pair of bias training samples relating to different groups from the well-represented group and the under-represented group, and the pair of bias training samples having similar value for a set of prediction features and opposite output indicating a difference in output within the pair of bias training samples owing on corresponding population group. Further, the server system 200 removes a pair of biased training samples from the plurality of training samples to generate updated training data. Furthermore, the server system 200 provides the updated training data to a data generation model, where the data generation model is configured to generate a plurality of synthetic training samples for the under-represented group based, at least in part, on the updated training data. The exemplary plot shows two different training samples being plotted on an X-Y graph. The two different groups are the privileged group and the unprivileged or under privileged group.
In plots 602 and 604, the triangular entities belong to the privileged group and the circular entities belong to the unprivileged group. The shaded entities belong to the favorable output and the unshaded entities belong to the unfavorable output. The overlapped entities are considered to be biased data and the other entities are considered to be unbiased training samples. The exemplary plot 602 shows unbiased training samples and the biased training samples are removed or eliminated using the K% technique to create unbiased training samples.
FIG. 7 is a simplified block diagram of a classification model 700, in accordance with an embodiment of the present disclosure. In one embodiment, the classification model 700 is an artificial neural network (ANN) based neural network architecture. In one example, the classification model 700 is a feed-forward neural network. Pursuant to the present example, a Mix-group neural layer is introduced within the classification model 700 to mitigate bias in prediction output or classification output. The Mix-group neural layer provides domain generalization properties to the classification model 700.
For an example, unbiased training data corresponding to one or more cardholder groups may be distributed differently. In certain cases, protected attributes of borrowers or training data may be used for the distribution of the unbiased training data. The distributions are different for different cardholder groups in the protected attributes due to the under-representation or historical bias. However, the protected attributes may not be used as the basis for classification or decision-making based on law. Such protected attributes may be, for example, gender, religion, and race. The classification model 700 may tend to explore this difference in the distribution among the groups to lift the overall accuracy and in the process, becomes biased towards certain groups. For an example, the classification model 700 may learn that male and female groups with similar features may have different salaries. In another example, geo-location and transaction patterns may be explored to identify suspicious activities or behaviors. In this regard, the classification model 700 may learn unknown or unintended biases (e.g., a particular demographic group being flagged out). As a result, the classification model 700 may describe some groups better than others. For example, a linear model may be suitable for one group but not for another. Hence, it is important to put additional constraints in the training of the classification model 700 to handle bias.
Further, the design of the classification model 700 plays a crucial role in introducing bias to the classification model’s prediction output. The model architecture may describe some groups better than others. For example, a linear model may be suitable for one group but not for another. Hence, it is important to put additional constraints in the training of AI-based model to handle bias.
Therefore, the present disclosure introduces group generalization by setting a group mixing neural network in the classification model 700 to reduce the difference in the distribution between the groups present in the protected attribute and enables learning representations that are invariant for different cardholder groups such that the classification model 700 does not explore the difference in distribution and give an accurate prediction for certain groups only.
In one embodiment, the classification model 700 includes a plurality of feed-forward layers 702 such as layer 1 702a, Mix-group neural layer 702b, layer 2 702c, and layer 3 702d. While four feed forward layers are depicted in figures, this is not meant to be limiting. In other embodiments of the present disclosure, more or a smaller number of feed forward layers may be present in the classification model 700. The Mix-group neural layer 702b is configured to probabilistically mix group-level features of the unbiased training data across a set of one or more cardholder groups.
As it may be understood, the classification model 700 may be trained in batches. For example, in the first batch, some of the unbiased training data may be provided as input to the classification model 700. The classification model 700 learns learnable parameters from the input, performs an action based on the input, and generates output. The output may be fed-back to the classification model 700 in a second batch along with unbiased training data for the second batch. To this end, the unbiased training data are segmented within a plurality of batches for training the classification model 700 in different batches. For example, the classification model 700 may also be trained based on a plurality of synthetic training data.
Thereafter, batch-based mean and batch-based variance is calculated for each of the plurality of batches based, at least in part, on training data within corresponding batches. In particular, based on which training data belongs to which batch, batch-based mean and a batch-based variance are calculated. Thereafter, the plurality of batch-based mean and the plurality of batch-based variance associated with the plurality of batches are probabilistically mixed to determine a standard mean and a standard variance. In this manner, the plurality of training data in a batch gets the same mean and variance. The different domains are the different groups in the protected attributes, for example, a male and a female. The male group and the female group have different distributions for similar features across different classes.
The standard mean and the standard variance across all the unbiased training data within the corresponding plurality of batches are normalized. In this manner, the classification model 700 may learn domain invariant representation of the unbiased training data. Based on the learning domain invariant representation of the unbiased training data, the classification model 700 may be fine-tuned.
The Mix-group neural layer 702b is easy to implement, fits into mini-batch training, and does not introduce any additional learning parameters in the underlying classification model architecture. In other words, the Mix-group neural layer 702b unlearns differences among two or different groups and tries the classification model 700 to focus more on classification, rather than an exploration of differences among the two or more groups.
In one embodiment, the processor 206 is configured to access input training data 704 pertaining to two different domains and having two different classes from a database. In an example embodiment, the two different groups may include a male group and a female group. The input training data 704 includes data pertaining to the two different groups and the domains. The input training data 704 includes data from the male groups whose income is less than a certain threshold, or greater than a certain threshold, and the female groups whose income is less than a certain threshold, or greater than a certain threshold.
In an embodiment, the Mix-group neural layer 702b is preferably placed between the layers 702a and 702c to easily train the model to mitigate the bias in the input training data. The classification model 700 probabilistically mixes group-level feature statistics of the input training data 704 across different groups in the protected attribute. This method helps the samples in the dataset to get the same mean and variance. The model is then fine-tuned using fewer training data to generate unbiased datasets.
At first, the classification model 700 inputs the input training data (e.g., X?R^D) of batch size N and group mask, M:N*1([1,0,0,…,1]). In one example, the sample batch X may include:
X=[X1,X2,X3,X3….XN]….Eqn. (1)
The classification model 700 feeds the input training data into the layer 1 702a which has the following weights and biases:
W_1= R^(D×D^1 ) (Weights)….Eqn. (2)
b_1= R^(D^1 ) (Bias) ….Eqn. (3)
Thereafter, the output (e.g., X^1?R^(D^1 )) of the layer 1 702a is fed into the Mix-group neural layer 702b. The Mix-group neural layer 702b defines an algorithm to help mitigate bias as following:
def?_mixGroup (X^1,M):
?=Beta(a,ß)
X_1=X^1 M
X_2=X^1 (1-M)
µ_1=get_mean(X_1)
?var?_1=get_variance(X_1)
s_1=sqrt(?var?_1 )
µ_2=get_mean(X_2 )
?var?_2=get_variance(X_2)
s_2=sqrt(?var?_2 )
x_norm=((X_1-µ_1 )/s_1) ×M+ ((X_2-µ_2 )/s_2) ×(1-M)
µ_mix=µ_1×?+µ_2×(1-?)
s_mix=s_1×?+s_2×(1-?)
x_norm=x_norm×s_mix+µ_mix
return x_norm
Here, X1 represents data points corresponding to the group 1. The algorithm calculates mean and variance of the group 1. In a similar manner, the algorithm also calculates mean and variance of the group 2. Therefore, the algorithm generates X_norm?R^(D^1 ) by calculating mixed mean and mean variance for both the groups.
The output of the Mix-group neural layer 702b is fed into the layer 2 702c of the classification model 700 that has the following weights and biases:
W_2= R^(D^1×) (Weights)…Eqn. (4)
b_2= R^1 (Bias)…Eqn. (5)
The classification model provides an output 706. Hence, the overall algorithm of the classification model 700 can be defined as following:
(1) def?_classifier(X,M):
(2) X^1=dropout (relu ((X×W_1 )+b_1))
(3) X_norm =_mixGroup (X^1,M)
(4) output=sigmoid((X×W_2 )+b_2
(5)return output

FIGS. 8A and 8B show experiment results 800 and 810 for output comparison of the classification model with the Mix-group neural layer, in accordance with an embodiment of the present disclosure. The experimental results show values of different fairness metrics for different existing classification models and the classification model with the Mix Group layer. The experimental results disclose the model being tested for fairness evaluation using adult census income dataset.
In one embodiment, the graph is plotted for two protected attributes namely gender and race. The fairness threshold is defined using Demographic Parity Difference (DPD), Average Odds Difference (AOD), Equality of Opportunity (EOP), and Demographic Parity Ration (DPR) where DPD, AOD, EOP : <0.1 and DPR to be between 0.8 and 1.25. The data is tested against a plain classifier and the classification model with the Mix Group layer shows to be providing better results than the other two models. It is evaluated and the performance of the classification model is compared with Mix Group layer with baseline in-processing technique on Public dataset like UCI Adult Census and COMPAS dataset. And the classification model with Mix-group neural layer was more effective in producing fair results.
The fairness evaluation of the group mixing model is done as two methods: (1) in-processing (2) intra-processing.
In one embodiment an intra-processing method is defined as an algorithm which has access to a trained model and a dataset (which typically differs from the original training dataset), and outputs a new model which gives debiased predictions on the test data (typically by updating or augmenting the weights of the original model).
The proposed solution produces fairer results when evaluated for the new population groups at test time. The dataset evolves with time and it is possible that new population groups get introduced. For example, gender attribute values may change from binary to non-binary population groups. When some new population groups get introduced into the protected attribute, existing methods would require to re-train the model from scratch on the new population groups. However, the present method generalizes well on the new population groups at test time and doesn’t require to re-train the model.
The fairness evaluation of individual protected attribute fairness metrics consists of:
Classification Accuracy: CA=((TP+TN))/((TP+TN+FP+FN))
Balanced Classification Accuracy: 1/2×(TPR+TNR)
Demographic Parity Ration: DPR=(P(Y ^=1¦S?1))/(P(Y ^=1¦S=1) )=1-e
Demographic Parity Difference: DPD/SPD=|P(Y ^=1¦S=1)-P(Y ^=1¦S?1)|=e
Equality of Opportunity: EOP=|P(Y ^=1¦S=1,Y=1)-P(Y ^=1¦S?1,Y=1)|=e
Average Odds Difference: AOD=1/2×(|FP(S=1)-FP(S=0)|+|TP(S=1)-TP(S=0)|)
In the present solution, the fine-tuning of the pre-trained biased model can be done on lesser samples which results in unbiased predictions. Hence, there is no need to train the pre-trained biased models from scratch on the whole dataset to mitigate bias.
FIGS. 9A and 9B, collectively, represent a flow diagram 900 of a training phase for implementing bias mitigation in credit risk assessment, in accordance with an embodiment of the present disclosure. The sequence of operations of the flow diagram 900 may not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped together and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner.
At 902, a server system 200 accesses training data from a database. The training data includes historical credit event timeline data associated with a plurality of cardholders 106a-106c of one or more cardholder groups. In an example, the training data includes consensus and survey data. Moreover, the training data may include financial data stored by the server system, a credit loan provider (a financial institution), or a third-party associated with banking and finance. For example, a training sample may relate to a borrower or an applicant that applied for loan and got accepted or rejected. The training data may have biased samples, i.e., training data that may affect outcome of the training process of a credit risk assessment model.
At 904, the server system 200 classifies a plurality of training samples into privileged group and unprivileged group. In one example, when a particular cardholder group has a greater number of data representations in the received training samples, high credit limits and credit scores, or high salaries, the particular cardholder group can be considered as ‘privileged’ or ‘favorable’ group. In contrast, when a particular cardholder group may have a smaller number of data representations in the received training data, low credit limits, low credit scores or low salaries, the particular cardholder group can be considered as ‘unprivileged’ or ‘unfavorable’ group. The server system 200 generates a feature-output plot including data points associated with the training samples. In particular, the feature-output plot is an X-Y plot (such as the plot shown in FIG. 7). A feature axis of the feature-output plot represents a value for a set of prediction features required to predict credit risk score of a training data.
At 906, the server system 200 filters biased training data from the training data to generate unbiased training data based, at least in part, on a bias mitigation model (e.g., K% removal method). In particular, the server system 200 identifies a pair of bias training data from the plurality of training data using the bias mitigation model. The server system 200 removes the pair of bias training data from the training data to generate the unbiased training data. It may be noted that several pairs of bias training data or one or more cluster including multiple bias training data may be identified by the server system, using the bias mitigation model. In this manner, the server system 200 filters the biased training data to obtain unbiased training data.
At 908, the server system 200 provides the unbiased training data to a fair data generation model to generate fair synthetic training data corresponding to under-represented or unprivileged groups of the one or more cardholder groups. In particular, to overcome the lack of data for the under-represented groups and to train fairly based on each of the one or more cardholder groups, the server system 200 generates the fair synthetic training data. In an example, the fair data generation model is one of a generative adversarial network (GAN) model, a conditional GAN (CTGAN) model, a copula GAN model, and a Gaussian copula GAN model. As it may be understood, each of the synthetic training data indicates financial transaction features and a set of demographic features of a synthetic cardholder from the under-represented group.
Thus, the pre-processing based bias mitigation approach removes biased samples from training data and thus generates fair synthetic training data independent of the underlying GAN architecture. Models trained using this synthetic training data are fair and unbiased. Further, the server system 200 develops a robust fairness assessment using intersectional bias testing to evaluate the models. The technical objective is to ensure that systems developed using synthetic training data are constrained to not increase disparate harms to different community groups. Further, the systems are constrained using synthetic training data to not increase true positive rate and false positive rate ratio among different groups in society. Another objective is to constrain the disparity for groups created using the intersection of protected attributes like gender-race, etc. There are some data models which become gender biased after training with lots of data. Even with similar attributes, male community is given higher preference by the data model and considered as ‘privileged group’. In an equitable world, the outcome for a person should be independent of protected attribute. Two groups are formed- privileged group with favorable outcomes and under-privileged group with unfavorable outcomes. Since, a data model should not consider any protected attribute of a user while prediction, therefore, the biased training data are dropped or removed from training data, thus mitigating biases. The proposed technique allows for fair data generation which is of high relevance across data sensitive industries such as finance, healthcare etc. Conventional methods may rely on utilizing protected attribute which at times may not be readily available or not allowed to be used. For instances, where the protected attribute usage is not allowed, the proposed method develops and validates methods which minimizes error between clusters being discriminated differently in an unsupervised setting.
At 910, the server system 200 identifies a minimal ratio of training samples from each of the one or more cardholder groups, wherein the minimal number is the same for each cardholder group from the one or more cardholder groups.
At 912, the server system 200 removes a plurality of excess training data from the unbiased training data based on the minimal ratio. Thus, the remaining unbiased training data represents a stratified training dataset with equal number of training data for each of the one or more cardholder groups. The unbiased training data may also include synthetics training data corresponding the under-represented group.
At 914, the server system 200 trains the RL model based, at least in part, on the unbiased training data. The RL model is trained based at least on a plurality of fairness related metrics corresponding to two or more cardholder groups (e.g., unprivileged group) that is determined in simulated delayed episodes. The training of the RL model may be performed in offline manner. The training of the RL model is performed at steps 914a-914d, in iterative manner.
At 914a, the server system 200 defines state space and action space of the RL model. The state space may include a plurality of states. Each state corresponds to a plurality of credit features corresponding to cardholders. Each action corresponds to providing a credit loan approval to a particular cardholder or credit loan rejection for the particular cardholder.
At 914b, the server system 200 simulates delayed episodes for each training sample based on exploration factor.
At 914c, the server system 200 calculates reward values corresponding to each simulated episode based on the delayed reward function.
At 914d, the server system 200 provides the plurality of fairness metrics as feedback for fine-tuning the RL model.
At 916, the server system 200 trains a credit risk assessment model i.e., classification model based on the unbiased training data and the fair synthetic training data. Further, to mitigate bias in prediction output of the classification model, the server system 200 introduces a Mix-group neural layer network within initial layers of classification model. In an example, the Mix-group neural layer may be introduced in the initial layers of the classification model to remove protected attribute information associated with unbiased training data that are more prominent in the initial layers. The Mix-group neural layer probabilistically mixes group-level feature statistics of training data across different groups in the protected attribute. In this manner, all the training data in a batch of classification model training gets the same mean and variance.
The training of the classification model using the group mixing network may enable fine-tuning the pre-trained biased model on lesser unbiased training data to get unbiased prediction from the classification model. Hence, training of the pre-trained biased models from scratch on the whole unbiased training data is not required to mitigate bias in the classification model. This makes the process of bias mitigation more time, cost and energy effective.
At 918, the server system 200 evaluates a plurality fairness metrics, A/B testing metric, and a plurality of business metrics based on outputs of the classification model and the RL model. Further, based on the plurality of fairness metrics and a plurality of business metrics, the server system may learn to determine importance score of the classification model and the reinforcement learning model for predicting whether a future credit loan application for a cardholder should be approved or declined.
FIG. 10 is a flow diagram of a computer-implemented method 1000 for bias mitigation for credit risk assessment model, in accordance with an embodiment of the present disclosure. The method 1000 depicted in the flow diagram may be executed by the server system 200 which may be standalone server or a server as whole incorporated in another server system. Operations of the method 1000, and combinations of operation in the method 1000, may be implemented by, for example, hardware, firmware, a processor, circuitry and/or a different device associated with the execution of software that includes one or more computer program instructions.
In certain implementations, the method 1000 may be performed by a single processing thread. Alternatively, the method 1000 may be performed by two or more processing threads, each processing thread implementing one or more individual functions, routines, subroutines, or operations of the method. In an illustrative example, the processing threads implementing the method 1000 may be synchronized (e.g., using semaphores, critical sections, and/or other thread synchronization mechanisms). Alternatively, the processing threads implementing the method 1000 may be executed asynchronously with respect to each other. The method 1000 starts at operation 1002.
Although the present disclosure is described in conjunction with implementation in banking and finance sector for credit risk assessment. However, this use-case should not be construed as a limitation. In other examples of the present disclosure, the techniques described herein may be used in healthcare sector to assess insurance, etc.
At 1002, the method 1000 includes accessing, by the server system 200, training data from a database 114. The training data includes a plurality of training samples. Each training sample represents historical credit event timeline data associated with a cardholder. In an example, the training data may be divided into two or more cardholder groups. In such a case, the two or more cardholder groups may be defined based on protected attributes, such as gender, age, race, and the like. Such distribution of the training data is unwanted and leads to bias in credit risk assessment model. To this end, the training data may include biased training samples.
At 1004, the method 1000 includes filtering, by the server system 200, biased training data from the training data to generate unbiased training data based, at least in part, on a bias mitigation model. In an example, the server system 200 divides training data that belong to different groups, i.e., privileged or well-represented group and unprivileged or under-represented groups. Based on the classification, the server system 200 identifies biased training data that have similar features, belong to different groups from the well-represented group and the under-represented group, and has different outputs. For example, such bias training data may be identified in pairs or clusters as instances of bias. Based on difference in outputs, the server system 200 may rank the instances of bias. Subsequently, a pre-defined ration, for example, K% of such bias instances may be removed to filter the training data.
In certain cases, the training data may lack enough training data for under-represented or unprivileged groups. In such a case, the method 1000 includes generating, by the server system, fair synthetic training data corresponding to under-represented groups from the two or more cardholder groups based, at least in part, on the unbiased training data and a generative adversarial network (GAN) model.
At 1006, the method 1000 includes evaluating, by the server system 200, fairness evaluation metrics associated with a credit risk assessment model that is trained based, at least in part, on the unbiased training data. The credit risk assessment model is trained for determining a decision whether to extend a credit to a particular cardholder or not. In certain cases, Mix-group neural layer may also be introduced in initial layers of the credit risk assessment model to probabilistically mix group-level feature statistics of unbiased training data of a batch across different cardholder groups in the protected attribute. Subsequently, all the unbiased training data of the batch has a standard mean and a standard variance. In this way, effect of protected attributes on the training data is mitigated from credit risk assessment model.
At 1008, the method 1000 includes training, by the server system 200, a neural network model based, at least in part, on the unbiased training data. The neural network model is trained based at least on a plurality of fairness related metrics corresponding to two or more cardholder groups that is determined in simulated delayed episodes. In particular, based on the unbiased training data, the neural network model i.e., RL model learns to classify training data (or loan applications of applicants) into approve or decline. Such reinforcement learning is performed based on a reward function (that is dependent on fairness related metrics in each episode) and delayed reward function. The delayed reward function provides delayed reward feedback for a prediction or classification, for example, based on credit risk score and delinquency rate of a cardholder after a period of time. In an example, equal number of training data from each of the set of different groups, i.e., equal number of training data associated with under-privileged and privileged group may be used for training.
Although the invention has been described with reference to specific exemplary embodiments, it is noted that various modifications and changes may be made to these embodiments without departing from the broad spirit and scope of the invention. For example, the various operations, blocks, etc., described herein may be enabled and operated using hardware circuitry (for example, complementary metal oxide semiconductor (CMOS) based logic circuitry), firmware, software and/or any combination of hardware, firmware, and/or software (for example, embodied in a machine-readable medium). For example, the apparatuses and methods may be embodied using transistors, logic gates, and electrical circuits (for example, application specific integrated circuit (ASIC) circuitry and/or in Digital Signal Processor (DSP) circuitry).
Particularly, the server system 200 and its various components may be enabled using software and/or using transistors, logic gates, and electrical circuits (for example, integrated circuit circuitry such as ASIC circuitry). Various embodiments of the invention may include one or more computer programs stored or otherwise embodied on a computer-readable medium, wherein the computer programs are configured to cause a processor or computer to perform one or more operations. A computer-readable medium storing, embodying, or encoded with a computer program, or similar language, may be embodied as a tangible data storage device storing one or more software programs that are configured to cause a processor or computer to perform one or more operations. Such operations may be, for example, any of the steps or operations described herein. In some embodiments, the computer programs may be stored and provided to a computer using any type of non-transitory computer readable media. Non-transitory computer readable media include any type of tangible storage media. Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (compact disc read only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (BLU-RAY® Disc), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash memory, RAM (random access memory), etc.). Additionally, a tangible data storage device may be embodied as one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. In some embodiments, the computer programs may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (e.g., electric wires, and optical fibers) or a wireless communication line.
Various embodiments of the invention, as discussed above, may be practiced with steps and/or operations in a different order, and/or with hardware elements in configurations, which are different than those which, are disclosed. Therefore, although the invention has been described based upon these exemplary embodiments, it is noted that certain modifications, variations, and alternative constructions may be apparent and well within the spirit and scope of the invention.
Although various exemplary embodiments of the invention are described herein in a language specific to structural features and/or methodological acts, the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as exemplary forms of implementing the claims.
,CLAIMS:CLAIMS
We claim:

1. A computer-implemented method, comprising:
accessing, by a server system, training data comprising a plurality of training samples from a database, each training sample representing historical credit event timeline data associated with a cardholder of a plurality of cardholders;
filtering, by the server system, biased training data from the training data to generate unbiased training data based, at least in part, on a bias mitigation model;
evaluating, by the server system, fairness evaluation metrics associated with a credit risk assessment model that is trained based, at least in part, on the unbiased training data, the credit risk assessment model trained for determining a decision whether to extend a credit to a particular cardholder or not; and
training, by the server system, a neural network model based, at least in part, on the unbiased training data, the neural network model trained based at least on a plurality of fairness related metrics corresponding to two or more cardholder groups that is determined in simulated delayed episodes.

2. The computer-implemented method as claimed in claim 1, wherein the historical credit event timeline data comprises an event timeline including credit events associated with the cardholder occurred during a particular time interval.

3. The computer-implemented method as claimed in claim 1, wherein the bias mitigation model is based, at least in part, on K% removal method, and wherein filtering the biased training data from the training data comprises:
determining, by the server system, a set of credit features associated with each cardholder based, at least in part, on the plurality of training samples;
removing, by the server system, one or more credit features from the set of credit features that are in correlation with a particular protected attribute equal to or above a threshold correlation value;
dividing, by the server system, the plurality of training samples into the two or more cardholder groups based, at least in part, on one or more protected attributes and labeled data associated with the plurality of training samples, the two or more cardholder groups comprising at least a privileged group and an unprivileged group;
calculating, by the server system, a cosine similarity score of each training sample from the unprivileged group to each training sample of the privileged group;
identifying, by the server system, similar training samples from the privileged group and the unprivileged group based, at least in part, on a similarity threshold value and the cosine similarity score; and
removing, by the server system, top K% biased training samples from each of the privileged group and the unprivileged group to generate the unbiased training data, the top K% training samples removed based, at least in part, on identifying step.

4. The computer-implemented method as claimed in claim 3, further comprising:
generating, by the server system, fair synthetic training data corresponding to under-represented groups from the two or more cardholder groups based, at least in part, on the unbiased training data and a generative adversarial network (GAN) model.

5. The computer-implemented method as claimed in claim 3, wherein the one or more protected attributes comprise at least one of: gender, ethnicity, age, demographics.

6. The computer-implemented method as claimed in claim 1, wherein the credit risk assessment model is a classification model.

7. The computer-implemented method as claimed in claim 6, wherein the classification model is an artificial neural network (ANN) and wherein the classification model comprises a plurality of feed-forward layers and a Mix-group neural layer, the Mix-group neural layer configured to mitigate bias in classification output and enable learning representations that are invariant for the two or more cardholder groups.

8. The computer-implemented method as claimed in claim 1, wherein the neural network model is a reinforcement learning (RL) model configured to learn data representations of an under-represented group of the two or more cardholder groups in offline manner.

9. The computer-implemented method as claimed in claim 8, wherein the RL model is further trained based at least on a delayed reward function, the delayed reward function comprising at least credit risk score gain of the cardholder within an observation period and delinquency rate of the cardholder.

10. A server system configured to perform the computer-implemented method as claimed in any of the claims 1-9.

Documents

Application Documents

#	Name	Date
1	202141035017-STATEMENT OF UNDERTAKING (FORM 3) [03-08-2021(online)].pdf	2021-08-03
2	202141035017-PROVISIONAL SPECIFICATION [03-08-2021(online)].pdf	2021-08-03
3	202141035017-POWER OF AUTHORITY [03-08-2021(online)].pdf	2021-08-03
4	202141035017-FORM 1 [03-08-2021(online)].pdf	2021-08-03
5	202141035017-DRAWINGS [03-08-2021(online)].pdf	2021-08-03
6	202141035017-DECLARATION OF INVENTORSHIP (FORM 5) [03-08-2021(online)].pdf	2021-08-03
7	202141035017-Correspondence_Power of Attorney_09-08-2021.pdf	2021-08-09
8	202141035017-Proof of Right [19-11-2021(online)].pdf	2021-11-19
9	202141035017-Correspondence_Copy of Assignment_06-12-2021.pdf	2021-12-06
10	202141035017-DRAWING [02-08-2022(online)].pdf	2022-08-02
11	202141035017-CORRESPONDENCE-OTHERS [02-08-2022(online)].pdf	2022-08-02
12	202141035017-COMPLETE SPECIFICATION [02-08-2022(online)].pdf	2022-08-02
13	202141035017-FORM 18 [28-07-2025(online)].pdf	2025-07-28