Sign In to Follow Application
View All Documents & Correspondence

Methods And Systems For Training A Fraud Detection Model With Curriculum Learning

Abstract: Embodiments provide methods and systems for training a fraud detection model with curriculum learning. The method performed by a server system includes accessing a plurality of electronic payment transactions from a transaction database. The method includes pre-processing the plurality of electronic payment transactions for extracting a set of transaction data points corresponding to the plurality of electronic payment transactions. The method includes calculating curriculum level scores corresponding to the set of transaction data points based on an auto-encoder. The method includes sorting the set of transaction data points based on a curriculum level score associated with each transaction data point. The method includes inputting the set of transaction data points into the fraud detection model based on the curriculum level scores. The fraud detection model is trained in an order of the sorted transaction data points from easy training data points to hard training data points.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
17 December 2021
Publication Number
25/2023
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
ipo@epiphanyipsolutions.com
Parent Application

Applicants

MASTERCARD INTERNATIONAL INCORPORATED
2000 Purchase Street, Purchase, NY 10577, United States of America

Inventors

1. Kanishka Kayathwal
100, Vallabhbari, Gumanpura, Kota 324007, Rajasthan, India
2. Hardik Wadhwa
H.No. 125 B.Block Sirsa 125055, Haryana, India
3. Nitish Kumar
Flat No -316 E, Aditya syndicate, Adityapur, Jamshedpur 831014, Jharkhand, India
4. Karthikeswaren R
No. 1 K.P. Koil Street Saidapet, Chennai 600015, Tamil Nadu, India

Specification

Claims:CLAIMS
We claim:

1. A computer-implemented method, comprising:
accessing, by a server system, a plurality of electronic payment transactions for a particular time duration from a transaction database;
pre-processing, by the server system, the plurality of electronic payment transactions for extracting a set of transaction data points corresponding to the plurality of electronic payment transactions;
calculating, by the server system, curriculum level scores corresponding to the set of transaction data points, the curriculum level scores being calculated based, at least in part, on an auto-encoder;
sorting, by the server system, the set of transaction data points based, at least in part, on the curriculum level scores corresponding to the set of transaction data points; and
inputting, by the server system, the set of transaction data points into a fraud detection model for training the fraud detection model based, at least in part, on the curriculum level scores corresponding to the set of transaction data points, the fraud detection model trained in an order of the sorted transaction data points from easy training data points to hard training data points.

2. The computer-implemented method as claimed in claim 1, further comprising:
segmenting, by the server system, the set of transaction data points into easy training data points and hard training data points based, at least in part, on the curriculum level scores and fine-tunable hyper-parameters; and
generating, by the server system, an input vector associated with each transaction data point of the set of transaction data points based, at least in part, on the plurality of electronic payment transactions.

3. The computer-implemented method as claimed in claim 1, wherein the fraud detection model is trained for classifying the plurality of electronic payment transaction into fraudulent transactions or non-fraudulent transactions.

4. The computer-implemented method as claimed in claim 1, further comprising:
generating, by the server system, a plurality of data features based, at least in part, on the set of transaction data points, wherein the plurality of data features is divided into numerical data features and categorical data features, wherein the categorical data features are label encoded.

5. The computer-implemented method as claimed in claim 4, further comprising:
passing, by the server system, the categorical data features through an embedding layer of the fraud detection model and the numerical data features through a dense layer of the fraud detection model;
concatenating, by the server system, output of the embedding layer and the dense layer to generate a concatenated output; and
passing, by the server system, the concatenated output through a series of dense layers for training the fraud detection model.

6. The computer-implemented method as claimed in claim 2, wherein the fine-tunable hyper-parameters comprise an upper threshold value and a lower threshold value.

7. The computer-implemented method as claimed in claim 6, wherein the easy training data points comprise data samples having a set of curriculum level scores between the upper threshold value and the lower threshold value.

8. The computer-implemented method as claimed in claim 6, wherein the hard training data points comprise data samples having a set of curriculum level scores above the upper threshold value and below the lower threshold value.

9. The computer-implemented method as claimed in claim 2, wherein the fraud detection model is initially trained with the easy training data points and moderately introducing the hard training data points in a ratio until the fraud detection model is completely trained, and wherein the ratio is a proportion of number of the easy training data points to number of the hard training data points.

10. A server system configured to perform the computer-implemented method as claimed in claims 1-9.

11. A computer-implemented method of training a fraud detection model with curriculum learning in payment transactions, the computer-implemented method comprising:
accessing, by a server system associated with a payment network, a plurality of electronic payment transactions between a plurality of cardholders and one or more merchants for a particular time duration from a transaction database;
pre-processing, by the server system, the plurality of electronic payment transactions for extracting a set of transaction data points corresponding to the plurality of electronic payment transactions;
calculating, by the server system, curriculum level scores corresponding to the set of transaction data points, the curriculum level scores being calculated based, at least in part, on an auto-encoder;
sorting, by the server system, the set of transaction data points based, at least in part, on the curriculum level scores corresponding to the set of transaction data points; and
accessing, by the server system, a fraud detection model from a model database;
inputting, by the server system, the set of transaction data points into the fraud detection model for training the fraud detection model based, at least in part, on the curriculum level scores corresponding to the set of transaction data points, the fraud detection model trained in an order of the sorted transaction data points from easy training data points to hard training data points, wherein the fraud detection model is trained for classifying the plurality of electronic payment transaction into fraudulent transactions or non-fraudulent transactions.
, Description:
FORM 2
THE PATENTS ACT 1970
(39 of 1970)
&
The Patent Rules 2003
COMPLETE SPECIFICATION
(refer section 10 & rule 13)

1. TITLE OF THE INVENTION:
METHODS AND SYSTEMS FOR TRAINING A FRAUD DETECTION MODEL WITH CURRICULUM LEARNING

2. APPLICANT(S):

(a) Name:

(b) Nationality:

(c) Address:

MASTERCARD INTERNATIONAL INCORPORATED

United States of America

2000 Purchase Street, Purchase, NY 10577, United States of America

3. PREAMBLE TO THE DESCRIPTION

The following specification particularly describes the invention and the manner in which it is to be performed.

4. DESCRIPTION
(See next page)


METHODS AND SYSTEMS FOR TRAINING A FRAUD DETECTION MODEL WITH CURRICULUM LEARNING

TECHNICAL FIELD
[0001] The present disclosure relates to artificial intelligence systems and, more particularly to, electronic methods and complex processing systems for training fraud detection models with curriculum learning.

BACKGROUND
[0002] Over the last few years, there is an increase in fraudulent transactions happening with the use of a payment card. In general, a payment card is issued by a financial institution, such as a bank, to enable its owner (i.e., cardholder) to perform financial transactions or payments through electronic funds transfer or access currency notes through an automated teller machine (ATM) based on funds stored in the payment account of the cardholder. With the use of information stored on a payment card, fraudsters may perform card-present or card-not-present (CNP) fraud. Tracking fraud or default in electronic payment transactions is a very challenging task because fraudsters keep utilizing very sophisticated techniques in online payment account frauds, where such transactions appear to be legitimate transactions to parties involved in the transactions. In fact, fraudsters can look and behave exactly how an authentic customer might be expected to look and behave like while doing online and/or offline transactions.
[0003] The fraudulent transactions impact the card issuers in a negative way. Earlier, fraudulent card transactions could happen only when a stolen card was used by a fraudster at a merchant’s location. During later years, with better access to the internet in all parts of the world and a booming e-commerce market, the number of card-not-present (CNP) transactions increased by a large amount and this made it easier for fraudsters to attack. To detect fraudulent transactions, companies (e.g., payment card companies) started leveraging technologies (e.g., artificial intelligence (AI), machine learning (ML), neural network (NN) models, etc.) to their advantage and it worked in their favor for the most part.
[0004] However, the existing fraud detection models face the problem of data sparsity. The problem of data sparsity arises because the number of fraudulent transactions is very less as compared to the number of non-fraudulent transactions. Due to the unbalanced classes (fraudulent transactions and non-fraudulent transactions), the model finds it very difficult to accurately separate the two classes. This further leads to reduced accuracy of the fraud detection model as the model is not trained on enough examples to predict fraudulent transactions with accuracy during the implementation stage.
[0005] In view of the above discussion, there exists a technological need for a method of training a fraud detection model with improved accuracy.

SUMMARY
[0006] Various embodiments of the present disclosure provide methods and systems for improving merchant fraud detection with curriculum learning.
[0007] In an embodiment, a computer-implemented method is disclosed. The computer-implemented method performed by a server system includes accessing, by a server system, a plurality of electronic payment transactions for a particular time duration from a transaction database. The computer-implemented method includes pre-processing, by the server system, the plurality of electronic payment transactions for extracting a set of transaction data points corresponding to the plurality of electronic payment transactions. The computer-implemented method includes calculating, by the server system, curriculum level scores corresponding to the set of transaction data points. The curriculum level scores are calculated based, at least in part, on an auto-encoder. The computer-implemented method includes sorting, by the server system, the set of transaction data points based, at least in part, on a curriculum level score associated with each of the set of transaction data points. The computer-implemented method includes inputting, by the server system, the set of transaction data points into the fraud detection model based, at least in part, on the curriculum level scores corresponding to the set of transaction data points. The fraud detection model is trained in an order of the sorted transaction data points from the easy training data points to the hard training data points.
[0008] Other aspects and example embodiments are provided in the drawings and the detailed description that follows.

BRIEF DESCRIPTION OF THE FIGURES
[0009] For a more complete understanding of example embodiments of the present technology, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
[0010] FIG. 1 illustrates an exemplary representation of an environment related to at least some example embodiments of the present disclosure;
[0011] FIG. 2 illustrates a simplified block diagram of a server system, in accordance with an embodiment of the present disclosure;
[0012] FIG. 3 is an example representation of communication data flow in various modules of the server system for training a fraud detection model with curriculum learning, in accordance with an embodiment of the present disclosure;
[0013] FIG. 4 is a schematic representation of a process for sorting and segmenting a set of transaction data points into easy training data points and hard training data points, in accordance with an embodiment of the present disclosure;
[0014] FIG. 5 represents a flow chart of a process flow for the training of the fraud detection model with curriculum learning, in accordance with an embodiment of the present disclosure; and
[0015] FIG. 6 illustrates a flow diagram depicting a computer-implemented method for training of the fraud detection model with curriculum learning, in accordance with an embodiment of the present disclosure.
[0016] The drawings referred to in this description are not to be understood as being drawn to scale except if specifically noted, and such drawings are only exemplary in nature.

DETAILED DESCRIPTION
[0017] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure can be practiced without these specific details.
[0018] Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in an embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
[0019] Moreover, although the following description contains many specifics for the purposes of illustration, anyone skilled in the art will appreciate that many variations and/or alterations to said details are within the scope of the present disclosure. Similarly, although many of the features of the present disclosure are described in terms of each other, or in conjunction with each other, one skilled in the art will appreciate that many of these features can be provided independently of other features. Accordingly, this description of the present disclosure is set forth without any loss of generality to, and without imposing limitations upon, the present disclosure.
[0020] The term "merchant", used throughout the description generally refers to a seller, a retailer, a purchase location, an organization, or any other entity that is in the business of selling goods or providing services, and it can refer to either a single business location or a chain of business locations of the same entity.
[0021] The terms "cardholder", “user”, and “customer” are used interchangeably throughout the description and refer to a person who holds a credit or a debit card that will be used by a merchant to perform a payment transaction.
[0022] The term "payment network", used herein, refers to a network or collection of systems used for the transfer of funds through the use of cash substitutes. Payment networks may use a variety of different protocols and procedures in order to process the transfer of money for various types of transactions. Transactions that may be performed via a payment network may include product or service purchases, credit purchases, debit transactions, fund transfers, account withdrawals, etc. Payment networks may be configured to perform transactions via cash-substitutes, which may include payment cards, letters of credit, checks, financial accounts, etc. Examples of networks or systems configured to perform as payment networks include those operated by such as Mastercard®.
[0023] The terms "easy training data points", "easy data points", "easy training examples", and "easy examples" are used interchangeably throughout the description and refer to fraudulent transactions that can easily be interpreted or recognized as fraudulent transactions by a fraud detection model. The fraudulent transactions may have certain attributes based on which the fraud detection model easily classifies such transactions as fraudulent transactions.
[0024] The terms "hard training data points", "hard data points", "hard training examples", and "hard examples" are used interchangeably throughout the description and refer to transactions that cannot be easily interpreted or recognized as a fraudulent transaction or non-fraudulent transaction by a fraud detection model. The fraudulent transactions may have certain attributes that make them difficult for the fraud detection model to classify such transactions as fraudulent transactions.
OVERVIEW
[0025] Various embodiments of the present disclosure provide methods, systems electronic devices, and computer program products for improving a fraud detection model with curriculum learning. More specifically, embodiments of the present disclosure disclose a method for training the fraud detection model with curriculum learning. In one embodiment, the fraud detection model corresponds to any neural network (NN) architecture.
[0026] In general, curriculum learning is a method of training a machine learning (ML) model or NN model in which the model is initially trained with easier training examples and gradually difficulty level of training examples is increased until the model is completely trained. In other words, curriculum learning mimics how a human child typically learns. For example, a child is initially taught alphabets and then words, or arithmetic is generally taught before teaching algebra to the child. In a similar manner, curriculum learning is a method of teaching any ML model or NN model initially with easy data points, and then moderately the corresponding ML model or NN model is trained with hard or difficult data points.
[0027] In general, the neural network is a network of artificial neurons or nodes that mimics the way a human brain operates to recognize patterns and solve problems in the field of artificial intelligence (AI), machine learning (ML), deep learning (DL), and the like. In addition, there are many different types of neural networks such as recurrent neural network (RNN), convolutional neural network (CNN), feed-forward neural network, etc.
[0028] As noted above, any fraud detection model based on neural networks faces a problem of data sparsity. Enough training data does not exist to improve the accuracy of the fraud detection model because the number of fraudulent transactions is very less as compared to the number of non-fraudulent transactions. As a result, the fraud-detection model is able to detect fraudulent transactions in real-time but with less accuracy and thus, may fail to detect a few fraudulent transactions which may further lead to financial loss.
[0029] To overcome such problems or limitations, the present disclosure describes a server system that is configured to train the fraud detection model with curriculum learning. In one embodiment, the use of curriculum learning boosts the performance of the neural network model. The fraud detection model is initially trained on easy data points and hard data points are gradually added to train the fraud detection model over the next batches and epochs.
[0030] At least one of the technical problems addressed by the present disclosure includes: (i) improved accuracy of fraud detection models and (ii) performance boost of neural networks.
[0031] The server system includes at least a processor and memory. In one non-limiting example, the server system is a payment server. The server system is configured to access a plurality of electronic payment transactions for a particular time duration (e.g., 1 month, 3 months, 6 months, 1 year, etc.) from a transaction database. In one embodiment, the server system may send some queries to the transaction database to access the plurality of electronic payment transactions. The electronic payment transactions may further be performed between a plurality of cardholders and a plurality of merchants.
[0032] The server system is configured to pre-process the plurality of electronic payment transactions to extract a set of transaction data points corresponding to the plurality of electronic payment transactions. In one embodiment, each data point may refer to the transaction features of each electronic payment transaction. In addition, the server system is configured to generate a plurality of data features based, at least in part, on the set of transaction data points. Each data feature of the plurality of data features is categorized into numerical data features or categorical data features. The categorical data features are label encoded. In one embodiment, the server system is configured to generate an input vector associated with each transaction data point of the set of transaction data points based, at least in part, on the plurality of electronic payment transactions.
[0033] The server system is configured to perform label encoding on the categorical data features. The categorical data feature is passed through an embedding layer of the fraud detection model and the numerical data feature is passed through a dense layer of the fraud detection model. In addition, the output of the embedding layer and the dense layer is concatenated to generate a concatenated output. The concatenated output is further passed through a series of dense layers to train the fraud detection model.
[0034] The server system is configured to calculate curriculum level scores corresponding to the set of transaction data points. In addition, the curriculum level scores are calculated based, at least in part, on an auto-encoder. In one embodiment, a curriculum level score is calculated for each transaction data point of the set of transaction data points. The server system is configured to sort the set of transaction data points based, at least in part, on the curriculum level score associated with each of the set of transaction data points. In one embodiment, the server system sorts the set of transaction data points in decreasing the value of the curriculum level score. In one embodiment, the value of the curriculum level scores defines the probability of fraudulent transactions.
[0035] The server system is further configured to segment the set of transaction data points into easy training data points and hard training data points based, at least in part, on the curriculum level scores and fine-tunable hyper-parameters. In one embodiment, the fine-tunable hyper-parameters may include an upper threshold value, a lower threshold value, and a curriculum learning function. The easy training data points include data samples sorted between the upper threshold value and the lower threshold value. The hard training data points include data samples sorted above the upper threshold value or below the lower threshold value.
[0036] Furthermore, the server system is configured to train the fraud detection model. The server system is configured to input the set of transaction data points into the fraud detection model based, at least in part, on the curriculum level scores corresponding to the set of transaction data points. The fraud detection model is trained in an order of the sorted transaction data points from the easy training data points to the hard training data points. In one example, the fraud detection model is initially trained with the easy training data points, and moderately the hard training data points are introduced in a pre-defined ratio until the fraud detection model trains only on the hard training points. In one embodiment, the pace of introduction of the hard training data points is determined based, at least in part, on the curriculum learning function. The fraud detection model is trained to classify an electronic payment transaction into a fraudulent transaction or non-fraudulent transaction in real-time. The pre-defined ratio is the proportion of the number of the easy training data points and the number of the hard training data points.
[0037] Various embodiments of the present disclosure offer multiple advantages and technical effects. For instance, the present disclosure performs training of a fraud detection neural network (NN) model with curriculum learning. In addition, the present disclosure talks about training the fraud detection model initially with easy training data points and further increasing the difficulty level of training by gradually introducing the hard training data points. The present disclosure further provides a performance boost and improved accuracy in the existing fraud detection model.
[0038] Various example embodiments of the present disclosure are described hereinafter with reference to FIGS. 1 to 6.
[0039] FIG. 1 illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, improving merchant fraud detection with facilitation of curriculum learning, etc. The environment 100 generally includes a server system 102, a model database 104 including a fraud detection model 106, a transaction database 108 including information of one or more merchants 112, a cardholder 114, an acquirer server 116, and an issuer server 118, each coupled to, and in communication with (and/or with access to) a network 110. The network 110 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber-optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among the entities illustrated in FIG. 1, or any combination thereof.
[0040] Various entities in the environment 100 may connect to the network 110 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof. For example, the network 110 may include multiple different networks, such as a private network made accessible by the network 110 to the server system 102, and a public network (e.g., the Internet, etc.).
[0041] The server system 102 is configured to perform one or more of the operations described herein. The server system 102 is configured to apply curriculum learning to rare event detections such as transaction frauds where classes were imbalanced and a detection model finds it difficult to separate the two classes. In one embodiment, the server system 102 is configured to train a fraud detection model 106 with the facilitation of curriculum learning methods. The server system 102 is configured to build a deep learning based fraud detection model to detect fraud at a transaction level using transaction attributes and transaction velocity features. The server system 102 is configured to enhance the training of the fraud detection model by introducing the curriculum learning based training.
[0042] The server system 102 is a separate part of the environment 100 and may operate apart from (but still in communication with, for example, via the network 110, and any third-party external servers (to access data to perform the various operations described herein). However, in other embodiments, the server system 102 may actually be incorporated, in whole or in part, into one or more parts of the environment 100, for example, a payment server 120. In addition, the server system 102 should be understood to be embodied in at least one computing device in communication with the network 110, which may be specifically configured, via executable instructions, to perform as described herein, and/or embodied in at least one non-transitory computer-readable media.
[0043] In one embodiment, the server system 102 is configured to access a plurality of electronic payment transactions for a particular time duration (e.g., 1 month, 3 months, 6 months, 1 year, etc.) from the transaction database 108. In one embodiment, the electronic payment transactions may be performed between the one or more merchants 112 and a plurality of cardholders (e.g., the cardholder 114). The information associated with the electronic payment transactions may be stored in the transaction database 108. In addition, the transaction database 108 may be a repository of electronic payment transactions that occurred between the one or more merchants 112 and the plurality of cardholders (e.g., the cardholder 114) in past.
[0044] After accessing the plurality of electronic payment transactions, the server system 102 is configured to pre-process the plurality of electronic payment transactions to extract a set of transaction data points corresponding to the plurality of electronic payment transactions. In addition, the server system 102 may generate an input vector associated with each transaction data point of the set of transaction data points based, at least in part, on the plurality of electronic payment transactions. In one embodiment, the server system 102 is configured to categorize each of the set of transaction data points into numerical data feature or categorical data feature.
[0045] The server system 102 is configured to calculate a curriculum level score corresponding to each transaction data point based, at least in part, on an auto-encoder and further, the set of transaction data points are sorted based on the value of the curriculum level score. In one example, the server system 102 is configured to calculate a first curriculum level score for the first transaction data point, a second curriculum level score for the second transaction data point, and so on. In general, an auto-encoder is a type of artificial neural network that is used to learn a compressed representation of raw data. In other words, auto-encoder is a type of neural network that includes an encoder to convert an input into a code and based on this code, a decoder reconstructs the input.
[0046] In one embodiment, the server system 102 is configured to sort the set of transaction data points in decreasing order of value of the curriculum level scores of the set of transaction data points. The server system 102 is further configured to segment the set of transaction data points into easy training data points and hard training data points based, at least in part, on the curriculum level scores and fine-tunable hyper-parameters (i.e., an upper threshold value, a lower threshold value, and a curriculum learning function). The server system 102 is configured to initially train the fraud detection model 106 with the easy training data points and then moderately introduce the hard training data points in a pre-defined ratio until the fraud detection model 106 is completely trained. In one embodiment, the pre-defined ratio may be set by the server system 102 or an administrator.
[0047] The model database 104 includes the fraud detection model 106. In one embodiment, the model database 104 provides storage location to the fraud detection model 106. In an example, information related to the fraud detection model 106 (e.g., metadata) is stored using tables in the model database 104. In another example, information related to the fraud detection model 106 is stored in an Extensible markup language (XML) file or a binary file. In one embodiment, the fraud detection model 106 may be associated with the issuer server 118.
[0048] In one embodiment, the fraud detection model 106 is a neural network (NN) model. In an example, the fraud detection model 106 is a three-layer NN model, a four-layer NN model, a five-layer NN model, …, n-layer neural network model, where n is a natural number. In addition, the fraud detection model 106 may include an input layer, various hidden layers, and an output layer.
[0049] The transaction database 108 includes information associated with the one or more merchants 112. In one embodiment, the transaction database 108 provides storage location to information associated with the one or more merchants 112. In one example, information associated with the one or more merchants 112 may include data such as the name of various merchants, transaction information of cardholders (e.g., the cardholder 114) at a particular merchant, fraudulent or non-fraudulent transactions information performed at various merchants, various terminals (e.g., point-of-sale (POS) device, automated teller machine (ATM), etc.) associated with each merchant, and the like.
[0050] In one embodiment, the one or more merchants 112 may be associated with the acquirer server 116. In one embodiment, the acquirer server 116 is associated with a financial institution (e.g., a bank) that processes financial transactions. This can be an institution that facilitates the processing of payment transactions for physical stores, merchants (e.g., the one or more merchants 112), or an institution that owns platforms that make online purchases or purchases made via software applications possible (e.g., shopping cart platform providers and in-app payment processing providers). The terms “acquirer”, “acquiring bank”, or “acquirer server” will be used interchangeably herein.
[0051] The cardholder such as the cardholder 114 may be any individual, representative of a corporate entity, non-profit organization, or any other person. The cardholder 114 may have a payment account issued by corresponding issuing banks (associated with the issuer server 118) and may be provided a payment card with financial or other account information encoded onto the payment card such that the cardholder 114 may use the payment card to initiate and complete a transaction using a bank account at the issuing bank. Examples of the payment card may include, but are not limited to, a smartcard, a debit card, a credit card, etc.
[0052] In one embodiment, the cardholder 114 is associated with the issuer server 118. In one embodiment, the issuer server 118 is associated with a financial institution normally called an "issuer bank" or "issuing bank" or simply "issuer", in which a cardholder (e.g., the cardholder 114) may have a payment account, (which also issues a payment card, such as a credit card or a debit card), and provides microfinance banking services (e.g., payment transaction using credit/debit cards) for processing electronic payment transactions, to the cardholder (e.g., the cardholder 114).
[0053] In one embodiment, a payment network 122 may be used by the payment cards issuing authorities as a payment interchange network. The payment network 122 may include a plurality of payment servers such as, the payment server 120. Examples of payment interchange network include, but are not limited to, Mastercard® payment system interchange network. The Mastercard® payment system interchange network is a proprietary communications standard promulgated by Mastercard International Incorporated® for the exchange of financial transactions among a plurality of financial activities that are members of Mastercard International Incorporated®. (Mastercard is a registered trademark of Mastercard International Incorporated located in Purchase, N.Y.).
[0054] The number and arrangement of systems, devices, and/or networks shown in FIG. 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1. Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 100 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 100.
[0055] Referring now to FIG. 2, a simplified block diagram of a server system 200 is shown, in accordance with an embodiment of the present disclosure. The server system 200 is identical to the server system 102. In some embodiments, the server system 200 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture. The server system 200 includes a computer system 202 and a database 204. In one embodiment, the database 204 is identical to the model database 104. The computer system 202 includes at least one processor 206 for executing instructions, a memory 208, a communication interface 210, and a storage interface 214 that communicate with each other via a bus 212.
[0056] In some embodiments, the database 204 is integrated within the computer system 202. For example, the computer system 202 may include one or more hard disk drives as the database 204. A storage interface 214 is any component capable of providing the processor 206 with access to the database 204. The storage interface 214 may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing the processor 206 with access to the database 204. In one embodiment, the database 204 is configured to store a fraud detection model 226 and an auto-encoder 228. The fraud detection model 226 is identical to the fraud detection model 106.
[0057] Examples of the processor 206 include, but are not limited to, an application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a field-programmable gate array (FPGA), and the like. The memory 208 includes suitable logic, circuitry, and/or interfaces to store a set of computer-readable instructions for performing operations. Examples of the memory 208 include a random-access memory (RAM), a read-only memory (ROM), a removable storage drive, a hard disk drive (HDD), and the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the memory 208 in the server system 200, as described herein. In another embodiment, the memory 208 may be realized in the form of a database server or cloud storage working in conjunction with the server system 200, without departing from the scope of the present disclosure.
[0058] The processor 206 is operatively coupled to the communication interface 210 such that the processor 206 is capable of communicating with a remote device 216 such as, the payment server 120, or communicating with any entity connected to the network 110 (as shown in FIG. 1). In one embodiment, the processor 206 is configured to access a plurality of electronic payment transactions for a particular time duration from the transaction database 108.
[0059] It is noted that the server system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the server system 200 may include fewer or more components than those depicted in FIG. 2.
[0060] In one embodiment, the processor 206 includes a data pre-processing engine 218, a scoring engine 220, a pacing engine 222, and a model training engine 224. It should be noted that components, described herein, such as the data pre-processing engine 218, the scoring engine 220, the pacing engine 222, and the model training engine 224 can be configured in a variety of ways, including electronic circuitries, digital arithmetic, and logic blocks, and memory systems in combination with software, firmware, and embedded technologies.
[0061] The data pre-processing engine 218 includes suitable logic and/or interfaces for accessing a plurality of electronic payment transactions for a particular time duration (e.g., 1 month, 6 months, 1 year, 2 years, etc.) from the transaction database 108. The electronic payment transactions may have been performed between a cardholder (e.g., the cardholder 114) or the plurality of cardholders, and the one or more merchants 112. In one example, the electronic payment transactions may have been performed online through a website accessed on a web browser or an application installed in a user device (e.g., laptop, desktop, Android smartphone, iPhone, tablet, etc.) associated with the cardholder 114.
[0062] In addition, the electronic payment transactions may have been performed using a payment card (e.g., credit card, debit card, etc.) associated with the cardholder 114. The payment card may have stored information (e.g., name of the cardholder 114, card number, expiry date, etc.) on a magnetic stripe or smart chip embedded on the payment card. The payment card may be issued by a financial institution normally called an "issuer bank" or "issuing bank" or simply "issuer", in which the cardholder 114 may have a payment account, and provides microfinance banking services (e.g., payment transaction using credit/debit cards) for processing electronic payment transactions, to the cardholder 114.
[0063] In one embodiment, the plurality of electronic payment transactions may include fraudulent transactions (labeled as fraud transactions) and non-fraudulent transactions (labeled as non-fraud transactions). The data pre-processing engine 218 is configured to pre-process the plurality of electronic payment transactions for extracting a set of transaction data points corresponding to the plurality of electronic payment transactions (fraud label and non-fraud label transactions). Examples of data pre-processing operations performed by the data pre-processing engine 218 include normalization operations, splitting of datasets, merging of datasets, and other suitable preprocessing operations.
[0064] In one embodiment, the data pre-processing engine 218 may split the set of transaction data points into training data points, validation data points, and testing data points. In one embodiment, the training data points may be used during training of the fraud detection model 226 and the validation data points may be used at the end of each epoch of training to check whether the performance metrics of the fraud detection model 226 are improving or not. The testing data points may be used during the testing of the fraud detection model 226. The data pre-processing engine 218 may sort the set of transaction data points and further split them in the ratio of 70:10:20 for training, validation, and testing data points. The earliest data points (i.e., 70%) are used as training data points, the later data points (i.e., 10%) are used as validation data points and the last data points (i.e., 20%) are used as testing data points.
[0065] The data pre-processing engine 218 is further configured to generate a plurality of data features based, at least in part, on the set of transaction data points. In one embodiment, each transaction data point is associated with each electronic payment transaction. In addition, the server system 200 is configured to generate one input vector for each transaction data point of the set of transaction data points. The data pre-processing engine 218 further categorizes each data feature of the plurality of data features into numerical data features or categorical data features. In one embodiment, the categorical data feature is label encoded and used in the entity embedding of the fraud detection model 226. In general, label encoding is a process of converting categorical features into integer values to convert them into machine-readable form.
[0066] In one embodiment, the plurality of data features may include merchant-level features and the like. In one embodiment, the merchant-level features may include features based on payment transactions of at least one merchant. In one example, the merchant-level features may include, but are not limited to, the name of each merchant, country code of each merchant, average card spend in the last ‘n’ number of days at each merchant, an approval rate of electronic payment transactions for each merchant, fraud rate for each merchant, total purchase amount for a pre-determined duration spent at each merchant, total purchase amount spent by various cardholders possessing various card types for the pre-determined duration, total number of transactions by the various cardholders having different card types within the pre-determined duration, total number of online transactions performed at each merchant within the pre-determined duration, and total numbers of transactions involving a payment card at each merchant within the pre-determined time duration.
[0067] In one example, the data pre-processing engine 218 is configured to remove special characters and numerical data features from the transaction data points and convert the data into lowercase. The data pre-processing engine 218 separates the numerical data features. In general, few of the numerical data features may have missing values up to a certain percentage level. For example, 1% of missing values represent a mild category, 10% of missing values represent a moderate category, and 30% of missing values represent a high category. The numerical data features with more than 30% of missing values (high category) are completely ignored by the data pre-processing engine 218. The data pre-processing engine 218 further uses class-wise mean imputation to impute these missing numerical features. Furthermore, all the numerical features are normalized before quantifying the features moderately affected using k Nearest Neighbor (kNN) imputation.
[0068] In general, kNN imputation tries to identify k samples in a dataset that are similar in data space, and then these k samples are used to estimate the value of the missing data points. In addition, missing values in each sample are imputed using a mean value of the k neighbors found in the dataset.
[0069] The data pre-processing engine 218 is further configured to pre-process the categorical data feature. In one embodiment, the categorical data features with nearly around 60% of missing data are completely ignored by the data pre-processing engine 218. In one example, data features such as payment card numbers are split and label encoded based, at least in part, on the number of times they appeared in the set of transaction data points. In addition, remaining data features are also processed by the data pre-processing engine 218 using label encoding. In one embodiment, the data pre-processing engine 218 is configured to quantify the set of transaction data points by utilizing suitable techniques based on the type of data present in the dataset.
[0070] In one embodiment, the data pre-processing engine 218 is configured to transmit the pre-processed transaction data to the fraud detection model 226.
[0071] In general, the fraud detection model 226 may include code, which, when executed by the processor 206, cause the processor 206 to compute an indicator in response to receipt of transaction data to indicate if the transaction data represents a fraudulent transaction or a non-fraudulent transaction. In one embodiment, the fraud detection model 226 may include a plurality of neural network layers. To train the fraud detection model 226, each of the categorical data features (with or without high cardinality) is passed through an embedding layer of the fraud detection model 226 (after label encoding) and each of the numerical data features is passed through a dense layer of the fraud detection model 226.
[0072] In one embodiment, the processor 206 is configured to concatenate outputs of the embedding layer and the dense layer to generate a concatenated output. The concatenated output is passed through a series of dense layers with a batch normalization process. To reduce the highly imbalanced nature of the dataset containing the set of transaction data points, a weightage of "50" is added to the fraud class. However, the value of weightage is not limited to the above stated value.
[0073] The scoring engine 220 includes suitable logic and/or interfaces for calculating curriculum level scores corresponding to the set of transaction data points. In one embodiment, the scoring engine 220 is configured to calculate a curriculum level score for each transaction data point based, at least in part, on the auto-encoder 228.
[0074] In general, auto-encoder is a specialized type of artificial neural network (ANN) that is configured to learn efficient coding of unlabeled data. The encoding is further validated and refined by attempting to regenerate the input from the encoding. In other words, auto encoders are a type of deep neural network model that can be used to reduce data dimensionality. Deep neural network models are composed of many layers of neural units, and in auto encoders, every pair of adjacent layers forms a full bipartite graph of connectivity. The layers of an auto encoder collectively create an hourglass figure where the input layer is large and subsequent layer sizes reduce in size until the center-most layer is reached. From there until the output layer, layer sizes expand back to the original input size.
[0075] In one embodiment, each of the transaction data points is passed as an input through the auto-encoder 228 and the auto-encoder 228 tries to reconstruct the input (i.e., the transaction data point). The auto-encoder 228 can reconstruct the input back but along with a loss value. In one embodiment, this loss value is termed as the curriculum level score. In some embodiments, the curriculum loss value may be termed as reconstruction loss.
[0076] The scoring engine 220 is further configured to sort the set of transaction data points based, at least in part, on the curriculum level score associated with each transaction data point of the set of transaction data points. In one embodiment, the scoring engine 220 is configured to sort the set of transaction data points in descending order of value of the curriculum level scores. The scoring engine 220 is further configured to segment the set of transaction data points into easy training data points and hard training data points based, at least in part, on the curriculum level scores and fine-tunable hyper-parameters. In one embodiment, the fine-tunable hyper parameters include an upper threshold value, a lower threshold value, and the curriculum learning function.
[0077] In one embodiment, the easy training data points refer to training data samples based on which the fraud detection model 226 can be easily classified. In other words, the easy training data points refer to fraudulent transactions that can easily be detected by the fraud detection model 226. Contrary to the easy training data points, the hard training data points refer to training data samples based on which the fraud detection model 226 faces difficulty to be trained. In other words, the hard training data points refer to fraudulent transactions that are so close or similar in attributes to non-fraudulent transactions that they cannot be easily detected by the fraud detection model 226 and thus, the fraud detection model 226 may classify these fraudulent transactions as non-fraudulent transactions.
[0078] The easy training data points include training data samples having a set of curriculum level scores between the upper threshold value and the lower threshold value. In other words, the training data samples are associated with the set of curriculum level scores which is close to zero or above the upper threshold value. In addition, the hard training data points include training data samples having a set of curriculum level scores above the upper threshold value or below the lower threshold value. In an embodiment, the upper threshold value and the lower threshold value may be set or decided automatically by the scoring engine 220. In another embodiment, the upper threshold value and the lower threshold value may be set manually by an administrator.
[0079] In one embodiment, the administrator may be any person, organization, or individual associated with the server system 200. In an embodiment, the administrator may be responsible for upkeep and maintenance of the server system 200. In another embodiment, the administrator may be responsible for troubleshooting the server system 200.
[0080] In one embodiment, the auto-encoder 228 receives embeddings of the training data points, and validation data points from the fraud detection model 226. Along with the embeddings, numerical data features are also utilized as training data points and validation data points. Both the training data points, and validation data points are pooled together and then split using the stratified k fold method with shuffling before feeding them to the auto-encoder 228. In one embodiment, kth fold validation reconstruction error (i.e., the curriculum level score) for each transaction data point is calculated while saving the original order of the data. While preserving the original order of the dataset, the reconstruction error is again segmented or split into training data points subset and validation data points subset.
[0081] In one example, the value of the lower threshold value is set to 10 percentile and the value of the upper threshold value is set to 90 percentiles. In one embodiment, each transaction data point with the curriculum level score below the lower threshold value and above the upper threshold value is considered as the hard training data point. In addition, each transaction data point with the curriculum level score between the lower threshold value and the upper threshold value is considered as the easy training data point. Accordingly, all transaction data points below 10 percentile and above 90 percentiles are considered as the hard training data points and all transaction data points between 10 percentile and 90 percentiles are considered as the easy training data points.
[0082] In some embodiments, the auto-encoder 228 is a vanilla auto-encoder. In general, a vanilla auto-encoder is a three-layer neural network containing only one hidden layer.
[0083] The pacing engine 222 includes suitable logic and/or interfaces for determining the pace or rate of providing the easy training data points and the hard training data points during training of the fraud detection model 226. Initially, the fraud detection model 226 is trained with the easy training data points, and then moderately, the rate or pace of the hard training data points is increased. The pacing engine 222 determines this pace or rate of introduction of the hard training data points along with the easy training data points during training of the fraud detection model 226.
[0084] The pacing engine 222 is configured to calculate a ratio, based on which the hard training data points are introduced along with the easy training data points during training of the fraud detection model 226. In one embodiment, the ratio is a ratio of the number of the easy training data points to the number of the hard training data points. In addition, the ratio determines the difficulty level of training of the fraud detection model 226.
[0085] In one embodiment, the pace or rate of introduction of the hard training data points is determined based, at least in part, on the curriculum learning function. The curriculum learning function is a fine-tunable hyper-parameter that defines the rate of introduction of the hard data points during training of the fraud detection model 226.
[0086] The model training engine 224 includes suitable logic and/or interfaces for training the fraud detection model 226 based on the easy training data points and the hard training data points. The pace or rate of introduction of the set of data points is already determined by the pacing engine 222.
[0087] In one example, the scoring engine 220 is configured to sort the set of transaction data points into the easy training data points and the hard training data points. In addition, the pacing engine 222 is configured to calculate the curriculum learning function to train the fraud detection model 226 for a batch size of 512. Based on the batch size of 512, the pacing engine 222 may initially send the easy training data points to the fraud detection model 226 for the first 365 batches. This ensures proper training of the fraud detection model 226 with the easy training data points. After the completion of 365 batches, the pacing engine 222 may start sending the hard training data points to the fraud detection model 226 in a linear manner. In one example, the pacing engine 222 may start sending the hard training data points to the fraud detection model 226 from 366th batch onwards until 375 batches, ensuring the fraud detection model 226 only trains on the hard training data points from that batch onwards.
[0088] For example, for the 366th batch, the pacing engine 222 may send 10 hard training data points along with 90 easy training data points to the fraud detection model 226. For the 367th batch, the pacing engine 222 may send 20 hard training data points along with 80 easy training data points to the fraud detection model 226. For the 368th batch, the pacing engine 222 may send 30 hard training data points along with 70 easy training data points to the fraud detection model 226, and so on. In a similar manner, the pacing engine 222 may gradually increase the difficulty level of training of the fraud detection model 226 until the 375th batch. Thereafter, the pacing engine 222 may send only the hard training data points to train the fraud detection model 226 till the 468th batch. The pacing engine 222 may further follow the same process for sending the hard training data points for other epochs as well.
[0089] In a similar manner, the model training engine 224 is configured to train the fraud detection model 226 for other epochs as well. In one embodiment, the model training engine 224 is configured to train the fraud detection model 226 with the execution of the pacing engine 222. In one example, the model training engine 224 is configured to train the fraud detection model 226 for 50 epochs with early stopping. In general, early stopping is a method that first enables a user to specify an arbitrarily large number of training epochs and then stops the training once the performance of the model stops improving on a holdout validation data set. In one embodiment, the pacing engine 222 determines the number of iterations and epochs based on the batch size.
[0090] In one example, the model training engine 224 may use a custom data generator to generate custom data for training of the fraud detection model 226. Once the training of the fraud detection model 226 is complete, the fraud detection model 226 may be used to detect fraudulent transactions associated with a merchant of the one or more merchants 112 in real-time. In one example, a fraudster may try to perform financial fraud using a stolen payment card over the network. The fraud detection model 226 may detect the fraudulent transaction and further decline the transaction or block the payment card of the cardholder 114.
[0091] In one embodiment, the pacing engine 222 may segment the set of transaction data points into the easy training data points and the hard training data points only during training of the fraud detection model 226 on the training data points subset. Once training of the fraud detection model 226 is complete, there is no need to utilize the pacing engine 222 to segment the transaction data points for validation since the fraud detection model 226 is used to detect fraudulent transactions during the validation stage. In real-time, the fraud detection model 226 is trained for classifying an electronic payment transaction into a fraudulent transaction or non-fraudulent transaction. In one example, during the implementation stage, the fraud detection model 226 is provided with an electronic payment transaction as input. Based on the attributes of the electronic payment transaction and the training of the fraud detection model 226, the fraud detection model 226 detects the electronic payment transaction to be a fraudulent transaction or non-fraudulent transaction.
[0092] FIG. 3 is an example representation 300 of communication data flow in various modules of the server system 200 for training the fraud detection model 226 with curriculum learning, in accordance with an embodiment of the present disclosure.
[0093] The fraud detection model 226 is a neural network model that is configured to detect fraudulent transactions in real-time. In one embodiment, the fraud detection model 226 is a three-layer NN model, four-layer NN model, …., n-layer NN model, where n is a natural number. The fraud detection model 226 is trained based on the training data set containing various attributes of fraudulent transactions as well as non-fraudulent transactions to enable the fraud detection model 226 to detect whether a given electronic payment transaction in real-time is fraudulent transaction or not. In one embodiment, the fraud detection model 226 is initially trained without curriculum learning, and thereafter, the fraud detection model 226 is trained with curriculum learning and the performance metrics of training with and without curriculum learning are evaluated.
[0094] At first, the data pre-processing engine 218 is configured to access the plurality of electronic payment transactions for the particular time duration (e.g., 1 year, 2 years, 5 years, etc.) from the transaction database 108 (see, 302). In one embodiment, the data pre-processing engine 218 may send queries to the transaction database 108 to access the plurality of electronic payment transactions. The electronic payment transactions may include both the fraudulent transactions and non-fraudulent transactions that may have occurred between cardholders (e.g., the cardholder 114) and the one or more merchants 112 in the past.
[0095] The data pre-processing engine 218 is configured to pre-process the plurality of electronic payment transactions to extract a set of transaction data points corresponding to the plurality of electronic payment transactions. Examples of pre-processing operations performed by the data pre-processing engine 218 include normalization operations, splitting of datasets, merging of datasets, and other suitable preprocessing operations.
[0096] After performing data pre-processing operations, the processor 206 is configured to train the fraud detection model 226 (see, 304). In one embodiment, the processor 206 is configured to train the fraud detection model 226 without curriculum learning. In one example, the fraud detection model 226 is trained on 210k transaction data points, validated on 30k transaction data points and finally tested on 60k transaction data points. In total, the fraud detection model 226 is trained, validated, and tested on a total of 300k transaction data points. In addition, the fraud detection model 226 undergoes out-of-time testing to ensure that there is no leakage during the training of the fraud detection model 226. In general, out-of-time testing is performed with out-of-time validation samples, wherein the out-of-time validation samples include data from an entirely different time period than what was used during development of the fraud detection model 226.
[0097] In one embodiment, the data pre-processing engine 218 is configured to generate a plurality of data features based, at least in part, on the set of transaction data points. In one embodiment, each transaction data point is associated with each electronic payment transaction. In addition, the server system 200 is configured to generate one input vector for each transaction data point of the set of transaction data points. The data pre-processing engine 218 further categorizes each data feature into numerical data feature or categorical data feature. In one embodiment, the numerical data features are used without any changes, and the categorical data features are label encoded for the embedding layer.
[0098] In one example, the data pre-processing engine 218 separates the numerical data features and the categorical data features. In one example, the numerical data features with more than 30% of missing values and the categorical data features with more than 60% of missing values are completely ignored by the data pre-processing engine 218. In addition, the data pre-processing engine 218 may use class-wise mean imputation to impute these missing data features.
[0099] To train the fraud detection model 226, the processor 206 is configured to pass each categorical data feature through an embedding layer of the fraud detection model 226 (after label encoding) and each numerical data feature through a dense layer of the fraud detection model 226. The processor 206 is further configured to concatenate output of the embedding layer and the dense layer to generate a concatenated output and this concatenated output is then passed through a series of dense layers with batch normalization and a dropout of 0.25 for training the fraud detection model 226. However, value of dropout is not limited to above stated value. In one embodiment, a weightage of 50 is also added to the fraud class to reduce the highly imbalanced nature of dataset containing the set of transaction data points. However, value of weightage is also not limited to above stated value.
[00100] After training of the fraud detection model 226 without curriculum learning, the processor 206 is configured to train the fraud detection model 226 with curriculum learning. In one embodiment, curriculum learning allows the fraud detection model 226 to quickly move across the loss landscape first using the easy training data points and then settling down at a good minimum with introduction of the hard training data points.
[00101] For training the fraud detection model 226 with curriculum learning, the scoring engine 220 and the pacing engine 222 are utilized by the processor 206. In one embodiment, the processor 206 is configured to receive embeddings of the training data points, and validation data points from the fraud detection model 226. Along with the embeddings, the processor 206 is configured to utilize numerical data features as training data points and validation data points. Further, both the training data points, and validation data points are pooled together and then split using stratified k fold method with shuffling before feeding them to the auto-encoder 228. In one embodiment, kth fold validation reconstruction error (i.e., the curriculum level score) for each transaction data point is calculated while saving original order of the data. While preserving the original order of the dataset, the reconstruction error is again segmented or split into training data points subset and validation data points subset.
[00102] As stated above, the processor 206 is configured to calculate curriculum level scores corresponding to the set of transaction data points with execution of the scoring engine 220. The scoring engine 220 further calculates the curriculum level scores based, at least in part, on the auto-encoder 228. In one embodiment, a curriculum level score is calculated for each transaction data point. Thereafter, the processor 206 is configured to sort the set of transaction data points based on the calculated curriculum level scores. In other words, the processor 206 is configured to sort the set of transaction data points based on difficulty level of the set of transaction data points.
[00103] Based on difficulty level of the set of transaction data points, the processor 206 is configured to segment each of the transaction data point into easy training data point or hard training data point (see, 306). The term 'easy training data point' represents a data sample (e.g., fraudulent transaction) that can easily be detected as fraudulent transaction by the fraud detection model 226 based, at least in part, on the attributes of that particular electronic payment transaction. However, the term ‘hard training data point’ represents a data sample (e.g., fraudulent transaction) that can easily be missed by the fraud detection model 226 based, at least in part, on the attributes of that particular electronic payment transaction. In other words, attributes of the electronic payment transaction may be altered by a fraudster in such a way that although the transaction is a fraudulent transaction, but the fraud detection model 226 fails to detect the fraudulent transaction as the transaction mimics a non-fraudulent transaction.
[00104] The processor 206 is configured to segment the set of transaction data points based on the curriculum level scores and fine-tunable hyper-parameters (i.e., the upper threshold value and the lower threshold value). In one embodiment, the processor 206 is configured to set the lower threshold value to 10 percentile and the upper threshold value to 90 percentiles. In addition, easy training data point is a data sample whose curriculum level score is either very close to 0 or high. In other words, any data sample sorted between the lower threshold value and the upper threshold value by the processor 206 is an easy training data point. Further, any data sample sorted below the lower threshold value and above the upper threshold value by the processor 206 is termed as hard training data point. In other words, the easy training data points and the hard training data points include the training data for training of the fraud detection model 226.
[00105] In one example, any transaction data point with high curriculum level score represents electronic payment transaction with high probability of fraud and any transaction data point with curriculum level score closer to 0 represents electronic payment transaction with low probability of fraud.
[00106] The processor 206 is further configured to determine how to introduce the easy training data points and the hard training data points to train the fraud detection model 226. In one embodiment, the processor 206 is configured to determine pace or rate of introduction of the hard training data points based, at least in part, on the curriculum learning function with execution of the pacing engine 222 (see, 308). Initially, the processor 206 is configured to train the fraud detection model 226 with only the easy training data points (i.e., the curriculum level scores between the lower threshold value and the upper threshold value) and then the hard training data points (i.e., the curriculum level scores below the lower threshold value and above the upper threshold value) are introduced gradually in a linear manner in batches in the ratio (e.g., pre-defined ratio) to complete the training of the fraud detection model 226. In one embodiment, the ratio may be increased by the processor 206 to increase difficulty level of training of the fraud detection model 226. The ratio is ratio of number of the easy training data points to number of the hard training data points. In addition, the ratio determines difficulty level of training of the fraud detection model 226. In an embodiment, the ratio may be set automatically by the server system 200 based on the batch size. In another embodiment, the ratio may be set manually by the administrator based on the batch size.
[00107] In one embodiment, the curriculum learning function is denoted by δ percent. The processor 206 is configured to introduce only the easy training data points in the early batches during training of the fraud detection model 226. In later batches, δ percentage of the easy training data points is replaced by the hard training data points and towards the end, the fraud detection model 226 is trained with only the hard training data points in all batches.
[00108] In one embodiment, value of δ may be varied in multiple of 5 (e.g., 5, 10, 15, and so on). In one embodiment, a random grid search is used to determine optimal or best value of δ and optimal value of δ comes out to be 10%. Thus, the processor 206 is configured to introduce the hard training data points along with the easy training data points based on the optimal value of δ (i.e., 10).
[00109] In one example, for a batch size of 512, the pacing engine 222 may initially send only the easy training data points for first 365 batches to ensure proper training of the fraud detection model 226. After completion of 365 batches, the pacing engine 222 may start sending the hard training data points to the fraud detection model 226 in a linear manner. For example, for 366th batch, the pacing engine 222 may send 10 hard training data points along with 90 easy training data points to the fraud detection model 226. For 367th batch, the pacing engine 222 may send 20 hard training data points along with 80 easy training data points to the fraud detection model 226. For 368th batch, the pacing engine 222 may send 30 hard training data points along with 70 easy training data points to the fraud detection model 226, and so on. It is to be noted that the pacing engine 222 gradually increases difficulty level of training of the fraud detection model 226 until 375th batch based on value of δ (i.e., 10). After 375th batch, the pacing engine 222 may send only the hard training data points to train the fraud detection model 226 till 468th batch. In one example, 468 batch marks end of one epoch and the fraud detection model 226 is trained in a similar manner for 50 epochs in total.
[00110] Based on the pace or rate of introduction of the hard training data points determined by the pacing engine 222, the processor 206 is configured to train the fraud detection model 226 with execution of the model training engine 224 (see, 310). The model training engine 224 trains the fraud detection model 226, but this time, based on the curriculum learning with execution of the scoring engine 220 and the pacing engine 222. With facilitation of curriculum learning, there is a significant boost in performance of the neural network (i.e., the fraud detection model 226).
Performance Metrics
[00111] In general, performance metrics are used to infer performance of a neural network model. The fraud detection model 226 is trained without curriculum learning and thereafter trained using curriculum learning. The performance metrics of the fraud detection model 226 without curriculum learning and with curriculum learning are recorded for evaluation purposes. Values of various performance metrics after training of the fraud detection model 226 without curriculum learning are illustrated below in Table 1:
Precision Recall AUCPR F1 Score
0.13 0.14 0.053 0.13
Table 1. Performance metrics of the fraud detection model without curriculum learning
[00112] Values of various performance metrics after training of the fraud detection model 226 with curriculum learning are illustrated below in Table 2:
Precision Recall AUCPR F1 Score
0.17 0.2 0.078 0.18
Table 2. Performance metrics of the fraud detection model with curriculum learning
[00113] From Table 1 and Table 2, it can be observed that there is an increase of around 30% in precision performance metric with curriculum learning. In general, precision is the rate of values that measures the accuracy of positive predictions. In addition, it can be observed that there is an increase of around 43% in recall performance metric with curriculum learning. In general, recall is the rate of values that measures positive instances that were correctly identified by a classifier model (e.g., NN model).
[00114] From Table 1 and Table 2, it can be further observed that there is an increase of around 48% in Area under curve for precision recall (AUCPR) performance metric with curriculum learning. In general, AUCPR is an area under precision recall curve, and more area under the curve denotes better performance of the NN model. Furthermore, it can be observed that there is an increase of around 38% in F1 score performance metric with curriculum learning. In general, F1 score is the harmonic mean of precision and recall. Thus, it can easily be observed that training the fraud detection model 226 with curriculum learning results in overall better performance metrics as compared to training the fraud detection model 226 without curriculum learning.
[00115] FIG. 4 is a schematic representation 400 of a process for sorting and segmenting the set of transaction data points into the easy training data points and the hard training data points, in accordance with an embodiment of the present disclosure.
[00116] The schematic representation 400 is explained herewith including entities such as, a set of transaction data points 405, a lower threshold value 410, an upper threshold value 415, easy training data points 420, and hard training data points 425. As mentioned previously, the processor 206 is configured to perform training of the fraud detection model 226 with curriculum learning. In general, with curriculum learning, neural network model (e.g., the fraud detection model 226) is initially trained with the easy training data points 420, and thereafter difficulty level of training is increased by gradually introducing the hard training data points 425 in a linear manner in the ratio over the next set of batches and epochs. Before training, the processor 206 is configured to determine the easy training data points 420 and the hard training data points 425 based on the calculation of curriculum level scores corresponding to the set of transaction data points 405. The processor 206 is configured to calculate the curriculum level scores with the execution of the auto-encoder 228.
[00117] The processor 206 is configured to pre-process the plurality of electronic payment transactions accessed from the transaction database 108 to extract the set of transaction data points 405. The steps for pre-processing the plurality of electronic transactions for extracting the set of transaction data points 405 are herein explained in detail with reference to FIG. 2, and therefore, they are not reiterated for the sake of brevity.
[00118] After pre-processing, the processor 206 is configured to trigger the scoring engine 220 to sort the set of transaction data points 405 into the easy training data points 420 and the hard training data points 425 (see, 402). In one embodiment, the scoring engine 220 includes the auto-encoder 228 (as shown in FIG. 4). The auto-encoder 228 is configured to calculate a curriculum level score corresponding to each transaction data point of the set of transaction data points 405. In other words, the auto-encoder 228 receives the set of transaction data points 405 as an input, and thereafter, the auto-encoder 228 tries to re-generate the set of transaction data points 405 along with some error values or reconstruction losses (i.e., the curriculum level scores).
[00119] For example, the auto-encoder 228 is fed with a transaction data point X as an input. Based on the input X, the auto-encoder 228 provides an output X’ which is re-generated based on the input X. In addition, the difference between the output X’ and the input X is the curriculum level score or reconstruction loss.
[00120] The processor 206 is configured to calculate the curriculum level score for each of the set of transaction data points 405 and thereafter, the curriculum level scores are sorted in descending order of value. However, the curriculum level scores can also be sorted in ascending order. The processor 206 is further configured to segment the set of transaction data points 405 into the easy training data points 420 and the hard training data points 425 based on the lower threshold value 410 and the upper threshold value 415 (see, 404).
[00121] In one embodiment, the upper threshold value 415 and the lower threshold value 410 may be denoted by ‘X’ and ‘Y’ percentiles respectively. In one example, the top ‘X’ percentile and the bottom ‘Y’ percentile of the set of transaction data points 405 are labeled as the hard training data points 425. The transaction data points are labeled as the easy training data points 420. In one embodiment, the value of ‘X’ percentile may vary in a range of 80 to 95 in a multiple of 5 (e.g., 80, 85, 90, or 95). In a similar manner, the value of ‘Y’ percentile may vary in a range of 05 to 15 in a multiple of 5 (e.g., 05, 10, or 15). However, to choose optimal values for ‘X’ and ‘Y’ percentiles, a random grid search is performed and the optimal value for ‘X’ comes out to be 90 and for ‘Y’, the optimal value comes out to be 10.
[00122] The scoring engine 220 is configured to sort the transaction data points between the easy training data points 420 and the hard training data points 425 (as shown in FIG. 4). The steps for sorting and segmenting the transaction data points into the easy training data points 420 and the hard training data points 425 are herein explained in detail with reference to FIG. 2, and therefore, they are not reiterated for the sake of brevity.
[00123] FIG. 5 represents a flow chart 500 of a process flow for the training of the fraud detection model 226 with curriculum learning, in accordance with an embodiment of the present disclosure. The sequence of operations of the flow chart 500 may not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped together and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner. It is to be noted that to explain the flow chart 500, references may be made to elements described in FIG. 1 and FIG. 2.
[00124] At 502, the server system 200 accesses the plurality of electronic payment transactions for the particular time duration (e.g., 1 month, 6 months, 1 year, 4 years, etc.) from the transaction database 108.
[00125] At 504, the server system 200 may pre-process the plurality of electronic payment transactions to extract a set of transaction data points corresponding to the plurality of electronic payment transactions. Examples of pre-processing operations may include but not limited to normalization operations, splitting of datasets, merging of datasets, and other suitable preprocessing operations.
[00126] At 506, the server system 200 inputs each transaction data point in the auto-encoder 228 to calculate a curriculum level score corresponding to each transaction data point of the set of transaction data points. In one embodiment, the auto-encoder 228 is utilized by the scoring engine 220 to calculate the curriculum level score corresponding to each transaction data point. In some embodiments, the auto-encoder 228 is configured to calculate the curriculum level score corresponding to each data feature of the plurality of data features.
[00127] At 508, the server system 200 sorts the set of transaction data points in a descending order based, at least in part, on the curriculum level scores associated with the set of transaction data points with the execution of the scoring engine 220. In some embodiments, the server system 200 may sort the set of transaction data points in ascending order.
[00128] At 510, the server system 200 segments each transaction data point into either easy training data point or hard training data point based, at least in part, on the calculated curriculum level score and the upper threshold value and the lower threshold value. The segmentation is performed for segmenting the set of transaction data points into the easy training data points and the hard training data points.
[00129] At 512, the server system 200 determines the pace of introduction of the hard training data points along with the easy training data points during training of the fraud detection model 226. In one embodiment, the pace is determined based, at least in part, on the curriculum learning function.
[00130] At 514, the server system 200 trains the fraud detection model 226 initially with the easy training data points and moderately introduces the hard training data points in the ratio based on the curriculum learning function. The ratio may include a pre-defined ratio or may be set by the administrator in real-time. Thus, the server system 200 trains the fraud detection model 226 with curriculum learning with the execution of the scoring engine 220 and the pacing engine 222.
[00131] The sequence of steps of the flow chart 500 need not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped together and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner.
[00132] FIG. 6 illustrates a flow diagram depicting a computer-implemented method 600 for the training of the fraud detection model 226 with curriculum learning, in accordance with an embodiment of the present disclosure. The method 600 depicted in the flow diagram may be executed by, for example, the server system 200. Operations of the method 600, and combinations of operation in the method 600, may be implemented by, for example, hardware, firmware, a processor, circuitry, and/or a different device associated with the execution of software that includes one or more computer program instructions. The operations of the method 600 are described herein may be performed by an application interface that is hosted and managed with help of the server system 200. The method 600 starts at operation 602.
[00133] At operation 602, the method 600 includes accessing, by the server system 200, the plurality of electronic payment transactions for the particular time duration from the transaction database 108.
[00134] At operation 604, the method 600 includes pre-processing, the plurality of electronic payment transactions for extracting the set of transaction data points corresponding to the plurality of electronic payment transactions.
[00135] At operation 606, the method 600 includes calculating, by the server system 200, curriculum level scores corresponding to the set of transaction data points. The curriculum level scores are calculated based, at least in part, on the auto-encoder 228.
[00136] At operation 608, the method 600 includes sorting, by the server system 200, the set of transaction data points based, at least in part, on the curriculum level score associated with each of the set of transaction data points.
[00137] At operation 610, the method 600 includes inputting, by the server system 200, the set of transaction data points into the fraud detection model 226 based, at least in part, on the curriculum level scores corresponding to the set of transaction data points. The fraud detection model is accessed from the model database 104. The fraud detection model 226 is trained in an order of the sorted transaction data points from the easy training data points to the hard training data points.
[00138] The sequence of operations of the method 600 need not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped together and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner.
[00139] Without limiting the scope of the present disclosure, the one or more example embodiments disclosed herein provide methods and systems for training a fraud detection model with curriculum learning. More specifically, the method performed by the server system trains the fraud detection model initially with easier training examples and thereafter gradually hard training examples are introduced in a ratio based on a curriculum learning function. In addition, the server system applies curriculum learning framework to rare event detections such as payment fraud where the classes are highly unbalanced. The method of utilization of curriculum learning for training of the fraud detection model shows a significant boost in the performance of the underlying neural network.
[00140] The disclosed methods with reference to FIGS. 1 to 6, or one or more operations of the methods 500 and 600 may be implemented using software including computer-executable instructions stored on one or more computer-readable media (e.g., non-transitory computer-readable media, such as one or more optical media discs, volatile memory components (e.g., DRAM or SRAM), or nonvolatile memory or storage components (e.g., hard drives or solid-state nonvolatile memory components, such as Flash memory components) and executed on a computer (e.g., any suitable computer, such as a laptop computer, net book, Web book, tablet computing device, smart phone, or other mobile computing device). Such software may be executed, for example, on a single local computer or in a network environment (e.g., via the Internet, a wide-area network, a local-area network, a remote web-based server, a client-server network (such as a cloud computing network), or other such network) using one or more network computers. Additionally, any of the intermediate or final data created and used during implementation of the disclosed methods or systems may also be stored on one or more computer-readable media (e.g., non-transitory computer-readable media) and are considered to be within the scope of the disclosed technology. Furthermore, any of the software-based embodiments may be uploaded, downloaded, or remotely accessed through a suitable communication means. Such suitable communication means include, for example, the Internet, the World Wide Web, an intranet, software applications, cable (including fiber optic cable), magnetic communications, electromagnetic communications (including RF, microwave, and infrared communications), electronic communications, or other such communication means.
[00141] Although the disclosure has been described with reference to specific exemplary embodiments, it is noted that various modifications and changes may be made to these embodiments without departing from the broad spirit and scope of the disclosure. For example, the various operations, blocks, etc. described herein may be enabled and operated using hardware circuitry (for example, complementary metal oxide semiconductor (CMOS) based logic circuitry), firmware, software and/or any combination of hardware, firmware, and/or software (for example, embodied in a machine-readable medium). For example, the apparatuses and methods may be embodied using transistors, logic gates, and electrical circuits (for example, application-specific integrated circuit (ASIC) circuitry and/or in Digital Signal Processor (DSP) circuitry).
[00142] Particularly, the server system 200 (e.g., the server system 102) and its various components such as the computer system 202 and the database 204 may be enabled using software and/or using transistors, logic gates, and electrical circuits (for example, integrated circuit circuitry such as ASIC circuitry). Various embodiments of the disclosure may include one or more computer programs stored or otherwise embodied on a computer-readable medium, wherein the computer programs are configured to cause a processor or computer to perform one or more operations. A computer-readable medium storing, embodying, or encoded with a computer program, or similar language, may be embodied as a tangible data storage device storing one or more software programs that are configured to cause a processor or computer to perform one or more operations. Such operations may be, for example, any of the steps or operations described herein. In some embodiments, the computer programs may be stored and provided to a computer using any type of non-transitory computer readable media. Non-transitory computer readable media include any type of tangible storage media. Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g., magneto-optical disks), CD-ROM (compact disc read only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (BLU-RAY® Disc), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash memory, RAM (random access memory), etc.). Additionally, a tangible data storage device may be embodied as one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. In some embodiments, the computer programs may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (e.g., electric wires, and optical fibers) or a wireless communication line.
[00143] Various embodiments of the invention, as discussed above, may be practiced with steps and/or operations in a different order, and/or with hardware elements in configurations, which are different than those which are disclosed. Therefore, although the invention has been described based upon these exemplary embodiments, it is noted that certain modifications, variations, and alternative constructions may be apparent and well within the spirit and scope of the invention.
[00144] Although various exemplary embodiments of the invention are described herein in a language specific to structural features and/or methodological acts, the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as exemplary forms of implementing the claims.

Documents

Application Documents

# Name Date
1 202141058989-STATEMENT OF UNDERTAKING (FORM 3) [17-12-2021(online)].pdf 2021-12-17
2 202141058989-POWER OF AUTHORITY [17-12-2021(online)].pdf 2021-12-17
3 202141058989-FORM 1 [17-12-2021(online)].pdf 2021-12-17
4 202141058989-FIGURE OF ABSTRACT [17-12-2021(online)].jpg 2021-12-17
5 202141058989-DRAWINGS [17-12-2021(online)].pdf 2021-12-17
6 202141058989-DECLARATION OF INVENTORSHIP (FORM 5) [17-12-2021(online)].pdf 2021-12-17
7 202141058989-COMPLETE SPECIFICATION [17-12-2021(online)].pdf 2021-12-17
8 202141058989-Proof of Right [15-01-2022(online)].pdf 2022-01-15
9 202141058989-Correspondence_Form-26_04-02-2022.pdf 2022-02-04
10 202141058989-Correspondence_Assignment_07-03-2022.pdf 2022-03-07