Methods And Systems For Training A Fraud Detection Model And Detecting

< Back

Methods And Systems For Training A Fraud Detection Model And Detecting Fraud

Abstract: Methods and systems for training a fraud detection model for detecting fraud are disclosed. The method performed by a server system includes accessing labeled data samples and unlabeled data samples from a database. Method includes training fraud detection model based on labeled samples. Method includes performing operations iteratively until performance of fraud detection model reaches a predefined criteria. Operations include (1) predicting, by fraud detection model, a pseudo-label for each unlabeled data sample, (2) generating pseudo-labeled data samples based on unlabeled data samples and pseudo-label predicted for each unlabeled data, (3) computing, by first autoencoder model and second autoencoder model, confidence score for each unlabeled data sample, (4) generating, by fraud detection model, soft pseudo-labeled data samples based on pseudo-labeled data samples and confidence score for each unlabeled data sample, (5) re-training fraud detection model based on labeled data samples and soft pseudo-labeled data samples.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

04 November 2022

Publication Number

19/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

MASTERCARD INTERNATIONAL INCORPORATED

2000 Purchase Street, Purchase, NY 10577, United States of America

Inventors

1. Soumyadeep Ghosh

Block H, Plot 20, Baishnabghata Patuli, Kolkata 700094, West Bengal, India

2. Ankur Saraswat

B-202, LIONS CGHS, Sector 56, Gurgaon 122011, Haryana, India

3. Awanish Kumar

13/65, Vikas Nagar, Near RLB school, Lucknow 226022, Uttar Pradesh, India

4. Janu Verma

House No. 21, Ward No. 10, Darshan Nagar, Nowshera 185151, District - Rajouri, Jammu and Kashmir, India

Specification

DESC: The present disclosure relates to artificial intelligence-based processing systems and, more particularly to, electronic methods and systems for training a fraud detection model and detecting fraudulent activities using artificial intelligence.
BACKGROUND
Various industries such as finance, e-commerce, social media, food delivery, and the like generally face various issues due to fraudulent activities. In various examples, fraudsters on payment networks, fake reviews on a hotel booking/rating website, or food delivery/rating apps have become common occurrences in recent times. Such fraudulent activities must be detected and addressed as they result in both monetary loss and a loss of goodwill for the various entities involved in such industries. To that end, fraud detection is a key problem that needs to be addressed for several applications. In other words, it is very crucial to detect and prevent fraudulent activities such as financial or credit card fraud (identity theft and merchant fraud), money laundering, loan fraud, insurance claim fraud, health care fraud (fake/overcharged bills), online reviewing fraud, and so on.
Conventionally, machine learning algorithms have been used to perform fraud detection to prevent such fraudulent activities from taking place. Although conventional machine learning algorithms yield high accuracies, they do so while requiring a large amount of labeled data to provide consistent performance. In fact, most machine learning models for performing any task require training using the large amount of labeled data. It should be noted that these machine learning algorithms are trained using large quantities of data owned by various entities that are facing fraudulent activities. However, most of the data available to these entities is generally unlabeled, and labeling this unlabeled data is an extremely complex, expensive, and time-consuming activity. Therefore, it is very difficult to optimize the performance of conventional machine learning algorithms due to this lack of labeled data thereby, leading to inadequate performance while performing tasks, such as fraud detection.
Thus, there exists a need for a technical solution to address the above-mentioned issues faced during fraud detection.
SUMMARY
Various embodiments of the present disclosure provide methods and systems for training a fraud detection model and detecting fraud using the fraud detection model.
In an embodiment, a computer-implemented method for training a fraud detection model and detecting fraud using the fraud detection model is disclosed. The computer-implemented method performed by a server system includes accessing a set of historical fraud data from a database associated with the server system. Herein, the set of historical fraud data includes a set of labeled data samples and a set of unlabeled data samples. The method further includes training the fraud detection model based, at least in part, on the set of labeled samples. The method further includes performing a plurality of operations iteratively until the performance of the fraud detection model reaches predefined criteria.
The plurality of operations includes predicting, by the fraud detection model, a pseudo-label for each unlabeled data sample from the set of unlabeled data samples. Herein, the pseudo-label indicates whether a particular unlabeled data sample is one of a fraud sample or a non-fraud sample. The plurality of operations further includes generating a set of pseudo-labeled data samples based, at least in part, on the set of unlabeled data samples and the pseudo-label predicted for each unlabeled data. The plurality of operations further includes computing, by a first autoencoder model and a second autoencoder model associated with the server system, a confidence score for each unlabeled data sample. The plurality of operations further includes generating, by the fraud detection model, a set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the confidence score for each unlabeled data sample. The plurality of operations further includes re-training the fraud detection model based, at least in part, on the set of labeled data samples and the set of soft pseudo-labeled data samples, wherein the re-training fine-tunes the performance of the fraud detection model. Further, the method includes determining, by the fraud detection model, a final class label for each unlabeled data sample. Herein, the final class label indicates whether the particular unlabeled data sample is one of the fraud data sample or the non-fraud data sample.
In another embodiment, a server system is disclosed. The server system includes a communication interface and a memory including executable instructions. The server system also includes a processor communicably coupled to the memory and to the communication interface. The processor is configured to execute the instructions to cause the server system, at least in part, to access a set of historical fraud data from a database associated with the server system. Herein, the set of historical fraud data includes a set of labeled data samples and a set of unlabeled data samples. Further, the server system is caused to train a fraud detection model based, at least in part, on the set of labeled samples. Further, the server system is caused to perform a plurality of operations iteratively until the performance of the fraud detection model reaches predefined criteria.
The plurality of operations includes predicting, by the fraud detection model, a pseudo-label for each unlabeled data sample from the set of unlabeled data samples. Herein, the pseudo-label indicates whether a particular unlabeled data sample is one of a fraud sample or a non-fraud sample. The plurality of operations further includes generating a set of pseudo-labeled data samples based, at least in part, on the set of unlabeled data samples and the pseudo-label predicted for each unlabeled data. The plurality of operations further includes computing, by a first autoencoder model and a second autoencoder model associated with the server system, a confidence score for each unlabeled data sample. The plurality of operations further includes generating, by the fraud detection model, a set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the confidence score for each unlabeled data sample. The plurality of operations further includes re-training the fraud detection model based, at least in part, on the set of labeled data samples and the set of soft pseudo-labeled data samples, wherein the re-training fine-tunes the performance of the fraud detection model. Further, the method includes determining, by the fraud detection model, a final class label for each unlabeled data sample. Herein, the final class label indicates whether the particular unlabeled data sample is one of the fraud data sample or the non-fraud data sample.
In yet another embodiment, a non-transitory computer-readable storage medium is disclosed. The non-transitory computer-readable storage medium includes computer-executable instructions that, when executed by at least a processor of a server system, cause the server system to perform a method. The method includes accessing a set of historical fraud data from a database associated with the server system. Herein, the set of historical fraud data includes a set of labeled data samples and a set of unlabeled data samples. The method further includes training a fraud detection model based, at least in part, on the set of labeled samples. The method further includes performing a plurality of operations iteratively until the performance of the fraud detection model reaches predefined criteria.
The plurality of operations includes predicting, by the fraud detection model, a pseudo-label for each unlabeled data sample from the set of unlabeled data samples. Herein, the pseudo-label indicates whether a particular unlabeled data sample is one of a fraud sample or a non-fraud sample. The plurality of operations further includes generating a set of pseudo-labeled data samples based, at least in part, on the set of unlabeled data samples and the pseudo-label predicted for each unlabeled data. The plurality of operations further includes computing, by a first autoencoder model and a second autoencoder model associated with the server system, a confidence score for each unlabeled data sample. The plurality of operations further includes generating, by the fraud detection model, a set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the confidence score for each unlabeled data sample. The plurality of operations further includes re-training the fraud detection model based, at least in part, on the set of labeled data samples and the set of soft pseudo-labeled data samples, wherein the re-training fine-tunes the performance of the fraud detection model. Further, the method includes determining, by the fraud detection model, a final class label for each unlabeled data sample. Herein, the final class label indicates whether the particular unlabeled data sample is one of the fraud data sample or the non-fraud data sample.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
BRIEF DESCRIPTION OF THE FIGURES
For a more complete understanding of example embodiments of the present technology, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
FIGS. 1A and 1B illustrate exemplary representations of different environments related to at least some example embodiments of the present disclosure;
FIG. 2 illustrates a simplified block diagram of a server system, in accordance with an embodiment of the present disclosure;
FIG. 3A illustrates a schematic block diagram representation for training a fraud detection model using labeled data and pseudo-labeled data, in accordance with an embodiment of the present disclosure;
FIG. 3B illustrates a schematic block diagram representation for training a fraud detection model incorporating a noise reduction layer using labeled data and pseudo-labeled data, in accordance with an embodiment of the present disclosure;
FIGS. 4A-4B, collectively, illustrate a method for training a fraud detection model, in accordance with an embodiment of the present disclosure;
FIG. 5A illustrates a graphical representation of samples selected/refined using a classification hint from autoencoders and an Elliptic dataset;
FIG. 5B illustrates a graphical representation of a percentage of samples selected using the classification hint from the autoencoders; and
FIGS. 6A-6B, collectively, illustrate a method for training a fraud detection model and detecting fraud using the fraud detection model, in accordance with an embodiment of the present disclosure.
The drawings referred to in this description are not to be understood as being drawn to scale except if specifically noted, and such drawings are only exemplary in nature.

DETAILED DESCRIPTION
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure can be practiced without these specific details.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in an embodiment” in various places in the specification does not necessarily all refer to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
Moreover, although the following description contains many specifics for the purposes of illustration, anyone skilled in the art will appreciate that many variations and/or alterations to said details are within the scope of the present disclosure. Similarly, although many of the features of the present disclosure are described in terms of each other, or in conjunction with each other, one skilled in the art will appreciate that many of these features can be provided independently of other features. Accordingly, this description of the present disclosure is set forth without any loss of generality to, and without imposing limitations upon, the present disclosure.
The term "payment network", used herein, refers to a network or collection of systems used for the transfer of funds through the use of cash substitutes. Payment networks may use a variety of different protocols and procedures to process the transfer of money for various types of transactions. Transactions that may be performed via a payment network may include product or service purchases, credit purchases, debit transactions, fund transfers, account withdrawals, etc. Payment networks may be configured to perform transactions via cash substitutes, which may include payment cards, letters of credit, checks, financial accounts, etc. Examples of networks or systems configured to perform as payment networks include those operated by such as Mastercard®.
The terms “account holder”, “user”, “cardholder”, “consumer”, and “buyer” are used interchangeably throughout the description and refer to a person who has a payment account or at least one payment card (e.g., credit card, debit card, etc.) may or may not be associated with the payment account, that will be used by a merchant to complete the payment transaction that may be initiated by the cardholder. The payment account may be opened via an issuing bank or an issuer server.
The term "merchant", used throughout the description generally refers to a seller, a retailer, a purchase location, an organization, or any other entity that is in the business of selling goods or providing services, and it can refer to either a single business location or a chain of business locations of the same entity.
The term “issuer”, used throughout the description, refers to a financial institution normally called an “issuer bank” or “issuing bank” in which an individual or an institution may have an account. The issuer also issues a payment card, such as a credit card or a debit card, etc. Further, the issuer may also facilitate online banking services, such as electronic money transfer, bill payment, etc., to the account holders through a server called “issuer server” throughout the description. The terms “issuer”, “issuer bank”, “issuing bank” or “issuer server” will be used interchangeably herein.
Further, the term “acquirer”, is a financial institution (e.g., a bank) that processes financial transactions for merchants. In other words, this can be an institution that facilitates the processing of payment transactions for physical stores, merchants, or institutions that own platforms that make either online purchases or purchases made via software applications possible (e.g., the shopping cart platform providers and the in-app payment processing providers). The terms “acquirer”, “acquirer bank”, “acquiring bank” or “acquirer server” will be used interchangeably herein.
The terms “payment network” and “card network” are used interchangeably throughout the description and refer to a network or collection of systems used for the transfer of funds through the use of cash substitutes. Payment networks may use variety of different protocols and procedures in order to process the transfer of money for various types of transactions. Payment networks are companies that connect an issuing bank with an acquiring bank to facilitate online payment. Transactions that may be performed via the payment network may include product or service purchases, credit purchases, debit transactions, fund transfers, account withdrawals, etc. Payment networks may be configured to perform transactions via cash substitutes that may include payment cards, letters of credit checks, financial accounts, etc. Examples of networks or systems configured to perform as payment networks include those operated by payment processors such as Mastercard®.
The term “payment card”, used throughout the description, refers to a physical or virtual card that may or may not be linked with a financial or payment account that may be presented to the merchant or any such facility to fund a financial transaction via the associated payment account. Examples of payment cards include, but are not limited to, debit cards, credit cards, prepaid cards, virtual payment numbers, virtual card numbers, forex cards, charge cards, e-wallet cards, and stored-value cards. The payment card may be the physical card that may be presented to the merchant for funding the payment. Alternatively, or additionally, the payment card may be embodied in the form of data stored in a user device, where the data is associated with the payment account such that the data can be used to process the financial transaction between the payment account and the merchant’s financial account.
The term “payment transaction” refers to an agreement that is carried out between a buyer and seller to exchange goods or services in exchange for assets in the form of a payment (e.g., cash, fiat-currency, digital asset, cryptographic currency, coins, tokens, etc.).
The terms “Neural Network”, “Artificial Neural Network”, and “Machine Learning Model’ may have been used interchangeably throughout the description and refer to a software solution that leverages Machine Learning (ML) algorithms to mimic the operations of a human brain. Moreover, neural networks process data more efficiently and feature improved pattern recognition and problem-solving capabilities when compared to traditional computers.
OVERVIEW
Various embodiments of the present disclosure provide methods, systems electronic devices, and computer program products for training a fraud detection model, i.e., an artificial intelligence (AI)/machine learning (ML) model, and using the trained fraud detection model to detect fraud. In a specific embodiment, the server system may be embodied within a payment server associated with a payment network. Further, in an embodiment, the server system is configured to access a set of historical fraud data from a database associated with the server system. In a non-limiting example, the set of historical fraud data includes a set of labeled data samples and a set of unlabeled data samples. Further, the set of labeled data samples includes a subset of fraud labeled data samples and a subset of non-fraud labeled data samples.
In another embodiment, the server system is configured to train a fraud detection model based, at least in part, on the set of labeled samples. In a non-limiting example, the fraud detection model may be a Deep Convolutional Neural Network (DCNN) based classifier. Further, the server system is configured to perform a plurality of operations iteratively until performance of the fraud detection model reaches a predefined criterion. Herein, the predefined criteria are a stage in the iterative process where the performance of the model saturates or the difference in model performance between different iterations diminishes. In some instances, the predefined criteria may be defined by an administrator of the server system as well.
In an implementation, the plurality of operations includes predicting, by the fraud detection model, a pseudo-label for each unlabeled data sample from the set of unlabeled data samples. Herein, the pseudo-label indicates whether a particular unlabeled data sample is one of a fraud sample or a non-fraud sample. Further, the plurality of operations includes generating a set of pseudo-labeled data samples based, at least in part, on the set of unlabeled data samples and the pseudo-label predicted for each unlabeled data.
Further, the plurality of operations includes computing, by a first autoencoder model and a second autoencoder model associated with the server system, a confidence score for each unlabeled data sample. In a non-limiting example, the first autoencoder model and the second autoencoder model are autoencoder based machine learning models. More specifically, to compute the confidence score, the server system is configured to generate the first autoencoder model based, at least in part, on a subset of fraud labeled data samples. Then, the server system is configured to compute, by the first autoencoder model, a first reconstruction loss for each unlabeled data sample based, at least in part, on the set of unlabeled data samples. Then, the server system is configured to generate the second autoencoder model based, at least in part, on a subset of non-fraud labeled data samples. Then, the server system is configured to compute, by the second autoencoder model, a second reconstruction loss for each unlabeled data sample based, at least in part, on the set of unlabeled data samples. Finally, the server system is configured to determine the confidence score for each unlabeled data sample based, at least in part, on the first reconstruction loss corresponding to each unlabeled data sample and the second reconstruction loss corresponding to each unlabeled data sample.
Further, the plurality of operations includes generating, by the fraud detection model, a set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the confidence score for each unlabeled data sample. More specifically, to generate the set of soft pseudo-labeled data samples, the server system is configured to extract a subset of unlabeled data samples from the set of unlabeled data based, at least in part, on the confidence score for each unlabeled data sample being at least equal to a sorting threshold. Then, the server system is configured to determine the set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the subset of unlabeled data samples. Herein, each soft pseudo-label data sample is associated with each unlabeled data sample from the subset of unlabeled data samples.
Further, the plurality of operations includes re-training the fraud detection model based, at least in part, on the set of labeled data samples and the set of soft pseudo-labeled data samples. It is noted that the re-training process fine-tunes the performance of the fraud detection model. More specifically, to re-train the fraud detection model the server system is configured to generate a set of final pseudo-labeled data samples based, at least in part, on applying a sharpening function to the confidence score for each unlabeled data sample associated with each soft pseudo-labeled data sample from the set of soft pseudo-labeled data samples. Then, the server system is configured to fine-tune the fraud detection model based, at least in part, on the set of labeled data samples and the set of final pseudo-labeled data samples, wherein the re-training fine-tunes the performance of the fraud detection model. In another embodiment, the server system is configured to determine, by the fraud detection model, a final class label for each unlabeled data sample. It is noted that the final class label indicates whether the particular unlabeled data sample is one of the fraud sample and the non-fraud sample.
Various embodiments of the present disclosure offer multiple advantages and technical effects. For instance, the present disclosure aims to solve the technical problem of how to train an AI/ML model for detecting fraud when there is a lack of suitable labeled data samples during the training process. The present disclosure solves this technical problem by providing an approach for training a fraud detection model for detecting fraud. It is noted that the semi-supervised formulation of the fraud detection model is based on supervised learning, along with an auxiliary unsupervised model (i.e., the first autoencoder model and the second autoencoder model) guiding the semi-supervised learning process. As may be understood, the primary challenge that is faced during such a formulation is to utilize unlabeled data effectively and efficiently, this problem has been solved by the approach of the present disclosure by performing a pseudo-labeling process. During the pseudo-labeling process, labels are assigned to the unlabeled data samples using the fraud detection model that has already been trained on the labeled data. Thereafter, a subset of unlabeled data is selected from the unlabeled data based on the corresponding confidence score. Then, the set of labeled data and the set of pseudo-labeled data are used to re-train the existing model or train a new model, which is effectively an improved version or iteration of the existing model. This process may be repeated iteratively until adding more unlabeled data yields no more benefit in terms of the model’s performance on a fixed test data set.
Various embodiments of the present disclosure provide artificial intelligence-based methods, systems electronic devices, and computer program products for performing fraud detection for various entities such as banks, e-commerce retailers, and the like. The present disclosure described various machine learning-based fraud detection models that are configured to detect fraudulent activities such as illicit transactions, fake reviews, and the like. The fraud detection model may generate risk scores that may be used by various entities to identify fraudulent activity.
Various embodiments of the present disclosure are described hereinafter with reference to FIGS. 1A-1B to 6A-6B.
FIG. 1A illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, determining an optimal subset of unlabeled data to generate pseudo-labeled data, training an Artificial intelligence (AI) model using labeled data and the pseudo-labeled data, etc., among other suitable operations for detecting and preventing fraudulent activities.
The environment 100 generally includes a server system 102, a user 104 associated with a user device 106, a merchant 108 associated with a merchant device 110, and a fraud database 112 each coupled to, and in communication with (and/or with access to) a network 114. The network 114 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among the entities illustrated in FIG. 1A, or any combination thereof.
Various entities in the environment 100 may connect to the network 114 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof.
The user 104 may refer to an individual, a representative of a corporate entity, a non-profit organization, or any other person. In one example, the user 104 may use the user device 106 to perform various activities, such as payment transactions, posting an online review, and the like at the merchant 108. In some non-limiting examples, the user device 106 may include a smartphone, a tablet computer, a handheld computer, a wearable device, a portable media player, a gaming device, a personal digital assistant (PDA), and the like.
The merchant 108 may refer to seller, retailer, purchase location, organization, or any other entity that is in the business of selling goods or providing services. In one example, the merchant 108 may be single business location or chain of business locations of the same entity. In one example, the merchant 108 may be associated with the merchant device 110 which may be used to complete transactions. In some non-limiting examples, the merchant device 110 may include a smartphone, a tablet computer, a handheld computer, a wearable device, a portable media player, a gaming device, a Personal Digital Assistant (PDA), Point-Of-Sale (POS) devices, Point-Of-Purchase (POP) devices, Point-Of-Interaction (POI) devices, and the like.
In one embodiment, the fraud database 112 stores information related to various activities performed with the merchant 108, such as online reviews, historical payment transactions, and the like. For example, the fraud database 112 may store labeled and unlabeled data samples associated with various activities, such as online reviews, historical payment transactions, and the like. In one example, the labeled data may be further classified into fraud labeled data and non-fraud labeled data. It another instance, the labeled data includes a set of labeled data samples and the unlabeled data includes a set of unlabeled data samples. Further, the set of labeled data samples includes a subset of fraud labeled data samples and a subset of non-fraud labeled data samples. Herein, fraudulent activities may be classified under fraud labeled data, while non-fraudulent activities may be classified under non-fraud labeled data. In various non-limiting examples, the fraud database 112 may include third-party datasets as well for validating the performance of the various models described later in the present disclosure, the third-party datasets may include Amazon™ reviews, Yelp™ Reviews, Elliptic™ Bitcoin data, and the like.
As may be noted, conventionally fraud detection is done using machine learning models. Although, conventional machine learning algorithms yield high accuracies, they do so while requiring a large amount of labeled data samples to provide consistent performance. It should be noted that these machine learning algorithms are trained using large quantities of data owned by various entities that are facing fraudulent activities. However, most of the data available to these entities is unlabeled, and labeling this unlabeled data is an extremely expensive and time-consuming activity. Therefore, it is very difficult to optimize the performance of machine learning algorithms due to a lack of labeled data thereby, leading to inadequate fraud detection.
To overcome the above-mentioned and other possible limitations, the present disclosure provides the server system 102. The server system 102 is configured to train an artificial intelligence (AI)-based fraud detection model 116 based, at least in part, on labeled data and pseudo-labels selected from unlabeled data. Further, the server system 102 is configured to determine whether fraudulent activities are ongoing, based at least in part on the trained fraud detection model 116.
In an embodiment, the server system 102 further includes at least the fraud detection model 116, a first autoencoder model 118, and a second autoencoder model 120. In an example, the server system 102 is associated with a database 122. In various non-limiting examples, the various models utilized by the server system 102 may be artificial intelligence (AI) models or machine learning (ML) models.
In one implementation, the database 122 provides the storage location for the various machine learning models or algorithms of the server system 102. The database 122 may be incorporated in the server system 102 or may be an individual entity connected to the server system 102 or may be the database 122 stored in a cloud storage. In an embodiment, the server system 102 can be a separate part of the environment 100 and may operate as a separate component from (but still in communication with, for example, via the network 114) the user 104 and the merchant 108. In addition, the server system 102 should be understood to be embodied in at least one computing device in communication with the network 114, which may be specifically configured, via executable instructions, to perform functions as described herein, and/or embodied in at least one non-transitory computer-readable media.
The server system 102 is configured to perform one or more of the operations described herein. The server system 102 is configured to access a set of labeled data, a set of unlabeled data, and a set of historical fraud data from the database 122. In a non-limiting example, the fraud detection model 116 is generated or trained based, at least in part, on the set of labeled data. Then, the first autoencoder model 118 is generated or trained based, at least in part, on fraud labeled data, and the second autoencoder model 120 is generated or trained based, at least in part, on non-fraud labeled data. Further, the server system 102 is configured to compute a set of reconstruction losses via the first autoencoder model 118 and the second autoencoder model 120. Then, the server system 102 is configured to generate a set of pseudo-labeled data based, at least in part, on a subset of unlabeled data selected from the unlabeled data based on the set of reconstruction losses. Further, the server system 102 is configured to retrain the fraud detection model 116 based, at least in part, on the set of labeled data and the set of pseudo-labeled data. Further, the server system 102 is configured to detect ongoing fraudulent activities using the retrained fraud detection model 116.
Therefore, the present disclosure provides a hybrid solution for fraud detection. In a non-limiting example, the fraud detection model 116 may be a semi-supervised learning model. The supervised approach of the fraud detection model 116 utilizes all recent progress in the field of semi-supervised learning from domains, such as computer vision natural language processing, and the like. It should be understood that all fraudulent activities are not outliers or have an anomaly present among such activities. To that end, fraudulent activities are mostly novel data samples that have some peculiar aspect associated with them and unsupervised learning algorithms may not allow the fraud detection model 116 to exploit and learn such peculiar aspects. It may be understood that such algorithms mostly create an understanding of the non-fraud data and then attempt to classify other data points as fraud data sample or non-fraud data sample. Herein, the terms data sample and sample may be used interchangeably. Thus, due to these reasons, the semi-supervised formulation of the fraud detection model 116 is based on supervised learning, along with an auxiliary unsupervised model (i.e., the first autoencoder model 118 and the second autoencoder model 120) guiding the semi-supervised learning process. As may be understood, the primary challenge that is faced during such a formulation is to utilize unlabeled data effectively and efficiently, this problem has been solved by the approach of the present disclosure by performing a pseudo-labeling process. During the pseudo-labeling process, labels are assigned to the unlabeled data samples using the fraud detection model 116 that has already been trained on the labeled data. Thereafter, the subset of unlabeled data is selected from the unlabeled data based on the set of reconstruction losses (referred hereinafter interchangeably as ‘confidence score’). Then, the set of labeled data and the set of pseudo-labeled data are used to re-train the existing model or train a new model, which is effectively an improved version or iteration of the existing model. This process may be repeated iteratively until adding more unlabeled data yields no more benefit in terms of the model’s performance on a fixed test data set.
To that end, the present disclosure provides various guided self-training based semi-supervised learning methods for performing fraud detection. The method of the present disclosure can be used with a classification model that utilizes unlabeled data iteratively using student-teacher based pseudo-labeling mechanism. Further, auxiliary autoencoders are used that give useful cues for improving or optimizing the pseudo-labeling mechanism while preventing label noise. Therefore, the present disclosure provides a hybrid solution for fraud detection. It should be understood that the approach of the present disclosure may be used for any supervised learning model apart from neutral network-based models as well.
The number and arrangement of systems, devices, and/or networks shown in FIG. 1A are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1A. Furthermore, two or more systems or devices shown in FIG. 1A may be implemented within a single system or device, or the single system or device as shown in FIG. 1A may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 100 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 100.
FIG. 1B illustrates an exemplary representation of another environment 130 related to at least some example embodiments of the present disclosure. Although the environment 130 is presented in one arrangement, other embodiments may include the parts of the environment 130 (or other parts) arranged otherwise depending on, for example, determining an optimal subset of unlabeled data to generate pseudo-labeled data, training an Artificial intelligence (AI) or Machine Learning (ML) model using labeled data and the pseudo-labeled data, determining fraudulent transactions based on the trained AI/ML model, etc. It should be noted that although the environment 130 illustrated by FIG. 1B is specific to the financial domain, the same should not be construed to be a limitation of the present disclosure. In other words, the various embodiments of the present disclosure may be used in other domains and industries as well such as detecting fraud reviews and the like. It should be noted that FIG. 1B shares various common entities with FIG. 1A; and therefore, these entities are not explained hereinafter again for the sake of brevity. The environment 130 generally includes the server system 102, the user 104 associated with the user device 106, the merchant 108 associated with the merchant device 110, the fraud database 112, an issuer server 132, an acquirer server 134, a payment network 136 including a payment server 138, each coupled to, and in communication with (and/or with access to) the network 114. The network 114 may include, without limitation, light fidelity (Li-Fi) network, local area network (LAN), wide area network (WAN), metropolitan area network (MAN), satellite network, the Internet, fiber optic network, coaxial cable network, infrared (IR) network, radio frequency (RF) network, virtual network, and/or another suitable public and/or private network capable of supporting communication among the entities illustrated in FIG. 1B, or any combination thereof.
In one non-limiting implementation, the issuer server 132 is a financial institution that manages cardholder accounts (i.e., payment accounts) of multiple users, such as the user 104. Payment account details of the payment accounts established with the issuer server 132 are stored in user profiles of the user 104 in memory of the issuer server 132 or on a cloud server associated with the issuer server 132. The issuer server 132 approves or denies a payment authorization request, and then routes, via the payment network 136 (or the server system 102), a payment authorization response back to the acquirer server 134. The acquirer server 134 sends the approval to the merchant (e.g., the merchant 108). The terms “issuer”, “issuer bank”, “issuing bank” or “issuer server” will be used interchangeably herein. Further, the user 104 may also be interchangeably referred to as a cardholder, ‘account holder’, ‘consumer’, and a ‘buyer’.
In one non-limiting implementation, the acquirer server 134 is associated with a financial institution (e.g., a bank) that processes financial transactions. This can be an institution that facilitates the processing of payment transactions for physical stores, ATM terminals, merchants (such as the merchant 108), or an institution that owns platforms that make online purchases or purchases made via software applications possible (e.g., shopping cart platform providers and in-app payment processing providers). The terms “acquirer”, “acquirer bank”, “acquiring bank” or “acquirer server” will be used interchangeably herein. In one implementation, the server system 102 is in communication with the issuer server 132, the acquirer server 134, or the payment server 138.
In one embodiment, the payment network 136 may be used by the payment card issuing authorities as a payment interchange network. The payment network 136 may include a plurality of payment servers, such as the payment server 138. Examples of payment interchange networks include, but are not limited to, Mastercard® payment system interchange network. The Mastercard® payment system interchange network is a proprietary communications standard promulgated by Mastercard International Incorporated® for the exchange of financial transactions among a plurality of financial activities that are members of Mastercard International Incorporated®. (Mastercard is a registered trademark of Mastercard International Incorporated located in Purchase, N.Y.).
The number and arrangement of systems, devices, and/or networks shown in FIG. 1B is provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1B. Furthermore, two or more systems or devices are shown in FIG. 1B may be implemented within a single system or device, or the single system or device as shown in FIG. 1B may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 130 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 130.
Referring now to FIG. 2, a simplified block diagram of a server system 200 is illustrated, in accordance with an embodiment of the present disclosure. The server system 200 is similar to the server system 102 of FIGS. 1A and 1B. In some embodiments, the server system 200 may be embodied as a cloud-based and/or SaaS-based (software as a service) architecture or by the payment server 138 of FIG. 1B. In one embodiment, the server system 200 is a part of the payment network 136. The server system 200 is configured to train an AI or ML model such as the fraud detection model 116 and determine fraudulent activities in the network 114.
In one embodiment, the server system 200 includes a computer system 202 and a database 204. The computer system 202 includes at least one processor 206 for executing instructions, a memory 208, a communication interface 210, and a user interface 216. The one or more components of the computer system 202 communicate with each other via a bus 212. The user interface 216 may enable any authorized entity (such as an administrator) to interact with the server system 200. The user interface 216 is an interface, such as a Human Machine Interface (HMI) or a software application that allows users such as an administrator (not shown) to interact with and control the server system 200 or one or more parameters associated with the server system 200. It may be noted that the user interface 216 may be composed of several components that vary based on the complexity and purpose of the application. Examples of components of the user interface 216 may include visual elements, controls, navigation, feedback and alerts, user input and interaction, responsive design, user assistance and help, accessibility features, and the like. More specifically, these components may correspond to icons, layout, color schemes, buttons, sliders, dropdown menus, tabs, links, error/success messages, mouse and touch interactions, keyboard shortcuts, tooltips, screen readers, and the like.
In some embodiments, the database 204 is integrated into the computer system 202. For example, the computer system 202 may include one or more hard disk drives as the database 204. A storage interface 214 is any component capable of providing the processor 206 with access to the database 204. The storage interface 214 may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a Redundant Array of Independent Disks (RAID) controller, a Storage Area Network (SAN) adapter, a network adapter, and/or any component providing the processor 206 with access to the database 204.
The processor 206 includes suitable logic, circuitry, and/or interfaces to execute computer-readable instructions for classifying and determining fraudulent activities in the network 114, etc. Examples of the processor 206 include, but are not limited to, an application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphical processing unit (GPU) processor, a field-programmable gate array (FPGA), and the like. The memory 208 includes suitable logic, circuitry, and/or interfaces to store a set of computer-readable instructions for performing operations. Examples of the memory 208 include a random-access memory (RAM), a read-only memory (ROM), a removable storage drive, a hard disk drive (HDD), and the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the memory 208 in the server system 200, as described herein. In another embodiment, the memory 208 may be realized in the form of a database server or cloud storage working in conjunction with the server system 200, without departing from the scope of the present disclosure.
The processor 206 is operatively coupled to the communication interface 210 such that the processor 206 can communicate with a remote device 218 such as the user device 106, the merchant device 110, the fraud detection model 116, the payment network 136, or communicated with any entity connected to the network 114 (as shown in FIGS. 1A-1B). It is noted that the server system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the server system 200 may include fewer or more components than those depicted in FIG. 2.
In one embodiment, the processor 206 includes a data pre-processing engine 220, a fraud detection engine 222, an autoencoder engine 224, and a sharpening engine 226. It should be noted that the components, described herein, can be configured in a variety of ways, including electronic circuitries, digital arithmetic and logic blocks, and memory systems in combination with software, firmware, and embedded technologies. In an embodiment, the database 204 includes a fraud detection model 228, a first autoencoder model 230, and a second autoencoder model 232. It should be noted that the fraud detection model 228, the first autoencoder model 230, and the second autoencoder model 232 of FIG. 2 are similar to the fraud detection model 116, the first autoencoder model 118, and the second autoencoder model 120 of FIGS. 1A and 1B, respectively.
The data pre-processing engine 220 includes suitable logic and/or interfaces for accessing a set of historical fraud data from the database (such as, the fraud database 112). The set of historical fraud data may further include a set of labeled data and a set of unlabeled data pertaining to a plurality of activities, such as payment transactions or reviews performed with an entity such as a business. It should be noted that the set of labeled data may classify data points into the plurality of activities as either fraud or non-fraud activities while unlabeled data does not provide such insights, in other words, the data points in the unlabeled data will be unlabeled. In one non-limiting example, each of the plurality of activities can be associated with one or more features such as merchant identifier (ID), User ID, card present (CP) indicator, card-not-present (CNP) indicator, country code, city code, review date, review length, review relevance score, fraud scores associated with the user, fraud score associated with the merchant, transaction amount, settlement amount, Primary Account Number (PAN), security level indicator, PayPass flag, business type, industry code, Merchant Category Code (MCC), etc., among other possible suitable features. In an embodiment, the data pre-processing engine 220 is communicably coupled to the fraud detection engine 222.
The fraud detection engine 222 includes suitable logic and/or interfaces for generating the fraud detection model 228 based at least in part on the set of labeled data. In a non-limiting example, the fraud detection model 228 may be a Deep Convolutional Neural Network (DCNN) based classifier that is trained on the set of labeled data.
In a non-limiting example, if a set of n examples is assumed where X: = {??1, ??2, . . . , ???? , ????+1 , . . . , ????} with ???? ? ??. The first ?? examples X_L: = { ??1, ??2, . . . , ???? } are labeled as Y_L : = {??1, ??2, . . . , ???? } where ???? ? ??, i.e., a discrete set over ?? classes such as ?? : = {1, 2, . . . , ??}. As may be understood, in the present scenario since only two classes are considered, i.e., fraud class and non-fraud class, the value of c is assumed as 2. The remaining examples x_i for ?? ? : = {?? + 1,?? + 2, . . . , ??} are considered to be unlabeled. If the unlabeled set is denoted as X_U, then ?? is the disjoint union of the two sets ?? = X_L?X_U . In supervised learning, the fraud detection engine 222 uses the labeled examples with their corresponding labels (X_L, Y_L) to train the fraud detection model 228 which may be a classifier that learns to predict class labels for previously unseen examples. It should be noted that the guiding principle of semi-supervised learning is to leverage the unlabeled examples as well to train the classifier, i.e., the fraud detection model 228.
The fraud detection model 228 takes a labeled data element data as an input (X_L,Y_L) which takes an example ???? ? ?? and generates a vector of class-label probabilities i.e., ?? ? R^c, where ?? are the parameters of the model. It should be noted that the fraud detection model 228 may be trained by minimizing the supervised loss. A non-limiting exemplary equation for determining supervised loss is provided below:
L_S (X_L,Y_L,?)=?_(i=1)^l¦? l_s (f_? (x_i ),y_i )… Eqn. 1
In an example, the loss function in the classification by the fraud detection model 228 is cross entropy loss, l_s (y ^,y)=-ylog?(y ^). As may be understood, the deep neural network of fraud detection model 228 may be considered to be a composition of two networks, i.e., a feature extraction network that transforms input data to a vector of the features ?_?:??R^dand a classification network that maps the feature vector to the class vector. Let v_i:=?_? (x_i )" be the feature vector of " x_i.
In one example, the classification network is usually a fully connected layer on top of ?_?. The output of the network for x_i is ???? (????) and the final prediction is the class with the highest probability score, i.e., may be given by the following non-limiting exemplary equation:
y ^_i=?arg max?_j?(f_? (x_i ))_j… Eqn. 2
It is noted that the trained classifier, i.e., the fraud detection model 228 is the starting point of the semi-supervised learning approach described by the present disclosure.
In an embodiment, a subset of unlabeled data may be introduced in the set of labeled data during the training process of the fraud detection model 228. This is done by assigning pseudo-labels to the subset of unlabeled data and training the fraud detection model 228 on this labeled set (X_L?X_U ),(Y_L?Y ^_U ) using the supervised loss for the true-labeled examples plus a similar loss for the pseudo-labeled examples. The same may be computed using the non-limiting exemplary equation given below:
L_p (X_U,Y ^_U,?)=?_(i=l+1)^n¦? l_s (f_? (x_i ),y ^_i )… Eqn. 3
In one implementation, the fraud detection model 228 is iteratively trained using a student-teacher training approach, i.e., a trained model (teacher model) is iteratively used to pseudo-label a set of unlabeled examples, and then re-train the model (now a student) on the labeled plus the pseudo-labeled examples. It is noted that the same model assumes the role of the student (as a learner) and the teacher (it generates labels, which are then used by itself as a student for learning). Therefore, a model f_? is trained on the labeled data X_L (using supervised loss from Eq. 1) and is then employed for inference on the unlabeled set X_U. The prediction vectors f_? (x_i )?x_i?X_U^' are converted to one hot vector, where X_U^'?X_U. These examples X_U^' along with their corresponding (pseudo-)labels Y ^_U^' are added to the original labeled set. This extended labeled set X_L??X^'?_U is used to train another (student) model ?f'?_?. This procedure is repeated and the current student model is used as a teacher in the next phase to get pseudo-labels ? X_U^'' for training another (student) model ?f''?_? on the set X_L??X'?_U??X''?_U. Now, for conventional self-training methods, the entire unlabeled set X_U is used in every iteration.
However, as mentioned above, the most general form of self-training can have different sets of unlabeled data (?X'?_U ,?X''?_U and so on) in every iteration. The method of selection of ?X'?_U from X_U can come from any utility function, the objective of which would be to use the most appropriate unlabeled data samples in each iteration. It should be understood that some alternative implementations may even use weights for each (i.e., labeled/unlabeled) data sample, which are updated in every iteration, such as Transductive Semi-Supervised Learning methods, which are borrowed from the known concept of boosting used in statistics. In an embodiment, the fraud detection engine 222 is communicably coupled to the autoencoder engine 224.
In an embodiment, the autoencoder engine 224 includes suitable logic and/or interfaces for generating or training the first autoencoder model 230 based, at least in part, on the subset of fraud labeled data. The autoencoder engine 224 is further configured to generate or train the second autoencoder model 232 based, at least in part, on the subset of non-fraud labeled data. Therefore, the autoencoder engine 224 is configured to train each of the two autoencoder models for each class, i.e., fraudulent and non-fraudulent. It should be noted that in order to improve the performance of the fraud detection engine 222, before assigning the pseudo-labels by using the prediction vectors, the subset of the unlabeled data is selected by the autoencoder engine 224 based on calculating a set of reconstruction losses from the first autoencoder model 230 and the second autoencoder model 232.
In one implementation, the autoencoder models are trained using the labeled data for each class. Let ???? and ?????? be the unlabeled training data from two different (but registered) classes, namely fraud and non-fraud, respectively. Let ???? = {x_F^((1)) , x_F^((2)) , x_F^((3)) , x_F^((p)) } be the training set including fraud samples and ?????? = x_nF^((1)) , x_nF^((2)) , x_nF^((3)) , x_nF^((q)) } be the non-fraud samples, where ?? + ?? = ??. Using these samples, the first autoencoder model 230 and the second autoencoder model 232 are trained to learn the following mappings, namely A8 : ???? ? ???? and A?: ?????? ? ??????.
In a particular example, the reconstruction loss for the first autoencoder model 230 may be assumed to be ??1 (may also be interchangeably referred to as ‘first reconstruction loss’) and that of the second autoencoder model 232 may be assumed to be ??2 (may also be interchangeably referred to as ‘second reconstruction loss’. Now, since the first autoencoder model 230 is trained on the fraud labeled samples, it is likely to return a lower value of the reconstruction loss for the fraud samples than non-fraud samples. Thus, for an unlabeled sample ????, both ??1 and ??2 are calculated. If ???? is a fraud sample then it is likely that ??1 < ??2, else ??1 > ??2, in which case it is more likely to be a non-fraud sample. This is known as the classification hint. In addition to this, the autoencoder engine 224 may also calculate | ??1 – ??2 |, the magnitude of which is directly proportional to the confidence of the autoencoders on the class of the sample. Now, these cues can be used by the fraud detection engine 222 during the pseudo-labeling process for the set of unlabeled data to improve the performance of fraud detection engine 222. In particular, at first, sub-sampling may be done, followed by model training and sample selection before proceeding to the pseudo-labeling process.
In other words, the reconstruction loss corresponding to each unlabeled data sample received from the first autoencoder model 230 and the second autoencoder model 232, i.e., the first reconstruction loss and the second reconstruction loss are used to compute the confidence score corresponding to each unlabeled data sample. Then, the fraud detection model 228 is configured to take cues from the confidence score associated with each unlabeled data sample to determine if the pseudo-label associated with that unlabeled data sample may be selected to further train the fraud detection model 228.
The process begins by assuming that a set of training examples, ?? (??) = (?? (??), ?? (??)), where each ???? ? (??) has a corresponding label ???? ? (??). Then, generate ?? random samples of the training set ?? (??) each of size ??, ?? < | ?? (??) |. An example can be present in more than one of the ?? samples, i.e., ?? * ?? being equal to | ?? (??) | is not a requirement.
Further, ?? separate models are trained on each of the ?? samples of the training data. The models are also chosen to be different architectures with their separate parameters ??1, ??2, . . . , ??k . The unlabeled examples X_U are then fed to each of the ?? trained models to infer their corresponding probability vectors. Thus, (????1(??), ????2(??), . . . , ????k(??)) is obtained for each ?? ? X_U, and ????i(??) ? R^c???.
Once the initial models on the labeled data are trained, the ?? models are used for pseudo-labeling on the set of unlabeled data. For this step, ideally, those samples are pseudo-labeled, where the pseudo-label is more likely to be correct, as incorrect pseudo-labels add label noise to the training data which may negatively affect convergence. For selecting samples, the following steps are performed:
Out of the ?? - ?? unlabeled data samples, the entropy (????) is calculated for the prediction vector obtained from the fraud detection model 228 for each unlabeled sample ????. The rationale for entropy is that having a lower entropy prediction vector implies the model is more confident in those examples. In some semi-supervised learning approaches (e.g., label propagation) use entropy, as a measure of uncertainty, to assign weight to the pseudo-labels. Similarly, | ??1 – ??2 | is calculated for every sample using the autoencoder models that were trained. The results are sorted in increasing order of the confidence score, i.e., 1-??(????) + | ??1 – ??2 |, as this score quantifies the confidence of the fraud detection model 228 and the first autoencoder model 230 and the second autoencoder model 232 on the pseudo-labels. Further, the top K % samples are selected from the sorted list for pseudo-labeling.
For all samples ???? that are selected from the above process, the autoencoder engine 224 determines whether the pseudo-label and the classification hint agree with each other, i.e., whether they are coherent with each other. The subset of all samples selected by the previous step will be further selected and this is the set that will be used as pseudo-labeled data in the next iteration of the self-training process. In other words, the server system 200 is configured to extract a subset of unlabeled data samples from the set of unlabeled data based, at least in part, on the confidence score for each unlabeled data sample being at least equal to a sorting threshold. In a non-limiting example, the sorting threshold may indicate that a top x% of the unlabeled data samples ordered with increasing values of confidence score may be selected for the extraction process.
Now, for assigning pseudo-labels to the unlabeled examples, the ensemble of the predictions of the individual ?? models. In one example, the following non-limiting exemplary equation may be used to determine the ensemble of predictions:
f_ensmbl (x)=1/k ?_(j=1)^k¦? f_?j (x)… Eqn. 4
The unlabeled examples are then sorted in decreasing order of the confidence score 1- H(????) + | ??1 – ??2 | of the ensemble prediction vectors f_ensmbl (x), and the first ?? examples with the lowest entropy are selected. The autoencoder engine 224 assigns soft pseudo-labels y ^_i to these ?? examples as follows:
y ^_i= f_ensmbl (x)…Eqn. 5
In another embodiment, the autoencoder engine 224 may be communicably coupled to the sharpening engine 226. The fraud detection engine 222 includes suitable logic and/or interfaces for generating a set of adjusted reconstruction losses based, at least in part, on applying a sharpening function to the set of reconstruction losses. In particular, given the soft pseudo-label assigned to the unlabeled data, i.e., the ensemble vector of the individual model predictions, the sharpening function is applied to reduce the entropy of the label distribution. In one example, the strength of the sharpening process may be adjusted such that it is proportional to the confidence on the pseudo-label of that unlabeled sample. The sharpening function is controlled by a hyper-parameter ??, which is generally a heuristic for most applications. In particular, the sharpening engine 226 uses the metric | ??1 – ??2 | as the hyper-parameter for sharpening. It is given by the following equation:
Sharpen(y,T)_i=(y_i^(1/| L1 – L2 |))/(?_(j=1)^c¦? y_j^(1/| L1 – L2 |) )… Eqn. 6
The soft pseudo-label is sharpened to obtain the final pseudo-label to be used for training the model. It is given by the following equation:
y ^_i^sharp= Sharpen(y ^_i,| L1 – L2 |)_1…Eqn. 7
Thus, each unlabeled sample from the set of unlabeled data is sharpened depending on the confidence of its pseudo-label. Therefore, the higher the confidence, the closer is the pseudo-label to a one-hot vector. This alleviates unnecessary label noise in the pseudo-labels and helps the model focus more on those samples where the pseudo-labels have a higher probability of being correct. In other words, a set of final pseudo-labeled data samples may be determined by the sharpening engine 226 based, at least in part, on applying the sharpening function to the confidence score for each unlabeled data sample associated with each soft pseudo-labeled data sample from the set of soft pseudo-labeled data samples.
Now, the enriched training data, i.e., both the original labeled samples as well as the pseudo-labeled examples, is used to train the next iteration classifier, i.e., the fraud detection model 228. Since the training data contains both hard (OHE) labels as well as soft (vector) labels, the loss function has two components - supervised loss, i.e., L_S (X_L,Y_L,?) using cross-entropy as defined in Eqn. 1 for the hard-labeled examples and the unsupervised loss L_U (X;?) using squared Euclidean distance as defined in Equation 3 for the soft-labeled examples. The full model, i.e., the semi-supervised learning (SSL) model is trained using a loss which is the linear combination of these two losses, that may be given by the following non-limiting exemplary equation:
L_SSL= L_s+? L_U… Eqn. 8
where, ? is the hyper-parameter to be chosen.
If there is no pseudo-labeled data, the loss term reduces to the supervised loss only. Finally, the sharpening engine 226 is configured to determine, by the fraud detection model 228, i.e., obtained after the iterative process stops, a final class label for each unlabeled data sample, the final class label indicating whether the particular unlabeled data sample is one of a fraud data sample and a non-fraud data sample.
FIG. 3A illustrates a schematic block diagram representation 300 for training a fraud detection model using labeled data and pseudo-labeled data, in accordance with an embodiment of the present disclosure. It should be noted that FIG. 3A describes the operation of the various engines of FIG. 2 with the help of a block diagram representation 300.
To that end, initially an initial labeled data 302 is accessed by both, a first autoencoder model 320 and a second autoencoder model 322 along with fraud detection model 310 (for example, see ‘N’ number of fraud detection models depicting each iteration of the fraud detection model 228). Here, class 0 data represents non-fraud data labeled and class 1 data represents fraud labeled data or vice-versa. At first, the labeled data is used to train the autoencoders as described with reference to FIG. 2.
Initially, the first iteration of the fraud detection model, i.e., Model 1 (may also be referred to as a first machine learning model), is used to predict labels for the set of unlabeled data 304. Then, the autoencoders are configured to access the set of unlabeled data 304 with the predicted labels. Further, the first autoencoder model 320 is configured to compute the reconstruction error or loss on class 0 A? given by ??1 (see, 324) for each data sample. Similarly, the second autoencoder model 322 is configured to compute the reconstruction error or loss on class 1 A? given by ??2 (see, 326) for each data sample.
Then, the confidence score, i.e., 1 ? H(????) + | ??1 – ??2 | is computed by the server system 102, and the same is arranged in increasing order (see, 312). It should be noted these values may be sorted in descending order as well. Further, the top K samples (i.e., the sorting threshold) are selected from the sorted list for pseudo-labeling.
Then, for the top K samples that are selected from the above process, the server system 102 determines whether the pseudo-label and the classification hint agree with each other, i.e., whether they are coherent with each other. The subset of all samples selected by the previous step will be further selected and this is the set that would be used as pseudo-labeled data in the next iteration of the self-training process, i.e., at this step the server system 200 filters and selects the best pseudo-labeled samples (see, 314). These pseudo-labels may be called soft pseudo-labels (see, 316). Upon going through a guided sharpening procedure, a set of final pseudo-labeled data is obtained (see, 318). This operation may be carried out by the sharpening engine 226 of the server system 102. In particular, the set of final pseudo-labeled data samples based, at least in part, on applying sharpening function to the confidence score for each unlabeled data sample associated with each soft pseudo-labeled data sample from the set of soft pseudo-labeled data samples. Now, this set of final pseudo-labeled data is added to the training set of the N number of fraud detection model 310 to form training data 308.
It should be noted that the initial labeled data is copied (see, 306) by the fraud detection model 310 and then, it is separated as training data (see, 308). Here, the training data 308 is enriched training data, i.e., both the original labeled samples as well as the final pseudo-labeled samples, and is now used to train the next iteration classifier, i.e., the fraud detection model 310. Therefore, the fraud detection model 310 is iteratively trained till the Nth step, where the performance of the fraud detection model 310 ceases to improve even after adding more pseudo-labeled samples in the training data 308. This stage in the iteration process may be referred to as a predefined criteria. In other words, once the performance of the fraud detection model 310 reaches the predefined criteria, the iterative process stops.
Further, training the N number of fraud detection model 310 is done based on equation 8 as described earlier (i.e., L_SSL= L_s+? L_U)
In one non-limiting example, the pseudo-code for the algorithm utilized by the present disclosure is given below:
(1) Let ?? (0) = (?? (0), ?? (0)) = (????, ????) be the initial training data.
(2) Let ???? be the unlabeled data.
(3) for ?? = 0 to ?? = ??, perform the following steps:

(a) Create ?? random samples of ?? (??) as described above.
(b) Train separate ‘k’ models (denoted by ‘M’) M??1, M??2, . . . , M??N on each of the ?? samples.
(c) For each ?? ? ????, get its prediction vector from each of the ?? models. (????1(??), ????2(??), . . . , ????k(??)).
(d) Compute the ensemble probability vector f_ensmbl (x) ??? ? .
(e) Sort the unlabeled examples in terms of (????) + | ??1 – ??2 |, and choose the first ?? examples.
(f) Select only those samples from ???? ? ???? that agree on the pseudo-label and classification hint obtained from A1 and A2.
(g) Assign soft pseudo-labels y ^_p= f_ensmbl (x)…for all ?? ? ????.
(h) Sharpen the soft pseudo-labels to obtain y ^_p^sharp for all ?? ? ???? using | ??1 – ??2 |, as the sharpening parameter.
(i) Create the training data for the next iteration as: ?? (i+1) = (?? (i+1), ?? (i+1))= (????, ????) = (?? (??) ? ????, ?? (??) ? y ^_p )

FIG. 3B illustrates a schematic block diagram representation 330 for training the fraud detection model 228 incorporating noise reduction layer 328 using labeled data and pseudo-labeled data, in accordance with an embodiment of the present disclosure. It should be noted that incorrect pseudo-labels add label noise to the training data which may negatively affect convergence. To avoid this issue, the noise reduction layer 328 may be added to the existing training mechanism for the fraud detection model 228 as described by FIG. 3A. For the sake of brevity, the various components of FIG. 3B are similar to the components described by FIG. 3A, hence, not explained again.
The noise reduction layer 328 reduces the noise (or mislabeled data samples) from the pseudo samples. In an example, the noise reduction layer 328 may be implemented using a machine learning model that is a Self-Learning with Multi-Prototypes (SMP) model. The SMP model aims to train a robust network on a very noisy dataset without extra supervision. In various non-limiting examples, different methods for dealing with the noisy labels may also be employed by the server system 102 to reduce noise from the pseudo-labels to improve the performance of the fraud detection mechanism.
FIGS. 4A-4B, collectively, illustrate a method 400 for training a fraud detection model, in accordance with an embodiment of the present disclosure. The method 400 depicted in the flow diagram may be executed by, for example, the server system 200. The sequence of operations of the method 400 may not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped and performed in the form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner. Operations of the method 400, and combinations of operations in the method 400 may be implemented by, for example, hardware, firmware, a processor, circuitry, and/or a different device associated with the execution of software that includes one or more computer program instructions. The plurality of operations is depicted in the process flow of the method 400. The process flow starts at operation 402.
At 402, the method 400 includes accessing, by a server system, such as server system 200, a set of historical fraud data from a database such as database. The set of historical fraud data includes a set of labeled data and a set of unlabeled data. The set of labeled data includes a subset of fraud labeled data and a subset of non-fraud labeled data.
At 404, the method 400 includes generating, by the server system 200, a first machine learning model (i.e., the fraud detection model’s initial state) based, at least in part, on the set of labeled data.
At 406, the method 400 includes generating, by the server system 200, a first autoencoder model 230 based, at least in part, on the subset of fraud labeled data and a second autoencoder model 232 based, at least in part, on the subset of non-fraud labeled data.
At 408, the method 400 includes determining, by the server system via the first autoencoder model 230 and the second autoencoder model 232, a set of reconstruction losses.
At 410, the method 400 includes performing, by the server system 200, the following steps i.e., steps 410A-410E, iteratively until the performance of a second machine learning model is fine-tuned, the steps including:
At 410A, the method 400 includes extracting, by the server system 200, the subset of unlabeled data from the set of unlabeled data based, at least in part, on the set of reconstruction losses.
At 410B, the method 400 includes generating, by the server system 200 via the first machine learning model, a set of soft pseudo-labeled data based, at least in part, on the subset of unlabeled data.
At 410C, the method 400 includes generating, by the server system 200, a set of final pseudo-labeled data based, at least in part, on applying a sharpening function to the set of soft pseudo-labeled data
At 410D, the method 400 includes training, by the server system 200, the second machine learning model (i.e., the fraud detection model’s retrained state) based, at least in part, on the set of labeled data and the set of final pseudo-labeled data.
At 410E, the method 400 includes determining, by the server system 200, the performance of the second machine learning model based, at least in part, on the set of historical fraud data.
To verify the approach provided by the present disclosure various experiments have been performed on publicly available datasets provided by entities such as Amazon™ online reviews, Yelp™ online reviews, Elliptic™ Bitcoin Fraud dataset, and the like. Further, the results of the experiments for the prior-art approaches and the proposed approach have been tested using the same datasets as well.
In an example, the Amazon™ online reviews dataset contains product reviews under the musical instruments category on Amazon’s online marketplace. The total number of samples in this database is 11,944 out of which around 9.5% are fraudulent. These databases have been utilized previously to study the fraudster camouflage problem on which fraud detection can be evaluated.
In another example, Yelp™ online reviews dataset contains user reviews of hotels and restaurants. The total number of samples is 45,954 out of which around 14.5% are labeled as filtered (spam/fraud) and the rest are legitimate (non-fraud).
In another example, the Elliptic™ Bitcoin Fraud dataset contains bitcoin transactions and maps them to illicit (scams, malware, terrorist organizations, ransomware, Ponzi schemes, etc.) and licit (exchanges, wallet providers, miners, licit services, etc.) categories. The dataset is presented as a transaction graph, each node being a bitcoin transaction, and an edge represents the flow of bitcoins between one transaction and the other. The dataset contains 203,769 nodes/data samples, out of which 4,545 (around 2%) are labeled as illicit, 42,019 (around 21%) samples are labeled as licit and the rest are unlabeled.
The semi-supervised experiments have been performed using the proposed self-training setups. In order to perform semi-supervised learning experiments, and at the same time compare the results of the proposed approach with the approaches of the prior art similar testing sets were used across all the experiments in each database. During the training process, a part of the training data is used as labeled data and the rest is used as unlabeled data. This helps to evaluate the effectiveness of the proposed approach of utilizing unlabeled data when compared with the prior art. The evaluation criteria are based on several factors such as accuracy, precision, recall, F1 score, Area under ROC (AUC) curve, Area under the precision-recall curve (AUCPR) and micro averaged F1 score. It should be noted that since fraud classification is a highly unbalanced problem (the fraction of fraud is very low compared to non-fraud), more emphasis has been placed on the F1 score and micro-averaged F1 score for evaluation.
In one exemplary experiment, the effectiveness of the proposed self-training algorithm for fraud detection is determined. The first step is to train a model on limited labeled data and then keep on adding pseudo-labeled data from the unlabeled set, in an iterative fashion. For this experiment, the Elliptic™ dataset was used. On the Elliptic™ dataset, fraudulent bitcoin transactions were detected by training the model (i.e., fraud detection model 228) on limited labeled training data and improving this model by iteratively utilizing pseudo-labeled data, which are unlabeled. The results for each iteration of the proposed self-training approach are depicted in Table 1. The first row of Table 1 depicts the baseline model which is trained on limited training data (e.g., around thirty thousand data samples out of the two hundred and three thousand data available in this database). Thereafter, in each row, the amount of unlabeled data that was added (in the form of pseudo-labeled samples) is shown. Based on Table 1, it can be easily observed that the ??1 score improves considerably. Further, both precision and recall improve and the improvement in precision is noteworthy (from around 66% in the baseline model to approximately 65% to 75% in the last iteration). This dataset is an ideal dataset for the proposed semi-supervised paradigm since it has a large amount of unlabeled data.

Table 1: Results for the proposed approach on the Elliptic dataset.
Similarly, after performing experiments on the Yelp™ and Amazon™ reviews dataset, similar gains were achieved from utilizing unlabeled data, although a semi-supervised learning protocol had to be simulated in those datasets, as they did not have any unlabeled data samples. In order to do this, a part of the training data was treated as unlabeled data samples and these unlabeled data samples were pseudo-labeled these samples in an iterative way. In each iteration, the pseudo-labeled data samples were used along with the labeled data for training. As a result, on the Yelp™ dataset, the ??1 score increased from about 50.9% to approximately 51% to 55% (see, Table 2), and on the Amazon™ reviews dataset the same improved from about 82.3% to about 83% to 87%, while only using about 35% to 45% of the training set as labeled data (see, Table 3).

Table 2: Results for the proposed approach on the Yelp reviews dataset, by using only 40% of the labeled samples.

Table 3: Results for the proposed approach on the Amazon reviews dataset, by using only 40% of the labeled samples.
In another exemplary experiment, fraud detection has primarily been performed with other prior art-based approaches such as graph neural networks (and their variants), tree-based models, and so on. The comparison of the proposed self-training approach with these methods is illustrated using this experiment.
The comparisons between the proposed self-training approach and the conventional approaches are shown for the online reviews datasets (Yelp™ and Amazon™) in Table 4 and the Elliptic Bitcoin Fraud dataset in Table 5. For experimental purposes, the rest of the training data has been utilized as pseudo-labeled data, the ground truth labels of which were never made available during training. For the Amazon™ and the Yelp™ online reviews dataset, a comparison with several graph-based approaches, using different proportions of training data has been performed. On both the online review databases, a Care-GNN method gave the previous state-of-the-art result, which the proposed semi-supervised algorithm outperforms by a significant margin on all three training data proportions. On the Elliptic Bitcoin Fraud dataset, a comparison (see, Table 5) with other shallow classification models along with graph-based methods namely GCN and Skip-GNN has been performed.

Table 4: Detection performance (%) (Area under ROC curve) on different proportions of labeled train data, on the Yelp and Amazon review databases.

Table 5: Comparative results on the Elliptic Bitcoin Fraud database
In yet another exemplary experiment, several ablation experiments tear apart each and every aspect of the proposed approach further, experiments have been done using different modifications that can be done to the proposed approach.
In one experiment, the significance of the proportion of pseudo-labeled data classes is tested. In this experiment, the amount of unlabeled data that is added in every iteration is chosen according to the decreasing entropy of the class-wise probability scores given by the models. In addition, in this way, the different performances exhibited by various models depending on what proportion of fraud and non-fraud data is added in each iteration are tested. In this experiment, only illicit pseudo-labeled data are added and it is observed that the models perform poorly. This is contrary to the general intuition that since it is an unbalanced classification problem, the performance should have improved by adding only the illicit (i.e., fraud) class. The reason behind this observation is that, since frauds are similar to anomalies, modeling anomalies may lead to poor performance from a classification model, which is exactly what happens in this case. The results of this experiment are shown in Table 6 below:

Table 6: Experimental results on the Bitcoin Fraud dataset, while adding only the illicit pseudo-labeled data during training.
In one experiment, the significance of the Lambda (?) is tested. This experiment has been performed on the Elliptic dataset where the value of lambda has been kept low in initial iterations and then increased slowly in each iteration of self-training. It can be seen that compared to the previous experiment on the same dataset (see, Table 1) the results in this case (benefit from the unlabeled data in each iteration) are comparatively better. In the first iteration of the self-training process, precision is at around 65% to 75%, whereas in the experiment shown in Table 1, the same is reached after the 4??h iteration. The results of this experiment are shown in Table 7 below:

Table 7: Experimental results on the Bitcoin Fraud dataset, where in each iteration 10 thousand new samples were added as pseudo-labeled data ( herein, Lambda is increased from 0 to 20 as the training progresses).
In one experiment, the contribution of autoencoders is tested. A natural ablation for the proposed approach is to observe the behavior and contribution of the autoencoders, which were learned separately on the fraud and non-fraud labeled data. During the pseudo-labeling process, the unlabeled data samples were sorted at first using the entropy of the predictions with the supervised model and the confidence metric obtained using the autoencoders. Once, the top 8% to 11% of data samples are selected from the sorted list, the classification hint from the autoencoders is used along with the model’s classification output to further refine this list. It is noted that the various experimental results described with reference to Tables 1-7 are approximate in nature and their reproduction may vary based on the various factors chosen or assumed during the experimental setups. Therefore, these experimental results are only exemplary in nature and should not be construed as limiting for the scope of the present disclosure.
FIG. 5A illustrates a graphical representation 500 of the samples selected/refined using the classification hint from the autoencoders, from the Elliptic dataset, in accordance with an embodiment of the present disclosure. It can be seen that a substantial number of samples are removed in each step where the classification hint from ??1 (i.e., the first autoencoder model 230) and ??2 (i.e., the second autoencoder model 232) and M (i.e., the fraud detection model 228) do not agree.
FIG. 5B illustrates a graphical representation 530 of the percentage of samples selected using the classification hint from the autoencoders, in accordance with an embodiment of the present disclosure. It can be seen that the selection is stricter for fraud samples than non-fraud samples on the Amazon dataset, while the same for the Yelp dataset does not necessarily hold. This shows that a significant number of samples are removed in each step which prevents label noise during pseudo-labeling.
FIGS. 6A-6B, collectively, illustrate a method 600 for training a fraud detection model and detecting fraud using the fraud detection model, in accordance with an embodiment of the present disclosure. The method 600 depicted in the flow diagram may be executed by, for example, the server system 200. The sequence of operations of the method 600 may not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped and performed in the form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner. Operations of the method 600, and combinations of operations in the method 600 may be implemented by, for example, hardware, firmware, a processor, circuitry, and/or a different device associated with the execution of software that includes one or more computer program instructions. The plurality of operations is depicted in the process flow of the method 600. The process flow starts at operation 602.
At operation 602, the method 600 includes accessing, by a server system (such as the server system 200), a set of historical fraud data from a database 204 associated with the server system 200. In a non-limiting example, the set of historical fraud data includes a set of labeled data samples and a set of unlabeled data samples.
At operation 604, the method 600 includes training, by the server system 200, a fraud detection model such as fraud detection model 228 based, at least in part, on the set of labeled samples.
At operation 606, the method 600 includes performing, by the server system 200, a plurality of operations iteratively until performance of the fraud detection model 228 reaches a predefined criteria. The plurality of operations includes operations 606A-606E.
At operation 606A, the method 600 includes predicting, by the fraud detection model 228, a pseudo-label for each unlabeled data sample from the set of unlabeled data samples, the pseudo-label indicating whether a particular unlabeled data sample is one of a fraud sample and a non-fraud sample.
At operation 606B, the method 600 includes generating a set of pseudo-labeled data samples based, at least in part, on the set of unlabeled data samples and the pseudo-label predicted for each unlabeled data.
At operation 606C, the method 600 includes computing, by a first autoencoder model and a second autoencoder model associated with the server system, a confidence score for each unlabeled data sample.
At operation 606D, the method 600 includes generating, by the fraud detection model 228, a set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the confidence score for each unlabeled data sample.
At operation 606E, the method 600 includes re-training the fraud detection model based, at least in part, on the set of labeled data samples and the set of soft pseudo-labeled data samples, wherein the re-training fine-tunes the performance of the fraud detection model.
At operation 608, the method 600 includes determining, by the fraud detection model 228, a final class label for each unlabeled data sample, the final class label indicating whether the particular unlabeled data sample is one of the fraud data sample and the non-fraud data sample.
The disclosed method with reference to FIGS. 4A-4B and FIGS. 6A-6B, or one or more operations of the server system 200 may be implemented using software including computer-executable instructions stored on one or more computer-readable media (e.g., non-transitory computer-readable media, such as one or more optical media discs, volatile memory components (e.g., dynamic random-access memory (dynamic RAM or DRAM) or static random-access memory (static RAM or SRAM)), or nonvolatile memory or storage components (e.g., hard drives or solid-state nonvolatile memory components, such as Flash memory components) and executed on a computer (e.g., any suitable computer, such as a laptop computer, netbook, Web book, tablet computing device, smartphone, or other mobile computing devices). Such software may be executed, for example, on a single local computer or in a network environment (e.g., via the Internet, a wide-area network, a local-area network, a remote web-based server, a client-server network (such as a cloud computing network), or other such networks) using one or more network computers. Additionally, any of the intermediate or final data created and used during the implementation of the disclosed methods or systems may also be stored on one or more computer-readable media (e.g., non-transitory computer-readable media) and are considered to be within the scope of the disclosed technology. Furthermore, any of the software-based embodiments may be uploaded, downloaded, or remotely accessed through a suitable communication means. Such suitable communication means includes, for example, the Internet, the World Wide Web, an intranet, software applications, cable (including fiber optic cable), magnetic communications, electromagnetic communications (including RF, microwave, and infrared communications), electronic communications, or other such communication means.
Although the invention has been described with reference to specific exemplary embodiments, it is noted that various modifications and changes may be made to these embodiments without departing from the broad scope of the invention. For example, the various operations, blocks, etc., described herein may be enabled and operated using hardware circuitry (for example, Complementary Metal Oxide Semiconductor (CMOS) based logic circuitry), firmware, software, and/or any combination of hardware, firmware, and/or software (for example, embodied in a machine-readable medium). For example, the apparatuses and methods may be embodied using transistors, logic gates, and electrical circuits (for example, Application-Specific Integrated Circuit (ASIC) circuitry and/or in Digital Signal Processor (DSP) circuitry).
Particularly, the server system 200 and its various components may be enabled using software and/or using transistors, logic gates, and electrical circuits (for example, integrated circuit circuitry such as ASIC circuitry). Various embodiments of the invention may include one or more computer programs stored or otherwise embodied on a computer-readable medium, wherein the computer programs are configured to cause a processor or computer to perform one or more operations. The computer-readable medium storing, embodying, or encoded with a computer program, or similar language, may be embodied as a tangible data storage device storing one or more software programs that are configured to cause the processor or computer to perform one or more operations. Such operations may be, for example, any of the steps or operations described herein. In some embodiments, the computer programs may be stored and provided to the computer using any type of non-transitory computer-readable media. Non-transitory computer-readable media include any type of tangible storage media. Examples of non-transitory computer-readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (compact disc read-only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (BLU-RAY® Disc), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash memory, RAM (random access memory), etc.). Additionally, a tangible data storage device may be embodied as one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. In some embodiments, the computer programs may be provided to the computer using any type of transitory computer-readable media. Examples of transitory computer-readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer-readable media can provide the program to the computer via a wired communication line (e.g., electric wires, and optical fibers) or a wireless communication line.
Various embodiments of the invention, as discussed above, may be practiced with steps and/or operations in a different order, and/or with hardware elements in configurations, which are different than those which, are disclosed. Therefore, although the invention has been described based upon these exemplary embodiments, it is noted that certain modifications, variations, and alternative constructions may be apparent and well within the scope of the invention.
Although various exemplary embodiments of the invention are described herein in a language specific to structural features and/or methodological acts, the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as exemplary forms of implementing the claims.
,CLAIMS:1. A computer-implemented method, comprising:
accessing, by a server system, a set of historical fraud data from a database associated with the server system, the set of historical fraud data comprising a set of labeled data samples and a set of unlabeled data samples;
training, by the server system, a fraud detection model based, at least in part, on the set of labeled samples; and
performing, by the server system, a plurality of operations iteratively until performance of the fraud detection model reaches a predefined criteria, the plurality of operations comprising:
predicting, by the fraud detection model, a pseudo-label for each unlabeled data sample from the set of unlabeled data samples, the pseudo-label indicating whether a particular unlabeled data sample is one of a fraud sample and a non-fraud sample;
generating a set of pseudo-labeled data samples based, at least in part, on the set of unlabeled data samples and the pseudo-label predicted for each unlabeled data;
computing, by a first autoencoder model and a second autoencoder model associated with the server system, a confidence score for each unlabeled data sample;
generating, by the fraud detection model, a set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the confidence score for each unlabeled data sample; and
re-training the fraud detection model based, at least in part, on the set of labeled data samples and the set of soft pseudo-labeled data samples, wherein the re-training fine-tunes the performance of the fraud detection model.
2. The computer-implemented method as claimed in claim 1, further comprising:
determining, by the fraud detection model, a final class label for each unlabeled data sample, the final class label indicating whether the particular unlabeled data sample is one of the fraud sample and the non-fraud sample.
3. The computer-implemented method as claimed in claim 1, wherein the set of labeled data samples comprises a subset of fraud labeled data samples and a subset of non-fraud labeled data samples.
4. The computer-implemented method as claimed in claim 1, wherein computing the confidence score, comprises:
generating, by the server system, the first autoencoder model based, at least in part, on a subset of fraud labeled data samples;
computing, by the first autoencoder model, a first reconstruction loss for each unlabeled data sample based, at least in part, on the set of unlabeled data samples;
generating, by the server system, the second autoencoder model based, at least in part, on a subset of non-fraud labeled data samples;
computing, by the second autoencoder model, a second reconstruction loss for each unlabeled data sample based, at least in part, on the set of unlabeled data samples; and
determining, by the server system, the confidence score for each unlabeled data sample based, at least in part, on the first reconstruction loss corresponding to each unlabeled data sample and the second reconstruction loss corresponding to each unlabeled data sample.
5. The computer-implemented method as claimed in claim 1, wherein generating the set of soft pseudo-labeled data samples, comprises:
extracting, by the server system, a subset of unlabeled data samples from the set of unlabeled data based, at least in part, on the confidence score for each unlabeled data sample being at least equal to a sorting threshold; and
determining, by the server system, the set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the subset of unlabeled data samples, wherein each soft pseudo-label data sample is associated with each unlabeled data sample from the subset of unlabeled data samples.
6. The computer-implemented method as claimed in claim 1, wherein re-training the fraud detection model, comprises:
generating, by the server system, a set of final pseudo-labeled data samples based, at least in part, on applying a sharpening function to the confidence score for each unlabeled data sample associated with each soft pseudo-labeled data sample from the set of soft pseudo-labeled data samples; and
fine-tuning the fraud detection model based, at least in part, on the set of labeled data samples and set of final pseudo-labeled data samples, wherein the re-training fine-tunes the performance of the fraud detection model.
7. The computer-implemented method as claimed in claim 1, wherein the fraud detection model is a Deep Convolutional Neural Network (DCNN) based classifier.
8. The computer-implemented method as claimed in claim 1, wherein the first autoencoder model and the second autoencoder model are autoencoder based machine learning models.
9. The computer-implemented method as claimed in claim 1, wherein the server system is a payment server associated with a payment network.
10. A server system, comprising:
a communication interface;
a memory comprising executable instructions; and
a processor communicably coupled to the communication interface and the memory, the processor configured to cause the server system to at least:
access a set of historical fraud data from a database associated with the server system, the set of historical fraud data comprising a set of labeled data samples and a set of unlabeled data samples;
train a fraud detection model based, at least in part, on the set of labeled samples; and
perform a plurality of operations iteratively until performance of the fraud detection model reaches a predefined criteria, the plurality of operations comprising:
predicting, by the fraud detection model, a pseudo-label for each unlabeled data sample from the set of unlabeled data samples, the pseudo-label indicating whether a particular unlabeled data sample is one of a fraud sample and a non-fraud sample;
generating a set of pseudo-labeled data samples based, at least in part, on the set of unlabeled data samples and the pseudo-label predicted for each unlabeled data;
computing, by a first autoencoder model and a second autoencoder model associated with the server system, a confidence score for each unlabeled data sample;
generating, by the fraud detection model, a set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the confidence score for each unlabeled data sample; and
re-training the fraud detection model based, at least in part, on the set of labeled data samples and the set of soft pseudo-labeled data samples, wherein the re-training fine-tunes the performance of the fraud detection model.
11. The server system as claimed in claim 10, wherein the server system is further caused to at least:
determine, by the fraud detection model, a final class label for each unlabeled data sample, the final class label indicating whether the particular unlabeled data sample is one of the fraud sample and the non-fraud sample.
12. The server system as claimed in claim 10, wherein the set of labeled data samples comprises a subset of fraud labeled data samples and a subset of non-fraud labeled data samples.
13. The server system as claimed in claim 10, wherein to compute the confidence score, the server system is further caused to at least:
generate the first autoencoder model based, at least in part, on a subset of fraud labeled data samples;
compute, by the first autoencoder model, the first reconstruction loss for each unlabeled data sample based, at least in part, on the set of unlabeled data samples;
generate the second autoencoder model based, at least in part, on a subset of non-fraud labeled data samples;
compute, by the second autoencoder model, a second reconstruction loss for each unlabeled data sample based, at least in part, on the set of unlabeled data samples; and
determine the confidence score for each unlabeled data sample based, at least in part, on the first reconstruction loss corresponding to each unlabeled data sample and the second reconstruction loss corresponding to each unlabeled data sample.
14. The server system as claimed in claim 10, wherein to generate the set of soft pseudo-labeled data samples, the server system is further caused to at least:
extract a subset of unlabeled data samples from the set of unlabeled data based, at least in part, on the confidence score for each unlabeled data sample being at least equal to a sorting threshold; and
determine the set of soft pseudo-labeled data samples based, at least in part, on the set of pseudo-labeled data samples and the subset of unlabeled data samples, wherein each soft pseudo-label data sample is associated with each unlabeled data sample from the subset of unlabeled data samples.
15. The server system as claimed in claim 10, wherein to re-train the fraud detection model, the server system is further caused to at least:
generate a set of final pseudo-labeled data samples based, at least in part, on applying a sharpening function to the confidence score for each unlabeled data sample associated with each soft pseudo-labeled data sample from the set of soft pseudo-labeled data samples; and
fine-tune the fraud detection model based, at least in part, on the set of labeled data samples and set of final pseudo-labeled data samples, wherein the re-training fine-tunes the performance of the fraud detection model.

Documents

Application Documents

#	Name	Date
1	202241063209-STATEMENT OF UNDERTAKING (FORM 3) [04-11-2022(online)].pdf	2022-11-04
2	202241063209-PROVISIONAL SPECIFICATION [04-11-2022(online)].pdf	2022-11-04
3	202241063209-POWER OF AUTHORITY [04-11-2022(online)].pdf	2022-11-04
4	202241063209-FORM 1 [04-11-2022(online)].pdf	2022-11-04
5	202241063209-DRAWINGS [04-11-2022(online)].pdf	2022-11-04
6	202241063209-DECLARATION OF INVENTORSHIP (FORM 5) [04-11-2022(online)].pdf	2022-11-04
7	202241063209-Correspondence_Form 26_14-11-2022.pdf	2022-11-14
8	202241063209-Proof of Right [22-12-2022(online)].pdf	2022-12-22
9	202241063209-DRAWING [03-11-2023(online)].pdf	2023-11-03
10	202241063209-CORRESPONDENCE-OTHERS [03-11-2023(online)].pdf	2023-11-03
11	202241063209-COMPLETE SPECIFICATION [03-11-2023(online)].pdf	2023-11-03