Abstract: Various embodiments provide methods and systems for training and optimizing AI/ML models. The method performed by the system includes accessing historical transaction dataset and generating common feature set and unique feature set based on historical transaction dataset. Method includes generating a primary machine learning model based on the common feature set and the unique feature set. The primary machine learning model includes encoder layer and prediction head layer. Method includes generating via the primary machine learning model, server gradient dataset based on the common feature set and the unique feature set. Method includes generating and transmitting client gradient request message to the plurality of client servers and receiving plurality of client gradient datasets from the plurality of client servers. Method includes generating aggregate gradient dataset based on server gradient dataset and plurality of client gradient datasets and optimizing weights of the primary machine learning model based on aggregate gradient dataset.
Description:TECHNICAL FIELD
The present disclosure relates to artificial intelligence-based processing systems and, more particularly to, electronic methods and systems for training and optimizing artificial intelligence-based learning models for different entities such as a plurality of clients and a central server without sharing training data.
BACKGROUND
Various industries such as finance, e-commerce, social media, food delivery, and the like rely on Artificial Intelligence (AI) or Machine Learning (ML) models for performing a variety of applications or tasks such as fraud detection, risk management, data analytics, algorithmic marketing, customer support, algorithmic trading, and the like. For example, a financial enterprise such as a payment processor (such as Mastercard®) can predict an ongoing payment transaction as fraudulent using a fraud detection model. The fraud detection model may be trained using AI and ML techniques for predicting the likelihood of the payment transaction being fraudulent based on the behavior of the cardholder. The designing of such AI or ML models includes various stages such as data collection, feature generation (or data preparation), model training, model testing, and model optimization (or fine-tuning). As may be understood, the initial stages of data collection and feature generation are very crucial for the development of robust and accurate AI or ML models. Therefore, it is highly desirable to generate AI or ML models that take insights from data collected from a wide variety of sources to improve the overall accuracy of these models while making any desired predictions. For example, to develop a machine learning-based fraud detection model, historical transaction data associated with a plurality of cardholders from a wide variety of issuing banks or acquiring banks is desired. In other words, rather than relying on a single data source for generating insights or features, multiple data sources are preferred since the features thus generated would lead to the generation of a fraud detection model that is more accurate in its predictions.
However, due to privacy concerns in privacy-critical industries such as finance, e-commerce, healthcare, and the like, it becomes very difficult to gather data from different sources due to privacy concerns. For example, financial transaction data related to a plurality of cardholders cannot be shared between different banking entities due to privacy concerns. Further, there may exist data privacy laws and regulations that prohibit the sharing of user data between different regions. For example, data localization laws of certain countries prohibit the financial data of their citizens from being moved beyond their territorial borders. Such laws restrict a single entity or different entities operating in different countries from using data of its users residing in different countries for training of AI or ML models. This also leads to poor model performance.
To address the above-mentioned technical problems, federated learning-based AI or ML models have been developed. In the federated learning process, a single AI or ML model such as a deep learning model is shared between a plurality of clients by a central server. Each of the plurality of clients downloads the shared model and trains it individually. Then, the plurality of clients share updates with the central server. Thereafter, the central server can update or optimize the shared model based on the received updates to generate a collective model. This process can be repeated iteratively till the model performance saturates. In other words, updates are shared rather than sharing the actual data. In this way, sharing of data between clients and the central server is avoided thus, addressing the privacy issue while being in line with the regulations. There are two types of federated learning methodologies, i.e., Horizontal Learning (HL) and Vertical Learning (VL). In HL, each of the shared models at the plurality of client’s ends uses different data samples to generate the same features whereas, in VL, the same data samples are used to generate different features for training the shared model.
In some scenarios at first glance, data samples between different client servers within the plurality of client servers may seem to be mutually exclusive however in reality, such data samples may share relationships with each other. The existing federated learning methodologies are unable to take advantage of such relations to improve their prediction accuracy. For example, in a payment network, a client such as a first issuing bank associated with a first set of cardholders will only have transaction data related to payment transactions performed by the first set of cardholders at a first set of merchants. Further, another client such as a second issuing bank associated with a second set of cardholders will only have transaction data related to payment transactions performed by the second set of cardholders at a second set of merchants. When a shared machine learning-based fraud detection model is trained based on the insights or features generated by the first issuing bank and the second issuing bank, the shared model will not be able to yield high accuracy while predicting the fraud rate of any cardholder from the first set of cardholder while performing transactions at any merchant from the second set of merchants. In other words, there exists a need to yield higher accuracy while performing predictions by training the collective model in a scenario where each of the plurality of clients has unique client data for generating unique client features for training the shared models.
Further, it is understood that the unique data samples and the unique features thus generated by different clients may differ in dimensionality from each other. Therefore, it becomes challenging to deploy the same AI or ML model for performing the shared training process between the plurality of clients by the central server. This is known as a feature dimensionality problem. For example, if a first client has 50 features while a second client has 20 features, then the same shared model cannot be used to generate the updates unless the first client only uses 20 features to train its shared model. However, this would lead to lower model accuracy since the insights from the 30 additional features will not be incorporated into the collective model.
Thus, there exists a need for a technical solution to address the above-mentioned technical problem of training shared machine learning models to yield higher prediction accuracy while addressing the feature dimensionality problem and data privacy concerns.
SUMMARY
Various embodiments of the present disclosure provide methods and systems for training and optimizing a plurality of AI and ML models for different entities such as a payment server, an issuer server, an acquirer server, and the like.
In an embodiment of the present disclosure, a computer-implemented method is disclosed. The method performed by a server system includes accessing a historical transaction dataset from a database associated with the server system. The method further includes generating a common feature set and a unique feature set based, at least in part, on the historical transaction dataset. Herein, the common feature set indicates features that are shared in common with a plurality of client servers and the unique feature set indicates features that are unique to the server system. The method further includes generating a primary machine learning model based, at least in part, on the common feature set and the unique feature set. The primary machine learning model includes at least an encoder layer and a prediction head layer. The method further includes generating via the primary machine learning model, a server gradient dataset based, at least in part, on the common feature set and the unique feature set. The method further includes generating and transmitting a client gradient request message to the plurality of client servers. In response to the client gradient request message, the method further includes receiving a plurality of client gradient datasets from the plurality of client servers. Herein, the plurality of client gradient datasets is generated by a plurality of secondary machine learning models associated with the plurality of client servers. Further, each secondary machine learning model of the plurality of secondary machine learning models includes a client level prediction head layer identical to the prediction head layer of the primary machine learning model. Further, the method includes generating an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets. The method further includes optimizing weights of the primary machine learning model based, at least in part, on the aggregate gradient dataset.
In another embodiment of the present disclosure, a system for training and optimizing a plurality of AI and ML models for different entities. The system includes memory and a processor. The memory stores instructions which are executed by the processor and cause the system to access a historical transaction dataset from a database associated with the server system. The system is further caused to generate a common feature set and a unique feature set based, at least in part, on the historical transaction dataset. The common feature set indicates features that are shared in common with a plurality of client servers and the unique feature set indicates features that are unique to the server system. The system is further caused to generate a primary machine learning model based, at least in part, on the common feature set and the unique feature set. The primary machine learning model includes at least an encoder layer and a prediction head layer. The system is further caused to generate via the primary machine learning model, a server gradient dataset based, at least in part, on the common feature set and the unique feature set. The system is further caused to generate and transmit a client gradient request message to the plurality of client servers. In response to the client gradient request message, the system is further caused to receive a plurality of client gradient datasets from the plurality of client servers. Herein, the plurality of client gradient datasets is generated by a plurality of secondary machine learning models associated with the plurality of client servers. Further, each secondary machine learning model of the plurality of secondary machine learning models includes a client level prediction head layer identical to the prediction head layer of the primary machine learning model. The system is further caused to generate an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets. The system is further caused to optimize weights of the primary machine learning model based, at least in part, on the aggregate gradient dataset.
In yet another embodiment of the present disclosure, a non-transitory computer-readable storage medium is disclosed. The non-transitory computer-readable storage medium includes computer-executable instructions that, when executed by at least a processor of a system, cause the system to perform a method. The method includes accessing a historical transaction dataset from a database associated with the server system. The method further includes generating a common feature set and a unique feature set based, at least in part, on the historical transaction dataset. The common feature set indicates features that are shared in common with a plurality of client servers and the unique feature set indicates features that are unique to the server system. The method further includes generating a primary machine learning model based, at least in part, on the common feature set and the unique feature set. The primary machine learning model includes at least an encoder layer and a prediction head layer. The method further includes generating via the primary machine learning model, a server gradient dataset based, at least in part, on the common feature set and the unique feature set. The method further includes generating and transmitting a client gradient request message to the plurality of client servers. In response to the client gradient request message, the method further includes receiving a plurality of client gradient datasets from the plurality of client servers. Herein, the plurality of client gradient datasets is generated by a plurality of secondary machine learning models associated with the plurality of client servers. Further, each secondary machine learning model of the plurality of secondary machine learning models includes a client level prediction head layer identical to the prediction head layer of the primary machine learning model. Further, the method includes generating an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets. The method further includes optimizing weights of the primary machine learning model based, at least in part, on the aggregate gradient dataset.
In yet another embodiment of the present disclosure, a computer-implemented method is disclosed. The method performed by a client server includes receiving a model type notification message from a server system. The model type notification message includes at least information related to a prediction head layer of a primary machine learning model associated with the server system. The method further includes accessing a client transaction dataset from a client database associated with the client server. The method further includes generating a common client feature set and a unique client feature set based, at least in part, on the client transaction dataset. Herein, the common client feature set indicates features that are shared in common with the server system, and the unique client feature set indicates features that are unique to the client server. The method further includes generating a secondary machine learning model based, at least in part, on the model type notification message, the common client feature set, and the unique client feature set. The secondary machine learning model includes at least a client level encoder layer and a client prediction head layer. The client prediction head layer is identical to the prediction head layer of the primary machine learning model. Further, in response to receiving a client gradient request message from the server system, the method includes generating via the secondary machine learning model, a client gradient dataset based, at least in part, on the common client feature set and the unique client feature set. Further, in response to receiving an aggregate gradient dataset from the server system, the method includes optimizing weights of the secondary machine learning model based, at least in part, on the aggregate gradient dataset.
BRIEF DESCRIPTION OF THE FIGURES
For a more complete understanding of example embodiments of the present technology, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
FIG. 1 illustrates an exemplary representation of an environment related to at least some example embodiments of the present disclosure;
FIG. 2 illustrates a simplified block diagram of a server system, in accordance with an embodiment of the present disclosure;
FIG. 3 illustrates a simplified block diagram of a client server, in accordance with an embodiment of the present disclosure;
FIG. 4 is a sequence flow diagram for training and optimizing a primary machine learning model of the server system and a plurality of secondary machine learning models, in accordance with an embodiment of the present disclosure;
FIG. 5 illustrates a graphical representation of a diagonal federated learning process for training and optimizing a primary machine learning model and a plurality of secondary machine learning models, in accordance with an embodiment of the present disclosure;
FIGS. 6A and 6B, collectively, depict a flow diagram depicting a method for training and optimizing the primary machine learning model, in accordance with an embodiment of the present disclosure; and
FIG. 7 is a flow diagram depicting a method for training and optimizing the secondary machine learning model, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure can be practiced without these specific details.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in an embodiment” in various places in the specification is not necessarily all refer to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
Moreover, although the following description contains many specifics for the purposes of illustration, anyone skilled in the art will appreciate that many variations and/or alterations to said details are within the scope of the present disclosure. Similarly, although many of the features of the present disclosure are described in terms of each other, or in conjunction with each other, one skilled in the art will appreciate that many of these features can be provided independently of other features. Accordingly, this description of the present disclosure is set forth without any loss of generality to, and without imposing limitations upon, the present disclosure.
The term “payment network”, used herein, refers to a network or collection of systems used for the transfer of funds through the use of cash substitutes. Payment networks may use a variety of different protocols and procedures to process the transfer of money for various types of transactions. Transactions that may be performed via a payment network may include product or service purchases, credit purchases, debit transactions, fund transfers, account withdrawals, etc. Payment networks may be configured to perform transactions via cash substitutes, which may include payment cards, letters of credit, checks, financial accounts, etc. Examples of networks or systems configured to perform as payment networks include those operated by such as Mastercard®.
OVERVIEW
In an embodiment, a server system that may be implemented in a payment server associated with a payment network is configured to access a historical transaction dataset from a database associated with the server system. In another embodiment, the server system is configured to generate a common feature set and a unique feature set based, at least in part, on the historical transaction dataset. In an example, the common feature set may indicate towards or include features that are shared in common with a plurality of client servers and the unique feature set may indicate towards or include features that are unique to the server system. In one embodiment, the plurality of client servers includes at least one of one or more issuer servers and one or more acquirer servers.
In another embodiment, the server system is configured to generate a primary machine learning model based, at least in part, on the common feature set and the unique feature set. In an example, the primary machine learning model includes at least an encoder layer and a prediction head layer. In one embodiment, the primary machine learning model is a diagonal federated learning model.
In one embodiment, the server system is configured to determine a model type of the primary machine learning model to be generated based, at least in part, on an application of the primary machine learning model. Then, the server system is configured to determine information-related to the prediction head layer of the primary machine learning model based, at least in part, on the model type of the primary machine learning model. It is understood that this information can be used to determine the configuration of the prediction head layer of the primary machine learning model. Further, the server system is configured to generate and transmit a model type notification message to the plurality of client servers. Herein, the model type notification message includes at least the information-related to the prediction head layer of the primary machine learning model. In another embodiment, the server system is configured to generate via the primary machine learning model, a server gradient dataset based, at least in part, on the common feature set and the unique feature set.
Further, the server system is configured to generate and transmit a client gradient request message to the plurality of client servers. In response to the client gradient request message, the server system is configured to receive a plurality of client gradient datasets from the plurality of client servers. Herein, the plurality of client gradient datasets is generated by a plurality of secondary machine learning models associated with the plurality of client servers. Further, each secondary machine learning model of the plurality of secondary machine learning models includes a client level prediction head layer identical to the prediction head layer of the primary machine learning model. In one embodiment, the plurality of secondary machine learning models is a diagonal federated learning model. In one implementation, each of the plurality of client servers generates a client gradient dataset via a secondary machine learning model based, at least in part, on a common client feature set and a unique client feature set. In this implementation, the common client feature set is identical to the common feature set and the unique client feature set including features unique to each of the plurality of client servers. In one embodiment, the unique client feature set of each of the plurality of client servers is generated based, at least in part, on client transaction-related data. Herein, the client transaction-related data is unique to each of the plurality of client servers.
In another embodiment, the server system is configured to generate an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets. In one implementation, generating the aggregate gradient dataset includes at first, determining a plurality of common nodes of the plurality of secondary machine learning models based, at least in part, on the common feature set. Then, extracting a plurality of common gradients from the plurality of client gradient datasets based, at least in part, on determining that the plurality of common gradients is generated by the plurality of common nodes. Finally, aggregating the plurality of common gradients to generate the aggregate gradient dataset. In another embodiment, the server system is configured to optimize weights of the primary machine learning model based, at least in part, on the aggregate gradient dataset. Further, the server system is configured to back-propagate the aggregate gradient dataset to the plurality of client servers for optimizing weights of the plurality of secondary machine learning models.
In an embodiment, a client server is configured to receive a model type notification message from a server system. Herein, the model type notification message includes at least information-related to a prediction head layer of a primary machine learning model associated with the server system. In another embodiment, the client server is configured to access a client transaction dataset from a client database associated with the client server. In another embodiment, the client server is configured to generate a common client feature set and a unique client feature set based, at least in part, on the client transaction dataset. Herein, the common client feature set may indicate towards or include features that are shared in common with the server system and the unique client feature set indicate towards or include features that are unique to the client server. In another embodiment, the client server is configured to generate a secondary machine learning model based, at least in part, on the model type notification message, the common client feature set, and the unique client feature set. Herein, the secondary machine learning model includes at least a client level encoder layer and a client prediction head layer. It is noted that the client prediction head layer is identical to the prediction head layer of the primary machine learning model. As may be understood, the client prediction head layer and the prediction head layer can be generated as identical by relying on the information-related to the prediction head layer included in the model type notification message. Thereafter, in response to receiving a client gradient request message from the server system, the client server is configured to generate via the secondary machine learning model, a client gradient dataset based, at least in part, on the common client feature set and the unique client feature set. Furthermore, in response to receiving an aggregate gradient dataset from the server system the client server is configured to optimize weights of the secondary machine learning model based, at least in part, on the aggregate gradient dataset.
Various embodiments of the present disclosure provide multiple advantages and technical effects while addressing technical problems such as how to train shared AI or ML models to yield higher prediction accuracy while addressing the feature dimensionality problem and data privacy concerns described earlier and the like.
To that end, the various embodiments of the present disclosure provide an approach for training and optimizing a plurality of AI and ML models for different entities such as a payment server, an issuer server, an acquirer server, and the like. For instance, the present disclosure describes an improved diagonal federated learning-based approach for training and optimizing the plurality of AI and ML models associated with a server system and a plurality of client servers. As may be understood, a central server such as the server system has a broader understanding of the transactions since the historical transaction dataset provides data corresponding to both cardholders and merchants. For example, if the plurality of client servers are issuer servers, then although they will have insights into cardholders through the transaction data of the cardholders, however, the issuers will lack insights into merchant-related data. Thus, by relying on the historical transaction dataset, a secondary machine learning model of each of the issuers can be trained to learn these merchant-related insights as well since it is optimized based on the aggregate gradient dataset. Additionally, each of the plurality of secondary machine learning models learns from the insights of the other secondary machine learning models as well. Further, the primary machine learning model also learns from the insights of the plurality of secondary machine learning models as well. Therefore, as a result, the overall accuracy of the primary machine learning model and the plurality of secondary machine learning models is improved significantly. It is pertinent to note that since only gradients are shared between the plurality of client servers and the server system, the privacy of data between these entities is maintained. Further, it is understood that the training and optimizing approach of the present disclosure is decentralized in nature thereby it further ensures data privacy and security.
Furthermore, it is noted that the dimensionality of an input layer of the client level encoding head corresponds to input client level features, and the dimensionality of an output layer of the client level encoding head is set to be equal to a dimensionality of an input layer of the client level prediction head layer. Now, since the client level prediction head layer of each of the plurality of secondary machine learning models is set to be identical with the prediction head layer of the primary machine learning model therefore, the output of each of the models involved in the collective training and optimizing process will be compatible with each other. This aspect of the present disclosure eliminates the inter-model compatibility problems arising due to the distinct dimensionalities of different models.
Various embodiments of the present disclosure provide artificial intelligence-based methods, systems electronic devices, and computer program products for training and optimizing artificial intelligence-based learning models associated with a plurality of entities such as a plurality of clients and a central server without sharing training data. In particular, the present disclosure describes a novel diagonal federated learning-based approach for training and optimizing a plurality of AI or ML models associated with the plurality of entities.
Various embodiments of the present disclosure are described hereinafter with reference to FIGS. 1 to 7.
FIG. 1 illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, training and optimizing artificial intelligence-based learning models for a plurality of entities such as a plurality of client servers and a central server without sharing training data, etc.
The environment 100 generally includes a server system 102 (i.e., a central server), a plurality of client servers 104 including a client server 1 104(1), a client server 2 104(2), … a client server N 104(N), wherein N is a natural number, a database 116, a payment network 112 including a payment server 114, each coupled to, and in communication with (and/or with access to) a network 110. The network 110 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among the entities illustrated in FIG. 1, or any combination thereof.
Various entities in the environment 100 may connect to the network 110 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof.
In one implementation, the plurality of client servers 104 (see, the client server 1 104(1), the client server 2 104(2), … the client server N 104(N), wherein N is a natural number) may refer to servers associated with any client or entity that is involved in training and implementing the AI or ML models for performing a variety of applications or tasks. In a non-limiting example, the variety of applications or tasks may include tasks such as fraud detection, risk management, data analytics, algorithmic marketing, customer support, algorithmic trading, and the like. In one embodiment, the plurality of client servers 104 can be or may include at least one of one or more issuer servers associated with one or more issuing banks, one or more acquirer servers associated with one or more acquiring banks, and a combination thereof. The issuer server can be a financial institution that manages cardholder accounts (i.e., payment accounts) of multiple account holders. Transaction-related data of the cardholders associated with the issuer server are stored in a memory of the issuer server or on a cloud server associated with the issuer server. The terms “issuer”, “issuer bank”, “issuing bank” or “issuer server” will be used interchangeably herein.
The acquirer server can be associated with a financial institution (e.g., a bank) that processes financial transactions. This can be an institution that facilitates the processing of payment transactions for physical stores, ATM terminals, merchants, or an institution that owns platforms that make online purchases or purchases made via software applications possible (e.g., shopping cart platform providers and in-app payment processing providers). The terms “acquirer”, “acquirer bank”, “acquiring bank” or “acquirer server” will be used interchangeably herein. The term ‘merchant’ may refer to a seller, a retailer, a purchase location, an organization, or any other entity that is in the business of selling goods or providing services. In one example, the merchant may be a single business location or a chain of business locations of the same entity. In one example, the merchant may be associated with a merchant device which may be used to complete transactions. In some non-limiting examples, the merchant device may include a smartphone, a tablet computer, a handheld computer, a wearable device, a portable media player, a gaming device, a personal digital assistant (PDA), Point-Of-Sale (POS) devices, Point-Of-Purchase (POP) devices, Point-Of-Interaction (POI) devices, and the like.
In one implementation, the plurality of client servers 104 can be configured to train a plurality of AI or ML models for determining a fraudulent transaction likelihood. In such a scenario, a client server such as the client server 1 104(1) can use transaction-related data for generating transaction features. Conventionally, the AI or ML models are generated by the client servers such as the client server 1 104(1) based on these transaction features. As may be understood, each of the plurality of client servers 104 may generate different transaction features since these features are generated using different data samples. For example, the client server 1 104(1) may be an issuer that utilizes its transaction-related data collected from a first group of cardholders associated with it to generate transaction features such as bank account-related features, demographic-related features, active cardholder product-related features, transaction amount related features, transaction date related features, and the like. On the other hand, another client server such as the client server 1 104(2) may be another issuer that utilizes its transaction-related data collected from a second group of cardholders associated with it to generate transaction features such as cardholder credit-related features, cardholder behavior-related features, transaction amount related features, transaction date related features, and the like.
It is understood that since distinct transaction features created using distinct transaction related data from different client servers are used for generating AI or ML models by the plurality of client servers 104, it is understood that each of these AI or ML models will yield different accuracies while determining the fraudulent transaction likelihood, i.e., the performance of these models will be different even if the end goal of models remains the same. Therefore, there exists a need for an approach that leverages the distinct transaction-related data (or transaction features) of the plurality of client servers 104 for generating the AI or ML models with even higher performance.
One such approach is to collate the transaction-related data from each of the plurality of client servers 104 and generate transaction features from the collated dataset. Further, these transaction features can be used to generate the AI or ML models with higher performance. However, this approach is riddled with various issues. At first, collating data from different sources can lead to privacy issues as described earlier. Further, generating the transaction features using the collated dataset will require a tremendous amount of processing resources and time which will, in turn, incur heavy financial costs.
Another approach is to use federated learning for training the AI or ML models in a decentralized manner as described earlier. However, this approach is unable to leverage the relationships between the transaction-related data from each of the plurality of client servers 104. Further, unique transaction-related data samples and unique features thus generated by different clients may differ in dimensionality from each other. Therefore, the existing federated learning methods are unable to leverage all distinct transaction features from the plurality of client servers 104 for improving the performance of the AI or ML models thus, generated. It is noted that although the various embodiments of the present disclosure are described hereinafter with reference to payment transactions in payment networks, the same should not be construed to be limiting. To that end, various embodiments may be implemented to train and optimize the AI or ML models for any suitable tasks.
To overcome the above-mentioned and other possible limitations, the present disclosure provides a central server, i.e., the server system 102. The server system 102 is configured to train and optimize the AI or ML models for a plurality of entities such as the plurality of client servers 104 and the server system 102 without sharing training data between the plurality of entities. In particular, the server system 102 is configured to implement an improved diagonal federated learning-based approach for training and optimizing a plurality of AI or ML models associated with the plurality of entities.
In an embodiment, the server system 102 further includes at least a primary machine learning model 106. In an example, the server system 102 is associated with a database 116. In one implementation, the database 116 provides the storage location for the primary machine learning model 106 of the server system 102. The database 116 may be incorporated in the server system 102 or may be an individual entity connected to the server system 102 or may be the database 116 stored in cloud storage. In an embodiment, the server system 102 can be a separate part of the environment 100 and may operate as a separate component from (but still in communication with, for example, via the network 110) the plurality of client servers 104. In addition, the server system 102 should be understood to be embodied in at least one computing device in communication with the network 110, which may be specifically configured, via executable instructions, to perform functions as described herein, and/or embodied in at least one non-transitory computer-readable media.
In an embodiment, the server system 102 is configured to perform one or more of the operations described herein. The server system 102 is configured to access a historical transaction dataset from a database such as the database 116 associated with the server system 102. It is understood that the historical transaction dataset corresponds to data samples associated with payment transactions performed by any of a plurality of cardholders, and a plurality of merchants associated with the plurality of client servers 104 with each other. Then, the server system 102 is configured to generate a common feature set and a unique feature set based, at least in part, on the historical transaction dataset. The term ‘common feature set’ refers to features (or transaction features) that are shared in common with the plurality of client servers 104. For example, a common feature set includes features such as transaction amount, transaction data, and the like. The term ‘unique feature set’ refers to features (or transaction features) that are unique to the server system 102. For example, a unique feature set includes merchant-related features, fraud rate in the past 30/60/90 days, fraud rate of the issuer, fraud rate of the acquirer, fraud rate of the merchant, and the like. It is understood that since the historical transaction dataset relates to data samples of all the transactions performed within the payment network 112, the unique features thus generated will provide insights that will not be shared by the plurality of client servers 104 (this aspect will be described later in the present disclosure).
The server system 102 is further configured to generate a primary machine learning model (such as the primary machine learning model 106) based, at least in part, on the common feature set and the unique feature set. In an embodiment, the primary machine learning model 106 can be any AI or ML-based learning model with at least an encoder layer and a prediction head layer. Then, the server system 102 is configured to generate via the primary machine learning model 106, a server gradient dataset based, at least in part, on the common feature set and the unique feature set. The server gradient dataset includes at least the gradients generated by each layer of the primary machine learning model 106. Thereafter, the server system 102 is configured to generate and transmit a client gradient request message to the plurality of client servers 104. In response to the client gradient request message, a plurality of client gradient datasets is received from the plurality of client servers 104. Herein, the plurality of client servers 104 are configured to generate the client gradient datasets via a plurality of secondary machine learning models 108(1), 108(2), …, 108(N) wherein N is a natural number associated with the plurality of client servers 104. It is noted that the plurality of secondary machine learning models 108(1), 108(2), …, 108(N) are referred hereinafter to as a plurality of secondary machine learning models 108 for the sake of brevity.
In an embodiment, each secondary machine learning model (such as the secondary machine learning model 108(1)) of the plurality of secondary machine learning models 108 includes at least a client level prediction head layer along with a client level encoding head. The client level prediction head layer is identical to the prediction head layer of the primary machine learning model 106. It is pertinent to note that client level encoding head is configured such that a dimensionality of an input layer of the client level encoding head corresponds to input client level features and a dimensionality of an output layer of the client level encoding head is equal to a dimensionality of an input layer of the client level prediction head layer.
The server system 102 is further configured to generate an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets. Further, the server system 102 is configured to optimize weights of the primary machine learning model 106 based, at least in part, on the aggregate gradient dataset. Furthermore, the server system 102 is configured to back-propagate the aggregate gradient dataset to the plurality of client servers 104 for optimizing weights of the plurality of secondary machine learning models 108. In other words, upon receiving the aggregate gradient dataset, each of the plurality of client servers 104 is configured to update the weights of each of the plurality of secondary machine learning models 108 to optimize their corresponding models.
Therefore, the present disclosure provides an improved diagonal federated learning method that trains and optimizes the AI or ML models associated with the server system 102 and the plurality of client servers 104. As may be understood, the central server, i.e., the server system 102 has a broader understanding of the transactions since the historical transaction dataset provides data corresponding to both cardholders and merchants. For example, if the plurality of client servers 104 are issuer servers, then although they will have insights into cardholders through the transaction data of the cardholders, however, the issuers will lack insights into merchant-related data. Thus, by relying on the historical transaction dataset, a secondary machine learning model of each of the issuers can be trained to learn these merchant-related insights as well since it is optimized based on the aggregate gradient dataset. Additionally, each of the plurality of secondary machine learning models 108 learns from the insights of the other secondary machine learning models as well. Further, the primary machine learning model 106 also learns from the insights of the plurality of secondary machine learning models 108 as well. Therefore, as a result, the overall accuracy of the primary machine learning model 106 and the plurality of secondary machine learning models 108 is improved significantly. The same has been explained with the help of an experiment later in the present disclosure.
It is understood that the training and optimizing approach of the present disclosure is decentralized in nature thereby it ensures data privacy and security. Further, by generating features at the sever system 102 and the plurality of client servers 104, the processing requirements can be distributed while reducing the time required for performing the desired computations. Further, by configuring the client level encoder head of each of the plurality of secondary machine learning models 108, the inter-model compatibility problems arising due to distinct dimensionalities of different models are also eliminated. In particular, the dimensionality of an input layer of the client level encoding head corresponds to input client level features and the dimensionality of an output layer of the client level encoding head is equal to a dimensionality of an input layer of the client level prediction head layer. Further, the client level prediction head layer of each of the plurality of secondary machine learning models 108 is configured to be identical with the prediction head layer of the primary machine learning model 106, therefore, the output of each of the models involved in the collective training and optimizing process will be compatible with each other.
In one embodiment, the payment network 112 may be used by the payment card issuing authorities as a payment interchange network. The payment network 112 may include a plurality of payment servers such as the payment server 114. Examples of payment interchange networks include, but are not limited to, a Mastercard® payment system interchange network. The Mastercard® payment system interchange network is a proprietary communications standard promulgated by Mastercard International Incorporated® for the exchange of financial transactions among a plurality of financial activities that are members of Mastercard International Incorporated®. (Mastercard is a registered trademark of Mastercard International Incorporated located in Purchase, N.Y.). It is noted that server system 102 can be implemented within the payment server 114 of the payment network 112.
The number and arrangement of systems, devices, and/or networks shown in FIG. 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1. Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device as shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 100 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 100.
Referring now to FIG. 2, a simplified block diagram of a server system 200 is illustrated, in accordance with an embodiment of the present disclosure. The server system 200 is similar to the server system 102 of FIG. 1. In some embodiments, the server system 200 may be embodied as a cloud-based and/or SaaS-based (software as a service) architecture or by the payment server 114 of FIG. 1. In one embodiment, the server system 200 is a part of the payment network 112. The server system 200 is configured to train an AI or ML model such as the primary machine learning model 106 of FIG. 1 and collectively optimize the plurality of secondary machine learning models 108 associated with the plurality of client servers 104 in the network 110.
In one embodiment, the server system 200 includes a computer system 202 and a database 204. The computer system 202 includes at least one processor 206 for executing instructions, a memory 208, a communication interface 210, and a user interface 216. The one or more components of the computer system 202 communicate with each other via a bus 212. The user interface 216 may enable any authorized entity (such as an administrator) to interact with the server system 200 in order to change or tune the operating parameters of the server system 200.
In some embodiments, the database 204 is integrated into the computer system 202. For example, the computer system 202 may include one or more hard disk drives as the database 204. A storage interface 214 is any component capable of providing the processor 206 with access to the database 204. The storage interface 214 may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing the processor 206 with access to the database 204.
The processor 206 includes suitable logic, circuitry, and/or interfaces to execute computer-readable instructions for training and optimizing various AI or ML models in the network 110. Examples of the processor 206 include, but are not limited to, an application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphical processing unit (GPU) processor, a field-programmable gate array (FPGA), and the like. The memory 208 includes suitable logic, circuitry, and/or interfaces to store a set of computer-readable instructions for performing operations. Examples of the memory 208 include a random-access memory (RAM), a read-only memory (ROM), a removable storage drive, a hard disk drive (HDD), and the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the memory 208 in the server system 200, as described herein. In another embodiment, the memory 208 may be realized in the form of a database server or cloud storage working in conjunction with the server system 200, without departing from the scope of the present disclosure.
The processor 206 is operatively coupled to the communication interface 210 such that the processor 206 is capable of communicating with a remote device 218 such as the plurality of client servers 104, the primary machine learning model 106, the plurality of secondary machine learning models 108, the payment network 112, the payment server 114 or communicate with any entity connected to the network 110 (as shown in FIG. 1). It is noted that the server system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the server system 200 may include fewer or more components than those depicted in FIG. 2.
In one embodiment, the processor 206 includes a data pre-processing engine 220, a model generation engine 222, and a model optimization engine 224. It should be noted that the components, described herein, can be configured in a variety of ways, including electronic circuitries, digital arithmetic, logic blocks, and memory systems in combination with software, firmware, and embedded technologies. In an embodiment, the database 204 includes a historical transaction dataset 226, and a primary machine learning model 228 (similar to the primary machine learning model 106 of FIG. 1).
The data pre-processing engine 220 includes suitable logic and/or interfaces for accessing the historical transaction dataset 226 from a database such as the database 204 associated with the server system 200. In an example, the historical transaction dataset 226 includes account-holder or cardholder-related data pertaining to a plurality of payment transactions performed by the account holder or cardholder with a plurality of different account holders or a plurality of merchants. It should be noted that for applications apart from the payment domain such as medical, data analytics, and so on, the historical transaction dataset 226 can be interchanged with different datasets relevant to the application.
In one non-limiting example, the historical transaction dataset 226 includes information related to payment transactions such as, but not limited to, a cardholder identifier (ID), a merchant ID, a User ID, a card-present indicator, a card-not-present indicator, a country code, a city code, a review date, a review length, a review relevance score, fraud scores associated with a user, fraud scores associated with a merchant, a transaction amount, a settlement amount, Primary Account Number (PAN) details, a security level indicator, a PayPass flag, a business type, an industry code, a Merchant Category Code (MCC), etc., among other relevant transaction-related information.
In another embodiment, the data pre-processing engine 220 is further configured to generate a common feature set and a unique feature set based, at least in part, on the historical transaction dataset 226. The term ‘common feature set’ refers to a set of features that are shared in common with the feature set of each of the plurality of client servers 104. In other words, the common feature set includes features that are used in the generation or training of all the machine learning models within the payment network 112, i.e., the primary machine learning model 228 and the plurality of secondary machine learning models 108. In particular, the common feature set is a set of features that are available to different parties (i.e., both the server system 200 and each of the plurality of client servers 104) when a transaction is performed by the cardholders. In various non-limiting examples, the common feature set may include features such as but not limited to the amount of a transaction, the date of the payment transaction, and the like.
The term ‘unique feature set’ refers to a set of features that are unique to the server system 200. In other words, the unique feature set includes features that are used only in the generation or training of the primary machine learning model 228. In particular, the unique feature set is a set of features that are generated based on a vast amount of data from transactions performed by different cardholders in the payment network 112. It is understood that a particular client server such as the client server 1 104(1), will only have data related to the transactions performed by the cardholders associated with it with a merchant if the client server 1 104(1) is an issuer. However, it will not have data related to transactions performed by cardholders associated with another client server such as the client server 2 104(2) for the same merchant. It should be noted that since the server system 200 acts as a payment processor, it will have the transaction data that encompasses a broader range of cardholders, therefore it can generate a unique set of features that none of the plurality of client servers 104 can generate. In various non-limiting examples, the unique feature set may include features such as, but not limited to, merchant-related features, fraud rate in the past 30/60/90 days, fraud rate of the issuer, fraud rate of the acquirer, fraud rate of the merchant, and the like. In another embodiment, the data pre-processing engine 220 is communicably coupled to the model generation engine 222.
The model generation engine 222 includes suitable logic and/or interfaces for generating the primary machine learning model 228 based, at least in part, on the common feature set and the unique feature set. More specifically, at first, the model generation engine 222 determines a model type based, at least in part, on an application for which the primary machine learning model 228 is to be generated. For example, a classification model can be generated if the application of the model is to classify ongoing transactions as fraudulent or non-fraudulent. In other words, a model type is dependent on a specific task or application for which the model is generated. Once the model type is determined then, the model generation engine 222 is configured to determine information related to the prediction head layer of the primary machine learning model 228 based, at least in part, on the model type of the primary machine learning model 228. In various non-limiting examples, the information related to the prediction head layer of the primary machine learning model 228 includes information such as, but not limited, to number of layers, number of neurons, and the like. Further, the model generation engine 222 is configured to generate the primary machine learning model 228. The primary machine learning model 228 includes at least an encoder layer and a prediction head layer. Herein, the prediction layer head of the primary machine learning model 228 is generated based on the information related to the prediction head layer determined based on the determined model type.
In one implementation, the model generation engine 222 is further configured to generate and transmit a model type notification message to the plurality of client servers 104. The model type notification message includes information related to the prediction head layer of the primary machine learning model 228. This notification message enables each of the plurality of client servers 104 to generate and configure their respective secondary machine learning models to have a client level prediction head layer identical to the prediction head layer of the primary machine learning model 228.
Further, the model generation engine 222 is configured to generate via the primary machine learning model 228, a server gradient dataset based, at least in part, on the common feature set and the unique feature set. In an example, the server gradient dataset includes a plurality of gradients corresponding to each layer of the primary machine learning model 228. This has been described further in reference to FIG. 5 later in the present disclosure. In another embodiment, the model generation engine 222 is communicably coupled to the model optimization engine 224.
The model optimization engine 224 includes suitable logic and/or interfaces for generating and transmitting a client gradient request message to the plurality of client servers 104. The client gradient request message is a message for requesting the plurality of client servers 104 for their respective client gradient datasets. The client gradient dataset of any client server is a dataset that includes a plurality of gradients corresponding to each layer of their respective secondary machine learning models. For example, a client gradient dataset of the client server 1 104(1) will include a plurality of gradients corresponding to each layer of the secondary machine learning model 108(1).
Upon receiving the plurality of client gradient datasets from the plurality of client servers 104, the model optimization engine 224 is configured to generate an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets. It is noted that since the prediction head layer of each of the plurality of the secondary machine learning models 108 is identical to the prediction head layer of the primary machine learning model 228, each of the gradients within the plurality of client gradient datasets will be compatible with the gradients within the server gradient dataset, therefore these gradients can be aggregated together. As may be understood, these identical prediction head layers between different machine learning models enable solving the dimensionality problem associated with federated learning where a different number of features are used to generate gradients. This is because identical predication layers will generate compatible gradient datasets regardless of the different or distinct dimensionalities.
More specifically, aggregating the server gradient dataset and the plurality of client gradient datasets further includes, at first, determining a plurality of common nodes of the plurality of secondary machine learning models 108 based, at least in part, on the common feature set. Then, the model optimization engine 224 is configured to extract a plurality of common gradients from the plurality of client gradient datasets based, at least in part, on determining that the plurality of common gradients is generated by the plurality of common nodes. Thereafter, the model optimization engine 224 is configured to aggregate the plurality of common gradients to generate the aggregate gradient dataset. As may be understood, the common feature set enables the model optimization engine 224 to determine the common nodes which further enables the aggregation of the gradients. To that end, the utilization of the common feature set of the server system 200 is vital for efficiently aggregating the gradients.
Further, the model optimization engine 224 is configured to optimize weights of each layer of the primary machine learning model 228 based, at least in part, on the aggregate gradient dataset. It is understood that since the weights of the primary machine learning model 228 are optimized using the aggregate gradient dataset, an optimized primary machine learning model will be able to make predictions based on the insights gained from the data of both the server system 200 and the plurality of client servers 104. This optimization process can be further iterated till the performance of the primary machine learning model 228 saturates. As may be understood, since only gradients are shared between the plurality of client servers 104 and the server system 200, the privacy of data between them is maintained.
Furthermore, the model optimization engine 224 is configured to back-propagate the aggregate gradient dataset to the plurality of client servers 104. It is understood that upon receiving the aggregate gradient dataset, each of the plurality of client servers 104 can individually optimize their respective secondary machine learning models. In other words, back-propagating the aggregate gradient dataset enables the plurality of client servers 104 to optimize the weights of the plurality of secondary machine learning models 108. As explained earlier, since only gradients are shared between the plurality of client servers 104 and the server system 200, the privacy of data between them is maintained.
Referring now to FIG. 3, a simplified block diagram of a client server 300 is illustrated, in accordance with an embodiment of the present disclosure. The client system 300 is similar to any one of the plurality of client servers 104 of FIG. 1. In some embodiments, the client server 300 may be embodied as a cloud-based and/or SaaS-based (software as a service) architecture or by any server associated with a client entity such as an issuing bank, an acquiring bank, a medical institute, and the like. In one embodiment, the client server 300 is a part of the payment network 112. The client server 300 is configured to train an AI or ML model such as the secondary machine learning model 108(1) of FIG. 1 and collectively optimize the primary machine learning model 228 associated with the server system 200 and the rest of the plurality of secondary machine learning models 108 associated with other client servers within the plurality of client servers 104 in the network 110. It is noted that the following description of FIG. 3 describes the operation of the embodiments of the present disclosure at the client’s end and the role of each of the plurality of client servers 104 in collective training and optimizing the various AI or ML models in the payment network 112. As described earlier, although the various embodiments of the present disclosure have been described with reference to training AI or ML models within a payment ecosystem, it should be noted that the various embodiments described herein can be applied to different industries as well. Similarly, the client-end implementation can be applied to different industries as well, without departing from the scope of the present disclosure.
In one embodiment, the client server 300 includes a computer system 302 and a client database 304. The computer system 302 includes at least one processor 306 for executing instructions, a memory 308, a communication interface 310, and a user interface 316. The one or more components of the computer system 302 communicate with each other via a bus 312. The user interface 316 may enable any authorized entity (such as an administrator) to interact with the client server 300 in order to change or tune the operating parameters of the client server 300.
In some embodiments, the client database 304 is integrated into the computer system 302. For example, the computer system 302 may include one or more hard disk drives as the client database 304. A storage interface 314 is any component capable of providing the processor 306 with access to the client database 304. The storage interface 314 may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing the processor 306 with access to the client database 304.
The processor 306 includes suitable logic, circuitry, and/or interfaces to execute computer-readable instructions for training and optimizing various AI or ML models in the network 110. Examples of the processor 306 include, but are not limited to, an application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphical processing unit (GPU) processor, a field-programmable gate array (FPGA), and the like. The memory 308 includes suitable logic, circuitry, and/or interfaces to store a set of computer-readable instructions for performing operations. Examples of the memory 308 include a random-access memory (RAM), a read-only memory (ROM), a removable storage drive, a hard disk drive (HDD), and the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the memory 308 in the client server 300, as described herein. In another embodiment, the memory 308 may be realized in the form of a database server or cloud storage working in conjunction with the client server 300, without departing from the scope of the present disclosure.
The processor 306 is operatively coupled to the communication interface 310 such that the processor 306 is capable of communicating with a remote device 318 such as other client servers of the plurality of client servers 104, the primary machine learning model 228, the plurality of secondary machine learning models 108 associated with the other client servers of the plurality of client servers 104, the payment network 112, the payment server 114 or communicate with any entity connected to the network 110 (as shown in FIG. 1). It is noted that the client server 300 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the client server 300 may include fewer or more components than those depicted in FIG. 3.
In one embodiment, the processor 306 includes a client data pre-processing engine 320, a model configuration engine 322, and a client model optimization engine 324. It should be noted that the components, described herein, can be configured in a variety of ways, including electronic circuitries, digital arithmetic and logic blocks, and memory systems in combination with software, firmware, and embedded technologies. In an embodiment, the client database 304 includes a client transaction dataset 326, and a secondary machine learning model 328 (similar to any of the plurality of secondary machine learning models 108 such as the secondary machine learning model 108(1) of FIG. 1). It is noted that the client transaction dataset will be unique to each of the plurality of client servers 104.
The client data pre-processing engine 320 includes suitable logic and/or interfaces for accessing the client transaction dataset 326 from a database (such as the client database 304) associated with the client server 300. In an example, the client transaction dataset 326 includes account holder or cardholder-related data pertaining to a plurality of payment transactions performed by the account holders or cardholders associated with the client server 300 with a plurality of different account holders or a plurality of merchants. It should be noted that for applications apart from the payment domain such as medical, data analytics, and so on, the client transaction dataset 326 can be interchanged with different datasets relevant to the application.
In one non-limiting example, the client transaction dataset 326 includes information related to payment transactions such as, but not limited to cardholder identifier (ID), merchant ID, User ID, cardholder profile, account holder profile, cardholder/account holder demographics, active cardholder product-related information, card present indicator, card-not-present indicator, country code, city code, review date, review length, review relevance score, fraud scores associated with the cardholder, transaction amount, settlement amount, Primary Account Number (PAN), security level indicator, PayPass flag, business type, industry code, Merchant Category Code (MCC), etc., among other relevant transaction-related information for cardholders/account holders associated with the client server 300. It is noted that the client transaction dataset 326 is distinct from the historical transaction dataset 226 since the client transaction dataset 326 will include transaction data only for those cardholders that have an account with the client server 300. In other words, the client transaction dataset 236 includes client specific transaction related data.
In another embodiment, the client data pre-processing engine 320 is further configured to generate a common client feature set and a unique client feature set based, at least in part, on the client transaction dataset 326.
The term ‘common client feature set’ refers to a set of features that are shared in common with the feature set of the plurality of client servers 104. In other words, the common client feature set includes features that are used in the generation or training of all the machine learning models within the payment network 112, i.e., the secondary machine learning model 328, the primary machine learning model 228, and other secondary machine learning models of the plurality of secondary machine learning models 108. In particular, the common client feature set is a set of features that are available to different parties (i.e., both the client server 300, the server system 200, and other client servers of the plurality of client servers 104) when a transaction is performed by the cardholders associated with the client server 300. In various non-limiting examples, the common client feature set may include features such as, but not limited to, the amount of a transaction, date of the payment transaction, and the like. It is pertinent to note that each client server (such as the client server 300) of the plurality of client servers 104 will have cardholders that are mutually exclusive to each other. It should also be noted that the common client feature set is identical to the common feature set.
The term ‘unique client feature set’ refers to a set of features that are unique to the client server 300. In other words, the unique client feature set includes features that are used only in the generation or training of the secondary machine learning model 328. In particular, the unique client feature set is a set of features that are generated based on the data from transactions performed by different cardholders associated with the client server 300. For example, a particular client server such as the client server 1 104(1), will only have data related to the transactions performed by the cardholders associated with it with a merchant, if client server 1 104(1) is an issuer. It should be noted that the client server 300 can generate a targeted unique client feature set based on the client transaction dataset 326 based on the profile of different cardholders/account holders.
In various non-limiting examples, the unique client feature set may include features such as, but not limited to, cardholder demographic-related features, features related to financial products used by the cardholder, and the like. It is understood that even if different client holders generate cardholder demographic-related features, these features will be mutually distinct since the cardholder data used to generate these features will be different. In another embodiment, the client data pre-processing engine 320 is communicably coupled to the model configuration engine 322.
The model configuration engine 322 includes suitable logic and/or interfaces for generating the secondary machine learning model 328 based, at least in part, on the model type notification message, the common client feature set, and the unique client feature set. More specifically, at first, the model configuration engine 322 receives the model type notification message including at least the information related to the prediction head layer from the server system 200. Upon determining the model type, the model configuration engine 322 is configured to generate the secondary machine learning model 328. The secondary machine learning model 328 includes at least a client level encoder layer and a client prediction head layer. It is noted that the client level prediction head layer is identical to the prediction head layer of the primary machine learning model 228. Further, it is noted that the client level encoder head is configured based on the combined number of the common client feature set and the unique client feature set.
Further, in response to receiving the client gradient request message from the server system 200, the model configuration engine 322 is configured to generate via the secondary machine learning model 328, a client gradient dataset based, at least in part, on the common client feature set and the unique client feature set. In an example, the client gradient dataset includes a plurality of gradients corresponding to each layer of the secondary machine learning model 328. This is similar to the generation of the server gradient dataset explained in reference to FIG. 5 later in the present disclosure, therefore the same is not repeated for the sake of brevity. In another embodiment, the model configuration engine 322 is communicably coupled to the client model optimization engine 324.
The client model optimization engine 324 includes suitable logic and/or interfaces for receiving the back-propagated aggregate gradient dataset from the server system 200. Upon receiving the aggregate gradient dataset, the client model optimization engine 324 is configured to optimize the weights of the secondary machine learning model 328 based, at least in part, on the aggregate gradient dataset. This process is similar to the optimization process already explained with reference to FIG. 2, therefore the explanation is repeated for the sake of brevity. As explained earlier, since only gradients are shared between the client server 300, the server system 200, and the other client servers of the plurality of client servers 104, the privacy of data between them is maintained.
FIG. 4 is a sequence flow diagram 400 for training and optimizing a primary machine learning model (such as the primary machine learning model 228) of the server system 200 and a plurality of secondary machine learning models (such as the plurality of secondary machine learning models 108), in accordance with an embodiment of the present disclosure. The sequence of operations of the sequence flow diagram 400 may not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped and performed in the form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner. It is to be noted that to explain the sequence flow diagram 400, references may be made to elements described in FIGS. 1-3. It is understood that the various steps of the sequence flow have already been explained wither reference to FIGS. 2 and 3, therefore an explanation for the same is not repeated herein for the sake of brevity. It is noted that the server system 406 of FIG. 4 is identical to the server system 200 of FIG. 2. The sequence flow begins at step 408.
At 408, the server system 406 determines a model type for the plurality of client servers 104. Herein, an example of two client servers, i.e., the client server 1 402 and the client server 2 404 has been taken for representing the plurality of client servers 104 for the sake of simplicity. It is noted that before the model training and optimizing process initiates, various operating parameters between the plurality of client servers 104 and the server system 406 have to be aligned to ensure proper training and optimization of models. For instance, before the training process begins, operating parameters such as the number of the plurality of client servers 104, model type (such as a regression type model, a classification type model, and the like), and information related to the prediction head layer are determined by the server system 406 and agreed upon by the plurality of client servers 104 participating the training process. In other words, various information is shared and confirmed between the plurality of client servers 104 and the server system 406.
At 410, the server system 406 determines information related to the prediction head layer of the primary machine learning model 228 based, at least in part, on the model type determined in the previous step (i.e., step 408). In various non-limiting examples, the information related to the prediction head layer of the primary machine learning model 228 includes information such as, but is not limited, to number of layers, number of neurons, and the like. It is noted that the prediction head layer is present in each model involved in the training and optimizing process. In other words, the prediction head layer is identical for all the machine learning models involved in the training and optimization process.
At 412(1) and 412(2), the server system 406 generates and transmits a model type notification message to the plurality of client servers 104, i.e., the client server 1 402 and the client server 2 404. As described earlier, the model type notification message includes at least information related to the prediction head layer of the primary machine learning model 228. In some scenarios, the information related to the prediction head layer of the primary machine learning model 228 may also indicate the model type of the model to be trained or generated, for example, a classification type model, etc.
At 414, the server system 406 generates the primary machine learning model 228. In particular, the server system 406 generates the primary machine learning model 228 based, at least in part, on the common feature set and the unique feature set generated based on the data from the historical transaction dataset 226.
At 416(1) and 416(2), the client server 1 402 and client server 2 404 generate and configure the secondary machine learning model, respectively. In particular, the client servers generate the secondary machine learning model based, at least in part, on the model type notification message, common client feature set, and unique client feature set generated based on the data from the client transaction dataset 226. It is noted that the client level prediction head layer of both the secondary machine learning models is identical to the prediction head layer of the primary machine learning model 228.
At 418, the server system 406 generates a server gradient dataset via the primary machine learning model 228 based, at least in part, on the common feature set and the unique feature set.
At 420(1) and 420(2), the server system 406 transmits client gradient request messages to the client server 1 402 and the client server 2 404, respectively.
At 422(1) and 422(2), the client server 1 402 and the client server 2 404, respectively generate a client gradient dataset.
At 424(1) and 424(2), the client server 1 402 and client server 2 404 transmit the client gradient dataset to the server system 406, respectively.
At 426, the server system 406 generates an aggregate gradient dataset based, at least in part, on aggregating the server gradient dataset and the plurality of client gradient datasets.
At 428, the server system 406 optimizes the primary machine learning model 228 by optimizing the weights of the different layers of the primary machine learning model 228 to generate an optimized primary machine learning model 228.
At 430(1) and 430(2), the server system 406 back-propagates the aggregate gradient dataset the client server 1 402 and client server 2 404.
At 432(1) and 432(2), the client server 1 402 and the client server 2 404 optimize their respective secondary machine learning model by optimizing the weights of the different layers of their respective secondary machine learning models to generate optimized secondary machine learning models. It is noted that steps 418-432(2) can be iterated till the performance of the primary machine learning model 228 and the plurality of secondary machine models 108 saturate to obtain the final primary machine learning model and the plurality of final secondary machine learning models.
FIG. 5 illustrates a graphical representation 500 of a diagonal federated learning process for training and optimizing the primary machine learning model 502 and the plurality of secondary machine learning models, in accordance with an embodiment of the present disclosure.
As depicted, the server system 200 is associated with the primary machine learning model 502. It is noted that the primary machine learning model 502 is identical to the primary machine learning model 228 of FIG. 2. Further, the server system 200 is communicably coupled to the secondary machine learning models associated with the client server 1 104(1) and the client server 2 104(2) (not depicted for brevity). The secondary machine learning models are represented by a client 1 model 506 and a client 2 model 508. It is noted that any of the client 1 model 506 and the client 2 model 508 are similar to the secondary machine learning model 328 of FIG. 3.
Further, each machine learning model is generated based on their respective datasets. In particular, the primary machine learning model 502 is trained and tested based on common and unique feature sets generated using the historical transaction dataset 504. It is noted that the historical transaction dataset 504 is identical to the historical transaction dataset 226 of FIG. 2. Similarly, the client 1 model 506 is associated with a client 1 transaction dataset 510 and the client 2 model 508 is associated with a client 2 transaction dataset 512.
As described earlier with reference to FIG. 2, the server system 200 is configured to generate a server gradient dataset via the primary machine learning model 502 based, at least in part, on the common feature set.
More specifically, assuming the server system 200 (CS) is associated with the historical transaction dataset 504, DS = (IS, XS, YS) where IS represents identifier (ID) space for the CS; XS represents the feature space for the CS; and YS represents the label space for the CS. Herein, it is noted that the ID space for a dataset refers to the range of unique identifiers that can be assigned to a plurality of items or records within that dataset. For instance, in the current scenario, ID for a record/row/item in a data will be an identifier of a transaction. In an example, a sequence number and a transaction date of a transaction may be combined to generate a unique identifier for the transaction. Therefore, the ID space for a dataset is the set of all ID values in a dataset.
Further, assuming the plurality of client servers participating in training and optimizing the plurality of the secondary machine learning models (represented by two models, i.e., the client 1 model 506 and the client 2 model 508 for the sake of simple illustration) can be represented as a set of C = C1, C2, C3, …, CK. Here, K is the number of the plurality of client servers 104. It is noted that the plurality of participating client servers is associated with a plurality of distinct client transaction datasets (see, the client 1 transaction dataset 510 and the client 2 transaction dataset 512), DK = (IK, XK, YK) where IK represents ID space for the Kth client server; XK represents the feature space for the Kth client server; and YK represents the label space for the Kth client server.
It is noted that herein IS = I1 ∪ I2 ∪ I3…∪ IK and Ii ∩ Ij = ∅ , i.e., they are mutually exclusive, and where i ≠j. In other words, this learning approach of the present disclosure is valid for all scenarios where the intersection of the ID space of different client servers from the plurality of client servers 104 will be zero or a null set. It is understood that there will exist no overlap between ID space or transaction-related data of different client servers. For instance, in an example, when different issuers are participating in the collective model training and optimizing process, the dataset of the first issuer will include all the transactions that are done by cardholders associated with the first issuer. Similarly, the dataset of a second issuer will include all the transactions that are done by cardholders associated with the second issuer. As may be understood, these two datasets will be mutually exclusive since there will exist no common transaction between them (as the transactions are performed by different cardholders). In other words, the client transaction dataset is unique to each of the plurality of client servers 104. However, since both the first issuer and the second issuer are communicably coupled with the server system 200 which acts as a payment gateway in some scenarios to enable the transactions to take place therefore, the server system 200 will have all data points, transaction-related data that is common with the plurality of client servers 104. It is noted that although the server system 200 will have transaction-related data, it lacks client level data points such as data related to cardholders, cardholder profiles, and the like that can play a vital role in improving the performance of predictive models when training models.
Based on the above-mentioned, it is understood that different clients will have different ID spaces/datasets. However, it is noted that different clients can share the same features that are used to train the model. For instance, the first issuer and the second issuer can both compute cardholder-specific fraud-related features based on their own distinct datasets. Therefore, the learnings generated by each client model will be different even if they learn on the same/similar features due to the presence of distinct datasets. Conclusively, it can be gathered that if the learning from different models can be combined through the approach described in the present disclosure to train and optimize all the models involved in the training and optimizing process, then the overall model performance of each individual model can be increased drastically. This aspect is explained with the help of experiments later in the present disclosure.
As may be understood from the above description, the server system 200 has all data points that are common with the plurality of client servers 104 however, the plurality of client servers 104 do not have common data points.
Assuming, MS represents the primary machine learning model 502 associated with the central server and MK represents the secondary machine learning model (see, the client 1 model 506 and the client 2 model 508) associated with their respective client servers. Let the dimensions of feature space XK be DK and XS be DS. Then, there is no constraint that D1=D2=D3=…. = DK = DS, i.e., the dimensions can have any values.
As described earlier with reference to FIG. 2, MS includes an encoder layer and a prediction head layer and MK includes a client level encoder layer and a client level prediction head layer such that the client level prediction head layer is identical to the prediction head layer. Therefore, it is noted that the task of an encoder layer or the client level encoder layer is to encode information from dimension DK or DS to a pre-defined common dimension DC (this is due to the presence of the identical prediction head layer).
In an embodiment, the CS is configured to generate a server gradient dataset. Further, the CS is configured to generate and transmit a client gradient request message to the plurality of client servers 104. In response to receiving the client gradient request message, the client 1 model 506 and the client 2 model 508 are configured to transmit the client gradient datasets to the CS (see, communication links 514 and 516).
Upon receiving the plurality of client gradient datasets, the CS is configured to generate an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets. Further, the CS is configured to back-propagate the aggregate gradient dataset to the client 1 model 506 and the client 2 model 508 (see, communication links 518 and 520).
Furthermore, the server system 200 and the plurality of client servers 104 are configured to optimize the weights of their respective machine learning model based, at least in part, on the aggregate gradient dataset. This optimization process can be repeated for ‘t’ iterations till the performance of the models on a training dataset saturates. It is noted that the embodiments of the present disclosure improve the overall performance of all the models within the learning group while maintaining data privacy.
In one non-limiting example, the pseudo-code for the algorithm utilized by the present disclosure is given below:
Initialize ω_(s,0), ω_(k,0) for each iteration round t = 1,2,3… and so on.
For each client k=1,2,… and so on.
Compute D_(s,k) as subset of D_s which contains I_S∩I_K " as ID space" .
█(&g_(s,t)^e,g_(s,t)^h←" local update " (D_(s,k),ω_(s,t) )@&g_(k,t)^e,g_(k,t)^h←" local update " (D_k,ω_(k,t) )@&g_t^h←((g_(s,t)^h+g_(k,t)^h ))/2@&ω_(s,t+1)^h←ω_(s,t)^h-ηg_t^h@&ω_(s,t+1)^e←ω_(s,t)^e-ηg_(s,t)^e@&ω_(k,t+1)^h←ω_(k,t)^h-ηg_t^h@&ω_(k,t+1)^e←ω_(k,t)^e-ηg_(k,t)^e )
where local-update (D,ω) is :
■(g^e=∑_(i∈D)▒ ∂l(x_i,y_i )/(∂ω^e )@g^h=∑_(i∈D)▒ ∂l(x_i,y_i )/(∂ω^h )@" return " g^e,g^h )
Description of the terms used herein:
t : round of iterative process;
ω : Model parameters;
ω^e: Parameters of encoder part of the model;
ω^h : Parameters of prediction head part of the model;
ω_K : Parameters of kth client’s secondary machine learning model;
ω_s : Parameters of the central server’s primary machine learning model;
ω_(k,t) : Parameters of kth client’s secondary machine learning model in round t;
ω_S,t : Parameters of kth central server’s primary machine learning model in round t;
l(x_i,y_i )= value of loss function evaluated at point (x_i,y_i ); and
∂l(x_i,y_i )/∂ω= value of the gradient of loss function w.r.t. parameter ω.
It is understood from the non-limiting pseudo-code provided above that training of the models is done using a gradient-descent algorithm in a particular implementation. This training process is iterative in nature, wherein for each iteration, gradients for all parameters of the model (i.e., derivative of the loss function with respect to parameters of the models) are computed. Further, these gradients are used to compute new values of the parameters. As may be understood, the equations shown in the non-limiting exemplary pseudo-code perform the gradient descent in a particular implementation. Herein, are gradients of the encoder part of the model at iteration ‘t’ calculated on dataset Ds, k. Similarly, are gradients of prediction head layers. At the local-update step, gradients are computed for the given dataset and model parameters. In a non-limiting implementation, the local update step may be performed two times. At first, it is performed on Ds, k i.e., the intersection of the central server dataset with the client K dataset. Second, it is performed on Dk, i.e., a client K dataset. These two gradients for the prediction layer are averaged and this averaged gradient is used to further update the parameters of the prediction layer of the Kth client and server model. On the other hand, gradients of encoder layers are used to update the parameters of encoder layers. These updates are performed in the last 4 steps of the exemplary pseudo-code.
To verify the improvement in model performance by using the approach provided by the present disclosure various experiments have been performed on publicly available datasets provided by entities such as the Kaggle dataset® consisting of transaction data. The modeling task or application was determined to be a classification task where the target variable is to predict whether a cardholder will perform a specific transaction in the future. Further, the results of the experiments for the prior-art approach and the proposed approach have been tested using the same datasets as well.
In an example, an anonymized dataset including 200 feature variables and binary target variables was generated based on the transaction data from the Kaggle dataset®. It is understood that in the payment network 112, a payment processor processes all the payment transactions for the cardholders associated with the plurality of client servers 104. This payment processor may operate the server system 200 to implement the embodiments described herein. Now, in order to simulate the data sharing setting similar to the payment eco-system environment, the anonymized dataset was split into three partitions, wherein each partition corresponded to two client servers (say, issuer 1 and issuer 2) and one server system 200.
It is noted that the dataset corresponding to the server system 200 has all the rows of the full dataset since it will have access to all payment transactions performed by the plurality of cardholders associated with each of the plurality of client servers 104. On the other hand, datasets corresponding to each client have 50% of rows such that no rows are common between them. This is because different clients will not share the same cardholders between them. For example, a cardholder of the first issuer can perform multiple transactions with different merchants however, another issuer will not have any information regarding these transactions since the cardholder has no relationship with it. To that end, each party had 50 common features and 50 unique features out of 200. More specifically, each party had 100 features available to them.
To measure the effectiveness or performance of the approach provided in the present disclosure, at first, the models were trained based on the individual datasets (i.e., the conventional approach for training models). In particular, a sample model is generated that is trained on a complete transaction dataset. In other words, the sample model is trained in data from bother issuer 1 and issuer 2. It is noted that this scenario is hypothetical and can be known as an ideal scenario where all the features and all the records from the plurality of clients 104 (herein e.g., an issuer 1 and an issuer 2) are present at a single location and used to train the sample model. It is noted that the sample model will output all other models (i.e., an issuer 1 model, an issuer 2 model, and a server system model shown in Table 1) since the sample model has exposure to all possible features. As may be understood, upon performing predictions using the sample model, the performance metric value of 0.84 is obtained. This performance metric value is used as an upper bound of performance while testing the trained and optimized federated learning models. Initially, the issuer 1 model, the issuer 2 model, and the server system model are generated based on their respective datasets and their initial performance metric is computed. The performance metric for the issuer 1 model is determined to be 0.761, the issuer 2 model is determined to be 0.759, and the server system model is determined to be 0.76 (as shown in Table 1). Table 1 depicting the performance of the conventional approach is given below:
Model Features Rows Area under ROC curve on test data
Sample Model Complete data 100% 0.84
Issuer 1 model 50 common + 50 issuer 1 features 50% 0.761
Issuer 2 model 50 common + 50 issuer 2 features 50% 0.759
Server system Model 50 common + 50 unique server system features 100% 0.76
Table 1: Results of the conventional model
Thereafter, the issuer 1 model, the issuer 2 model, and the server system model were trained and optimized further by applying the diagonal federated learning-based approach of the present disclosure to generate the final optimized models for the issuer 1, the issuer 2 and the server system 200. Further, the performance metric is computed for all these optimized models respectively. In particular, the tests described earlier were simulated again, and in response improved performance metrics for each of the optimized models were determined, and results are depicted in Table 2.
Proposed Approach Area under ROC curve on test data
Optimized Issuer 1 model 0.80
Optimized Issuer 2 model 0.801
Optimized server system model 0.806
Table 2: Results of the proposed model
As can be seen from Table 2, a clear gain in the performance of each of the optimized models from almost 0.76 auc-roc to approx. 0.80 auc-roc. In other words, each party involved in the learning process gains an improvement in the performance of about 5%. More specifically, it is understood that the model performance of each of the optimized models (i.e., approx. 0.80 auc-roc) is very close to the ideal scenario assumed for the sample model, i.e., 0.84 auc-roc.
In one exemplary application, assuming that an application or task of models to be trained is to determine whether a payment card will face fraudulent behavior or not in the future. Then, the data preparation and feature generation have to be done at the payment card number level. In such a scenario, the clients participating in the learning process will be issuers and the server system 200 will be server associated with a payment processor such as Mastercard®.
At first, feature generation is done at the card number level. For the server system 200, the historical transaction dataset 226 is used to generate features at the card level. These features may include features such as a sum of transaction amount in the past X days of card (X being any natural number), count of transactions in the past X days of card, a number of unique merchants transacted at in the past X days of card, and the like. As may be understood, these features are created using data that is present at the issuer end for various issuer clients. It is noted that the historical transaction dataset 226 has a full view of transactions that are performed at a plurality of merchants whereas any issuer will have a limited view of only those transactions at merchants that are performed by its cards issued by the issuer. Therefore, features of the server system 200 like fraud count of transactions in past X days at the merchant and the like, that are computed from the historical transaction dataset 226 will provide complete insights and a deeper picture of the relationship between different cardholders and different merchants. Further, such features provide extra information that will not be the case if federated learning is not implemented. Such features can be considered as server system specific features (i.e., features of the unique feature set) since these features provide additional information that other clients cannot provide and are beneficial for the model training task.
Further, when the client is an issuer then, the examples of issuer-specific features (i.e., client level unique feature set) that are usually present only at issuer are features like demographic details of cardholder, credit profile of cardholder, information from other products (like loans/insurance/other credit/debit cards) of issuer. Furthermore, when the client is an acquirer then, the examples of acquirer-specific features (i.e., client level unique feature set) that are usually present only at the acquirer are features like features of merchants derived using transactions that are not processed by the payment processor associated with the server system 200 (like processed by other payment processors/networks, transactions done in cheques/deposits/cash and the like). As may be understood, since gradients generated as a result of machine learning models generated based on these unique features from both client and the server system 200 are used to further optimize the model performance, the learnings from all of the models from the server system 200 and the plurality of client servers 104 can be integrated into each of the models being optimized. Therefore, the present approach drastically improves the performance of every model connected with the payment network 112.
FIGS. 6A and 6B, collectively, depict a flow diagram depicting a method 600 for training and optimizing the primary machine learning model 228, in accordance with an embodiment of the present disclosure. The method 600 depicted in the flow diagram may be executed by, for example, the server system 200 or the payment server 114. Operations of the method 600, and combinations of operations in the method 600, may be implemented by, for example, hardware, firmware, a processor, circuitry, and/or a different device associated with the execution of software that includes one or more computer program instructions. The method 600 starts at operation 602.
At operation 602, the method 600 includes accessing, by a server system (such as the server system 200), a historical transaction dataset (such as the historical transaction dataset 226) from a database (such as the database 204) associated with the server system 200.
At operation 604, the method 600 includes generating, by the server system 200, a common feature set and a unique feature set based, at least in part, on the historical transaction dataset 226, the common feature set indicating features that are shared in common with a plurality of client servers 104 and the unique feature set indicating features that are unique to the server system 200.
At operation 606, the method 600 includes generating, by the server system 200, a primary machine learning model 228 based, at least in part, on the common feature set and the unique feature set, the primary machine learning model 228 including at least an encoder layer and a prediction head layer.
At operation 608, the method 600 includes generating, by the server system 200 via the primary machine learning model 228, a server gradient dataset based, at least in part, on the common feature set and the unique feature set.
At operation 610, the method 600 includes generating and transmitting, by the server system 200, a client gradient request message to the plurality of client servers 104. In other words, a client gradient request message is transmitted to each of the plurality of client servers 104.
At operation 612, the method 600 includes receiving, by the server system 200, a plurality of client gradient datasets from the plurality of client servers 104, wherein the plurality of client gradient datasets is generated by a plurality of secondary machine learning models 108 associated with the plurality of client servers 104. Each secondary machine learning model of the plurality of secondary machine learning models 108 includes a client level prediction head layer identical to the prediction head layer of the primary machine learning model 228.
At operation 614, the method 600 includes generating, by the server system 200, an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets.
At operation 616, the method 600 includes optimizing, by the server system 200, weights of the primary machine learning model 228 based, at least in part, on the aggregate gradient dataset.
At operation 618, the method 600 includes back-propagating, by the server system 200, the aggregate gradient dataset to the plurality of client servers 104 for optimizing weights of the plurality of secondary machine learning models 108.
The sequence of operations of the method 600 need not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped and performed in the form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner.
FIG. 7 is a flow diagram depicting a method 700 for training and optimizing the secondary machine learning model 328, in accordance with an embodiment of the present disclosure. The method 700 depicted in the flow diagram may be executed by, for example, the client server 300 such as an issuer server or acquirer server. Operations of the method 700, and combinations of operation in the method 700, may be implemented by, for example, hardware, firmware, a processor, circuitry, and/or a different device associated with the execution of software that includes one or more computer program instructions. The method 700 starts at operation 702.
At operation 702, the method 700 includes receiving, by a client server (such as the client server 300), a model type notification message from a server system (such as the server system 200). The model type notification message includes at least information related to a prediction head layer of a primary machine learning model 228 associated with the server system 200. It is noted that in some scenarios, the information related to a prediction head layer also indicates the model type of the primary machine learning model 228 as well.
At operation 704, the method 700 includes accessing, by the client server 300, a client transaction dataset (such as client transaction dataset 327) from a client database (such as the client database 304) associated with the client server 300.
At operation 706, the method 700 includes generating, by the client server 300, a common client feature set and a unique client feature set based, at least in part, on the client transaction dataset 326, the common client feature set indicating features that are shared in common with the server system 200 and the unique client feature set indicating features that are unique to the client server 300.
At operation 708, the method 700 includes generating, by the client server 300, a secondary machine learning model 328 based, at least in part, on the model type notification message, the common client feature set, and the unique client feature set. The secondary machine learning model 328 includes at least a client level encoder layer and a client prediction head layer, the client prediction head layer being identical to the prediction head layer of the primary machine learning model 228.
At operation 710, the method 700 includes in response to receiving a client gradient request message from the server system 200, generating, by the client server 300 via the secondary machine learning model 328, a client gradient dataset based, at least in part, on the common client feature set and the unique client feature set.
At operation 712, the method 700 includes in response to receiving an aggregate gradient dataset from the server system 200, optimizing, by the client server 300, weights of the secondary machine learning model 328 based, at least in part, on the aggregate gradient dataset.
The sequence of operations of the method 700 need not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped and performed in the form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner.
The disclosed method with reference to FIGS. 6A-6B & 7, or one or more operations of the server system 200 or the client server 300 may be implemented using software including computer-executable instructions stored on one or more computer-readable media (e.g., non-transitory computer-readable media, such as one or more optical media discs, volatile memory components (e.g., DRAM or SRAM), or nonvolatile memory or storage components (e.g., hard drives or solid-state nonvolatile memory components, such as Flash memory components) and executed on a computer (e.g., any suitable computer, such as a laptop computer, netbook, Web book, tablet computing device, smartphone, or other mobile computing devices). Such software may be executed, for example, on a single local computer or in a network environment (e.g., via the Internet, a wide-area network, a local-area network, a remote web-based server, a client-server network (such as a cloud computing network), or other such networks) using one or more network computers. Additionally, any of the intermediate or final data created and used during the implementation of the disclosed methods or systems may also be stored on one or more computer-readable media (e.g., non-transitory computer-readable media) and are considered to be within the scope of the disclosed technology. Furthermore, any of the software-based embodiments may be uploaded, downloaded, or remotely accessed through a suitable communication means. Such a suitable communication means include, for example, the Internet, the World Wide Web, an intranet, software applications, cable (including fiber optic cable), magnetic communications, electromagnetic communications (including RF, microwave, and infrared communications), electronic communications, or other such communication means.
Although the invention has been described with reference to specific exemplary embodiments, it is noted that various modifications and changes may be made to these embodiments without departing from the broad spirit and scope of the invention. For example, the various operations, blocks, etc., described herein may be enabled and operated using hardware circuitry (for example, complementary metal oxide semiconductor (CMOS) based logic circuitry), firmware, software, and/or any combination of hardware, firmware, and/or software (for example, embodied in a machine-readable medium). For example, the apparatuses and methods may be embodied using transistors, logic gates, and electrical circuits (for example, application-specific integrated circuit (ASIC) circuitry and/or in Digital Signal Processor (DSP) circuitry).
Particularly, the server system 200 along with the client server 300 and their various components may be enabled using software and/or using transistors, logic gates, and electrical circuits (for example, integrated circuit circuitry such as ASIC circuitry). Various embodiments of the invention may include one or more computer programs stored or otherwise embodied on a computer-readable medium, wherein the computer programs are configured to cause a processor or the computer to perform one or more operations. A computer-readable medium storing, embodying, or encoded with a computer program, or similar language, may be embodied as a tangible data storage device storing one or more software programs that are configured to cause a processor or computer to perform one or more operations. Such operations may be, for example, any of the steps or operations described herein. In some embodiments, the computer programs may be stored and provided to a computer using any type of non-transitory computer-readable media. Non-transitory computer-readable media include any type of tangible storage media. Examples of non-transitory computer-readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g., magneto-optical disks), CD-ROM (compact disc read-only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (BLU-RAY® Disc), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash memory, RAM (random access memory), etc.). Additionally, a tangible data storage device may be embodied as one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. In some embodiments, the computer programs may be provided to a computer using any type of transitory computer-readable media. Examples of transitory computer-readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer-readable media can provide the program to a computer via a wired communication line (e.g., electric wires, and optical fibers) or a wireless communication line.
Various embodiments of the invention, as discussed above, may be practiced with steps and/or operations in a different order, and/or with hardware elements in configurations, which are different from those which, are disclosed. Therefore, although the invention has been described based on these exemplary embodiments, it is noted that certain modifications, variations, and alternative constructions may be apparent and well within the spirit and scope of the invention.
Although various exemplary embodiments of the invention are described herein in a language specific to structural features and/or methodological acts, the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as exemplary forms of implementing the claims.
, Claims:1. A computer-implemented method, comprising:
accessing, by a server system, a historical transaction dataset from a database associated with the server system;
generating, by the server system, a common feature set and a unique feature set based, at least in part, on the historical transaction dataset, the common feature set indicating features that are shared in common with a plurality of client servers and the unique feature set indicating features that are unique to the server system;
generating, by the server system, a primary machine learning model based, at least in part, on the common feature set and the unique feature set, the primary machine learning model comprising at least an encoder layer and a prediction head layer;
generating, by the server system via the primary machine learning model, a server gradient dataset based, at least in part, on the common feature set and the unique feature set;
generating and transmitting, by the server system, a client gradient request message to the plurality of client servers;
receiving, by the server system, a plurality of client gradient datasets from the plurality of client servers in response to the client gradient request message, wherein the plurality of client gradient datasets is generated by a plurality of secondary machine learning models associated with the plurality of client servers, each secondary machine learning model of the plurality of secondary machine learning models comprising a client level prediction head layer identical to the prediction head layer of the primary machine learning model;
generating, by the server system, an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets; and
optimizing, by the server system, weights of the primary machine learning model based, at least in part, on the aggregate gradient dataset.
2. The computer-implemented method as claimed in claim 1, further comprising:
back-propagating, by the server system, the aggregate gradient dataset to the plurality of client servers for optimizing weights of the plurality of secondary machine learning models.
3. The computer-implemented method as claimed in claim 1, wherein generating the aggregate gradient dataset, further comprises:
determining, by the server system, a plurality of common nodes of the plurality of secondary machine learning models based, at least in part, on the common feature set;
extracting, by the server system, a plurality of common gradients from the plurality of client gradient datasets based, at least in part, on determining that the plurality of common gradients is generated by the plurality of common nodes; and
aggregating, by the server system, the plurality of common gradients to generate the aggregate gradient dataset.
4. The computer-implemented method as claimed in claim 1, further comprising:
determining, by the server system, a model type of the primary machine learning model to be generated based, at least in part, on an application of the primary machine learning model;
determining, by the server system, information related to the prediction head layer of the primary machine learning model based, at least in part, on the model type of the primary machine learning model; and
generating and transmitting, by the server system, a model type notification message to the plurality of client servers, the model type notification message comprising at least the information related to the prediction head layer of the primary machine learning model.
5. The computer-implemented method as claimed in claim 1, wherein each of the plurality of client servers generates a client gradient dataset via a secondary machine learning model based, at least in part, on a common client feature set and a unique client feature set, the common client feature set being identical to the common feature set and the unique client feature set comprising features unique to each of the plurality of client servers.
6. The computer-implemented method as claimed in claim 5, wherein the unique client feature set of each of the plurality of client servers is generated based, at least in part, on a client transaction dataset, the client transaction dataset being unique to each of the plurality of client servers.
7. The computer-implemented method as claimed in claim 1, wherein the primary machine learning model and the plurality of secondary machine learning models are diagonal federated learning models.
8. The computer-implemented method as claimed in claim 1, wherein the server system is a payment server associated with a payment network.
9. The computer-implemented method as claimed in claim 1, wherein the plurality of client servers comprises at least one of one or more issuer servers and one or more acquirer servers.
10. A computer-implemented method, comprising:
receiving, by a client server, a model type notification message from a server system, the model type notification message comprising at least information related to a prediction head layer of a primary machine learning model associated with the server system;
accessing, by the client server, a client transaction dataset from a client database associated with the client server;
generating, by the client server, a common client feature set and a unique client feature set based, at least in part, on the client transaction dataset, the common client feature set indicating features that are shared in common with the server system and the unique client feature set indicating features that are unique to the client server;
generating, by the client server, a secondary machine learning model based, at least in part, on the model type notification message, the common client feature set, and the unique client feature set, the secondary machine learning model comprising at least a client level encoder layer and a client prediction head layer, the client prediction head layer being identical to the prediction head layer of the primary machine learning model;
in response to receiving a client gradient request message from the server system, generating, by the client server via the secondary machine learning model, a client gradient dataset based, at least in part, on the common client feature set and the unique client feature set; and
in response to receiving an aggregate gradient dataset from the server system, optimizing, by the client server, weights of the secondary machine learning model based, at least in part, on the aggregate gradient dataset.
11. A server system, comprising:
a memory configured to store instructions;
a communication interface; and
a processor in communication with the memory and the communication interface, the processor configured to execute the instructions stored in the memory and thereby cause the server system to perform, at least in part, to:
access a historical transaction dataset from a database associated with the server system;
generate a common feature set and a unique feature set based, at least in part, on the historical transaction dataset, the common feature set indicating features that are shared in common with a plurality of client servers and the unique feature set indicating features that are unique to the server system;
generate a primary machine learning model based, at least in part, on the common feature set and the unique feature set, the primary machine learning model comprising at least an encoder layer and a prediction head layer;
generate via the primary machine learning model, a server gradient dataset based, at least in part, on the common feature set and the unique feature set;
generate and transmit a client gradient request message to the plurality of client servers;
receive a plurality of client gradient datasets from the plurality of client servers in response to the client gradient request message, wherein the plurality of client gradient datasets is generated by a plurality of secondary machine learning models associated with the plurality of client servers, each secondary machine learning model of the plurality of secondary machine learning models comprising a client level prediction head layer identical to the prediction head layer of the primary machine learning model;
generate an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets; and
optimize weights of the primary machine learning model based, at least in part, on the aggregate gradient dataset.
12. The server system as claimed in claim 11, wherein the server system is further caused, at least in part to:
back-propagate the aggregate gradient dataset to the plurality of client servers for optimizing weights of the plurality of secondary machine learning models.
13. The server system as claimed in claim 11, wherein for generating the aggregate gradient dataset, the server system is further caused, at least in part to:
determine a plurality of common nodes of the plurality of secondary machine learning models based, at least in part, on the common feature set;
extract a plurality of common gradients from the plurality of client gradient datasets based, at least in part, on determining that the plurality of common gradients is generated by the plurality of common nodes; and
aggregate the plurality of common gradients to generate the aggregate gradient dataset.
14. The server system as claimed in claim 11, wherein the server system is further caused, at least in part to:
determine a model type of the primary machine learning model to be generated based, at least in part, on an application of the primary machine learning model;
determining, by the server system, information related to the prediction head layer of the primary machine learning model based, at least in part, on the model type of the primary machine learning model; and
generate and transmit a model type notification message to the plurality of client servers, the model type notification message comprising at least the information related to the prediction head layer of the primary machine learning model.
15. The server system as claimed in claim 11, wherein each of the plurality of client servers generates a client gradient dataset via a secondary machine learning model based, at least in part, on a common client feature set and a unique client feature set, the common client feature set being identical to the common feature set and the unique client feature set comprising features unique to each of the plurality of client servers.
16. The server system as claimed in claim 15, wherein the unique client feature set of each of the plurality of client servers is generated based, at least in part, on a client transaction dataset, the client transaction dataset being unique to each of the plurality of client servers.
17. A non-transitory computer-readable storage medium comprising computer-executable instructions that, when executed by at least a processor of a server system, cause the server system to perform a method comprising:
accessing a historical transaction dataset from a database associated with the server system;
generating a common feature set and a unique feature set based, at least in part, on the historical transaction dataset, the common feature set indicating features that are shared in common with a plurality of client servers and the unique feature set indicating features that are unique to the server system;
generating a primary machine learning model based, at least in part, on the common feature set and the unique feature set, the primary machine learning model comprising at least an encoder layer and a prediction head layer;
generating via the primary machine learning model, a server gradient dataset based, at least in part, on the common feature set and the unique feature set;
generating and transmitting a client gradient request message to the plurality of client servers;
receiving a plurality of client gradient datasets from the plurality of client servers in response to the client gradient request message, wherein the plurality of client gradient datasets is generated by a plurality of secondary machine learning models associated with the plurality of client servers, each secondary machine learning model of the plurality of secondary machine learning models comprising a client level prediction head layer identical to the prediction head layer of the primary machine learning model;
generating an aggregate gradient dataset based, at least in part, on the server gradient dataset and the plurality of client gradient datasets; and
optimizing weights of the primary machine learning model based, at least in part, on the aggregate gradient dataset.
18. The non-transitory computer-readable storage medium as claimed in claim 17, the method further comprising:
back-propagating the aggregate gradient dataset to the plurality of client servers for optimizing weights of the plurality of secondary machine learning models.
19. The non-transitory computer-readable storage medium as claimed in claim 17, wherein generating the aggregate gradient dataset, further comprises:
determining a plurality of common nodes of the plurality of secondary machine learning models based, at least in part, on the common feature set;
extracting a plurality of common gradients from the plurality of client gradient datasets based, at least in part, on determining that the plurality of common gradients is generated by the plurality of common nodes; and
aggregating the plurality of common gradients to generate the aggregate gradient dataset.
20. The non-transitory computer-readable storage medium as claimed in claim 17, the method further comprising:
determining a model type of the primary machine learning model to be generated based, at least in part, on an application of the primary machine learning model;
determining information related to the prediction head layer of the primary machine learning model based, at least in part, on the model type of the primary machine learning model; and
generating and transmitting a model type notification message to the plurality of client servers, the model type notification message comprising at least the information related to the prediction head layer of the primary machine learning model.
| # | Name | Date |
|---|---|---|
| 1 | 202341048518-STATEMENT OF UNDERTAKING (FORM 3) [19-07-2023(online)].pdf | 2023-07-19 |
| 2 | 202341048518-POWER OF AUTHORITY [19-07-2023(online)].pdf | 2023-07-19 |
| 3 | 202341048518-FORM 1 [19-07-2023(online)].pdf | 2023-07-19 |
| 4 | 202341048518-FIGURE OF ABSTRACT [19-07-2023(online)].pdf | 2023-07-19 |
| 5 | 202341048518-DRAWINGS [19-07-2023(online)].pdf | 2023-07-19 |
| 6 | 202341048518-DECLARATION OF INVENTORSHIP (FORM 5) [19-07-2023(online)].pdf | 2023-07-19 |
| 7 | 202341048518-COMPLETE SPECIFICATION [19-07-2023(online)].pdf | 2023-07-19 |
| 8 | 202341048518-Proof of Right [21-10-2023(online)].pdf | 2023-10-21 |