Abstract: Embodiments provide methods and systems for predicting payment transaction events. For each event of plurality of events forming input event sequence of evaluation dataset where each event includes meta-features, time data, and mark data associated with payment transaction, the method includes generating meta-feature embedding by feeding meta-features to pre-trained meta-features autoencoder. The method includes converting mark data to numerical mark embedding using mark embedding network. The method includes generating input event embedding by concatenating time data, numerical mark embedding, and meta-feature embedding. The method includes forming input event sequence embedding by combining each input event embedding. The method includes generating representation vector by feeding the input event sequence embedding to neural network model. The method includes computing probability density function of time data and probability mass function of mark data to be utilized to compute joint probability density function of time data and mark data for next event prediction.
Claims:CLAIMS
We claim:
1. A computer-implemented method comprising:
for each event of a plurality of events forming an input event sequence of an evaluation dataset, wherein each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user, and wherein each event comprises a plurality of meta-features, a time data, and a mark data associated with the payment transaction, performing, by a processor:
generating a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder;
converting the mark data to a numerical mark embedding using a mark embedding network; and
generating an input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding;
forming an input event sequence embedding by combining each input event embedding;
generating a representation vector by feeding the input event sequence embedding to a neural network model; and
computing a probability density function of a time data for a next event and a probability mass function of a mark data for the next event from the representation vector, wherein the probability density function of the time data and the probability mass function of the mark data are utilized to compute a joint probability density function of the time data and the mark data for the next event prediction.
2. The method as claimed in claim 1, wherein the neural network is a recurrent neural network model and wherein the representation vector is an event history vector generated by feeding the input event sequence embedding to the recurrent neural network model.
3. The method as claimed in claim 2, further comprising:
feeding the event history vector to a neural network model to compute the probability density function of the time data for the next event, wherein the neural network model is pre-trained using a log-likelihood loss function.
4. The method as claimed in claim 3, further comprising:
computing the probability density function of the time data using a Gaussian Mixture Model (GMM), wherein one or more parameters of the GMM are computed through the neural network model.
5. The method as claimed in claim 2, further comprising:
feeding the event history vector to a neural network model to compute the probability mass function of the mark data for the next event, wherein the probability mass function of the mark data comprises a categorical mark distribution.
6. The method as claimed in claim 1, further comprising:
for each training event of a plurality of training events forming a training input event sequence of a training dataset, wherein each training event corresponds to a payment transaction of a plurality of time-ordered transactions of a user, and wherein each training event comprises a plurality of meta-features, a time data, and a mark data associated with the payment transaction, performing, by the processor:
generating a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder;
converting the mark data to a numerical mark embedding using a mark embedding network;
generating a pseudo label by feeding the meta-feature embedding to an unsupervised clustering algorithm; and
generating a training input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding.
7. The method as claimed in claim 6, further comprising:
forming a training input event sequence embedding by combining each training input event embedding; and
generating an event history vector by feeding the training input event sequence embedding to a recurrent neural network model.
8. The method as claimed in claim 7, further comprising:
feeding the event history vector to a neural network model to compute a probability density function of a pseudo label for a next training event, wherein the probability density function of the pseudo label comprises a categorical pseudo label distribution;
computing a prediction of the pseudo label for the next training event from the categorical pseudo label distribution; and
comparing the predicted pseudo label with a known pseudo label using a categorical cross-entropy loss function.
9. The method as claimed in claim 6, wherein pre-training the meta-features autoencoder, further comprises:
feeding the plurality of meta-features to an encoder-decoder architecture; and
updating weights of the encoder-decoder architecture using a total mean absolute error as a loss function.
10. The method as claimed in claim 1, wherein the plurality of meta-features comprises at least one of an industry, a super industry, a transaction amount, a Merchant Category Code (MCC), a payment card type, and an aggregate merchant name.
11. A system comprising:
a communication interface;
a memory comprising executable instructions; and
a processor communicably coupled to the communication interface and configured to execute the instructions to cause the system to at least:
for each event of a plurality of events forming an input event sequence of an evaluation dataset, wherein each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user, and wherein each event comprises a plurality of meta-features, a time data, and a mark data associated with the payment transaction, perform to:
generate a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder;
convert the mark data to a numerical mark embedding using a mark embedding network; and
generate an input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding;
form an input event sequence embedding by combining each input event embedding;
generate a representation vector by feeding the input event sequence embedding to a neural network model; and
compute a probability density function of a time data for a next event and a probability mass function of a mark data for the next event from the representation vector, wherein the probability density function of the time data and the probability mass function of the mark data are utilized to compute a joint probability density function of the time data and the mark data for the next event prediction.
12. The system as claimed in claim 11, wherein the neural network is a recurrent neural network model and wherein the representation vector is an event history vector generated by feeding the input event sequence embedding to the recurrent neural network model.
13. The system as claimed in claim 12, wherein the system is further caused to:
feeding the event history vector to a neural network model to compute the probability density function of the time data for the next event, wherein the neural network model is pre-trained using a log-likelihood loss function.
14. The system as claimed in claim 13, wherein the system is further caused to:
computing the probability density function of the time data using a Gaussian Mixture Model (GMM), wherein one or more parameters of the GMM are computed through the neural network model.
15. The system as claimed in claim 12, wherein the system is further caused to:
feed the event history vector to a neural network model to compute the probability mass function of the mark data for the next event, wherein the probability mass function of the mark data comprises a categorical mark distribution.
16. The system as claimed in claim 11, wherein the system is further caused to:
for each training event of a plurality of training events forming a training input event sequence of a training dataset, wherein each training event corresponds to a payment transaction of a plurality of time-ordered transactions of a user, and wherein each training event comprises a plurality of meta-features, a time data, and a mark data associated with the payment transaction, performing, by the processor:
generating a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder;
converting the mark data to a numerical mark embedding using a mark embedding network;
generating a pseudo label by feeding the meta-feature embedding to an unsupervised clustering algorithm; and
generating a training input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding.
17. The system as claimed in claim 16, wherein the system is further caused to:
form a training input event sequence embedding by combining each training input event embedding; and
generate an event history vector by feeding the training input event sequence embedding to a recurrent neural network model.
18. The system as claimed in claim 17, wherein the system is further caused to:
feed the event history vector to a neural network model to compute a probability density function of a pseudo label for a next training event, wherein the probability density function of the pseudo label comprises a categorical pseudo label distribution;
compute a prediction of the pseudo label for the next training event from the categorical pseudo label distribution; and
compare the predicted pseudo label with a known pseudo label using a categorical cross-entropy loss function.
19. The system as claimed in claim 16, wherein for pre-training the meta-features autoencoder, the system is further caused to:
feed the plurality of meta-features to an encoder-decoder architecture; and
update weights of the encoder-decoder architecture using a total mean absolute error as a loss function.
20. A computer-implemented method comprising:
for each event of a plurality of events forming an input event sequence of an evaluation dataset, wherein each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user, and wherein each event comprises a plurality of meta-features, a time data, and a mark data associated with the payment transaction, performing, by a processor:
generating a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder;
converting, the mark data to a numerical mark embedding using a mark embedding network; and
generating an input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding;
forming an input event sequence embedding by combining each input event embedding;
generating an event history vector by feeding the input event sequence embedding to a recurrent neural network model;
computing a probability density function of a time data for a next event using a Gaussian Mixture Model (GMM), wherein one or more parameters of the GMM are computed by feeding the event history vector to a neural network model;
computing a probability mass function of a mark data for a next event by feeding the event history vector to another neural network model, wherein the probability mass function of the mark data comprises a categorical mark distribution; and
computing a joint probability density function of the time data and the mark data for predicting the next event using the computed probability density function of the time data and the computed probability mass function of the mark data.
, Description:
FORM 2
THE PATENTS ACT 1970
(39 of 1970)
&
The Patent Rules 2003
COMPLETE SPECIFICATION
(refer section 10 & rule 13)
TITLE OF THE INVENTION:
METHODS AND SYSTEMS FOR PREDICTING PAYMENT TRANSACTION EVENTS
APPLICANT(S):
Name:
Nationality:
Address:
MASTERCARD INTERNATIONAL INCORPORATED
United States of America
2000 Purchase Street, Purchase, NY 10577, United States of America
PREAMBLE TO THE DESCRIPTION
The following specification particularly describes the invention and the manner in which it is to be performed.
DESCRIPTION
(See next page)
METHODS AND SYSTEMS FOR PREDICTING PAYMENT TRANSACTION EVENTS
TECHNICAL FIELD
The present disclosure relates to artificial intelligence processing systems and, more particularly to, electronic methods and complex processing systems for predicting various payment transaction events by utilizing deep learning techniques such as Marked Temporal Point Process (MTPP).
BACKGROUND
The availability of technical capabilities for the aggregation of large amounts of data in corporate information systems leads to the possibility of constructing predictive models of customer behavior on the history of their activities. The precise time interval or the exact distance between two events carries a great deal of information about the dynamics of the underlying systems. These characteristics make such data fundamentally different from independently and identically distributed data and time-series data where time and space are treated as indexes. However, there are many real-life scenarios where two events do not occur at regular intervals, for example, the time duration between two transactions of a customer is not fixed, or people might visit various places at different moments of a day and the like.
Forecasting events in the payment industry and banking sphere is very important nowadays because there is much amount of data collected by different information systems such as financial institutes, banks, etc., and the availability of data brings the opportunity to make many interesting applications for predictive models based on this data. As the events are irregular, predicting what kind of event will happen at what time in the future based on the observed sequence of events is of prime value. One such example is personalized offers provided based on forecasting customer needs by predicting the expected time and category of the next purchase while taking into account the transaction history of a customer. Currently, the personalized recommendations are predicted for a group of events occurring at regular intervals, which leads to inaccurate recommendations as the spending habit of each customer is different and therefore should be the personalized recommendation.
Thus, there is a need for a technical solution for precisely predicting payment transaction events occurring at irregular intervals through the use of artificial intelligence and deep learning techniques such that the underlying information in the prediction can be utilized for different applications such as hyper-personalization, managing merchant campaigns, geographical recommendation, attrition and the like.
SUMMARY
Various embodiments of the present disclosure provide systems, methods, electronic devices, and computer program products for predicting payment transaction events.
In an embodiment, a computer-implemented method is disclosed. The computer-implemented method performed by a processor includes performing for each event of a plurality of events forming an input event sequence of an evaluation dataset, wherein each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user, and wherein each event includes a plurality of meta-features, a time data, and a mark data associated with the payment transaction, a) generating a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder, b) converting the mark data to a numerical mark embedding using a mark embedding network, and c) generating an input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding. The method includes forming an input event sequence embedding by combining each input event embedding. The method includes generating a representation vector by feeding the input event sequence embedding to a neural network model. Thereafter, the method includes computing a probability density function of a time data for a next event and a probability mass function of a mark data for the next event from the representation vector. The probability density function of the time data and the probability mass function of the mark data are utilized to compute a joint probability density function of the time data and the mark data for the next event prediction.
In another embodiment, a system is disclosed. The system includes a communication interface, a memory including executable instructions, and a processor communicably coupled to the communication interface. The processor is configured to execute the executable instructions to cause the system to at least perform for each event of a plurality of events forming an input event sequence of an evaluation dataset, wherein each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user, and wherein each event includes a plurality of meta-features, a time data, and a mark data associated with the payment transaction, a) generate a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder, b) convert the mark data to a numerical mark embedding using a mark embedding network, and c) generate an input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding. The system is further caused to form an input event sequence embedding by combining each input event embedding. The system is further caused to generate a representation vector by feeding the input event sequence embedding to a neural network model. Thereafter, the system is further caused to compute a probability density function of a time data for a next event and a probability mass function of a mark data for the next event from the representation vector. The probability density function of the time data and the probability mass function of the mark data are utilized to compute a joint probability density function of the time data and the mark data for the next event prediction.
In yet another embodiment, a computer-implemented method is disclosed. The computer-implemented method performed by a processor includes performing for each event of a plurality of events forming an input event sequence of an evaluation dataset, wherein each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user, and wherein each event includes a plurality of meta-features, a time data, and a mark data associated with the payment transaction, a) generating a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder, b) converting the mark data to a numerical mark embedding using a mark embedding network, and c) generating an input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding. The method includes forming an input event sequence embedding by combining each input event embedding. The method includes generating an event history vector by feeding the input event sequence embedding to a recurrent neural network model. The method includes computing a probability density function of a time data for a next event using a Gaussian Mixture Model (GMM). One or more parameters of the GMM are computed by feeding the event history vector to a neural network model. The method includes computing a probability mass function of a mark data for a next event by feeding the event history vector to another neural network model. The probability mass function of the mark data includes a categorical mark distribution. The method includes computing a joint probability density function of the time data and the mark data for predicting the next event using the computed probability density function of the time data and the computed probability mass function of the mark data.
BRIEF DESCRIPTION OF THE FIGURES
For a more complete understanding of example embodiments of the present technology, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
FIG. 1 is an example representation of an environment, related to at least some example embodiments of the present disclosure;
FIG. 2 is a simplified block diagram of a system, in accordance with one embodiment of the present disclosure;
FIG. 3 is a schematic block diagram representation of training a meta-features autoencoder, in accordance with an example embodiment;
FIG. 4 represents a schematic block diagram representation of generating a pseudo label during a training phase of a neural network model, in accordance with an example embodiment;
FIG. 5 represents a schematic block diagram representation of training the neural network model, in accordance with an embodiment of the present disclosure;
FIG. 6 represents a schematic block diagram representation of evaluating the trained neural network model, in accordance with one embodiment of the present disclosure;
FIG. 7 represents a flow diagram of a computer-implemented method for predicting payment transaction events, in accordance with an example embodiment;
FIG. 8 represents a flow diagram of another computer-implemented method for predicting payment transaction events, in accordance with an example embodiment; and
FIG. 9 is a simplified block diagram of a server system, in accordance with an example embodiment of the present disclosure.
The drawings referred to in this description are not to be understood as being drawn to scale except if specifically noted, and such drawings are only exemplary in nature.
DETAILED DESCRIPTION
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure can be practiced without these specific details.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in an embodiment” in various places in the specification is not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
Moreover, although the following description contains many specifics for the purposes of illustration, anyone skilled in the art will appreciate that many variations and/or alterations to said details are within the scope of the present disclosure. Similarly, although many of the features of the present disclosure are described in terms of each other, or in conjunction with each other, one skilled in the art will appreciate that many of these features can be provided independently of other features. Accordingly, this description of the present disclosure is set forth without any loss of generality to, and without imposing limitations upon, the present disclosure.
The term "payment instrument", used throughout the description, refers to a paper-based or electronic payment means or other payment means used to initiate the transfer of funds. Example of the payment instruments includes payment accounts, payment cards (such as debit card, credit card, etc.), digital payment cards, e-wallets, etc.
The term "payment network", used throughout the description, refers to a network or collection of systems used for the transfer of funds through the use of cash-substitutes. Payment networks may use a variety of different protocols and procedures to process the transfer of money for various types of transactions. Transactions that may be performed via a payment network may include product or service purchases, credit purchases, debit transactions, fund transfers, account withdrawals, etc. Payment networks may be configured to perform transactions via cash-substitutes, which may include payment cards, letters of credit, checks, financial accounts, etc. Examples of networks or systems configured to perform as payment networks include those operated by such as Mastercard®.
The term "issuer", used throughout the description, refers to a financial institution normally called an "issuer bank" or "issuing bank" in which an individual or an institution may have an account. The issuer also issues a payment card, such as a credit card or a debit card, etc. Further, the issuer may also facilitate online banking services such as electronic money transfer, bill payment, etc., to the account holders through a server system called "issuer server" throughout the description.
The term “merchant”, used throughout the description, generally refers to a seller, a retailer, a purchase location, an organization, or any other entity that is in the business of selling goods or providing services either in an online or offline manner, and it can refer to either a single business location or a chain of business locations of the same entity.
The term "acquirer", used throughout the description, refers to a financial institution that is part of the financial payment system and is normally called a “merchant bank” or the “acquiring bank” or “acquirer bank” or simply “acquirer” in which the merchant first establishes an account to accept the payment. The acquirer may also facilitate online services such as electronic money transfer to the account holders i.e., merchants through a server system called "acquirer server" throughout the description.
The term “Merchant Category Code (MCC)”, used throughout the description, refers to a four-digit code uniquely assigned to each merchant by the acquirer as part of the account process setup. The MCC is used to classify the business by the type of goods or services, the merchant provides such that the issuer can identify what type of transaction its cardholder is performing, from what website, and in which country.
OVERVIEW
Various example embodiments of the present disclosure provide methods, systems, user devices, and computer program products for predicting payment transaction events using Marked Temporal Point Process (MTPP) based Artificial Intelligence (AI) model.
In various example embodiments, the present disclosure describes a computing system that facilitates the prediction of payment transaction events. The system includes at least a processor and a memory. The processor is configured to generate a training dataset, a validation dataset, and an evaluation dataset by retrieving relevant transaction history from a database / transaction database during applicable stages. A training dataset includes a training input event sequence formed using a plurality of training events. Each training event represents a payment transaction of a plurality of time-ordered transactions of a user. Further, each training event includes a plurality of meta-features, time data, and mark data associated with the payment transaction. The plurality of meta-features and the time-data of a time-sequence corresponding to the plurality of transactions are retrieved from the database. Examples of the plurality of meta-features include an industry, a super industry, a transaction amount, a Merchant Category Code (MCC), a payment card type, an aggregate merchant name and the like.
In one embodiment, the processor is configured to form a training input event sequence embedding by combining each training input event embedding generated by concatenating the time data, a numerical mark embedding, and a meta-feature embedding. The meta-feature embedding is generated by feeding the plurality of meta-features to a pre-trained meta-features autoencoder. The mark data is converted to the numerical mark embedding using a mark embedding network engine. Additionally, a pseudo label is generated by feeding the meta-feature embedding to an unsupervised clustering algorithm. A representation vector such as an event history vector is generated by feeding the training input event sequence embedding to a neural network model such as a recurrent neural network model. The event history vector is fed to a pseudo label computation neural network model to compute a probability density function of a pseudo label for a next training event. A predicted pseudo label is compared with a known pseudo label using the categorical cross-entropy loss function.
In one embodiment, the event history vector is also fed to respective neural network models to compute a probability density function of a time data and a probability mass function of a mark data for a next training event. Negative log-likelihood is used to compare the predicted values of the time data and the mark data (except the pseudo label) with the already known next event since the training dataset is already known. The process is repeated for each training input event sequence of the training dataset and total losses for all the training input event sequences are added. The final value is used to update the weights of all the neural network models used in the training architecture. The pseudo label generation is used only during the training phase to improve the predictive performance of the AI model.
During an evaluation phase, an evaluation dataset similar to the training dataset is generated and an input event sequence embedding is generated in a similar manner as the training input event sequence embedding is generated as explained hereinabove for further processing. A probability density function of time data for the next event and a probability mass function of a mark data for the next event from the input history vector is computed. The probability density function of the time data and the probability mass function of the mark data are utilized to compute a joint probability density function of the time data and the mark data for the next event prediction.
Real-world data exhibits complex behavior. For example, transactions are discrete events happening at irregular intervals. The solution explained in the present disclosure is flexible to approximate any distribution, efficient for downstream tasks, and easy to use for sampling and summary statistics. A typical Temporal Point Process (TPP) includes four components, an input marker, an input time, an output marker, and an output time. A marker is a class for a classification problem and a typical TPP is capable of taking only a limited number of inputs. Other classification recorders are non-TPP or purely designed for classification problems where they are not capable of predicting the time. Further, some classification recorders are capable of predicting the time, but the classification accuracy is very less.
When the number of classes increases, classification becomes difficult. For example, if a mark such as a super industry is to be predicted, then there may be only a limited number of inputs e.g., 30. If an MCC is to be predicted, then the number of inputs increases to e.g., 500, which may be difficult. If the name of an aggregate merchant is to be predicted, then the classification becomes even more difficult problem (e.g., 5000 classes). When the number of classes is drastically huge, MTPP is very useful to avoid extreme class classification problems for labeling. For resolving the extreme class classification problem, the present disclosure discloses one more set of inputs i.e., meta-features and a combination of unsupervised learning which gives a boost compared to a typical TPP classification in terms of accuracy and temporal performance.
The number of classes is reduced by feeding the meta-feature encoding to an unsupervised learning algorithm such that instead of labeling a huge number of inputs altogether, only a group of inputs is processed hierarchically. Thus, instead of making predictions at the bottom level, the disclosed AI model is capable of making predictions at hierarchical levels. For example, if the marker is set as a super industry, the generation of the pseudo label using unsupervised learning during the training phase provides further important information such as the MCC of the predicted super industry and the aggregate merchant type. This type of structure is useful to reveal user-level characteristics. For instance, the super industry is a food industry, the merchant is McDonald and AMT is a particular location. If only the food industry as Mcdonald's® is passed as an input marker to the model, it may not reveal the characteristics like certain locations being frequently visited. That type of information is revealed by the pseudo labels, which augments the marker prediction and the user characteristics.
The information retrieved from the predicted transaction events is utilized for providing transactional intelligence to the issuers and merchants for enhancing their customer services via various real-time applications, for example, better campaign management for the merchant. Predicting what a customer will buy, has been a useful and common technique for Calculating the Life-time Value estimate (CLV) of the customer. Existing techniques are unable to determine when exactly a purchase will be made. Accordingly, a model is built that is capable of predicting both transactional attributes along with time and has a direct positive impact on issuers’ and merchants’ customer-oriented policies such as high returns on investments, lower customer attrition, hyper-personalized recommendation to the customer, and the like.
Various example embodiments of the present disclosure are described hereinafter with reference to FIGS. 1 to 9.
FIG. 1 illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, predicting multiple payment transaction events of a user. The environment 100 generally includes a transaction event prediction scenario 110, a system 120, an issuer server 135, a payment network 145 including a payment server 140, a merchant server 125, an acquirer server 130, and a transaction database 155 each connected to, and in communication with (and/or with access to) a network 150. The network 150 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber-optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among the entities illustrated in FIG. 1, or any combination thereof.
Various entities in the environment 100 may connect to the network 150 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof. The network 150 may include, without limitation, a local area network (LAN), a wide area network (WAN) (e.g., the Internet), a mobile network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among two or more of the entities illustrated in FIG. 1, or any combination thereof. For example, the network 150 may include multiple different networks, such as a private network made accessible by the payment network 145 to the issuer server 135, the merchant server 125, the acquirer server 130, and the transaction database 155, and separately, a public network (e.g., the Internet) through which the system 120, the payment server 140, the issuer server 135, the merchant server 125, the acquirer server 130 and the transaction database 155 may communicate.
The transaction event prediction scenario 110 is shown to include a plurality of transactions of a user (not shown) in a time-ordered manner on the x-axis as performed with a plurality of respective merchants on the y-axis. The plurality of merchants is collectively represented as an event mark 104. The merchants M1 104a, M2 104b, M3 104c, M4 104d, and M5 104e form the event mark 104. The transaction time of each transaction arranged sequentially is represented as t1 102a, t2 102b, t3 102c, t4 102d, and t5 102e (collectively referred to as an event arrival time 102). Further, an event history 106 is shown to collectively include transaction times t1 102a, t2 102b, t3 102c, and t4 102d along with the merchants M1 104a, M2 104b, M3 104c, and M4 104d.
Various embodiments of the present disclosure provide ways to predict a time and a type of the next transaction via the system 120. An event prediction 108 is shown to collectively include the transaction time t5 102e and the merchant M5 104e as predicted by the system 120 using the event history 106. Even though the transaction event prediction scenario 110 is shown to include event mark 104 as merchant names, the system 120 is configured to predict various marks such as, but not limited to, geographical area, an industry, a super industry (e.g., apparel and accessories, accommodations and food services, consumer electronics and computers, etc.), a transaction amount, a Merchant Category Code (MCC), a payment card type, an aggregate merchant name and the like along with the transaction time (collectively referred as a plurality of meta-features hereinafter).
The system 120 is configured to retrieve the meta-features from the transaction database 155. An AI model trained and stored in the system 120 is configured to process the plurality of meta-features along with an input transaction time and a selected input mark to be predicted for the next transaction. Even though only one transaction is shown as predicted (represented as t5 102e being the time of the predicted transaction and M5 104e being the merchant name of the predicted transaction), the system 120 is capable of predicting a plurality of transaction events for a longer period of time (e.g., next 3 to 5 months).
The AI model is trained in such a way that apart from predicting the event time and event mark of a transaction, it is also capable of predicting other user characteristics. For example, the amount of transactions can be predicted. As the transaction amounts of the event history 106 can be retrieved from the transaction database 155, and if the future transaction’s amount is predicted using the system 120, the CLV of a customer can be estimated by summation of all the transaction amounts. Also, an inactivity period (e.g., no financial activity for more than 90 days) of a customer can be found from predicting the event time of the next transaction. Therefore, if the difference of (t5 102e (predicted time) – t4 102d (known time)) happens to be greater than 90 days, the customer is considered inactive. Further, when the merchant name or the MCC is predicted along with the time prediction of a transaction, the merchant is notified of such information so that he can come up with an applicable personalized offer for the particular customer.
Additionally, the system 120 is configured to predict the transaction type as Card Not Present (CNP) or Card On File (COF). Also, the system 120 is capable of predicting multiple MCCs for upcoming transactions. This information is useful to know if the event prediction 108 does not occur then what would be the next possible merchant and when will the next transaction occur thereafter. Further, this information can be utilized in campaign management (merchant side) or hyper-personalization (user side).
The system 120 includes a processor and a memory. The system 120 is in communication with the transaction database 155 via a communication interface over the network 150. In one embodiment, the transaction database 155 is integrated within the payment server 140 associated with the payment network 145. The system 120 is a separate part of the environment 100 and may operate apart from (but still in communication with, for example, via the network 150) the issuer server 135, the acquirer server 130, the payment server 140, the merchant server 125, and the transaction database 155. However, in other embodiments, the system 120 may actually be incorporated, in whole or in part, into one or more parts of the environment 100, for example, the payment server 140 or the issuer server 135, or the acquirer server 130. In addition, the system 120 should be understood to be embodied in at least one computing device in communication with the network 150, which may be specifically configured, via executable instructions, to perform as described herein, and/or embodied in at least one non-transitory computer-readable media. In one embodiment, the payment server 140 associated with the payment network 145 is shown. The payment network 145 may be used by the payment card issuing authorities as a payment interchange network. Examples of payment interchange networks include, but are not limited to, Mastercard® payment system interchange network. The Mastercard® payment system interchange network is a proprietary communications standard promulgated by Mastercard International Incorporated® for the exchange of financial transaction data between financial institutions that are members of Mastercard International Incorporated®. (Mastercard is a registered trademark of Mastercard International Incorporated located in Purchase, N.Y.).
The number and arrangement of systems, devices, and/or networks shown in FIG. 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks, and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1. Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 100 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 100.
Referring now to FIG. 2, a simplified block diagram of a system 200, is shown, in accordance with an embodiment of the present disclosure. The system 200 is similar to the system 120. In some embodiments, the system 200 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture. In some embodiments, the system 200 may be implemented in a server system. In one embodiment, the system 200 is a part of a payment network 145 or is integrated within the payment server 140. In another embodiment, the system 200 may be embodied within the merchant server 125 or the issuer server 135.
The system 200 includes a computer system 202 and a database 204. The computer system 202 includes at least one processor 206 for executing instructions, a memory 208, a communication interface 210, and a user interface 216 that communicate with each other via a bus 212.
In some embodiments, the database 204 is integrated within the computer system 202. For example, the computer system 202 may include one or more hard disk drives as the database 204. A storage interface 214 is any component capable of providing the processor 206 with access to the database 204. The storage interface 214 may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing the processor 206 with access to the database 204.
In one embodiment, the database 204 is configured to store a plurality of meta-features such as a plurality of transactional features, a plurality of card features, and a plurality of merchant features associated with each transaction of the plurality of transactions. In one embodiment, the transaction database 155 of FIG. 1 is embodied within the database 204. In another embodiment, the transaction database 155 is a separate entity communicably coupled with the database 204.
The processor 206 includes suitable logic, circuitry, and/or interfaces to execute operations for predicting payment transaction events. Examples of the processor 206 include, but are not limited to, Graphics Processing Unit (GPU), an application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a field-programmable gate array (FPGA), and the like. The memory 208 includes suitable logic, circuitry, and/or interfaces to store a set of computer-readable instructions for performing operations. Examples of the memory 208 include a random-access memory (RAM), a read-only memory (ROM), a removable storage drive, a hard disk drive (HDD), and the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the memory 208 in the system 200, as described herein. In another embodiment, the memory 208 may be realized in the form of a database server or cloud storage working in conjunction with the system 200, without departing from the scope of the present disclosure.
The processor 206 is operatively coupled to the communication interface 210 such that the processor 206 is capable of communicating with a remote device 218 such as, the transaction database 155, the payment server 140, the issuer server 135, the acquirer server 130, the merchant server 125, or communicated with any entity connected to the network 150 (as shown in FIG. 1).
It is noted that the system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the system 200 may include fewer or more components than those depicted in FIG. 2.
In one embodiment, the processor 206 includes a feature extraction engine 220, a mark embedding network engine 222, a meta-features autoencoder 224, an unsupervised learning algorithm 226, a Recurrent Neural Network (RNN) model 228, a time computation neural network model 230, a mark computation neural network model 232, a pseudo label computation neural network model 234, and a trained neural network engine 236. It should be noted that the components, described herein, can be configured in a variety of ways, including electronic circuitries, digital arithmetic and logic blocks, and memory systems in combination with software, firmware, and embedded technologies.
The feature extraction engine 220 includes suitable logic and/or interfaces for accessing a plurality of meta-features associated with each transaction from the transaction database 155 of FIG. 1. The plurality of meta-features of a transaction is generated during various phases of the trained neural network engine 236 and stored in the transaction database 155 for extraction during the training phase, validation phase, and evaluation phase / execution phase. The feature extraction engine 220 is configured to extract a plurality of transactional features, a plurality of card features, and a plurality of merchant features associated with each transaction from the transaction database 155. Some non-exhaustive examples of the plurality of transactional features include a transaction amount, a transaction status, a transaction time, a transaction type, and the like. Some non-exhaustive examples of the plurality of card features include a card type, card information, and the like. Some non-exhaustive examples of the plurality of merchant features include a merchant name, MCC, an aggregate merchant type, industry, super industry, and the like.
A temporal point pattern is a list of times of events. Many real phenomena, such as a plurality of transactions of a user, produce data that can be represented as a temporal point pattern. Usually, complex mechanisms are behind these seemingly random events, and an essential tool for dealing with these mechanisms, for example in predicting future events, is a temporal point process. The term point is used to refer to an event as being instant and thus can represent it as a point on the timeline. Often there is more information available associated with an event. This information is known as marks or classes. The marks may be of a separate interest or may simply be included to make a more realistic model of the event times.
The processor 206 is configured to generate a training dataset that includes a training input event sequence. The training input event sequence includes a plurality of training events. Each training event corresponds to a payment transaction of a plurality of time-ordered transactions of a user. Further, each training event includes a plurality of meta-features, a time data, and a mark data associated with the payment transaction.
During the training phase of the trained neural network engine 236, the mark embedding network engine 222 (hereinafter alternatively referred to as “mark embedding layer 222”) is trained to convert categorical marks into numerical values. The mark embedding layer 222 includes suitable logic and/or interfaces for converting mark data to numerical mark embedding.
Next, a meta-features autoencoder 224 is trained to generate a meta-feature embedding. The meta-features autoencoder 224 involves an encoder-decoder architecture. For a given training input event sequence, all the meta-features are compressed using an encoder to an embedding layer and later decoded using a decoder. The pre-trained meta-features autoencoder 224 is used in the main architecture training. Also, the decoder is not required to train the main architecture as only the meta-feature embedding is used as an input.
Next, a pseudo label is generated by feeding the meta-feature embedding to an unsupervised learning algorithm 226. In one embodiment, the pseudo label is generated using Lloyd’s k-means unsupervised clustering algorithm. The pseudo label is used while training the main architecture.
A training input event embedding is generated by concatenating the time data, the numerical mark embedding, and the meta-feature embedding. The process is repeated for each payment transaction of a user. A training input event sequence embedding is formed by combining each training input event embedding.
Next, an event history vector (an example of a representation vector) is generated by feeding the training input event sequence embedding to a neural network model such as a recurrent neural network model 228. The RNN model 228 is trained to model the nonlinear dependency over both the markers and the timings from past events. The RNN model 228 is a feedforward neural network structure where additional edges, referred to as the recurrent edges, are added such that the outputs from the hidden units at the current time step are fed into them again as the future inputs at the next time step. In consequence, the same feedforward neural network structure is replicated at each time step, and the recurrent edges connect the hidden units of the network replicate at adjacent time steps together along time, that is, the hidden units with recurrent edges not only receive the input from the current data sample but also from the hidden units in the last time step. This feedback mechanism creates an internal state of the network to memorize the influence of each past data sample. In various embodiments, a modern variant Long Short-Term Memory (LSTM), or Gated recurrent units (GRU) of the RNN model 228 may be used without deviating from the scope.
A time computation neural network model 230 is trained to compute a probability density function of time data for the next training event by being fed the event history vector. The time computation neural network model 230 is trained using a log-likelihood loss function. Further, the time computation neural network model 230 is utilized to compute one or more parameters of a Gaussian Mixture Model (GMM) that is used to compute the probability density function of the time data. In one embodiment, negative log-likelihood is used to compare the predicted time data with the already known time data of the next event since the training dataset is already known.
A mark computation neural network model 232 is trained to compute a probability mass function of a mark data for the next training event by being fed the event history vector. The probability mass function of the mark data includes a categorical mark distribution. In one embodiment, negative log-likelihood is used to compare the predicted mark data with the already known mark data of the next event since the training dataset is already known.
A pseudo label computation neural network model 234 is trained to compute a probability density function of a pseudo label for the next training event by feeding the event history vector. The probability density function of the pseudo label includes a categorical pseudo label distribution. A prediction of the pseudo label for the next training event is computed from the categorical pseudo label distribution. The predicted pseudo label is compared with a known pseudo label using the categorical cross-entropy loss function.
Total of the negative log-likelihood computed for the mark data and the time data and categorical cross-entropy loss computed for the pseudo label represent the total error for the training input event sequence. The process is repeated for all the training input event sequences of the training dataset. The total losses for all the training input event sequences are added. The final value is used to update the weights of all the models/engines being trained as explained hereinabove. The process is repeated until a pre-determined threshold is reached. At the end of the training phase, the trained mark embedding network engine 222, the trained meta-features autoencoder 224, the trained RNN model 228, the trained time computation neural network model 230, the trained mark computation neural network model 232 and collectively thereby, the trained neural network engine 236 is achieved.
After training the neural network engine / model 236, during the validation phase, the processor 206 is configured to generate a validation dataset similar to the training dataset and perform validation of the trained neural network engine 236.
During the evaluation phase, the processor 206 is configured to generate an evaluation dataset that includes an input event sequence. The input event sequence includes a plurality of events. Each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user. Further, each event includes a plurality of meta-features, a time data, and a mark data associated with the payment transaction. The plurality of meta-features is retrieved from the transaction database 155 by the processor 206.
During the evaluation phase of the trained neural network engine 236, The trained mark embedding network engine 222 converts categorical marks into numerical values i.e., the mark data to a numerical mark embedding. A meta-feature embedding is generated by feeding the plurality of meta-features to the pre-trained meta-features autoencoder 224. The pseudo label generation is used only during the training of the trained neural network engine 236 to improve the predictive performance. An input event embedding is generated by concatenating the time data, the numerical mark embedding, and the meta-feature embedding. The process is repeated for each payment transaction of a user. An input event sequence embedding is formed by combining each input event embedding.
Next, an event history vector is generated by feeding the input event sequence embedding to the trained recurrent neural network model 228. The trained time computation neural network model 230 is used to compute a probability density function of a time data for a next event by feeding the event history vector. Further, the trained time computation neural network model 230 is utilized to compute one or more parameters of a Gaussian Mixture Model (GMM) that is used to compute the probability density function of the time data. The next event time is obtained by taking the mean of GMM.
The trained mark computation neural network model 232 is used to compute a probability mass function of a mark data for the next event by feeding the event history vector. The probability mass function of the mark data includes a categorical mark distribution. The next event mark is predicted as a mark with maximum probability.
Referring now to FIG. 3, a schematic block diagram representation 300 of training the meta-features autoencoder 224, is shown, in accordance with an embodiment of the present disclosure. The meta-features autoencoder 224 includes an encoder 304 and a decoder 306. A plurality of meta-features 302 associated with a payment transaction of a user and being part of the training dataset are retrieved from the transaction database 155. Some non-exhaustive examples of the meta-features include a payment card type, a transaction amount, a transaction time, an MCC, a super industry, an industry, and the like. The plurality of meta-features 302 is fed to the encoder 304 sequentially. For each input passed to the encoder 304, a corresponding output is generated by the decoder 306 and a corresponding mean absolute error between the input and the output is calculated. When all of the plurality of meta-features 302 is passed, the total mean absolute error is used as a loss function to update the weights of the encoder 304 and the decoder 306. The training process is repeated until the calculated error equals a pre-determined threshold. This results in the generation of a meta-feature embedding 308. The decoder 306 is only used while training the meta-features autoencoder 224 and not used while training the main models or during the evaluation phase. In one example embodiment, an LSTM Auto-encoder or other RNN autoencoding process is used for training the meta-features autoencoder 224.
FIG. 4 represents a schematic block diagram representation 300 of generating a pseudo label during the training phase of the trained neural network engine/model 236, in accordance with an example embodiment. In one embodiment, the meta-feature embedding 308 generated by the meta-features autoencoder 224 is fed to the unsupervised learning algorithm 226. In at least one embodiment, the unsupervised learning algorithm 226 is Lloyd k-means algorithm with silhouette score. ‘K’ refers to the total number of clusters to be defined in the entire dataset. There is a centroid chosen for a given cluster type which is used to calculate the distance of a given data point. The distance essentially represents the similarity of features of a data point to a cluster type. In other words, the K-means algorithm identifies the ‘K’ number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible. The ‘means’ in the K-means refers to averaging of the data i.e., finding the centroid. The goal is to group data into ‘K’ clusters. The Κ-means clustering algorithm uses iterative refinement to produce a final result. The algorithm inputs are the number of clusters ‘Κ’ and the data set. The data set is a collection of features for each data point (i.e., the meta-feature embedding 308). Lloyd's k-means algorithm has polynomial smoothed running time.
Determining the right number of clusters in a data set is important, not only because some clustering algorithms like k-means require such a parameter, but also because the appropriate number of clusters controls the proper granularity of cluster analysis. There are many possible ways to estimate the number of clusters. In at least one embodiment, the silhouette score method is utilized to validate the number of ‘K’. Silhouette Coefficient or silhouette score is a metric used to calculate the goodness of a clustering technique. Its value ranges from -1 to 1. ‘1’ means clusters are well apart from each other and clearly distinguished. ‘0’ means clusters are indifferent. ‘-1’ means clusters are assigned in the wrong way. The silhouette coefficient is also a measure of how similar a data point is within cluster (cohesion) compared to other clusters (separation).
The Silhouette Coefficient s for a single sample is then calculated as:
s=(b-a)/max(a,b)
Where, ‘a’ is the mean distance between a sample and all other points in the same class (mean intra-cluster distance)
‘b’ is the mean distance between a sample and all other points in the next nearest cluster (mean nearest-cluster distance)
The Silhouette Coefficient for a set of samples is computed as the mean of the Silhouette Coefficient for each sample.
AverageSilhouette = mean{S(i)}
In various embodiments, a number of other techniques may be utilized for validating ‘K’, such as cross-validation, information criteria, the information-theoretic jump method, the G-means algorithm, and the like.
A pseudo label 404 is generated as the output as shown. The pseudo label 404 generated in such a way is used to reveal more insights into the transaction events. The generation of the pseudo label 404 also helps in downsizing the number of inputs i.e., providing a solution for extreme class classification problems. For example, if the input marker is set as aggregate merchant type, then the number of inputs would be 5000 classes. Labeling such a huge number of inputs using a supervised learning algorithm is very difficult. Using the unsupervised learning algorithm 226, the number of inputs is reduced to a smaller number gradually which further increases the overall accuracy of the trained neural network engine 236. Over-clustering regularisation positively affects decision boundary and feature generalization.
FIG. 5 represents a schematic block diagram representation 500 of training the trained neural network engine 236, in accordance with an example embodiment. As explained with reference to FIG. 2, the processor 206 is configured to generate a training dataset. As shown, the training dataset includes at least one training input event sequence 502. The training input event sequence 502 includes a plurality of training events. Each training event corresponds to a payment transaction of a plurality of time-ordered transactions of a user. Further, each training event includes a plurality of meta-features, a time data, and a mark data associated with the payment transaction. Accordingly, a time-sequence 502a represents a plurality of time data, each time data associated with corresponding payment transaction of a plurality of transaction of a user. A mark-sequence 502b represents a plurality of mark data, each mark data associated with corresponding payment transaction of a plurality of transaction of the user. An input mark data represents the selection of a category such as an MCC associated with a payment transaction that is to be predicted for the next transaction.
Additionally, the training input event sequence 502 includes a meta-features sequence 502c that includes the plurality of meta-features (such as the plurality of meta-features 302 of FIG. 3) associated with each transaction of the plurality of transaction of the user as accessed using the feature extraction engine 220 of FIG. 2. Some non-exhaustive examples of the plurality of meta-features include a transaction amount, a transaction status, a transaction time, a transaction type, a card type, a card number, a merchant name, MCC, an aggregate merchant type, industry, super industry and the like. The mark embedding network engine 222 / the mark embedding layer 222 is trained to convert the mark data of the mark-sequence 502b to a numerical mark embedding(s).
As explained with reference to FIG. 3, the pre-trained meta-features autoencoder 224 that includes the encoder 304 is used to generate the meta-feature embeddings 503 by compressing all the meta-features from the meta-features sequence 502c.
A training input event embedding is generated by concatenating the time data of the time-sequence 502a, the numerical mark embedding, and the meta-feature embeddings 503. As the process is repeated for each payment transaction of the user, a training input event sequence embedding 504 is formed by combining each training input event embedding.
In at least one embodiment, a marked temporal point process is proposed to jointly model the time and the marker information by learning a general representation of the nonlinear dependency over the history based on recurrent neural networks. Using the RNN model 228, event history is embedded into a compact vector representation which can be used for predicting the next event time and marker type. Through training, the embedding vector of an event encodes its meaning relative to other events. Events that are similar tend to have embedding vectors closer to each other in the embedding space than those that are not. One such event history vector 506 is generated by feeding the training input event sequence embedding 504 to the Recurrent Neural Network (RNN) model 228. The RNN model 228 is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, the RNN model 228 uses the internal state (memory) to process variable-length sequences of inputs.
The time computation neural network model 230 is fed the event history vector 506 to compute a probability density function of a time data 508 (hereinafter alternatively referred to as ‘time 508’) for the next training event. As explained with reference to FIG. 2, the time computation neural network model 230 is trained using a log-likelihood loss function.
In an example embodiment, a probability density function (PDF) f_i^* (τ_i ) of the time is modeled using intensity-based function as follows:
f_i^* (τ_i )=λ^* (t_(i-1)+τ_i )"exp" (-∫_(t_(i-1))^(t_i)▒〖λ^* (t^' )dt'〗)
Where λ represents the intensity function.
The intensity-based function requires Monte Carlo estimation for log-likelihood.
In at least one embodiment, the probability density function (PDF) f_i^* (τ_i ) of the time is modeled using the intensity-free function. The intensity-free approach involves the estimation of conditional time PDF estimation without using the intensity function λ. To estimate conditional time PDF, Gaussian Mixture Model is used as parametric distribution as follows:
f_i^* (τ_i )=f(τ│w,μ,s)=∑_(c=1)^C▒〖w_c 1/(τs_c √2π) exp(-(logτ-μ_c )^2/(2s_c^2 )) 〗
Where, the GMM parameters w (weight), μ (mean), and s (standard deviation) are computed using the time computation neural network model 230.
In various embodiments, other possible choices for parametric distribution such as exponential distribution, Weibull distribution, etc. may be utilized for computation of the PDF of the time for the next event. In one embodiment, negative log-likelihood is used to compare the predicted time data 508 with the already known time data of the next event since the training dataset is already known.
The mark computation neural network model 232 is trained to compute a probability mass function p_i^* (m_i ) of a mark data 510 (hereinafter alternatively referred to as ‘mark 510’) for the next training event by feeding the event history vector 506. A probability mass function (PMF) is a function that models the potential outcomes of a discrete random variable. The probability mass function of the mark data includes a categorical mark distribution. In one embodiment, negative log-likelihood is used to compare the predicted mark data with the already known mark data of the next event since the training dataset is already known.
Marked TPP is a probabilistic generative model as it models joint probability distribution of time and mark of the next event. For the next event, the joint distribution of time / time data and mark / mark data is learned as follows:
Joint Probability (time, mark) = Probability (time) x Probability (mark)
Joint probability density function f_i^* (τ_i,m_i ) is represented as follows:
f_i^* (τ_i,m_i )=f_i^* (τ_i )∙p_i^* (m_i )
Where, * Symbol indicates dependence on event history,
f_i^* (τ_i ) is probability density function of the time,
p_i^* (m_i ) is probability mass function of the mark, and
τ_i represents inter-event time ti – ti-1.
As explained with reference to FIG. 4, the pseudo label 404 is generated by feeding the meta-feature embedding 308 to the unsupervised learning algorithm 226. The pseudo label 404 is used while training the main architecture i.e., the trained neural network engine 236. The probability density function of the pseudo label includes a categorical pseudo label distribution. A prediction of a pseudo label 512 for the next training event is computed from the categorical pseudo label distribution. As shown, the predicted pseudo label 512 is compared with the known pseudo label 404 using a categorical cross-entropy loss function 514. As shown, a training output 516 for the next event prediction includes the time 508, the mark 510, and the pseudo label 512.
Total of the negative log-likelihood computed for the mark data 510 and the time data and categorical cross-entropy loss computed for the pseudo label represents the total error for the training input event sequence. The process is repeated for all the sequences of the training dataset. The total losses for all the training input event sequences including the training input event sequence 502 are added. The final value is used to update the weights of all the models / engines being trained. The process is repeated until a pre-determined threshold is reached. At the end of the training phase, the trained mark embedding network engine 222, the trained meta-features autoencoder 224, the trained RNN model 228, the trained time computation neural network model 230, the trained mark computation neural network model 232, and collectively thereby, the trained neural network engine 236 is achieved.
FIG. 6 represents a schematic block diagram representation 600 of evaluating the trained neural network engine 236, in accordance with an example embodiment. As explained with reference to FIG. 2, the processor 206 is configured to generate an evaluation dataset. As shown, the evaluation dataset includes at least one input event sequence 610. The input event sequence 610 includes a plurality of events. Each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user. Further, each event includes a plurality of meta-features, a time data, and a mark data associated with the payment transaction. Accordingly, a time-sequence 602 represents a plurality of time-data, each time data associated with corresponding payment transaction of a plurality of transaction of a user. A mark-sequence 604 represents a plurality of mark data, each mark data associated with a corresponding payment transaction of a plurality of transactions of the user. An input mark data represents the selection of a category such as a payment instrument type associated with a payment transaction that is to be predicted for the next transaction.
Additionally, the input event sequence 610 includes a meta-features sequence 608 that includes the plurality of meta-features (such as the plurality of meta-features 302 of FIG. 3) associated with each transaction of the plurality of transaction of the user. A pre-trained mark embedding network engine 612 (i.e., the mark embedding network engine 222 trained during the training phase) converts the mark data of the mark-sequence 604 to a numerical mark embedding. Meta-feature embeddings 611 are generated using the pre-trained meta-features autoencoder 224 that is trained to compress the plurality of the meta-features from the meta-features sequence 608.
An input event embedding is generated by concatenating the time data of the time-sequence 602, the numerical mark embedding (not shown in the figure), and the meta-feature embeddings 611. As the process is repeated for each payment transaction of the user, an input event sequence embedding 614 is formed by combining each input event embedding.
An event history vector 618 is generated by feeding the input event sequence embedding 614 to a pre-trained RNN model 616 (i.e., the RNN model 228 trained during the training phase).
A pre-trained time computation neural network model 620 (i.e., the time computation neural network model 230 trained during the training phase) is used to compute a probability density function of a time data 624 (hereinafter alternatively referred to as ‘time 624’) as predicted for a next event by taking in input i.e., the event history vector 618.
As explained with reference to FIG. 5, the probability density function (PDF) f_i^* (τ_i ) of the time is computed using the intensity-free function. To estimate conditional time PDF, Gaussian Mixture Model is used as parametric distribution as follows:
f_i^* (τ_i )=f(τ│w,μ,s)=∑_(c=1)^C▒〖w_c 1/(τs_c √2π) exp(-(logτ-μ_c )^2/(2s_c^2 )) 〗 ……………. (1)
Where, the GMM parameters w (weight), μ (mean), and s (standard deviation) are computed using the pre-trained time computation neural network model 620.
In one embodiment, the event time is obtained by taking the mean of the above equation (1). Using the equation (1), f_i^* (τ_i ) is computed and thereafter the value of (τ_i ) is found which maximizes the value of f_i^*.
A pre-trained mark computation neural network model 622 (i.e., the mark computation neural network model 232 as trained during the training phase) is used to compute a probability mass function p_i^* (m_i ) of a mark data 626 (hereinafter alternatively referred to as ‘mark 626’) as predicted for the next event by taking in the input i.e., the event history vector 618. In one embodiment, the next event mark is predicted as a mark with maximum probability. The value of (m_i ) is computed which maximizes the value of p_i^*.
For the next event, the joint distribution of the time 624 / time data and the mark 626 / mark data is represented through joint probability density function f_i^* (τ_i,m_i ) as shown below:
f_i^* (τ_i,m_i )=f_i^* (τ_i )∙p_i^* (m_i )
Where, * Symbol indicates dependence on event history
f_i^* (τ_i ) is the probability density function of the time
p_i^* (m_i ) is probability mass function of the mark
τ_i represents inter-event time ti – ti-1.
As shown, an output 628 for the next event prediction includes the time 624, and the mark 626. As explained hereinabove, the proposed framework is used to estimate the time and the mark (e.g., merchant category code) of the next event through an over-clustering approach / extreme class classification problem. The disclosure involves a combination of supervised and unsupervised objectives to improve model generalization and to boost predictive performance i.e., the estimation of time and mark of the next event. In doing so, an intensity-free framework is used to learn the probability density function of the time. The intensity-free approach provides closed-form likelihood estimation compared to approximate likelihood estimation of intensity-based approaches. However, the trained neural network engine 236 is built in such a way that different intensity-based and intensity-free approaches can be easily plugged and played.
Various embodiments provide a flexible way to capture user-level information through the meta-features autoencoder 224 as it provides a compact representation of additional transactional attributes including numeric and/or categorical. When merchant category code is considered as mark, the proposed framework captures the user-level transactional behavior over many merchant category codes. This information plays a crucial role in the time-based hyper-personalization of the user. For example, a time-based analysis may be provided to different merchants to run different campaigns (like breakfast campaigns, new product campaigns, etc.). Further, coupon-based offers may be provided at a personalized level to attract both high spending and average spending individuals. For example, if a customer is not performing well, he might be given additional offers. If a customer is spending heavily, alternative purchase categories may be provided to him. Pitching discretionary products at the right time helps the merchants attract more customers than usual.
FIG. 7 represents a flow diagram of a computer-implemented method 700 for predicting payment transaction events, in accordance with an example embodiment. The method 700 depicted in the flow diagram may be executed by the system 120 or the system 200. Operations of the method 700, and combinations of operation in the method 700, may be implemented by, for example, hardware, firmware, a processor, circuitry and/or a different device associated with the execution of software that includes one or more computer program instructions. The method 700 starts at operation 702.
At operation 702, the method 700 includes performing, by a processor 206, operations 702a, 702b, and 702c for each event of a plurality of events forming an input event sequence of an evaluation dataset. Each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user. Each event includes a plurality of meta-features, a time data, and a mark data associated with the payment transaction.
At operation 702a, the method 700 includes generating, by the processor 206, a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder.
At operation 702b, the method 700 includes converting, by the processor 206, the mark data to a numerical mark embedding using a mark embedding network.
At operation 702c, the method 700 includes generating, by the processor 206, an input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding.
At operation 704, the method 700 includes forming, by the processor 206, an input event sequence embedding by combining each input event embedding.
At operation 706, the method 700 includes generating, by the processor 206, a representation vector by feeding the input event sequence embedding to a neural network model.
At operation 708, the method 700 includes computing, by the processor 206, a probability density function of a time data for a next event and a probability mass function of a mark data for the next event from the representation vector. The probability density function of the time data and the probability mass function of the mark data are utilized to compute a joint probability density function of the time data and the mark data for the next event prediction. The method 700 ends at operation 708.
Various embodiments provide multiple ways to benefit the customer, the merchant, and/or the issuer by developing the model as explained hereinabove. The developed model predicts the next best transaction in terms of predicting not only where the customer will buy but also when the customer will buy. Such information is useful in various scenarios. For example, if a user has been a customer of a merchant historically, the user is eligible for any campaign management held by the merchant. However, in a scenario where the user is not a customer of the merchant, and he transacts with the merchant for the first time, the user can still be identified as a potential customer using the AI model of the present disclosure for the merchant. As another example, if a merchant is launching a morning breakfast campaign, then even though a user has been historically a customer of the merchant, he may not be the potential customer for the morning breakfast campaign. Such prediction is made by the AI model and the merchant is notified about exactly whom he should target. Additionally, the customer may be provided with a coupon for the morning breakfast. Further, if the customer is going to transact around 30 dollars, the AI is utilized to push the customer to spend over 30 dollars by designing the coupon value accordingly.
FIG. 8 represents a flow diagram of another computer-implemented method 800 for predicting a payment transaction event, in accordance with an example embodiment. The method 800 depicted in the flow diagram may be executed by the system 120 or the system 200. Operations of the method 800, and combinations of operation in the method 800, may be implemented by, for example, hardware, firmware, a processor, circuitry and/or a different device associated with the execution of software that includes one or more computer program instructions. The method 800 starts at operation 802.
At operation 802, the method 800 includes performing, by a processor 206, operations 802a, 802b, and 802c for each event of a plurality of events forming an input event sequence of an evaluation dataset. Each event corresponds to a payment transaction of a plurality of time-ordered transactions of a user. Each event includes a plurality of meta-features, a time data, and a mark data associated with the payment transaction.
At operation 802a, the method 800 includes generating, by the processor 206, a meta-feature embedding by feeding the plurality of meta-features to a pre-trained meta-features autoencoder.
At operation 802b, the method 800 includes converting, by the processor 206, the mark data to a numerical mark embedding using a mark embedding network.
At operation 802c, the method 800 includes generating, by the processor 206, an input event embedding by concatenating the time data, the numerical mark embedding, and the meta-feature embedding.
At operation 804, the method 800 includes forming, by the processor 206, an input event sequence embedding by combining each input event embedding.
At operation 806, the method 800 includes generating, by the processor 206, an event history vector by feeding the input event sequence embedding to a recurrent neural network model.
At operation 808, the method 800 includes computing, by the processor 206, a probability density function of a time data for a next event using a Gaussian Mixture Model (GMM). One or more parameters of the GMM are computed by feeding the event history vector to a neural network model.
At operation 810, the method 800 includes computing, by the processor 206, a probability mass function of a mark data for a next event by feeding the event history vector to another neural network model. The probability mass function of the mark data includes a categorical mark distribution.
At operation 812, the method 800 includes computing, by the processor 206, a joint probability density function of the time data and the mark data for predicting the next event using the computed probability density function of the time data and the computed probability mass function of the mark data. The method 800 ends at operation 812.
Various embodiments of the present disclosure provide multiple advantages. These include predicting when the next transaction is likely to take place. This helps in the determination of the propensity of attrition. Providing suitable offers and discounts helps businesses to prevent the attrition of a customer. Predicting the likelihood and nature of a next transaction helps identify potential customers suitable for a particular promotion or campaign. Targeted campaigning helps with a better conversion rate and return on investment. Prediction using the next best transaction is done at the customer level keeping in mind the transactional context. Hyper-personalized recommendation results in increased customer interaction and loyalty. Pitching appropriate higher-value products increases the business’s revenue. Predicting top markers (e.g., MCC, merchant, industry) for the next transactions helps us identify the most suitable locations for transactions. An increased number of transactions per unit time generates better net revenue as compared to discrete transactions.
FIG. 9 is a simplified block diagram of a server system 900, in accordance with one embodiment of the present disclosure. In one embodiment, the server system 900 is an example of a server system that includes a payment transaction event prediction system 902a. The payment transaction event prediction system 902a is the same as the system 120 shown and explained with reference to FIG. 1. In one embodiment, the server system 900 is the payment server 140 of FIG. 1. The server system 900 includes a processing system 902 configured to extract programming instructions from a memory 904 to provide various features of the present disclosure. In at least one embodiment, the processing system 902 is a Graphics Processing Unit (GPU). Further, two or more components may be embodied in one single component, and/or one component may be configured using multiple sub-components to achieve the desired functionalities. Some components of the server system 900 may be configured using hardware elements, software elements, firmware elements and/or a combination thereof. In one embodiment, the server system 900 is configured to predict payment transaction events using deep learning techniques.
Via a communication interface 906, the processing system 902 receives information from a remote device 908 such as the issuer server 135, the acquirer server 130, the merchant server 125, the transaction database 155, the payment server 140 and the like. The processing system 902 also includes the payment transaction event prediction system 902a. The server system 900 may perform similar operations as performed by the system 200 for generating the plurality of meta-features, generating the meta-feature embeddings, converting, the mark data to numerical mark embeddings, forming an input event sequence embedding by combining each input event embedding, generating a representation vector, computing a probability density function of a time data for a next event and a probability mass function of a mark data for the next event from the representation vector and the like.
In other words, for a given payment instrument, such as a payment card, all transaction data over a predetermined time period is pulled. There may be such multiple events where a random number of transactions occur for example, for every year. Additional features are required to reveal characteristics. All these features called as the meta-features are combined and an embedding is generated out of the meta-features by feeding them to an unsupervised clustering model. Meta features can be any attribute at the transaction level. That is the additional input that is being fed to the RNN compared to a normal TPP process.
The components of the server system 900 provided herein may not be exhaustive, and the server system 900 may include more or fewer components than those depicted in FIG. 9. Further, two or more components may be embodied in one single component, and/or one component may be configured using multiple sub-components to achieve the desired functionalities. Some components of the server system 900 may be configured using hardware elements, software elements, firmware elements, and/or a combination thereof.
The disclosed method with reference to FIG. 7 and FIG. 8, or one or more operations of the system 200 may be implemented using software including computer-executable instructions stored on one or more computer-readable media (e.g., non-transitory computer-readable media, such as one or more optical media discs, volatile memory components (e.g., DRAM or SRAM)), or nonvolatile memory or storage components (e.g., hard drives or solid-state nonvolatile memory components, such as Flash memory components) and executed on a computer (e.g., any suitable computer, such as a laptop computer, net book, Web book, tablet computing device, smartphone, or other mobile computing device). Such software may be executed, for example, on a single local computer or in a network environment (e.g., via the Internet, a wide-area network, a local-area network, a remote web-based server, a client-server network (such as a cloud computing network), or other such network) using one or more network computers. Additionally, any of the intermediate or final data created and used during implementation of the disclosed methods or systems may also be stored on one or more computer-readable media (e.g., non-transitory computer-readable media) and are considered to be within the scope of the disclosed technology. Furthermore, any of the software-based embodiments may be uploaded, downloaded, or remotely accessed through a suitable communication means. Such suitable communication means include, for example, the Internet, the World Wide Web, an intranet, software applications, cable (including fiber optic cable), magnetic communications, electromagnetic communications (including RF, microwave, and infrared communications), electronic communications, or other such communication means.
Although the invention has been described with reference to specific exemplary embodiments, it is noted that various modifications and changes may be made to these embodiments without departing from the broad spirit and scope of the invention. For example, the various operations, blocks, etc., described herein may be enabled and operated using hardware circuitry (for example, complementary metal-oxide semiconductor (CMOS) based logic circuitry), firmware, software, and/or any combination of hardware, firmware, and/or software (for example, embodied in a machine-readable medium). For example, the apparatuses and methods may be embodied using transistors, logic gates, and electrical circuits (for example, application-specific integrated circuit (ASIC) circuitry and/or in Digital Signal Processor (DSP) circuitry).
Particularly, the system 200 and its various components may be enabled using software and/or using transistors, logic gates, and electrical circuits (for example, integrated circuit circuitry such as ASIC circuitry). Various embodiments of the invention may include one or more computer programs stored or otherwise embodied on a computer-readable medium, wherein the computer programs are configured to cause a processor of the computer to perform one or more operations. A computer-readable medium storing, embodying, or encoded with a computer program, or similar language, may be embodied as a tangible data storage device storing one or more software programs that are configured to cause a processor or computer to perform one or more operations. Such operations may be, for example, any of the steps or operations described herein. In some embodiments, the computer programs may be stored and provided to a computer using any type of non-transitory computer-readable media. Non-transitory computer-readable media include any type of tangible storage media. Examples of non-transitory computer-readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (compact disc read-only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (BLU-RAY® Disc), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash memory, RAM (random access memory), etc.). Additionally, a tangible data storage device may be embodied as one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. In some embodiments, the computer programs may be provided to a computer using any type of transitory computer-readable media. Examples of transitory computer-readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer-readable media can provide the program to a computer via a wired communication line (e.g., electric wires, and optical fibers) or a wireless communication line.
Various embodiments of the invention, as discussed above, may be practiced with steps and/or operations in a different order, and/or with hardware elements in configurations, which are different than those which are disclosed. Therefore, although the invention has been described based upon these exemplary embodiments, it is noted that certain modifications, variations, and alternative constructions may be apparent and well within the spirit and scope of the invention.
Although various exemplary embodiments of the invention are described herein in a language specific to structural features and/or methodological acts, the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as exemplary forms of implementing the claims.
| # | Name | Date |
|---|---|---|
| 1 | 202141057409-STATEMENT OF UNDERTAKING (FORM 3) [09-12-2021(online)].pdf | 2021-12-09 |
| 2 | 202141057409-POWER OF AUTHORITY [09-12-2021(online)].pdf | 2021-12-09 |
| 3 | 202141057409-FORM 1 [09-12-2021(online)].pdf | 2021-12-09 |
| 4 | 202141057409-FIGURE OF ABSTRACT [09-12-2021(online)].jpg | 2021-12-09 |
| 5 | 202141057409-DRAWINGS [09-12-2021(online)].pdf | 2021-12-09 |
| 6 | 202141057409-DECLARATION OF INVENTORSHIP (FORM 5) [09-12-2021(online)].pdf | 2021-12-09 |
| 7 | 202141057409-COMPLETE SPECIFICATION [09-12-2021(online)].pdf | 2021-12-09 |
| 8 | 202141057409-Proof of Right [15-01-2022(online)].pdf | 2022-01-15 |
| 9 | 202141057409-Correspondence And Assignment_07-03-2022.pdf | 2022-03-07 |
| 10 | 202141057409-Correspondence_Form-26_26-04-2022.pdf | 2022-04-26 |