Abstract: Embodiments provide methods and systems for scalable unsupervised graph representation learning for node embedding. The method performed by a server system includes accessing homogenous graph of historical interaction data associated with a set of entities from a database. The method includes segmenting the homogenous graph into a plurality of sub-graphs based, at least in part, on one or more graph sampler methods. The method includes identifying positive and negative samples corresponding to each node based, at least in part, on a set of sampling methods. The method includes training of a plurality of neural networks based, at least in part, on training data and a cumulative loss function. The method includes storing final node representations of the plurality of nodes after completion of the training process to determine prediction outputs for a plurality of graph context prediction tasks associated with the trained plurality of neural networks.
Claims:CLAIMS
We claim:
1. A computer-implemented method, comprising:
accessing, by a server system, a homogenous graph of historical interaction data associated with a set of entities from a database, the homogenous graph comprising a plurality of nodes;
segmenting, by the server system, the homogenous graph into a plurality of sub-graphs based, at least in part, on one or more graph sampler methods;
identifying, by the server system, positive and negative samples corresponding to each node based, at least in part, on a set of sampling methods, the set of sampling methods comprises one of: node neighborhood similarity method and neighborhood feature similarity method;
training, by the server system, a plurality of neural networks based, at least in part, on training data and a cumulative loss function, and the cumulative loss function determined based on a graph-level loss function, and a summation of a first node-level loss value and a second node-level loss value corresponding to each node; and
storing, by the server system, final node representations of the plurality of nodes after completion of training process to determine prediction outputs for a plurality of graph context prediction tasks associated with the trained plurality of neural networks.
2. The computer-implemented method of claim 1, wherein the plurality of neural networks comprises graph neural network (GNN) based encoders, each GNN based encoder associated with a sub-graph of the plurality of sub-graphs, and wherein the training data comprises the positive and negative samples corresponding to each node and graph data corresponding to each sub-graph.
3. The computer-implemented method of claim 1, wherein training of the plurality of neural networks is performed by running a plurality of training epochs, and wherein each training epoch comprises executing a plurality of operations for an individual node associated with each sub-graph, the plurality of operations comprising:
generating, by the server system, first node representations of the individual node that is sampled in one or more sub-graphs of the plurality of sub-graphs, the first node representations of the individual node for the one or more sub-graphs generated based, at least in part, on corresponding graph neural network (GNN) based encoders associated with the one or more sub-graphs;
determining, by the server system, an attention context vector associated with the individual node based, at least in part, on an attention mechanism and the first node representations; and
aggregating, by the server system, the first node representations of the individual node generated for the one or more sub-graphs to determine an intermediate node representation of the individual node based, at least in part, on a non-linearity function and the attention context vector.
4. The computer-implemented method of claim 1, wherein the node neighborhood similarity method comprises:
identifying, by the server system, first set of positive samples and first set of negative samples corresponding to each node associated based, at least in part, on proximity and influence of direct neighbor nodes of each node in the homogenous graph.
5. The computer-implemented method of claim 1, wherein the neighborhood feature similarity method comprises:
identifying, by the server system, second set of positive samples and second set of negative samples corresponding to each node based, at least in part, on node feature similarity at sub-graph level.
6. The computer-implemented method of claims 4 or 5, further comprising:
calculating, by the server system, the first node-level loss value corresponding to each node based, at least in part, a first loss function (L1), wherein the first loss function (L1) is defined with an objective such that a node representation of each node is in proximity of the first set of positive samples in an embedding space; and
calculating, by the server system, the second node-level loss value corresponding to each node based, at least in part, a second loss function (L2), wherein the second loss function (L2) is defined with an objective such that the node representation of each node is in proximity of the second set of positive samples in an embedding space.
7. The computer-implemented method of claim 1, further comprising:
optimizing, by the server system, neural network parameters of the plurality of neural networks and attention context vectors of the plurality of nodes to reduce the cumulative loss function till a threshold value for the training data.
8. The computer-implemented method of claim 1, further comprising:
calculating, by the server system, a prior graph similarity distribution between the plurality of sub-graphs based, at least in part, on Weisfeiler-Lehman (WL) graph kernels, the prior graph similarity distribution calculated for identifying similar sub-graphs of the plurality of sub-graphs.
9. The computer-implemented method of claim 1, further comprising:
calculating, by the server system, a prior graph similarity distribution based, at least in part, on Weisfeiler-Lehman graph kernels, the prior graph similarity distribution calculated for identifying similar sub-graphs of the plurality of sub-graphs;
calculating, by the server system, a posterior graph similarity distribution based, at least in part, on cosine similarity between aggregated sub-graph representations of the plurality of sub-graphs; and
comparing, by the server system, the posterior graph similarity distribution with the prior graph similarity distribution for calculating the graph-level loss function based on KL divergence.
10. The computer-implemented method of claim 1, wherein the plurality of graph context prediction tasks comprises: (a) predicting fraudulent transactions, (b) predicting anti-money laundering activities, and (c) predicting transaction patterns in Peer to Peer (P2P) network based transactions.
11. A server system configured to perform the computer-implemented method as claimed in any of the claims 1-10.
, Description:
FORM 2
THE PATENTS ACT 1970
(39 of 1970)
&
The Patent Rules 2003
COMPLETE SPECIFICATION
(refer section 10 & rule 13)
TITLE OF THE INVENTION:
METHODS AND SYSTEMS FOR SCALABLE UNSUPERVISED GRAPH REPRESENTATION LEARNING FOR NODE EMBEDDING
APPLICANT(S):
Name:
Nationality:
Address:
MASTERCARD INTERNATIONAL INCORPORATED
United States of America
2000 Purchase Street, Purchase, NY 10577, United States of America
PREAMBLE TO THE DESCRIPTION
The following specification particularly describes the invention and the manner in which it is to be performed.
DESCRIPTION
(See next page)
METHODS AND SYSTEMS FOR SCALABLE UNSUPERVISED GRAPH REPRESENTATION LEARNING FOR NODE EMBEDDING
TECHNICAL FIELD
The present disclosure relates to artificial intelligence systems and, more particularly to, electronic methods and complex processing systems for scalable unsupervised graph representation learning for node embedding.
BACKGROUND
Real-world issues can be abstracted into graph models, i.e., a network graph of nodes and edges. The network graph represents a large network of relationships between various entities. Generally, a network graph G contains a set of nodes V and a set of edges E. Additionally, the nodes may represent entities that belong to the real world and the edges may represent interconnections or relationships between these entities. Each entity may be represented as a node or vertex, and relationships between these nodes are represented as links or edges. An edge (u, v) ∈ E exists between a pair of nodes u and v if there is a relation between these nodes. The edge may further be undirected (i.e., to capture symmetric relation between nodes) or directed (i.e., to capture asymmetric relation between nodes). For example, an undirected graph may be used to represent relationships between friends of a person in a social network as the friendship is mutual, and a directed graph may be used to represent relationships between various web pages in World Wide Web as one website may contain a hyperlink to connect to another website and so on. Furthermore, the graphs may be classified as either homogeneous or heterogeneous. In homogeneous graphs, all the nodes represent entities of the same type (e.g., friends in a social network, etc.), and all the edges represent the relationships (e.g., mutual friends) of the same type. In heterogeneous graphs, the nodes may represent the entities of different types (e.g., buyers, sellers, products, etc.) and edges may represent the relationships of different types (e.g., wants to buy, has bought, is selling, etc.).
To extract effective information from any graph, learning representations (e.g., embeddings) of nodes of the graph is an important and ubiquitous task. In general, representation learning refers to the ability of learning complex relations between the different nodes of the graph from a high-dimensional graph structure to a low dimensional dense vector (i.e., embeddings). The learned representations (i.e., embeddings) may further be used to perform tasks such as link prediction, analysis, and so on. However, it is extremely difficult to extract effective information from the graphs in an unsupervised manner. In addition, the scalability of the graphs leads to a problem in learning the representations as the graph may contain a very large number of nodes (e.g., trillions of nodes). Further, the complex nature of unlabeled graphs makes it difficult to learn the node representations for the unlabeled graphs.
There are existing solutions such as the GraphSage model set forth in a paper titled “Inductive Representation Learning on Large Graphs” authored by William L. Hamilton et al. published on NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems in December 2017, and GraphSaint model set forth in a paper titled “GraphSAINT: Graph Sampling Based Inductive Learning Method” authored by Hanqing Zeng et al. in International conference on learning representations, 2020, pp. 1–19. These existing solutions utilize different types of sampling strategies to scale the graph learning problem. In the GraphSage model, the local neighborhood sampling strategy is used based on either uniform sampling or importance sampling. However, using the local neighborhood sampling strategy does not fully resolve the problem of neighborhood explosion when the number of hops is increased to analyze the neighborhood nodes.
In view of the above discussion, there exists a technological need for a method for unsupervised graph representation learning method for node embeddings.
SUMMARY
Various embodiments of the present disclosure provide methods and systems for scalable unsupervised graph representation learning for node embedding.
In an embodiment, a computer-implemented method is disclosed. The method includes accessing, by a server system, a homogenous graph of historical interaction data associated with a set of entities from a database. The homogenous graph includes a plurality of nodes. The method includes segmenting, by the server system, the homogenous graph into a plurality of sub-graphs based, at least in part, on one or more graph sampler methods. The method includes identifying, by the server system, positive and negative samples corresponding to each node based, at least in part, on a set of sampling methods. The set of sampling methods includes one of: node neighborhood similarity method and neighborhood feature similarity method. The method includes training, by the server system, a plurality of neural networks based, at least in part, on training data and a cumulative loss function determined based on a graph-level loss function, and a summation of first node-level loss value and second node-level loss value corresponding to each node. The method further includes storing, by the server system, final node representations of the plurality of nodes after completion of the training process, wherein the final node representations are used to determine prediction outputs for a plurality of graph context prediction tasks associated with the trained plurality of neural networks.
Other aspects and example embodiments are provided in the drawings and the detailed description that follows.
BRIEF DESCRIPTION OF THE FIGURES
For a more complete understanding of example embodiments of the present technology, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
FIG. 1A illustrates an exemplary representation of an environment related to at least some embodiments of the present disclosure;
FIG. 1B illustrates another exemplary representation of an environment related to at least some embodiments of the present disclosure;
FIG. 2 illustrates a simplified block diagram of a server system, in accordance with an embodiment of the present disclosure;
FIG. 3 is an example representation of communication data flow between various modules of the server system, in accordance with an embodiment of the present disclosure;
FIG. 4 illustrates a block diagram representation of training a plurality of neural networks included in an unsupervised graph representation learning model, in accordance with an embodiment of the present disclosure;
FIG. 5 is a flow chart of a training process for an unsupervised graph representation learning model, in accordance with an embodiment of the present disclosure;
FIGS. 6A and 6B, collectively, represent a flow diagram of a method for predicting fraudulent transactions based on the unsupervised graph representation learning model, in accordance with an embodiment of the present disclosure; and
FIG. 7 illustrates a flow diagram depicting a method for unsupervised graph representation learning for homogeneous graphs, in accordance with an embodiment of the present disclosure.
The drawings referred to in this description are not to be understood as being drawn to scale except if specifically noted, and such drawings are only exemplary in nature.
DETAILED DESCRIPTION
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure can be practiced without these specific details.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. The appearance of the phrase “in an embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
Moreover, although the following description contains many specifics for the purposes of illustration, anyone skilled in the art will appreciate that many variations and/or alterations to said details are within the scope of the present disclosure. Similarly, although many of the features of the present disclosure are described in terms of each other, or conjunction with each other, one skilled in the art will appreciate that many of these features can be provided independently of other features. Accordingly, this description of the present disclosure is set forth without any loss of generality to, and without imposing limitations upon, the present disclosure.
The term "payment network", used herein, refers to a network or collection of systems used for the transfer of funds through the use of cash substitutes. Payment networks may use a variety of different protocols and procedures to process the transfer of money for various types of transactions. Transactions that may be performed via a payment network may include product or service purchases, credit purchases, debit transactions, fund transfers, account withdrawals, etc. Payment networks may be configured to perform transactions via cash-substitutes, which may include payment cards, letters of credit, checks, financial accounts, etc. Examples of networks or systems configured to perform as payment networks include those operated by such as Mastercard®.
The terms "embeddings" and "representations" are used interchangeably throughout the description and refer to a low-dimensional state or space in which high-dimensional vectors can be translated. More specifically, embeddings make it easier to perform machine learning analysis on high-dimensional vector formats.
The term "merchant", used throughout the description generally refers to a seller, a retailer, a purchase location, an organization, or any other entity that is in the business of selling goods or providing services, and it can refer to either a single business location or a chain of business locations of the same entity.
The terms "cardholder", “user”, and “customer” are used interchangeably throughout the description and refer to a person who holds a credit or a debit card that will be used by a merchant to perform a payment transaction.
OVERVIEW
Various embodiments of the present disclosure provide methods, systems electronic devices, and computer program products for scalable unsupervised representation learning for graphs.
Conventional graph representation learning models have various limitations or drawbacks. For example, the conventional graph representation learning models are non-scalable and could not perform global representation learning of nodes considering common interests or connections of nodes.
There are a plethora of graph representation learning models such as the GraphSage model set forth in a paper titled “Inductive Representation Learning on Large Graphs” authored by William L. Hamilton et al. published on NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems in December 2017, and GraphSaint model set forth in a paper titled “GraphSAINT: Graph Sampling Based Inductive Learning Method” authored by Hanqing Zeng et al. in International conference on learning representations, 2020, pp. 1–19 that can be used for graph context prediction tasks. The GraphSage model introduces inductive learning on graphs by considering input features as well along with topology. Both of the supervised learning for tasks such as classification and the unsupervised learning for learning node representations were defined. However, the GraphSage model does uniform sampling and restricts the neighborhood size at every layer to a fixed value for low and fixed time complexity. The uniform sampling introduces bias and is not scalable when the entire neighborhood is considered for every node.
Further, the GraphSAINT model is a supervised algorithm that introduced sampling at graph level by considering both bias and variance with a decoupled sampling and training. However, the GraphSAINT model can learn node representations only when labels are available, hence, the GraphSAINT model cannot work in an unsupervised setting. Apart from the above problems, neighborhood explosion is also a predominant problem that limits any graph algorithms capability to scale it to a higher number of nodes.
To overcome such problems or limitations, the present disclosure describes a server system that is configured to perform scalable unsupervised graph representation learning for graphs (i.e., homogeneous graphs or heterogeneous graphs). More specifically, the server system is configured to execute graph neural network (GNN) based encoders to learn node representations or embeddings for the graph in an unsupervised learning manner. More specifically, embodiments of the present disclosure disclose a method for learning node representations (i.e., embeddings) for each node of a homogeneous graph. In the case of a heterogeneous graph, the heterogeneous graph is first converted into homogeneous graphs. In this manner, embodiments of the present disclosure disclose a method for learning the node representations for the homogeneous graph or the heterogeneous graph.
In addition, the present disclosure segments the homogeneous graph into a plurality of sub-graphs to address the problem of scalability. Further, the node embeddings for each node of each sub-graph are obtained from GNN based encoders. Furthermore, the node embeddings for the nodes that come under various sub-graphs (i.e., if a single node is sampled in various sub-graphs) are aggregated based on an attention mechanism to determine an intermediate node embedding. Moreover, the present disclosure utilizes a cumulative loss function based on a graph-level loss function and node-level loss values.
The server system includes at least a processor and a memory. In one non-limiting example, the server system is a payment server. The server system is configured to access a homogeneous graph including a plurality of nodes from a database. In one embodiment, the homogeneous graph is stored as a graph data structure in the database. In addition, the homogeneous graph is generated based on historical interaction data associated with a set of entities. In one example, the set of entities may include cardholders, merchants, authors, papers, music titles, and the like. In one non-limiting example, the historical interaction data is payment transaction data of payment transactions performed between a plurality of cardholders and a plurality of merchants over a period of time (e.g., 1 year, 2 years, 5 years, etc.).
The server system is configured to segment the homogeneous graph into a plurality of sub-graphs based, at least in part, on one or more graph sampler methods. In one embodiment, the one or more graph sampler methods include a node sampler method for sampling nodes and an edge sampler method for sampling edges of the homogeneous graph. In one embodiment, the node sampler method defines that probability of sampling a node 'u' is directly proportional to degree of the node 'u', and the node sampler method defines that probability of sampling an edge 'e' existing between two nodes 'u' and 'v' is directly proportional to sum of inverse of degrees of the two nodes 'u' and 'v'. It can be interpreted that unsupervised learning may bring bias and variance in learned node representations but it is not true in the present disclosure. In the present disclosure, bias problem is addressed by using the aggregator normalization constant α_(u,v)=p_(u,v) \/p_v defined by the GraphSAINT model and use an approximate edge sampler to minimize variance across each dimension of the embedding as defined by the GraphSAINT model (e.g., p_e∝1/deg(u)+1/deg(v)).
The server system is configured to identify positive and negative samples corresponding to each node based, at least in part, on a set of sampling methods. The set of sampling methods includes one of: node neighborhood similarity method and neighborhood feature similarity method. During the execution of the node neighborhood similarity method, the server system is configured to identify a first set of positive samples and a first set of negative samples corresponding to each node based, at least in part, on proximity and influence of direct neighbor nodes (i.e., first hop neighbors) of each node in the homogeneous graph.
In one embodiment, the server system is configured to generate an adjacency matrix and a feature matrix for each sub-graph of the plurality of sub-graphs. During the execution of the neighborhood feature similarity method, the server system is configured to identify a second set of positive samples and a second set of negative samples corresponding to each node based, at least in part, on node feature similarity at sub-graph level. The server system is configured to train a plurality of neural networks based, at least in part, on training data and a cumulative loss function. The cumulative loss function is determined based on a graph-level loss function, and a summation of the first node-level loss value and second node-level loss value corresponding to each node.
The training of the plurality of neural networks is performed by running a plurality of training epochs. Each training epoch includes executing a plurality of operations for an individual node associated with each sub-graph. The plurality of neural networks includes graph neural network (GNN) based encoders. Each GNN based encoder is associated with a sub-graph of the plurality of sub-graphs. The training data includes the positive and negative samples corresponding to each node and graph data corresponding to each sub-graph.
Initially, the server system is configured to execute the first operation to generate first node representations of the individual node that is sampled in one or more sub-graphs of the plurality of sub-graphs. The first node representations of the individual node for the one or more sub-graphs are generated based, at least in part, on corresponding graph neural network (GNN) based encoders associated with the one or more sub-graphs. In other words, the GNN based encoder is used as an aggregator to incorporate both topology and input features so that latent representation for a node is obtained. So, when a node is sampled in multiple sub-graphs after the final GNN based encoder layer in every network, an attention mechanism is used to obtain a unique representation for the node from all the above representations. In addition, the server system is configured to execute a second operation to determine an attention context vector associated with the individual node based, at least in part, on an attention mechanism and the first node representations. The server system is further configured to execute a third operation to aggregate the first node representations of the individual node generated for the one or more sub-graphs to determine an intermediate node representation of the individual node based, at least in part, on a non-linearity function and the attention context vector.
Thereafter, the server system is configured to optimize neural network parameters (e.g., weights and biases) of the plurality of neural networks and attention context vectors of the plurality of nodes to reduce the cumulative loss function to a threshold value for the training data. After completion of the training process, the server system is configured to store final node representations of the plurality of nodes. Further, the final node representations are used to determine prediction outputs for a plurality of graph context prediction tasks associated with the trained plurality of neural networks. In one embodiment, the plurality of graph context prediction tasks includes: (a) predicting fraudulent transactions, (b) predicting anti-money laundering activities, and (c) predicting transaction patterns in P2P (Peer to Peer) network-based transactions.
The server system is further configured to calculate the first node-level loss value corresponding to each node based, at least in part, on a first loss function (L1). The first loss function (L1) is defined with an objective such that a node representation of each node is in the proximity of the first set of positive samples in an embedding space. Furthermore, the server system is configured to calculate the second node-level loss value corresponding to each node based, at least in part, a second loss function (L2). The second loss function (L2) is defined with an objective such that the node representation of each node is in the proximity of the second set of positive samples in the embedding space.
In one embodiment, the server system is configured to calculate a prior graph similarity distribution between the plurality of sub-graphs to identify similar sub-graphs of the plurality of sub-graphs based, at least in part, on Weisfeiler-Lehman graph kernels. In addition, the server system is configured to calculate a posterior graph similarity distribution between the plurality of sub-graphs to identify similar sub-graphs of the plurality of sub-graphs in the embedding space. Thereafter, the server system is configured to compare the posterior graph similarity distribution with the prior graph similarity distribution to calculate the graph-level loss function using the KL divergence layer.
Various embodiments of the present disclosure offer multiple advantages and technical effects. For instance, the present disclosure employs multiple strategies to ensure the entire graph structure is captured even when positive and negative samples are sampled at the sub-graph scale. The present disclosure includes a cumulative loss function that incorporates these strategies and optimizes weights through the attention layer so that overall graph structure (global) and local graph structure is learned in an effective manner. In other words, the cumulative loss function ensures that the learned representation is optimal even though node labels are not present and the problem is unsupervised. The cumulative loss function incorporates local and global graph learning components that make the proposed algorithm effective even in the absence of node labels. Thus, the proposed solution devised a strategy to process big graph (millions of nodes) by efficient graph sampling and back-propagating normalized unsupervised loss based on some novel key metrics such as: Positive sample/negative samples based on neighborhood proximity, negative/positive samples based on node attribute and keeping KL divergence of sub-graph similarity low.
Furthermore, the present disclosure provides significantly more robust solutions because of handling simultaneous/concurrent processor execution (such as applying one or more neural network models over the same or different input, simultaneously). Even further, the present disclosure improves the operations of processors because, by performing these synergistic operations to determine node representations of homogenous or heterogeneous graphs that can be used for further downstream applications, the processors will require fewer computation cycles in learning node representations of the homogenous or heterogeneous graphs.
Various example embodiments of the present disclosure are described hereinafter with reference to FIGS. 1A-1B to 7.
FIG. 1A illustrates an exemplary representation of an environment 100 related to at least some embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, scalable unsupervised representation learning for graphs, etc. The environment 100 generally includes a server system 102, a set of entities 104a, 104b, and 104c, a graph database 106, and a database 108, each coupled to, and in communication with (and/or with access to) a network 110. The network 110 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber-optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among the entities illustrated in FIG. 1A, or any combination thereof.
Various entities in the environment 100 may connect to the network 110 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof. For example, the network 110 may include multiple different networks, such as a private network made accessible by the network 110 to the server system 102, and a public network (e.g., the Internet, etc.).
In one embodiment, the set of entities 104a-104c include real-world objects. In an example, the set of entities 104a-104c may include cardholders, users, authors, and the like. In another example, the set of entities 104a-104c may include merchants, movies, books, and the like. In addition, an entity of the set of entities 104a-104c may connect with or have a relationship with other entities of the set of entities 104a-104c. More specifically, an entity of the set of entities 104a-104c may be associated (in some way or the other) or interact with other entities of the set of entities 104a-104c.
The server system 102 is configured to perform one or more of the operations described herein. The server system 102 is configured to perform scalable unsupervised representation learning for graphs (i.e., homogeneous graphs or heterogeneous graphs). The server system 102 is a separate part of the environment 100 and may operate apart from (but still in communication with, for example, via the network 110), the set of entities 104a-104c, and any third-party external servers (to access data to perform the various operations described herein). However, in other embodiments, the server system 102 may actually be incorporated, in whole or in part, into one or more parts of the environment 100, for example, the entity 104a. In addition, the server system 102 should be understood to be embodied in at least one computing device in communication with the network 110, which may be specifically configured, via executable instructions, to perform as described herein, and/or embodied in at least one non-transitory computer-readable media.
The graph database 106 may securely store graph data. The graph database 106 may store topological graph data. The server system 102 is configured to access a homogeneous graph of historical interaction data associated with the set of entities 104a-104c from the graph database 106. In one embodiment, the server system 102 may access the historical interaction data of a plurality of interactions performed among the set of entities 104a-104c. In the homogeneous graph, the set of entities 104a-104c are represented as nodes, and interactions between the set of entities 104a-104c are represented as edges. In one example, the set of entities 104a-104c may represent a set of users, a set of products listed on an e-commerce website or platform, a set of authors, a set of books, or the like.
The server system 102 is configured to segregate the homogeneous graph into the plurality of sub-graphs. In one embodiment, each sub-graph may include a subset of the plurality of nodes of the homogeneous graph. In one embodiment, the graph database 106 stores data associated with the homogeneous graph and the plurality of sub-graphs. The server system 102 is further configured to identify positive samples and negative samples corresponding to each node associated with each sub-graph based, at least in part, on a set of sampling methods.
Furthermore, the server system 102 is configured to train a plurality of neural networks based, at least in part, on training data (i.e., the positive samples and the negative samples corresponding to each node) and a cumulative loss function. To train the plurality of neural networks, the server system 102 is configured to execute a plurality of operations for an individual node associated with each sub-graph. The server system 102 is configured to generate intermediate node embeddings of the plurality of nodes included in the plurality of sub-graphs based on graph neural network (GNN) encoders stored in the database 108. In one embodiment, the database 108 provides storage location to data associated with the GNN based encoders 108a.
The environment 100 may further include a client computer 112 including any suitable device external to the server system 102. In some embodiments, the client computer 112 may receive outputs and/or decisions made by the server system 102. In one embodiment, the client computer 112 may send a prediction request to the server system 102. The prediction request may include information on the graph context prediction task selected by a client. After the prediction request, the server system 102 may execute an unsupervised graph representation learning model and predict an output corresponding to the graph context prediction task based on learned node representations.
FIG. 1B illustrates another exemplary representation of an environment 120 related to at least some embodiments of the present disclosure. Although the environment 120 is presented in one arrangement, other embodiments may include the parts of the environment 120 (or other parts) arranged otherwise depending on, for example, scalable unsupervised representation learning for graphs (i.e., homogeneous graphs or heterogeneous graphs), etc. The environment 120 generally includes a server system 122, a plurality of cardholders 124a, 124b, and 124c, a plurality of merchants 126a, 126b, and 126c, a database 128, a graph database 132, an issuer server 134, an acquirer server 136, and a payment network 138 including a payment server 140, each coupled to, and in communication with (and/or with access to) a network 130. The network 130 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber-optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among the entities illustrated in FIG. 1B, or any combination thereof.
Various entities in the environment 120 may connect to the network 130 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof. For example, the network 130 may include multiple different networks, such as a private network made accessible by the network 130 to the server system 122, and a public network (e.g., the Internet, etc.).
The plurality of cardholders 124a-124c includes a list of cardholders that may have performed a payment transaction through a payment instrument (e.g., payment card, payment wallet, payment account, etc.) at the plurality of merchants 126a-126c. In one embodiment, the payment account of the plurality of cardholders 124a-124c may be associated with an issuing bank (e.g., the issuer server 134). In one example, the plurality of cardholders 124a-124c may have utilized the payment instruments to perform the payment transactions at the plurality of merchants 126a-126c (e.g., payment terminals associated with the merchants 126a-126c, a merchant website, etc.).
In one embodiment, the plurality of cardholders 124a-124c may have performed the payment transaction online (i.e., by accessing merchant’s website on a web browser or application installed in a computer system) or offline (i.e., by performing the payment transaction on a payment terminal (e.g., point-of-sale (POS) device, automated teller machine (ATM), etc.) installed in a facility. In a successful payment transaction, a payment amount gets debited from the payment account of the cardholders 124a-124c (e.g., the cardholder 124a) and gets credited in the payment account of the merchants 126a-126c (e.g., the merchant 126a). In one embodiment, the payment account of the merchants 126a-126c may be associated with an acquirer bank (e.g., the acquirer server 136).
In one embodiment, the issuer server 134 is associated with a financial institution normally called an "issuer bank" or "issuing bank" or simply "issuer", in which a cardholder (e.g., the cardholders 124a-124c) may have a payment account, (which also issues a payment card, such as a credit card or a debit card), and provides microfinance banking services (e.g., payment transaction using credit/debit cards) for processing electronic payment transactions, to the cardholder (e.g., the cardholders 124a-124c).
In one embodiment, the acquirer server 136 is associated with a financial institution (e.g., a bank) that processes financial transactions. This can be an institution that facilitates the processing of payment transactions for physical stores, merchants, or an institution that owns platforms that make online purchases or purchases made via software applications possible (e.g., shopping cart platform providers and in-app payment processing providers). The terms “acquirer”, “acquiring bank”, “acquiring bank” or “acquirer server” will be used interchangeably herein.
The server system 122 is configured to perform one or more of the operations described herein. The server system 122 is configured to perform scalable unsupervised representation learning for graphs (i.e., homogeneous graphs or heterogeneous graphs). In an embodiment, the server system 122 is identical to the server system 102 of FIG. 1A. In another embodiment, the server system 122 is the payment server 140. The server system 122 is a separate part of the environment 120 and may operate apart from (but still in communication with, for example, via the network 130), the plurality of cardholders 124a-124c, the plurality of merchants 126a-126c, and any third-party external servers (to access data to perform the various operations described herein). However, in other embodiments, the server system 122 may actually be incorporated, in whole or in part, into one or more parts of the environment 120, for example, the cardholder 124a. In addition, the server system 122 should be understood to be embodied in at least one computing device in communication with the network 130, which may be specifically configured, via executable instructions, to perform as described herein, and/or embodied in at least one non-transitory computer-readable media.
The server system 122 is configured to access a homogeneous graph of historical payment transaction data associated with the set of entities 104a-104c of FIG. 1A from the database 128. The database 128 may include historical payment transaction data including a plurality of payment transactions performed between the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c. In one embodiment, the historical payment transactions may be performed between the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c over a time period (e.g., 1 year, 2 years, 7 years, etc.).
In one embodiment, the server system 122 is configured to access a heterogeneous graph including the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c represented as nodes, and historical payment transactions performed between the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c represented as edges. The server system 122 is further configured to convert the heterogeneous graph into the homogeneous graph based on methods already known in the art. Thus, the server system 122 is configured to generate the homogeneous graph including the set of entities 104a-104c (i.e., either the cardholders 124a-124c or the merchants 126a-126c) from the heterogeneous graph.
The server system 122 is further configured to sample the homogeneous graph into the plurality of sub-graphs based, at least in part, on one or more graph sampler methods. The one or more graph sampler methods may include a node sampler method for sampling nodes and an edge sampler method for sampling edges of the homogeneous graph. Furthermore, the server system 122 is configured to identify positive samples and negative samples corresponding to each node associated with each sub-graph. The positive samples and the negative samples are identified based, at least in part, on a set of sampling methods such as, node neighborhood similarity method, neighborhood feature similarity method, and the like.
Moreover, the server system 122 is configured to train a plurality of neural networks to determine an intermediate node embedding for an individual node associated with each sub-graph. The individual node may get sampled in a single sub-graph or in one or more sub-graphs of the plurality of sub-graphs. In one embodiment, a number of the plurality of neural networks is equal to a number of the sub-graphs. For example, if the homogeneous graph is sampled into 57 sub-graphs, the server system 122 is configured to train 57 neural networks in parallel to perform the unsupervised representation learning for the homogeneous graph.
Initially, the server system 122 is configured to generate node embeddings of the individual node that is sampled in the one or more sub-graphs of the plurality of sub-graphs based, at least in part, on the corresponding graph neural network (GNN) based encoders associated with the one or more sub-graphs. In one embodiment, each GNN based encoder is configured to generate the node embeddings for each sub-graph of the plurality of sub-graphs. In addition, each GNN based encoder uses an aggregator function, such as graph convolutional network (GCN) function, mean function, long short-term memory (LSTM) function, pooling function, or any other similar aggregator function.
The server system 122 is further configured to determine an attention vector associated with the individual node based, at least in part, on an attention mechanism and the node embeddings. In one embodiment, the attention mechanism is used to assign a weightage value or importance value to the corresponding node embeddings of the individual node generated for the one or more sub-graphs of the plurality of sub-graphs. Furthermore, the server system 122 is configured to aggregate the corresponding node embeddings of the individual node (i.e., sampled in the one or more sub-graphs) to determine the intermediate node representation of the individual node based, at least in part, on a non-linearity function (e.g., multi-layer perceptron (MLP) function) and the attention vector. In one embodiment, the graph database 132 provides storage location to data associated with the GNN based encoders. The server system 122 is also configured to update neural network parameters (e.g., weights and biases) of the plurality of neural networks to reduce a cumulative loss function (explained in detail hereinafter with reference to FIG. 4) till a threshold value for the training data.
In one embodiment, the payment network 138 may be used by the payment card issuing authorities as a payment interchange network. The payment network 138 may include a plurality of payment servers such as the payment server 140. Examples of payment interchange networks include, but are not limited to, Mastercard® payment system interchange network. The Mastercard® payment system interchange network is a proprietary communications standard promulgated by Mastercard International Incorporated® for the exchange of financial transactions among a plurality of financial activities that are members of Mastercard International Incorporated®. (Mastercard is a registered trademark of Mastercard International Incorporated located in Purchase, N.Y.).
The number and arrangement of systems, devices, and/or networks shown in FIG. 1B is provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1B. Furthermore, two or more systems or devices shown in FIG. 1B may be implemented within a single system or device, or a single system or device shown in FIG. 1B may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 120 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 120.
Referring now to FIG. 2, a simplified block diagram of a server system 200 is shown, in accordance with an embodiment of the present disclosure. The server system 200 is similar to the server system 102 or the server system 122. In some embodiments, the server system 200 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture.
The server system 200 includes a computer system 202 and a database 204. The computer system 202 includes at least one processor 206 for executing instructions, a memory 208, a communication interface 210, and a storage interface 214 that communicate with each other via a bus 212.
In some embodiments, the database 204 is integrated within the computer system 202. For example, the computer system 202 may include one or more hard disk drives as the database 204. The storage interface 214 is any component capable of providing the processor 206 with access to the database 204. The storage interface 214 may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing the processor 206 with access to the database 204. In one embodiment, the database 204 is configured to store graph neural network (GNN) based encoders.
Examples of the processor 206 include, but are not limited to, an application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a field-programmable gate array (FPGA), and the like. The memory 208 includes suitable logic, circuitry, and/or interfaces to store a set of computer-readable instructions for performing operations. Examples of the memory 208 include a random-access memory (RAM), a read-only memory (ROM), a removable storage drive, a hard disk drive (HDD), and the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the memory 208 in the server system 200, as described herein. In another embodiment, the memory 208 may be realized in the form of a database server or cloud storage working in conjunction with the server system 200, without departing from the scope of the present disclosure.
The processor 206 is operatively coupled to the communication interface 210 such that the processor 206 is capable of communicating with a remote device 216 such as, the payment server 140, or communicating with any entity connected to the network 110 (as shown in FIG. 1A) or the network 130 (as shown in FIG. 1B). In one embodiment, the processor 206 is configured to access the homogeneous graph of historical interaction data (e.g., historical payment transaction data) associated with the set of entities 104a-104c from the database 108.
It is noted that the server system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the server system 200 may include fewer or more components than those depicted in FIG. 2.
In one embodiment, the processor 206 includes a data pre-processing engine 218, a graph creation engine 220, a graph sampling engine 222, a neural network engine 224, an attention module 226, and a prediction module 228. It should be noted that components, described herein, such as the data pre-processing engine 218, the graph creation engine 220, the graph sampling engine 222, and the neural network engine 224 can be configured in a variety of ways, including electronic circuitries, digital arithmetic and logic blocks, and memory systems in combination with software, firmware, and embedded technologies. In one embodiment, the processor 206 is configured to run or execute an algorithm (stored in the GNN based encoders) to perform scalable unsupervised representation learning for the homogenous graphs.
With reference to the FIG. 1A, the data pre-processing engine 218 is configured to access historical interaction data including a plurality of interactions associated with the set of entities 104a-104c from a database (e.g., the database 108) for a period of time (e.g., 1 month, 6 months, 1 year, 2 years, etc.). In an example, the historical interaction data includes interactions or relationships between the set of entities 104a-104c.
With reference to the FIG. 1B, the data pre-processing engine 218 is configured to access historical payment transaction data including a plurality of payment transactions performed between the plurality of cardholders 124a-124c from the database 128 for the period of time (e.g., 1 month, 6 months, 1 year, 2 years, etc.). In another example, the data pre-processing engine 218 is configured to access historical payment transaction data including a plurality of payment transactions performed between the plurality of merchants 126a-126c from the database 128 for the period of time (e.g., 1 month, 6 months, 1 year, 2 years, etc.). In one example, the historical transaction data includes information such as merchant name identifier, unique merchant identifier, a timestamp, geo-location data, information of the payment instrument involved in the payment transaction, and the like. The historical transaction data defines relationships between cardholder accounts and merchants. For example, when a cardholder purchases an item from a merchant, a relationship is defined. Thus, the historical transaction data can be leveraged to expose a variety of different attributes of the accounts, such as account activity, customer preferences, similarity to other accounts, and the like. In one non-limiting example, the historical payment transaction data includes information such as merchant name identifier, unique merchant identifier, a timestamp, geo-location data, information of the payment instrument involved in the payment transaction, and the like. However, the historical transaction data is sparse, as any given cardholder account (which includes merchant accounts that perform transactions with other merchants) interacts with a small fraction of merchants. Similarly, any given merchant may interact with a fraction of the cardholder accounts. Therefore, the historical transaction data implicitly create a bipartite graph between accounts.
In one embodiment, the data pre-processing engine 218 is configured to perform operations (such as data-cleaning, normalization, feature extraction, and the like) on the historical payment transaction data. The data pre-processing engine 218 may use natural language processing (NLP) algorithms to extract a plurality of graph features based on the historical payment transaction data. The plurality of graph features is used to generate a heterogeneous transaction graph.
In one embodiment, the plurality of graph features may include, but is not limited to, geo-location data associated with the payment transactions, population density, transaction velocity (i.e., frequency of financial transactions among the cardholders), historical fraud data, and transaction history. The plurality of graph features is converted into a plurality of feature vectors.
The graph creation engine 220 includes suitable logic and/or interfaces for generating a homogeneous graph based, at least in part, on the plurality of graph features (i.e., the plurality of feature vectors) identified from the historical interaction data, where each entity belongs to the same type. The homogeneous graph represents a computer-based graph representation of a set of entities (e.g., the set of entities 104a-104c, etc.) as nodes. In addition, interactions/ relationships between the nodes are represented as edges (i.e., weight, or unweight).
In one embodiment, heterogeneous graphs can be accessed according to the requirements of downstream tasks when the entities are of various types. For example, if the downstream tasks are to recommend communities to relevant users on the network platform, the heterogeneous graph may be a graph network constructed based on social behaviors of all users on the network platform, relations among the users, and relations between users and communities. The user's social behaviors may include, for example, published articles, comments of articles published by other users, and participating communities. The heterogeneous graph includes various types of nodes such as users, communities, articles, and comments. In another example, the graph creation engine 220 is configured to generate a heterogeneous transaction graph including the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c based, at least in part, on the plurality of graph features (i.e., the plurality of feature vectors) identified from the historical payment transaction data. In the case of the heterogeneous transaction graph, the graph creation engine 220 is configured to convert the heterogeneous graph into a homogeneous graph based on the common link between either two merchants or two payment cards.
The heterogeneous transaction graph may represent a computer-based graph representation of the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c as nodes. In addition, interactions/ relationships between the cardholders 124a-124c and the merchants 126a-126c may be represented as edges (i.e., weigh, or unweight). The heterogeneous transaction graph is further converted into one or more homogeneous graphs containing, where each homogenous graph includes only the single set of entities (e.g., the cardholders 124a-124c, or only the plurality of merchants 126a-126c) as nodes.
In one embodiment, the plurality of feature vectors for a particular node may be represented as the plurality of node feature vectors. Additionally, in the case of the homogeneous graph, the plurality of node feature vectors for all the nodes are identical to each other due to the reason that the data features for all the nodes are always identical as all the nodes belong to a single entity (e.g., the set of entities 104a-104c) in the homogeneous graph. In an example, the plurality of data features associated with the cardholders 124a-124c may include demographic details such as name, age, gender, occupation details, employment details, average card spend, etc. In another example, the plurality of data features associated with the merchants 126a-126c may include merchant configuration profile, merchant identifier, chargeback data, etc.
The graph creation engine 220 generates the homogeneous graph that associates the set of entities 104a-104c (i.e., nodes) with each other using one or more relationships (i.e., edges). In an example, the homogeneous graph may include the nodes (e.g., nodes relating to the payment instruments associated with the plurality of cardholders 124a-124c) and edges (e.g., edges representing payment transactions performed among the plurality of cardholders 124a-124c). In another example, the homogeneous graph may include the nodes (e.g., nodes relating to the merchant identifiers associated with the plurality of merchants 126a-126c) and edges (e.g., edges representing payment transactions performed among the plurality of merchants 126a-126c). The homogeneous graph is a node-based structure including a plurality of nodes (representing the set of entities 104a-104c) interconnected with each other using edges.
In some embodiments, the homogeneous graph may include metadata associated with the nodes and/or information identifying the one or more relationships (for example, payment transactions, etc.) among the nodes. In addition, the homogeneous graph gets modified or updated with time. In an example, each edge of the homogeneous graph represents a payment transaction performed between a cardholder (e.g., the cardholder 124a) and other cardholders (e.g., the cardholder 124c). In another example, each edge of the homogeneous graph represents a payment transaction performed between a merchant (e.g., the merchant 126a) and other merchants (e.g., the merchant 126c).
The graph sampling engine 222 includes suitable logic and/or interfaces for segmenting the homogenous graph into a plurality of sub-graphs in which each sub-graph includes a subset of a plurality of nodes that are included in the homogenous graph. The graph sampling engine 222 implements one or more graph sampler methods. Because the homogenous graph is segmented into the plurality of sub-graphs, the same individual node (e.g., a single node 'u') may appear in more than one sub-graph (represented as one or more sub-graphs) of the plurality of sub-graphs. For example, the node 'u' may appear in the second, sixth, and eleventh sub-graph of the plurality of sub-graphs, etc.
Examples of the one or more graph sampler methods may include a node sampler method, an edge sampler method, or any other similar sampler method. In one embodiment, the one or more graph sampler methods are used to create normalized adjacency matrices and feature matrices for each sub-graph of the plurality of sub-graphs. In one embodiment, the adjacency matrix is created to remove bias introduced due to the sampling of the homogeneous graph.
In one embodiment, a normalized adjacency matrix and a feature matrix are created for each sub-graph of the plurality of sub-graphs. For example, an adjacency matrix A1 and a feature matrix F1 are created for a sub-graph S1, an adjacency matrix A2, and a feature matrix F2 is created for a sub-graph S2, an adjacency matrix A3, and a feature matrix F3 is created for a sub-graph S3, and so on.
In one embodiment, the number of the plurality of sub-graphs segmented from the homogeneous graph is in a range of 1 to 'n', where 'n' is a natural number. In one embodiment, the graph sampling engine 222 executes the node sampler method for sampling nodes. According to the node sampler method, the probability of sampling a node 'u' is directly proportional to the degree of the node 'u'. In general, the degree of a node is the number of connections that a node has to other nodes in a network. In one embodiment, the graph sampling engine 222 executes the edge sampler method for sampling edges. According to the edge sampler method, the probability of sampling an edge 'e' existing between two nodes 'u' and 'v' is directly proportional to the sum of the inverse of degrees of the two nodes 'u' and 'v'.
Further, the graph sampling engine 222 is configured to identify positive and negative samples corresponding to each node in a particular sub-graph based, at least in part, on a set of sampling methods. The set of sampling methods includes a node neighborhood similarity method and a neighborhood feature similarity method. Detail of the set of sampling methods is described with reference to the FIG. 3. The positive samples include the first and second sets of positive samples identified by the node neighborhood similarity method and the neighborhood feature similarity method, respectively. The negative samples include the first and second sets of negative samples identified by the node neighborhood similarity method and the neighborhood feature similarity method, respectively.
In particular, the graph sampling engine 222 executes the node neighborhood similarity method to identify the first set of positive samples and the first set of negative samples corresponding to each node associated based, at least in part, on proximity and influence of direct neighbor nodes (i.e., one-hop neighbor nodes) of each node in the homogeneous graph. More specifically, according to the node neighborhood similarity method, the first set of positive samples of a node 'u' are those nodes that are direct neighbor nodes or one-hop neighbor nodes of the node 'u' with the least degree. Similarly, the first set of negative samples of a node 'u' is the nodes with the highest degree.
In one embodiment, the graph sampling engine 222 executes the neighborhood feature similarity method to identify the second set of positive samples and the second set of negative samples for each node in each sub-graph according to node features. As mentioned earlier, the graph sampling engine 222 is configured to generate the adjacency matrix and the feature matrix for each sub-graph of the plurality of sub-graphs. In one embodiment, the graph sampling engine 222 is configured to analyze the generated feature matrix to identify the second set of positive and negative samples. More specifically, the neighborhood feature similarity method identifies the second set of positive and negative samples of a node based on the node feature vectors of the node. A detailed explanation of performing the steps for executing the neighborhood feature similarity method is herein explained in detail with reference to FIG. 4, and therefore, they are not reiterated for the sake of brevity.
In one embodiment, the graph sampling engine 222 is configured to execute the sub-graph similarity distribution method to calculate a prior graph similarity distribution among the sub-graphs based, at least in part, on Weisfeiler-Lehman graph kernels based methodology. The prior graph similarity distribution is calculated for identifying similar sub-graphs of the plurality of sub-graphs. More specifically, the prior graph similarity distribution is calculated for identifying the sub-graphs that are similar in structure to each other.
The neural network engine 224 includes suitable logic and/or interfaces for training a plurality of neural networks based, at least in part, on training data and a cumulative loss function. The training data includes positive and negative samples (including first and second sets of positive and negative samples) of each node generated based on the set of sampling methods. In one example, each node may be represented as a one-hot encoder vector. The plurality of neural networks may include graph neural network (GNN) based encoders. In one embodiment, the GNN based encoder consists of an aggregator function. In one non-limiting example, the GNN based encoder consists of a graph convolutional network (GCN) aggregator function. In one embodiment, the GNN based encoder consists of mean aggregator function, long short-term memory (LSTM) aggregator function, pooling aggregator function, or any other aggregator function of the like.
Each GNN based encoder is configured to generate node representations of nodes included in a particular sub-graph. In one embodiment, a number of the plurality of neural networks is equal to a number of the plurality of sub-graphs. In addition, each sub-graph of the plurality of sub-graphs may be passed as input through each neural network of the plurality of neural networks. The plurality of neural networks is trained in such a way that local sub-graph structure, and global learning on node level and the sub-graph level are learned. As a possible implementation, neural network parameters of the plurality of neural networks can be trained according to the node representations of each node and the training data through calculating gradient values of all weight matrices of the plurality of neural networks based on gradient descent algorithm. Therefore, with the gradient descent algorithm, an unsupervised learning technique, less content is required to be memorized by the model, which is conducive to simplifying the training process.
It should be noted that the process of training the neural network parameters of the plurality of neural networks is an iterative process. By calculating the cumulative loss function, the neural network parameters of the plurality of neural networks are continuously updated until the training of the neural network parameters are converged and the model training is complete.
During the training process, the neural network engine 224 is configured to generate intermediate node representations of each individual node included in each sub-graph in an iterative manner. In addition, the neural network engine 224 is configured to train the plurality of neural networks to determine a sub-graph representation for each sub-graph of the plurality of sub-graphs.
To train the plurality of neural networks, the neural network engine 224 is configured to execute a plurality of operations for an individual node associated with each sub-graph. In one non-limiting example, for an ith individual node, k first node representations of the ith individual node are generated, where i and k are positive integers. The first node representations or node embeddings are generated using GNN based encoders corresponding to k number of sub-graphs in which the ith individual node is present.
To aggregate the k first node representations of the ith individual node, an attention context vector is determined via an attention module 226. The attention module 226 is configured to learn the relative importance of each k first node representations based on an attention mechanism.
Based on the attention context vector, the k first node representations of the ith individual node are aggregated to generate an intermediate node representation of the ith individual node.
In other words, the processor 206 is configured to aggregate the first node representations of the individual node generated for the one or more sub-graphs of the plurality of sub-graphs to determine the intermediate node representation of the individual node based, at least in part, on a non-linearity function (i.e., multi-layer perceptron (MLP) function) and the attention context vector.
It is to be noted that the intermediate node representations for the nodes that are sampled in only a single sub-graph may be obtained directly through the corresponding GNN based encoder that processes the sub-graph containing the node. Therefore, it may not be necessary to determine the attention vector for such nodes and perform the aggregation step.
According to one aspect, during the training process, the neural network engine 224 is configured to update neural network parameters (e.g., weights and biases) of the plurality of neural networks and attention context vectors of the plurality of nodes to reduce the cumulative loss function till a threshold value for the training data. The neural network engine 224 is configured to determine the cumulative loss function based, at least in part, on a graph-level loss function and a summation of the first node-level loss value (L1v) and second node-level loss value (L2v) corresponding to each node.
In one embodiment, the processor 206 is configured to calculate a posterior graph similarity distribution between the plurality of sub-graphs to identify similar sub-graphs of the plurality of sub-graphs in the embedding space. More specifically, the processor 206 is configured to identify the sub-graphs of the plurality of sub-graphs that are similar in structure with each other. The processor 206 is further configured to compare the posterior graph similarity distribution with the prior graph similarity distribution to calculate a graph-level loss function using a KL divergence layer.
The graph-level loss function is calculated to ensure that the generated graph representation (i.e., the posterior graph similarity distribution) through the corresponding GNN based encoder should be closer to the prior graph similarity distribution (i.e., calculated using Weisfeiler-Lehman graph kernel) for each sub-graph using KL divergence.
In one embodiment, the neural network engine 224 is configured to calculate first and second node-level loss values corresponding to the individual node based, at least in part, first and second loss functions, respectively. The first loss function for the individual node is defined with an objective such that the intermediate node representation is in the proximity of the first set of positive samples in an embedding space. In other words, the first loss function ensures that the intermediate node representations of each node should be similar to the first set of positive samples of the node and should be distinct from the first set of negative samples of the node in the embedding space. The cumulative loss function is calculated as a weighted loss calculated from the node neighborhood similarity method and the neighborhood feature similarity method and KL divergence of prior graph similarity distribution and the posterior graph similarity distribution. The calculation of the cumulative loss function ensures learning of both local and global graph structures.
In one embodiment, the neural network engine 224 may compute a similarity between node representations of the positive samples of a particular node with node representation of the particular node in the embedding space. The similarity may comprise cosine similarity, inner products, or any similarity function between two or more vectors. Conversely, in some embodiments, a distance function may be used to compare two or more vectors. Doing so may generate values that indicate the distance between each node representation of positive samples in the n-dimensional embedding space of the node.
Once the training process of the plurality of neural networks is complete, the neural network engine 224 is configured to determine final node representations of the plurality of nodes.
In one embodiment, the prediction module 228 is configured to predict results for a plurality of graph context prediction tasks based, at least in part, on the final node representations of the plurality of nodes included in the homogenous graph. In one embodiment, the plurality of context prediction tasks may be associated with the trained plurality of neural networks. The plurality of context prediction tasks may include node classification, link prediction, etc.
In some practical implementations, the plurality of graph context prediction tasks may include, but are not limited to, predicting fraudulent transactions, predicting anti-money laundering activities, predicting transaction patterns in P2P (Peer to Peer) network-based transactions, etc.
In one non-limiting example, the prediction module 228 is configured to predict whether a particular payment transaction involving a merchant and a cardholder is fraudulent or not based, at least in part, on the final node representations of the merchant and the cardholder.
FIG. 3 is an example representation 300 of communication data flow between various modules of the server system 200, in accordance with an embodiment of the present disclosure.
As explained above, the data pre-processing engine 218 is configured to access the historical interaction data from the database (e.g., the database 108 or the database 128) (see, 302). In one non-limiting example, the historical interaction data is historical payment transaction data of payments performed between the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c over the period of time (e.g., 6 months, 1 year, 2 years, etc.). The data pre-processing engine 218 is configured to perform pre-processing operations (such as data cleansing, normalization, and the like) on the historical interaction data. In one embodiment, the data pre-processing engine 218 is configured to generate the plurality of graph features based on the historical interaction data. In addition, the plurality of graph features is represented in the form of a plurality of node features for each node.
Thereafter, the data pre-processing engine 220 is configured to provide the plurality of graph features to the graph creation engine 220 to generate the homogeneous graph based on the plurality of graph features (see, 304). For example, the graph creation engine 220 is configured to generate the homogeneous graph of payment transactions (represented as edges) performed between payment instruments (e.g., credit cards) (represented as nodes). In one embodiment, the graph creation engine 220 is configured to create the heterogeneous graph based on the plurality of graph features. For example, the graph creation engine 220 is configured to generate the heterogeneous graph of payment transactions (represented as edges) performed between cardholders (e.g., the cardholders 124a-124c) (represented as nodes) and merchants (e.g., the merchants 126a-126c) (represented as nodes).
The graph creation engine 220 is configured to convert the heterogeneous graph into the one or more homogeneous graphs based on methods already known in the art. For example, the graph creation engine 220 is configured to convert the heterogeneous graph (e.g., including the cardholders 124a-124c and the merchants 126a-126c) into one homogeneous graph including only the cardholders 124a-124c and one homogeneous graph including only the merchants 126a-126c. In one embodiment, the graph creation engine 220 is configured to store the homogeneous graph in the form of graph data structure in the database 108 or the database 128. The graph sampling engine 222 is further configured to access the homogeneous graph (see, 306). In one embodiment, the graph sampling engine 222 is configured to access the homogeneous graph from the database 108 or the database 128.
The graph creation engine 220 is configured to transmit the homogenous graph to the graph sampling engine 222 to segregate the homogenous graph (e.g., G(V,E)) into a plurality of sub-graphs (see, 306).
The graph sampling engine 222 is configured to sample the homogeneous graph into the plurality of sub-graphs (e.g., M sub-graphs). In one embodiment, the one or more graph sampler methods are used to sample the homogeneous graph into the plurality of sub-graphs. In one embodiment, each sub-graph includes a subset of the plurality of nodes. In one embodiment, the one or more graph sampler methods include a node sampler method and an edge sampler method. The one or more graph sampler methods are used to create normalized adjacency matrices (bias normalization constants) and feature matrices to remove bias introduced due to sampling of the homogeneous graph.
In one embodiment, the node sampler method may be defined as: P_v∝ 〖Degree〗_v. Here, the probability Pv of sampling a node 'v' is directly proportional to the degree of the node 'v'. In one embodiment, the edge sampler may be defined as: P_((u,v))∝ 〖1/Degree〗_u + 〖1/Degree〗_v. Here, the probability P(u, v) of sampling an edge (u, v) is directly proportional to sum of inverse of their degrees.
In one embodiment, the graph sampling engine 222 is configured to identify positive and negative samples corresponding to each node of each sub-graph based, at least in part, on a set of sampling methods. The set of sampling methods is as follows:
1. Node Neighborhood similarity method: The node neighborhood similarity method identifies the first set of positive samples and the first set of negative samples (denoted as (P, N)) using the homogenous graph G(V, E). In one example implementation, for an ith individual node, J number of adjacent nodes or first hop neighbor nodes is identified. Thereafter, the processor 206 is configured to sort the J first hop neighbor nodes in accordance with degree value in ascending order and select top k number of first hop neighbor nodes from the sorted first hop neighbor nodes. In general, the degree of a node is the number of connections that a node has to other nodes in a network graph. Based on the probability function defined in the node neighborhood similarity method, the processor 206 is configured to identify the first set of negative samples that have a high degree, thereby reducing their influence, and identify positive samples nodes that have the lesser degree, thereby avoiding learning from weakly connected neighbors. A set from the top k number of first hop neighbor nodes is considered as the first set of positive samples that has the least degree (i.e., higher the strength of connection with the individual node). In similar manner, a second set from the top k number of first hop neighbor nodes is considered as the first set of negative samples that has the higher degree (i.e., weaker the strength of connection with the individual node).
In one embodiment, the graph sampling engine 222 is configured to execute the node neighborhood similarity method to identify the first set of positive samples and the first set of negative samples for each node based on neighborhood proximity and influence. More specifically, the node neighborhood similarity method defines the first set of positive samples as direct neighbor nodes (i.e., one-hop neighbor nodes) of each node with the least degree. Mathematically, the node neighborhood similarity method may be represented as:
LP_a= 〖min〗_(v1,v2,..) (deg(v))∀v,(a,v) ∈ ε_G … Eqn. (1)
2. Neighborhood feature similarity method: This method identifies second set of positive samples and second set of negative samples (denoted as (p, n)) for each node at sub-graph scale based on node features of all the nodes included in the sub-graph. To identify the second set of positive and negative samples based on similar neighborhood node attributes (i.e., node features), the processor 206 is configured to analyze feature matrix for each sub-graph. For an ith individual node, top n number of similar nodes is identified based on cosine similarity in node features. Further, since the ith individual node is sampled into p sub-graphs, the top n number of similar nodes from p*n nodes are identified and similarly, top n dissimilar nodes from p*n dissimilar nodes are identified. Mathematically, the neighborhood feature similarity method may be represented as:
GP_a= 〖max〗_(v1,v2,v3 ...vB) (f'a.f'v)∀v∈{{〖max〗_(v1,v2,v3 ...vB) (f'a.f'v)∀v∈SG}∀SG∈setofsubgraphs} …Eqn. (2)
NS_a= 〖min〗_(v1,v2,v3 ...vC) (f'a.f'v)∀v∈{{〖min〗_(v1,v2,v3 ...vC) (f'a.f'v)∀v∈SG}∀SG∈setofsubgraphs} …Eqn. (3)
In one embodiment, the graph sampling engine 222 is configured to execute the sub-graph similarity distribution method to maintain similarity distribution between the plurality of sub-graphs. In one embodiment, the prior graph similarity distribution between the plurality of sub-graphs is calculated using Weisfeiler-Lehman graph kernels. In one embodiment, the prior graph similarity distribution between the plurality of sub-graphs is calculated as:
P = {█(SG_1 ---- K (SG_1,SG_1),K (SG_1,SG_2),K (SG_1,SG_3)@SG_2 ---- K (SG_2,SG_1),K (SG_2,SG_2),K (SG_2,SG_3)@SG_3 ---- K (SG_3,SG_1),K (SG_3,SG_2),K (SG_3,SG_3))┤ … Eqn. (4)
The similarities between various sub-graphs are calculated using the WL subtree kernel that generates a m*m similarity matrix T where the value at (i,j) indicates the similarity of subgraph G_(s_i ) with subgraph G_(s_j ).
It is to be noted that equation (4) denotes the prior graph similarity distribution for only three sub-graphs SG_1, SG_2, and SG_3, however, the equation (4) may denote the prior graph similarity distribution for the plurality of sub-graphs in a similar manner. Additionally, it is to be noted that the identification of the positive samples and the negative samples for each node of each sub-graph is a one-time process and the positive samples (i.e., the first set of positive samples and the second set of positive samples) and the negative samples (i.e., the first set of negative samples and the second set of negative samples) are stored in the memory 208 for further calculations.
Once the identification of the positive samples and the negative samples for each node in each sub-graph is complete, the neural network engine 224 is configured to receive training data from the graph sampling engine 222 (see, 308). The training data may include positive and negative samples of each node, and the graph data (for example, feature and adjacency matrices) corresponding to each sub-graph. The neural network engine 224 is configured to train the plurality of neural networks to determine node representations (i.e., embeddings) for each node of the homogeneous graph. In one embodiment, the neural network engine 224 is configured to execute the plurality of operations for the individual node (i.e., a node that gets sampled in various sub-graphs) associated with each sub-graph. In one embodiment, the plurality of neural networks is trained based on the training data (i.e., the positive samples and the negative samples) and a cumulative loss function. The cumulative loss function is determined based on a graph-level loss function, and a summation of a first node-level loss value and a second node-level loss value corresponding to each node.
The neural network engine 224 is configured to generate the node embeddings (i.e., first node representations) of the individual node that is sampled in the one or more sub-graphs of the plurality of sub-graphs. The node embeddings of the individual node for the one or more sub-graphs are generated based, at least in part, on the corresponding GNN based encoders associated with the one or more sub-graphs. In one embodiment, each GNN based encoder consists of an aggregator function. In one embodiment, the aggregator function includes graph convolutional network (GCN) aggregator, mean aggregator, long short-term memory (LSTM) aggregator, pooling aggregator, or any other aggregator of the like.
In one embodiment, each sub-graph is passed through a GNN based encoder of the GNN based encoders. In one embodiment, the number of the plurality of sub-graphs is a user-defined hyper-parameter and may be set by an administrator (i.e., a person that has the permission to operate or run the server system 200). In addition, the number of the plurality of sub-graphs may depend on the sampling method used to sample the homogeneous graph.
The neural network engine 224 may send an instruction to the attention module 226 to determine the attention context vector associated with the individual node (see, 310).
The attention module 226 is configured to determine an attention context vector associated with the individual node sampled in the one or more sub-graphs based on the attention mechanism and the node embeddings or first node representations. In one embodiment, the attention mechanism defines a set of rules or protocols to enable the attention module 226 to assign the attention context vector to the individual node sampled in the one or more sub-graphs. The attention context vector may include importance values of one or more sub-graphs for the individual node.
In one embodiment, the attention mechanism defines how much importance or weightage must be assigned to which first node representation for the individual node out of the various first node representations i.e., node embeddings generated with respect to the one or more sub-graphs. In one embodiment, the attention context vector may change based according to the plurality of graph context prediction tasks. In one embodiment, the attention mechanism facilitates aggregation of the corresponding node embeddings (i.e., multiple node embeddings) of the individual node generated for the one or more sub-graphs and assign them importance based on the graph context prediction task (e.g., prediction of fraudulent or non-fraudulent payment transaction).
The attention module 226 is configured to send the attention context vector of the individual node to the neural network engine 224 (see, 312).
The neural network engine 224 is configured to determine the intermediate node representation of the individual node based, at least in part, on the non-linearity function and the attention context vector. In one embodiment, the intermediate node representation for the individual node is determined by aggregating the corresponding node embeddings of the individual node generated for the one or more sub-graphs in which the individual node sampled.
In one example, a node 'a' gets sampled in the sub-graph S1 and the sub-graph S2. Similarly, a node 'b' gets sampled in the sub-graph S1 and the sub-graph S2. In addition, c1 is the attention context vector assigned to the node 'a' sampled in the sub-graph S1, c2 is the attention context vector assigned to the node 'b' sampled in the sub-graph S1, c3 is the attention context vector assigned to the node 'a' sampled in the sub-graph S2, and c4 is the attention context vector assigned to the node 'b' sampled in the sub-graph S2. The final node representation for the node 'a' is calculated as: A_final = a1*c1+a2*c3. Similarly, the final node representation for the node 'b' is calculated as: B_final = b1*c2+b2*c4.
The neural network engine 224 is configured to optimize the neural network parameters (e.g., weights and biases) of the plurality of neural networks and attention context vectors of the plurality of nodes to reduce the cumulative loss function till the threshold value for the training data. In an embodiment, the cumulative loss function includes the graph-level loss function and a node-specific loss function. The node-specific loss function is calculated to ensure that all the positive samples (i.e., the first set of positive samples identified based on the node neighborhood similarity method and the second set of positive samples identified based on the neighborhood feature similarity method) of the individual node should have similar node representations and the node representations determined for all the negative samples (i.e., the first set of negative samples identified based on the node neighborhood similarity method and the second set of negative samples identified based on the neighborhood feature similarity method) of the individual node should be distinct in the embedding space. More specifically, the node-specific loss function is a weighted aggregation of loss calculated for all the positive samples and all the negative samples obtained from the node neighborhood similarity method and the neighborhood feature similarity method. Mathematically, the node-specific loss function may be represented as:
L_v= - ∑_(〖u∈Positivesample〗_v)▒log〖(σ〗 (v^T.u))- ∑_(〖u'∈Negativesample〗_v)▒log〖(σ〗 (〖-v〗^T.u'))
… Eqn. (5)
In one embodiment, the processor 206 is configured to calculate the posterior graph similarity distribution using cosine similarity between aggregated sub-graph representations. In one embodiment, the aggregated sub-graph representation is the mean aggregation of the node embeddings in each sub-graph of the plurality of sub-graphs. In one embodiment, the posterior graph similarity distribution between the plurality of sub-graphs is calculated as:
Q = {█(SG_1 ---- (SG_1,SG_1),(SG_1,SG_2),(SG_1,SG_3)@SG_2 ---- (SG_2,SG_1),(SG_2,SG_2),(SG_2,SG_3)@SG_3 ---- (SG_3,SG_1),(SG_3,SG_2),(SG_3,SG_3))┤ … Eqn. (6)
In one embodiment, the graph-level loss function is calculated as a difference between the posterior graph similarity distribution and the prior graph similarity distribution (i.e., Eqn. (6) – Eqn. (4)). Mathematically, the cumulative loss function may be calculated as:
Loss = ∑_(v∈G)▒L_v -∑_(i=1)^m▒∑_(j=1)^m▒〖P_ij log〖(Q_ij)〗 〗
… Eqn. (7)
In one embodiment, the neural network engine 224 is configured to calculate the cumulative loss function to learn the local as well as global structure of the homogeneous graph. In one embodiment, the cumulative loss function uses the normalization factor to reduce variance in forward α and backward propagation λ.
FIG. 4 illustrates a block diagram representation 400 of training a plurality of neural networks included in an unsupervised graph representation learning model, in accordance with an embodiment of the present disclosure. The block diagram representation 400 includes a homogeneous graph 402, a plurality of sub-graphs 404a, 404b, and 404c, and GNN based encoders 406a, 406b, and 406c.
As stated above, the processor 206 is configured to execute the plurality of neural networks (e.g., the GNN based encoders 406a-406c) to perform scalable unsupervised representation learning for the homogeneous graph 402. In one embodiment, the graph is a homogeneous graph (e.g., the homogenous graph 402) or a heterogeneous graph. In the case of a heterogeneous graph, the processor 206 is configured to convert the heterogeneous graph into the homogeneous graph 402 based on methods already known in the art. For example, a heterogeneous graph including a set of first entities (e.g., the cardholders 124a-124c) and a set of second entities (e.g., the merchants 126a-126c) can be converted into the homogeneous graph 402 of a single set of entities based on common link present between entities of the first set of entities (e.g., the cardholders 124a-124c) or common link present between entities of the second set of entities (e.g., the merchants 126a-126c).
Let us consider that the homogeneous graph 402 (also represented as G (V, E)) is sampled into 'M' number of sub-graphs (i.e., the plurality of sub-graphs 404a-404c). In one embodiment, V represents vertices (i.e., nodes) and E represents edges (i.e., connections). In one embodiment, the sampling is performed based on at least one of the node sampler methods, the edge sampler method, or any other sampler method of the like. In addition, the sampling is performed based on the downstream application (i.e., the graph context prediction task). In one embodiment, the plurality of sub-graphs 404a-404c are represented as G_(s_i ).
In one embodiment, number of the first set of positive samples and the first set of negative samples for every node v ∈ V are restricted to k. The first set of positive samples for v is the top k direct or first-hop (one-hop) neighbor nodes for the node v when the nodes are sorted according to a degree in ascending order. More specifically, the direct or first-hop neighbor nodes with the least degree are more preferred than the direct or first hop neighbor nodes with a high degree to be termed as the first set of positive samples. This is due to the reason that a lesser degree means higher strength of the connection.
In one embodiment, to obtain the first set of negative samples for the node v from V, the highest degree nodes in the graph G are preferred. This is because of the reason that the probability of sampling a node is directly proportional to the degree of the node (i.e., according to the node sampler method).
In one embodiment, finding similar nodes separated by many hops or finding similar nodes present in between the homogeneous graph 402 using node features from the homogeneous graph 402 is a tedious task because computation of X * X takes O(n^2.3 ). Here, 'X' denotes the feature matrix of all the nodes of dimensions n*d in the homogeneous graph 402. Here, 'n' is a number of the nodes, d is a dimension, and '*' represents matrix multiplication. However, it is easier to perform these calculations at sub-graph level G_(s_i ) because of a condition n_i≪n. In addition, calculations for X_i*X_i can be performed in parallel. Given the restriction of c positive samples for each node v, the c top nodes from every sub-graph G_(s_i ) based on node feature similarity are selected. Further, the top c samples from cm positive samples collated from all sub-graphs G_(s_i ) are selected (i.e., the second set of positive samples). Similarly, the negative samples are selected but with the least similarity (i.e., the second set of negative samples).
Once the positive and negative samples for every node in all the sub-graphs G_(s_i ) are identified, graph similarity distribution between the sub-graphs 404a-404c is calculated based on Weisfeiler-Lehman subtree kernel. More specifically, m * m similarity matrix T is generated where value at (i,j) indicates the similarity of sub-graph G_(s_i ) with sub-graph G_(s_j ). It is to be noted that these above-stated calculations are performed only once and then stored inside memory to perform the rest of the calculations.
In one embodiment, the processor 206 is further configured to train the plurality of neural networks (e.g., the GNN based encoders 406a-406c) based on the positive samples and the negative samples generated for every node in the plurality of sub-graphs 404a-404c and the cumulative loss function. In one embodiment, a number of the plurality of neural networks (e.g., the GNN based encoders 406a-406c) is equal to a number of the plurality of sub-graphs 404a-404c. Furthermore, the processor 206 is configured to generate an adjacency matrix and feature matrix for each sub-graph of the plurality of sub-graphs 404a-404c.
Let A_i, and X_i denote an adjacency matrix and feature matrix of the i^th sub-graph, respectively, and ε_i denotes the i^th encoder (i.e., the GNN based encoders 406a-406c). In one embodiment, any GNN based encoder may be plugged in based on the downstream task (i.e., the graph context prediction task). Let us consider that Y_(∈_i ) denotes the embeddings obtained for all the nodes processed through the i^th encoder. The processing of the nodes through all these encoders (i.e., the GNN based encoders 406a-406c) is performed in a parallel manner by giving an advantage in terms of processing speed for embedding generation at the sub-graph level.
The processor 206 is configured to generate the embeddings for all the nodes in the plurality of sub-graphs 404a-404c. In one example, let us consider that a node v is sampled in one or more sub-graphs (e.g., the sub-graph 404a and the sub-graph 404c) and therefore various embeddings are generated for the node v. Let the set s represent a set of all the sub-graphs (i.e., the one or more sub-graphs) in which the node v is sampled. The processor 206 is then configured to generate an attention context vector for the node v using all these various embeddings from the one or more sub-graphs to decide their relative importance based on attention mechanism 408. Mathematically, the attention context vector v may be calculated as:
context_v=W_1*∑_(i∈s)▒〖〖Y_∈〗_i〗_v …Eqn. (8)
The processor 206 is configured to calculate the intermediate node representation for the node v as:
Y_v=∑_(i∈s)▒〖f(〖〖〖context_v*Y〗_∈〗_i〗_v ")* " 〖〖Y_∈〗_i〗_v 〗…Eqn. (9)
Where f denotes a multi-layer perceptron (MLP) layer (i.e., the non-linearity function) to generate the relative importance on the representation 〖〖Y_∈〗_i〗_v in the intermediate node representation Y_v. The obtained Y_(∈_i ) are sent through the MLP layer that further generates the intermediate node representation representing the sub-graph. Let sg_i denote these intermediate sub-graph representations, where i∈ m.
The processor 206 is configured to ensure that the first set of positive samples for each node from P are to be placed closer in the embedding space and the first set of negative samples for each node from N to forced be placed far away from v respectively based on the first node-level loss value calculated using the following formula:
L_(1_v )=-∑_(x∈P)▒〖σ(Y_v*Y_x ) -∑_(y∈N)▒σ(〖-Y〗_v*Y_y ) 〗…Eqn. (10)
Similarly, the processor 206 is configured to calculate the second node-level loss value to ensure that node representation of the individual node should be similar to the second set of positive samples of the individual node in feature space and far away from the second set of negative samples of the individual node, respectively. The mathematical formula for the second node-level loss value is:
L_(2_v )=-∑_(x∈p)▒〖σ(Y_v*Y_x ) -∑_(y∈n)▒σ(〖-Y〗_v*Y_y ) 〗
… Eqn. (11)
where σ denotes the sigmoid loss function. In one embodiment, the feature-based similarity scores give partial global inferences of similarities in nodes even when they are not many hops away.
Using the intermediate sub-graph representation sg_i, the processor 206 is configured to generate a matrix R of size m*m, where value (i,j) denotes the cosine similarity between the embeddings sg_(i ) and sg_j.
To make inference at the global level, the processor 206 is configured to enforce that the prior graph similarity distribution obtained through the Weisfeiler-Lehman sub-tree kernel and distribution R (i.e., the posterior graph similarity distribution) to be close for every sub-graph based on KL divergence as:
L_3=-∑_(i∈m)▒〖∑_(j∈m)▒〖a_ij logb_ij 〗 〗…Eqn. (12)
where a_ij∈T and b_ij∈R respectively. The unsupervised loss value (i.e., the cumulative loss value) to be optimized is calculated using the following formula:
L=(∑_(v∈V)▒〖L_1+L2 〗)+L_3…Eqn. (13)
In one embodiment, the processor 206 is configured to calculate the gradients for all the weight matrices involved to minimize the cumulative loss value L. The processor 206 is configured to update neural network parameters of the sub-graphs 404a-404c and the attention mechanism 408 via back-propagation methods based on the cumulative loss value.
In a possible implementation, a pseudo-code of a method for performing scalable unsupervised graph representation learning is represented as:
Let SG denote the set of sub-graphs, N denote the set of nodes
Dic = {} //for prior distribution
Let m be the total number of sub-graphs
for i = 1, 2, 3, ...m
Dic[i] = []
for j = 1, 2, 3, ..., m
Dic[i].append(GED(SG[i], SG[j]))
Normalize Dic[i]
for epochs = 1, 2, …
Nodeemb = []
SGemb = []
for k = 1, 2, 3, ...m
emb = AkXkWk
sg = [0, 0, 0..]
for noderep ∈ emb
sg+ = noderep
sg/= len (emb)
Nodeemb.append(emb)
SGemb.append(sg)
for emb ∈ Nodeemb
Attention emb
Attention Finalnodeemb
loss = 0
for k = 1, 2, 3, …m
For l = 1, 2, 3, …m
loss+= −Dic[k][l]log2(SGemb[k]T.SGemb[l])
for n ∈ N
For ps ∈ Positivesamplesn
loss+= −log(𝜎(〖Finalnodeemb〗_n^T.〖Finalnodeemb〗_ps))
For ns ∈ Negativesamplesn
loss+= −log(𝜎(〖-Finalnodeemb〗_n^T.〖Finalnodeemb〗_ns))
Backpropogate and update GCNweights (Wk) and Attention weights
FIG. 5 is a flow chart 500 of a training process for an unsupervised graph representation learning model, in accordance with an embodiment of the present disclosure. The sequence of operations of the flow chart 500 may not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner. It is to be noted that to explain the flow chart 500, references may be made to elements described in FIG. 1A and FIG. 2.
At 502, the server system 102 accesses a homogenous graph associated with the set of entities 104a-104c from the graph database 106.
At 504, the server system 102 segments the homogenous graph into M sub-graphs based, at least in part, on one or more graph sampler methods.
At 506, the server system 102 identifies the positive and negative samples corresponding to each node based, at least in part, on the set of sampling methods. The positive and negative samples corresponding to each node are stored in the database 108.
At 508, the server system 102 may run a plurality of training epochs till predefined iterations and check the number of training epochs against the predefined iterations. The training of the plurality of neural networks is performed by executing a plurality of steps 510-518 in an iterative manner.
At 510, the server system 102 generates k first node representations of the ith individual node in one or more sub-graphs in which i and k are positive integers. 'i' represents the ith node contained in the homogenous graph, 'i' is a positive integer, and ith is not greater than the total number of nodes contained in the homogenous graph. The value 'k' is consistent with the number of sub-graphs containing the ith node. Here, a first node representation for a particular sub-graph is generated using a GNN based encoder associated with the particular sub-graph.
At 512, the server system 102 aggregates the k first node representations of the ith individual node based, at least in part, on an attention context vector and MLP function. Based on the aggregation, the server system 102 determines the intermediate node representation of the ith individual node.
In a similar manner, steps 510 and 512 are performed for each node of the plurality of nodes.
At 514, the server system 102 generates sub-graph representation sg_i based on all the node representations that are processed through ith GNN based encoder through an MLP layer.
At 516, the server system 102 calculates a cumulative loss value based, at least in part, on intermediate node representations of all the nodes and sub-graph representations of M sub-graphs. More specifically, the server system 102 computes the cumulative loss value according to the above-mentioned Eqn. (13).
At 518, the server system 102 updates the neural network parameters of GNN based encoders and the attention context vector for each node based, at least in part, on the cumulative loss value.
Once the number of training epochs reaches the predefined iterations, at 520, the training is completed and the process ends. Thereafter, the final node representations of the plurality of nodes are stored in the graph database 106.
The sequence of steps of the flow chart 500 need not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped together and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner.
FIGS. 6A and 6B, collectively, represent a flow diagram 600 of a method for predicting fraudulent transactions based on the unsupervised graph representation learning model, in accordance with an embodiment of the present disclosure. The sequence of operations of the flow chart 600 may not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner. It is to be noted that to explain the flow chart 600, references may be made to elements described in FIG. 1B and FIG. 2.
At 602, the server system 122 accesses a heterogeneous transaction graph of historical payment transaction data (i.e., online, or offline payment transactions) associated with the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c from the database 128. The heterogeneous transaction graph includes a first set of nodes and a second set of nodes. In an embodiment, the first set of nodes may represent the plurality of cardholders 124a-124c and the second set of nodes may represent the plurality of merchants 126a-126c. In one embodiment, the heterogeneous graph is stored in the form of graph data structure in the database 128.
At 604, the server system 122 converts the heterogeneous transaction graph into one or more homogeneous graphs. In one embodiment, the heterogeneous graph including the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c is converted into the homogeneous graph including either only the plurality of cardholders 124a-124c or only the plurality of merchants 126a-126c. In an embodiment, the heterogeneous graph may be converted into the homogeneous graph based on a common link present between either two merchants of the plurality of merchants 126a-126c or a common link present between either two cardholders of the plurality of cardholders 124a-124c.
At 606, the server system 122 segments a homogenous graph from the one or more homogenous graphs into the plurality of sub-graphs based, at least in part, on the one or more graph sampler methods. The one or more graph sampler methods may include a node sampler method, an edge sampler method, or any other sampler method of the like.
At 608, the server system 122 identifies the positive and negative samples corresponding to each node associated with each sub-graph based, at least in part, on the set of sampling methods.
At 610, the server system 122 trains the plurality of neural networks based, at least in part, on the training data (i.e., the positive samples and the negative samples corresponding to each node associated with each sub-graph) and the cumulative loss function. The training data may also include the feature matrix and adjacency matrix of each sub-graph.
The training of the plurality of neural networks is performed by executing a plurality of operations for an individual node associated with each sub-graph. The plurality of operations is explained in detail below in steps 610a-610d.
At 610a, the server system 122 generates first node representations of the individual node that is sampled in the one or more sub-graphs of the plurality of sub-graphs. The node embeddings of the individual node for the one or more sub-graphs are generated based, at least in part, on the corresponding graph neural network (GNN) based encoders.
At 610b, the server system 122 determines the attention context vector associated with the individual node based, at least in part, on the attention mechanism.
At 610c, the server system 122 aggregates the first node representations of the individual node generated for the one or more sub-graphs to determine the intermediate node representation of the individual node based, at least in part, on the non-linearity function and the attention context vector.
At 610d, the server system 122 calculates first and second node-level loss values corresponding to the individual node based, at least in part, a first loss function, and second loss function, respectively. The first loss function for the individual node is defined with an objective such that the intermediate node representation is in the proximity of the positive samples in the embedding space.
At 612, the server system 122 optimizes the neural network parameters (e.g., weights and biases) of the plurality of neural networks and attention context vectors of the plurality of nodes to reduce the cumulative loss function till the threshold value for the training data.
At 614, the server system 122 determines final node representations of the plurality of nodes after completion of the training process explained above in steps 602-612.
At 616, the server system 122 predicts future fraudulent transactions based on transaction patterns determined based on the final node representations of the plurality of nodes. For example, the server system 122 is configured to determine the final node representations of the plurality of nodes for the homogeneous graph including the plurality of cardholders 124a-124c. Similarly, the server system 122 is configured to determine the final node representations of the plurality of nodes for the homogeneous graph including the plurality of merchants 126a-126c. By obtaining the final node representations of the plurality of cardholders 124a-124c as well as the plurality of merchants 126a-126c, the server system 122 is configured to capture structural information of the transaction graph (i.e., the heterogeneous graph).
The final node representations for the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c may further be concatenated for each transaction to predict whether the transaction is a fraudulent transaction or not. For each transaction, the final node representations of the plurality of cardholders 124a-124c and the plurality of merchants 126a-126c may be concatenated, and the resulting vector may be used to predict whether the particular transaction is fraudulent transaction or non-fraudulent transaction.
The sequence of steps of the flow chart 600 need not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped together and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner.
FIG. 7 illustrates a flow diagram depicting a method 700 for unsupervised graph representation learning for homogeneous graphs, in accordance with an embodiment of the present disclosure. The method 700 depicted in the flow diagram may be executed by, for example, the server system 200. Operations of the method 700, and combinations of operation in the method 700, may be implemented by, for example, hardware, firmware, a processor, circuitry, and/or a different device associated with the execution of software that includes one or more computer program instructions. The operations of the method 700 are described herein may be performed by an application interface that is hosted and managed with help of the server system 200. The method 700 starts at operation 702.
At operation 702, the method 700 includes accessing, by the server system 200, a homogenous graph of historical interaction data (e.g., historical payment transaction data) associated with a set of entities (e.g., the cardholders 124a-124c, the merchants 126a-126c, etc.) from a database (e.g., the database 108 or the database 128). The homogenous graph includes a plurality of nodes.
At operation 704, the method 700 includes segmenting, by the server system 200, the homogenous graph into the plurality of sub-graphs based, at least in part, on the one or more graph sampler methods.
At operation 706, the method 700 includes identifying, by the server system 200, positive and negative samples corresponding to each node based, at least in part, on the set of sampling methods. The set of sampling methods includes one of: node neighborhood similarity method and neighborhood feature similarity method.
At operation 708, the method 700 includes training, by the server system 200, the plurality of neural networks based, at least in part, on the training data and the cumulative loss function. The cumulative loss function is determined based on the graph-level loss function, and a summation of a first node-level loss value and a second node-level loss value corresponding to each node.
At operation 710, the method 700 includes storing, by the server system 200, final node representations of the plurality of nodes after completion of the training process to determine prediction outputs for the plurality of graph context prediction tasks associated with the trained plurality of neural networks.
The sequence of operations of the method 700 need not be necessarily executed in the same order as they are presented. Further, one or more operations may be grouped together and performed in form of a single step, or one operation may have several sub-steps that may be performed in parallel or in a sequential manner.
Experiments
Experiment 1: The quality of learned node representations on a plurality of graph context prediction tasks can be evaluated. A fraud transaction prediction task is chosen as an evaluation since it tests the ability of the node representations in predicting ambiguous or fraud nodes in the graph structure. In the performance, the performance of the unsupervised graph representation learning model of the present disclosure is compared against prior algorithms for performing the fraud transaction prediction task.
Datasets: Experiments can be conducted based on the payment transaction data (i.e., the historical interaction data). The payment transaction data includes ~538k transactions performed between ~136k payment cards (e.g., credit cards or debit cards owned by cardholders) and ~2.1k merchants.
Based on the payment transaction data, a heterogeneous transaction graph of payment transactions is generated and converted into one or more homogeneous graphs based on methods already known in the art. For example, the heterogeneous transaction graph may be converted into a homogeneous graph based on a common link present between either two cardholders or two merchants. The performance of the GNN based encoders is evaluated against an existing solution of fraud risk model (for example, Mastercard DI score, where the DI score represents Mastercard® Decision Intelligence (registered as a trademark) product powered by Mastercard® that provides decision and fraud detection service). The product uses artificial intelligence technology to help financial institutions increase the accuracy of real-time approvals of genuine transactions and reduce false declines. In addition, F1-score (micro) is calculated to evaluate the performance of the GNN based encoders against the existing fraud risk model. The results are illustrated below in Table 1:
Model Dataset F1-score (micro)
Proposed model Transaction Data 0.58
Fraud Risk Models Transaction data 0.49
Table 1: Results of evaluation of unsupervised graph representation learning model against fraud risk model
Experiment 2: In this experiment, the unsupervised graph representation learning model of the present disclosure is compared against baseline models. The baseline model such as DeepGraphInfomax(DGI) is a well-recognized model. A node classification task is chosen as an evaluation.
Datasets: The Pubmed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.
F1-score (micro) is calculated to evaluate the performance of the GNN based encoders against DeepGraphInfomax (DGI) model. The results are illustrated below in Table 2:
Model Dataset F1-score (micro)
Proposed model Pubmed 81
DGI Model Pubmed 76.8
Table 2
The table 2 indicates that the proposed model provides better results than DGI for the task of node classification on the above dataset.
The disclosed methods with reference to FIGS. 1A to 7, or one or more operations of the methods 500, 600, and 700 may be implemented using software including computer-executable instructions stored on one or more computer-readable media (e.g., non-transitory computer-readable media, such as one or more optical media discs, volatile memory components (e.g., DRAM or SRAM), or nonvolatile memory or storage components (e.g., hard drives or solid-state nonvolatile memory components, such as Flash memory components) and executed on a computer (e.g., any suitable computer, such as a laptop computer, netbook, Webbook, tablet computing device, smartphone, or other mobile computing devices). Such software may be executed, for example, on a single local computer or in a network environment (e.g., via the Internet, a wide-area network, a local-area network, a remote web-based server, a client-server network (such as a cloud computing network), or other such networks) using one or more network computers. Additionally, any of the intermediate or final data created and used during implementation of the disclosed methods or systems may also be stored on one or more computer-readable media (e.g., non-transitory computer-readable media) and are considered to be within the scope of the disclosed technology. Furthermore, any of the software-based embodiments may be uploaded, downloaded, or remotely accessed through a suitable communication means. Such suitable communication means include, for example, the Internet, the World Wide Web, an intranet, software applications, cable (including fiber optic cable), magnetic communications, electromagnetic communications (including RF, microwave, and infrared communications), electronic communications, or other such communication means.
Although the disclosure has been described with reference to specific exemplary embodiments, it is noted that various modifications and changes may be made to these embodiments without departing from the broad scope of the disclosure. For example, the various operations, blocks, etc. described herein may be enabled and operated using hardware circuitry (for example, complementary metal-oxide-semiconductor (CMOS) based logic circuitry), firmware, software, and/or any combination of hardware, firmware, and/or software (for example, embodied in a machine-readable medium). For example, the apparatuses and methods may be embodied using transistors, logic gates, and electrical circuits (for example, application-specific integrated circuit (ASIC) circuitry and/or in Digital Signal Processor (DSP) circuitry).
Particularly, the server system 200 (e.g., the server system 102 or the server system 122) and its various components such as the computer system 202 and the database 204 may be enabled using software and/or using transistors, logic gates, and electrical circuits (for example, integrated circuit circuitry such as ASIC circuitry). Various embodiments of the disclosure may include one or more computer programs stored or otherwise embodied on a computer-readable medium, wherein the computer programs are configured to cause a processor or computer to perform one or more operations. A computer-readable medium storing, embodying, or encoded with a computer program, or similar language may be embodied as a tangible data storage device storing one or more software programs that are configured to cause a processor or computer to perform one or more operations. Such operations may be, for example, any of the steps or operations described herein. In some embodiments, the computer programs may be stored and provided to a computer using any type of non-transitory computer-readable media. Non-transitory computer-readable media include any type of tangible storage media. Examples of non-transitory computer-readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g., magneto-optical disks), CD-ROM (compact disc read-only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (BLU-RAY® Disc), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash memory, RAM (random access memory), etc.). Additionally, a tangible data storage device may be embodied as one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. In some embodiments, the computer programs may be provided to a computer using any type of transitory computer-readable media. Examples of transitory computer-readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer-readable media can provide the program to a computer via a wired communication line (e.g., electric wires, and optical fibers) or a wireless communication line.
Various embodiments of the invention, as discussed above, may be practiced with steps and/or operations in a different order, and/or with hardware elements in configurations, which are different than those which are disclosed. Therefore, although the invention has been described based upon these exemplary embodiments, it is noted that certain modifications, variations, and alternative constructions may be apparent and well within the scope of the invention.
Although various exemplary embodiments of the invention are described herein in a language specific to structural features and/or methodological acts, the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as exemplary forms of implementing the claims.
| # | Name | Date |
|---|---|---|
| 1 | 202241017945-STATEMENT OF UNDERTAKING (FORM 3) [28-03-2022(online)].pdf | 2022-03-28 |
| 2 | 202241017945-POWER OF AUTHORITY [28-03-2022(online)].pdf | 2022-03-28 |
| 3 | 202241017945-FORM 1 [28-03-2022(online)].pdf | 2022-03-28 |
| 4 | 202241017945-FIGURE OF ABSTRACT [28-03-2022(online)].jpg | 2022-03-28 |
| 5 | 202241017945-DRAWINGS [28-03-2022(online)].pdf | 2022-03-28 |
| 6 | 202241017945-DECLARATION OF INVENTORSHIP (FORM 5) [28-03-2022(online)].pdf | 2022-03-28 |
| 7 | 202241017945-COMPLETE SPECIFICATION [28-03-2022(online)].pdf | 2022-03-28 |
| 8 | 202241017945-Correspondence_Form 26_26-04-2022.pdf | 2022-04-26 |
| 9 | 202241017945-Proof of Right [04-08-2022(online)].pdf | 2022-08-04 |
| 10 | 202241017945-Correspondence_Assignment_12-08-2022.pdf | 2022-08-12 |