Abstract: The invention provides for automated recordal of standardized merchant names within merchant location data records. The invention comprises (i) retrieving first and second sets of merchant location data records from first and second databases, (ii) matching, using a first matching method, merchant location data records from the first and second sets of merchant location data records, (iii) generating a sub10 set of unmatched merchant location data records from the first set of merchant location data records, (iv) matching based on a second matching method, records within the generated sub-set of merchant location data records against records from the second set of merchant location data records, (v) generating clusters of merchant location data records, (vi) for each generated cluster, identifying 15 matched merchant location data records, and (vii) for each identified matched merchant location data record within a cluster, recording a merchant name retrieved from a co-referent complement merchant location data record.
] The present invention relates to the field of automated standardization of merchant information in big data databases, and in particular to automated identification and recordal of standardized merchant name information to merchant data records within big data databases.
Background of the invention
[002] With the increase in big data, a frequently observed occurrence is where individual data records that relate to a single entity are inadvertently (either because of sub-optimal information collection practices or sub-optimal information recordal practices) are found to have non-standard and distinct names.
[003] For analytics processing that requires a consolidated view of an entire business, non-standard or incorrect naming of merchant data records creates serious problems, and requires pre-processing and standardization and / or disambiguation of the data to identify and aggregate the various names into a standardized format for processing and analytics.
[004] This is particularly observed in the credit and debit card industries, where Point-of-Sale (POS) terminals are provided to merchants at various locations, and where the paperwork generated during the POS terminal delivery process is insufficient, incomplete or simply incorrect - as a result of which the merchant name information which is available at the card issuer end, or at the payment network end is correspondingly incorrect. By way of an example, names referring to the same merchant but in respect of different merchant store locations or merchant POS terminal locations may be spelled differently, may contain truncated or shortened words, may be completely different from each other, or may contain irrelevant and non-standard information or characters. For example, for the merchant "Dominos' - different individual merchant location data records may be respectively named / labelled 'Dominos', 'Dominoes', 'Dimonos', 'Dominopizza', 'Domino Pvt. Ltd.' and / or 'Jubilant Foodworks'. Additionally, if the business has multiple different locations or departments within stores that are separate cost centers, each store or department may be designated separately, e.g. 'Dominos #1234' etc.
[005] Existing solutions for pre-processing and standardization and / or disambiguation of merchant information that is extracted from records generated during the POS terminal delivery process is largely a manual operation to verify name assignment and to correct for incorrect, inaccurate, obscure or outlier information. Existing processes comprise an n-request process that involves usage of third party merchant data and hand¬written rules to match location information corresponding to a POS terminal with merchant location information in one or more third party merchant information databases, and
subsequent to derivation of a match, a standardized merchant name is assigned to the merchant location for which the merchant name was not previously available.
[006] For small or conventionally sized data sets, or for data sets that are largely static, the prior art solutions are not completely impractical. However, in the case of POS terminal information, where payment networks can daily receive information ranging between 30,000 and 40,000 new merchant locations at which POS terminals have been delivered or installed, such pre-processing is extremely impractical and more often than not is entirely impossible. Such "big data" sets often include large complex collections of merchant location data records, and have a volume, velocity and variety that exceed most organization's traditional storage or computing capacity for accurate and timely decision making.
[007] There is accordingly a need to provide solutions that enable processing and automated identification, disambiguation and recordal of standardized merchant name information to individual merchant location data records, such that merchant location data records corresponding to a single merchant can be accurately identified, aggregated and analyzed.
Summary
[008] The invention provides methods, systems and computer program products that enable processing and automated identification, disambiguation and recordal of standardized merchant name information to individual merchant location data records, such that merchant location data records corresponding to a single merchant can be accurately identified, aggregated and analyzed.
[009] The invention provides a system for automated recordal of standardized merchant names within merchant location data records retrieved from a first merchant data aggregation database, comprising a processor implemented server system coupled with the first merchant data aggregation database, wherein said server system is configured for (i) retrieving a first set of merchant location data records from the first merchant data aggregation database, (ii) retrieving at least a second set of merchant location data records, wherein said second set of merchant location data records is retrieved from at least a second merchant data aggregation database that is different from the first merchant data aggregation database, (hi) matching based on a first matching method, merchant location data records from the first set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a first merchant location data record from the first set of merchant location data records, and a second co-referent complement merchant location data record from at least the second set of merchant location data records, (iv) generating a sub-set of merchant location data records from the first set of merchant location data records that do not result in a match with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records, (v) matching
based on a second matching method that is different from the first matching method, merchant location data records from the generated sub-set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a third merchant location data record from the generated sub-set of merchant location data records, and a fourth co-referent complement merchant location data record from at least the second set of merchant location data records, (vi) generating one or more clusters of merchant location data records within the first set of merchant location data records, (vii) for each generated cluster, identifying one or more matched merchant location data records, wherein said matched merchant location data records have been matched based on either of the first matching method or the second matching method with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records, and (viii) for each identified matched merchant location data record within a cluster, recording within a corresponding merchant name data field memory location, a merchant name retrieved from a co-referent complement merchant location data record to which the matched merchant location data record corresponds.
[0010] The system may be configured such that within each generated cluster, each unmatched merchant location data record that has not resulted in a match based on either of the first matching method or the second matching method, is assigned a merchant name retrieved from a matched merchant location data within the same cluster that has been matched based on either of the first matching method or the second matching method, wherein the assigned merchant name is recorded within a merchant name data field memory location associated with said unmatched merchant location data record.
[0011] In an embodiment, the system is configured such that the merchant name recorded to the corresponding merchant name data field memory location is written to the first merchant data aggregation database.
[0012] In another embodiment, the system may be configured such that the first matching method matches merchant location data records from the first set of merchant location data records against merchant location data records retrieved from a plurality of merchant data aggregation databases each of which are distinct from the first merchant data aggregation database.
[0013] In a further system embodiment, the first matching method is implemented through a processor implemented regression classifier.
[0014] The system may be configured such that the first matching method implements merchant location data record matching based on one or more of any one or more of merchant name, merchant address, city, zip code, phone number, state or country.
[0015] In an alternative embodiment, the system may be configured such that the second matching method is implemented through a processor implemented random forest classifier.
[0016] In one embodiment of the system, the second matching method implements merchant location data record matching based on one or more of merchant name, merchant address, city, zip code, phone number, state, country, latitude, longitude or a determined distance between data fields in two data records.
[0017] In another embodiment, the system is configured such that the first matching method is computationally efficient or time efficient in comparison with the second matching method.
[0018] In a particular embodiment, the system is configured such that grouping of merchant location data records within the one or more clusters is based on one or more of merchant name information, industry information, average transaction value information and / or merchant category code information retrieved from data fields associated with the merchant location data records.
[0019] In another embodiment, the system may be configured such that the one or more clusters or merchant location data records are generated by a processor implemented regression classifier.
[0020] The invention also provides a method for automated recordal of standardized merchant names within merchant location data records retrieved from a first merchant data aggregation database. The method comprises implementing within a processor implemented server system coupled with the first merchant data aggregation database, the steps of (i) retrieving a first set of merchant location data records from the first merchant data aggregation database, (ii) retrieving at least a second set of merchant location data records, wherein said second set of merchant location data records is retrieved from at least a second merchant data aggregation database that is different from the first merchant data aggregation database, (hi) matching based on a first matching method, merchant location data records from the first set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a first merchant location data record from the first set of merchant location data records, and a second co-referent complement merchant location data record from at least the second set of merchant location data records, (iv) generating a sub-set of merchant location data records from the first set of merchant location data records that do not result in a match with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records, (v) matching based on a second matching method that is different from the first matching method, merchant location data records from the generated sub-set of merchant location data records with merchant location data records from at least the second set of merchant
location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a third merchant location data record from the generated sub-set of merchant location data records, and a fourth co-referent complement merchant location data record from at least the second set of merchant location data records, (vi) generating one or more clusters of merchant location data records within the first set of merchant location data records, (vii) for each generated cluster, identifying one or more matched merchant location data records, wherein said matched merchant location data records have been matched based on either of the first matching method or the second matching method with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records, and (viii) for each identified matched merchant location data record within a cluster, recording within a corresponding merchant name data field memory location, a merchant name retrieved from a co-referent complement merchant location data record to which the matched merchant location data record corresponds.
[0021] In an embodiment of the method, within each generated cluster, each unmatched merchant location data record that has not resulted in a match based on either of the first matching method or the second matching method, is assigned a merchant name retrieved from a matched merchant location data within the same cluster that has been matched based on either of the first matching method or the second matching method, wherein the assigned merchant name is recorded within a merchant name data field memory location associated with said unmatched merchant location data record.
[0022] In another embodiment of the method, the merchant name recorded to the corresponding merchant name data field memory location is written to the first merchant data aggregation database.
[0023] In a particular embodiment of the method, the first matching method matches merchant location data records from the first set of merchant location data records against merchant location data records retrieved from a plurality of merchant data aggregation databases each of which are distinct from the first merchant data aggregation database.
[0024] In one method embodiment, the first matching method is implemented through a processor implemented regression classifier.
[0025] In a further method embodiment, the first matching method implements merchant location data record matching based on one or more of any one or more of merchant name, merchant address, city, zip code, phone number, state or country.
[0026] In a specific embodiment of the method, the second matching method is implemented through a processor implemented random forest classifier.
[0027] The second matching method may implement merchant location data record matching based on one or more of merchant name, merchant address, city, zip code, phone
number, state, country, latitude, longitude or a determined distance between data fields in two data records.
[0028] In a specific embodiment of the method, the first matching method is computationally efficient or time efficient in comparison with the second matching method.
[0029] In one embodiment of the method, grouping of merchant location data records within the one or more clusters is based on one or more of merchant name information, industry information, average transaction value information and / or merchant category code information retrieved from data fields associated with the merchant location data records.
[0030] In a particular method embodiment, the one or more clusters or merchant location data records are generated by a processor implemented regression classifier.
[0031] The invention also provides a computer program product for automated recordal of standardized merchant names within merchant location data records retrieved from a first merchant data aggregation database. The computer program product comprises a non-transitory computer readable medium having a computer readable program code embodied therein, the computer readable program code comprising instructions for (i) retrieving a first set of merchant location data records from the first merchant data aggregation database, (ii) retrieving at least a second set of merchant location data records, wherein said second set of merchant location data records is retrieved from at least a second merchant data aggregation database that is different from the first merchant data aggregation database, matching based on a first matching method, merchant location data records from the first set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a first merchant location data record from the first set of merchant location data records, and a second co-referent complement merchant location data record from at least the second set of merchant location data records, (hi) generating a sub-set of merchant location data records from the first set of merchant location data records that do not result in a match with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records, (iv) matching based on a second matching method that is different from the first matching method, merchant location data records from the generated sub-set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a third merchant location data record from the generated sub-set of merchant location data records, and a fourth co-referent complement merchant location data record from at least the second set of merchant location data records, (v) generating one or more clusters of merchant location data records within the first set of merchant location data records, (vi) for each generated cluster, identifying one or more matched merchant location data records, wherein said matched merchant location
data records have been matched based on either of the first matching method or the second matching method with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records, and (vii) for each identified matched merchant location data record within a cluster, recording within a corresponding merchant name data field memory location, a merchant name retrieved from a co-referent complement merchant location data record to which the matched merchant location data record corresponds.
Brief description of the accompanying drawings
[0032] Figure 1 illustrates a system environment for merchant data aggregation of a type used in connection with credit card and / or debit card based POS terminals.
[0033] Figure 2 is a table illustrating an exemplary set of merchant location data records of a kind that are intended to be identified, disambiguated, and modified to record a standardized merchant name in accordance with the teachings of the present invention.
[0034] Figure 3 illustrates a system environment in accordance with the present invention, configured for enabling identification and disambiguation of merchant location data records and for aggregation and assigning of standardized merchant names to merchant location data records corresponding to individual merchants.
[0035] Figures 4A and 4B illustrate in combination, a method of aggregating and assigning of standardized merchant names to merchant location data records.
[0036] Figure 5 illustrates a method of retrieval and optional pre-processing of merchant location data records.
[0037] Figure 6 illustrates a method of matching and generating co-referent pairs of merchant location data records retrieved respectively from an internal database and from one or more external databases, based on a first matching method.
[0038] Figure 7 illustrates a method of identifying a sub-set of merchant location data records retrieved from an internal database that require to be matched against merchant location data records from one or more external databases, based on a second matching method.
[0039] Figure 8 illustrates a method of matching and generating co-referent pairs of merchant location data records retrieved respectively from the identified sub-set of merchant location data records (from the method of Figure 7) and from the one or more external databases, based on a second matching method.
[0040] Figure 9 illustrates a first method of assigning standardized merchant names to individual merchant location data records within clusters of merchant location data records that have been generated in respect of records retrieved from an internal database.
[0041] Figure 10 illustrates a second method of assigning standardized merchant names to individual merchant location data records within clusters of merchant location data records that have been generated in respect of records retrieved from an internal database.
[0042] Figure 11 illustrates a method of training a random forest classifier to match merchant location data records retrieved from an internal database with merchant location data records retrieved from an external database.
[0043] Figure 12 illustrates an exemplary embodiment of a merchant data aggregation server of a type that may be configured for implementing the teachings of the present invention.
[0044] Figure 13 illustrates an exemplary computer system according to which various embodiments of the present invention may be implemented.
Detailed description
The invention provides methods, systems and computer program products that enable processing and automated identification, disambiguation and recordal of standardized merchant name information to individual merchant location data records, such that merchant location data records corresponding to a single merchant can be accurately identified, aggregated and analyzed.
Figure 1 illustrates a system environment 100 for merchant data aggregation of a type used in connection with credit card and / or debit card based POS terminals. System environment 100 comprises a plurality of merchant locations 102a, 102b upto 102n at which merchant locations, POS terminals 1 upto n have been installed and are operational. POS terminals 1 to n are respectively in network communication through network 104 with merchant data aggregator 106 - which merchant data aggregator 106 aggregates information corresponding to the merchant locations 102a upto 102n and may be configured to store and automatically identify, disambiguate and record standardized merchant name information in respect of individual merchant location data records corresponding to merchant locations 102a upto 102n.
[0045] POS terminals 1 to n may comprise any processor implemented data processing devices having network communication capabilities, and configured to enable network based credit card and / or debit card transactions through a payment network. In exemplary instances, POS terminals 1 to n may include any of a credit or debit card swipe machine or a near-field communication enabled credit or debit card machine, or a computer, or a
smartphone, or any other mobile or non-mobile data processing and/or data communication device.
[0046] Network 104 may comprise any communication network (for example, the internet). In a particular embodiment, network infrastructure 104 may comprise part or whole of a payment card network.
[0047] Merchant data aggregator 106 may include one or more servers 106a and one or more databases 106b. The one or more databases 106b may be configured to store merchant location data records corresponding to the merchant locations 102a to 102n. Each merchant location data record may include a plurality of data fields, including data fields configured to store information corresponding to any of merchant name, street address, city, zip code, phone number, state, country, latitude, longitude, merchant industry, average value of transactions received from one or more POS terminals associated with the merchant location, merchant category code (MCC) assigned to a merchant, merchant interbank card association number (ICA code), merchant tax identifier, and / or merchant uniform resource locator (URL).
[0048] Server(s) 106a may include one or more processor implemented servers configured to receive merchant location data from various sources or inputs, and to generate corresponding merchant location data records and to retrievably store such merchant location data records in database(s) 106b. Server(s) 106a may additionally be configured to implement one or more of the methods of the present invention for automated identification, disambiguation and recordal of standardized merchant name information to individual merchant location data records within database(s) 106b, such that merchant location data records corresponding to a single merchant can be accurately identified, aggregated and analyzed.
[0049] Figure 2 is a table illustrating an exemplary set of merchant location data records of a kind that are intended to be identified, disambiguated, and modified with a standardized merchant name in accordance with the teachings of the present invention. The table illustrates a plurality of merchant location data records, each data record including data fields for merchant name, city, state, zip code, phone number, latitude and longitude of the merchant location. As will be seen from the similarly shaded data records, a single merchant may have multiple locations, and correspondingly multiple merchant location data records. Correlating and aggregating different merchant locations that correspond to a single merchant in a fully automated manner, and thereafter assigning a standardized merchant name to the aggregated data records is one of the outcomes of the present invention - and relies on both machine learning and artificial intelligence.
[0050] Figure 3 illustrates a system environment 300 in accordance with the present invention, configured for enabling identification and disambiguation of merchant location data records and for aggregation and assigning of standardized merchant names to merchant location data records corresponding to individual merchants.
[0051] System environment 300 comprises a plurality of merchant locations 302a, 302b upto 302n at which merchant locations POS terminals 1 upto n have been installed and are operational. POS terminals 1 to n are respectively in network communication through network 304 with merchant data aggregator 306 - which merchant data aggregator 306 aggregates information corresponding to the merchant locations 302a upto 302n and may be configured to store and automatically identify, disambiguate and record standardized merchant name information in respect of individual merchant location data records corresponding to merchant locations 302a upto 302n.
[0052] Network 304 may comprise any communication network (for example, the internet). In a particular embodiment, network infrastructure 304 may comprise part or whole of a payment card network.
[0053] Merchant data aggregator 306 may include one or more servers 306a and one or more databases 306b. The one or more databases 306b may be configured to store merchant location data records corresponding to the merchant locations 302a to 302n. Each merchant location data record may include a plurality of data fields, including data fields configured to store information corresponding to any of merchant name, street address, city, zip code, phone number, state, country, latitude, longitude, merchant industry, average value of transactions received from one or more POS terminals associated with the merchant location, merchant category code (MCC) assigned to a merchant, merchant interbank card association number (ICA code), merchant tax identifier, and / or merchant uniform resource locator (URL).
[0054] Server(s) 306a may include one or more processor implemented servers configured to receive merchant location data from various sources or inputs, and generate corresponding merchant location data records and to retrievably store such merchant location data records in database(s) 306b. Server(s) 306a may additionally be configured to implement one or more of the methods of the present invention for automated identification, disambiguation and recordal of standardized merchant name information to individual merchant location data records within database(s) 306b, such that merchant location data records corresponding to a single merchant can be accurately identified, aggregated and analyzed.
[0055] Merchant data aggregator 306 is additionally communicably coupled with a first external data aggregator database 308a and a second external data aggregator database 308b. Each of said first and said external data aggregator databases 308a and 308b are databases operated and maintained by external agencies and which contain merchant location data records that have been generated and stored by such external agencies. In an embodiment of the invention, the external data aggregator databases 308a and 308b are the AggData™ and Pitney Bowes™ databases respectively - each of which comprise external databases of aggregated location based business information including data records of retail locations for businesses and merchants.
[0056] The configuration and operation of merchant data aggregator 306 may be understood in connection with the methods described hereinbelow.
[0057] Figures 4A and 4B illustrate in combination, a method of aggregating and disambiguating merchant location data records and of assigning of standardized merchant names to merchant location data records. The method of Figures 4A and 4B may be implemented within one or more servers 306a within merchant data aggregator 306 within system environment 300.
[0058] Step 402 comprises retrieving a first set of merchant location data records from an internal merchant information database (for example from database 306b) within merchant data aggregator 306, and at least a second set of merchant location data records from one or more than one external data aggregator database(s) (for example, from one or both of a first external data aggregator database 308a and a second external data aggregator database 308b). As in the above cases, each merchant location data record may include a plurality of data fields, including data fields configured to store information corresponding to any of merchant name, street address, city, zip code, phone number, state, country, latitude, longitude, merchant industry, average value of transactions received from one or more POS terminals associated with the merchant location, merchant category code (MCC) assigned to a merchant, merchant interbank card association number (ICA code), merchant tax identifier, and / or merchant uniform resource locator (URL).
[0059] Step 402 may additionally comprise preprocessing the retrieved first set of merchant location data records and / or at least the second set of merchant location data records that are retrieved from the respective databases. In an embodiment of the invention, pre-processing merchant location data records may include the steps of removing records with null addresses and states, normalizing addresses, formatting phone numbers to standard phone number formats (e.g. a standard U.S. phone number format), removing unnecessary or duplicated characters from records, and / or converting each data field to a standard format for the purposes of comparison and / or analysis.
[0060] Step 404 comprises matching the retrieved first set of merchant location data records with at least the second set of merchant location data records to identify matching co-referent pairs of merchant location data records based on a first matching method. Each co-referent pair comprises a first merchant location data record from the first set of merchant location data records, and a corresponding 'co-referent complement' merchant location data record that comprises part of the at least second set of merchant location data records that has been retrieved from an external database. The first matching method may in an embodiment comprise a machine learning driven matching technique that includes adaptive blocking to maximize detection of true co-referent pairs and to minimize the incidence of false co-referent pairs being generated - thereby optimizing the time efficiency and the computational efficiency of the matching process. In an embodiment, the first matching method implements matching of the first set of merchant location data records and
at least the second set of merchant location data records, based on a processor implemented regression classifier. The processor implemented regression classifier may comprise a L-2 regularized logistic regression classifier, which may have been trained or configured for matching based on training data comprising features created through actively learned similarity functions. Active learning techniques may be used to iteratively identify unconfident or low confidence co-referent pairs determined by the regression classifier, and these unconfident or low confidence co-referent pairs may be manually labelled and provided as input training data to the regression classifier to improve its recognition capabilities in respect of decision boundaries.
[0061] In an embodiment, the first matching method implemented at step 404 identifies co-referent pairs of merchant location data records based on one or more data fields within the data records, including any one or more of merchant name, merchant address, city, zip code, phone number, state and / or country.
[0062] Step 406 comprises generating a third set of merchant location data records comprising one or more merchant location data records from the first set of merchant location data, wherein the matching method of step 404 has failed to identify for such one or more merchant location data records, a satisfactory co-referent complement data record in the second set of merchant location data records. Stated differently, the third set of merchant location data records comprises merchant location data records that have not been satisfactorily matched through the matching process of step 404.
[0063] Step 408 comprises matching the third set of merchant location data records with at least the second set of merchant location data records to identify matching co-referent pairs of merchant location data records based on a second matching method that is different from the first matching method. The second matching method may in an embodiment comprise matching the third set of merchant location data records against at least the second set of merchant location data records based on a processor implemented random forest classifier.
[0064] In an embodiment, the second matching method implemented at step 408 identifies co-referent pairs of merchant location data records based on one or more data fields within the data records, including any one or more of merchant name, merchant address, city, zip code, phone number, state, country, latitude, longitude and / or a determined distance between data fields in two data records (for example, levenshtein distance between name and address etc.).
[0065] In an embodiment of the invention, the second matching method implemented at step 408 is more computationally intensive or time intensive than the first matching method implemented at step 404.
[0066] The processor implemented random forest classifier may be trained for improving the efficiency and accuracy of the matching outcomes, in accordance with the method discussed subsequently in connection with Figure 11.
[0067] Step 410 comprises clustering (vertical clustering) individual merchant location data records within the first set of merchant location data records into a plurality of clusters based on a first clustering method. The first clustering method may comprise block-wise hierarchical clustering that is performed over the merchant location data records in the first set of merchant location data records. The block-wise hierarchical clustering may in an embodiment be performed using a processor implemented regression classifier. The processor implemented regression classifier may comprise an L-2 regularized logistic regression classifier and may be trained using active learning techniques based on training data comprising features created through actively learned similarity functions. Active learning techniques may be used to iteratively identify unconfident or low confidence pairs or cluster candidates determined by the regression classifier, and these unconfident or low confidence pairs or cluster candidates may be manually labelled and provided as input training data to the regression classifier to improve its recognition capabilities in respect of the decision boundary. In an embodiment, the clustering of merchant location data records at step 410 may be based on one or more of merchant name information, industry information, average transaction value information and / or merchant category code information retrieved from data fields associated with the merchant location data records.
[0068] Subsequent to completion of the clustering of merchant location data records within the first set of merchant location data records, step 412 comprises identifying within each generated cluster of merchant location data records (generated at step 410), individual merchant location data records for which a satisfactory co-referent complement data record has been identified within at least the second set of merchant location data records (i.e. at one or both of steps 404 and / or step 408). Thereafter step 412 includes assigning to each such identified merchant location data record, a merchant name retrieved from the matched co-referent complement data record from the second set of merchant location data records. So for example, if a merchant location data record within the first set of merchant location data records and having a merchant name 'XYZ', has been matched with a satisfactory co-referent complement data record within the second set of merchant location data records, where the co-referent complement data record has the merchant name 'Dominos', the name 'Dominos' is thereafter assigned to the merchant name data field within the merchant location data record from the first set of merchant location data records.
[0069] In the event that the first set of merchant location data records is being matched against only a single second set of merchant location data records retrieved from a single external aggregator database, the assignment of merchant name to merchant location data records within the first set of merchant location data records is relatively straightforward -since each merchant location data record within the first set of merchant location data records would be satisfactorily matched with a single co-referent complement data record in the second set of merchant location data records, and can be assigned the merchant name
retrieved from the matched co-referent complement data record. In other cases, where the first set of merchant location data records are matched against merchant location data records retrieved from two or more external databases (for example, from both of the AggData™ and Pitney Bowes ™ databases) assignment of a merchant name to merchant location data records within the first set of merchant location data records may in an embodiment be achieved in one or more of the following ways:
Each of the two or more databases may be assigned a priority relative to the remaining databases. For example, if the external databases comprises both of the AggData™ and Pitney Bowes ™ databases, the AggData™ database may be assigned a higher priority relative to the Pitney Bowes™ database.
• If a merchant location data record from the first set of merchant location data records is matched with a corresponding co-referent complement data record in both of the two or more external databases, a merchant name that is assigned to the merchant location data record from the first set of merchant location data records is extracted from the matching co-referent complement data record found in the external database that has the highest priority out of the external databases in which matches have been found. So for example, if co-referent complement data records are found in both of the AggData™ and Pitney Bowes ™ databases, and the AggData™ database has been assigned the higher priority of the two, the merchant name extracted from the matched co-referent complement data record that has been found in the AggData™ database is assigned to the corresponding merchant location data record from the first set of merchant location data records.
•
•
If the matching of data records is carried out against a plurality of external databases, but a merchant location data record from the first set of merchant location data records is matched with a corresponding co-referent complement data record in only of the two or more external databases, a merchant name that is assigned to the merchant location data record from the first set of merchant location data records is extracted from the single matching co-referent complement data record that has been found in the concerned external database - regardless of the identity of such external database. So for example, if the matching of the first set of merchant location data records is carried out against merchant location data records from both of the AggData™ and Pitney Bowes ™ databases, and (i) if a co-referent complement data record for a specific merchant location data record within the first set of merchant location data records) is found in only the AggData™ database, the merchant name extracted from the matched co-referent complement data record that has been found in the AggData™ database is assigned to the corresponding merchant location data record from the first set of merchant location data records, or (ii) if a co-referent complement data record for a specific merchant location data record within the first set of merchant location data records) is found in only the Pitney Bowes ™ database, the merchant name extracted from the matched co-referent complement data record
that has been found in the Pitney Bowes ™ database is assigned to the corresponding merchant location data record from the first set of merchant location data records,
• In case no match is found for a particular merchant location data record within the first set of merchant location data records, against data records from any of the plurality of external databases, name assignment of the unmatched merchant locations may be implemented in accordance with step 414 which is discussed in more detail below.
[0070] At step 414, for each individual merchant location data record within a cluster (generated within the first set of merchant location data records at step 410), for which a satisfactory co-referent complement data record has not been identified, a merchant name is assigned to said individual merchant location data record based on merchant narae(s) that has / have been assigned (at step 412) to one or more other individual merchant location data records in the same cluster for which a satisfactory co-referent complement data record has been identified. To explain by way of example, let us assume a case where a cluster of merchant location data records (generated in accordance with step 410) within the first set of merchant location data records comprises first and a second merchant location data records. In the example, (i) the first merchant location data record within the cluster previously had a merchant name 'XYZ', and has been matched with a satisfactory co-referent complement data record within the second set of merchant location data records (in accordance with either or both of steps 404 and / or 408), and since the co-referent complement data record has the merchant name 'Dominos', step 412 has therefore resulted in the name 'Dominos' being assigned to the merchant name data field within the first merchant location data record in said cluster and (ii) the second merchant location data record has a merchant name 'ABC but no satisfactory co-referent complement data record has been matched to this second merchant location data record even after implementing the matching steps 404 and 408. In this case, the merchant name that has been assigned to the second merchant location data record within the cluster at step 412 is also assigned to the first merchant location data record within said cluster - as a result of which, the name 'Dominos' is thereafter assigned to the merchant name data field within the first merchant location data record as well.
[0071] While the above example discusses a simple case where a cluster comprises two merchant location data records, one of which has been satisfactorily matched with a co-referent complement data record, and the other one of which has not been satisfactorily matched, it is also possible that a cluster may have a plurality of merchant location data records that have been satisfactorily matched (i.e. a plurality of 'matched merchant location data records') and as a result of such matching, different merchant names have been assigned to each such matched merchant location data record. In such cases, each merchant location data records within the cluster that has not been satisfactorily matched (i.e. each 'unmatched merchant location data record') would be assigned a merchant name from one of such matched merchant location data records, and selection of the matched merchant location data record from which the merchant name is to be extracted for assignment to the
unmatched merchant location data record may be based on any one of several selection techniques. One such selection technique is to select a matched merchant location data record which has the most frequently occurring merchant name within that cluster and to assign this most frequently occurring merchant name within that cluster to the merchant name data field in one or more unmatched merchant location data records within the cluster. Other selection techniques may be based on common merchant ICA codes, common merchant tax identifiers, and / or common merchant URLs.
[0072] In implementing the various method steps of the method of Figures 4A and 4B, each assignment of a merchant name to a merchant location data record within the first set of merchant location data records may be accompanied by generation and assignment of a corresponding confidence level or confidence score that represents the likelihood that the name assignment is correct. Generation and assignment of confidence scores may in an embodiment be achieved in one or more of the following ways:
• If a merchant location data record within the first set of merchant location data records is being matched against merchant location data records retrieved from two or more external databases (for example, from both of the AggData™ and Pitney Bowes ™ databases) and if a match is found for the merchant location data record in both external databases, and additionally if both of the matched co-referent complement data records have the same merchant name, then this merchant name may be assigned to the merchant location data record within the first set of merchant location data records along with a high confidence score (or with the confidence score 'HIGH').
• If a merchant location data record within the first set of merchant location data records is being matched against merchant location data records retrieved from two or more external databases (for example, from both of the AggData™ and Pitney Bowes ™ databases) and if a match is found for the merchant location data record in only one of such external databases, then the merchant name retrieved from the matched co-referent complement data record from the external database may be assigned to the merchant location data record within the first set of merchant location data records along with a high confidence score (or with the confidence score 'HIGH').
• If a merchant location data record within the first set of merchant location data records is being matched against merchant location data records retrieved from two or more external databases (for example, from both of the AggData™ and Pitney Bowes ™ databases) and if no match is found for the merchant location data record in either of such external databases, but the merchant tax identifier, merchant URL and acquirer identifier (i.e. an identifier corresponding to an acquirer institution through which the POS terminal at the merchant location operates for carrying out electronic payment transactions) match one or more other merchant location data records that have already been assigned a merchant name within the same cluster as the merchant location data record, then the merchant name retrieved from such other merchant
location data record(s) within the cluster may be retrieved and assigned to the merchant location data record within the first set of merchant location data records, along with a medium confidence score (or with the confidence score 'MEDIUM).
• All other merchant location data records within the clusters of data records generated out of the first set of merchant location data records for which a merchant name has not been identified are associated with a low confidence score (or with the confidence score 'LOW').
[0073] In the event one or more merchant location data records within the first set of merchant location data records remains unmatched subsequent to implementation of method steps 402 to 414 of Figures 4A and 4B, or have been assigned a merchant name but with a low confidence score, such merchant location data records are sent for manual review and name assignment. Thereafter, such manually labelled results can be used as training data for iteratively improving the accuracy and results of one or more of the regressions classifier(s) and / or random forest classifier through which the method of Figures 4A and 4B is implemented.
[0074] One of the critical advantages of the method of Figures 4A and 4B, is the two-stage matching that uses different first and second matching methods. The first matching method which is more computationally and time efficient and is based on analysis of a smaller set of data parameters, is used as a first level matching method on the entire set of merchant location data records within an internal database, to determine matches against merchant location data records retrieved from external databases. Thereafter, for a smaller set of merchant location data records within the internal database which have not been satisfactorily matched through the first matching method, the second matching method (which is more computationally and time intensive and which is based on analysis of a larger set of data parameters than the first matching method) is applied to determine matches against merchant location data records retrieved from external databases. The more complex second matching method is required to be performed on a smaller number of merchant location data records, as a result of which the combination of the first and second matching methods results in improved disambiguation and matching accuracy without significant tradeoffs in computational and time efficiencies.
[0075] Figure 5 illustrates a method of retrieval and optional pre-processing of merchant location data records. The method of Figure 5 may be implemented in connection with implementation of method step 402 of the method of Figures 4A and 4B, as described above.
[0076] Step 502 comprises retrieving and optionally pre-processing a first set of internal merchant location data records from an internal merchant information database - for example from database 306b within merchant data aggregator 306.
[0077] Step 504 comprises retrieving and optionally pre-processing a first set of external merchant location data records from a first external data aggregator database - for example
from first external data aggregator database 308a (or for example, from an AggData™ database).
[0078] Step 506 comprises retrieving and optionally pre-processing a second set of external merchant location data records from a second external data aggregator database -for example from second external data aggregator database 308b (or for example, from a Pitney Bowes ™ database).
[0079] In embodiments of the method of Figure 5, pre-processing merchant location data records at any of steps 502 to 506 may include the steps of removing records with null addresses and states, normalizing addresses, formatting phone numbers to standard phone number formats (e.g. a standard U.S. phone number format), removing unnecessary or duplicated characters from records, and / or converting each data field to a standard format for the purposes of comparison and / or analysis.
[0080] Figure 6 illustrates a method of matching and generating co-referent pairs of merchant location data records retrieved respectively from an internal database and from one or more external databases, based on a first matching method. The method of Figure 6 may be implemented in connection with implementation of method step 404 of the method of Figures 4A and 4B, as described above.
[0081] Step 602 comprises matching merchant location data records within the first set of internal merchant location data records (for example, that have been retrieved at step 502 of Figure 5) with a first set of external merchant location data records (retrieved for example at step 504 of Figure 5) to identify a first set of matching co-referent pairs of merchant location data records based on a first matching method.
[0082] Step 604 comprises assigning a co-referent pairing confidence score to each co-referent pair within the first set of matching co-referent pairs of merchant location data records identified at step 602 above.
[0083] Step 606 comprises matching merchant location data records within the first set of internal merchant location data records (for example, that have been retrieved at step 502 of Figure 5) with a second set of external merchant location data records (retrieved for example at step 506 of Figure 5) to identify a second set of matching co-referent pairs of merchant location data records based on the first matching method.
[0084] Step 608 comprises assigning a co-referent pairing confidence score to each co-referent pair within the second set of matching co-referent pairs of merchant location data records identified at step 606 above.
[0085] The first matching method implemented in Figure 6 may in an embodiment comprise a machine learning driven matching technique that includes adaptive blocking to
maximize detection of true co-referent pairs and to minimize the incidence of false co-referent pairs being generated - thereby optimizing the time efficiency and the computational efficiency of the matching process. In an embodiment, the first matching method may implement matching of the first set of merchant location data records and at least the second set of merchant location data records based on a processor implemented regression classifier. The processor implemented regression classifier may comprise a L-2 regularized logistic regression classifier, which may have been trained or configured for matching based on training data comprising features created through actively learned similarity functions. Active learning techniques may be used to iteratively identify unconfident or low confidence co-referent pairs determined by the regression classifier, and these unconfident or low confidence co-referent pairs may be manually labelled and provided as input training data to the regression classifier to improve its recognition capabilities in respect of the decision boundary.
[0086] In an embodiment, the first matching method implemented within the method of Figure 6 identifies co-referent pairs of merchant location data records based on one or more data fields within the data records, including any one or more of merchant name, merchant address, city, zip code, phone number, state and / or country.
[0087] Figure 7 illustrates a method of identifying a sub-set of merchant location data records retrieved from an internal database that require to be matched against merchant location data records from one or more external databases, based on a second matching method. The method of Figure 7 maybe implemented in connection with implementation of method step 406 of the method of Figures 4A and 4B, as described above.
[0088] Step 702 comprises identifying within the first set of merchant location data records, one or more merchant location data records which, have in the process of matching using the first matching method (for example during implementation of step 404 of the method of Figures 4A and 4B, or the method of Figure 6), been assigned a co-referent pairing confidence score that is below a predefined threshold score.
[0089] Step 704 comprises generating a sub-set of merchant location data records comprising the identified merchant location data records, which have in the process of matching using the first matching method, been assigned a co-referent pairing confidence score that is below the predefined threshold score.
[0090] Figure 8 illustrates a method of matching and generating co-referent pairs of merchant location data records retrieved respectively from the identified sub-set of merchant location data records (generated through the method of Figure 7) and from the one or more external databases, based on a second matching method. The method of Figure 8 maybe implemented in connection with implementation of method step 408 of the method of Figures 4A and 4B, as described above.
[0091] Step 802 comprises matching the generated sub-set of merchant location data records within the first set of external merchant location data records (generated using the method of Figure 7) with a first set of external merchant location data records (for example, merchant location data records extracted from first external data aggregator database 308a, or from an AggData™ database) to identify a second set of matching co-referent pairs of merchant location data records based on a second matching method. It will be understood for the purposes of the invention that a first set of co-referent pairs of merchant location data records has already been generated pursuant to step 404 of the method of Figures 4A and 4B.
[0092] Step 804 optionally comprises assigning a co-referent pairing confidence score to each co-referent pair within the second set of matching co-referent pairs of merchant location data records that has been generated at step 802 - wherein said co-referent pairing confidence score represents a predicted accuracy of the co-referent pairing.
[0093] Step 806 comprises matching the generated sub-set of merchant location data records within the first set of external merchant location data records (generated using the method of Figure 7) with the second set of external merchant location data records (for example, merchant location data records extracted from second external data aggregator database 308b, or from a Pitney Bowes™ database) to identify a third set of matching co-referent pairs of merchant location data records based on the second matching method.
[0094] Step 808 comprises assigning a co-referent pairing confidence score to each co-referent pair within the third set of matching co-referent pairs of merchant location data records - wherein said co-referent pairing confidence score represents a predicted accuracy of the co-referent pairing.
[0095] The second matching method of Figure 8 may in an embodiment comprise matching the generated sub-set of merchant location data records within the first set of external merchant location data records (generated using the method of Figure 7) against merchant location data records retrieved from either the first or second set of external merchant location data records (for example, merchant location data records extracted from a first or second external data aggregator database 308a, 308b, or from an AggData™ database or a Pitney Bowes™ database) based on a processor implemented random forest classifier.
[0096] In an embodiment, the second matching method described in Figure 8 identifies co-referent pairs of merchant location data records based on one or more data fields within the data records, including any one or more of merchant name, merchant address, city, zip code, phone number, state, country, latitude, longitude and / or a determined distance between data fields in two data records (for example, the Levenshtein distance between name and address etc.). In a particular embodiment, the second matching method implemented within the method of Figure 8 is more computationally intensive or time intensive than the first matching method implemented within the method of Figure 6.
[0097] Figure 9 illustrates a first method of assigning standardized merchant names to individual merchant location data records within clusters of merchant location data records that have been generated in respect of records retrieved from an internal database. The method of Figure 9 may be implemented in connection with implementation of method step 412 of the method of Figures 4A and 4B, as described above.
[0098] Step 902 comprises, for each generated cluster of merchant location data records (generated or clustered in accordance with step 410 of the method of Figures 4A and 4B), identifying individual merchant location data records within said cluster, for which at least one co-referent complement data record has been identified within either or both of the first or second set of external merchant location data records (for example, merchant location data records extracted from a first or second external data aggregator database 308a, 308b, or from an AggData™ database or a Pitney Bowes™ database), through either of the first matching method or the second matching method [i.e. the methods of Figures 6 and 8 respectively).
[0099] Step 904 comprises retrieving for each such individual merchant location data record (identified at step 902), a corresponding co-referent pairing confidence score that has been assigned during the process of identifying co-referent pairs for said individual data record - wherein said co-referent pairing confidence score represents a predicted accuracy of the co-referent pairing.
[00100] Step 906 comprises determining whether the co-referent pairing confidence score corresponding to an individual merchant location data record is below a predefined threshold score. This predefined threshold score may be provided as part of configuring the merchant data aggregator 306 and may represent a threshold of reliability for generated co-referent pairings.
[00101] Step 908 responds to a determination that a co-referent pairing confidence score satisfies the predefined threshold score, by assigning to the corresponding individual merchant location data record within said cluster, a merchant name retrieved from a matched co-referent complement data record within either or both of the first or second set of external merchant location data records.
[00102] Figure 10 illustrates a second method of assigning standardized merchant names to individual merchant location data records within clusters of merchant location data records that have been generated in respect of records retrieved from an internal database. The method of Figure 10 may be implemented in connection with implementation of method step 414 of the method of Figures 4A and 4B, as described above.
[00103] Step 1002 comprises, for each generated cluster of merchant location data records (generated or clustered in accordance with step 410 of the method of Figures 4A and 4B), parsing individual merchant location data records for which at least one co-referent
complement data record has been identified within either or both of the first or second set of external merchant location data records (for example, merchant location data records extracted from a first or second external data aggregator database 308a, 308b, or from an AggData™ database or a Pitney Bowes™ database), using either of the first matching method or the second matching method.
[00104] Step 1004 comprises retrieving for each such individual merchant location data record, a corresponding co-referent pairing confidence score that has been assigned during the process of identifying co-referent pairs for said individual data record - wherein said co-referent pairing confidence score represents a predicted accuracy of the co-referent pairing.
[00105] Step 1006 comprises determining whether the co-referent pairing confidence score corresponding to an individual merchant location data record is below a predefined threshold score. This predefined threshold score may be provided as part of configuring merchant data aggregator 306 and may represent a threshold of reliability for generated co-referent pairings.
[00106] Step 1008 comprises responding to a co-referent pairing confidence score being found to be below the predefined threshold score, by assigning to the corresponding individual merchant location data record within said cluster, a merchant name based on merchant name(s) assigned to one or more other individual merchant location data records in the same cluster for which a co-referent pairing confidence score satisfies the predefined threshold score.
[00107] Figure 11 illustrates a method of training a random forest classifier to match merchant location data records retrieved from an internal database with merchant location data records retrieved from an external database. The random forest classifier trained in accordance with the method of Figure 11 may be used to implement one or both of the second matching methods of step 408 of the method of Figures 4A and 4B, and of Figure 8 respectively.
[00108] Step 1102 comprises identifying co-referent pairs of merchant location data records that have been generated based on implementing the first matching method (of step 404 of the method of Figures 4A and 4B) to match the first set of merchant location data records with at least the second set of merchant location data records, and which have in the process of matching (using the first matching method), been assigned a co-referent pairing confidence score that is below a predefined threshold score.
[00109] Step 1104 comprises performing an amalgamation of K-means and K-mode clustering [i.e. a second method of clustering that may be different from the first method of clustering that has been described previously in connection with step 410 of the method of Figures 4A and 4B) on the identified co-referent pairs of merchant location data records to generate a plurality of clusters of co-referent pairs of merchant location data.
[00110] At step 1106, for each of the generated clusters, one or more co-referent pairs of merchant location data are extracted from the generated clusters and are provided merchant name inputs to assign a merchant name to the extracted merchant location data records.
[00111] Step 1108 comprises providing the extracted one or more co-referent pairs of merchant location data as input training data to a random forest classifier that is configured to generate co-referent pairs of merchant location data records by matching merchant location data records within the first set of merchant location data records retrieved from an internal merchant information database (for example database 306b within merchant data aggregator 306) with at least a second set of merchant location data records from an external data aggregator database ( for example, first or second external data aggregator database 308a, 308b, or an AggData™ database or a Pitney Bowes™ database).
[00112] Step 1110 comprises training or modifying the configuration of the random forest classifier based on the input training data.
[00113] The method of Figure 11 maybe implemented periodically (for example, monthly, quarterly, semi-annually or annually) to modify the configuration of the random forest classifier that is used for implementing the second matching method of step 408 of the method of Figures 4A and 4B, and/or the method of Figure 8 - so as to periodically improve the matching results provided by the random forest classifier.
[00114] Figure 12 illustrates an exemplary embodiment of a merchant data aggregation server 1200 of a type that may be configured for implementing the teachings of the present invention.
[00115] Data aggregation server 1200 may comprise any processor implemented instance of a data aggregation server configured to implement one or more of the methods of Figures 4 to 11 described above. Server 1200 comprises an operator interface 1202, processor 1204, communication transceiver 1206 and memory 1208, which memory 1208 may include transitory memory and / or non-transitory memory. In an exemplary embodiment, memory 1208 may have stored therewithin, (i) an operating system 12082 configured for managing device hardware and software resources and that provides common services for software programs implemented within server 1200, (ii) a data pre-processor 12084 configured to implement the data pre-processing step 402 of the method of Figures 4A and 4B, and/or steps 502 to 506 of Figure 5, (iii) an internal database interface 12086 configured to enable server 1200 to retrieve merchant location data records from and to store merchant location data records to an internal database (for example, database 306b within merchant data aggregator 306), (iv) an external database interface 12088 configured to enable server 1200 to retrieve merchant location data from one or more external databases (for example, first or second external data aggregator database 308a, 308b, or from an AggData™ database or a Pitney Bowes™ database), (v) a processor implemented adaptive blocking controller 12090 configured for performing adaptive blocking on merchant location data records received from both the internal database and the one or more external databases for the
purposes of maximizing detection of true co-referent pairs while optimizing time efficiency and computational efficiency of the matching process, (vi) a processor implemented regression classifier 12092 configured to implement one or both of the first matching method described in connection with step 404 of the method of Figures 4A and 4B and steps 602 to 608 of Figure 6, and/or for performing clustering of merchant location data records as described in connection with step 410 of the method of Figures 4A and 4B, (vii) a processor implemented random forest classifier 12094 configured to implement the second matching method described in connection with step 408 of the method of Figures 4A and 4B and steps 802 to 808 of Figure 8, (viii) a processor implemented confidence scoring controller 12096 configured to generate and assign confidence levels or confidence scores to either (a) co-referent pairs identified by the first or second matching methods and / or (b) merchant names assigned to merchant location data records in accordance with the teachings of the present invention, (ix) a processor implemented clustering controller 12098 configured to cluster merchant location data records in accordance with step 410 of the method of Figures 4A and 4B, and (x) a processor implemented merchant name assignment controller 12100 configured to implement merchant name assignment to merchant location data records in accordance with the teachings of step 414 of the method of Figures 4A and 4B, and / or the methods of Figures 9 or 10 as described in detail above.
[00116] Figure 13 illustrates an exemplary computer system according to which various embodiments of the present invention may be implemented.
[00117] System 1300 includes computer system 1302 which in turn comprises one or more processors 1304 and at least one memory 1306. Processor 1304 is configured to execute program instructions - and may be a real processor or a virtual processor. It will be understood that computer system 1302 does not suggest any limitation as to scope of use or functionality of described embodiments. The computer system 1302 may include, but is not be limited to, one or more of a general-purpose computer, a programmed microprocessor, a micro-controller, an integrated circuit, and other devices or arrangements of devices that are capable of implementing the steps that constitute the method of the present invention. Exemplary embodiments of a computer system 1302 in accordance with the present invention may include one or more servers, desktops, laptops, tablets, smart phones, mobile phones, mobile communication devices, tablets, phablets and personal digital assistants. In an embodiment of the present invention, the memory 1306 may store software for implementing various embodiments of the present invention. The computer system 1302 may have additional components. For example, the computer system 1302 may include one or more communication channels 1308, one or more input devices 1310, one or more output devices 1312, and storage 1314. An interconnection mechanism (not shown) such as a bus, controller, or network, interconnects the components of the computer system 1302. In various embodiments of the present invention, operating system software (not shown) provides an operating environment for various softwares executing in the computer system 1302 using a processor 1304, and manages different functionalities of the components of the computer system 1302.
[00118] The communication channel(s) 1308 allow communication over a
communication medium to various other computing entities. The communication medium provides information such as program instructions, or other data in a communication media. The communication media includes, but is not limited to, wired or wireless methodologies implemented with an electrical, optical, RF, infrared, acoustic, microwave, Bluetooth or other transmission media.
[00119] The input device(s) 1310 may include, but is not limited to, a touch screen, a
keyboard, mouse, pen, joystick, trackball, a voice device, a scanning device, or any another device that is capable of providing input to the computer system 1302. In an embodiment of the present invention, the input device(s) 1310 may be a sound card or similar device that accepts audio input in analog or digital form. The output device(s) 1312 may include, but not be limited to, a user interface on CRT, LCD, LED display, or any other display associated with any of servers, desktops, laptops, tablets, smart phones, mobile phones, mobile communication devices, tablets, phablets and personal digital assistants, printer, speaker, CD/DVD writer, or any other device that provides output from the computer system 1302.
[00120] The storage 1314 may include, but not be limited to, magnetic disks, magnetic
tapes, CD-ROMs, CD-RWs, DVDs, any types of computer memory, magnetic stripes, smart cards, printed barcodes or any other transitory or non-transitory medium which can be used to store information and can be accessed by the computer system 1302. In various embodiments of the present invention, the storage 1314 may contain program instructions for implementing any of the described embodiments.
[00121] In an embodiment of the present invention, the computer system 1302 is part
of a distributed network or a part of a set of available cloud resources.
[00122] The present invention may be implemented in numerous ways including as a
system, a method, or a computer program product such as a computer readable storage medium or a computer network wherein programming instructions are communicated from a remote location.
[00123] The present invention may suitably be embodied as a computer program
product for use with the computer system 1302. The method described herein is typically implemented as a computer program product, comprising a set of program instructions that is executed by the computer system 1302 or any other similar device. The set of program instructions may be a series of computer readable codes stored on a tangible medium, such as a computer readable storage medium (storage 1314), for example, diskette, CD-ROM, ROM, flash drives or hard disk, or transmittable to the computer system 1302, via a modem or other interface device, over either a tangible medium, including but not limited to optical or analogue communications channel(s) 1308. The implementation of the invention as a computer program product may be in an intangible form using wireless techniques, including but not limited to microwave, infrared, Bluetooth or other transmission techniques. These instructions can be preloaded into a system or recorded on a storage
medium such as a CD-ROM, or made available for downloading over a network such as the Internet or a mobile telephone network. The series of computer readable instructions may embody all or part of the functionality previously described herein.
[00124] Based on the above, it would be apparent that the present invention offers
significant advantages - in particular, by enabling processing and automated identification, disambiguation and recordal of standardized merchant name information to individual merchant location data records, such that merchant location data records corresponding to a single merchant can be accurately identified, aggregated and analyzed.
[00125] While the exemplary embodiments of the present invention are described and illustrated herein, it will be appreciated that they are merely illustrative. It will be understood by those skilled in the art that various modifications in form and detail may be made therein without departing from or offending the spirit and scope of the invention as defined by the appended claims. Additionally, the invention illustratively disclose herein suitably may be practiced in the absence of any element which is not specifically disclosed herein - and in a particular embodiment that is specifically contemplated, the invention is intended to be practiced in the absence of any one or more element which are not specifically disclosed herein.
We claim:
1. A system for automated recordal of standardized merchant names within merchant location data records retrieved from a first merchant data aggregation database, comprising:
a processor implemented server system coupled with the first merchant data aggregation database, wherein said server system is configured for:
retrieving a first set of merchant location data records from the first merchant data aggregation database;
retrieving at least a second set of merchant location data records, wherein said second set of merchant location data records is retrieved from at least a second merchant data aggregation database that is different from the first merchant data aggregation database;
matching based on a first matching method, merchant location data records from the first set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a first merchant location data record from the first set of merchant location data records, and a second co-referent complement merchant location data record from at least the second set of merchant location data records;
generating a sub-set of merchant location data records from the first set of merchant location data records that do not result in a match with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records;
matching based on a second matching method that is different from the first matching method, merchant location data records from the generated sub-set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a third merchant location data record from the generated sub-set of merchant location data records, and a fourth co-referent complement merchant location data record from at least the second set of merchant location data records;
generating one or more clusters of merchant location data records within the first set of merchant location data records;
for each generated cluster, identifying one or more matched merchant location data records, wherein said matched merchant location data records have been matched based on either of the first matching method or the second matching method with corresponding co-
referent complement merchant location data records from at least the second set of merchant location data records; and
for each identified matched merchant location data record within a cluster, recording within a corresponding merchant name data field memory location, a merchant name retrieved from a co-referent complement merchant location data record to which the matched merchant location data record corresponds.
2. The system as claimed in claim 1, configured such that within each generated cluster, each unmatched merchant location data record that has not resulted in a match based on either of the first matching method or the second matching method, is assigned a merchant name retrieved from a matched merchant location data within the same cluster that has been matched based on either of the first matching method or the second matching method, wherein the assigned merchant name is recorded within a merchant name data field memory location associated with said unmatched merchant location data record.
3. The system as claimed in claim 1, configured such that the merchant name recorded to the corresponding merchant name data field memory location is written to the first merchant data aggregation database.
4. The system as claimed in claim 1, configured such that the first matching method matches merchant location data records from the first set of merchant location data records against merchant location data records retrieved from a plurality of merchant data aggregation databases each of which are distinct from the first merchant data aggregation database.
5. The system as claimed in claim 1, configured such that the first matching method is implemented through a processor implemented regression classifier.
6. The system as claimed in claim 1, configured such that the first matching method implements merchant location data record matching based on one or more of any one or more of merchant name, merchant address, city, zip code, phone number, state or country.
7. The system as claimed in claim 1, configured such that the second matching method is implemented through a processor implemented random forest classifier.
8. The system as claimed in claim 1, configured such that the second matching method implements merchant location data record matching based on one or more of merchant name, merchant address, city, zip code, phone number, state, country, latitude, longitude or a determined distance between data fields in two data records.
9. The system as claimed in claim 1, configured such that the first matching method is computationally efficient or time efficient in comparison with the second matching method.
10. The system as claimed in claim 1, configured such that grouping of merchant
location data records within the one or more clusters is based on one or more of merchant
name information, industry information, average transaction value information and / or
merchant category code information retrieved from data fields associated with the merchant
location data records.
11. The system as claimed in claim 1, configured such that the one or more clusters or merchant location data records are generated by a processor implemented regression classifier.
12. A method for automated recordal of standardized merchant names within merchant location data records retrieved from a first merchant data aggregation database, the method comprising implementing within a processor implemented server system coupled with the first merchant data aggregation database, the steps of:
retrieving a first set of merchant location data records from the first merchant data aggregation database;
retrieving at least a second set of merchant location data records, wherein said second set of merchant location data records is retrieved from at least a second merchant data aggregation database that is different from the first merchant data aggregation database;
matching based on a first matching method, merchant location data records from the first set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a first merchant location data record from the first set of merchant location data records, and a second co-referent complement merchant location data record from at least the second set of merchant location data records;
generating a sub-set of merchant location data records from the first set of merchant location data records that do not result in a match with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records;
matching based on a second matching method that is different from the first matching method, merchant location data records from the generated sub-set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a third merchant location data record from the generated sub-set of merchant location data records, and a fourth co-referent complement merchant location data record from at least the second set of merchant location data records;
generating one or more clusters of merchant location data records within the first set of merchant location data records;
for each generated cluster, identifying one or more matched merchant location data records, wherein said matched merchant location data records have been matched based on either of the first matching method or the second matching method with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records; and
for each identified matched merchant location data record within a cluster, recording within a corresponding merchant name data field memory location, a merchant name retrieved from a co-referent complement merchant location data record to which the matched merchant location data record corresponds.
13. The method as claimed in claim 12, wherein within each generated cluster, each unmatched merchant location data record that has not resulted in a match based on either of the first matching method or the second matching method, is assigned a merchant name retrieved from a matched merchant location data within the same cluster that has been matched based on either of the first matching method or the second matching method, wherein the assigned merchant name is recorded within a merchant name data field memory location associated with said unmatched merchant location data record.
14. The method as claimed in claim 12, wherein the merchant name recorded to the corresponding merchant name data field memory location is written to the first merchant data aggregation database.
15. The method as claimed in claim 12, wherein the first matching method matches merchant location data records from the first set of merchant location data records against merchant location data records retrieved from a plurality of merchant data aggregation databases each of which are distinct from the first merchant data aggregation database.
16. The method as claimed in claim 12, wherein the first matching method is implemented through a processor implemented regression classifier.
17. The method as claimed in claim 12, wherein the first matching method implements merchant location data record matching based on one or more of any one or more of merchant name, merchant address, city, zip code, phone number, state or country.
18. The method as claimed in claim 12, wherein the second matching method is implemented through a processor implemented random forest classifier.
19. The method as claimed in claim 12, wherein the second matching method implements merchant location data record matching based on one or more of merchant name, merchant
address, city, zip code, phone number, state, country, latitude, longitude or a determined distance between data fields in two data records.
20. The method as claimed in claim 12, wherein the first matching method is
computationally efficient or time efficient in comparison with the second matching method.
21. The method as claimed in claim 12, wherein grouping of merchant location data
records within the one or more clusters is based on one or more of merchant name
information, industry information, average transaction value information and / or merchant
category code information retrieved from data fields associated with the merchant location
data records.
22. The method as claimed in claim 12, wherein the one or more clusters or merchant location data records are generated by a processor implemented regression classifier.
23. A computer program product for automated recordal of standardized merchant names within merchant location data records retrieved from a first merchant data aggregation database, comprising a non-transitory computer readable medium having a computer readable program code embodied therein, the computer readable program code comprising instructions for:
retrieving a first set of merchant location data records from the first merchant data aggregation database;
retrieving at least a second set of merchant location data records, wherein said second set of merchant location data records is retrieved from at least a second merchant data aggregation database that is different from the first merchant data aggregation database;
matching based on a first matching method, merchant location data records from the first set of merchant location data records with merchant location data records from at least the second set of merchant location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a first merchant location data record from the first set of merchant location data records, and a second co-referent complement merchant location data record from at least the second set of merchant location data records;
generating a sub-set of merchant location data records from the first set of merchant location data records that do not result in a match with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records;
matching based on a second matching method that is different from the first matching method, merchant location data records from the generated sub-set of merchant location data records with merchant location data records from at least the second set of merchant
location data records, to identify one or more co-referent pairs of merchant location data records, each co-referent pair of merchant location data records comprising a third merchant location data record from the generated sub-set of merchant location data records, and a fourth co-referent complement merchant location data record from at least the second set of merchant location data records;
generating one or more clusters of merchant location data records within the first set of merchant location data records;
for each generated cluster, identifying one or more matched merchant location data records, wherein said matched merchant location data records have been matched based on either of the first matching method or the second matching method with corresponding co-referent complement merchant location data records from at least the second set of merchant location data records; and
for each identified matched merchant location data record within a cluster, recording within a corresponding merchant name data field memory location, a merchant name retrieved from a co-referent complement merchant location data record to which the matched merchant location data record corresponds.
| # | Name | Date |
|---|---|---|
| 1 | 202111022869-STATEMENT OF UNDERTAKING (FORM 3) [21-05-2021(online)].pdf | 2021-05-21 |
| 2 | 202111022869-PROOF OF RIGHT [21-05-2021(online)].pdf | 2021-05-21 |
| 3 | 202111022869-POWER OF AUTHORITY [21-05-2021(online)].pdf | 2021-05-21 |
| 4 | 202111022869-FORM 1 [21-05-2021(online)].pdf | 2021-05-21 |
| 5 | 202111022869-FIGURE OF ABSTRACT [21-05-2021(online)].pdf | 2021-05-21 |
| 6 | 202111022869-DRAWINGS [21-05-2021(online)].pdf | 2021-05-21 |
| 7 | 202111022869-DECLARATION OF INVENTORSHIP (FORM 5) [21-05-2021(online)].pdf | 2021-05-21 |
| 8 | 202111022869-COMPLETE SPECIFICATION [21-05-2021(online)].pdf | 2021-05-21 |
| 9 | 202111022869-FORM 18 [22-04-2025(online)].pdf | 2025-04-22 |