Method And System For Spend Categorization

< Back

Method And System For Spend Categorization

Abstract: ABSTRACT METHOD AND SYSTEM FOR SPEND CATEGORIZATION Traditional systems being used for spend categorization fail to categorize narration memos which has not captured merchant and/or location information. The disclosure herein generally relates to data processing, and, more particularly, to a method and system for spend categorization. The system categorizes transaction memos as belonging to one of four categories, based on amount of information present in each of the transaction memos. This information is further processed to generate a spend category representation for each user. Training data formed using information on the narration memos, categorizations, the spend category representations and various other data generated. The system then trains a data model using the generated training data to generate a trained data model, which is further used to perform spend categorization for real-time data. [To be published with FIG. 4]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

01 March 2022

Publication Number

35/2023

Publication Type

INA

Invention Field

ELECTRONICS

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. CHANNA, Harminder Singh

Tata Consultancy Services Limited, Think Campus, Unit-VIII & IX, KIADB Industrial Estate, Electronic City, Phase-II, Bangalore 560100, Karnataka, India

2. DAS, Arindam

Tata Consultancy Services Limited L Center, Unit-VI,No.78, 79 & 83, EPIP Industrial Area, Whitefield, Bangalore 560066, Karnataka, India

3. CHETTRI, Priyanka

Tata Consultancy Services Limited L Center, Unit-VI,No.78, 79 & 83, EPIP Industrial Area, Whitefield, Bangalore 560066, Karnataka, India

4. SUBHALAKSHMI, Maitri

Tata Consultancy Services Limited, Kalinga Park,SEZ Cargo. Plot No 35, Chandaka Industrial Estate.Near Infocity, Patia, Chandrasekharpur, Bhubaneswar 751024, Odisha, India

Claims

1. A processor implemented method (200) of generating a training data for spend categorization, comprising: fetching (202) a transaction data as input, via one or more hardware processors, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time; pre-processing (204) the transaction data, via the one or more hardware processors, to generate a pre-processed transaction data; identifying (206), via the one or more hardware processors, a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data; classifying (208), via the one or more hardware processors, the web-page corresponding to each of a plurality of narration memos as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos; identifying (210), via the one or more hardware processors, one or more spend categories the transaction data is associated with, based on association of the narration memos with the plurality of categories; mapping (212), via the one or more hardware processors, each of the identified one or more spend categories with a corresponding broader category; and generating (214), via the one or more hardware processors, a spend category representation of each of the plurality of users, based on one or more of the broader categories to which the one or more spend categories are mapped, wherein information on a) the transaction data, b) web-pages identified for the plurality of narration memos, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data.

2. The method as claimed in claim 1, wherein the plurality of categories comprise a first category, a second category, a third category, and a fourth category, wherein in the first category, merchant and location information corresponding to the narration memo is directly available, in the second category, the merchant and location information corresponding to the narration memo is directly available but is not identifiable, in the third category, the merchant information is not available but the location information corresponding to the narration memo is directly available, and in the fourth category, no information corresponding to the narration memo is available.

3. The method as claimed in claim 1, wherein pre-processing the transaction data comprises a) removing numbers and special characters, b) removing words having number of characters less than a threshold of words, and c) forming one or more words by combining one or more word-segments identified as forming one or more meaningful words when combined.

5. A system (100) for generating a training data for spend categorization, comprising: one or more hardware processors (102); a communication interface (112); and a memory (104) operatively coupled to the one or more hardware processors via the communication interface, wherein a plurality of instructions in the memory when executed cause the one or more hardware processors to: fetch a transaction data as input, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time; pre-process the transaction data to generate a pre-processed transaction data; identify a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data; classify the web-page corresponding to each of a plurality of narration memos as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos; identify one or more spend categories the transaction data is associated with, based on association of the narration memos with the plurality of categories; map each of the identified one or more spend categories with a corresponding broader category; and generate a spend category representation of each of the plurality of users, based on the one or more broader categories to which the one or more spend categories are mapped, wherein information on a) the transaction data, b) web-pages identified for the plurality of narration memos, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data.

6. The system as claimed in claim 5, wherein the plurality of categories comprises a first category, a second category, a third category, and a fourth category, wherein in the first category merchant and location information corresponding to the narration memo is directly available, in the second category the merchant and location information corresponding to the narration memo is directly available but is not identifiable, in the third category the merchant information is not available but the location information corresponding to the narration memo is directly available, and in the fourth category no information corresponding to the narration memo is available.

7. The system as claimed in claim 5, wherein the one or more hardware processors are configured to pre-process the transaction data by a) removing numbers and special characters, b) removing words having number of characters less than a threshold of words, and c) forming one or more words by combining one or more word-segments identified as forming one or more meaningful words when combined.

8. The system as claimed in claim 5, wherein the one or more hardware processors are configured to train a data model using the training data to generate a trained data model, wherein the trained data model is used to process real-time input comprising one or more transaction data of one or more users to perform spend categorization of each of the one or more users. Dated this 01st day of March 2022 Tata Consultancy Services Limited By their Agent & Attorney (Adheesh Nargolkar) of Khaitan & Co Reg No IN-PA-1086 , Description:FORM 2 THE PATENTS ACT, 1970 (39 of 1970) & THE PATENT RULES, 2003 COMPLETE SPECIFICATION (See Section 10 and Rule 13) Title of invention: METHOD AND SYSTEM FOR SPEND CATEGORIZATION Applicant Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956 Having address: Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India The following specification particularly describes the invention and the manner in which it is to be performed. TECHNICAL FIELD [001] The disclosure herein generally relates to data processing, and, more particularly, to a method and system for spend categorization. BACKGROUND [002] People make purchases from various service providers, and they purchase products and/or services related to different categories. For example, the purchased item may be any or all of fuel, food, movie tickets, household items, electronics items, vehicle accessories and so on. Purchase history of every person/user indicates purchase preferences of the user, and this is a crucial input in various applications such as but not limited to targeted advertisements. If the purchase preferences of a user are known, advertisements/purchase suggestions that match the preferences of the user may be sent to the user, and this improves chances of receiving positive feedback and the user purchasing advertised item. [003] With advancement in technology, many of the users have started using digital transactions modes as preferred payment method. Every time a user makes a transaction using a digital payment option, a digital record (alternately referred to as “narration memo”) of the transaction is generated. The digital record may sometimes contain only partial information with respect to the transaction made. For example, consider that the user made debit card transaction at a restaurant. Corresponding digital record generated may have information on a unique identification of the restaurant (i.e. the merchant), and location, and this information helps to categorize this transaction as related to a food category. However, in many instances, such data may not be available from the narration memo, and as a result, it may not be possible to categorize the transactions. For example, when a customer makes a third party money transfer, the customer is given an option to add details of the transaction as a comment. However, most of the times the customers may skip the step of adding the comment or may input some random meaningless data for the sake of filling up highlighted fields. However, not adding any text in the narration field or adding the meaningless random text results in transaction details not being recorded properly, and this in turn affects quality with which the spend categorization is being performed by the state of the art existing systems used for the spend categorization. SUMMARY [004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method of generating a training data for spend categorization is provided. In this method, initially a transaction data is fetched as input, via one or more hardware processors, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time. The transaction data is then pre-processed via the one or more hardware processors, to generate a pre-processed transaction data. Further, a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data is identified via the one or more hardware processors. Further, the web-page corresponding to each of a plurality of narration memos is classified as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos, via the one or more hardware processors. Further, one or more spend categories the transaction data is associated with, are identified based on association of the narration memos with the plurality of categories, via the one or more hardware processors, based on association of the narration memos with the plurality of categories. Further, each of the identified one or more spend categories is mapped with a corresponding broader category via the one or more hardware processors. Further, a spend category representation of each of the plurality of users is generated via the one or more hardware processors, based on one or more of the one or more broader categories to which the one or more spend categories are mapped, wherein information on a) the transaction data, b) the web-pages identified, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data. [005] In another aspect, the plurality of categories comprises a first category, a second category, a third category, and a fourth category, wherein in the first category merchant and location information corresponding to the narration memo is directly available, in the second category the merchant and location information corresponding to the narration memo is directly available but is not identifiable, in the third category the merchant information is not available but the location information corresponding to the narration memo is directly available, and in the fourth category no information corresponding to the narration memo is available. [006] In yet another aspect, the method involves training a data model using the training data to generate a trained data model, wherein the trained data model is used to process real-time input comprising one or more transaction data of one or more users to perform spend categorization of each of the one or more users. [007] In yet another aspect, a system for generating a training data for spend categorization is provided. The system includes one or more hardware processors, a communication interface, and a memory operatively coupled to the one or more hardware processors via the communication interface, wherein a plurality of instructions in the memory when executed cause the one or more hardware processors to initially fetch a transaction data as input, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time. The transaction data is then pre-processed via the one or more hardware processors, to generate a pre-processed transaction data. Further, a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data is identified via the one or more hardware processors. Further, the web-page corresponding to each of a plurality of narration memos is classified as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos, via the one or more hardware processors. Further, one or more spend categories the transaction data is associated with, are identified based on association of the narration memos with the plurality of categories, via the one or more hardware processors, based on association of the narration memos with the plurality of categories. Further, each of the identified one or more spend categories is mapped with a corresponding broader category via the one or more hardware processors. Further, a spend category representation of each of the plurality of users is generated via the one or more hardware processors, based on one or more of the one or more broader categories to which the one or more spend categories are mapped, wherein information on a) the transaction data, b) the web-pages identified, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data. [008] In yet another aspect, the one or more hardware processors are configured to train a data model using the training data to generate a trained data model, wherein the trained data model is used to process real-time input comprising one or more transaction data of one or more users to perform spend categorization of each of the one or more users. [009] In yet another aspect, a non-transitory computer readable medium for generating a training data for spend categorization is provided. The non-transitory computer readable medium includes a plurality of instructions, which when executed, cause the following steps. Initially a transaction data is fetched as input, via one or more hardware processors, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time. The transaction data is then pre-processed via the one or more hardware processors, to generate a pre-processed transaction data. Further, a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data is identified via the one or more hardware processors. Further, the web-page corresponding to each of a plurality of narration memos is classified as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos, via the one or more hardware processors. Further, one or more spend categories the transaction data is associated with, are identified based on association of the narration memos with the plurality of categories, via the one or more hardware processors, based on association of the narration memos with the plurality of categories. Further, each of the identified one or more spend categories is mapped with a corresponding broader category via the one or more hardware processors. Further, a spend category representation of each of the plurality of users is generated via the one or more hardware processors, based on one or more of the one or more broader categories to which the one or more spend categories are mapped, wherein information on a) the transaction data, b) the web-pages identified, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data. [010] In yet another aspect the non-transitory computer readable medium is configured to cause the one or more hardware processors to train a data model using the training data to generate a trained data model, wherein the trained data model is used to process real-time input comprising one or more transaction data of one or more users to perform spend categorization of each of the one or more users. [011] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed. BRIEF DESCRIPTION OF THE DRAWINGS [012] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles: [013] FIG. 1 illustrates an exemplary system for spend categorization according to some embodiments of the present disclosure. [014] FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps involved in the process of generating a data model for spend categorization, by the system of FIG. 1, according to some embodiments of the present disclosure. [015] FIG. 3 various categories of transaction memo, based on information in the transaction memo, in accordance with some embodiments of the present disclosure. [016] FIG. 4 is an example illustration of use of trained data model for spend categorization of real-time data, by the system of FIG. 1, according to some embodiments of the present disclosure. DETAILED DESCRIPTION OF EMBODIMENTS [017] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. [018] With advancement in technology, many of the users have started using digital transactions modes as preferred payment method. Every time a user makes a transaction using a digital payment option, a digital record ( alternately referred to as “narration memo”) of the transaction is generated. The digital record may or may not contain some amount of information with respect to the transaction made. For example, consider that the user made debit card transaction at a restaurant. Corresponding digital record generated may have information on a unique identification of the restaurant (i.e. the merchant), and location, and this information helps to categorize this transaction as related to a food category. However, in many instances, such data may not be available from the narration memo, and as a result, it may not be possible to categorize the transactions and in turn perform the spend categorization of users. [019] Embodiments herein provide a method and system for spend categorization. The system categorizes a plurality of narration memos in a fetched transaction data as belonging to different categories, based on a) availability of certain required information, and b) whether the available information is identifiable. The system generates a training data comprising information on a) a transaction data, b) web-pages identified for the plurality of narration memos, c) the spend category each of the transaction data is associated with, d) a broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation. The system then trains a data model using the generated training data. The trained data model is then used for performing spend categorization of different users. In various embodiments, the system may be configured to use other techniques to extract the information from narration memos in which the required information is available and identifiable. [020] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 4, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method. [021] FIG. 1 illustrates an exemplary system for spend categorization according to some embodiments of the present disclosure. The system 100 includes or is otherwise in communication with hardware processors 102, at least one memory such as a memory 104, an I/O interface 112. The hardware processors 102, memory 104, and the Input /Output (I/O) interface 112 may be coupled by a system bus such as a system bus 108 or a similar mechanism. In an embodiment, the hardware processors 102 can be one or more hardware processors. [022] The I/O interface 112 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 112 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer and the like. Further, the I/O interface 112 may enable the system 100 to communicate with other devices, such as web servers, and external databases. [023] The I/O interface 112 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the I/O interface 112 may include one or more ports for connecting several computing systems with one another or to another server computer. The I/O interface 112 may include one or more ports for connecting several devices to one another or to another server. [024] The one or more hardware processors 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, node machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 102 is configured to fetch and execute computer-readable instructions stored in the memory 104. [025] The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 106. [026] The plurality of modules 106 include programs or coded instructions that supplement applications or functions performed by the system 100 for executing different steps involved in generating the training data and in turn a trained data model for spend categorization. The plurality of modules 106, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 106 may also be used as, signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 106 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 102, or by a combination thereof. The plurality of modules 106 can include various sub-modules (not shown). The plurality of modules 106 may include computer-readable instructions that supplement applications or functions performed by the system 100 for the spend categorization. [027] The data repository (or repository) 110 may include a plurality of abstracted piece of code for refinement and data that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 106. [028] Although the data repository 110 is shown internal to the system 100, it will be noted that, in alternate embodiments, the data repository 110 can also be implemented external to the system 100, where the data repository 110 may be stored within a database (repository 110) communicatively coupled to the system 100. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in FIG. 1) and/or existing data may be modified and/or non-useful data may be deleted from the database. In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS). Functions of the components of the system 100 are now explained with reference to steps in flow diagrams in FIG. 2. [029] FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps of a method (200) involved in the process of generating a data model for spend categorization, by the system of FIG. 1, according to some embodiments of the present disclosure. [030] At step 202 of the method 200, the system 100 fetches a transaction data as input, via one or more hardware processors, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time. The system 100 may optionally store the fetched transaction data in a master table, for further processing. The system 100 may fetch information on the transactions from transaction detail tables maintained by banks and other financial institutions. The narration memo may contain certain information with respect to the transaction made. When a customer makes a credit/debit card transaction at a merchant point, the merchant name and location are stored as Narration memo. For example, if the customer used his credit/debit card to make payment for refueling at a gas station, the narration memo stored for this transaction in a banking database may be: ‘othpos036209904 KAISER KUNTI FUEL RANGPO’, which contains the merchant name and the city location. Secondly, when a customer does online transaction from a well-known website, they provide their information to the banking interfaces which is then stored as the Narration memo. For example, if a customer books train ticket from Paytm®, the narration memo may be stored as ‘ONL Paytm TicketsPayment’. When a customer does any NEFT or IMPS payments from their online accounts, most of the banks generally provide a pre-defined set of options for the customer to choose from. There are generally 6-8 options ranging from education, loan, shopping, gift to relatives etc. There is also an option where the customer could themselves fill the details if any given option does not conform to their transactions. This approach however, gives the vaguest and incorrect detail as it so happens, customers choose the option randomly without giving it a second thought. For example, the customers fill unintelligible words like aaa, aadssf when they themselves have to provide the input. [031] Further, at step 204, the system 100 pre-processes the transaction data, via the one or more hardware processors, to generate a pre-processed transaction data. Pre-processing involves formatting the fetched transaction data to convert to a desired format, as required for further processing. The system 100 may perform the pre-processing based on pre-configured rules/requirements. For example, during the pre-processing, the system 100 may remove numbers and special characters from the transaction data, using appropriate mechanism that may be known in the art. For example, the system 100 may use regular expression function in python to remove the symbols and numbers from the narration memo. Further, as part of text cleaning during the pre-processing, the system 100 may remove all words having 3 or less characters (that are not recognized as English words). Further, during the pre-processing, the system checks for words that may have been joined together in narration in the transaction data 100, which individually may be meaningful, and if found, splits such words to form meaningful words. For example, consider that the word identified by the system 100 is ‘PaytmTicketsPayment’ , then the system 100 splits the word as Paytm Tickets Payment to form the meaningful words. It is to be noted that the aforementioned steps of removing numbers and special characters, removing words, and splitting words are example steps, and appropriate other pre-processing steps may be carried out as required, as required. [032] Further, at step 206, the system 100 identifies at least one web-page corresponding to each of the plurality of narration memos in the pre-processed transaction data. The system 100 may use any suitable approach, for example, a web scraping technique, to identify the web-page(s) matching each of the narration memos. The web scraping (also referred to as ‘scraping’) allows the system 100 to scrap html files for each of the transaction memos. HTML files are coded representation of the web-page. The narration memos from the pre-processed are used to search the web-page for that entry. For example, if the narration memo for one of the transactions was ‘ONL Paytm TicketsPayment ON9000786HH’, after cleaning the text the input to web scrapper would be ‘ONL Paytm Tickets Payment’. The html file corresponding to the web-page is extracted and is temporarily stored. This way the web-pages for every narration memo in the form of html is created/generated by the system 100. [033] Further, at step 208, the system 100 classifies via the one or more hardware processors, the web-page corresponding to each of the narration memos as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos. The system 100 classifies a narration memo as belonging to a first category if merchant and location information corresponding to the narration memo is directly available. The system 100 classifies a narration memo as belonging to a second category if the merchant and location information corresponding to the narration memo is directly available but is not identifiable. The system 100 classifies a narration memo as belonging to a third category if the merchant information is not available but the location information corresponding to the narration memo is directly available. The system 100 classifies a narration memo as belonging to a fourth category if no information corresponding to the narration memo is available. In case of the fourth category, the narration memo may not be always blank, but may contain words that may be meaningless. This is depicted in FIG. 3. [034] As the extent of information and ease with which the information may be extracted vary for narration memo belonging to each of the categories, the system 100 may use different approaches to further process and extract data from each of the narration memos. This is explained below:

1. For narration memo in category 1:- [035] For Category 1 a selected html tag directly gives information on the direct merchant information and address. For example, the extracted text might be ‘Fuel station in Rangpo’ for narration code ‘KAISER KUNTI FUEL RANGPO’. Hence the system 100 may use any suitable text processing and data extraction technique to extract information (i.e. Fuel Station as a refined category) from the narration memo. The extracted value i.e. Fuel Station in this example, may be then stored in html format for the given key and class_ values.

2. For narration memo in category 2:- [036] For category 2 the data is directly available but not identifiable. So by processing the transaction memo the system 100 obtains a plurality of values. For example, consider that transaction memo was generated when a user made a mobile phone recharge transaction using a mobile application of the service provider. As the narration memo belongs to category 2, only the service provider information is available from the narration memo. However, the service provider may be providing a variety of services such as but not limited to personal banking, television recharge, and utility bill payment. Hence direct processing the narration memo may not help the system 100 understand the service for which the user has made the transaction. For example, an html extraction for a narration ‘ONL Aircel Payment’ may provide the following refined category list: [' bank ', ' mobile postpaid ', ' aircel bill payment', ' aircel billpay ', ' pmt face', ' online recharge', ' aircel prepaid mobile recharge html', ' Personal Banking ', ' utility ', ' aircel prepaid mo ', ' aircel mobile bill payment', ' aircel online prepaid recharge services', ' aircel prepaid mobile online recharge', ' aircel online recharge'] [037] So, in order to identify the service, the system 100 matches the words in the narration memo with the words in the refined category list. In this example, the ‘Aircel®’ and ‘payment’ are compared against every element in the refined category list, and in this example, during the comparison, ‘ONL’ is removed as it has less than 3 characters and is meaningless. For the remaining terms, ‘aircel bill payment’ is identified as closest match, and hence is identified as the refined category matching the narration memo. For narration memo in category 3:- [038] For the narration memo belonging to Category 3, the extracted tag has multiple indirect links. For example, in case of narration memo ‘ONL Paytm TicketsPayment’, the type of ticket is not specified. Since Paytm® offers multiple services like travel and entertainment, the ticket purchased could be for any. Hence the system 100 extracts a list of possible categories from all the links and chooses the refined category based on either mode or word to word match with the narration memo. For example, extraction from web-page of the aforementioned narration memo belonging to category 3 give the following set of refined category list. [' train tickets', ' bus tickets', ' movies', ' flights', ' offer ', ' free movie tickets', ' shop ', ' get 100 cashback on transaction charges upto ', ' shop ', ' zero charges guaranteed across all train e ticket ', ' about us ', ' our policies', ' Movie Tickets ', ' Khandwa'] [039] For this category, the system cannot use the approach used for category 2, as the narration memo has multiple links as opposed to having only one in the narration memo in category 2. In order to identify the finer category from the narration memo belonging to category 3, the system 100 processes the text in the narration memo to identify most frequently occurred category, and then determines the refined category based on the identified most frequently occurred category. In the aforementioned example, ‘train tickets’ has occurred most, and the same is identified as the finer category, by the system 100. For narration memo in category 4:- [040] For Category 4, the tags do not have information relating to the narration memo. This may happen for the unintelligible narration memo given by the users themselves as ‘DD’, ‘aaa’, ‘asdasd’ etc. To identify the finer category matching the narration memo belonging to category 4, the system 100 uses a data model that is generated to perform finer category identification for transaction memos belonging to the category 4. Method of generating the data model is explained below. [041] In an embodiment, the data model is a multiclass Support Vector Machine (SVM) model to predict a broader category for the null input i.e. category no.4 data. Training data used for generating the data model includes past transaction data of a plurality of customers, by checking the spend patterns on a monthly basis, demographics, customer relationships with one or more banks, deposit balances, time of transaction and so on, and a corresponding “Final category” of the transaction memo. The term “final category”, for a narration memo, in this context refers to the finer category/spend category that has been identified as matching the narration memo. The training data may also include information on transaction memos belonging to the first category, the second category, and the third category, and corresponding final category. [042] The aforementioned step of determining the finer category (also referred to as “spend categories”) corresponding to each of the transaction memo is executed at step 210 by the system 100. Further, at step 212, the system 100 maps, via the one or more hardware processors, each of the identified one or more finer categories with a corresponding broader category. In an embodiment, the system 100 performs the mapping of the finer categories with the corresponding broad category, by using the data model if the data model is trained using training data comprising mapping of a plurality of finer categories and corresponding broader categories. [043] Further, at step 214, the system 100 generates, via the one or more hardware processors, a spend category representation of each of the plurality of users, based on the one or more broader categories to which the one or more spend categories are mapped. The spend category representation of a user indicates or represents the purchase interests of the user. Further, the information on the transaction data, the web-pages identified, the spend category each transaction data is associated with, the broader categories, and the generated spend category representation, form a training data, which is then used to train a data model to perform the spend categorization. The data model thus generated may be stored in a perpetual table in the system. When a new transaction data comprising at least one transaction memo is fed as input to the system 100, the system processes the at least one transaction memo using the data model to perform the spend categorization and in turn generate the spend category representation (spend categorization data) of the user as depicted in FIG. 4. [044] While training the data model, the system 100 determines by processing the training data, frequency of payments made during a considered time interval. For example, consider that on the 5th of every month in the considered time interval, the customer makes a fixed transaction of Rs. 10000/- and enters some unclear narration code every time. The system 100 detects this pattern by comparing transactions made on one day to the transactions made on the same day of previous 6 months. The comparison happens on day basis rather than hourly. Once the pattern is found, the system 100 matches corresponding transaction information such as but not limited to amount, time of transaction (day, hour etc.), party age, gender, and location to already determined final categories of the category 1, 2 and 3. A similarity score, which indicates/represents extent of similarity of the determined data with the data in categories 1,2 and 3, is calculated. If the similarity score exceeds a threshold of similarity score, the corresponding category is determined as relevant. After identifying all the categories that are relevant, the category with which the relevancy is maximum is determined as the final category. [045] In case a pattern is not found in the transactions then the model depends on other information such as demographics, time of transaction, age, gender etc., to find the closest match from the previously defined categories. It is to be noted that though the multiclass SVM is being referred to in this description, other suitable data models may be used. The generated data model is then used by the system 100 to process transaction memos determined as belonging to category 4, to identify the corresponding finer category. [046] Information on the classification is then used by the system 100 at step 210 to identify one or more spend categories the transaction data is associated with, based on association of the narration memos with the plurality of categories. [047] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims. [048] The embodiments of present disclosure herein address unresolved problem of spend categorization when availability of data in transaction memo is less or nil. The embodiment, thus provides a method and system for generating a training data for spend categorization. Moreover, the embodiments herein further provide a mechanism of training a data model using the generated training data to generate a trained data model, which in turn is used for spend categorization of real-time data. [049] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs. [050] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. [051] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. [052] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media. [053] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

Specification

Claims:We Claim:
1. A processor implemented method (200) of generating a training data for spend categorization, comprising:
fetching (202) a transaction data as input, via one or more hardware processors, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time;
pre-processing (204) the transaction data, via the one or more hardware processors, to generate a pre-processed transaction data;
identifying (206), via the one or more hardware processors, a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data;
classifying (208), via the one or more hardware processors, the web-page corresponding to each of a plurality of narration memos as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos;
identifying (210), via the one or more hardware processors, one or more spend categories the transaction data is associated with, based on association of the narration memos with the plurality of categories;
mapping (212), via the one or more hardware processors, each of the identified one or more spend categories with a corresponding broader category; and
generating (214), via the one or more hardware processors, a spend category representation of each of the plurality of users, based on one or more of the broader categories to which the one or more spend categories are mapped, wherein
information on a) the transaction data, b) web-pages identified for the plurality of narration memos, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data.

2. The method as claimed in claim 1, wherein the plurality of categories comprise a first category, a second category, a third category, and a fourth category, wherein
in the first category, merchant and location information corresponding to the narration memo is directly available,
in the second category, the merchant and location information corresponding to the narration memo is directly available but is not identifiable,
in the third category, the merchant information is not available but the location information corresponding to the narration memo is directly available, and
in the fourth category, no information corresponding to the narration memo is available.

4. The method as claimed in claim 1, wherein the data model is trained using the training data to generate a trained data model, wherein the trained data model is used to process real-time input comprising one or more transaction data of one or more users to perform spend categorization of each of the one or more users.

5. A system (100) for generating a training data for spend categorization, comprising:
one or more hardware processors (102);
a communication interface (112); and
a memory (104) operatively coupled to the one or more hardware processors via the communication interface, wherein a plurality of instructions in the memory when executed cause the one or more hardware processors to:
fetch a transaction data as input, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time;
pre-process the transaction data to generate a pre-processed transaction data;
identify a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data;
classify the web-page corresponding to each of a plurality of narration memos as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos;
identify one or more spend categories the transaction data is associated with, based on association of the narration memos with the plurality of categories;
map each of the identified one or more spend categories with a corresponding broader category; and
generate a spend category representation of each of the plurality of users, based on the one or more broader categories to which the one or more spend categories are mapped, wherein
information on a) the transaction data, b) web-pages identified for the plurality of narration memos, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data.

6. The system as claimed in claim 5, wherein the plurality of categories comprises a first category, a second category, a third category, and a fourth category, wherein
in the first category merchant and location information corresponding to the narration memo is directly available,
in the second category the merchant and location information corresponding to the narration memo is directly available but is not identifiable,
in the third category the merchant information is not available but the location information corresponding to the narration memo is directly available, and
in the fourth category no information corresponding to the narration memo is available.

Dated this 01st day of March 2022
Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086 , Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:

METHOD AND SYSTEM FOR SPEND CATEGORIZATION

Applicant

Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
[001] The disclosure herein generally relates to data processing, and, more particularly, to a method and system for spend categorization.

BACKGROUND
[002] People make purchases from various service providers, and they purchase products and/or services related to different categories. For example, the purchased item may be any or all of fuel, food, movie tickets, household items, electronics items, vehicle accessories and so on. Purchase history of every person/user indicates purchase preferences of the user, and this is a crucial input in various applications such as but not limited to targeted advertisements. If the purchase preferences of a user are known, advertisements/purchase suggestions that match the preferences of the user may be sent to the user, and this improves chances of receiving positive feedback and the user purchasing advertised item.
[003] With advancement in technology, many of the users have started using digital transactions modes as preferred payment method. Every time a user makes a transaction using a digital payment option, a digital record (alternately referred to as “narration memo”) of the transaction is generated. The digital record may sometimes contain only partial information with respect to the transaction made. For example, consider that the user made debit card transaction at a restaurant. Corresponding digital record generated may have information on a unique identification of the restaurant (i.e. the merchant), and location, and this information helps to categorize this transaction as related to a food category. However, in many instances, such data may not be available from the narration memo, and as a result, it may not be possible to categorize the transactions. For example, when a customer makes a third party money transfer, the customer is given an option to add details of the transaction as a comment. However, most of the times the customers may skip the step of adding the comment or may input some random meaningless data for the sake of filling up highlighted fields. However, not adding any text in the narration field or adding the meaningless random text results in transaction details not being recorded properly, and this in turn affects quality with which the spend categorization is being performed by the state of the art existing systems used for the spend categorization.

SUMMARY
[004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method of generating a training data for spend categorization is provided. In this method, initially a transaction data is fetched as input, via one or more hardware processors, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time. The transaction data is then pre-processed via the one or more hardware processors, to generate a pre-processed transaction data. Further, a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data is identified via the one or more hardware processors. Further, the web-page corresponding to each of a plurality of narration memos is classified as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos, via the one or more hardware processors. Further, one or more spend categories the transaction data is associated with, are identified based on association of the narration memos with the plurality of categories, via the one or more hardware processors, based on association of the narration memos with the plurality of categories. Further, each of the identified one or more spend categories is mapped with a corresponding broader category via the one or more hardware processors. Further, a spend category representation of each of the plurality of users is generated via the one or more hardware processors, based on one or more of the one or more broader categories to which the one or more spend categories are mapped, wherein information on a) the transaction data, b) the web-pages identified, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data.
[005] In another aspect, the plurality of categories comprises a first category, a second category, a third category, and a fourth category, wherein in the first category merchant and location information corresponding to the narration memo is directly available, in the second category the merchant and location information corresponding to the narration memo is directly available but is not identifiable, in the third category the merchant information is not available but the location information corresponding to the narration memo is directly available, and in the fourth category no information corresponding to the narration memo is available.
[006] In yet another aspect, the method involves training a data model using the training data to generate a trained data model, wherein the trained data model is used to process real-time input comprising one or more transaction data of one or more users to perform spend categorization of each of the one or more users.
[007] In yet another aspect, a system for generating a training data for spend categorization is provided. The system includes one or more hardware processors, a communication interface, and a memory operatively coupled to the one or more hardware processors via the communication interface, wherein a plurality of instructions in the memory when executed cause the one or more hardware processors to initially fetch a transaction data as input, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time. The transaction data is then pre-processed via the one or more hardware processors, to generate a pre-processed transaction data. Further, a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data is identified via the one or more hardware processors. Further, the web-page corresponding to each of a plurality of narration memos is classified as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos, via the one or more hardware processors. Further, one or more spend categories the transaction data is associated with, are identified based on association of the narration memos with the plurality of categories, via the one or more hardware processors, based on association of the narration memos with the plurality of categories. Further, each of the identified one or more spend categories is mapped with a corresponding broader category via the one or more hardware processors. Further, a spend category representation of each of the plurality of users is generated via the one or more hardware processors, based on one or more of the one or more broader categories to which the one or more spend categories are mapped, wherein information on a) the transaction data, b) the web-pages identified, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data.
[008] In yet another aspect, the one or more hardware processors are configured to train a data model using the training data to generate a trained data model, wherein the trained data model is used to process real-time input comprising one or more transaction data of one or more users to perform spend categorization of each of the one or more users.
[009] In yet another aspect, a non-transitory computer readable medium for generating a training data for spend categorization is provided. The non-transitory computer readable medium includes a plurality of instructions, which when executed, cause the following steps. Initially a transaction data is fetched as input, via one or more hardware processors, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time. The transaction data is then pre-processed via the one or more hardware processors, to generate a pre-processed transaction data. Further, a web-page corresponding to each of a plurality of narration memos in the pre-processed transaction data is identified via the one or more hardware processors. Further, the web-page corresponding to each of a plurality of narration memos is classified as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos, via the one or more hardware processors. Further, one or more spend categories the transaction data is associated with, are identified based on association of the narration memos with the plurality of categories, via the one or more hardware processors, based on association of the narration memos with the plurality of categories. Further, each of the identified one or more spend categories is mapped with a corresponding broader category via the one or more hardware processors. Further, a spend category representation of each of the plurality of users is generated via the one or more hardware processors, based on one or more of the one or more broader categories to which the one or more spend categories are mapped, wherein information on a) the transaction data, b) the web-pages identified, c) the spend category each of the transaction data is associated with, d) the broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation, form the training data.
[010] In yet another aspect the non-transitory computer readable medium is configured to cause the one or more hardware processors to train a data model using the training data to generate a trained data model, wherein the trained data model is used to process real-time input comprising one or more transaction data of one or more users to perform spend categorization of each of the one or more users.
[011] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
[012] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[013] FIG. 1 illustrates an exemplary system for spend categorization according to some embodiments of the present disclosure.
[014] FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps involved in the process of generating a data model for spend categorization, by the system of FIG. 1, according to some embodiments of the present disclosure.
[015] FIG. 3 various categories of transaction memo, based on information in the transaction memo, in accordance with some embodiments of the present disclosure.
[016] FIG. 4 is an example illustration of use of trained data model for spend categorization of real-time data, by the system of FIG. 1, according to some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
[017] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[018] With advancement in technology, many of the users have started using digital transactions modes as preferred payment method. Every time a user makes a transaction using a digital payment option, a digital record ( alternately referred to as “narration memo”) of the transaction is generated. The digital record may or may not contain some amount of information with respect to the transaction made. For example, consider that the user made debit card transaction at a restaurant. Corresponding digital record generated may have information on a unique identification of the restaurant (i.e. the merchant), and location, and this information helps to categorize this transaction as related to a food category. However, in many instances, such data may not be available from the narration memo, and as a result, it may not be possible to categorize the transactions and in turn perform the spend categorization of users.
[019] Embodiments herein provide a method and system for spend categorization. The system categorizes a plurality of narration memos in a fetched transaction data as belonging to different categories, based on a) availability of certain required information, and b) whether the available information is identifiable. The system generates a training data comprising information on a) a transaction data, b) web-pages identified for the plurality of narration memos, c) the spend category each of the transaction data is associated with, d) a broader category each of the identified one or more spend categories is associated with, and e) the generated spend category representation. The system then trains a data model using the generated training data. The trained data model is then used for performing spend categorization of different users. In various embodiments, the system may be configured to use other techniques to extract the information from narration memos in which the required information is available and identifiable.
[020] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 4, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[021] FIG. 1 illustrates an exemplary system for spend categorization according to some embodiments of the present disclosure. The system 100 includes or is otherwise in communication with hardware processors 102, at least one memory such as a memory 104, an I/O interface 112. The hardware processors 102, memory 104, and the Input /Output (I/O) interface 112 may be coupled by a system bus such as a system bus 108 or a similar mechanism. In an embodiment, the hardware processors 102 can be one or more hardware processors.
[022] The I/O interface 112 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 112 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer and the like. Further, the I/O interface 112 may enable the system 100 to communicate with other devices, such as web servers, and external databases.
[023] The I/O interface 112 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the I/O interface 112 may include one or more ports for connecting several computing systems with one another or to another server computer. The I/O interface 112 may include one or more ports for connecting several devices to one another or to another server.
[024] The one or more hardware processors 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, node machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 102 is configured to fetch and execute computer-readable instructions stored in the memory 104.
[025] The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 106.
[026] The plurality of modules 106 include programs or coded instructions that supplement applications or functions performed by the system 100 for executing different steps involved in generating the training data and in turn a trained data model for spend categorization. The plurality of modules 106, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 106 may also be used as, signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 106 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 102, or by a combination thereof. The plurality of modules 106 can include various sub-modules (not shown). The plurality of modules 106 may include computer-readable instructions that supplement applications or functions performed by the system 100 for the spend categorization.
[027] The data repository (or repository) 110 may include a plurality of abstracted piece of code for refinement and data that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 106.
[028] Although the data repository 110 is shown internal to the system 100, it will be noted that, in alternate embodiments, the data repository 110 can also be implemented external to the system 100, where the data repository 110 may be stored within a database (repository 110) communicatively coupled to the system 100. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in FIG. 1) and/or existing data may be modified and/or non-useful data may be deleted from the database. In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS). Functions of the components of the system 100 are now explained with reference to steps in flow diagrams in FIG. 2.
[029] FIGS. 2A and 2B (collectively referred to as FIG. 2) is a flow diagram depicting steps of a method (200) involved in the process of generating a data model for spend categorization, by the system of FIG. 1, according to some embodiments of the present disclosure.
[030] At step 202 of the method 200, the system 100 fetches a transaction data as input, via one or more hardware processors, wherein the transaction data corresponds to a plurality of transactions made by a plurality of users over a period of time. The system 100 may optionally store the fetched transaction data in a master table, for further processing. The system 100 may fetch information on the transactions from transaction detail tables maintained by banks and other financial institutions. The narration memo may contain certain information with respect to the transaction made. When a customer makes a credit/debit card transaction at a merchant point, the merchant name and location are stored as Narration memo. For example, if the customer used his credit/debit card to make payment for refueling at a gas station, the narration memo stored for this transaction in a banking database may be: ‘othpos036209904 KAISER KUNTI FUEL RANGPO’, which contains the merchant name and the city location. Secondly, when a customer does online transaction from a well-known website, they provide their information to the banking interfaces which is then stored as the Narration memo. For example, if a customer books train ticket from Paytm®, the narration memo may be stored as ‘ONL Paytm TicketsPayment’. When a customer does any NEFT or IMPS payments from their online accounts, most of the banks generally provide a pre-defined set of options for the customer to choose from. There are generally 6-8 options ranging from education, loan, shopping, gift to relatives etc. There is also an option where the customer could themselves fill the details if any given option does not conform to their transactions. This approach however, gives the vaguest and incorrect detail as it so happens, customers choose the option randomly without giving it a second thought. For example, the customers fill unintelligible words like aaa, aadssf when they themselves have to provide the input.
[031] Further, at step 204, the system 100 pre-processes the transaction data, via the one or more hardware processors, to generate a pre-processed transaction data. Pre-processing involves formatting the fetched transaction data to convert to a desired format, as required for further processing. The system 100 may perform the pre-processing based on pre-configured rules/requirements. For example, during the pre-processing, the system 100 may remove numbers and special characters from the transaction data, using appropriate mechanism that may be known in the art. For example, the system 100 may use regular expression function in python to remove the symbols and numbers from the narration memo. Further, as part of text cleaning during the pre-processing, the system 100 may remove all words having 3 or less characters (that are not recognized as English words). Further, during the pre-processing, the system checks for words that may have been joined together in narration in the transaction data 100, which individually may be meaningful, and if found, splits such words to form meaningful words. For example, consider that the word identified by the system 100 is ‘PaytmTicketsPayment’ , then the system 100 splits the word as Paytm Tickets Payment to form the meaningful words. It is to be noted that the aforementioned steps of removing numbers and special characters, removing words, and splitting words are example steps, and appropriate other pre-processing steps may be carried out as required, as required.
[032] Further, at step 206, the system 100 identifies at least one web-page corresponding to each of the plurality of narration memos in the pre-processed transaction data. The system 100 may use any suitable approach, for example, a web scraping technique, to identify the web-page(s) matching each of the narration memos. The web scraping (also referred to as ‘scraping’) allows the system 100 to scrap html files for each of the transaction memos. HTML files are coded representation of the web-page. The narration memos from the pre-processed are used to search the web-page for that entry. For example, if the narration memo for one of the transactions was ‘ONL Paytm TicketsPayment ON9000786HH’, after cleaning the text the input to web scrapper would be ‘ONL Paytm Tickets Payment’. The html file corresponding to the web-page is extracted and is temporarily stored. This way the web-pages for every narration memo in the form of html is created/generated by the system 100.
[033] Further, at step 208, the system 100 classifies via the one or more hardware processors, the web-page corresponding to each of the narration memos as belonging to one of a plurality of categories, based on information present in each of the plurality of narration memos. The system 100 classifies a narration memo as belonging to a first category if merchant and location information corresponding to the narration memo is directly available. The system 100 classifies a narration memo as belonging to a second category if the merchant and location information corresponding to the narration memo is directly available but is not identifiable. The system 100 classifies a narration memo as belonging to a third category if the merchant information is not available but the location information corresponding to the narration memo is directly available. The system 100 classifies a narration memo as belonging to a fourth category if no information corresponding to the narration memo is available. In case of the fourth category, the narration memo may not be always blank, but may contain words that may be meaningless. This is depicted in FIG. 3.
[034] As the extent of information and ease with which the information may be extracted vary for narration memo belonging to each of the categories, the system 100 may use different approaches to further process and extract data from each of the narration memos. This is explained below:
1. For narration memo in category 1:-
[035] For Category 1 a selected html tag directly gives information on the direct merchant information and address. For example, the extracted text might be ‘Fuel station in Rangpo’ for narration code ‘KAISER KUNTI FUEL RANGPO’. Hence the system 100 may use any suitable text processing and data extraction technique to extract information (i.e. Fuel Station as a refined category) from the narration memo. The extracted value i.e. Fuel Station in this example, may be then stored in html format for the given key and class_ values.
2. For narration memo in category 2:-
[036] For category 2 the data is directly available but not identifiable. So by processing the transaction memo the system 100 obtains a plurality of values. For example, consider that transaction memo was generated when a user made a mobile phone recharge transaction using a mobile application of the service provider. As the narration memo belongs to category 2, only the service provider information is available from the narration memo. However, the service provider may be providing a variety of services such as but not limited to personal banking, television recharge, and utility bill payment. Hence direct processing the narration memo may not help the system 100 understand the service for which the user has made the transaction. For example, an html extraction for a narration ‘ONL Aircel Payment’ may provide the following refined category list:
[' bank ', ' mobile postpaid ', ' aircel bill payment', ' aircel billpay ', ' pmt face', ' online recharge', ' aircel prepaid mobile recharge html', ' Personal Banking ', ' utility ', ' aircel prepaid mo ', ' aircel mobile bill payment', ' aircel online prepaid recharge services', ' aircel prepaid mobile online recharge', ' aircel online recharge']
[037] So, in order to identify the service, the system 100 matches the words in the narration memo with the words in the refined category list. In this example, the ‘Aircel®’ and ‘payment’ are compared against every element in the refined category list, and in this example, during the comparison, ‘ONL’ is removed as it has less than 3 characters and is meaningless. For the remaining terms, ‘aircel bill payment’ is identified as closest match, and hence is identified as the refined category matching the narration memo.
For narration memo in category 3:-
[038] For the narration memo belonging to Category 3, the extracted tag has multiple indirect links. For example, in case of narration memo ‘ONL Paytm TicketsPayment’, the type of ticket is not specified. Since Paytm® offers multiple services like travel and entertainment, the ticket purchased could be for any. Hence the system 100 extracts a list of possible categories from all the links and chooses the refined category based on either mode or word to word match with the narration memo. For example, extraction from web-page of the aforementioned narration memo belonging to category 3 give the following set of refined category list.
[' train tickets', ' bus tickets', ' movies', ' flights', ' offer ', ' free movie tickets', ' shop ', ' get 100 cashback on transaction charges upto ', ' shop ', ' zero charges guaranteed across all train e ticket ', ' about us ', ' our policies', ' Movie Tickets ', ' Khandwa']
[039] For this category, the system cannot use the approach used for category 2, as the narration memo has multiple links as opposed to having only one in the narration memo in category 2. In order to identify the finer category from the narration memo belonging to category 3, the system 100 processes the text in the narration memo to identify most frequently occurred category, and then determines the refined category based on the identified most frequently occurred category. In the aforementioned example, ‘train tickets’ has occurred most, and the same is identified as the finer category, by the system 100.
For narration memo in category 4:-
[040] For Category 4, the tags do not have information relating to the narration memo. This may happen for the unintelligible narration memo given by the users themselves as ‘DD’, ‘aaa’, ‘asdasd’ etc. To identify the finer category matching the narration memo belonging to category 4, the system 100 uses a data model that is generated to perform finer category identification for transaction memos belonging to the category 4. Method of generating the data model is explained below.
[041] In an embodiment, the data model is a multiclass Support Vector Machine (SVM) model to predict a broader category for the null input i.e. category no.4 data. Training data used for generating the data model includes past transaction data of a plurality of customers, by checking the spend patterns on a monthly basis, demographics, customer relationships with one or more banks, deposit balances, time of transaction and so on, and a corresponding “Final category” of the transaction memo. The term “final category”, for a narration memo, in this context refers to the finer category/spend category that has been identified as matching the narration memo. The training data may also include information on transaction memos belonging to the first category, the second category, and the third category, and corresponding final category.
[042] The aforementioned step of determining the finer category (also referred to as “spend categories”) corresponding to each of the transaction memo is executed at step 210 by the system 100. Further, at step 212, the system 100 maps, via the one or more hardware processors, each of the identified one or more finer categories with a corresponding broader category. In an embodiment, the system 100 performs the mapping of the finer categories with the corresponding broad category, by using the data model if the data model is trained using training data comprising mapping of a plurality of finer categories and corresponding broader categories.
[043] Further, at step 214, the system 100 generates, via the one or more hardware processors, a spend category representation of each of the plurality of users, based on the one or more broader categories to which the one or more spend categories are mapped. The spend category representation of a user indicates or represents the purchase interests of the user. Further, the information on the transaction data, the web-pages identified, the spend category each transaction data is associated with, the broader categories, and the generated spend category representation, form a training data, which is then used to train a data model to perform the spend categorization. The data model thus generated may be stored in a perpetual table in the system. When a new transaction data comprising at least one transaction memo is fed as input to the system 100, the system processes the at least one transaction memo using the data model to perform the spend categorization and in turn generate the spend category representation (spend categorization data) of the user as depicted in FIG. 4.
[044] While training the data model, the system 100 determines by processing the training data, frequency of payments made during a considered time interval. For example, consider that on the 5th of every month in the considered time interval, the customer makes a fixed transaction of Rs. 10000/- and enters some unclear narration code every time. The system 100 detects this pattern by comparing transactions made on one day to the transactions made on the same day of previous 6 months. The comparison happens on day basis rather than hourly. Once the pattern is found, the system 100 matches corresponding transaction information such as but not limited to amount, time of transaction (day, hour etc.), party age, gender, and location to already determined final categories of the category 1, 2 and 3. A similarity score, which indicates/represents extent of similarity of the determined data with the data in categories 1,2 and 3, is calculated. If the similarity score exceeds a threshold of similarity score, the corresponding category is determined as relevant. After identifying all the categories that are relevant, the category with which the relevancy is maximum is determined as the final category.
[045] In case a pattern is not found in the transactions then the model depends on other information such as demographics, time of transaction, age, gender etc., to find the closest match from the previously defined categories. It is to be noted that though the multiclass SVM is being referred to in this description, other suitable data models may be used. The generated data model is then used by the system 100 to process transaction memos determined as belonging to category 4, to identify the corresponding finer category.
[046] Information on the classification is then used by the system 100 at step 210 to identify one or more spend categories the transaction data is associated with, based on association of the narration memos with the plurality of categories.
[047] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[048] The embodiments of present disclosure herein address unresolved problem of spend categorization when availability of data in transaction memo is less or nil. The embodiment, thus provides a method and system for generating a training data for spend categorization. Moreover, the embodiments herein further provide a mechanism of training a data model using the generated training data to generate a trained data model, which in turn is used for spend categorization of real-time data.
[049] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[050] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[051] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[052] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[053] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

Documents

Application Documents

#	Name	Date
1	202221011118-STATEMENT OF UNDERTAKING (FORM 3) [01-03-2022(online)].pdf	2022-03-01
2	202221011118-REQUEST FOR EXAMINATION (FORM-18) [01-03-2022(online)].pdf	2022-03-01
3	202221011118-FORM 18 [01-03-2022(online)].pdf	2022-03-01
4	202221011118-FORM 1 [01-03-2022(online)].pdf	2022-03-01
5	202221011118-FIGURE OF ABSTRACT [01-03-2022(online)].jpg	2022-03-01
6	202221011118-DRAWINGS [01-03-2022(online)].pdf	2022-03-01
7	202221011118-DECLARATION OF INVENTORSHIP (FORM 5) [01-03-2022(online)].pdf	2022-03-01
8	202221011118-COMPLETE SPECIFICATION [01-03-2022(online)].pdf	2022-03-01
9	202221011118-FORM-26 [22-06-2022(online)].pdf	2022-06-22
10	Abstract1.jpg	2022-07-04
11	202221011118-Proof of Right [24-08-2022(online)].pdf	2022-08-24