Abstract: Disclosed herein is a method and system 202 for identifying prospective bill defaulters. In particular, the present disclosure provides a system 202 and a method that is capable of efficiently analysing data associated with a huge customer base, identify defaulting patterns and based on the identification, accurately predict the customers that are likely to default on their upcoming bill payments. Further, the present disclosure, employs various machine learning (ML) based classification models 108 that are trained to accurately make the prediction. [Figure 6]
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION (See section 10, rule 13)
“METHOD AND SYSTEM FOR IDENTIFYING PROSPECTIVE BILL
DEFAULTERS”
Name and address of the applicant:
a) Name: MASTEK LTD.
b) Nationality: Indian
c) Address: Mastek Millennium Center, A-7, Millennium Business Park, Off Thane Belapur Road, Mahape, Navi Mumbai - 400 710, India
PREAMBLE TO THE DESCRIPTION
The following specification particularly describes the invention and the manner in which it is to be performed:
TECHNICAL FIELD
The disclosure generally relates a method and a system for predicting potential bill payment defaulters.
BACKGROUND OF THE INVENTION
The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
Organizations/businesses/companies not limiting to power distribution companies, telecom companies etc, typically operate in an invoicing system and hence, often face an issue of late payment and/or non-payment of the bills/invoices of the services provided to customers/consumers. These defaults may prove to be detrimental to profit margin of the organization/business/company. In particular, when a customer exceeds the due date for making the payment, the company has limited options to notify the customer and facilitate them to make the bill payment. These options generally comprise issuing dunning notices to the customers over a certain period from the actual due date.
However, even after several reminders (or dunning notices) and penalties, there exists a set of customers that default on paying their bills. Taking harsh actions such as stopping services to each and every defaulter would negatively impact the customer base and therefore, the well-being of the company. Also, there may exist another set of customers that may make late bill payments after the expiry of the due date. In such a scenario, the P&L (profit and loss) of the company/business/organization gets imbalanced as there is no visibility on when such customers would actually pay their bills.
Therefore, it is imperative to be proactive and prevent such late payments/defaulting scenarios by understanding patterns of default and working towards reducing the value of defaults and eventually preventing them.
However, one major challenge that may arise in preventing such defaults is that generally such organizations/businesses/companies have a huge customer/consumer base and therefore, working towards preventing defaults requires analysis of a large amount of data associated with the huge customer/consumer base.
Further, it is also noted that certain techniques in the past to identify defaulters has led to false negatives and false positives resulting in misguiding facts. Hence, it doesn't solve the purpose of reducing the monetary value of defaults and defaulters.
There is, therefore, a need for robust techniques that overcome the above-mentioned challenges and helps predict/identify potential defaulters so as to proactively reduce and prevent default payments.
SUMMARY OF THE INVENTION
The present disclosure overcomes one or more shortcomings of the prior art and provides additional advantages. Embodiments and aspects of the disclosure described in detail herein are considered a part of the claimed disclosure.
In one non-limiting embodiment of the present disclosure, a method for identifying prospective bill defaulters is disclosed. The method comprising receiving data associated with one or more customers, wherein the data comprises at least customer's business information, billing and payment details and customer's demographic details. The method further comprises pre-processing the received data to extract information based on a plurality of parameters corresponding to bill payment history and business type. The method further comprises segmenting, based on the pre-processed data, each customer of the one or more customers into a cluster of a plurality of predefined clusters. Further, the method comprises processing, using distinct pre-trained machine learning based classification models, the pre-processed data in each cluster to identify at least one prospective bill defaulting customer.
In another non-limiting embodiment of the present disclosure, wherein the distinct machine learning based classification models are trained by receiving data associated with a predetermined set of customers, wherein the data comprises at least customer's business information, billing and payment details and customer's demographic details.
Training the distinct machine learning based classification models further comprises pre-processing the received data to extract information based on a plurality of parameters corresponding to bill payment history and business type. Training the distinct machine learning based classification models further comprises segmenting, based on the pre-processed data, each customer of the predetermined set of customers into a cluster of a plurality of clusters, wherein the plurality of clusters are defined based on at least average bill per month and payment history. Training the distinct machine learning based classification models further comprises determining a plurality of features for each cluster, wherein each feature is associated with a corresponding importance value indicating a level of importance in predicting a prospective bill defaulter in the cluster. Training the distinct machine learning based classification models further comprises predicting based on the plurality of features, one or more prospective bill defaulting customers from each cluster.
In yet another non-limiting embodiment of the present disclosure, wherein training the distinct machine learning based classification models further comprises determining an accuracy of prediction for each of the trained ML based classification models and updating one or more trained ML based classification models on determining that the accuracy of prediction is below a threshold value.
In yet another non-limiting embodiment of the present disclosure, wherein the plurality of clusters (106a-106d) at least comprises a Moderate Usage - Frequent Defaulters cluster, a Moderate Usage - Regular Payers cluster, a Heavy Usage - Frequent Defaulters cluster and a Heavy Usage - Regular Payers cluster.
In yet another non-limiting embodiment of the present disclosure, wherein the plurality of features at least comprises age of customer, Last Year Same Month (LYSM) data, current bill amount, average bill of customer, minimum bill amount, billing month, relationship of customer, maximum bill amount, postal code, type of premise of customer and IT.
In yet another non-limiting embodiment of the present disclosure, a system to identify prospective bill defaulters is disclosed. The system comprises a memory and a processor operatively coupled to the memory. The processor is configured to receive data associated with one or more customers, wherein the data comprises at least
customer's business information, billing and payment details and customer's demographic details. The processor is further configured to pre-process the received data to extract information based on a plurality of parameters corresponding to bill payment history and business type. The processor is further configured to segment, based on the pre-processed data, each customer of the one or more customers into a cluster of a plurality of predefined clusters. The processor is further configured to process, using distinct pre-trained machine learning based classification models, the pre-processed data in each cluster to identify at least one prospective bill defaulting customer.
In yet another embodiment of the present disclosure, wherein to train the distinct machine learning based classification models, the processor is configured to receive data associated with a predetermined set of customers, wherein the data comprises at least customer's business information, billing and payment details and customer's demographic details. The processor is further configured to pre-process the received data to extract information based on a plurality of parameters corresponding to bill payment history and business type. The processor is further configured to segmenting, based on the pre-processed data, each customer of the predetermined set of customers into a cluster of a plurality of clusters, wherein the plurality of clusters are defined based on at least average bill per month and payment history. The processor is further configured to determine a plurality of features for each cluster, wherein each feature is associated with a corresponding importance value indicating a level of importance in predicting a prospective bill defaulter in the cluster. The processor is further configured to predict based on the plurality of features, one or more prospective bill defaulting customers from each cluster.
In yet another non-limiting embodiment of the present disclosure, wherein the processor is further configured to determining an accuracy of prediction for each of the trained ML based classification models and updating one or more trained ML based classification models on determining that the accuracy of prediction is below a threshold value.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above,
further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
The features, nature, and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout. Some embodiments of system and/or methods in accordance with embodiments of the present subject matter are now described, by way of example only, and with reference to the accompanying Figs., in which:
Figure 1 depicts an exemplary environment 100 for identifying prospective bill defaulters in accordance with an embodiment of the present disclosure;
Figure 2 depicts a block diagram 200 of a system to identify prospective bill defaulters in accordance with an embodiment of the present disclosure;
Figures 3A-3D illustrate instances 300A-300D of descriptive analysis in accordance with an embodiment of the present disclosure;
Figure 4 depicts a graphical representation 400 of a plurality of clusters in accordance with an embodiment of the present disclosure;
Figure 5 depicts tables 500 comprising a plurality of features with their corresponding importance values in accordance with an embodiment of the present disclosure;
Figure 6 depicts a flowchart 600 of an exemplary method for identifying prospective bill defaulters in accordance with an embodiment of the present disclosure; and
Figure 7 depicts a flowchart 700 of an exemplary method for training machine learning based classification models in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
The foregoing has broadly outlined the features and technical advantages of the present disclosure in order that the detailed description of the disclosure that follows may be better understood. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure.
The novel features which are believed to be characteristic of the disclosure, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present disclosure.
The present disclosure provides a system and a method that is capable of efficiently analysing data associated with a huge customer base, identify defaulting patterns and based on the identification, accurately predict the customers that are likely to default on their upcoming bill payments. In particular, the present disclosure, employs various machine learning (ML) models that are trained to accurately make the prediction. A detailed explanation of the proposed solution is provided in the forthcoming paragraphs.
It may be noted that the techniques of the present disclosure can be applied to any sector not limiting to, credit card bill payments, telephone/mobile bill payments, government tax payments, any subscription payments, water bill payment, insurance payments, loan re-payments, education / tuition fees, any service fees (like doctor's bill, lawyers bill...etc), monthly instalments scheme and many more.
For the sake of simplicity, not limiting to, the techniques of the present disclosure by way of an example of payment of electricity bill. It may however be appreciated by a skilled person that the example is only for the sake of explaining and not to restrict the scope of invention and the techniques of the present disclosure may be applied for various different industries, use cases, applications or businesses that involves bill payments.
Further, the terms like subscribers, consumers and customers are interchangeably used in the present disclosure representing an end user of the services/products.
Now, as described above, the present disclosure employs various machine learning models to identify prospective bill defaulters or predict a likelihood of a customer to default on their upcoming bill payment. However, in order to enable the machine learning models to make the prediction, it is first necessary to train the machine learning models to accurately make said prediction. Further, for training the machine learning models, it is imperative to first select a dataset.
In one embodiment, the dataset may comprise a predetermined set of customers chosen from a wider pool of customers of the power distribution company. The predetermined set of customers 102 may be chosen based on city/region and for a predefined time interval. For instance, for the power distribution company that offers its services to various cities/states in India, a predetermined set of customers 102 may be chosen as those with their billing address being Mumbai and active for the past 2 years. Said scenario is depicted in Figure 1, that shows selection of a predetermined set of customers 102 chosen from a wider pool of customers. In another embodiment, certain additional criteria may also be applied while choosing the predetermined set of customers 102. For instance, in addition to city/region, the predetermined set of customers 102 may be chosen according to their size of business.
Upon selecting the predetermined set of customers 102, data related to the predetermined set of customers needs to be obtained. To explain how and what data pertaining to the predetermined set of customers 102 is obtained, it is imperative to introduce a system that is configured with essential components to not just obtain the data but also perform additional functionalities to achieve the object of the invention.
In view of above, said system is illustrated in Figure 2. The system 202 may comprise an I/O interface 204, a processor 206, a memory 208 and units 210 but not limited thereto. The processor 204 may be operatively coupled to the I/O interface 204, the memory 208 and the units 210. In one implementation, the units 210 may comprise a receiving unit 212, a pre-processing unit 214, a segmentation unit 216, a modelling unit 218 and an identification unit 218. The units 212, 214, 216, 218 and 220 are operatively and communicatively connected with each other. According to embodiments of present
disclosure, the units 212, 214, 216, 218 and 220 may comprise hardware components like processor, microprocessor, microcontrollers, application-specific integrated circuit for performing various operations of the system 202 or software modules or a combination of the hardware-software. It must be understood to a person skilled in art that in one embodiment, the processor 206 may be able to perform all the functions of the units 212, 214, 216, 218 and 220 alone to achieve the desired objective of the invention. Further, those skilled in the art, will also appreciate that in another embodiment, the processor 206 may work in conjunction with units 212, 214, 216, 218 and 220 to achieve the desired objective of the invention.
Further, in one implementation, the processor 206 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 206 may be configured to fetch and execute computer-readable instructions stored in the memory 208. The I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like.
Now, as described above, once the predetermined set of customers 102 has been selected, data 104 related to each customer of the predetermined set of customers 102 needs to be obtained. To achieve this, in one embodiment, the processor 206 may be used. Alternatively, in another embodiment, the processor 206 may enable the receiving unit 212 to receive said data 104. Further, the data 104 may comprise customer's business information, billing and payment details and customer's demographic details. Customer's business information may comprise industry type, number of locations -own and rental, maximum area of the site, staff strength, listing details, social media RF sentiment and like. Further, billing and payment details may comprise due date, date of receiving the bill, preferred payment avenue, prepayment, billing address, bill distribution agency, installation type, premise type and like. Furthermore, customer's demographic details may comprise location of residence, age, gender, marital status, highest qualification, social media RF sentiment, religion, family details, CIBIL score and like. Additionally, the processor 206 and/or the receiving unit 212 may receive said data 104 for the predetermined set of customers 102 from the predefined time interval.
For instance, the processor 206 and/or the receiving unit 212 may receive last two year's data 104 for each customer of the predetermined set of customers 102.
Moving on, since the received data 104 may be very large, discontinuous and erratic, it is important to pre-process the received data 104 in order to remove any discontinuities, fill data gaps and be able to extract relevant information from it. In one embodiment, the relevant information that needs to be extracted from the received data 104 may at least comprise information pertaining to customer's name, customer's business type, monthly bill amount, due date, date of receiving the bill, billing address, bill distribution agency, payment details comprising date of making payment, an average bill of each customer over the set time interval, and like. In another embodiment, the relevant information may comprise correlating payment details vis-a-vis bills and demographic details. The pre-processing of the received data 104 may be implemented either by the processor 206 or by the pre-processing unit 214 enabled by the processor 206. Further, the processor 206 and/or the pre-processing unit 214 may employ data transformation, data imputation along with various statistical methods to pre-process the data. In one embodiment, pre-processing the data 104 may prove to be beneficial in performing descriptive analysis that may further aid in identifying various patterns related to bill-payment defaulting for the predetermined set of customers 102 so as to identify parameters that would help in identifying prospective defaulters.
The descriptive analysis on the pre-processed data may be performed either by the processor 206 or by the data analysis unit 216 enabled by the processor 206. Various instances of the descriptive analytics performed on the pre-processed data containing information such as bill payment history, billing address (or postal code), payment avenue etc are illustrated in figures 3A-3D. Figure 3A depicts a default trend for the pre-determined set of customers 102 over the past 2 years. As seen from figure 3 A, there are around 33% customers that have never defaulted in the last two years while there is a wide range of customers that have defaulted more than 3 times in the last two years. Further, figure 3B illustrates a heat map depicting default value variation with various postal codes of Mumbai region. It can be readily observed from figure 3B that there are certain areas in Mumbai where the tendency of customers to default and dunning amount is very large, thereby indicating that "postal code" may be a crucial factor in identifying a prospective bill defaulter. Moving on, figure 3C illustrates a heat
map depicting loss due to payment default by preferred payment avenue indicating that a payment default may not necessarily be due to customer's fault but due to certain issues faced while making bill payment through their preferred payment avenue. Lastly, figure 3D depicts a month-wise pattern for repeated defaults. In particular, figure 3D employs upward pointing arrows, downward pointing arrows and horizontal arrows to depict an increase in the number of defaults, a decrease in the number of defaults and negligible change in the number of defaults for various different month. Further, the information depicted by this pattern may additionally comprise the list of customers that have defaulted in each month and hence, may also help in identifying a seasonal trend in defaulting for various customers. For instance, considering a customer X that actively uses the services of the power distribution company owns a woolen garment manufacturing business. From the knowledge of monthly payment pattern and his/her business type, it may be observed that customer X generally defaults on his/her bill payments during the months of June and July which may be considered as a slack period for his/her business. In another example, a customer Y who own a jewelry shop is generally observed to default on his/her payments during the months of festive months of October and November, thereby providing an understanding in a seasonal trend of defaulting for the predetermined set of customers 102.
Upon performing the descriptive analysis, one may obtain a better sense of the factors that may play an important role in identifying the prospective bill defaulters and may therefore, help in coming up with robust machine learning models.
MODELLING METHODOLOGY-
Now, the pre-processed data associated with each customer of the predetermined set of customers 102 may be used for training various machine learning models to predict/identify prospective defaulters. The modelling methodology may be implemented in various steps including but not limited to -
1) Customer segmentation using non-hierarchical clustering
2) Model development and training
3) Forecast propensity of default for each customer in a test data
4) Measure accuracy of the model
Each of the steps in the modelling methodology are explained in detail in the upcoming paragraphs -
CUSTOMER SEGMENTATION
The pre-processed data for each of the predetermined set of customers 102 may then be used by the processor 206 to identify the most crucial parameters that may help in bifurcating each customer into a category/cluster. To achieve this, the processor 206 may employ non-hierarchical clustering methods such as K-means clustering to first identify the parameters based on which the predetermined set of customers 102 may be clustered and create a plurality of clusters 106-106d. In one embodiment, based on the pre-processed data, the processor 102 may identify average bill per month and business type as the parameters to segment each customer into a cluster. In segment the predetermined set of customers 102 into a plurality of clusters 106a-106d. However, in another implementation, the processor 206 may enable the segmentation unit 216 to segment the predetermined set of customers 102 into the plurality of clusters 106a-106d. Further, as illustrated in Figure 1, the plurality of clusters 106a-106d may comprise -
• a cluster 106a associated with moderate usage - frequent defaulters comprising customers with a moderate average bill value and a high frequency of defaulting on bills.
• a cluster 106b associated with moderate usage - regular payers comprising customers with a moderate average bill value and a low frequency of defaulting on bills.
• a cluster 106c associated with heavy usage - frequent defaulters comprising customers with a large average bill value and a high frequency of defaulting on bills, and
• a cluster 106d associated with heavy usage - regular payers comprising customers with a large average bill value and a low frequency of defaulting on bills.
The segmentation of the predetermined set of customers 102 is illustrated in table 1 (below) and figure 4 by way of an example for the power distribution company, where the predetermined set of customers 102 have been selected for the years 2016-2018.
Cluster Frequency Average bill value No. of
of defaulting (INR) customers
Moderate usage - 8 3,700 1,23,566
Frequent defaulters
(106a)
Moderate usage - 1 2,400 3,09,886
Regular payers (106b)
Heavy usage - Frequent 3 71,200 3,623
defaulters (106c)
Heavy usage - Regular 2 24,200 21,513
Payers (106d)
However, it must be appreciated by a skilled person that the data depicted in table 1 is merely exemplary and the number of clusters may either be less than or greater than the number of clusters described herein. Further, the plurality of clusters 106a-106d may be created in some other manner relevant to the disclosure. Furthermore, it may be noted that according to an embodiment of the present disclosure, a customer may be considered as a frequent defaulter if they have defaulted on their bill payments at least thrice in the last two years. However, depending upon the application, the frequency at which a customer may be considered as the frequent defaulter may be varied. Additionally, a customer may be considered as a moderate usage customer if the average monthly bill falls between INR 2000-5000 and a customer may be considered as a heavy usage customer if the average monthly bill is greater than INR 5000. However, it must be noted by a skilled person that said values are merely exemplary and may vary depending upon the industry, use case, application and business.
MODEL DEVELOPMENT & TRAINING -
Upon segmenting the predetermined set of customers 102 into the plurality of clusters 106a-106d, a machine learning (ML) based classification model is to be developed and trained on data present in each cluster. In one embodiment, distinct ML based classification models 108 may be developed for each cluster using various ML building platforms. Further, the distinct ML based classification models 108 may be based on linear models, tree-based models and neural network-based models.
Now, to develop the distinct ML based classification models 108 for each cluster, the processor 206 may determine a plurality of distinct features that may be relevant to the data present in each cluster and important in predicting the propensity of a customer (belonging to a cluster) to default on an upcoming bill payment. In one embodiment, the plurality of distinct features may comprise age of customer, current bill amount, average bill of customer, minimum bill amount, billing month, relationship of customer, maximum bill amount, postal code and type of premise of customer, Last Year Same Month (LYSM) data, default pattern in the last 30 days, 60 days, 90 days etc., and Industry type (IT).
Upon developing a distinct ML based classification model 108 for each cluster, the developed ML models 108 may be tested for stability and biasness. For this purpose, the processor 206 may implement Synthetic Minority Oversampling Technique or SMOTE where to correct an imbalance in classification, examples in a minority class are oversampled. In one embodiment, this may be achieved by simply duplicating examples from the minority class in the training dataset prior to fitting a model. This may balance the class distribution but does not provide any additional information to the model.
Once, a ML-based classification model is developed for a cluster, it may then be trained on the data included in said cluster to identify, prospective customers that may default on their upcoming bill payment. To achieve this, the processor 206 may calculate an importance level of each of the plurality of features as illustrated in Figure 5. For instance, with reference to figure 5, table 502 depicts importance value for each of the plurality of features for the cluster 106a corresponding to "Moderate usage-Frequent defaulters". From the table 502, it may be observed that the bill payer attribute "30_or_less" implying whether a customer has defaulted in the last 30 days or less, has the greatest importance value signifying that while predicting the possibility of a customer Z belonging to cluster 106a to default in the upcoming bill payment, the bill payer attribute "30_or_less" plays the most important role. In another example, for the cluster 106b for moderate usage-regular payers, the bill payer attribute "current bill amount" has the highest importance value signifying that the likelihood that a customer P belonging to this cluster would default on an upcoming payment depends significantly on the current bill amount. That is, the customer P may default on an
upcoming payment if the current bill amount is greater than a certain value. Similar important value tables 506 and 508 are also depicted in figure 5 corresponding to clusters 106c and 106d respectively. Hence, based on the importance value of the plurality of features, the processor 206 trains the distinct ML based classification models 108 to identify/predict the prospective bill defaulters for an upcoming bill in each cluster.
It may further be noted by a skilled person that the development and training of distinct ML based classification models 108 may be performed by modelling unit 218 enabled by the processor 206.
FORECAST PROPENSITY OF DEFAULT FOR EACH CUSTOMER IN A TEST DATA – DETERMINE ACCURACY
Moving on, the developed and trained distinct ML based classification models 108 may be applied on test data by the processor 206 and/or the identification unit 220 in order to determine their accuracy while forecasting/predicting the propensity of default for a customer. It may be appreciated by a skilled person that the test data may be entirely different than the pre-processed data associated with the predetermined set of customers 102 and used as training data. In one embodiment, the accuracy of a trained ML-based classification model may be determined in terms of an F1 score and an accuracy percentage.
The F1 scores and accuracy percentages of the trained ML based classification models 108 for each cluster are tabulated in table 2 –
Cluster F1 score Accuracy percentage
106a 0.86 85
106b 0.84 72
106c 0.68 79
106d 0.86 86
From the above table, it is evident that the trained ML based classification models 108 for each cluster exhibit high accuracy and therefore, can efficiently predict, in real-time the propensity of a customer to default on an upcoming bill payment.
REAL-TIME IMPLEMENTATION OF THE TRAINED DISTINCT ML BASED CLASSIFICATION MODELS 108 –
Moving ahead, the trained distinct ML based classification models 108 may be implemented in real-time to identify customers that are likely to default on an upcoming bill payment.
To achieve this, in one embodiment, the processor 206 may implement the following steps –
1) receive data comprising customer’s business information, billing and payment
details and customer’s demographic details for each of the one or more customers;
2) pre-process the received data to remove any discontinuities, fill data gaps and be able to extract relevant information from it;
3) segment each of the one or more customers in a cluster of the plurality of clusters 106a-106d.
4) apply the trained ML-based classification models 108 for each cluster in which the one or more customers are segmented to identify the prospective at least one customer of the one or more customers that may default on their upcoming bill payment.
However, in another embodiment the processor 206 may enable different units such as the receiving unit 212, the pre-processing unit 214, the segmentation unit 216 and the identification unit 220 to perform the above steps.
Further, the trained ML based classification models 108 may be updated continuously to further improve their accuracy as and when they are implemented in real-time for prediction.
Figure 6 illustrates a flowchart 600 of an exemplary method for identifying prospective bill defaulters in accordance with an embodiment of the present disclosure. The method 600 may also be described in the general context of computer executable instructions. Generally, computer executable instructions may include routines, programs, objects, components, data structures, procedures, modules, and functions, which perform specific functions or implement specific abstract data types.
The order in which the method 600 is described is not intended to be construed as a limitation, and any number of the described method blocks may be combined in any
order to implement the method. Additionally, individual blocks may be deleted from the methods without departing from the spirit and scope of the subject matter described.
At step 602, the method 600 may include receiving data associated with one or more customers. In one embodiment, the data comprises at least customer’s business information, billing and payment details and customer’s demographic details.
At step 604, the method 600 may include pre-processing the received data to extract information based on a plurality of parameters corresponding to bill payment history and business type.
At step 606, the method 600 may include segmenting, based on the pre-processed data, each customer of the one or more customers into a cluster of a plurality of predefined clusters 106a-106d.
At step 608, the method 600 may include processing, using distinct pre-trained machine learning based classification models 108, the pre-processed data in each cluster to identify at least one prospective bill defaulting customer.
Figure 7 illustrates a flowchart 700 of an exemplary method for training distinct machine learning based classification models in accordance with an embodiment of the present disclosure. The method 700 may also be described in the general context of computer executable instructions. Generally, computer executable instructions may include routines, programs, objects, components, data structures, procedures, modules, and functions, which perform specific functions or implement specific abstract data types.
The order in which the method 700 is described is not intended to be construed as a limitation, and any number of the described method blocks may be combined in any order to implement the method. Additionally, individual blocks may be deleted from the methods without departing from the spirit and scope of the subject matter described.
At step 702, the method 700 may include receiving data 104 associated with a predetermined set of customers 102. In one embodiment, the data 104 comprises at least customer’s business information, billing and payment details and customer’s demographic details.
At step 704, the method 700 may include pre-processing the received data 104 to extract information based on a plurality of parameters corresponding to bill payment history and business type.
At step 706, the method 700 may include segmenting, based on the pre-processed data, each customer of the predetermined set of customers 102 into a cluster of a plurality of clusters 106a-106d. In one embodiment, the plurality of clusters 106a-106d are defined based on at least average bill per month and payment history.
At step 708, the method 700 may include determining a plurality of features for each cluster. In one embodiment, each feature is associated with a corresponding importance value indicating a level of importance in predicting a prospective bill defaulter in the cluster.
At step 710, the method 700 may include predicting based on the plurality of features, one or more prospective bill defaulting customers from each cluster.
At step 712, the method 700 may include determining an accuracy of prediction for each of the trained ML based classification models 108.
At step 714, the method 700 may include updating one or more trained ML based classification models 108 on determining that the accuracy of prediction is below a threshold value.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.
Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer- readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., are non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, non-volatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
Suitable processors include, by way of example, a general-purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a graphic processing unit (GPU), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine.
REFERENCE NUMERALS
[0077] Exemplary environment 100
[0078] Predetermined set of customers 102
[0079] data 104
[0080] Plurality of clusters 106a-106d
[0081] Learning models 108
[0082] Block diagram 200
[0083] System 202
[0084] I/O Interface 204
[0085] Processor 206
[0086] Memory 208
[0087] Units 210
[0088] Receiving unit 212
[0089] Pre-processing unit 214
[0090] Segmentation unit 216
[0091] Modelling unit 218
[0092] Identification unit 220
[0093] Instances of descriptive analysis 300A-300D
[0094] Graphical Representation of clusters 400
[0095] Features with corresponding importance value tables 502-508
[0096] Method 600
[0097] Method steps 602-608
[0098] Method 700
[0099] Method steps 702-714
WE CLAIM:
1. A method for identifying prospective bill defaulters, the method comprising:
receiving (602) data associated with one or more customers, wherein the data comprises at least customer’s business information, billing and payment details and customer’s demographic details;
pre-processing (604) the received data to extract information based on a plurality of parameters corresponding to bill payment history and business type;
segmenting (608), based on the pre-processed data, each customer of the one or more customers into a cluster of a plurality of predefined clusters; and
processing (610), using distinct pre-trained machine learning based classification models (108), the pre-processed data in each cluster to identify at least one prospective bill defaulting customer.
2. The method as claimed in claim 1, wherein the distinct machine learning based classification
models (108) are trained by:
receiving (702) data (104) associated with a predetermined set of customers (102), wherein the data (104) comprises at least customer’s business information, billing and payment details and customer’s demographic details;
pre-processing (704) the received data (104) to extract information based on a plurality of parameters corresponding to bill payment history and business type;
segmenting (706), based on the pre-processed data, each customer of the predetermined set of customers (102) into a cluster of a plurality of clusters (106a-106d), wherein the plurality of clusters (106a-106d) are defined based on at least average bill per month and payment history; and
determining (708) a plurality of features for each cluster, wherein each feature is associated with a corresponding importance value indicating a level of importance in predicting a prospective bill defaulter in the cluster; and
predicting (710) based on the plurality of features, one or more prospective bill defaulting customers from each cluster.
3. The method as claimed in claim 2, further comprising:
determining (712) an accuracy of prediction for each of the trained ML based classification models; and
updating (714) one or more trained ML based classification models (108) on determining that the accuracy of prediction is below a threshold value.
4. The method as claimed in claim 1, plurality of clusters (106a-106d) at least comprises a Moderate Usage - Frequent Defaulters cluster, a Moderate Usage - Regular Payers cluster, a Heavy Usage - Frequent Defaulters cluster and a Heavy Usage - Regular Payers cluster.
5. The method as claimed in claim 1, wherein the plurality of features at least comprises age of customer, Last Year Same Month (LYSM) data, current bill amount, average bill of customer, minimum bill amount, billing month, relationship of customer, maximum bill amount, postal code, type of premise of customer and Industry type (IT).
6. A system to identify prospective bill defaulters, the system comprising:
a memory (208); and
a processor (206) operatively coupled to the memory (208), wherein the processor (206) is configured to:
receive data associated with one or more customers, wherein the data comprises at least customer’s business information, billing and payment details and customer’s demographic details;
pre-process the received data to extract information based on a plurality of parameters corresponding to bill payment history and business type;
segment, based on the pre-processed data, each customer of the one or more customers into a cluster of a plurality of predefined clusters; and
process, using distinct pre-trained machine learning based classification models (108), the pre-processed data in each cluster to identify at least one prospective bill defaulting customer.
7. The system as claimed in claim 6, wherein to train the distinct machine learning based
classification models (108), the processor (206) is configured to:
receive data (104) associated with a predetermined set of customers (102), wherein the data (104) comprises at least customer’s business information, billing and payment details and customer’s demographic details;
pre-process the received data (104) to extract information based on a plurality of parameters corresponding to bill payment history and business type;
segment, based on the pre-processed data, each customer of the predetermined set of customers (102) into a cluster of a plurality of clusters (106a-106d), wherein the plurality of clusters (106a-106d) are defined based on at least average bill per month and payment history; and
determine a plurality of features for each cluster, wherein each feature is associated with a corresponding importance value indicating a level of importance in predicting a prospective bill defaulter in the cluster; and
predict based on the plurality of features, one or more prospective bill defaulting customers from each cluster.
8. The system as claimed in claim 7, wherein the processor (206) is further configured to:
determine an accuracy of prediction for each of the trained machine learning based classification models; and
update one or more trained ML based classification models on determining that the accuracy of prediction is below a threshold value.
9. The system as claimed in claim 6, plurality of clusters (106a-106d) at least comprises a
Moderate Usage - Frequent Defaulters cluster, a Moderate Usage - Regular Payers cluster,
a Heavy Usage - Frequent Defaulters cluster and a Heavy Usage - Regular Payers cluster.
10.The system as claimed in claim 6, wherein the plurality of features at least comprises age of customer, Last Year Same Month (LYSM) data, current bill amount, average bill of customer, minimum bill amount, billing month, relationship of customer, maximum bill amount, postal code, type of premise of customer and Industry type (IT).
| # | Name | Date |
|---|---|---|
| 1 | 202321008589-STATEMENT OF UNDERTAKING (FORM 3) [09-02-2023(online)].pdf | 2023-02-09 |
| 2 | 202321008589-FORM 1 [09-02-2023(online)].pdf | 2023-02-09 |
| 3 | 202321008589-DRAWINGS [09-02-2023(online)].pdf | 2023-02-09 |
| 4 | 202321008589-DECLARATION OF INVENTORSHIP (FORM 5) [09-02-2023(online)].pdf | 2023-02-09 |
| 5 | 202321008589-COMPLETE SPECIFICATION [09-02-2023(online)].pdf | 2023-02-09 |
| 6 | Abstract1.jpg | 2023-05-15 |
| 7 | 202321008589-FORM-26 [06-07-2023(online)].pdf | 2023-07-06 |
| 8 | 202321008589-Proof of Right [14-07-2023(online)].pdf | 2023-07-14 |