A System Of Deep Learning Customer Churn Prediction Model In E

< Back

A System Of Deep Learning Customer Churn Prediction Model In E Commerce Using Data Mining

Abstract: E-commerce can be defined as an activity to buy and sell goods and services electronically over the internet. It also involves transfer of data and money in order to execute these transactions and services. In today’s business world, companies need to build an effective customer churn prediction model in order to identify potential customers at the risk of churn. They use the prediction results to extend them offers and take remedial actions in order to retain them. Adding new customers to any business is very costly and time consuming. Although technological innovation led to improvement of the service yet e-commerce companies are facing the risk of customer churn. As customer churn rate is influenced by many features, the prediction of customer churn manually is very difficult and sometimes almost impossible because of the complexity of the features and large size of the database. In this invention, an Adam deep learning model has been proposed on the benchmarked Brazilian e-commerce dataset. The proposed Adam deep learning model has been validated by comparing with churn prediction models developed on machine learning approaches such as support vector machine and neural network. The forecast capabilities of the models are tested with evaluation parameters, such as recall, specificity, prediction accuracy, negative predicted values and positive predicted values. It has been concluded that the Adam deep learning model shows higher prediction accuracy, recall, specificity, positive predicted values and negative predicted values in comparison to state-of-the-art machine learning based churn prediction models. This work presents a stable Adam deep learning model that can be used by e-commerce companies to predict churning customers and accordingly plan a timely remedial action to retain them.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

08 December 2021

Publication Number

51/2021

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

ashish.iprindia@hotmail.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-09-18

Renewal Date

Applicants

SEEMA

ASSISTANT PROFESSOR, YADAVINDRA DEPARTMENT OF ENGINEERING, PUNJABI UNIVERSITY GURU KASHI CAMPUS, (DAMDAMA SAHIB), TALWANDI SABO.

GAURAV GUPTA

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, PUNJABI UNIVERSITY, PATIALA.

GURJIT SINGH BHATHAL

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, PUNJABI UNIVERSITY, PATIALA.

BRAHMALEEN K. SIDHU

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, PUNJABI UNIVERSITY, PATIALA.

SUNIL KUMAR

ASSISTANT PROFESSOR, YADAVINDRA DEPARTMENT OF ENGINEERING, PUNJABI UNIVERSITY GURU KASHI CAMPUS (DAMDAMA SAHIB), TALWANDI SABO.

PRIYANKA GUPTA

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, PUNJABI UNIVERSITY, PATIALA.

BACHANDEEP SINGH BHATHAL

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER APPLICATION, CHANDIGARH UNIVERSITY, GARUAH.

KEWAL KRISHAN

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, LOVELY PROFESSIONAL UNIVERSITY, JALLANDHAR.

ABHILASHA JAIN

ASSOCIATE PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, GZSCCET, MRSPTU, BATHINDA.

SWATI BANSAL

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, GZSCCET, MRSPTU, BATHINDA.

Inventors

1. SEEMA

ASSISTANT PROFESSOR, YADAVINDRA DEPARTMENT OF ENGINEERING, PUNJABI UNIVERSITY GURU KASHI CAMPUS, (DAMDAMA SAHIB), TALWANDI SABO.

2. GAURAV GUPTA

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, PUNJABI UNIVERSITY, PATIALA.

3. GURJIT SINGH BHATHAL

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, PUNJABI UNIVERSITY, PATIALA.

4. BRAHMALEEN K. SIDHU

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, PUNJABI UNIVERSITY, PATIALA.

5. SUNIL KUMAR

ASSISTANT PROFESSOR, YADAVINDRA DEPARTMENT OF ENGINEERING, PUNJABI UNIVERSITY GURU KASHI CAMPUS (DAMDAMA SAHIB), TALWANDI SABO.

6. PRIYANKA GUPTA

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, PUNJABI UNIVERSITY, PATIALA.

7. BACHANDEEP SINGH BHATHAL

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER APPLICATION, CHANDIGARH UNIVERSITY, GARUAH.

8. KEWAL KRISHAN

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, LOVELY PROFESSIONAL UNIVERSITY, JALLANDHAR.

9. ABHILASHA JAIN

ASSOCIATE PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, GZSCCET, MRSPTU, BATHINDA.

10. SWATI BANSAL

ASSISTANT PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, GZSCCET, MRSPTU, BATHINDA.

Specification

The knowledge discovery (KDD) process refers to the general procedure to discover knowledge in data. The main aim of the objective of the KDD process is to extract information from data and draw meaningful inferences from it [1]. The major phases of a typical KDD process are data collection, data selection, data pre-processing, data transformation, data mining and data interpretation & evaluation [1]. Figure 1 shows the schematic of a typical KDD process. Figure 1 also shows that data mining is also an important phase of KDD process [1].
Data mining is the process of storing, extracting, editing and deleting data in databases [1, 2]. The basic goal of data mining is to extract the information from a dataset and transform it into an understandable pattern. Data mining is a process to convert raw data into useful information. It is a process to analyze the extracted information to understand informative trends and patterns in the data [1, 2]. Data mining helps to break down patterns and connections in data based on requirement. Companies using social media use data mining to understand the behaviour and purchase patterns of their users. Data Mining is an iterative process making use of data mining algorithms such as classification, regression clustering, etc. on data in order to extract useful, and valid patterns [2].
The data mining process can be split in five phases. Firstly, the data is collected and loaded into data warehouses. Secondly, the data is stored and managed either on cloud or in-house servers. Thirdly, the data is accessed by business analysts, IT professionals and management teams to organize it [2]. Fourthly, the data is sorted using suitable application software based on the user's results, and finally, the end-user presents the data in graphical or tabular form [1, 2].
With the increase in the competition in the e-commerce business, making customers to continue with one e-commerce platform is extremely important. In e-commerce, reducing or stopping the use of an e-commerce platform is called customer churn [3]. Customer churn is one of the most important features in analyzing the growth of any e-business. To calculate the customer churn rate, the number of lost customers is calculated over a time interval [3, 4]. The customer churn rate should be reduced to improve the growth of every e-business. Hence, companies extend attractive offers, discounts and deals to the potential churning customers [3, 4]. However, companies need to predict the potential churning customer by means analysis of their purchase patterns [3].
Researchers have proved that that customer churn can be predicted using behavioral data [3, 4, 5, 6, 7, 8]. These behavioral data depend upon the history of customers and its sellers over the servers [3, 4]. These data can be used to train models, which can predict and further target customers who are at high risk of churning in the near future. Over several years, researchers continually explore new ways to develop and train model with high accuracy rate in churn prediction.
The present work has been structured as follows: Literature review has been discussed in section 2. Section 3 explains the methodology used. Thereafter, Section 4 discusses results and discussions followed by conclusions and future work in section 5. Finally, references have been listed in the last section.
2. Literature Review
A number of research attempts have been made by various researchers to develop customer churn prediction models based on various data mining methodologies using machine and deep learning algorithms. Following paragraphs discuss relevant papers on customer churn prediction using data mining using machine and deep learning algorithms.
Azeem et al. [3] proposed a fuzzy nearest-neighbor classifier based churn prediction model for telecom customers. The real-time prepaid customer dataset was taken from a south Asian company. Authors used AUC and TP rate for validation of the proposed model and compared with state-of-the-art models developed. Authors concluded that their proposed model has higher prediction accuracy as compared to SVM, LR, RF and ANN based models. Authors suggested that for improving the prediction accuracy, large data can be used. It has been suggested that the proposed methodology can also be used to develop customer churn prediction in other applications.
Vijaya and Sivasankar [4] developed a churn prediction model in telecom applications. The simulated annealing based model was developed for telecom UCI dataset. Authors developed churn prediction model based on PSO and prediction results were compared with decision tree, naïve bayes, neural network and SVM techniques. Validation was done using accuracy, precision, TP rate, TN rate and FP rate. Authors concluded that PSO based model accuracy was better in every respect as compared to traditional model. Authors suggested that developed model can be enhanced and improved for customer churn prediction in other applications.
Bahari and Eloyidom [5] developed a CRM based framework to predict of customer behavior in banking using data mining and neural network. A Portuguese bank UCI dataset was taken for customer behavior prediction. It was concluded that neural network based model has better accuracy & specificity as compared to Naïve Bayes based model. They concluded that more instances were classified correctly by NN based model as compared to Naïve Bayes based model. Authors suggested that time for model building need to be reduced. Additionally, enhanced algorithms can be used for model building in banking and e-commerce applications.
Baumann et al. [6] proposed an algorithm to test the predictive power of the model website clickstream data. Authors performed correlation analysis of graphical data scrapped from website. Authors proved that regression analysis and graph metrics improved prediction accuracy of the developed model. Authors suggested that the use of improved and weighted graphs can improve prediction accuracy of customer surfing pattern on the websites.
Lee et al. [7] data mining based customer churn prediction model in mobile users of Korea. Churn prediction was done using neural network, decision tree and logistic regression. Authors suggested that they have considered macro perspective of churn prediction features. They added that micro perspective of churn prediction features needed to be considered for development of developed models.
Li et al. [8] developed a data mining churn prediction model to predict the purchase behavior and personal preferences related to e-commerce purchases. Authors concluded that the cost of retaining existing customers in very less as compared to acquiring new customers. The success rate of retaining existing churners is also more as compared to acquiring new customers. So companies must think for predict potential churners and do maximum possible efforts to retain them. Authors suggested that they didn’t not consider the types of items purchased, which can also be considered in future.
Jie et al. [10] used OWA and k-mean clustering model to analyze the customer purchase data to identify churning customers. Authors concluded that the developed model performs better as compared to state-of-the-art methods. Authors suggested that customer retention can be improved by improving the service to the customers.
Fridrich [11] proposed a neural network based churn prediction model in e-commerce to identify customers churning customers. The proposed model was found to exhibit improved churn prediction accuracy and ability. Authors suggested that improved techniques such as improved decision tree, logistic regression and fuzzy logic can be used for model development.
Machado and Ruiz [12] proposed a churn prediction model mobile application usage of Portugal taxi service. Authors concluded that their model is having better performance and accuracy as compared to previous developed models. Authors advised that improved algorithm and mobile & web applications can be developed for more accurate prediction of customer churn.
Gordini and Veglio [13] developed SVM based churn prediction model for Italian e-commerce customers. Authors compared the performance of developed models and validated the models based on parameters such as recency, frequency, product category length, age, profession, gender, request status, monetary etc. Authors concluded that proposed method has better predictive power and accuracy in comparison to neural network and logistic regression based model. Authors also conducted that their model works very well for noisy, imbalanced & nonlinear data. Authors suggested that staying power of the model can be predicted in future that will help companies to avoid customers from churning.
Berger and Kompan [14] proposed a web interaction based prediction model for real data of an online company. Authors concluded that the prediction using the proposed model outperforms the churn prediction based on existing state-of-the-art models. Authors suggested that the developed approach can be used for churn prediction in other applications as well.
Ahmed et al. [15] developed a churn prediction model for telecom customers using gradient boosted machine tree, extreme gradient boosting, decision tree and random forest techniques. Random sampling was used for selection of the sample and mean method was used for imputation of missing data. Authors suggested that developed model can be used for churn prediction in e-commerce and other applications.
Pondel et al. [16] proposed an ANN deep learning model for churn prediction in e-commerce in retailing. Authors concluded that customer retention rate can be increased and churn rate can be reduced with accurate prediction of churning customers. The developed model exhibit good accuracy, precision and recall in comparison to previous developed models. They suggested that e-commerce is an extremely challenging and potential area for prediction of churn due to unbalanced data.
Anjana and Urolagin [17] developed customer churn prediction models based on four machine learning algorithms viz. XGBoost, RF, LR and SVM. Performance of the developed models was compared using AUC. The models are applied on normalized data and the features were extracted using PCA and select K-best techniques. The features were reduced from 14 to 4 by the authors. Authors concluded that XGBoost algorithm achieves better performance compared for select K-Best outputs than PCA. Authors also found that XGBoost algorithm was able to handle missing values, and avoid overfitting quite effectively.
Kingma and Ba [19] introduced Adam algorithm for optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is very efficient, requires low memory, and is well suited for large data problems. They concluded that Adam is well suited for problems and objectives with very noisy data. They concluded that Adam works well and can be effectively applied to stochastic optimization methods.
Duchi et al. [20] analyzed convergence rates of stochastic optimization procedures by combining smoothing techniques with accelerated gradient methods. Authors concluded that convergence rates of stochastic optimization procedures are strongly dependent on the variance of the gradient estimates. They proved the effectiveness of the proposed algorithm for statistical estimation problems. Authors suggested using the proposed algorithm in combination to other algorithms for distributed stochastic optimization problems.
Chouiekh and Haj [21] proposed a deep convolutional neural networks for telecom customers churn prediction. The learning technique was developed based on call detail records describing customer’s activity for a real telecommunication provider. The authors found that deep CNN outperformed other traditional machine learning algorithms. Authors concluded that use of this approach can reduce the cost related to customer loss and fits better the churn prediction business use case. Authors suggested that the work can be extended by developing models and selecting features using fully connected layers. The work can also be used to predict customer churn in other business applications also.
Kavya et al. [22] developed SVM based classification model for classification of breast mammograms to classify them as abnormal and normal. Authors used neighborhood component analysis for selection of features for said classification based on Tamura and statistical features from the breast mammograms. Authors proved that SVM classifier with quadratic kernel has a good accuracy. They suggested that the proposed models can be used for other applications too.
Martinez et al. [23] developed a framework based on machine learning methods for predicting whether a customer is going to make purchase at the company within a certain time frame in the near future. Authors proposed a new set of customer relevant features that derives from times and values of previous purchases. Authors concluded that the gradient tree boosting method turns out to be the best performing method. Using a data set containing more than 10000 customers and a total number of 200000 purchases, an accuracy score of 89% on the test data set was found.

3. Research Gaps
After studying the relevant published literature related to the prediction of customer churn in various application areas such as banking, telecom, e-commerce etc. using various methods, the following research gaps have been identified.
Highly accurate prediction models are needed to be developed.
Enhanced algorithms are required to be used to predict customer churn in e-commerce.
Improved techniques are needed to be used for development of accurate churn prediction models.
The time for the model building is required to be reduced.
Large numbers of features are needed to be used in future developed model for better prediction.
Use of comprehensive benchmarked dataset is required for accurate prediction.
A suitable database for the customer's purchase information and feedback is required to be used for development of prediction model to address customer satisfaction.
A large number of performance metrics should be taken for validation of the developed models.

In this work, a Brazilian e-commerce benchmarked dataset [9] has been used. Principal component analysis has been used to select 12 strong features. The selected features are used to train an Adam deep learning model as well as SVM & NN machine learning models. The performance of the developed models have been compared using various performance metrics.

SUMMARY OF THE INVENTION
This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention.
This summary is neither intended to identify key or essential inventive concepts of the invention and nor is it intended for determining the scope of the invention.
To further clarify advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
Discloses here in E-commerce can be defined as an activity to buy and sell goods and services electronically over the internet. It also involves transfer of data and money in order to execute these transactions and services. In today’s business world, companies need to build an effective customer churn prediction model in order to identify potential customers at the risk of churn. They use the prediction results to extend them offers and take remedial actions in order to retain them. Adding new customers to any business is very costly and time consuming. Although technological innovation led to improvement of the service yet e-commerce companies are facing the risk of customer churn. As customer churn rate is influenced by many features, the prediction of customer churn manually is very difficult and sometimes almost impossible because of the complexity of the features and large size of the database.
In the present invention, an Adam deep learning model has been proposed on the benchmarked Brazilian e-commerce dataset. The proposed Adam deep learning model has been validated by comparing with churn prediction models developed on machine learning approaches such as support vector machine and neural network. The forecast capabilities of the models are tested with evaluation parameters, such as recall, specificity, prediction accuracy, negative predicted values and positive predicted values. It has been concluded that the Adam deep learning model shows higher prediction accuracy, recall, specificity, positive predicted values and negative predicted values in comparison to state-of-the-art machine learning based churn prediction models. This work presents a stable Adam deep learning model that can be used by e-commerce companies to predict churning customers and accordingly plan a timely remedial action to retain them.
BRIEF DESCRIPTION OF THE DRAWINGS
The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
Figure 1: The KDD process [1]
Figure 2: Confusion Matrix
Figure 3: Comparison of performance metrics for training dataset
Figure 4: Comparison of performance metrics for test dataset
DEATAILED DESCRIPTION OF THE INVENTION
The detailed description of various exemplary embodiments of the disclosure is described herein with reference to the accompanying drawings. It should be noted that the embodiments are described herein in such details as to clearly communicate the disclosure. However, the amount of details provided herein is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
It is also to be understood that various arrangements may be devised that, although not explicitly described or shown herein, embody the principles of the present disclosure. Moreover, all statements herein reciting principles, aspects, and embodiments of the present disclosure, as well as specific examples, are intended to encompass equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a",” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
In addition, the descriptions of "first", "second", “third”, and the like in the present invention are used for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
These and other advantages of the present subject matter would be described in greater detail with reference to the following figures. It should be noted that the description merely illustrates the principles of the present subject matter. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described herein, embody the principles of the present subject matter and are included within its scope.
3. Methodology
3.1 Dataset [9]
Quality of training dataset is very important for effective training of a classifier and attaining a high prediction accuracy. A complete training dataset allows a trained model to work effectively. This study uses a Brazilian e-commerce [9] dataset and contains important information related to customer and seller on various e-commerce platforms. This dataset contains sales and purchase data of 99441 customers from year 2016 to year 2018 [9]. This dataset has been found suitable for customer churn prediction as it has detailed information of sellers, customers and important features and attributes [9]. The dataset contains (i) customer feedback data such as reviews & review score (ii) customer data such as state, latitude & longitude (iii) payment data such as type & status (iv) transaction parameter of data such as type & number of transactions (v) delivery data such as time of order approval, estimated delivery & delays [9]. For model development in this work, 50% data of the balanced dataset has been taken for training and 50% data has been taken for testing. Principal components analysis (PCA) has been used to reduce the dimensionality of the data and further, feature selection [17]. In this work, two principal components have been generated for development of the churn prediction models.
3.2 Machine Learning [11, 15, 17]
Machine learning is a subset of artificial intelligence that allows algorithms to predict outcomes without being programmed [11, 15]. Machine learning is the study of evolutionally algorithms that can improve automatically by their experience or use of some data. These algorithms use some type of data as input to predict output patterns in the data [15]. These algorithms are used to develop a model based on some input training data to make predictions or decisions. The amount and quality of the training data will decide on the performance of the developed models and its prediction accuracy [15]. It brings statistics and computer science closer for developing prediction models. Machine learning algorithms are used in a number of applications such as in speech recognition, medicine, email filtering, computer vision and data analysis [11, 15, 17]. These algorithms work in areas where it is difficult or almost unfeasible to develop conventional algorithms. Machine learning is very important because it helps an enterprises a prediction of patterns and trends in customer behavior along with supporting in the development of new products [15, 17]. Some efficient machine learning algorithms are explained in the next paragraphs.
3.2.1 Machine Learning Approaches
3.2.1.1 Neural Networks [11, 15, 17]
Neural networks (NN) are an efficient machine learning technique and a basis for deep learning algorithms. Its structure and name are inspired by the human brain. These algorithms working like human brain just like the way biological neurons sends signal to each other. These algorithms consist of nodal layers, having an input layer, one or more hidden layers, and an output layer [11]. Each node connects to another node with associated threshold value and weight. These algorithms depend heavily on the training data in order to learn and improve their prediction accuracy. These algorithms become very powerful and accurate once they are fine-tuned using sufficient training data [11].
While working, an input layer is determined and weights are assigned to the node. The weights assigned to a node determine the importance of any node, with larger weight nodes contributing more significantly to the output. All inputs are then multiplied by their respective weights and then summed up [11, 15]. Thereafter, the output is parsed using an activation function that finally determines the output. If individual node output is more than the threshold value, the particular node is activated, sends data to the next layer of the network [15, 17]. Otherwise, the node is not activated and it does not send any data to the next layer of the network. This results in the output of one node becoming in the input of the next node and so on to build the neural network [11].
3.2.1.2 Support Vector Machine [13, 15, 22]
A support vector machine (SVM) is a machine learning classification approach to solve two-group classification problems. After inputting labeled and pre-processed training dataset, SVM is capable of categorizing new data [13, 15]. These methods have better performance and higher speed for a limited number of samples. These advantages make these algorithms very suitable for text classification problems [22].
An SVM training algorithm builds a model to assign new examples to some category thus making this a non-probabilistic binary linear classifier [15]. SVM maps input training data to points in space to maximize the width of the gap between the two categories [15, 22]. New data are then further mapped into the same space and predicted to belong to a data category based on which side of the gap, they lie [13, 22]. Along with their application to performing linear classification, SVMs can also perform a non-linear classification efficiently mapping the given inputs to high dimensional feature spaces [13, 15, 22].
3.3 Deep Learning [16, 18, 19, 21]
Deep learning is a type of machine learning algorithms that constructs artificial neural networks to mimic the structure and function of the human brain [16, 18, 19, 21]. Deep learning uses multiple hidden layers to extract features from data and transform the data into different levels of abstraction. In deep learning, each level learns to transform its input data into a slightly more abstract and composite representation [16, 18, 21]. Considering a face image recognition example with raw input as a matrix of pixels; the first layer may encode edges; the second layer arrangements of edges; the third layer a nose and eyes; and the fourth layer may encode the face [18, 19].
Machine learning performs feature extraction and abstraction from the input data with little to no human input. Deep learning is basically a neural network with three or more layers designed to simulate the behavior of the human brain and learn from large amounts of data [16, 18, 21]. Machine learning models need expert intervention to proceed further after each step [18]. Deep learning does not need to develop the feature extractor for each problem and learns from the data on its own [18]. The machine learning model breaks the problem in sub-parts, and after solving each part, produces the final result [18]. The deep learning model produces the final output after taking the input following an end-to-end approach [18].
3.3.1 Adam Deep Learning Model [19, 20]
Adam [19] is a deep learning classifier that uses first-order gradient optimization with stochastic objective functions. Adam is an extended version of stochastic gradient decent and can be used in place of classical stochastic gradient descent to update network weights more efficiently [18, 19]. Objective functions depend upon the adaptive moments of lower-order functions. Adam deep learning classifier is a hybrid technique AdaGrad [24] and RMSProp [25]. Adam deep learning model uses minibatch size and shows good accuracy. Computations are made at minibatch size, which is a small set of datasets [18, 19]. The prediction accuracy of the trained network with optimal weights is determined in Adam. The major advantage of Adam deep learning classifier is that magnitudes of parameter updates are invariant to rescaling of the gradient [18, 19]. It performs step-size annealing with the step-sizes of Adam are bounded by step-size hyperparameter. This classifier does not require a stationary objective rather it works with sparse gradients [19].
Adam is straightforward and easy to implement, computationally very efficient, require little memory, is invariant to diagonal rescaling, well suited for large data problems [19]. The method is more useful to non-stationary objectives and very well suited to problem with noisy and sparse gradients [18, 19].
In the developed Adam deep learning model, the deep learning network is trained with gradient threshold of value equal to 1. The values of the maximum epoch of 900 and the minibatch size of 100 are taken. A five-layered deep learning architecture is used in this study. The first layer is the input sequence layer, while the bidirectional long short-term memory layer is the second layer [19]. The third layer is the fully connected layer that converge the input weights to churn and non-churn classes [19]. The last two layers are the softmax and classification layers [19]. In this work, it has been proved that Adam classifier can be used effectively large-scale high-dimensional machine learning problems.
4 Results and discussions
Pre-processing techniques are applied on the Brazilian e-commerce dataset. The dataset was checked for any inconsistencies using pre-processing. The absurd and NaN value data are rejected and not considered for study. The missing values in the dataset are imputed using statistical mean method. The dataset is then further balanced and features are normalized. In this work, 50% data of the balanced dataset has been taken for training and 50% data has been taken for testing. PCA has been used for dimensionality reduction and calculation of the principal components. The feature selection is performed using PCA. A total of 12 high significance features were selected and used to train the deep learning and machine learning classification models under study. All classification models are trained with the training dataset for performance evaluation, and unseen test dataset are tested to validated and test the performance of the developed models. The present work uses two-step binary classification for all models i.e. the customer is either be classified as churn or non-churned.
4.1 Performance Metrics [3-8, 14-17, 19-22]
Figure 2 shows the structure of the confusion matrix [3]. The (i) true positive (a) value refers to the probability that an actual positive will test positive, (ii) true negative (d) value refers to the probability that an actual negative will also test negative, (iii) false positive (b) value refers to the probability that an actual negative will also test positive while, false negative (c) value refers to the probability that an actual positive will also test negative.
The combination of the confusion matrix [3] parameters above will determine the important parameters, as shown in equations 1–5:
Prediction Accurcay=(a+d)/(a+b+c+d) (1)
Sensitivity=a/((a+c)) (2)
Specificity =d/((b+d)) (3)
Positive predicted value=a/((a+b)) (4)
Negative predicted value=d/((c+d)) (5)
The predicted values of the performance metrics viz. recall, specificity, prediction accuracy, negative predicted values and positive predicted values are depicted in tables 1 and 2 for training and test datasets respectively. It has been found that Adam deep learning model achieves comparatively higher values of prediction accuracy with training and test datasets as compared with SVM & NN classifiers respectively. The Adam deep learning model also achieves comparatively higher recall, specificity, negative predicted values and positive predicted values compared to other machine learning models.
The results depicted in table 1 shows that the prediction accuracies for training dataset for Adam, SVM & NN models are 98.26%, 97.58% and 97.43% respectively. It has been found that the prediction accuracy for training dataset using Adam deep learning model is higher in comparison to SVM and NN machine learning models.
Table 1: Performances metrics of classifiers with training dataset
Classification
Method Recall Specificity Prediction
Accuracy Negative predicted value Positive predicted value
NN 0.9613 0.9832 0.9743 0.9621 0.9843
SVM 0.9646 0.9800 0.9758 0.9653 0.9811
Adam 0.9788 0.9864 0.9826 0.9789 0.9863

The results depicted in table 2 shows that the prediction accuracies for test dataset for Adam, SVM & NN models 97.61%, 96.19% and 96.79% respectively. It has been found that the prediction accuracy for test dataset using Adam deep learning model is higher in comparison to SVM and NN machine learning models.
Table 2: Performances metrics of classifiers with test dataset
Classification
Method Recall Specificity Prediction
Accuracy Negative predicted value Positive predicted value
NN 0.9476 0.9893 0.9679 0.9564 0.9833
SVM 0.9402 0.9827 0.9619 0.9485 0.9805
Adam 0.9626 0.9894 0.9761 0.9641 0.9889

Figures 3 and 4, shows the graphics representation of results of validation parameters viz. recall, specificity, prediction accuracy, negative predicted value and positive predicted values respectively. It can be clearly found that Adam deep learning classifier has higher prediction accuracy and performance metrics as compared to SVM and NN classifiers undertaken for study.
5. Conclusions and Future Work
Customer churn prediction is extremely important to predict potential churning customers for e-business companies, especially in e-commerce application where users have several buying choices just a single click away. Churn prediction has been found beneficial for e-commerce companies as it helps to predict issues that force customers to churn from them. In this study, a deep learning churn prediction model in e-commerce has been proposed for a Brazilian e-commerce dataset. The dataset has been pre-processed, balanced and normalized for churn prediction. 50% data of the balanced dataset has been taken for training and 50% data has been taken for testing. PCA has been used for dimensionality reduction and calculation of the principal components. All classification models are trained with the training dataset for performance evaluation, and unseen test dataset are tested to validated and test the performance of the developed models. The results of the proposed deep learning model has been compared with neural network and SVM machine learning classification models.
The results show that the Adam deep learning classifier provides the better output in terms of prediction accuracy, recall, specificity, positive predicted value and negative predicted value as compared to SVM and NN machine learning models. The prediction accuracies for training dataset for Adam, SVM & NN models are 98.26%, 97.58% and 97.43% respectively. The prediction accuracies for test dataset for Adam, SVM & NN models are 97.61%, 96.19% and 96.79% respectively. It has been found that training as well as test dataset using Adam deep learning model is higher in comparison to SVM and NN machine learning models.
It can be concluded that the Adam deep learning model can predict customer churn well and can be used by e-commerce companies to predict the churning customer and identify suitable measures to retain the churning customers. Authors are working further to improve the prediction accuracy of the developed models. Authors are also working to develop customer churn prediction models in e-commerce using other deep learning & machine learning and compare the performance of all models to further improve the accuracy and performance.
References
[1]. https://www.google.com/search?q=kdd+process&rlz=1C1OKWM_enIN909IN910& source=lnms&tbm=isch&sa=X&ved=2ahUKEwjP3vXQuIrzAhXvzDgGHSYJB2IQ_AUoAXoECAEQAw&biw=1366&bih=625#imgrc=ff456qBQGOL9pM&imgdii=yk6p6bPcejmrQM (assessed Sept 4, 2021 at 10am).
[2]. P.N. Tan, M. Steinbach and V. Kumar. Data Mining: Concepts and Techniques (2nd ed.). The Morgan Kaufmann Publisher, Elsevier Inc., 2016.
[3]. M. Azeem, M. Usman and A. C. M. Fong. A churn prediction model for prepaid customers in telecom using fuzzy classifiers. Telecommunication Systems, 66 (4), pp. 603-614, 2017.
[4]. J. Vijaya and E. Sivasankar. An efficient system for customer churn prediction through particle swarm optimization based feature selection model with simulated annealing. Cluster Computing, pp. 1-12, 2017.
[5]. T.F. Bahari and M.S. Elayidom. An efficient CRM-Data mining framework for the prediction of customer behavior. Procedia Computer Science, 46, pp. 725 – 731, 2015.
[6]. A. Baumann, J. Haupt, F. Gebert and S. Lessmann. Changing perspectives: Using graph metrics to predict purchase probabilities. Expert Systems with Applications, 94, pp. 137-148, 2018.
[7]. E.B. Lee, J. Kim and S.G. Sang-Gun Lee. Predicting customer churn in the mobile industry using data mining technology. Industrial Management and Data Systems, 117 (1), pp. 90-109, 2017.
[8]. H. Li, Z. Guan and Y. Cui. Customer churn prediction based on BG / NBD model. Wuhan International Conference on E-Business-Emerging Issues in E-Business, pp. 386-393, 2017.
[9]. Olist, Dabague and F. Magioli. Brazilian E-Commerce Public Dataset. Available at https://www.kaggle.com/olistbr/brazilian-ecommerce, 2018 (Assessed Sept 4, 2020).
[10]. C. Jie, Y. Xiaobing and Z. Zhifei. Integrating OWA and data mining for analyzing customers churn in e-commerce. Journal of Systems Science and Complexity, 28, pp. 381–392, 2015.
[11]. M. Fridrich. Hyperparameter optimization of artificial neural network in customer churn prediction using Genetic Algorithm. Trends in Economics and Management, 28(1), pp. 9–21, 2017.
[12]. N. L. R. Machado and D. D. A. Ruiz. Customer: A novel customer churn prediction method based on mobile application usage. IEEE Wireless Communications and Mobile Computing Conference, pp. 2146-2151, 2017.
[13]. N. Gordini and V. Veglio. Customer churn prediction and marketing retention strategies. An application of support vector machines based on the AUC parameter selection technique in B2B e-commerce industry. Industrial Marketing Management, 8, pp. 1-8, 2016.
[14]. P. Berger and M. Kompan. User modelling for churn prediction in E-commerce. IEEE Conference on Intelligent Systems, pp. 1-6, 2019.
[15]. A K. Ahmad, A. Jafar and K, Aljoumaa. Customer churn prediction in telecom using machine learning in big data platform. Journal of Big Data, 6 (28), pp.1-24, 2019.
[16]. M. Pondel, M. Wuczynski, W. Gryncewicz, L. Lysik, M. Hernes, A. Rot and A. Kozina. Deep learning for customer churn prediction in ecommerce decision support. International Conference on Business Information Systems, pp.3-12, 2021.
[17]. K.V. Anjana and S. Urolagin. Churn prediction in telecom industry using machine learning algorithms with K-Best and principal component analysis. In: X.Z. Gao, R. Kumar, S. Srivastava and B.P. Soni (eds) Applications of Artificial Intelligence in Engineering. Algorithms for Intelligent Systems, pp.499-507, 2021, Singapore.
[18]. Y. Bengio, I. Goodfellow and A. Courville. Deep learning (2nd ed.), The MIT Press Essential Knowledge series, 2015.
[19]. D. P. Kingma and J. L. Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, pp. 1–15, 2015.
[20]. J. C. Duchi, P. L. Bartlett, and M. J. Wainwright. Randomized smoothing for (parallel) stochastic optimization. IEEE Conference on Decision Control, 12, pp. 5442–5444, 2012.
[21]. A. Chouiekh and E.I.H.E.I. Haj. Deep convolutional neural networks for customer churn prediction analysis. Elsevier International Journal of Cognitive Informatics and Natural Intelligence, 14 (1), pp. 1-16, 2020.
[22]. N. Kavya, N. Sriraam, N. Usha, D. Sharath, B. Hiremath, M. Menaka and B. Venkatraman. Feature selection using neighborhood component analysis with support vector machine for classification of breast mammograms. In: V. Bindhu, J. Chen, J. Tavares (eds) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, 637, pp. 253-260, 2020.
[23]. A. Martinez, C. Schmuck, S. Pereverzyev Jr., C. Pirker, and M. Haltmeier. A machine learning framework for customer purchase prediction in the non-contractual setting. Elsevier European Journal of Operational Research, 281 (3), pp.588-596, 2020.
[24]. “RMSprop-Optimization algorithms Coursera.” https://www.coursera.org/lecture/ deep-neural-network/rmsprop-BhJlm (Accessed Oct. 23, 2020).
[25]. G. Bunaccors. Machine Learning Algorithms. (1st ed.). The Packt Publishers, Birmingham UK, 2017.

We Claims:
1. A system of deep learning customer churn prediction model in e-commerce using data mining comprises Databases for data cleaning, data warehouse for data selection and transformation, task relevant data for data mining, and Pattern for pattern evaluation with data integration and knowledge.
2. The system as claimed in claim 1, wherein pre-processing techniques are applied on the Brazilian e-commerce dataset. The dataset was checked for any inconsistencies using pre-processing. The absurd and NaN value data are rejected and not considered for study. The missing values in the dataset are imputed using statistical mean method.
3. The system as claimed in claim 1, wherein the dataset is further balanced and features are normalized; 50% data of the balanced dataset has been taken for training and 50% data has been taken for testing of the developed models respectively.
4. The system as claimed in claim 1, wherein principal component analysis has been used for dimensionality reduction and calculation of the principal components. The feature selection is performed using PCA. A total of 12 high significance features were selected and used to train the deep learning and machine learning classification models under study.
5. The system as claimed in claim 1, wherein all classification models are trained with the training dataset for performance evaluation, and unseen test dataset are tested to validated and test the performance of the developed models.
6. The system as claimed in claim 1, wherein the present work uses two-step binary classification for all models i.e. the customer is either be classified as churn or non-churned.
8. The system as claimed in claim 1, wherein the Adam deep learning classifier provides the better output in terms of prediction accuracy, recall, specificity, positive predicted value and negative predicted value as compared to SVM and NN machine learning models.
9. The system as claimed in claim 1, wherein the prediction accuracy for training as well as test dataset using Adam deep learning model is higher in comparison to SVM and NN machine learning models.

Documents

Application Documents

#	Name	Date
1	202111057174-STATEMENT OF UNDERTAKING (FORM 3) [08-12-2021(online)].pdf	2021-12-08
2	202111057174-REQUEST FOR EARLY PUBLICATION(FORM-9) [08-12-2021(online)].pdf	2021-12-08
3	202111057174-FORM-9 [08-12-2021(online)].pdf	2021-12-08
4	202111057174-FORM 1 [08-12-2021(online)].pdf	2021-12-08
5	202111057174-DRAWINGS [08-12-2021(online)].pdf	2021-12-08
6	202111057174-DECLARATION OF INVENTORSHIP (FORM 5) [08-12-2021(online)].pdf	2021-12-08
7	202111057174-COMPLETE SPECIFICATION [08-12-2021(online)].pdf	2021-12-08
8	202111057174-FORM 18 [09-04-2022(online)].pdf	2022-04-09
9	202111057174-FER.pdf	2022-08-10
10	202111057174-FER_SER_REPLY [07-02-2023(online)].pdf	2023-02-07
11	202111057174-CORRESPONDENCE [07-02-2023(online)].pdf	2023-02-07
12	202111057174-CLAIMS [07-02-2023(online)].pdf	2023-02-07
13	202111057174-PA [02-03-2024(online)].pdf	2024-03-02
14	202111057174-FORM28 [02-03-2024(online)].pdf	2024-03-02
15	202111057174-ASSIGNMENT DOCUMENTS [02-03-2024(online)].pdf	2024-03-02
16	202111057174-8(i)-Substitution-Change Of Applicant - Form 6 [02-03-2024(online)].pdf	2024-03-02
17	202111057174-Retyped Pages under Rule 14(1) [07-03-2024(online)].pdf	2024-03-07
18	202111057174-2. Marked Copy under Rule 14(2) [07-03-2024(online)].pdf	2024-03-07
19	202111057174-Retyped Pages under Rule 14(1) [24-04-2024(online)].pdf	2024-04-24
20	202111057174-2. Marked Copy under Rule 14(2) [24-04-2024(online)].pdf	2024-04-24
21	202111057174-Retyped Pages under Rule 14(1) [19-07-2024(online)].pdf	2024-07-19
22	202111057174-2. Marked Copy under Rule 14(2) [19-07-2024(online)].pdf	2024-07-19
23	202111057174-US(14)-HearingNotice-(HearingDate-21-08-2024).pdf	2024-07-31
24	202111057174-Correspondence to notify the Controller [08-08-2024(online)].pdf	2024-08-08
25	202111057174-Written submissions and relevant documents [30-08-2024(online)].pdf	2024-08-30
26	202111057174-RELEVANT DOCUMENTS [30-08-2024(online)].pdf	2024-08-30
27	202111057174-PETITION UNDER RULE 137 [30-08-2024(online)].pdf	2024-08-30
28	202111057174-Annexure [30-08-2024(online)].pdf	2024-08-30
29	202111057174-FORM-8 [12-09-2024(online)].pdf	2024-09-12
30	202111057174-PatentCertificate18-09-2024.pdf	2024-09-18
31	202111057174-IntimationOfGrant18-09-2024.pdf	2024-09-18

Search Strategy

1	search_202111057174E_10-08-2022.pdf