Predicting One Or More Characteristics Of The User Based On Machine

< Back

Predicting One Or More Characteristics Of The User Based On Machine Learning Techniques

Abstract: PREDICTING ONE OR MORE CHARACTERISTICS OF A USER BASED ON MACHINE LEARNING TECHNIQUES A method and an apparatus for classifying a user at initial login stage based on one or more characteristics of the user is disclosed. The method comprises obtaining a first set of parameters related to the user, wherein the first set of parameters includes historical data, including previous instances of fraud, demographic variables of customers, and behavioral patterns that indicate suspicious activity, inputting the first set of parameters in a first machine learning model and predicting a first characteristic of the user based on an output from the first machine learning model to generate a first prediction outcome. The method further comprises obtaining a second set of parameters related to the user, wherein the second set of parameters includes user’s historical claim data and patterns in similar cases, inputting the second set of parameters in a second machine learning model and predicting a second characteristic of the user based on an output from the second machine learning model to generate a second prediction outcome. The method further comprises classifying the user into different categories user based on the first prediction outcome and the second prediction outcome. REFER TO FIGURE 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

06 September 2023

Publication Number

11/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

ICICI Prudential Life Insurance Company Limited

ICICI Prulife Towers, 1089, Appasaheb Marathe Marg, Prabhadevi, Mumbai 400025, India

Inventors

1. Karthik Kanagaraj

B 502, Ruparel Orion, Eastern express highway, Swastik park, Chembur, Mumbai 400071, India

2. Akash Omer

C-201, Vasant Athena, Laxmi Nagar, Opp to Viviana mall, Thane West, Maharashtra- 400601, India

Specification

FORM-2
THE PATENT ACT,1970
(39 OF 1970)
AND
THE PATENT RULES, 2003
(As Amended)
COMPLETE SPECIFICATION (See section 10;rule 13)
"PREDICTING ONE OR MORE CHARACTERISTICS OF THE USER BASED ON MACHINE LEARNING TECHNIQUES"
ICICI Prudential Life Insurance Company Limited, a corporation organized and existing under the laws of India of ICICI Prulife Towers, 1089, Appasaheb Marathe Marg, Prabhadevi, Mumbai 400025, India
The following specification particularly describes the invention and the manner in which it is to be performed:

PREDICTING ONE OR MORE CHARACTERISTICS OF THE USER BASED ON MACHINE LEARNING TECHNIQUES
TECHNICAL FIELD
The present invention relates to the field of machine learning, in particular to predicting one or more characteristics of the users at an initial stage based on the machine learning techniques.
BACKGROUND
[0001] Machine learning techniques find their applications in various fields these days. One or more machine learning models can be trained using sample historical data so that the models can be used to learn based on the sample data. Based on the learning by the machine learning models, the models can be used to predict an outcome.
[0002] One such application of the machine learning technology can be found for predicting fraudulent users and prone to early claim of policy. One application where predicting fraudulent uses is helpful may be the insurance sector. Insurance companies take that risk from users in exchange for a regular payment called premium. The company believes the premium is enough to cover the risk. However, there are many people who try and seek the same kind of insurance covers. This group of people is called the Insurance pool. The possibility of all the clients needing the Insurance Claim is almost improbable. Thus, if and when any such event (of claim) occurs for a couple of individuals, risk pooling allows the Insurance company to settle their claim.
[0003] Fraudulent people can take insurance policy fraudulently or those who are already suffering from chronic disease can take the insurance policy by intentionally concealing complete details about them. Thus, there is a need in the

art to provide ways for predicting fraudulent users and users prone to early claiming of policy at the initial stage itself.
SUMMARY
[0004] The following presents a simplified summary of the subject matter in order to provide a basic understanding of some aspects of subject matter embodiments. This summary is not an extensive overview of the subject matter. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the subject matter.
[0005] Its sole purpose is to present some concepts of the subject matter in a simplified form as a prelude to the more detailed description that is presented later.
[0006] The primary objective of the present invention is to provide machine learning based techniques to predict fraudulent users at an application login stage.
[0007] Another objective of the present invention is to predict users prone to early claiming of policy.
[0008] Another objective of the present invention is to provide machine learning techniques to categorize the users based on their probability to early claim and fraud risk.
[0009] In one embodiment, a method for classifying a user at initial login stage based on one or more characteristics of the user is disclosed. The method comprises obtaining a first set of parameters related to the user, wherein the first set of parameters includes historical data, including previous instances of fraud, demographic variables of customers, and behavioral patterns that indicate suspicious activity, inputting the first set of parameters in a first machine learning model and predicting a first characteristic of the user based on an output from the

first machine learning model to generate a first prediction outcome. The method further comprises obtaining a second set of parameters related to the user, wherein the second set of parameters includes user’s historical claim data and patterns in similar cases, inputting the second set of parameters in a second machine learning model and predicting a second characteristic of the user based on an output from the second machine learning model to generate a second prediction outcome. The method further comprises classifying the user into different categories user based on the first prediction outcome and the second prediction outcome.
[0010] In another embodiment, an apparatus for classifying a user at initial login stage based on one or more characteristics of the user is disclosed. The apparatus comprises a memory and a processor coupled with the memory and configured to perform the operations of obtaining a first set of parameters related to the user, wherein the first set of parameters includes historical data, including previous instances of fraud, demographic variables of customers, and behavioral patterns that indicate suspicious activity, inputting the first set of parameters in a first machine learning model and predicting a first characteristic of the user based on an output from the first machine learning model to generate a first prediction outcome. The processor further perform the operations of obtaining a second set of parameters related to the user, wherein the second set of parameters includes user’s historical claim data and patterns in similar cases, inputting the second set of parameters in a second machine learning model and predicting a second characteristic of the user based on an output from the second machine learning model to generate a second prediction outcome. The processor further perform the operations of classifying the user into different categories user based on the first prediction outcome and the second prediction outcome.
[0011] These and other objects, embodiments and advantages of the present invention will become readily apparent to those skilled in the art from the following

detailed description of the embodiments having reference to the attached figures, the invention not being limited to any particular embodiments disclosed.
BRIEF DESCRIPTION OF FIGURES
5 [0012] The foregoing and further objects, features and advantages of the present
subject matter will become apparent from the following description of exemplary embodiments with reference to the accompanying drawings, wherein like numerals are used to represent like elements.
10 [0013] It is to be noted, however, that the appended drawings along with the
reference numerals illustrate only typical embodiments of the present subject matter, and are therefore, not to be considered for limiting of its scope, for the subject matter may admit to other equally effective embodiments.
15 [0014] FIGURE 1 illustrates a block diagram of an apparatus for classifying a user
based on one or more characteristics of the user, according to an embodiment of the present invention.
[0015] FIGURE 2 illustrates a flow chart of a method for predicting fraudulent
20 users considering the insurance sector as an example, according to an embodiment
of the present invention.
[0016] FIGURE 3 illustrates a flowchart of a method for classifying a user based
on one or more characteristics of the user, according to an embodiment of the
25 present invention.
[0017] FIGURE 4 is a block diagram illustrating an exemplary computing device in which one or more embodiments of the present invention may operate, according to an embodiment. 30
5

DETAILED DESCRIPTION
[0018] Exemplary embodiments now will be described with reference to the
accompanying drawings. The disclosure may, however, be embodied in many
different forms and should not be construed as limited to the embodiments set forth
5 herein; rather, these embodiments are provided so that this disclosure will be
thorough and complete, and will fully convey its scope to those skilled in the art. The terminology used in the detailed description of the particular exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting. In the drawings, like numbers refer to like elements.
10
[0019] It is to be noted, however, that the reference numerals used herein illustrate only typical embodiments of the present subject matter, and are therefore, not to be considered for limiting of its scope, for the subject matter may admit to other equally effective embodiments.
15
[0020] The specification may refer to “an”, “one” or “some” embodiment(s) in several locations. This does not necessarily imply that each such reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other
20 embodiments.
[0021] As used herein, the singular forms “a”, “an” and “the” are intended to
include the plural forms as well, unless expressly stated otherwise. It will be further
understood that the terms “includes”, “comprises”, “including” and/or
25 “comprising” when used in this specification, specify the presence of stated
features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
6

[0022] Unless otherwise defined, all terms (including technical and scientific
terms) used herein have the same meaning as commonly understood by one of
ordinary skill in the art to which this disclosure pertains. It will be further
understood that terms, such as those defined in commonly used dictionaries, should
5 be interpreted as having a meaning that is consistent with their meaning in the
context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
[0023] The figures depict a simplified structure only showing some elements and
10 functional entities, all being logical units whose implementation may differ from
what is shown. The connections shown are logical connections; the actual physical connections may be different. It is apparent to a person skilled in the art that the structure may also comprise other functions and structures.
15 [0024] Referring to FIG. 1, a block diagram of an apparatus 100 for classifying a
user based on one or more characteristics of the user is disclosed. The apparatus 100 comprises a first module 102, a second module 104, a classification module 106, a memory 108 and a processor 110. In one embodiment, the operations of one or more modules mentioned here may be performed by the processor 408. The
20 processor 408 is coupled to the memory 108 and perform the operations of the
apparatus 100.
[0025] The first module 102 may be used to predict an application for frauds. The
second module 104 may be used to predict early claim (for example, of an insurance
25 policy) made by the user. Both the first module 102 and the second module 104
uses machine learning techniques for processing. The machine learning techniques are stored in the memory 108. The output from both the first module 102 and the second module 104 may be combined and provided to the classification module 106 which classify the users into different categories. The classification module 106
7

may classify the users based on different categories starting from the users having highest probability of early claim and fraud risk.
[0026] The first module 102 and the second module 104 use the variables such as
5 historical data, including previous instances of fraud, demographic variables of
customers, and behavioral patterns that might indicate suspicious activity. The variables may include, but not limited to:
User demographics (such as age, gender, salary, location, education, etc.);
10 Gross salary of user
User location
Distribution channel
Type of product
Base sum assured
15 Annualized Premium Equivalent (APE)
User occupation
Total sum assured
Credit score
Proposer's data (salary, occupation, relationship)
20 NRI flag
Unit manager details
Oversampling of data using SMOTE to adjust the data imbalance.
[0027] The first module 102 uses a first machine learning model stored in the
25 memory 108 to predict the first characteristic of the user based on an output from
the first machine learning model to generate a first prediction outcome. The first
characteristic of the user includes the risk of fraud in customer applications. The
first characteristic of the user helps predict whether the user is a fraudulent user.
The input to the first machine learning model includes a wide array of historical
30 data, including previous instances of fraud, demographic variables of customers,
and behavioral patterns that might indicate suspicious activity. By analyzing these
8

factors, the first machine learning model is capable of recognizing patterns that are
associated with fraudulent behavior of the user. The first machine learning model
employs advanced statistical techniques, such as logistic regression and gradient
boosting algorithms, to calculate the probabilities of fraud. These algorithms work
5 by assessing the input data and generating a probability score for each application.
The scores are then categorized into various risk levels, ranging from high
propensity to commit fraud to low propensity. This categorization enables the
apparatus 100 to flag potentially fraudulent applications early on, allowing for
additional scrutiny or intervention at login stage itself. The use of logistic regression
10 helps in understanding the relationship between different variables and their impact
on the likelihood of fraud, while the gradient boost algorithm improves the accuracy of the predictions. The operations of the first module 102 may also be performed by the processor 110.
15 [0028] The second module 104 uses a second machine learning model stored in the
memory 108 to predict a second characteristics of the user based on an output from the second machine learning model to generate a second prediction outcome. The second characteristics of the user includes a likelihood of an early claim being registered by the user. An early claim might indicate various underlying issues, such
20 as a higher risk profile or possible premeditated fraud. This second machine
learning model operates in real-time, assessing the data as soon as an application is logged into the system. Once the first machine learning model has run and provided its prediction regarding the potential for fraud, the second machine learning model steps in to evaluate the propensity of an early claim. This second machine learning
25 model considers different factors, such as user’s historical claim data, patterns in
similar cases, and other relevant indicators that might suggest a higher probability of an early claim. The operations of the second module 104 may also be performed by the processor 110.
9

[0029] The processor 110 is configured to combine the outputs from the first
machine learning model and the second machine learning model and assign a final
tag to each application. This final tag reflects a comprehensive assessment of the
application’s risk, considering both the potential for fraud and the likelihood of an
5 early claim.
[0030] The machine learning techniques employs the following machine learning models/algorithms:
10 Logistic regression Algorithm
Gradient Boost Algorithm Ada Boost Algorithm Boosting algorithm selected among these to build the model.
15 [0031] The Gradient Boosting algorithm employed utilizes LightGBM, a highly
efficient, histogram-based implementation optimized for large datasets and categorical features. It operates by constructing an ensemble of weak learners, typically decision trees, in a sequential manner. Each tree is built to correct the errors made by the previous trees, effectively minimizing the residual errors of the
20 combined ensemble through gradient descent. The algorithm starts with an initial
model that predicts a baseline output, then iteratively adds new trees where each new tree focuses on the errors or residuals of the existing ensemble. The model is trained using a custom loss function designed to handle specific classification errors pertinent to the risk detection task, enhancing its performance and accuracy. To
25 address class imbalance, particularly in detecting risk patterns which are often rare,
SMOTE (Synthetic Minority Over-sampling Technique) is employed to generate synthetic samples of the minority class, thus improving the model’s ability to learn from the minority class and enhancing overall prediction performance. Bayesian optimization is then used to fine-tune hyperparameters such as the depth of the trees
10

and the number of leaves, balancing the trade-off between model complexity and overfitting, leading to a more robust and generalizable model.
[0032] As gradient boosting algorithm is used in both the first module 102 and the
5 second module 104 where the first module 102 is used for predicting application
frauds and the second module 104 is used for predicting early claim. Further, output
from both the modules 102 and 104 (after applying gradient boosting algorithm) is
provided to the classification module 106. The machine learning techniques can be
used to process the input variables received by the first module 102 and the second
10 module 104. In one embodiment, the first machine learning model and the second
machine learning model uses the machine learning techniques to train the model. The model can be trained based on the input variables of a plurality of users at the initial stage.
15 [0033] The present invention can replace all other triggers and give the opportunity
to deploy a single predictive intervention at an initial stage. The preset invention will have the capacity to consume all other scores if available. In one aspect of the present invention, any other external environment data in form of score, ranking can be used. In one embodiment, the initial stage refers to a stage where the user
20 logs in to an application for registering policy. The prediction of the users can be
made based on previous history of life insurance data of users.
[0034] The current system operates by utilizing various scores derived from both internal and external sources. These scores act as indicators of a customer's risk
25 profile. These sources include historical data, internal assessments, credit scores,
financial assessments, other third-party evaluations, etc. The present invention aims to revolutionize this process by integrating all these disparate sources into a single, cohesive system. This invention has the unique capability to replace the multitude of triggers that are currently in use. In simpler terms, where the existing model
30 might rely on several individual triggers—each corresponding to a specific score or
11

marker—the invention consolidates them into a single predictive intervention along
with other variables. This single predictive intervention is significant because it
simplifies the process and enhances accuracy. Instead of handling multiple
indicators and triggers separately, the invention consumes all available scores and
5 markers and then processes them to generate a unified prediction or
recommendation. This integration allows the system to act more decisively and at an earlier stage, potentially mitigating risks more effectively.
[0035] The model is designed to absorb all the previous scores and indicators,
10 regardless of their source, and harmonize them into a comprehensive analysis. This
process not only ensures that no relevant information is overlooked but also allows
for more robust decision-making. By combining all the data, the model can assess
the risk profile with greater precision and provide a recommendation that is more
reliable and reflective of the complete picture. In practical terms, for a customer,
15 this means that the system can now evaluate their profile with a higher degree of
confidence and accuracy. It effectively eliminates the need for multiple, fragmented
triggers and replaces them with a singular, comprehensive approach that enhances
the efficiency and effectiveness of the entire evaluation process. It not only
streamlines the process but also improves the quality of the output, ensuring that
20 decisions are made based on a full spectrum of available data.
[0036] The classification module 106 is used to determine the probability-based
flagging (Red, Maroon, Amber, Yellow, Green – i.e. RMAYG classification) and
then both of the flagging are overlayed in the matrix form to provide a final tagging
25 against each user for further underwriter actions. The classification module 106
classifies users into different flagging (Red, Maroon, Amber, Yellow, Green – i.e., RMAYG classification) based on their probability to early claim and fraud risk (as received from the first module 102 and the second module 104).
12

[0037] Users with Red flagging have the highest propensity to file an early claim
and/or do an application fraud so a strict complete underwriting is recommended
for such users. This is followed by maroon tagged customer, where a video call with
doctor is recommended and those with others tagging are monitored but no strict
5 action are performed on such cases.
[0038] The RMAYG classification by the classification module 106 is provided to
the underwriting team. This ensures that the underwriting team take appropriate
actions on login applications depending on the RMAYG classification they have
10 received.
[0039] The classification process is crucial for the system to provide accurate and actionable insights. It is based on a matrix that considers the outcomes of both
15 modules. For instance, an application with a high fraud risk but low early claim
probability might be flagged for further investigation, while an application with a low fraud risk but high early claim probability could be treated differently and allowing the system to make more nuanced and informed decisions. This dual-module approach enhances the system’s robustness and accuracy, making it a
20 powerful tool in risk management and decision-making. It allows for early
intervention in cases where risks are identified, thereby protecting the organization from potential losses and ensuring that only genuine applications are processed efficiently.
25 [0040] The present invention plays a pivotal role in classifying insurance policies
into the RMAYG (Red, Maroon, Amber, Yellow, Green) categories at the login stage. This classification is based on a comprehensive assessment of various risk factors to determine the propensity of a claim and the associated risk levels. The present invention integrates outputs from two preceding models: the first machine
30 learning model predicting the first characteristic of the user and the second machine
13

learning model predicting the second characteristic of the user. The classification into RMAYG categories is based on a set of criteria including:
5 • A/E Experience (Mortality Experience): Historical data on mortality
rates relevant to the policyholder’s profile.
• Sourcing Channel: The channel through which the policy was
acquired, which may impact risk levels.
• Product Type: The type of insurance product, which influences the risk
10 associated with claims.
• Early Claims Experience: Historical claims data, excluding accidental
and COVID-related deaths, to assess the likelihood of early claims.
[0041] At the login stage, policies are tagged into RMAYG cohorts according to
15 their assessed propensity for fraud and claims. The RMAYG classification guides
subsequent underwriting actions:
• Full Underwriting (FULL UW): For high-risk categories, requiring
comprehensive verification including income and medical evaluations.
20 • Video Medical with Doctor (VMER): For intermediate-risk
categories, where video-based medical assessments are sufficient.
• Straight Through Processing (STP): For low-risk categories, allowing
for streamlined issuance without additional underwriting.
25 [0042] For instance, policies categorized as “Red” based on high mortality
experience and early claims history would require full underwriting, whereas “Green” policies with minimal risk might proceed directly to STP. Example for colouring based on mortality experience is given below:
14

Second prediction outcome
Tag RED MAROON AMBER YELLOW GREEN
RED
First AMBER
prediction outcome YELLOW
GREEN

[0043] The present invention’s design and implementation offer several notable
technical advantages that significantly enhance its functionality and efficiency. By
5 leveraging LightGBM, the system benefits from improved computational speed and
scalability, which is crucial for handling large volumes of data. The histogram-based approach of LightGBM reduces the computational complexity of training, allowing the model to process data faster and with less memory usage compared to traditional gradient boosting methods.
10
[0044] The present invention is a network of models of various algorithms which computes various types of risk at individual source channel levels and then provides a single risk category with an associated action recommendation to the underwriter. This is unique and complex because of the fact that there are models developed for
15 every segment of the applicant such that biases are not skewing the predictions.
[0045] The present invention uses artificial intelligence and its unique in itself in
way as it is a single predictive intervention which consumes lot of parallel models
which are developed internally as well as externally and finally refine the overall
20 prediction percentage (i.e., a measure of how accurate a model's predictions are) as
an overall level. Generally, AI models are used in their pure form with very higher population with intervention but the first and the second machine learning models
15

external information as well as mortality calculation which brings down the overall intervention population in the range of 5-6% is something very unique to this model.
[0046] Referring to FIG. 2 now, a flowchart of a method 200 for predicting
5 fraudulent users considering an example of insurance policy is illustrated. At step
202, the method comprises creating a base table. This includes mapping all the
demographic and product related variables that can identify risky profiles at an
initial stage. The initial stage may include a login stage, for example, when the user
first time logs in into an application/website provided by insurance provider.
10
[0047] At step 204, the method comprises a step of making data model ready for
processing. This includes providing data sanity checks where with variations in
values entered for particular variables as well as the missing values are treated.
Further, data pre-processing is performed where categorical variables are bucketed
15 into respective categories. Oversampling of data is done to adjust the data
imbalance.
[0048] At step 206, the method comprises providing methodology for risk and early
claim. This includes using regression and classification algorithms such as gradient
20 boosting algorithms for building machine learning models for risk and early claim.
Each model will provide probabilities as output, post which Decile analysis is performed, and the threshold limit is kept providing final classification as- Red, Amber, Yellow, Green tagging.
25 [0049] At step 208, the method comprises combining outputs from the first module
102 and the second module 104. The output is classified by the classification module 106 which includes classifying customers into different taggings based on their probability to early claim and fraud risk.
30 [0050] At step 210, the method comprises providing the classification to the
underwriting team. This involves providing classification to the underwriting team
16

via business rule engine. This ensures that the underwriting team take appropriate actions on login applications.
[0051] Referring to Figure 3 now, a flowchart of a method 300 for classifying a
5 user at initial login stage based on one or more characteristics of the user is
illustrated. At step 302, the method comprises obtaining a first set of parameters related to the user, wherein the first set of parameters includes historical data, including previous instances of fraud, demographic variables of customers, and behavioral patterns that indicate suspicious activity. The first characteristics of the
10 user includes predicting whether the user is a fraudulent user. At step 304, the
method comprises inputting the first set of parameters in a first machine learning model. At step 306, the method comprises predicting a first characteristic of the user based on an output from the first machine learning model to generate a first prediction outcome. The second characteristics of the user includes predicting
15 chances of early claiming of an insurance policy by the user.
[0052] At step 308, the method comprises obtaining a second set of parameters related to the user, wherein the second set of parameters includes user’s historical claim data and patterns in similar cases. At step 310, the method comprises
20 inputting the second set of parameters in a second machine learning model. At step
312, the method comprises predicting a second characteristic of the user based on an output from the second machine learning model to generate a second prediction outcome. At step 314, the method comprises classifying the user into different categories user based on the first prediction outcome and the second prediction
25 outcome. Classifying the user includes classifying the user in RMAYG (Red,
Maroon, Amber, Yellow, Green) categories.
[0053] In one embodiment, the above-mentioned functions of the modules may be
implemented using one or more hardware components present in a computing
30 device. For example, the computing device may include the components as
mentioned in Figure 4.
17

[0054] Figure 4 is an exemplary computing device 400 in which one or more
embodiments of the present invention may operate, according to an embodiment.
In the system schematic of figure 4, bus 410 is in physical communication with
5 Input/ Output device 402, interface 404, memory 406, and processor 408. Bus 410
includes a path that permits components within computing device 400 to
communicate with each other. Examples of Input/ Output device 402 include
peripherals and/or other mechanism that may enable a user to input information to
computing device 400, including a keyboard, computer mice, buttons, touch
10 screens, voice recognition, and biometric mechanisms and the like Input/ Output
device 402 also includes a mechanism that outputs information to the user of computing device 400, such as a display, a light emitting diode (LED), a printer, a speaker, and the like.
15 [0055] Examples of interface 404 include mechanisms that enable computing
device 400 to communicate with other computing devices and/or systems through network connections. Examples of memory 406 include random access memory (RAM), read-only memory (ROM), flash memory, and the like. The memory 406 store information and instructions for execution by processor 408. The processor
20 408 includes, but not limited to, a microprocessor, an application specific integrated
circuit (ASIC), or a field programmable object array (FPOA) and the like. The processor 408 interprets and executes instructions retrieved from memory 406.
[0056] In one embodiment, the computing device 400 may be responsible for
25 implementing the above-mentioned steps. For example, input parameters of the user
may be received using the Input/ Output device 402. The machine learning models may be stored in the memory 406 and may be implemented by the processor 408.
[0057] In the development of our machine learning model for identifying fraud and
30 early claims at the login stage, several technical challenges were encountered,
particularly during the pre-processing, training, and validation phases. These
18

challenges were crucial in refining our approach and ensuring the model's
effectiveness and robustness. One of the primary technical problems faced in the
existing art was class imbalance. In the dataset of present invention, instances of
fraudulent claims and early claims were significantly fewer compared to legitimate
5 claims. This imbalance posed a challenge for the model, as traditional learning
algorithms tend to be biased towards the majority class, resulting in poor
performance on the minority class. To address this, the present invention employs
the Synthetic Minority Over-sampling Technique (SMOTE), which is a well-
established method for generating synthetic samples in the feature space of the
10 minority class.
[0058] Another significant challenge was managing overfitting and underfitting. Overfitting occurred when the model learned too much from the training data, including the noise introduced by SMOTE, thereby reducing its generalization
15 capability on unseen data. To counteract this, the present invention employs
regularization techniques and carefully tuned hyperparameters to find a balance between model complexity and performance. Conversely, underfitting was observed in scenarios where the model failed to capture the underlying patterns of fraudulent behaviour and early claims due to excessive simplification. This was
20 mitigated by selecting more complex models and iteratively refining feature
selection to ensure that the model had sufficient capacity to learn from the data.
[0059] The presence of noise in the dataset, particularly in synthetic samples generated by SMOTE, posed challenges in feature engineering and model
25 validation. Noise can obscure the true patterns indicative of fraud and early claims,
leading to less reliable predictions. Rigorous feature selection and engineering processes were implemented to minimize the impact of noise, ensuring that only the most relevant and robust features were used for training. Additionally, validating the model with an imbalanced dataset required careful design of
30 evaluation metrics. Standard accuracy metrics were insufficient, so we used
19

precision, recall, F1 score, and confusion matrices to better assess the model's performance.
[0060] Lastly, the scalability of the model was a concern due to the increased
5 computational cost associated with generating synthetic samples and training on a
larger, more balanced dataset. We optimized our implementation to handle large-
scale data efficiently, ensuring that the model could be deployed in a real-world
setting without significant performance bottlenecks. In conclusion, while
developing the propensity to longevity and risk model, the present invention
10 encountered and addressed several technical challenges related to class imbalance,
overfitting, underfitting, noise, and computational efficiency. These challenges were integral to refining our approach and achieving a robust and effective solution.
[0061] To address class imbalance, SMOTE was employed to generate synthetic
15 samples for the minority class. In present invention, SMOTE parameters are
carefully tuned to ensure a balanced representation without introducing excessive
noise. To manage overfitting, the present invention applied techniques to limit
model complexity, such as setting appropriate hyperparameters for the Gradient
Boosting algorithm. To address underfitting, the present invention provides
20 improved feature engineering by adding relevant features that captured more
nuances of the data, ensuring the model had enough complexity to learn effectively.
[0062] To counteract the noise introduced by SMOTE, the present invention implemented advanced feature selection techniques to focus on the most
25 informative features. This included analyzing feature importance scores from the
Gradient Boosting model to retain features that provided the most predictive value and reduce the impact of noise. To handle increased data volume and ensure real¬time processing, the present invention optimized the model's performance through gradient boosting parameter tuning to balance computational efficiency and
30 predictive accuracy. The present invention also employed efficient data handling
20

practices and streamlined model inference processes to integrate seamlessly into production systems.
[0063] In present invention, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processor. The implementations described herein are not limited to any specific combinations of hardware circuitry and software.
[0064] In the drawings and specification, there have been disclosed exemplary embodiments of the invention. Although specific terms are employed, they are used in a generic and descriptive sense only and not for purposes of limitation of the scope of the invention.

WE CLAIM
1. A method for classifying a user at initial login stage based on one or more
characteristics of the user, the method comprising:
obtaining a first set of parameters related to the user, wherein the first set of parameters includes historical data, including previous instances of fraud, demographic variables of customers, and behavioral patterns that indicate suspicious activity;
inputting the first set of parameters in a first machine learning model;
predicting a first characteristic of the user based on an output from the first machine learning model to generate a first prediction outcome;
obtaining a second set of parameters related to the user, wherein the second set of parameters includes user’s historical claim data and patterns in similar cases;
inputting the second set of parameters in a second machine learning model;
predicting a second characteristic of the user based on an output from the second machine learning model to generate a second prediction outcome;
classifying the user into different categories based on the first prediction outcome and the second prediction outcome.
2. The method as claimed in claim 1, wherein the first machine learning model
and the second machine learning model uses logistic regression and gradient
boost algorithm.

3. The method as claimed in claim 1, wherein predicting the first characteristics of the user includes predicting whether the user is a fraudulent user.
4. The method as claimed in claim 1, wherein predicting the second characteristics of the user includes predicting chances of early claiming of an insurance policy by the user.
5. The method as claimed in claim 1, wherein classifying the user includes classifying the user in RMAYG (Red, Maroon, Amber, Yellow, Green) categories.
6. An apparatus for classifying a user at initial login stage based on one or more characteristics of the user, the apparatus comprising
a memory;
a processor coupled with the memory and configured to perform the operations:
obtaining a first set of parameters related to the user, wherein the first set of parameters includes historical data, including previous instances of fraud, demographic variables of customers, and behavioral patterns that indicate suspicious activity;
inputting the first set of parameters in a first machine learning model;
predicting a first characteristic of the user based on an output from the first machine learning model to generate a first prediction outcome;

obtaining a second set of parameters related to the user, wherein the second set of parameters includes user’s historical claim data and patterns in similar cases;
inputting the second set of parameters in a second machine learning model;
predicting a second characteristic of the user based on an output from the second machine learning model to generate a second prediction; and
classifying the user into different categories based on the first prediction outcome and the second prediction outcome.
7. The apparatus as claimed in claim 6, wherein the first machine learning model and the second machine learning model uses logistic regression and gradient boost algorithm.
8. The apparatus as claimed in claim 6, wherein predicting the first characteristics of the user includes predicting whether the user is a fraudulent user.
9. The apparatus as claimed in claim 6, wherein predicting the second characteristics of the user includes predicting chances of early claiming of an insurance policy by the user.
10. The apparatus as claimed in claim 6, wherein classifying the user includes classifying the user in RMAYG (Red, Maroon, Amber, Yellow, Green) categories.

Documents

Application Documents

#	Name	Date
1	202321059977-STATEMENT OF UNDERTAKING (FORM 3) [06-09-2023(online)].pdf	2023-09-06
2	202321059977-PROVISIONAL SPECIFICATION [06-09-2023(online)].pdf	2023-09-06
3	202321059977-POWER OF AUTHORITY [06-09-2023(online)].pdf	2023-09-06
4	202321059977-FORM 1 [06-09-2023(online)].pdf	2023-09-06
5	202321059977-DRAWINGS [06-09-2023(online)].pdf	2023-09-06
6	202321059977-DRAWING [03-09-2024(online)].pdf	2024-09-03
7	202321059977-CORRESPONDENCE-OTHERS [03-09-2024(online)].pdf	2024-09-03
8	202321059977-COMPLETE SPECIFICATION [03-09-2024(online)].pdf	2024-09-03
9	Abstract 1.jpg	2024-09-26
10	202321059977-Proof of Right [21-02-2025(online)].pdf	2025-02-21