Abstract: The present invention generally relates to a system for explainable credit risk analysis and real-time loan approval using deep learning models. The system integrates structured and unstructured credit-related data from various sources, including transactional, demographic, socio-economic data, collateral images, and voice recordings. A data pre-processing module cleans and transforms the data, while a consolidation module merges and scales features. Deep learning models extract additional attributes from images and speech—classifying collateral characteristics and analyzing customer sentiment. These enriched attributes are fed into classification and regression models to assess creditworthiness and predict eligible loan amounts. A loan value calculation module determines the final loan amount based on predicted percentage and collateral value. To enhance trust, an explainability module provides clear justifications for model decisions by identifying key contributing factors. The complete system is integrated into a loan origination platform to enable real-time, transparent credit risk assessment and loan approval.
Description:FIELD OF THE INVENTION
The present disclosure relates generally to the field of financial technology and artificial intelligence. More specifically, the invention relates to systems and methods for analyzing credit risk with enhanced explainability and facilitating real-time or near real-time loan approval decisions.
BACKGROUND OF THE INVENTION
The assessment of credit risk is a fundamental process in the financial industry, crucial for decisions regarding lending, credit lines, and other financial products. Traditionally, this process has relied on manual analysis of financial statements, credit reports, and other historical data, often supplemented by statistical models like logistic regression and decision trees. While these methods have provided a foundation for credit assessment, they are often time-consuming, can be subject to human bias, and may not fully capture the complex interplay of factors influencing a borrower's ability and willingness to repay.
With the advent of more powerful computing and sophisticated algorithms, particularly in the field of Artificial Intelligence (AI) and Machine Learning (ML), credit risk analysis has evolved. AI/ML models, such as neural networks and gradient boosting algorithms, have demonstrated the potential for improved accuracy in predicting default risk by analyzing vast amounts of data and identifying complex patterns that may not be apparent through traditional methods.
Concurrently, there has been a growing demand for faster, and ideally real-time, loan approval processes. In today's fast-paced digital economy, consumers and businesses expect quick decisions on credit applications. This has led to the development of automated underwriting systems that leverage technology to streamline data collection and apply predefined rules or statistical models for rapid evaluation.
However, the increased complexity and "black box" nature of some high-performing AI/ML models have introduced challenges, particularly in regulated industries like finance. Regulatory requirements, such as those emphasizing fair lending practices and the right to explanation for adverse credit decisions, necessitate transparency and interpretability in the decision-making process. Applicants who are denied credit have a right to understand the reasons behind the decision. This is where the need for Explainable Artificial Intelligence (XAI) arises – to provide clear, understandable insights into how an AI model arrived at a particular credit risk assessment.
Prior art solutions in credit risk analysis and loan approval often face limitations in one or more of these areas. Traditional manual and statistical methods, while offering some degree of explainability, are typically too slow for real-time approval and may lack the predictive power of more advanced AI models. While some automated systems provide speed, they may rely on less complex models or lack the necessary explainability to satisfy regulatory requirements and build customer trust. Existing applications of AI/ML in credit scoring, while potentially accurate, have often struggled with providing clear and actionable explanations for their predictions, making it difficult to comply with regulations and address concerns about bias. Furthermore, integrating robust explainability techniques into real-time loan approval systems presents significant technical challenges, requiring efficient algorithms and computational infrastructure to provide explanations without introducing unacceptable delays in the decision-making process.
Therefore, a need exists for a system and method that can provide accurate credit risk analysis and real-time loan approval while simultaneously offering clear and understandable explanations for the decisions made, thereby addressing the limitations of prior art solutions and meeting the growing demands for speed, accuracy, transparency, and compliance in the financial lending landscape.
SUMMARY OF THE INVENTION
The present disclosure seeks to provide a system and method for explainable credit risk analysis and real-time loan approval. The invention provides a significant improvement in credit risk analysis by offering accurate, interpretable, and explainable risk assessments. Utilizing a diverse set of data sources, including internal financial transactions, external socio-economic factors, customer demographics, and real-time data, the system employs various Machine Learning and Deep Learning models, such as Gradient Boosting Algorithms and deep learning models for tasks like image classification , speech analysis , and Natural Language Processing (NLP).
The system processes this data through predictive and regression models, including Deep Learning models with a specific architecture comprising a Dense input layer with 64 units and ReLU activation, two Dense hidden layers with 64 and 32 units respectively, both with ReLU activation, and a Dense output layer with 1 unit and a sigmoid activation function for binary classification. The results from these advanced models are compared with existing traditional models to validate performance.
Based on the comprehensive analysis, the system generates a detailed and explainable Credit Risk Report and determines an applicable New Credit Limit. This approach provides financial institutions with enhanced capabilities in Credit Risk Monitoring, enabling them to build a healthier loan portfolio and facilitate real-time or near real-time loan approval decisions with increased transparency and trust.
In an embodiment, a system for explainable credit risk analysis and real-time loan approval using deep learning models is disclosed. The system includes a data source module configured to collect credit-related data from a plurality of sources, including structured data comprising transactional data, customer demographic data, and socio-economic data, and unstructured data comprising collateral images and customer interaction voice recordings.
The system further includes a data pre-processing module coupled to the data source module, configured to clean, transform, and structure the collected data, including handling missing values, encoding categorical variables, scaling numerical variables, and performing feature selection and extraction.
The system further includes a data consolidation module coupled to the data pre-processing module, configured to consolidate the pre-processed data by merging structured and unstructured data features based on a common identifier and apply feature scaling.
The system further includes one or more deep learning feature extraction modules coupled to the data source module and the data consolidation module, comprising: a deep learning model configured to receive and process collateral images to classify image characteristics and generate image-based attribute values; and a deep learning speech model configured to receive and process customer interaction voice recordings to convert voice to text and perform sentiment analysis to generate sentiment-based attribute values; wherein the generated image-based and sentiment-based attribute values are fed into the consolidated data.
The system further includes a model preparation module coupled to the data consolidation module, configured to train and test credit risk evaluation models using the consolidated data, the model preparation module comprising: a classification model configured to determine a likelihood of a customer to repay a loan based on the consolidated data and the generated attribute values; and a regression model configured to predict a loan amount percentage based on the consolidated data and the generated attribute values if the classification model indicates the loan can be given.
The system further includes a loan value calculation module configured to calculate a final loan value based on the predicted loan amount percentage and a current value of collateral accessed via an external interface.
The system further includes an explainability module coupled to the model preparation module, configured to generate explanations for the decisions made by the classification model and the regression model, identifying factors contributing and not contributing to the decision.
The system further includes an integration module configured to integrate the trained models and the explainability module into a loan origination system for real-time credit risk assessment, prediction, and explanation generation.
In another embodiment, a method for evaluating credit risk and approving loans using deep learning-based explainability is disclosed. The method includes collecting credit-related data from a plurality of data sources, including structured data comprising transactional data, customer demographic data, and socio-economic data, and unstructured data comprising collateral images and customer interaction voice recordings.
The method further includes pre-processing the collected data using a processor, including cleaning, transforming, and structuring the data by performing steps comprising handling missing values, encoding categorical variables, scaling numerical variables, and performing feature selection and extraction.
The method further includes consolidating the pre-processed data and applying feature scaling using the processor.
The method further includes extracting features from the unstructured data using one or more deep learning models, comprising: processing collateral images using a deep learning model to classify image characteristics and generate image-based attribute values; and processing customer interaction voice recordings using a deep learning speech model to convert voice to text and perform sentiment analysis to generate sentiment-based attribute values; wherein the generated image-based and sentiment-based attribute values are added to the consolidated data.
The method further includes preparing and training one or more credit risk evaluation models using the consolidated data and the generated attribute values, the preparing and training comprising: training a classification model to determine a likelihood of a customer to repay a loan; and training a regression model to predict a loan amount percentage if the classification model indicates the loan can be given.
The method further includes calculating a final loan value based on the predicted loan amount percentage and a current value of collateral accessed via an external interface.
The method further includes generating explanations for the decisions made by the classification model and the regression model using an explainability model, the explanations identifying factors contributing and not contributing to the decision.
The method further includes providing real-time credit risk assessment, prediction, and explanation generation by integrating the trained models and the explainability model into a loan origination system.
An object of the present disclosure is to construct a robust and intelligent credit risk analysis model using a deep learning framework that integrates diverse and heterogeneous data sources, including structured financial data, socio-economic indicators, customer demographic information, historical transactions, and unstructured data such as collateral images and voice recordings.
Another object of the present disclosure is to develop an explainable and transparent credit limit prediction model that provides clear and interpretable insights into the decision-making process, ensuring adherence to regulatory standards and enabling financial institutions to build trust with customers and regulators.
Another object of the present disclosure is to enable real-time credit decision-making by integrating the developed credit risk assessment and credit limit estimation models with existing Loan Origination Systems (LOS), ensuring seamless operation, improved processing efficiency, and proactive risk mitigation.
Another object of the present disclosure is to enhance the accuracy of creditworthiness evaluation by incorporating global and local financial parameters along with customer behavioral patterns and sentiment analysis derived from unstructured data.
Yet another object of the present invention is to deliver an expeditious and cost-effective method for evaluating credit risk and approving loans using deep learning-based explainability.
To further clarify the advantages and features of the present disclosure, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail in the accompanying drawings.
BRIEF DESCRIPTION OF FIGURES
These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read concerning the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Figure 1 illustrates a block diagram of a system for explainable credit risk analysis and real-time loan approval using deep learning models in accordance with an embodiment of the present disclosure;
Figure 2 illustrates a flow chart of a method for evaluating credit risk and approving loans using deep learning-based explainability in accordance with an embodiment of the present disclosure; and
Figure 3 illustrates an architecture of a system for explainable credit risk analysis in accordance with an embodiment of the present disclosure.
Further, skilled artisans will appreciate those elements in the drawings are illustrated for simplicity and may not have necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present disclosure. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
DETAILED DESCRIPTION:
To promote an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.
Reference throughout this specification to “an aspect”, “another aspect” or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase “in an embodiment”, “in another embodiment” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by "comprises...a" does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
Embodiments of the present disclosure will be described below in detail concerning the accompanying drawings.
Referring to Figure 1, a block diagram of a system for explainable credit risk analysis and real-time loan approval using deep learning models is illustrated in accordance with an embodiment of the present disclosure. the system (100) includes a data source module (102) configured to collect credit-related data from a plurality of sources, including structured data comprising transactional data, customer demographic data, and socio-economic data, and unstructured data comprising collateral images and customer interaction voice recordings.
In an embodiment, a data pre-processing module (104) is coupled to the data source module (102), configured to clean, transform, and structure the collected data, including handling missing values, encoding categorical variables, scaling numerical variables, and performing feature selection and extraction.
In an embodiment, a data consolidation module (106) is coupled to the data pre-processing module (104), configured to consolidate the pre-processed data by merging structured and unstructured data features based on a common identifier and apply feature scaling.
In an embodiment, one or more deep learning feature extraction modules (108) are coupled to the data source module (102) and the data consolidation module (106), comprising a deep learning image classification model (108A) configured to receive and process collateral images to classify image characteristics and generate image-based attribute values and a deep learning speech model (108B) configured to receive and process customer interaction voice recordings to convert voice to text and perform sentiment analysis to generate sentiment-based attribute values, wherein the generated image-based and sentiment-based attribute values are fed into the consolidated data.
In an embodiment, a model preparation module (110) is coupled to the data consolidation module (106), configured to train and test credit risk evaluation models using the consolidated data, the model preparation module (110) comprising a classification model (110A) configured to determine a likelihood of a customer to repay a loan based on the consolidated data and the generated attribute values and a regression model (110B) configured to predict a loan amount percentage based on the consolidated data and the generated attribute values if the classification model indicates the loan can be given.
In an embodiment, a loan value calculation module (112) is configured to calculate a final loan value based on the predicted loan amount percentage and a current value of collateral accessed via an external interface.
In an embodiment, an explainability module (114) is coupled to the model preparation module (110), configured to generate explanations for the decisions made by the classification model and the regression model, identifying factors contributing and not contributing to the decision.
In an embodiment, an integration module (116) is configured to integrate the trained models and the explainability module (114) into a loan origination system for real-time credit risk assessment, prediction, and explanation generation.
In another embodiment, the deep learning image classification model (108A) is a pretrained model configured for image classification, segmentation, and object detection, wherein the deep learning image classification model (108A) is configured to process collateral images by performing convolutional operations and pooling to extract hierarchical features, followed by classification layers trained to output a probability distribution over predefined image characteristics categories, and further configured to generate image-based attribute values by mapping these classifications or specific feature activations to numerical or categorical representations.
In a further embodiment, the deep learning speech model (108B) is configured to convert voice to text and perform sentiment analysis to determine if customer interaction is positive, negative or neutral, wherein the deep learning speech model (108B) is configured to process customer interaction voice recordings by first applying a speech-to-text engine utilizing an acoustic model to generate a transcript, and subsequently feeding the transcript into a natural language processing (NLP) model configured for sentiment analysis to classify the sentiment as positive, negative, or neutral, and wherein the sentiment-based attribute values are generated as numerical scores or categorical labels representing the detected sentiment intensity or type, wherein the speech-to-text engine utilizes a recurrent neural network (RNN) or Transformer-based architecture for sequence-to-sequence transduction from acoustic features to text, wherein the NLP model for sentiment analysis employs a deep learning architecture selected from a recurrent neural network (RNN), a convolutional neural network (CNN), or a Transformer network, trained on a corpus of text data labelled with sentiment.
In one embodiment, the classification model is a Gradient Boosting classifier or a deep learning model comprising at least one input layer, at least one output layer, and at least two hidden layers, wherein the classification model is trained using a structured approach involving hyperparameter tuning through techniques selected from grid search, random search, or Bayesian optimization to identify the optimal configuration for the given consolidated data.
In another embodiment, the regression model is an Gradient Boosting regression model, wherein the explainability module (114) is configured to generate explanations for the regression model's predictions by calculating SHAP values for each feature input to the model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the predicted loan amount percentage and the base value of the model, wherein the explainability module (114) is further configured to present the generated SHAP values for the regression model in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact on the predicted loan amount percentage, to facilitate understanding of how different factors influenced the calculated loan value.
In another embodiment, the explainability module (114) utilizes SHAP (Shapley Additive explanations) values to generate explanations, wherein the explainability module (114) is configured to generate explanations for the classification model's decisions by calculating SHAP values for each feature input to the model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the prediction and the base value of the model, wherein the explainability module (114) is further configured to present the generated SHAP values in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact direction (positive,negative or neutral influence on the prediction), to facilitate understanding of why a particular credit risk classification is made.
The system (100) further comprising one or more image capturing device configured to capture data for the data source module (102), the image capturing device including at least one of: a camera for ornament images, a purity machine for ornament purity, a scanner for OCR, or a biometric device for authentication.
In another embodiment, the integration module (116) is configured to receive customer loan requirements and collateral details from the loan origination system and provide real-time predictions of loan eligibility and amount, along with detailed explanations, wherein the integration module (116) is configured to integrate the trained models and the explainability module (114) into a loan origination system by providing a set of APIs that allow the loan origination system to submit customer data and receive real-time credit risk assessments, predicted loan amounts, and corresponding detailed explanations for the decisions, wherein the integration module (116) is configured to handle real-time requests from the loan origination system by efficiently routing the incoming data through the pre-processing, consolidated data, deep learning feature extraction, and model preparation modules, and then invoking the explainability module (114) before returning the results to the loan origination system with minimal latency, wherein the explainability module (114) is further configured to provide global explanations summarizing the overall impact and importance of different features across the entire dataset or a segment thereof, in addition to generating local explanations for individual customer cases, to provide insights into the model's general behavior.
Yet, in a further embodiment, the regression model for predicting a loan amount percentage for loan applications with explainability, comprising: a data storage unit configured to store consolidated loan application data and generated attribute values; a processor coupled to the data storage unit; and a non-transitory computer-readable medium storing instructions executable by the processor, the instructions causing the processor to perform steps including: loading the consolidated loan application data and the generated attribute values; merging the consolidated loan application data and the generated attribute values based on a common identifier; applying a trained classification model to the merged data to predict an eligibility status for each loan application; filtering the merged data based on the predicted eligibility status to create a subset of data containing eligible loan applications; selecting a target variable representing a loan amount percentage and a plurality of features from the subset of data; processing the selected features and target variable, including handling missing values and encoding categorical variables; training an Gradient Boosting regression model using the processed features and target variable from a training portion of the subset of data; for a new loan application: receiving new loan application data including generated attribute values; applying the trained classification model to the new loan application data to predict a new eligibility status; conditionally applying the trained Gradient Boosting regression model to the new loan application data, only if the new eligibility status indicates the loan is eligible, to predict a loan amount percentage for the new loan application; and an explainability module coupled to the processor, the explainability module configured to: generate explanations for the classification model's decisions by calculating SHAP (Shapley Additive explanations) values for each feature input to the classification model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the classification prediction and a base value of the classification model; present the generated SHAP values for the classification model in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact direction (positive or negative influence on the prediction), to facilitate understanding of why a particular credit risk classification is made; generate explanations for the Gradient Boosting regression model's predictions by calculating SHAP values for each feature input to the Gradient Boosting regression model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the predicted loan amount percentage and a base value of the Gradient Boosting regression model; and present the generated SHAP values for the Gradient Boosting regression model in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact on the predicted loan amount percentage, to facilitate understanding of how different factors influenced the calculated loan value.
Figure 2 illustrates a flow chart of a method for evaluating credit risk and approving loans using deep learning-based explainability in accordance with an embodiment of the present disclosure. At step (202), method (200) includes collecting credit-related data from a plurality of data sources, including structured data comprising transactional data, customer demographic data, and socio-economic data, and unstructured data comprising collateral images and customer interaction voice recordings.
At step (204), method (200) includes pre-processing the collected data using a processor, including cleaning, transforming, and structuring the data by performing steps comprising handling missing values, encoding categorical variables, scaling numerical variables, and performing feature selection and extraction.
At step (206), method (200) includes consolidating the pre-processed data and applying feature scaling using the processor.
At step (208), method (200) includes extracting features from the unstructured data using one or more deep learning models, comprising processing collateral images using a deep learning image classification model to classify image characteristics and generate image-based attribute values, and processing customer interaction voice recordings using a deep learning speech model to convert voice to text and perform sentiment analysis to generate sentiment-based attribute values, wherein the generated image-based and sentiment-based attribute values are added to the consolidated data.
At step (210), method (200) includes preparing and training one or more credit risk evaluation models using the consolidated data and the generated attribute values, the preparing and training comprising training a classification model to determine a likelihood of a customer to repay a loan, and training a regression model to predict a loan amount percentage if the classification model indicates the loan can be given.
At step (212), method (200) includes calculating a final loan value based on the predicted loan amount percentage and a current value of collateral accessed via an external interface.
At step (214), method (200) includes generating explanations for the decisions made by the classification model and the regression model using an explainability model, the explanations identifying factors contributing and not contributing to the decision.
At step (216), method (200) includes providing real-time credit risk assessment, prediction, and explanation generation by integrating the trained models and the explainability model into a loan origination system.
In another embodiment, a loan amount percentage prediction for loan applications with explainability, comprising loading, by a processor, consolidated loan application data and generated attribute values from a data storage unit. Then, merging, by the processor, the consolidated loan application data and the generated attribute values based on a common identifier. Then, applying, by the processor, a trained classification model to the merged data to predict an eligibility status for each loan application. Then, filtering, by the processor, the merged data based on the predicted eligibility status to create a subset of data containing eligible loan applications. Then, selecting, by the processor, a target variable representing a loan amount percentage and a plurality of features from the subset of data. Then, processing, by the processor, the selected features and target variable, including handling missing values and encoding categorical variables. Then, training, by the processor, an Gradient Boosting regression model using the processed features and target variable from a training portion of the subset of data. Then, for a new loan application receiving, by the processor, new loan application data including generated attribute values, applying, by the processor, the trained classification model to the new loan application data to predict a new eligibility status, and conditionally applying, by the processor, the trained Gradient Boosting regression model to the new loan application data, only if the new eligibility status indicates the loan is eligible, to predict a loan amount percentage for the new loan application. Then, generating, by the processor, explanations for the classification model's decisions by calculating SHAP (Shapley Additive explanations) values for each feature input to the classification model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the classification prediction and a base value of the classification model. Then, presenting, by the processor, the generated SHAP values for the classification model in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact direction (positive or negative influence on the prediction), to facilitate understanding of why a particular credit risk classification is made. Then, generating, by the processor, explanations for the Gradient Boosting regression model's predictions by calculating SHAP values for each feature input to the Gradient Boosting regression model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the predicted loan amount percentage and a base value of the Gradient Boosting regression model. Then, presenting, by the processor, the generated SHAP values for the Gradient Boosting regression model in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact on the predicted loan amount percentage, to facilitate understanding of how different factors influenced the calculated loan value.
The method (200) further comprising capturing data from image capturing device including at least one of: a camera for ornament images, a purity machine for ornament purity, a scanner for OCR, or a biometric device for authentication, and providing the captured data to the data source module, wherein processing collateral images using a deep learning vision model includes using a pretrained model configured for image classification, segmentation, and object detection, wherein processing customer interaction voice recordings using a deep learning speech model includes converting voice to text and performing sentiment analysis to determine if customer interaction is positive, negative or neutral, wherein training a classification model includes training an Gradient Boosting t classifier or a deep learning model comprising at least one input layer, at least one output layer, and at least two hidden layers, wherein training a regression model includes training an Gradient Boosting regression model wherein generating explanations using an explainability model includes utilizing SHAP values, wherein providing real-time credit risk assessment, prediction, and explanation generation includes receiving customer loan requirements and collateral details from the loan origination system and outputting real-time predictions of loan eligibility and amount, along with detailed explanations.
Figure 3 illustrates an architecture of a system for explainable credit risk analysis in accordance with an embodiment of the present disclosure. The Method of evaluating the likelihood of a customer to repay the loan on time or not and trying to provide the customer with a viable new credit. It determines the overall stability of credit and thereby reducing the NPA and not going into further losses. Credit Risk Analysis is crucial since that determines Financial Stability and the financial institution’s profitability as well. Nowadays, Credit Risk is very closely monitored by Financial Institutions since they are heavily regulated and such Loans are moving to NPA and will in turn affect the Institution’s credibility for lending. Credit Scores are defined for Individual’s payment history, Amount availed, credit history Length and credit inquires. Credit Scores also forms a basis for Credit Risk Analysis. Examining the value of Collateral for retail lending and considering the socio-economic factors also plays a major role in Credit Risk Analysis. Other socio-economic factors like unemployment, economic growth, interest rates also can impact the customers’ ability to repay the loans. Regulatory conditions, disruption in technology can also be factors contributing to Credit Risk Analysis. Credit Risk Analysis will bring in various risk mitigation strategies like whether to grant a loan, should we limit the exposure, should we bring in varied interest rates, collection of additional collaterals etc. Continuous monitoring of credit risk exposures, customer performance, overall portfolio health by senior management and regulatory reporting also are part of effective Credit Risk Management. Credit Risk Analysis is a very important and complex need of any financial system. It involves assessing the likelihood of a customer defaulting, estimating potential losses, and managing these risks through various tools and strategies. As industry evolves, particularly with advancements in technology and varied market and socio-economic conditions, credit risk management techniques must also be adapted to ensure financial stability, acceptance and growth.
Credit risk analysis was earlier relying on statistical methods, such as credit scoring models and regression analysis, to evaluate the probability of default. However, with advent of Deep Learning and Machine Learning models, there has been a notable transformation in the way credit risk is evaluated, enabling more precise predictions and improved risk management. Deep learning, which is a branch of machine learning that utilizes neural networks with multiple layers, has demonstrated significant potential in identifying intricate patterns and relationships within large datasets. Deep Learning models can model non-linear relationships thereby improving prediction accuracy. Deep Learning models can deal with both structured and unstructured data. Models take less time and accuracy than human decision-making efforts. We can rely on Deep Learning Frameworks such as ANN, CNN, RNN, Autoencoders and GAN. Advantages of Using Deep Learning for Credit Risk is that,
• Deep Learning can work on complex patterns in data leading to accurate Predictions.
• Since majority of the financial relationships are non-linear, DL works better for such non-linear relationships.
• Deep Learning Models can handle vast amount of data efficiently.
• Deep Learning Models can discover relevant features during training itself.
• Deep Learning Models can integrate with unstructured data as well.
Other important things to be considered are data pre-processing. Feature Selection and Dimensionality reduction techniques can also be employed to simplify the intricacy of the data. Once the Data is prepared, Deep Learning models can be trained and further evaluation/performance of the models can be done using accuracy, precision recall, F1-score, AUC and ROC curves. After training, the model can be deployed for real-time credit risk assessment, credit limit setting, risk management and prediction. As the technology evolves, deep learning will likely become more integrated into mainstream credit risk management systems, driving improvements in predictive accuracy and enabling better risk mitigation strategies for financial institutions.
The First Part of the Process is Data Collection and Data Pre Processing.
Regarding Credit Risk Model it’s important that we have the right data with size and attributes for Training, Testing and Validation. Model Accuracy entirely depends on the volume of the data available for processing. Data Consists of Transactional Data, Other Customer demographics, Socio Economic Data and other forms of structured and unstructured data as well. Even the collateral being submitted for Loans, along with its purity is also some of the important aspects of the data collection. Organizations with call centers can analyze the data on how they interact with the customer and vice versa.
Data Pre Processing is an important aspect in model building. From the raw data we need to analyze the same and convert the same into a usable, clean and structured format so that it can be fed as input to the model. Pre Processing plays an important role in the model accuracy. Poor Data quality will result in misleading predictions and in effect making the model accuracy at a lower side. Various steps have been taken as far as data preprocessing is concerned. Handling Missing Values, Encoding Categorical Variables, Scaling Numerical Variables and Feature Selection and Extraction.
Once the Data Pre Processing is done and completed, Organization will have a curated data which is error free and can be fed as input for any Machine Learning Models.
First Part of the Model is that from the Data, Images of Collateral will be passed on to a Deep learning Model, Deep Learning Models which has the capability of Image Classification, Segmentation and Object detection. Once the Image is Passed on to the Deep Learning Image Classification Model which is a Pretrained one, it will classify the image into any of the four categories like if it’s a New Or Old Type of Ornament along with name and among that if New then whether it’s a sentimental one or Non Sentimental One. This output value will be normalized and will be fed into the curated data as another attribute value for the Machine Learning Model. Then from the call center customer interaction data will use the voice recordings to be passed on to a Deep Learning Speech Model, which is used for Sentimental Analysis. First the Model will convert the voice into Text and then the Language Model will find the Sentimental Analysis of the Text. The Model is designed to return if the interaction with the call center is Positive , Negative or Neutral. The Results will also be fed into the curated database for classifying the customer into a High, Medium and low Risk Customer. This will be provided as feedback to the end user for the decision making purpose. After that the Data will be passed on to a Machine learning Model for Prediction. Model used Gradient Boosting Classifier model for Classification if the loan can be given or Not. Also a Deep Learning Model is also created for prediction purpose with one input, one output and two hidden layers. This is used for model comparison and also brining in the concept of Deep Learning from the traditional machine learning models. If the Loan can be given then the data attributes will be passed on to another Machine Learning Model created using Gradient Boosting Regression Model, which will predict the percentage of Loan that can be given. From the Percentage the Loan value is calculated with the Current Collateral Rate accessed through an API from the global Gold Rate Platform.
Further once the loan amount has arrived, using an Explainable AI Model using SHAP, will define what are the factors that are majorly contributing to the result and what all are not. It will explain to the user in Text Mode also as to what all factored for the decision and what all were not factored.
API’s has been created for all the models and finally the trained Models will fit inside the Loan Origination system where the user will feed in the customer Loan requirement and once the Collaterals are submitted, each collateral items will be scanned using a web camera and the input will be moved into the Deep Learning Image Classification Model using an API Call and will return the parameters associated with that. Once all such Collaterals are scanned, then the prediction and the Regression models API is called for displaying to the user the amount that can be given which the customer is sure to pay with the detailed Explanation using the Explainable Model API.
Usually, we see lot of Risk Analysis Models, which are all done reactive as an MIS for the higher management and with less no. of attributes. Here the attributes are more which contain Demographics, Collateral, Socio Economic Factors, Sentimental Factors and Transactions with internal Ratings and Score. The Main novelty and advantage of this is that it’s integrated with a Loan Origination System for Real Time Prediction and brining in the transparency which Regulators are insisting on. Apart from that there is less AI and ML based models which uses Deep Learning for Credit Risk Analysis.
The developed Credit Risk model represents a significant improvement in credit Risk analysis and providing accurate, interpretable and explainable Risk Report and New Credit Limit applicable by considering internal and external data sources which consist of financial transactions, socio economic factors, customer demographics and real time data. This model gives financial institutions an upper hand in Credit Risk Monitoring and building a healthier portfolio. The goal of this project is to develop a deep learning model from the mostly existing traditional models where some of the customer demographics and transactions are considered. Focus was given to ensure that the model gives a good prediction capability with complex data structures and also how good a new credit can be given. Adding the Explainability aspect improved the model transparency and binds with the regulatory framework as well. This model will be integrated in real time with Loan Origination System whereby real time assessment of customer can be done. This project will open doors for the research community as well.
The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.
Benefits, other advantages, and solutions to problems have been described above about specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims. , Claims:1. A system for explainable credit risk analysis and real-time loan approval using deep learning models, comprising:
a data source module configured to collect credit-related data from a plurality of sources, including structured data comprising transactional data, customer demographic data, socio-economic data, unstructured data comprising collateral images and customer interaction voice recordings;
a data pre-processing module coupled to the data source module, configured to clean, transform, and structure the collected data, including handling missing values, encoding categorical variables, scaling numerical variables, and performing feature selection and extraction;
a data consolidation module coupled to the data pre-processing module, configured to consolidate the pre-processed data by merging structured and unstructured data features based on a common identifier and apply feature scaling;
one or more deep learning feature extraction modules coupled to the data source module and the data consolidation module, comprising:
a deep learning image classification model configured to receive and process collateral images to classify image characteristics and generate image-based attribute values; and
a deep learning speech model configured to receive and process customer interaction voice recordings to convert voice to text and perform sentiment analysis to generate sentiment-based attribute values;
wherein the generated image-based and sentiment-based attribute values are fed into the consolidated data;
a model preparation module coupled to the data consolidation module, configured to train and test credit risk evaluation models using the consolidated data, the model preparation module comprising:
a classification model configured to determine a likelihood of a customer to repay a loan based on the consolidated data and the generated attribute values; and
a regression model configured to predict a loan amount percentage based on the consolidated data and the generated attribute values if the classification model indicates the loan can be given;
a loan value calculation module configured to calculate a final loan value based on the predicted loan amount percentage and a current value of collateral accessed via an external interface;
an explainability module coupled to the model preparation module, configured to generate explanations for the decisions made by the classification model and the regression model, identifying factors contributing and not contributing to the decision; and
an integration module configured to integrate the trained models and the explainability module into a loan origination system for real-time credit risk assessment, prediction, and explanation generation.
2. The system as claimed in claim 1, wherein the deep learning image classification model is a pretrained model configured for image classification, segmentation, and object detection, wherein the deep learning image classification model is configured to process collateral images by performing convolutional operations and pooling to extract hierarchical features, followed by classification layers trained to output a probability distribution over predefined image characteristics categories, and further configured to generate image-based attribute values by mapping these classifications or specific feature activations to numerical or categorical representations.
3. The system as claimed in claim 1, wherein the deep learning speech model is configured to convert voice to text and perform sentiment analysis to determine if customer interaction is positive, negative or neutral, wherein the deep learning speech model is configured to process customer interaction voice recordings by first applying a speech-to-text engine utilizing an acoustic model to generate a transcript, and subsequently feeding the transcript into a natural language processing (NLP) model configured for sentiment analysis to classify the sentiment as positive, negative, or neutral, and wherein the sentiment-based attribute values are generated as numerical scores or categorical labels representing the detected sentiment intensity or type, wherein the speech-to-text engine utilizes a recurrent neural network (RNN) or Transformer-based architecture for sequence-to-sequence transduction from acoustic features to text, wherein the NLP model for sentiment analysis employs a deep learning architecture selected from a recurrent neural network (RNN), a convolutional neural network (CNN), or a Transformer network, trained on a corpus of text data labelled with sentiment.
4. The system as claimed in claim 1, wherein the classification model is an Gradient Boosting classifier Model or a deep learning model comprising at least one input layer, at least one output layer, and at least two hidden layers, wherein the classification model is trained using a structured approach involving hyperparameter tuning through techniques selected from grid search, random search, or Bayesian optimization to identify the optimal configuration for the given consolidated data.
5. The system as claimed in claim 1, wherein the regression model for predicting a loan amount percentage for loan applications with explainability, comprising:
a data storage unit configured to store consolidated loan application data and generated attribute values;
a processor coupled to the data storage unit; and
a non-transitory computer-readable medium storing instructions executable by the processor, the instructions causing the processor to perform steps including:
loading the consolidated loan application data and the generated attribute values;
merging the consolidated loan application data and the generated attribute values based on a common identifier;
applying a trained classification model to the merged data to predict an eligibility status for each loan application;
filtering the merged data based on the predicted eligibility status to create a subset of data containing eligible loan applications;
selecting a target variable representing a loan amount percentage and a plurality of features from the subset of data;
processing the selected features and target variable, including handling missing values and encoding categorical variables;
training an Gradient Boosting Regression model using the processed features and target variable from a training portion of the subset of data;
for a new loan application:
receiving new loan application data including generated attribute values;
applying the trained classification model to the new loan application data to predict a new eligibility status;
conditionally applying the trained Gradient Boosting regression model to the new loan application data, only if the new eligibility status indicates the loan is eligible, to predict a loan amount percentage for the new loan application; and
an explainability module coupled to the processor, the explainability module configured to:
generate explanations for the classification model's decisions by calculating SHAP (Shapley Additive explanations) values for each feature input to the classification model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the classification prediction and a base value of the classification model;
present the generated SHAP values for the classification model in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact direction (positive or negative influence on the prediction), to facilitate understanding of why a particular credit risk classification is made;
generate explanations for the Gradient Boosting regression model's predictions by calculating SHAP values for each feature input to the Gradient Boosting regression model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the predicted loan amount percentage and a base value of the Gradient Boosting regression model; and
present the generated SHAP values for the Gradient Boosting regression model in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact on the predicted loan amount percentage, to facilitate understanding of how different factors influenced the calculated loan value.
6. The system as claimed in claim 1, further comprising one or more image capturing device configured to capture data for the data source module, the image capturing device including at least one of: a camera for ornament images, a purity machine for ornament purity, a scanner for OCR, or a biometric device for authentication.
7. The system as claimed in claim 1, wherein the integration module is configured to receive customer loan requirements and collateral details from the loan origination system and provide real-time predictions of loan eligibility and amount, along with detailed explanations, wherein the integration module is configured to integrate the trained models and the explainability module into a loan origination system by providing a set of APIs that allow the loan origination system to submit customer data and receive real-time credit risk assessments, predicted loan amounts, and corresponding detailed explanations for the decisions, wherein the integration module is configured to handle real-time requests from the loan origination system by efficiently routing the incoming data through the pre-processing, consolidated data, deep learning feature extraction, and model preparation modules, and then invoking the explainability module before returning the results to the loan origination system with minimal latency, wherein the explainability module is further configured to provide global explanations summarizing the overall impact and importance of different features across the entire dataset or a segment thereof, in addition to generating local explanations for individual customer cases, to provide insights into the model's general behaviour.
8. A method for evaluating credit risk and approving loans using deep learning-based explainability, comprising:
collecting credit-related data from a plurality of data sources, including structured data comprising transactional data, customer demographic data, socio-economic data, unstructured data comprising collateral images and customer interaction voice recordings;
pre-processing the collected data using a processor, including cleaning, transforming, and structuring the data by performing steps comprising handling missing values, encoding categorical variables, scaling numerical variables, and performing feature selection and extraction;
consolidating the pre-processed data and applying feature scaling using the processor;
extracting features from the unstructured data using one or more deep learning models, comprising:
processing collateral images using a deep learning image classification model to classify image characteristics and generate image-based attribute values; and
processing customer interaction voice recordings using a deep learning speech model to convert voice to text and perform sentiment analysis to generate sentiment-based attribute values;
wherein the generated image-based and sentiment-based attribute values are added to the consolidated data;
preparing and training one or more credit risk evaluation models using the consolidated data and the generated attribute values, the preparing and training comprising:
training a classification model to determine a likelihood of a customer to repay a loan; and
training a regression model to predict a loan amount percentage if the classification model indicates the loan can be given;
calculating a final loan value based on the predicted loan amount percentage and a current value of collateral accessed via an external interface;
generating explanations for the decisions made by the classification model and the regression model using an explainability model, the explanations identifying factors contributing and not contributing to the decision; and
providing real-time credit risk assessment, prediction, and explanation generation by integrating the trained models and the explainability model into a loan origination system.
9. The method as claimed in claim 8, wherein a loan amount percentage prediction for loan applications with explainability, comprising:
loading, by a processor, consolidated loan application data and generated attribute values from a data storage unit;
merging, by the processor, the consolidated loan application data and the generated attribute values based on a common identifier;
applying, by the processor, a trained classification model to the merged data to predict an eligibility status for each loan application;
filtering, by the processor, the merged data based on the predicted eligibility status to create a subset of data containing only eligible loan applications;
selecting, by the processor, a target variable representing a loan amount percentage and a plurality of features from the subset of data;
processing, by the processor, the selected features and target variable, including handling missing values and encoding categorical variables;
training, by the processor, an Gradient Boosting regression model using the processed features and target variable from a training portion of the subset of data;
for a new loan application:
receiving, by the processor, new loan application data including generated attribute values;
applying, by the processor, the trained classification model to the new loan application data to predict a new eligibility status; and
conditionally applying, by the processor, the trained Gradient Boosting regression model to the new loan application data, only if the new eligibility status indicates the loan is eligible, to predict a loan amount percentage for the new loan application;
generating, by the processor, explanations for the classification model's decisions by calculating SHAP (Shapley Additive explanations) values for each feature input to the classification model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the classification prediction and a base value of the classification model;
presenting, by the processor, the generated SHAP values for the classification model in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact direction (positive or negative influence on the prediction), to facilitate understanding of why a particular credit risk classification is made;
generating, by the processor, explanations for the Gradient Boosting regression model's predictions by calculating SHAP values for each feature input to the Gradient Boosting regression model for a specific customer instance, wherein the SHAP values represent the contribution of each feature to the difference between the predicted loan amount percentage and a base value of the Gradient Boosting regression model; and
presenting, by the processor, the generated SHAP values for the Gradient Boosting regression model in a human-readable format, such as a local explanation plot or a list of key contributing factors with their impact on the predicted loan amount percentage, to facilitate understanding of how different factors influenced the calculated loan value.
10. The method as claimed in claim 8, further comprising capturing data from image capturing device including at least one of: a camera for ornament images, a purity machine for ornament purity, a scanner for OCR, or a biometric device for authentication, and providing the captured data to the data source module, wherein processing collateral images using a deep learning image classification model includes using a pretrained model configured for image classification, segmentation, and object detection, wherein processing customer interaction voice recordings using a deep learning speech model includes converting voice to text and performing sentiment analysis to determine if customer interaction is positive, negative or neutral wherein training a classification model includes training an Gradient Boosting classifier or a deep learning model comprising at least one input layer, at least one output layer, and at least two hidden layers, wherein training a regression model includes training an Gradient Boosting regression model wherein generating explanations using an explainability model includes utilizing SHAP values, wherein providing real-time credit risk assessment, prediction, and explanation generation includes receiving customer loan requirements and collateral details from the loan origination system and outputting real-time predictions of loan eligibility and amount, along with detailed explanations.
| # | Name | Date |
|---|---|---|
| 1 | 202541052732-STATEMENT OF UNDERTAKING (FORM 3) [30-05-2025(online)].pdf | 2025-05-30 |
| 2 | 202541052732-FORM 1 [30-05-2025(online)].pdf | 2025-05-30 |
| 3 | 202541052732-FIGURE OF ABSTRACT [30-05-2025(online)].pdf | 2025-05-30 |
| 4 | 202541052732-DRAWINGS [30-05-2025(online)].pdf | 2025-05-30 |
| 5 | 202541052732-DECLARATION OF INVENTORSHIP (FORM 5) [30-05-2025(online)].pdf | 2025-05-30 |
| 6 | 202541052732-COMPLETE SPECIFICATION [30-05-2025(online)].pdf | 2025-05-30 |
| 7 | 202541052732-Proof of Right [30-06-2025(online)].pdf | 2025-06-30 |
| 8 | 202541052732-FORM-26 [30-06-2025(online)].pdf | 2025-06-30 |
| 9 | 202541052732-FORM-9 [05-07-2025(online)].pdf | 2025-07-05 |
| 10 | 202541052732-FORM-26 [05-07-2025(online)].pdf | 2025-07-05 |
| 11 | 202541052732-FORM-8 [19-07-2025(online)].pdf | 2025-07-19 |
| 12 | 202541052732-FORM 18 [21-07-2025(online)].pdf | 2025-07-21 |