Deploying Deep Learning Techniques to Estimate Probability of Default: Way to Data Driven Credit Risk Modeling
Summary
Correct estimation of probability of default (PD) for credit loans is an essential
task for BridgeFund, an online loan broker operating in the Dutch
Small and Medium-Sized Enterprizes (SME) market . Advanced machine
learning techniques are increasingly being explored to enhance prediction
accuracy. Traditional models like logistic regression offer clear interpretability
but often lack predictive power compared to more complex algorithms.
Ensemble methods and deep learning techniques show potential
for significant performance improvements in PD quantification. This
study compares XGBoost, Random Forest, Feedforward Neural Networks
(FNN) and Tabular Networks (TabNet) against logistic regression to determine
their efficacy. The results show that XGBoost outperforms logistic
regression and all other models, in all evaluation metrics for PD scoring.
However, the "black box" nature of XGBoost raises concerns about model
transparency and stakeholder trust, necessitating careful implementation.
Developing techniques to demystify XGBoost’s decision-making process
such as calculation of SHAP values will enhance the model’s interpretability
and, therefore, applicability.