Fintech Sector

Predicting Bank Loan Defaults for Profit Maximization

Predictive Modeling

Developing an AI model for loan default prediction and using it to increase bank profits?
Challenge Accepted!

Predicting Bank Loan Defaults for Profit Maximization

Process & Story

A company from the fintech industry was trying to develop an AI model that could predict bank loan defaults in order to use the information for profit maximization. The company has access to a large dataset of customers’ credit records and other related data on their customers’ activity that could potentially be used to automate credit scoring and streamline the decision-making process for credit approval.

Problem

In the current credit scoring systems, it is a very time-consuming process for agents to manually look through all of the documents and other information to decide on the loan approval. In addition, there was no comprehensive way to determine which customers would default on their loans without manually going through every case file. On the other hand, banks gain additional interest rates from clients late with their installments, so the credit decision should also be optimized for bank profit maximization.

Solution

To predict loan defaults, we applied logistic regression, which is a parametric machine learning algorithm that models the probability of falling into either one of the binary classes (client will default with his loan or have a clear payment status).

The model was trained with historical data on loan defaulted clients to learn the patterns which are related to defaults.

The results of the logistic regression showed that besides some apparent features like credit score, debt to income ratio, or duration of the credit history, there were also some other less obvious ones that proved to be significant for predicting the bank loan default (e.g., demographic data, homeownership).

Due to the fact that the existing dataset is not balanced, which means that there are many more customers with clear loan status than customers who default, we used the sampling method to address this issue. The sampling method is a special case of statistical inference where observations are selected from a population to answer a question about the whole population.

Another important part of the solution was to make it interpretable and easy to understand for business users, so they can easily identify which features have the highest impact on predicting defaults. If the model is too complex and difficult to comprehend, it will be hard for bank agents to use it as a useful tool for decision-making.

Case Study Schema Predicting Bank Loan Defaults for Profit Maximization

Tools

Python

Tensorflow

Google Cloud Platform

Have a similar project in mind?

LET'S TALK!

Challenges

Model explainability
Like many other fintech AI-based solutions, this model was required by law and the regulatory authorities to be explainable. Our goal was to develop a model that could step by step explain the results for all processed cases marking which input parameter had what impact on the likelihood of the case falling into one of the binary categories (loans settled on time; defaulted loans). We decided to use the logistic regression model for binary classification since it is highly interpretable by nature.

Model interpretability and parametrization
As the model was developed to be used for profit maximization, it was necessary to have interpretable results. This means that each parameter of our regression model should give us insight into the borrowers' payment behavior on time or not on time. It was also designed to provide the financial institution the freedom to alter the settings and feature importance to match broader company strategic decisions (e.g., how changing interest rate from current sales campaign affects the default risk assessment, loan performance, and overall company annual income). We provided a fully transparent solution with attribute level weight estimations towards default probabilities.

Training datasets imbalance
The overall number of people defaulting with their loans is relatively much smaller than the number of loans paid on time. This has consequences on the training process of the logistic regression model. The negative cases are much less represented than the positive ones, making it harder to train a good model that can predict defaults accurately, limiting the number of false-positive decisions. For this reason, we used various sampling methods to even out these numbers and reach an expected accuracy level for default prediction.

Fast framework and infrastructure
The model had to work with a large dataset, and predictions should be quick enough for banks to assess credit risk, provide accurate credit scores and make AI-backed decisions on their loans in minutes after receiving each new incoming loan application.

Our other projects:

Katana Studios

Powering Real-Time 3D Automotive Configurator with Multi-Cloud GPU Rendering at Scale

Empowering real-time product visualization in automotive marketing with a resilient, cloud-native rendering platform and elastic UI Challenge Accepted!

Computer Vision Data Engineering

Boussias Group

Boussias Group Turning Disconnected Data into a Scalable Data Intelligence Hub

How Boussias transformed manual reporting by unifying 3M+ records and 1,000+ tables into a cloud-native data lake. Challenge Accepted!

Data Engineering

Reverse Vending

Building a Real-Time Packaging Recognition System with Edge-Optimized AI

With sub-500ms detection and lightweight packaging, the computer vision model is built for frictionless scaling, ready to support hundreds of machines in the national deposit return network. Challenge Accepted!

Edge Devices Computer Vision Data Engineering

Predicting Bank Loan Defaults for Profit Maximization

Process & Story

Problem

Solution

Tools

Have a similar project in mind?

Challenges

Our other projects:

Powering Real-Time 3D Automotive Configurator with Multi-Cloud GPU Rendering at Scale

Boussias Group Turning Disconnected Data into a Scalable Data Intelligence Hub

Building a Real-Time Packaging Recognition System with Edge-Optimized AI

1. Definitions

2. Cookies

3. How System Logs work on the Website

4. Cookie mechanism on the Website

5. Additional information