Fintech Sector

Loan Application Fraud Detection

Fraud Detection Predictive Modeling

Developing a model that can identify fraudulent loan applications for a financial institution?
Challenge Accepted!

Process & Story

A company from the banking sector approached us with a significant problem. They wanted to develop an AI model that could identify fraudulent loan applications. Currently, banks and lenders struggle with this issue, generating substantial losses to them.

Problem

Loan and mortgage fraud are standard practices in the financial world. Application fraud is a type of criminal activity, in which a dishonest individual uses a stolen or contrived ID to apply for a loan or line of credit with the goal of not repaying it. The con artist establishes credible-looking credit and account activity over time to gain access to larger amounts of loans and higher lines of credit. Fraudsters utilize a break-out technique, which is a wide method of fraud that involves committing numerous instances of application fraud. Over time, the fraudster builds up a number of credit lines. The perpetrator then maxes out all of the credit lines in rapid succession before disappearing.

Identifying these cases manually is challenging because of the substantial volume of applications to be reviewed each day. Fraudulent applications take a long time to detect and even longer to deny, resulting in more significant revenue loss. Our client, just like other financial institutions, was experiencing the same issue.

Solution

The solution was to create a model that could sift through all loan applications and sort out the probable fraudulent ones from the genuine ones within minutes.

Our team started with a thorough understanding of the business process. We learned how both the client and its customers interact in this transaction, including deciding whether to offer a loan, write a contract, apply for a credit score, process loans, and other related activities.

We started by researching what kind of data is available and performing initial data exploratory tasks. As loan fraud at the application stage is not that common, we struggled with significant data imbalance. We decided to build a logistic regression model with the available features like loan transaction details, bank statements, credit score, history on banking activities and credit history, bank account details, borrower’s identity, age, income and asset information, and other public information.

The model based on logistic regression classified applications into binary categories of regular and fraudulent ones. This approach enabled fast case processing and immediate fraud alerts. Thanks to the model, bank operators can notice suspicious applications and immediately take action to protect their customers and the bank itself.

Case Study Schema Loan Application Fraud Detection

Tools

Google Cloud Platform

Tensorflow

Python

Have a similar project in mind?

LET'S TALK!

Challenges

Defining what to consider a fraud
The line between fraud and non-fraud can be very thin and the ML model has to incorporate both business knowledge, company policy, and statistical features. Sometimes even a good definition of fraud is a challenge. Building a good model requires iterations and very close collaboration with the client.

Datasets imbalance
First, since there are very few loan and mortgage fraud cases in general, we had fewer samples for creating a predictive model. The scarcity of examples was problematic because it imbalanced our training set with many examples from the positive category and very few examples from the fraudulent one. For this reason, we used various sampling methods to even out these numbers and reach an expected accuracy level for default prediction.

Model explainability and interpretability
Various AI-based applications used in the fintech industry are required by law to be explainable, i.e., banks and loan originators must explain their decisions to the regulators or auditors. This is why we decided to utilize logistic regression modeling, which is explainable and generates results that both humans and machines can easily understand.

Architecture design
The model had to work with a large dataset in order to automate the loan underwriting process. Predictions should be quick enough for bank representatives to process loan applications in minutes and accurately detect mortgage fraud. The architecture we designed could meet these requirements and generate accurate results.

Our other projects:

Katana Studios

Powering Real-Time 3D Automotive Configurator with Multi-Cloud GPU Rendering at Scale

Empowering real-time product visualization in automotive marketing with a resilient, cloud-native rendering platform and elastic UI Challenge Accepted!

Computer Vision Data Engineering

Boussias Group

Boussias Group Turning Disconnected Data into a Scalable Data Intelligence Hub

How Boussias transformed manual reporting by unifying 3M+ records and 1,000+ tables into a cloud-native data lake. Challenge Accepted!

Data Engineering

Reverse Vending

Building a Real-Time Packaging Recognition System with Edge-Optimized AI

With sub-500ms detection and lightweight packaging, the computer vision model is built for frictionless scaling, ready to support hundreds of machines in the national deposit return network. Challenge Accepted!

Edge Devices Computer Vision Data Engineering

Loan Application Fraud Detection

Process & Story

Problem

Solution

Tools

Have a similar project in mind?

Challenges

Our other projects:

Powering Real-Time 3D Automotive Configurator with Multi-Cloud GPU Rendering at Scale

Boussias Group Turning Disconnected Data into a Scalable Data Intelligence Hub

Building a Real-Time Packaging Recognition System with Edge-Optimized AI

1. Definitions

2. Cookies

3. How System Logs work on the Website

4. Cookie mechanism on the Website

5. Additional information