Process & Story
A company from the banking sector approached us with a significant problem. They wanted to develop an AI model that could identify fraudulent loan applications. Currently, banks and lenders struggle with this issue, generating substantial losses to them.
Problem
Loan and mortgage fraud are standard practices in the financial world. Application fraud is a type of criminal activity, in which a dishonest individual uses a stolen or contrived ID to apply for a loan or line of credit with the goal of not repaying it. The con artist establishes credible-looking credit and account activity over time to gain access to larger amounts of loans and higher lines of credit. Fraudsters utilize a break-out technique, which is a wide method of fraud that involves committing numerous instances of application fraud. Over time, the fraudster builds up a number of credit lines. The perpetrator then maxes out all of the credit lines in rapid succession before disappearing.
Identifying these cases manually is challenging because of the substantial volume of applications to be reviewed each day. Fraudulent applications take a long time to detect and even longer to deny, resulting in more significant revenue loss. Our client, just like other financial institutions, was experiencing the same issue.
Solution
The solution was to create a model that could sift through all loan applications and sort out the probable fraudulent ones from the genuine ones within minutes.
Our team started with a thorough understanding of the business process. We learned how both the client and its customers interact in this transaction, including deciding whether to offer a loan, write a contract, apply for a credit score, process loans, and other related activities.
We started by researching what kind of data is available and performing initial data exploratory tasks. As loan fraud at the application stage is not that common, we struggled with significant data imbalance. We decided to build a logistic regression model with the available features like loan transaction details, bank statements, credit score, history on banking activities and credit history, bank account details, borrower’s identity, age, income and asset information, and other public information.
The model based on logistic regression classified applications into binary categories of regular and fraudulent ones. This approach enabled fast case processing and immediate fraud alerts. Thanks to the model, bank operators can notice suspicious applications and immediately take action to protect their customers and the bank itself.