Using machine learning in fraud detection - How does it work? What are the benefits?
When combined with other anti-fraud measures, machine learning improves detection by a factor of five. The more data computers analyze, the better they will become at spotting fraudulent behavior, making it harder for criminals to succeed, writes Jérôme Bovay.
Fraud prevention has long used data analysis to help stop criminals. By looking at how fraudsters operate, we can learn how to stop and even catch them.
Traditionally, data analysis was a long, slow slog, with the upshot that fraud was mostly spotted after the event, if at all. Today, data crunching is faster, broader and deeper because we can teach computers to recognize unusual bank transactions – a process known as machine learning.
Machine learning relies on data – the more data, the better the algorithm becomes at detecting, and even preventing, fraud.
The use of machine learning, in tandem with other anti-fraud measures, has been shown to reduce the number of false positives (an alert that is raised for a genuine transaction) by a factor of five. This frees up time for investigation departments to deal with genuine problem transactions and avoids the need to inconvenience customers unnecessarily.
For optimum results, machine learning needs to come in two forms – supervised and unsupervised.
Unsupervised learning is based on unlabeled data. It has not been examined by the bank and comes without any description. Supervised learning is based on labeled data – in a fraud detection context, it will be described fraudulent or genuine.
Labeled data allows computer programs to build up a picture of what a normal transaction looks like; unlabeled data lets them look for transactions that deviate from the norm. You need both for anti-fraud algorithms to work well.
When unlabeled data looks suspicious, the algorithm flags it and an alert is sent to the bank, which will examine it and return a verdict of genuine or fraudulent. In this way, data becomes labeled, and over time, the algorithm will refine its ability to detect suspicious transactions.
The next step is for algorithms to look for suspicious patterns in data – for example, a succession of transfers just below an alert threshold to the same recipient. The number of transactions required to build a good anti-fraud model varies according to the size and complexity of the client. A company, for example, has a much bigger financial footprint than an individual, and will require more data crunching as a result.
A single algorithm will never stop fraud by itself – that ultimately depends on using a combination of techniques and algorithms and multiple data feeds. Rules are also imperative as they allow data to be categorized and labeled. Once the data is labeled, machines can look for patterns in the way fraudsters avoid or break the rules. Through variations in patterns, machines might be able to identify new types of fraud.
Our machine learning approach at NetGuardians is based on a statistical model that combines supervised and unsupervised learning. We use unlabeled data to build a profile of a user. We use labeled data to improve risk models and reduce false positives. Using both approaches, NetGuardians’ anti-fraud solution can cut the number of false positives by 80 percent, cut time spent on dealing with hits by 93 percent, thereby improving operational efficiency and the customer experience.
We aim to continue refining our formula to improve our performance. For example, we are continuously examining where to set thresholds for an alert. There are always going to be some frauds between the far outliers in a group and the norm. By continuing to learn about these outliers we will be able to refine our thresholds to ensure we spot more fraud without a concurrent increase in the number of false positives.
Fraudsters will always develop new ways to hide what they are doing, but machine learning is rapidly catching up with them. Our computer models have already created a step change in the efficiency of anti-fraud processes, with a huge drop in false positives, reduced operational costs, and the ability to detect new kinds of fraud. By further feeding and refining our models, we will leave fewer places for fraudsters to hide.