It is very common for any financial institute to deal with fraudsters. These encounters are in so many different levels such as internet, card services, phone calls and everywhere that fraudsters could claim to be the actual account holders. Although banks (and many other institutes) fight this matter, sometimes having too many false positives forces them to pay the price. This becomes more and more serious as the patterns which fraudsters follow are exactly the ones which account holders have.

The way machine learning is “expected” to work is to take all the unstructured data and provide the exact fraud activity, either a purchase, a call or any web interaction and nothing else. In fact, I hear this often “give the data – all of it – to the black box and run; see what it sings”. Let’s see why this problem is difficult. We’ll discuss in another post what a magic ML box can do (but hey, there is no miracle; it’s all calculus. No attributes and no change in position means no detection.)

The challenge in anomaly detection comes in several categories; we only go over a few items:

A. Datasets with anomalies are NOT balanced.

B. Anomaly detection goes beyond the data.

C. Fraud patterns change.

Anomaly cases are rare, in particular fraud. In one occasion that I had to detect fraudulent cases, anomalies were 1 in 20,000 observations. This makes the training set heavily unbalanced. For every single anomaly, there are ten others with the exact same behavior. Result? False positives! Anyone who has worked with any kind of trend curves or just a simple scatter plot, shall see quickly that analysis will consider rare points with or without justification an outlier and can hardly predict such values in future; you need attributes to differentiate good ones from bad ones. Even deep learning neural net methods may never find any correlation. This means the quantity of unusual patterns is low to the point that there is not enough of them for machine to learn.

Therefore the quantity of the usual patterns with the exact same pattern as the unusual ones is so much higher that machine simply never learns from the unusual activity patterns. Essentially this becomes a point to look at the problem from a different angle to solve or develop other attributes. Sometimes a combination of unsupervised learning, transformation followed by supervised learning can work.

Consider a sample data provided for anomaly detection. It is often the case that characteristics of an observation by say a fraudster is exactly like an account holder. This makes the problem solving and finding anomalies close to impossible. If you were a fraudster, what would you do? Follow account holder patterns or a complete outlier pattern? It is usual for a fraudster to know all the information about the actual account holder. Otherwise, finding an outlier based on characteristics which are different compare to the account holder does not need the heavy power of machine learning. Why do you need machine learning to solve this problem? Any filtering can catch that and that’s exactly why fraudsters follow the exact patterns of the actual users.

More, any problem solver craves to find information in the associated data or a combination of integrated form of data to explain anomalies in some way with little false positives. This is the dream for any such problem solver; correlation.

It is very common that the way fraudsters use a phone call or a card or a web will be in the right way for them to succeed and the next time, they will not use the same method. In fact, they will move on to another user. Would it make sense to them to follow the same patterns? That only means getting caught. Any machine learning model should have a nature of constantly changing based on the pattern that it sees, learning from errors of the past.

A book which I found very useful is below; it helped me in several cases.

Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection

Farzan Jafeh, PhD – Aug 2019