Applying Machine Learning to Fraud Detection
Supervisor: Cristina Cerqueira
When I first joined Farfetch I was told they were looking for someone to implement Machine Learning algorithms to help predict possible fraud attempts. These fraud attempts can consist of something as simple as using someone else’s credit card to make purchases to more convoluted methods which I might not be aware of. I was unaware (that is, until my first day there) that I was going to be the only one (at the time) working on it. Hence I dived into learning more about SQL and the Pandas module in Python (both of which I was working with for the first time) and into Machine Learning algorithms. I used Microsoft’s Azure Machine Learning Studio for a quick testing of different Machine Learning algorithms, and I repeated the same in R to double-check the accuracy of different algorithms. Both agreed that Random Forests was the most efficient method.
Having decided on Random Forests for the initial implementation in Python. During the next three weeks came the manual labor: building the queries to retrieve all the needed that in an efficient manner (where information was not only across different tables but also different databases, which slowed down the process) to transforming the raw information into more relevant data using tips from field experts at Farfetch, to checking that datatypes match,… and finally running the Machine Learning algorithm on the transformed data. After all this the python/sql scripts worked as expected. I quickly tried the same scripts with different algorithms to check (again) for efficiency, but similarly to the initial tests Random Forests performed best.
All of the above was done on static data. I still had to make it so that the algorithm could run live. The speed of the script wasn’t an issue, but I had yet to understand the way Farfetch pipelines so that I could integrate my work. Unfortunately my internship was coming to an end (and I had to go back to my studies in October). Hence I spent my 2nd last week organizing my code (It was a mess when I started, but by the end it was incredibly neat), and writing plans/suggestions/documentation for what exists/could be improved/needs to get done. In the last week a new intern, Nuno Monteiro, came along to continue this project (which by then got the informal nickname of project Sherlock). During that week my work was to put him up to date on the work that was done.
Nuno was doing his Master thesis at Farfetch, specifically about implementing Machine Learning in industry (titled A data mining based system for credit-card fraud detection in e-tail) , so he wrote his thesis alongside this internship (and I got to co-author the paper given my initial role in Project Sherlock).
Looking back I reckon I would’ve been more productive if I practiced more SQL/Python and less time on understanding the theory behind the Machine Learning algorithms used, but they were so interesting (As much as I like coding, there will always be a mathematician in me curious about how things work, and not just in making them work).
I’d also like to thank João Sousa (Pandas expert) and Carlos Gomes (SQL expert) for all the technical support they’ve given me during the internship, and everyone else who was there during breaks and lunch and gym sessions and dinners out. Couldn’t have had a better internship.