Take me to...
The specific use case for each of our machine learning models has its own fraud Modus Operandi, or risk typologies, and a different set of data to master.
For TSYS, as an example, Featurespace’s Data Scientists brought a world-leading risk score, called the ‘Foresight Score’ to market. This is something TSYS can provide to its own clients to bolster their transaction fraud detection capabilities while reducing false positives.
That same group also:
It’s uncommon for a Data Science team to deliver world-leading machine learning models across so many practice areas, yet that’s a critical aspect of the Featurespace mission. It’s even more uncommon for a team to create industry-leading scores in so many areas.
So, what’s our secret? The answer is going to sound mundane to some people. There are no unique industry insights or modeling techniques. We simply have a strong process.
Each model we deploy is packed with insights. The feature concepts that detect fraudulent activity, the robustness of the models, the consistent performance of those models — these are all the results of incremental gains that many people have made over many years.
Enda Ridge’s book “Guerilla Analytics: A practical approach to working with Data” describes 7 principles for designing a way of working for your data science team. Applying those principals to the way that we do Data Science at Featurespace has been transformational.
Stripped back, the Guerilla Analytics principles show Data Science teams how to think about model development or publishing of analytic results with all the rigor that is found in modern software development. The principles encourage Data Science teams to bring this consistency, not only to management of code but also management of data.
As a group, we maintain unambiguously defined expectations for the way work should be structured from how source data should be received, how data transformations should be tracked, how modelling code should be written, and analytical results stored and tracked. We strive always to maintain ways of working that is prescriptive yet also flexible to accommodate different customer environments, new technologies or atypical projects which do not perfectly fit within the prescribed boundaries. Flexibility is extended if work is done in a way that meets the following intentions:
Working in a common way across the team results in quantifiable increases in the quality and quantity of the Data Science that we produce in four notable ways:
Our way of working defines how we do what we do as a team, but it must give us freedom to adapt and evolve so that it continues to meet our needs. By working on small experiments, we create the common ways of working that forges a leaner and more effective way of achieving the principles that underpins its design. This approach is actively encouraged across the team. During regular retrospectives changes to the ways of working can be advocated for, and voted on, by the whole team for inclusion in its current working version. We’ve found that encouraging ownership of incremental improvement keeps engagement with the ways of working high, and keeps uptake high, allowing the whole team to continue to benefit.
In the same way that the aviation industry relies on pre-flight and in-flight checklists, we’ve also developed as a team to lean heavily on checklists as we build models. Each project phase is accompanied by a checklist containing all its key considerations from the technical tasks to be conducted to the customer communications and agreements required in the phase.
How is something so simple quite so effective? Deploying machine learning models well could easily be described as a dark art. The list of small issues that can catch you out as you try to develop a model from a historical dataset that generalizes well as a test set and does exactly the same on a run-time data stream reliably, seems to grow steadily over time. Each time you think you have mastered it real-time machine learning has a habit of reminding you that you still have much to learn. Before we had checklists as a team, we relied on the knowledge of a few experienced (read: battle-hardened) Data Scientists who try to draw down all their applicable nuggets of knowledge from memory and impart these to a project team. What we sorely lacked was a process that systematically collected the knowledge and allowed it to be distributed in a scalable way.
The checklists now represent the most succinct version of our team’s knowledge about successfully deploying machine learning models. Take two simple examples from our checklists for data exploration ‘check for duplicates in the data’ and ‘ensure that the time-zone applied to datetimes across different data sources/ fields is consistent’. Neither is a world-beating data exploration insight, and both are reasonably obvious if you’ve explored transactional data before! Nonetheless, if both are handled well during modelling, you’d notice a small positive impact on the quality of the score produced.
We recognize that excellent score performance comes from the accumulation of many small performance gains. What checklists have given us is a system that ensures all these incremental benefits are packaged in every model that is deployed. No Featurespace model is deployed without its project team demonstrating that its passes all the checks. The cynical might frame this as over-prescriptive but in our experience its liberating. The checklists provide prompts but are never exact about the way that the task is accomplished, allowing the Data Scientist freedom to experiment or explore in their own way. For the less experienced team members, they provide guidance along a narrow path to a successful outcome whilst prompting thoughtfulness about how the point applies to the project in question. For those with more experience, the checklists prevent complacency and create the freedom of mind to think about areas not considered by those before them.
As with the ways of working, the most important thing about our checklists is that they are not static and owned by the whole team. Improving the lists as a part of every project completed is an unwritten expectation, contribution is encouraged and celebrated since each item added represents the generation of new knowledge which can benefit every future project.
Our third and final process pillar is model review sessions. We conduct these at the end of each important phase of a project, creating an intentional moment to step away from details and to inspect the quality and health of the overall of the machine learning system that we are building for a customer. Experienced Data Scientists who have not worked closely with the project team ask questions about the decisions made with the project team and ensure that best practices have been followed.
So why is this valuable when there are rigorous processes around data processing, code quality and extensive lists containing the accumulated knowledge of the team? The answer lies in the realization that experience is the ability to apply existing knowledge to new problems well. Some pieces of knowledge can be distilled into checklists and easily applied by all whilst other pieces of knowledge require intuition and experience to apply to full effect.
The difference between success and failure when deploying machine learning systems is rarely the micro-decisions about feature engineering or hyper-parameter choice. Misinterpreting the scope, misspecification or misunderstanding of data and unsuitable model design are much more existential problems. These are the problems where intuition and experience pay dividends, so we’ve arranged our model reviews around these existential problems for any machine learning deployment. In short, we bring in extra experience in the three places where its most valuable.
Model scope review: Here we aim to ensure proper and accurate translation of business requirements into a machine learning problem. Are the functional requirements for the model clear? Do we have score performance metrics that accurately reflect what the customer cares most about? Will we have access to data sources suitable to build a machine learning system from? Experience and intuition are extremely valuable when assessing the viability of a machine learning system. For example, optimizing a model to detect gross fraud value when the customer is looking to reduce net fraud losses is an easy mistranslation to make. A more subtle example is overlooking a feedback loop which means that once deployed your supervised machine learning model will see a steady degradation in the quality of its training data.
Data understanding review: No machine learning system is better than the data that its built from, but any misunderstanding of the data puts an artificially low ceiling on the quality! During this review we ask our teams to talk through their understanding of their modelling data in the context of the scope of the model they are trying to build. An experienced Data Scientist as a reviewer cannot change the quality or quantity of the data exploration conducted by the project team but can assess the quality of understanding and interpretation. It’s the act of needing to clearly articulate understanding that bring most value to this session.
Model design review: In this session, we examine the overall health of the model produced and its fitness for deployment. Goals for the reviewer are to gain confidence that the model will generalize well from the development data set to scoring runtime data and that the model has been designed in the most robust and resilient way possible.
The key to running successful model review sessions as detailed above is much deeper than the act of scheduling a review. Positive project improving outcomes from review sessions are amplified by striving to create a kind and supportive culture where mistakes can be admitted publicly and where challenges or questions are encouraged irrespective of the experience or job title of the challenger or challenged. Our culture has created the psychological security for members of the project team to seek guidance and ask questions about their areas of concern or uncertainty from those with experience.
Of course, one valuable element from these sessions is to provide assurance that the Data Science team’s ways of working has been followed and the checklists for the appropriate phases covered by the project’s Data Scientists. This quality assurance stage of the review can be conducted in advance since the ways of working and checklists create clear pass/fail criteria for the review; further fostering a sense of psychological security for less experienced team members. Work is ‘examined’ by those with more experience but the ‘exam questions’ are published well in advance so there are rarely surprises. Moving assurance stages away from the review session itself further liberates the session to become a collaborative forum for discussion with all participants present on equal footing.
Every team member sees project reviews as a valuable opportunity to learn and improve whilst generating fresh and new ideas, including the Data Scientist conducting the review. We believe that the quality of the machine learning systems produced gets pushed higher and higher in an environment like this.
We’ve shared the three key elements from our data science process.
These three steps have been our secret to success so far because they’ve created a system which supports bringing together the learnings and ideas from all the brilliant people from our global Data Science team past and present.
While it’s only taken a few hundred words to describe the key pillars of our process, it’s taken us many years and a lot of work to get to create what we have today. Our success has always come from, and will continue to come from, the belief that we will never end offering the right guardrails to our teams. We want to ensure we nurture innovation and creativity so that our teams flourish because there is no end to making the world a safer place to transact.