Model Fairness in Fraud Prevention – Human Intervention and Practical Solutions

5 min read May 5, 2022

Take me to...


Iker Perez

Principal Research Scientist

Iker is a Principal Research Scientist at Featurespace, where he specialises in uncertainty quantification and probabilistic reasoning, as well as interpretability and algorithmic fairness in automated fraud prevention. He joined Featurespace in 2020, following a lectureship in the School of Mathematical Sciences at the University of Nottingham. Iker holds a PhD in probability theory, and he is an accredited graduate statistician by the Royal Statistical Society. Loves dogs.

LinkedIn Connect

Tackling a problem where you know that no perfect solution can ever exist is challenging. Each of the existing, widely accepted and mutually exclusive criteria for model fairness has its promoters and detractors, who collectively must put into consideration their competing interests in specific use cases. To oversimplify things, well calibrated fraud detection models that efficiently prevent malicious behavior, as it is experienced today, are prone to exploiting the many biases present in financial transaction data. On the other hand, fraud prevention systems that are tailored to ensure parity and equal opportunities for the public and businesses achieve this by making sacrifices on efficiency, i.e., allowing preventable crime to happen.

At Featurespace, we strive to interpret the presence of biases in our fraud prediction systems to ensure we make the world not only a safer place to transact, but a fairer one. Our modeling guidelines and best practices impose strict controls over feature schemas, search for spurious associations and ensure that sensitive information is only used for reporting and evaluations of bias, unless otherwise justified. We commonly follow lengthy iterative processes whereby score distributions, as well as true and false positive rates are scrutinized over subsets of the data defined across potentially sensitive partitions. Our current strategy is directed towards making advances in constructing interpretability and explainability tools that can help our internal data science teams and subject matter experts make informed modeling decisions that are better tailored to each of our customers’ preferences.

To facilitate these fairness analyses, we draw motivation from pragmatic approaches to addressing the three primary fairness criteria.

Addressing independence

This amounts to ensuring that (conditional) statistical demographic parity and group fairness are satisfied. In simple terms, our fraud models should yield the same score for any two financial transactions associated with different entities, if the only differentiating factor in the transaction details and entity profile is the value of a protected attribute. To this end, simple statistical tricks are commonly used to enforce the independence property, applicable at different steps in the model construction process. This includes, but it is not limited to:

  1. Data curation: Numerical feature vectors are subjected to matrix factorizations and decompositions that decorrelate each field from any existing sensitive information. Propensity scores from protected populations are recorded for downstream causal reasoning exercises. 
  2. Model training: A model’s hypothesis space is restricted with strong regularization terms that enforce parity, by matching distributional moments in scores across protected demographic partitions.
  3. Score calibration: Model outputs are scaled to ensure conditional parity.

Addressing separation

In binary classification settings associated with fraud prevention, achieving separation requires that receiver operating characteristic (ROC) profiles in fraud models be invariant to changes in sensitive information. Here, true and false positive rates for fraud prevalence, as well as their negative counterparts, must be consistent across sub-populations that are segregated by regional characteristics or their financial activity profile. In practice, this is commonly achieved post-processing, i.e., after model design.

Commonly, the performance of a fraud prevention model will differ across geographical regions, merchant types or customer profiles. For instance, prevention models within merchant consortia are often better calibrated to the large institutions within. Thus, separation is usually achieved through interventionism, by deliberately worsening the ROC curve profiles associated with well performing groups, to ensure that performance falls in line with the lowest common denominator. In lay terms, this offers positive discrimination through the benefit of the doubt and is applicable to randomly chosen entities which would commonly score highly for potential fraud. In our consortia example, this would be achieved by deliberately classifying as genuine financial activity a reduced number of transactions associated with fraudulent customer behavior by your prevention model. Such interventionism in your fraud system’s behavior is controversial and comes at a significant cost. However, it is a cost that encourages model builders to improve the performance for the lowest common denominator. Interventions here are generally targeted towards collecting more diverse observations that can contribute to fairer and more efficient model designs in the future.

Addressing sufficiency

Sufficiency is a property generally satisfied by well-trained off-the-shelf machine learning systems, without the need for any external intervention. Here, scores from your fraud prevention system should be truly representative of the likelihood of malicious activity associated with transactions, all protected attributes considered. It is paramount to avoid over-fitting, data leakage and similar mistakes during model training, since sufficiency follows from good generalization of your model’s predictive performance to newly exposed data. The property can be visually inspected through traditional ‘goodness of fit’ tests, such as Hosmer–Lemeshow tests, segregated across the protected populations in your data sample. We generally expect for a fraud model to satisfy sufficiency even without the presence of protected attributes during training if these are predictable from correlated proxy variables. However, we note that imposing any corrections in your model to ensure parity of equality of opportunities (by satisfying independence or separation) will be an impediment to constructing sufficient models.

Featurespace’s approach to ensuring model fairness is designed to fit both our own mission, as well as align with global efforts towards financial inclusion and fairness. We actively address all three fairness criteria within our model development and aim to make our models and their outputs as transparent as possible. We know that transparency in machine learning models is crucial to meeting the explainability and interpretability requirements around model governance that financial regulators are beginning to explore. We participate in recommendations and commentary processes to facilitate the development of regulatory frameworks. By emphasizing model fairness in our research and development we believe we can accelerate the adoption of machine learning for fraud prevention and anti-money laundering, ultimately making the world a safer place to transact.

Read more about Model Governance for Anti Money Laundering from Featurespace: https://www.featurespace.com/newsroom/model-governance-anti-money-laundering/ 


Sign up for regular insights, content and news from Featurespace.