Model Fairness in Fraud Prevention - Model Bias 101

Iker Perez

Principal Research Scientist

Iker is a Principal Research Scientist at Featurespace, where he specialises in uncertainty quantification and probabilistic reasoning, as well as interpretability and algorithmic fairness in automated fraud prevention. He joined Featurespace in 2020, following a lectureship in the School of Mathematical Sciences at the University of Nottingham. Iker holds a PhD in probability theory, and he is an accredited graduate statistician by the Royal Statistical Society. Loves dogs.

Connect

At Featurespace, our tools and algorithms are widely used to monitor financial activity and identify fraudulent behavior. Our mission is to make the world a safer place to transact, so we routinely evaluate and respond to millions of data points in our efforts to combat cards and payments fraud, application fraud and money laundering. This sometimes involves processing sensitive information associated with a financial entity (e.g., a person), which may be directly observable or indirectly present through spurious relationships. Examples can include the age, fiscal activity type, gender or origin of individuals and their businesses. This is important in understanding what genuine behavior looks like for each person, which is how Featurespace can deliver market-leading fraud prevention rates for our customers. But this sensitive information can also be misinterpreted to make sweeping judgements on the inherent riskiness of people and businesses in particular demographics. We thus question whether any biases exist in our transaction scoring tools and strive to understand whether these are justifiable. This article examines some of the core concepts necessary to understand how model bias can occur, and how to prevent it.

Common sources of bias in algorithmic fraud prevention

Most algorithmic biases in the fraud space originate when we leverage prevention models for interventional decision-making. Commonly, this begins with an unanticipated impact of legacy systems. The data generating process for purchases and payments is historically influenced (or intervened) by manually updated rules, limits, and controls, imposed over decades. These may have been drawn from demographic profiles of card holders and merchants. This has biased how different entities have been allowed to transact historically, and this in turn has shaped the fraud methodologies pursued by malicious actors. It is feasible to distinguish fraud patterns in historic data that differentially affect populations segregated by the merchant category, or even by the card-holder’s gender, ethnicity, nationality, or age. Any ‘vanilla’ fraud modelling strategy commonly builds over such observed data sets and can incorrectly associate a protected attribute with fraud prevalence. A newly built decisioning system may intervene by imposing diverging restrictions for transacting that are based on an entity’s protected attributes. This may then create feedback loops which further perpetuate unfair evaluations.

Additionally, fraud prevention systems may be subject to poor design choices or improper correction mechanisms. Done properly, these mechanisms should ensure data balance. Unlike common practices in political science or biostatistics, machine learning models used for fraud prevention rarely account for the data collection mechanism, missing data, or potential dissimilarities between populations. This is often the case when models are calibrated to large datasets of multiple merchants grouped in a consortium. The prevention systems are commonly dominated by fraud topologies observable only in large institutions within the consortia, and their application can be detrimental and subject to multiple biases from the perspective of small merchants.

How do we identify, evaluate, and report on model biases?

In the absence of recognized standards that regulate machine learning systems and their applications, at Featurespace we strive to establish and disseminate good practice. As such, we are concerned with three primary fairness criteria popularly assessed under an observational lens, i.e., judged on properties of the joint statistical distribution of transaction scores, protected attributes, and fraud prevalence:

Independence: Scores derived from fraud prevention models must be independent of protected attributes. A model should yield the same score for any two financial transactions associated with different entities, if the only differentiating factor in the entity profiles is the value of a protected attribute.
Separation: Conditioned on the observable fraud, model scores must be independent of protected attributes. That is, true and false positive rates in the associated classifiers must be invariant to changes in the sensitive information.
Sufficiency: Conditioned on model scores, the observable fraud should be independent of protected attributes. A commonly satisfied condition that measures general model calibration.

A note from the author

At this point, I would imagine that you feel confused by such tightly packed definitions of model fairness. I know I felt that way when I first encountered them. The above definitions are well documented within publicly accessible sources, and examples abound. Additionally, concepts such as group parity, equalized opportunities and test fairness are of special relevance in money laundering and application fraud.

The most important thing to note is that the above criteria are mutually exclusive. For example, our research suggests that a well calibrated sufficient fraud model will always draw from protected attributes or correlated proxy variables, if available. Sensitive information about merchants and card holders is undeniably powerful in forecasting the potential for an entity to be a victim of fraudulent behavior, even if the true causal relationship is complex, rooted in history and potentially unfair. Such best models will fail to satisfy independence fairness criteria, and any conscious attempts to alter the reasoning of algorithmic decision systems may be subject to criticism. Ultimately, self-governance in relation to model fairness requires for subjective reasoning and expert understanding of the cause-effect relations that bring about fraud.

Model Fairness in Fraud Prevention – Model Bias 101

Iker Perez

Common sources of bias in algorithmic fraud prevention

How do we identify, evaluate, and report on model biases?

A note from the author

Sign up for regular insights, content and news from Featurespace.