Take me to...
At Featurespace, our tools and algorithms are widely used to monitor financial activity and identify fraudulent behavior. Our mission is to make the world a safer place to transact, so we routinely evaluate and respond to millions of data points in our efforts to combat cards and payments fraud, application fraud and money laundering. This sometimes involves processing sensitive information associated with a financial entity (e.g., a person), which may be directly observable or indirectly present through spurious relationships. Examples can include the age, fiscal activity type, gender or origin of individuals and their businesses. This is important in understanding what genuine behavior looks like for each person, which is how Featurespace can deliver market-leading fraud prevention rates for our customers. But this sensitive information can also be misinterpreted to make sweeping judgements on the inherent riskiness of people and businesses in particular demographics. We thus question whether any biases exist in our transaction scoring tools and strive to understand whether these are justifiable. This article examines some of the core concepts necessary to understand how model bias can occur, and how to prevent it.
Most algorithmic biases in the fraud space originate when we leverage prevention models for interventional decision-making. Commonly, this begins with an unanticipated impact of legacy systems. The data generating process for purchases and payments is historically influenced (or intervened) by manually updated rules, limits, and controls, imposed over decades. These may have been drawn from demographic profiles of card holders and merchants. This has biased how different entities have been allowed to transact historically, and this in turn has shaped the fraud methodologies pursued by malicious actors. It is feasible to distinguish fraud patterns in historic data that differentially affect populations segregated by the merchant category, or even by the card-holder’s gender, ethnicity, nationality, or age. Any ‘vanilla’ fraud modelling strategy commonly builds over such observed data sets and can incorrectly associate a protected attribute with fraud prevalence. A newly built decisioning system may intervene by imposing diverging restrictions for transacting that are based on an entity’s protected attributes. This may then create feedback loops which further perpetuate unfair evaluations.
Additionally, fraud prevention systems may be subject to poor design choices or improper correction mechanisms. Done properly, these mechanisms should ensure data balance. Unlike common practices in political science or biostatistics, machine learning models used for fraud prevention rarely account for the data collection mechanism, missing data, or potential dissimilarities between populations. This is often the case when models are calibrated to large datasets of multiple merchants grouped in a consortium. The prevention systems are commonly dominated by fraud topologies observable only in large institutions within the consortia, and their application can be detrimental and subject to multiple biases from the perspective of small merchants.
In the absence of recognized standards that regulate machine learning systems and their applications, at Featurespace we strive to establish and disseminate good practice. As such, we are concerned with three primary fairness criteria popularly assessed under an observational lens, i.e., judged on properties of the joint statistical distribution of transaction scores, protected attributes, and fraud prevalence:
At this point, I would imagine that you feel confused by such tightly packed definitions of model fairness. I know I felt that way when I first encountered them. The above definitions are well documented within publicly accessible sources, and examples abound. Additionally, concepts such as group parity, equalized opportunities and test fairness are of special relevance in money laundering and application fraud.
The most important thing to note is that the above criteria are mutually exclusive. For example, our research suggests that a well calibrated sufficient fraud model will always draw from protected attributes or correlated proxy variables, if available. Sensitive information about merchants and card holders is undeniably powerful in forecasting the potential for an entity to be a victim of fraudulent behavior, even if the true causal relationship is complex, rooted in history and potentially unfair. Such best models will fail to satisfy independence fairness criteria, and any conscious attempts to alter the reasoning of algorithmic decision systems may be subject to criticism. Ultimately, self-governance in relation to model fairness requires for subjective reasoning and expert understanding of the cause-effect relations that bring about fraud.
Please click here for the next part on Model Fairness in Fraud Prevention, where I discuss practical approaches and commonly used techniques that help fraud models satisfy the above definitions of fairness.