By Maeve Madigan, Research Scientist at Featurespace, A Visa Solutions

By Maeve Madigan, Research Scientist at Featurespace, A Visa Solutions

Imagine you have a group of AI agents working together to help banks and financial institutions decide who should get a loan or who might be trying to commit fraud. These agents can talk to each other, share ideas, and make decisions as a team, simulating the traditional human teams at organizations. But here’s the big question: does working together mean they can sometimes become more biased, especially against certain groups of customers?  

 The latest research from the Featurespace Innovation Lab wanted to find out if these teams of AI agents can unintentionally become more unfair or biased than the individual constituent agents. This is an important consideration as more and more organizations are beginning to automate processes through agents, since unfair decisions can hurt customers, damage reputation and even result in regulatory fines.  

 We’re thrilled that this research has been accepted to the prestigious ‘Workshop on Agentic AI in Financial Services’ at the AAAI 2026 conference in Singapore, where author Maeve Madigan will be presenting the findings in person in January 2026.. 

 

The Research Methodology 

Our team set up a series of experiments using two real-world datasets: 

  • One that looked at Consumer Income 
  • One that looked at Individual Consumer Credit Risk 

 They ran large-scale simulations across 10 different LLMs (in their most current versions) in different multi-agent configurations, setting each agent tasks to solve ‘in teams’. The agents would debate and refine their answers, just like students discussing homework, before agreeing on the best solution. 

 To check for fairness, our Innovation Lab team looked at whether the multi-agent teams treated people differently based on factors like gender. They measured how accurate the decisions were for different groups and compared the results. 

 

Key Findings 

  • Unpredictable Bias: The results varied between the agents becoming more biased and becoming less biased when working together, as compared to when working individually.  
  • Long-Tail Risk: Most of the time, the bias changes are small. But, in rare cases, the multi-agent teams became much more unfair, occasionally by a factor of ten. 
  • System-Level Testing Needed: The findings advocate for evaluating multi-agent systems as unified teams, rather than on an agent-by-agent. 

 

Emerging Bias methodology and findings infographic

 

The Featurespace Takeaway 

At Featurespace, we believe in making the world a safer place to transact and that relies on maintaining fairness to the extent possible. This research shows that bringing advanced LLMs together can deliver powerful results if we, as an industry, can work together and be vigilant.  

Read the research report here