Bias and fairness in underwriting models refers to the systematic evaluation and mitigation of discriminatory patterns that emerge when machine learning systems make credit, insurance, or merchant approval decisions. These models can inadvertently disadvantage protected groups based on race, gender, age, or geography, even when those attributes are not explicitly used as input features.
How Bias Enters Underwriting Models
Understanding the sources of bias is essential before mitigation strategies can work.
Historical Data Reflects Past Discrimination
Machine learning models learn from historical decisions, which often encode decades of discriminatory lending practices. If a bank historically denied mortgages to applicants from certain zip codes, a model trained on that data will learn to associate those zip codes with higher risk. This creates a feedback loop where past discrimination perpetuates future discrimination. Redlining patterns from the 1960s still appear in modern credit models when geographic features serve as proxies for race.
Proxy Variables and Feature Correlation
Even when protected attributes like race or gender are excluded, models can discover correlated features that serve as proxies. Zip codes correlate with race and income. First names correlate with gender and ethnicity. Shopping patterns, social media connections, and even the time of day an application is submitted can encode demographic signals. A 2022 study by Stanford University found that 78 percent of alternative data features used in fintech underwriting correlated significantly with at least one protected class.
Unequal Representation in Training Data
When training datasets underrepresent certain populations, models perform worse for those groups. If only 5 percent of historical loan applicants were small business owners in rural areas, the model has less signal to accurately assess their risk. This leads to higher error rates, more false declines, and worse outcomes for underrepresented groups. Stripe and Square have documented this challenge when expanding merchant underwriting to new verticals and geographies.
Measuring and Testing for Fairness
Financial institutions must implement ongoing testing regimes to detect and quantify bias.
Disparate Impact Analysis
Disparate impact occurs when a neutral policy disproportionately affects a protected group. The CFPB and federal banking regulators use the 80 percent rule as a threshold: if approval rates for a protected group fall below 80 percent of the majority group rate, the model triggers disparate impact concerns. Institutions run these analyses quarterly or whenever models are retrained. JPMorgan Chase publicly disclosed implementing automated disparate impact testing across all consumer lending models in 2023.
Fairness Metrics and Trade Offs
Multiple fairness metrics exist, and they often conflict. Demographic parity requires equal approval rates across groups. Equalized odds requires equal true positive and false positive rates. Calibration requires that a 70 percent approval probability means 70 percent approval regardless of group membership. A model cannot satisfy all metrics simultaneously, forcing institutions to make value judgments about which fairness criteria matter most for their context. According to a 2024 report by Brookings Institution, 62 percent of financial institutions now formally document their fairness metric choices and rationale.
Model Cards and Documentation
Model cards document a models intended use, training data composition, performance across demographic groups, and known limitations. This transparency enables regulators, auditors, and internal stakeholders to assess fairness. The Model Risk Management guidance from the Office of the Comptroller of the Currency, or OCC, known as SR 11 7, requires institutions to maintain documentation of model validation including fairness testing. Google and Microsoft pioneered model cards for public AI systems, and financial regulators now expect similar documentation for underwriting models.
Summary
Bias and fairness in underwriting models demands attention to historical data patterns, proxy variables, and representation gaps that can produce discriminatory outcomes. Financial institutions must implement disparate impact testing, select appropriate fairness metrics, and maintain transparent documentation to meet regulatory expectations and protect customers from algorithmic harm.