Reducing bias in models

1) Why is it iGaming

Models affect responsible play (RG) limits, anti-fraud, payout limits, KYC/AML verification, complaint prioritization, personalization, and offers. Biased decisions → regulatory risks, complaints and reputational damage. The goal is fair, explainable, sustainable models while maintaining business value.

2) Where bias comes from (sources)

1. Representation bias: underrepresented countries/brands/devices/new players.
2. Measurement bias: proxy signals (time of day, device) are correlated with prohibited attributes.
3. Labels bias: Past rules/moderation/manual decisions were biased.
4. Constructs (construct bias): the "success" metric is defined in such a way that it infringes on vulnerable groups (for example, an aggressive KPI "deposit at 24h").
5. Data/rule drift: Models "forget" new markets/rules, behavior changes.
6. Experiments: unstratified A/B tests, traffic skew, "surviving" sessions.

3) Equity terms and metrics

Demographic Parity (DP): The proportion of positive decisions is similar between groups.
Equalized Odds (EO): Same TPR and FPR between groups.
Equal Opportunity (EOp): the same TPR (sensitivity) for the "positive" class.
Calibration: the same calibration of probabilities between groups.
Treatment/Outcome disparity: difference in assigned activities/outcomes.
Uplift fairness: differences in the effect of interventions between groups.

💡 In reality, it is impossible to perfectly comply with all the criteria at the same time - choose the target set of metrics for the task and the regulatory framework (for example, RG → EOp + calibration; antifraud → EO).

4) Strategies to reduce bias by stage

4. 1 Pre-processing

Reweighing/Resampling: class and group balancing (upsample underrepresented).
Data statements-Fix group coverage, sources, and constraints.
Feature hygiene: remove "dirty" proxies (geo-granularity, "night/day" as a status proxy), apply bining/masking.
Synthetic data (caution): for rare cases (chargeback, self-exclusion) with the check that synthetics do not enhance bias.
Label repair: overriding labels under changed rules; audit of historical cases.

4. 2 In-processing (in training)

Fairness constraints/regularizers: Penalties for TPR/FPR/DP differences between groups.
Adversarial debiasing: An individual "critic" attempts to predict a sensitive attribute by embeddings; the challenge is to make that impossible.
Monotonic/causal constraints: monotony by vital signs (for example, an increase in losses → not reduce the risk), blocking causally impossible dependencies.
Interpretable baselines: GAM/EBM/gradient boosting with monotonicity as reference layer.

4. 3 Post-processing

Threshold optimization per group - TPR/FPR/PPV alignment within acceptable thresholds.
Score calibration: calibration by subgroups (Platt/Isotonic).
Policy overrides: RG/compliance business rules on top of the model (for example, "self-exclusion always dominates the offer").

5) Causal approaches and counterfactual fairness

Causal DAG: explicit causal hypothesis (game loss → trigger RG; country of license → payout rules, but not "player quality").
Counterfactual tests: for candidate x, we change the sensitive attribute/proxy, fixing other factors → the solution must be stable.
Do-interventions: simulation of "what if" when changing managed factors (deposit limit) without affecting prohibited attributes.

6) Practice for iGaming: Typical Cases

RG scoring: goal - Equal Opportunity (do not miss risky regardless of group) + calibration. Hard overrides for self-exclusion rules.
Antifraud/AML: Equalized Odds (FPR control) + separate thresholds by market/payment method.
KYC in onboarding: minimizing false failures for "thin-file" players; active training for underrepresented documents/devices.
Marketing personalization: exclude high-risk from aggressive offers; limit proxy features (time of day, device), use uplift-fairness.

7) Monitoring equity in sales

What we monitor:

EO/EOp-deltas (TPR/FPR) by main groups (country, device, channel), calibration, base rate drift, feature drift.
Business effect: difference in approval of payments/limits/offers.
RG complaints/outcomes: response rate and quality of interventions.

How could I:

Dashboards by groups, control cards, alerts in CI/CD in case of violation of fairness thresholds.
Stratification experiments: A/B tests with mandatory reporting of fairness metrics; early stop rules.
Shadow/Champion-Challenger: Parallel run of new policy with fairness reports.

8) Relationship with Governance/Privacy

Acceptable feature policies: list of allowed/prohibited/conditional features, proxy audit.
Model Cards + Fairness Appendix: Goal, Data, Metrics, Groups, Limits, Revision Rate.
DSAR/transparency: explainable reasons for failures/limits; decision logs.
Process RACI: who approves fairness thresholds, who films incidents.

9) Templates and checklists

9. 1 Fairness check before release

Team coverage in training and validation documented
Target fairness metrics (EO/EOp/DP/Calibration) and thresholds selected
Counterfactual tests and proxy audit conducted
Post-processing plan generated (thresholds by group/calibration)
RG Arrangements/Compliance overrides
Monitoring and alerts are configured; incident owner assigned

9. 2 Fairness Appendix template (to model card)

Purpose and impact: which decisions are affected by the model

Groups and Coverage: Training/Validation Kit Allocation

Metrics and results: EO/EOp/Calibration with confidence intervals

debiasing interventions: what is applied (reweighing, constraints, thresholds)

Limitations: known risks where the model is not used

Review Frequency: Date, Owner, Criteria for Review

9. 3 Feature Policy (snippet)

Prohibited: direct/indirect attributes (religion, health, proxy geo

Conventionally: device/channel/time - only after proxy test and benefit justification

Mandatory: PII masking, pseudonymization, monotonic restrictions on risk features

10) Implementation tools and patterns

Pipeline hooks: automatic tests for proxy correlations, TPR/FPR difference, calibration by groups.
CI locks: pipeline drop when violating fairness thresholds/inconsistent features.

Explainability for support: local attributions (SHAP/IG) + "allowed dictionary of explanations."

Active Learning: data collection by rare groups; multilevel confidence thresholds.
Champion-Challenger: safe implementation; an equity comparison journal.

11) Implementation Roadmap

0-30 days (MVP)

1. Define high-impact models (RG, AML, payouts, KYC).
2. Fix target fairness metrics and thresholds.
3. Add pre-processing balancing and basic calibration.
4. Enable EO/EOp/Calibration dashboard by key group.
5. Update model cards with Fairness Appendix.

30-90 days

1. Implement in-processing (constraints/adversarial).
2. Configure per-group (post-processing) threshold policies and shadow runs.
3. Enter counterfactual tests in CI and stratified A/B rules.
4. Regular reviews of incidents and complaints, adjustment of thresholds.

3-6 months

1. Causal graphs for key tasks, monotonic/causal constraints.
2. Active learning and collecting reference data on rare cases.
3. Automation of fairness reporting and signals to the release process.
4. Audit all feature policies and proxy lists.

12) Anti-patterns

"First AUC, then fairness" - late and expensive.
Ignoring calibration between groups.
One common threshold for radically different base frequencies.
Constant "circumcision" feature instead of searching for causal causes.
Explainability as a "tick" without a valid dictionary for support.
Lack of stratification in A/B tests.

13) Success Metrics (Section KPI)

Decrease of EO/EOp deltas below the set threshold

Stable calibration by group (Brier/ACE)

Proportion of releases that have passed the fairness gate in CI

Reduce complaints/escalations related to unfair decisions

Improved RG outcomes without increased dysparitis

Fair Appendix card coverage ≥ 90%

Total

Reducing bias is an engineering discipline, not a one-time "filter." Clearly chosen metrics of fairness, debiasing tactics at each stage, causal thinking, and rigorous production monitoring yield models that work honestly, withstand audit, and improve long-term metrics of business and player trust.

Reducing bias in models

Total

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects