GH GambleHub

Experiment flags and A/B tests

1) Why do you need it

Experimentation is a manageable way to improve conversion and reliability without the risk of "breaking the food." In iGaming, this affects: registration, deposit/withdrawal, bets/settle, KYC/AML funnels, lobby/UX, bonuses and anti-fraud. Ficheflags produce rapid, reversible changes; A/B tests - evidence of effect before scaling.

2) Platform principles

1. Safety-by-design: flags with TTL, rollbacks and reach limits; prohibition of switching on at red SLO.
2. Compliance-aware: SoD/4-eyes for sensitive flags (payments, RG, PII); geo-residency data.
3. Single Source of Truth: all flags/experiments - as data (Git/policy repository).

4. Deterministic assignment: stable bucketing (hash (userdeviceaccount)).
5. Observation: exposures/conversions are logged, SRM/guardrails are checked automatically.
6. Cost-aware: limits on the cardinality and cost of telemetry experiments.

3) Taxonomy of flags

Release-flags: control the rolling out of versions (canary/rollout/kill-switch).
Experiment flags: A/B/n, multi-armed bandit, interleaving for ranking.
Ops flags: degradation of features (temporary), switching providers (PSP/KYC).
Config flags: parameters without release (limits, texts, coefficients).
Safety-flags: emergency switches (export PII off, bonus caps).

Each flag has: 'owner', 'risk _ class', 'scope (tenant/region)', 'rollout _ strategy', 'ttl', 'slo _ gates', 'audit'.

4) Platform architecture

Flag Service (CDN cache): gives the solution in ≤10 -20 ms; subscribed to GitOps/pe-consiler.
Assignment Engine: stable hash + stratification (GEO/brand/device) → buckets.
Experiment Service: test catalog, MDE/power calculation, SRM/guardrails, statistics.
Exposure Logger: idempotent log of "falling under the flag/variant" + event key.
Metrics API: SLI/KPI/KRI and Experiment Aggregates (CUPED/Adjustments).
Policy Engine: SoD/4-eyes, freeze windows, geo-constraints, SLO gates.
Dashboards & Bot: reports, alerts guardrail, short commands in chatbot.

5) Data model (simplified)

Flag: `id`, `type`, `variants`, `allocation{A:0. 5,B:0. 5}`, `strata{geo,tenant,device}`, `constraints`, `ttl`, `kill_switch`, `slo_gates`, `risk_class`, `audit`.
Experiment: `id`, `hypothesis`, `metrics{primary,secondary,guardrails}`, `audience`, `power`, `mde`, `duration_rule`, `sequential?`, `cuped?`, `privacy_scope`.

6) Idea-to-inference process

1. Hypothesis: metric-goal, risk/compliance assessment, MDE (minimally noticeable effect).
2. Design: choice of audience and stratification (GEO/tenant/device), calculation of power and duration.
3. Randomization and start: enabling via Policy-Engine (SLO green, SoD passed).
4. Monitoring: SRM checks (randomization distortion), guardrails (errors/latency/revenue).
5. Analytics: frequency (t-test, U-test) or Bayesian; CUPED for variance reduction.
6. Solution: promote/rollback/iterate; entry in the knowledge directory.
7. Archiving: turning off the TTL flag, releasing configuration/code, cleaning telemetry.

7) Purpose and bucketing

Deterministic: 'bucket = hash (secret_salt + user_id) mod N'.
Stratification: separately by 'geo, tenant, device, new_vs_returning' → uniformity in layers.
Single salt for a period: changes controlled to avoid collisions/leaks.
Exposures: Logged to the first target metric (to avoid selective logging).

8) Metrics and guardrails

Primary: registration/deposit conversion, ARPPU, D1/D7 retention, KYC speed, CTR lobby.
Secondary: LCP/JS errors, p95 "stavka→settl," auth-success PSP.
Guardrails: error_rate, p99 latency, SLO-burn-rate, complaints/tickets, RG-threshold (responsible game).
Long-term: churn, LTV proxes, chargebacks, RG flags.

9) Statistics and decision-making

MDE & capacity: predefined (e.g. MDE = + 1. 0 pp, power = 80%, α = 5%).
SRM (Sample Ratio Mismatch): χ ² - test every N minutes; with SRM - pause the test and investigate.
CUPED: covariate - pre-test behavior/basic conversion (reduces variance).
Multiplicity corrections: Bonferroni/Holm or control FDR.
Sequential: group sequential/always-valid p-values (SPRT, mSPRT) - safe early stops.
Bayesian: posterior probability of improvement and expected loss; good for making decisions with price asymmetry errors.
Interference/peeking: prohibition of "look and decide" outside of sequential procedures; logs of all views.
Non-parametric: Mann-Whitney for heavy tails; bootstrap for stability.

10) Privacy and compliance

Without PII in labels and expositions: tokenization, geo-scope storage.
SoD/4-eyes: experiments affecting payouts/limits/PII/responsible play.
Holdout by RG/Compliance: part of the traffic is always in control (to see regulatory/ethical effects).
Data minimization - store only the necessary aggregates and keys.
WORM audit: who started/changed/stopped, parameters, versions.

11) Integrations (operational)

CI/CD & GitOps: flags as data; PR review, validation of schemes.
Alerting: flag guardrail→avto, IC/owner notification.
Incident bot: commands '/flag on/off ', '/exp pause/resume', '/exp report '.
Release-gates: prohibit releases if active experiments in sensitive areas without owner-online.
Metrics API: reports, SLO-gates, exemplars (trace_id for degradation).
Status page: does not publish details of experiments; only if affects availability.

12) Configurations (examples)

12. 1 Canary roll flag

yaml apiVersion: flag. platform/v1 kind: FeatureFlag metadata:
id: "lobby. newLayout"
owner: "Games UX"
risk_class: "medium"
spec:
type: release scope: { tenants: ["brandA"], regions: ["EU"] }
allocation:
steps:
- { coverage: "5%", duration: "30m" }
- { coverage: "25%", duration: "1h" }
- { coverage: "100%" }
slo_gates: ["slo-green:auth_success","slo-green:bet_settle_p99"]
ttl: "30d"
kill_switch: true

12. 2 Experiment A/B with guardrails and CUPED

yaml apiVersion: exp. platform/v1 kind: Experiment metadata:
id: "payments. depositCTA. v3"
hypothesis: "The new button increases the deposit-conversion by + 1 pp"
owner: "Payments Growth"
spec:
audience:
strata: ["geo","tenant","device"]
filters: { geo: ["TR","EU"] }
split: { A: 0. 5, B: 0. 5 }
metrics:
primary: ["deposit_conversion"]
secondary: ["signup_to_kyc","auth_success_rate"]
guardrails: ["api_error_rate<1. 5%","latency_p99<2s","slo_burnrate<1x"]
stats:
alpha: 0. 05 power: 0. 8 mde: "1pp"
cuped: true sequential: true operations:
srm_check: "5m"
pause_on_guardrail_breach: true ttl: "21d"

13) Dashboards and reporting

Exec: lift by key metrics, percentage of successful experiments, economic effect.
Ops/SRE: guardrail-alerts, SRM, SLO degradation, impact on lags/queues.
Domain: funnels (registratsiya→depozit→stavka), GEO/PSP segments/device.
Catalog: knowledge base on completed experiments (what tried, what worked/didn't, effects on RG/compliance).

14) KPI/KRI functions

Time-to-Test: ideya→start (days).
Test Velocity: experiments/month per team/domain.
Success Rate: proportion of tests with a positive, statistically significant effect.
Guardrail Breach Rate: SLO/error rate.
SRM Incidence: proportion of tests with impaired randomization.
Documentation Lag: time from completion to directory write.
Cost per Test: $ Telemetry/Settlement/Maintenance.
Long-term Impact: LTV/churn/chargebacks change on winning variant cohorts.

15) Implementation Roadmap (6-10 weeks)

Ned. 1–2:
  • Repository of flags/experiments, schemes (JSON Schema), basic Flag Service with cache.
  • Policy-Engine (SoD/4-eyes, SLO-gates), integration with GitOps.
Ned. 3–4:
  • Assignment Engine (hash + strata), Exposure Logger, SRM check, guardrails alerts.
  • The first set of flags: release + ops (kill-switch), 1-2 safe A/B.
Ned. 5–6:
  • Statistical module: CUPED, frequency and Bayesian reports, sequential control.
  • Dashboards (Exec/Ops/Domain), incident-bot commands '/flag ', '/exp'.
Ned. 7–8:
  • Autopause by guardrails, integration with Release-gates, knowledge catalog.
  • Process documentation, team training (Growth/Payments/Games).
Ned. 9–10:
  • Multi-region and geo-residency, FinOps limits of cardinality, chaos teachings (SRM disruption).
  • Certification of experiment owners, WORM audit.

16) Antipatterns

Include flags "all at once" without canaries and SLO-gates.
Mix release flags and experimental flags into one entity without explicit goals.
On-client randomization without salt/determinism → SRM/manipulation.
Peeking without sequential control; choose the winning metric after the fact.
Lack of guardrails and owner-on-duty → an increase in incidents.
Store PII in expositions/labels; ignoring geo-residency.
Do not turn off the TTL flags → "frozen" branches and behavior.

17) Best Practices (Brief)

Small, clear hypotheses; one Primary metrics per test.
Start with 5-10% traffic and strict guardrails.
CUPED almost always; Bayesian - when solution speed is important and the cost of errors is asymmetric.
Always check SRM and invariant metrics.
Write a post-analysis and add to the knowledge catalog.
Respect Responsible Play (RG): Don't incentivize harmful behavior with short-term revenue metrics.

Total

Flags and A/B tests are the production contour of change: flags as data, safe randomization and strict statistics, SLO/compliance-guardrails, observability and auditing. This approach allows you to quickly learn from the sale, increasing conversion and quality without increasing risks, with a proven effect for business and regulators.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.