GH GambleHub

A/B tests of payment scenarios

1) Why test payment scenarios

Increase approvals (AR) and reduce failures (DR).
Reduce cost: take-rate (interchange/scheme/markup/fixed) and cost-per-approval.
Reduce risk: less chargebacks/fraud with the same approvals.
Sustainability: choose a provider/3DS strategy/routing for specific GEO/BIN/methods.

💡 Important: Payment tests affect money and risk in real time. Guardrails and ethics are mandatory.

2) Experiment design

2. 1. Randomization unit

User-level (recommended): all attempts of one user fall into one branch → there is no "mixing" of 3DS/tokens.
BIN-level: when the test is about routing by issuer; risk of cross-user confounding.
Order/Attempt-level: acceptable for small UI experiments (for example, a copy of an error), undesirable for routing/3DS.

2. 2. Stratification (before randomization)

Stratify by: GEO player, issuer country/BIN6, payment method, channel (web/app), amount-segment, risk-rate. This will reduce the variance and risk of SRM.

2. 3. What we test

Routing/cascade: PSP_A vs PSP_B, sticky BIN, limit-aware.
3DS policy: frictionless→challenge, enforced 3DS for BIN/GEO.
UX flow: sequence of steps, error/repetition texts.
Parameters: windows and soft-decline codes.
Pricing: Provider with IC++ vs blended and impact on all-in cost.

3) Metrics: targeted, secondary, guardrails

3. 1. The main

AR (Approval Rate) = approved/attempted.
Cost-per-Approval = (auth+decline fees)/approved.
Take-rate% (all-in) = fees/volume (in reporting currency).
3DS pass-rate; liability shift %.
Latency p95/p99 payment flow.

3. 2. Risk metrics

Chargeback ratio (CBR), refund rate, fraud alerts/1000 trx.
FX slippage (bps) = effective vs reference FX.

3. 3. Guardrails (stop conditions)

A drop in AR> Y bps or a rise in CBR/Refunds above the threshold.
SRM (Sample Ratio Mismatch) - traffic imbalance versus expected.
Spikes: latency, soft-decline surge, 3DS anomaly.

4) Stats and power

4. 1. Sample size (approximation for fractions)


n_per_group ≈ 2 (Z_{1-α/2} + Z_{1-β})^2 p(1-p) / δ^2

where'p 'is the base AR,' δ 'is the expected uplift in AR, α is the significance level, β is a type II error.

4. 2. Sequential Analysis (Early Stops)

Alpha-spending (O'Brien-Fleming/Pocock): we fix the inspection schedule and spend α in stages.
SPRT/Bayes - for operational solutions, but fix the protocol.

4. 3. Varys Editorial

CAPPED: 'Y = Y − θ (X − μ_X)', where X is the pre-experimental covariate (AR/DR/risk rate), θ is the covariate coefficient.
Stratified scores, cluster-robust errors (user/BIN clusters).
Bootstrap for take-rate/cost metrics (heavy tails).

4. 4. Multivariable tests and bandits

MAB (UCB/Thompson): When it's important to "learn" on the fly and keep turning.
For compliance-critical metrics (CBR, liability) - prefer classic A/B with guardrails.

5) Experimental platform architecture

1. Assignment service: deterministic hash '(user_id, experiment_id, salt)' → bucket.
2. Feature-flags/Rules-engine: activation of the route/3DS/retract along the branch.
3. Events: attempts/results (authorize/capture/refund/cb) → bus (Kafka/PubSub).
4. Idempotency: total'idempotency _ key'per cascade.
5. DWH/Showcases: normalized statuses, fees, FX, risk flags.
6. Monitoring: online-SLI (AR/3DS/latency), alerts, SRM check.
7. Protocols: pre-register hypothesis, final criteria, data frieze.

6) Data model (minimum)

sql ref. experiments (
exp_id PK, name, hypothesis, owner, start_at, end_at,
unit -- USER      BIN      ORDER,
target_metric, guardrails JSONB, design JSONB, alpha NUMERIC, power NUMERIC, meta JSONB
);

ref. experiment_arms (
exp_id FK, arm_id, name, traffic_share NUMERIC, params JSONB, enabled BOOLEAN
);

assignments. buckets (
exp_id, user_id, assigned_arm, assigned_at, salt, hash_key, PRIMARY KEY (exp_id, user_id)
);

events. payments (
attempt_id PK, user_id, exp_id, arm_id,
provider, method, bin, iso2, risk_score,
status, decline_code, three_ds_used BOOLEAN, liability_shift BOOLEAN,
amount_minor BIGINT, currency, latency_ms INT,
authorized_at, captured_at, settled_at, meta JSONB
);

finance. fees (
attempt_id FK, interchange_amt NUMERIC, scheme_amt NUMERIC, markup_amt NUMERIC,
auth_amt NUMERIC, refund_amt NUMERIC, cb_amt NUMERIC, gateway_amt NUMERIC,
fx_slippage_amt NUMERIC, reporting_currency TEXT
);

risk. outcomes (
attempt_id FK, is_refund BOOLEAN, is_chargeback BOOLEAN, fraud_alert BOOLEAN
);

7) SQL templates

7. 1. SRM check (share of traffic by hand)

sql
SELECT arm_id,
COUNT() AS n,
ROUND(100. 0 COUNT() / SUM(COUNT()) OVER (), 2) AS share_pct
FROM assignments. buckets
WHERE exp_id =:exp
GROUP BY 1;

7. 2. Key metrics by hand

sql
WITH base AS (
SELECT e. arm_id,
COUNT()                  AS attempts,
COUNT() FILTER (WHERE status='APPROVED') AS approvals,
AVG(latency_ms)              AS latency_avg_ms,
AVG((three_ds_used)::int)         AS three_ds_share
FROM events. payments e
WHERE e. exp_id=:exp AND e. authorized_at BETWEEN:from AND:to
GROUP BY 1
),
cost AS (
SELECT e. arm_id,
SUM(f. interchange_amt + f. scheme_amt + f. markup_amt +
f. auth_amt + f. refund_amt + f. cb_amt + f. gateway_amt + f. fx_slippage_amt) AS fees_rep,
SUM(e. amount_minor)/100. 0 AS volume_rep
FROM events. payments e
JOIN finance. fees f USING (attempt_id)
WHERE e. exp_id=:exp AND e. settled_at BETWEEN:from AND:to
GROUP BY 1
)
SELECT b. arm_id,
approvals::numeric/NULLIF(attempts,0)             AS ar,
fees_rep/NULLIF(volume_rep,0)                 AS take_rate,
(SELECT COUNT() FROM risk. outcomes r
JOIN events. payments e2 USING (attempt_id)
WHERE e2. exp_id=:exp AND e2. arm_id=b. arm_id AND r. is_chargeback)=0
AS cb_zero_flag,
latency_avg_ms, three_ds_share
FROM base b LEFT JOIN cost c ON c. arm_id=b. arm_id;

7. 3. CAPPED for AR (example)

sql
WITH pre AS (
SELECT user_id, AVG((status='APPROVED')::int) AS ar_pre
FROM events. payments
WHERE authorized_at <:pre_from_end
GROUP BY 1
),
cur AS (
SELECT e. user_id, e. arm_id, (e. status='APPROVED')::int AS ar_flag
FROM events. payments e
WHERE e. exp_id=:exp AND e. authorized_at BETWEEN:from AND:to
)
SELECT arm_id,
AVG(ar_flag - theta (ar_pre - mu_pre)) AS ar_cuped
FROM cur
LEFT JOIN pre USING (user_id),
LATERAL (SELECT AVG(ar_pre) AS mu_pre FROM pre) mu,
LATERAL (SELECT COVAR_SAMP(ar_flag, ar_pre)/VAR_SAMP(ar_pre) AS theta FROM cur LEFT JOIN pre USING(user_id)) t
GROUP BY arm_id;

7. 4. Checking guardrails (example)

sql
SELECT arm_id,
100. 0 SUM(is_chargeback::int)::numeric / NULLIF(COUNT(),0) AS cbr_pct,
100. 0 SUM(is_refund::int)::numeric  / NULLIF(COUNT(),0) AS refund_pct
FROM risk. outcomes r
JOIN events. payments e USING (attempt_id)
WHERE e. exp_id=:exp AND e. settled_at BETWEEN:from AND:to
GROUP BY 1
HAVING 100. 0 SUM(is_chargeback::int)::numeric / NULLIF(COUNT(),0) >:cbr_threshold
OR 100. 0 SUM(is_refund::int)::numeric  / NULLIF(COUNT(),0) >:refund_threshold;

8) Test process (end-to-end)

1. Pre-registration: hypothesis, metrics, design, dimensions, stop rules.
2. SRM/AA test on the "empty" effect (a couple of days).
3. Launch: assignment freeze, logic in rules-engine/phicheflags.
4. Online Monitoring: AR/3DS/latency/health + guardrails.
5. Intermediate alpha-spending checks (if planned).
6. Finish and date frieze: only after accounting for funding/reserves/late CB/refunds.
7. Analytics: CUPED/stratification, sensitivity, GEO/BIN/method/channel heterogeneity.
8. Solution: roll-out, roll-back, or follow-up test; updating rules/routing.
9. Documentation and retrospective: lessons, updating thresholds/weights.

9) Anti-patterns and traps

Peeking/re-review without protocol → false victories.
Order-level randomization in routing tests → leakage between hands.
Multiplicity game (many metrics/slices) without α correction.
Incomplete cost (forgot FX/reserve/refund fees) → wrong take-rate.
Missing SRM check → misaligned pins.
Non-idempotent retrays → double AR authorizations/distortions.

10) Safety, compliance and ethics

Same-method/return-to-source should not be broken by the test.
Sanctions/licenses/GEO policies are beyond experimentation.
RG/responsible game: do not degrade defense mechanisms for the sake of AR.
PCI/GDPR: tokens instead of PAN, minimizing personal data, DPA/SOC2.

11) Experiment dashboard KPI

AR/DR, uplift and confidence intervals by arms and key stratification (GEO/BIN/method).
Cost-per-Approval, take-rate %, FX slippage (bps).
3DS pass/liability shift, soft-decline share.
Latency p95/p99, errors/timeouts.
CB/Refunds (lag-aware), SRM, traffic coverage, duration.

12) Best practices (short)

1. Randomize at the user level and stratify.
2. Use guardrails and SRM check; fix the protocol.
3. Consider full cost (fees + FX + reserve) and cost-per-approval.
4. Use CAPPED, cluster-robust errors, and bootstrap for cost metrics.
5. For critical risks - classic A/B; bandits - for mainly price/AR tasks.
6. Consider funding/reserves/late CBs before final withdrawal.
7. Document and version the rules; do post-mortem.

13) Start-up checklist

  • Hypothesis, metrics, effect, design, sample size, term.
  • Unit randomization and strata, assignment service, phicheflags.
  • Guardrails/thresholds, SRM/AA-precheck, alerts.
  • Logs/events, idempotency, status normalization.
  • Display cases fees/FX/reserve; reporting currency.
  • Alpha-spending plan and data freeze.
  • Playbooks roll-out/roll-back; documentation of results.

Summary

A/B tests of payment scenarios are an engineering statistical discipline: correct randomization and stratification, full cost and risk metrics, guardrails and SRM, neat analytics (CUPED/cluster-robustness/sequential analysis) and "combat-ready" infrastructure (idempotency, telemetry, reconciliation). By following this technique, you increase AR, reduce all-in take-rate, and at the same time do not pay for "false victories" with an increase in chargebacks and regulatory risks.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.