Performance benchmarking
1) Why the iGaming platform needs benchmarks
Capacity planning: Confirm whether the infrastructure will survive prime time, a tournament or a new provider.
Choice of technologies: data, SQL/OLAP engines, streaming, FS/ML-serving, caches, API gateways.
Regression control: after releases, migration of schemes/features, model updates.
Budget and TCO: comparison of "performance for $" and "latency for $."
The result: a "buy/optimize/save" decision based on numbers, not sensations.
2) Methodology: How not to fool yourself
1. Fix everything: data/code versions, cluster configs, sides, data-cat.
2. Warm-up → a stable plateau → degradation: we measure only the plateau.
3. Replication: run ≥3; 95% confidence interval.
4. Realistic profiles: peaks/" breath "loads, think-time, hot key pockets.
5. The same semantics: the same SQL/feature-joyns/KPI, identical windows and filters.
6. Cache hygiene: tests "with heated cache" and "cold start" - separately.
7. Independence: the bench is isolated from the production/related experiments.
8. Stop criteria: SLO violated or saturations reached - we complete the test.
3) Workload mix
3. 1 Ingestion/ETL (Bronze → Silver → Gold)
Metrics: events/s, end-to-end freshness, success/retrai, cost/1000 messages.
Tests: PSP/provider burst streams, dirty data, schema drift.
3. 2 SQL/OLAP (DWH/cubes)
Metrics: latency p50/p95/p99, throughput (QPS), scans/bytes/to kernel-sec, cost/query.
Inquiries: GGR/NET day/week, retention cohorts, deposit funnels, heavy joins.
3. 3 Streaming (game rounds, payment signals)
Metrics: E2E window latency, watermark delays, exactly-once, consumer lag.
Scenarios: provider "jump" X3, drop out of one party, rebalancing.
3. 4 Feature Store and offline preparation
Metrics: point-in-time join latency, throughput feature/sec, group materialization time feature, freshness.
Scenarios: mass recalibration, replaying history (backfill).
3. 5 ML-Serving (online/batch/stream)
Metrics: p95/p99, error rate, feature freshness, hit-rate cache, cost/1k scoring, cold start.
Scenarios: spike for payments (CCP/anti-fraud), RG scoring for stocks.
3. 6 Analytics and Metrics APIs
Metrics: p95 ≤ target, success-rate, cache hit, cost/request, FX/TZ restrictions.
Scenarios: partner panels, mass reports, long-tail filters.
4) Metrics and SLI/SLO
Additionally for ML: ACE/calibration under load, PSI/drift of inputs in peak.
5) Experiment design
5. 1 Load profiles
Ramp-up 10-15 min → Plateau 30-60 min → Ramp-down.
Peaks: "tournament" profile (10 min X3), "weekend promotion" (2 h X1. 8), "flash-dil" (5 min X5).
Think-time и key-skew (80/20) для API/Feature Store.
5. 2 Control of variables
Fixing lot/replication sizes, connection limits, pool size.
Turning off smart autotuners, or pre-training them for honesty.
Individual runs with/without cache.
5. 3 Statistics and report
Median, IQR, confidence interval.
Latency graphs, time-series, saturations.
A separate block of "uncertainties and threats to validity."
6) Set of artifacts
6. 1 Benchmark passport (template)
Objective: (e.g. confirm p95 API ≤ 300ms at X3)
Loads: (SQL TPC-like, API-mix, ML-scoring 200 QPS...)
Data: volume, hot key pockets, snapshot version
Configurations: clusters, versions, limits, flags
Metrics/SLO: list, thresholds, alerts
Stand: isolation, regions, encryption keys
Risks: cold starts, network queues, cache policy
6. 2 YAML load profile (sketch)
yaml name: analytics_api_peak_oct ramp_up: PT10M plateau: PT40M ramp_down: PT5M mix:
- endpoint: /v2/metrics/revenue qps: 180 group_by: [date, brand, country]
cache_ratio: 0. 6
- endpoint: /v2/metrics/retention qps: 60 window: ROLLING_28D cache_ratio: 0. 3 limits:
concurrency: 800 per_ip_qps: 50 think_time_ms: {p50: 80, p95: 250}
6. 3 Starting checklist
- Data/snapshots committed, cache cleared (for cold-run).
- Configs/versions are recorded in the passport; seed is set.
- SLO alerts are enabled; tracing and profilers are active.
- SLO rollback/stop plan.
- # bench-status channel, on-call owner assigned.
7) Specificity of iGaming domains
7. 1 Provider events and tournaments
Simulate a cut by game/provider, "showcase effect" (one or two games give 40-60% of traffic).
Enable feature flags as a response to degradation.
7. 2 Payments/PSP
Biphasic transactions, retrays, queues, idempotence.
Test the primary/backup PSPs in parallel.
7. 3 RG/Antifrode/KYC
Test tail latency and fallback heuristics (when the model is not available).
Separate profiles for VIP/thin files (thin-file).
8) Tools and practices
Load generation: k6/JMeter/locust (API), native event replayers (stream).
Profiling: request tracing, flamegraphs, GC/alloc, GPU util.
Observability: build/commit labels in metrics and logs, owner responsibility.
Cost metrics: $/1k requests, $/hour plateau, "SLO cost."
9) Analysis and interpretation
Compare at the SLO level: "fulfilled/not," and only then - "how much faster."
Separate cache wins from engine/architecture wins.
For OLAP, see byte scans, "shuffle," skew.
For ML, the effect of quantization/distillation and scoring cache hit rate.
10) Capacity Planning
Translate the results into scaling formulas: QPS/kernel, events/s/instance, $/unit.
Build a headroom (e.g. 30%) and specify the limits of the autoscale.
Keep the "red button" of degradation: remove heavy features/widgets, include simplified KPIs.
11) Roles and RACI
Data Platform (R): stands, orchestration, observability, instruments.
Domain Owners (R): scripts and SQL/KPI, validation.
ML Lead (R): scoring profiles, cache/quantization.
SRE (R): limits, autoscale, incidents.
Security/DPO (C): test data privacy, tokenization.
Product/Finance (A/C): SLO, cost goals and interpretation for business.
12) Implementation Roadmap
0-30 days (MVP)
1. Directory of bench scripts for: ingestion, OLAP, API, ML.
2. Passport and YAML profile for "prime time" API and payments.
3. Dashboard SLO/Saturation/Cost; alerts to SLO failures.
4. "bench before release" procedure for critical changes.
30-90 days
1. Stream bench (late data, rebalancing, X3 burst).
2. ML-serving: shadow + cold-start, quantization and cache.
3. Auto-generation of reports (PDF/Confluence) from metrics and passports.
4. Inventory of bottlenecks, backlog of optimizations with ROI.
3-6 months
1. Regular seasonal benches (summer/autumn/holidays).
2. Capacity-plan for the year: headroom, budget, expansion points.
3. Auto-replays of incidents (repro benches), champion-challenger configs.
4. External partner tests (providers/PSPs) with signed webhooks.
13) Anti-patterns
Mixing cache and engine without separate tests.
Lack of warming up and short "sprints" instead of a plateau.
Benches on toy data without hot keys and distortions.
Ignore p99 and GC/IO; "average speed" instead of tails.
Comparison of "apples with oranges": different SQL/filters/windows.
No repeatability protocol: unable to reproduce result.
14) Related Sections
DataOps practices, API analytics and metrics, MLOps: exploitation of models, Alerts from data streams, Audit and versioning, Data retention policies, Security and encryption, Access control.
Total
Benchmarking is an engineering discipline, not a "one-off run." Strict methodology, realistic iGaming profiles, transparent SLOs and cost accounting turn numbers into confident decisions: where to scale, what to optimize, what risks to take and what margin of safety to keep to the next peak.