Performance benchmarking

1) Why the iGaming platform needs benchmarks

Capacity planning: Confirm whether the infrastructure will survive prime time, a tournament or a new provider.
Choice of technologies: data, SQL/OLAP engines, streaming, FS/ML-serving, caches, API gateways.
Regression control: after releases, migration of schemes/features, model updates.

Budget and TCO: comparison of "performance for $" and "latency for $."

The result: a "buy/optimize/save" decision based on numbers, not sensations.

2) Methodology: How not to fool yourself

1. Fix everything: data/code versions, cluster configs, sides, data-cat.
2. Warm-up → a stable plateau → degradation: we measure only the plateau.
3. Replication: run ≥3; 95% confidence interval.
4. Realistic profiles: peaks/" breath "loads, think-time, hot key pockets.
5. The same semantics: the same SQL/feature-joyns/KPI, identical windows and filters.
6. Cache hygiene: tests "with heated cache" and "cold start" - separately.
7. Independence: the bench is isolated from the production/related experiments.
8. Stop criteria: SLO violated or saturations reached - we complete the test.

3) Workload mix

3. 1 Ingestion/ETL (Bronze → Silver → Gold)

Metrics: events/s, end-to-end freshness, success/retrai, cost/1000 messages.
Tests: PSP/provider burst streams, dirty data, schema drift.

3. 2 SQL/OLAP (DWH/cubes)

Metrics: latency p50/p95/p99, throughput (QPS), scans/bytes/to kernel-sec, cost/query.
Inquiries: GGR/NET day/week, retention cohorts, deposit funnels, heavy joins.

3. 3 Streaming (game rounds, payment signals)

Metrics: E2E window latency, watermark delays, exactly-once, consumer lag.
Scenarios: provider "jump" X3, drop out of one party, rebalancing.

3. 4 Feature Store and offline preparation

Metrics: point-in-time join latency, throughput feature/sec, group materialization time feature, freshness.
Scenarios: mass recalibration, replaying history (backfill).

3. 5 ML-Serving (online/batch/stream)

Metrics: p95/p99, error rate, feature freshness, hit-rate cache, cost/1k scoring, cold start.
Scenarios: spike for payments (CCP/anti-fraud), RG scoring for stocks.

3. 6 Analytics and Metrics APIs

Metrics: p95 ≤ target, success-rate, cache hit, cost/request, FX/TZ restrictions.
Scenarios: partner panels, mass reports, long-tail filters.

4) Metrics and SLI/SLO

Category	SLI (what we measure)	Typical SLO
Latency	p95/p99 queries	p95 ≤ 300ms (API), ≤ 200ms (ML-online)
Throughput	QPS / events/s	maintain X3 "prime time" ≥ 30 min
Freshness	end-to-end (ingest→gold)	≤ 15 min; features ≤ 60 sec
Reliability	success-rate	≥ 99. 5%
Cost	$/1k requests, $/vendor-event	within budget
Stability	jitter, GC pauses, tail latency	without p99- "spikes"
Saturation	CPU/NET/DISK/GPU util	≤ 70-80% on plateau

Additionally for ML: ACE/calibration under load, PSI/drift of inputs in peak.

5) Experiment design

5. 1 Load profiles

Ramp-up 10-15 min → Plateau 30-60 min → Ramp-down.
Peaks: "tournament" profile (10 min X3), "weekend promotion" (2 h X1. 8), "flash-dil" (5 min X5).
Think-time и key-skew (80/20) для API/Feature Store.

5. 2 Control of variables

Fixing lot/replication sizes, connection limits, pool size.
Turning off smart autotuners, or pre-training them for honesty.
Individual runs with/without cache.

5. 3 Statistics and report

Median, IQR, confidence interval.
Latency graphs, time-series, saturations.

A separate block of "uncertainties and threats to validity."

6) Set of artifacts

6. 1 Benchmark passport (template)

Objective: (e.g. confirm p95 API ≤ 300ms at X3)

Loads: (SQL TPC-like, API-mix, ML-scoring 200 QPS...)

Data: volume, hot key pockets, snapshot version

Configurations: clusters, versions, limits, flags

Metrics/SLO: list, thresholds, alerts

Stand: isolation, regions, encryption keys

Risks: cold starts, network queues, cache policy

6. 2 YAML load profile (sketch)

yaml name: analytics_api_peak_oct ramp_up: PT10M plateau: PT40M ramp_down: PT5M mix:
- endpoint: /v2/metrics/revenue qps: 180 group_by: [date, brand, country]
cache_ratio: 0. 6
- endpoint: /v2/metrics/retention qps: 60 window: ROLLING_28D cache_ratio: 0. 3 limits:
concurrency: 800 per_ip_qps: 50 think_time_ms: {p50: 80, p95: 250}

6. 3 Starting checklist

Data/snapshots committed, cache cleared (for cold-run).
Configs/versions are recorded in the passport; seed is set.
SLO alerts are enabled; tracing and profilers are active.
SLO rollback/stop plan.
# bench-status channel, on-call owner assigned.

7) Specificity of iGaming domains

7. 1 Provider events and tournaments

Simulate a cut by game/provider, "showcase effect" (one or two games give 40-60% of traffic).
Enable feature flags as a response to degradation.

7. 2 Payments/PSP

Biphasic transactions, retrays, queues, idempotence.
Test the primary/backup PSPs in parallel.

7. 3 RG/Antifrode/KYC

Test tail latency and fallback heuristics (when the model is not available).
Separate profiles for VIP/thin files (thin-file).

8) Tools and practices

Load generation: k6/JMeter/locust (API), native event replayers (stream).
Profiling: request tracing, flamegraphs, GC/alloc, GPU util.
Observability: build/commit labels in metrics and logs, owner responsibility.

Cost metrics: $/1k requests, $/hour plateau, "SLO cost."

9) Analysis and interpretation

Compare at the SLO level: "fulfilled/not," and only then - "how much faster."

Separate cache wins from engine/architecture wins.
For OLAP, see byte scans, "shuffle," skew.
For ML, the effect of quantization/distillation and scoring cache hit rate.

10) Capacity Planning

Translate the results into scaling formulas: QPS/kernel, events/s/instance, $/unit.
Build a headroom (e.g. 30%) and specify the limits of the autoscale.
Keep the "red button" of degradation: remove heavy features/widgets, include simplified KPIs.

11) Roles and RACI

Data Platform (R): stands, orchestration, observability, instruments.
Domain Owners (R): scripts and SQL/KPI, validation.
ML Lead (R): scoring profiles, cache/quantization.
SRE (R): limits, autoscale, incidents.
Security/DPO (C): test data privacy, tokenization.
Product/Finance (A/C): SLO, cost goals and interpretation for business.

12) Implementation Roadmap

0-30 days (MVP)

1. Directory of bench scripts for: ingestion, OLAP, API, ML.
2. Passport and YAML profile for "prime time" API and payments.
3. Dashboard SLO/Saturation/Cost; alerts to SLO failures.
4. "bench before release" procedure for critical changes.

30-90 days

1. Stream bench (late data, rebalancing, X3 burst).
2. ML-serving: shadow + cold-start, quantization and cache.
3. Auto-generation of reports (PDF/Confluence) from metrics and passports.
4. Inventory of bottlenecks, backlog of optimizations with ROI.

3-6 months

1. Regular seasonal benches (summer/autumn/holidays).
2. Capacity-plan for the year: headroom, budget, expansion points.
3. Auto-replays of incidents (repro benches), champion-challenger configs.
4. External partner tests (providers/PSPs) with signed webhooks.

13) Anti-patterns

Mixing cache and engine without separate tests.
Lack of warming up and short "sprints" instead of a plateau.
Benches on toy data without hot keys and distortions.
Ignore p99 and GC/IO; "average speed" instead of tails.
Comparison of "apples with oranges": different SQL/filters/windows.
No repeatability protocol: unable to reproduce result.

14) Related Sections

DataOps practices, API analytics and metrics, MLOps: exploitation of models, Alerts from data streams, Audit and versioning, Data retention policies, Security and encryption, Access control.

Total

Benchmarking is an engineering discipline, not a "one-off run." Strict methodology, realistic iGaming profiles, transparent SLOs and cost accounting turn numbers into confident decisions: where to scale, what to optimize, what risks to take and what margin of safety to keep to the next peak.

Performance benchmarking

Total

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects