Load testing and stress profiles

Brief Summary

Load testing is a system test of performance and resilience under realistic and extreme scenarios. The basis of success: the correct traffic model (open vs closed), fixed SLO, pure metric (latency/throughput/errors/saturation), representative data, automation and repeatability. The result is not an "RPS figure," but a solution: where are the bottlenecks, how much does performance cost, where is the failure threshold and how to move it.

SLO/SLI and target metrics

SLO (example): p95 API ≤ 250 ms, p99 ≤ 600 ms; error ≤ 0. 3 %/30 days.
SLI: latency (p50/p95/p99), throughput (RPS/CPS/QPS), saturation (CPU/heap/GC/FD/conn), ошибки (5xx, timeouts), очереди (depth/lag), DB (locks, slow queries), кэш (hit-ratio).
Error and Saturation triggers (for example, CPU> 75% or queue depth> X → degradation).

Types of tests

1. Baseline/Benchmark - single service/endpoint, "ideal" conditions.
2. Load - realistic "working day" + ramp-up/ramp-down.
3. Stress - increase the load to degradation and breakpoint fixation.
4. Spike - sharp jump (x2-x10 in seconds/minutes).
5. Soak/Endurance - long run (8-72 h): memory leaks, latency drift.
6. Capacity - Step load for performance curve and capacity planning.
7. Degradation/Chaos-mix - load + partial failures (slow database, cache drop, "collapsed" applink).

Traffic models: Open vs Closed

Open model (more realistic for the Internet): users come with λ intensity (Poisson-like stream). If the system slows down, requests are accumulated, not "frozen."

Closed model - a fixed number of virtual users (VUs) with think-time. When the delay increases, the RPS artificially falls - carefully with conclusions.
Recommendation: for front-end APIs use open model (k6'arrival-rate '), for internal synchronous scripts - combine with closed.

Load profiles (templates)

"Normal day": baseline background + daily fluctuations.
"Peak event": 10-30 minutes before the start (warm-up), sharp spike at the start, plateau, tail.
"Tournament/stream": ladder steps, repeated peaks in intervals.
"Infrastructure degradation": half the cache is empty, one region is off, PSP latency increases.
"Failover": traffic flows to protection in 1-5 minutes; checking auto-scale/HPA/Retry storms.

Environment data and preparation

Test data: realistic cardinality (providers, currencies, countries), dirty fields, query distributions (Pareto/Zipf).
Secrets/PII: Anonymization; keys/PSP - sandbox.
Environment: dedicated perf stand, isolation from integrations (mock/stab), fixed versions.
Observability: metrics (Prometheus), logs (Loki/ELK), traces (OTel). Record build-id in responses.

Antistorm Retrays and Idempotence

Retrai only for idempotent operations; set retry-budget (e.g. ≤ 10% of traffic).
Exponential backoff + jitter; "collapsing" identical GETs.
For payments - idempotent keys and explicit statuses.
Protection against thundering herd: cache locks, SWR, local semaphores.

Tools and Patterns

k6 (scripting, open-model, good reporting), Locust (Python scripts), Gatling (Scala), JMeter (a wide range of protocols).

Protocols: HTTP/1. 1/2/3, gRPC, WebSocket, TCP/UDP; push server do not test "as GET."

Traffic generation: horizontal scaling of generators, control of network bottleneck.
Removal of profiles: pprof/async-profiler/ebpf under load, OTel routes.

Mini-example k6 (open-model + spike):

javascript import http from 'k6/http';
import {check, sleep} from 'k6';

export const options = {
scenarios: {
warmup: { executor: 'ramping-arrival-rate', startRate: 50, timeUnit: '1s',
preAllocatedVUs: 200, stages: [ { target: 200, duration: '5m' } ] },
spike: { executor: 'constant-arrival-rate', rate: 1200, timeUnit: '1s',
preAllocatedVUs: 2000, startTime: '6m', duration: '3m' }
},
thresholds: {
http_req_failed: ['rate<0. 3%'],
http_req_duration: ['p(95)<250', 'p(99)<600']
}
};

export default function () {
const res = http. get(`${__ENV. BASE_URL}/api/v1/catalog? c=${Math. floor(Math. random()1000)}`);
check(res, { 'status is 200': (r) => r. status === 200 });
sleep(Math. random()0. 9) ;//think time (for closed parts of the script)
}

Procedure

1. Hypothesis → which bottlenecks are likely (CPU, DB, cache, network, TLS, GC).
2. Profile → scenarios/routes, traffic shares, models (open/closed), data.
3. Warm-up → cache/connections/TLS/interpreters.
4. Increase of the → stage to the target intensity.
5. Plateau → collection of stable metrics and traces.
6. Stress/decline → find a break point, observe the auto-scale.
7. Analyze → correlate metrics, flamegraph, report, and change plan.
8. Repruf → repeat through the CI (Region Perf) pipeline.

Analysis of results

Load → delay/error curve: looking for the elbow (capacity).
Breakdown latency: network (DNS/TLS/connect), proxy, application, database, external calls.
Saturation: CPU> 75-85%, GC pause> p95, I/O waits, task queue.
Elasticity: autoscale reaction time (HPA/KEDA), cold start, cache warm-up.
Cost: $/1000 RPS at target SLO, peak budget forecast.

Engineering Practices

Indicators of degradation: "tails" p99, queue growth, hit-ratio drop, growth of retray attempts.
Exclude confounders: file descriptor limits, sysctl, conn-pool, 'reuseport', TLS chains, OCSP.
DB: indexes/plans/query cache, connection pool, batch operations, backpressure on producers.
Caches: size/eviction policy, hot keys, replication.
Network/edge: HTTP/2/3, resumption≥70%, Brotli, CDN cache key, tiered-cache.

Observability under load

Metrics: system (CPU/mem/IO), runtime (GC/heap), network (RTT/loss/ECN), L7 (p95/99, 5xx/429), queues, database clusters/cache.
Trails: include sampling on "tails" (tail-based), build-id/canary marks.
Logs: aggregation of errors with volume limitation (so as not to "forDOSor" log-pipeline).
Experiments: feature-flags and configs should be recorded in the report.

Automation and CI/CD

Perf-jobs in CI (smoke 3-5 min, nightly 30-60 min, weekly soak).
Tolerance limits: latency/errors/resources → "break build" in regression.
Artifacts: graphs, flamegraphs, profiles, JSON reports (k6/jtl).
Versioning of data and scripts, PR-review of perf scripts.

iGaming/fintech specific

Tournaments/matches: spike + plateau; TLS/DNS/CDN warming up, increased pool limits, gray routes for bots.
Payments/PSP: sandbox limits, idempotency, strict timeouts; checking degrade-mode (directory cache, queues).
Jackpots/events: atomicity and consistency, no takes, load on RNGs/leadboards.
Anti-fraud/AML: load on rules/ML scoring, backpressure, event deduplication.
Regulatory: logging metrics and versions at peaks, reports for audit.

Launch checklist

Fixed SLO/SLI and red lines (error/latency/saturation).
Load scenarios and profiles are approved (open/closed, spike/soak/stress).
Data realistic, PII masked, integrations sandbox/mock.
Observability ready: metrics/trails/logs, release tags.
System configs (ulimit/sysctl/pools) are documented.
Auto-scale/cache warm-up plan and rollback criteria.
Threshold alerts and on-call plan.
Reporting template (charts, conclusions, actions) is prepared.

Common errors

The closed-model test produces a "green" result, and the product drops (you cannot ignore the open-model).
Unrepresentative data (one currency/one provider) → false conclusions.
Zero preparation: cold caches/TLS/connections → excessive latency at the start.
Retrai without limits → storm and cascade falls.

The same profiles for all services → skipping real "hot spots."

The absence of soak runs → memory leaks and drift are not visible.
Opaque results: no traces/flamegraphs → unable to locate bottleneck.

Mini playbooks

Defining a breakpoint

1. Steps of 10-20% of the load every 5-10 minutes. 2) Fix the moment where p95 rises sharply and errors> SLO. 3) Remove CPU/DB/cache profiles. 4) Optimization plan and repeat.

Reining in retray storms

1. Restrict retry-budget and enable backoff + jitter. 2) Enter request-collapsing/SWR. 3) Allow "degrad mode" (limited functionality). 4) Double-check idempotency.

Peak event (tournament) - pre-plan

1. Warm up CDN/DNS/TLS/pools. 2) Increase target HPA, prepare reserve. 3) Separate rate limits for bots. 4) Peak-mode dashboards, on-call communication bridge.

Soak-night

1. 8-12 hours of stable load. 2) Monitor heap/FD/conn/GC-pauses. 3) Check the p95 delta and hit-ratio. 4) Fix leaks and drift.

Result

Load testing is an engineering decision-making process, not a "race for RPS." Model real profiles (especially the open model), capture SLOs, take metrics and traces, look for the performance knee, and count the cost of performance. Automate runs, keep anti-storm retreats and plan peak events - this way the platform will be predictable and stable in the most stressful moments.