GH GambleHub

Load testing and stress

Load testing and stress

1) Why do you need it

Objectives:
  • Confirm capacity (how many RPS/competitive sessions the system will withstand given SLO).
  • Find bottlenecks (CPU/IO/DB/networks/locks/pools).
  • Set up performance budgets and gates in CI/CD.
  • Reduce the risk of releases (p95/p99 regression, peak error growth).
  • Plan capacity/cost (scale out and reserves).

2) Types of perf tests

Load: realistic traffic close to peaks; SLO validation.
Stress: growth to/above the limit → degradation behavior where it breaks.
Spike: fast load jump → elasticity/autoscale.
Soak/Endurance: hours/day → leaks, fragmentation, latency drift.
Capacity/Scalability: how throughput/latency changes with scale-out; Amdal/Gustafson law.
Smoke perf: a short "smoke" run on each release (performance dignity).


3) Traffic generation models

Fixed VUs/concurrency: 'N' users, each making requests to → queue on the client. Risk of hiding overload.
Arrival rate: a flow of applications with λ intensity (req/s), as in real life. More correct for public APIs.

Little's Law: 'L = λ × W'.
For pool/service, minimum parallelism ≈ 'λ × W' (add 20-50% of inventory).
Where 'λ' is throughput, 'W' is the average service time.


4) Load profiles and scenarios

User journey mix: shares of scripts (login, browse, deposit, checkout...).
Think-time: user pauses (distributions: exponential/lognormal).
Data profile: size of responses, payload, variability of parameters.
Correlation: link steps (cookies/tokens/ID) as in a real flow.
Cold/warm/hot cache: individual runs.
Read vs Write: balance of reads/records, idempotency for retrays.
Multi-region: RTT, distribution by POP/ASN.


5) Test environment

Isolation: the stand is close to the product in topology/settings (but do not "beat" the product).
Data: PII masking, volumes, indices as in sales.
Load generators: do not rest against the CPU/network; distributed runners, time synchronization.
Observability: metrics/trails/logs, synthetics on the perimeter, export of CPU/heap profiles.


6) Metrics and SLI

Throughput: RPS/Transactions/sec

Latency: p50/p95/p99, TTFB, server time vs network.
Errors: share of 5xx/4xx/domain errors.
Saturation: CPU, load avg, GC, disk IOps/latency, network, pool wait.
Business SLI: ≤ 5s deposit success, ≤ 2s order confirmation.

Take the thresholds from the SLO (for example, "99. 95% ≤ 300 ms"), monitor burn-rate during the run.


7) Finding bottlenecks (technique)

1. Consistently warm up the system by 60-80% of the target load.
2. Increase in steps (ramp) → fix where p95/p99 and error-rate grow.

3. Match p99 spikes to:
  • queues in pools (DB/HTTP),
  • growth of WAIT/locks (DB),
  • GC-pauses/heap,
  • network retransmitts/packet loss,
  • disk latency/cache misses.
  • 4. Localize: binary search by query path, profilers (CPU/alloc/lock-profile).
  • 5. Fix the "bottle" → tuning → repeating the run.

8) Behavior under stress

Graceful degradation: limits, circuit-breakers, backpressure queues, accepted for processing.
Retrays: maximum 1, idempotent only; jitter; the retray budget ≤ 10% of RPS.
Fail-open/Fail-closed: for non-critical dependencies, allow fail-open (cache/stubs).
Cascading failure: isolation of pools/quotas (bulkhead), fast timeouts, "smooth" disabling of functions (feature flags).


9) Tools (selection for the task)

k6 (JavaScript, open/open-model, fast, convenient in CI).
JMeter (rich in ecosystem, GUI/CLI, plugins, but heavier).
Gatling (Scala DSL, high performance).
Locust (Python, scripting flexibility).
Vegeta/hey/wrk (micro-benches and quick check).

Rule: one "main" tool + light CLI for smoke pen in PR.


10) Examples (snippets)

10. 1 k6 (open model with arrival rate)

js import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
scenarios: {
open_model: {
executor: 'ramping-arrival-rate',
startRate: 200, timeUnit: '1s',
preAllocatedVUs: 200, maxVUs: 2000,
stages: [
{ target: 500, duration: '5m' },  // до 500 rps
{ target: 800, duration: '5m' },  // стресс
{ target: 0,  duration: '1m' }
]
}
},
thresholds: {
http_req_duration: ['p(95)<300', 'p(99)<800'],
http_req_failed: ['rate<0.005'],
},
};

export default function () {
const res = http.get(`${__ENV.BASE_URL}/api/catalog?limit=20`);
sleep(Math.random() 2); // think-time
}

10. 2 JMeter (profile idea)

Thread Group + Stepping Thread или Concurrency Thread (open-like).
HTTP Request Defaults, Cookie Manager, CSV Data Set.
Backend Listener → InfluxDB/Grafana; Assertions by time/code.

10. 3 Locust (Python)

python from locust import HttpUser, task, between class WebUser(HttpUser):
wait_time = between(0.2, 2.0)
@task(5)
def browse(self): self.client.get("/api/catalog?limit=20")
@task(1)
def buy(self): self.client.post("/api/checkout", json={"sku":"A1","qty":1})

11) Data, correlation, preparation

Seed data: directories, users, balances, tokens - as in sales.
PII masking/anonymization; generating synthetics on top of real distributions.
Correlation: Extract IDs/tokens from responses (RegExp/JSONPath) and use in subsequent steps.


12) Observability during runs

RED dashboards (Rate, Errors, Duration) along the routes.
Exemplars - transition from metrics to traces (trace_id).
Error logs: sampling + aggregation, duplicates/idempotence.
System: CPU/GC/heap, disks/network, pool wait.
DB: top queries, locks, index scans, bloat.


13) Automation and performance gates

CI: short runs on merge (e.g. k6 2-3 minutes) with thresholds.
Nightly/Weekly: long soak/stress in a separate medium; reports and trends.
Canary releases: analysis of SLO (error-rate, p95) as the "gate" of the promotion.
Regressions: baseline vs current build; alert at deterioration> X%.


14) Capacity planning and cost

Curves throughput→latency: define knee point - after it p99 grows sharply.
Scale-out: Measure scaling efficiency (RPS delta/node delta).
Cost: "RPS per $/hour," reserve for peak events + DR-reserve.


15) Anti-patterns

Beat into the prod without control or test in an "empty" environment, not like the prod.
Closed model with fixed VUs hiding overload.
Lack of think-time/data → unrealistic cache hits, or vice versa - storm to the source.
One "/ping "script instead of custom flow.

Lack of observability: "we see only RPS and average delay."

Uncontrolled retrays → self-DDoS.
Mixing the test and optimizations without fixing hypotheses/changes.


16) Checklist (0-30 days)

0-7 days

Define SLI/SLO and target traffic profiles (mix, think-time, data).
Select the tool (k6/JMeter/Locust), raise the RED dashboards.
Prepare the stand and seed data, disable third-party limits/captchas.

8-20 days

Build scenarios: open-model (arrival rate), cold/warm/hot cache.
Run load → stress → spike; fix knee point and bottlenecks.
Implement performance gates in CI (micro-run).

21-30 days

Soak test 4-24h: GC leaks/drift, stabilization.
Document limits, capacity plan, "RPS→p95/oshibki" illustrations.

Prepare runbook "how to increase limits/scale" and "how to degrade."


17) Maturity metrics

There are realistic profiles (mix, think-time, data) that cover ≥ 80% of traffic.
RED dashboards + tracing are connected for all tests.
Performance gates block releases when regressing p95/errors.
Capacity and knee point are documented by key services.
Monthly soak/stress runs and progress reports.
Resistance to "spike" is confirmed by autoscale and the absence of cascade-fail.


18) Conclusion

Load testing is a regular engineering practice, not a one-time "measurement." Model real users (open-model), measure what reflects the client's experience (SLI/SLO), keep observability and gates in CI/CD, conduct stress/spike/soak runs and fix knee point. Then peak events and black swans turn into manageable scenarios, and performance turns into a predictable and measurable parameter of your platform.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.