Load and stress testing

1) Terms and objectives

Load test - test in the working range (target RPS/competition) against SLO (for example, p95 <200 ms, error rate <0. 5%).
Stress test - going beyond (before/over saturation of the CPU/DB/network), observing degradation and recovery mechanics.
Spike test - sharp bursts of load (× N for minutes).
Soak/Endurance - long run (hours/day) to find leaks, GC drift, fragmentation, queue growth.
Capacity test - calculation of the plateau of throughput (saturation point) and reserves.

Objectives: confirm SLO, fix plateau, understand bottlenecks, calibrate auto-scaling and limits.

2) Traffic model: open vs closed

Closed model (concurrency-driven): a fixed number of virtual users (VUs), each after the answer makes think time.
Open model (arrival-rate): fixed rate of requests (RPS), regardless of responses.

💡 On sale, more often the "open" world (users come as they come), therefore, priority is given to modeling arrival rate for API/web backs.

Little’s Law: `L = λ W`

'L'is the average number of simultaneously serviced requests,

'λ' - intensity (RPS),

'W'is the average response time.
Hence the assessment of the necessary competitiveness of the generator: 'concurrency ≈ target_RPS p95_latency'.

3) Metrics: what we measure

Delay SLI: p50/p90/p95/p99 and p99 tail. 9; separate for "hot" and "cold" paths.
Errors: '5xx', '4xx' (valid/invalid), timeouts, aborted.
Throughput: sustained RPS, throughput streams/bytes.
Resources: CPU, RAM/heap, GC pauses, disk IOPS/lat, network bandwidth, number of connections/FD.
Queues and Backprescher: depth, waiting time, number of shed/limited requests.
Cache efficiency: hit/miss, warm-up storms.
DB/caches/queues: p95 requests, locks, conflicts, pool utilization.

4) Stands and data

Configuration equivalence: software versions, limits (uLimit, conntrack), JVM/GC config, pools.

Topology: LBs, CDN, WAF, TLS, the same network "hops."

Data: realistic distributions (sizes of objects, "hot "/" cold "keys, regionality).
Cold/warm/hot start: individual runs; Be sure to test "cold" caches.
Background isolation: Disable irrelevant jobs/cronomes or account for their effect.

5) Scenarios (load profiles)

1. Baseline: step acceleration to target RPS, hold 10-30 min.
2. Ramp & Hold: smooth growth to X% above target, retention → tail analysis.
3. Spike: instant × 2- × 5 splash for 1-5 minutes, then return.
4. Stress to Failure: steps to failures; fix the first SLO failure point and the "break" point.
5. Soak: 6-24 hours with variability of traffic (day/night), watch for faces/drift.
6. Mixed: mixture of endpoints by real distribution (Zipf/pareto), different weights.

6) Step-by-step process

Define SLO and target traffic profiles.
Select the load model (open/closed), set arrival-rate or VUs.
Prepare data and "hot "/" cold "modes.
Set up telemetry (trails/metrics/logs), correlation with the test run.
Warming up and running, collecting artifacts (CPU/heap profiles, flame graphs, explain/slow-logs DB).
Bottleneck analysis, formation of action items.
Reprogon after fixes, baseline update and capacity playbook.

7) Bottlenecks and typical fixes

CPU-bound service: profiling → elimination of hot functions, allocations, branches; vectorization, cache-friendly structure.
Network/TLS: keep-alive, HTTP/2/3, connection pooling, correct timeouts, reduced chatting.
DB: indexes, batching, prepared queries, connection pool, R/W separation, result caching, query deduplication.
Caches: size, TTL, request coalescing, storm protection, warming, regional balls.
Queues/brokers: acceptance limits/parallelism, size of batches, idempotent consumers, DLQ ceilings.
Garbatage/pauses: GC tuning, buffer rentals, object pooling within reasonable limits.
I/O/disk: asynchronous I/O, compression, compression of responses with a reasonable level.

8) Limits and protection

Budget timeouts: from top to bottom to avoid cascades.

Rate limit/token buckets: predictable degradation instead of "long death."

Circuit breaker and low priority saturation shading.
Backpressure: signals and constraining concurrency deep into the chain.
Bulkheads: isolating pools for critical endpoints.
Idempotency: keys for safe repetitions under retraces.

9) Tools and when to choose them

k6 - laconic JS, excellent support for arrival-rate, integration and graphs.
Gatling - Scala DSL, high-performance generator.
JMeter - flexible, rich ecosystem; convenient for protocols/plugins.
Locust - Python scripts, convenient for complex user flow logic.
Vegeta/hey/wrk - microbenchies and point runs on HTTP.
tc/netem, toxiproxy - network degradation injection.
Flamegraph/profiler - search for CPU/heap hot spots.

10) Examples (sketches)

k6 (open model, mix endpoints)

javascript import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
scenarios: {
open_model: {
executor: 'constant-arrival-rate',
rate: 800, timeUnit: '1s', duration: '20m',
preAllocatedVUs: 500, maxVUs: 2000
}
},
thresholds: {
'http_req_duration{kind:hot}': ['p(95)<200'],
'http_req_failed': ['rate<0. 005']
}
};

export default function () {
const r = Math. random();
let res;
if (r < 0. 6) {
res = http. get('https://svc/api/hot', { tags: { kind: 'hot' }});
} else if (r < 0. 9) {
res = http. get('https://svc/api/warm', { tags: { kind: 'warm' }});
} else {
res = http. post('https://svc/api/heavy', JSON. stringify({ n: 1000 }), { headers: { 'Content-Type': 'application/json' }});
}
check(res, { 'status is 2xx': (r) => r. status >= 200 && r. status < 300 });
sleep(0. 2);
}

Gatling (steps and spike)

scala setUp(
scn. inject(
rampUsersPerSec(50) to 500 during (10 minutes),
constantUsersPerSec(500) during (20 minutes),
spikeUsers(2000). during(30. seconds)
)
). protocols(http. baseUrl("https://svc"))

Load plan (YAML skeleton)

yaml profile: "mix-traffic"
targets:
- endpoint: GET /api/hot weight: 0. 6
- endpoint: GET /api/warm weight: 0. 3
- endpoint: POST /api/heavy weight: 0. 1 schedule:
- step: { rps: 300, hold: 10m }
- step: { rps: 600, hold: 10m }
- step: { rps: 900, hold: 10m }
guardrails:
slo:
p95_ms: 200 error_rate: 0. 5%
abort_if:
- metric: error_rate op: ">"
value: 2%
window: 2m

11) Automation and lifecycle

Perf-smoke in each PR (short run of key endpoints).
Nightly "capacity" runs on the stage with reports and profile artifacts.
Gates in CI/CD: build file when regressing p95/p99> X% of baseline or error rate growth.
Versioning of baselines and storage of profiles/flamegraphs as artifacts.
Relevance tags: which service/endpoint is covered, which traffic profile is used.

12) Anti-patterns

The generator and test service on the same machine → distorted results.
Only closed model (VUs) for APIbacks → undershooting and misjudgment.
Runs on an empty database/cache without a cold start.
No realistic distributions (all queries are the same).
No telemetry (RPS/latency only on generator side).
Comparison without stable baselines and environment control.
"Optimization" through an increased timeout instead of correcting the cause.

13) Architect checklist

1. SLO and typical/peak load defined?
2. Is the correct model (open/closed) selected and the traffic profile described?
3. The stand is equivalent in configuration and topology, is there a cold/hot mode?
4. Telemetry and profiles enabled, test wound tagged?
5. Runs: baseline/ramp/spike/stress/soak - covered?
6. Are saturation points fixed and safety margin planned?
7. Configured limits, breakers, backprescher, idempotency, shading policies?
8. Are there CI gates for p95/p99 regression and error rate, are the baselines versioned?
9. After fixes - reprogon and playbook power update?
10. Auto zoom and emergency plan documented?

Conclusion

Load and stress testing are not one-time "races," but continuous engineering practice. A realistic traffic model, correct stands, telemetry and automation in CI/CD turn performance from "secret magic" into metric-driven ability: you know where your ceiling is, how safe the stock is and what changes really improve the user experience.

Load and stress testing

Gatling (steps and spike)

Load plan (YAML skeleton)

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects