Load and stress testing
1) Terms and objectives
Load test - test in the working range (target RPS/competition) against SLO (for example, p95 <200 ms, error rate <0. 5%).
Stress test - going beyond (before/over saturation of the CPU/DB/network), observing degradation and recovery mechanics.
Spike test - sharp bursts of load (× N for minutes).
Soak/Endurance - long run (hours/day) to find leaks, GC drift, fragmentation, queue growth.
Capacity test - calculation of the plateau of throughput (saturation point) and reserves.
Objectives: confirm SLO, fix plateau, understand bottlenecks, calibrate auto-scaling and limits.
2) Traffic model: open vs closed
Closed model (concurrency-driven): a fixed number of virtual users (VUs), each after the answer makes think time.
Open model (arrival-rate): fixed rate of requests (RPS), regardless of responses.
Little’s Law: `L = λ W`
'L'is the average number of simultaneously serviced requests,
'λ' - intensity (RPS),
'W'is the average response time.
Hence the assessment of the necessary competitiveness of the generator: 'concurrency ≈ target_RPS p95_latency'.
3) Metrics: what we measure
Delay SLI: p50/p90/p95/p99 and p99 tail. 9; separate for "hot" and "cold" paths.
Errors: '5xx', '4xx' (valid/invalid), timeouts, aborted.
Throughput: sustained RPS, throughput streams/bytes.
Resources: CPU, RAM/heap, GC pauses, disk IOPS/lat, network bandwidth, number of connections/FD.
Queues and Backprescher: depth, waiting time, number of shed/limited requests.
Cache efficiency: hit/miss, warm-up storms.
DB/caches/queues: p95 requests, locks, conflicts, pool utilization.
4) Stands and data
Configuration equivalence: software versions, limits (uLimit, conntrack), JVM/GC config, pools.
Topology: LBs, CDN, WAF, TLS, the same network "hops."
Data: realistic distributions (sizes of objects, "hot "/" cold "keys, regionality).
Cold/warm/hot start: individual runs; Be sure to test "cold" caches.
Background isolation: Disable irrelevant jobs/cronomes or account for their effect.
5) Scenarios (load profiles)
1. Baseline: step acceleration to target RPS, hold 10-30 min.
2. Ramp & Hold: smooth growth to X% above target, retention → tail analysis.
3. Spike: instant × 2- × 5 splash for 1-5 minutes, then return.
4. Stress to Failure: steps to failures; fix the first SLO failure point and the "break" point.
5. Soak: 6-24 hours with variability of traffic (day/night), watch for faces/drift.
6. Mixed: mixture of endpoints by real distribution (Zipf/pareto), different weights.
6) Step-by-step process
Define SLO and target traffic profiles.
Select the load model (open/closed), set arrival-rate or VUs.
Prepare data and "hot "/" cold "modes.
Set up telemetry (trails/metrics/logs), correlation with the test run.
Warming up and running, collecting artifacts (CPU/heap profiles, flame graphs, explain/slow-logs DB).
Bottleneck analysis, formation of action items.
Reprogon after fixes, baseline update and capacity playbook.
7) Bottlenecks and typical fixes
CPU-bound service: profiling → elimination of hot functions, allocations, branches; vectorization, cache-friendly structure.
Network/TLS: keep-alive, HTTP/2/3, connection pooling, correct timeouts, reduced chatting.
DB: indexes, batching, prepared queries, connection pool, R/W separation, result caching, query deduplication.
Caches: size, TTL, request coalescing, storm protection, warming, regional balls.
Queues/brokers: acceptance limits/parallelism, size of batches, idempotent consumers, DLQ ceilings.
Garbatage/pauses: GC tuning, buffer rentals, object pooling within reasonable limits.
I/O/disk: asynchronous I/O, compression, compression of responses with a reasonable level.
8) Limits and protection
Budget timeouts: from top to bottom to avoid cascades.
Rate limit/token buckets: predictable degradation instead of "long death."
Circuit breaker and low priority saturation shading.
Backpressure: signals and constraining concurrency deep into the chain.
Bulkheads: isolating pools for critical endpoints.
Idempotency: keys for safe repetitions under retraces.
9) Tools and when to choose them
k6 - laconic JS, excellent support for arrival-rate, integration and graphs.
Gatling - Scala DSL, high-performance generator.
JMeter - flexible, rich ecosystem; convenient for protocols/plugins.
Locust - Python scripts, convenient for complex user flow logic.
Vegeta/hey/wrk - microbenchies and point runs on HTTP.
tc/netem, toxiproxy - network degradation injection.
Flamegraph/profiler - search for CPU/heap hot spots.
10) Examples (sketches)
k6 (open model, mix endpoints)
javascript import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
scenarios: {
open_model: {
executor: 'constant-arrival-rate',
rate: 800, timeUnit: '1s', duration: '20m',
preAllocatedVUs: 500, maxVUs: 2000
}
},
thresholds: {
'http_req_duration{kind:hot}': ['p(95)<200'],
'http_req_failed': ['rate<0. 005']
}
};
export default function () {
const r = Math. random();
let res;
if (r < 0. 6) {
res = http. get('https://svc/api/hot', { tags: { kind: 'hot' }});
} else if (r < 0. 9) {
res = http. get('https://svc/api/warm', { tags: { kind: 'warm' }});
} else {
res = http. post('https://svc/api/heavy', JSON. stringify({ n: 1000 }), { headers: { 'Content-Type': 'application/json' }});
}
check(res, { 'status is 2xx': (r) => r. status >= 200 && r. status < 300 });
sleep(0. 2);
}
Gatling (steps and spike)
scala setUp(
scn. inject(
rampUsersPerSec(50) to 500 during (10 minutes),
constantUsersPerSec(500) during (20 minutes),
spikeUsers(2000). during(30. seconds)
)
). protocols(http. baseUrl("https://svc"))
Load plan (YAML skeleton)
yaml profile: "mix-traffic"
targets:
- endpoint: GET /api/hot weight: 0. 6
- endpoint: GET /api/warm weight: 0. 3
- endpoint: POST /api/heavy weight: 0. 1 schedule:
- step: { rps: 300, hold: 10m }
- step: { rps: 600, hold: 10m }
- step: { rps: 900, hold: 10m }
guardrails:
slo:
p95_ms: 200 error_rate: 0. 5%
abort_if:
- metric: error_rate op: ">"
value: 2%
window: 2m
11) Automation and lifecycle
Perf-smoke in each PR (short run of key endpoints).
Nightly "capacity" runs on the stage with reports and profile artifacts.
Gates in CI/CD: build file when regressing p95/p99> X% of baseline or error rate growth.
Versioning of baselines and storage of profiles/flamegraphs as artifacts.
Relevance tags: which service/endpoint is covered, which traffic profile is used.
12) Anti-patterns
The generator and test service on the same machine → distorted results.
Only closed model (VUs) for APIbacks → undershooting and misjudgment.
Runs on an empty database/cache without a cold start.
No realistic distributions (all queries are the same).
No telemetry (RPS/latency only on generator side).
Comparison without stable baselines and environment control.
"Optimization" through an increased timeout instead of correcting the cause.
13) Architect checklist
1. SLO and typical/peak load defined?
2. Is the correct model (open/closed) selected and the traffic profile described?
3. The stand is equivalent in configuration and topology, is there a cold/hot mode?
4. Telemetry and profiles enabled, test wound tagged?
5. Runs: baseline/ramp/spike/stress/soak - covered?
6. Are saturation points fixed and safety margin planned?
7. Configured limits, breakers, backprescher, idempotency, shading policies?
8. Are there CI gates for p95/p99 regression and error rate, are the baselines versioned?
9. After fixes - reprogon and playbook power update?
10. Auto zoom and emergency plan documented?
Conclusion
Load and stress testing are not one-time "races," but continuous engineering practice. A realistic traffic model, correct stands, telemetry and automation in CI/CD turn performance from "secret magic" into metric-driven ability: you know where your ceiling is, how safe the stock is and what changes really improve the user experience.