Load testing and stress

Stress testing and stress

1) Why do you need it

Objectives:

Confirm capacity (how many RPS/competitive sessions the system will withstand given SLO).
Find bottlenecks (CPU/IO/DB/networks/locks/pools).
Set up performance budgets and gates in CI/CD.
Reduce the risk of releases (p95/p99 regression, peak error growth).
Plan capacity/cost (scale out and reserves).

2) Types of perf tests

Load: realistic traffic close to peaks; SLO validation.
Stress: growth to/above the limit → degradation behavior where it breaks.
Spike: fast load jump → elasticity/autoscale.
Soak/Endurance: hours/day → leaks, fragmentation, latency drift.
Capacity/Scalability: how throughput/latency changes with scale-out; Amdal/Gustafson law.
Smoke perf: a short "smoke" run on each release (performance dignity).

3) Traffic generation models

Fixed VUs/concurrency: 'N' users, each making requests to → queue on the client. Risk of hiding overload.
Arrival rate: a flow of applications with λ intensity (req/s), as in real life. More correct for public APIs.

Little's Law: 'L = λ × W'.
For pool/service, minimum parallelism ≈ 'λ × W' (add 20-50% of inventory).
Where 'λ' is throughput, 'W' is the average service time.

4) Load profiles and scenarios

User journey mix: shares of scripts (login, browse, deposit, checkout...).
Think-time: user pauses (distributions: exponential/lognormal).
Data profile: size of responses, payload, variability of parameters.
Correlation: link steps (cookies/tokens/ID) as in a real flow.
Cold/warm/hot cache: individual runs.
Read vs Write: balance of reads/records, idempotency for retrays.
Multi-region: RTT, distribution by POP/ASN.

5) Test environment

Isolation: the stand is close to the product in topology/settings (but do not "beat" the product).
Data: PII masking, volumes, indices as in sales.
Load generators: do not rest against the CPU/network; distributed runners, time synchronization.
Observability: metrics/trails/logs, synthetics on the perimeter, export of CPU/heap profiles.

6) Metrics and SLI

Throughput: RPS/Transactions/sec

Latency: p50/p95/p99, TTFB, server time vs network.
Errors: share of 5xx/4xx/domain errors.
Saturation: CPU, load avg, GC, disk IOps/latency, network, pool wait.
Business SLI: ≤ 5s deposit success, ≤ 2s order confirmation.

Take the thresholds from the SLO (for example, "99. 95% ≤ 300 ms"), monitor burn-rate during the run.

7) Finding bottlenecks (technique)

1. Consistently warm up the system by 60-80% of the target load.
2. Increase in steps (ramp) → fix where p95/p99 and error-rate grow.

3. Match p99 spikes to:

queues in pools (DB/HTTP),
growth of WAIT/locks (DB),
GC-pauses/heap,
network retransmitts/packet loss,
disk latency/cache misses.
4. Localize: binary search by query path, profilers (CPU/alloc/lock-profile).
5. Fix the "bottle" → tuning → repeating the run.

8) Behavior under stress

Graceful degradation: limits, circuit-breakers, backpressure queues, accepted for processing.
Retrays: maximum 1, idempotent only; jitter; the retray budget ≤ 10% of RPS.
Fail-open/Fail-closed: for non-critical dependencies, allow fail-open (cache/stubs).
Cascading failure: isolation of pools/quotas (bulkhead), fast timeouts, "smooth" disabling of functions (feature flags).

9) Tools (selection for the task)

k6 (JavaScript, open/open-model, fast, convenient in CI).
JMeter (rich in ecosystem, GUI/CLI, plugins, but heavier).
Gatling (Scala DSL, high performance).
Locust (Python, scripting flexibility).
Vegeta/hey/wrk (micro-benches and quick check).

Rule: one "main" tool + light CLI for smoke pen in PR.

10) Examples (snippets)

10. 1 k6 (open model with arrival rate)

js import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
scenarios: {
open_model: {
executor: 'ramping-arrival-rate',
startRate: 200, timeUnit: '1s',
preAllocatedVUs: 200, maxVUs: 2000,
stages: [
{ target: 500, duration: '5m' },  // до 500 rps
{target: 800, duration: '5m'} ,//stress
{ target: 0,  duration: '1m' }
]
}
},
thresholds: {
http_req_duration: ['p(95)<300', 'p(99)<800'],
http_req_failed: ['rate<0. 005'],
},
};

export default function () {
const res = http. get(`${__ENV. BASE_URL}/api/catalog? limit=20`);
sleep(Math. random() 2); // think-time
}

10. 2 JMeter (profile idea)

Thread Group + Stepping Thread или Concurrency Thread (open-like).
HTTP Request Defaults, Cookie Manager, CSV Data Set.
Backend Listener → InfluxDB/Grafana; Assertions by time/code.

10. 3 Locust (Python)

python from locust import HttpUser, task, between class WebUser(HttpUser):
wait_time = between(0. 2, 2. 0)
@task(5)
def browse(self): self. client. get("/api/catalog? limit=20")
@task(1)
def buy(self): self. client. post("/api/checkout", json={"sku":"A1","qty":1})

11) Data, correlation, preparation

Seed data: directories, users, balances, tokens - as in sales.
PII masking/anonymization; generating synthetics on top of real distributions.
Correlation: Extract IDs/tokens from responses (RegExp/JSONPath) and use in subsequent steps.

12) Observability during runs

RED dashboards (Rate, Errors, Duration) along the routes.
Exemplars - transition from metrics to traces (trace_id).
Error logs: sampling + aggregation, duplicates/idempotence.
System: CPU/GC/heap, disks/network, pool wait.
DB: top queries, locks, index scans, bloat.

13) Automation and performance gates

CI: short runs on merge (e.g. k6 2-3 minutes) with thresholds.
Nightly/Weekly: long soak/stress in a separate medium; reports and trends.
Canary releases: analysis of SLO (error-rate, p95) as the "gate" of the promotion.
Regressions: baseline vs current build; alert at deterioration> X%.

14) Capacity planning and cost

Curves throughput→latency: define knee point - after it p99 grows sharply.
Scale-out: Measure scaling efficiency (RPS delta/node delta).
Cost: "RPS per $/hour," reserve for peak events + DR-reserve.

15) Anti-patterns

Beat into the prod without control or test in an "empty" environment, not like the prod.
Closed model with fixed VUs hiding overload.
Lack of think-time/data → unrealistic cache hits, or vice versa - storm to the source.
One "/ping "script instead of custom flow.

Lack of observability: "we see only RPS and average delay."

Uncontrolled retrays → self-DDoS.
Mixing the test and optimizations without fixing hypotheses/changes.

16) Checklist (0-30 days)

0-7 days

Define SLI/SLO and target traffic profiles (mix, think-time, data).
Select the tool (k6/JMeter/Locust), raise the RED dashboards.
Prepare the stand and seed data, disable third-party limits/captchas.

8-20 days

Build scenarios: open-model (arrival rate), cold/warm/hot cache.
Run load → stress → spike; fix knee point and bottlenecks.
Implement performance gates in CI (micro-run).

21-30 days

Soak test 4-24h: GC leaks/drift, stabilization.
Document limits, capacity plan, "RPS→p95/oshibki" illustrations.

Prepare runbook "how to increase limits/scale" and "how to degrade."

17) Maturity metrics

There are realistic profiles (mix, think-time, data) that cover ≥ 80% of traffic.
RED dashboards + tracing are connected for all tests.
Performance gates block releases when regressing p95/errors.
Capacity and knee point are documented by key services.
Monthly soak/stress runs and progress reports.
Resistance to "spike" is confirmed by autoscale and the absence of cascade-fail.

18) Conclusion

Load testing is a regular engineering practice, not a one-time "measurement." Model real users (open-model), measure what reflects the client's experience (SLI/SLO), keep observability and gates in CI/CD, conduct stress/spike/soak runs and fix knee point. Then peak events and black swans turn into manageable scenarios, and performance turns into a predictable and measurable parameter of your platform.

Load testing and stress

Stress testing and stress

8-20 days

21-30 days

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects