Load balancing in operations

1) Why does the operating team need to manage balancing

Load balancing is not just about query distribution. This is a layer of risk and performance management: limiting the radius of failure, predictable latency, economies of scale, isolation of "noisy neighbors," direct impact on the execution of SLOs and the cost of incidents.

2) Balancing Layers: Network to Business Operations

L3/L4 (IP/port): simple and fast (DSR, ECMP, IPVS, LVS). Ideal for TCP/UDP services, brokers, gates.
L7 (HTTP/gRPC/WebSocket): path/header/metadata routing; canary, A/B, geo and client-aware policy.
GSLB/GeoDNS/Anycast: global distribution by region/RoR, accounting for delay, proximity and regional health.
Intra-service balancing: clients with service discovery (xDS, Consul, Eureka), client balancers (gRPC pick_first/round_robin), service mesh.

3) Distribution algorithms and when to apply them

Round-Robin (RR): Simple base case for homogeneous nodes and short queries.
Least Connections (LC): better for different query durations.
Least Request/Peak EWMA: adaptively reduces latency for "long" requests and noise.

Weighted RR/LC: takes into account the power of the nodes or "cost guardrails."

Consistent Hashing (Rendezvous/Maglev): for sticky keys (user, table/room, basket), reduces re-routing when scaling.
Power of Two Choices: Good LC approximation under high load with less telemetry.
Hedged/Retry Budgeted Requests: parallel catch-up requests with retray budget for p99.

4) Sessions, condition and stickiness

Sticky sessions (cookie/IP/identifier) - when the cache is populated locally or there is a stateful context (for example, a live table in iGaming).
Cons: hotspot effect, it is more difficult to evacuate nodes.
Solution: short TTL stickiness, state transfer to external stores (Redis, session store), shared-nothing and event-sourcing where possible.

5) Health-checks and protection against flapping

L7 content checks (asssert by body/header) instead of 200-as-success.
Combined samples: TCP + HTTP + internal '/ready 'with different timeouts.
Debowns: n failures → exception; m successes → return to the pool.
Outlier detection - automatic exclusion of nodes with high error-rate/latency (ejection).

6) Timeout, Retray, and Backpressure policies

Budget-oriented retrays: limiting the total user time (for example, 800 ms SLA → retriable 2 × 200 ms + margin).
Circuit Breakers: limit simultaneous requests/connections/errors.
Quotas/Rate Limits: default "per-tenant/per-IP/per-key" limits at the very edge.
Server-side queueing: short queues or failure with obvious degradation so as not to "overclock" the tail of latency.

7) Global balancing and fault tolerance

Geo-routing: latency-based, customer region, health.
Anycast + health-probes: instantaneous convergence of routes as PoP falls.
Failover hierarchy: RoR→region→oblako; cold/warm/hot DR.
Traffic Partitioning: product/legal isolations (countries, payment providers, VIP segments).

8) Balancing for threads and real time

WebSocket/SSE/gRPC-stream: long-term connections → monitor connections/node, redistribution at scale-out.
Sticky by user or by room/table through consistent hashing.
Drain/PreStop Hooks: correctly evict connections during release and autoscale.

9) Security on the perimeter

TLS termination, HSTS, ALPN; mTLS for east-west.
WAF/bot management to application balancer.
DDoS-защита: rate-limits, challenge-/proof-of-work, upstream scrubbing.
Policies as code (OPA/Kyverno/Envoy RBAC).

10) Observability and SLO for balancing

SLI: successful requests, error/sec, p50/p95/p99 latency, saturations (CPU/conn/epoll).
Per-backend metrics: request rate, error rate, EWMA-latency → input to algorithms.
L7 logs: correlate with releases (annotations), feature flags, canaries.
Allerts: according to the burn-rate of the error budget and according to the symptoms of the client (external synthetics).

11) Auto-scaling and cost-efficiency

HPA/VPA/KEDA: scaling by RPS, queues, user metrics.
Weighted-routing by cost: Cheaper regions/clouds get more weight under normal load.
Warm pools/heated: pre-warmed specimens so as not to "catch" a cold start.

12) Change Management: canary, shadow, blue-green

Canary routing: 1%→5%→25% with auto-stop under SLO degradation.
Shadow traffic: duplicate requests to the new version without responding to the client (for validation).
Blue-Green: VIP/routing table instant switching; quick rollback.

13) Configuration and GitOps

A single source of truth: routes, weights, timeout and limit policies - in the repository.
Promotion of the configuration on Wednesdays (dev→stage→prod) by the same pipeline.
Validation and configuration tests: linters, dry-run, traffic map simulation.

14) Private cases (regulated domains)

Payment/CCS providers: parallel channels, switching by quality/response time; per-provider SLO.
Multi-jurisdictions: geo-routing, content/limit policy by country.

VIP segments: individual weights/channels, elevated SLOs, UX degradation "handles."

15) Anti-patterns

One balancer as a "single point of failure."

Sticky over IP behind NAT - "sticky" clusters and traffic skew.
Universal RR for heavy/long requests - p99 tail growth.
Retreats with no budget and no idempotency are a storm of requests.
Health-check only TCP - "green" when the application is not working.
"Eternal" adhesive sessions without TTL - inability to evacuate nodes.
Configs are edited manually, without review and promotion - drift and incidents.

16) Implementation checklist

Selected level: L4/L7/GSLB, defined goals and responsibilities.
The distribution algorithm corresponds to the load profile (EWMA/LC/Hash).
Consistent hashing where stateful context is needed.
Combined health-checks, outlier-ejection, debunks.
Timeouts/retreats/limits - like a code, with time budgets.
Observability per-backend and client synthetics; burn-rate alerts.
Canary/blue-green + shadow traffic; quick rollback.
GitOps for configs; dry-run and route tests.
DR plan and failover hierarchy (RoR→region→oblako).
Isolation of VIP/legal cohorts and providers.

17) Example of architectural flow

1. GSLB (latency-based) directs the client to the nearest healthy region.
2. Edge/L7 balancer applies WAF, TLS, rate-limits, 5% canary.
3. Service mesh distributes to pitches with LC + EWMA excluding outliers.
4. For real-time tables - consistent hashing by 'table _ id', sticky TTL 10 min.
5. HPA scales frontends across RPS and queues; warm pool → no cold start.
6. Observability: dashboard p50/p95/p99, error-rate, saturations, burn-rate.
7. In case of degradation: auto-eject nodes, canary reduction, switching to a spare provider, version rollback.

18) The bottom line

Load balancing is an operational discipline that connects network, application, data, and business SLOs. Properly selected level (L4/L7/GSLB), adequate algorithms, strict health-checks, timeout and retray policies, observability and GitOps management turn balancing from a "box with settings" into a mechanism for sustainable and economical delivery of services.

Load balancing in operations

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects