Load balancing in operations
1) Why does the operating team need to manage balancing
Load balancing is not just about query distribution. This is a layer of risk and performance management: limiting the radius of failure, predictable latency, economies of scale, isolation of "noisy neighbors," direct impact on the execution of SLOs and the cost of incidents.
2) Balancing Layers: Network to Business Operations
L3/L4 (IP/port): simple and fast (DSR, ECMP, IPVS, LVS). Ideal for TCP/UDP services, brokers, gates.
L7 (HTTP/gRPC/WebSocket): path/header/metadata routing; canary, A/B, geo and client-aware policy.
GSLB/GeoDNS/Anycast: global distribution by region/RoR, accounting for delay, proximity and regional health.
Intra-service balancing: clients with service discovery (xDS, Consul, Eureka), client balancers (gRPC pick_first/round_robin), service mesh.
3) Distribution algorithms and when to apply them
Round-Robin (RR): Simple base case for homogeneous nodes and short queries.
Least Connections (LC): better for different query durations.
Least Request/Peak EWMA: adaptively reduces latency for "long" requests and noise.
Weighted RR/LC: takes into account the power of the nodes or "cost guardrails."
Consistent Hashing (Rendezvous/Maglev): for sticky keys (user, table/room, basket), reduces re-routing when scaling.
Power of Two Choices: Good LC approximation under high load with less telemetry.
Hedged/Retry Budgeted Requests: parallel catch-up requests with retray budget for p99.
4) Sessions, condition and stickiness
Sticky sessions (cookie/IP/identifier) - when the cache is populated locally or there is a stateful context (for example, a live table in iGaming).
Cons: hotspot effect, it is more difficult to evacuate nodes.
Solution: short TTL stickiness, state transfer to external stores (Redis, session store), shared-nothing and event-sourcing where possible.
5) Health-checks and protection against flapping
L7 content checks (asssert by body/header) instead of 200-as-success.
Combined samples: TCP + HTTP + internal '/ready 'with different timeouts.
Debowns: n failures → exception; m successes → return to the pool.
Outlier detection - automatic exclusion of nodes with high error-rate/latency (ejection).
6) Timeout, Retray, and Backpressure policies
Budget-oriented retrays: limiting the total user time (for example, 800 ms SLA → retriable 2 × 200 ms + margin).
Circuit Breakers: limit simultaneous requests/connections/errors.
Quotas/Rate Limits: default "per-tenant/per-IP/per-key" limits at the very edge.
Server-side queueing: short queues or failure with obvious degradation so as not to "overclock" the tail of latency.
7) Global balancing and fault tolerance
Geo-routing: latency-based, customer region, health.
Anycast + health-probes: instantaneous convergence of routes as PoP falls.
Failover hierarchy: RoR→region→oblako; cold/warm/hot DR.
Traffic Partitioning: product/legal isolations (countries, payment providers, VIP segments).
8) Balancing for threads and real time
WebSocket/SSE/gRPC-stream: long-term connections → monitor connections/node, redistribution at scale-out.
Sticky by user or by room/table through consistent hashing.
Drain/PreStop Hooks: correctly evict connections during release and autoscale.
9) Security on the perimeter
TLS termination, HSTS, ALPN; mTLS for east-west.
WAF/bot management to application balancer.
DDoS-защита: rate-limits, challenge-/proof-of-work, upstream scrubbing.
Policies as code (OPA/Kyverno/Envoy RBAC).
10) Observability and SLO for balancing
SLI: successful requests, error/sec, p50/p95/p99 latency, saturations (CPU/conn/epoll).
Per-backend metrics: request rate, error rate, EWMA-latency → input to algorithms.
L7 logs: correlate with releases (annotations), feature flags, canaries.
Allerts: according to the burn-rate of the error budget and according to the symptoms of the client (external synthetics).
11) Auto-scaling and cost-efficiency
HPA/VPA/KEDA: scaling by RPS, queues, user metrics.
Weighted-routing by cost: Cheaper regions/clouds get more weight under normal load.
Warm pools/heated: pre-warmed specimens so as not to "catch" a cold start.
12) Change Management: canary, shadow, blue-green
Canary routing: 1%→5%→25% with auto-stop under SLO degradation.
Shadow traffic: duplicate requests to the new version without responding to the client (for validation).
Blue-Green: VIP/routing table instant switching; quick rollback.
13) Configuration and GitOps
A single source of truth: routes, weights, timeout and limit policies - in the repository.
Promotion of the configuration on Wednesdays (dev→stage→prod) by the same pipeline.
Validation and configuration tests: linters, dry-run, traffic map simulation.
14) Private cases (regulated domains)
Payment/CCS providers: parallel channels, switching by quality/response time; per-provider SLO.
Multi-jurisdictions: geo-routing, content/limit policy by country.
VIP segments: individual weights/channels, elevated SLOs, UX degradation "handles."
15) Anti-patterns
One balancer as a "single point of failure."
Sticky over IP behind NAT - "sticky" clusters and traffic skew.
Universal RR for heavy/long requests - p99 tail growth.
Retreats with no budget and no idempotency are a storm of requests.
Health-check only TCP - "green" when the application is not working.
"Eternal" adhesive sessions without TTL - inability to evacuate nodes.
Configs are edited manually, without review and promotion - drift and incidents.
16) Implementation checklist
- Selected level: L4/L7/GSLB, defined goals and responsibilities.
- The distribution algorithm corresponds to the load profile (EWMA/LC/Hash).
- Consistent hashing where stateful context is needed.
- Combined health-checks, outlier-ejection, debunks.
- Timeouts/retreats/limits - like a code, with time budgets.
- Observability per-backend and client synthetics; burn-rate alerts.
- Canary/blue-green + shadow traffic; quick rollback.
- GitOps for configs; dry-run and route tests.
- DR plan and failover hierarchy (RoR→region→oblako).
- Isolation of VIP/legal cohorts and providers.
17) Example of architectural flow
1. GSLB (latency-based) directs the client to the nearest healthy region.
2. Edge/L7 balancer applies WAF, TLS, rate-limits, 5% canary.
3. Service mesh distributes to pitches with LC + EWMA excluding outliers.
4. For real-time tables - consistent hashing by 'table _ id', sticky TTL 10 min.
5. HPA scales frontends across RPS and queues; warm pool → no cold start.
6. Observability: dashboard p50/p95/p99, error-rate, saturations, burn-rate.
7. In case of degradation: auto-eject nodes, canary reduction, switching to a spare provider, version rollback.
18) The bottom line
Load balancing is an operational discipline that connects network, application, data, and business SLOs. Properly selected level (L4/L7/GSLB), adequate algorithms, strict health-checks, timeout and retray policies, observability and GitOps management turn balancing from a "box with settings" into a mechanism for sustainable and economical delivery of services.