GH GambleHub

Load balancing

1) Why and where is it in architecture

The balancer is a "turnstile" between the client and the backend fleet. Its objectives are:
  • availability (without a single point of failure), latency (p95 down), scale (horizontal), security (TLS/WAF), release manageability (canary/blue-green).
Application layers:
  • Edge/Global: Anycast, GSLB/GeoDNS, CDN/Edge-LB, DDoS.
  • L4 (TCP/UDP): NLB, maglev, proxy without termination.
  • L7 (HTTP/2, gRPC, WebSocket, QUIC): path routing/headers/stamps, cache/compression/retrays.
  • Data-tier: DB-прокси (PgBouncer/ProxySQL), Redis Cluster/Consistent Hash, Kafka partitioning.

2) Balancing models and algorithms

Round-Robin (RR): simple uniform.
Least Connections (LC): Good for long connections (WS, gRPC).
Least Request/Power-of-Two (P2C): Comparing two random ones is a good speed/quality balance.
Weighted RR/LC: weights for canary/hot nodes.
Consistent Hashing (CH): session stickiness without a table (cart, Redis).
Maglev/Flow-hash: fast L3/L4 distribution with flapping resistance.
Latency-aware: Selection by sliding p50/p95.
EWMA: Takes into account the history of delays.

Recommendation: by default P2C (least-request) on L7; for stateful/caches - consistent hash; для WS/gRPC — least-connections.

3) Upstream health: checks and "evictions"

Health-checks: TCP, HTTP 200/匹配 тела, gRPC status; intervals/timeouts/error threshold.
Outlier Ejection: automatic exclusion of "noisy" instances (sequential-5xx, success-rate-ejection).
Slow-start & warmup: soft entry of new instances (gradual weight growth).
Connection draining: when off/reset - "topping up" of active connections without interruption.

4) Sessions and stickiness (stickiness)

Cookie-stickiness (L7): `Set-Cookie: lb=<id>; SameSite; Secure`.
CH by key: 'hash (userId' sessionId 'cartId)'.
IP-hash - only in closed networks (NAT breaks).
TTL stickiness + fallback in nodal eviction.
Important: minimize the need for stickiness → store the state outside the instance (Redis/DB/JWT).

5) Global Balancing (GTM/GSLB)

Anycast + health-probe: one IP, traffic to the nearest PoP; automatic feilover.
GeoDNS/Latency-DNS: Geo/Latency Response.
Regional clusters: "resident data" remains in the region (GDPR); interregional failover with replication.
Politicians: geo-blocks, "stickeregion" by account/token.

6) Protocols and features

HTTP/2: multiplex, priorities; need a competent connection-pool for upstream.
gRPC: long-lived streams → least-connections, aggressive health-checks.
WebSocket/SSE: stickiness on the connection, large idle timeouts, TCP keep-alive.
QUIC/HTTP/3: fast start, resistance to loss; monitor MTU/patch-MTU.
TLS-termination/mTLS: terminate on edge/L7-LB; oral mTLS/identity (SPIFFE).

7) Overload control

Rate-limit: per-IP, per-key, per-route; burst+sustain.
Adaptive Concurrency (Envoy) - dynamic limit of simultaneous requests.
Queue/Surge-buffer: limited queue size with fair refusal 503.
Hedging/Parallel racing: duplicating slow queries (idempotent only).
Timeout budget: separate connect/read/write.
Backpressure: '503 + Retry-After', jitter client exponential retreas.
Slow-loris protection: read/write timeouts, minimum speed.

8) Releases and Traffic Management

Canary (weighted): 1–5–10–25–50–100% с guardrails (p95, 5xx, timeouts).
Blue-Green: instant switch, rollback - DNS/LB.
Shadow/Mirror: copy of requests without affecting the response; PII masking.
Header/Claim-routing: `X-Canary: 1` или `JWT. claims. region/role`.

9) Autoscaling and drainage

HPA/ASG по CPU+RPS+p95+queue-depth.
PreStop hook: Wait for connections to complete.
Warm pool/instance reuse: shortening cold starts.
Capacity planning: target 'utilization of 60-70%' at p95 is normal.

10) Observability and SLO

LB metrics: RPS, p50/p95/p99, 4xx/5xx, open-connections, queue-len, ejections, retries, hit-ratio cache.
Tracing: 'traceparent/x-request-id' through LB → services → databases.
Logs: structural, PII/PAN masks, correlation with upstream.
Route SLO: for example, 'latency p95 ≤ 300 ms', 'availability ≥ 99. 9%`, `5xx ≤ 0. 5%`.
Alerts: by deviations (burn-rate SLO, ejection surge, 5xx/timeout growth).

11) Balancing data and caches

PostgreSQL/MySQL:
  • Read/Write split (ProxySQL/pgpool) + read-replicas; sticky-txn.
  • Failover: synchronous replica for RPO = 0 (more expensive).
Redis:
  • Redis Cluster + hash-slot; for sessions - CH; timeouts/Retryable errors.
Kafka/Redpanda:
  • Balance through partitioning and consumer-groups; not to be confused with HTTP-LB.
  • Object Storage (S3/MinIO): multi-region failover через GSLB/replication.

12) K8s and cloud LBs

Service (ClusterIP/NodePort/LoadBalancer) - base L4.
Ingress/Gateway API - L7 routing, canary weights, TLS.
AWS: NLB (L4, high bandwidth), ALB (L7, WAF, sticky, header-routing).
GCP: Global LB (L7/HTTP(S) с Anycast), TCP/UDP proxy LB.
Azure: Front Door (global), Application Gateway (L7), Load Balancer (L4).

13) Configuration examples

13. 1 NGINX (L7, least_conn, sticky, canary)

nginx upstream api_pool {
least_conn;
server api-1:8080 max_fails=3 fail_timeout=10s;
server api-2:8080 max_fails=3 fail_timeout=10s;
sticky cookie lb_id expires=30m path=/ secure httponly;
}

map $http_x_canary $dst {
default api_pool;
1    canary_pool;
}

upstream canary_pool {
least_conn;
server api-canary:8080 weight=1;
}

server {
listen 443 ssl http2;
location /api/ {
proxy_read_timeout 5s;
proxy_connect_timeout 1s;
proxy_set_header X-Request-Id $request_id;
proxy_pass http://$dst;
}
}

13. 2 HAProxy (P2C, health, slowstart, stick-table)

haproxy backend api balance leastconn option httpchk GET /health default-server inter 3s fall 3 rise 2 slowstart 10s server s1 10. 0. 0. 11:8080 check server s2 10. 0. 0. 12:8080 check stick-table type ip size 100k expire 30m http-request track-sc0 src rate limit per IP http-request deny deny_status 429 if { sc_http_req_rate(0) gt 50 }

13. 3 Envoy (P2C, outlier, retries, adaptive concurrency)

yaml load_assignment: {... }
lb_policy: LEAST_REQUEST least_request_lb_config: { choice_count: 2 }
outlier_detection:
consecutive_5xx: 5 interval: 5s base_ejection_time: 30s typed_extension_protocol_options:
envoy. extensions. filters. http. adaptive_concurrency. v3. AdaptiveConcurrency:
gradient_controller_config:
sample_aggregate_percentile: PERCENTILE_50 retry_policy:
retry_on: "5xx,reset,connect-failure"
num_retries: 2 per_try_timeout: 1s

13. 4 Kubernetes (Gateway API, weighted canary)

yaml apiVersion: gateway. networking. k8s. io/v1 kind: HTTPRoute spec:
rules:
- matches: [{ path: { type: PathPrefix, value: /api }}]
backendRefs:
- name: api-v1 weight: 90 port: 8080
- name: api-v2-canary weight: 10 port: 8080

14) Checklists

Before LB/route release

  • Algorithm selected (P2C/LC/CH) for traffic type.
  • Health-checks and ejection thresholds are configured.
  • Slow-start, warmup, connection-drain enabled.
  • TLS/mTLS, HSTS, secure ciphers; HTTP/2/3 if necessary.
  • Sticky/CH only if required; TTL и fallback.
  • Rate-limit/burst, timeouts, retry-budget, adaptive concurrency.
  • Logs/trails: 'trace-id' is thrown; PII masking.
  • SLO/alerts by p95/5xx/elections/queue-len.
  • Canary weights + rollback plan; shadow with large changes.

For payment/compliance routes

  • Idempotency-Key.
  • Failover between PSPs; same-method checks.
  • Error codes are normalized; ETA/reasons per customer.

For DB/Caches

  • RW-split/replicas; timeouts, network retry.
  • CH/slot-hash for Redis; protection against "hot keys."
  • Latency monitoring and replication-lag.

15) Quality metrics (minimum)

Latency p50/p95/p99 by route/method.
Error rate 4xx/5xx, timeout/overflow.
Open/active connections, queue depth, retry count.
Outlier ejections and causes.
Sticky hit-ratio / cache hit-ratio.
GSLB: regional distribution, faylovers, PoP availability.

16) Anti-patterns

One unprotected monolithic LB.
Sticky sessions "for everything," instead of taking out the state.
Global infinite queues (hide the problem, grow p99).
Retrai without jitter/budget is a "storm" of requests.
Trust'X-Forwarded-For'without a list of trusted proxies.
Lack of drain during depletion → WS/gRPC breaks.
Failure to take into account long-lived connections when autoscale.

17) iGaming specificity

Peaks and tournaments: micro-cache on directories/listings (1-5 s), auto-scale in turn.
Live games/streams: LC for long connections, priority of the nearest PoP.
Payments: geo/currency/amount/provider routing; strict timeouts and idempotency.
Responsible play and compliance: priority to skip requests for limits/locks even with degradation (fail-open/close by policy).

18) Implementation process (4 sprints)

1. Traffic map: protocols, p95/p99 loads, critical routes.
2. LB configuration: algorithms, health/outlier, TLS, limits/timeouts, observability.
3. GSLB/Edge: Anycast/GeoDNS, PoP support, regional data policies.
4. Release strategy: canary/shadow, SLO alerts, autoscale + drain, post-incident analysis.

Final cheat sheet

Choose an algorithm for the type of traffic (P2C/LC/CH) and the duration of the connection.
Keep your upstream "healthy": health-checks + outlier + slow-start + drain.
Manage peak load: rate-limit, adaptive concurrency, queues with failure.
Use GSLB/Anycast for global availability and compliance by region.
Observability and SLO are mandatory; releases - via canary/shadow with rollback plan.
Where possible, remove session from instances and stickiness from LB.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.