Shaping and traffic routing
1) Why all this
Shaping and routing - a base of managed availability and predictable latency:- Stability: do not give "noisy neighbors" to score channels.
- Fairness: Priorities and quotas between tenants/classes.
- Efficiency: we send the request to where it is processed faster/cheaper.
- Change control: canary/weighted releases without risk.
- Savings: optimization of egress/egress-costs and CDN-cache-hitrate.
2) Basic concepts
2. 1 Traffic shaping vs policing
Shaping - aligns traffic by buffering and sending packets at the target rate (smoothing "explosions").
Policing - "punishes" excesses (drop/marking) without buffering. Tougher, but cheaper.
2. 2 Classes, queues and disciplines
Priority Queues (PRIO), WFQ/DRR (Fair Allocation), HTB (Hierarchical Quotas), CoDel/RED (Buffer Block Control), ECN (No Drop Congestion Signal).
On L7 - "queues" in the form of RPS limits/connections/bytes and priority pools.
2. 3 Limiting algorithms
Token Bucket (n tokens added with rate r; request "spends" k tokens).
Leaky Bucket (fixed outflow; good for smoothing).
Global/local limits: local - fast, global - fair (Redis/etcd/per-tenant).
3) QoS per L3/L4
3. 1 DSCP/ToS and Classes of Service
Label packets by traffic type (interactive, backend RPC, background jobs).
In data centers, negotiate the DSCP policy with the network fabric/cloud.
3. 2 Linux tc: HTB + fq_codel (thumbnail)
bash
Clearing tc qdisc del dev eth0 root 2 >/dev/null true
Корневая HTB с 1Gbit tc qdisc add dev eth0 root handle 1: htb default 30 tc class add dev eth0 parent 1: classid 1:1 htb rate 1gbit
Класс latency-critical 200Mbit tc class add dev eth0 parent 1:1 classid 1:10 htb rate 200mbit ceil 1gbit prio 0 tc qdisc add dev eth0 parent 1:10 handle 10: fq_codel
Класс background 100Mbit tc class add dev eth0 parent 1:1 classid 1:30 htb rate 100mbit ceil 1gbit prio 2 tc qdisc add dev eth0 parent 1:30 handle 30: fq_codel
3. 3 ECN/RED/BBR
ECN reduces drops at peaks; RED/CoDel restricts buffering.
BBR (instead of Cubic) often reduces p99 latency, especially on top of WAN/heavy queues.
4) L7 routing (HTTP/gRPC/WS)
4. 1 Routing criteria
Paths/methods ('/api/v1/', 'POST'), headers (client version, feature flags, canary header), cookies (A/B, sticky), JWT stamps (tenant/role), geo/ASN, time windows, load (outlier detection).
Protocol: HTTP/2 (multiplexing), HTTP/3/QUIC (resistance to packet loss), gRPC (bi-di streams), WebSocket (long-lived connections).
4. 2 Weighted split/canary releases
Root 'v1: 95%', 'v2: 5%', automatic increase with "green" metrics.
Cut-offs: errors/latency/business invariants.
Envoy (sketch)
yaml route:
weighted_clusters:
clusters:
- name: svc-v1 weight: 95
- name: svc-v2 weight: 5
Istio
yaml apiVersion: networking. istio. io/v1beta1 kind: VirtualService spec:
hosts: ["svc"]
http:
- route:
- destination: { host: svc, subset: v1, weight: 95 }
- destination: { host: svc, subset: v2, weight: 5 }
4. 3 Sticky sessions and consistent hashing
Session affinity by cookie/IP/JWT identifier.
Consistent hashing for cache clusters, shardy services, rate limit gateways.
Nginx
nginx upstream api {
hash $cookie_user_id consistent;
server 10. 0. 0. 1;
server 10. 0. 0. 2;
}
4. 4 Geo- and latency-aware routing
GeoIP/ASN at the edge (CDN/edge) → the nearest POP/region.
Latency-aware: periodic health samples + RTT measurements → traffic to the "fastest" cluster.
4. 5 Outlier detection / circuit breaking
Knocking out "bad" instances: max-ejection-percent, basic errors/latency.
Circuit breaker: limits on connections/RPS/in queues.
5) Traffic shaping at gateway/mash stack level
5. 1 Rate limiting
Local (per-pod): cheap, but not fair inter-replica.
Global (Redis/etcd): validity per-tenant/API key.
Politicians: per-route, per-method, per-tenant, burst.
Envoy RLS (sketch)
yaml typed_per_filter_config:
envoy. filters. http. ratelimit:
"@type": type. googleapis. com/envoy. extensions. filters. http. ratelimit. v3. RateLimit domain: "api"
rate_limit_service:
grpc_service: { envoy_grpc: { cluster_name: rate_limit_cluster } }
5. 2 Fairness and Priorities
Priority pools are interactive> system> background.
DRR/WFQ equivalents on L7: quotas/weights per-client/tenant.
5. 3 Overload and protection
Load-shed: failure/degradation when budgets are exceeded.
Adaptive concurrency: dynamics of limits from p50/p95/queue-len.
Server-side backpressure: 429/503 + Retry-After.
6) eBPF and CNI level
6. 1 Cilium/eBPF
Filtering/routing in the kernel: fewer context switches, thin L3-L7 policies.
Maglev hashing for stable distribution.
eBPF programs for per-pod QoS (TC/XDP hooks).
6. 2 Calico/NetworkPolicies
L3/L4 access policies, basic priority classes, integration with Kubernetes QoS (Guaranteed/Burstable/BestEffort).
7) Edge/CDN and API gateways
CDN: cache keys (normalization query/headers), stale-while-revalidate, origin protection (rate limit/bot filters).
API gateways: authentication, quotas/tariff plans (per-consumer), SLA restrictions, geo-routing, API version.
WAF: filtering at the edge so as not to waste the CPU of the kernel.
8) Asynchronous buses/streaming
Kafka/NATS/Pulsar: producer/consumer quotas, batch size limit, backpressure via lag.
Event routing: tenant/idempotency-key, flickering partitions for uniformity.
Exactly-once ≈ "effective once": transactional producers + idempotent bruises.
9) Timeouts, retreats, backoff
End-to-end timeouts: client <proxy <service (not vice versa).
Retrai: Limited number with jitterized exponential backoff but no storms.
Idempotency is mandatory in retreats; otherwise - SAGA/compensation.
Hedged/parallel requests (caution): improves p99, increases overall traffic.
10) Observability and SLO
10. 1 Metrics
rate_limit_hits, requests_queued, shed_requests_total, latency_ms{p50,p95,p99}, error_ratio, retry_attempts, outlier_ejections, queue_time_ms.
10. 2 Tracing
Scan Correlation-ID; mark spans with cause type: 'retry' shed 'throttle' queue '.
Links for retrays/hedges to understand the impact on subsystems.
10. 3 Logs/Reports
Summary of drops/shedding/limits, heat maps by route.
Separate panels for fairness index.
10. 4 SLO examples
"p99 ≤ 300 ms at 95 percentile load; shed ≤ 0. 1%; error_ratio ≤ 0. 5%».
"At least 95% of the quota is guaranteed to the interactive class when overloaded."
11) Configuration examples
11. 1 Nginx: rate limit + burst + canary split
nginx map $http_x_canary $canary { default 0; 1 1; }
limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;
upstream api_v1 { server 10. 0. 0. 1; }
upstream api_v2 { server 10. 0. 0. 2; }
server {
location /api/ {
limit_req zone=perip burst=20 nodelay;
if ($canary) { proxy_pass http://api_v2; break; }
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_pass http://api_v1;
}
}
11. 2 Envoy: circuit breaker + outlier detection
yaml circuit_breakers:
thresholds:
- priority: DEFAULT max_connections: 1000 max_pending_requests: 500 max_requests: 2000 outlier_detection:
consecutive_5xx: 5 interval: 10s max_ejection_percent: 50 base_ejection_time: 30s
11. 3 Istio: quota tenant (reserve via label)
yaml apiVersion: security. istio. io/v1 kind: AuthorizationPolicy spec:
selector: { matchLabels: { app: api } }
rules:
- when:
- key: request. headers[x-tenant]
values: ["gold"]
Next - RateLimitPolicy in the limit provider with a large quota pool for "gold."
11. 4 Kubernetes QoS hints
Guaranteed for critical bottoms (requests = limits).
PodPriority & Preemption: Critical bottoms will displace background bottlenecks.
Topology Spread Constraints: zoning for sustainability.
12) Anti-patterns
The global eye limit → false 429/timeouts for important customers.
Retrai without jitter/idempotency → storm.
Confusion of timeouts (client> server) → freezes and "double work."
Common caches/queues for prod and experiments → data contamination.
"Always sticky" without common sense → uneven load/hot knots.
Disabled outlier detection → rotten instance spoils the metrics of the week.
13) Implementation checklist
- Segment traffic: classes/tenants/routes.
- Set the target budgets to RPS/connections/bytes and p95/p99.
- Enable rate limit (local + global), circuit breaker, outlier detection.
- Configure canary split + auto rollback on metrics.
- Record timeouts/retrays with exponential backoff + jitter.
- Enable ECN/BBR (where applicable) and fq_codel/HTB for egress.
- Individual pools/caches/queues for shadow and experiments.
- Dashboards: metrics of limits, queues, latency, fairness.
- SLO and runbook: shedding/rollback/enable criteria.
14) FAQ
Q: What to choose: shaping or policing?
A: For custom paths - shaping (anti-aliasing without drops). For service classes "background "/" bulk "- policing to protect critical flows.
Q: How do you avoid retreat storms?
A: Jitterized backoff, limit of attempts, idempotency, server prompts' Retry-After ', global quotas.
Q: Sticky or hashing?
A: Sticky - when a session is needed/the cache is local to the user; hashing - when you need uniformity and stability of sharding.
Q: What gives HTTP/3/QUIC?
A: Without TCP HOL locks, better loss tolerance, faster recovery - significantly reduces p99/p999 tails.
15) Totals
Efficient shaping and L7 routing is a consistent set of policies: priorities and quotas, fair distribution, secure limits and smart routing, backed by observability and SLO. By following the practices described (HTB/fq_codel/ECN at the lower levels and Envoy/Istio/Nginx/eBPF at the upper), you will get predictable latency tails, resistance to overload and controlled, safe releases.