Shaping and traffic routing

1) Why all this

Shaping and routing - a base of managed availability and predictable latency:

Stability: do not give "noisy neighbors" to score channels.
Fairness: Priorities and quotas between tenants/classes.
Efficiency: we send the request to where it is processed faster/cheaper.
Change control: canary/weighted releases without risk.
Savings: optimization of egress/egress-costs and CDN-cache-hitrate.

2) Basic concepts

2. 1 Traffic shaping vs policing

Shaping - aligns traffic by buffering and sending packets at the target rate (smoothing "explosions").
Policing - "punishes" excesses (drop/marking) without buffering. Tougher, but cheaper.

2. 2 Classes, queues and disciplines

Priority Queues (PRIO), WFQ/DRR (Fair Allocation), HTB (Hierarchical Quotas), CoDel/RED (Buffer Block Control), ECN (No Drop Congestion Signal).
On L7 - "queues" in the form of RPS limits/connections/bytes and priority pools.

2. 3 Limiting algorithms

Token Bucket (n tokens added with rate r; request "spends" k tokens).
Leaky Bucket (fixed outflow; good for smoothing).
Global/local limits: local - fast, global - fair (Redis/etcd/per-tenant).

3) QoS per L3/L4

3. 1 DSCP/ToS and Classes of Service

Label packets by traffic type (interactive, backend RPC, background jobs).
In data centers, negotiate the DSCP policy with the network fabric/cloud.

3. 2 Linux tc: HTB + fq_codel (thumbnail)

bash
Clearing tc qdisc del dev eth0 root 2 >/dev/null         true

Корневая HTB с 1Gbit tc qdisc add dev eth0 root handle 1: htb default 30 tc class add dev eth0 parent 1: classid 1:1 htb rate 1gbit

Класс latency-critical 200Mbit tc class add dev eth0 parent 1:1 classid 1:10 htb rate 200mbit ceil 1gbit prio 0 tc qdisc add dev eth0 parent 1:10 handle 10: fq_codel

Класс background 100Mbit tc class add dev eth0 parent 1:1 classid 1:30 htb rate 100mbit ceil 1gbit prio 2 tc qdisc add dev eth0 parent 1:30 handle 30: fq_codel

3. 3 ECN/RED/BBR

ECN reduces drops at peaks; RED/CoDel restricts buffering.
BBR (instead of Cubic) often reduces p99 latency, especially on top of WAN/heavy queues.

4) L7 routing (HTTP/gRPC/WS)

4. 1 Routing criteria

Paths/methods ('/api/v1/', 'POST'), headers (client version, feature flags, canary header), cookies (A/B, sticky), JWT stamps (tenant/role), geo/ASN, time windows, load (outlier detection).
Protocol: HTTP/2 (multiplexing), HTTP/3/QUIC (resistance to packet loss), gRPC (bi-di streams), WebSocket (long-lived connections).

4. 2 Weighted split/canary releases

Root 'v1: 95%', 'v2: 5%', automatic increase with "green" metrics.
Cut-offs: errors/latency/business invariants.

Envoy (sketch)

yaml route:
weighted_clusters:
clusters:
- name: svc-v1 weight: 95
- name: svc-v2 weight: 5

Istio

yaml apiVersion: networking. istio. io/v1beta1 kind: VirtualService spec:
hosts: ["svc"]
http:
- route:
- destination: { host: svc, subset: v1, weight: 95 }
- destination: { host: svc, subset: v2, weight: 5 }

4. 3 Sticky sessions and consistent hashing

Session affinity by cookie/IP/JWT identifier.
Consistent hashing for cache clusters, shardy services, rate limit gateways.

Nginx

nginx upstream api {
hash $cookie_user_id consistent;
server 10. 0. 0. 1;
server 10. 0. 0. 2;
}

4. 4 Geo- and latency-aware routing

GeoIP/ASN at the edge (CDN/edge) → the nearest POP/region.
Latency-aware: periodic health samples + RTT measurements → traffic to the "fastest" cluster.

4. 5 Outlier detection / circuit breaking

Knocking out "bad" instances: max-ejection-percent, basic errors/latency.
Circuit breaker: limits on connections/RPS/in queues.

5) Traffic shaping at gateway/mash stack level

5. 1 Rate limiting

Local (per-pod): cheap, but not fair inter-replica.
Global (Redis/etcd): validity per-tenant/API key.
Politicians: per-route, per-method, per-tenant, burst.

Envoy RLS (sketch)

yaml typed_per_filter_config:
envoy. filters. http. ratelimit:
"@type": type. googleapis. com/envoy. extensions. filters. http. ratelimit. v3. RateLimit domain: "api"
rate_limit_service:
grpc_service: { envoy_grpc: { cluster_name: rate_limit_cluster } }

5. 2 Fairness and Priorities

Priority pools are interactive> system> background.
DRR/WFQ equivalents on L7: quotas/weights per-client/tenant.

5. 3 Overload and protection

Load-shed: failure/degradation when budgets are exceeded.
Adaptive concurrency: dynamics of limits from p50/p95/queue-len.
Server-side backpressure: 429/503 + Retry-After.

6) eBPF and CNI level

6. 1 Cilium/eBPF

Filtering/routing in the kernel: fewer context switches, thin L3-L7 policies.
Maglev hashing for stable distribution.
eBPF programs for per-pod QoS (TC/XDP hooks).

6. 2 Calico/NetworkPolicies

L3/L4 access policies, basic priority classes, integration with Kubernetes QoS (Guaranteed/Burstable/BestEffort).

7) Edge/CDN and API gateways

CDN: cache keys (normalization query/headers), stale-while-revalidate, origin protection (rate limit/bot filters).
API gateways: authentication, quotas/tariff plans (per-consumer), SLA restrictions, geo-routing, API version.
WAF: filtering at the edge so as not to waste the CPU of the kernel.

8) Asynchronous buses/streaming

Kafka/NATS/Pulsar: producer/consumer quotas, batch size limit, backpressure via lag.
Event routing: tenant/idempotency-key, flickering partitions for uniformity.
Exactly-once ≈ "effective once": transactional producers + idempotent bruises.

9) Timeouts, retreats, backoff

End-to-end timeouts: client <proxy <service (not vice versa).
Retrai: Limited number with jitterized exponential backoff but no storms.
Idempotency is mandatory in retreats; otherwise - SAGA/compensation.
Hedged/parallel requests (caution): improves p99, increases overall traffic.

10) Observability and SLO

10. 1 Metrics

rate_limit_hits, requests_queued, shed_requests_total, latency_ms{p50,p95,p99}, error_ratio, retry_attempts, outlier_ejections, queue_time_ms.

Classes: class = interactive	system	background, tenant, route.

10. 2 Tracing

Scan Correlation-ID; mark spans with cause type: 'retry' shed 'throttle' queue '.
Links for retrays/hedges to understand the impact on subsystems.

10. 3 Logs/Reports

Summary of drops/shedding/limits, heat maps by route.
Separate panels for fairness index.

10. 4 SLO examples

"p99 ≤ 300 ms at 95 percentile load; shed ≤ 0. 1%; error_ratio ≤ 0. 5%».

"At least 95% of the quota is guaranteed to the interactive class when overloaded."

11) Configuration examples

11. 1 Nginx: rate limit + burst + canary split

nginx map $http_x_canary $canary { default 0; 1 1; }

limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;

upstream api_v1 { server 10. 0. 0. 1; }
upstream api_v2 { server 10. 0. 0. 2; }

server {
location /api/ {
limit_req zone=perip burst=20 nodelay;
if ($canary) { proxy_pass http://api_v2; break; }
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_pass http://api_v1;
}
}

11. 2 Envoy: circuit breaker + outlier detection

yaml circuit_breakers:
thresholds:
- priority: DEFAULT max_connections: 1000 max_pending_requests: 500 max_requests: 2000 outlier_detection:
consecutive_5xx: 5 interval: 10s max_ejection_percent: 50 base_ejection_time: 30s

11. 3 Istio: quota tenant (reserve via label)

yaml apiVersion: security. istio. io/v1 kind: AuthorizationPolicy spec:
selector: { matchLabels: { app: api } }
rules:
- when:
- key: request. headers[x-tenant]
values: ["gold"]
Next - RateLimitPolicy in the limit provider with a large quota pool for "gold."

11. 4 Kubernetes QoS hints

Guaranteed for critical bottoms (requests = limits).
PodPriority & Preemption: Critical bottoms will displace background bottlenecks.
Topology Spread Constraints: zoning for sustainability.

12) Anti-patterns

The global eye limit → false 429/timeouts for important customers.
Retrai without jitter/idempotency → storm.

Confusion of timeouts (client> server) → freezes and "double work."

Common caches/queues for prod and experiments → data contamination.
"Always sticky" without common sense → uneven load/hot knots.
Disabled outlier detection → rotten instance spoils the metrics of the week.

13) Implementation checklist

Segment traffic: classes/tenants/routes.
Set the target budgets to RPS/connections/bytes and p95/p99.
Enable rate limit (local + global), circuit breaker, outlier detection.
Configure canary split + auto rollback on metrics.
Record timeouts/retrays with exponential backoff + jitter.
Enable ECN/BBR (where applicable) and fq_codel/HTB for egress.
Individual pools/caches/queues for shadow and experiments.
Dashboards: metrics of limits, queues, latency, fairness.
SLO and runbook: shedding/rollback/enable criteria.

14) FAQ

Q: What to choose: shaping or policing?
A: For custom paths - shaping (anti-aliasing without drops). For service classes "background "/" bulk "- policing to protect critical flows.

Q: How do you avoid retreat storms?
A: Jitterized backoff, limit of attempts, idempotency, server prompts' Retry-After ', global quotas.

Q: Sticky or hashing?
A: Sticky - when a session is needed/the cache is local to the user; hashing - when you need uniformity and stability of sharding.

Q: What gives HTTP/3/QUIC?
A: Without TCP HOL locks, better loss tolerance, faster recovery - significantly reduces p99/p999 tails.

15) Totals

Efficient shaping and L7 routing is a consistent set of policies: priorities and quotas, fair distribution, secure limits and smart routing, backed by observability and SLO. By following the practices described (HTB/fq_codel/ECN at the lower levels and Envoy/Istio/Nginx/eBPF at the upper), you will get predictable latency tails, resistance to overload and controlled, safe releases.

Shaping and traffic routing

Envoy (sketch)

Istio

Nginx

Envoy RLS (sketch)

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects