Design rate limiters
1) Why rate limiting
Rate limiting protects the availability and economics of APIs: stops floods, bursts of retrays, credential stuffing, protects expensive operations (money transactions, report generation), smoothes the load on dependent systems (databases/providers). Good design gives fairness, predictability of latency and clear SLOs.
Key objectives
RPS stability and backend overload protection.
Controlled "elasticity" (burst allowance).
Customer differentiation (per-user/per-organization/per-key/per-IP/per-region).
Value model: different "prices" for different transactions.
2) Limit types
RPS limits: requests per second/minute.
Quotas: total budget per period (day/month).
Competitiveness: simultaneous operations (checkout, heavy job).
Rate/Stripe Bytes/sec (Load/Unload).
Weighted limits: the "cost" of the request by complexity (for example, GraphQL complexity, batch size).
Adaptive: tightened in case of anomalies (suspicious activity/errors 401/403/5xx).
3) Algorithms and when to apply them
3. 1 Fixed window counter
Simple: counter per interval (e.g. 100 r/min).
Pros: Minimum cost. Cons: "edge bursts" at the borders of the window.
When: admin panels, low accuracy, low cost.
3. 2 Sliding window (log / counter)
Log - stores timestamps of recent requests, accurate, expensive in memory.
Counter: the average of two adjacent windows (rolling), a compromise of accuracy and price.
When: public APIs of medium traffic, you need smoothness without complex mathematics.
3. 3 Token bucket
Parameters: rate'r '(tokens/sec) and capacity'b' (burst). Each request "burns" the token.
Pros: natural burst allowance, simple implementation. Cons: There's no strict evenness.
When: almost always for RPS, if "volleys" are needed within 'b'.
3. 4 Leaky bucket (drip)
Queue from which "leaks" at a fixed speed.
Pros: even output flow. Cons: More delays.
When: smoothing to external "fragile" providers.
3. 5 GCRA (Generalized Cell Rate Algorithm)
Theoretical arrival time (TAT) model:- 'TAT _ next = max (TAT_current, now) + 1/r ', the request is accepted if' now <= TAT_current + burst/r '.
- Pros: strict, accurate, little memory (keep TAT by key). Cons: harder to understand.
When: need strict control and smoothness, distributed limits.
3. 6 Competitive semaphores
Active operation counter; entrance - if there are "tickets"; exit - release.
When: long-running operations, threads, WebSocket, downloads.
4) Limit Key Model
Key = attribute combination:- `client_id`/`api_key`/`user_id`/`org_id`
- 'IP/ASN/geo '(rough protection)
- 'endpoint/method '(hot routes)
- 'scope/plan/tier '(monetization)
- 'idempotency _ key '(write operations)
- Use a hierarchy: first strict per-key, then per-organization, then global.
5) Cost model
Define "cost" 'cost (q)':- GraphQL: field complexity × depth.
- REST: response/request size, operation type (read = 1, write = 3, report = 10).
- Batch: `cost = min(n, cap)`.
- We limit tokens, not "requests": 'budget - = cost (q)'.
6) Distributed implementation
6. 1 Vaults
In-process: ultra-fast, but not a general limit (suitable for local "soft" limits).
Redis: de facto standard. INCR/EXPIRE, Lua scripts (atomicity), ZSET for sliding window, keys with TTL.
Envoy/NGINX/Kong/Traefik: built-in filters; convenient for perimeter.
Service Mesh: local limits on sidecar + global synchronization.
6. 2 Atomicity and racing
Lua in Redis: checking and incrementing in one step.
GCRA: store one TAT with CAS/script.
Clock consistency: NTP, monotone timers.
Sharding: consistent hash by key; avoid "hot" shards.
6. 3 Geo-distribution
Local limits on regional clusters + upper global (coarse).
CRDT/replication - careful (delays, double consumption). Regional limits with a margin are preferable.
7) Policies and prioritization
Plans: Free/Pro/Enterprise with different 'r', 'b', quotas.
Priorities: "Expensive" routes get less limit or more cost.
Lists: allow-list for integrations, deny by ASN/proxy/TOP.
Escalation: if you exceed it again, lower the limit, enter proof-of-work/captcha/challenges.
8) Examples of configs
8. 1 Envoy (HTTP rate limit filter, pseudo)
yaml rate_limit:
domain: public-api descriptors:
- key: api_key rate_limit:
unit: second requests_per_unit: 50 burst: 100
- key: api_key value: payments. write rate_limit:
unit: second requests_per_unit: 5 burst: 10
8. 2 NGINX (lua + Redis, pseudo)
nginx lua_shared_dict limits 10m;
location /api/ {
access_by_lua_block {
local key = ngx. var. arg_apikey.. ":".. ngx. var. request_method.. ":".. ngx. var. uri
-- token bucket in Redis (evalsha)
local allowed, retry_after = ratelimit_allow(key, 50, 100) -- r=50/s, b=100 if not allowed then ngx. header["Retry-After"] = retry_after return ngx. exit(429)
end
}
proxy_pass http://backend;
}
8. 3 Competitive limits (pseudo code)
pseudo on_request_start(key):
if redis. incr_with_ttl("sem:" + key, ttl=60) > MAX_CONCURRENCY:
redis. decr("sem:" + key); reject(429)
on_request_finish(key):
redis. decr("sem:" + key)
8. 4 GCRA (pseudocode)
pseudo params: r tokens/sec, burst b tat = redis. get(key) or now allowed_time = tat - (b / r)
if now < allowed_time: reject(429, retry_after = allowed_time - now)
tat_next = max(tat, now) + 1/r redis. set(key, tat_next, ttl = ceil(b/r) + safety)
9) Integration with retrays, timeouts and circuit breaker
Retry-budget: limit the share of retrays to X% of the main traffic.
Jitter: when backoff, always add jitter - reduces synchronous bursts.
Circuit breaker: if there is a high error ('5xx', timeouts), lower the limits or transfer some of the routes to "read-only."
Hedging: neat; consider cost to avoid doubling your budget.
10) Observability and management
Метрики: `rps_allowed`, `rps_blocked`, `429_rate`, `retry_after_avg`, `burst_used`, `quota_remaining`, `active_concurrency`.
Labels: by limit key, region, endpoint, plan.
Decision logs (sampled): cause of failure, current counters, key TTL.
Dashboards: heat cards by keys/endpoints, "hot" clients.
Alerts: growth of 429> 2-5% on critical routes, frequent "exhaustion" of quotas, imbalance of shards.
11) Testing and validation
Contract tests of policies (if-then tables).
Loading: bursts (x10 from r), long plateaus, "dirty" patterns (slow-POST, long connections).
Chaos traffic: uneven streams, clock drift, Redis/mesh drop.
A/B-inclusion: canary rollout limits, shadow-solutions (log, but do not block) before inclusion.
12) Edge cases and subtleties
Clock skew: Use 'now ()' from a single source (server), not from client headers.
Idempotency-Key: for write - reduces amplification in retras.
Batch operations: limit the size of the batch and the total cost.
Long-poll/WebSocket: limit the number of channels/subscriptions and duration.
Cold start: "warm" start of counters/preload; otherwise bursts of false 429.
Computationally expensive requests: limit to the execution of business logic.
The TTL: TTL boundaries of the keys shall cover the window + safety margin.
13) Antibot escalations
Stages: warning → 429 + 'Retry-After' → challenge (captcha/puzzle) → temporary block.
Signals: device-fingerprint, cursor/timing behavior, TOR/proxy/hosting.
Policies must be deterministic and reproducible for forensics.
14) Safety and compliance
Deny-by-default on critical routes (write/finance).
Audit: keep decisions on limits for regulatory cases and incident reviews.
PII: limit keys must not disclose personal data in logs.
15) Prod Readiness Checklist
- Limit keys and cost model are defined.
- Selected algorithm (token bucket/GCRA) and storage (Redis/gateway).
- Policies for tier clients + global fuses.
- Competitive limits for long-term transactions.
- Retry-budget, backoff with jitter, integration with circuit breaker.
- Dashboards/alerts, sampled decision logs.
- Canary on and shadow mode.
- Tests of bursts, long plateaus, Redis failures, clock skew.
- Customer documentation: 429, 'Retry-After' codes, exponential backoff examples.
16) TL; DR
Use token bucket or GCRA with Redis/gateway, design limit keys and request costs, add competitive semaphores for long operations, integrate with retry-budget and circuit breaker, monitor 429 and "burst capacity," roll out limits via canary/shadow and be sure to test bursts and storage failure