Design rate limiters

1) Why rate limiting

Rate limiting protects the availability and economics of APIs: stops floods, bursts of retrays, credential stuffing, protects expensive operations (money transactions, report generation), smoothes the load on dependent systems (databases/providers). Good design gives fairness, predictability of latency and clear SLOs.

Key objectives

RPS stability and backend overload protection.
Controlled "elasticity" (burst allowance).
Customer differentiation (per-user/per-organization/per-key/per-IP/per-region).
Value model: different "prices" for different transactions.

2) Limit types

RPS limits: requests per second/minute.
Quotas: total budget per period (day/month).
Competitiveness: simultaneous operations (checkout, heavy job).
Rate/Stripe Bytes/sec (Load/Unload).
Weighted limits: the "cost" of the request by complexity (for example, GraphQL complexity, batch size).
Adaptive: tightened in case of anomalies (suspicious activity/errors 401/403/5xx).

3) Algorithms and when to apply them

3. 1 Fixed window counter

Simple: counter per interval (e.g. 100 r/min).
Pros: Minimum cost. Cons: "edge bursts" at the borders of the window.

When: admin panels, low accuracy, low cost.

3. 2 Sliding window (log / counter)

Log - stores timestamps of recent requests, accurate, expensive in memory.
Counter: the average of two adjacent windows (rolling), a compromise of accuracy and price.

When: public APIs of medium traffic, you need smoothness without complex mathematics.

3. 3 Token bucket

Parameters: rate'r '(tokens/sec) and capacity'b' (burst). Each request "burns" the token.
Pros: natural burst allowance, simple implementation. Cons: There's no strict evenness.

When: almost always for RPS, if "volleys" are needed within 'b'.

3. 4 Leaky bucket (drip)

Queue from which "leaks" at a fixed speed.
Pros: even output flow. Cons: More delays.

When: smoothing to external "fragile" providers.

3. 5 GCRA (Generalized Cell Rate Algorithm)

Theoretical arrival time (TAT) model:

'TAT _ next = max (TAT_current, now) + 1/r ', the request is accepted if' now <= TAT_current + burst/r '.
Pros: strict, accurate, little memory (keep TAT by key). Cons: harder to understand.

When: need strict control and smoothness, distributed limits.

3. 6 Competitive semaphores

Active operation counter; entrance - if there are "tickets"; exit - release.
When: long-running operations, threads, WebSocket, downloads.

4) Limit Key Model

Key = attribute combination:

`client_id`/`api_key`/`user_id`/`org_id`
'IP/ASN/geo '(rough protection)
'endpoint/method '(hot routes)
'scope/plan/tier '(monetization)
'idempotency _ key '(write operations)
Use a hierarchy: first strict per-key, then per-organization, then global.

5) Cost model

Define "cost" 'cost (q)':

GraphQL: field complexity × depth.
REST: response/request size, operation type (read = 1, write = 3, report = 10).
Batch: `cost = min(n, cap)`.
We limit tokens, not "requests": 'budget - = cost (q)'.

6) Distributed implementation

6. 1 Vaults

In-process: ultra-fast, but not a general limit (suitable for local "soft" limits).
Redis: de facto standard. INCR/EXPIRE, Lua scripts (atomicity), ZSET for sliding window, keys with TTL.
Envoy/NGINX/Kong/Traefik: built-in filters; convenient for perimeter.
Service Mesh: local limits on sidecar + global synchronization.

6. 2 Atomicity and racing

Lua in Redis: checking and incrementing in one step.
GCRA: store one TAT with CAS/script.
Clock consistency: NTP, monotone timers.
Sharding: consistent hash by key; avoid "hot" shards.

6. 3 Geo-distribution

Local limits on regional clusters + upper global (coarse).
CRDT/replication - careful (delays, double consumption). Regional limits with a margin are preferable.

7) Policies and prioritization

Plans: Free/Pro/Enterprise with different 'r', 'b', quotas.
Priorities: "Expensive" routes get less limit or more cost.
Lists: allow-list for integrations, deny by ASN/proxy/TOP.
Escalation: if you exceed it again, lower the limit, enter proof-of-work/captcha/challenges.

8) Examples of configs

8. 1 Envoy (HTTP rate limit filter, pseudo)

yaml rate_limit:
domain: public-api descriptors:
- key: api_key rate_limit:
unit: second requests_per_unit: 50 burst: 100
- key: api_key value: payments. write rate_limit:
unit: second requests_per_unit: 5 burst: 10

8. 2 NGINX (lua + Redis, pseudo)

nginx lua_shared_dict limits 10m;

location /api/ {
access_by_lua_block {
local key = ngx. var. arg_apikey.. ":".. ngx. var. request_method.. ":".. ngx. var. uri
-- token bucket in Redis (evalsha)
local allowed, retry_after = ratelimit_allow(key, 50, 100) -- r=50/s, b=100 if not allowed then ngx. header["Retry-After"] = retry_after return ngx. exit(429)
end
}
proxy_pass http://backend;
}

8. 3 Competitive limits (pseudo code)

pseudo on_request_start(key):
if redis. incr_with_ttl("sem:" + key, ttl=60) > MAX_CONCURRENCY:
redis. decr("sem:" + key); reject(429)
on_request_finish(key):
redis. decr("sem:" + key)

8. 4 GCRA (pseudocode)

pseudo params: r tokens/sec, burst b tat = redis. get(key) or now allowed_time = tat - (b / r)
if now < allowed_time: reject(429, retry_after = allowed_time - now)
tat_next = max(tat, now) + 1/r redis. set(key, tat_next, ttl = ceil(b/r) + safety)

9) Integration with retrays, timeouts and circuit breaker

Retry-budget: limit the share of retrays to X% of the main traffic.
Jitter: when backoff, always add jitter - reduces synchronous bursts.

Circuit breaker: if there is a high error ('5xx', timeouts), lower the limits or transfer some of the routes to "read-only."

Hedging: neat; consider cost to avoid doubling your budget.

10) Observability and management

Метрики: `rps_allowed`, `rps_blocked`, `429_rate`, `retry_after_avg`, `burst_used`, `quota_remaining`, `active_concurrency`.
Labels: by limit key, region, endpoint, plan.
Decision logs (sampled): cause of failure, current counters, key TTL.
Dashboards: heat cards by keys/endpoints, "hot" clients.
Alerts: growth of 429> 2-5% on critical routes, frequent "exhaustion" of quotas, imbalance of shards.

11) Testing and validation

Contract tests of policies (if-then tables).
Loading: bursts (x10 from r), long plateaus, "dirty" patterns (slow-POST, long connections).
Chaos traffic: uneven streams, clock drift, Redis/mesh drop.
A/B-inclusion: canary rollout limits, shadow-solutions (log, but do not block) before inclusion.

12) Edge cases and subtleties

Clock skew: Use 'now ()' from a single source (server), not from client headers.
Idempotency-Key: for write - reduces amplification in retras.
Batch operations: limit the size of the batch and the total cost.
Long-poll/WebSocket: limit the number of channels/subscriptions and duration.
Cold start: "warm" start of counters/preload; otherwise bursts of false 429.
Computationally expensive requests: limit to the execution of business logic.
The TTL: TTL boundaries of the keys shall cover the window + safety margin.

13) Antibot escalations

Stages: warning → 429 + 'Retry-After' → challenge (captcha/puzzle) → temporary block.
Signals: device-fingerprint, cursor/timing behavior, TOR/proxy/hosting.
Policies must be deterministic and reproducible for forensics.

14) Safety and compliance

Deny-by-default on critical routes (write/finance).
Audit: keep decisions on limits for regulatory cases and incident reviews.
PII: limit keys must not disclose personal data in logs.

15) Prod Readiness Checklist

Limit keys and cost model are defined.
Selected algorithm (token bucket/GCRA) and storage (Redis/gateway).
Policies for tier clients + global fuses.
Competitive limits for long-term transactions.
Retry-budget, backoff with jitter, integration with circuit breaker.
Dashboards/alerts, sampled decision logs.
Canary on and shadow mode.
Tests of bursts, long plateaus, Redis failures, clock skew.
Customer documentation: 429, 'Retry-After' codes, exponential backoff examples.

16) TL; DR

Use token bucket or GCRA with Redis/gateway, design limit keys and request costs, add competitive semaphores for long operations, integrate with retry-budget and circuit breaker, monitor 429 and "burst capacity," roll out limits via canary/shadow and be sure to test bursts and storage failure

Design rate limiters

Key objectives

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects