GH GambleHub

Design rate limiters

1) Why rate limiting

Rate limiting protects the availability and economics of APIs: stops floods, bursts of retrays, credential stuffing, protects expensive operations (money transactions, report generation), smoothes the load on dependent systems (databases/providers). Good design gives fairness, predictability of latency and clear SLOs.

Key objectives

RPS stability and backend overload protection.
Controlled "elasticity" (burst allowance).
Customer differentiation (per-user/per-organization/per-key/per-IP/per-region).
Value model: different "prices" for different transactions.

2) Limit types

RPS limits: requests per second/minute.
Quotas: total budget per period (day/month).
Competitiveness: simultaneous operations (checkout, heavy job).
Rate/Stripe Bytes/sec (Load/Unload).
Weighted limits: the "cost" of the request by complexity (for example, GraphQL complexity, batch size).
Adaptive: tightened in case of anomalies (suspicious activity/errors 401/403/5xx).

3) Algorithms and when to apply them

3. 1 Fixed window counter

Simple: counter per interval (e.g. 100 r/min).
Pros: Minimum cost. Cons: "edge bursts" at the borders of the window.

When: admin panels, low accuracy, low cost.

3. 2 Sliding window (log / counter)

Log - stores timestamps of recent requests, accurate, expensive in memory.
Counter: the average of two adjacent windows (rolling), a compromise of accuracy and price.

When: public APIs of medium traffic, you need smoothness without complex mathematics.

3. 3 Token bucket

Parameters: rate'r '(tokens/sec) and capacity'b' (burst). Each request "burns" the token.
Pros: natural burst allowance, simple implementation. Cons: There's no strict evenness.

When: almost always for RPS, if "volleys" are needed within 'b'.

3. 4 Leaky bucket (drip)

Queue from which "leaks" at a fixed speed.
Pros: even output flow. Cons: More delays.

When: smoothing to external "fragile" providers.

3. 5 GCRA (Generalized Cell Rate Algorithm)

Theoretical arrival time (TAT) model:
  • 'TAT _ next = max (TAT_current, now) + 1/r ', the request is accepted if' now <= TAT_current + burst/r '.
  • Pros: strict, accurate, little memory (keep TAT by key). Cons: harder to understand.

When: need strict control and smoothness, distributed limits.

3. 6 Competitive semaphores

Active operation counter; entrance - if there are "tickets"; exit - release.
When: long-running operations, threads, WebSocket, downloads.

4) Limit Key Model

Key = attribute combination:
  • `client_id`/`api_key`/`user_id`/`org_id`
  • 'IP/ASN/geo '(rough protection)
  • 'endpoint/method '(hot routes)
  • 'scope/plan/tier '(monetization)
  • 'idempotency _ key '(write operations)
  • Use a hierarchy: first strict per-key, then per-organization, then global.

5) Cost model

Define "cost" 'cost (q)':
  • GraphQL: field complexity × depth.
  • REST: response/request size, operation type (read = 1, write = 3, report = 10).
  • Batch: `cost = min(n, cap)`.
  • We limit tokens, not "requests": 'budget - = cost (q)'.

6) Distributed implementation

6. 1 Vaults

In-process: ultra-fast, but not a general limit (suitable for local "soft" limits).
Redis: de facto standard. INCR/EXPIRE, Lua scripts (atomicity), ZSET for sliding window, keys with TTL.
Envoy/NGINX/Kong/Traefik: built-in filters; convenient for perimeter.
Service Mesh: local limits on sidecar + global synchronization.

6. 2 Atomicity and racing

Lua in Redis: checking and incrementing in one step.
GCRA: store one TAT with CAS/script.
Clock consistency: NTP, monotone timers.
Sharding: consistent hash by key; avoid "hot" shards.

6. 3 Geo-distribution

Local limits on regional clusters + upper global (coarse).
CRDT/replication - careful (delays, double consumption). Regional limits with a margin are preferable.

7) Policies and prioritization

Plans: Free/Pro/Enterprise with different 'r', 'b', quotas.
Priorities: "Expensive" routes get less limit or more cost.
Lists: allow-list for integrations, deny by ASN/proxy/TOP.
Escalation: if you exceed it again, lower the limit, enter proof-of-work/captcha/challenges.

8) Examples of configs

8. 1 Envoy (HTTP rate limit filter, pseudo)

yaml rate_limit:
domain: public-api descriptors:
- key: api_key rate_limit:
unit: second requests_per_unit: 50 burst: 100
- key: api_key value: payments. write rate_limit:
unit: second requests_per_unit: 5 burst: 10

8. 2 NGINX (lua + Redis, pseudo)

nginx lua_shared_dict limits 10m;

location /api/ {
access_by_lua_block {
local key = ngx. var. arg_apikey.. ":".. ngx. var. request_method.. ":".. ngx. var. uri
-- token bucket in Redis (evalsha)
local allowed, retry_after = ratelimit_allow(key, 50, 100) -- r=50/s, b=100 if not allowed then ngx. header["Retry-After"] = retry_after return ngx. exit(429)
end
}
proxy_pass http://backend;
}

8. 3 Competitive limits (pseudo code)

pseudo on_request_start(key):
if redis. incr_with_ttl("sem:" + key, ttl=60) > MAX_CONCURRENCY:
redis. decr("sem:" + key); reject(429)
on_request_finish(key):
redis. decr("sem:" + key)

8. 4 GCRA (pseudocode)

pseudo params: r tokens/sec, burst b tat = redis. get(key) or now allowed_time = tat - (b / r)
if now < allowed_time: reject(429, retry_after = allowed_time - now)
tat_next = max(tat, now) + 1/r redis. set(key, tat_next, ttl = ceil(b/r) + safety)

9) Integration with retrays, timeouts and circuit breaker

Retry-budget: limit the share of retrays to X% of the main traffic.
Jitter: when backoff, always add jitter - reduces synchronous bursts.

Circuit breaker: if there is a high error ('5xx', timeouts), lower the limits or transfer some of the routes to "read-only."

Hedging: neat; consider cost to avoid doubling your budget.

10) Observability and management

Метрики: `rps_allowed`, `rps_blocked`, `429_rate`, `retry_after_avg`, `burst_used`, `quota_remaining`, `active_concurrency`.
Labels: by limit key, region, endpoint, plan.
Decision logs (sampled): cause of failure, current counters, key TTL.
Dashboards: heat cards by keys/endpoints, "hot" clients.
Alerts: growth of 429> 2-5% on critical routes, frequent "exhaustion" of quotas, imbalance of shards.

11) Testing and validation

Contract tests of policies (if-then tables).
Loading: bursts (x10 from r), long plateaus, "dirty" patterns (slow-POST, long connections).
Chaos traffic: uneven streams, clock drift, Redis/mesh drop.
A/B-inclusion: canary rollout limits, shadow-solutions (log, but do not block) before inclusion.

12) Edge cases and subtleties

Clock skew: Use 'now ()' from a single source (server), not from client headers.
Idempotency-Key: for write - reduces amplification in retras.
Batch operations: limit the size of the batch and the total cost.
Long-poll/WebSocket: limit the number of channels/subscriptions and duration.
Cold start: "warm" start of counters/preload; otherwise bursts of false 429.
Computationally expensive requests: limit to the execution of business logic.
The TTL: TTL boundaries of the keys shall cover the window + safety margin.

13) Antibot escalations

Stages: warning → 429 + 'Retry-After' → challenge (captcha/puzzle) → temporary block.
Signals: device-fingerprint, cursor/timing behavior, TOR/proxy/hosting.
Policies must be deterministic and reproducible for forensics.

14) Safety and compliance

Deny-by-default on critical routes (write/finance).
Audit: keep decisions on limits for regulatory cases and incident reviews.
PII: limit keys must not disclose personal data in logs.

15) Prod Readiness Checklist

  • Limit keys and cost model are defined.
  • Selected algorithm (token bucket/GCRA) and storage (Redis/gateway).
  • Policies for tier clients + global fuses.
  • Competitive limits for long-term transactions.
  • Retry-budget, backoff with jitter, integration with circuit breaker.
  • Dashboards/alerts, sampled decision logs.
  • Canary on and shadow mode.
  • Tests of bursts, long plateaus, Redis failures, clock skew.
  • Customer documentation: 429, 'Retry-After' codes, exponential backoff examples.

16) TL; DR

Use token bucket or GCRA with Redis/gateway, design limit keys and request costs, add competitive semaphores for long operations, integrate with retry-budget and circuit breaker, monitor 429 and "burst capacity," roll out limits via canary/shadow and be sure to test bursts and storage failure

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.