Rate Limits and Load Control
TL; DR
A reliable circuit is a combination of limits and quotas at several levels (edge→BFF→servis), fair resource allocation (per-tenant/key/route), SLO-adaptive throttling and backprescher instead of silent timeouts. Use token/leaky bucket for "speed," sliding window for accounting quotas, competitive limits for heavy operations, dynamic throttling on degradation and circuit-breaker to fragile upstream. Everything is under observation and with playbooks.
1) Why limits in iGaming/fintech
SLO and sustainability: protection against retray avalanches, tournament/event peaks, payment spikes.
Fairness: one tenant or partner does not "suck out" the entire budget.
Anti-abuse/bots: login/registration, spam, directory scraping.
Cost: containment of expensive calls (KYC, reports, aggregations).
Compliance/fair use: formal "fair use" quotas in contracts.
2) Limit taxonomy
3) Algorithms and where to apply
3. 1 Token Bucket (default)
Parameters: 'rate '(tokens/sec),' burst '(max margin).
Great for API read, payment/status, BFF.
With an empty bucket → 429 + 'Retry-After'.
3. 2 Leaky Bucket (averaging)
Guaranteed "demolition" of RPS, useful for webhooks so as not to score workers.
3. 3 Fixed Window vs Sliding Window
Fixed - simple, but "boundaries"; Sliding - fair accounting in the window (min/hour/day).
Apply Sliding for contractual quotas.
3. 4 Concurrent Limits
Limit of simultaneously active tasks. Ideal for exports/reports, KYC packages, reprocessing.
In case of shortage - 429/503 + queue/polling.
3. 5 Cost/Complexity Limiter
GraphQL/search: consider "cost" by depth/cardinality/extensions.
Clipping/degradation of "expensive" requests, response with a hint.
4) Dimensioning keys
per-tenant (multi-lease, equity),
per-api_key/client_id (partners),
per-route (more severe critical mutations),
per-user/device/IP/ASN/geo
per-BIN/country (payment methods, protection of issuers and providers),
per-method (GET softer, POST/PUT stricter).
Composition: main key + "risk multiplier" (new account, TOR/proxy, high chargeback risk).
5) SLO-adaptive throttling
Enable dynamic throttling when SLO is in danger:- Triggers: 'p95 latency↑', '5xx↑', 'queue len↑', 'CPU/IO saturation'.
- Actions: lower rate/burst, enable outlier-ejection, cut "expensive" routs, temporary degrade (without heavy fields/aggregations).
- Return: stepwise (25→50→100%) when normalizing signals of N consecutive intervals.
6) Architecture integration
API Gateway (edge): primary rate/quotas, geo/ASN, HMAC/JWT validation, 429/' Retry-After '.
BFF/Service Mesh: thin per-route/per-tenant limits, concurrent-limits, circuit-breakers to upstream.
Inside the service: semaphores for heavy operations, a backprescher in queues, "work pools" with a bound size.
Webhooks: a separate ingress endpoint with leaky bucket and retray buffer.
7) Configurations (fragments)
Kong / NGINX-style (rate + burst):yaml plugins:
- name: rate-limiting config:
policy: local minute: 600 # 10 rps limit_by: consumer fault_tolerant: true
- name: response-ratelimiting config:
limits:
heavy: { minute: 60 }
Envoy (circuit + outlier + rate):
yaml circuit_breakers:
thresholds: { max_connections: 1000, max_requests: 800 }
outlier_detection:
consecutive_5xx: 5 interval: 5s base_ejection_time: 30s http_filters:
- name: envoy. filters. http. local_ratelimit typed_config:
token_bucket: { max_tokens: 100, tokens_per_fill: 100, fill_interval: 1s }
filter_enabled: { default_value: 100% }
filter_enforced: { default_value: 100% }
Concurrent-limits (pseudo):
pseudo sema = Semaphore(MAX_ACTIVE_EXPORTS_PER_TENANT)
if! sema. tryAcquire(timeout=100ms) then return 429 with retry_after=rand(1..5)s process()
sema. release()
GraphQL cost guard (idea):
pseudo cost = sum(weight(field) cardinality(arg))
if cost > tenant. budget then reject(429,"query too expensive")
8) Policies for different channels
REST
GET - softer, POST/PATCH/DELETE - stricter; "idempotent" statuses/checks can be retracted.
For payments: limits on'auth/capture/refund 'per-user/tenant/BIN/country.
GraphQL
Depth/complexity caps, persisted/whitelisted queries, limits on aliases.
WebSocket/SSE
Frequency limit 'subscribe/unsubscribe', cap on the number of topics, control of the size of events and send-queue → when the'policy _ disconnect' overflow.
Webhooks
Leaky bucket at reception, per-sender quotas, dead-letter queue, deterministic 2xx/429.
9) Customer feedback
Always return a clear 429 with headings:- `Retry-After:
` - `X-RateLimit-Limit/Remaining/Reset`
- For quotas - 403 with the code 'quota _ exceeded' and a link to the plan upgrade.
- Documentation: limits in OpenAPI/SDL + "Fair Use" pages.
10) Monitoring and dashboards
Metrics:- Hits limits: 'rate. limit. hit 'by keys/routs/tenants.
- 429/503 доля, latency p50/p95/p99, error rate, queue length, open circuits.
- Fair-share: top tenants in consumption, "bully detector."
- Webhooks: reception/retrai, drop-rate, middle lag.
- 429 not more than 1-3% of the total RPS (without bots).
- p95 limiter additive ≤ 5-10 ms per edge.
- Degradation recovery time ≤ 10 min.
sql
SELECT ts::date d, tenant, route,
SUM(hits) AS limit_hits,
SUM(total) AS total_calls,
SUM(hits)::decimal/NULLIF(SUM(total),0) AS hit_rate
FROM ratelimit_stats
GROUP BY 1,2,3
ORDER BY d DESC, hit_rate DESC;
11) Incident playbooks
Retray storm (upstream fall): turn on global throttling, raise backoff, open circuit-breaker, return "quick mistakes" instead of timeouts.
Bot attack/scraping: hard cap by IP/ASN/geo, enable WAF/JS challenge, restrict directories/search.
Tournament/event peak: preemptively raise reading limits, lower "expensive fields," enable cache/denormalization.
Added webhooks from PSP: temporary leaky bucket, prioritization of critical types, expand dead-letter and retray.
12) Testing and UAT
Load: RPS ladder, beads × 10 of normal.
Fairness: emulation of 1 "greedy" tenant - no more than X% of the global budget.
Degradation: SLO adaptation reduces limits and keeps p95 in the corridor.
Boundary cases: window change (min→chas), clock shake (clock skew), Redis scaling/key sharding.
Contract: 429 and Retry-After headers are present, SDK is correctly back-off.
13) Storage for limits
In-memory for local limits (small clusters).
Redis/Memcached for distributed (Lua scripts for atomicity).
Sharding keys by hash; TTL under windows; backup metric for cache loss.
Idempotency: the limiter should not break idempotent repeated calls (accounting by request key).
14) Governance
Limits catalog: who is the owner, what keys/threshold/rationed.
Feature-flags for fast switches (crisis mode).
Versioning policies and RFC process for changes to contractual quotas.
A/B experiments on the selection of optimal thresholds.
15) Anti-patterns
Global one limit "for all APIs."
Only fixed windows → "edge" jumps.
Limit without feedback (no 'Retry-After '/headers).
Silent timeouts instead of fast 429/503.
Lack of per-tenant fair-share - one client strangles the rest.
No GraphQL/complexity search protection.
Zeros in concurrent-guard → DB/PSP "vacuum cleaner."
16) Mini cheat sheet of choice
The default is token bucket (rate + burst) per-tenant + route.
Quotas by money/reports: sliding window day/month.
Heavy operations: concurrent-limits + queue.
GraphQL/поиск: complexity-budgets + persisted queries.
WS/webhooks: leaky bucket + backpressure.
Кризис: dynamic throttling + circuit-breaker + degrade.
Summary
Load control is a multi-level discipline: correct algorithms (bucket/windows/competitiveness), fair limit keys, SLO adaptation and transparent feedback. By sewing limits into gateway/mesh/services, arming GraphQL/WS/webhooks with profile policies and connecting observability with playbooks, you turn peak events and other people's failures into controlled situations - without crashes, disrupted payments and conversion drawdowns.