Rate Limits and Load Control

TL; DR

A reliable circuit is a combination of limits and quotas at several levels (edge→BFF→servis), fair resource allocation (per-tenant/key/route), SLO-adaptive throttling and backprescher instead of silent timeouts. Use token/leaky bucket for "speed," sliding window for accounting quotas, competitive limits for heavy operations, dynamic throttling on degradation and circuit-breaker to fragile upstream. Everything is under observation and with playbooks.

1) Why limits in iGaming/fintech

SLO and sustainability: protection against retray avalanches, tournament/event peaks, payment spikes.
Fairness: one tenant or partner does not "suck out" the entire budget.
Anti-abuse/bots: login/registration, spam, directory scraping.
Cost: containment of expensive calls (KYC, reports, aggregations).
Compliance/fair use: formal "fair use" quotas in contracts.

2) Limit taxonomy

Category	For what	Examples of keys
Rate (speed)	Stable RPS, burst protection	`api_key`, `tenant`, `route`, `country`, `BIN`
Quota (accounting)	Day/month for expensive resources	`tenant-day`, `partner-month`, `report-type`
Concurrent	Limit concurrent heavy operations	`payout:create`, `export:csv`, `recalc`
Cost-based	Complex/expensive queries (GraphQL/search)	"complexity," response size
Adaptive	Reaction to SLO/latency/errors	global/per-route
Ingress/egress	Receive webhooks/outgoing calls	`webhook-receiver`, `psp-outbound`

3) Algorithms and where to apply

3. 1 Token Bucket (default)

Parameters: 'rate '(tokens/sec),' burst '(max margin).
Great for API read, payment/status, BFF.
With an empty bucket → 429 + 'Retry-After'.

3. 2 Leaky Bucket (averaging)

Guaranteed "demolition" of RPS, useful for webhooks so as not to score workers.

3. 3 Fixed Window vs Sliding Window

Fixed - simple, but "boundaries"; Sliding - fair accounting in the window (min/hour/day).
Apply Sliding for contractual quotas.

3. 4 Concurrent Limits

Limit of simultaneously active tasks. Ideal for exports/reports, KYC packages, reprocessing.
In case of shortage - 429/503 + queue/polling.

3. 5 Cost/Complexity Limiter

GraphQL/search: consider "cost" by depth/cardinality/extensions.
Clipping/degradation of "expensive" requests, response with a hint.

4) Dimensioning keys

per-tenant (multi-lease, equity),

per-api_key/client_id (partners),

per-route (more severe critical mutations),

per-user/device/IP/ASN/geo

per-BIN/country (payment methods, protection of issuers and providers),

per-method (GET softer, POST/PUT stricter).

Composition: main key + "risk multiplier" (new account, TOR/proxy, high chargeback risk).

5) SLO-adaptive throttling

Enable dynamic throttling when SLO is in danger:

Triggers: 'p95 latency↑', '5xx↑', 'queue len↑', 'CPU/IO saturation'.
Actions: lower rate/burst, enable outlier-ejection, cut "expensive" routs, temporary degrade (without heavy fields/aggregations).
Return: stepwise (25→50→100%) when normalizing signals of N consecutive intervals.

6) Architecture integration

API Gateway (edge): primary rate/quotas, geo/ASN, HMAC/JWT validation, 429/' Retry-After '.
BFF/Service Mesh: thin per-route/per-tenant limits, concurrent-limits, circuit-breakers to upstream.
Inside the service: semaphores for heavy operations, a backprescher in queues, "work pools" with a bound size.
Webhooks: a separate ingress endpoint with leaky bucket and retray buffer.

7) Configurations (fragments)

Kong / NGINX-style (rate + burst):

yaml plugins:
- name: rate-limiting config:
policy: local minute: 600    # 10 rps limit_by: consumer fault_tolerant: true
- name: response-ratelimiting config:
limits:
heavy: { minute: 60 }

Envoy (circuit + outlier + rate):

yaml circuit_breakers:
thresholds: { max_connections: 1000, max_requests: 800 }
outlier_detection:
consecutive_5xx: 5 interval: 5s base_ejection_time: 30s http_filters:
- name: envoy. filters. http. local_ratelimit typed_config:
token_bucket: { max_tokens: 100, tokens_per_fill: 100, fill_interval: 1s }
filter_enabled: { default_value: 100% }
filter_enforced: { default_value: 100% }

Concurrent-limits (pseudo):

pseudo sema = Semaphore(MAX_ACTIVE_EXPORTS_PER_TENANT)
if! sema. tryAcquire(timeout=100ms) then return 429 with retry_after=rand(1..5)s process()
sema. release()

GraphQL cost guard (idea):

pseudo cost = sum(weight(field) cardinality(arg))
if cost > tenant. budget then reject(429,"query too expensive")

8) Policies for different channels

REST

GET - softer, POST/PATCH/DELETE - stricter; "idempotent" statuses/checks can be retracted.
For payments: limits on'auth/capture/refund 'per-user/tenant/BIN/country.

GraphQL

Depth/complexity caps, persisted/whitelisted queries, limits on aliases.

WebSocket/SSE

Frequency limit 'subscribe/unsubscribe', cap on the number of topics, control of the size of events and send-queue → when the'policy _ disconnect' overflow.

Webhooks

Leaky bucket at reception, per-sender quotas, dead-letter queue, deterministic 2xx/429.

9) Customer feedback

Always return a clear 429 with headings:

`Retry-After: `
`X-RateLimit-Limit/Remaining/Reset`
For quotas - 403 with the code 'quota _ exceeded' and a link to the plan upgrade.
Documentation: limits in OpenAPI/SDL + "Fair Use" pages.

10) Monitoring and dashboards

Metrics:

Hits limits: 'rate. limit. hit 'by keys/routs/tenants.
429/503 доля, latency p50/p95/p99, error rate, queue length, open circuits.
Fair-share: top tenants in consumption, "bully detector."
Webhooks: reception/retrai, drop-rate, middle lag.

SLO benchmarks:

429 not more than 1-3% of the total RPS (without bots).
p95 limiter additive ≤ 5-10 ms per edge.
Degradation recovery time ≤ 10 min.

SQL example (key slice):

sql
SELECT ts::date d, tenant, route,
SUM(hits) AS limit_hits,
SUM(total) AS total_calls,
SUM(hits)::decimal/NULLIF(SUM(total),0) AS hit_rate
FROM ratelimit_stats
GROUP BY 1,2,3
ORDER BY d DESC, hit_rate DESC;

11) Incident playbooks

Retray storm (upstream fall): turn on global throttling, raise backoff, open circuit-breaker, return "quick mistakes" instead of timeouts.
Bot attack/scraping: hard cap by IP/ASN/geo, enable WAF/JS challenge, restrict directories/search.
Tournament/event peak: preemptively raise reading limits, lower "expensive fields," enable cache/denormalization.
Added webhooks from PSP: temporary leaky bucket, prioritization of critical types, expand dead-letter and retray.

12) Testing and UAT

Load: RPS ladder, beads × 10 of normal.
Fairness: emulation of 1 "greedy" tenant - no more than X% of the global budget.
Degradation: SLO adaptation reduces limits and keeps p95 in the corridor.
Boundary cases: window change (min→chas), clock shake (clock skew), Redis scaling/key sharding.
Contract: 429 and Retry-After headers are present, SDK is correctly back-off.

13) Storage for limits

In-memory for local limits (small clusters).
Redis/Memcached for distributed (Lua scripts for atomicity).
Sharding keys by hash; TTL under windows; backup metric for cache loss.
Idempotency: the limiter should not break idempotent repeated calls (accounting by request key).

14) Governance

Limits catalog: who is the owner, what keys/threshold/rationed.
Feature-flags for fast switches (crisis mode).
Versioning policies and RFC process for changes to contractual quotas.
A/B experiments on the selection of optimal thresholds.

15) Anti-patterns

Global one limit "for all APIs."

Only fixed windows → "edge" jumps.
Limit without feedback (no 'Retry-After '/headers).
Silent timeouts instead of fast 429/503.
Lack of per-tenant fair-share - one client strangles the rest.
No GraphQL/complexity search protection.

Zeros in concurrent-guard → DB/PSP "vacuum cleaner."

16) Mini cheat sheet of choice

The default is token bucket (rate + burst) per-tenant + route.
Quotas by money/reports: sliding window day/month.
Heavy operations: concurrent-limits + queue.
GraphQL/поиск: complexity-budgets + persisted queries.
WS/webhooks: leaky bucket + backpressure.
Кризис: dynamic throttling + circuit-breaker + degrade.

Summary

Load control is a multi-level discipline: correct algorithms (bucket/windows/competitiveness), fair limit keys, SLO adaptation and transparent feedback. By sewing limits into gateway/mesh/services, arming GraphQL/WS/webhooks with profile policies and connecting observability with playbooks, you turn peak events and other people's failures into controlled situations - without crashes, disrupted payments and conversion drawdowns.

Rate Limits and Load Control

TL; DR

GraphQL

WebSocket/SSE

Webhooks

Summary

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects