GH GambleHub

Rate Limits and Load Control

TL; DR

A reliable circuit is a combination of limits and quotas at several levels (edge→BFF→servis), fair resource allocation (per-tenant/key/route), SLO-adaptive throttling and backprescher instead of silent timeouts. Use token/leaky bucket for "speed," sliding window for accounting quotas, competitive limits for heavy operations, dynamic throttling on degradation and circuit-breaker to fragile upstream. Everything is under observation and with playbooks.

1) Why limits in iGaming/fintech

SLO and sustainability: protection against retray avalanches, tournament/event peaks, payment spikes.
Fairness: one tenant or partner does not "suck out" the entire budget.
Anti-abuse/bots: login/registration, spam, directory scraping.
Cost: containment of expensive calls (KYC, reports, aggregations).
Compliance/fair use: formal "fair use" quotas in contracts.

2) Limit taxonomy

CategoryFor whatExamples of keys
Rate (speed)Stable RPS, burst protection`api_key`, `tenant`, `route`, `country`, `BIN`
Quota (accounting)Day/month for expensive resources`tenant-day`, `partner-month`, `report-type`
ConcurrentLimit concurrent heavy operations`payout:create`, `export:csv`, `recalc`
Cost-basedComplex/expensive queries (GraphQL/search)"complexity," response size
AdaptiveReaction to SLO/latency/errorsglobal/per-route
Ingress/egressReceive webhooks/outgoing calls`webhook-receiver`, `psp-outbound`

3) Algorithms and where to apply

3. 1 Token Bucket (default)

Parameters: 'rate '(tokens/sec),' burst '(max margin).
Great for API read, payment/status, BFF.
With an empty bucket → 429 + 'Retry-After'.

3. 2 Leaky Bucket (averaging)

Guaranteed "demolition" of RPS, useful for webhooks so as not to score workers.

3. 3 Fixed Window vs Sliding Window

Fixed - simple, but "boundaries"; Sliding - fair accounting in the window (min/hour/day).
Apply Sliding for contractual quotas.

3. 4 Concurrent Limits

Limit of simultaneously active tasks. Ideal for exports/reports, KYC packages, reprocessing.
In case of shortage - 429/503 + queue/polling.

3. 5 Cost/Complexity Limiter

GraphQL/search: consider "cost" by depth/cardinality/extensions.
Clipping/degradation of "expensive" requests, response with a hint.

4) Dimensioning keys

per-tenant (multi-lease, equity),

per-api_key/client_id (partners),

per-route (more severe critical mutations),

per-user/device/IP/ASN/geo

per-BIN/country (payment methods, protection of issuers and providers),

per-method (GET softer, POST/PUT stricter).

Composition: main key + "risk multiplier" (new account, TOR/proxy, high chargeback risk).

5) SLO-adaptive throttling

Enable dynamic throttling when SLO is in danger:
  • Triggers: 'p95 latency↑', '5xx↑', 'queue len↑', 'CPU/IO saturation'.
  • Actions: lower rate/burst, enable outlier-ejection, cut "expensive" routs, temporary degrade (without heavy fields/aggregations).
  • Return: stepwise (25→50→100%) when normalizing signals of N consecutive intervals.

6) Architecture integration

API Gateway (edge): primary rate/quotas, geo/ASN, HMAC/JWT validation, 429/' Retry-After '.
BFF/Service Mesh: thin per-route/per-tenant limits, concurrent-limits, circuit-breakers to upstream.
Inside the service: semaphores for heavy operations, a backprescher in queues, "work pools" with a bound size.
Webhooks: a separate ingress endpoint with leaky bucket and retray buffer.

7) Configurations (fragments)

Kong / NGINX-style (rate + burst):
yaml plugins:
- name: rate-limiting config:
policy: local minute: 600    # 10 rps limit_by: consumer fault_tolerant: true
- name: response-ratelimiting config:
limits:
heavy: { minute: 60 }
Envoy (circuit + outlier + rate):
yaml circuit_breakers:
thresholds: { max_connections: 1000, max_requests: 800 }
outlier_detection:
consecutive_5xx: 5 interval: 5s base_ejection_time: 30s http_filters:
- name: envoy. filters. http. local_ratelimit typed_config:
token_bucket: { max_tokens: 100, tokens_per_fill: 100, fill_interval: 1s }
filter_enabled: { default_value: 100% }
filter_enforced: { default_value: 100% }
Concurrent-limits (pseudo):
pseudo sema = Semaphore(MAX_ACTIVE_EXPORTS_PER_TENANT)
if! sema. tryAcquire(timeout=100ms) then return 429 with retry_after=rand(1..5)s process()
sema. release()
GraphQL cost guard (idea):
pseudo cost = sum(weight(field) cardinality(arg))
if cost > tenant. budget then reject(429,"query too expensive")

8) Policies for different channels

REST

GET - softer, POST/PATCH/DELETE - stricter; "idempotent" statuses/checks can be retracted.
For payments: limits on'auth/capture/refund 'per-user/tenant/BIN/country.

GraphQL

Depth/complexity caps, persisted/whitelisted queries, limits on aliases.

WebSocket/SSE

Frequency limit 'subscribe/unsubscribe', cap on the number of topics, control of the size of events and send-queue → when the'policy _ disconnect' overflow.

Webhooks

Leaky bucket at reception, per-sender quotas, dead-letter queue, deterministic 2xx/429.

9) Customer feedback

Always return a clear 429 with headings:
  • `Retry-After: `
  • `X-RateLimit-Limit/Remaining/Reset`
  • For quotas - 403 with the code 'quota _ exceeded' and a link to the plan upgrade.
  • Documentation: limits in OpenAPI/SDL + "Fair Use" pages.

10) Monitoring and dashboards

Metrics:
  • Hits limits: 'rate. limit. hit 'by keys/routs/tenants.
  • 429/503 доля, latency p50/p95/p99, error rate, queue length, open circuits.
  • Fair-share: top tenants in consumption, "bully detector."
  • Webhooks: reception/retrai, drop-rate, middle lag.
SLO benchmarks:
  • 429 not more than 1-3% of the total RPS (without bots).
  • p95 limiter additive ≤ 5-10 ms per edge.
  • Degradation recovery time ≤ 10 min.
SQL example (key slice):
sql
SELECT ts::date d, tenant, route,
SUM(hits) AS limit_hits,
SUM(total) AS total_calls,
SUM(hits)::decimal/NULLIF(SUM(total),0) AS hit_rate
FROM ratelimit_stats
GROUP BY 1,2,3
ORDER BY d DESC, hit_rate DESC;

11) Incident playbooks

Retray storm (upstream fall): turn on global throttling, raise backoff, open circuit-breaker, return "quick mistakes" instead of timeouts.
Bot attack/scraping: hard cap by IP/ASN/geo, enable WAF/JS challenge, restrict directories/search.
Tournament/event peak: preemptively raise reading limits, lower "expensive fields," enable cache/denormalization.
Added webhooks from PSP: temporary leaky bucket, prioritization of critical types, expand dead-letter and retray.

12) Testing and UAT

Load: RPS ladder, beads × 10 of normal.
Fairness: emulation of 1 "greedy" tenant - no more than X% of the global budget.
Degradation: SLO adaptation reduces limits and keeps p95 in the corridor.
Boundary cases: window change (min→chas), clock shake (clock skew), Redis scaling/key sharding.
Contract: 429 and Retry-After headers are present, SDK is correctly back-off.

13) Storage for limits

In-memory for local limits (small clusters).
Redis/Memcached for distributed (Lua scripts for atomicity).
Sharding keys by hash; TTL under windows; backup metric for cache loss.
Idempotency: the limiter should not break idempotent repeated calls (accounting by request key).

14) Governance

Limits catalog: who is the owner, what keys/threshold/rationed.
Feature-flags for fast switches (crisis mode).
Versioning policies and RFC process for changes to contractual quotas.
A/B experiments on the selection of optimal thresholds.

15) Anti-patterns

Global one limit "for all APIs."

Only fixed windows → "edge" jumps.
Limit without feedback (no 'Retry-After '/headers).
Silent timeouts instead of fast 429/503.
Lack of per-tenant fair-share - one client strangles the rest.
No GraphQL/complexity search protection.

Zeros in concurrent-guard → DB/PSP "vacuum cleaner."

16) Mini cheat sheet of choice

The default is token bucket (rate + burst) per-tenant + route.
Quotas by money/reports: sliding window day/month.
Heavy operations: concurrent-limits + queue.
GraphQL/поиск: complexity-budgets + persisted queries.
WS/webhooks: leaky bucket + backpressure.
Кризис: dynamic throttling + circuit-breaker + degrade.

Summary

Load control is a multi-level discipline: correct algorithms (bucket/windows/competitiveness), fair limit keys, SLO adaptation and transparent feedback. By sewing limits into gateway/mesh/services, arming GraphQL/WS/webhooks with profile policies and connecting observability with playbooks, you turn peak events and other people's failures into controlled situations - without crashes, disrupted payments and conversion drawdowns.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.