GH GambleHub

WebSocket streams and events

TL; DR

Work stream = trusted channel (WSS) + summarized offsets + idempotent events + strict limits and backpressure. Do: JWT authentication, authorization for topics, heartbeats, seq/offset + resume-token, at-least-once + deadup. For scale - user/tenant sharding, sticky routing, and queue (Kafka/NATS/Redis Streams) as a source of truth.

1) iGaming business cases (what we really stream)

Balance/limits: instantaneous changes in balance, RG limits, locks.
Bets/rounds/results: confirmation, status, calculation of winnings.
Tournaments/leaderboards: positions, timers, prize events.
Payments: payout/refund status, KYC/AML flags - like notifications (and criticism remains in REST + webhooks).
Service events: chat messages, push banners, session statuses, maintenance.

2) Protocol and connection

WSS only (TLS 1. 2+/1. 3). Maximum of 1 active connection per default device/session.
Ping/Pong: the client sends' ping'every 20-30 seconds, the response timeout is 10 seconds. The server drops the connection at 3 consecutive timeouts.
Compression: 'permessage-deflate', frame size limit (for example, ≤ 64 KB).
Payload format: JSON for external, Protobuf/MsgPack for internal/mobile.

3) Authentication and authorization

JWT handshake in query/header ('Sec-WebSocket-Protocol '/' Authorization'), TTL token short (≤ 15 min), refresh by out-of-band (REST).
Tenant-scoped claims: `sub`, `tenant`, `scopes`, `risk_flags`.
ACLs to topics/channels: subscribing only to allowed 'topics' (for example: 'user: {id}', 'tournament: {id}', 'game: {table}').
Connection re-creation when the token expires: "soft window" 60 s.

4) Subscription model

The client sends commands after connect:
json
{ "op":"subscribe", "topics":["user:123", "tournament:456"], "resume_from":"1748852201:987654" }
{ "op":"unsubscribe", "topics":["tournament:456"] }

'resume _ from '- offset (see § 5) if the client reconnects.
The server responds with ack/nack, the failed ACLs are in'nack 'with'reason'.

5) Delivery guarantees and summary

Purpose: at-least-once per channel + idempotency in the client.

Each event has a monotonous' seq'within the "part" (usually user/room) and a global' event _ id'for deduplication.
With a re-connection, the client sends' resume _ from '= the last confirmed' seq '(or' offset'of the broker). The server loads missed events from the "source of truth" (Kafka/NATS/Redis Streams).
If the lag exceeds retention (for example, 24 hours), the server sends a 'snapshot' of the state and a new 'seq'.

Client semantics:
  • Store'last _ seq '/' event _ id'in durable storage (IndexedDB/Keychain).
  • Dedup by 'event _ id', skip events with 'seq ≤ last_seq', detect holes (gap) → auto-' resync' snapshot request.

6) Message scheme (envelope)

json
{
"ts": "2025-11-03T12:34:56. 789Z",
"topic": "user:123",
"seq": "1748852201:987654",   // partition:offset
"event_id": "01HF..",      // UUID/KSUID
"type": "balance. updated",
"data": { "currency":"EUR", "delta"--5. 00, "balance":125. 37 },
"trace_id": "4e3f.., "//for correlation
"signature": "base64 (hmac (...)) "//optional for partners
}

'type '- domain taxonomy (see event dictionary).
PII/PCI - exclude/mask at the gateway level.

7) Backpressure, quotas and protection against "expensive" customers

Server → Client: per-connection send-queue with sliding window. Full - resetting subscriptions to "noisy" topics or disconnect with code '1013 '/' policy _ violation'.
Client → Server: limits on'subscribe/unsubscribe '(for example, ≤ 10/sec), topic list limit (≤ 50), minimum resubscription interval.
Rate limits by IP/tenant/key. Anomalies → temporary blocking.
Priority: vital events (balance, RG-limits) - priority queue.

8) Protection and safety

WAF/bot profile on handshake endpoint, Origin allowed list.
mTLS between edge gateway and stream nodes.
DoS protection: SYN cookies on L4, limits on the number of open WS/keep-alive interval.
Anti-replay: 'timestamp' in optional payload signature (for partners) with a valid window of 5 min.
Tenant isolation: physical/logical sharding, keys/tokens per-tenant.

9) Transportation architecture

Gateway (edge): TLS terminal, authN/Z, quotas, routing per party.
Stream nodes: stateless workers with sticky routing by 'hash (user_id)% N'.
Event broker: Kafka/NATS/Redis Streams - source of truth and replay buffer.
State-service: stores snapshots (balance, positions in the tournament).
Multi-region: asset-asset; GSLB by nearest region; home-region is fixed at login; with a feiler - a "cold" summary from another region.

10) Order, consistency, idempotency

Ordering is guaranteed within the party (user/room), not globally.
Consistency: the event may come before the REST response; UX must be able to live with an intermediate state (optimistic UI + reconciliation).
Idempotence: reprocessing 'event _ id' does not change the state of the client.

11) Errors, reconnect and storms

Closing codes: '1000' (normal), '1008' (policy), '1011' (internal), '1013' (server overload).
Client exponential backoff + jitter: 1s, 2s, 4s... max 30s.
During mass reconnects ("thundering herd") - the server gives' retry _ after'and "gray" responses with a prompt to use SSE fallback for read-only.

12) Cash and snapshots

Each subscription can start with a snapshot of the current state, then a stream of diff events.
Data _ version schema versioning and compatibility (field extension does not break clients).

13) Observability and SLO

Metrics:
  • Connections: active, established/sec, distribution by tenant/region.
  • Delivery: p50/p95 delays from broker to client, drop-rate, resend-rate.
  • Reliability: share of successful resumes without a snapshot, gap detector.
  • Errors: 4xx/5xx on handshake, closing codes, limit hits.
  • Load: RPS of 'subscribe' commands, queue size, CPU/NET.
SLO benchmarks:
  • Establishing WS p95 ≤ 500 ms (within the region).
  • End-to-end latency event p95 ≤ 300 ms (user-partition).
  • Resume success ≥ 99%, message loss = 0 (по at-least-once).
  • Uptime Stream Endpoint ≥ 99. 95%.

14) Schema and version management

Dictionary of events with owners, examples and semantics.
"Soft" evolution: only adding optional fields; deletion - after the '@ deprecated' period.
Contract tests against client SDKs, linters on JSON Schema/Protobuf.

15) Incident playbooks (embed in your shared playbook)

Latency growth: switch parties to backup nodes, increase the size of the batch at the broker, enable prioritization of vital events.
Reconnect storm: activate 'retry _ after', temporarily raise handshake limits, enable SSE fallback.
Token leak: JWKS rotation, revocation of affected tokens, forced reconnect with re-auth.
Loss of broker party: transfer to snapshot mode, replay after recovery.

16) API Mini Specification (Simplified)

Handshake (HTTP GET → WS):

GET /ws? tenant=acme&client=web
Headers:
Authorization: Bearer <JWT>
X-Trace-Id: <uuid>
Client commands:
json
{ "op":"subscribe",  "topics":["user:123"], "resume_from":"1748852201:42" }
{ "op":"unsubscribe", "topics":["user:123"] }
{ "op":"ping", "ts":"2025-11-03T12:34:56Z" }
Server Responses:
json
{ "op":"ack", "id":"subscribe:user:123" }
{ "op":"event", "topic":"user:123", "seq":"1748852201:43", "type":"balance. updated", "data":{...} }
{ "op":"snapshot", "topic":"user:123", "seq":"1748852201:42", "state":{...} }
{ "op":"error", "code":"acl_denied", "reason":"no access to topic tournament:456" }
{ "op":"pong", "ts":"..." }

17) UAT checklist

  • Summary from the offset after 1/10/60 minutes of downtime of the client.
  • Dedup: redelivery of the same 'event _ id' does not change state.
  • Gap detector → automatic 'snapshot' and alignment.
  • Quotas and backpressure: the loaded client receives policy-disconnect.
  • Multiregion: failover region while maintaining offset.
  • Security: Token rocker expired by JWT, trying to subscribe outside ACL.
  • RG/event balance comes before/after REST - UI correctly "stitches."

18) Frequent errors

No 'seq/offset' and renewal - lose events and trust.
Mixing critical payment commands in WS mutations - use REST.
Lack of backpressure/quotas - "suspended" connections and an avalanche of memory.
Global orderliness is expensive and unnecessary; enough order in the party.
PII logging in events - privacy violations and PCI/GDPR.
Lack of a dictionary of events and versioning - clients break down.

Summary

WebSocket streams give reactive UX and operational signals if they are built as a summarized, protected and limited channel: WSS + mTLS/JWT, ACL on topics, seq/offset + resume, at-least-once with deduplication, backpressure/quotas, broker as a source of truth, observability and SLO. So streams remain fast for the user and manageable for the platform - without compromises on security and money.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.