Message queues: Kafka and RabbitMQ
Message queues: Kafka, RabbitMQ
(Section: Technology and Infrastructure)
Brief Summary
Message queues are the foundation of event-oriented architecture (EDA) in iGaming. They link microservices of rates, payments, anti-fraud, CRM, notifications and analytics. In practice, two classes of solutions are most common:- Apache Kafka is a distributed event log (log) focused on streaming, replication and horizontal scaling through parties.
- RabbitMQ is an AMQP queue broker with flexible routing (exchanges/bindings), priorities, TTL, confirmations and classic queue tasks.
Both tools are mature, but solve different problems: Kafka - for scalable streams and analytics, RabbitMQ - for operational task orchestration, RPC and diverse routing.
Where is it appropriate in iGaming
Kafka - choose when:- We need high TPS events (bets, game events, telemetry) and horizontal scale through the parties.
- Cold/hot re-consume (re-reading tape data), retention and compaction for aggregates (balance, player condition) are important.
- We need stream processes (Kafka Streams/ksqlDB/Flink) for realtime aggregates: tournament leaders, responsible game limits, anti-fraud signals.
- We need classic task queues: KYC check, deferred/repeated payments, sending e-mail/SMS/push, webhooks to PSP.
- Flexible routing (topic/direct/fanout), priorities, TTL, delay, dead-letter and RPC patterns.
- Strict per-consumer restrictions (prefetch/QoS), simple load management and fast retrays are required.
Frequent outcome: Kafka for events and analytics + RabbitMQ for orchestration and integrations.
Data model and routing
Kafka
Topics are → divided into parties, each is an ordered log.
The message key defines the batch → ordering within the key.
Consumers read offset, groups of consumers scale processing.
Retention by time/volume; log compaction stores the latest version of the key.
RabbitMQ
Exchanges (direct/fanout/topic/headers) + bindings → messages get into queues.
Confirmations (ack/nack/request), publisher confirms, priorities, TTL, dead-letter (DLX/DLQ).
Quorum queues (Raft) for high availability; lazy queues to save RAM.
Delivery guarantees and idempotency
At-most-once: no retrays; risk of loss, minimum delay.
At-least-once: default standard → duplicates → idempotent handlers (request/transaction key, upsert, dedup table, outbox) are possible.
Exactly-once: in Kafka, an idempotent producer + transaction topics + agreed consumption is achieved in conjunction, but more often it is more expensive and more difficult; in RabbitMQ - limited and with bones. In real payment/bet flows, at-least-once + strict idempotence is applied.
- Unique idempotency-keys (UUID/ULID) per event/command.
- Outbox pattern in the + Change Data Capture (Debezium) service database → double write prevention.
- Dedup by (key, created_at) in a separate row with TTL.
Order/Message Order
Kafka guarantees order within the party. Choose the key so that the whole "life" of the entity (for example, 'player _ id' for balance) is in one key.
RabbitMQ order is not strictly guaranteed with repeated deliveries/multiple consumers; pipelines critical to order - better in Kafka or through single-active consumer and stream serialization.
Design of topicals and queues
Kafka:- Granularity: 'domain. event '(for example,' payments. deposit. created`).
- Keys: 'player _ id', 'account _ id', 'bet _ id' for ordering.
- Batches = N by target TPS (rule: 1 batch ≈ X messages/sec/consumer); lay stock for growth.
- Retention: events - hours/days; compaction - for "states."
- Exchanges by domain: 'payments. direct`, `risk. topic`.
- Queues for consumers: 'kyc. checker. q`, `psp. webhooks. retry. q`.
- DLQ per work queue delay for backoff.
- Prefetch specifies concurrency, quorum queues for HA.
Errors, Retrays and DLQs
Classify errors: temporary (network/PSP 5xx) → retrays; fatal (validation, scheme) → immediately DLQ.
Exponential backoff + jitter, retray limit, poison-pill detection.
Separate retry-queues by steps (5s, 1m, 5m, 1h).
DLQ handler: alert, trace, manual parsing, re-injection with patch.
Data Contract and Schematics
Use Avro/Protobuf + Schema Registry (for Kafka - de facto standard).
Versioning: backward-compatible changes (adding optional fields), prohibition of breaking migrations.
PII fields - encryption/tokenization; comply with GDPR and local regulations.
Monitoring, observability and SLO
Metrics of producers/consumers: lag, throughput, errors, retrai, processing time.
Logs + tracing (correlation ID: 'trace _ id', 'message _ id').
SLO: p99-latency of publication/delivery, permissible consumer lag, recovery time after files.
Alerts for DLQ growth, lag excess, drop in parties/quorum.
Safety and compliance
TLS in transit, secret encryption (SOPS/Vault), limited ACL/RBAC.
Separate topics/queues for sensitive domains (payments, KYC).
Audit log of publications/subscriptions, storage of keys outside the code.
Regional requirements (EU/Turkey/LatAm): retention, storage localization, masking.
High availability, fault tolerance and DR
Kafka:- Cluster of 3-5 brokers at least; replication. factor ≥ 3.
- min. insync. replicas and acks = all for durable records.
- Cross Regional Replication (MirrorMaker-2) for DR.
- Quorum queues for HA, even/odd number of nodes with quorum.
- Federation/Shovel for inter-data center replication, DR scripts.
- Cold/warm stand, switching tests.
Performance and tuning
Kafka (producer):- `linger. ms` и `batch. size 'for butching;' compression. type` (lz4/zstd).
- 'acks = all ', but watch for latency; tune'max. in. flight. requests. per. connection 'with idempotency.
- Enough parties; NVMe drives 10/25G grid; JVM GC settings.
- Correct group management, 'max. poll. interval. ms', pause the parties at the backoff.
- Publisher confirms in butches; channels re-use.
- 'prefetch '(e.g. 50-300) by treatment time; lazy queues for large backlogs.
- Post hot queues to nodes; TCP tune/file descriptors.
Typical patterns for iGaming
Outbox + Kafka for reliable publication of domain events (bet placed, deposit credited).
RabbitMQ RPC for synchronous requests to integrations (KYC document check, rebate calculation).
Saga pattern: orchestration through events (Kafka) and teams (RabbitMQ) with compensatory steps.
Fan-out notifications: from one event → CRM, anti-fraud, analytics.
Smart-retry PSP webhooks with progressive delays and DLQ.
Migration and hybrid architectures
Start with RabbitMQ for "operating system," add Kafka for events and analytics.
Duplicate publications: service → outbox → connector in both directions (Kafka + RabbitMQ) until complete stabilization.
Gradually migrate analytics/stream aggregation subscribers to Kafka Streams/ksqlDB.
Mini Selection Checklist
1. Load/TPS> tens of thousands/sec? → Kafka.
2. Need retention and re-reading like from a magazine? → Kafka.
3. Flexible routing, priorities, delayed delivery, RPC? → RabbitMQ.
4. Strict key order and horizontal scale → Kafka (key/parties).
5. Simple tasks/work-kew with concurrency control → RabbitMQ.
6. Ideally, a combination: Kafka (events) + RabbitMQ (orchestration).
Examples of minimum configurations
Example: delayed retrai and DLQ in RabbitMQ (via policy)
Work queue: 'psp. webhooks. q`
Retras queue: 'psp. webhooks. retry. 1m. q '(TTL = 60s, DLX points back to operational)
DLQ: `psp. webhooks. dlq`
Policies (conceptually):- `psp. webhooks. q` → `x-dead-letter-exchange=psp. retry. exchange`
- `psp. webhooks. retry. 1m. q` → `x-message-ttl=60000`, `x-dead-letter-exchange=psp. work. exchange`
- `psp. webhooks. dlq '→ monitoring and manual debugging.
Example: Kafka's betting topic
Topic: 'bets. placed. v1 ', parties: 24, RF = 3, retention 7 days.
The message key is' player _ id'or' bet _ id '(choose which is more important for order).
Схема: Protobuf/Avro с `bet_id`, `player_id`, `stake`, `odds`, `ts`, `idempotency_key`.
Testing and quality
Contract tests producer/consumer + Schema Registry.
Chaos tests: node drops, network delays, split-brain.
Load runs with target TPS, p99 check, lag growth and recovery.
Summary
Kafka - event highway and streaming: key ordering, retention/compression, high TPS, real-time analytics.
RabbitMQ - operational task queue: flexible routing, confirmations, priorities, retrays/DLQ, RPC.
In iGaming, best practice is complementary use: events and analytics in Kafka, integration/orchestration tasks in RabbitMQ, with uniform schema standards, idempotency, monitoring and strict SLOs.