Message Broker and Event Routing
(Section: Technology and Infrastructure)
Brief Summary
Message Broker is a fundamental layer of integrations and event bus in iGaming. It implements delivery, buffering and routing of messages between microservices of rates, payments, anti-fraud, KYC, CRM and analytics. Well-designed exchanges (exchanges), queues, routing keys and re-delivery rules provide low latency, resistance to traffic bursts and predictable SLOs.
Broker role in iGaming platform
Decoupling services: publishing events instead of hard synchronous calls.
Flexible routing: one event → many consumers (CRM, risk, analytics).
Load management: queues, prefetch/QoS, backprescher.
Reliability and recovery: confirmations, retrays, DLQ, replication.
Audit and compliance: event tracing, PII masking, retention policy.
Messaging Models
Point-to-Point (task queue): one consumer processes a task (KYC, e-mail, PSP webhook).
Pub/Sub (domain events): publication to the exchanger with a fan-out for several queues.
RPC via broker: request/response with correlation (rarely on hot paths, but useful for integrations).
Routing Concepts (AMQP Classics)
Exchanges and bindings determine which queue the message will fall into:1. direct - exact match of'routing _ key '.
2. topic - templates'a. b. c'c "(one word) and '#' (0 + words). Universal choice.
3. fanout - broadcast to all related queues.
4. headers - header routing (key/value), useful for complex policies.
Examples of keys and topologies:- `payments. psp. stripe. succeeded`, `payments. psp..failed`, `bets. live. #`, `rg. limit. breach`.
- Exchangers by domain: 'payments. topic`, `bets. topic`, `risk. topic`; individual - for 'platform' system events. audit`.
Queues and Policies
Work queue: consumed by business handlers.
Retry queues: with TTL (delay) and DLX for exponential backups (for example, '5s → 1m → 5m → 1h').
DLQ (Dead-Letter Queue): the final "dump" after the exhaustion of retras.
Priorities: for urgent tasks (conclusions> letters).
Lazy/Quorum: lazy - saving RAM with large backlogs; quorum - consensus-based HA.
- `work. q` → `x-dead-letter-exchange=retry. ex`
- `retry. 1m. q` → `x-message-ttl=60000`, `x-dead-letter-exchange=work. ex`
- `dlq. q '→ monitoring and manual remediation
Delivery warranties and procedures
At-least-once - default: duplicates are possible → idempotence is mandatory.
At-most-once - minimum delay, but risk of loss (for "non-critical" signals).
Exactly-once - rarely practical in brokers; achieved more difficult and more expensive. For money: at-least-once + hard idempotence.
- In one queue and with a single consumer, the order is preserved; with parallelism + retraces, the order may be disturbed.
- For entities with an order requirement, serialize the stream (single-active consumer per key) or transfer it to "log" buses (streaming).
Idempotency and transactional publishing
Idempotency-Key in a message (ULID/UUID), dedup storage with TTL or upsert by key.
Outbox pattern: writing an event to the 'outbox'table within a business transaction, the connector publishes to the broker → excludes the "double entry "/loss.
Correlation metadata: 'message _ id', 'trace _ id', 'causation _ id', 'tenant _ id'.
RPC via broker (when needed)
The request is published with 'reply _ to' and 'correlation _ id', the response is in the specified queue.
Use limited (external providers, synchronous checks), control timeouts and chat tendency (otherwise - degradation into a distributed monolith).
For hot user paths, asynchronous events + state projections are preferred.
Data Contracts and Schemas
Formats: Avro/Protobuf/JSON-Schema. For JSON, fix the versioning and required fields.
The politics of evolution: backward-compatible change; breaking changes without migrations are prohibited.
PII - tokenization/encryption of fields; purpose and shelf life.
Error Handling, Retray, DLQ
Classification: temporary (network/5xx) retray →; file (validation/scheme) → DLQ.
Exponential backoff + jitter, retry limit, poison-pill labels.
Delayed delivery: via TTL/Delayed-exchange.
The tool "reinject to work" from DLQ after fixing the cause.
Observability and SLO
Producer metrics: publishing speed, errors/confirmations.
Queue metrics: length, consumption rate, percentage of retrays, p99 queue time.
Consumers: lag, throughput, processing time, NACK share.
SLO: p99 E2E latency of event delivery ≤ X seconds; availability ≥ 99. 9%; DLQ-rate ≤ Y%.
Tracing: end-to-end 'trace _ id '/' span _ id', logs by 'message _ id'.
Alerts: DLQ/lags growth, quorum drop, NACK surge, retry stages sticking.
Security and access
TLS/MTLS in transit; encryption on disk when persistent queues are stored.
RBAC/ACL: publish/consume rights by vhost/namespace/topic.
Segmentation: sensitive domains (payments/CCM) - separate exchangers/clusters.
Secrets in Vault/SOPS; audit log of publications/subscriptions.
Data localization: storage and retention by region (EU, Turkey, LatAm).
High Availability and DR
Quorum queues/replication, odd number of nodes, AZ anti-affinity.
Cross-regional replication (federation/shovel) for critical domains.
Switching regulations (runbook), periodic DR exercises (game day).
Versioning topologies as code (IaC) - repeatable deposits and fast resync.
Performance and tuning
Producer: publisher confirms, channel reuse, asynchronous publications.
Queues: prefetch for the average duration of the task; lazy for deep backlogs; separation of "hot" queues by nodes.
Network/OS: 10/25G, file descriptors, TCP tuning. JVM/GC - for load profile.
Tests for burst loads (matches, tournaments, peak payments).
Typical routing patterns for iGaming
1. Payment events (topic):
Exchange: `payments. topic`
Keys:- `payments. psp. stripe. succeeded`
- `payments. psp..failed`
- `withdrawal. requested. #`
- `ledger. writer. q '(bind:' payments. #`)
- `crm. triggers. q '(bind:' payments... succeeded ')
- `risk. reviews. q '(bind:' withdrawal. #`)
2. Antifraud scoring (direct + retry):
`risk. work. q` ← `risk. direct` (`routing_key=risk. check`)
`risk. retry. 1m. q '(TTL 60s → DLX back to'risk. direct`)
`risk. dlq. q 'for fatal.
3. Notifications (fanout + priority):
`notify. fanout` → `email. q (prio)`, `sms. q`, `push. q`
Priorities: conclusions/limits above marketing mailings.
4. Audit and trace (headers):
Header bindings' {"tenant ": "X, ""critical":" true"} '→ a separate audit queue.
Example of Minimum Message Scheme (JSON)
json
{
"message_id": "01HX8H8Y6D6W0T1S2A3B4C5D6E",
"trace_id": "f4d2a1...e9",
"occurred_at": "2025-11-05T11:20:45. 321Z",
"tenant_id": "eu-1",
"schema_version": 3,
"event": "payments. psp. stripe. succeeded",
"payload": {
"payment_id": "pay_123",
"player_id": "p_987",
"amount": { "currency": "EUR", "value": 50. 00 },
"psp_tx": "tx_456",
"idempotency_key": "ulid_..."
}
}
Integration with other loops
Streaming/analytics: important topics can be duplicated in the log bus (Kafka/Redpanda) for retching and reprocessing.
Fichestor: events → online features (Redis) and offline parties (Parquet/OLAP).
Saga orchestration: commands via direct/topic, events - pub/sub; compensating steps - as separate messages.
Implementation checklist
1. Define domain exchangers and routing key standard.
2. Design a work/retry/DLQ for each critical flow.
3. Enable publisher confirms, 'prefetch', priorities and delay where needed.
4. Enter idempotency-key, outbox, and correlation IDs.
5. Approve data schemas and evolution rules.
6. Configure TLS/RBAC, segmentation by domain/tenant.
7. Set SLO and alerts (lag, DLQ-rate, p99).
8. Prepare DR plan and automated IaC topologies.
9. Perform load and chaos tests.
10. Document the incident runbook and re-inject from the DLQ.
Anti-patterns
One "giant" exchanger with no key discipline; random "as you have to" bindings.
Absence of retry/DLQ and mixing of temporal/fatal errors.
Synchronous RPC over broker on user hot paths.
Lack of idempotency and outbox → doubles/loss of money.
PII storage in clear, share publish/consume for all.
Summary
A well-designed Message Broker is a robust event artery where routing is predictable and fault tolerance is built in at the topology level. Use topic exchangers, a single key standard, work/retry/DLQ for each critical stream, idempotency and outbox, strict SLOs and observability. In tandem with the streaming bus and state projections, this gives the iGaming platform sustained speed, transparency and control over complexity as the load grows.