Microservice architecture
1) Why microservices in iGaming
Speed of change: independent releases of team features (payments, content, risk, tournaments).
Reliability: failure of one service does not bring down the entire product (failure limits).
Scale: horizontal scale of "hot" domains (wallet, lobby, streams).
Compliance: segregation of data by region/jurisdiction.
When not worth it: a small team/volume, no DevOps practices, weak automation of tests - then a modular monolith is better.
2) Domains, borders and teams (DDD + Team Topologies)
Domain contours: Account/Profile, CCM/Compliance, Payments/Wallet, Game Content/Aggregation, Bonuses/Missions, Tournaments, Marketing/CRM, Reporting/BI.
Bounded Context = Data Model and Language Contract.
Change flows ↔ commands: one command = one loop + its SLOs.
BFF (Backend for Frontend): separate facades for Web/Mobile/Partner, so as not to collect "orchestration" on the client.
3) Communications: synchronous vs asynchronous
Synchronous (REST/gRPC): when an immediate response is needed (checking deposit limits).
Asynchron (Kafka/NATS/SQS): events and background processes (cashback accrual, mailings, rating updates).
- Critical path = minimum network hops.
- Cross-domain integration - through events and contractual APIs.
- Do not build "chains of 5 + synchronous calls" online → use EDA/sagas.
4) Contracts and versioning
Contract One: OpenAPI/AsyncAPI + Schema Registry (Avro/JSON Schema).
SemVer + compatibility: Adding fields does not break clients.
Consumer-driven contracts (CDC): auto-checks in CI (vs. regressions).
Rejection policy: support window (12-18 months), telemetry for older versions.
5) Events, sagas and consistency
Outbox/Transaction Log Tailing: atomic record "data + event."
Saga patterns:- Orchestration (central coordinator) for payments/outputs.
- Choreography (reaction to events) for bonuses/missions.
- Idempotence: keys on 'entityId + action + nonce', dedup registry storage.
- Consistency: "external" - through events; "internal" - transactions within the service.
6) Data and storage
The principle of "own store": each service owns its own database (isolation of schemes).
Storage selection by access pattern:- Transactions/balances are relational (PostgreSQL) with strict invariants.
- Events/log - append-only (Kafka/Redpanda).
- Cache/sessions - Redis/KeyDB; leaderboards - Redis Sorted Sets.
- Search - OpenSearch/Elastic.
- Materialized Read Projections (CQRS) - Quick Lists/Reports.
7) Reliability and stability
Timeouts/Retry with jitter/Retry-budget only for idempotent operations.
Circuit-breaker/Outlier-ejection between services.
Bulkhead: separate pools for "noisy" upstream.
Rate limits per-client/route, backpressure (503 + Retry-After).
Dead-letter + poison-message handling in queues.
8) Observability
Trace: OpenTelemetry ('trace _ id' through shlyuz→servisy→BD).
Metrics: RPS, p50/p95/p99, error rate 4xx/5xx, saturation (CPU/mem/queue), business metrics (TTP, TtW).
Logs: structured JSON, PII/PAN/IBAN masking, correlation by 'trace _ id'.
SLO/alerts: to route/function (for example, 'Deposit p95 ≤ 300 ms', 'success ≥ 98. 5%`).
9) Safety and compliance
Zero-Trust: mTLS servis↔servis (SPIFFE/SPIRE), short-lived certificates.
AuthN/Z: OAuth2/JWT (aud/scope/exp), attribute differentiation of roles.
Secrets: KMS/Secrets Manager/Sealed Secrets, key rotation.
GDPR/data localization: regional clusters, geo-fencing on the API gateway.
Audit: immutable logs (WORM), tracing admin actions.
10) Deployment and Releases
Containers/K8s: one service = one deploy; resources/limits; PodDisruptionBudget.
CI/CD: linters, unit/contract/integ tests, security scan, SBOM.
Releases: canary/blue-green/shadow, scheme migrations via expand-and-contract.
Autoscale: HPA by CPU + RPS + p95 + queue-depth; drain on collapse.
11) Performance and cost
Profiling: p95/99 "by services and methods," flame-graphs.
Caching: read-through/write-through; TTL/disability by event.
Data locality: Keep hot data close to computation.
FinOps: download target 60-70%, "warm pools," auto-pause of inactive workers.
12) Domain templates (iGaming)
12. 1 Payments/Wallet
Services: 'payments-gw' (facade), 'wallet', 'psp-adapters-', 'fraud-check'.
Stream: 'init → reserve → capture/rollback' (saga).
События: `PaymentInitiated`, `PaymentAuthorized`, `PaymentSettled/Failed`.
Idempotency: 'Idempotency-Key', deadup in 'wallet'.
12. 2 CCM/Compliance
Сервисы: `kyc-flow`, `doc-storage`, `sanctions-check`, `pep-screening`.
События: `KycSubmitted/Approved/Rejected`, `RiskScoreUpdated`.
Audit and ETA: task queue, time-line case, post-actions.
12. 3 Bonuses/Missions
Services: 'bonus-engine', 'wallet-bonus', 'eligibility'.
Choreography: 'BetPlaced → BonusEngine → BonusGranted → WalletCredited'.
Protection against abuse: idempotent grants, limits, rule simulator.
12. 4 Tournaments/Leaderboards
Services: 'tournament-svc', 'scoring', 'leaderboard'.
Storage: Redis ZSET + periodic "flush" in OLAP.
События: `ScoreUpdated`, `TournamentClosed`, `RewardIssued`.
13) Contract + Event Example (Simplified)
OpenAPI (fragment) - Wallet
yaml paths:
/v1/wallet/{userId}/credit:
post:
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/CreditRequest'
responses:
'202': { description: 'Enqueued' }
components:
schemas:
CreditRequest:
type: object required: [amount, currency, reason, idempotencyKey]
properties:
amount: { type: number }
currency: { type: string, example: UAH }
reason: { type: string, enum: [Deposit, Bonus, Adjustment] }
idempotencyKey: { type: string }
AsyncAPI (fragment) - event
yaml channels:
wallet. credit. applied:
publish:
message:
name: WalletCreditApplied payload:
type: object required: [userId, amount, currency, sourceEventId]
14) Testing
Unit/Property-based for domain rules.
CDC (Pact/Assertible) - provider/consumer contract tests.
Integration with local brokers (Testcontainers).
Critical flow E2E (registratsiya→depozit→start igry→vyvod)
Chaos/Failover tests: PSP shutdown/broker drop/zone loss.
15) Metrics and SLO (minimum)
Availability of services: '≥99. 9% 'for payment/wallet.
Latency p95: critical path API ≤ 300-500 ms.
Error budget: 0. 1–0. 5% per quarter, burn-alerts.
Queues: lead time events (produce→consume), DLQ ≤ 0. 1%.
Business: TTP, TtW, FTD-success, KYC-TtV.
16) Checklists
Service design
- Clear domain boundary and data owner.
- OpenAPI/AsyncAPI contracts + schemas in Registry.
- SLO/alerts defined; metrics/trails/logs are built in.
- Timeout/Retray/Idempotency Policies.
- Schema migrations: expand-and-contract.
Before Release
- Unit/CDC/integration tests green.
- Canary route and rollback plan.
- Rate-limits/weight routes are configured.
- Secrets/keys/certificates are digging.
- Ficha flags and follbacks are prepared.
17) Anti-patterns
Network as data bus: deep synchronous chains instead of events.
Common "god" - DB for all services.
Lack of idempotency → double write-offs/accruals.
Dark releases without telemetry and kill-switch.
Hidden session (stickiness everywhere instead of external condition).
Contracts "in code" without version and CDC.
Logic in API gateway instead of services (gateway = thin).
18) Monolith Migration (Strangler Fig)
1. Select the facade gateway and the primary circuit (for example, payments).
2. Remove binary logging (outbox) from monolith to events.
3. Gradually transfer endpoints to a new service (routing/canary weights).
4. Compress the monolith to the "core" and turn it off.
19) Stack and infrastructure (example)
Communications: REST/gRPC, Kafka/NATS; Schema Registry.
Repositories: PostgreSQL, Redis, OpenSearch, S3/MinIO; OLAP — ClickHouse/BigQuery.
Containers/orchestration: Docker, Kubernetes (Ingress/Gateway), Service Mesh (Istio/Linkerd) if necessary.
Gateway: Envoy/Kong/Traefik/NGINX.
CI/CD: GitHub Actions/GitLab CI + ArgoCD/Flux; Pact/OWASP/ZAP.
Observability: OpenTelemetry, Prometheus, Tempo/Jaeger, Loki.
20) Final cheat sheet
Design boundaries by domain and data responsibility.
Synchron - only where an answer is needed now; the rest are events.
Contracts/Schemes/CDC - Regression Insurance.
Sagas + outbox + idempotency - the foundation of reliability.
Observability and SLO are not an option, but the "ready" criterion.
Releases via canary/blue-green, migrations - expand-and-contract.
Safety/compliance: mTLS, JWT, KMS, regional data.
First, a modular monolith, then evolution - if the scale and team are ready.