GH GambleHub

Availability Zones and Cross Regions

1) Terms and objectives

Availability zone (AZ) - an isolated data center within the region (own capacity/network).
Region - AZ group with common geography and delays.

Recovery objectives:
  • RTO (Recovery Time Objective) - how much time you can not provide a service.
  • RPO (Recovery Point Objective) - how much data can be lost.

Usually: within the region we aim at RTO ≤ 5-15 minutes, RPO ~ 0-1 minutes, interregionally - RTO ≤ 1 hour, RPO ≤ 5 minutes (depending on the product and budget).

2) Architectural models

2. 1 Inside the region (multi-AZ)

Stateless layer: distributed over AZ; balancing - L4/L7 with health-checks.
Stateful layer: clusters with synchronous replication (or quorum) between AZ.
Cache/queues: clustered, with AZ sharding and automatic failover.

2. 2 Interregional (multi-region)

Active-Active: Both regions receive traffic.

Minimal user latency, fast recovery, − consistency and conflict complexity

Active-Passive (hot/warm): the main region serves, the second - in hot/warm anticipation.

simpler data, cheaper; − higher RTO.
Pilot-Light: minimal "light" (data is synchronized, calculations unfold in case of an accident).
DR-backup: only backups and recovery scenario (cheapest and slowest).

3) Data and consistency

3. 1 Databases

Synchronous quorum (RPO≈0, ↑latentnost): PostgreSQL with synchronous standbys within the region; distributed databases (CockroachDB/Cassandra) with local quorums (Local Quorum) and AZ balancing.
Asynchronous interregional (RPO> 0, ↓latentnost): logical replication Postgres/MySQL; «global tables» в KV/NoSQL; CDC→strim to another region.
Conflicting entries: For active-active, use CRDT/versioning or leader-region per key/tenant.

3. 2 Event-sourcing and queues

Queues/streams (Kafka/Pulsar/SQS-like): mirror-topics or cross-regional replicators; key - consumer idempotency and key deadlock.
Webhooks and external partners: sign, have replay, store offset/checkpoints in both zones.

3. 3 Cash

Local caches per-region (write-through/refresh-ahead); global shared cache for durable KVs only (aka split-brain). Disabling by event (pub/sub), TTL - conservative.

4) Global traffic and network loop

GSLB/DNS: Geo-/Latency-based routing, health-checks, traffic-weights for canaries and accidents.
Anycast/Edge: we bring the entrance closer to the user, then to the nearest healthy region.
Failover policies: regional upstream pools, prohibition of 0-RTT on critical paths, low timeouts to interregional dependencies.
Retray policies: exponential backoff + jitter, total-deadline constraint, idempotent PUT/POST with 'Idempotency-Key'.

5) Kubernetes and service mesh

5. 1 Multi-AZ in one cluster

topology spread constraints по `topology. kubernetes. io/zone`.
PodDisruptionBudget и priority classes.
NodeAffinity/Anti-Affinity - Avoid replica co-location.
Storage areas: PV with AZ replication or distributed volume systems.

5. 2 Multi-region (multi-cluster)

Separate clusters per-region + GitOps (Argo CD/Flux) for declarative synchronization.
Service Mesh (Istio/Linkerd): locality-aware load-balancing and failover between regions; mTLS, shared identity.
Traffic-shifting: gradually 1%→10%→50% to a new region; handle "put 0%" instantly.

6) RTO/RPO selection and pattern binding

PatternTypical RTOsTypical RPOWhere applicable
Active-Activeminutes0-minute ~ (CRDT/CDC)low latency global APIs
Hot Standby5-15 minseconds-minutescritical B2C services
Warm Standby15-60 minminutes-hoursb2b/operating subsystems
Pilot-Lighthourshourslow criticality/cost
Backup-onlydaysdayarchive/analytics not real-time

7) Fault tolerance testing (DR)

GameDays: Quarterly full-scale "region/AZ out" scenario.
Chaos injections: network delays, packet losses, broker/base disconnection in one AZ.
RTO/RPO actual: measure switch time and data loss, publish report.
Runbooks: step-by-step instructions and "red buttons" for switching (DNS weights, feature-flags, disabling heavy features).

8) Observability and management

Metric slices by region/AZ/tenant; p95/p99 route latency.
SLO and Error Budgets per region and per global pool.
Alerts: the degradation of one region should not "jam" paging if the second carries traffic normally.

Трейсы: baggage `region`, `az`, `failover=true/false`; reports "how many requests went to failover."

9) Safety and compliance

Data residency: linking PII/payment data to specific regions (jurisdiction).
Secrets: KMS/smart HSM with cross-regional keys and rotation; Separate key materials per region.
mTLS and mutual trust between regions; restrict cross-regional egress by ACL.

10) Cost and savings

Edge cache + SWR - decrease in interregional egress.
Different storage classes (hot/warm/cold) and downsampling metrics/logs.
Auto-scale profiles by region (night minimum).
Image identity + differentiated configuration via environment variables/Helm values.

11) Antipatterns

One Stateful master per system; split-brain without quorum.
Interregional synchronous writing to a single RDBMS (unbearable latency).
Global cache with strong consistency without CRDT → congestion and phantoms.
Retrays without idempotency → duplicate transactions/payments.
A single "global" SLO - hides the failure of one region.
There are no regular DR exercises - the plans are inoperable in battle.

12) Specifics of iGaming/Finance

Payment providers/CCPs are selected regionally; do smart-routing over PSP with health signals, failover to backup.
Jurisdiction: Holding PII and in-country/region transaction logs; cross-region - aggregates/anonymous only.
Limits/responsible play: local rules and hours - do not replicate "head-on" between regions, use event consistency.
Bonuses/balance: idempotent keys and "source of truth" per tenant/region; reconcile-jobs after DR.

13) Mini recipes (pseudo-figures)

13. 1 Envoy locality-aware + failover

yaml load_assignment:
endpoints:
- locality: { region: eu, zone: eu-a }
lb_endpoints: [{ endpoint: { address:... } }]
- locality: { region: eu, zone: eu-b }
lb_endpoints: [{ endpoint: { address:... } }]
- locality: { region: us, zone: us-a } # failover target lb_endpoints: [{ endpoint: { address:... } }]
common_lb_config:
zone_aware_lb_config: {}
locality_weighted_lb_config: {}
outlier_detection:
consecutive_5xx: 5 base_ejection_time: 30s

13. 2 Kubernetes topology spread

yaml spec:
topologySpreadConstraints:
- maxSkew: 1 topologyKey: topology. kubernetes. io/zone whenUnsatisfiable: DoNotSchedule labelSelector: { matchLabels: { app: api } }

13. 3 DNS Weight Feilover (idea)

'weight (eu) = 90 ',' weight (us) = 10 '→ when degraded' eu'automatically shifts to' us'. Health-checks and lowered TTLs (but not too aggressive, 30-120 s).

14) Prod Readiness Checklist

  • RTO/RPO per service defined and agreed with business.
  • Stateless distributed across AZ; stateful has quorum/replication and a clear consistency model.
  • Cross-regional replication (asynchron/CDC), collision/deduplication tests.
  • GSLB/Anycast are configured, health-checks and weights are automated.
  • Kubernetes: topology-spread, PDB, anti-affinity; multi-cluster GitOps.
  • Retrai with jitter, idempotency on write; short timeouts interregionally.
  • DR exercises, measured actual RTO/RPO; current runbook.
  • Observability by region/AZ, SLO and burn-rate on sections, alerts do not "jam" normal operation.
  • Data residency/secrets/keys comply with regulatory requirements.
  • Economics: egress, storage, autoscale profiles under control.

15) TL; DR

Build multi-AZ as a base layer, multi-region as business insurance. Choose a pattern (active-active/standby) for RTO/RPO and cost, replicate data consciously (quorums/CDC/CRDT), manage global traffic through GSLB/Anycast and location-aware balancing. Mandatory: idempotency, short timeouts, regular DR exercises, SLO/metrics on region/AZ slices. For iGaming/Finance, add regional PSP/KYC, data requirements, and split SLOs by jurisdiction.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.