Technology and Infrastructure → Multi-cloud strategy and synchronization
Multi-cloud strategy and synchronization
1) Why Multi-cloud
Multi-cloud - using two or more public clouds (or their combination with on-prem) for:- Resilience and DR: reducing cloud-specific risks (regional/platform failures).
- Geography and compliance: storage and processing in the right jurisdictions (data residency).
- Performance and cost: route to near POP, market arbitrage on prices/quotas.
- Independence from the vendor: freedom of technology and bargaining power.
- The price of the issue is the complexity of synchronizing data, networks, identities and change processes.
2) Basic deployment models
2. 1 Asset-liability (multi-cloud DR)
Prod lives in Cloud-A; in Cloud-B - warm/hot stand-by.
RTO/RPOs depend on the depth of replication, from minutes (journaling) to hours (backup/restore).
Pros: simpler and cheaper. Cons: RTO higher, risk of config "drift"
2. 2 Asset-asset (two battle planes)
Traffic is distributed between Cloud-A/Cloud-B (GeoDNS/Anycast, GSLB, country level/ASN).
Requires thoughtful data consistency and "blast radius" isolation.
Pros: low RTO/RPO, closer to the user. Cons: Complexity of consistency and testing.
2. 3 Split by domain (functional segmentation)
Payment core in the cloud with the best private connections to the PSP; content/directory - in another.
Minimize cross-cloud hot-track synchronizations.
3) Data synchronization: strategies and patterns
3. 1 Consistency types
Strong-Transactional synchronous replication (typically within the same cloud/region).
Final (eventual): asynchronous replication; suitable for catalogs, profiles, analytics.
Bounded staleness-Valid lag (seconds/minutes) for reads outside the hot vertical.
3. 2 Replication techniques
CDC (Change Data Capture): journal → events → application in another cloud; good for DWH/reporting/caches.
Event Sourcing: source of truth - stream of domain events; from these are assembled projections in each cloud.
CRDT/conflict-free structures: for editable entries/counters (e.g. ratings/leaderboards).
Dual-write with idempotency: recording and publishing by event; the receiver provides dedupe (outbox/inbox).
Object stores: versioning + cross-region/cross-cloud replication (with egress overhead).
3. 3 Conflict-resolution (example)
Domain rules: "last operation wins" only if idempotent commands of the same type.
Order according to the source of truth: payment status finalizes the wallet, and not vice versa.
Vector clocks/logical labels: for rare collisions in asset-asset records.
Compensations (sagas): in case of discrepancy - domain compensation (unblocking the balance, reversing the transaction).
3. 4 Practical layout (wallet and payments)
Commands (debit/credit) go to the local log in Cloud-A/Cloud-B.
Events' wallet. changed 'are published to both clouds via an inter-cloud bus.
Status finalization - PSP confirmation only; deduplication by'operation _ id '.
Final reports are collected CDC→DWH in each cloud; vendor-dependent fields are normalized.
4) Network layer and global traffic
GSLB (Global Server Load Balancing): GeoDNS/Anycast, health samples per-cloud, stickiness per session.
Mesh-over-internet/private links: IPsec/Cloud-to-Cloud interconnect/private peerings.
Egress control: fixed NAT-IP by allow-list to PSP/KYC; QoS and limits.
Segmentation: separate subnets for prod/stage; east-west traffic control is inter-cloud.
[Users] → [GSLB/Anycast] → (Cloud-A: Edge/API) ↔ (Cloud-B: Edge/API)
[Services / Data A] ↔↔↔ [Services / Data B]
^ Inter-cloud Mesh ^
[DWH/CDC A] [DWH/CDC B]
5) Identity, secrets and compliance
IAM federation: single IdP (OIDC/SAML), role model projected into both clouds; exclude "snowflakes."
Secrets and KMS: keys on the side of each cloud (BYOK/HYOK if necessary), rotations agreed; Do not replicate master keys directly.
mTLS/signature: inter-cloud mutual TLS services; events and webhooks are signed by HMAC with keys to the cloud.
Data residency: tags/data classes, routing/storage policies (PII/PCI remain in the country).
Audit: WORM logs, cross-cloud tracing, unified change log.
6) Platform and abstractions
Kubernetes multi-cluster: clusters in each cloud; unification via GitOps (Argo/Flux), cluster profiles and policy-as-code (OPA/Gatekeeper).
Service Mesh (multi-cluster): mTLS, retry/breakers, locality-aware routing; clearly restrict cross-cloud calls.
Storage (CSI) and cache: avoid stateful set with mandatory synchronous inter-cloud writing; cache/read - locally, asynchronous warm-up.
IaC: Terraform/Crossplane for cloud artifacts; single modules with vendor-specific "inserts."
DevPortal/Service Catalog: per-cloud location and dependency metadata.
7) CI/CD and Change Management
A single mono-repo/mono-specs with per-cloud parameterization (features, quotas, types of balancers).
Canary/Blue-Green per-cloud: release separately in Cloud-A/Cloud-B + metric comparison.
Test matrix: integration tests "oblako↔oblako," replay incidents, geo synthetics.
Contract versioning: Schema Registry general, backward-compatible MINOR rules.
Change freeze on EOL migrations: when you switch traffic between clouds.
8) Observability and SLO management
End-to-end trace_id: sizing through a gateway → service → broker → consumer in another cloud; лейблы `cloud`, `region`, `api_version`, `partner`.
SLO per-cloud/per-region: availability/latency/error dashboards and inter-cloud lag (replication latency).
Inter-cloud synchronization anomalies: alerts to DLQ growth, increase in "conflict rate," CDC lag.
Status page: public statuses by cloud and region.
9) FinOps: Multi-cloud cost
Egress and inter-cloud channels: the main cost item; minimize chatter, aggregate events, use local projections.
Duplicate resources: warm pools, reserved instances/comments in each cloud → balance.
Load profiles: Shift non-critical background jabs to the cloud with the best price/quota.
"Consistency cost" counters: $/sec lag, $/GB replication, $/conflict - transparency for business.
10) Cases for iGaming/fintech
Payments/purse (strict consistency level): asset-liability with fast failover; status finalization events are the only source of truth; log replication.
Game catalog/promo/ratings: asset-asset with eventual, CRDT-counters for statistics; TTL cache per read.
Reporting to regulators: local DWH storefronts, cross-cloud aggregation asynchronously; freshness guarantees (SLO freshness).
Marketing/notifications: geo/cloud orchestration, cross-cloud calling limits; deduplication of submissions.
KYC/AML: parallel providers in different clouds, normalization of responses and a single decision-making policy.
11) Sample solutions (fragments)
11. 1 Outbox→CDC (idempotency)
BEGIN TX apply(domain_command)
insert into outbox(event_id, aggregate_id, type, payload, hash)
COMMIT
//Replicator reads outbox, publishes to inter-cloud bus;
//receiver executes inbox-dedupe on event_id/hash.
11. 2 Conflict policy (pseudo)
if operation. type in {CAPTURE, REFUND}:
source = PSP_EVENT elif operation. type in {LIMIT_SET, LIMIT_REMOVE}:
source = RG_SERVICE apply_if_newer(source, aggregate_version)
11. 3 Network Policy
Inter-cloud calls are allowed only for 'events', 'idp', 'catalog-sync'; straight'wallet. write '- not allowed (locally).
12) Safety and risk
Blast-radius: limits on inter-cloud bandwidth and queues so that the error/loop does not "flood" both clouds.
Gardrails of automation: AI-Ops/ranbooks cannot change configs of two clouds at the same time without a multisignal.
Communication break tests: split-brain behavior, queue growth, timeouts and auto-degradation.
13) Implementation checklist
1. Strict/final consistency domains and target RPO/RTO per-domain defined.
2. Selected model (asset-liability/asset-asset/domain segmentation).
3. Inter-cloud network: GSLB, mesh/private links, fixed egress-IP, WAF/bot protection.
4. Data schemas in Registry, compatibility rules; outbox/inbox is ubiquitous.
5. Idempotence and deduplication (keys, TTL storage, hash).
6. CI/CD: parameterization per-cloud, canary separately, common release center.
7. Observation: 'trace _ id', replication log, conflict-rate, DLQ monitoring.
8. IAM federation, KMS/cloud secrets, access audit.
9. FinOps: egress budgets, alerts for inter-cloud costs.
10. Regular DR drills: cloud feiler, split-brain simulations.
14) Anti-patterns
Synchronous cross-cloud hot-path transactions (wallet/write) → fragility and P99 tails.
A single "master cluster" of databases for two clouds → SPOF over the network.
Replication of "all at once" without data categories → an explosion of costs and conflicts.
Lack of outbox/inbox and idempotency → duplicate payments/credits.
Secrets "moving" through S3-buckets/pipes in open form.
Unaccounted egress and hidden inter-cloud service chats → unpredictable accounts.
15) The bottom line
Multi-cloud is not "two ticks in the console," but the discipline of designing data, networks and change processes. Clearly separate domains by consistency requirements, limit cross-cloud hot-track, use CDC/event sourcing and idempotency, measure lags and conflicts, and keep costs under control. Multi-cloud will then become a tool for resilience and speed, rather than a source of late-night incidents and egress bills.