Multi-cloud topology
1) When multi-cloud is justified
Drivers:- Reliability/availability: independent failure zones at provider level.
- Sovereignty/compliance: storage/processing by jurisdictions (data residency).
- Risk management: reduction of vendor-locin, purchasing/price levers.
- Geography/performance: closer to the user and data sources.
- Special services: access to the best "niche" capabilities of different clouds.
- Significant complexity of SDLC/observability/operations.
- Growth in egress value and latency between providers.
- Different IAM/network models/quotas/limits → more operational risks.
2) Topological patterns
3) Network layer and routing
3. 1 Global Login
GSLB/DNS routing: latency-/health--based; short TTLs to migration windows.
Anycast + L7 proxy: single IP, regional health routing.
Policies by jurisdiction: geo-blocking/geo-pinning traffic.
python def pick_cluster(client, intent):
вход: ip, geo, tenant, feature allowed = filter_by_compliance(client. geo) # sovereignty healthy = [c for c in allowed if sdo (c). ok and slo(c). ok]
return argmin(healthy, key=lambda c: latency_estimate(client, c))
3. 2 Inter-cloud connectivity
Private channels/peering where possible; otherwise - TLS + mTLS via the Internet.
Egress control: aggregation/compression, local caches/aggregators.
Networks as code: Terraform/Blueprints, CIDR policies, routes and egress gateways.
4) Data and consistency
4. 1 Models
Globally strong consistency is inter-cloud rarely realistic (latency/grids/cost).
Pragmatic event: bidirectional CDC (change data capture) with conflict resolution.
CRDT/idempotency: for counters/sets/logs - commutative structures.
4. 2 Patterns
Dual-write with outbox: transactional event recording → broker → replication to a neighboring cloud.
Read-local/Write-home: writes to the "home" region/cloud, reads - locally (with versions and stale policies).
Split-brain protection: divergence detection, "compensation" (saga), manual arbitration for monetary invariants.
DB → Debezium/stream → Events(topic@vN) → Cross-cloud relay → Apply w/ resolver resolver: prefer_higher_version prefer_home business_rule()
4. 3 Object storage
Asynchronous replication of buckets, hashes/manifests, dedup.
ILM (hot/warm/cold) policies are cloud independent.
Sovereignty rules: "PII does not leave UA/EEA" - are validated as code.
5) Identity, secrets and keys
Identity Federation: single IdP, short-lived tokens, OIDC-trust on pipelines.
Secrets: KMS/HSM of each cloud + Vault class abstraction; dual-key for rotations/switches.
PoLP/ABAC: rights based on attributes (cloud, region, env, data_class).
Crypto domains: different root keys for jurisdictions → crypto-erasure by scope.
6) Executive environment: clusters and meshes
Multicluster (K8s): one cluster per cloud/region; fleet control via GitOps (ArgoCD/Fleet).
Сервис-меш: mTLS, retries, circuit-breakers, failover policies cross-cluster.
- Static services → in place.
- Interactive APIs → in each cloud (Active/Active).
- Batch/ETL → "green" windows/cheap region (carbon/cost aware).
rego package placement
allow[cloud] {
input. service. pii == false cloud:= input. clouds[_]
cloud. features. contains("cheap_gpu")
}
deny["PII outside allowed region"] {
input. service. pii == true not input. target_region in {"eu-central","eu-north","eu-west"}
}
7) Observability and SLO in multi-cloud
Multi-lease labels: 'cloud', 'region', 'tenant', 'data _ domain'.
SLI/SLO per-cloud and globally: "globally available if ≥1 cloud is available."
Telemetry collection: locally + aggregation with egress control.
Traces: global trace-id, context propagation, tail-based sampling by tails.
Comparison dashboards: A vs B per endpoint/p99/error-budget burn.
8) SDLC/IaC and "policies as code"
Single IaC mono directory: provider modules/stacks, invariants (tags, networks, encryption).
GitOps: declarative manifestos, drift detection, promo environments.
Conformance tests: API/event contracts, Canaries for both clouds.
Release gates: a block at risk of violating SLO in one cloud (burn rate forecast), in the absence of sovereignty matches.
yaml gate: multi-cloud-slo-and-compliance checks:
- slo_burn_rate(global) < 1. 0
- slo_burn_rate(cloud:A) < 2. 0
- compliance_rule("pii_in_region") == pass
- egress_forecast < budget on_fail: block_release
9) Cost and carbon (FinOps/GreenOps)
Unit metrics: '$/req', '$/GB-egress', 'gCO₂e/req'.
Cost/carbon routing for non-critical batch: cheap/green watches/regions.
Egress-cap: budget for inter-cloud traffic; cache/aggregation/compression/TTL.
RI/SP/Committed Use in each cloud + "elastic layer" on spot/preemptible.
10) Testing fails and exercises
Game-days: "extinguish cloud A," "slow down the database," "break through egress limits."
Check points: RTO/RPO, DNS convergence time, flag feature roll, cache behavior.
Chaos smoke in releases: degradation of dependencies should not lead to a cascade of retrays.
11) Security, privacy, compliance
Zero-Trust: mTLS between services/clouds, artifact signature, SBOM.
DPA/sovereignty: dataset catalogs, localization rules, Legal Hold on top of ILM.
Secrets and keys: rotation magazine, playbooks compromise/kill-switch.
Webhooks and external integrations: signature, anti-replay, regional endpoints.
12) Data/event integration templates
12. 1 Bidirectional Kafka-bridge (idea):
cloudA. topicX ⇄ relayA→B ⇄ cloudB. topicX cleanup. policy=compact,delete key-based routing idempotent producer
12. 2 Outbox table and relay:
sql
-- outbox id uuid pk, aggregate_id, type, payload jsonb, version int, created_at timestamptz
-- transactional insertion with domain table change
Next, the connector reads the outbox and publishes the event to the local broker + relay.
12. 3 Conflict strategy (pseudo):
python def resolve(local, remote):
if local. version > remote. version: return local if remote. version > local. version: return remote equal versions: domain rules return business_tiebreak (local, remote)
13) Anti-patterns
"Drag everything as it is into two clouds." Double the difficulty without winning.
Synchronous inter-cloud transactions on the hot track.
Single global encryption key for all clouds/regions.
Logs/trails with PII without disguise and without localization rules.
No external measurements (real availability is visible only on the provider's status page).
No playbooks/drills - DR not working at moment X.
Cascade of retrays during degradation of one cloud (no limiters/shading/breakers).
Unaccounted for egress are unexpected bills.
14) Architect checklist
1. Multi-cloud drivers formulated (SLO/DR/sovereignty/cost)?
2. Pattern selected (AA/AP/DR-Only/Poly-Service) and RTO/RPO committed?
3. Network plan: GSLB/Anycast, health-samples, egress-cap, private channels?
4. Data: CDC/CRDT/dual-write, conflict resolution rules, outbox?
5. Sovereignty: Data/regions map, politicians as code and their gates?
6. IAM/secrets: federation, short-lived tokens, KMS by domain?
7. Clusters/mesh: failover strategy, limits/breaks/timeouts?
8. Observability: labels' cloud/region ', SLO per-cloud and globally, external synthetics?
9. SDLC/IaC/GitOps: single catalog, conformance tests, release gates?
10. FinOps/GreenOps: unit metrics, egress budget, "green" batch windows?
11. Drills: regular game-days, protocols and retests?
12. Exit-plan: data export/formats/deadlines, second-source for key services?
15) Mini sample configurations
15. 1 Jurisdiction Routing Policy (pseudo-YAML):
yaml route:
pii:
allowed_regions: ["eu-central","eu-north","eu-west"]
deny_cross_cloud: false analytics:
allowed_regions: ["eu-","us-"]
prefer_low_carbon: true weights:
eu-central@cloudA: 60 eu-central@cloudB: 40
15. 2 Health-sample for GSLB:
http
GET /healthz
200 OK x-region: eu-central x-slo: ok at-risk breach
15. 3 Failover-feature-flag (pseudocode):
python if slo_at_risk("cloudA", "payments"):
route. weight["cloudA"] -= 20 route. weight["cloudB"] += 20 enable_stale_rates(ttl=1560)
Conclusion
Multi-cloud is an engineering discipline, not a label. It requires clear motives, conscious choice of topology, thoughtful work with data, strong automation and strict policies. If you measure risks and cost, build networks and data "according to the textbook," train fylovers and steer towards simplicity, a multi-cloud platform will give you stability, flexibility and freedom - without surprises in bills and without compromising on user experience.