Multi-cloud topology

1) When multi-cloud is justified

Drivers:

Reliability/availability: independent failure zones at provider level.
Sovereignty/compliance: storage/processing by jurisdictions (data residency).
Risk management: reduction of vendor-locin, purchasing/price levers.
Geography/performance: closer to the user and data sources.
Special services: access to the best "niche" capabilities of different clouds.

Anti-arguments:

Significant complexity of SDLC/observability/operations.
Growth in egress value and latency between providers.
Different IAM/network models/quotas/limits → more operational risks.

2) Topological patterns

Pattern	Description	Pluses	Minuses	Case
Active/Active	Two + clouds serve food at the same time	Min. RTO/RPO, closer to the user	Complex Data/Routing	Critical fintech/identification
Active/Passive (Hot/Warm)	One active, second warm reserve	Easier data, understandable cutover	↑RTO, need regular drill	Most B2C/SaaS
DR-Only (Cold)	Cold backup + backups/images	Cheap	High RTO/RPO	Low-critical systems
Poly-Service	Services are distributed across the clouds	Using the "best" services	Cross-cloud dependencies	Analytics/ML separate from OLTP
Edge-Anchored	Edge/CDN + by Region Best Cloud	Low latency, caches	Complex disability/rules	Global Products/Media

3) Network layer and routing

3. 1 Global Login

GSLB/DNS routing: latency-/health--based; short TTLs to migration windows.
Anycast + L7 proxy: single IP, regional health routing.
Policies by jurisdiction: geo-blocking/geo-pinning traffic.

Cluster selection pseudocode:

python def pick_cluster(client, intent):
вход: ip, geo, tenant, feature allowed = filter_by_compliance(client. geo) # sovereignty healthy = [c for c in allowed if sdo (c). ok and slo(c). ok]
return argmin(healthy, key=lambda c: latency_estimate(client, c))

3. 2 Inter-cloud connectivity

Private channels/peering where possible; otherwise - TLS + mTLS via the Internet.
Egress control: aggregation/compression, local caches/aggregators.
Networks as code: Terraform/Blueprints, CIDR policies, routes and egress gateways.

4) Data and consistency

4. 1 Models

Globally strong consistency is inter-cloud rarely realistic (latency/grids/cost).
Pragmatic event: bidirectional CDC (change data capture) with conflict resolution.
CRDT/idempotency: for counters/sets/logs - commutative structures.

4. 2 Patterns

Dual-write with outbox: transactional event recording → broker → replication to a neighboring cloud.
Read-local/Write-home: writes to the "home" region/cloud, reads - locally (with versions and stale policies).
Split-brain protection: divergence detection, "compensation" (saga), manual arbitration for monetary invariants.

Pseudo-pipeline CDC:


DB → Debezium/stream → Events(topic@vN) → Cross-cloud relay → Apply w/ resolver resolver: prefer_higher_version          prefer_home          business_rule()

4. 3 Object storage

Asynchronous replication of buckets, hashes/manifests, dedup.
ILM (hot/warm/cold) policies are cloud independent.
Sovereignty rules: "PII does not leave UA/EEA" - are validated as code.

5) Identity, secrets and keys

Identity Federation: single IdP, short-lived tokens, OIDC-trust on pipelines.
Secrets: KMS/HSM of each cloud + Vault class abstraction; dual-key for rotations/switches.
PoLP/ABAC: rights based on attributes (cloud, region, env, data_class).
Crypto domains: different root keys for jurisdictions → crypto-erasure by scope.

6) Executive environment: clusters and meshes

Multicluster (K8s): one cluster per cloud/region; fleet control via GitOps (ArgoCD/Fleet).
Сервис-меш: mTLS, retries, circuit-breakers, failover policies cross-cluster.

Distribution:

Static services → in place.
Interactive APIs → in each cloud (Active/Active).
Batch/ETL → "green" windows/cheap region (carbon/cost aware).

Where to deposit policy (Rego sketch):

rego package placement

allow[cloud] {
input. service. pii == false cloud:= input. clouds[_]
cloud. features. contains("cheap_gpu")
}

deny["PII outside allowed region"] {
input. service. pii == true not input. target_region in {"eu-central","eu-north","eu-west"}
}

7) Observability and SLO in multi-cloud

Multi-lease labels: 'cloud', 'region', 'tenant', 'data _ domain'.

SLI/SLO per-cloud and globally: "globally available if ≥1 cloud is available."

Telemetry collection: locally + aggregation with egress control.
Traces: global trace-id, context propagation, tail-based sampling by tails.
Comparison dashboards: A vs B per endpoint/p99/error-budget burn.

8) SDLC/IaC and "policies as code"

Single IaC mono directory: provider modules/stacks, invariants (tags, networks, encryption).
GitOps: declarative manifestos, drift detection, promo environments.
Conformance tests: API/event contracts, Canaries for both clouds.
Release gates: a block at risk of violating SLO in one cloud (burn rate forecast), in the absence of sovereignty matches.

Gate (pseudo):

yaml gate: multi-cloud-slo-and-compliance checks:
- slo_burn_rate(global) < 1. 0
- slo_burn_rate(cloud:A) < 2. 0
- compliance_rule("pii_in_region") == pass
- egress_forecast < budget on_fail: block_release

9) Cost and carbon (FinOps/GreenOps)

Unit metrics: '$/req', '$/GB-egress', 'gCO₂e/req'.
Cost/carbon routing for non-critical batch: cheap/green watches/regions.
Egress-cap: budget for inter-cloud traffic; cache/aggregation/compression/TTL.
RI/SP/Committed Use in each cloud + "elastic layer" on spot/preemptible.

10) Testing fails and exercises

Game-days: "extinguish cloud A," "slow down the database," "break through egress limits."

Check points: RTO/RPO, DNS convergence time, flag feature roll, cache behavior.
Chaos smoke in releases: degradation of dependencies should not lead to a cascade of retrays.

11) Security, privacy, compliance

Zero-Trust: mTLS between services/clouds, artifact signature, SBOM.
DPA/sovereignty: dataset catalogs, localization rules, Legal Hold on top of ILM.
Secrets and keys: rotation magazine, playbooks compromise/kill-switch.
Webhooks and external integrations: signature, anti-replay, regional endpoints.

12) Data/event integration templates

12. 1 Bidirectional Kafka-bridge (idea):


cloudA. topicX ⇄ relayA→B ⇄ cloudB. topicX cleanup. policy=compact,delete  key-based routing  idempotent producer

12. 2 Outbox table and relay:

sql
-- outbox id uuid pk, aggregate_id, type, payload jsonb, version int, created_at timestamptz
-- transactional insertion with domain table change

Next, the connector reads the outbox and publishes the event to the local broker + relay.

12. 3 Conflict strategy (pseudo):

python def resolve(local, remote):
if local. version > remote. version: return local if remote. version > local. version: return remote equal versions: domain rules return business_tiebreak (local, remote)

13) Anti-patterns

"Drag everything as it is into two clouds." Double the difficulty without winning.
Synchronous inter-cloud transactions on the hot track.
Single global encryption key for all clouds/regions.
Logs/trails with PII without disguise and without localization rules.
No external measurements (real availability is visible only on the provider's status page).
No playbooks/drills - DR not working at moment X.
Cascade of retrays during degradation of one cloud (no limiters/shading/breakers).
Unaccounted for egress are unexpected bills.

14) Architect checklist

1. Multi-cloud drivers formulated (SLO/DR/sovereignty/cost)?
2. Pattern selected (AA/AP/DR-Only/Poly-Service) and RTO/RPO committed?
3. Network plan: GSLB/Anycast, health-samples, egress-cap, private channels?
4. Data: CDC/CRDT/dual-write, conflict resolution rules, outbox?
5. Sovereignty: Data/regions map, politicians as code and their gates?
6. IAM/secrets: federation, short-lived tokens, KMS by domain?
7. Clusters/mesh: failover strategy, limits/breaks/timeouts?
8. Observability: labels' cloud/region ', SLO per-cloud and globally, external synthetics?
9. SDLC/IaC/GitOps: single catalog, conformance tests, release gates?
10. FinOps/GreenOps: unit metrics, egress budget, "green" batch windows?
11. Drills: regular game-days, protocols and retests?
12. Exit-plan: data export/formats/deadlines, second-source for key services?

15) Mini sample configurations

15. 1 Jurisdiction Routing Policy (pseudo-YAML):

yaml route:
pii:
allowed_regions: ["eu-central","eu-north","eu-west"]
deny_cross_cloud: false analytics:
allowed_regions: ["eu-","us-"]
prefer_low_carbon: true weights:
eu-central@cloudA: 60 eu-central@cloudB: 40

15. 2 Health-sample for GSLB:

http
GET /healthz
200 OK x-region: eu-central x-slo: ok    at-risk    breach

15. 3 Failover-feature-flag (pseudocode):

python if slo_at_risk("cloudA", "payments"):
route. weight["cloudA"] -= 20 route. weight["cloudB"] += 20 enable_stale_rates(ttl=1560)

Conclusion

Multi-cloud is an engineering discipline, not a label. It requires clear motives, conscious choice of topology, thoughtful work with data, strong automation and strict policies. If you measure risks and cost, build networks and data "according to the textbook," train fylovers and steer towards simplicity, a multi-cloud platform will give you stability, flexibility and freedom - without surprises in bills and without compromising on user experience.

Multi-cloud topology

Conclusion

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects