Hybrid cloud: on-prem + cloud

1) Why a hybrid and when is it justified

Drivers: regulatory requirements (data residency/PII), existing on-prem investments, latency to "proprietary" systems, cost control, access to managed cloud services.
Trade-offs: complexity of networks and security, duplication of competencies, synchronization of data and configs, operational risks.

Motto: portable where critical; cloud-native where it is profitable.

2) Hybrid models

On-prem extension: cloud as data center extension (new microservices/analytics, fronts).
Cloud-first with local anchors: core in the cloud, on-prem - accounting systems/payment gateways/PII storage.
Cloud-bursting: elastic peaks of load in the cloud (batch, promo-peaks), base volume - locally.
DR to Cloud: Hot/Warm Cloud Reserve for on-prem (RTO/RPO managed).
Edge + Core: PoP/edge nodes are closer to the user, root data/ML is in the cloud.

3) Network and connectivity

3. 1 Channels

Site-to-Site VPN (IPsec/SSL) - start quickly, higher latency, jitter.
Direct lines (DC/ER/IC, MPLS) - predictable SLAs, below delay, more expensive.
Dual-link + BGP - fault tolerance and routing control.

3. 2 Addressing and routes

Single RFC1918 diagram without intersections; CIDR plan for years to come.
NAT-domes at borders only; east-west without NAT.
Segment/VRF for isolation of environments (dev/stage/prod), tenants, providers.

3. 3 Time and DNS policies

Single NTP (clock = fate for cryptography/signatures).
Split-horizon DNS: internal zones (svc. cluster. local, corp.local), external - public.
Health-based GSLB for inbound traffic.

4) Identity and access

SSO Federation: OIDC/SAML, on-prem IdP ↔ cloud IdP; SCIM provisioning.
Roles according to the principle of least privilege; break-glass accounts with MFA.
Machine identity: SPIFFE/SPIRE or mesh-PKI for mTLS.
RBAC "end-to-end": Git/CI/CD → cluster/mesh → brokers/DB → logs.

5) Platform: Kubernetes + GitOps

5. 1 Single execution layer

Clusters on-prem and cloud with the same versions/CRD.
GitOps (Argo CD/Flux): single charts/overlays, drift control, promo streams.

5. 2 Service mesh

Istio/Linkerd: mTLS default, locality-aware balancing, failover inter-cluster.
L7 policies (JWT, headers, rate limits, retry/circuit/timeout) - in manifest code.

5. 3 Example (K8s topology & mesh)

yaml anti-affinity and distribution by zones on-prem cluster spec:
topologySpreadConstraints:
- maxSkew: 1 topologyKey: topology. kubernetes. io/zone whenUnsatisfiable: DoNotSchedule labelSelector: { matchLabels: { app: api } }
Istio DestinationRule: local cluster priority, then trafficPolicy cloud:
outlierDetection: { consecutive5xx: 5, interval: 5s, baseEjectionTime: 30s }

6) Data and storage

6. 1 Bases

On-prem master, cloud read-replica (analytics/directories).
Cloud master + on-prem cache (low latency for local integrations).
Distributed SQL/NoSQL (Cockroach/Cassandra) with local quorums.
CDC/log replication (Debezium) between loops; idempotency of handlers.

6. 2 Object/file/block

S3-compatible stors (on-prem MinIO + cloud S3/GCS) with replication/versioning; WORM for audit.
Backups: 3-2-1 (3 copies, 2 media, 1 - offsite), regular recovery verification.

6. 3 Cache and queues

Redis/KeyDB cluster per-site; global cache - only through events/TTL.
Kafka/Pulsar: MirrorMaker 2/replicator; key - deadup/idempotency of consumers.

7) Security and compliance (Zero Trust)

mTLS everywhere (mesh), TLS 1. 2 + on the perimeter; disabling unencrypted channels.
Secrets: HashiCorp Vault/ESO; short-lived tokens; auto-rotation.
KMS/HSM: keys segmented per jurisdiction/tenant; scheduled crypto rotations.
Segmentation: NetworkPolicies, micro-segmentation (NSX/Calico), ZTNA for admin access.
Logs: immutable (Object Lock), end-to-end 'trace _ id', PII/PAN masking.

8) Observability, SLO and incident management

OpenTelemetry SDK everywhere; Collector on-prem and in the cloud.
Tail-sampling: 100% ошибок и p99, labels `site=onprem|cloud`, `region`, `tenant`.
SLO and Error Budgets by slices (route/tenant/provider/site); alerts by burn-rate.
End-to-end dashboards: RED/USE, dependency maps, canary comparisons (before/after migrations).

9) CI/CD and configs

A single registry of artifacts (pull-through cache on-prem).
Promo stream: dev → stage (on-prem) → canary (cloud) → prod; or vice versa - depending on the goal.
Checks: contract tests (OpenAPI/gRPC/CDC), static analysis, IaC linking, image scan, SLO gates.

10) DR/BCP (continuity plan)

RTO/RPO per service. Examples:

catalogs/landings: RTO 5-15 min, RPO ≤ 5 min;
payments/wallets: RTO ≤ 5 min, RPO ≈ 0-1 min (quorum/synchronous within the site).
Runbook: switching GSLB/weights, raising standby in a cluster, feature-flags "lightweight mode."
GameDays: quarterly - disconnection of the site/channel, verification of real RTO/RPO.

11) Cost and FinOps

Egress between on-prem and the cloud is the main "hidden" expense; cache and keep hiking to a minimum (SWR, edge).
Tagging: 'service', 'env', 'site', 'tenant', 'cost _ center'.
Rule 80/20: we transfer/keep portable 20% of the "critical core," the rest - where it is cheaper.
Downsampling metrics, references of logs "hot/cold," budget-aware sampling tracing.

12) Placement patterns of workloads

Pattern	Where is the CPU	Where is the data	Comment
Data-gravity	Cloud	On-prem	Analytics/ML in the Cloud by CDC; minimal egress
Edge-first	On-prem/PoP	Cloud	Real time at the client; aggregation and retention - in the cloud
Portable-core	Both	Both	K8s/mesh/Vault/OTel are one; operational complexity higher
DR-to-cloud	On-prem	Cloud (replicas)	Regular exercises; fast cutover

13) Examples of configs

13. 1 IPsec S2S (idea)


onprem ↔ cloud: IKEv2, AES-GCM, PFS group 14, rekey ≤ 1h, DPD 15s, SLA monitoring jitter/packet-loss

13. 2 Terraform (tag/label snippet)

hcl resource "kubernetes_namespace" "payments" {
metadata {
name = "payments"
labels = {
"site"    = var. site    # onprem    cloud
"tenant"   = var. tenant
"cost_center" = var. cc
}
}
}

13. 3 Vault + ESO (secret from on-prem to cloud cluster)

yaml apiVersion: external-secrets. io/v1beta1 kind: ExternalSecret spec:
refreshInterval: 1h secretStoreRef: { kind: ClusterSecretStore, name: vault-store }
target: { name: psp-hmac, creationPolicy: Owner }
data:
- secretKey: hmac remoteRef: { key: kv/data/payments, property: HMAC_SECRET }

14) Antipatterns

Intersecting CIDR → NAT chaos; first the address plan, then the channels.
One "shared" global cache with strong consistency → latency and split-brain.
Retrays without idempotency → double write-offs/orders.
"Naked" VPN without mTLS/Zero Trust inside - lateral movement when compromised.
Lack of DR exercises: plans do not work in reality.
The discrepancy between the versions of K8s/CRD/operators → the impossibility of uniform charts.
Logs in free format without 'trace _ id' and masking are impossible.

15) Specifics of iGaming/Finance

Data residency: PII/payment events - on-prem/regional circuit; to the cloud - aggregates/anonymous.
PSP/KYC: multi-providers; smart-routing from the cloud to local gateways, fallback to backup; webhooks through a broker with deduplication.
"Money Paths": individual SLOs above total; HMAC/mTLS, 'Retry-After', 'Idempotency-Key' are required.
Audit: WORM storage (Object Lock), immutable transaction logs, two-way recording (on-prem + cloud) for critical events.
Jurisdictions: KMS/Vault key segmentation per country/brand; geo-blocks on the perimeter.

16) Prod Readiness Checklist

Address plan, DNS, NTP - one; S2S links + forward protected links (BGP).
Single identity (SSO/OIDC/SAML), MFA, least privilege; SPIFFE/SPIRE for services.
K8s in all sites, GitOps, same operators/CRD; service mesh с mTLS и locality-aware LB.
Data: CDC, consistency tests, RPO/RTO policies, 3-2-1 backup and regular restore drives.
Security: Vault/ESO, Rotation, NetworkPolicies, ZTNA; logs are immutable.
Observability: OTel, tail-sampling, SLO/budgets by site/region/tenant; canary dashboards.
CI/CD: contract tests, IaC linting, image scan; release-gates by SLO.
DR-runbooks, GameDays, measured actual RTO/RPO; cutover/roll-back buttons.
FinOps: egress limits, tags and reports, metrics/logs/trails retention policy.
iGaming specifics: data residency, multi-PSP, WORM audit, individual SLOs for payments.

17) TL; DR

Hybrid = common execution platform (K8s + GitOps + mesh + OTel + Vault) on two worlds: on-prem and cloud. Plan network and identity, make data portable via CDC/idempotency, differentiate security across Zero Trust, measure SLO/Error Budgets reliability, and train DR. For iGaming, keep data and payments in jurisdiction, use multi-PSP smart-routing, and unchangeable auditing.

Hybrid cloud: on-prem + cloud

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects