GH GambleHub

Blue-Green and Canary deploy

Blue-Green and Canary deploy

1) Challenge and key ideas

Blue-Green and Canary are non-stop release strategies that reduce the risk of adoption:
  • Blue-Green: keep two parallel versions (Blue - active, Green - new), switch traffic atomically. A quick rollback → instantly return Blue.
  • Canary: turn on the new version in stages (1% → 5% → 25% → 50% → 100%), monitor SLO metrics and stop/roll back during degradation.

The general principle is to separate "artifact delivery" from "traffic inclusion" and automate observability + rollbacks.

2) When to choose

Blue-Green is suitable when:
  • need instant switching (hard RTO), simple state-less services;
  • there are strict release/freeze windows and clear smoke checks;
  • it is expensive to hold a long double capacity - but it is possible for a short time.
Canary is suitable when:
  • complex changes, step-by-step validation on real traffic is required;
  • there is mature telemetry (SLO, business metrics), auto-stop capability;
  • critically limit the radius of damage (fintech/iGaming streams).

Combo pattern: roll out Green and switch to it through canary-stages (Blue-Green as a frame, Canary as a method of carrying traffic).

3) Traffic routing architecture

Options for switching/adding traffic:

1. L4/L7 balancer (ALB/NLB, Cloud Load Balancer) - weighted target groups.

2. API gateway/WAF - route/weight by versions, headers, cookies, regions.

3. Service Mesh (Istio/Linkerd/Consul) - percentage distribution, fault injection, timeout/retray/restriction handles.

4. Ingress/NGINX/Envoy - upstream weights and attribute routing.

5. Argo Rollouts/Flagger - operator-controller, automatic progression, integration with Prometheus/New Relic/Datadog.

4) Kubernetes: practical templates

4. 1 Blue-Green (via Service selector)

Два Deployment: `app-blue` и `app-green`.
One Service 'app-svc' with a selector for the desired 'version'.

yaml apiVersion: apps/v1 kind: Deployment metadata: { name: app-green, labels: { app: app, version: green } }
spec:
replicas: 4 selector: { matchLabels: { app: app, version: green } }
template:
metadata: { labels: { app: app, version: green } }
spec:
containers:
- name: app image: ghcr. io/org/app:1. 8. 0 apiVersion: v1 kind: Service metadata: { name: app-svc }
spec:
selector: {app: app, version: blue} # ← switch to green - change ports: [{port: 80, targetPort: 8080}]

Switching - atomic change of selector (or labels) with controlled drain.

4. 2 Canary (Istio VirtualService)

yaml apiVersion: networking. istio. io/v1beta1 kind: VirtualService metadata: { name: app }
spec:
hosts: ["app. example. com"]
http:
- route:
- destination: { host: app. blue. svc. cluster. local, subset: v1 }
weight: 90
- destination: { host: app. green. svc. cluster. local, subset: v2 }
weight: 10

Change 'weight' by step; add retry, timeout, outlier-detector to DestinationRule.

4. 3 Argo Rollouts (Auto Canary Run)

yaml apiVersion: argoproj. io/v1alpha1 kind: Rollout metadata: { name: app }
spec:
replicas: 6 strategy:
canary:
canaryService: app-canary stableService: app-stable steps:
- setWeight: 5
- pause: {duration: 300} # 5 min observation
- analysis:
templates:
- templateName: slo-guard
- setWeight: 25
- pause: { duration: 600 }
- analysis:
templates: [{ templateName: slo-guard }]
- setWeight: 50
- pause: {}
trafficRouting:
istio:
virtualService:
name: app routes: ["http-route"]

The template analysis is associated with metrics (see below).

5) SLO gates and auto rollback

Protected metrics (examples):
  • Technical: 'p95 _ latency', '5xx _ rate', 'error _ budget _ burn', 'CPU/Memory throttling'.
  • Grocery: 'CR (deposit)', 'success of payments', 'scoring fraud', 'ARPPU' (on cold windows).
Stop policy (example):
  • If the '5xx _ rate' of the new version is> 0. 5% for 10 min - pause and rollback.
  • If 'p95 _ latency' ↑> 20% of the base - rollback.
  • If canary promotion goes but budget SLO is burned> 2 %/hour - hold.
Argo AnalysisTemplate (simplified):
yaml apiVersion: argoproj. io/v1alpha1 kind: AnalysisTemplate metadata: { name: slo-guard }
spec:
metrics:
- name: http_5xx_rate interval: 1m successCondition: result < 0. 005 provider:
prometheus:
address: http://prometheus. monitoring:9090 query:
sum(rate(http_requests_total{app="app",status=~"5.."}[5m])) /
sum(rate(http_requests_total{app="app"}[5m]))

6) Data and compatibility (the most common cause of pain)

Use the expand → migrate → contract strategy:
  • Expand: add new nullable columns/indexes, support both schemes.
  • Migrate: Double Write/Read, Back-Fill.
  • Contract: delete old fields/code after exiting 100% of traffic.
  • Event/queues: version payload (v1/v2), support idempotency.
  • Cache/sessions: version keys; Ensure format compatibility.

7) Integration with CI/CD and GitOps

CI: build once, image signature, SBOM, tests.
CD: artifact promotion through environments; Blue-Green/Canary are governed by manifestos.
GitOps: MR → controller (Argo CD/Flux) applies weights/selectors.
Environments/Approvals: for production steps - manual gate + audit decisions.

8) NGINX/Envoy and Cloud LBs: Quick Examples

8. 1 NGINX (upstream weights)

nginx upstream app_upstream {
server app-blue:8080 weight=90;
server app-green:8080 weight=10;
}
server {
location / { proxy_pass http://app_upstream; }
}

8. 2 AWS ALB (Weighted Target Groups)

TG-Blue: 90, TG-Green: 10 → change weights via IaC/CLI.
Link CloudWatch alerts to rollback auto scripts (weight change to 0/100).

9) Safety and compliance

Zero trust between versions: distinguish between encryption secrets/rolling keys.
Policy-as-Code: disallow unsigned image deploy, 'no latest'.
Secrets and configs as version artifacts; rollback includes rollback of configs.
Audit: who, when he lifted the weight/switched the selector, with what ticket.

10) Cost and capacity

Blue-Green requires double the power for the release period → plan a window.
Canary can last longer → cost of telemetry/surveillance, parallel content of two versions.
Optimization: autoscaling by HPA/VPA, short Blue-Green windows, night releases for "heavy" services.

11) Runbooks

1. Pause the promotion.
2. Reduce Green weight to 0% (canary )/return selector to Blue (blue-green).
3. Check: errors/latency returned to basic, drain connections.
4. Open an incident, collect artifacts (logs, tracks, comparison of metrics).
5. Fix/reprod to stage, drive smoke, restart progression.

12) Anti-patterns

Rebuilding an artifact between stage and prod (violation of "build once").
"Deaf" canary without SLO/metrics is a formality, not a defense.
Lack of feature flags: the release is forced to include behavior 100% at once.
Non-working health-checks/liveness → "sticky" bottoms and false stability.
Database compatibility "at random": the contract breaks when switching.
Mutable image tags/' latest 'in the prod.

13) Implementation checklist (0-45 days)

0-10 days

Choose a strategy for services: B/G, Canary or combined.
Enable image signing, health-checks, readiness-samples, 'no latest'.
Prepare SLO dashboards (latency/error rate/business metrics).

11-25 days

Automate weights (Istio/Argo Rollouts/ALB-weights).
Configure analysis templates, alerts and auto-rollback.
Template manifests (Helm/Kustomize), integrate with GitOps.

26-45 days

Implement the expand-migrate-contract strategy for the database.
Cover critical kill-switch flags.
Spend "game day": simulate a rollback and incident.

14) Maturity metrics

% of releases through Blue-Green/Canary (target> 90%).
Average switchover/rollback time (target <3 min).
Share of releases with SLO auto-stop (and without incidents).
Service coverage by telemetry (traces/logs/metrics)> 95%.
The share of DB migrations according to the expand-migrate-contract scheme is> 90%.

15) Attachments: Policy and Pipeline Templates

OPA (disallow unsigned images)

rego package admission. image

deny[msg] {
input. request. kind. kind == "Deployment"
some c img:= input. request. object. spec. template. spec. containers[c].image not startswith(img, "ghcr. io/org/")
msg:= sprintf("Image not from trusted registry: %v", [img])
}

Helm-values for canary (simplified)

yaml canary:
enabled: true steps:
- weight: 5 pause: 300
- weight: 25 pause: 600
- weight: 50 pause: 900 sloGuards:
max5xxPct: 0. 5 maxP95IncreasePct: 20

GitHub Actions - weight promotion (pseudo)

yaml
- name: Promote canary to 25%
run: kubectl patch virtualservice app \
--type=json \
-p='[{"op":"replace","path":"/spec/http/0/route/1/weight","value":25}]'

16) Conclusion

Blue-Green and Canary are not mutually exclusive, but complementary strategies. Build them on top of signed artifacts, SLO observability, automatic gates and GitOps control. Separate delivery from inclusion, keep a quick rollback and migration discipline - and releases become predictable, secure and fast.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.