Blue-Green和Canary deploy

藍綠色和金絲雀

1）挑戰和關鍵想法

Blue-Green和Canary是不間斷發布的策略，可降低實施風險：

Blue-Green：保持兩個並行版本（Blue-active，Green-new），以原子方式切換流量。快速回滾→立即返回Blue。
金絲雀：分階段包括新版本（1％→ 5％→ 25％→ 50％→ 100％），監視SLO度量標準，並在降解時停止/回滾。

一般原則：將「人工制品交付」與「交通啟用」分開，並使觀察力+回滾自動化。

2）什麼時候選擇

Blue-Green適合以下情況：

需要即時切換（剛性RTO），簡單的無狀態服務；
有嚴格的發布/凍結窗口和可理解的煙霧檢查；
保持長期雙容量非常昂貴-但可以短暫地保持。

金絲雀適合：

復雜的變化，需要對實際流量進行分階段驗證；
有成熟的遙測（SLO，業務指標），自動啟動的可能性；
限制病變半徑（fintech/iGaming流）是至關重要的。

組合模式：通過金絲雀階段推出綠色並切換到綠色（藍綠色作為框架，金絲雀作為交通運輸方法）。

3）流量路由體系結構

開關/分流選項：

1.L4/L7平衡器（ALB/NLB，Cloud Load Balancer）是加權目標組。

2.API 網關/WAF是跨版本，標題，Cookie，地區的路線/重量。

3.Service Mesh （Istio/Linkerd/Consul）-分布百分比,斷線註射,定時器/中繼器/限制。

4.Ingress/NGINX/Envoy-上遊權重和屬性路由。

5.Argo Rollouts/Flagger是操作員-控制器，自動進展，與Prometheus/New Relic/Datadog集成。

4） Kubernetes： 實用模板

4.1 Blue-Green（通過服務選擇器）

Два Deployment: `app-blue` и `app-green`.

一個帶有選擇器的Service 'app-svc'到所需的「版本」。

yaml apiVersion: apps/v1 kind: Deployment metadata: { name: app-green, labels: { app: app, version: green } }
spec:
replicas: 4 selector: { matchLabels: { app: app, version: green } }
template:
metadata: { labels: { app: app, version: green } }
spec:
containers:
- name: app image: ghcr. io/org/app:1. 8. 0 apiVersion: v1 kind: Service metadata: { name: app-svc }
spec:
selector: {app: app, version: blue} # ← switch to green - change ports: [{port: 80, targetPort: 8080}]

切換-選擇器（或標簽）與受控排版的原子交換。

4.2 Canary (Istio VirtualService)

yaml apiVersion: networking. istio. io/v1beta1 kind: VirtualService metadata: { name: app }
spec:
hosts: ["app. example. com"]
http:
- route:
- destination: { host: app. blue. svc. cluster. local, subset: v1 }
weight: 90
- destination: { host: app. green. svc. cluster. local, subset: v2 }
weight: 10

在臺階上改變「重量」；在DestinationRule上添加retry, timeout, outlier檢測器。

4.3 Argo Rollouts（自動金絲雀運行）

yaml apiVersion: argoproj. io/v1alpha1 kind: Rollout metadata: { name: app }
spec:
replicas: 6 strategy:
canary:
canaryService: app-canary stableService: app-stable steps:
- setWeight: 5
- pause: {duration: 300} # 5 min observation
- analysis:
templates:
- templateName: slo-guard
- setWeight: 25
- pause: { duration: 600 }
- analysis:
templates: [{ templateName: slo-guard }]
- setWeight: 50
- pause: {}
trafficRouting:
istio:
virtualService:
name: app routes: ["http-route"]

模板分析與度量相關（請參見下文）。

5）SLO門和自動回滾

受保護的度量（示例）：

技術：「p95_latency」，「5xx_rate」，「error_budget_burn」，「CPU/Memory throttling」。
雜貨店：「CR（存款）」，「付款成功」，「評分」，「ARPPU」（在冷窗上）。

腳策略（示例）：

如果「5xx_rate」是新版本>0。5％在10分鐘內-pause和rollback。
如果「p95_latency」 ↑>基本後退的20％。
如果金絲雀促銷活動正在進行，但預算SLO被燒毀>2％/小時-保持。

Argo AnalysisTemplate（簡化）：

yaml apiVersion: argoproj. io/v1alpha1 kind: AnalysisTemplate metadata: { name: slo-guard }
spec:
metrics:
- name: http_5xx_rate interval: 1m successCondition: result < 0. 005 provider:
prometheus:
address: http://prometheus. monitoring:9090 query:
sum(rate(http_requests_total{app="app",status=~"5.."}[5m])) /
sum(rate(http_requests_total{app="app"}[5m]))

6）數據和兼容性（疼痛的最常見原因）

使用expand → migrate → contract策略：

Expand：添加新的無效列/索引,支持這兩個方案。
Migrate：雙重寫作/閱讀，後退。
合同：100%流量退出後刪除舊字段/代碼。
事件/隊列：翻轉付費負載（v1/v2）,支持idempotency。
緩存/會話：驗證密鑰；確保格式兼容性。

7）與CI/CD和GitOps的集成

CI：組裝單個工件（構建一個），映像簽名，SBOM，測試。
CD：通過周圍環境宣傳文物；藍綠色/金絲雀由宣言管理。
GitOps： merge MR →控制器（Argo CD/Flux）應用權重/選擇器。
Environments/Approvals： for prod steps-手動門+審核解決方案。

8）NGINX/Envoy和雲LB： 快速示例

8.1 NGINX （apstrim重量）

nginx upstream app_upstream {
server app-blue:8080 weight=90;
server app-green:8080 weight=10;
}
server {
location / { proxy_pass http://app_upstream; }
}

8.2 AWS ALB (Weighted Target Groups)

TG-Blue： 90, TG-Green： 10 →通過IaC/CLI改變重量。
將CloudWatch-alerts引入自動滾裝腳本（重量變化為0/100）。

9）安全性和合規性

版本之間的零信任：區分秘密/滾動加密密鑰。
Policy-as-Code：禁止丟棄未簽名的圖像，「不最新」。
秘密和偽裝作為版本文物；回滾包括configs回滾。
審計：誰在舉重/切換選擇器時，用什麼滴答聲。

10）成本和容量

Blue-Green在發布期間需要雙重動力→計劃一個窗口。
金絲雀可以延伸更長的時間→遙測/監視的成本，兩個版本的並行內容。
優化：通過HPA/VPA自動計算，Blue-Green短窗口，「重型」服務的夜間發布。

11）回滾（runbook）

1.凍結推廣（pause）。
2.將綠色重量減至0%（金絲雀）/將選擇器返回藍色（藍色綠色）。
3.檢查：錯誤/潛伏期恢復為基線，排水化合物。
4.打開事件，收集文物（標誌，路線，指標比較）。
5.舞臺上的fix/reprode，趕走煙霧，重新開始進度。

12）反模式

在stage和prod之間重新組合工件（違反「build once」）。
沒有SLO/度量的「聾人」金絲雀是形式而不是防禦。
沒有幻燈片：發布被迫立即包括100％的行為。
不良的健康檢查/生活→「紮根」的墊子和虛假的穩定性。
DB的「向前」兼容性：轉換時合同中斷。
可變圖像標簽/「最新」。

13）實施清單（0-45天）

0-10天

選擇服務策略：B/G、Canary或組合。
啟用映像簽名，健康檢查，就緒樣本，「no latest」。
準備SLO（latency/error rate/業務指標）行列板。

11-25天

自動化重量（Istio/Argo Rollouts/ALB重量）。
配置分析模板、異形和自動滾回。
將清單模板化（Helm/Kustomize）,與GitOps集成。

26-45天

為DB實施expand-migrate-contract策略。
用旗幟覆蓋關鍵的殺手開關浮動。
舉行「遊戲日」：模擬回滾和事件。

14）成熟度量

通過Blue-Green/Canary發行的％（目標>90％）。
平均切換/回滾時間（目標<3分鐘）。
在SLO上自動停止發布的份額（並且沒有事件）。
遙測（traces/logs/metrics）服務覆蓋率>95%。
根據expand-migrate-Contract模式的DB遷移比例超過90％。

15）應用程序： 策略模板和piplines

OPA（禁止未簽名的圖像）

rego package admission. image

deny[msg] {
input. request. kind. kind == "Deployment"
some c img:= input. request. object. spec. template. spec. containers[c].image not startswith(img, "ghcr. io/org/")
msg:= sprintf("Image not from trusted registry: %v", [img])
}

金絲雀的Helm-values（簡化）

yaml canary:
enabled: true steps:
- weight: 5 pause: 300
- weight: 25 pause: 600
- weight: 50 pause: 900 sloGuards:
max5xxPct: 0. 5 maxP95IncreasePct: 20

GitHub Actions-重量推廣（偽）

yaml
- name: Promote canary to 25%
run: kubectl patch virtualservice app \
--type=json \
-p='[{"op":"replace","path":"/spec/http/0/route/1/weight","value":25}]'

16）結論

藍綠色和金絲雀不是相互排斥的，而是互補的策略。將它們構建在簽名工件、SLO可觀察性、自動門和GitOps控制之上。將交付與啟用分開,保持快速回滾和遷移紀律-發布將變得可預測、安全和快速。

Blue-Green和Canary deploy

藍綠色和金絲雀

11-25天

26-45天

金絲雀的Helm-values（簡化）

GitHub Actions-重量推廣（偽）

與我們聯繫

快速聯繫

影片即將更新

我們目前正忙於各項專案