K8s金丝雀升级实战:零宕机部署指南
在项目迭代与持续交付过程中,上线部署是工程师必须掌握的环节。当前主流的部署策略包括金丝雀部署(灰度发布)、蓝绿部署与滚动升级三类。本文将重点拆解金丝雀部署在Kubernetes环境下的落地细节。
金丝雀部署的核心逻辑:升级时仅让少量用户流量切入新版本,大部分请求仍由旧版本处理。若新版本运行稳定且无异常,再逐步扩大流量比例,直至全量切换。这一策略既能守住系统整体可用性,又能在早期暴露潜在缺陷,最大限度降低故障影响面。下图清晰展示了这一渐进式切换流程。
概念落地还需动手实操。下面用一个完整的Kubernetes示例,演示金丝雀升级的具体步骤。
实战演示:在Kubernetes中实现金丝雀升级
(1)创建文件 canary-demo-v1.yaml,内容如下:
apiVersion: v1
kind: Service
metadata:
name: canary-demo
labels:
app: canary-demo
spec:
type: NodePort
ports:
- name: http
port: 80
targetPort: http
selector:
app: canary-demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: canary-demo-v1
labels:
app: canary-demo
spec:
replicas: 10
selector:
matchLabels:
app: canary-demo
version: v1.0.0
template:
metadata:
labels:
app: canary-demo
version: v1.0.0
spec:
containers:
- name: canary-demo
image: collenzhao/k8s-deployment-strategies
ports:
- name: http
containerPort: 8080
env:
- name: VERSION
value: v1.0.0
# 这里使用了Service的服务来实现Deployment的负载均衡。
(2)应用该文件:
kubectl apply -f canary-demo-v1.yaml
(3)查看Service信息:
kubectl get service canary-demo
# 输出:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
canary-demo NodePort 10.1.119.250 80:30952/TCP 4s
(4)通过Service访问Deployment验证版本:
curl 10.1.119.250:80
# 输出:
Host: canary-demo-v1-78b6cd78db-skjng, Version: v1.0.0
# 可以看出当前版本是v1.0.0。
(5)准备金丝雀升级,创建 canary-demo-v2.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: canary-demo-v2
labels:
app: canary-demo
spec:
replicas: 1
selector:
matchLabels:
app: canary-demo
version: v2.0.0
template:
metadata:
labels:
app: canary-demo
version: v2.0.0
spec:
containers:
- name: canary-demo
image: collenzhao/k8s-deployment-strategies
ports:
- name: http
containerPort: 8080
env:
- name: VERSION
value: v2.0.0
(6)开启两个命令行窗口,分别监控Deployment和Pod的变化:
kubectl get --watch deployment
kubectl get --watch pod
(7)执行升级:
kubectl apply -f canary-demo-v2.yaml
(8)观察Deployment和Pod的变化,效果如下图所示:
注意观察:v1.0.0版本对应10个Pod副本,v2.0.0版本仅1个Pod——这就是金丝雀的“小比例灰度”设计。
(9)编写循环脚本模拟请求,观察流量分配:
for a in {1..11}
do
sleep 1;
curl "10.1.119.250:80";
done
# 输出:
Host: canary-demo-v1-78b6cd78db-nbbjx, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-nbbjx, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-67cg5, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-gd9kf, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-7zjwf, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-gd9kf, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-dskpc, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-gd9kf, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-67cg5, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-fdrwp, Version: v1.0.0
Host: canary-demo-v2-7c4c5f5444-g69jr, Version: v2.0.0
# 循环11次,其中10次命中v1.0.0,1次命中v2.0.0,流量比例恰好为10:1。
(10)确认新版本运行正常后,逐步调整比例。将v2.0.0扩容至5个,v1.0.0缩容至5个:
kubectl scale --replicas=5 deploy canary-demo-v2
kubectl scale --replicas=5 deploy canary-demo-v1
(11)观察Deployment变化:
kubectl get --watch deployment

(12)再次运行脚本,验证流量是否变为1:1:
for a in {1..10}
do
sleep 1;
curl "10.1.119.250:80";
done
# 输出:
Host: canary-demo-v1-78b6cd78db-67cg5, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-9dphd, Version: v1.0.0
Host: canary-demo-v2-7c4c5f5444-lcbhw, Version: v2.0.0
Host: canary-demo-v1-78b6cd78db-hr9x8, Version: v1.0.0
Host: canary-demo-v1-78b6cd78db-7zjwf, Version: v1.0.0
Host: canary-demo-v2-7c4c5f5444-lcbhw, Version: v2.0.0
Host: canary-demo-v1-78b6cd78db-fdrwp, Version: v1.0.0
Host: canary-demo-v2-7c4c5f5444-9hbwr, Version: v2.0.0
Host: canary-demo-v2-7c4c5f5444-9hbwr, Version: v2.0.0
Host: canary-demo-v1-78b6cd78db-hr9x8, Version: v1.0.0
# 10次请求中,v1和v2各占5次,比例完全对等。
(13)确认新版本稳定无误后,删除旧版本Deployment,并将v2.0.0扩容至10个:
kubectl delete deployment.apps/canary-demo-v1
kubectl scale --replicas=10 deploy canary-demo-v2
# 此时应用已全部升级至v2.0.0。再次运行for循环,所有请求均返回:
Host: canary-demo-v2-7c4c5f5444-g69jr, Version: v2.0.0
Host: canary-demo-v2-7c4c5f5444-lcbhw, Version: v2.0.0
Host: canary-demo-v2-7c4c5f5444-hs4k2, Version: v2.0.0
... 全部为v2.0.0
(14)最后清理测试资源:
kubectl delete all -l app=canary-demo
回顾整个流程,核心思路清晰:通过动态调整新旧版本的副本数比例,渐进地将流量从旧版本迁移至新版本。每一步都能即时观测、随时回滚,这才是生产环境真正可落地的安全升级方案。