Argo Rollouts: Canary deployments

Kubernetes Argo Rollouts Canary

7 min read | by Jordi Prats

A canary deployment is a technique to reduce the risk of introducing a new version of a software application in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure. If any issue is detected on the "canary", the deployment can be stopped, and the rest of the users won't be affected. With Argo Rollouts, we can easily implement this strategy.

Installing Argo Rollouts

First we'll have to make sure we have Argo Rollouts and it's CLI installed:

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
brew install argoproj/tap/kubectl-argo-rollouts

To start using the canary deployment strategy with Argo Rollouts, we need to update the Deployment manifest to use the Rollout resource and set the stategy to canary:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-rollout
spec:
  replicas: 10
  selector:
    matchLabels:
     app: canary-rollout
  template:
    metadata:
      labels:
        app: canary-rollout
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
  strategy:
    canary:
      maxSurge: '25%'
      maxUnavailable: 0

In the previous example we are using the canary strategy but without any additional configuration. This will make it behave like a regular rolling update.

Simple canary deployment

To make it a canary deployment, we'll need to design the steps we want to follow. In a simple canary deployment we can use setWeight and pause to control how we are going to do it:

  • setWeight: The percentage of the new version to be deployed.
  • pause: The time to wait between steps. We can set a specific duration or wait for a manual resume.

During the rollout, the controller by default will keep the previous version running at it's maximum replicas, and the new version will be scaled up to the desired replicas. If the rollout is successful, the previous version will be scaled down to zero. This is to make sure we can switch back to the previous version in case of any issue without having to wait for the replicas to be scaled up again. If we don't want to use this approach, we can set dynamicStableScale to true so that it will automatically scale down the previous version as it is scaling up the new one.

Let's see an example:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-rollout
spec:
  replicas: 10
  selector:
    matchLabels:
     app: canary-rollout
  template:
    metadata:
      labels:
        app: canary-rollout
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
        env:
        - name: APP_VERSION
          value: "v2"
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {}
      - setWeight: 50
      - pause:
          duration: 10m
      - setWeight: 90
      - pause: {}

In this example we are:

  • First, we are going to scale up to 20% of the total replicas, and wait for a manual resume.
  • Then we are going to scale up to 50% of the total replicas, and wait for 10 minutes.
  • Finally we are going to scale up to 90% of the total replicas and wait for a manual resume again.

We can use the kubectl argo rollouts get rollout command to check the status of the rollout:

$ kubectl argo rollouts get rollout canary-rollout
Name:            canary-rollout
Namespace:       demo-rollout
Status:           Paused
Message:         CanaryPauseStep
Strategy:        Canary
  Step:          1/6
  SetWeight:     20
  ActualWeight:  20
Images:          nginx:latest (canary, stable)
Replicas:
  Desired:       10
  Current:       10
  Updated:       2
  Ready:         10
  Available:     10

NAME                                        KIND        STATUS     AGE  INFO
 canary-rollout                            Rollout      Paused   10m
├──# revision:2
  └──⧉ canary-rollout-7f9c9956ff           ReplicaSet   Healthy  20s  canary
     ├──□ canary-rollout-7f9c9956ff-6bvt9  Pod          Running  10s  ready:1/1
     └──□ canary-rollout-7f9c9956ff-dsg2l  Pod          Running  10s  ready:1/1
└──# revision:1
   └──⧉ canary-rollout-8fc79696d            ReplicaSet   Healthy  10m  stable
      ├──□ canary-rollout-8fc79696d-57gz5   Pod          Running  10m  ready:1/1
      ├──□ canary-rollout-8fc79696d-6ghm9   Pod          Running  10m  ready:1/1
      ├──□ canary-rollout-8fc79696d-8fmjd   Pod          Running  10m  ready:1/1
      ├──□ canary-rollout-8fc79696d-9dq6r   Pod          Running  10m  ready:1/1
      ├──□ canary-rollout-8fc79696d-jqzsp   Pod          Running  10m  ready:1/1
      ├──□ canary-rollout-8fc79696d-pz46t   Pod          Running  10m  ready:1/1
      ├──□ canary-rollout-8fc79696d-t8k7j   Pod          Running  10m  ready:1/1
      └──□ canary-rollout-8fc79696d-vsmmx   Pod          Running  10m  ready:1/1

Since we don't have any specific duration for this step, we'll need to resume the rollout manually usign the promote command:

$ kubectl argo rollouts promote canary-rollout
rollout 'canary-rollout' promoted
$ kubectl argo rollouts get rollout canary-rollout
Name:            canary-rollout
Namespace:       demo-rollout
Status:           Paused
Message:         CanaryPauseStep
Strategy:        Canary
  Step:          3/6
  SetWeight:     50
  ActualWeight:  50
Images:          nginx:latest (canary, stable)
Replicas:
  Desired:       10
  Current:       10
  Updated:       5
  Ready:         10
  Available:     10

NAME                                        KIND        STATUS     AGE  INFO
 canary-rollout                            Rollout      Paused   24m
├──# revision:2
  └──⧉ canary-rollout-7f9c9956ff           ReplicaSet   Healthy  14m  canary
     ├──□ canary-rollout-7f9c9956ff-6bvt9  Pod          Running  13m  ready:1/1
     ├──□ canary-rollout-7f9c9956ff-dsg2l  Pod          Running  13m  ready:1/1
     ├──□ canary-rollout-7f9c9956ff-4jqrf  Pod          Running  4s   ready:1/1
     ├──□ canary-rollout-7f9c9956ff-6d5m4  Pod          Running  4s   ready:1/1
     └──□ canary-rollout-7f9c9956ff-hjrnj  Pod          Running  4s   ready:1/1
└──# revision:1
   └──⧉ canary-rollout-8fc79696d            ReplicaSet   Healthy  24m  stable
      ├──□ canary-rollout-8fc79696d-57gz5   Pod          Running  24m  ready:1/1
      ├──□ canary-rollout-8fc79696d-6ghm9   Pod          Running  24m  ready:1/1
      ├──□ canary-rollout-8fc79696d-8fmjd   Pod          Running  24m  ready:1/1
      ├──□ canary-rollout-8fc79696d-t8k7j   Pod          Running  24m  ready:1/1
      └──□ canary-rollout-8fc79696d-vsmmx   Pod          Running  24m  ready:1/1

If we have a duration set, the rollout will automatically resume after the time has passed or we can also make it to continue by using the promote command.

Canary and stable services

If we define the canaryService and stableService, the controller will update the services to select the right set of Pods.

First we'll need to create these services with a generic selector for the Rollout to use:

apiVersion: v1
kind: Service
metadata:
  name: rollout-canary
spec:
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http
  selector:
    app: canary-rollout
---
apiVersion: v1
kind: Service
metadata:
  name: rollout-stable
spec:
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http
  selector:
    app: canary-rollout

Having the services created, we can how create a new Rollout using these services:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-rollout
spec:
  replicas: 10
  selector:
    matchLabels:
     app: canary-rollout
  template:
    metadata:
      labels:
        app: canary-rollout
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
        env:
        - name: APP_VERSION
          value: "v3"
  strategy:
    canary:
      canaryService: rollout-canary
      stableService: rollout-stable
      steps:
      - setWeight: 20
      - pause: {}

Once the rollout have progressed, we can see the services being updated with the pod template hash to point to the right set of Pods:

$ kubectl get svc rollout-canary -o yaml
apiVersion: v1
kind: Service
metadata:
(...)
  name: rollout-canary
spec:
  clusterIP: 10.96.51.65
  clusterIPs:
  - 10.96.51.65
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app: canary-rollout
    rollouts-pod-template-hash: 7df7c59c9b
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Other features

While this post covered canary deployments, there are several additional features worth exploring:

  • Advanced Traffic Routing: Leveraging ingress controllers and service meshes to dynamically shift traffic based on custom rules and real-time metrics.
  • Progressive Experimentation: Using analysis templates and metrics to automatically validate new versions before promoting them.
  • Experiments: Running A/B tests and other experiments to compare different versions of an application.
  • Automated Analysis: Integrating metrics-based analysis (prometheus, CloudWatch...) to make rollout decisions based on real-time performance data.

You can also checkout blue-green deployments with Argo Rollouts.


Posted on 18/03/2025