Using AWS Karpenter with spot instances

3 min read | by Jordi Prats

One of the advantages of using AWS Karpenter is that makes straightforward using spot instances. But how do we handle termination notices coming from AWS?

AWS Karpenter is not supposed handle the termination notices, if we want to drain the node to gracefully relocate it's resources before the instance is terminated we will have to install AWS node termination handler.

Supposing we have configured Karpenter to be able to use spot instances bt setting the key karpenter.sh/capacity-type as follows:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: pet2cattle-workers
spec:
  ttlSecondsUntilExpired: 2592000

  ttlSecondsAfterEmpty: 30

  labels:
    nodelabel: example

  requirements:
    - key: "node.kubernetes.io/instance-type" 
      operator: In
      values: ["m5a.large", "m5a.xlarge", "m5a.2xlarge"]
    - key: "topology.kubernetes.io/zone" 
      operator: In
      values: ["es-west-1a", "eu-west-1b", "eu-west-1c"]
    - key: "kubernetes.io/arch" 
      operator: In
      values: ["arm64", "amd64"]
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ["spot", "on-demand"]

  provider:
    instanceProfile: 'eks_pet2cattle_worker-instance-profile'
    securityGroupSelector:
      Name: 'eks_pet2cattle-worker'
    tags:
      exampleTag: TagValue

  limits:
    resources:
      cpu: 1000

We can take advantatge to the fact that AWS Karpenter, by default, adds the karpenter.sh/capacity-type label to the nodes specifying whether it is a spot instance or and on demand instance:

$ kubectl describe node ip-10-12-16-11.eu-west-1.compute.internal
Name:               ip-10-12-16-11.eu-west-1.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5a.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eu-west-1
                    failure-domain.beta.kubernetes.io/zone=eu-west-1a
                    karpenter.sh/capacity-type=spot
                    karpenter.sh/provisioner-name=workers-nodeprovisioner
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-12-16-11.eu-west-1.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m5a.xlarge
                    topology.ebs.csi.aws.com/zone=eu-west-1a
                    topology.kubernetes.io/region=eu-west-1
                    topology.kubernetes.io/zone=eu-west-1a
                    vpc.amazonaws.com/has-trunk-attached=true
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0caac9adebadda005"}
(...)

We can use this label to select the nodes where we want to schedule the termination handler DaemonSet. To do so, we can install the termination handler with helm using the following settings:

helm repo add eks https://aws.github.io/eks-charts
helm upgrade --install aws-node-termination-handler --namespace termination-handler \
  --set enableSpotInterruptionDraining=true \
  --set nodeSelector.karpenter.sh/capacity-type="spot"
  eks/aws-node-termination-handler

If we already have the termination handler installed we'll have to modify it's values.yaml to se the following options:

enableSpotInterruptionDraining: "true"

nodeSelector:
  karpenter.sh/capacity-type: "spot"

Having both Karpenter and the termination handler we are making sure we are handling the spot instances lifecycle: Once we receive the notification from AWS that the node is going to be terminated the instance is stopped so the Pods can, as gracefully as possible, be rescheduled on another node (or on a new node)


Posted on 21/01/2022