3 min read | by Jordi Prats
If you are using a mixed policy on your EKS workers ASG you will want to install the AWS node termination handler to drain a node once AWS notifies that a particular spot instance is going to be reclaimed
To install the termination handler is this straightforward:
helm repo add eks https://aws.github.io/eks-charts
helm upgrade --install aws-node-termination-handler --namespace termination-handler eks/aws-node-termination-handler
Obviously, we can customize some settings using a values file, checking the default values from the helm chart we will find a description for each value. A good start can be the following, on which we are enabling to drain the spot instances if we get the termination notice:
enableRebalanceMonitoring: false
enableRebalanceDraining: false
enableScheduledEventDraining: ""
enableSpotInterruptionDraining: "true"
checkASGTagBeforeDraining: false
emitKubernetesEvents: true
To be able to test the termination handler we can install a fake metadata endpoint using amazon-ec2-metadata-mock. We just need to download one of the releases and install it like so:
$ helm install amazon-ec2-metadata-mock amazon-ec2-metadata-mock-1.9.1.tgz -n termination-handler
To point the termination handler to this metadata endpoint we will have to add the instanceMetadataURL to the values file pointing to Service named amazon-ec2-metadata-mock-service on the namespace we have installed it. So, using the mentioned helm install, the URL will look like this:
instanceMetadataURL: "http://amazon-ec2-metadata-mock-service.termination-handler.svc.cluster.local:1338"
Once we redeploy the termination handler with the new setting it will wait for 2 minutes before notifying the fake termination notice, so we will be able to see how it drains the node in preparation for being terminated.
The termination handler uses a DaemonSet to spawn one Pod per worker, so if we don't want to have the termination handler we can add a small script on the user_data for the ASG which will detect whether is a spot or ondemand instance and set a label accordingly:
data "template_file" "user_data_workers" {
template = <<EOF
#!/bin/bash
set -o xtrace
LIFECYCLE=$(aws ec2 describe-spot-instance-requests --filters Name=instance-id,Values="$(wget -q -O - http://169.254.169.254/latest/meta-data/instance-id)" --region "eu-west-1" | jq -r '.SpotInstanceRequests | if length > 0 then "spot" else "ondemand" end')
/etc/eks/bootstrap.sh \
--apiserver-endpoint '${var.cluster_endpoint}' \
--b64-cluster-ca '${var.cluster_certificate_authority}' \
--kubelet-extra-args "--read-only-port=10255 --node-labels=node/lifecycle=$LIFECYCLE" \
'${var.cluster_id}'
EOF
}
Bear in mind that the describe-spot-instance-requests requires you to specify the region you are in, so you'll have to adjust it accordingly. Finally, we can use this label to set a nodeSelector for the termination handler so it will only schedule Pods from the DaemonSet on the instances that are actually spot instances
nodeSelector:
node/lifecycle: "spot"
Posted on 29/09/2021