2 min read | by Jordi Prats
For some applications we might want to avoid having two or more Pods belonging to the same Deployment to be scheduled on different nodes, yet we don't need them to be a DaemonSet. Let's use as an example the cluster autoscaler: We would like to have two replicas but not on the same node, since if we are draining the node an there's not enough capacity on the other nodes with both Pods offline a manual intervention would be required to spawn a new node
$ kubectl get pods -n autoscaler -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
autoscaler-aws-cluster-autoscaler-585cc546dd-jc46d 1/1 Running 0 16h 10.103.195.47 ip-10-12-16-10.eu-west-1.compute.internal <none> <none>
autoscaler-aws-cluster-autoscaler-585cc546dd-s4j2r 1/1 Running 0 16h 10.103.195.147 ip-10-12-16-10.eu-west-1.compute.internal <none> <none>
To do so we will have to configure affinity
The affinity for a Pod is spec.affinity, so on a Deployment it would go on the pod template thus spec.template.spec.affinity.
If we are using a Helm chart we will have to check if it's possible to set it. For example, we can set it for the cluster autoscaler by setting the following values:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app.kubernetes.io/name: aws-cluster-autoscaler
This means that this podAntiAffinity is required (requiredDuringSchedulingIgnoredDuringExecution) based the node label kubernetes.io/hostname, grouping pods using the label app.kubernetes.io/name that it's value is aws-cluster-autoscaler
So, this means that when it is trying to schedule a new Pod with app.kubernetes.io/name=aws-cluster-autoscaler, it will select a node that it's label kubernetes.io/hostname is not already owned by another Pod of this same group.
By applying this settings we will be able to see how the autoscaler Pods are no longer scheduled on the very same node:
$ kubectl get pods -n autoscaler -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
autoscaler-aws-cluster-autoscaler-77f6d6cf75-8srd7 1/1 Running 0 15m 10.103.195.19 ip-10-12-16-10.eu-west-1.compute.internal <none> <none>
autoscaler-aws-cluster-autoscaler-77f6d6cf75-v6jg5 1/1 Running 0 4m23s 10.103.199.47 ip-10-12-16-144.eu-west-1.compute.internal <none> <none>
Posted on 11/08/2021