4 min read | by Jordi Prats
If we want just a subset of Pods to be able to be scheduled on a given node we can achieve it using taints and tolerations
With a taint we can tell the cluster not to schedule Pods on this node, but with a toleration on a Pod we can allow it to tolerate this taint
First we are going to create a taint on a node:
$ kubectl taint nodes minikube-m02 application=example:NoSchedule
node/minikube-m02 tainted
Using kubect describe node we will be able to see that it have been applied:
$ kubectl describe node minikube-m02
Name: minikube-m02
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=minikube-m02
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 19 Aug 2021 18:14:37 +0200
Taints: node.kubernetes.io/not-ready:NoExecute
application=example:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
(...)
We can use a nodeSelector to try to schedule a Pod on this node:
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: nginx
image: nginx
nodeSelector:
kubernetes.io/hostname: minikube-m02
But the node will remain in Pending state:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
example 0/1 Pending 0 3s
We can check the reason using the kubectl describe: The only node that matches the nodeSelector has a taint that does not tolerate, so it cannot be scheduled there:
$ kubectl describe pod example
Name: example
Namespace: default
Priority: 0
Node: <none>
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f7bff (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-f7bff:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: application=example
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 11s (x2 over 13s) default-scheduler 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {application: example}, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
We can add a toleration on the Pod for the taint that we have created:
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: nginx
image: nginx
nodeSelector:
kubernetes.io/hostname: minikube-m02
tolerations:
- key: "application"
operator: "Equal"
value: "example"
effect: "NoSchedule"
If we create this Pod we will be able to see how it is scheduled to run on this node, ignoring (tolerating) it's taint:
$ kubectl describe pod example
Name: example
Namespace: default
Priority: 0
Node: minikube-m02/192.168.49.3
Start Time: Thu, 19 Aug 2021 19:01:54 +0200
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
nginx:
Container ID:
Image: nginx
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7z5w8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-7z5w8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/hostname=minikube-m02
Tolerations: application=example:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15s default-scheduler Successfully assigned default/example to minikube-m02
Normal Pulling 11s kubelet Pulling image "nginx"
Posted on 20/08/2021