4 min read | by Jordi Prats
While trying to deploy Pods we might notice the on the Events section that Pod cannot be scheduled due to a volume node affinity conflict:
$ kubectl describe pod website-365-flask-ampa2-ha-member-1 -n website-365
Name: website-365-flask-ampa2-ha-member-1
Namespace: website-365
Priority: 0
Node: <none>
Labels: (...)
Annotations: (...)
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/website-365-flask-ampa2-ha-member
Init Containers:
(...)
Containers:
(...)
Conditions:
Type Status
PodScheduled False
Volumes:
volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: volume-website-365-flask-ampa2-ha-member-1
ReadOnly: false
(...)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 31m (x20835 over 7d19h) cluster-autoscaler pod didn't trigger scale-up: 2 node(s) had taint {pti/role: system}, that the pod didn't tolerate, 1 node(s) had volume node affinity conflict
Normal NotTriggerScaleUp 95s (x46144 over 7d19h) cluster-autoscaler pod didn't trigger scale-up: 1 node(s) had volume node affinity conflict, 2 node(s) had taint {pti/role: system}, that the pod didn't tolerate
Warning FailedScheduling 64s (x2401 over 43h) default-scheduler 0/4 nodes are available: 2 node(s) had taint {pti/role: system}, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict.
This message is stating the fact that the node sits on a different availability zones than the volume it tries to use hence it cannot be scheduled on that node since it wouldn't be able to mount the requested volume.
We can check it looking to the Volumes section:
$ kubectl describe pod website-365-flask-ampa2-ha-member-1 -n website-365
Name: website-365-flask-ampa2-ha-member-1
Namespace: website-365
Priority: 0
Node: <none>
(...)
Volumes:
volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: volume-website-365-flask-ampa2-ha-member-1
ReadOnly: false
(...)
We'll need to check the PVC first to retrieve the actual volume it is using:
$ kubectl get pvc -n website-365
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-website-365-postgresql-0 Bound pvc-dc818c5c-2677-4bc0-aa32-e141e0ac1516 200Gi RWO ebs-gp2 41d
volume-website-365-flask-ampa2-ha-member-0 Bound pvc-710b454f-c06b-4367-b8da-1ec5a3d78a00 200Gi RWO ebs-gp2 41d
volume-website-365-flask-ampa2-ha-member-1 Bound pvc-a0cb18a4-b471-4169-b408-699aedaed33d 200Gi RWO ebs-gp2 41d
volume-website-365-flask-ampa2-ha-primary-0 Bound pvc-7d4ea83f-da45-44bd-88eb-801950abb8de 200Gi RWO ebs-gp2 41d
If we describe it we'll be able to see on which availability zone it is:
$ kubectl describe pv pvc-a0cb18a4-b471-4169-b408-699aedaed33d
Name: pvc-a0cb18a4-b471-4169-b408-699aedaed33d
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
Finalizers: [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com]
StorageClass: ebs-gp2
Status: Bound
Claim: website-365/volume-website-365-flask-ampa2-ha-member-1
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 200Gi
Node Affinity:
Required Terms:
Term 0: topology.ebs.csi.aws.com/zone in [eu-west-1b]
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: ebs.csi.aws.com
FSType: ext4
VolumeHandle: vol-09923383c7c9af32f
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1633054440112-8081-ebs.csi.aws.com
Events: <none>
Now it's just a matter of checking the availability zone of each of the nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-120-194-190.eu-west-1.compute.internal Ready <none> 7d22h v1.21.4-eks-033ce7e
ip-10-120-194-235.eu-west-1.compute.internal Ready <none> 37d v1.21.4-eks-033ce7e
ip-10-120-195-8.eu-west-1.compute.internal Ready <none> 8m28s v1.21.4-eks-033ce7e
ip-10-120-197-126.eu-west-1.compute.internal Ready <none> 14h v1.21.4-eks-033ce7e
$ kubectl describe node ip-10-120-195-8.eu-west-1.compute.internal
Name: ip-10-120-195-8.eu-west-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=m5a.xlarge
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eu-west-1
failure-domain.beta.kubernetes.io/zone=eu-west-1a
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-120-195-8.eu-west-1.compute.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=m5a.xlarge
pti/eks-workers-group-name=default
pti/lifecycle=spot
topology.ebs.csi.aws.com/zone=eu-west-1a
topology.kubernetes.io/region=eu-west-1
topology.kubernetes.io/zone=eu-west-1a
vpc.amazonaws.com/has-trunk-attached=true
Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0e34bcb1ab40300fb"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
(...)
Depending on how we have our cluster configured this can be handled in different ways. Usually the ClusterAutoscaler or Karpenter to schedule new nodes on the appropriate availability zone. If, after some time, they don't we'll have to check why: Being having reached it's maximum number of nodes the most likely reason
Posted on 27/04/2022