Kubernetes: volume node affinity conflict

4 min read | by Jordi Prats

While trying to deploy Pods we might notice the on the Events section that Pod cannot be scheduled due to a volume node affinity conflict:

$ kubectl describe pod website-365-flask-ampa2-ha-member-1 -n website-365 
Name:           website-365-flask-ampa2-ha-member-1
Namespace:      website-365
Priority:       0
Node:           <none>
Labels:         (...)
Annotations:    (...)
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  StatefulSet/website-365-flask-ampa2-ha-member
Init Containers:
(...)
Containers:
(...)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  volume-website-365-flask-ampa2-ha-member-1
    ReadOnly:   false
(...)
Events:
  Type     Reason             Age                      From                Message
  ----     ------             ----                     ----                -------
  Normal   NotTriggerScaleUp  31m (x20835 over 7d19h)  cluster-autoscaler  pod didn't trigger scale-up: 2 node(s) had taint {pti/role: system}, that the pod didn't tolerate, 1 node(s) had volume node affinity conflict
  Normal   NotTriggerScaleUp  95s (x46144 over 7d19h)  cluster-autoscaler  pod didn't trigger scale-up: 1 node(s) had volume node affinity conflict, 2 node(s) had taint {pti/role: system}, that the pod didn't tolerate
  Warning  FailedScheduling   64s (x2401 over 43h)     default-scheduler   0/4 nodes are available: 2 node(s) had taint {pti/role: system}, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict.

This message is stating the fact that the node sits on a different availability zones than the volume it tries to use hence it cannot be scheduled on that node since it wouldn't be able to mount the requested volume.

We can check it looking to the Volumes section:

$ kubectl describe pod website-365-flask-ampa2-ha-member-1 -n website-365 
Name:           website-365-flask-ampa2-ha-member-1
Namespace:      website-365
Priority:       0
Node:           <none>
(...)
Volumes:
  volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  volume-website-365-flask-ampa2-ha-member-1
    ReadOnly:   false
(...)

We'll need to check the PVC first to retrieve the actual volume it is using:

$ kubectl get pvc -n website-365
NAME                                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-website-365-postgresql-0                 Bound    pvc-dc818c5c-2677-4bc0-aa32-e141e0ac1516   200Gi      RWO            ebs-gp2        41d
volume-website-365-flask-ampa2-ha-member-0    Bound    pvc-710b454f-c06b-4367-b8da-1ec5a3d78a00   200Gi      RWO            ebs-gp2        41d
volume-website-365-flask-ampa2-ha-member-1    Bound    pvc-a0cb18a4-b471-4169-b408-699aedaed33d   200Gi      RWO            ebs-gp2        41d
volume-website-365-flask-ampa2-ha-primary-0   Bound    pvc-7d4ea83f-da45-44bd-88eb-801950abb8de   200Gi      RWO            ebs-gp2        41d

If we describe it we'll be able to see on which availability zone it is:

$ kubectl describe pv pvc-a0cb18a4-b471-4169-b408-699aedaed33d
Name:              pvc-a0cb18a4-b471-4169-b408-699aedaed33d
Labels:            <none>
Annotations:       pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
Finalizers:        [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com]
StorageClass:      ebs-gp2
Status:            Bound
Claim:             website-365/volume-website-365-flask-ampa2-ha-member-1
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          200Gi
Node Affinity:     
  Required Terms:  
    Term 0:        topology.ebs.csi.aws.com/zone in [eu-west-1b]
Message:           
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            ebs.csi.aws.com
    FSType:            ext4
    VolumeHandle:      vol-09923383c7c9af32f
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1633054440112-8081-ebs.csi.aws.com
Events:                <none>

Now it's just a matter of checking the availability zone of each of the nodes:

$ kubectl get nodes
NAME                                           STATUS   ROLES    AGE     VERSION
ip-10-120-194-190.eu-west-1.compute.internal   Ready    <none>   7d22h   v1.21.4-eks-033ce7e
ip-10-120-194-235.eu-west-1.compute.internal   Ready    <none>   37d     v1.21.4-eks-033ce7e
ip-10-120-195-8.eu-west-1.compute.internal     Ready    <none>   8m28s   v1.21.4-eks-033ce7e
ip-10-120-197-126.eu-west-1.compute.internal   Ready    <none>   14h     v1.21.4-eks-033ce7e
$ kubectl describe node ip-10-120-195-8.eu-west-1.compute.internal
Name:               ip-10-120-195-8.eu-west-1.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5a.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eu-west-1
                    failure-domain.beta.kubernetes.io/zone=eu-west-1a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-120-195-8.eu-west-1.compute.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m5a.xlarge
                    pti/eks-workers-group-name=default
                    pti/lifecycle=spot
                    topology.ebs.csi.aws.com/zone=eu-west-1a
                    topology.kubernetes.io/region=eu-west-1
                    topology.kubernetes.io/zone=eu-west-1a
                    vpc.amazonaws.com/has-trunk-attached=true
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0e34bcb1ab40300fb"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
(...)

Depending on how we have our cluster configured this can be handled in different ways. Usually the ClusterAutoscaler or Karpenter to schedule new nodes on the appropriate availability zone. If, after some time, they don't we'll have to check why: Being having reached it's maximum number of nodes the most likely reason


Posted on 27/04/2022