4 min read | by Jordi Prats
If you are trying to run a EKS cluster on a small subnet you might run out of IPs sooner that you might think. Every time it has to attach or detach IPs an API call needs to be made, so to avoid getting API calls throttled it pre-reserves some
By default it will reserve an extra ENI, so how many IPs are reserved by default depends on the instance type, using the m5 family as an example:
We can check AWS documentation on ENIs to check every instance type for all the families that support trunking.
In some scenarios this amount of reserved IPs might make sense: as we scale up, each instance can hold much more Pods. But this cannot be the case for some workloads where we might be running just a bunch of Pods for node, regardless of the resources the instance uses. If that's the case, these reserved IP are never going to needed. Futhermore, if we have the EKS cluster on a small subnet we might run out of IPs quite easily:
subnet-731034d9d807709b7 us-west-2a 10.12.162.0/24 available 28
subnet-f5166f7fe2be65038 us-west-2b 10.12.164.0/24 available 8
subnet-0a38f8730d8f60fe3 us-west-2c 10.12.166.0/24 available 6
subnet-efc070d9d36d07f17 us-west-2d 10.12.163.0/24 available 9
There are three environment variables on the aws-node DaemonSet that can control the amount of IPs it reserves. These values controls how each of the nodes reserve it's share of IPs, it does not consider other node's IPs.
If we have a small cluster that's going to run just a few Pods per node we can tune down these values to reduce the amount of reservers IPs, but we need to keep in mind that if we end up scheduling more IPs too fast API calls can get throttled, leaving Pods as Pending for an longer time than expected.
For example, we can set:
We can apply these values using kubectl set env:
kubectl set env daemonset aws-node -n kube-system WARM_ENI_TARGET=0
kubectl set env daemonset aws-node -n kube-system WARM_IP_TARGET=8
kubectl set env daemonset aws-node -n kube-system MINIMUM_IP_TARGET=25
As soon as all the DaemonSets get replace, it's effect on the number of reserver IPs can be quite dramatic:
subnet-731034d9d807709b7 us-west-2a 10.12.162.0/24 available 224
subnet-f5166f7fe2be65038 us-west-2b 10.12.164.0/24 available 221
subnet-0a38f8730d8f60fe3 us-west-2c 10.12.166.0/24 available 214
subnet-efc070d9d36d07f17 us-west-2d 10.12.163.0/24 available 222
The tradeoff here is that the cluster will not be able to schedule Pods as fast as otherwise would
Posted on 16/05/2022