AWS EKS: reduce the number of reserved IPs

EKS AWS ENI available IPs subnet

4 min read | by Jordi Prats

If you are trying to run a EKS cluster on a small subnet you might run out of IPs sooner that you might think. Every time it has to attach or detach IPs an API call needs to be made, so to avoid getting API calls throttled it pre-reserves some

By default it will reserve an extra ENI, so how many IPs are reserved by default depends on the instance type, using the m5 family as an example:

  • An m5.large can hold up to 3 ENIs, with up to 10 IP addresses each thus it will reserve the first ENI it needs, plus an extra one. That's going to be a total of 20 IPs to reserve.
  • Going up to an m5.2xlarge, it can hold up to 4 ENIs with each ENI one holding up to 15 IP addresses. This would make it reserve 30 IPs
  • For m5.4xlarge, m5.8xlarge and m5.12xlarge, it can hold up to 4 ENIs as well, but each one can hold up to 30 IP addresses per ENI, so it would reserve 60 IPs per instance
  • Worst case for the m5 family would be running m5.16xlarge or m5.24xlarge that can hold up to 50 IPs per ENI, so just one instance would get 100 IPs. This means that you couldn't have more that two instances on a /24 network

We can check AWS documentation on ENIs to check every instance type for all the families that support trunking.

In some scenarios this amount of reserved IPs might make sense: as we scale up, each instance can hold much more Pods. But this cannot be the case for some workloads where we might be running just a bunch of Pods for node, regardless of the resources the instance uses. If that's the case, these reserved IP are never going to needed. Futhermore, if we have the EKS cluster on a small subnet we might run out of IPs quite easily:

subnet-731034d9d807709b7   us-west-2a    available    28
subnet-f5166f7fe2be65038   us-west-2b    available    8
subnet-0a38f8730d8f60fe3   us-west-2c    available    6
subnet-efc070d9d36d07f17   us-west-2d    available    9

There are three environment variables on the aws-node DaemonSet that can control the amount of IPs it reserves. These values controls how each of the nodes reserve it's share of IPs, it does not consider other node's IPs.

  • WARM_ENI_TARGET : Defines how many ENIs worth of IPs to keep: Depending on the instance type this is going to be a different amount of IPs
  • WARM_IP_TARGET : Specifies the number of IPs that will be reserved per ENI
  • MINIMUM_IP_TARGET : Sets the minimum number of IPs to keep around on each node

If we have a small cluster that's going to run just a few Pods per node we can tune down these values to reduce the amount of reservers IPs, but we need to keep in mind that if we end up scheduling more IPs too fast API calls can get throttled, leaving Pods as Pending for an longer time than expected.

For example, we can set:

  • WARM_ENI_TARGET=0: Do not keep a additiona ENI ready
  • WARM_IP_TARGET=4: Keep a minimim of 4 reserved IPs
  • MINIMUM_IP_TARGET=15: Set 15 as a floor value (slightly more than the number of expected Pods to run per node)

We can apply these values using kubectl set env:

kubectl set env daemonset aws-node -n kube-system WARM_ENI_TARGET=0
kubectl set env daemonset aws-node -n kube-system WARM_IP_TARGET=8
kubectl set env daemonset aws-node -n kube-system MINIMUM_IP_TARGET=25

As soon as all the DaemonSets get replace, it's effect on the number of reserver IPs can be quite dramatic:

subnet-731034d9d807709b7   us-west-2a    available    224
subnet-f5166f7fe2be65038   us-west-2b    available    221
subnet-0a38f8730d8f60fe3   us-west-2c    available    214
subnet-efc070d9d36d07f17   us-west-2d    available    222

The tradeoff here is that the cluster will not be able to schedule Pods as fast as otherwise would

Posted on 16/05/2022