7 min read | by Jordi Prats
Starting November 29th 2021, AWS is considering that Karpenter is ready for production: It is a cluster autoscaler alternative intended to improve the efficiency and cost of running workloads on Kubernetes clusters
Karpenter's key inner workings are these two control loops:
To define what worker nodes it can be spawn, we can configure Provisioners with a set of requirements that constrain what nodes can be provisioned. Karpenter will choose the best-fitting node to spin up depending on the Pods that are in Pending state using the fast-acting control loop.
As workloads get in and out of the cluster, the slow-acting control loop will make sure it's workloads are not fragmented across multiple nodes and will try to consolidate them into as few nodes as possible.
To install and use Karpenter, first we will need to create an IRSA enabled role with the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
"ec2:RunInstances",
"ec2:CreateTags",
"iam:PassRole",
"ec2:TerminateInstances",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeInstances",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeAvailabilityZones",
"ssm:GetParameter"
],
"Resource": [
"*"
]
}
]
}
Once this policy is attached to the IRSA IAM role that we are going to use with Karpenter's ServiceAccount we can set the annotation on the values.yaml like this:
serviceAccount:
create: true
name: karpenter
annotations:
"eks.amazonaws.com/role-arn": arn:...
On this file we also need to configure the controller. We will need to set the AWS_REGION, the clusterName and the clusterEndpoint.
To get the clusterEndpoint we can use aws cli to retrieve the URL using it's clusterName like this:
$ aws eks describe-cluster --name pet2cattle --query "cluster.endpoint" --output json
"https://B2BC91B51F0003EA14AADED1D2FFBB1C.gr7.eu-west-1.eks.amazonaws.com"
Once we have all the data we can push the configuration to the values.yaml file:
controller:
env:
- name: AWS_REGION
value: eu-west-1
clusterName: "pet2cattle"
clusterEndpoint: "..."
Once we have this config in place we can deploy it using Karpenter's helm chart (at the time of this writing it's version is 0.5.0)
$ helm repo add karpenter https://charts.karpenter.sh
$ helm repo update
$ helm upgrade --install karpenter karpenter/karpenter -n karpenter -f values.yaml
Once deployed we'll have to wait until it's available:
$ kubectl get pods -n karpenter
NAME READY STATUS RESTARTS AGE
karpenter-controller-6fdec9addf-qwert 1/1 Running 0 11m
karpenter-webhook-dffdeb86ad-pl111 1/1 Running 0 10m
At this point we are now ready to configure Karpenter to deploy new nodes to the cluster, so we will need to make sure Cluster Autoscaler is disabled so they don't compete adding nodes to the cluster.
To configure Karpenter to create new nodes we will need to create a Provisioner that will be used to determine which node needs to be added. We can configure the following:
A Provisioner object will look like follows:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: pet2cattle-workers
spec:
ttlSecondsUntilExpired: 2592000
ttlSecondsAfterEmpty: 30
labels:
nodelabel: example
requirements:
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["m5a.large", "m5a.xlarge", "m5a.2xlarge"]
- key: "topology.kubernetes.io/zone"
operator: In
values: ["es-west-1a", "eu-west-1b", "eu-west-1c"]
- key: "kubernetes.io/arch"
operator: In
values: ["arm64", "amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot", "on-demand"]
provider:
instanceProfile: 'eks_pet2cattle_worker-instance-profile'
securityGroupSelector:
Name: 'eks_pet2cattle-worker'
tags:
exampleTag: TagValue
limits:
resources:
cpu: 1000
At this point we just need to push it into Kubernetes:
$ kubectl apply -f karpenter-provisioner.yaml
If we look at Karpenter's logs we will be able to see how it spins up new nodes:
$ stern karpenter -n karpenter
(...)
karpenter-controller-6fdec9addf-qwert manager 2021-11-29T22:58:44.238Z INFO controller.provisioning Starting provisioner {"commit": "84b683b", "provisioner": "pet2cattle-workers"}
karpenter-controller-6fdec9addf-qwert manager 2021-11-29T22:58:44.239Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "84b683b", "provisioner": "pet2cattle-workers"}
karpenter-controller-6fdec9addf-qwert manager 2021-11-29T22:58:48.516Z INFO controller.provisioning Batched 1 pods in 1.000595223s {"commit": "84b683b", "provisioner": "pet2cattle-workers"}
karpenter-controller-6fdec9addf-qwert manager 2021-11-29T22:58:48.523Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [m5a.large m5a.xlarge m5a.2xlarge] {"commit": "84b683b", "provisioner": "pet2cattle-workers"}
karpenter-controller-6fdec9addf-qwert manager 2021-11-29T22:58:50.748Z INFO controller.provisioning Launched instance: i-d8d1ea0b8ede8e690, hostname: ip-10-12-15-228.eu-west-1.compute.internal, type: m5a.large, zone: eu-west-1a, capacityType: spot {"commit": "84b683b", "provisioner": "pet2cattle-workers"}
karpenter-webhook-dffdeb86ad-pl111 webhook 2021-11-29T22:58:50.764Z INFO webhook Webhook ServeHTTP request=&http.Request{Method:"POST", URL:(*url.URL)(0xc000680900), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Accept":[]string{"application/json, */*"}, "Accept-Encoding":[]string{"gzip"}, "Content-Length":[]string{"6243"}, "Content-Type":[]string{"application/json"}, "User-Agent":[]string{"kube-apiserver-admission"}}, Body:(*http.body)(0xc00084e680), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:6243, TransferEncoding:[]string(nil), Close:false, Host:"karpenter-webhook.karpenter.svc:443", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"10.12.15.143:49608", RequestURI:"/default-resource?timeout=10s", TLS:(*tls.ConnectionState)(0xc0004b1d90), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc00084e6c0)} {"commit": "84b683b"}
karpenter-webhook-dffdeb86ad-pl111 webhook 2021-11-29T22:58:50.765Z INFO webhook Kind: "karpenter.sh/v1alpha5, Kind=Provisioner" PatchBytes: null {"commit": "84b683b", "knative.dev/kind": "karpenter.sh/v1alpha5, Kind=Provisioner", "knative.dev/namespace": "", "knative.dev/name": "pet2cattle-workers", "knative.dev/operation": "UPDATE", "knative.dev/resource": "karpenter.sh/v1alpha5, Resource=provisioners", "knative.dev/subresource": "status", "knative.dev/userinfo": "{system:serviceaccount:karpenter:karpenter 4cc8c7b5-cc9b-48a1-8862-c41b97416ab2 [system:serviceaccounts system:serviceaccounts:karpenter system:authenticated] map[authentication.kubernetes.io/pod-name:[karpenter-controller-6fdec9addf-qwert] authentication.kubernetes.io/pod-uid:[acfe89ab-dead-beef-beef-caaad8320d0f]]}"}
karpenter-webhook-dffdeb86ad-pl111 webhook 2021-11-29T22:58:50.765Z INFO webhook remote admission controller audit annotations=map[string]string(nil) {"commit": "84b683b", "knative.dev/kind": "karpenter.sh/v1alpha5, Kind=Provisioner", "knative.dev/namespace": "", "knative.dev/name": "pet2cattle-workers", "knative.dev/operation": "UPDATE", "knative.dev/resource": "karpenter.sh/v1alpha5, Resource=provisioners", "knative.dev/subresource": "status", "knative.dev/userinfo": "{system:serviceaccount:karpenter:karpenter 4cc8c7b5-cc9b-48a1-8862-c41b97416ab2 [system:serviceaccounts system:serviceaccounts:karpenter system:authenticated] map[authentication.kubernetes.io/pod-name:[karpenter-controller-6fdec9addf-qwert] authentication.kubernetes.io/pod-uid:[acfe89ab-dead-beef-beef-caaad8320d0f]]}", "admissionreview/uid": "891234ff-beef-dead-adde-f6d760b8babc", "admissionreview/allowed": true, "admissionreview/result": "nil"}
karpenter-controller-6fdec9addf-qwert manager 2021-11-29T22:58:50.769Z INFO controller.provisioning Bound 1 pod(s) to node ip-10-12-15-228.eu-west-1.compute.internal {"commit": "84b683b", "provisioner": "pet2cattle-workers"}
karpenter-webhook-dffdeb86ad-pl111 webhook 2021-11-29T22:58:50.772Z INFO webhook Webhook ServeHTTP request=&http.Request{Method:"POST", URL:(*url.URL)(0xc000681c20), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Accept":[]string{"application/json, */*"}, "Accept-Encoding":[]string{"gzip"}, "Content-Length":[]string{"6243"}, "Content-Type":[]string{"application/json"}, "User-Agent":[]string{"kube-apiserver-admission"}}, Body:(*http.body)(0xc00084fd40), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:6243, TransferEncoding:[]string(nil), Close:false, Host:"karpenter-webhook.karpenter.svc:443", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"10.12.15.143:49610", RequestURI:"/validate-resource?timeout=10s", TLS:(*tls.ConnectionState)(0xc0000b0bb0), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc00084fd80)} {"commit": "84b683b"}
karpenter-webhook-dffdeb86ad-pl111 webhook 2021-11-29T22:58:50.774Z INFO webhook remote admission controller audit annotations=map[string]string(nil) {"commit": "84b683b", "knative.dev/kind": "karpenter.sh/v1alpha5, Kind=Provisioner", "knative.dev/namespace": "", "knative.dev/name": "pet2cattle-workers", "knative.dev/operation": "UPDATE", "knative.dev/resource": "karpenter.sh/v1alpha5, Resource=provisioners", "knative.dev/subresource": "status", "knative.dev/userinfo": "{system:serviceaccount:karpenter:karpenter 4cc8c7b5-cc9b-48a1-8862-c41b97416ab2 [system:serviceaccounts system:serviceaccounts:karpenter system:authenticated] map[authentication.kubernetes.io/pod-name:[karpenter-controller-6fdec9addf-qwert] authentication.kubernetes.io/pod-uid:[acfe89ab-dead-beef-beef-caaad8320d0f]]}", "admissionreview/uid": "cea2eecd-7c0b-48c2-9a92-c79e4e1476f7", "admissionreview/allowed": true, "admissionreview/result": "nil"}
karpenter-controller-6fdec9addf-qwert manager 2021-11-29T22:58:50.778Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "84b683b", "provisioner": "pet2cattle-workers"}
We'll need to wait for a while for the node to become ready:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-12-14-223.eu-west-1.compute.internal Ready <none> 8d v1.21.4-eks-033ce7e
ip-10-12-14-131.eu-west-1.compute.internal Ready <none> 38d v1.21.4-eks-033ce7e
ip-10-12-15-123.eu-west-1.compute.internal Ready <none> 49m v1.21.4-eks-033ce7e
ip-10-12-15-36.eu-west-1.compute.internal NotReady <none> 24s
ip-10-12-17-87.eu-west-1.compute.internal Ready <none> 9h v1.21.4-eks-033ce7e
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-12-14-223.eu-west-1.compute.internal Ready <none> 8d v1.21.4-eks-033ce7e
ip-10-12-14-131.eu-west-1.compute.internal Ready <none> 38d v1.21.4-eks-033ce7e
ip-10-12-15-123.eu-west-1.compute.internal Ready <none> 51m v1.21.4-eks-033ce7e
ip-10-12-15-36.eu-west-1.compute.internal Ready <none> 2m54s v1.21.5-eks-bc4871b
ip-10-12-17-87.eu-west-1.compute.internal Ready <none> 9h v1.21.4-eks-033ce7e
If we kubectl describe the node we will be able to see that the label we have defined on the Provisioner object:
$ kubectl describe node ip-10-12-15-36.eu-west-1.compute.internal
Name: ip-10-12-15-36.eu-west-1.compute.internal
Roles: <none>
Labels: nodelabel=example
karpenter.sh/capacity-type=spot
karpenter.sh/provisioner-name=pet2cattle-workers
node.kubernetes.io/instance-type=m5a.large
topology.kubernetes.io/zone=eu-west-1a
Annotations: node.alpha.kubernetes.io/ttl: 0
CreationTimestamp: Thu, 01 Dec 2021 01:34:36 +0100
Taints: node.kubernetes.io/unreachable:NoExecute
karpenter.sh/not-ready:NoSchedule
node.kubernetes.io/unreachable:NoSchedule
(...)
For spot instances: To be able to properly manage it's lifecycle, we will have to make sure the termination-handler works together with Karpenter by draining the node when a termination notice is received.
Posted on 03/12/2021