Using an HPA object to autoscale a deployment based on it's Pods CPU metrics

3 min read | by Jordi Prats

On Kubernetes, scaling an application is just a matter of defining how many replicas we want:

$ kubectl scale deployment/demo --replicas=5
deployment.apps/demo scaled

Having to manually adjust the number of replicas is not really practical. Here's where the HorizontalPodAutoscaler (HPA) comes into play

An HPA can be configured to use resource metrics (metrics.k8s.io), custom metrics (custom.metrics.k8s.io) and external metrics (external.metrics.k8s.io). The most basic usage is using resource metrics provided by the metrics-server that will need to be installed. We can check it's availability using kubectl get apiservice:

$ kubectl get apiservice | grep metrics
v1beta1.metrics.k8s.io                 default/metrics-server   True        15d

Once we have checked that it is available we will have to make sure the Pod have a resource request configured (or at least the namespace has a LimitRange in place). We can check it by taking a look at the Pod definition:

$ kubectl get pod ampa-voting-5bd8449967-sstrw -o yaml
apiVersion: v1
kind: Pod
metadata:
  name: spin-clouddriver-8b84fcf99-4nb74
spec:
  affinity: {}
  containers:
  - image: jordiprats/pet2cattle
    name: pet2cattle
    ports:
    - containerPort: 8008
      protocol: TCP
    resources:
      limits:
        cpu: "2"
        memory: 8000Mi
      requests:
        cpu: 200m
        memory: 1000Mi
(...)

Once we have resource requests in place, we can create a new HPA imperatively using kubectl autoscale specifying which deployment we want to control. It's options are:

  • Minimum number of replicas: Using the --min option
  • Maximum number of replicas: Using the --max option
  • Target CPU usage: Using the --cpu-percent option we can tell when we want a new Pod created base on the amount of CPU it is using during the last minute across all the Pods. For example, if we set it to 80 percent, the Pod have requested 200m but it's using more than 160m (ie 2000.8) then it will create a new Pod*

On the following example we are going to create a HPA that will keep the number of replicar between 2 and 10, scaling the application when the CPU actual usage goes beyond the 80% of the requested resources:

$ kubectl autoscale deployment ampa-voting --min=2 --max=10 --cpu-percent=80
horizontalpodautoscaler.autoscaling/ampa-voting autoscaled

Once we have it in place it's going to take a while to collect the statistics:

$ kubectl get hpa
NAME          REFERENCE                TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
ampa-voting   Deployment/ampa-voting   <unknown>/80%   2         10        0          7s

After that it will start scaling the deployment based on the CPU usage of the existing Pods:

$ kubectl get hpa
NAME          REFERENCE                TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
ampa-voting   Deployment/ampa-voting   29%/80%   2         10        4          10m

Posted on 01/07/2021