Troubleshooting a Pod by changing it's command

3 min read | by Jordi Prats

When a container in a Pod is crashing sometimes it's logs are not enough to fully understand what's going on. One way to approach this situation the command it runs to something that won't make Kubernetes restart the container: For example a sleep command

$ kubectl get pods
NAME                           READY   STATUS              RESTARTS   AGE
deploy-test-84b4fdcbbd-kjkkm   0/1     CrashLoopBackOff    4          4m47s

Depending on the glib version we are using we can use sleep infinity to make it sleep indefinitely, otherwise we can always set it to something long enough for us to debug it without interruptions. I usually set it to sleep 24h, but that's really up to you.

To set the command we'll have to first identify if the Pod is controlled by some other entity (for example a Deployment, a StatefulSet, a Job...). To do so we can use kubectl describe on the Pod and look for the Controlled By property. For example:

$ kubectl describe pod deploy-test-84b4fdcbbd-kjkkm
Name:           deploy-test-84b4fdcbbd-kjkkm
Namespace:      test
Priority:       0
Node:           minikube/
Start Time:     Mon, 07 Mar 2022 22:08:28 +0100
Labels:         component=deploy-test
Annotations:    <none>
Status:         Pending
IPs:            <none>
Controlled By:  ReplicaSet/deploy-test-84b4fdcbbd

If it's controlled by a ReplicaSet, we will have to describe it to see which object controls it:

$ kubectl describe ReplicaSet/deploy-test-84b4fdcbbd
Name:           deploy-test-84b4fdcbbd
Namespace:      test
Selector:       component=deploy-test,pod-template-hash=84b4fdcbbd
Labels:         component=deploy-test
Annotations: 1
Controlled By:  Deployment/deploy-test

Once we have the object that has the Pod template we can edit it using kubect edit and add (or replace) the command:

$ kubectl edit Deployment/deploy-test

To do so we just need to look for the container and set (or update) the command property to sleep infinity or sleep 24h (if infinity is not supported):

apiVersion: apps/v1
kind: Deployment
      - command:
        - sleep
        - infinity
        image: alpine:latest
        imagePullPolicy: Always
        name: crasher
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

We might need to also remove it's liveness, readiness and statup probes to avoid Kubernetes to replace it. Once we have the Pod running:

$ kubectl get pods
NAME                           READY   STATUS              RESTARTS        AGE
deploy-test-75bbd7d4c-hnxf7    0/1     CrashLoopBackOff    5 (2m11s ago)   6m
deploy-test-7c99946dcd-87ljf   0/1     ContainerCreating   0               5s
deploy-test-84b4fdcbbd-kjkkm   0/1     Terminating         7               10m
$ kubectl get pods
NAME                           READY   STATUS        RESTARTS   AGE
deploy-test-75bbd7d4c-hnxf7    0/1     Terminating   5          6m4s
deploy-test-7c99946dcd-87ljf   1/1     Running       0          9s

We can now open a shell on it to investigate what's going on:

$ kubectl exec -it deploy-test-7c99946dcd-87ljf -- sh
/ # 

From here we can now try to manually run the process or check the presence and permissions of files...

Posted on 08/03/2022