How risky it really is to run a Pod with privileged: true?

3 min read

When running containers, by default we will have an isolation between the host and the running container: you cannot access the host’s resources. But when you run a Pod with the privileged flag, you are effectively disabling this isolation making it equivalent to running that process as root on the host server.

When you tun a privileged Pod means that the pod can access the host’s resources and kernel capabilities: This is essentially equivalent to root on the host. This might be needed on some scenarios, such as being able to run GPU enabled containers in a Kubernetes cluster, so that the GPU can be accessed directly from the container.

We can test it out using the following Deployment

apiVersion: apps/v1
kind: Deployment
  name: test-privileged
      app: nginx
  replicas: 1
        app: nginx
      - name: nginx
        image: nginx:latest
        - containerPort: 80
          privileged: true

Now if we try list the files we can see on the /dev filesystem we will be able to see all the devices that the host can see:

$ kubectl --context arm64 exec -it test-privileged-8468dbb5c7-6m7mk -- ls /dev
autofs     loop6     rpivid-hevcmem   tty20  tty49      vcs3
bsg    loop7     rpivid-intcmem   tty21  tty5     vcs4
btrfs-control  mapper    rpivid-vp9mem    tty22  tty50      vcs5
bus    mem     sda      tty23  tty51      vcs6
cachefiles   mqueue    sda1     tty24  tty52      vcs7
cec0     net     sda2     tty25  tty53      vcsa
cec1     null    sda3     tty26  tty54      vcsa1
cpu_dma_latency  port    sdb      tty27  tty55      vcsa2
cuse     ppp     sg0      tty28  tty56      vcsa3
dma_heap   ptmx    sg1      tty29  tty57      vcsa4
dri    pts     sg2      tty3   tty58      vcsa5
fd     ram0    shm      tty30  tty59      vcsa6
full     ram1    snd      tty31  tty6     vcsa7
fuse     ram10     stderr     tty32  tty60      vcsu
gpiochip0  ram11     stdin      tty33  tty61      vcsu1
gpiochip1  ram12     stdout     tty34  tty62      vcsu2
gpiomem    ram13     termination-log  tty35  tty63      vcsu3
hwrng    ram14     tty      tty36  tty7     vcsu4
i2c-11     ram15     tty0     tty37  tty8     vcsu5
i2c-12     ram2    tty1     tty38  tty9     vcsu6
input    ram3    tty10      tty39  ttyAMA0    vcsu7
kmsg     ram4    tty11      tty4   ttyprintk  vga_arbiter
kvm    ram5    tty12      tty40  uhid     vhci
longhorn   ram6    tty13      tty41  uinput     vhost-net
loop-control   ram7    tty14      tty42  urandom    watchdog
loop0    ram8    tty15      tty43  vc-mem     watchdog0
loop1    ram9    tty16      tty44  vchiq      zero
loop2    random    tty17      tty45  vcio
loop3    raw     tty18      tty46  vcs
loop4    rfkill    tty19      tty47  vcs1
loop5    rpivid-h264mem  tty2     tty48  vcs2

On a non-privileged Pod we wouldn't be able to see all the devices:

$ kubectl exec -it pet2cattle-7f9775bbd8-klkfd -c pet2cattle -- ls /dev
fd               ptmx             stderr           tty
full             pts              stdin            urandom
mqueue           random           stdout           zero
null             shm              termination-log

So, as soon as we have access to all the devices (disks) we can do whatever we want to the Kubernetes node by mounting the relevant filesystems and writing whatever change we want.

We must bear in mind that privileged=true is a shorthand for ALL PRIVILEGES, if we really need to add some extra capabilities we can configure permissions in a more granular way. You can check the SecurityContext reference for a comprehensive list.

Posted on 22/12/2021