Using Kata Containers on Azure Kubernetes Service for sandboxing containers

Last year I wrote a blog post about running gVisor on Azure Kubernetes for sandboxing containers.

-> https://www.danielstechblog.io/running-gvisor-on-azure-kubernetes-service-for-sandboxing-containers/

Back then, the only managed Kubernetes service that supported sandboxing containers in dedicated node pools was Google Kubernetes Engine via gVisor.

A few weeks back, Microsoft announced the public preview of Kata Containers for Azure Kubernetes Service.

-> https://techcommunity.microsoft.com/t5/apps-on-azure-blog/preview-support-for-kata-vm-isolated-containers-on-aks-for-pod/ba-p/3751557?WT.mc_id=AZ-MVP-5000119

Enable Kata Containers

Before we can use Kata Containers in our Azure Kubernetes Service cluster, we need to install and enable a couple of prerequisites following the Azure documentation.

-> https://learn.microsoft.com/en-us/azure/aks/use-pod-sandboxing?WT.mc_id=AZ-MVP-5000119#prerequisites

Afterward, we create a new node pool. The new node pool receives a label and a taint as we want the node pool to be exclusively available for untrusted workloads using Kata Containers.

az aks nodepool add --cluster-name $AKS_CLUSTER_NAME --resource-group $AKS_CLUSTER_RG \
  --name kata --os-sku mariner --workload-runtime KataMshvVmIsolation --node-vm-size Standard_D4s_v3 \
  --node-taints kata=enabled:NoSchedule --labels kata=enabled

Once the node pool is ready, we run the following command checking the available runtimes.

> kubectl get runtimeclasses.node.k8s.io
NAME                     HANDLER   AGE
kata-mshv-vm-isolation   kata      54m
runc                     runc      54m

We see two runtimes runc the default one for trusted workloads and the new kata-mshv-vm-isolation for untrusted workloads that uses Kata Containers.

Verify Kata Containers usage

We deploy the following pod template that deploys an NGINX proxy onto the Kata Containers node pool.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-kata
spec:
  containers:
  - name: nginx
    image: nginx
  runtimeClassName: kata-mshv-vm-isolation
  tolerations:
    - key: kata
      operator: Equal
      value: "enabled"
      effect: NoSchedule
  nodeSelector:
    kata: enabled

Noteworthy is the definition of the runtime class as otherwise, Kubernetes uses runc, the default runtime. Furthermore, for completeness, we specify the toleration and the node selector.

> kubectl get pods nginx-kata -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP            NODE                           NOMINATED NODE   READINESS GATES
nginx-kata   1/1     Running   0          15s   10.240.3.40   aks-kata-24023092-vmss000000   <none>           <none>

After deploying our NGINX pod, we verify if the pod is sandboxed via Kata Containers. Hence, we do an exec into the NGINX proxy pod and execute the installation of ping.

> kubectl exec -it nginx-gvisor -- /bin/sh
> apt update && apt install iputils-ping -y
...
Setting up iputils-ping (3:20210202-1) ...
Failed to set capabilities on file `/bin/ping' (Operation not supported)
The value of the capability argument is not permitted for a file. Or the file is not a regular (non-symlink) file
Setcap failed on /bin/ping, falling back to setuid
...

The installation succeeds, but the set of required capabilities fails as we run in a sandbox provided by Kata Containers. Using the default runc runtime we will not see this error message as the NGINX proxy pod will not be running in a sandbox.

Summary

Finally, we can run untrusted workloads on Azure Kubernetes Service without installing another secure container runtime manually.

During the public preview, we can reduce the impact on a production cluster by using a dedicated node pool for Kata Containers.