Updating the base image of an VMSS aks-engine cluster

In mid-February was a CVE for runc published and immediately patched on the major cloud provider platforms.

-> https://seclists.org/oss-sec/2019/q1/119
-> https://kubernetes.io/blog/2019/02/11/runc-and-cve-2019-5736/
-> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-5736

As an example if you were running an Azure Kubernetes Service cluster with Kubernetes version 1.12.4, you just needed to upgrade the cluster to version 1.12.5 to have all agent nodes patched and running on the AKS specific Moby runtime version 3.0.4.

-> https://azure.microsoft.com/en-us/updates/cve-2019-5736-and-runc-vulnerability/

When you are running an aks-engine cluster you can also do an upgrade via aks-engine or redeploy the cluster to get the updated base image. In the case that both options are not feasible for you right now or in the future and your cluster uses VMSS you get a third option.

It is just updating the VMSS base image to get the patched one.

On one of your aks-engine master nodes run the following script to update the VMSS base image and then update each VMSS instance to the latest version.

-> https://github.com/neumanndaniel/kubernetes/blob/master/aks-engine/aksEngineBaseImageUpdate.sh

#! /bin/bash
AZCLI=$(which az)
if [ -z "$AZCLI" ]
    echo "[$(date +"%Y-%m-%d %H:%M:%S")] No Azure CLI installed. Will install Azure CLI..."
    sudo apt-get install apt-transport-https lsb-release software-properties-common dirmngr -y
    AZ_REPO=$(lsb_release -cs)
    echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | \
        sudo tee /etc/apt/sources.list.d/azure-cli.list
    sudo apt-key --keyring /etc/apt/trusted.gpg.d/Microsoft.gpg adv \
        --keyserver packages.microsoft.com \
        --recv-keys BC528686B50D79E339D3721CEB3E94ADBE1229CF
    sudo apt-get update
    sudo apt-get install azure-cli
    echo "[$(date +"%Y-%m-%d %H:%M:%S")] Azure CLI installed. Script continues with updating the base image..."
echo "[$(date +"%Y-%m-%d %H:%M:%S")] Logging in to Azure via Managed Service Identity..."
NULL=$(az login --identity)
echo "[$(date +"%Y-%m-%d %H:%M:%S")] Gathering information about the Kubernetes cluster and the latest base image..."
VMSS=$(kubectl get nodes|grep vmss --max-count=1|cut -d ' ' -f1|rev|cut -c 7-|rev)
RESOURCEGROUP=$(kubectl get node $(kubectl get nodes|grep vmss --max-count=1| cut -d ' ' -f 1) -o json | jq .metadata.labels|grep kubernetes.azure.com/cluster|cut -d '"' -f4)
VMSSPROPERTIES=$(az vmss show --resource-group $RESOURCEGROUP --name $VMSS)
OFFER=$(echo $VMSSPROPERTIES| jq .virtualMachineProfile.storageProfile.imageReference.offer|cut -d '"' -f 2)
PUBLISHER=$(echo $VMSSPROPERTIES| jq .virtualMachineProfile.storageProfile.imageReference.publisher|cut -d '"' -f 2)
SKUTEMP=$(echo $VMSSPROPERTIES| jq .virtualMachineProfile.storageProfile.imageReference.sku|cut -d '"' -f 2|rev|cut -c 7-|rev)
SKU=$SKUTEMP$(date +"%Y%m")
BASEIMAGES=$(az vm image list --offer $OFFER --publisher $PUBLISHER --sku $SKU --all)
echo "[$(date +"%Y-%m-%d %H:%M:%S")] Updating base image..."
az vmss update --resource-group $RESOURCEGROUP --name $VMSS \
    --set virtualMachineProfile.storageProfile.imageReference.sku=$(echo $LATESTBASEIMAGE|jq .sku|cut -d '"' -f 2) \
        virtualMachineProfile.storageProfile.imageReference.version=$(echo $LATESTBASEIMAGE|jq .version|cut -d '"' -f 2)| jq .virtualMachineProfile.storageProfile.imageReference
echo "[$(date +"%Y-%m-%d %H:%M:%S")] Updating VMSS instances..."
VMSSINSTANCES=$(kubectl get nodes|grep vmss |cut -d ' ' -f1)
    TEMPINSTANCEID=$(kubectl get nodes $ITEM -o yaml|grep providerID)
    INSTANCEID=$(echo $TEMPINSTANCEID|cut -d '/' -f13)
    echo "[$(date +"%Y-%m-%d %H:%M:%S")] Draining node $ITEM..."
    kubectl drain $ITEM --ignore-daemonsets --delete-local-data --force
    echo "[$(date +"%Y-%m-%d %H:%M:%S")] Updating VMSS instance $ITEM..."
    az vmss update-instances --instance-ids $INSTANCEID --name $VMSS --resource-group $RESOURCEGROUP
    echo "[$(date +"%Y-%m-%d %H:%M:%S")] Uncordon node $ITEM..."
    kubectl uncordon $ITEM
echo "[$(date +"%Y-%m-%d %H:%M:%S")] Base image update finished..."

The script has the following requirements and does the following steps.


  1. aks-engine cluster uses MSI (Managed Service Identity)
  2. aks-engine cluster uses VMSS


  1. Check for Azure CLI
  2. Azure login via MSI
  3. Set new base image for VMSS
  4. Get all VMSS instances
  5. For each VMSS instance
    – Run kubectl drain VMSS instance
    – Update VMSS instance to latest version
    – Run kubectl uncordon VMSS instance

After the script successfully finishes all agent nodes are running on the patched version and new VMSS instances in a scaling event are also using the patched version.

So, beside a cluster upgrade or redeployment you get with an aks-engine cluster using VMSS a third option to implement security fixes via updating the base image.