Keeping your ACS Engine Kubernetes cluster on Azure up-to-date

It is quite simple to keep a Kubernetes cluster created by ACS Engine on Azure up-to-date, because the master and agent nodes are configured by default to automatically apply security patches on a nightly schedule.

But you need a solution like Kured to automatically reboot the agent nodes in the cluster, so the security patches take effect. Rebooting the master nodes is a manual task after all.


The installation of Kured is described in the mentioned GitHub repository above. You only need to run kubectl apply referencing the kured-ds.yaml. But there is currently a downside with the default kured-ds.yaml. If you apply the default kured-ds.yaml file without any modification to the image version, it uses the container image 1.0.0 of Kured with kubectl version 1.7.x and this causes some odd behavior with a Kubernetes cluster running version 1.9.x or 1.10.x for example. The result is that the agent nodes in the cluster will never reboot. Furthermore, the default kured-ds.yaml file does not support Kubernetes clusters with RBAC created by ACS Engine. If you are running a Kubernetes cluster without RBAC have a look at the first scenario and its solution to getting the automated reboot working.

ACS Engine Kubernetes cluster without RBAC:

Running kubectl logs kured-lrkkp –namespace kube-system returns the following log output.

time="2018-04-24T10:04:44Z" level=info msg="Kubernetes Reboot Daemon: master-b86c60f"
time="2018-04-24T10:04:44Z" level=info msg="Node ID: k8s-agentpool-35404701-1"
time="2018-04-24T10:04:44Z" level=info msg="Lock Annotation: kube-system/"
time="2018-04-24T10:04:44Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T10:04:44Z" level=info msg="Holding lock"
time="2018-04-24T10:04:44Z" level=info msg="Uncordoning node k8s-agentpool-35404701-1"
time="2018-04-24T10:04:44Z" level=info msg="Releasing lock"

Looking into the issue #14 of the GitHub repository indicated that this is a known issue regarding the version drift between the kubectl client and the Kubernetes server.


That guided me to the container registry of Kured looking for the latest image version to test it again. Image master-b27aaa1 contains kubectl version 1.9.6 and with that the reboot issue was solved.


Now, with version 1.9.6 of kubectl in the image Kured is working as expected.

time="2018-04-24T13:56:29Z" level=info msg="Kubernetes Reboot Daemon: master-b27aaa1"
time="2018-04-24T13:56:29Z" level=info msg="Node ID: k8s-agentpool-35404701-1"
time="2018-04-24T13:56:29Z" level=info msg="Lock Annotation: default/"
time="2018-04-24T13:56:29Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T13:56:31Z" level=info msg="Holding lock"
time="2018-04-24T13:56:31Z" level=info msg="Uncordoning node k8s-agentpool-35404701-1"
time="2018-04-24T13:56:36Z" level=info msg="node \"k8s-agentpool-35404701-1\" uncordoned" cmd=/usr/bin/kubectl std=out
time="2018-04-24T13:56:36Z" level=info msg="Releasing lock"

Kured ensures that only one agent node at a given time will be rebooted to maintain performance and availability of the Kubernetes cluster and its running applications.

time="2018-04-24T12:29:42Z" level=info msg="Kubernetes Reboot Daemon: master-b27aaa1"
time="2018-04-24T12:29:42Z" level=info msg="Node ID: k8s-agentpool-35404701-2"
time="2018-04-24T12:29:42Z" level=info msg="Lock Annotation: default/"
time="2018-04-24T12:29:42Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T13:55:33Z" level=info msg="Reboot required"
time="2018-04-24T13:55:33Z" level=warning msg="Lock already held: k8s-agentpool-35404701-1"

For the sake of completeness, you will see the following log output, when no reboot is required.

time="2018-04-24T12:29:41Z" level=info msg="Kubernetes Reboot Daemon: master-b27aaa1"
time="2018-04-24T12:29:41Z" level=info msg="Node ID: k8s-agentpool-35404701-0"
time="2018-04-24T12:29:41Z" level=info msg="Lock Annotation: default/"
time="2018-04-24T12:29:41Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T13:57:22Z" level=info msg="Reboot not required"

Have a look at the following modified kured-ds.yaml file in my GitHub repository, if you want to use the latest image version of Kured and do not want to modify the default kured-ds.yaml by yourself.


The only thing I have done was the image version modification.

ACS Engine Kubernetes cluster with RBAC:

Kubernetes clusters with RBAC requiring some more modifications of the YAML file. You must add a ServiceAccount and ClusterRoleBinding section as well the serviceAccountName specification in the DaemonSet section. As the ClusterRoleBinding I am using the default role cluster-admin to get the automated reboot via Kured to work. I currently had not the time to create a custom role binding having only the least privileges that are required to do the work. You can find the kured-ds-rbac.yaml file in my GitHub repository.