Keeping your AKS – Managed Kubernetes cluster on Azure up-to-date

It is quite simple to keep an AKS – Managed Kubernetes cluster on Azure up-to-date, because Microsoft automatically applies security patches to the nodes on a nightly schedule as described in the AKS FAQ on the Azure documentation page.

-> https://docs.microsoft.com/en-us/azure/aks/faq#are-security-updates-applied-to-aks-agent-nodes

But you need a solution like Kured to automatically reboot the nodes in the cluster, so the security patches take effect.

-> https://github.com/weaveworks/kured

The installation of Kured is described in the mentioned GitHub repository above. You only need to run kubectl apply referencing the kured-ds.yaml. But there is currently a downside with the default kured-ds.yaml. If you apply the default kured-ds.yaml file without any modification to the image version, it uses the container image 1.0.0 of Kured with kubectl version 1.7.x and this causes some odd behavior with an AKS cluster running Kubernetes version 1.9.6 for example. The result is that the nodes in the cluster will never reboot.

Running kubectl logs kured-2l4k9 –namespace kube-system returns the following log output indicating the issue.

time="2018-04-24T09:37:20Z" level=info msg="Kubernetes Reboot Daemon: master-b86c60f"
time="2018-04-24T09:37:20Z" level=info msg="Node ID: aks-agentpool-14987876-2"
time="2018-04-24T09:37:20Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2018-04-24T09:37:20Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T09:37:20Z" level=info msg="Holding lock"
time="2018-04-24T09:37:20Z" level=info msg="Uncordoning node aks-agentpool-14987876-2"
time="2018-04-24T09:37:20Z" level=info msg="Releasing lock"

Looking into the issue #14 of the GitHub repository indicates that this is a known issue regarding the version drift between the kubectl client and the Kubernetes server.

-> https://github.com/weaveworks/kured/issues/14

That guided me to the container registry of Kured looking for the latest image version to test it again.

-> https://quay.io/repository/weaveworks/kured?tab=tags

Image master-b27aaa1 contains kubectl version 1.9.6 and with that the reboot issue was solved. Now, with version 1.9.6 of kubectl in the image, Kured is working as expected.

time="2018-04-24T13:56:29Z" level=info msg="Kubernetes Reboot Daemon: master-b27aaa1"
time="2018-04-24T13:56:29Z" level=info msg="Node ID: aks-agentpool-14987876-1"
time="2018-04-24T13:56:29Z" level=info msg="Lock Annotation: default/kured:weave.works/kured-node-lock"
time="2018-04-24T13:56:29Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T13:56:31Z" level=info msg="Holding lock"
time="2018-04-24T13:56:31Z" level=info msg="Uncordoning node aks-agentpool-14987876-1"
time="2018-04-24T13:56:36Z" level=info msg="node \"aks-agentpool-14987876-1\" uncordoned" cmd=/usr/bin/kubectl std=out
time="2018-04-24T13:56:36Z" level=info msg="Releasing lock"

Kured ensures that only one node at a given time will be rebooted to maintain performance, reliability and availability of the AKS cluster and its running applications.

time="2018-04-24T12:29:42Z" level=info msg="Kubernetes Reboot Daemon: master-b27aaa1"
time="2018-04-24T12:29:42Z" level=info msg="Node ID: aks-agentpool-14987876-2"
time="2018-04-24T12:29:42Z" level=info msg="Lock Annotation: default/kured:weave.works/kured-node-lock"
time="2018-04-24T12:29:42Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T13:55:33Z" level=info msg="Reboot required"
time="2018-04-24T13:55:33Z" level=warning msg="Lock already held: aks-agentpool-14987876-1"

For the sake of completeness, you will see the following log output, when no reboot is required.

time="2018-04-24T12:29:41Z" level=info msg="Kubernetes Reboot Daemon: master-b27aaa1"
time="2018-04-24T12:29:41Z" level=info msg="Node ID: aks-agentpool-14987876-0"
time="2018-04-24T12:29:41Z" level=info msg="Lock Annotation: default/kured:weave.works/kured-node-lock"
time="2018-04-24T12:29:41Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T13:57:22Z" level=info msg="Reboot not required"

Have a look at the following modified kured-ds.yaml file in my GitHub repository, if you want to use the latest image version of Kured and do not want to modify the default kured-ds.yaml by yourself.

-> https://github.com/neumanndaniel/kubernetes/blob/master/kured/kured-ds.yaml

The only thing I have done was the image version modification.

Facebooktwittergoogle_pluslinkedinmail