Keeping your ACS Engine Kubernetes cluster on Azure up-to-date

It is quite simple to keep a Kubernetes cluster created by ACS Engine on Azure up-to-date, because the master and agent nodes are configured by default to automatically apply security patches on a nightly schedule.

But you need a solution like Kured to automatically reboot the agent nodes in the cluster, so the security patches take effect. Rebooting the master nodes is a manual task after all.

-> https://github.com/weaveworks/kured

The installation of Kured is described in the mentioned GitHub repository above. You only need to run kubectl apply referencing the kured-ds.yaml. But there is currently a downside with the default kured-ds.yaml. If you apply the default kured-ds.yaml file without any modification to the image version, it uses the container image 1.0.0 of Kured with kubectl version 1.7.x and this causes some odd behavior with a Kubernetes cluster running version 1.9.x or 1.10.x for example. The result is that the agent nodes in the cluster will never reboot. Furthermore, the default kured-ds.yaml file does not support Kubernetes clusters with RBAC created by ACS Engine. If you are running a Kubernetes cluster without RBAC have a look at the first scenario and its solution to getting the automated reboot working.

ACS Engine Kubernetes cluster without RBAC:

Running kubectl logs kured-lrkkp –namespace kube-system returns the following log output.

time="2018-04-24T10:04:44Z" level=info msg="Kubernetes Reboot Daemon: master-b86c60f"
time="2018-04-24T10:04:44Z" level=info msg="Node ID: k8s-agentpool-35404701-1"
time="2018-04-24T10:04:44Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2018-04-24T10:04:44Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T10:04:44Z" level=info msg="Holding lock"
time="2018-04-24T10:04:44Z" level=info msg="Uncordoning node k8s-agentpool-35404701-1"
time="2018-04-24T10:04:44Z" level=info msg="Releasing lock"

Looking into the issue #14 of the GitHub repository indicated that this is a known issue regarding the version drift between the kubectl client and the Kubernetes server.

-> https://github.com/weaveworks/kured/issues/14

That guided me to the container registry of Kured looking for the latest image version to test it again. Image master-b27aaa1 contains kubectl version 1.9.6 and with that the reboot issue was solved.

-> https://quay.io/repository/weaveworks/kured?tab=tags

Now, with version 1.9.6 of kubectl in the image Kured is working as expected.

time="2018-04-24T13:56:29Z" level=info msg="Kubernetes Reboot Daemon: master-b27aaa1"
time="2018-04-24T13:56:29Z" level=info msg="Node ID: k8s-agentpool-35404701-1"
time="2018-04-24T13:56:29Z" level=info msg="Lock Annotation: default/kured:weave.works/kured-node-lock"
time="2018-04-24T13:56:29Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T13:56:31Z" level=info msg="Holding lock"
time="2018-04-24T13:56:31Z" level=info msg="Uncordoning node k8s-agentpool-35404701-1"
time="2018-04-24T13:56:36Z" level=info msg="node \"k8s-agentpool-35404701-1\" uncordoned" cmd=/usr/bin/kubectl std=out
time="2018-04-24T13:56:36Z" level=info msg="Releasing lock"

Kured ensures that only one agent node at a given time will be rebooted to maintain performance and availability of the Kubernetes cluster and its running applications.

time="2018-04-24T12:29:42Z" level=info msg="Kubernetes Reboot Daemon: master-b27aaa1"
time="2018-04-24T12:29:42Z" level=info msg="Node ID: k8s-agentpool-35404701-2"
time="2018-04-24T12:29:42Z" level=info msg="Lock Annotation: default/kured:weave.works/kured-node-lock"
time="2018-04-24T12:29:42Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T13:55:33Z" level=info msg="Reboot required"
time="2018-04-24T13:55:33Z" level=warning msg="Lock already held: k8s-agentpool-35404701-1"

For the sake of completeness, you will see the following log output, when no reboot is required.

time="2018-04-24T12:29:41Z" level=info msg="Kubernetes Reboot Daemon: master-b27aaa1"
time="2018-04-24T12:29:41Z" level=info msg="Node ID: k8s-agentpool-35404701-0"
time="2018-04-24T12:29:41Z" level=info msg="Lock Annotation: default/kured:weave.works/kured-node-lock"
time="2018-04-24T12:29:41Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2018-04-24T13:57:22Z" level=info msg="Reboot not required"

Have a look at the following modified kured-ds.yaml file in my GitHub repository, if you want to use the latest image version of Kured and do not want to modify the default kured-ds.yaml by yourself.

-> https://github.com/neumanndaniel/kubernetes/blob/master/kured/kured-ds.yaml

The only thing I have done was the image version modification.

ACS Engine Kubernetes cluster with RBAC:

Kubernetes clusters with RBAC requiring some more modifications of the YAML file. You must add a ServiceAccount and ClusterRoleBinding section as well the serviceAccountName specification in the DaemonSet section. As the ClusterRoleBinding I am using the default role cluster-admin to get the automated reboot via Kured to work. I currently had not the time to create a custom role binding having only the least privileges that are required to do the work. You can find the kured-ds-rbac.yaml file in my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/blob/master/kured/kured-ds-rbac.yaml

Facebooktwittergoogle_pluslinkedinmail