How to change the node size of the default node pool in AKS without downtime?

Currently, as of writing this blog post, Azure Kubernetes Service does not support changing the node size of the default node pool or additional node pools without recreating the whole AKS cluster or the additional node pool.

Having all the configuration in infrastructure as code whether it is Bicep or Terraform seems to be a dead end for this simple operation. If we change the node size in our IaC definition for the default node pool, the AKS cluster gets deleted first and then created again in the case of Terraform or just breaks the deployment in the case of Bicep. This is not an option for a production AKS cluster.

Another way can be an entirely new AKS cluster with the correct node size and then migrating all the workloads over from the old to the new AKS cluster. Depending on the usage of additional Azure networking services this can be done without any downtime for your customers. But still, this is a time-consuming task.

So, what else can we do?

There is another option available that requires some manual interaction that can be somehow automated with one or several shell scripts.

Change the node size of the default node pool without downtime

The procedure requires several steps to be executed one after another.

First, we add a new node pool of type System with the new node size to our AKS cluster by running az aks nodepool add with all the necessary parameters we need. After that, we disable the cluster autoscaler on the default node pool by running az aks nodepool update –disable-cluster-autoscaler. This ensures that we do not get any new nodes on the default node pool when executing our drain operation on this node pool.

AKS default node pool AKS default node pool and newly added node pool

Now, we can initiate the drain operation for all nodes in the default node pool by iterating over every node and executing the command kubectl drain ${NODE_NAME} –delete-emptydir-data –ignore-daemonsets. The Kubernetes nodes are marked as not available for scheduling and every pod on the nodes gets evicted by respecting pod disruption budgets and is scheduled onto the newly added node pool. One node after another in the default node pool gets prepared for the upcoming node pool deletion.

> kubectl drain aks-nodepool1-11750814-vmss000003 aks-nodepool1-11750814-vmss000004 aks-nodepool1-11750814-vmss000005 --delete-empty-dir --ignore-daemonsets
...
evicting pod kube-system/metrics-server-948cff58d-l42zd
error when evicting pods/"metrics-server-948cff58d-l42zd" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
...

> kubectl get nodes
NAME                                 STATUS                     ROLES   AGE     VERSION
aks-newdefault-41723878-vmss000000   Ready                      agent   6m42s   v1.23.5
aks-newdefault-41723878-vmss000001   Ready                      agent   6m42s   v1.23.5
aks-newdefault-41723878-vmss000002   Ready                      agent   7m19s   v1.23.5
aks-nodepool1-11750814-vmss000003    Ready,SchedulingDisabled   agent   15m     v1.23.5
aks-nodepool1-11750814-vmss000004    Ready,SchedulingDisabled   agent   14m     v1.23.5
aks-nodepool1-11750814-vmss000005    Ready,SchedulingDisabled   agent   15m     v1.23.5

Last but not least we delete the default node pool by running az aks nodepool delete. After the delete operation successfully proceeded our newly added node pool is now our default node pool.

AKS new default node pool

One final step that needs to be done is adjusting our IaC definition with the new name for the default node pool and the new node size. Otherwise, on the next run of the IaC definition, the AKS cluster would be deleted and then created again or breaks the deployment as the name and the node size do not match with the IaC definition.

Summary

Even though it requires some manual interaction and an adjustment to the IaC definition this approach is the only one where you do not need to set up a new AKS cluster and replace the existing one for only changing the node size in the default node pool. Furthermore, this procedure can be executed during normal business hours without having an impact on your customers.

Hopefully, Microsoft supports the change of the node size for the default node pool and additional node pools without the requirement of recreating the AKS cluster or the node pool soon. In the end, the process to update a node pool with a new node size is the same as when you initiate a node image upgrade of a node pool. So, we can only assume why the AKS API does not leverage the underlying VMSS API capabilities yet.

At least the GitHub issue for that case has been marked as a feature request since 2021.

-> https://github.com/Azure/AKS/issues/2339