The Azure Kubernetes Fleet Manager comes with two different configuration options with and without a hub cluster configuration.
In today’s blog post, we focus on the Azure Kubernetes Fleet Manager without a hub cluster configuration. This configuration option only provides the Azure Kubernetes Service update management, and this is our focus for today,
Before we dive into the topic, let us step back and answer the question of why we need the Azure Kubernetes Fleet Manager in times of infrastructure as code.
Why we need the Azure Kubernetes Fleet Manager?
Imagine you use Terraform for your infrastructure as code and use either GitHub Actions or Terraform Cloud to apply your definitions. You have a large number of Azure Kubernetes Service clusters with hundreds of nodes each. Depending on how you have configured the max surge setting, a Kubernetes version upgrade can take a long time.
This is a problem in two ways. First, the Terraform provider for Azure has a default 90-minute timeout configured for the azurerm_kubernetes_cluster resource. It might be that you need to overwrite the default value to prevent running into a timeout. Second, costs. For instance, GitHub Actions are billed per minute and have default timeouts as well.
Another issue arises with infrastructure as code when using the Azure Kubernetes Service automated Kubernetes version upgrades. You must use then the lifecycle ignore changes instructions to prevent changes for the Kubernetes version when you apply your infrastructure as code configuration. This makes a Kubernetes version upgrade a bit more work, as you need to remove the instructions first and re-add them later.
Here comes the Azure Kubernetes Fleet Manager into play, which allows better scheduling, control, and execution of Kubernetes version upgrades.
Fleet Manager Deployment
In the Azure portal, we create a new Azure Kubernetes Fleet Manager instance and select the hub cluster mode without hub cluster.
Afterward, we onboard the existing Azure Kubernetes Service cluster in this example aks-azst-1 and aks-azst-2, both running Kubernetes version 1.27.7.
During the Azure Kubernetes Service cluster onboarding to the Fleet Manager instance, you can define update groups for each cluster that are important at a later stage. In our case, aks-azst-1 is assigned to the canary update group and aks-azst-2 to the production one.
The clusters are now members of the Fleet Manager instance and show up on the overview page reporting their Kubernetes and node OS image versions.
Define an update strategy
Before we start a Kubernetes cluster upgrade of our Azure Kubernetes Service, we define an update strategy to control the overall update process.
Within a strategy, we can define multiple stages to control the update process. The first stage is called canary and targets the canary update group. Before we proceed with the next stage in the strategy, we will pause the update of other Azure Kubernetes Service clusters for an hour.
Our second stage is called production and targets our production update group.
The update strategy is now ready to be used within an update run.
Update stages are executed sequentially during an update run, and all update groups within a stage in parallel.
Execute an update run
Defining an update run does not execute the Kubernetes upgrade immediately. It has to be triggered manually by starting the defined update run.
For our update run, we set the update sequence to stages and copy the stages from our previously created update strategy. We set the upgrade scope to Kubernetes version and select 1.28.3 as the target version. Furthermore, we want to use the latest available node OS image in the respective Azure regions.
As said before, an update run does not start automatically. Hence, we select the update run and hit Start, which kicks off the run.
By clicking on the update run and the different stages, we get important information about the execution states of our update in progress.
Once the update run succeeded, we can check the clusters’ Kubernetes and node OS image versions on the Fleet Manager’s overview page.
Both Azure Kubernetes Service clusters are now running Kubernetes version 1.28.3.
Summary
The Azure Kubernetes Fleet Manager is an invaluable addition to the Azure Kubernetes Service cluster management in Azure and makes the Kubernetes version upgrade a breeze, as well in combination with already existing infrastructure as code configurations.
Besides the Kubernetes version upgrade capabilities, the Fleet Manager can also replicate and maintain Kubernetes resource objects when running the Fleet Manager with the hub cluster configuration.
You can use Terraform to define the bespoken update configuration entirely in infrastructure as code.
-> Create Azure Kubernetes Fleet Manager: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_fleet_manager
-> Add member clusters: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_fleet_member
-> Define an update strategy: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_fleet_update_strategy
The only thing right now that needs to be defined and executed manually is the actual update run.
For further information about the Azure Kubernetes Fleet Manager, have a look at the Azure documentation.
-> https://learn.microsoft.com/en-us/azure/kubernetes-fleet/?WT.mc_id=AZ-MVP-5000119