Daniel's Tech Blog

Cloud Computing, Cloud Native & Kubernetes

Apply configuration changes to the default node pool in AKS via Bicep

In today’s blog post we look into the Bicep and how to apply configuration changes to the default node pool in Azure Kubernetes Service within the same Bicep template.

What sounds easy on the first look gets nasty if all the things shall be done within the same template like creating an Azure Kubernetes Service cluster, adding additional node pools, or changing the configuration of existing ones. Especially, when you use Terraform which just works due to the Azure provider that abstracts away the challenges.

Before we dig deeper into the Bicep template itself let us talk about the challenges.

Challenges

An Azure Kubernetes Service cluster requires a default node pool configuration during its creation. Some of the configuration options can only be changed later on via the agent pool API. Besides the AKS cluster resource, the template must contain an AKS agent pool resource as well that also contains the default node pool configuration. This is not a challenge but needs to be implemented. For instance, an Azure Kubernetes Service deployment via the Azure portal also uses both resources as seen in the screenshot below.

Azure Template Export

Now to the part that might be a showstopper when you do not know how to work around it.

When you change a setting of the default node pool that can only be done via the agent pool API you need to accomplish two things. First, the value of the configuration parameter in the AKS cluster resource shall not have the new value and shall not be maintained manually. Second, the new value shall be applied via the AKS agent pool resource.

Trying to achieve this you certainly run into a circular dependency that blocks the template deployment.

Still curious how to solve this? Read on we now diving into the Bicep template.

AKS cluster and agent pool resources in the same template

As mentioned before Azure Kubernetes Service requires a default node pool configuration during the cluster creation.

param cluster_configuration object = {
  ...
  default_node_pool: {
    availability_zones: [ '1', '2', '3' ]
    ...
  }
  ...
}
...
var default_node_pool = cluster_configuration.default_node_pool
...
resource azure_kubernetes_service 'Microsoft.ContainerService/managedClusters@2022-07-01' = {
  name: cluster_name
  ...
  properties: {
    ...
    agentPoolProfiles: [
      {
        availabilityZones: default_node_pool.availability_zones
        ...
      }
    ]
    ...
  }
}

Furthermore, the template shall allow changes to the default node pool and provide the ability to add additional node pools.

  ...
  default_node_pool: {
    availability_zones: [ '1', '2', '3' ]
    ...
  }
  additional_node_pools: [
    {
      availability_zones: [ '1', '2', '3' ]
      ...
    }
  ]
}
...
var default_node_pool = cluster_configuration.default_node_pool
var all_node_pools = union(array(cluster_configuration.default_node_pool), cluster_configuration.additional_node_pools)
...
resource azure_kubernetes_service_node_pools 'Microsoft.ContainerService/managedClusters/agentPools@2022-06-02-preview' = [for node_pool in all_node_pools: {
  name: length(node_pool.name) <= 12 ? node_pool.name : substring(node_pool.name, 0, 12)
  parent: azure_kubernetes_service
  properties: {
    availabilityZones: node_pool.availability_zones
    ...
  }
}]

Nothing special so far except that we combine both parameters into an array as a variable.

var all_node_pools = union(array(cluster_configuration.default_node_pool), cluster_configuration.additional_node_pools)

Change the default node pool configuration

The default agent pool has a lot of settings that can be changed via the AKS cluster resource. For instance, the cluster autoscaler configuration. Doing the same for the upgradeSettings block results in an error pointing us to the AKS agent pool resource.

{
  "code": "DeploymentFailed",
  "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.",
  "details": [
    {
      "code": "BadRequest",
      "message": {
        "code": "BadRequest",
        "message": "Updating property MaxSurge of a virtual-machine-scale-set agent pool is not allowed through the managed cluster API. Use the agent pool API (https: //aka.ms/agent-pool-rest-api) to update property in agent pool nodepool1",
        "subcode": ""
      }
    }
  ]
}

Now comes the tricky part providing the AKS cluster resource with the current value for the upgradeSettings parameter maxSurge and providing the new one via the AKS agent pool resource.

Workaround circular dependency

The first thing that comes to our mind is using the existing AKS agent pool resource with an if statement when we set a specific setting in our default agent pool parameter.

param cluster_configuration object = {
  ...
  default_node_pool: {
    ...
    max_surge: '3'
    update_needed_via_agent_pool_api: true
  }
  ...
}
...
resource data_azure_kubernetes_service_default_node_pool 'Microsoft.ContainerService/managedClusters/agentPools@2022-07-01' existing = if (default_node_pool.update_needed_via_agent_pool_api) {
  name: '${cluster_name}/${default_node_pool.name}'
}
...
resource azure_kubernetes_service 'Microsoft.ContainerService/managedClusters@2022-07-01' = {
  name: cluster_name
  ...
  properties: {
    ...
    agentPoolProfiles: [
      {
        ...
        upgradeSettings: {
          maxSurge: default_node_pool.update_needed_via_agent_pool_api ? data_azure_kubernetes_service_default_node_pool.properties.upgradeSettings.maxSurge : default_node_pool.max_surge
        }
        ...
      }
    ]
    ...
  }
}

Once we try to roll out the Bicep template the Azure Resource Manager detects this template configuration as a circular dependency and stops the rollout with an error message.

{
  "code": "BadRequest",
  "message": {
    "error": {
      "code": "InvalidTemplate",
      "message": "Deployment template validation failed: 'Circular dependency detected on resource: '/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/rg-sec-alert/providers/Microsoft.ContainerService/managedClusters/aks-sec-alert/agentPools/nodepool1'. Please see https://aka.ms/arm-template/#resources for usage details.'.",
      "additionalInfo": [
        {
          "type": "TemplateViolation",
          "info": {
            "lineNumber": 0,
            "linePosition": 0,
            "path": ""
          }
        }
      ]
    }
  }
}

Fortunately, we can work around and mitigate the circular dependency by outsourcing the existing AKS agent pool resource in a Bicep module.

// Parameters
@description('The name of the Azure Kubernetes Service cluster.')
param cluster_name string = ''

@description('The name of the Azure Kubernetes Service cluster node pool.')
param node_pool_name string = ''

// Resources
resource azure_kubernetes_service_node_pool 'Microsoft.ContainerService/managedClusters/agentPools@2022-07-01' existing = {
  name: '${cluster_name}/${node_pool_name}'
}

// Outputs
output max_surge string = azure_kubernetes_service_node_pool.properties.upgradeSettings.maxSurge

Referencing the new module in our Bicep template is not detected then by the Azure Resource Manager as a circular dependency.

module data_azure_kubernetes_service_default_node_pool '../data_azure_kubernetes_service_node_pool/main.bicep' = if (default_node_pool.update_needed_via_agent_pool_api) {
  scope: resourceGroup()
  name: 'data-${cluster_name}-${default_node_pool.name}'
  params: {
    cluster_name: cluster_name
    node_pool_name: default_node_pool.name
  }
}
...
resource azure_kubernetes_service 'Microsoft.ContainerService/managedClusters@2022-07-01' = {
  name: cluster_name
  ...
  properties: {
    ...
    agentPoolProfiles: [
      {
        ...
        upgradeSettings: {
          maxSurge: default_node_pool.update_needed_via_agent_pool_api ? data_azure_kubernetes_service_default_node_pool.outputs.max_surge : default_node_pool.max_surge
        }
        ...
      }
    ]
    ...
  }
}

Now the rollout of the Bicep template does exactly what we want. It gets the current value of the maxSurge parameter in the upgradeSettings block via the existing AKS agent pool resource referenced as a Bicep module and sets the new value for it via the AKS agent pool resource.

AKS default node pool AKS default node pool max surge set to 2 AKS default node pool max surge set to 3

Summary

It takes some tinkering to finally get there. Especially, when you are used to working with Terraform. But working with Bicep templates or even with ARM templates lets you understand how the underlying resource providers in Azure work in detail. This understanding is sometimes an advantage.

Nevertheless, circular dependencies might break your neck at the first sight but can be worked around and mitigated by outsourcing the affected configuration part into a Bicep module.

All the Bicep templates/modules are available for you via my GitHub repository.

-> https://github.com/neumanndaniel/bicep/tree/main/modules

WordPress Cookie Notice by Real Cookie Banner