Daniel's Tech Blog

Cloud Computing, Cloud Native & Kubernetes

Azure Kubernetes Service cluster autoscaler configurations

Currently you have two options on Azure Kubernetes Service to run the cluster autoscaler.

The first one is the integrated solution on the managed master control plane side. This one is in preview and only available when using the AKS VMSS preview.

You can configure AKS during the creation process to use the cluster autoscaler or enable the CA afterwards, when it is an AKS VMSS-based cluster.

Use the following Azure CLI commands to create a new AKS VMSS-based cluster with the CA enabled. If you have not registered the VMSSPreview feature yet, then do it first before you continue.

az feature register --name VMSSPreview --namespace Microsoft.ContainerService
az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/VMSSPreview')].{Name:name,State:properties.state}"
az provider register --namespace Microsoft.ContainerService
az extension add --name aks-preview

az aks create \
  --resource-group aksVmss \
  --name aksVmssCluster \
  --kubernetes-version 1.12.6 \
  --node-count 3 \
  --enable-vmss \
  --enable-cluster-autoscaler \
  --min-count 3 \
  --max-count 6

When already running AKS with VMSS you enable the cluster autoscaler with the following Azure CLI command.

az aks update \
  --resource-group aksVmss \
  --name aksVmssCluster \
  --enable-cluster-autoscaler \
  --min-count 3 \
  --max-count 6

As the CA runs on the managed master control plane side you must do some additional configuration steps to get the CA log output. Just follow the Azure docs guide for getting the master logs and check mark cluster-autoscaler.

-> https://docs.microsoft.com/en-us/azure/aks/view-master-logs

aksclusterautoscaler01aksclusterautoscaler02aksclusterautoscaler03

For providing information about the runtime state the CA creates a Kubernetes configMap object to report the actual state of the CA and the AKS cluster. You can query the status with the following kubectl command.

kubectl -n kube-system describe configmap cluster-autoscaler-status
Name:         cluster-autoscaler-status
Namespace:    kube-system
Labels:       <none>
Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2019-03-13 13:55:56.953402677 +0000 UTC

Data
====
status:
----
Cluster-autoscaler status at 2019-03-13 13:55:56.953402677 +0000 UTC:
Cluster-wide:
  Health:      Healthy (ready=4 unready=0 notStarted=0 longNotStarted=0 registered=4 longUnregistered=0)
               LastProbeTime:      2019-03-13 13:55:56.860479724 +0000 UTC m=+23365.154091042
               LastTransitionTime: 2019-03-13 07:26:49.003127759 +0000 UTC m=+17.296738977
  ScaleUp:     NoActivity (ready=4 registered=4)
               LastProbeTime:      2019-03-13 13:55:56.860479724 +0000 UTC m=+23365.154091042
               LastTransitionTime: 2019-03-13 07:26:49.003127759 +0000 UTC m=+17.296738977
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-03-13 13:55:56.860479724 +0000 UTC m=+23365.154091042
               LastTransitionTime: 2019-03-13 08:43:14.035942018 +0000 UTC m=+4602.329553236

NodeGroups:
  Name:        aks-nodepool1-33037761-vmss
  Health:      Healthy (ready=2 unready=0 notStarted=0 longNotStarted=0 registered=2 longUnregistered=0 cloudProviderTarget=2 (minSize=2, maxSize=4))
               LastProbeTime:      2019-03-13 13:55:56.860479724 +0000 UTC m=+23365.154091042
               LastTransitionTime: 2019-03-13 07:26:49.003127759 +0000 UTC m=+17.296738977
  ScaleUp:     NoActivity (ready=2 cloudProviderTarget=2)
               LastProbeTime:      2019-03-13 13:55:56.860479724 +0000 UTC m=+23365.154091042
               LastTransitionTime: 2019-03-13 07:26:49.003127759 +0000 UTC m=+17.296738977
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-03-13 13:55:56.860479724 +0000 UTC m=+23365.154091042
               LastTransitionTime: 2019-03-13 08:43:14.035942018 +0000 UTC m=+4602.329553236


Events:  <none>

That is pretty much all you need to know about the integrated CA solution.

For AKS clusters using the standard configuration with availability sets, you are responsible for deploying, configuring and operating the cluster autoscaler. The CA itself runs then on one of the agent nodes in the kube-system namespace.

Getting things started we need to provide the necessary CA secret before we can deploy the cluster autoscaler onto our AKS cluster.

-> https://github.com/neumanndaniel/kubernetes/blob/master/cluster-autoscaler/ca-secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: cluster-autoscaler-azure
  namespace: kube-system
data:
  ClientID: <REDACTED>
  ClientSecret: <REDACTED>
  ResourceGroup: <REDACTED>
  SubscriptionID: <REDACTED>
  TenantID: <REDACTED>
  VMType: YWtz
  ClusterName: <REDACTED>
  NodeResourceGroup: <REDACTED>

You can use this script to generate the ca-secret.yaml file filled out with all the necessary configuration information.

-> https://github.com/neumanndaniel/kubernetes/blob/master/cluster-autoscaler/ca-generate-secret.sh

#! /bin/bash
ID=$(az account show --query id)
SUBSCRIPTION_ID=$(echo -n $ID | tr -d '"')

TENANT=$(az account show --query tenantId)
TENANT_ID=$(echo -n $TENANT | tr -d '"' | base64 --wrap=0)

read -p "What is your AKS cluster name? " AKS_CLUSTER_NAME
read -p "What is the AKS cluster resource group name? " AKS_RESOURCE_GROUP

CLUSTER_NAME=$(echo -n $AKS_CLUSTER_NAME | base64 --wrap=0)
RESOURCE_GROUP=$(echo -n $AKS_RESOURCE_GROUP | base64 --wrap=0)

PERMISSIONS=$(az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/$SUBSCRIPTION_ID")
CLIENT_ID=$(echo $PERMISSIONS | jq .appId | tr -d '"','\n' | base64 --wrap=0)
CLIENT_SECRET=$(echo $PERMISSIONS | jq .password | tr -d '"','\n' | base64 --wrap=0)

SUBSCRIPTION_ID=$(echo -n $ID | tr -d '"' | base64 --wrap=0)

NODE_RESOURCE_GROUP=$(az aks show --name $AKS_CLUSTER_NAME  --resource-group $AKS_RESOURCE_GROUP -o tsv --query 'nodeResourceGroup' | tr -d '\n'  | base64 --wrap=0)

echo "---
apiVersion: v1
kind: Secret
metadata:
    name: cluster-autoscaler-azure
    namespace: kube-system
data:
    ClientID: $CLIENT_ID
    ClientSecret: $CLIENT_SECRET
    ResourceGroup: $RESOURCE_GROUP
    SubscriptionID: $SUBSCRIPTION_ID
    TenantID: $TENANT_ID
    VMType: YWtz
    ClusterName: $CLUSTER_NAME
    NodeResourceGroup: $NODE_RESOURCE_GROUP
---"

Run kubectl apply -f ca-secret.yaml next.

Before we continue with the CA deployment, we need some adjustments in the ca-deployment.yaml file.

-> https://github.com/neumanndaniel/kubernetes/blob/master/cluster-autoscaler/ca-deployment.yaml

Have a look at line 129 and 145. In line 129 we must adjust the cluster autoscaler container image tag to match the required version of the cluster autoscaler for our Kubernetes version we are using in our AKS cluster.

-> https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#releases

...
      - image: k8s.gcr.io/cluster-autoscaler:v1.12.3
...

In line 145 we may have to adjust the agent pool name to match the one of our AKS cluster and the CA range for min and max nodes.

...
        - --nodes=3:6:agentpool
...

You can run the following commands to get the agent pool name.

read -p "What is your AKS cluster name? " AKS_CLUSTER_NAME
read -p "What is the AKS cluster resource group name? " AKS_RESOURCE_GROUP
az aks show --name $AKS_CLUSTER_NAME  --resource-group $AKS_RESOURCE_GROUP -o tsv --query 'agentPoolProfiles[].name'

Afterwards run kubectl apply -f ca-deployment.yaml to deploy the cluster autoscaler.

As already mentioned before the CA creates a Kubernetes configMap object to report the actual state of the CA and the AKS cluster.

kubectl -n kube-system describe configmap cluster-autoscaler-status
Name:         cluster-autoscaler-status
Namespace:    kube-system
Labels:       <none>
Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2019-03-13 14:00:32.590378775 +0000 UTC

Data
====
status:
----
Cluster-autoscaler status at 2019-03-13 14:00:32.590378775 +0000 UTC:
Cluster-wide:
  Health:      Healthy (ready=4 unready=0 notStarted=0 longNotStarted=0 registered=4 longUnregistered=0)
               LastProbeTime:      2019-03-13 14:00:32.314378889 +0000 UTC m=+817.700367649
               LastTransitionTime: 2019-03-13 13:47:27.684689934 +0000 UTC m=+33.070678694
  ScaleUp:     NoActivity (ready=4 registered=4)
               LastProbeTime:      2019-03-13 14:00:32.314378889 +0000 UTC m=+817.700367649
               LastTransitionTime: 2019-03-13 13:47:27.684689934 +0000 UTC m=+33.070678694
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-03-13 14:00:32.314378889 +0000 UTC m=+817.700367649
               LastTransitionTime: 2019-03-13 13:47:27.684689934 +0000 UTC m=+33.070678694

NodeGroups:
  Name:        agentpool
  Health:      Healthy (ready=3 unready=0 notStarted=0 longNotStarted=0 registered=3 longUnregistered=0 cloudProviderTarget=3 (minSize=3, maxSize=6))
               LastProbeTime:      2019-03-13 14:00:32.314378889 +0000 UTC m=+817.700367649
               LastTransitionTime: 2019-03-13 13:47:27.684689934 +0000 UTC m=+33.070678694
  ScaleUp:     NoActivity (ready=3 cloudProviderTarget=3)
               LastProbeTime:      2019-03-13 14:00:32.314378889 +0000 UTC m=+817.700367649
               LastTransitionTime: 2019-03-13 13:47:27.684689934 +0000 UTC m=+33.070678694
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-03-13 14:00:32.314378889 +0000 UTC m=+817.700367649
               LastTransitionTime: 2019-03-13 13:47:27.684689934 +0000 UTC m=+33.070678694


Events:  <none>

 

WordPress Cookie Notice by Real Cookie Banner