Working with Windows Server node pools in Azure Kubernetes Service

Back in May Microsoft released the public preview of Windows Server support for Azure Kubernetes Service.

-> https://azure.microsoft.com/en-us/blog/announcing-the-preview-of-windows-server-containers-support-in-azure-kubernetes-service/

When you are starting with Windows Server node pools in AKS you should at least be aware of some prerequisites and limitations.

  • Windows Server node pools require Azure CNI alias AKS Advanced Networking
  • The first node pool is a Linux-based one hosting the Kubernetes system services and thus cannot be deleted. This node pool should have at least two nodes.
  • Kubernetes Network Policies, Azure NPM and Calico, are not supported.
  • Windows Server node pool names are restricted to 6 characters max.
  • Azure Monitor for containers does not fully supports Windows Server node pools.

We talk about some of the points later in this article.

First, I like to highlight an important setting you should configure before deploying an AKS cluster with a Windows Server node pool. It is the option to specify one or several taints per node pool.

As we do not want to schedule pods accidentally on Windows Server nodes, we must set a taint. For instance, the following one.

...
      taints = [
        "kubernetes.io/os=windows:NoSchedule"
      ]
...
...
          "taints": [
            "kubernetes.io/os=windows:NoSchedule"
          ]
...

Thus, we ensure that only pods tolerating the taint are scheduled on Windows Server nodes. If we would not have specified the taint, Linux-based pods could be scheduled on the Windows Server nodes and fail to start. I have seen this for instance, when using Helm 2 with Tiller leveraging the Helm Terraform provider. For god sake Helm 3 is available now and we do not need Tiller anymore.

Long story short, always set a taint for Windows Server node pools.

> kubectl get nodes -o wide
NAME                            STATUS   ROLES   AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION      CONTAINER-RUNTIME
aks-pool1-33037761-vmss000000   Ready    agent   25m   v1.14.8   10.240.0.4     <none>        Ubuntu 16.04.6 LTS               4.15.0-1060-azure   docker://3.0.7
aks-pool1-33037761-vmss000001   Ready    agent   26m   v1.14.8   10.240.0.255   <none>        Ubuntu 16.04.6 LTS               4.15.0-1060-azure   docker://3.0.7
akspool2000000                  Ready    agent   22m   v1.14.8   10.240.1.250   <none>        Windows Server 2019 Datacenter   10.0.17763.737      docker://19.3.2
> kubectl describe nodes | grep -e "Name:" -e "Taints"
Name:               aks-pool1-33037761-vmss000000
Taints:             <none>
Name:               aks-pool1-33037761-vmss000001
Taints:             <none>
Name:               akspool2000000
Taints:             kubernetes.io/os=windows:NoSchedule

You do not break the AKS cluster, because all Kubernetes system services are running on the first node pool.

> kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                                  READY   STATUS    RESTARTS   AGE     IP             NODE                            NOMINATED NODE   READINESS GATES
kube-system   calico-node-jsnwk                                     1/1     Running   0          7m30s   10.240.0.4     aks-pool1-33037761-vmss000000   <none>           <none>
kube-system   calico-node-kdbz9                                     1/1     Running   0          8m35s   10.240.0.255   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   calico-typha-85664b5b66-4fj9d                         1/1     Running   0          6m26s   10.240.0.255   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   calico-typha-horizontal-autoscaler-77df4784d7-rfpd6   1/1     Running   0          6m29s   10.240.1.126   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   coredns-7fc597cc45-g8wkf                              1/1     Running   0          6m29s   10.240.0.156   aks-pool1-33037761-vmss000000   <none>           <none>
kube-system   coredns-7fc597cc45-gpq86                              1/1     Running   0          6m30s   10.240.1.120   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   coredns-autoscaler-7ccc76bfbd-6qt9m                   1/1     Running   0          6m25s   10.240.1.123   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   kube-proxy-9dngq                                      1/1     Running   0          3m5s    10.240.0.255   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   kube-proxy-tqncv                                      1/1     Running   0          2m30s   10.240.0.4     aks-pool1-33037761-vmss000000   <none>           <none>
kube-system   metrics-server-58b6fcfd54-hvghz                       1/1     Running   0          6m29s   10.240.1.241   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   omsagent-ktjc8                                        1/1     Running   1          8m35s   10.240.1.226   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   omsagent-nd66v                                        1/1     Running   0          7m30s   10.240.0.10    aks-pool1-33037761-vmss000000   <none>           <none>
kube-system   omsagent-rs-649477b4c8-4f4b9                          1/1     Running   1          6m28s   10.240.1.70    aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   tunnelfront-dbd5b5b9b-bxhvq                           1/1     Running   0          6m26s   10.240.1.106   aks-pool1-33037761-vmss000001   <none>           <none>

Here is an example of the node pool configuration for an AKS cluster with a Windows Server node pool.

module "aks" {
  source = "../modules/aks-windows"
  ...
  agent_pool_configuration = [
    {
      agent_count = 2
      vm_size     = "Standard_D2_v3"
      zones       = ["1", "2"]
      agent_os    = "Linux"
      taints      = null
    },
    {
      agent_count = 1
      vm_size     = "Standard_D4_v3"
      zones       = ["1", "2"]
      agent_os    = "Windows"
      taints = [
        "kubernetes.io/os=windows:NoSchedule"
      ]
    }
  ]
}
...
    "agentPoolProfiles": {
      "value": [
        {
          "nodeCount": 2,
          "nodeVmSize": "Standard_D2_v3",
          "nodeOsType": "Linux"
          "availabilityZones": [
            "1",
            "2"
          ],
          "enableAutoScaling": false,
          "taints": null
        },
        {
          "nodeCount": 1,
          "nodeVmSize": "Standard_D4_v3",
          "nodeOsType": "Windows"
          "availabilityZones": [
            "1",
            "2"
          ],
          "enableAutoScaling": false
          "taints": [
            "kubernetes.io/os=windows:NoSchedule"
          ],
        }
      ]
    },
...

Our next step is to adjust the Kubernetes templates. So, our Windows containers get scheduled on the correct nodes. All we need to do is to define the tolerations and the nodeSelector options. Both settings are mandatory in our case.

...
    spec:
      tolerations:
        - key: kubernetes.io/os
          operator: Equal
          value: windows
          effect: NoSchedule
      nodeSelector:
        "kubernetes.io/os": windows
      containers:
...
> kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                                  READY   STATUS    RESTARTS   AGE   IP             NODE                            NOMINATED NODE   READINESS GATES
ambassador    ambassador-f69b8b555-7v8tb                            1/1     Running   0          17m   10.240.1.144   aks-pool1-33037761-vmss000001   <none>           <none>
ambassador    ambassador-f69b8b555-gmf2v                            1/1     Running   0          19m   10.240.0.125   aks-pool1-33037761-vmss000000   <none>           <none>
default       helloworld-655645ffc4-b9g4j                           1/1     Running   0          11m   10.240.2.119   akspool2000000                  <none>           <none>
kube-system   calico-node-jsnwk                                     1/1     Running   0          36m   10.240.0.4     aks-pool1-33037761-vmss000000   <none>           <none>
kube-system   calico-node-kdbz9                                     1/1     Running   0          37m   10.240.0.255   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   calico-typha-85664b5b66-4fj9d                         1/1     Running   0          35m   10.240.0.255   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   calico-typha-horizontal-autoscaler-77df4784d7-rfpd6   1/1     Running   0          35m   10.240.1.126   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   coredns-7fc597cc45-g8wkf                              1/1     Running   0          35m   10.240.0.156   aks-pool1-33037761-vmss000000   <none>           <none>
kube-system   coredns-7fc597cc45-gpq86                              1/1     Running   0          35m   10.240.1.120   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   coredns-autoscaler-7ccc76bfbd-6qt9m                   1/1     Running   0          35m   10.240.1.123   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   kube-proxy-9dngq                                      1/1     Running   0          32m   10.240.0.255   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   kube-proxy-tqncv                                      1/1     Running   0          31m   10.240.0.4     aks-pool1-33037761-vmss000000   <none>           <none>
kube-system   metrics-server-58b6fcfd54-hvghz                       1/1     Running   0          35m   10.240.1.241   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   omsagent-ktjc8                                        1/1     Running   1          37m   10.240.1.226   aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   omsagent-nd66v                                        1/1     Running   0          36m   10.240.0.10    aks-pool1-33037761-vmss000000   <none>           <none>
kube-system   omsagent-rs-649477b4c8-4f4b9                          1/1     Running   1          35m   10.240.1.70    aks-pool1-33037761-vmss000001   <none>           <none>
kube-system   tunnelfront-dbd5b5b9b-bxhvq                           1/1     Running   0          35m   10.240.1.106   aks-pool1-33037761-vmss000001   <none>           <none>

Last but not least is the Azure Monitor for containers support. Windows Server is not fully supported, as Windows Server nodes do not have the oms-agent pod running. In the end no logs are gathered for Windows containers now. But the container live logging functionality works. So, we do not have to live completely without any logging functionality right now.

I hope you got some useful information about working with Windows Server node pools in Azure Kubernetes Service.

Summarizing the important steps, you need to take care of.

  • Specify taints for Windows Server node pools.
  • Specify the nodeSelector option and tolerations in your Kubernetes templates.

You find the code samples for Terraform and Azure Resource Manager templates on my GitHub repository as well the example Kubernetes template.

-> https://github.com/neumanndaniel/terraform/tree/master/modules/aks-windows
-> https://github.com/neumanndaniel/armtemplates/tree/master/container
-> https://github.com/neumanndaniel/kubernetes/blob/master/windows/hello-world.yaml

Facebooktwitterlinkedinmail