On an Azure Kubernetes Service cluster with Bring Your Own Container Network Interface (BYOCNI) using Cilium, you could not use Cilium’s agent-not-ready taint functionality.
-> https://docs.cilium.io/en/stable/installation/taints/
The reason for that is that the Azure control plane blocks add/remove operations on taints via the Kubernetes API. You have to remove taints via the Azure Kubernetes Service API. Unfortunately, that prevented us from using Cilium’s agent-not-ready taint functionality till now. Microsoft recently introduced the capability of node initialization taints for Azure Kubernetes Service.
Node initialization taints are just normal node taints set via the Azure Kubernetes Service API, with the difference that you can remove them via the Kubernetes API.
-> https://learn.microsoft.com/en-us/azure/aks/use-node-taints?WT.mc_id=AZ-MVP-5000119#use-node-initialization-taints-preview
-> https://learn.microsoft.com/en-us/azure/aks/use-node-taints?WT.mc_id=AZ-MVP-5000119#node-taint-options
This allows us to use Cilium’s agent-not-ready taint functionality. But why do you want to use it?
Cilium’s agent-not-ready taint functionality
It is highly dependent on the environment whether you have to use Cilium’s agent-not-ready taint functionality or not. When Cilium is the only CNI that gets installed on your Kubernetes nodes, you do not need it, as Cilium will run as the exclusive CNI plugin then.
Ensuring that Cilium runs as the exclusive CNI guarantees that all pods in a Kubernetes cluster are managed by and known to Cilium.
Hence, it is recommended to use Cilium’s agent-not-ready taint functionality. More details are available in the Cilium documentation linked above.
Azure Kubernetes Service and Cilium taints
In the following demonstration, we configure our Azure Kubernetes Cluster with the node initialization taint “ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready:NoExecute”. NoExecute is the recommended value for Cilium, but it depends on your environment, and hence NoSchedule might be a better fit. Using “ignore-taint.cluster-autoscaler.kubernetes.io” ensures that the Kubernetes Cluster Autoscaler continues to work as intended.
After the initial feature flag configuration, we add the node initialization taint via the Azure CLI.
❯ az aks update -g rg-azst-1 -n aks-azst-1 --node-init-taints "ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready:NoExecute"
Argument '--nodepool-initialization-taints' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
The behavior of this command has been altered by the following extension: aks-preview
Taint ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready:NoExecute with hard effect will be skipped from system pool
❯ az aks show -g rg-azst-1 -n aks-azst-1 | grep nodeInitializationTaints -A2
WARNING: The behavior of this command has been altered by the following extension: aks-preview
"nodeInitializationTaints": null,
"nodeLabels": {
"environment": "demo"
--
"nodeInitializationTaints": [
"ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready:NoExecute"
],
The Azure Kubernetes Service cluster is configured with two node pools: one system node pool and one user node pool.
Additionally, the system node pool is tainted with “CriticalAddonsOnly=true:NoSchedule” to ensure that only system workloads are running on it. As you might have seen in the output above, system node pools in Azure Kubernetes Service do not allow hard effect taints with NoExecute or NoSchedule. Except for “CriticalAddonsOnly=true:NoSchedule” and taints with PreferNoSchedule. So-called soft taints.
-> https://github.com/Azure/AKS/issues/2578#issuecomment-932594106
When you only have one node pool in your Azure Kubernetes Service cluster, you must use “ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready:PreferNoSchedule” for the node initialization taint.
Another observation I would like to share for this preview feature is that it only allows, during the time of writing, that node initialization taints are set with a cluster-level operation. The node pool level assignment via infrastructure as code only works for the system node pool defined with the Azure Kubernetes Cluster object itself.
"details": [
{
"code": "BadRequest",
"target": "/subscriptions/<REDACTED>/resourceGroups/rg-azst-1/providers/Microsoft.ContainerService/managedClusters/aks-azst-1/agentPools/defaultuser",
"message": {
"code": "NodeInitializationTaintsFeatureNotSupported",
"details": null,
"message": "Node initialization taints not supported: only managed cluster level operations can contain node initialization taints field. Please retry with a managed cluster level operation.",
"subcode": ""
}
}
]
I assume this is a preview limitation, as the official infrastructure as code documentation outlines it on a node pool level.
Cilium has been installed via the Helm Chart and a modified value for the default setting: “agentNotReadyTaintKey: “node.cilium.io/agent-not-ready””.
❯ kubectl get configmaps cilium-config -o yaml | grep 'agent-not-ready-taint-key' agent-not-ready-taint-key: ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready
Once the node initialization taint has been added, new nodes will be created with the taint. Existing nodes can be updated by running an Azure Kubernetes Service node OS image upgrade or a Kubernetes version upgrade that replaces the existing nodes.
The user node pool uses the Cluster Autoscaler, and once that kicks in and provisions a new node, we can observe the result by using “kubectl get nodes aks-defaultuser-30089184-vmss000003 -o yaml” several times after another to see the progress, like below.
apiVersion: v1
kind: Node
metadata:
...
creationTimestamp: "2025-11-04T20:59:38Z"
labels:
agentpool: defaultuser
...
name: aks-defaultuser-30089184-vmss000003
resourceVersion: "1487322"
uid: 07560115-3263-4ae8-87d5-4b69bed7fdf1
spec:
taints:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
- effect: NoSchedule
key: node.kubernetes.io/not-ready
- effect: NoSchedule
key: ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready
- effect: NoExecute
key: node.kubernetes.io/not-ready
timeAdded: "2025-11-04T20:59:39Z"
...
---
apiVersion: v1
kind: Node
metadata:
...
creationTimestamp: "2025-11-04T20:59:38Z"
labels:
agentpool: defaultuser
...
name: aks-defaultuser-30089184-vmss000003
resourceVersion: "1487511"
uid: 07560115-3263-4ae8-87d5-4b69bed7fdf1
spec:
...
taints:
- effect: NoSchedule
key: ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready
- effect: NoExecute
key: node.kubernetes.io/not-ready
timeAdded: "2025-11-04T20:59:44Z"
...
---
apiVersion: v1
kind: Node
metadata:
...
creationTimestamp: "2025-11-04T20:59:38Z"
labels:
agentpool: defaultuser
...
name: aks-defaultuser-30089184-vmss000003
resourceVersion: "1487890"
uid: 07560115-3263-4ae8-87d5-4b69bed7fdf1
spec:
...
As soon as the Cilium agent is up and running, the taint “ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready:NoExecute” is removed from the node. Observing the above output closely, the taint was set with NoSchedule instead of NoExecute. Maybe a bug in the current preview.
Summary
The Azure Kubernetes Service node initialization taint feature enables us to use Cilium’s agent-not-ready taint functionality. Therefore, Cilium can ensure that no pods start before it on a Kubernetes node.
