This is the first part of a three-part series about “An experiment – Enable Cilium native routing on Azure Kubernetes Service BYOCNI”.
Cilium supports two routing modes, encapsulation and native routing. Due to its versatility of not depending on the underlying network, the encapsulation, also called tunneling, mode is the default one for most Cilium installations. The native routing mode, first and foremost, can lead to lower latency and higher throughput as Cilium delegates packets to the Linux kernel’s routing subsystem when the packets are not addressed to another local endpoint on the same Kubernetes node.
-> https://docs.cilium.io/en/latest/network/concepts/routing/
Today, we look at Cilium running on Azure Kubernetes Service with the bring your own CNI mode and how to enable native routing for Cilium.
We need to fulfill two requirements before we can enable the native routing mode for Cilium. The first one is that the underlying network must be capable of forwarding IP traffic using the pods’ IP addresses. The second one is the awareness of the Kubernetes node on how to forward packets of pods. For Cilium, we have two sub-options, where one needs to be fulfilled.
- A router on the network exists that can forward the packets and knows how to reach all pods. According to the Cilium documentation, this model is used for cloud provider network integration.
- Each Kubernetes node is made aware of all pod IP addresses within the Kubernetes cluster. This can happen on a single L2 network with Cilium’s option “auto-direct-node-routes” set to “true” or via BGP.
Before we dive deeper into the details, let us have a quick look at AKS Azure CNI powered by Cilium and AKS Automatic. Those two other options use the native routing mode in conjunction with the delegated Azure IPAM.
# AKS Azure CNI powered by Cilium ❯ kubectl exec -it cilium-456hr -- cilium-dbg status ... KubeProxyReplacement: True [eth0 10.224.0.5 fe80::7eed:8dff:fe42:4d80 (Direct Routing)] ... CNI Chaining: none CNI Config file: CNI configuration management disabled Cilium: Ok 1.18.2 (v1.18.2-263cb49397) ... Cilium health daemon: Ok IPAM: IPv4: delegated to plugin, ... Routing: Network: Native Host: Legacy Attach Mode: Legacy TC Device Mode: veth Masquerading: Disabled # AKS Automatic ❯ kubectl exec -it cilium-5wxtq -- cilium-dbg status ... KubeProxyReplacement: True [eth0 10.224.0.6 fe80::7eed:8dff:fe2c:8ce5 (Direct Routing)] ... CNI Chaining: none CNI Config file: CNI configuration management disabled Cilium: Ok 1.17.9 (v1.17.9-a90719a94a) ... Cilium health daemon: Ok IPAM: IPv4: delegated to plugin, ... Routing: Network: Native Host: Legacy Attach Mode: Legacy TC Device Mode: veth Masquerading: Disabled
In general, the Azure Virtual Network is capable of forwarding IP traffic, but it has its own specifics with its default system routes implementation. If a CIDR range is not part of the address space of a Virtual Network, Azure implements default system routes with the next hop “None”. That means packets to this CIDR range will be dropped rather than routed over the network.
However, adding the pod CIDR range to the Virtual Network’s address space changes the next hop from “None” to “Virtual Network”. Also, this will not help us in the case of Azure Kubernetes Service BYOCNI, as the Kubernetes pods neither get the IP addresses assigned from the Azure SDN stack nor is the Azure SDN stack made aware of the Kuberetes pods’ IP addresses. Hence, the Azure SDN stack is completely unaware of how to forward the packets. For AKS Azure CNI powered by Cilium and AKS Automatic, the Azure CNI plugin with the delegated Azure IPAM takes care of it.
So, how can we enable Cilium native routing on Azure Kubernetes Service BYOCNI?
There are three options available:
- Using WireGuard Transparent Encryption
- Using user-defined routes with an Azure Route Table to tell the Azure SDN stack where to forward packets from Kubernetes pods.
- Using an Azure Route Server/Azure Virtual WAN and configure Cilium to advertise the routes via BGP. The Azure Route Server/Azure Virtual WAN tells the Azure SDN stack where to forward packets from the Kubernetes pods.
Looking at the first option, you might wonder why WireGuard Transparent Encryption? WireGuard implements a WireGuard tunnel between each Kubernetes node in the cluster and thus creates an encrypted overlay on top of the Azure Virtual Network. Same as with the encapsulation mode, this encrypted overlay does not depend on the underlying network to function.
-> https://docs.cilium.io/en/stable/security/network/encryption-wireguard/
Thus, Cilium native routing just works out of the box. Nevertheless, it is still not 100% native routing as we still have encapsulation on the WireGuard tunnel.
“When running in tunnel routing mode, pod-to-pod traffic is encapsulated twice. It is first sent to the VXLAN / Geneve tunnel interface, and then subsequently also encapsulated by the WireGuard tunnel.”
In a follow-up blog post, we will dive deeper into Cilium native routing on Azure Kubernetes Service BYOCNI with WireGuard Transparent Encryption.
The second option really supports Cilium native routing by overriding the default system route for the pod CIDR range and telling the Azure SDN stack that it has to send traffic from the node’s assigned pod CIDR range to the Kubernetes node.
Unfortunately, there is no automation available for this. For my testing, I spun up a small Azure Kubernetes Cluster and adjusted the Azure Route Table manually. For production, if someone wants to go down this route, you need to implement automation for this to keep the Azure Route Table up to date. Furthermore, an Azure Route Table is also limited in the number of routes you can add. Hence, for large-scale clusters with over thousands of nodes, this does not scale.
Fortunately, there is a third option with Azure Route Server or Azure Virtual WAN leveraging Cilium’s BGP capabilities. The Azure Route Server has a limitation of having only a maximum of 8 BGP peers that can be configured. In a follow-up blog post, we will dive deeper into Cilium native routing on Azure Kubernetes Service BYOCNI with Azure Route Server and how to mitigate the BGP peer limitation. Also, this option requires automation, adding the BGP peers.
Summary
In the end, there are three options to enable Cilium native routing on Azure Kubernetes Service BYOCNI. One that still relies on encapsulation but enables secure network communication by encrypting the network traffic. A second one, Azure Route Tables, that programs the Azure SDN stack but does not scale in large cluster deployments, and then a third one leveraging BGP and Azure Route Server to program the Azure SDN stack.
Outlook
In the second part of the series, we have a look at Cilium native routing with WireGuard Transparent Encryption, and in the third part, at Cilium native routing with Azure Route Server and BGP.
