Daniel's Tech Blog

Cloud Computing, Cloud Native & Kubernetes

Performance comparison Cilium native routing on Azure Kubernetes Service BYOCNI

As a follow-up on the series about Cilium native routing on Azure Kubernetes Service BYOCNI, we focus in today’s blog post on the performance comparison between the different configurations.

-> https://www.danielstechblog.io/an-experiment-enable-cilium-native-routing-on-azure-kubernetes-service-byocni-part-1/
-> https://www.danielstechblog.io/an-experiment-enable-cilium-native-routing-on-azure-kubernetes-service-byocni-part-2/
-> https://www.danielstechblog.io/an-experiment-enable-cilium-native-routing-on-azure-kubernetes-service-byocni-part-3/

  • BYOCNI – Cilium VXLAN encapsulation
  • BYOCNI – Cilium VXLAN encapsulation + WireGuard encryption
  • BYOCNI – Cilium native routing + Wire Guard encryption
  • BYOCNI – Cilium native routing

This will be the first part, and in the second part, we will have a look at the performance comparison between the following Azure Kubernetes Service network configurations.

  • Azure CNI powered by Cilium – CNI Overlay
  • Azure CNI powered by Cilium – CNI Subnet
  • BYOCNI Cilium native routing

Test setup

For the test setup, I used several Azure Kubernetes Service clusters running Kubernetes 1.35.1. The clusters have one system node pool with Standard_D4das_v5 as VM SKU and a user node pool with Standard_D4ds_v5 as VM SKU.

In the first part, Cilium with the most recent version 1.19.3 was selected, where in the second part, 1.18.6 came to use as the latest available version of Azure CNI powered by Cilium.

The testing suite consists of three iperf3 and three netperf tests focusing on throughput and latency. On the Azure Kubernetes Service cluster, we then have an iperf3 server, a netperf server, and a client. Servers and client run on different nodes in the cluster.

-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/network-performance-testing.yaml

The YAML template for the test resources was created with GitHub CoPilot using the Claude Sonnet 4.6 model.

Azure Kubernetes Service BYOCNI – Cilium performance tests

In the table below, you find the test summary of the first test run, which is also available as a Markdown file.

-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/00_Summary.md

All tests are available in detail as Markdown files, outlining the test setup configuration and which test commands have been executed.

-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/01_Cilium_VXLAN.md

-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/02_Cilium_VXLAN_WireGuard.md

-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/03_Cilium_Native_Routing_WireGuard.md

-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/04_Cilium_Native_Routing.md

Test VXLAN VXLAN + WireGuard Native Routing + WireGuard Native Routing
TCP Throughput (Gbit/s) 7.6 2.1 2.4 12.0
UDP Throughput (Gbit/s) 1.5 0.8 1.0 2.0
UDP Throughput Lost Datagrams (%) 3.4 0.6 0.3 1.2
Bidirectional Throughput (TX-C/RX-C) (Gbit/s) 3.8/2.4 1.7/0.5 1.7/0.5 12.0/11.0
TCP Latency (Mean µs/Transaction Rate) 165/6057 237/4215 224/4449 124/ 8072
UDP Latency (Mean µs/Transaction Rate) 164/6077 237/4212 232/4300 122/ 8158
TCP Stream (Throughput 10^6bits/sec) 6927 2164 2225 11927

Looking at the test results, we see decent performance with VXLAN encapsulation, which should be sufficient for most environments. Pure performance and the lowest latency provide the native routing mode as expected.

An interesting observation is the WireGuard encryption test results. Whether we use WireGuard encryption with VXLAN encapsulation or with native routing does not change anything significantly. It seems that WireGuard encryption is the limiting factor when using the default network MTU of 1500.

I have not run any tests with a larger MTU setting. Hence, I cannot say if it improves the numbers, but I would guess so.

Azure Kubernetes Service, Azure CNI powered by Cilium performance tests

Before we have a look at the test summary of the second part, I would like to highlight something first. Using Cilium 1.18.6 on Azure Kubernetes Service produced fluctuating test results. Sometimes better, sometimes worse, as seen in the table below.

-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/00_Summary.md
-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/05_AKS_CNI_Overlay_Cilium.md
-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/06_AKS_CNI_Node_Subnet_Cilium.md
-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/07_Cilium_Native_Routing_AKS_Version_Match.md

Test Azure CNI Overlay Native Azure CNI Subnet Native BYOCNI Native Routing BYOCNI Native Routing 1.19.3
TCP Throughput (Gbit/s) 6.4 7.1 8.8 12.0
UDP Throughput (Gbit/s) 1.3 1.3 2.0 2.0
UDP Throughput Lost Datagrams (%) 0.2 0.1 11.0 1.2
Bidirectional Throughput (TX-C/RX-C) (Gbit/s) 6.0/1.7 6.5/1.4 4.1/3.1 12.0/11.0
TCP Latency (Mean µs/Transaction Rate) 333/3000 123/8100 128/7826 124/8072
UDP Latency (Mean µs/Transaction Rate) 340/2944 125/3540 125/8006 122/8158
TCP Stream (Throughput 10^6bits/sec) 9809 8391 11497 11927

Hence, I would conclude that Azure Kubernetes Service, Azure CNI powered by Cilium, and BYOCNI Cilium native routing provide somewhat the same performance using Cilium in version 1.18.6.

What is way better and expected is the percentage of lost UDP datagrams with Azure CNI powered by Cilium. Azure has the advantage that the managed Cilium offering is optimized for this on Azure using the Azure CNI plugin. Cilium is a pure data plane.

Cilium 1.19.3 for BYOCNI is currently the best option using native routing. The downside of it is that we require two additional components, a BGP route reflector and the Azure Route Server, to get it working with Azure’s SDN stack.

At the time of writing, Cilium in version 1.19 was not available for the managed Cilium offering on Azure Kubernetes Service. Once it becomes available, I will run the tests again.

Summary

When running Cilium on Azure Kubernetes Service via the BYOCNI option, the standard VXLAN encapsulation option is performant enough to cover most scenarios. Only when pure performance, highest throughput, and lowest latency are required does the native routing option with a BGP route reflector and Azure Route Server make sense.

Looking into the network traffic encryption with WireGuard, it has a big performance impact. Fortunately, there is the new ztunnel transparent encryption option in preview available.

-> https://docs.cilium.io/en/stable/security/network/encryption-ztunnel/

In a follow-up blog post, we will have a look at it.

All the test results, configurations, and summaries are available on my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/network-performance-testing/

WordPress Cookie Notice by Real Cookie Banner