Last year I have written a blog post about detecting SNAT port exhaustion on Azure Kubernetes Service.
Today we dive into the topic of how to prevent SNAT port exhaustion on Azure Kubernetes Service with Virtual Network NAT.
Since this year the managed NAT gateway option for Azure Kubernetes Service is generally available and can be set during the cluster creation.
Unfortunately, as of writing this blog post, you cannot update existing Azure Kubernetes Service clusters with the outbound type loadBalancer to the outbound type managedNATGateway or userAssignedNATGateway.
Before we dive deeper into the topic of preventing SNAT port exhaustion on Azure Kubernetes Service let us step back and talk about what SNAT port exhaustion is.
What is SNAT port exhaustion?
SNAT, Source Network Address Translation, is used in AKS whenever an outbound call to an external address is made. Assuming you use AKS in its standard configuration, it enables IP masquerading for the backend VMSS instances of the load balancer.
SNAT ports get allocated for every outbound connection to the same destination IP and destination port. The default configuration of an Azure Kubernetes Service cluster provides 64.000 SNAT ports with a 30-minute idle timeout before idle connections are released.
When running into SNAT port exhaustion new outbound connections fail.
What is Virtual Network NAT?
Virtual Network NAT simplifies the outbound internet connectivity for a virtual network as a fully managed network address translation service. Once activated on a subnet all outbound connectivity is handled by Virtual Network NAT as it takes precedence over other configured outbound scenarios.
Furthermore, the Virtual Network NAT can use up to 16 public IP addresses which results in 1032192 available SNAT ports that can be dynamically allocated on-demand for every resource in the subnet.
SNAT port exhaustion prevention options
Currently, you have two options to prevent workloads on an AKS cluster from running into SNAT port exhaustion.
Number one is to assign enough public IPs to the load balancer, set a custom value for the allocated SNAT ports per node, and set the TCP idle reset to 4 minutes.
The automatic default for the allocated SNAT ports per node depends on the cluster size and starts with 1024 SNAT ports and ends at 32 SNAT ports per node. Also, the default TCP idle reset is 30 minutes.
In the end, you are still at risk of running into SNAT port exhaustion.
Number two is to use the Virtual Network NAT. But do not use the outbound type managedNATGateway or userAssignedNATGateway in the Azure Kubernetes Service configuration.
Using Virtual Network NAT
So, why should you still stick to the outbound type loadBalancer in the Azure Kubernetes Service configuration? Remember what I wrote at the beginning of the blog post?
Once activated on a subnet all outbound connectivity is handled by Virtual Network NAT as it takes precedence over other configured outbound scenarios. When you use managedNATGateway or userAssignedNATGateway you cannot recover yourself from a Virtual Network NAT outage without redeploying the Azure Kubernetes Service cluster. This also counts towards enabling those outbound types on existing Azure Kubernetes Service clusters, you must redeploy the cluster.
Using the outbound type loadBalancer lets you disassociate the Virtual Network NAT from the subnet and AKS will leverage the outbound rules from the load balancer for outbound connectivity in case of a Virtual Network NAT outage. Also, this configuration lets you switch to Virtual Network NAT on an existing Azure Kubernetes Service cluster.
Let us see this configuration option in action.
I simply deployed an Azure Kubernetes Service cluster via the Azure portal with the Azure CNI plugin enabled. So, the load balancer of the cluster is configured with the default values like the TCP idle reset of 30 minutes. Furthermore, I deployed a Virtual Network NAT gateway with a TCP idle reset of 4 minutes and did not associate the NAT gateway with the AKS subnet yet.
As seen in the screenshot above all outbound connectivity gets handled by the load balancer as the AKS nodes got a fixed amount of SNAT ports assigned to them.
Now we associate the NAT gateway with the AKS subnet. It takes a while till all outbound connectivity gets handled by the NAT gateway due to the default TCP idle reset of 30 minutes of the load balancer.
An important note at this point from the Azure documentation:
When NAT gateway is configured to a virtual network where standard Load balancer with outbound rules already exists, NAT gateway will take over all outbound traffic moving forward. There will be no drops in traffic flow for existing connections on Load balancer. All new connections will use NAT gateway.
The transfer from the load balancer to the NAT gateway is seamless for your workloads running on AKS.
In case of a Virtual Network NAT outage, you simply disassociate the NAT gateway from the AKS subnet, and outbound connectivity is handled again by the load balancer as seen above in the screenshot.
The most effective way for you to prevent SNAT port exhaustion on an Azure Kubernetes Service cluster is the usage of Virtual Network NAT.
Depending on your needs you can use the above-described configuration enabling Virtual Network NAT for existing Azure Kubernetes Service clusters and have a DR strategy in place when it comes to a Virtual Network NAT outage. The configuration as described above allows you to reestablish outbound connectivity of your workloads till a Virtual Network NAT outage has been resolved.
Or you deploy a new Azure Kubernetes Service cluster with the outbound type managedNATGateway or userAssignedNATGateway enabled.
But as of writing this blog post, you cannot update existing Azure Kubernetes Service clusters with the outbound type loadBalancer to the outbound type managedNATGateway or userAssignedNATGateway nor you can switch back to the outbound type loadBalancer without redeploying an Azure Kubernetes Service cluster if it has been provisioned with the managedNATGateway or userAssignedNATGateway option.
That said in case of a Virtual Network NAT outage, and you depend on outbound connectivity for your workloads the official configuration for using Virtual Network NAT on AKS with the outbound types managedNATGateway or userAssignedNATGateway might not be the one you would like to use.