Improve K8SAAS outbound flows
Context
Currently, AKS uses a load balancer and a public Azure IP to manage outbound traffic to the internet. After 2 years of operating AKS, it turns out that this type of outbound configuration is not suitable for all types of workloads. In fact, a load balancer-based outbound setup enforces a fixed number of ports per node, by default 1024 ports per node .
Today, the combination of the available port range per IP (64000) and the number of nodes per cluster (8) limits the maximum to 8000 ports per node. While 8000 ports may seem high, there are occasions when this limit is reached by a pod, impacting the functionality of other pods on the same node.
How resolve the problem
The Azure NAT Gateway resolves this issue by dynamically assigning the number of ports based on node requirements. Additionally, we use a pair of contiguous public IP addresses, which provides a total of 128,000 available ports.
Do you need it ?
We recommend the use of Azure NAT Gateway in the following cases:
- When you host workloads that make significant outbound calls to the internet, and you don't have control over those workloads.
- When using CI/CD jobs, especially if they are used for load or performance testing.
In fact, if you have control over the application components, it's more beneficial to manage your connection pools or keep alive. Often, excessive outbound connections can be related to issues like memory leaks or excessive CPU consumption.
SNAT problem can be monitor under panel Network:SNAT Activity
on grafana dashboard Cluster Overview
COST
Azure NAT Gateway is billed on an hourly basis, and you are also charged based on the data transfer (ingress and egress) that goes through it. This cost structure reflects both the usage duration and the volume of data being handled by the NAT Gateway.
How to request
As for now migration process is still in preview. We recommand to request a new cluster with natgateway option onPostIt