Monitor VPC NAT gateways with CloudWatch metrics and alarms
Many VPC designs make use of public and private subnets. A NAT gateway is needed to communicate from a private subnet with the Internet.
A VPC NAT gateway is a finite resource that can be exhausted. That’s why you need to add monitoring to be alerted if the NAT gateway gets a bottleneck.
Each NAT gateway sends metrics to CloudWatch that we can monitor with CloudWatch alarms. We recommend creating alarms for the following metrics:
ErrorPortAllocation: The number of times the NAT gateway could not allocate a source port.
PacketsDropCount: The number of packets dropped by the NAT gateway.
Unfortunately, NAT gateways do not report a single metric on the throughput utilization of bandwidth and packets. The maximum bandwidth is 100 Gbit/second and 10,000,000 packets/second. Luckily, we can calculate throughput by using CloudWatch metric math.
To calculate the bandwidth utilization, we use the following metrics:
And the following expressions:
|bandwidth||(in1+in2+out1+out2)/60*8/1000/1000/1000||Bytes/min to Gbit/s|
|utilization||bandwidth/100*100||to %; 100 Gbit/s is the hard limit|
CloudWatch metric math sounds complicated? We have you covered! Monitor NAT gateways and receive alerts in Slack or Microsoft Teams with our ChatOps bot marbot!
PS: If marbot is not your thing, you can still find inspiration in marbot’s CloudFormation template.