Monitor VPC NAT gateways with CloudWatch metrics and alarms
Many VPC designs make use of public and private subnets. A NAT gateway is needed to communicate from a private subnet with the Internet.
A VPC NAT gateway is a finite resource that can be exhausted. That’s why you need to add monitoring to be alerted if the NAT gateway gets a bottleneck.
CloudWatch metrics
Each NAT gateway sends metrics to CloudWatch that we can monitor with CloudWatch alarms. We recommend creating alarms for the following metrics:
ErrorPortAllocation
: The number of times the NAT gateway could not allocate a source port.PacketsDropCount
: The number of packets dropped by the NAT gateway.
Monitoring throughput utilization
Unfortunately, NAT gateways do not report a single metric on the throughput utilization of bandwidth and packets. The maximum bandwidth is 100 Gbit/second and 10,000,000 packets/second. Luckily, we can calculate throughput by using CloudWatch metric math.
To calculate the bandwidth utilization, we use the following metrics:
ID | metric | statistic | period |
---|---|---|---|
in1 | BytesInFromDestination | Sum | 60 |
in2 | BytesInFromSource | Sum | 60 |
out1 | BytesOutToDestination | Sum | 60 |
out2 | BytesOutToSource | Sum | 60 |
And the following expressions:
ID | expression | comment |
---|---|---|
bandwidth | (in1+in2+out1+out2)/60*8/1000/1000/1000 | Bytes/min to Gbit/s |
utilization | bandwidth/100*100 | to %; 100 Gbit/s is the hard limit |
marbot’s Monitoring Setup Assistant
CloudWatch metric math sounds complicated? We have you covered! Monitor NAT gateways and receive alerts in Slack or Microsoft Teams with our ChatOps bot marbot!
PS: If marbot is not your thing, you can still find inspiration in marbot’s CloudFormation template.