Monitoring EC2 Network Utilization
This post was originally published on the marbot blog.
Are you monitoring the network utilization of your EC2 instances? Why not? The network is one of the rare resources that will limit your workload’s maximum throughput:
- CPU
- Memory
- Network
- Disk
- GPU
I’ve debugged performance problems in a lot of infrastructures during the last 12 months. In most of the scenarios, the network capabilities of EC2 or RDS instances was the bottleneck causing troubles. That is why I want to share with you how to monitor the network utilization of EC2 instances.
To monitor the networking utilization of an EC2 instance, we need to solve two challenges.
Challenge #1: What’s the network performance of my EC2 instance?
To be able to monitor the network utilization of your EC2 instance, you need to be able to answer the following question. What are the baseline and maximum network throughput of your EC2 instance? Unfortunately, AWS does not provide accurate information about the network performance for most instance types. For example, AWS promises Moderate
network performance for a t2.xlarge
instance or Up to 10 Gbps
for a m5.large
instance.
This provided information is not satisfactory. That is why I ran a network performance benchmark and published the results at EC2 Network Performance Cheat Sheet. The results are astonishing.
An m5.large
instance provides 10.04 Gbit/s for a few minutes only. Afterward, the baseline network performance for an m5.large instance is around 0.74 Gbit/s. The results for other instance types look similar.
The EC2 Network Performance Cheat Sheet gives you an estimation for the baseline and maximum network throughput of your EC2 instance which allows you to define a threshold for monitoring.
Fine, we have solved the challenge #1.
Challenge #2: How to combine multiple CloudWatch metrics?
Each EC2 instance reports various metrics to CloudWatch. The metrics NetworkIn
and NetworkOut
collect the number of bytes received on all network interfaces by the instance. However, to calculate the network utilization of your EC2 instance, you need to add up both metrics.
Pick one of the following options to create a CloudWatch alarm monitoring the total network utilization of your EC2 instance:
- Use the AWS Management Console to create the CloudWatch alarm manually.
- Use CloudFormation to create the CloudWatch alarm with Infrastructure as Code.
- Use marbot’s Jump Start to create the CloudWatch alarm.
AWS Management Console
Log into the AWS Management Console and go to CloudWatch. Select Alarms
from the sub-navigation and click the Create Alarm
button. The wizard shown in the following screenshot appears. Click the Select metric
button.
Search for the NetworkIn
and NetworkOut
metrics of your EC2 instance and select them both. After doing so, select the Graphed metrics
tab.
- Click
Add a math expression
. - Type in id
out
for theNetworkOut
metric andin
for theNetworkIn
metric. - Type in the expression
(in+out)/300/1000/1000/1000*8
.
Let me quickly explain the math expression (in+out)/300/1000/1000/1000*8
:
- Add up
in
andout
. - Divide by
300
to convert from 5 minutes to 1 second. - Divide by
1000/1000/1000*8
to convert Byte in Gbit.
Make sure you have only selected the math expression before you click the Select metric
button.
Finally, set up the alarm.
- Type in a name and description.
- Define the threshold. For example, 80% of the baseline network performance listed in the EC2 Network Performance Cheat Sheet.
- To avoid alarms from short network utilization spikes configure
8 out of 12 datapoints
. Which translates to 45 minutes within an hour.
Click the Create Alarm
button.
Fine, you have set up a CloudWatch alarm monitoring the network utilization of your EC2 instance.
Instead of going through this process manually, you could create CloudWatch alarms in an automated way with the help of CloudFormation as well.
CloudFormation
The following snippet shows a CloudFormation template setting up a CloudWatch alarm monitoring the network utilization of an EC2 instance.
You need to modify the Threshold
. I suggest 80% of the network baseline performance as listed in the EC2 Network Performance Cheat Sheet.
AWSTemplateFormatVersion: '2010-09-09' |
Are you looking for an, even more, simpler way to monitor the network utilization of your EC2 instance?
marbot Jump Start
Our chatbot marbot escalates alarms among the members of your DevOps team. Luckily, marbot provides built-in Jump Starts which simplify creating CloudWatch alarms for your cloud resources. The Jump Start for EC2 instances sets up monitoring for network utilization as well.
- Add marbot to your Slack workspace.
- Invite marbot to a channel.
- Follow the installation instructions.
- Ask marbot for help monitoring your EC2 instance:
@marbot Help me to monitor my EC2 instance.
- Select
EC2 instance
orEC2 instances
as monitoring goal and follow the Jump Start wizard as shown in the following screenshot.
It couldn’t be easier!
Summary
Monitoring the network utilization of your EC2 instance is essential, as the network is a limited resource. The instance type affects maximum and baseline performance. Your EC2 instance might not be able to provide the maximum network performance for more than 5 to 30 minutes. Therefore, use the baseline performance to define the alarm threshold. Use EC2 Network Performance Cheat Sheet to get an estimation of the network performance of your EC2 instance.
Further reading
- Article Analyzing CloudTrail with Athena
- Article Connecting Kinesis Analytics with AWS IoT
- Article Cloud adaption strategy: event-based data synchronization
- Tag ec2
- Tag cloudwatch