Burst credits of t2 EC2 instances need monitoring

EC2 is one of the fundamental services on AWS. If you are not 100% Serverless, your application health depends on the health of your EC2 instances. When I do AWS architecture reviews for our clients, I check that CPU burst capacity is monitored for EC2 instances of type t2. A widespread mistake is that credits of burstable EC2 instances (t2 family) are not monitored. If your burstable EC2 instance runs out of credits, the performance drops by 70-95%.

t2 performance can drop by 95%! Are you prepared?

Why are t2 instances special?

Let’s look at the t2.large in more detail. Your baseline performance is 0.6 vCPU (70% performance drop) while you can burst up to 2 vCPUs as long as you have credits. But the baseline can be much worse. Let’s look at the t2.nano, the cheapest instance type. Your baseline is 0.05 vCPU (95% performance drop) while you can burst up to 1 vCPU. The following table shows the information for all t2 instance types.

instance type performance drop baseline vCPUs maximum vCPUs
t2.nano 95% 0.05 1
t2.micro 90% 0.10 1
t2.small 80% 0.20 1
t2.medium 80% 0.40 2
t2.large 70% 0.60 2
t2.xlarge 77.5% 0.90 4
t2.2xlarge 83.1% 1.35 8

Why does this matter?

Your t2 instance only consumes credits if your application requires more than the baseline performance. This means that you can only run out of credits if your instance is bursting. In other words, credits are only consumed if your application needs performance.

Now imagine what happens if the t2 instance is no longer able to burst and drops to the baseline from one second to the other. This is precisely what happens if you run out of burst credits. In other words, performance drops by up to 95% from one second to the other. This will have a significant impact on your application. Most likely, the application will not be responsive anymore.

How can you avoid the performance drop?

First of all, your t2 instances are earning credits while not bursting. For Example, a t2.large earns 36 credits per hour. One CPU credit is equal to one vCPU running at 100% utilization for one minute. As long as you don’t run out of credits, performance will not drop to the baseline. But how do you know? First, each t2 EC2 instance publishes the CPUCreditBalance metric to CloudWatch. The metric reports the remaining CPU credits available. Second, you can define a CloudWatch Alarm that continuously monitors the CPUCreditBalance metric. As soon as you run out of credits, an alert is triggered and you can react. E.g., increase capacity or reschedule work.

CloudWatch Alarms and marbot

We found that emails are not a good way to handle alerts. In a team, multiple people are responsible. If you send an email to a group email address:

  1. Your team has no idea if someone already started to work on solving the issue.
  2. You disturb the whole team for each alert.
  3. It’s easy to ignore an email.
  4. You have no statistics about how many alerts are generated. Too many alerts are an indication that your team is no longer able to handle them.
  5. No help to investigate the issue is available, like links to the AWS Management Console.

To solve the problem, we built marbot: a Slack bot supporting your DevOps team to detect and solve incidents on AWS.

marbot forwards alerts to Slack

marbot sends alerts to a single user from the Slack channel via a direct message. If the user doesn’t acknowledge the alert within 5 minutes, marbot will escalate to the next level. Escalations minimize distraction while keeping response time low. Try marbot for free now.

CloudFormation template

We developed a CloudFormation template to monitor an EC2 instance in any region (includes CPUCreditBalance metric monitoring). The template integrates with marbot, but you can modify it to send out emails. The template is available on GitHub for free. We also offer a version that works with a fleet of EC2 instances managed by an Auto Scaling Group.

If you have already installed marbot, you can also ask marbot to monitor your EC2 instance or read more detailed setup instructions. Otherwise: Try marbot for free now.

Summary

t2 EC2 instances are cheap, but they increase complexity. You have to monitor burst credits to ensure that you will not suffer from a 95% performance drop. We usually advise not to use the t2 family in production systems that serve user traffic. But we like t2 instances for test environments and internal applications like Jenkins. If you want to stay with the low-cost t2 family, you can enable T2 Unlimited which provides a way to continue to burst without credits but with an additional charge.

Want to learn more?

The Cloud Monitoring Seminar

Learn more about monitoring your AWS infrastructure! Subscribe to our free weekly e-mail seminar. Each week, you receive an email to learn about one aspect of monitoring your AWS account. In the first week, you learn how to monitor an RDS database instance to its full extent.

Subscribe for free to The Cloud Monitoring Seminar!

The Cloud Monitoring Seminar is provided in cooperation with marbot. A Slack bot supporting your DevOps team to detect and solve incidents on AWS.

Published on


Subscribe now and we'll keep you posted about new content on our blog.

We are raffling our book AWS in Action among all newsletter subscribers.

Newsletter RSS Feedly
Michael Wittig

Michael Wittig

I’m the author of Amazon Web Services in Action. I work as a software engineer, and independent consultant focused on AWS and DevOps.

Is anything missing in my article? I'm looking forward to your feedback! @hellomichibye or michael@widdix.de.

marbot

Are you part of a highly motivated DevOps team? Use marbot, a friendly chatbot, to forward all kind of alerts from your AWS infrastructure to Slack. Alerts are escalated across your team automatically allowing you to focus on your daily work.

Amazon Web Services in Action (Second Edition)

Amazon Web Services in Action (Second Edition) introduces you to computing, storing, and networking in the AWS cloud.