Combine CloudWatch metrics for Auto Scaling or to reduce costs

Andreas Wittig – 17 Dec 2019

Every part of your AWS infrastructure emits utilization metrics. Amazon CloudWatch collects these metrics and allows you to visualize them as well as to define alarms. AWS announced an exciting new feature allowing you to combine multiple metrics recently: IF/AND/OR statements for metric math.

CloudWatch metric math

Combining CloudWatch metrics has several advantages:

  1. Simplify your monitoring configuration by reducing the number of CloudWatch alarms.
  2. Reduce costs by reducing the number of CloudWatch alarms (each alarm costs around USD 0.10 per month).
  3. Increase or decrease the desired capacity of an Auto Scaling Group according to multiple metrics (e.g., the typical bottlenecks CPU, memory, and network).

All you need to do is to define a Metric Math Expression that combines multiple metrics. Doing so results in a calculated metric. Next, you can define a CloudWatch alarm or a visualization based on the calculated metric.

Use metric math to combine multiple metrics

Let’s imagine the following scenario: you are using an Auto Scaling Group to launch EC2 instances. Typical bottlenecks of your virtual machines are:

How do you get notified or scale-out automatically when one of these resources gets scarce? And how do you get notified or scale in automatically when the resources are no longer being used?

The following screenshot shows four basic metrics:

  • CPU Utilization
  • Memory Utilization
  • Network In
  • Network Out

Please note that AWS does not provide a memory utilization metric by default. Therefore, I’m using the CloudWatch Agent to collect the data for a memory utilization metric.

Also, as explained in Monitoring EC2 Network Utilization, you need to combine the Network In and Network Out metric to calculate the total network throughput. The Network Utilization metric calculates the percentage utilization of the network.

However, I want to put your attention on the Summary Utilization metric:

IF(cpu > 70, 1, 0) OR IF(memory > 75, 1, 0) OR IF(network > 80, 1, 0)
  • If the CPU utilization is above 70%, the metric math expression will return 1.
  • If the memory utilization is above 75%, the metric math expression will return 1.
  • If the network utilization is above 80%, the metric math expression will return 1.
  • Otherwise, the metric math expression will return 0.

Metric Math Expression

Next, define a CloudWatch alarm based on the Summary Utilization metric. Use 1 for the threshold.

Metric Math Expression

The alarm will transition into the ALARM state when the CPU utilization is above 70%, or the memory utilization is above 75%, or the network utilization is above 80%. Configure the CloudWatch alarm to send a notification or increase the desired capacity of the Auto Scaling Group.

Do you prefer Infrastructure as Code? The following code snippet shows how to create the CloudWatch alarm with the help of CloudFormation.

Note: The example assumes that you are running a m5.large instance with a maximal network throughout of about 0.75 Gbit/s.

AWSTemplateFormatVersion: '2010-09-09'
Parameters:
AutoScalingGroupName:
Type: String
Resources:
EC2HighUtilization:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'EC2 High Utilization: CPU, memory, or network'
Metrics:
- Id: summary
Label: EC2 Utilization
Expression: IF(cpu > 70, 1, 0) OR IF(memory > 75, 1, 0) OR IF(network > 80, 1, 0)
ReturnData: true
- Id: cpu
MetricStat:
Metric:
Namespace: AWS/EC2
MetricName: CPUUtilization
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref AutoScalingGroupName
Stat: Maximum
Period: 300
ReturnData: false
- Id: memory
MetricStat:
Metric:
Namespace: CWAgent
MetricName: mem_used_percent
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref AutoScalingGroupName
Stat: Maximum
Period: 300
ReturnData: false
- Id: network
Label: Network Utilization
Expression: "((network_in+network_out)/300/1000/1000/1000*8)/0.75*100"
ReturnData: false
- Id: network_in
MetricStat:
Metric:
Namespace: AWS/EC2
MetricName: NetworkIn
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref AutoScalingGroupName
Stat: Sum
Period: 300
ReturnData: false
- Id: network_out
MetricStat:
Metric:
Namespace: AWS/EC2
MetricName: NetworkOut
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref AutoScalingGroupName
Stat: Sum
Period: 300
ReturnData: false
ComparisonOperator: GreaterThanOrEqualToThreshold
EvaluationPeriods: 1
DatapointsToAlarm: 1
Threshold: '1'

That’s all. Happy monitoring!

Summary

As CloudWatch metric math supports IF/AND/OR statements, it is possible to aggregate multiple metrics into a single metric. Doing so allows you to scale an Auto Scaling Group based on multiple metrics as well as reduce the number of CloudWatch alarms, which reduces costs.

This post was originally published on the marbot blog.

Andreas Wittig

Andreas Wittig

I’ve been building on AWS since 2012 together with my brother Michael. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV,HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.