AWS Cost Optimization 101

Andreas Wittig – 09 Jan 2020

The beginning of the year is the perfect time to clean up and optimize. This also applies to your AWS bill. I’ve composed practical tips on how to cut costs with small effort.

AWS Cost Optimization 101

The good thing about AWS: you typically pay per usage. The bad thing about AWS: understanding the pricing models of all the AWS services is hard. Little self-promotion: our consulting firm widdix offers analyzing and optimizing your AWS bill.

The following mind map provides guidance for reducing costs based on my experience from analyzing and reducing AWS bills for various clients. Download the mind map as a PDF file for better readability.

Mind Map: AWS Cost Optimization

Let me start with the process of analyzing your AWS bill.

  1. Use the Cost Explorer to aggregate costs by service.
  2. Which services cause the highest costs?
  3. Do the costs per service match with your assumptions? For example, it is quite unlikely that you want to spend 2x more on CloudWatch than on EC2.
  4. Visualize your spending among the last 12 months.
  5. Are costs increasing by a similar amount each month for a specific service? If so, you might be piling up unused resources (e.g., EBS snapshots).
  6. Any high cost increases not caused by changes to your cloud infrastructure?
  7. Does the cost increase per month match with your revenue numbers?
  8. Open your AWS bill for the last three months and drill down into the details.
  9. Which resources of a service cause high costs? Justify the costs.
  10. Do the costs per service match with your estimations?
  11. Are there any hints for expenses caused by unused resources?

Primarily, you should watch out for the following aspects.

EC2

Purchase Savings Plans for baseline capacity. The deal is simple: you commit to a monthly usage of computing capacity, AWS grants a discount on the on-demand price. Read Reduce your AWS bill with Savings Plans to learn more.

Identify and terminate unused instances. Boring but very efficient.

Verify that instance type still reflects the current workload. Check the CloudWatch metrics for CPU, Storage I/O, and networking to come up with a first guess. After that, experiment to test your assumption.

Verify that the maximum I/O performance of the instance matches the performance of your EBS volumes. Remember that there is a network between your EC2 instance and your EBS volume. The instance type limits the maximum throughput to all attached EBS volumes. Make sure that matches with the configuration of your EBS volumes, where the volume type and provisioned IOPS define the maximum throughput. See EBS Volume Types and EBS–Optimized Instances for more details.

Use Spot Instances for stateless and non-production workloads. Keep in mind that AWS might terminate your spot instance anytime before using them in production. Read 3 simple ways of saving up to 90% of EC2 costs to learn more.

Switching to the latest instance types often cuts costs. For example, migrating from m4.large to m5.large? reduces the costs by 4%. On top of that, you get a small performance improvement as well.

Using AMD- or ARM-based instance types in favor of Intel-based instance types is also worth a look.

  • Savings potential for AMD-based instance types (e.g., t3a, m5a, and r5a): 10%
  • Savings potential for ARM-based instance types: 40%

On the one hand, it is a little bit more work to migrate to an Open Source operating system. Our operating system of choice on AWS is Amazon Linux, a free of charge Linux image maintained by Amazon. On the other hand, the cost savings are enormous.

EBS

Commonly, EBS snapshots are piling up. Therefore, delete snapshots created to backup data that are no longer needed. Also, check whether your backup solution deletes old snapshots. Have you written a backup solution with Lambda? Replace it with AWS Backup. Read Review: AWS Backup - A centralized place for managing backups? to learn more.

Delete snapshots belonging to unused AMIs. A typical waste management problem when your deployment pipeline builds AMIs for every commit.

Search for unused volumes and delete them. Check whether someone (script, Kubernetes, …) creates volumes automatically and does not clean them up.

S3

It’s obvious but still valid: delete unnecessary objects and buckets.

Consider using S3 Intelligent Tiering. Or, if you need to archive data, check out Glacier Deep Archive. Read 6 new ways to reduce your AWS bill with little effort to learn more.

Configure life-cycle policies define a retention period for objects. Read Object Lifecycle Management to learn more.

VPC

Check costs for NAT gateways. I’ve seen scenarios where placing EC2 instances into a public subnet was the only option to avoid horrendous traffic costs.

Also, create VPC endpoints for S3 and DynamoDB. Doing so reduces the traffic processed by your NAT gateways.

Traffic within the VPC is free? No, it is not. Traffic between Availability Zones is charged at $0.02/GB. Check the traffic costs and think about making changes to your architecture when necessary.

A VPC endpoint costs $7.20 per AZ and month in US East (N. Virginia). Adding VPC endpoints for ten services in 3 AZs costs $216 per month. And the processed data is not even included. In general, avoid VPC endpoints when possible. Read 6 new ways to reduce your AWS bill with little effort to learn more.

CloudWatch

Configure a retention period for all log groups. For example, delete log messages after 30 days.

Check costs for metrics API calls caused by 3rd party tools (e.g., Prometheus, Datadog, …). Make sure you are only polling metrics that you need and set the polling interval to 5 minutes.

Each CloudWatch alarms cost $0.10 per month; a CloudWatch dashboard costs $3.00 per month. Therefore, delete needless alarms and dashboards.

Identify unnecessary custom metrics. For example, configure the CloudWatch Agent only to send metrics that you need for monitoring.

Check costs for log ingestion. It might be necessary to reduce or filter the log events that you send to CloudWatch.

Serverless

Optimize memory configuration for Lambda functions. Check out AWS Lambda Power Tuning.

Use Provisioned Concurrency to reduce costs for high traffic Lambda functions and evaluate HTTP APIs as an alternative to API Gateway. Read All you need to know about AWS re:Invent in 2019 to learn more.

ECS

Using Fargate allows you to get rid of an overprovisioned fleet of EC2 instances. If using Fargate is not an option, check out the ECS Capacity Provider to scale the fleet of EC2 instances easily. Read ECS vs. Fargate: What’s the difference? to learn more.

Purchase Savings Plans for Fargate. Read Reduce your AWS bill with Savings Plans to learn more.

Use Fargate Spot for non-production workloads. Read AWS Fargate Spot Now Generally Available to learn more.

RDS

Enable RDS Storage Auto Scaling instead of over-provisioning storage capacity.

Consider switching to Aurora Serverless for unsteady. Check out our review to learn about the pros and cons.

And don’t forget to verify that the instance type of your database still reflects the current workload. Also, check that the maximum I/O performance of the compute layer matches the storage layer.

License costs for traditional database systems are tremendous. Migrating to an Open Source database should be on your long or short term TODO list.

DynamoDB

Switch to On-demand capacity mode for unsteady workloads. Read Cost savings with DynamoDB On-Demand: Lessons learned to learn more.

If on-demand capacity is not for you, use auto-scaling to adjust the provisioned capacity to the workload.

Elasticsearch

Make use of Reserved Instances were planning one year is feasible.

Evaluate UltraWarm tier (Preview) to retain large amounts of data at lower costs. Read UltraWarm for Amazon Elasticsearch Service to learn more.

Route 53

Increase TTL for records to reduce queries.

Are you using Route 53 resolver endpoints? You typically pay $270 per month for endpoints in 3 AZs. Therefore, you might want to consolidate your resolver endpoints from multiple AWS accounts.

CloudFront

Check the hit/miss ratio of the cache and adjust your configuration and TTL accordingly.

Bypassing the CloudFront cache and loading assets directly from S3 is more expensive. Therefore, restrict access to S3 by using an Origin Access Identity.

CloudTrail

Simple: delete unnecessary trails. Keep in mind that configuring more than one trail results in additional costs.

Check costs for data events (S3 and Lambda). Read AWS CloudTrail: your audit log is incomplete to learn more.

Now it is up to you. Go and reduce your AWS bill!

One more thing: make sure you have created a budget alarm to get notified about unexpected costs in advance. Consult the AWS documentation or learn how to receive AWS budget alarms via Slack.

Special thanks to Thorsten Höger from Taimos for reviewing this blog post.

Andreas Wittig

Andreas Wittig

I’ve been building on AWS since 2012 together with my brother Michael. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV, attachmentAV, HyperEnv, and marbot.

Here are the contact options for feedback and questions.