Cheap, Durable, Fast. How to choose an EBS volume type?

Andreas Wittig – 19 Jan 2021

Elastic Block Storage (EBS) provides solid state drives (SSD) and hard disk drives (HDD) for EC2 instances. The virtual machine accesses the persistent storage via the network. In December 2020, AWS announced another volume type called General Purpose SSD (gp3). So now there are three volume types based on SSDs. With this blog post, I compare gp2, gp3, and io2 volumes and guide how to choose the volume type that fits best a specific scenario.

Cheap, Durable, Fast. How to choose an EBS volume type?

Do you prefer listening to a podcast episode over reading a blog post? Here you go!

gp2 (General Purpose SSD)

At first sight, the gp2 volume type is easy to use. The volume size determines the price as well as the baseline throughput (IOPS and bandwidth).

Baseline Throughput (IOPS) = MIN( Volume Size (GiB) * 3, 3000 )

Volumes smaller than 1,000 GiB can burst up to 3,000 IOPS for a short period per day. I’ve seen many infrastructures fail because the performance was fine during testing but degraded significantly an hour after the go-live in production.

gp3 (General Purpose SSD)

The latest generation of General Purpose SSD volumes is different. Every volume comes with a baseline performance of 3,000 IOPS and 125 MB/s regardless of the size. So the baseline performance of a gp3 volume is the same as the burst capacity of a gp2 volume.

Be careful; a gp2 volume larger than 333 GiB provides a maximum bandwidth of 250 MiB/s. The gp3 volume comes with only 125 MB/s by default.

Surprisingly, it is possible to increase the maximum throughput of a gp3 volume by provisioning additional IOPS and bandwidth. Provisioned throughput was the unique selling point of io2 volumes, which I present next.

An gp3 volume supports up to 16,000 IOPS and 1,000 MiB/s. I want to highlight the maximum bandwidth of 1,000 MiB/s, four times as much as gp2.

AWS charges by size, IOPS, and bandwidth. 3,000 IOPS and 125 MB/s are included for free for every volume.

While AWS says gp3 delivers up to 20% lower price-point over gp2, in reality the price advantage is somewhere between 7 and 20% depending on IOPS and throughput required in addition to baseline.

io2 (Provisioned IOPS SSD)

The io2 volume type works similar to gp3. When creating an io2 volume, you specify the size as well as the provisioned IOPS. Again, AWS charges by size and provisioned IOPS.

By the way, AWS announced io2 in August 2020. The Relational Database Service (RDS) does not support io2 volumes yet. The service still runs on io1.

However, an io2 volume is much more expensive than a gp3 volume, as shown in the following example.

Volume Size IOPS Bandwith (MB/s) gp3 (USD/month) io2 (USD/month)
1000 3000 125 80 320
2000 6000 250 180 640
2000 12000 500 220 1030

As a rule of thumb, an io2 volume costs 3 to 4 times as much as a gp3 volume.

So what is the big difference between io2 and gp3?

  1. Durability: the annual failure rate of an io2 volume is 0.001%. That’s a huge difference compared to gp3 with an annual failure rate of 0.2%. In other words, 1 of 500 gp3 volumes fails every year. But, only 1 of 100,0000 io2 volumes fail every year.
  2. SLA on throughput: an io2 volume promises to deliver the provisioned performance 99.9 percent of the time. There is no such guarantee for gp3 volumes.
  3. Maximum throughput: an io2 volume supports up to 64,000 IOPS and 1,000 MiB/s. That’s four times the maximum IOPS of a gp3 volume. However, both volume types do not provide more than 1,000 MiB/s bandwidth.

I decided not to discuss the previous generation io1 in this blog post. In summary, the previous generation is more expensive, less durable, and comes with lower maximum bandwidth. Check out Amazon EBS volume types to learn more.

I/O Benchmark

Some AWS customers are complaining about degraded performance when switching from gp2 to gp3. For example, Silas has written down his experiences with gp3 and an Elasticsearch cluster. That’s why I decided to benchmark the three different volume types.

I did my test on Jan 11, 2021, with the following setup.

  • Region: eu-west-1
  • EC2 Instance: m5.8xlarge
  • gp2 Volume: 1000 GiB
  • gp3 Volume: 1000 GiB, 3000 IOPS, 125 MB/s
  • io2 Vokume: 1000 GiB, 3000 IOPS
  • File System: xfs
  • Duration: 120 min

I’ve used fio to measure the I/O performance with the following commands.

fio --directory=/mnt/gp3 --name gp3 --direct=1 --rw=randrw --bs=16k --size=1G --numjobs=16 --time_based --runtime=7200 --group_reporting --norandommap
fio --directory=/mnt/gp2 --name gp2 --direct=1 --rw=randrw --bs=16k --size=1G --numjobs=16 --time_based --runtime=7200 --group_reporting --norandommap
fio --directory=/mnt/io2 --name io2 --direct=1 --rw=randrw --bs=16k --size=1G --numjobs=16 --time_based --runtime=7200 --group_reporting --norandommap

The following table shows the results.

Read io2 gp3 gp2
bw (KB/s) 23980 23991 23991
iops 1498 1499 1499
clat avg (usec) 5293 5302 5473
clat stdev (usec) 261 524 2702
clat 90.00p (usec) 5536 5664 5984
clat 95.00p (usec) 5664 5792 7264
clat 99.00p (usec) 5856 5984 16768
clat 99.90p (usec) 6688 6880 40192
clat 99.99p (usec) 8768 9664 69120
Write io2 gp3 gp2
bw (KB/s) 23989 24000 24000
iops 1499 1499 1499
clat avg (usec) 5379 5365 5194
clat stdev (usec) 229 554 1041
clat 90.00p (usec) 5600 5728 5600
clat 95.00p (usec) 5728 5856 5728
clat 99.00p (usec) 5920 6496 8384
clat 99.90p (usec) 7520 11456 16512
clat 99.99p (usec) 9280 17536 23936

As expected, all three volume types delivered 3,000 IOPS (1,500 read IOPS and 1,500 write IOPS) over 2 hours.

Please note, I’ve been using a block size of 16 KB for my I/O benchmark. That’s whe the volumes are not reaching their maximum bandwith.

However, there are some differences in the completion latency (clat) - the time from submission to completion of the I/O pieces. The latency is much more stable (see stdev and percentiles) for io2 volumes. A gp3 volume is somewhere in the middle between an io2 volume and a gp2 volume from a latency predictability point of view.

In summary, I could not find any hints on why switching from gp2 to gp3 should slow down your workloads.

Guidance

  • The volume type gp2 is outdated. A gp3 volume is more cost-effective and predictable as it does not come with burstable performance.
  • The volume type io1 is outdated. Choose io2 whenever available in your region.
  • The volume type io2 is expensive but much more durable. On top of that, an io2 volume provides a SLA on the provisioned throughput. Therefore, I recommend io2 for production-critical database workloads.

Andreas Wittig

Andreas Wittig

I’ve been building on AWS since 2012 together with my brother Michael. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV,HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.