Cheap, Durable, Fast. How to choose an EBS volume type?

Andreas Wittig – 19 Jan 2021

Elastic Block Storage (EBS) provides solid state drives (SSD) and hard disk drives (HDD) for EC2 instances. The virtual machine accesses the persistent storage via the network. In December 2020, AWS announced another volume type called General Purpose SSD (gp3). So now there are three volume types based on SSDs. With this blog post, I compare gp2, gp3, and io2 volumes and guide how to choose the volume type that fits best a specific scenario.

Do you prefer listening to a podcast episode over reading a blog post? Here you go!

gp2 (General Purpose SSD)

At first sight, the gp2 volume type is easy to use. The volume size determines the price as well as the baseline throughput (IOPS and bandwidth).

Baseline Throughput (IOPS) = MIN( Volume Size (GiB) * 3, 3000 )

Volumes smaller than 1,000 GiB can burst up to 3,000 IOPS for a short period per day. I’ve seen many infrastructures fail because the performance was fine during testing but degraded significantly an hour after the go-live in production.

gp3 (General Purpose SSD)

The latest generation of General Purpose SSD volumes is different. Every volume comes with a baseline performance of 3,000 IOPS and 125 MB/s regardless of the size. So the baseline performance of a gp3 volume is the same as the burst capacity of a gp2 volume.

Be careful; a gp2 volume larger than 333 GiB provides a maximum bandwidth of 250 MiB/s. The gp3 volume comes with only 125 MB/s by default.

Surprisingly, it is possible to increase the maximum throughput of a gp3 volume by provisioning additional IOPS and bandwidth. Provisioned throughput was the unique selling point of io2 volumes, which I present next.

An gp3 volume supports up to 16,000 IOPS and 1,000 MiB/s. I want to highlight the maximum bandwidth of 1,000 MiB/s, four times as much as gp2.

AWS charges by size, IOPS, and bandwidth. 3,000 IOPS and 125 MB/s are included for free for every volume.

While AWS says gp3 delivers up to 20% lower price-point over gp2, in reality the price advantage is somewhere between 7 and 20% depending on IOPS and throughput required in addition to baseline.

io2 (Provisioned IOPS SSD)

The io2 volume type works similar to gp3. When creating an io2 volume, you specify the size as well as the provisioned IOPS. Again, AWS charges by size and provisioned IOPS.

By the way, AWS announced io2 in August 2020. The Relational Database Service (RDS) does not support io2 volumes yet. The service still runs on io1.

However, an io2 volume is much more expensive than a gp3 volume, as shown in the following example.

Volume Size	IOPS	Bandwith (MB/s)	gp3 (USD/month)	io2 (USD/month)
1000	3000	125	80	320
2000	6000	250	180	640
2000	12000	500	220	1030

As a rule of thumb, an io2 volume costs 3 to 4 times as much as a gp3 volume.

So what is the big difference between io2 and gp3?

Durability: the annual failure rate of an io2 volume is 0.001%. That’s a huge difference compared to gp3 with an annual failure rate of 0.2%. In other words, 1 of 500 gp3 volumes fails every year. But, only 1 of 100,0000 io2 volumes fail every year.
SLA on throughput: an io2 volume promises to deliver the provisioned performance 99.9 percent of the time. There is no such guarantee for gp3 volumes.
Maximum throughput: an io2 volume supports up to 64,000 IOPS and 1,000 MiB/s. That’s four times the maximum IOPS of a gp3 volume. However, both volume types do not provide more than 1,000 MiB/s bandwidth.

I decided not to discuss the previous generation io1 in this blog post. In summary, the previous generation is more expensive, less durable, and comes with lower maximum bandwidth. Check out Amazon EBS volume types to learn more.

I/O Benchmark

Some AWS customers are complaining about degraded performance when switching from gp2 to gp3. For example, Silas has written down his experiences with gp3 and an Elasticsearch cluster. That’s why I decided to benchmark the three different volume types.

I did my test on Jan 11, 2021, with the following setup.

Region: eu-west-1
EC2 Instance: m5.8xlarge
gp2 Volume: 1000 GiB
gp3 Volume: 1000 GiB, 3000 IOPS, 125 MB/s
io2 Vokume: 1000 GiB, 3000 IOPS
File System: xfs
Duration: 120 min

I’ve used fio to measure the I/O performance with the following commands.

fio --directory=/mnt/gp3 --name gp3 --direct=1 --rw=randrw --bs=16k --size=1G --numjobs=16 --time_based --runtime=7200 --group_reporting --norandommap
fio --directory=/mnt/gp2 --name gp2 --direct=1 --rw=randrw --bs=16k --size=1G --numjobs=16 --time_based --runtime=7200 --group_reporting --norandommap
fio --directory=/mnt/io2 --name io2 --direct=1 --rw=randrw --bs=16k --size=1G --numjobs=16 --time_based --runtime=7200 --group_reporting --norandommap

The following table shows the results.

Read	io2	gp3	gp2
bw (KB/s)	23980	23991	23991
iops	1498	1499	1499
clat avg (usec)	5293	5302	5473
clat stdev (usec)	261	524	2702
clat 90.00p (usec)	5536	5664	5984
clat 95.00p (usec)	5664	5792	7264
clat 99.00p (usec)	5856	5984	16768
clat 99.90p (usec)	6688	6880	40192
clat 99.99p (usec)	8768	9664	69120

Write	io2	gp3	gp2
bw (KB/s)	23989	24000	24000
iops	1499	1499	1499
clat avg (usec)	5379	5365	5194
clat stdev (usec)	229	554	1041
clat 90.00p (usec)	5600	5728	5600
clat 95.00p (usec)	5728	5856	5728
clat 99.00p (usec)	5920	6496	8384
clat 99.90p (usec)	7520	11456	16512
clat 99.99p (usec)	9280	17536	23936

As expected, all three volume types delivered 3,000 IOPS (1,500 read IOPS and 1,500 write IOPS) over 2 hours.

Please note, I’ve been using a block size of 16 KB for my I/O benchmark. That’s whe the volumes are not reaching their maximum bandwith.

However, there are some differences in the completion latency (clat) - the time from submission to completion of the I/O pieces. The latency is much more stable (see stdev and percentiles) for io2 volumes. A gp3 volume is somewhere in the middle between an io2 volume and a gp2 volume from a latency predictability point of view.

In summary, I could not find any hints on why switching from gp2 to gp3 should slow down your workloads.

Guidance

The volume type gp2 is outdated. A gp3 volume is more cost-effective and predictable as it does not come with burstable performance.
The volume type io1 is outdated. Choose io2 whenever available in your region.
The volume type io2 is expensive but much more durable. On top of that, an io2 volume provides a SLA on the provisioned throughput. Therefore, I recommend io2 for production-critical database workloads.

Andreas Wittig

I’ve been building on AWS since 2012 together with my brother Michael. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV,HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.

Cheap, Durable, Fast. How to choose an EBS volume type?

gp2 (General Purpose SSD)

gp3 (General Purpose SSD)

io2 (Provisioned IOPS SSD)

I/O Benchmark

Guidance

Andreas Wittig

Further reading