Cheap, Durable, Fast. How to choose an EBS volume type?
Elastic Block Storage (EBS) provides solid state drives (SSD) and hard disk drives (HDD) for EC2 instances. The virtual machine accesses the persistent storage via the network. In December 2020, AWS announced another volume type called General Purpose SSD (gp3)
. So now there are three volume types based on SSDs. With this blog post, I compare gp2
, gp3
, and io2
volumes and guide how to choose the volume type that fits best a specific scenario.
Do you prefer listening to a podcast episode over reading a blog post? Here you go!
gp2 (General Purpose SSD)
At first sight, the gp2
volume type is easy to use. The volume size determines the price as well as the baseline throughput (IOPS and bandwidth).
Baseline Throughput (IOPS) = MIN( Volume Size (GiB) * 3, 3000 ) |
Volumes smaller than 1,000 GiB can burst up to 3,000 IOPS for a short period per day. I’ve seen many infrastructures fail because the performance was fine during testing but degraded significantly an hour after the go-live in production.
gp3 (General Purpose SSD)
The latest generation of General Purpose SSD volumes is different. Every volume comes with a baseline performance of 3,000 IOPS and 125 MB/s regardless of the size. So the baseline performance of a gp3
volume is the same as the burst capacity of a gp2
volume.
Be careful; a
gp2
volume larger than 333 GiB provides a maximum bandwidth of 250 MiB/s. Thegp3
volume comes with only 125 MB/s by default.
Surprisingly, it is possible to increase the maximum throughput of a gp3
volume by provisioning additional IOPS and bandwidth. Provisioned throughput was the unique selling point of io2
volumes, which I present next.
An gp3
volume supports up to 16,000 IOPS and 1,000 MiB/s. I want to highlight the maximum bandwidth of 1,000 MiB/s, four times as much as gp2.
AWS charges by size, IOPS, and bandwidth. 3,000 IOPS and 125 MB/s are included for free for every volume.
While AWS says gp3 delivers up to 20% lower price-point over gp2, in reality the price advantage is somewhere between 7 and 20% depending on IOPS and throughput required in addition to baseline.
io2 (Provisioned IOPS SSD)
The io2
volume type works similar to gp3
. When creating an io2
volume, you specify the size as well as the provisioned IOPS. Again, AWS charges by size and provisioned IOPS.
By the way, AWS announced
io2
in August 2020. The Relational Database Service (RDS) does not supportio2
volumes yet. The service still runs onio1
.
However, an io2
volume is much more expensive than a gp3
volume, as shown in the following example.
Volume Size | IOPS | Bandwith (MB/s) | gp3 (USD/month) | io2 (USD/month) |
---|---|---|---|---|
1000 | 3000 | 125 | 80 | 320 |
2000 | 6000 | 250 | 180 | 640 |
2000 | 12000 | 500 | 220 | 1030 |
As a rule of thumb, an io2
volume costs 3 to 4 times as much as a gp3
volume.
So what is the big difference between io2
and gp3
?
- Durability: the annual failure rate of an
io2
volume is 0.001%. That’s a huge difference compared togp3
with an annual failure rate of 0.2%. In other words, 1 of 500gp3
volumes fails every year. But, only 1 of 100,0000io2
volumes fail every year. - SLA on throughput: an
io2
volume promises to deliver the provisioned performance 99.9 percent of the time. There is no such guarantee forgp3
volumes. - Maximum throughput: an
io2
volume supports up to 64,000 IOPS and 1,000 MiB/s. That’s four times the maximum IOPS of agp3
volume. However, both volume types do not provide more than 1,000 MiB/s bandwidth.
I decided not to discuss the previous generation
io1
in this blog post. In summary, the previous generation is more expensive, less durable, and comes with lower maximum bandwidth. Check out Amazon EBS volume types to learn more.
I/O Benchmark
Some AWS customers are complaining about degraded performance when switching from gp2
to gp3
. For example, Silas has written down his experiences with gp3 and an Elasticsearch cluster. That’s why I decided to benchmark the three different volume types.
I did my test on Jan 11, 2021, with the following setup.
- Region:
eu-west-1
- EC2 Instance:
m5.8xlarge
gp2
Volume: 1000 GiBgp3
Volume: 1000 GiB, 3000 IOPS, 125 MB/sio2
Vokume: 1000 GiB, 3000 IOPS- File System:
xfs
- Duration: 120 min
I’ve used fio to measure the I/O performance with the following commands.
fio --directory=/mnt/gp3 --name gp3 --direct=1 --rw=randrw --bs=16k --size=1G --numjobs=16 --time_based --runtime=7200 --group_reporting --norandommap |
The following table shows the results.
Read | io2 | gp3 | gp2 |
---|---|---|---|
bw (KB/s) | 23980 | 23991 | 23991 |
iops | 1498 | 1499 | 1499 |
clat avg (usec) | 5293 | 5302 | 5473 |
clat stdev (usec) | 261 | 524 | 2702 |
clat 90.00p (usec) | 5536 | 5664 | 5984 |
clat 95.00p (usec) | 5664 | 5792 | 7264 |
clat 99.00p (usec) | 5856 | 5984 | 16768 |
clat 99.90p (usec) | 6688 | 6880 | 40192 |
clat 99.99p (usec) | 8768 | 9664 | 69120 |
Write | io2 | gp3 | gp2 |
---|---|---|---|
bw (KB/s) | 23989 | 24000 | 24000 |
iops | 1499 | 1499 | 1499 |
clat avg (usec) | 5379 | 5365 | 5194 |
clat stdev (usec) | 229 | 554 | 1041 |
clat 90.00p (usec) | 5600 | 5728 | 5600 |
clat 95.00p (usec) | 5728 | 5856 | 5728 |
clat 99.00p (usec) | 5920 | 6496 | 8384 |
clat 99.90p (usec) | 7520 | 11456 | 16512 |
clat 99.99p (usec) | 9280 | 17536 | 23936 |
As expected, all three volume types delivered 3,000 IOPS (1,500 read IOPS and 1,500 write IOPS) over 2 hours.
Please note, I’ve been using a block size of 16 KB for my I/O benchmark. That’s whe the volumes are not reaching their maximum bandwith.
However, there are some differences in the completion latency (clat
) - the time from submission to completion of the I/O pieces. The latency is much more stable (see stdev
and percentiles) for io2
volumes. A gp3
volume is somewhere in the middle between an io2
volume and a gp2
volume from a latency predictability point of view.
In summary, I could not find any hints on why switching from
gp2
togp3
should slow down your workloads.
Guidance
- The volume type
gp2
is outdated. Agp3
volume is more cost-effective and predictable as it does not come with burstable performance. - The volume type
io1
is outdated. Chooseio2
whenever available in your region. - The volume type
io2
is expensive but much more durable. On top of that, anio2
volume provides a SLA on the provisioned throughput. Therefore, I recommendio2
for production-critical database workloads.