Cheap, Durable, Fast. How to choose an EBS volume type?
Elastic Block Storage (EBS) provides solid state drives (SSD) and hard disk drives (HDD) for EC2 instances. The virtual machine accesses the persistent storage via the network. In December 2020, AWS announced another volume type called
General Purpose SSD (gp3). So now there are three volume types based on SSDs. With this blog post, I compare
io2 volumes and guide how to choose the volume type that fits best a specific scenario.
Do you prefer listening to a podcast episode over reading a blog post? Here you go!
At first sight, the
gp2 volume type is easy to use. The volume size determines the price as well as the baseline throughput (IOPS and bandwidth).
Baseline Throughput (IOPS) = MIN( Volume Size (GiB) * 3, 3000 )
Volumes smaller than 1,000 GiB can burst up to 3,000 IOPS for a short period per day. I’ve seen many infrastructures fail because the performance was fine during testing but degraded significantly an hour after the go-live in production.
The latest generation of General Purpose SSD volumes is different. Every volume comes with a baseline performance of 3,000 IOPS and 125 MB/s regardless of the size. So the baseline performance of a
gp3 volume is the same as the burst capacity of a
Be careful; a
gp2volume larger than 333 GiB provides a maximum bandwidth of 250 MiB/s. The
gp3volume comes with only 125 MB/s by default.
Surprisingly, it is possible to increase the maximum throughput of a
gp3 volume by provisioning additional IOPS and bandwidth. Provisioned throughput was the unique selling point of
io2 volumes, which I present next.
gp3 volume supports up to 16,000 IOPS and 1,000 MiB/s. I want to highlight the maximum bandwidth of 1,000 MiB/s, four times as much as gp2.
AWS charges by size, IOPS, and bandwidth. 3,000 IOPS and 125 MB/s are included for free for every volume.
While AWS says gp3 delivers up to 20% lower price-point over gp2, in reality the price advantage is somewhere between 7 and 20% depending on IOPS and throughput required in addition to baseline.
io2 volume type works similar to
gp3. When creating an
io2 volume, you specify the size as well as the provisioned IOPS. Again, AWS charges by size and provisioned IOPS.
By the way, AWS announced
io2in August 2020. The Relational Database Service (RDS) does not support
io2volumes yet. The service still runs on
io2 volume is much more expensive than a
gp3 volume, as shown in the following example.
|Volume Size||IOPS||Bandwith (MB/s)||gp3 (USD/month)||io2 (USD/month)|
As a rule of thumb, an
io2 volume costs 3 to 4 times as much as a
So what is the big difference between
Staying ahead of the game with Amazon Web Services (AWS) is a challenge. Our weekly videos and online events provide independent insights into the world of cloud. Subscribe to cloudonaut plus to get access to our exclusive videos and online events.Subscribe now!
- Durability: the annual failure rate of an
io2volume is 0.001%. That’s a huge difference compared to
gp3with an annual failure rate of 0.2%. In other words, 1 of 500
gp3volumes fails every year. But, only 1 of 100,0000
io2volumes fail every year.
- SLA on throughput: an
io2volume promises to deliver the provisioned performance 99.9 percent of the time. There is no such guarantee for
- Maximum throughput: an
io2volume supports up to 64,000 IOPS and 1,000 MiB/s. That’s four times the maximum IOPS of a
gp3volume. However, both volume types do not provide more than 1,000 MiB/s bandwidth.
I decided not to discuss the previous generation
io1in this blog post. In summary, the previous generation is more expensive, less durable, and comes with lower maximum bandwidth. Check out Amazon EBS volume types to learn more.
Some AWS customers are complaining about degraded performance when switching from
gp3. For example, Silas has written down his experiences with gp3 and an Elasticsearch cluster. That’s why I decided to benchmark the three different volume types.
I did my test on Jan 11, 2021, with the following setup.
- EC2 Instance:
gp2Volume: 1000 GiB
gp3Volume: 1000 GiB, 3000 IOPS, 125 MB/s
io2Vokume: 1000 GiB, 3000 IOPS
- File System:
- Duration: 120 min
I’ve used fio to measure the I/O performance with the following commands.
fio --directory=/mnt/gp3 --name gp3 --direct=1 --rw=randrw --bs=16k --size=1G --numjobs=16 --time_based --runtime=7200 --group_reporting --norandommap
The following table shows the results.
|clat avg (usec)||5293||5302||5473|
|clat stdev (usec)||261||524||2702|
|clat 90.00p (usec)||5536||5664||5984|
|clat 95.00p (usec)||5664||5792||7264|
|clat 99.00p (usec)||5856||5984||16768|
|clat 99.90p (usec)||6688||6880||40192|
|clat 99.99p (usec)||8768||9664||69120|
|clat avg (usec)||5379||5365||5194|
|clat stdev (usec)||229||554||1041|
|clat 90.00p (usec)||5600||5728||5600|
|clat 95.00p (usec)||5728||5856||5728|
|clat 99.00p (usec)||5920||6496||8384|
|clat 99.90p (usec)||7520||11456||16512|
|clat 99.99p (usec)||9280||17536||23936|
As expected, all three volume types delivered 3,000 IOPS (1,500 read IOPS and 1,500 write IOPS) over 2 hours.
Please note, I’ve been using a block size of 16 KB for my I/O benchmark. That’s whe the volumes are not reaching their maximum bandwith.
However, there are some differences in the completion latency (
clat) - the time from submission to completion of the I/O pieces. The latency is much more stable (see
stdev and percentiles) for
io2 volumes. A
gp3 volume is somewhere in the middle between an
io2 volume and a
gp2 volume from a latency predictability point of view.
In summary, I could not find any hints on why switching from
gp3should slow down your workloads.
- The volume type
gp2is outdated. A
gp3volume is more cost-effective and predictable as it does not come with burstable performance.
- The volume type
io1is outdated. Choose
io2whenever available in your region.
- The volume type
io2is expensive but much more durable. On top of that, an
io2volume provides a SLA on the provisioned throughput. Therefore, I recommend
io2for production-critical database workloads.