Databases on AWS

Michael Wittig – 30 Apr 2020

Andy Jassy, CEO of AWS, proclaimed #DBFreedom, aka use whatever database you like. AWS offers them all. At least, that’s what AWS marketing wants us to understand.

In the real world, AWS offers a wide variety of databases for different use cases. Your job is to pick the right solution for your problem. Knowing all the options improves the quality of your architectural decisions. In this blog post, I introduce all the database options that AWS offers.

This is a cross-post from the Cloudcraft blog.

Databases on AWS: #DBFreedom

Amazon Relational Database Service (RDS)

Amazon RDS provides traditional relational databases operated by AWS. You can create a new database with the click of a button, wait 5-15 minutes, and you are ready to go. AWS takes care of patching, monitoring, backups, HA deployments, and read-replicas.

The following engines are supported:

MySQL
MariaDB
PostgreSQL
Oracle Database
Microsoft SQL Serve

Use RDS if no other database is a better fit or if you are in doing a lift&shift migration.

Amazon Aurora (Serverless)

Amazon Aurora is also part of RDS, but it requires a more detailed view. Aurora is a proprietary database engine developed by Amazon. The core of the technology is a unique storage layer that makes it possible to scale relational databases horizontally without hassle. The following figure demonstrates the replicated storage layer and the horizontally scalable database instance.

Amazon Aurora

Aurora provides a MySQL or PostgreSQL compatible database that is easy to scale.

If you go with the Serverless offering, you get a database that scales in and out depending on load. Does your workload run on MySQL or PostgreSQL? Give Aurora Serverless a try!

Amazon DynamoDB

Amazon DynamoDB is a NoSQL database with virtually unlimited scaling. Both in terms of storage and queries per second. The downside is that DynamoDB is not a relational database and does not support SQL. You can think of it as a document or key-value store if you are familiar with those concepts. If you create a data model, you have to work from the queries backward. The following figure shows a Serverless application that uses DynamoDB as a data store.

Amazon DynamoDB

You interact with DynamoDB via the AWS API. Usually, you use one of the AWS SDKs to call the API from your programming language of choice.

Amazon DocumentDB

Amazon DocumentDB provides a MongoDB compatible database hosted by AWS. DocumentDB is powered by Aurora storage technology. MongoDB is a document database that can be used as a primary database.

Not all features of MongoDB 3.6 are supported. If you are looking for a real MongoDB, check out MongoDB Atlas.

Typical use cases include more complex data models, as found in business applications.

Amazon ElastiCache

Amazon ElastiCache provides in-memory caches operated by AWS. Choose between Redis and memcached, two popular Open Source in-memory databases. You can expect similar features that RDS provides as well: high availability, snapshots, and many more. Caches are usually not used as the primary data source. Instead, a cache is used to offload reads from the database.

Typical use cases are caching, low latency and read-intensive lookups, volatile data (e.g., a session store).

Amazon ElastiCache

Cached data can become outdated. Commonly, caches are not always consistent with the primary data source. But the performance benefits are worth it.

Amazon Elasticsearch Service

Amazon Elasticsearch Service provides Elasticsearch hosted by AWS. Elasticsearch is a document store with a search engine. Usually, Elasticsearch is not used as the primary database. Instead, data is replicated into Elasticsearch to provide search functionality to end-users.

Amazon Elasticsearch Service

Due to some licensing conflicts with Elastic, Amazon provides the Open Distro for Elasticsearch, which is a flavor of Elasticsearch. Don’t be confused by Open Distro. It’s the same as Elasticsearch, but open-source implementations replace the commercial plugins.

Typical use case: full-text search for documents as well as faceted search for an online shop.

Amazon Redshift

Amazon Redshift provides a data warehouse managed by AWS. Redshift can deal with up to 8 PB of data! It’s a relational database that supports SQL for queries. Redshift is good at inserting large data chunks in one shot. It doesn’t like to receive frequent but small updates.

The typical use case is a data warehouse.

Amazon Neptune

Amazon Neptune provides a graph database operated by AWS. Neptune supports Gremlin and SPARQL to interact with the data. Graph databases shine when your data is highly interconnected. Graphs can be used to answer questions such as “people like you bought this” or “friends of friends like that”.

The typical use case is highly interconnected data.

Amazon Quantum Ledger Database (QLDB)

Amazon QLDB provides a ledger database. This is a new category of databases. You cannot update or delete data in QLDB. You can only append new data. QLDB goes one step further: you can cryptographically verify that the data has not changed. A perfect fit for a system that deals with financial transactions that must never change. QLDB comes with zero operational effort for you. AWS takes care of everything.

Amazon Keyspaces (for Apache Cassandra)

Amazon Keyspaces provides a Cassandra compatible database. MCS comes with a Serverless offering with built-in auto-scaling. All you have to do is query the database. Cassandra is known as a wide column database with a proven track record. Be warned, Cassandra is not easy to use!

Previously known as Amazon Managed Apache Cassandra Service (MCS).

Amazon Timestream

Amazon Timestream provides a time-series database. Time series data is all about time. Good fits are sensor data, stock market data, FX rates, and so on. Whenever tuples of time and value are stored, and your queries deal with time spans, this database might be of value. Unfortunately, there is not too much information about Timestream published at this moment. Timestream is in private preview and, therefore, not ready for most of us and certainly not for production workloads!

Comparison

The following table provides a comparison of the different database options on AWS.

	RDS / Aurora / Serverless	DynamoDB	DocumentDB	ElastiCache	Elasticsearch Service	Redshift	Neptune	QLDB	Amazon Keyspaces (for Apache Cassandra)	Timestream
Max. Data Volume	64 TiB	Unlimited	64 TB	155 TiB	3 PB	8 PB	64 TB	Unlimited	Unlimited	Not documented yet.
Interface	SQL	AWS API	Subset of MongoDB API	Redis/memcached API	Elasticsearch API	SQL	Subset of Gremlin & SPARQL	Subset of PartiQL	Subset of CQL	Not documented yet.
Replication	None, Multi-AZ, Multi-Region	Multi-AZ, Multi-Region	Multi-AZ	Multi-AZ	Multi-AZ	Multi-AZ	Multi-AZ	Multi-AZ	Multi-AZ	Not documented yet.
DB Model	Relational	Key-value, Document	Document	Key-value	Document, SearchEngine	Relational	Graph	Ledger	Widecolumn	Timeseries

Michael Wittig

I’ve been building on AWS since 2012 together with my brother Andreas. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV, attachmentAV, HyperEnv, and marbot.

Here are the contact options for feedback and questions.

Databases on AWS

Amazon Relational Database Service (RDS)

Amazon Aurora (Serverless)

Amazon DynamoDB

Amazon DocumentDB

Amazon ElastiCache

Amazon Elasticsearch Service

Amazon Redshift

Amazon Neptune

Amazon Quantum Ledger Database (QLDB)

Amazon Keyspaces (for Apache Cassandra)

Amazon Timestream

Comparison

Michael Wittig

Further reading