Databases on AWS
Andy Jassy, CEO of AWS, proclaimed #DBFreedom, aka use whatever database you like. AWS offers them all. At least, that’s what AWS marketing wants us to understand.
In the real world, AWS offers a wide variety of databases for different use cases. Your job is to pick the right solution for your problem. Knowing all the options improves the quality of your architectural decisions. In this blog post, I introduce all the database options that AWS offers.
This is a cross-post from the Cloudcraft blog.
Amazon Relational Database Service (RDS)
Amazon RDS provides traditional relational databases operated by AWS. You can create a new database with the click of a button, wait 5-15 minutes, and you are ready to go. AWS takes care of patching, monitoring, backups, HA deployments, and read-replicas.
The following engines are supported:
- Oracle Database
- Microsoft SQL Serve
Use RDS if no other database is a better fit or if you are in doing a lift&shift migration.
Amazon Aurora (Serverless)
Amazon Aurora is also part of RDS, but it requires a more detailed view. Aurora is a proprietary database engine developed by Amazon. The core of the technology is a unique storage layer that makes it possible to scale relational databases horizontally without hassle. The following figure demonstrates the replicated storage layer and the horizontally scalable database instance.
Aurora provides a MySQL or PostgreSQL compatible database that is easy to scale.
If you go with the Serverless offering, you get a database that scales in and out depending on load. Does your workload run on MySQL or PostgreSQL? Give Aurora Serverless a try!
Amazon DynamoDB is a NoSQL database with virtually unlimited scaling. Both in terms of storage and queries per second. The downside is that DynamoDB is not a relational database and does not support SQL. You can think of it as a document or key-value store if you are familiar with those concepts. If you create a data model, you have to work from the queries backward. The following figure shows a Serverless application that uses DynamoDB as a data store.
You interact with DynamoDB via the AWS API. Usually, you use one of the AWS SDKs to call the API from your programming language of choice.
Amazon DocumentDB provides a MongoDB compatible database hosted by AWS. DocumentDB is powered by Aurora storage technology. MongoDB is a document database that can be used as a primary database.
Not all features of MongoDB 3.6 are supported. If you are looking for a real MongoDB, check out MongoDB Atlas.
Typical use cases include more complex data models, as found in business applications.
Amazon ElastiCache provides in-memory caches operated by AWS. Choose between Redis and memcached, two popular Open Source in-memory databases. You can expect similar features that RDS provides as well: high availability, snapshots, and many more. Caches are usually not used as the primary data source. Instead, a cache is used to offload reads from the database.
Typical use cases are caching, low latency and read-intensive lookups, volatile data (e.g., a session store).
Cached data can become outdated. Commonly, caches are not always consistent with the primary data source. But the performance benefits are worth it.
Amazon Elasticsearch Service
Amazon Elasticsearch Service provides Elasticsearch hosted by AWS. Elasticsearch is a document store with a search engine. Usually, Elasticsearch is not used as the primary database. Instead, data is replicated into Elasticsearch to provide search functionality to end-users.
Due to some licensing conflicts with Elastic, Amazon provides the Open Distro for Elasticsearch, which is a flavor of Elasticsearch. Don’t be confused by Open Distro. It’s the same as Elasticsearch, but open-source implementations replace the commercial plugins.
Typical use case: full-text search for documents as well as faceted search for an online shop.
Amazon Redshift provides a data warehouse managed by AWS. Redshift can deal with up to 8 PB of data! It’s a relational database that supports SQL for queries. Redshift is good at inserting large data chunks in one shot. It doesn’t like to receive frequent but small updates.
The typical use case is a data warehouse.
Amazon Neptune provides a graph database operated by AWS. Neptune supports Gremlin and SPARQL to interact with the data. Graph databases shine when your data is highly interconnected. Graphs can be used to answer questions such as “people like you bought this” or “friends of friends like that”.
The typical use case is highly interconnected data.
Amazon Quantum Ledger Database (QLDB)
Amazon QLDB provides a ledger database. This is a new category of databases. You cannot update or delete data in QLDB. You can only append new data. QLDB goes one step further: you can cryptographically verify that the data has not changed. A perfect fit for a system that deals with financial transactions that must never change. QLDB comes with zero operational effort for you. AWS takes care of everything.
Amazon Keyspaces (for Apache Cassandra)
Amazon Keyspaces provides a Cassandra compatible database. MCS comes with a Serverless offering with built-in auto-scaling. All you have to do is query the database. Cassandra is known as a wide column database with a proven track record. Be warned, Cassandra is not easy to use!
Previously known as Amazon Managed Apache Cassandra Service (MCS).
Amazon Timestream provides a time-series database. Time series data is all about time. Good fits are sensor data, stock market data, FX rates, and so on. Whenever tuples of time and value are stored, and your queries deal with time spans, this database might be of value. Unfortunately, there is not too much information about Timestream published at this moment. Timestream is in private preview and, therefore, not ready for most of us and certainly not for production workloads!
The following table provides a comparison of the different database options on AWS.
|RDS / Aurora / Serverless||DynamoDB||DocumentDB||ElastiCache||Elasticsearch Service||Redshift||Neptune||QLDB||Amazon Keyspaces (for Apache Cassandra)||Timestream|
|Max. Data Volume||64 TiB||Unlimited||64 TB||155 TiB||3 PB||8 PB||64 TB||Unlimited||Unlimited||Not documented yet.|
|Interface||SQL||AWS API||Subset of MongoDB API||Redis/memcached API||Elasticsearch API||SQL||Subset of Gremlin & SPARQL||Subset of PartiQL||Subset of CQL||Not documented yet.|
|Replication||None, Multi-AZ, Multi-Region||Multi-AZ, Multi-Region||Multi-AZ||Multi-AZ||Multi-AZ||Multi-AZ||Multi-AZ||Multi-AZ||Multi-AZ||Not documented yet.|
|DB Model||Relational||Key-value, Document||Document||Key-value||Document, SearchEngine||Relational||Graph||Ledger||Widecolumn||Timeseries|