How to Replicate Your Data with DynamoDB Global Tables
In my last post about Multi-Region AWS Architectures, I discussed how you could reduce end-user latency and increase availability by running your application in multiple regions. I compared AWS services that help you to run your application in various regions at the same time. In this post, we’ll focus on one data store that shines in multi-region architectures: Amazon DynamoDB.
This is a cross-post from the Cloudcraft blog.
Amazon DynamoDB
Amazon DynamoDB is a NoSQL database service that supports key-value and document data structures. DynamoDB is a fully managed service that you can use only on the AWS cloud. You can run it in a Serverless fashion where you only pay for what you use while DynamoDB adapts to the load automatically.
Using DynamoDB
DynamoDB is a key-value store that organizes your data in tables. Each table contains items (values) that are identified by keys. A table can also maintain secondary indexes for data lookup besides the primary key. You will now have a look at these basic building blocks of DynamoDB.
Table, item, attribute
A DynamoDB table has a name and organizes a collection of items. An item is a collection of attributes. An attribute is a name-value pair. The attribute value can be scalar (number, string, binary, boolean), multi-valued (number set, string set, binary set), or JSON document (object, array). Items in a table are not required to have the same attributes; there is no enforced schema.
You can create a table with the Management Console, CloudFormation, SDKs, or the CLI. The following example shows how you create a table with the CLI:
aws dynamodb create-table --table-name entity \ |
In line 1, you define the name of the table. Line 2 defines the attributes that are used in the key schema in line 3. Line 4 defines that you want to use DynamoDB in the serverless flavor.
Primary keys
A primary key is unique within a table and identifies an item. You need the primary key to lookup an item. The primary key is either a partition key or a partition and a sort key.
Partition keys
A partition key (formerly known as hash key) uses a single attribute of an item to create a hash index. If you want to look up an item based on its partition key, you need to know the exact partition key. A user table could use the user’s email as a primary key. A user then can be retrieved if you know the partition key (email, in this case).
Partition and sort keys
A partition and sort key (formerly known as hash and range key) use two attributes of an item to create a more powerful primary key. The first attribute is the partition part of the key, and the second part is the sort key. To look up an item, you need to know the exact partition part of the key, but you don’t need to know the sort part. Items are sorted by sort key within a partition key. This allows you to query the sort part of the key from a particular starting point. A message table can use a partition and sort as its primary key; the partition is the email of the user, and the sort is a timestamp. You can now look up all messages of a user that are newer than a specific timestamp.
Multi-Region
DynamoDB is one of the few data stores on AWS that can run in multiple regions and allows writes and reads to all the regions in parallel. For example, you can replicate a table created in us-east-1 to eu-west-1. If you write to us-east-1, the data item will show up in eu-west-1 as well. You can also write to eu-west-1, and the data item appears in us-east-1. Data is usually replicated across regions in under a second, which is way faster than any human can travel from one continent to another.
There is one thing to keep in mind when you modify the same data item in two regions simultaneously. If the same data item is modified in two or more regions at the same time, the last write wins. The previous write is discarded.
Unfortunately, DynamoDB transactions do not span multiple regions. A transaction only runs in a single region. You can modify multiple items in a single transaction. Therefore, your life gets much easier if you can ensure that writes to a data item happen in a single region only at a time.
Imagine a user editing data. This user is always routed to the same region. Therefore, the user will edit data in a single region only. Other users from other continents still see the data. And if the user travels from one continent to another, she can continue to edit data without issues.
Summary
DynamoDB is a NoSQL database that supports key-value and document data structures. Data items are grouped in tables. Each item consists of attributes. One or two attributes form the primary key of the item.
DynamoDB allows you to replicate tables across multiple regions. You can read and write to any replica across the globe at any time. If the same item is modified in two regions, the last write wins.
I recommend using DynamoDB in multi-region architectures whenever possible. You might need to learn a new technology, but your learning efforts will pay off quickly. With DynamoDB, you use the only multi-region-write data store on AWS that is fully managed and pay per use.
Further reading
- Article Multi-Region AWS Architectures
- Article Your Lambda function might execute twice. Be prepared!
- Article Databases on AWS
- Tag dynamodb