New CloudFormation Template - Backing up DynamoDB the easy way
DynamoDB is an easy-to-use NoSQL database available only on AWS. It comes with many useful features such:
- Time-to-live for items
- Secondary indexes
- Scalable read and write throughput
- Streams that contain all changes of a table
Unfortunately, there is no backup feature. You might say:
But DynamoDB replicates my data. The durability guarantees are super high. I will never lose my data. I can even replicate my data cross region
But what if your application has a bug and updates many items in a way that corrupts your data, or what if your cleanup script deletes the wrong items? You can not revert changes easily in DynamoDB. You need a backup in that case!
One of the solutions is to run a job on an EMR cluster that does the backup. But this is a multi-step process:
Stop fighting your database.
DynamoDB is exploding in popularity. It's fast, scalable, and fully-managed. But it's not your father's database. Data modeling in DynamoDB is different than the relational data model you're used to.
Learn how to properly design your data model with DynamoDB to avoid problems later.
The DynamoDB Book by Alex DeBrie is available now!
- Create an EMR cluster
- Wait until the cluster is ready
- Run the backup job
- Wait until the job is completed, retry if necessary
- Terminate the EMR cluster
- Wait until the cluster is terminated
The only unsolved problem is: How to setup the Data Pipeline? My answer is CloudFormation. I authored a template to backup a DynamoDB table. The template contains the data pipeline, S3 buckets for logs and the backups, and also the glue to alert you if the backup fails. The following figure shows the pipeline definition:
If you want to setup a CloudFormation stack to backup a DynamoDB table, just follow the steps in the Free Template for AWS CloudFormation documentation.
One downside of this solution is that you have to spin up an EMR cluster for each table that you backup. A cluster consists of one master and one core node, e.g. of type
us-east-1 this means 2 * $0.10 (assuming the backup completes within one hour). Additionally, you pay for EMR 2 * $0.03 (also assuming the backup ti finish in under 1 hour). In total, you spend $0.26 per table per backup.
- A look at DynamoDB
- DynamoDB pitfall: limited throughout due to hot partitions
- CloudFormation vs. Terraform
- Maintaining an Open Source library of production-ready CloudFormation templates
- New CloudFormation Template - Operational Alerts and new Docs
- New CloudFormation Templates - ECS Cluster & Service, legacy VPC wrapper, automated tests
- New CloudFormation Template - Authentication Proxy using your GitHub Organization and YAML
- New CloudFormation Templates - VPC bastion host, Jenkins, Security AWS Config
- New CloudFormation Templates - NAT Gateway, Static website, Security