🎉 We are launching a new weekly show: Hot off the Cloud

🎉 We are launching a new weekly show

New CloudFormation Template - Backing up DynamoDB the easy way

Michael Wittig – 08 Aug 2017

DynamoDB is an easy-to-use NoSQL database available only on AWS. It comes with many useful features such:

  • Time-to-live for items
  • Secondary indexes
  • Scalable read and write throughput
  • Streams that contain all changes of a table

Unfortunately, there is no backup feature. You might say:

But DynamoDB replicates my data. The durability guarantees are super high. I will never lose my data. I can even replicate my data cross region

But what if your application has a bug and updates many items in a way that corrupts your data, or what if your cleanup script deletes the wrong items? You can not revert changes easily in DynamoDB. You need a backup in that case!

One of the solutions is to run a job on an EMR cluster that does the backup. But this is a multi-step process:


Looking for a new challenge?

  • tecRacer

    Cloud Consultant • AWS Migrations

    tecRacer • Premier AWS Consulting Partner • Germany, Austria, Portugal, and Switzerland
    Assessment Transformation Change Management
  • DEMICON

    Senior Lead Full Stack Developer

    DEMICON • AWS Advanced Consulting Partner • Remote
    AWS JavaScript/TypeScript Angular React

  1. Create an EMR cluster
  2. Wait until the cluster is ready
  3. Run the backup job
  4. Wait until the job is completed, retry if necessary
  5. Terminate the EMR cluster
  6. Wait until the cluster is terminated

Luckily, Data Pipeline is a service that can do the orchestration work for you. The Data Pipeline documentation has examples to backup and restore a DynmoDB table.

The only unsolved problem is: How to setup the Data Pipeline? My answer is CloudFormation. I authored a template to backup a DynamoDB table. The template contains the data pipeline, S3 buckets for logs and the backups, and also the glue to alert you if the backup fails. The following figure shows the pipeline definition:

Data Pipeline to backup a DynamoDB table

If you want to setup a CloudFormation stack to backup a DynamoDB table, just follow the steps in the Free Template for AWS CloudFormation documentation.

One downside of this solution is that you have to spin up an EMR cluster for each table that you backup. A cluster consists of one master and one core node, e.g. of type m4.large. In us-east-1 this means 2 * $0.10 (assuming the backup completes within one hour). Additionally, you pay for EMR 2 * $0.03 (also assuming the backup ti finish in under 1 hour). In total, you spend $0.26 per table per backup.

Become a cloudonaut supporter

Michael Wittig

Michael Wittig ( Email, Twitter, or LinkedIn )

We launched the cloudonaut blog in 2015. Since then, we have published 360 articles, 49 podcast episodes, and 48 videos. It's all free and means a lot of work in our spare time. We enjoy sharing our AWS knowledge with you.

Please support us

Have you learned something new by reading, listening, or watching our content? With your help, we can spend enough time to keep publishing great content in the future. Learn more

$
Amount must be a multriply of 5. E.g, 5, 10, 15.

Thanks to Alan Leech, Alex DeBrie, ANTHONY RAITI, Christopher Hipwell, Jaap-Jan Frans, Jason Yorty, Jeff Finley, Jens Gehring, jhoadley, Johannes Grumböck, Johannes Konings, John Culkin, Jonas Mellquist, Juraj Martinka, Kamil Oboril, Ken Snyder, Markus Ellers, Ross Mohan, Ross Mohan, sam onaga, Satyendra Sharma, Shawn Tolidano, Simon Devlin, Thorsten Hoeger, Todd Valentine, Victor Grenu, and all anonymous supporters for your help! We also want to thank all supporters who purchased a cloudonaut t-shirt.