High availability is a no-brainer: EC2 auto-recovery

Andreas Wittig – 09 Nov 2015

Werner Vogels (CTO of AWS) is quoted with “Everything fails all the time.”. This does not mean AWS is an unreliable cloud provider. Quite the contrary: AWS plans for failure. All services are highly available or fault tolerant. Some of them by default, some of them offer tools to achieve this goal.

Problem

An EC2 instance (virtual machine) is not highly available by default. The underlying virtualization layer, the operating system of the host system or the hardware of the host system are possible points of failure. If one of these parts break, the EC2 instance will become unavailable.

Solution

AWS offers tools to handle the failure of an EC2 instance. The following figure shows the easiest way to recover from a failure:

  1. The EC2 instance fails for one of the previously described reasons.
  2. A health check of the EC2 instance is performed automatically in the background and reported to CloudWatch, the monitoring service from AWS.
  3. A CloudWatch alarm triggers the recovery of the EC2 instance if the health check detects a failure.
  4. A new EC2 instance will be started automatically to replace the failed one.
  5. The new EC2 instance is a clone of the failed EC2 instance. The ID, the private and public IP addresses will stay the same. As long as data is stored on EBS volumes, no data is lost.

EC2 auto-recovery process


Looking for a new challenge?

  • DEMICON

    Senior Lead Full Stack Developer

    DEMICON • AWS Advanced Consulting Partner • Remote (Europe)
    AWS JavaScript/TypeScript Angular React
  • tecRacer

    Cloud Consultant • AWS Serverless Development

    tecRacer • Premier AWS Consulting Partner • Germany, Austria, Portugal, and Switzerland
    Serverless Lambda Python Node.js Go

The following components are needed to setup auto-recovery for EC2 instances:

  • EC2 instance from C3, C4, M3, M4, R3, or T2 family
  • CloudWatch alarm based on health check
  • ElasticIP if you want to keep the same public IP address after an auto-recovery

Use CloudFormation template

I have written a template that you can use to launch an EC2 instance with auto-recovery. It uses Infrastructure as Code to create the needed components and links. You can use AWS CloudFormation to create your EC2 instance with auto-recovery in minutes. The GitHub repository widdix/aws-cf-templates contains the CloudFormation template for EC2 with auto-recovery and some more useful templates.

Next steps

This solution can recover a failed EC2 instance. But it is only able to recover the EC2 instance in the same availability zone (also known as a data center). If the whole availability zone is affected by an outage, your EC2 instance will fail. It is possible to plan for an outage of an availability zone, too. If you are interested, I can recommend our book Amazon Web Services in Action or the AWS documentation about Auto Scaling and ELB.

Become a cloudonaut supporter

Andreas Wittig

Andreas Wittig ( Email Twitter LinkedIn Mastodon )

We launched the cloudonaut blog in 2015. Since then, we have published 366 articles, 60 podcast episodes, and 58 videos. It's all free and means a lot of work in our spare time. We enjoy sharing our AWS knowledge with you.

Please support us

Have you learned something new by reading, listening, or watching our content? With your help, we can spend enough time to keep publishing great content in the future. Learn more

$
Amount must be a multriply of 5. E.g, 5, 10, 15.

Thanks to Alan Leech, Alex DeBrie, ANTHONY RAITI, Christopher Hipwell, e9e4e5f0faef, Jason Yorty, Jeff Finley, jhoadley, Johannes Grumböck, Johannes Konings, John Culkin, Jonas Mellquist, Jonathan Deamer, Juraj Martinka, Ken Snyder, Markus Ellers, Oriol Rodriguez, Ross Mohan, Ross Mohan, sam onaga, Satyendra Sharma, Simon Devlin, Thorsten Hoeger, Todd Valentine, Victor Grenu, waldensystems, and all anonymous supporters for your help! We also want to thank all supporters who purchased a cloudonaut t-shirt.