AWS Velocity Series: EC2 based app infrastructure

Michael Wittig – 05 Mar 2017

To run a production-ready application on EC2 gives you maximum freedom but also maximum responsibilities. By production-ready, I mean:

Highly available: no single point of failure
Scalable: increase or decrease the number of instances based on load
Frictionless deployment: deliver new versions of your application automatically without downtime
Secure: patching operating systems and libraries frequently, follow the least privilege principle in all areas
Operations: provide tools like logging, monitoring and alerting to recognize and debug problems

The overall architecture will consist of a load balancer, forwarding requests to multiple EC2 instances, distributed among different availability zones (data centers).

EC2 based app architecture

The diagram was created with Cloudcraft - Visualize your cloud architecture like a pro.

AWS Velocity Series

Most of our clients use AWS to reduce time-to-market following an agile approach. But AWS is only one part of the solution. In this article series, I show you how we help our clients to improve velocity: the time from idea to production. Discover all posts!

Let’s start simple and tackle all the challenges along the way.

A single EC2 instance is a single point of failure

A single EC2 instance is a single point of failure. When you want to run a production-ready app on EC2, you need more than one EC2 instance. Luckily, AWS provides a way to manage multiple EC2 instances: the Auto Scaling Group. But if you run multiple EC2 instances to serve your application, you also need a load balancer to distribute the requests to one of the EC2 instances.

In the Local development environment part of this series, you created an infrastructure folder which is empty by now. It’s time to change this. You will now create a CloudFormation template that describes the infrastructure that is needed to run the app on EC2 instances.

Load balancer

You can follow step by step or get the full source code here: https://github.com/widdix/aws-velocity

Create a file infrastructure/ec2.yml. The first part of the file contains the load balancer. To fully describe an Application Load Balancer, you need:

A Security Group that allows traffic on port 80
The lApplication Load Balancer itself
A Target Group, which is a fleet of EC2 instances that can receive traffic from the load balancer
A Listener, which wires the load balancer together with the target group and defines the listening port

Watch out for comments with more detailed information in the code.

infrastructure/ec2.ymlGitHub

---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'EC2'
Parameters:
  # You can reuse a VPC for multiple applications. In this case, we use one of our Free Templates for AWS CloudFormation (https://github.com/widdix/aws-cf-templates/tree/master/vpc).
  ParentVPCStack:
    Description: 'Stack name of parent VPC stack based on vpc/vpc-*azs.yaml template.'
    Type: String
Conditions:
Resources:
  # The load balancer accepts HTTP traffic. Therefore the firewall must allow incoming traffic on port 80.
  LoadBalancerSecurityGroup:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: 'load-balancer-sg'
      VpcId:
        'Fn::ImportValue': !Sub '${ParentVPCStack}-VPC'
      SecurityGroupIngress:
      - CidrIp: '0.0.0.0/0'
        FromPort: 80
        ToPort: 80
        IpProtocol: tcp
  # The load balancer needs to run in public subnets because our users should be able to access the app from the Internet.
  LoadBalancer:
    Type: 'AWS::ElasticLoadBalancingV2::LoadBalancer'
    Properties:
      Scheme: 'internet-facing'
      SecurityGroups:
      - !Ref LoadBalancerSecurityGroup
      Subnets:
      - 'Fn::ImportValue': !Sub '${ParentVPCStack}-SubnetAPublic'
      - 'Fn::ImportValue': !Sub '${ParentVPCStack}-SubnetBPublic'
      Tags:
      - Key: Name
        Value: 'load-balancer'
  # A target group groups a bunch of backend instances that receive traffic from the load balancer. the health check ensures that only working backends are used.
  TargetGroup:
    Type: 'AWS::ElasticLoadBalancingV2::TargetGroup'
    Properties:
      HealthCheckIntervalSeconds: 15
      HealthCheckPath: '/5'
      HealthCheckPort: 3000
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 10
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 8
      Matcher:
        HttpCode: 200
      Port: 3000
      Protocol: HTTP
      Tags:
      - Key: Name
        Value: 'target-group'
      VpcId:
        'Fn::ImportValue': !Sub '${ParentVPCStack}-VPC'
  # The load balancer should listen on port 80 for HTTP traffic
  Listener:
    Type: 'AWS::ElasticLoadBalancingV2::Listener'
    Properties:
      DefaultActions:
      - TargetGroupArn: !Ref TargetGroup
        Type: forward
      LoadBalancerArn: !Ref LoadBalancer
      Port: 80
      Protocol: HTTP
# A CloudFormation stack can return information that is needed by other stacks or scripts.
Outputs:
  DNSName:
    Description: 'The DNS name for the load balancer.'
    Value: !GetAtt 'LoadBalancer.DNSName'
    Export:
      Name: !Sub '${AWS::StackName}-DNSName'
  # The URL is needed to run the acceptance test against the correct endpoint
  URL:
    Description: 'URL to the load balancer.'
    Value: !Sub 'http://${LoadBalancer.DNSName}'
    Export:
      Name: !Sub '${AWS::StackName}-URL'

But how do you get notified if something goes wrong? Let’s add a parameter to the Parameters section to make the receiver configurable:

infrastructure/ec2.ymlGitHub

AdminEmail:
  Description: 'The email address of the admin who receives alerts.'
  Type: String

Alerts are triggered by a CloudWatch Alarm which can send an alert to an SNS topic. You can subscribe to this topic via an email address to receive the alerts. Let’s create a SNS topic and two alarms in the Resources section:

infrastructure/ec2.ymlGitHub

# A SNS topic is used to send alerts via Email to the value of the AdminEmail parameter 
Alerts:
  Type: 'AWS::SNS::Topic'
  Properties:
    Subscription:
    - Endpoint: !Ref AdminEmail
      Protocol: email
# This alarm is triggered, if the load balancer responds with 5XX status codes
LoadBalancer5XXAlarm:
  Type: 'AWS::CloudWatch::Alarm'
  Properties:
    EvaluationPeriods: 1
    Statistic: Sum
    Threshold: 0
    AlarmDescription: 'Load balancer responds with 5XX status codes.'
    Period: 60
    AlarmActions:
    - !Ref Alerts
    Namespace: 'AWS/ApplicationELB'
    Dimensions:
    - Name: LoadBalancer
      Value: !GetAtt 'LoadBalancer.LoadBalancerFullName'
    ComparisonOperator: GreaterThanThreshold
    MetricName: HTTPCode_ELB_5XX_Count
# This alarm is triggered, if the backend responds with 5XX status codes
LoadBalancerTargetGroup5XXAlarm:
  Type: 'AWS::CloudWatch::Alarm'
  Properties:
    EvaluationPeriods: 1
    Statistic: Sum
    Threshold: 0
    AlarmDescription: 'Load balancer target responds with 5XX status codes.'
    Period: 60
    AlarmActions:
    - !Ref Alerts
    Namespace: 'AWS/ApplicationELB'
    Dimensions:
    - Name: LoadBalancer
      Value: !GetAtt 'LoadBalancer.LoadBalancerFullName'
    ComparisonOperator: GreaterThanThreshold
    MetricName: HTTPCode_Target_5XX_Count

Let’s recap what you implemented: A load balancer with a firewall rule that allows traffic on port 80. In the case of 5XX status codes you will receive an Email. But the load balancer alone is not enough. Now it’s time to add the EC2 instances.

EC2 instances

So far, there are no EC2 instances. Let’s change that by adding a few more parameters in the Parameters section to make EC2 instances configurable:

infrastructure/ec2.ymlGitHub

# A bastion host increases the security of your system. In this case, we use one of our Free Templates for AWS CloudFormation (https://github.com/widdix/aws-cf-templates/tree/master/vpc).
ParentSSHBastionStack:
  Description: 'Optional Stack name of parent SSH bastion host/instance stack based on vpc/vpc-ssh-bastion.yaml template.'
  Type: String
  Default: ''
# This is the simple way of getting SSH access to your EC2 instance. Not the most secure way. If you want to have personalized users follow https://cloudonaut.io/manage-aws-ec2-ssh-access-with-iam/
KeyName:
  Description: 'Optional key pair of the ec2-user to establish a SSH connection.'
  Type: String
  Default: ''
InstanceType:
  Description: 'The instance type of web servers (e.g. t2.micro).'
  Type: String
  Default: 't2.micro'
# Where does this AMI comes from? It will be created in the CI/CD pipeline!
ImageId:
  Description: 'Unique ID of the Amazon Machine Image (AMI) to boot from.'
  Type: String
# How long do you want to keep logs?
LogsRetentionInDays:
  Description: 'Specifies the number of days you want to retain log events in the specified log group.'
  Type: Number
  Default: 14
  AllowedValues: [1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 3653]

To make the template react differently to different parameter inputs, you need to add a few Conditions that will be used later in the template:

infrastructure/ec2.ymlGitHub

HasKeyName: !Not [!Equals [!Ref KeyName, '']]
HasSSHBastionSecurityGroup: !Not [!Equals [!Ref ParentSSHBastionStack, '']]
HasNotSSHBastionSecurityGroup: !Equals [!Ref ParentSSHBastionStack, '']

Now everything is prepared to describe the EC2 instances. You need:

A Security Group that allows
- traffic on port 3000 from the load balancer Security Group
- traffic on port 22 from the bastion host Security Group if the condition HasSSHBastionSecurityGroup is met
- traffic on port 22 from the world if the condition HasNotSSHBastionSecurityGroup is met
An Auto Scaling Group that defined how many EC2 instances should run
A CloudWatch Logs Group to capture the logs
A Instance Profile to reference the IAM Role
An IAM Role that allows access to deliver logs to CloudWatch Logs
A Launch Configuration that defined what kind of EC2 instances should be created by the Auto Scaling Group

And also create a fleet of EC2 instances in the Resources section:

infrastructure/ec2.ymlGitHub

# The app listens on port 3000, but only the load balancer is allowed to send traffic to that port!
SecurityGroup:
  Type: 'AWS::EC2::SecurityGroup'
  Properties:
    GroupDescription: 'ec2-sg'
    VpcId:
      'Fn::ImportValue': !Sub '${ParentVPCStack}-VPC'
    SecurityGroupIngress:
    - SourceSecurityGroupId: !Ref LoadBalancerSecurityGroup
      FromPort: 3000
      ToPort: 3000
      IpProtocol: tcp
# If the bastion host approach is enabled, traffic on port 22 is only allowed from the bastion host
SecurityGroupInSSHBastion:
  Type: 'AWS::EC2::SecurityGroupIngress'
  Condition: HasSSHBastionSecurityGroup
  Properties:
    GroupId: !Ref SecurityGroup
    IpProtocol: tcp
    FromPort: 22
    ToPort: 22
    SourceSecurityGroupId:
      'Fn::ImportValue': !Sub '${ParentSSHBastionStack}-SecurityGroup'
# Otherwise SSH is allowed from anywhere
SecurityGroupInSSHWorld:
  Type: 'AWS::EC2::SecurityGroupIngress'
  Condition: HasNotSSHBastionSecurityGroup
  Properties:
    GroupId: !Ref SecurityGroup
    IpProtocol: tcp
    FromPort: 22
    ToPort: 22
    CidrIp: '0.0.0.0/0'
AutoScalingGroup:
  Type: 'AWS::AutoScaling::AutoScalingGroup'
  Properties:
    LaunchConfigurationName: !Ref LaunchConfiguration # be patient, you will create a launch configuration soon
    MinSize: 2 # at least two instances should always be running
    MaxSize: 4 # at most 4 instances are allowed to run
    DesiredCapacity: 2 # you want to start with 2 instances
    HealthCheckGracePeriod: 300
    HealthCheckType: ELB # make use of the health check of the load balancer which checks the application health instead of only checking the instance health
    VPCZoneIdentifier: 
    - 'Fn::ImportValue': !Sub '${ParentVPCStack}-SubnetAPublic'
    - 'Fn::ImportValue': !Sub '${ParentVPCStack}-SubnetBPublic'
    TargetGroupARNs:
    - !Ref TargetGroup # automatically (de)register instances with the target group of the load balancer
    Tags:
    - Key: Name
      Value: 'ec2'
      PropagateAtLaunch: true
  CreationPolicy: # wait up to 15 minutes to receive a success signal during instance startup
    ResourceSignal:
      Timeout: PT15M
  UpdatePolicy: # this allows rolling updates if a change requires new EC2 instances
    AutoScalingRollingUpdate:
      PauseTime: PT15M
      WaitOnResourceSignals: true

Let’s recap what you implemented: A firewall rule that allows traffic on port 3000 (the application’s port). Depending on if you use the bastion host approach or not, an appropriate firewall rule will be created to allow SSH access. You also added an Auto Scaling Group that can scale between 2 and 4 instances. So far you have not defined what kind of EC2 instances you want to start, let’s do this in the Resources section:

infrastructure/ec2.ymlGitHub

# Log files that reside on EC2 instances must be avoided because instances come and go depending on load. CloudWatch Logs provides a centralized way to store and search logs.
Logs:
  Type: 'AWS::Logs::LogGroup'
  Properties:
    RetentionInDays: !Ref LogsRetentionInDays
InstanceProfile:
  Type: 'AWS::IAM::InstanceProfile'
  Properties:
    Path: '/'
    Roles:
    - !Ref Role
# The EC2 instance needs permissions to make requests to the CloudWatch Logs service to deliver logs.
Role:
  Type: 'AWS::IAM::Role'
  Properties:
    AssumeRolePolicyDocument:
      Version: '2012-10-17'
      Statement:
      - Effect: Allow
        Principal:
          Service: 'ec2.amazonaws.com'
        Action: 'sts:AssumeRole'
    Path: '/'
    Policies:
    - PolicyName: logs
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Action:
          - 'logs:CreateLogGroup'
          - 'logs:CreateLogStream'
          - 'logs:PutLogEvents'
          - 'logs:DescribeLogStreams'
          Resource: 'arn:aws:logs:*:*:*'
# The Launch Configuration determines what kind of EC2 instances are launched by the Auto Scaling Group
LaunchConfiguration:
  Type: 'AWS::AutoScaling::LaunchConfiguration'
  Metadata:
    'AWS::CloudFormation::Init': # Configuration for the cfn-ini helper script that runs on startup. This is only needed for the dynamic configuration. The rest is backed into the AMI in the CI/CD pipeline.
      config:
        files:
          '/etc/awslogs/awscli.conf': # configuration file for the CloudWatch Logs agent that ships logs to the service
            content: !Sub |
              [default]
              region = ${AWS::Region}
              [plugins]
              cwlogs = cwlogs
            mode: '000644'
            owner: root
            group: root
          '/etc/awslogs/awslogs.conf': # configuration file for the CloudWatch Logs agent that ships logs to the service
            content: !Sub |
              [general]
              state_file = /var/lib/awslogs/agent-state
              [/var/log/messages]
              datetime_format = %b %d %H:%M:%S
              file = /var/log/messages
              log_stream_name = {instance_id}/var/log/messages
              log_group_name = ${Logs}
              [/var/log/secure]
              datetime_format = %b %d %H:%M:%S
              file = /var/log/secure
              log_stream_name = {instance_id}/var/log/secure
              log_group_name = ${Logs}
              [/var/log/cron]
              datetime_format = %b %d %H:%M:%S
              file = /var/log/cron
              log_stream_name = {instance_id}/var/log/cron
              log_group_name = ${Logs}
              [/var/log/cloud-init.log]
              datetime_format = %b %d %H:%M:%S
              file = /var/log/cloud-init.log
              log_stream_name = {instance_id}/var/log/cloud-init.log
              log_group_name = ${Logs}
              [/var/log/cfn-init.log]
              datetime_format = %Y-%m-%d %H:%M:%S
              file = /var/log/cfn-init.log
              log_stream_name = {instance_id}/var/log/cfn-init.log
              log_group_name = ${Logs}
              [/var/log/cfn-hup.log]
              datetime_format = %Y-%m-%d %H:%M:%S
              file = /var/log/cfn-hup.log
              log_stream_name = {instance_id}/var/log/cfn-hup.log
              log_group_name = ${Logs}
              [/var/log/cfn-init-cmd.log]
              datetime_format = %Y-%m-%d %H:%M:%S
              file = /var/log/cfn-init-cmd.log
              log_stream_name = {instance_id}/var/log/cfn-init-cmd.log
              log_group_name = ${Logs}
              [/var/log/cloud-init-output.log]
              file = /var/log/cloud-init-output.log
              log_stream_name = {instance_id}/var/log/cloud-init-output.log
              log_group_name = ${Logs}
              [/var/log/dmesg]
              file = /var/log/dmesg
              log_stream_name = {instance_id}/var/log/dmesg
              log_group_name = ${Logs}
              [/var/log/forever.log]
              file = /var/log/forever.log
              log_stream_name = {instance_id}/var/log/forever.log
              log_group_name = ${Logs}
              [/var/log/app.out]
              file = /var/log/app.out
              log_stream_name = {instance_id}/var/log/app.out
              log_group_name = ${Logs}
              [/var/log/app.err]
              file = /var/log/app.err
              log_stream_name = {instance_id}/var/log/app.err
              log_group_name = ${Logs}
            mode: '000644'
            owner: root
            group: root
        commands:
          'forever':
            command: 'forever start -l /var/log/forever.log -o /var/log/app.out -e /var/log/app.err index.js' # forever keeps the app (a Node.js script) up and running in the background
            cwd: '/opt/app'
        services:
          sysvinit:
            awslogs: # start the CloudWatch Logs agent
              enabled: true
              ensureRunning: true
              files:
              - '/etc/awslogs/awslogs.conf'
              - '/etc/awslogs/awscli.conf'
  Properties:
    ImageId: !Ref ImageId # the image that is created during the build in the CI/CD pipeline passed in as a parameter
    IamInstanceProfile: !Ref InstanceProfile
    InstanceType: !Ref InstanceType
    SecurityGroups:
    - !Ref SecurityGroup
    KeyName: !If [HasKeyName, !Ref KeyName, !Ref 'AWS::NoValue']
    UserData: # execute cfn-init helper script and signal success or failure back to CloudFormation
      'Fn::Base64': !Sub |
        #!/bin/bash -x
        /opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource LaunchConfiguration --region ${AWS::Region}
        /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource AutoScalingGroup --region ${AWS::Region}

Let’s recap what you implemented: The Launch Configuration defines what kind of EC2 instances the Auto Scaling Group creates. The cfn-init script reads Metadata from CloudFormation to configure an running EC2 instance dynamically. The cfn-signal script reports to CloudFormation if the EC2 instance was started successfully or not. CloudWatch Logs stored the log files that are delivered by an agent that runs on the EC2 instance.

Auto Scaling

So far, the number of EC2 instances is static. To scale based on the load you need to add

Scaling Policies to define what should happen if the system should scale up/down
CloudWatch Alarms to trigger a Scaling Policy based on a metric such as CPU utilization

to the Resources section:

infrastructure/ec2.ymlGitHub

# Increase the number of instances by 25% but at least by one not more often than every 10 minutes.
ScalingUpPolicy:
  Type: 'AWS::AutoScaling::ScalingPolicy'
  Properties:
    AdjustmentType: PercentChangeInCapacity
    MinAdjustmentStep: 1
    AutoScalingGroupName: !Ref AutoScalingGroup
    Cooldown: 600
    ScalingAdjustment: 25
# Decrease the number of instances by 25% but at least by one one not more often than every 15 minutes.
ScalingDownPolicy:
  Type: 'AWS::AutoScaling::ScalingPolicy'
  Properties:
    AdjustmentType: PercentChangeInCapacity
    MinAdjustmentStep: 1
    AutoScalingGroupName: !Ref AutoScalingGroup
    Cooldown: 900
    ScalingAdjustment: -25
# Trigger the ScalingUpPolicy if the average CPU load of the past 5 minutes is higher than 70%
CPUHighAlarm:
  Type: 'AWS::CloudWatch::Alarm'
  Properties:
    EvaluationPeriods: 1
    Statistic: Average
    Threshold: 70
    AlarmDescription: 'CPU load is high.'
    Period: 300
    AlarmActions:
    - !Ref ScalingUpPolicy
    Namespace: 'AWS/EC2'
    Dimensions:
    - Name: AutoScalingGroupName
      Value: !Ref AutoScalingGroup
    ComparisonOperator: GreaterThanThreshold
    MetricName: CPUUtilization
# Trigger the ScalingDownPolicy if the average CPU load of the past 5 minutes is lower than 30% for 3 consecutive times
CPULowAlarm:
  Type: 'AWS::CloudWatch::Alarm'
  Properties:
    EvaluationPeriods: 3
    Statistic: Average
    Threshold: 30
    AlarmDescription: 'CPU load is low.'
    Period: 300
    AlarmActions:
    - !Ref ScalingDownPolicy
    Namespace: 'AWS/EC2'
    Dimensions:
    - Name: AutoScalingGroupName
      Value: !Ref AutoScalingGroup
    ComparisonOperator: LessThanThreshold
    MetricName: CPUUtilization

Let’s recap what you implemented: The Scaling Policy defines what happens when you want to scale while a CloudWatch Alarm triggers the Scaling Policy based on live metrics like CPUUtilization. The Auto Scaling Group will now keep a dynamic number of EC2 instances but always ensures that not less that two instances are running and not more than 4.

One thing is missing: Monitoring of your EC2 instances. Add

A CloudWatch Alarm to monitor the CPU utilization
A Log Filter that searches for the word Error in the logs and puts the result count into a CloudWatch Metric
A CloudWatch Alarm that monitors the Log Filter output

to your Resources section:

infrastructure/ec2.ymlGitHub

# Sends an alert if the average CPU load of the past 5 minutes is higher than 85%
CPUTooHighAlarm:
  Type: 'AWS::CloudWatch::Alarm'
  Properties:
    EvaluationPeriods: 1
    Statistic: Average
    Threshold: 85
    AlarmDescription: 'CPU load is too high.'
    Period: 300
    AlarmActions:
    - !Ref Alerts
    Namespace: 'AWS/EC2'
    Dimensions:
    - Name: AutoScalingGroupName
      Value: !Ref AutoScalingGroup
    ComparisonOperator: GreaterThanThreshold
    MetricName: CPUUtilization
# Filters all logs for the word Error
AppErrorsLogsFilter:
  Type: 'AWS::Logs::MetricFilter'
  Properties:
    FilterPattern: Error
    LogGroupName: !Ref Logs
    MetricTransformations:
    - MetricName: AppErrors
      MetricNamespace: !Ref 'AWS::StackName'
      MetricValue: 1
# Sends an alert if the word Error was found in the logs
AppErrorsAlarm:
  Type: 'AWS::CloudWatch::Alarm'
  Properties:
    AlarmDescription: 'application errors in logs'
    Namespace: !Ref 'AWS::StackName'
    MetricName: AppErrors
    Statistic: Sum
    Period: 60
    EvaluationPeriods: 1
    Threshold: 0
    ComparisonOperator: GreaterThanThreshold
    AlarmActions:
    - !Ref Alerts

The infrastructure is ready now. Read the next part of the series to learn how to setup the CI/CD pipeline to deploy the EC2 based app.

Series

AWS Velocity Cover

Set the assembly line up
Local development environment
CI/CD Pipeline as Code
Running your application
EC2 based app
a. Infrastructure (you are here)
b. CI/CD Pipeline
Containerized ECS based app
a. Infrastructure
b. CI/CD Pipeline
Serverless app
Summary

You can find the source code on GitHub.

Michael Wittig

I’ve been building on AWS since 2012 together with my brother Andreas. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV, HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.

AWS Velocity Series: EC2 based app infrastructure

AWS Velocity Series

A single EC2 instance is a single point of failure

Load balancer

EC2 instances

Auto Scaling

Series

Michael Wittig

Further reading