ECS Deployment Options - From rolling updates to blue-green and canary
“How often do you deploy to production?” - This is an important question as the best application is useless if you can’t deploy it. And being able to deploy regularly and automated is quite important.
The Elastic Container Service (ECS) is used by many AWS customers and offers different ways to deploy your containers. In this article, I’ll explain the different deployment options of ECS, show how they work, and when to use them.
Do you prefer listening to a podcast episode over reading a blog post? Here you go!
The different options can be set in the DeploymentController property of an ECS service. Let’s have a look:
The first option is ECS itself. The ECS service scheduler is responsible for performing a rolling update of newer versions. The current version of running tasks is replaced by newer versions. The optional DeploymentConfiguration parameter defines how many tasks should run during the deployment. MinimumHealthyPercent indicates the percentage of tasks that must remain in RUNNING status, and MaximumHealthyPercent parameter represents an upper limit on the number of your service’s tasks that are allowed in the RUNNING or PENDING state during a deployment. Both values define a percentage of the current desiredCount.
The configuration is easy and straight forward but also limits the possibilities. It’s not possible to cancel or rollback a deployment. And due to the limited configuration, it does not allow us to do a blue/green or canary deployment.
Another issue comes in combination with CloudFormation (see Deep dive on load balanced ECS Service deployments with CloudFormation for more details):
- A CloudFormation stack creation will always complete successfully even with failing health checks and independent of any deploymentConfiguration
- A CloudFormation stack update will fail only if minimumHealthyPercent is 100%, and the container health check is unhealthy (no other combination). And it takes CloudFormation 3 hours before it triggers the rollback.
- CloudFormation rollback means that it triggers a new ECS deployment with the former taskdefinition (which could lead to some troubles as well)
The next option is CODE_DEPLOY, where the deployment of an ECS service is orchestrated by CodeDeploy. During deployment, it creates tasks of the new version in parallel to existing tasks and then shifts the traffic over. The traffic can be shifted all-at-once, linear, or as canary (small percentage at the beginning and then the rest).
Hooks (basically lambda functions) that are executed before and after the traffic is shifted can ensure that the application works as expected. Failures in the hooks lead to automated rollbacks. In addition, CloudWatch alarms can be set that are watched during the deployment, and if they go on a rollback is triggered. The deployment is also “baked” for some time to allow a fast rollback in case something went wrong after the new version has been rolled out completely.
See also Clare Liguori‘s twitter thread, where she explains it in detail.
To set up ECS deployments with CodeDeploy a certain order must be followed:
- Create your infrastructure (like IAM roles, security groups, and load balancer)
- Set up an ECS Task Definition and Service (and optional a Cluster)
- Create a CodeDeploy Application and DeploymentGroup (referencing ECS service and cluster and two target groups)
- Set up a CodePipeline which builds the image, creates a taskdefinition.json and appsepc.yaml and calls CodeDeploy
Unfortunately, not everything can be configured in CloudFormation, as DeploymentGroup does not support ECS blue/green deployments.
Another issue is that the CodeDeploy action needs not only an appspec.yaml but also a taskdefinition.json that contains our task definition (Although AWS states in their docs that TaskDefinition is optional for an ECS Service, I couldn’t deploy a service without a referenced task definition). With those dependencies, it’s not possible to create a pipeline first and deploy your application afterward. It must be the other way around. First, create your infrastructure and then your pipeline. Further updates to your task definition are deployed with CodeDeploy and not CloudFormation.
One advantage of CDK is that it can fill the gaps of CloudFormation and abstract complicated things away. Although blue/green or canary deployments for ECS in CodeDeploy are not part of the official CDK library, there is a community project (cloudcomponents/cdk-constructs) that does the heavy lifting for you.
The last option is EXTERNAL. External deployments allow anyone to perform deployments by using TaskSets. A TaskSet is a set of ECS tasks that are part of a deployment. They are also used internally by CodeDeploy.
There is an example for Jenkins that performs the following steps:
- Create a green task set with the new task definition
- Shift test traffic to the green task set
- Shift an increment of prod traffic to the green task set
- Shift all prod traffic to the green task set
- Remove blue task set with the old task definition
But also, CloudFormation supports blue/green deployments for Amazon ECS, which is good as you can define your whole infrastructure as code. But how is it different from CodeDeploy, and how does that work?
CloudFormation also uses the external deployment option. To perform the steps mentioned above, CloudFormation introduced a new transformation (“AWS::CodeDeployBlueGreen”) and a new “hooks” section (and an “AWS::CodeDeploy::BlueGreen” hook). Some resources need to be created for blue and green in CloudFormation (e.g., load balancer target group), and for some resources, only the “blue” part needs to be created (e.g., task definition or task set). During a stack creation or update, CloudFormation does multiple transformations of the template to perform the steps above.
I can’t link to any docs as they don’t exist yet. Just have a look at the example. And Clare Liguori‘s twitter thread explains it in more detail.
But the CloudFormation solution comes with a lot of limitations. Just a few:
- SSM parameters can’t be resolved
- Importing existing resources is not supported
- Doesn’t work in combination with nested stacks
- No visibility or management of in-progress deployments like in CodeDeploy.
The one who has the choice is in agony. Every option has it’s advantages but also drawbacks. Unfortunately, there’s no simple answer to what option you should use. I hope I could shed some light on it and help you with your decision.
Despite the variety, I don’t believe that this is what the customers want. Too many cooks spoil the broth: ECS, CodeDeploy, and CloudFormation offer you different ways to deploy an ECS service. But what I’d like to have is the simplicity of ECS rolling updates, the powerful configuration of CodeDeploy, and the infrastructure as code support of the CloudFormation solution.
PS: And a nice integration in CDK :)