Show Me Your Architecture Vol. 2: Platform Engineering on AWS
Through the AWS documentation, books like AWS in Action or AWS training, you can gain theoretical knowledge. But beyond that, it is very valuable to learn directly from practice. In this series, we inspect real-life AWS architectures. In the 2nd volume of the series, Matt provides insights into platform engineering on AWS.
I’m Matt Gowie, Founder at Masterpoint. I started my career as a software engineer and later transitioned into the AWS and DevOps world. I established Masterpoint originally as a solo consultancy. Still, in recent years we’ve grown to a larger team and are entirely focused on AWS Platform Engineering efforts using Terraform, Kubernetes, and a focus on GitOps. Our project successes have included short-term engagements and larger projects for diverse clients ranging from seed-funded startups, Fortune 20 enterprises, and many that fall in between.
We’re building cloud platforms that allow our clients to easily deploy their AWS applications. We empower developers to deploy their microservices by providing continuous delivery mechanisms and a production-ready platform. Our goal is to provide a ready-to-use application layer customized to the needs of our clients.
As shown in the following figure, we use the following building blocks as the generic platform for our clients.
- Amazon Elastic Kubernetes Service (EKS) orchestrates containers.
- AWS Fargate and EKS Managed Node Groups act as our compute layer for EKS.
- Amazon RDS, Amazon ElastiCache, Amazon OpenSearch, Amazon S3, … and more provide managed services consumed by our clients’ applications.
- Argo CD allows application engineers to deploy their microservices using declarative GitOps CD for Kubernetes.
- Spacelift is used to manage the infrastructure automation that we define in Terraform modules and configuration files.
- The SOPS Operator is our standard means to manage Kubernetes Secret Resources, providing sensitive configuration parameters to microservices in a GitOps way.
Using Infrastructure as Code with Terraform allows us to bootstrap all the underlying infrastructure like VPC, EKS, and so on. Spacelift executes our Terraform code to spin up the platform up until the point where Argo CD is up and running. From that point, Argo CD takes over, mainly to deploy application microservices and any 3rd party tooling needed in the cluster (like a CSI driver, observability tooling, log processor, or similar).
The obvious question is, why EKS instead of ECS? The main reason why we bet on EKS is that it supports GitOps. The idea behind GitOps is that changes to the infrastructure or application code are pulled to the cluster instead of pushed, which enforces a single source of truth (Git) and removes infrastructure drift from the equation. There is no GitOps operator for ECS. Therefore, ECS requires the traditional approach of a CI/CD pipeline.
So why is GitOps so important? GitOps is the perfect choice when it comes to involving application engineers in their platform. For developers, GitOps feels natural and is simple to use because it simply revolves around a tool that we all already know well: Git and our Git Provider (GitHub, GitLab, etc.).
Besides picking the right service for orchestrating containers, we have considered using Flux instead of Argo CD. Both Flux and Argo CD are continuous delivery GitOps tools for Kubernetes. We picked Argo CD because it comes with a graphical user interface that is friendly to application engineers and is feature complete. However, we have an eye on the progress Flux is making because it has some interesting capabilities and is gaining ground.
The architecture of the platform has one design flaw. Terraform spins up the infrastructure and installs Argo CD. Then, Argo CD takes over and provisions the application services and supporting tooling. The problem is that it is important to ensure that neither Terraform nor Argo CD crosses the boundary between the resources they manage. For example, Terraform should not interact with the K8s resources managed by Argo CD. And Argo CD should not modify AWS resources managed by Terraform.
We wanted to solve this limitation by potentially moving all of our Terraform towards Crossplane, but our research showed us that it is not ready for our level of infrastructure automation. Veronika from my team just published a blog post summarizing our experiences with Crossplane: Crossplane: Why it Didn’t Work for Us.
Our architecture constantly evolves, as platform engineering, K8s, and GitOps is a very vibrant space.
Here is one example, at the beginning, we were big fans of AWS Systems Manager Parameter Store to make secrets available to microservices. However, managing those secrets required multiple steps to be driven by a GitOps workflow, which was unreliable and a lot of work.
Therefore, we started using SOPS to manage secrets once we figured out that it was a more powerful, Git-driven pattern for secrets management. The SOPS operator that we use allows us to manage a SOPS file, and it updates Kubernetes Secrets whenever changes are made in Git and keep things secure by encrypting the secret values that we store in Git via AWS KMS.
The combination of EKS and GitOps allows Matt and his team to build platforms that allow application engineers to deploy their microservices securely, reliably, and with ease. GitOps is a modern and intuitive way to deploy microservices and enhances the collaboration between application and platform engineers. By using services like RDS, ElastiCache, OpenSearch, or S3, Matt offloads the complexity of managing databases and storage and focuses on the important part: the business application.