Programming your CDN: CloudFront and Lambda@Edge

Andreas Wittig – 09 Apr 2021

Minimizing the load time of your websites and applications is essential for two reasons. First, search engines rank websites based on page load times. Second, users are impatient and might cancel loading your application to jump to a competitor instead. That’s why content delivery networks (CDNs) became more and more popular since they came into existence in the late 1990s.

Programming your CDN

A CDN typically consists of hundreds of proxy servers distributed among data centers all around the world. The idea is to minimize the distance between the users and the server. Doing so allows reducing latency significantly. On top of that, a CDN caches the responses from the origins like your web servers, for example. Another way to minimize latency and, therefore, page load times.

But, a CDN is much more than that. Learn how to benefit from programming your CDN in the following.

This is a cross-post from the Cloudcraft blog.

Distributing media assets with CloudFront

There are many CDN providers. Akamai, Fastly, and Cloudflare, to name a few. AWS offers a content delivery network as well: CloudFront. Amazon’s CDN consists of more than 200 edge locations and spans 88 cities across 45 countries. For example, I live in Ulm - a small town in the south of Germany - the next edge location is in Munich, which is about 120 km (about 75 miles) away.

To distribute your media assets with CloudFront, you need to configure a so-called distribution that forwards incoming requests to an origin if the response is not already cached. As shown in the following diagram, CloudFront supports multiple origins.

Amazon S3, an object store, is the most obvious choice for media assets
Any infrastructure running on AWS, typically behind an Elastic Load Balancing (ELB) load balancer
Any HTTP(S) endpoint accessible from the Internet

Distributing media assets with CloudFront

Of course, CloudFront supports HTTPS. Bring your certificate or use the Amazon Certificate Manager (ACM) to generate a free certificate for your domain.

CloudFront costs per request and data transfer. That’s $0.0100 per 10,000 HTTPS requests and $0.085 per GB. For example, I paid $40 in November to deliver a blog’s assets with about 100,000 page impressions.

I’m using CloudFront for many projects. For example, static websites typically generated by a website generator like Hexo, Jekyll, or Hugo are a perfect fit for S3 and CloudFront. Also, I’m using CloudFront to distribute the assets - HTML, CSS, and more - of a single page application (SPA).

One more thing, CloudFront works fine for assets like HTML, CSS, JPG, PNG, but is an excellent choice to deliver video-on-demand and downloads as well. That being said,the file size limit is 20 GB.

Programming your CDN with Lambda@Edge

Interestingly, CloudFront is more than a CDN for static assets. Programming your CDN is possible by making use of Lambda@Edge.

But let’s start at the beginning. AWS Lambda is a platform to execute your source code without spinning up any servers. All you do is to program a function and upload it. AWS Lambda provides the runtime environment - for example, Python, Java, or Node.js - and provisions the required infrastructure underneath. It is possible to build Serverless web applications with this approach.

Lambda@Edge enables you to execute your source code at the CDN layer. Hook your code in one of the following steps.

When CloudFront receives a request from a user.
Before CloudFront forwards a request to the origin (e.g., S3).
When CloudFront receives a response from the origin.
Before CloudFront returns the response to the user.

Programming your CDN with Lambda@Edge

Being able to execute your code at the CDN layer opens up many possibilities. Next, I will share some use cases from practice.

Use Cases

The following use cases should serve as inspiration for utilizing Lamnda@Edge as part of your cloud architecture.

HTTP redirects and index document

When using S3 and CloudFront to host static websites, there are two common challenges.

Redirect requests from a subdomain to the second-level domain.
Define root documents for subfolders. CloudFront allows configuring a root document for the root folder only.

Adding those missing features to CloudFront is possible with Lambda@Edge. The following diagram illustrates the process for an HTTP redirect.

HTTP redirects and index document

CloudFront invokes the Lambda function for every incoming request - a so-called viewer request. The Lambda function checks the domain name and returns an HTTP response with status code 302 if a redirect is needed. The following code snippet - written in JavaScript - from aws-cf-templates shows how to do so exemplarily.

const domainName = 'example.com'.toLowerCase();
const redirectDomainName = 'www.example.com'.toLowerCase();

exports.handler = async function (event) {
  const cf = event.Records[0].cf;
  if (cf.request.headers.host[0].value.toLowerCase() === redirectDomainName) {
    return {
      status: '301',
      statusDescription: 'Moved Permanently',
      headers: {
        location: [{
          key: 'Location',
          value: `https://${!domainName}${!cf.request.uri}`,
        }],
      }
    };
  } else {
    return cf.request;
  }
};

In this use case, using Lambda@Edge is simple, but powerful.

Resize or manipulate images on-the-fly

Another use case for which I have used Lambda@Edge is to resize or manipulate images on-the-fly. Users access your website with their smartphones, tablets, laptops, and large screens. A responsive website renders images in different dimensions. To reduce network traffic and latency, you need to provide an image in multiple sizes. In theory, you can do so by storing each image in all the required dimensions. However, when the layout changes, you will need to re-create those images with the changed sizes. Doing so can be challenging.

Another approach is to resize images on-the-fly. Nowadays, so-called source sets allow you to define different variants of an image, and the browser decides which image to load based on the needed image dimensions, for example.

<img srcset="example-480w.jpg 480w,
             example-800w.jpg 800w"
     sizes="(max-width: 600px) 480px,
            800px"
     src="example-800w.jpg"
     alt="Example">

Therefore, the browser will request example-480w.jpg or example-800w.jpg based on the actual width of the responsive image. Check out Responsive images from Mozilla to learn more.

By using Lambda@Edge, quick resizing is possible. When CloudFront has not cached an image with specific dimensions yet, it will invoke a Lambda function that will fetch the original image from S3 and resize as needed.

Resize or manipulate images on-the-fly

The benefit is that modifying the layout does not require you to re-generate all the images in different sizes. Instead, the provided image sizes are generated on-demand. That’s also a benefit when some assets are accessed very seldomly or not at all.

Besides resizing images, the same approach works for manipulating images in general. For example, to add a watermark or to optimize for smaller file sizes.

User and Origin Authentication

Executing your code with Lambda@Edge enables you to implement custom authentication as well. There are two major scenarios for doing so:

Authenticate the user that tries to access media assets. For example, to restrict access to paid content.
Add credentials when sending requests to the origin - for example, an ALB - to ensure that your web application only answers CloudFront requests. That’s important because CloudFront origins have to be accessible from the Internet. Without authenticating your CloudFront distribution, it is possible to bypass the CDN. Something a DDoS attacker might try to do.

The functionality is similar to the previous use cases. Configuring a Lambda@Edge function to process viewer requests allows you to authenticate a user, for example, by using basic authentication or JWT. On top of that, hooking a Lambda@Edge function into the origin request allows you to add credentials to authenticate at the origin.

User and Origin Authentication

In theory, it is possible to access a database from Lambda@Edge as well. DynamoDB is a good fit here because of its low latency and public REST API.

Hopefully, you were inspired by one of the described use cases for Lambda@Edge already. The programming you CDN is a powerful approach, however, you need to know about the limits of the architecture pattern as well.

Limitations

When looking at an AWS service, it is most interesting to look into its limitations. Because this is where you decide whether or not your architecture works out as planned.

Lambda@Edge does support Internet-facing origins only. Private resources within a VPC are neither accessible from CloudFront nor Lambda@Edge.
Lambda@Edge supports the following runtime environments: Python 3.8, Python 3.7, Node.js 12, and Node.js 10.
Lambda@Edge does not support environment variables.
Lambda@Edge does not support layers.
Lambda@Edge does not support dead letter queues.

A Lambda@Edge function for processing viewer requests comes with additional restrictions.

Maximum Memory: 128 MB
Function Timeout: 5 seconds
Response Size: 40 KB

Check out the official documentation to learn more about the limitations of Lambda@Edge.

Summary

When designing an architecture, you should consider adding a content delivery network (CDN) to decrease network latency and cache popular requests. AWS offers CloudFront, which integrates very well with the object store S3.

On top of that, extending and customizing CloudFront’s functionally with Lambda@Edge is powerful. Manipulate incoming requests and hook into forwarding requests to an origin if a request cannot be answered from the cache. For example, I have been using Lambda@Edge to implement HTTP redirects, generate resized and optimized images on-the-fly and implement authentication between a user and CloudFront and between CloudFront and your custom origin.

A fun fact at the end: Lambda@Edge is not running at the 200+ edge locations AWS operates. Instead, AWS deploys the Lambda function to their 20 regions. The name is somewhat misleading.

Andreas Wittig

I’ve been building on AWS since 2012 together with my brother Michael. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV, attachmentAV, HyperEnv, and marbot.

Here are the contact options for feedback and questions.

Programming your CDN: CloudFront and Lambda@Edge

Distributing media assets with CloudFront

Programming your CDN with Lambda@Edge

Use Cases

HTTP redirects and index document

Resize or manipulate images on-the-fly

User and Origin Authentication

Limitations

Summary

Andreas Wittig

Further reading