👉 AWS Debug Games (Beta) - Prove your AWS expertise by solving tricky challenges.

👉 AWS Debug Games - Prove your AWS expertise.

Anonymize CloudFront Access Logs

Michael Wittig – 22 Apr 2020

Amazon CloudFront can upload access log files to an S3 bucket. By default, CloudFront logs the IP address of the client. Optionally, cookies could be logged as well. If EU citizens access your CloudFront distribution, you have to process personally identifiable information (PII) in a General Data Protection Regulation (GDPR) compliant way. IP addresses are considered PII, and cookie data could also contain PII. If you want to process and store PII, you need a reason in the spirit of the GDPR.

Disclaimer: I’m not a lawyer! This is not legal advice.

Access logs are required to support operations to debug issues. For that purpose, it is okay to keep the access logs for seven days1. But you might need access logs for capacity planning as well. How can you keep the access logs for more than seven days without violating GDPR?

Anonymize Data

The question is: do you really need the IP address in our access logs? The answer is likely no. Unfortunately, CloudFront does not allow us to disable the IP address logging. We have to implement a workaround to anonymize the access logs as soon as they are available on S3. The workaround works like this:

Anonymize CloudFront Access Logs

The diagram was created with Cloudcraft - Visualize your cloud architecture like a pro.

We can use a similar mechanism that is implemented by Google Analytics. An IPv4 address like is turned into (the last 8 bits are removed, 24 bits are kept). IPv6 addresses need a different logic: Google removes the last 80 bits. I will go one step further and remove the last 96 bits and keep 32 bits 2.

The following steps are needed to anonymize an access log file:

  1. Download the object from S3
  2. Decompress the gzip data
  3. Parse the data (tab-separated values, log file format)
  4. Replace the IP addresses with anonymized values
  5. Compress the data with gzip
  6. Upload the anonymized data to S3
  7. Remove the original data from S3

There is no documented max size of an access log file. We should prepare for files that are larger than the available memory. Luckily, Lambda functions support Node.js, which has superb support to deal with streaming data. If we stream data, we never load all data into memory at once.

Looking for a new challenge?


    Cloud Operations Lead

    DEMICON • AWS Advanced Consulting Partner • Remote (Europe)
    service-delivery-management hiring devops platform

First, weload some core Node.js dependencies and the AWS SDK:

const fs = require('fs');
const zlib = require('zlib');
const stream = require('stream');
const AWS = require('aws-sdk');
const s3 = new AWS.S3({apiVersion: '2006-03-01'});

It’s time to implement the anonymization:

function anonymizeIPv4Address(str) {
const s = str.split('.');
s[3] = '0';
return s.join('.');

function anonymizeIPv6Address(str) {
const s = str.split(':').slice(0, 2);
return s.join(':');

function anonymizeIpAddress(str) {
if (str === '-' || str === 'unknown') {
return str;
if (str.includes('.')) {
return anonymizeIPv4Address(str);
} else if (str.includes(':')) {
return anonymizeIPv6Address(str);
} else {
throw new Error('Neither IPv4 nor IPv6: ' + str);

We also have to deal with TSV (tab-separated values)

function transformLine(line) {
if (line.startsWith('#') || line.trim() === '') {
return line;
const values = line.split('\t');
values[4] = anonymizeIpAddress(values[4]);
values[19] = anonymizeIpAddress(values[19]);
return values.join('\t');

So far, we process only small amounts of data: a single access log file line. It’s time to deal with the whole file.

Each chunk of data is represented as a buffer in Node.js. A buffer represents binary data in the form of a sequence of bytes. In the buffer, we search for the line-end \n byte. We slice all bytes from beginning to \n and convert them into a string to extract a line. Continue with the apporach until end of file is reached. There is one edge case: A chunk of data can stop in the middle of a line. We have to add the old chunk to the beginning of a new chunk.

async function process(record) {
let chunk = Buffer.alloc(0);
const transform = (currentChunk, encoding, callback) => {
chunk = Buffer.concat([chunk, currentChunk]);
const lines = [];
while(chunk.length > 0) {
const i = chunk.indexOf('\n', 'utf8');
if (i === -1) {
} else {
lines.push(chunk.slice(0, i).toString('utf8'));
chunk = chunk.slice(i+1);
const transformed = lines
callback(null, Buffer.from(transformed, 'utf8'));
const params = {
Bucket: record.s3.bucket.name,
Key: record.s3.object.key
if ('versionId' in record.s3.object) {
params.VersionId = record.s3.object.versionId;
const body = s3.getObject(params).createReadStream()
.pipe(new stream.Transform({
await s3.upload({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key.slice(0, -2) + 'anonymized.gz',
Body: body
if (chunk.length > 0) {
throw new Error('file was not read completly');
return s3.deleteObject(params).promise();

Finally, Lambda requires a thin interface that we have to implement. I also ensure that anonymized data is not processed again to avoid an expensive infinit loop.

exports.handler = async (event) => {
for (let record of event.Records) {
if (record.s3.object.key.endsWith('.anonymized.gz')) {
} else if (record.s3.object.key.endsWith('.gz')) {
await process(record);

I integrated the workaround into our collection of aws-cf-templates. Check out the documentation or the code on GitHub. A similar approach can be used to anonymize access logs from ELB load balancers (ALB, CLB, NLB).

PS: You should also enable S3 lifecycle rules to delete access logs after 38 months.

Thanks to Thorsten Höger for reviewing this article.

  1. 1. Germany source: https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Veranstaltungen/ITSiKongress/14ter/Vortraege-19-05-2015/Heidrich_Wegener.pdf?__blob=publicationFile
  2. 2. One official recommendation I found recommends dropping at least the last 88 bits of an IPv6 address (German source: https://www.datenschutz-bayern.de/dsbk-ent/DSK_84-IPv6.html)

Become a cloudonaut supporter

Michael Wittig

Michael Wittig ( Email Twitter LinkedIn Mastodon )

We launched the cloudonaut blog in 2015. Since then, we have published 365 articles, 67 podcast episodes, and 67 videos. It's all free and means a lot of work in our spare time. We enjoy sharing our AWS knowledge with you.

Please support us

Have you learned something new by reading, listening, or watching our content? With your help, we can spend enough time to keep publishing great content in the future. Learn more

Amount must be a multriply of 5. E.g, 5, 10, 15.

Thanks to Alan Leech, Alex DeBrie, Christopher Hipwell, e9e4e5f0faef, Jason Yorty, Jeff Finley, jhoadley, Johannes Konings, John Culkin, Jonathan Deamer, Juraj Martinka, Ken Snyder, Markus Ellers, Oriol Rodriguez, Ross Mohan, sam onaga, Satyendra Sharma, Simon Devlin, Todd Valentine, Victor Grenu, and all anonymous supporters for your help! We also want to thank all supporters who purchased a cloudonaut t-shirt.