Encrypting sensitive data stored on S3

Michael Wittig – 14 Aug 2018

Encrypting sensitive data stored on S3

S3 comes with a bunch of features to encrypt your data at rest. Data at rest means inactive data stored physically on disk. Before we dive into encrypting data at rest, I want to highlight that there is also data in use and data in transit. If the data is in memory, it is in use. If the data is on the network, it is in transit. If you transfer data to S3, it is TLS encrypted by default. This blog post will guide you through all ways to encrypt your S3 data at rest.

Comparing options

S3 offers a bunch of options to encrypt your data at rest. The fundamental questions to compare the options are:

  • Who en/decrypts the data? Data encryption can happen either on your side (client-side encryption) or on AWS (server-side encryption or SSE). When you encrypt data on your side, the data transferred to S3 is already encrypted. S3 never sees the raw data. Server-side encryption is different because you send the raw data to S3 where it is encrypted.
  • Who stores the secret? Imagine you encrypted all your pictures and uploaded them to S3. You store the secret used for encryption on your USB stick. A few months later, you want to look at your pictures. Unfortunately, the USB stick where your stored the secret broke. The loss of the USB stick is a catastrophe. You are no longer able to decrypt your pictures. They are gone forever.
  • Who manages the secret? Data encryption makes no sense if everyone can access your secret. Managing access to the secret is a great responsibility.

The following table summaries the available options on S3 to encrypt your data at rest.

Who en/decrypts the data Who stores the secret Who manages the secret
SSE-AES AWS AWS AWS
SSE-KMS (AWS managed CMK) AWS AWS AWS
SSE-KMS (customer managed CMK) AWS AWS you
SSE-C AWS you you
AWS SDK + KMS (AWS managed CMK) you AWS AWS
AWS SDK + KMS (customer managed CMK) you AWS you
AWS SDK + self-managed secret you you you

Let’s dive into the details of each option.

Server-side encryption

Server-side encryption means that you send unencrypted raw data to AWS. On the AWS infrastructure, the raw data is encrypted and finally stored on disk. When you retrieve data, AWS reads the encrypted data from the disk, decrypts the data, and sends raw data back to you. The en/decryption is transparent to the AWS user.

SSE-AES

SSE-AES is a straightforward approach. AWS handles encryption and decryption for you on the server-side using the aes256 algorithm. AWS also controls the secret key that is used for encryption/decryption.

To upload a file and store it encrypted, run:

aws s3 cp path/to/local.file s3://bucket-name/sse-aes --sse AES256

To download the decrypted file, run:

aws s3 cp s3://bucket-name/sse-aes path/to/local.file

SSE-KMS (AWS managed CMK)

SSE-KMS is very similar to SSE-AES. The only difference is that the secret key (aka AWS managed Customer Master Key (CMK)) is provided by the KMS service and not by S3.

To upload a file and store it encrypted, run:

aws s3 cp path/to/local.file s3://bucket-name/sse-kms --sse aws:kms

To download the decrypted file, run:

aws s3 cp s3://bucket-name/sse-kms path/to/local.file

The AWS managed CMK comes with the following default key policy that you can not modify. The default key policy allows:

  1. The S3 service is called from the same AWS account to encrypt/decrypt using the CMK
  2. IAM in the same AWS account to use authorize read-only API actions

This is the policy in its full length:

{
"Version": "2012-10-17",
"Id": "auto-s3-2",
"Statement": [
{
"Sid": "Allow access through S3 for all principals in the account that are authorized to use S3",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"kms:ViaService": "s3.us-east-1.amazonaws.com",
"kms:CallerAccount": "ACCOUNT_ID"
}
}
},
{
"Sid": "Allow direct access to key metadata to the account",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT_ID:root"
},
"Action": [
"kms:Describe*",
"kms:Get*",
"kms:List*"
],
"Resource": "*"
}
]
}

You can not delete or restrict the AWS managed CMK used by S3!

SSE-KMS (customer managed CMK)

Alternatively, you can manage the secret key (aka Customer managed Customer Master Key) using the KMS service. You create a Customer Master Key (CMK) and reference that key for encryption/decryption. At any time, you can delete the CMK to make all data useless. You also have full control over the CMK by customizing the key policy.

To create a basic CMK, run:

aws kms create-key

The key policy will allow access from all IAM entities in your AWS account (as long as the IAM policy allows it).

Your output will look similar:

{
"KeyMetadata": {
"Origin": "AWS_KMS",
"KeyId": "858d8d36-c87b-4b48-9a41-b69b7ad9d4e2",
"Description": "",
"KeyManager": "CUSTOMER",
"Enabled": true,
"KeyUsage": "ENCRYPT_DECRYPT",
"KeyState": "Enabled",
"CreationDate": 1534164269.969,
"Arn": "arn:aws:kms:us-east-1:ACCOUNT_ID:key/858d8d36-c87b-4b48-9a41-b69b7ad9d4e2",
"AWSAccountId": "ACCOUNT_ID"
}
}

Remember somewhere the KeyId value (e.g., 858d8d36-c87b-4b48-9a41-b69b7ad9d4e2).

To upload a file and store it encrypted using your newly created CMK, run and replace KMS_KEY_ID with the KeyId value:

aws s3 cp path/to/local.file s3://bucket-name/sse-kms-cmk --sse aws:kms --sse-kms-key-id KMS_KEY_ID

To download the decrypted file, run:

aws s3 cp s3://bucket-name/sse-kms-cmk path/to/local.file

Now, disable the CMK:

aws kms disable-key --key-id KMS_KEY_ID

And try to download the file again and you will run into an error (dKMS.DisabledException). That’s the difference compared with the AWS managed CMK that you can not control.

Finally, mark the CMK for deletion to avoid future costs:

aws kms schedule-key-deletion --key-id KMS_KEY_ID

You will never be able to retrieve the file from S3 once you delete the CMK!

SSE-C

With SSE-C, you are in charge of the secret key while AWS still cares about encryption/decryption. Every time you call the S3 API, you also have to attach the secret key.

To generate a random 32 bytes (256 bits) secret key, run:

openssl rand -out sse-c.key 32

To upload a file and store it encrypted, run:

aws s3 cp path/to/local.file s3://bucket-name/sse-c --sse-c AES256 --sse-c-key fileb://sse-c.key

The big difference comes when you want to download the file again. Now you also have to provide the secret key.

aws s3 cp s3://bucket-name/sse-c path/to/local.file --sse-c AES256 --sse-c-key fileb://sse-c.key

If you lose the key, you can not retrieve the data from S3 anymore!

Client-side encryption

Client-side encryption means that you encrypt the data before you send it to AWS. It also means that you decrypt the data that you retrieve from AWS. Usually, client-side encryption needs to be deeply embedded into your application.

AWS SDK + KMS

You can use the AWS SDK to upload/download files from S3. The KMS service can generate data keys that you can use for encryption/decryption. The data key itself is encrypted using the KMS Customer Master Key. If you want to use the encrypted data key, you have to send the encrypted data to the KMS service and ask for decryption. The decrypted data key is only returned if the CMK is still available and you have permissions to use it.

I implemented client-side encryption using the AWS SDK for Node.js. The encrypted key is uploaded together with the object to reduce the risk of losing the data key for a particular object. Keep in mind that the implementation is not very efficient since you will make a call to the KMS service for every encrypt and decrypt operation. It could make sense to keep the data keys in memory and reuse them for multiple objects.

You can find the full source code on GitHub.

Let’s dive into the code.

First, we need a way to create an encrypted data key. The encrypted data key is stored on disk as a performance optimization. Multiple encrypt calls reuse the data key as long as you do not remove or override the file data.key in your current working directory.

createGitHub
const util = require('util');
const fs = require('fs');

const AWS = require('aws-sdk');
const kms = new AWS.KMS({apiVersion: '2014-11-01'});

const TEMP_DATA_KEY_FILE_NAME = 'data.key';

const writeFile = util.promisify(fs.writeFile);

exports.create = async (kmsKeyId) => {
const data = await kms.generateDataKey({
KeyId: kmsKeyId,
KeySpec: 'AES_256'
}).promise();
await writeFile(TEMP_DATA_KEY_FILE_NAME, data.CiphertextBlob);
return TEMP_DATA_KEY_FILE_NAME;
};

You created a data key, but it is encrypted. Before you can use the data key, you have to decrypt it first.

getDecryptedDataKeyBufferGitHub
  const getDecryptedDataKeyBuffer = async (encryptedKeyBuffer) => {
const data = await kms.decrypt({CiphertextBlob: encryptedKeyBuffer}).promise();
return data.Plaintext;
};

The AES algorithm that you will use for encryption relies on an initialization vector (IV). The IV is generated randomly and ensures that similar data results in very different ciphertext. The IV is also needed when decrypting the ciphertext.

generateIVBufferGitHub
const crypto = require('crypto');

const IV_LENGTH = 8;

const generateIVBuffer = (keyBuffer) => {
const salt = crypto.randomBytes(16);
const iv = crypto.pbkdf2Sync(keyBuffer, salt, 100000, IV_LENGTH, 'sha512');
return iv;
};

Last but not least, we need a way to parse S3 URIs (e.g., s3://bucket-name/key) that are used to specify the location on S3.

parseS3UriGitHub
const url = require('url');

const parseS3Uri = (uri) => {
const u = new url.URL(uri);
if (u.protocol !== 's3:') {
throw new Error('invalid S3 URI');
}
return {
Bucket: u.hostname,
Key: u.pathname
};
};

You might be impatient to see the implementation of the encryption. The encrypted data key is stored together with the IV and the file’s content on S3. You also add a small (8 bytes) header at the beginning of the file to add some metadata that you need for decryption. The idea to store the encrypted data key together with the encrypted data is called envelope encryption.

encryptGitHub
const s3 = new AWS.S3({apiVersion: '2006-03-01'});

const HEADER_LENGTH = 8;

const readFile = util.promisify(fs.readFile);

exports.encrypt = async (inputFile, s3Uri) => {
const encryptedKeyBuffer = await readFile(TEMP_DATA_KEY_FILE_NAME);
const decryptedKeyBuffer = await getDecryptedDataKeyBuffer(encryptedKeyBuffer);
const plainBuffer = await readFile(inputFile);
const ivBuffer = generateIVBuffer(decryptedKeyBuffer);
const cipher = crypto.createCipheriv('aes256', decryptedKeyBuffer, ivBuffer.toString('hex'));
const headerBuffer = Buffer.alloc(HEADER_LENGTH);
headerBuffer.writeUInt8(1, 0); // header version
headerBuffer.writeUInt8(0, 1); // reserved for future use
headerBuffer.writeUInt8(0, 2); // reserved for future use
headerBuffer.writeUInt8(0, 3); // reserved for future use
headerBuffer.writeUInt32LE(encryptedKeyBuffer.length, 4); // length of encrypted data key
const bodyBuffer = Buffer.concat([headerBuffer, encryptedKeyBuffer, ivBuffer, cipher.update(plainBuffer), cipher.final()]);
const params = Object.assign({}, parseS3Uri(s3Uri), {Body: bodyBuffer});
await s3.putObject(params).promise();
return s3Uri;
};

A encrypted object can not be uploaded to S3. Let’s look at the reverse operation:

decryptGitHub
exports.decrypt = async (s3Uri, outputFile) => {
const params = parseS3Uri(s3Uri);
const object = await s3.getObject(params).promise();
const bodyBuffer = object.Body;
const headerBuffer = bodyBuffer.slice(0, HEADER_LENGTH);
const headerVersion = headerBuffer.readUInt8(0);
if (headerVersion !== 1) {
throw new Error('Unsupported header version');
}
const encryptedKeyLength = headerBuffer.readUInt32LE(4);
const encryptedKeyBuffer = bodyBuffer.slice(8, 8 + encryptedKeyLength);
const decryptedKeyBuffer = await getDecryptedDataKeyBuffer(encryptedKeyBuffer);
const ivBuffer = bodyBuffer.slice(8 + encryptedKeyLength, 8 + encryptedKeyLength + IV_LENGTH);
const decipher = crypto.createDecipheriv('aes256', decryptedKeyBuffer, ivBuffer.toString('hex'));
const decryptedBuffer = Buffer.concat([decipher.update(bodyBuffer.slice(8 + encryptedKeyLength + IV_LENGTH)) , decipher.final()]);
await writeFile(outputFile, decryptedBuffer);
return outputFile;
};

You can find the full source code on GitHub.

AWS SDK + self-managed secret

You likely need an HSM device could be AWS CloudHSM implementing PKCS11 to implement the same idea as described above using AWS KMS.

Summary

S3 offers a bunch of options to encrypt your data at rest. Usually, the criticality of your data determines the options you can choose from. Is it okay if AWS technically sees your raw data? If yes, server-side encryption is the right option for you. If not, go with client-side encryption. Keep in mind that client-side encryption requires know-how and is more effort to implement compared to server-side encryption. The AWS Encryption SDKs (Java and python) might help to implement client-side encryption.

Michael Wittig

Michael Wittig

I’ve been building on AWS since 2012 together with my brother Andreas. We are sharing our insights into all things AWS on cloudonaut and have written the book AWS in Action. Besides that, we’re currently working on bucketAV, HyperEnv for GitHub Actions, and marbot.

Here are the contact options for feedback and questions.