Encrypting sensitive data stored on S3
S3 comes with a bunch of features to encrypt your data at rest. Data at rest means inactive data stored physically on disk. Before we dive into encrypting data at rest, I want to highlight that there is also data in use and data in transit. If the data is in memory, it is in use. If the data is on the network, it is in transit. If you transfer data to S3, it is TLS encrypted by default. This blog post will guide you through all ways to encrypt your S3 data at rest.
Comparing options
S3 offers a bunch of options to encrypt your data at rest. The fundamental questions to compare the options are:
- Who en/decrypts the data? Data encryption can happen either on your side (client-side encryption) or on AWS (server-side encryption or SSE). When you encrypt data on your side, the data transferred to S3 is already encrypted. S3 never sees the raw data. Server-side encryption is different because you send the raw data to S3 where it is encrypted.
- Who stores the secret? Imagine you encrypted all your pictures and uploaded them to S3. You store the secret used for encryption on your USB stick. A few months later, you want to look at your pictures. Unfortunately, the USB stick where your stored the secret broke. The loss of the USB stick is a catastrophe. You are no longer able to decrypt your pictures. They are gone forever.
- Who manages the secret? Data encryption makes no sense if everyone can access your secret. Managing access to the secret is a great responsibility.
The following table summaries the available options on S3 to encrypt your data at rest.
Who en/decrypts the data | Who stores the secret | Who manages the secret | |
---|---|---|---|
SSE-AES | AWS | AWS | AWS |
SSE-KMS (AWS managed CMK) | AWS | AWS | AWS |
SSE-KMS (customer managed CMK) | AWS | AWS | you |
SSE-C | AWS | you | you |
AWS SDK + KMS (AWS managed CMK) | you | AWS | AWS |
AWS SDK + KMS (customer managed CMK) | you | AWS | you |
AWS SDK + self-managed secret | you | you | you |
Let’s dive into the details of each option.
Server-side encryption
Server-side encryption means that you send unencrypted raw data to AWS. On the AWS infrastructure, the raw data is encrypted and finally stored on disk. When you retrieve data, AWS reads the encrypted data from the disk, decrypts the data, and sends raw data back to you. The en/decryption is transparent to the AWS user.
SSE-AES
SSE-AES
is a straightforward approach. AWS handles encryption and decryption for you on the server-side using the aes256 algorithm. AWS also controls the secret key that is used for encryption/decryption.
To upload a file and store it encrypted, run:
aws s3 path/to/local.file s3://bucket-name/sse-aes --sse AES256 |
To download the decrypted file, run:
aws s3 s3://bucket-name/sse-aes path/to/local.file |
SSE-KMS (AWS managed CMK)
SSE-KMS
is very similar to SSE-AES
. The only difference is that the secret key (aka AWS managed Customer Master Key (CMK)) is provided by the KMS service and not by S3.
To upload a file and store it encrypted, run:
aws s3 path/to/local.file s3://bucket-name/sse-kms --sse aws:kms |
To download the decrypted file, run:
aws s3 s3://bucket-name/sse-kms path/to/local.file |
The AWS managed CMK comes with the following default key policy that you can not modify. The default key policy allows:
- The S3 service is called from the same AWS account to encrypt/decrypt using the CMK
- IAM in the same AWS account to use authorize read-only API actions
This is the policy in its full length:
{ |
You can not delete or restrict the AWS managed CMK used by S3!
SSE-KMS (customer managed CMK)
Alternatively, you can manage the secret key (aka Customer managed Customer Master Key) using the KMS service. You create a Customer Master Key (CMK) and reference that key for encryption/decryption. At any time, you can delete the CMK to make all data useless. You also have full control over the CMK by customizing the key policy.
To create a basic CMK, run:
aws kms create-key |
The key policy will allow access from all IAM entities in your AWS account (as long as the IAM policy allows it).
Your output will look similar:
{ |
Remember somewhere the KeyId
value (e.g., 858d8d36-c87b-4b48-9a41-b69b7ad9d4e2
).
To upload a file and store it encrypted using your newly created CMK, run and replace KMS_KEY_ID
with the KeyId
value:
aws s3 path/to/local.file s3://bucket-name/sse-kms-cmk --sse aws:kms --sse-kms-key-id KMS_KEY_ID |
To download the decrypted file, run:
aws s3 s3://bucket-name/sse-kms-cmk path/to/local.file |
Now, disable the CMK:
aws kms disable-key --key-id KMS_KEY_ID |
And try to download the file again and you will run into an error (dKMS.DisabledException). That’s the difference compared with the AWS managed CMK that you can not control.
Finally, mark the CMK for deletion to avoid future costs:
aws kms schedule-key-deletion --key-id KMS_KEY_ID |
You will never be able to retrieve the file from S3 once you delete the CMK!
SSE-C
With SSE-C
, you are in charge of the secret key while AWS still cares about encryption/decryption. Every time you call the S3 API, you also have to attach the secret key.
To generate a random 32 bytes (256 bits) secret key, run:
openssl rand -out sse-c.key 32 |
To upload a file and store it encrypted, run:
aws s3 cp path/to/local.file s3://bucket-name/sse-c --sse-c AES256 --sse-c-key fileb://sse-c.key |
The big difference comes when you want to download the file again. Now you also have to provide the secret key.
aws s3 cp s3://bucket-name/sse-c path/to/local.file --sse-c AES256 --sse-c-key fileb://sse-c.key |
If you lose the key, you can not retrieve the data from S3 anymore!
Client-side encryption
Client-side encryption means that you encrypt the data before you send it to AWS. It also means that you decrypt the data that you retrieve from AWS. Usually, client-side encryption needs to be deeply embedded into your application.
AWS SDK + KMS
You can use the AWS SDK to upload/download files from S3. The KMS service can generate data keys that you can use for encryption/decryption. The data key itself is encrypted using the KMS Customer Master Key. If you want to use the encrypted data key, you have to send the encrypted data to the KMS service and ask for decryption. The decrypted data key is only returned if the CMK is still available and you have permissions to use it.
I implemented client-side encryption using the AWS SDK for Node.js. The encrypted key is uploaded together with the object to reduce the risk of losing the data key for a particular object. Keep in mind that the implementation is not very efficient since you will make a call to the KMS service for every encrypt and decrypt operation. It could make sense to keep the data keys in memory and reuse them for multiple objects.
You can find the full source code on GitHub.
Let’s dive into the code.
First, we need a way to create an encrypted data key. The encrypted data key is stored on disk as a performance optimization. Multiple encrypt calls reuse the data key as long as you do not remove or override the file data.key
in your current working directory.
const util = require('util'); |
You created a data key, but it is encrypted. Before you can use the data key, you have to decrypt it first.
const getDecryptedDataKeyBuffer = async (encryptedKeyBuffer) => { |
The AES algorithm that you will use for encryption relies on an initialization vector (IV). The IV is generated randomly and ensures that similar data results in very different ciphertext. The IV is also needed when decrypting the ciphertext.
const crypto = require('crypto'); |
Last but not least, we need a way to parse S3 URIs (e.g., s3://bucket-name/key
) that are used to specify the location on S3.
const url = require('url'); |
You might be impatient to see the implementation of the encryption. The encrypted data key is stored together with the IV and the file’s content on S3. You also add a small (8 bytes) header at the beginning of the file to add some metadata that you need for decryption. The idea to store the encrypted data key together with the encrypted data is called envelope encryption.
const s3 = new AWS.S3({apiVersion: '2006-03-01'}); |
A encrypted object can not be uploaded to S3. Let’s look at the reverse operation:
exports.decrypt = async (s3Uri, outputFile) => { |
You can find the full source code on GitHub.
AWS SDK + self-managed secret
You likely need an HSM device could be AWS CloudHSM implementing PKCS11 to implement the same idea as described above using AWS KMS.
Summary
S3 offers a bunch of options to encrypt your data at rest. Usually, the criticality of your data determines the options you can choose from. Is it okay if AWS technically sees your raw data? If yes, server-side encryption is the right option for you. If not, go with client-side encryption. Keep in mind that client-side encryption requires know-how and is more effort to implement compared to server-side encryption. The AWS Encryption SDKs (Java and python) might help to implement client-side encryption.