Encrypting data on Amazon S3

March 20, 2017 Sepior

For cloud storage services such as Amazon S3, the need for encryption is clear. Encrypting data-at-rest in almost any solution has long become best practice, and most IaaS providers offering storage will also offer encryption. What is less clear is what type of key management is the best choice for your application.

In this blog post we consider the different options available for encrypting data-at-rest for applications using Amazon S3.

Encrypting data may seem simple, but consider the fact that Amazon S3 supports no less than five different ways to do this:

Server-Side Encryption with Amazon S3-Managed Keys
Server-Side Encryption with AWS KMS-Managed Keys
Server-Side Encryption with Customer-Provided Keys
Client-Side Encryption with an AWS KMS–Managed Customer Master Key
Client-Side Encryption with a Client-Side Master Key

Given that you can either (a) have AWS KMS create your keys or (b) bring your own keys (BYOK) to AWS KMS, this really gives you seven options. And that is not even counting the fact that you can combine available server-side methods with client-side methods.

To understand this complexity, let us start by confirming that encrypting data is in fact simple, and then move on to where the complexity enters the picture. You start out with the data you want to encrypt, the cleartext. You then use some well-known encryption algorithm, E (such as AES), to encrypt your data and obtain the encrypted data which is called the ciphertext:

ciphertext = E (key, cleartext)

When you want to decrypt your ciphertext, you use the matching decryption algorithm, D:

cleartext = D (key, ciphertext)

As you can see, we mention a key as input to both E and D. This is the cryptographic key used, and it is the most important piece of information (besides the actual data you want to protect). For symmetric algorithms such as AES, we use the same key for encryption and decryption. The core idea behind encrypting data-at-rest is that anybody who obtains a copy of the stored data, the ciphertext, cannot find out what the cleartext is…

Unless a copy of the cryptographic key is at hand!

Therefore, the cryptographic key used for encryption/decryption becomes the most critical piece of information when trying to ensure the protection of the cleartext.

For an attacker who desires to learn the cleartext, several different attack vectors exist. Here we list three properties that relate to some of the more critical ones (but the list is by no means exhaustive):

Quality of the key
Storage of the key
Usage of the key

Scenario 1: If the key is not sufficiently random, the attacker might have an easy (well, easier) time brute-forcing it. Cryptographers say that the key must be derived from a source of randomness with high entropy. This is not trivial; it can be hard to validate that a key generated by a third party is in fact random.

Scenario 2: If the attacker can gain access to the place where the key is stored, she can make a copy of the key, and then it is trivial to learn the cleartext. This may seem circular, in that the problem we are trying to solve is already that of protecting stored data. The reason this makes sense is that the cryptographic key will be much smaller than the cleartext. 256 bits is considered a very secure key size for an AES key, and such a key can protect gigabytes of data. Consequently we have reduced the problem of protecting the large cleartext when stored, with protecting the small key when stored. This makes it easier, and we will see different strategies for key storage in the analysis of Amazon S3 below.

Scenario 3: If the attacker can somehow eavesdrop on the computation performing the encryption/decryption, he could learn the key as it is used by this computation. In many scenarios however, if you can observe the key during the actual encryption process, you can most likely also observe the cleartext. Consequently it’s often accepted that an application with access to the cleartext data will also be granted access to the cryptographic keys. Not always though, but we will get back to this in another blog post.

Now, let us consider the 5 (or 7 combo) options that Amazon presents and analyse those using the three properties above as the main criteria:

In the server-side cases, Amazon S3 has direct access to the cryptographic keys in all three scenarios. The subtle differences lie in where the keys are stored and generated. In the AWS KMS case, any key is still under full Amazon control, though the customer may choose to generate the key herself. Also, Amazon S3 sees all cleartext data.

In the client-side cases, Amazon S3 *never* sees cleartext nor the keys, only the ciphertext. The main difference between the two client-side scenarios is whether Amazon (through AWS KMS) has access to the keys. In principle, Amazon never sees any cleartext data (unless of course the application is running on an Amazon service like EC2).

In the end, this boils down to two things:

Who can you trust?
What does it cost you?

Assuming it is more expensive for a customer to create and store keys on her own than to use AWS KMS, the apparent trade-off between cost and trust becomes evident. If you trust Amazon S3 completely, the cost is next to zero. If you do not trust them, then things might get very expensive and complex.

Sepior KMaaS mitigates this trade-off by providing a cost efficient cloud-friendly key management service, lowering both development and operations costs, while requiring basically zero trust in any third party service provider. If this is what your business requires, I suggest that you try out our Amazon S3 Java demo to get an idea of how KMaaS can protect your data in the cloud.