How exactly to encrypt existing objects within Amazon S3 making use of S3 Inventory retroactively, Amazon Athena, and S3 Batch Operations

Amazon Simple Storage Service (S3) can be an object storage services that offers industry-top scalability, performance, security, and information availability. With Amazon S3, it is possible to pick from three different server-aspect encryption configurations when uploading items:

  • SSE-S3 – uses Amazon S3-maintained encryption keys
  • SSE-KMS – utilizes customer grasp keys (CMKs) kept in AWS Key Management Service (KMS)
  • SSE-C – utilizes master keys supplied by the client in each Place or GET demand

These options permit you to choose the best encryption way for the working job. But simply because your company evolves and new specifications arise, you might find you need to change the encryption configuration for several objects. For example, you could be necessary to use SSE-KMS rather than SSE-S3 as you need more handle on the lifecycle and permissions of the encryption keys to be able to meet compliance goals.

The settings could possibly be changed by you on your own buckets to utilize SSE-KMS instead of SSE-S3, but the switch just impacts uploaded objects, not really objects that existed in the buckets prior to the noticeable modify in encryption settings. Manually re-encrypting older objects below master keys within KMS may be time-prohibitive depending on just how many objects there are. Automating this effort can be done using the right mix of features in AWS providers.

In this article, I’ll demonstrate how exactly to use Amazon S3 Inventory, Amazon Athena, and Amazon S3 Batch Functions to supply insights about the encryption position of objects inside S3 also to remediate incorrectly encrypted items inside a massively scalable, resilient, and cost-effective method. The answer uses a similar method of the one described in this blog post, nonetheless it has been made with automation and multi-bucket scalability at heart. Tags are accustomed to target specific noncompliant buckets within an accounts, and any encrypted (or unencrypted) object could be re-encrypted making use of SSE-S3 or SSE-KMS. Versioned buckets are usually supported also, and the solution functions on a regional degree.

Take note: You may’t re-encrypt to or even from objects encrypted in SSE-C. The reason being the expert key material should be provided through the GET or Place request, and cannot be supplied as a parameter for S3 Batch Procedures.

Moreover, the complete solution could be deployed inside under five minutes using AWS CloudFormation. Tag your buckets focused for encryption simply, the perfect solution is artifacts into S3 upload, and deploy the artifact template through the CloudFormation gaming console. In the next sections, you shall note that the architecture offers been created to be user friendly and operate, while at the same time that contains numerous customizable functions for more complex users.

Solution overview

At a higher level, the core top features of the architecture contain 3 services getting together with each other: S3 Inventory reports (1) are delivered for targeted buckets, the report shipping events trigger an AWS Lambda functionality (2), and the Lambda functionality then executes S3 Batch (3) jobs utilizing the reports as insight to encrypt targeted buckets. Number 1 below and the rest of this section give a more descriptive look at what’s happening within the surface. If this isn’t of high curiosity for you, feel absolve to skip in order to the Prerequisites and Remedy Deployment sections ahead.

Shape 1: Solution architecture overview

Figure 1: Solution architecture overview

Here’s an in depth overview of the way the solution works, like shown in Figure 1 above:

  1. When the CloudFormation template is very first launched, a true amount of resources are created, including:
    • An S3 bucket to shop the S3 Inventory reviews
    • An S3 bucket to shop S3 Batch Job completion reviews
    • A CloudWatch event that’s set off by changes to tags in S3 buckets
    • An AWS Glue Data source and AWS Glue Tables which you can use by Athena to query S3 Stock and S3 Batch review findings
    • A Lambda function that’s used as a Custom made Resource during template start, and afterwards as a focus on for S3 event notifications and CloudWatch events
  2. During deployment associated with the CloudFormation template, the Lambda-backed Custom Useful resource lists all S3 buckets within the AWS Area specified and checks in order to see if any includes a configurable tag existing (configured through an AWS CloudFormation parameter). Whenever a bucket with the specific tag is found out, the Lambda configures an S3 Inventory survey for the uncovered bucket to be sent to the newly-developed central document destination bucket.
  3. When a fresh S3 Inventory statement arrives in to the central report location bucket (that may take between 1-2 days) from the tagged buckets, a good S3 Event Notification triggers the Lambda to procedure it.
  4. The Lambda function first adds the road of the report CSV file as a partition to the AWS Glue table. Which means that as each bucket delivers its record, it becomes queryable by Athena instantly, and any queries executed come back the newest information on the standing of the S3 buckets in the accounts.
  5. The Lambda function then checks the worthiness of the EncryptBuckets parameter in the CloudFormation release template to assess whether any re-encryption action ought to be taken. If it’s set to yes, the Lambda function creates an S3 Batch executes and job it. The work takes each object listed in the manifest copies and report it over in the same location. Once the copy occurs, SSE-KMS or SSE-S3 encryption is specific in the operating job parameters, re-encrypting properly all identified items effectively.
  6. As soon as the batch work finishes for the S3 Inventory review, a completion survey is delivered to the main batch job document bucket. The CloudFormation template offers a parameter that settings the option to add either all effectively processed objects or just objects which were unsuccessfully processed. These reports could be queried with Athena furthermore, since the reports may also be additional as partitions to the AWS Glue batch reviews tables because they arrive.


To check out the sample deployment, your AWS Identification and Access Administration (IAM) principal (user or even role) needs administrator accessibility or equivalent.

Solution deployment

For this walkthrough, the solution will be configured to encrypt items using SSE-KMS, than SSE-S3 rather, when an inventory statement is delivered for a bucket. Please be aware that the key plan of the KMS essential will be instantly updated by the customized resource during start to allow S3 to utilize it to encrypt stock reports. No key plans are transformed if SSE-S3 encryption will be selected instead. The configuration in this walkthrough adds a tag to all or any recently encrypted objects also. You’ll figure out how to utilize this tag to restrict usage of unencrypted objects within versioned buckets. I’ll make callouts through the entire deployment guideline for when you’re able to choose a different construction from what’s deployed in this article.

To deploy the answer architecture and validate its efficiency, you’ll perform five methods:

  1. Tag focus on buckets for encryption
  2. Deploy the CloudFormation template
  3. Validate delivery of S3 Inventory reviews
  4. Confirm that reports are usually queryable with Athena
  5. Validate that objects are usually correctly encrypted

In case you are only thinking about deploying the perfect solution is and encrypting your existing atmosphere, Steps 1 and 2 are that are necessary to be completed. Methods 3 through 5 are usually optional however, and outline processes that you’ll perform to validate the option’s functionality. They’re primarily for users that are looking to dive strong and benefit from all the features available.

With that said, allow’s get started doing deploying the architecture!

Action 1: Tag focus on buckets

Demand Amazon S3 system and identify which buckets ought to be targeted for encryption and inventorying. For each determined bucket, tag it with a specified key value set by selecting Attributes > Tags > Include tag. This demo utilizes the tag __Inventory: real and tags only 1 bucket called adams-lambda-functions, as shown in Body 2.

Figure 2: Tagging the bucket focused for encryption within Amazon S3Figure 2: Tagging the bucket focused for encryption within Amazon S3

Phase 2: Deploy the CloudFormation template

  1. Download the S3 encryption solution. You will have two files that define the backbone of the answer:
    • encrypt.py, which provides the Lambda microservices logic;
    • deploy.yml, that is the CloudFormation template that deploys the perfect solution is.
  2. Zip the document encrypt.py, rename it to encrypt.zip, and upload it into any S3 bucket that’s in exactly the same Region because the one where the CloudFormation template will undoubtedly be deployed. Your bucket should appear to be Figure 3:
    Number 3: encrypt.zip uploaded into a good S3 bucketDetermine 3: encrypt.zip uploaded into a good S3 bucket
  3. Navigate to the CloudFormation gaming console and create the CloudFormation stack utilizing the deploy then.yml template. To learn more, see Getting Started with AWS CloudFormation within the CloudFormation User Guideline. Figure 4 displays the parameters utilized to attain the configuration specified because of this walkthrough, with the areas outlined in red requiring input. It is possible to choose your own construction by altering the correct parameters if the people specified usually do not fit your use situation.
    Number 4: Established the parameters within the CloudFormation stackFigure 4: Established the parameters within the CloudFormation stack

Stage 3: Validate shipping of S3 Inventory reviews

After you’ve deployed the CloudFormation template successfully, select all of your tagged S3 buckets and be sure it now comes with an S3 Inventory record configuration. To get this done, demand S3 console, decide on a tagged bucket, choose the Management tab, and select Inventory, as shown in Physique 5. You need to see that a listing configuration exists. A listing report will be sent to this bucket within one to two 2 days automatically, based on the true number of items in the bucket. Take note of the true title of the bucket where in fact the inventory report will undoubtedly be delivered. The bucket is provided a semi-random title during development through the CloudFormation template, so creating a take note of this can help you discover the bucket easier when you look for report delivery later.

Figure 5: Be sure the tagged S3 bucket comes with an S3 Inventory review configurationFigure 5: Be sure the tagged S3 bucket comes with an S3 Inventory review configuration

Step 4: Concur that reports are usually queryable with Athena

  1. After one to two 2 days, demand inventory reviews destination bucket and concur that reports have already been delivered for buckets with the __Stock: true tag. As shown in Figure 6, a written report has been shipped for the adams-lambda-functions bucket.

    Determine 6: Confirm shipping of reviews to the S3 reviews destination bucket

    Figure 6: Confirm shipping of reviews to the S3 reviews destination bucket

  2. Next, demand Athena console and choose the AWS Glue data source which has the table keeping the schema and partition locations for several of your reports. If the default was utilized by you ideals for the parameters once you released the CloudFormation stack, the AWS Glue data source will undoubtedly be named s3_stock_database, and the desk will be named s3_inventory_table. Run the next query in Athena:
    SELECT encryption_position, count(*) FROM s3_stock_table Team BY encryption_status;

    The outputs of the query will be a snapshot aggregate count of objects in the types of SSE-S3, SSE-C, SSE-KMS, or NOT-SSE across your tagged bucket environment, before encryption occurred, as shown in Figure 7.

    Figure 7: Query outcomes in AthenaFigure 7: Query outcomes in Athena

    From the query outcomes, you can view that the adams-lambda-functions bucket had only two items inside it, both which were unencrypted. At this true point, you can elect to perform any analytics with Athena on the shipped inventory reports.

Action 5: Validate that items are correctly encrypted

  1. Navigate to all of your focus on buckets within Amazon S3 and verify the encryption standing of several sample items by selecting the Properties tab of every object. The objects ought to be encrypted using the specific KMS CMK now. As you set the AddTagToEncryptedObjects parameter to yes through the CloudFormation stack release, these objects also needs to have the __ObjectEncrypted: true tag present. For example, Figure 8 displays the rules_existing_rule.zip item from the adams-lambda-functions bucket. This item has been encrypted utilizing the correct KMS important properly, which includes an alias of blog in this illustration, and it has already been tagged with the specific crucial value pair.
    Determine 8: Checking the encryption status of an item in S3Number 8: Checking the encryption position of an object within S3
  2. For further validation, navigate back again to the Athena system and choose the s3_batch_table from the s3_inventory_database, let’s assume that you still left the default brands unchanged. After that, run the next query:
    SELECT * FROM s3_batch_table;

    If encryption was prosperous, this query should bring about zero items being returned as the solution automagically only delivers S3 batch work completion reviews on items that didn’t duplicate. After validating by inspecting both items themselves and the batch completion reviews, you can now properly state that the contents of the focused S3 buckets are properly encrypted.

Next steps

Congratulations! You’ve successfully operated and deployed a remedy for rectifying S3 buckets with incorrectly encrypted and unencrypted objects. The architecture will be scalable because it utilizes S3 Batch Functions and Lambda massively, it’s serverless fully, and it’s inexpensive to run.

Please note that should you selected zero for the EncryptBuckets parameter through the initial start of the CloudFormation template, it is possible to retroactively perform encryption on targeted buckets by performing a stack update simply. Through the stack update, change the EncryptBuckets parameter to yes, and proceed with deployment as regular. The upgrade will reconfigure S3 inventory reviews for several focus on S3 buckets to find the most up-to-date stock. After the reviews are delivered, encryption shall proceed while desired.

Moreover, with the answer deployed, it is possible to target fresh buckets for encryption simply by adding the __Inventory: true tag. CloudWatch Activities will sign up the tagging motion and immediately configure an S3 Stock report to be shipped for the recently tagged bucket.

Finally, given that your S3 buckets are usually encrypted properly, you should have a few even more manual steps to greatly help sustain your newfound account hygiene:

  • Perform remediation on unencrypted items that may have didn’t copy through the S3 Batch Procedures work. The most typical reason that objects neglect to duplicate is when object dimension exceeds 5 GiB. S3 Batch Functions uses the typical CopyObject API call within the surface area, but this API contact can only just handle objects significantly less than 5 GiB in proportions. To copy these items effectively, you can change the solution you discovered in this article to launch an S3 Batch Operations job that invokes Lambda features. In the Lambda functionality logic, you may make CreateMultipartUpload API calls on items that unsuccessful with a typical copy. The initial batch job completion reports provide detail which objects didn’t encrypt because of size exactly.
  • Prohibit the retrieval of unencrypted item variations for buckets that got versioning enabled. Once the item is usually copied over itself through the encryption process, the old unencrypted version of the thing exists. This will be where the choice in the perfect solution is to specify a tag on all freshly encrypted objects becomes helpful—now you can use that tag to draft a bucket plan that prohibits the retrieval of aged unencrypted items in your versioned buckets. For the answer that you deployed in this article, this type of policy would appear to be this:
      "Version": "2012-10-17",
      "Statement": [
          "Effect":     "Deny",
          "Action":     "s3:GetObject",
          "Source":    "arn:aws:s3:::adams-lambda-functions/*",
          "Principal":   "*",
          "Condition":   "StringNotEquals": "s3:ExistingObjectTag/__ObjectEncrypted": "true"  
  • Update bucket policies to avoid the upload of unencrypted or incorrectly encrypted objects. By updating bucket guidelines, you help make sure that in the upcoming, newly uploaded objects will undoubtedly be encrypted correctly, which can only help maintain accounts hygiene. The S3 encryption solution presented is intended to become a onetime-use remediation device here, while you should look at updating bucket plans as a preventative actions. Proper usage of bucket guidelines shall help make sure that the S3 encryption alternative isn’t needed again, unless another encryption necessity transformation occurs in the foreseeable future. For more information, see How to avoid Uploads of Unencrypted Objects to Amazon S3.

When you have feedback concerning this post, submit remarks in the Comments section below. Should you have questions concerning this post, start a brand-new thread on the Amazon S3 forum.

Want a lot more AWS Security how-to articles, news, and show announcements? Stick to us on Twitter.


Adam Kozdrowicz

Adam is really a Machine and Data Understanding Engineer for AWS Expert Services. He works together with enterprise customers building large data apps on AWS closely, and he enjoys dealing with frameworks such as for example AWS Amplify, SAM, and CDK. During his leisure time, Adam loves to surf, travel, practice pictures, and build device learning models.

%d bloggers like this: