AWS S3 Lifecycle Management

From UCSC Genomics Institute Computing Infrastructure Information

Revision as of 20:59, 10 March 2022 by Anovak (talk | contribs) (→‎Restoring Objects)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

AWS S3 Lifecycle Policy Overview

AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were modified or accessed. AWS S3 objects are stored in the Standard storage class by default, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage classes maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.

It is recommended to utilize the appropriate storage classes for your data.

  • If you have data that you do not expect to access more than once a month, AWS Infrequent Access is a reasonable storage class to use.
  • If you have data that you do not expect to access more than once a year, AWS Glacier is a reasonable storage class to use.

UCSC GI Automated Policy

In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to AWS Intelligent-Tiering, which monitors S3 object access patterns and transitions objects to more efficient storage classes accordingly.

  • Objects uploaded to S3 will remain in the Standard storage class for 1 day, at which point they will be transitioned to Intelligent-Tiering.
  • Old and new S3 buckets will have this lifecycle policy automatically attached.

AWS Intelligent Tiering functionality:

  • Intelligent-Tiering does not change object access patterns. This means you do not need to execute special API commands to access objects.
  • Intelligent-Tiering does not incur charges for object retrieval from different tiers.

For more details on AWS Intelligent Tiering, see the AWS Docs

Restoring Objects

If an object has not been accessed for a while, you may encounter an error like this when trying to access it:

An error occurred (InvalidObjectState) when calling the GetObject operation: The operation is not valid for the object's access tier

This means that the object is in Glacier, either because somebody put it there, or because Intelligent-Tiering moved it there after it was not accessed for a while. If you want to access it, you will need to restore it (and our AWS account will be billed for doing so).

To restore an object, you can use the S3 section of the AWS web console.

You can also restore an object from the command line with the AWS CLI tool. To restore the object s3://bucket-name/path/to/object.dat, you would issue the command:

aws s3api restore-object --restore-request "{}" --bucket "bucket-name" --key "path/to/object.dat"

If the object was manually put in Glacier, you would instead need --restore-request "Days=7", or some other number of days.

Note that you need to specify the bucket name and key within the bucket separately, instead of using an S3 URI.

Restores from Glacier are not immediate, or even particularly fast. Jyn Erso has to go down to the Scarif data vault and find the right data-tape, and it takes a few hours, even if your file is small.