Excluding Sub directories from AWS S3 Lifecycle Rules Effectively

0

I have a scenario where I need to exclude certain subdirectories (subprefixes) inside a directory (prefix) from AWS S3 lifecycle rules. Specifically, I have a structure like this in my S3 bucket:

$ aws s3 ls
2024-06-12 22:24:09 userdata.live

$ aws s3 ls s3://userdata.live
                           PRE 114/
                           PRE 123/
                           PRE 145/
$ aws s3 ls s3://userdata.live/123/
                           PRE profile/
2024-06-12 22:25:18       4070 123330091122212.jpeg                         
2024-06-29 21:12:33      26600 1718475355_8712480182.png
2024-06-29 21:12:33       4070 1719692995_66812ec3e8a21.jpeg
2024-06-29 21:12:34       4070 1719693662_6680915ecff8c.jpeg
2024-06-29 21:12:34       4070 1719693773_6681221d580df.jpeg

My transition lifecycle rule is on Entire bucket but I wish to exclude all objects under any subprefix/ from this rule e.g: Lifecycle rule should not apply to all profile/ sub directory within the all users. I've more than 30K directories (userIDS).

My lifecycle rule is

{
    "Rules": [
        {
            "ID": "Change from Standard to Standard IA",
            "Filter": {},
            "Status": "Enabled",
            "Transitions": [
                {
                    "Days": 30,
                    "StorageClass": "STANDARD_IA"
                }
            ]
        }      
    ]

Please share.

Artem
asked 6 days ago522 views
3 Answers
2
Accepted Answer

There's no way to accomplish that exclusively with lifecycle rules. The only criteria available for choosing objects to target are the key prefix (not a "doesn't contain a /" type of condition, for example), the tags attached to an object, and the size of the object: https://docs.aws.amazon.com/AmazonS3/latest/API/API_control_LifecycleRuleFilter.html. No combination of those could express, "does not contain a '/'" type of filter condition.

S3 lifecycle rules can use tags to choose the objects to target, though, but you'd need to add the tags with some other mechanism first. One way without a great deal of custom coding could be to use S3 Inventory (https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html) to produce a list of all the objects in the bucket at regular intervals; filter that S3 inventory report in a Lambda function to produce a list of the objects whose keys "do not contain a /" and aren't already tagged; and feed that filtered inventory list to an S3 Batch Jobs task (https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-create-job.html#specify-existing-manifest-file) to add the tags (https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-put-object-tagging.html). Note that S3 Batch Operations replaces all the tags, so this is straightforward only if you aren't using object tagging for other purposes, as it sounds you probably aren't.

Once the tags are added, the S3 lifecycle rule could simply target objects with that custom tag value.

If what you're practically after is optimising storage costs for end user "folders", then rather than building custom logic to transition objects to a different storage class, you might want to consider transitioning all objects larger than 128 kiB from Standard to the Intelligent Tiering class: https://docs.aws.amazon.com/AmazonS3/latest/userguide/intelligent-tiering-overview.html. Aside from the one-time cost to transition the objects to Intelligent Tiering and a very low, continuous per-object fee for activity monitoring, there are no additional costs from Intelligent Tiering automatically tiering your objects based on their access patterns between tiers equivalent to Standard, Standard-IA, and Glacier Instant Retrieval classes.

Notably and differently from all other storage classes, Intelligent Tiering not only transitions objects down to less frequently accessed classes, but it also brings objects back up to the frequent access tier when they start to get accessed again.

EXPERT
Leo K
answered 6 days ago
profile picture
EXPERT
reviewed 5 days ago
profile picture
EXPERT
reviewed 6 days ago
  • What I understand, tags are crucial for my needs.

    If I want to transition any objects, **those objects must be tagged. ** Objects that I do not want to affect with the lifecycle rule will remain untagged. Is that correct? For example, no need to add tags to objects under 'profile'?"

    e.g: This rule will only work for which objects should have tag? Please correct me if I'm wrong.

     {
                "ID": "Move current to Standard IA",
                "Filter": {
                    "Tag": {
                        "Key": "allowTransition",
                        "Value": "true"
                    }
                },
                "Status": "Enabled",
                "Transitions": [
                    {
                        "Days": 30,
                        "StorageClass": "STANDARD_IA"
                    }
                ]
            }
    
  • Yes. If you want to control lifecycle transitions based on the "folder structure" as you described, tags are the only way you can do that. However, if you're just looking for cost control with the help of different storage classes, I'd warmly recommend not doing that and letting AWS's automation take care of it for you with S3 Intelligent Tiering. All you'd need is a single lifecycle rule to transition all objects (including those in folders) >128 KB to Intelligent Tiering, and AWS would take care of the rest.

  • Yes, cost savings is a priority. I have 4-5 sub directories (logo, pics, pdf) inside each directory that need to be always accessible, while objects outside sometimes these are less frequently accessed. New uploads go into Standard, so now please suggest if Intelligent is the good choice to go? If Yes Do I've to transition after 30 days to Intelligent or new upload will go to Intelligent, Please suggest Thank you!

1

If the numbers of objects aren't astronomically large, and if a good fraction of the objects are larger than 128 kiB in size, since it's the threshold needed both for Standard-IA and Intelligent Tiering to deliver any cost savings, and given that you said already that many objects aren't regularly accessed, then I'd say Intelligent Tiering is likely a very good fit.

You can transition objects from Standard to Intelligent Tiering without having to wait for 30 days. You could also upload the objects directly to Intelligent Tiering, without going through Standard at all. Once the objects are in the Intelligent Tiering class, the transitions both from frequent to infrequent tiers, for example, and back from infrequent to frequent happen internally in Intelligent Tiering.

The only major consideration for a moderate number of objects when deciding between Intelligent Tiering and Standard-IA or Glacier Instant Retrieval is that when a rarely accessed object is automatically tiered to one of the less expensive storage tiers by Intelligent Tiering, it only takes a single read operation to trigger it to be promoted back to the frequent access tier.

For a huge number of use cases, this makes for an excellent, cost-efficient balance, but there are edge cases where it's not the cheapest route. For example, if you had large amounts of data in objects that do keep getting accessed, just not very often, then depending on exactly how frequently they are accessed, it might be somewhat more cost-efficient to keep them in Standard-IA or Glacier Instant Retrieval (from which they would never get promoted to Standard class) and pay the higher request fees for the infrequent read operations, as opposed to Intelligent Tiering promoting each object back to the most expensive tier every time the object is accessed.

EXPERT
Leo K
answered 6 days ago
  • Thank you for detailed explanation.

0

I agree with all the recommendations and detailed explanation shared by Leo K. Adding to this, if you've 30K+ directories out of which 4-5 sub directories inside each directory will always be accessible, you should analyze this pattern using Storage Class Analysis report: https://docs.aws.amazon.com/AmazonS3/latest/userguide/analytics-storage-class.html that will help you decide when to transition the right data to the right storage class. You can configure SCA to analyze all the objects in a bucket or, you can configure filters to group objects together for analysis by common prefix (that is, objects that have names that begin with a common string).

With this analysis, you will understand out of Total Storage, how much data is accessed in last number of days(0-14, 15-29, etc.), further calculating the price: 1/ For Total Objects when moved to INT-Tiering/S3-IA/GIR, 2/ Assuming if 10% of Total Objects are accessed and promoted back to the frequent access tier(in case of INT-Tiering), get the 10% cost of these objects, 3/ Assuming if 10% of Total Objects are accessed from S3-IA, calculate the Storage cost + API Request Cost + Data Retrieval per GB, 4/ Assuming if 10% of Total Objects are accessed from GIR, calculate the Storage cost + API Request Cost + Data Retrieval per GB,

Compare this cost with the current Storage/API Cost that will help in choosing the right Storage Class to transition.

AWS
answered 6 days ago