Bytes transferred for cloudfront distribution in CUR reports

0

I have created a legacy CUR export in Billing and cost management>Data Exports. After parsing the output file, i am unable to locate the bytes transferred or the bytes for which I was billed for. Can you please guide how can i get the total bytes per distribution?

I also looked at creating CUR2.0 report (standard data export) and the selected the following colums:

`{"QueryStatement":"SELECT bill_billing_period_end_date, bill_billing_period_start_date, line_item_product_code, line_item_resource_id, line_item_usage_type, resource_tags FROM COST_AND_USAGE_REPORT"}

{"TableConfigurations":{"COST_AND_USAGE_REPORT":{"INCLUDE_RESOURCES":"TRUE","INCLUDE_SPLIT_COST_ALLOCATION_DATA":"TRUE","TIME_GRANULARITY":"DAILY"}}}`

Didn't get the information that I am looking for.

I looked at created standard logs in a s3 bucket for the distribution, but the online documentation says:

Important We recommend that you use the logs to understand the nature of the requests for your content, not as a complete accounting of all requests. CloudFront delivers access logs on a best-effort basis. The log entry for a particular request might be delivered long after the request was actually processed and, in rare cases, a log entry might not be delivered at all. When a log entry is omitted from access logs, the number of entries in the access logs won't match the usage that appears in the AWS usage and billing reports.

My goal is to save the bytes (not the final cost) for which I was billed against each CF distribution and save that separately for different accounting purposes.

Kamari
asked 12 days ago168 views
2 Answers
2
Accepted Answer

Hello,

Please try this solution.

Step 1 Ensure CUR Includes Necessary Columns

your CUR includes the following columns:

"line_item_usage_account_id" "line_item_resource_id" "line_item_usage_type" "line_item_usage_amount" "line_item_product_code"

Step 2 Download CUR File

Download the CUR file from your S3 bucket to your local machine or directly to your processing environment.

Step 3 Install Pandas

If you don't have Pandas installed, you can install it using pip:

pip install pandas

Step 4 Simple Python Script

Here's a Python script to parse the CUR file and calculate the total bytes transferred for each CloudFront distribution:

import pandas as pd

# Load the CUR file into a DataFrame
cur_file_path = 'path_to_your_cur_file.csv'  # Update this path
df = pd.read_csv(cur_file_path)

# Filter rows relevant to CloudFront data transfer
cloudfront_df = df[
    (df['line_item_product_code'] == 'AmazonCloudFront') &
    (df['line_item_usage_type'].str.contains('DataTransfer', na=False))
]

# Aggregate the bytes transferred per distribution
bytes_transferred = cloudfront_df.groupby('line_item_resource_id')['line_item_usage_amount'].sum().reset_index()

# Rename columns for better readability
bytes_transferred.columns = ['Distribution ID', 'Total Bytes Transferred']

# Save the result to a CSV file
output_file_path = 'cloudfront_bytes_transferred.csv'
bytes_transferred.to_csv(output_file_path, index=False)

print(f"Total bytes transferred per distribution have been saved to {output_file_path}")

Steps to Run the Script:

Download your CUR file from your S3 bucket and note the path where it's saved.

Update the script:

  1. Replace 'path_to_your_cur_file.csv' with the actual path to your CUR file.
  2. (Optional) Change the output_file_path if you want the results saved to a different location or with a different name.

Run the script:

Save the script to a file, for example, cloudfront_usage.py.

Execute the script using Python:

python cloudfront_usage.py

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/reports-billing.html

https://docs.aws.amazon.com/cur/latest/userguide/use-cases.html

https://docs.aws.amazon.com/cur/latest/userguide/cur-data-transfers-charges.html

EXPERT
answered 11 days ago
profile picture
EXPERT
reviewed 8 days ago
profile picture
EXPERT
reviewed 8 days ago
profile picture
EXPERT
reviewed 8 days ago
  • Thank you, Partha. It worked - and thanks for pointing to the right column that has this information. The Online documentation isn't very clear about this though.

0

You will need an export containing the following columns from the CUR data:

line_item_usage_start_date
line_item_resource_id
line_item_usage_amount
line_item_unblended_cost
line_item_net_unblended_cost **<- This will only be in your data if you are getting a discount on the OnDemand pricing **
line_item_usage_type
line_item_line_item_type
product_region

You then can create the following query:

SELECT	
    month as usage_month,	
    line_item_resource_id,
    data_transfer_out_gb,	
    ondemand_cost,	
    discounted_cost, **<- Remove line_item_net_unblended_costdoes no exist**	
    (1 - discounted_cost/ondemand_cost) as discount_percent, **<- Remove line_item_net_unblended_costdoes no exist**		
    operation,	
    product_region	
FROM	
(SELECT 	
	date(date_trunc('month',line_item_usage_start_date)) AS month,
       (CASE 	
           WHEN line_item_usage_type LIKE '%DataTransfer-Out-Bytes' THEN 'DataTransfer Out'
           WHEN line_item_usage_type LIKE '%-Requests-%' AND line_item_usage_type NOT LIKE '%-OriginShield' THEN 'http/https (proxy) requests'	
           ELSE 'Other CloudFront operations'
        END) as operation,	
	product_region,
        line_item_resource_id,
        sum(line_item_usage_amount) as data_transfer_out_gb,
	sum(line_item_unblended_cost) as ondemand_cost,
	sum(line_item_net_unblended_cost) as discounted_cost **<- Remove if this column does no exist**
FROM **<your table>**	
WHERE line_item_product_code = 'AmazonCloudFront'	
	AND line_item_line_item_type = 'Usage'
	AND line_item_usage_start_date >= timestamp '2024-01-01 00:00:00.000' **<- Modify the date accordingly to get the right start date**
        AND line_item_usage_start_date < timestamp '2024-04-01 00:00:00.000' **<- Modify to set end date. Example dates get data for Jan/Feb/Mar 2024**	
GROUP BY 1,2,3,4)

This will give you a table that will show you for each month and cloud-front region how much DataTransfer Out you had (GB) and how much that did cost AND also how many http/https (proxy) requests you had.

For each combination of mont/region/distribution you will get up to 3 lines of data.

If you also want to break out the additional features (origin shield and edge functions) the CASE clause needs to be expanded to break them out of the Other CloudFront operations category

The easiest way to use this Query is if you setup a https://docs.aws.amazon.com/cur/latest/userguide/dataexports-create-legacy.html report (enable Include resource IDs but NOT the other two), make sure to select Athena as a target (so it will select parquet format). Then setup Glue/Athena to query it from the console: https://docs.aws.amazon.com/cur/latest/userguide/cur-query-athena.html

AWS
answered 8 days ago
  • WHEN line_item_usage_type LIKE '%-Requests-%' AND line_item_usage_type NOT LIKE '%-OriginShield' THEN 'http/https (proxy) requests' ELSE 'Other CloudFront operations'

    Hi, I guess, these operations don't count/contribute to total bytes transferred? right? Only the one's having "DataTransfer-Out-Bytes"

  • They don't contribute, but they can be a cost driver if you have a very busy site and it may be benficial to break them out for visibility.