Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket name sometimes missing in the S3 URL - causing failures for the same bucket and code that was previously successful for operations like get object and put object. #4187

Open
mjoeydba opened this issue Jul 2, 2024 · 2 comments
Assignees
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue s3

Comments

@mjoeydba
Copy link

mjoeydba commented Jul 2, 2024

Describe the bug

S3 access failing for the same bucket and code that was previously successful. Debug trace shows that the URL used during failure does not include the bucket name either as host or in the path.

Success

2024-06-27 21:21:12,116 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Bucket': 'xxxx', 'Region': 'us-east-1', 'UseFIPS': False, 'UseDualStack': False, 'ForcePathStyle': False, 'Accelerate': False, 'UseGlobalEndpoint': True, 'Key': 'xxxx/xxxx.xlsx', 'DisableMultiRegionAccessPoints': False, 'UseArnRegion': True}
2024-06-27 21:21:12,116 botocore.regions [DEBUG] Endpoint provider result: https://xxxx.s3.amazonaws.com

Failure

2024-07-02 18:22:11,094 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Region': 'us-east-1', 'UseFIPS': False, 'UseDualStack': False, 'ForcePathStyle': False, 'Accelerate': False, 'UseGlobalEndpoint': True, 'DisableMultiRegionAccessPoints': False, 'UseArnRegion': True}
2024-07-02 18:22:11,095 botocore.regions [DEBUG] Endpoint provider result: https://s3.amazonaws.com

The URL in the getobject call is also showing same behavior which seems to cause the access denied error.

Expected Behavior

Successfully download object.

Current Behavior

Failure with Access Denied after it worked successfully for the same code.

Reproduction Steps

#Note : The issue occurrence is unpredictable.
import pandas as pd
import boto3
from io import BytesIO
from pyspark.sql.functions import upper
import logging
from botocore.config import Config
boto3.set_stream_logger('', logging.DEBUG)
#boto3.set_stream_logger('')

Initialize S3 client

s3 = boto3.client('s3')
INBOUND_S3_BUCKET = "xxxx"
INBOUND_FILE_PATH = 'xxx/xxxx.xlsx'
obj = s3.get_object(Bucket = INBOUND_S3_BUCKET, Key = INBOUND_FILE_PATH)

Possible Solution

Unknown

Additional Information/Context

No response

SDK version used

1.34.137

Environment details (OS name and version, etc.)

Linux, databricks

@mjoeydba mjoeydba added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Jul 2, 2024
@mjoeydba mjoeydba changed the title S3 access failing for the same bucket and code that was previously successful Jul 2, 2024
@mjoeydba mjoeydba changed the title Bucket name sometimes missing in S3 operations - access failing for the same bucket and code that was previously successful Jul 3, 2024
@mjoeydba mjoeydba changed the title Bucket name sometimes missing in the S3 URL - access failing for the same bucket and code that was previously successful Jul 3, 2024
@tim-finnigan tim-finnigan self-assigned this Jul 3, 2024
@tim-finnigan
Copy link
Contributor

Hi @mjoeydba thanks for reaching out. Here is a guide on troubleshooting Access Denied errors in S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/troubleshoot-403-errors.html

That error is likely occurring due to your settings, policies, permissions, or profile configuration. But if you'd like us to investigate this further on the SDK side, please share a complete code snippet to reproduce the issue, as well as debug logs (with any sensitive info redacted) by adding boto3.set_stream_logger('') to your script.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. s3 p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Jul 3, 2024
@mjoeydba
Copy link
Author

mjoeydba commented Jul 7, 2024

Hi @tim-finnigan. Thanks for the response.
Please find attached logs and program. The issue is non deterministic. The code is running against the same bucket and EC2 instance profile.

The only difference I see is the bucket name is missing in the S3 URL when there is a failure. AWS support also confirmed that when the error occurs the bucket name being sent is the first level folder under the bucket.

program.txt
failure_log.txt
success_log.txt

@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue s3
2 participants