Connectivity Issues with AWS EBS CSI While Setting Up Kubernetes Cluster and Deploying StatefulSet on AWS

0

I encountered an issue while setting up my own Kubernetes cluster and would appreciate some assistance. I've set up a Kubernetes cluster on AWS by myself and I'm using the AWS EBS CSI driver to manage volumes. After configuring AWS credentials (Access Key and Secret Key) and storing them in a Kubernetes Secret, I encountered connectivity issues when trying to access the AWS API. I've ensured that the credentials and configurations are correct, but the problem persists.

The specific issue arose when I was deploying a statefulset. I couldn't successfully deploy it and received the error message "AttachVolume.Attach failed for volume "postgres-pv" : volume attachment is being deleted". To further investigate the problem, I checked the logs of ebs-csi-controller and found the following error message "Could not detach volume "vol-aaaaaaaaaaa" from node "i-bbbbbbbbbb": error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded". I then attempted to test the connection to the AWS API using curl -v https://ec2.ap-northeast-1.amazonaws.com, but received a "301 Moved Permanently" response. I've ensured that the AWS credentials and settings are correct, but the issue persists. Could you please advise on how to resolve this problem?

asked a month ago186 views
1 Answer
4

Hello,

1.Check IAM Permissions: Ensure your IAM role/user has the necessary permissions for EC2 and EBS operations.

2.Verify Kubernetes Secret: Confirm the AWS credentials are correctly stored in a Kubernetes Secret and accessible by the CSI driver.

3.Network Configuration: Ensure your Kubernetes nodes have outbound internet access and can reach the AWS EC2 API.

4.Examine CSI Driver Logs: Check the logs of the EBS CSI driver for detailed error messages using:

kubectl logs -n kube-system <ebs-csi-driver-pod> -c ebs-plugin

5.Correct AWS API Endpoint: Use the correct AWS region endpoint (https://ec2.ap-northeast-1.amazonaws.com).

6.DNS Resolution: Verify DNS resolution in your cluster with:

kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh
nslookup ec2.ap-northeast-1.amazonaws.com

7.Timeout Issues: Investigate any network connectivity issues or timeouts indicated by context deadline exceeded errors. try this step may help you

profile picture
EXPERT
answered a month ago
  • Thank you for your suggestions. Here's what I have checked and encountered:

    1. Check IAM Permissions: My IAM user/role has the necessary permissions such as https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/example-iam-policy.json

    2. Verify Kubernetes Secret: The credentials are correctly stored in a Kubernetes Secret and referenced in the ebs-csi-controller Deployment.

    3. Network Configuration: I tested the AWS EC2 endpoint using curl -v https://ec2.ap-northeast-1.amazonaws.com, which returned a 301, indicating basic connectivity should be fine.

    4. Examine CSI Driver Logs: The logs show the following errors:

    error executing batch" err="error listing AWS instances: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , canceled, context deadline exceeded"
    
    1. DNS Resolution: I tested DNS resolution using nslookup ec2.ap-northeast-1.amazonaws.com, which returned an error "connection timed out; no servers could be reached".

    Coredns service seems to be running fine, and there are no obvious issues from the logs. The root cause of the problem is still unclear. Thank you for your previous help, do you have any further suggestions or other possible directions for investigation?