Followed example on sagemaker-examples.readthedocs.io- Pretrained Bert Model

0

Hi

I just started a new AWS account to test out Sagemaker. I followed this example to the letter using Sagemaker Studio's Juypterlab https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-script-mode/pytorch_bert/deploy_bert_outputs.html

When trying to get a response I received this error Received server error (500) from primary and could not load the entire response body. See https://ap-southeast-2.console.aws.amazon.com/cloudwatch/home?region=ap-southeast-2#logEventViewer:group=/aws/sagemaker/Endpoints/bert-base-2024-06-24-09-15-07in account 381492025627 for more information.

Here are some key logs 2024-06-24T09:18:00.042Z 2024-06-24 09:17:59,965 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000

2024-06-24T09:18:00.042Z 2024-06-24 09:17:59,965 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000

2024-06-24T09:18:00.042Z 2024-06-24 09:17:59,967 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000

2024-06-24T09:18:00.043Z 2024-06-24 09:17:59,967 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000

2024-06-24T09:18:00.043Z 2024-06-24 09:18:00,025 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080

2024-06-24T09:18:00.043Z 2024-06-24 09:18:00,027 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.

2024-06-24T09:18:00.043Z 2024-06-24 09:18:00,029 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.

2024-06-24T09:18:00.043Z 2024-06-24 09:18:00,030 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.

2024-06-24T09:18:00.043Z Model server started.

2024-06-24T09:18:00.043Z 2024-06-24 09:18:00,032 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,040 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,648 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=6e35bcfffe387de4-00000014-00000003-39759a947189b525-c1df4593

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,654 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=6e35bcfffe387de4-00000014-00000004-0250da947189b525-c1cdb6b7

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,656 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=6e35bcfffe387de4-00000014-00000000-5f229a947189b525-e2878da0

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,656 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 545

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,657 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 538

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,657 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 545

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,658 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-1

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,658 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-2

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,658 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-3

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,694 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=6e35bcfffe387de4-00000014-00000001-289a9a947189b525-7337c1c0

2024-06-24T09:18:00.807Z 2024-06-24 09:18:00,695 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 585 2024-06-24T09:18:01.562Z 2024-06-24 09:18:00,696 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-4

2024-06-24T09:22:55.972Z 2024-06-24 09:22:55,866 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 0 2024-06-24T09:22:56.473Z 2024-06-24 09:22:55,866 [INFO ] W-9000-model ACCESS_LOG - /169.254.178.2:40630 "POST /invocations HTTP/1.1" 500 3 2024-06-24T09:23:00.547Z 2024-06-24 09:22:56,238 [INFO ] pool-1-thread-6 ACCESS_LOG - /169.254.178.2:49882 "GET /ping HTTP/1.1" 200 0

asked 16 days ago198 views
1 Answer
1

Hi,

In this Github issue, somebody had exact same issue as you and could fix it: https://github.com/aws/sagemaker-python-sdk/issues/4395

Please, follow same steps to fix your case. You are right: it seems that the example code is incorrect and needs some changes.

Best,

Didier

profile pictureAWS
EXPERT
answered 15 days ago
profile pictureAWS
EXPERT
iBehr
reviewed 15 days ago