SageMaker Inference Recommender: Image size is greater than supported size 10GB

Hello,

I am trying to run a suite of inference recommendation jobs on a set of GPU instances (ml.g5.12xlarge, ml.g5.8xlarge, ml.g5.16xlarge) as well as AWS Inferentia machines (ml.inf2.2xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge).

The following parameters customize each job:

SAGEMAKER_MODEL_SERVER_WORKERS = 1
OMP_NUM_THREADS =3
JobType = Default ( not Advanced )

A number of jobs is being spawned for each instance (as shown by the InferenceRecommender page in SageMaker):

ml.g5.8xlarge, ml.g5.16xlarge, ml.inf2.2xlarge - 1 job
- All fail with error: Image size 12399514599 is greater than supported size 10737418240
ml.inf2.24xlarge - 2 jobs
- 1 job fails with error: Image size 12399514599 is greater than supported size 10737418240
- 1 job fails with "Benchmark failed to finish within job duration"
ml.inf2.8xlarge - 3 jobs
- 2 jobs fail with error: Image size 12399514599 is greater than supported size 10737418240
- 1 job fails with "Benchmark failed to finish within job duration"
ml.g5.12xlarge - 4 jobs
- 3 jobs fail with error: Image size 12399514599 is greater than supported size 10737418240
- 1 job successfully completes!!

Since the models I am experimenting with consist of LLMs, their size combined with the associated image exceed the 10GB threshold discussed in this community question.

My questions are:

How can one use the Inference Recommendation service for LLMs considering they routinely exceed the 10GB AWS Lambda threshold?
Why does 1 job successfully complete on the ml.g5.12xlarge when the remaining jobs (for this instance and others as well) failed with the image size error?
How does one avoid the "Benchmark failed to finish within job duration" error?

Topics

Serverless Compute Machine Learning & AI

Tags

AWS Lambda Amazon SageMaker Machine Learning & AI ML Ops with Amazon SageMaker and Kubernetes Amazon SageMaker Experiments

Language

English

Adrian

asked 6 days ago140 views

1 Answer

Newest
Most votes
Most comments

Are these answers helpful? Upvote the correct answer to help the community benefit from your knowledge.

-1

I suggest using model-as-a-service features or adapting LLMs for recommendation tasks to handle large models. https://blog.tensorflow.org/2023/06/augmenting-recommendation-systems-with.html

The successful job might be due to the specific configuration of the ml.g5.12xlarge instance. It’s recommended to use the ml.g5.12xlarge instance type for deploying a 13B parameter model. This instance type might have more resources available to handle the large image size. https://stackoverflow.com/questions/76968515/i-want-to-deploy-llm-model-on-sagemaker-and-it-is-giving-me-this-error-ive-tri

To avoid the benchmark error, you could use the PauseTiming function or tailor the SageMaker JumpStart deployment process to your requirements. https://github.com/google/benchmark/issues/920 https://benchmarkdotnet.org/articles/guides/troubleshooting.html

EXPERT

Giovanni Lauria

answered 6 days ago

Adrian
6 days ago
Hello Giovanni,

Thank you for the input. To my understanding, the image size error stems from the fact that SageMaker Serverless is backed by AWS Lambda (which comes with the 10GB limitation). Unfortunately, adapting the LLM for recommendation tasks won't solve this issue. As the image + model size exceeds this threshold, the question is how can SageMaker Inference Recommender be used for most LLMs?

This is the API for creating recommendation jobs: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateInferenceRecommendationsJob.html, which doesn't allow time pausing either.

Relevant content

SageMaker Inference Recommendation
AWS-User-8002955
asked 2 years ago
"Failure reason Image size 12704675783 is greater than supported size 10737418240" when creating serverless endpoint in SageMaker.
rePost-User-2297137
asked 2 years ago
Extending Docker image for SageMaker Inference
jdbaker
asked 2 years ago
Inference Recommendation fails due to image size error
Adrian
asked 5 days ago
Why does CloudWatch show that my Amazon SageMaker endpoint's CPU or GPU utilization is greater than 100%?
AWS OFFICIALUpdated 2 years ago
How can I resolve the Amazon SageMaker inference error "upstream timed out (110: Connection timed out) while reading response header from upstream"?
AWS OFFICIALUpdated 2 years ago
How can I set the number or size of files when I run a CTAS query in Athena?
AWS OFFICIALUpdated 3 years ago
How do I troubleshoot issues when I bring my custom container to Amazon SageMaker for training or inference?
AWS OFFICIALUpdated 2 years ago
Accelerating SageMaker Training Jobs running on AWS Trainium
EXPERT
Kamran Khan
published 5 months ago
Easily right-size bioinformatics workflows in AWS HealthOmics
EXPERT
ajfriedman18
published 3 months ago