SageMaker Inference Recommender: Image size is greater than supported size 10GB

0

Hello,

I am trying to run a suite of inference recommendation jobs on a set of GPU instances (ml.g5.12xlarge, ml.g5.8xlarge, ml.g5.16xlarge) as well as AWS Inferentia machines (ml.inf2.2xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge).

The following parameters customize each job:

  • SAGEMAKER_MODEL_SERVER_WORKERS = 1

  • OMP_NUM_THREADS =3

  • JobType = Default ( not Advanced )

A number of jobs is being spawned for each instance (as shown by the InferenceRecommender page in SageMaker):

  • ml.g5.8xlarge, ml.g5.16xlarge, ml.inf2.2xlarge - 1 job

    • All fail with error: Image size 12399514599 is greater than supported size 10737418240
  • ml.inf2.24xlarge - 2 jobs

    • 1 job fails with error: Image size 12399514599 is greater than supported size 10737418240
    • 1 job fails with "Benchmark failed to finish within job duration"
  • ml.inf2.8xlarge - 3 jobs

    • 2 jobs fail with error: Image size 12399514599 is greater than supported size 10737418240
    • 1 job fails with "Benchmark failed to finish within job duration"
  • ml.g5.12xlarge - 4 jobs

    • 3 jobs fail with error: Image size 12399514599 is greater than supported size 10737418240
    • 1 job successfully completes!!

Since the models I am experimenting with consist of LLMs, their size combined with the associated image exceed the 10GB threshold discussed in this community question.

My questions are:

  • How can one use the Inference Recommendation service for LLMs considering they routinely exceed the 10GB AWS Lambda threshold?
  • Why does 1 job successfully complete on the ml.g5.12xlarge when the remaining jobs (for this instance and others as well) failed with the image size error?
  • How does one avoid the "Benchmark failed to finish within job duration" error?
1 Answer
-1

I suggest using model-as-a-service features or adapting LLMs for recommendation tasks to handle large models. https://blog.tensorflow.org/2023/06/augmenting-recommendation-systems-with.html

The successful job might be due to the specific configuration of the ml.g5.12xlarge instance. It’s recommended to use the ml.g5.12xlarge instance type for deploying a 13B parameter model. This instance type might have more resources available to handle the large image size. https://stackoverflow.com/questions/76968515/i-want-to-deploy-llm-model-on-sagemaker-and-it-is-giving-me-this-error-ive-tri

To avoid the benchmark error, you could use the PauseTiming function or tailor the SageMaker JumpStart deployment process to your requirements. https://github.com/google/benchmark/issues/920 https://benchmarkdotnet.org/articles/guides/troubleshooting.html

profile picture
EXPERT
answered 6 days ago
  • Hello Giovanni,

    Thank you for the input. To my understanding, the image size error stems from the fact that SageMaker Serverless is backed by AWS Lambda (which comes with the 10GB limitation). Unfortunately, adapting the LLM for recommendation tasks won't solve this issue. As the image + model size exceeds this threshold, the question is how can SageMaker Inference Recommender be used for most LLMs?

    This is the API for creating recommendation jobs: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateInferenceRecommendationsJob.html, which doesn't allow time pausing either.