- Newest
- Most votes
- Most comments
I suggest using model-as-a-service features or adapting LLMs for recommendation tasks to handle large models. https://blog.tensorflow.org/2023/06/augmenting-recommendation-systems-with.html
The successful job might be due to the specific configuration of the ml.g5.12xlarge instance. It’s recommended to use the ml.g5.12xlarge instance type for deploying a 13B parameter model. This instance type might have more resources available to handle the large image size. https://stackoverflow.com/questions/76968515/i-want-to-deploy-llm-model-on-sagemaker-and-it-is-giving-me-this-error-ive-tri
To avoid the benchmark error, you could use the PauseTiming function or tailor the SageMaker JumpStart deployment process to your requirements. https://github.com/google/benchmark/issues/920 https://benchmarkdotnet.org/articles/guides/troubleshooting.html
Relevant content
- asked 2 years ago
- asked 2 years ago
- asked 5 days ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
Hello Giovanni,
Thank you for the input. To my understanding, the image size error stems from the fact that SageMaker Serverless is backed by AWS Lambda (which comes with the 10GB limitation). Unfortunately, adapting the LLM for recommendation tasks won't solve this issue. As the image + model size exceeds this threshold, the question is how can SageMaker Inference Recommender be used for most LLMs?
This is the API for creating recommendation jobs: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateInferenceRecommendationsJob.html, which doesn't allow time pausing either.