Inferentia and Trainium Service Quotas

3 minute read

Content level: Foundational

Understand what service quotas are, how they apply to Inferentia and Trainium instances and endpoints, and have an example of what quotas would be appropriate for a POC.

Quotas

Your AWS account has default quotas, formerly referred to as limits, for each AWS service. These quotas are region specific, so make sure you are in your expected region when you request them.

By default, all quotas are 0 for Inferentia and Trainium. There is no charge for increased quotas.

If you only get part of your quota request, reach out to your account team and send them the account number and ticket numbers for each request. (If you are just doing a POC, you may not need the full request)

EC2

For EC2, quotas refer to the maximum TOTAL number of vCPUs assigned to each instance type. There are separate quotas for Inferentia and Trainium, and separate quotas for spot and on-demand.

Be sure to explore spot instances for your workload!

As an example, a quota of 192 will let you run a single inf2.48xlarge, two inf2.24xlarges, six inf2.8xlarges, or forty-eight of the inf2.xlarges. It will also let you run inf1 instance types.

Similarly for Trainium, a quota of 128 will let you run a single trn1n.32xlarge or trn1.32xlarge, but it will also let you run sixteen trn1.2xlarge.

(even if you are only interested in inference, a Trainium instance may make sense for 70B or larger models)

To find the relevant quotas, in the Elastic Compute section, search for "inf" or "trn".

Inf2 vCPU list trn1 vCPU list

For more information on these instances, see:

https://aws.amazon.com/ec2/instance-types/inf2/

https://aws.amazon.com/ec2/instance-types/trn1/

SageMaker

For SageMaker, quotas control the maximum number of instances per endpoint type.

If you don't know what type you will need, start with one of each size. You may find that a larger or smaller instance is more efficient for you depending on your use case.

To find the relevant quotas, in the SageMaker section, search for "inf2" or "trn"

Inf2 SageMaker quotas

POC example

These are settings recommended for a basic inference POC.

Complete the following steps in both us-east-1 and us-west-2 :

(Outside of the US, Inferentia2 is available in Sao Paulo, Dublin, Frankfurt, Stockholm, London, Paris, Tokyo, Singapore, Mumbai, and Sydney)

Open the Service Quotas console.
Choose Amazon EC2.
Choose the service quota.
Navigate to the following service quotas, and choose Request quota increase.

EC2 POC Example

For Running on-demand Inf2 instances, request 1536 vCPUs (each inf2.48xlarge has 192 vCPUs, so this will give us 8 inf2.48xlarge instances.)
For All inf Spot Instance Requests, request 1536 vCPUs.
For Running On-Demand Trn instances, request 1280 vCPUs (each trn1.32xlarge has 128 vCPUs).
For Trn Spot instances, request 1280 vCPUs.

Inf on-demand example

SageMaker POC Example

Navigate to AWS Services and select Amazon SageMaker.
Search for inf2 and request 4 of each type. (Note: these are endpoint counts, not vCPU counts)
Search for trn and request 4 of ml.trn1.32xlarge for endpoint usage.
Talk to your Solutions Architect about what quotas you might need for SageMaker training.

Trainium Sagemaker

Topics

Machine Learning & AI

Relevant content

New NLP/CV Examples to Get Started on AWS Inferentia and AWS Trainium
EXPERT
Kamran Khan
published 2 years ago
Accelerating SageMaker Training Jobs running on AWS Trainium
EXPERT
Kamran Khan
published 5 months ago
How do I know if an open source model is supported on Inferentia, Trainium, or Neuron?
EXPERT
Jim_Burtoft
published 20 days ago
Increase Default quota for a specific service
Accepted Answer
Cristian
asked 9 months ago
service-quota-limit
Mohd Asif
asked 5 months ago
Service quotas in eu regions
AlbertK
asked 3 years ago
How can I increase my IAM default quota?
AWS OFFICIALUpdated 2 years ago
How do I manage my AWS service quotas?
AWS OFFICIALUpdated 2 years ago
How can I request, view, and manage service quota increase requests using AWS CLI commands?
AWS OFFICIALUpdated a year ago
How can I increase my ACM service quota?
AWS OFFICIALUpdated 20 days ago