Inferentia and Trainium Service Quotas

3 minute read
Content level: Foundational
1

Understand what service quotas are, how they apply to Inferentia and Trainium instances and endpoints, and have an example of what quotas would be appropriate for a POC.

Quotas

Your AWS account has default quotas, formerly referred to as limits, for each AWS service. These quotas are region specific, so make sure you are in your expected region when you request them.

By default, all quotas are 0 for Inferentia and Trainium. There is no charge for increased quotas.

If you only get part of your quota request, reach out to your account team and send them the account number and ticket numbers for each request. (If you are just doing a POC, you may not need the full request)

EC2

For EC2, quotas refer to the maximum TOTAL number of vCPUs assigned to each instance type. There are separate quotas for Inferentia and Trainium, and separate quotas for spot and on-demand.

Be sure to explore spot instances for your workload!

As an example, a quota of 192 will let you run a single inf2.48xlarge, two inf2.24xlarges, six inf2.8xlarges, or forty-eight of the inf2.xlarges. It will also let you run inf1 instance types.

Similarly for Trainium, a quota of 128 will let you run a single trn1n.32xlarge or trn1.32xlarge, but it will also let you run sixteen trn1.2xlarge.

(even if you are only interested in inference, a Trainium instance may make sense for 70B or larger models)

To find the relevant quotas, in the Elastic Compute section, search for "inf" or "trn".

Inf2 vCPU list trn1 vCPU list

For more information on these instances, see:

https://aws.amazon.com/ec2/instance-types/inf2/

https://aws.amazon.com/ec2/instance-types/trn1/

SageMaker

For SageMaker, quotas control the maximum number of instances per endpoint type.

If you don't know what type you will need, start with one of each size. You may find that a larger or smaller instance is more efficient for you depending on your use case.

To find the relevant quotas, in the SageMaker section, search for "inf2" or "trn"

Inf2 SageMaker quotas

POC example

These are settings recommended for a basic inference POC.

Complete the following steps in both us-east-1 and us-west-2 :

(Outside of the US, Inferentia2 is available in Sao Paulo, Dublin, Frankfurt, Stockholm, London, Paris, Tokyo, Singapore, Mumbai, and Sydney)

  1. Open the Service Quotas console.
  2. Choose Amazon EC2.
  3. Choose the service quota.
  4. Navigate to the following service quotas, and choose Request quota increase.

EC2 POC Example

  1. For Running on-demand Inf2 instances, request 1536 vCPUs (each inf2.48xlarge has 192 vCPUs, so this will give us 8 inf2.48xlarge instances.)
  2. For All inf Spot Instance Requests, request 1536 vCPUs.
  3. For Running On-Demand Trn instances, request 1280 vCPUs (each trn1.32xlarge has 128 vCPUs).
  4. For Trn Spot instances, request 1280 vCPUs.

Inf on-demand example

SageMaker POC Example

  1. Navigate to AWS Services and select Amazon SageMaker.
  2. Search for inf2 and request 4 of each type. (Note: these are endpoint counts, not vCPU counts)
  3. Search for trn and request 4 of ml.trn1.32xlarge for endpoint usage.
  4. Talk to your Solutions Architect about what quotas you might need for SageMaker training.

Trainium Sagemaker

profile pictureAWS
EXPERT
published 2 months ago1637 views