Kinesis Data Stream via API Gateway: Observed 100k+ Records/Second. Investigating Unexpected Throughput.

0

I'm observing unexpectedly high throughput for my Kinesis Data Stream integration with API Gateway, and I'm seeking clarification on the metrics and possible explanations.

Setup:

  • Using Kinesis Data Stream with API Gateway REST API integration for PutRecords
  • Kinesis Data Stream: 8 shards, provisioned mode
  • API Gateway: Burst limit and rate limit both set to 5000

Expected behavior: Based on my understanding, each Kinesis shard can process 1000 records per second. With 8 shards, I expected a maximum throughput of 8000 records per second.

Observed behavior:

  • CloudWatch metrics for API Gateway's Count (1-second sum) exceed 100,000
  • Kinesis PutRecords' Records sum (1-second sum) also exceeds 100,000
  • No rate limiting or throttling events recorded for either Kinesis or API Gateway during this period

Questions:

  1. Am I misinterpreting the meaning of these metrics?
  2. How is it possible to achieve such high throughput given the setup described above?
  3. Are there any factors I'm overlooking that could explain this behavior?

I've attached screenshots of the relevant CloudWatch metrics for reference.

Any insights or explanations would be greatly appreciated. Thank you in advance for your help!

Kinesis PutRecords.Records Enter image description here

API GW Count Enter image description here

Kinesis Throttling History Enter image description here

1 Answer
2
Accepted Answer

API Gateway sends metrics to CloudWatch every minute (see here) and so does Kinesis (see here).

So, selecting a period that is less than 1 minute is meaningless and would still provide you with the same value as selecting 1 minute.

When you select Statistics=Sum it presents you with the sum of all samples within the selected time range. if that time range is 1 minute (or less) it will contain only a single sample. So, Sum would be equal to SampleCount and contain the total count within 1 minute.

To get the per second rate you need to take the value and divide by 60. So in your example, the average number of Kinesis PutRecords and API Gateway Count per second is 100K/60 ~= 1666

profile pictureAWS
EXPERT
answered 6 days ago
profile picture
EXPERT
reviewed 5 days ago