I have been working on AWS Serverless Lambdas and now having issues with third party rate limits.
Context
A Lambda invocation triggered by an event:
calls a third-party API 4 times,
runs some logic,
writes to my database.
As volume increases, we started getting 429s from the third party.
How I handle it today
Lambda is triggered by SQS.
On failures I use partial batch item failures so SQS will retry with exponential backoff.
Inside the Lambda I only retry when I see a 429 from the third party.
Problems I am having now
This is reactive: it depends on the vendor returning a clean 429. Under load we also see transport errors (e.g., socket hangups) that don’t 429, so they slip through to my deadletter.
I tried using an internal in-process queue (e.g., p-queue) to serialize calls, but that only works inside one invocation. With multiple Lambdas running concurrently (from SQS concurrency), each process has its own queue → they still collectively exceed the vendor’s global limit.
A simple in-memory cache like Redis won’t solve it either because each invocation is isolated and I’m updating different entities per call.
What I want is an approach to solve this. Essentially what I want is a cross-invocation limiter that guarantees we never exceed the third party API's limits across all concurrent lambdas preferably something which is serverless friendly. The limits I need to take into account for the API are:
1. can't do more than 60 calls per minute,
2. can't do more than 5000 calls per day,
3. can't have more than 3 calls resolving concurrently.
What are the best solutions for this approach? I thought of using a centralised dynamodb table which has counter for each but handling concurrency of resolving no more than 3 at the same time is giving me issues in that approach.