5

UPDATE

The original test code below is largely correct, but in NodeJS the various AWS services should be setup a bit differently as per the SDK link provided by @Michael-sqlbot

// manager const AWS = require("aws-sdk") const https = require('https'); const agent = new https.Agent({ maxSockets: 498 // workers hit this level; expect plus 1 for the manager instance }); const lambda = new AWS.Lambda({ apiVersion: '2015-03-31', region: 'us-east-2', // Initial concurrency burst limit = 500 httpOptions: { // <--- replace the default of 50 (https) by agent: agent // <--- plugging the modified Agent into the service } }) // NOW begin the manager handler code 

In planning for a new service, I am doing some preliminary stress testing. After reading about the 1,000 concurrent execution limit per account and the initial burst rate (which in us-east-2 is 500), I was expecting to achieve at least the 500 burst concurrent executions right away. The screenshot below of CloudWatch's Lambda metric shows otherwise. I cannot get past 51 concurrent executions no matter what mix of parameters I try. Here's the test code:

// worker exports.handler = async (event) => { // declare sleep promise const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms)); // return after one second let nStart = new Date().getTime() await sleep(1000) return new Date().getTime() - nStart; // report the exact ms the sleep actually took }; // manager exports.handler = async(event) => { const invokeWorker = async() => { try { let lambda = new AWS.Lambda() // NO! DO NOT DO THIS, SEE UPDATE ABOVE var params = { FunctionName: "worker-function", InvocationType: "RequestResponse", LogType: "None" }; return await lambda.invoke(params).promise() } catch (error) { console.log(error) } }; try { let nStart = new Date().getTime() let aPromises = [] // invoke workers for (var i = 1; i <= 3000; i++) { aPromises.push(invokeWorker()) } // record time to complete spawning let nSpawnMs = new Date().getTime() - nStart // wait for the workers to ALL return let aResponses = await Promise.all(aPromises) // sum all the actual sleep times const reducer = (accumulator, response) => { return accumulator + parseInt(response.Payload) }; let nTotalWorkMs = aResponses.reduce(reducer, 0) // show me let nTotalET = new Date().getTime() - nStart return { jobsCount: aResponses.length, spawnCompletionMs: nSpawnMs, spawnCompletionPct: `${Math.floor(nSpawnMs / nTotalET * 10000) / 100}%`, totalElapsedMs: nTotalET, totalWorkMs: nTotalWorkMs, parallelRatio: Math.floor(nTotalET / nTotalWorkMs * 1000) / 1000 } } catch (error) { console.log(error) } }; Response: { "jobsCount": 3000, "spawnCompletionMs": 1879, "spawnCompletionPct": "2.91%", "totalElapsedMs": 64546, "totalWorkMs": 3004205, "parallelRatio": 0.021 } Request ID: "43f31584-238e-4af9-9c5d-95ccab22ae84" 

Am I hitting a different limit that I have not mentioned? Is there a flaw in my test code? I was attempting to hit the limit here with 3,000 workers, but there was NO throttling encountered, which I guess is due to the Asynchronous invocation retry behaviour.

Edit: There is no VPC involved on either Lambda; the setting in the select input is "No VPC".

Edit: Showing Cloudwatch before and after the fix

Cannot exceed 51 Concurrent Executions With a fixed Agent, success is achieved

7
  • 3
    What's the configuration of your AWS Lambda function? Is it in VPC? Commented Feb 11, 2019 at 15:06
  • "which I guess is due to the Asynchronous invocation retry behaviour." You are using InvocationType: "RequestResponse" -- that means synchronous, not asynchronous, even if your handler is an async function. The service isn't retrying. But, if you are running the invoker as a lambda function, too, then unless that invoker function's container has a lot of CPU cycles available (which you can get by bumping up the memory) it likely does not have the resources to generate, sign, and submit enough simultaneous requests to properly perform the test. Maybe run that in EC2. Commented Feb 11, 2019 at 23:02
  • 1
  • 1
    @Michael-sqlbot ROFL! Its gonna take a bit to recover from that one LOL. So, actually, that is quite handy then isn't it! Knowing that us-east-2 will only give you 500 on the initial burst, one could set this to 495 and NEVER WORRY about hitting AWS's throttle! Node is caching beyond maxSockets which (may) cause memory concerns with large payloads, so there's that little gotcha, but that's likely minor. As my tests show, there is negligible performance gain above 1024MB. Ok, re-writing this test now..... Commented Feb 12, 2019 at 4:08
  • 1
    @Michael-sqlbot - well your Homer Simpson link has me doing Tim Allen power tool noises right now! Check out the updated screenshot. The modified code has smashed the parallelRatio from 0.022 to 0.008! That's fun!! If you do a write up in answer form I can get ya checked. I visit my kid at OSU often but don't quite make it down to "Who Dey" county. The next time I do, I owe you some beers!! Thank you. Commented Feb 12, 2019 at 6:05

2 Answers 2

3

There were a number of potential suspects, particularly due to the fact that you were invoking Lambda from Lambda, but your focus on consistently seeing a concurrency of 50 — a seemingly arbitrary limit (and a suspiciously round number) — reminded me that there's an anti-footgun lurking in the JavaScript SDK:

In Node.js, you can set the maximum number of connections per origin. If maxSockets is set, the low-level HTTP client queues requests and assigns them to sockets as they become available.

Here of course, "origin" means any unique combination of scheme + hostname, which in this case is the service endpoint for Lambda in us-east-2 that the SDK is connecting to in order to call the Invoke method, https://lambda.us-east-2.amazonaws.com.

This lets you set an upper bound on the number of concurrent requests to a given origin at a time. Lowering this value can reduce the number of throttling or timeout errors received. However, it can also increase memory usage because requests are queued until a socket becomes available.

...

When using the default of https, the SDK takes the maxSockets value from the globalAgent. If the maxSockets value is not defined or is Infinity, the SDK assumes a maxSockets value of 50.

https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-maxsockets.html

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, the round number '50' just felt like a limit to me. Thanks for the tip.
1

Lambda concurrency it not the only factor that decides how scalable your functions are. If your Lambda function is runnning within a VPC, it will require an ENI (Elastic Network Interface) which allows for ethernet traffic from and to the container (Lambda function).

It's possible your throttling occurred due to too many ENI's being requested (50 at a time). You can check this by viewing the logs of the Manager lambda function and looking for an error message when it's trying to invoke one of the child containers. If the error looks something like the following, you'll know ENI's is your issue.

Lambda was not able to create an ENI in the VPC of the Lambda function because the limit for Network Interfaces has been reached.

2 Comments

This could be it, but note that Lambda does not require 1 ENI per concurrent invocation/per container -- containers on the same host share the ENI, so at 128MB it's only ~1 ENI per 24 containers. You can estimate the approximate ENI capacity with the following formula: Concurrent executions × (Memory in GB / 3 GB)
Tom, thanks for the input. I have just edited the question to reflect the fact that both worker and manager are not in a VPC.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.