Intermittent Internal Server Error - StatusCode 500 on API Gateway calling Lambda

Question

I have a REST API in AWS API Gateway that invokes a Python Lambda function and returns some result. Most of the times this workflow works fine, meaning that the Lambda function is executed and passes the result back to the API, which in turn returns a 200 OK response.

However, there are few times in which I get a 500 error code from the API and the Lambda seems not to be even executed. The response.reason says: "Internal Server Error" and no additional information is given.

There is no difference between the failing requests and the successful ones to the API in terms of the method or parameters format.

One more comment is that the API has the cache setting enabled. I've seen similar posts and some of the answers mention the format of the JSON object returned by the Lambda function, others point to IAM permissions issues, but none of those seem to be the cause here. In fact, as this post's title says this is an intermittent behavior: most of the times it works fine, but occasionally I get this error.

Any hint would be highly appreciated.

You can enable logging on api Gateway and check the logs, it should give you some idea about the issue. — nirvana124
– nirvana124, Commented Apr 10, 2021 at 4:26
@PankajYadav In fact I did so, I enabled both CloudWatch Logs and Access Logging, but none of them provided additional information. Surprisingly, the log entries that correspond to the API request that caused the error don't even look like an error. — Nicolás García
– Nicolás García, Commented Apr 10, 2021 at 4:57
You are using exception handling inside your lambda function, right ? — KnowledgeGainer
– KnowledgeGainer, Commented May 6, 2021 at 9:09
Exactly, that's how I realize about the error codes. In fact, since my first post I've received some additional errors like: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) or <Response [502]> and <Response [503]> among others. — Nicolás García
– Nicolás García, Commented May 7, 2021 at 13:08

Yves M. · Accepted Answer · 2022-09-05 11:06:44Z

11

I have the same problem and in my case I had to enable Log full requests/responses data together with INFO logs on the API Gateway stage to see the following logs:

(xxx) Endpoint response body before transformations: { "Type": "Service", "message": "INFO: Lambda is initializing your function. It will be ready to invoke shortly." }

In my case the issue was related to the fact that the lambda was in Inactive state, which happens If a function remains idle for several weeks.

edited Sep 5, 2022 at 11:06

Yves M.

31.3k24 gold badges111 silver badges154 bronze badges

answered Apr 7, 2022 at 10:37

fcracker79

1,2281 gold badge14 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jspinella Over a year ago

What's the recommended way to handle this in code that invokes a Lambda? It's a bummer that AWS has this as an "error" that client libraries like boto3 will throw exceptions for even though the Lambda will run eventually. I almost want to say 202 makes more sense than 500. 503 would be better even, but since the Lambda does eventually invoke, 5xx doesn't make sense to me. Anyway, I guess I would surround the Lambda invokation code in a try-catch.

fcracker79 Over a year ago

"As stated in the announcement post, Lambda precreates the ENIs required for your function to connect to your VPCs, which can take 60 to 90 seconds to complete. We will be changing this process slightly, by creating the required ENI resources while the function is placed in a Pending state and transitioning to Active after that process is completed. ". In case of VPC there is not much you can do, apparently... @jspinella

xav19 · Accepted Answer · 2021-05-06 06:59:34Z

I have the same problem and I suspect a timeout maybe due to lambda reaching its memory limit.

I have set the memory limit to the next notch (128 -> 512) and augmented the timeout to 10s (default is 3), and now I'm able to see the timeout in action. I still have the problem for the moment but now I'll be able to investigate.

I hope that this helps you.

Thorsten Behrens · Accepted Answer · 2021-05-13 08:16:30Z

I see this with a HTTP API integration. It's intermittent, and it appears to improve when adding provisioned concurrency to the Lambda. For example, on a Lambda that has between 4 and 10 concurrent instances, but usually hovers in the 4 to 8 range, purchasing between 5 and 6 provisioned concurrent instances helped reduce, possibly eliminate, these 500 errors.

I am still monitoring to see whether they are gone for good. The frequency of these errors has gone down drastically with the provisioned instances.

Collectives™ on Stack Overflow

Intermittent Internal Server Error - StatusCode 500 on API Gateway calling Lambda

3 Answers 3

2 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Related