0

I have a question on the step function part of AWS

I have a function to watch and update datas in databases. But because we can have only 1000 as we can have 1 000 000 items to update, I would like to manage it by 10 000 or 100 000 with a lambda.

But the optimal solution should be to manage them in parallel to update every datas at the same time and finish them together

So for that I would like to create a Lambda function with aws-sdk which should create a parallel step function with X tasks and every tasks will manage 10 000 or 100 000 items of the database

But when I read the aws-sdk documentation, it looks like there is no way to create a parallel step function, even from a template

So my question is, is it possible to create a parallel step function from a Lambda function with aws-sdk ? Or do you have a better solution to my problem ?

Thanks in advance

Update : To give you more informations, my problem is I'll have to update a insert an unknown of datas in my DB each first day of month, and the problem is that I need to call an API that takes 15 seconds to return the data (it's not our API so I cannot try to upgrade return time).

If I just use a Lambda function, it will be in timeout after 15 minutes.

Suddenly, I thought of using Step function to execute the Lambda function for each data, but the problem is, if we have a lot of datas, it will maybe take more than 24 hours and I would like to find a solution where I can execute my Lambda function in parallel to optimize the time, so i thought about parallel task of step function.

But because the number of datas will change every month, I don't know how to dynamically increase or decrease branch number of my step function, and that's why I thought of generate my step function from another Lambda

2
  • that API takes 15 seconds to return some data, what's the format of the response? JSON? line delimited json? Commented Jun 3, 2021 at 14:32
  • It's text plain, but we work on a script to return it in JSON Commented Jun 3, 2021 at 14:43

1 Answer 1

3

I have a function to watch and update data in databases.

I suppose what you need to watch is some kind of user/data events? what to watch? what to update?

Can you provide more info before I can give you some architectural suggestions?

By the way, it is Step Functions to orchestrate/invoke Lambda functions, not the other around.


updated answer:

so you seem to face the 15 mins hard limit for Lambda max execution time. there are 3 approaches I can see:

  1. instead of using a Lambda function, use an ECS container or EC2 instance to handle the large volume of data processing and database writing. however, this requires substantial code re-rewrite and infrastructure/architectural change.

  2. figure out a way to break down the input data so you can fan out the handling to multiple Lambda function instances, i.e.: input data -> Lambda to break down task -> SQS messages -> Lambda to handle each task. but my concern is that the task to break down input data may also need substantial time.

  3. before Lambda execution timeout, mark the current processed position, invoke the same Lambda function with the original event + position offset. the next Lambda instance would pick up the data processing from where the previous execution stopped. https://medium.com/swlh/processing-large-s3-files-with-aws-lambda-2c5840ae5c91

Sign up to request clarification or add additional context in comments.

2 Comments

I updated my answer. recently I used approach #3 to handle a 200MB line-delimited JSON file which contains 1.08M records, the total processing time was over 50 mins, it used 4 consecutive Lambda executions.
Thanks for your help, I think i've found the solution

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.