2

I use AWS Simple Email Services (SES) for email. I've configured SES to save incoming email to an S3 bucket, which triggers an AWS Lambda function. This function reads the new object and forwards the object contents to an alternate email address.

I'd like to log some basic info. from my AWS Lambda function during invocation -- who the email is from, to whom it was sent, if it contained any links, etc.

Ideally I'd save this info. to a database, but since AWS Lambda functions are costly (relatively so to other AWS ops.), I'd like to do this as efficiently as possible.

I was thinking I could issue an HTTPS GET request to a private endpoint with a query-string containing the info. I want logged. Since I could fire my request async. at the outset and continue processing, I thought this might be a cheap and efficient approach.

Is this a good method? Are there any alternatives?

My Lambda function fires irregularly so despite Lambda functions being kept alive for 10 minutes or so post-firing, it seems a database connection is likely slow and costly since AWS charges per 100ms of usage.

Since I could conceivable get thousands of emails/month, ensuring my Lambda function is efficient is paramount to cost. I maintain 100s of domain names so my numbers aren't exaggerated. Thanks in advance.

4
  • This is very broad without some more info. Properly coded, a database connection to DynamoDB or an RDS server or a DB on your EC2 isn't slow and doesn't cost anything if you're not using it. But it depends on how many Lambda's you expect to have. If you're just concerned about cost have you thought about a very small EC2 server instead? It will depend on how much you use Lambda to see if it's worth it. Commented Jul 17, 2020 at 18:42
  • Well, my use case requires a Lambda function so I'd have to make a connection within it. This is costly if I have to establish a connection and save data on each invocation. Maybe I'm wrong that it's slow? Lambda functions charge by 100ms. Commented Jul 17, 2020 at 21:47
  • I'm confused about what you are trying to achieve. It seems like you want to reduce the run duration of the Lambda functions? However, I'm not sure what "value" you are talking about. Could you possibly edit your question to provide a lot more information? Also, what are your current Lambda costs? Commented Jul 17, 2020 at 22:59
  • John, I added some content that I think clarifies what I'm trying to do. And you're right, I'm trying to keep my Lambda function duration reduced as much as is possible to make them cheaper. Thx. Commented Jul 18, 2020 at 1:11

3 Answers 3

1

I do not think that thousands per emails per month should be a problem, these cloud services have been developed with scalability in mind and can go way beyond the numbers you are suggesting.

In terms of persisting, I cannot really understand - lack of logs, metrics - why your db connection would be slow. From the moment you use AWS, it will use its own internal infrastructure so speeds will be high and not something you should be worrying about.

I am not an expert on billing but from what you are describing, it seems like using lambdas + S3 + dynamoDB is highly optimised for your use case.

From the type of data you are describing (email data) it doesn't seem that you would have neither a memory issue (lambdas have mem constraints which can be a pain) or an IO bottleneck. If you can share more details on your memory used during invocation and the time taken that would be great. Also how much data you store on each lambda invocation.

I think you could store jsonified strings of your email data in dynamodb easily, it should be pretty seamless and not that costly.

Have not used (SES) but you could put a trigger on DynamoDB whenever you store a record, in case you want to follow up with another lambda. You could combine S3 + dynamoDB. When you store a record, simply upload a file containing the record to a new S3 key and update the row in DynamoDB with a pointer to the new S3 object

DynamoDB + S3

Sign up to request clarification or add additional context in comments.

4 Comments

Is it faster to use an HTTP GET method the way I described it?
If speed of execution is your concern (i am not sure why you need more than 100ms, although each lambda write should take less than that), I cannot think of a faster way of doing this. If however you want to do a rest API get + persist, then you could have something like a celery task queue, each task being a http get. Youu could have each task use asyncio. This way you can issue many multiple GETs while awaiting on a single get response containing your email data, and then write to your db. In theory you could totally avoid lambdas. But this is your choice and you will need to experiment.
How could I avoid using a Lambda function? The Lambda function is triggered on an object being saved to an S3 bucket and the Lambda function invocation is what's doing the work -- the emailing. Your solution is quite good, just trying to fully resolve any shortcomings on my part.
You could remove the trigger and simply gave a celery task that will read the s3 bucket every say few minutes for new objects, based on date created, process them and send the processed result wherever. The difference is that you will now be polling rather than relying on an event. But you will at least avoid lambda if that is what you want. Although I think Lambda is a neater approach.
1

You can now persist data using AWS EFS.

Comments

0

Since you are collecting small amounts of data, an option could be to write your package of data into SQS.

A periodic lambda or EC2 could empty the SQS queue data into a DB or whatever you required, at an interval to amortise cost of running - say every 30 minutes, depending on your traffic: make timing adjustable by parameter !

If you are concerned about runtime, set up boto3 handles etc in the lambda init so they get reused rather than being recreated each time.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.