5

I have some data in S3 and I want to create a lambda function to predict the output with my deployed aws sagemaker endpoint then I put the outputs in S3 again. Is it necessary in this case to create an api gateway like decribed in this link ? and in the lambda function what I have to put. I expect to put (where to find the data, how to invoke the endpoint, where to put the data)

import boto3 import io import json import csv import os client = boto3.client('s3') #low-level functional API resource = boto3.resource('s3') #high-level object-oriented API my_bucket = resource.Bucket('demo-scikit-byo-iris') #subsitute this for your s3 bucket name. obj = client.get_object(Bucket='demo-scikit-byo-iris', Key='foo.csv') lines= obj['Body'].read().decode('utf-8').splitlines() reader = csv.reader(lines) import io file = io.StringIO(lines) # grab environment variables runtime= boto3.client('runtime.sagemaker') response = runtime.invoke_endpoint( EndpointName= 'nilm2', Body = file.getvalue(), ContentType='*/*', Accept = 'Accept') output = response['Body'].read().decode('utf-8') 

my data is a csv file of 2 columns of floats with no headers, the problem is that lines return a list of strings(each row is an element of this list:['11.55,65.23', '55.68,69.56'...]) the invoke work well but the response is also a string: output = '65.23\n,65.23\n,22.56\n,...'

So how to save this output to S3 as a csv file

Thanks

1
  • As suggested below, use SageMaker Batch Transform. It is much simpler and lower cost. Commented Feb 15, 2019 at 17:23

1 Answer 1

4

If your Lambda function is scheduled, then you won't need an API Gateway. But if the predict action will be triggered by a user, by an application, for example, you will need.

When you call the invoke endpoint, actually you are calling a SageMaker endpoint, which is not the same as an API Gateway endpoint.

A common architecture with SageMaker is:

  1. API Gateway with receives a request then calls an authorizer, then invoke your Lambda;
  2. A Lambda with does some parsing in your input data, then calls your SageMaker prediction endpoint, then, handles the result and returns to your application.

By the situation you describe, I can't say if your task is some academic stuff or a production one.

So, how you can save the data as a CSV file from your Lambda?

I believe you can just parse the output, then just upload the file to S3. Here you will parse manually or with a lib, with boto3 you can upload the file. The output of your model depends on your implementation on SageMaker image. So, if you need the response data in another format, maybe you will need to use a custom image. I normally use a custom image, which I can define how I want to handle my data on requests/responses.

In terms of a production task, I certainly recommend you check Batch transform jobs from SageMaker. You can provide an input file (the S3 path) and also a destination file (another S3 path). The SageMaker will run the batch predictions and will persist a file with the results. Also, you won't need to deploy your model to an endpoint, when this job run, will create an instance of your endpoint, download your data to predict, do the predictions, upload the output, and shut down the instance. You only need a trained model.

Here some info about Batch transform jobs:

https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html

https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html

I hope it helps, let me know if need more info.

Regards.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi I'm currently working on deploying a model on SageMaker in production, I'm comparing the 'batch job' method and the 'endpoint' method. From your answer, I've got a couple if questions: 1). does it mean there's no way to specify the output path when using endpoint, we can only use a lambda? What if use StepFunction to work with endpoint? 2.) my understanding is that 'batch job' is like a temp endpoint, if the job is done, the temp endpoint will be shut down, and it costs less than using an endpoint, but how about the security side? Thanks,
Hi Cecilia, I have answered your questions, but my comment ended up too long, so I wrote a gist file. Sorry. Full comment
Hi Bruno, much appreciated, I'll give it a read, thank you very much :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.