Data Preprocessing on AWS SageMaker

Question

I have an endpoint running a trained SageMaker model on AWS, which expects the data on a specific format.

Initially, the data has been processed on the client side of the application, it means, the API Gateway (which receives the POST API calls on AWS) used to receive pre-processed data, but now there's a change, the API Gateway will receive raw data from the client, and the job of pre-processing this data before sending to our SageMaker model is up to our workflow.

What is the best way to create a pre-processing job on this workflow, without needing to re-train the model? My pre-process is just a bunch of dataframe transformations, no standardization or calculation with the training set required (it would not need to save any model file).

Thanks!

pedroprates · Accepted Answer · 2020-10-16 18:05:02Z

After some research, this is the solution I've followed:

First I have created a SKLearn sagemaker model to do all the preprocess setup (I've built a Scikit-Learn custom class to handle all the preprocess steps, following this AWS code)
Trained this preprocess model on my training data. My model, in specific, didn't need to be trained (it does not have any standardization or anything that would need to store training data parameters), but sagemaker requires the model to be trained.
Loaded the trained legacy model that we had using the Model parameter.
Created a PipelineModel with the preprocessing model and legacy model in cascade:

pipeline_model = PipelineModel(name=model_name, role=role, models=[ preprocess_model, trained_model ])

Create a new endpoint, calling the PipelineModel and then changed the Lambda function to call this new endpoint. With this I could send the raw data directly for the same API Gateway and it would call only one endpoint, without needing to pay two endpoints 24/7 to perform the entire process.

I've found this to be a good and "economic" way to perform the preprocess outside the trained model, without having to do hard processing jobs on a Lambda function.

Robert Kossendey · Accepted Answer · 2020-10-13 07:58:20Z

I would create a Lambda, which is getting invoked by the API-Gateway, processing the data and sending it to your SageMaker endpoint.

Collectives™ on Stack Overflow

Data Preprocessing on AWS SageMaker

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related