1

I have an endpoint running a trained SageMaker model on AWS, which expects the data on a specific format.

Initially, the data has been processed on the client side of the application, it means, the API Gateway (which receives the POST API calls on AWS) used to receive pre-processed data, but now there's a change, the API Gateway will receive raw data from the client, and the job of pre-processing this data before sending to our SageMaker model is up to our workflow.

What is the best way to create a pre-processing job on this workflow, without needing to re-train the model? My pre-process is just a bunch of dataframe transformations, no standardization or calculation with the training set required (it would not need to save any model file).

Thanks!

2 Answers 2

1

After some research, this is the solution I've followed:

  • First I have created a SKLearn sagemaker model to do all the preprocess setup (I've built a Scikit-Learn custom class to handle all the preprocess steps, following this AWS code)
  • Trained this preprocess model on my training data. My model, in specific, didn't need to be trained (it does not have any standardization or anything that would need to store training data parameters), but sagemaker requires the model to be trained.
  • Loaded the trained legacy model that we had using the Model parameter.
  • Created a PipelineModel with the preprocessing model and legacy model in cascade:
pipeline_model = PipelineModel(name=model_name, role=role, models=[ preprocess_model, trained_model ]) 
  • Create a new endpoint, calling the PipelineModel and then changed the Lambda function to call this new endpoint. With this I could send the raw data directly for the same API Gateway and it would call only one endpoint, without needing to pay two endpoints 24/7 to perform the entire process.

I've found this to be a good and "economic" way to perform the preprocess outside the trained model, without having to do hard processing jobs on a Lambda function.

Sign up to request clarification or add additional context in comments.

Comments

0

I would create a Lambda, which is getting invoked by the API-Gateway, processing the data and sending it to your SageMaker endpoint.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.