sagemaker-pipeline

This repo takes the data processing and model training from https://github.com/aws-samples/amazon-sagemaker-immersion-day/blob/master/processing_xgboost.ipynb and converts it into a DVC pipeline. The code is minimally modified from the original notebook to modularize it into individual scripts and parametrize the s3 paths and training hyperparameters. To run it, modify the bucket and prefix paths in params.yaml and then use dvc repro or dvc exp run to execute the pipeline in SageMaker.

The pipeline has three stages:

Prepare data from S3
Run a preprocessing job using the Scikit Learn Processor
Run a model training job using XGBoost

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.dvc		.dvc
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
sm_prepare.py		sm_prepare.py
sm_preprocessing.py		sm_preprocessing.py
sm_training.py		sm_training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sagemaker-pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Languages

treeverse/sagemaker-pipeline

Folders and files

Latest commit

History

Repository files navigation

sagemaker-pipeline

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages