Project Objective: Build a precise machine learning predictive model using the Dataiku DSS (Dataiku Data Science Studio) to forecast heart failure incidents accurately.
The model was trained and tested using Python and the Heart Failure Prediction Dataset.
Dataiku DSS(Dataiku Data Science Studio) is a Big Data solution and predictive analysis software developed by the French publisher Dataiku. It offers pre-built capabilities to evaluate, deploy & monitor Machine Learning models.
Using Python notebooks and Dataiku Machine Learning experiment tracking capabilities, I went through:
Checking the distribution of the target variable
Transformation of categorical variables into dummies
Scaling of continuous variables
Splitting the dataset (train/test)
4 - Machine learning experimentation: the test of different Machine Learning approaches to predict heart failures using scikit-learn models
Scikit-learn models models tested:
- Logistic regression
- SVM
- Decision Tree
- Random Forest
a) For each model, a grid search was performed to find the best hyper parameters
b) Then the model was trained on the train set using these best parameters and cross-validation
c) Everything (parameters, performance metrics, and models) was logged in the Daitaku Experiment Tracking (MLFlow framework) to keep track of the results of the different experiments and be able to compare afterward.
Dataset Source: Heart Failure Prediction Dataset. Retrieved from Kaggle
