This is a series of notebooks that mark my progress in reading and practicing the concepts presented by Muller and Guido in the book Introduction to Machine Learning with Python: A Guide for Data Scientists.
Chapter 1: Basic ML concepts and the first example with Iris dataset and KNN Classifier.
Chapter 2: Overview of a bunch of ML algorithms.
Nearest Neighbors - Easy to explain - Good as baseline - Not good for large and high dimensional datasets - non-linear time complexity Linear Models - Good for large and high dimensional sparse datasets - Usually fast - Easy to explain - Some can perform feature selection - Sensible to scaling - Sensible to parameter tuning - Models are limited to hyperplanes Naive Bayes - Very very fast - Only for classification - Good for large and high dimensional datasets - Often less accurate than Linear Models Decision Trees - Very fast - Robust to scaling - Very very easy to explain Random Forests - Better than a Decision Tree alone - Very robust and powerful - Robust to scalin - Not very good to high-dimensional sparse data Gradient Boosted Decision Trees - Often better than Random Forests - Slower to train tran Random Forests, but faster to predict and smaller in memory - Often needs parameter tuning Support Vector Machines - Poweful for medium-size datasets - Requires scaling - Very sensitive to parameter tuning Neural Networks - Can build very complex models - Sensitive to scaling of the data - Sensitive to parameter tuning - Long time to train Chapter 3: Unsupervised Learning and Preprocessing
Scaling - StandardScaler - RobustScaler - MinMaxScaler - Normalizer Dimensionality Reduction, Feature Extraction and Manifold Learning - Principal Component Analysis (PCA) - Non-Negative Matrix Factorization - Manifold Learning with t-SNE Clustering - *k*-Means Clustering - Agglomerative Clustering - DBSCAN Clustering Evaluation - Adjusted Rand Index (ARI) - Normalized Mutual Information (NMI) - Sillhouette Coefficient - Robustness-based clustering metrics - Qualitative Method Chapter 4: Representing Data and Engineering Features Chapter 5: Model Evaluation and Improvement