ML_Intro_Notebooks

This is a series of notebooks that mark my progress in reading and practicing the concepts presented by Muller and Guido in the book Introduction to Machine Learning with Python: A Guide for Data Scientists.

Chapter 1: Basic ML concepts and the first example with Iris dataset and KNN Classifier.

Chapter 2: Overview of a bunch of ML algorithms.

Nearest Neighbors	- Easy to explain	- Good as baseline	- Not good for large and high dimensional datasets	- non-linear time complexity Linear Models	- Good for large and high dimensional sparse datasets	- Usually fast	- Easy to explain	- Some can perform feature selection	- Sensible to scaling	- Sensible to parameter tuning	- Models are limited to hyperplanes Naive Bayes	- Very very fast	- Only for classification	- Good for large and high dimensional datasets	- Often less accurate than Linear Models Decision Trees	- Very fast	- Robust to scaling	- Very very easy to explain Random Forests	- Better than a Decision Tree alone	- Very robust and powerful	- Robust to scalin	- Not very good to high-dimensional sparse data Gradient Boosted Decision Trees	- Often better than Random Forests	- Slower to train tran Random Forests, but faster to predict and smaller in memory	- Often needs parameter tuning Support Vector Machines	- Poweful for medium-size datasets	- Requires scaling	- Very sensitive to parameter tuning Neural Networks	- Can build very complex models	- Sensitive to scaling of the data	- Sensitive to parameter tuning	- Long time to train

Chapter 3: Unsupervised Learning and Preprocessing

Scaling	- StandardScaler	- RobustScaler	- MinMaxScaler	- Normalizer Dimensionality Reduction, Feature Extraction and Manifold Learning	- Principal Component Analysis (PCA)	- Non-Negative Matrix Factorization	- Manifold Learning with t-SNE Clustering	- *k*-Means Clustering	- Agglomerative Clustering	- DBSCAN Clustering Evaluation	- Adjusted Rand Index (ARI)	- Normalized Mutual Information (NMI)	- Sillhouette Coefficient	- Robustness-based clustering metrics	- Qualitative Method

Chapter 4: Representing Data and Engineering Features Chapter 5: Model Evaluation and Improvement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML_Intro_Notebooks

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
README.md		README.md
cap1 - First example with KNN and Iris Dataset.ipynb		cap1 - First example with KNN and Iris Dataset.ipynb
cap2.1 - k-NN - Classification Problems.ipynb		cap2.1 - k-NN - Classification Problems.ipynb
cap2.2 - k-NN - Regression Problems.ipynb		cap2.2 - k-NN - Regression Problems.ipynb
cap2.3 Linear Models - Regression Problems.ipynb		cap2.3 Linear Models - Regression Problems.ipynb
cap2.4 Linear Models - Classification Problems.ipynb		cap2.4 Linear Models - Classification Problems.ipynb
cap2.5 - Naive Bayes.ipynb		cap2.5 - Naive Bayes.ipynb
cap2.6 - Decision Tree.ipynb		cap2.6 - Decision Tree.ipynb
cap2.7 - Ensembles of Decision Trees.ipynb		cap2.7 - Ensembles of Decision Trees.ipynb
cap2.8 - Kernelized Support Vector Machines.ipynb		cap2.8 - Kernelized Support Vector Machines.ipynb
cap2.9 - Neural Networks (Deep Learning).ipynb		cap2.9 - Neural Networks (Deep Learning).ipynb
cap3.1 - Preprocessing and Scaling.ipynb		cap3.1 - Preprocessing and Scaling.ipynb
cap3.2 - Principal Component Analysis.ipynb		cap3.2 - Principal Component Analysis.ipynb
cap3.3 - Non-Negative Matrix Factorization (NMF).ipynb		cap3.3 - Non-Negative Matrix Factorization (NMF).ipynb
cap3.4 - Manifold Learning with t-SNE.ipynb		cap3.4 - Manifold Learning with t-SNE.ipynb
cap3.5 - Clustering - k-means.ipynb		cap3.5 - Clustering - k-means.ipynb
cap3.6 - Clustering - Agglomerative Clustering.ipynb		cap3.6 - Clustering - Agglomerative Clustering.ipynb
cap3.7 - Clustering - DBSCAN.ipynb		cap3.7 - Clustering - DBSCAN.ipynb
cap3.8 - Clustering - Comparing and Evaluating Clustering Algorithms.ipynb		cap3.8 - Clustering - Comparing and Evaluating Clustering Algorithms.ipynb
cap4.1 - Representing Data and Engineering Features.ipynb		cap4.1 - Representing Data and Engineering Features.ipynb
cap4.2 - Automatic Feature Selection.ipynb		cap4.2 - Automatic Feature Selection.ipynb
cap5.1 - Model Evaluation and Improvement (Cross-validation).ipynb		cap5.1 - Model Evaluation and Improvement (Cross-validation).ipynb
cap5.2 - Model Evaluation and Improvement (Grid Search).ipynb		cap5.2 - Model Evaluation and Improvement (Grid Search).ipynb
cap5.3 - Model Evaluation and Improvement (Evaluation Metrics and Scoring).ipynb		cap5.3 - Model Evaluation and Improvement (Evaluation Metrics and Scoring).ipynb
cap6.1 - Algorithm Chains and Pipelines.ipynb		cap6.1 - Algorithm Chains and Pipelines.ipynb
cap7 - Working with Text Data.ipynb		cap7 - Working with Text Data.ipynb
tmp		tmp
tmp.png		tmp.png
tree.dot		tree.dot

provezano/ML_Intro_Notebooks

Folders and files

Latest commit

History

Repository files navigation

ML_Intro_Notebooks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages