What’s in it for you? What is Machine Learning Applications of Random Forest What is Classification Why Random Forest What is Random Forest Random Forest and Decision Tree Use Case How does Random Forest work
Remote Sensing Used in ETM devices to acquire images of the earth’s surface. Accuracy is higher and training time is less Object Detection Multiclass object detection is done using Random Forest algorithms Provides better detection in complicated environments Kinect Random Forest is used in a game console called Kinect Tracks body movements and recreates it in the game Application of Random Forest
Remote Sensing Used in ETM devices to acquire images of the earth’s surface. Accuracy is higher and training time is less Object Detection Multiclass object detection is done using Random Forest algorithms Provides better detection in complicated environments Kinect Random Forest is used in a game console called Kinect Tracks body movements and recreates it in the game Application of Random Forest
Kinect Random Forest is used in a game console called Kinect Tracks body movements and recreates it in the game Application of Random Forest
Application of Random Forest User performs a step Kinect registers the movement Marks the user based on accuracy
Application of Random Forest User performs a step Kinect registers the movement Marks the user based on accuracy Training set to identify body parts Random forest classifier learns Identifies the body parts while dancing Score game avatar based on accuracy
What’s in it for you? What is Machine Learning? Applications of Random Forest What is Classification? Why Random Forest? What is Random Forest? Random Forest and Decision Tree Comparing Random Forest and Regression Use Case – Iris Flower Analysis
Types of Machine Learning
Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Types of Machine Learning
Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Types of Machine Learning
Supervised Learning Machine Learning Unsupervised Learning Reinforcement Learning Types of Supervised Learning
Supervised Learning Machine LearningClassification Regression Types of Supervised Learning
Supervised Learning Machine LearningRegressionClassification What is Classification?
Supervised Learning Machine Learning Classification is a kind of problem wherein the outputs are categorical in nature like ‘Yes’ or ‘No’, ‘True’ or ‘False’, ‘0’ or ‘1’ Classification What is Classification?
Supervised Learning Machine LearningClassification Solutions under Classification
Supervised Learning Machine LearningClassification KNN Solutions under Classification
Supervised Learning Machine LearningClassification KNN Naïve Bayes Solutions under Classification
Supervised Learning Machine LearningClassification KNN Decision Tree Naïve Bayes Solutions under Classification
Supervised Learning Machine LearningClassification KNN Decision Tree Naïve Bayes Random Forest Solutions under Classification
Supervised Learning Machine LearningClassification KNN Decision Tree Naïve Bayes Random Forest Solutions under Classification
Why Random Forest?
Why Random Forest? No overfitting Use of multiple trees reduce the risk of overfitting Training time is less Estimates missing data Random Forest can maintain accuracy when a large proportion of data is missing High accuracy For large data, it produces highly accurate predictions Runs efficiently on large database
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Output 1 What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Output 1 Output 2 What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Output 1 Output 2 Output 3 What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 Majority What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 Majority Final Decision What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 Majority Final Decision What is Random Forest?
Random Forest and Decision Tree
Decision Tree Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence or reaction Is diameter>=3 False True Is color Orange? False True
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node Entropy is the measure of randomness or unpredictability in the dataset Decision Tree- Important Terms
Entropy Entropy is the measure of randomness or unpredictability in the dataset High entropy E1 Decision Tree - Important Terms
Entropy Entropy is the measure of randomness or unpredictability in the dataset Initial Dataset Decision split Set 1 Set 2 High entropy E1 Decision Tree - Important Terms
Entropy Entropy is the measure of randomness or unpredictability in the dataset Initial Dataset Decision split Set 1 Set 2 High entropy E1 Decision Tree - Important Terms
Entropy Entropy is the measure of randomness or unpredictability in the dataset Initial Dataset Decision Split Set 1 Set 2 High entropy Lower entropy After Splitting E1 E2 Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node It is the measure of decrease in entropy after the dataset is split Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision EntropyInformation gain It is the measure of decrease in entropy after the dataset is split High entropy Lower entropy After Splitting E1 E2 Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision EntropyInformation gain It is the measure of decrease in entropy after the dataset is split High entropy Lower entropy After splitting E1 E2 Information gain= E1-E2 Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node Leaf node carries the classification or the decision Decision Tree - Important Terms
EntropyLeaf Node Leaf node carries the classification or the decision Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Leaf Node Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node Decision node has two or more branches Decision Tree - Important Terms
EntropyDecision Node Decision node has two or more branches Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Decision Node Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node The top most Decision node is known as the Root node Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision EntropyRoot Node The top most Decision node is known as the Root node Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Root Node Decision Tree - Important Terms
How does a Decision Tree work?
Let’s try to understand how a DecisionTree works with a simple example How does a Decision Tree work?
How does a Decision Tree work?
Problem statement To classify the different types of fruits in the bowl based on different features How does a Decision Tree work?
Problem statement To classify the different types of fruits in the bowl based on different features The dataset(bowl) is looking quite messy and the entropy is high in this case How does a Decision Tree work?
Problem statement To classify the different types of fruits in the bowl based on different features The dataset(bowl) is looking quite messy and the entropy is high in this case Training Dataset Color Diameter Label Red Yellow Purple Red Yellow 3 1 1 3 Purple 3 3 Apple Apple Lemon Lemon Grapes Grapes How does a Decision Tree work?
How to split the data We have to frame the conditions that split the data in such a way that the information gain is the highest How does a Decision Tree work?
How to split the data We have to frame the conditions that split the data in such a way that the information gain is the highest Note Gain is the measure of decrease in entropy after splitting How does a Decision Tree work?
Now we will try to choose a condition that gives us the highest gain How does a Decision Tree work?
Now we will try to choose a condition that gives us the highest gain We will do that by splitting the data using each condition and checking the gain that we get out them. How does a Decision Tree work?
We will do that by splitting the data using each condition and checking the gain that we get out them. The condition that gives us the highest gain will be used to make the first split How does a Decision Tree work?
Conditions Training Dataset Color Diameter Label Red Yellow purple Red Yellow 3 1 1 3 purple 3 3 Apple Apple Lemon Lemon Grapes Grapes Color== purple? Diameter=3 Color==Yellow? Color== Red? Diameter=1 How does a Decision Tree work?
Conditions Training Dataset Color Diameter Label Red Yellow purple Red Yellow 3 1 1 3 purple 3 3 Apple Apple Lemon Lemon Grapes Grapes Color== purple? Diameter=3 Color==Yellow? Color== Red? Diameter=1 Let’s say this condition gives us the maximum gain How does a Decision Tree work? Color==Yellow?
We split the data Is diameter >=3? False True How does a Decision Tree work?
Is diameter >=3? False True The entropy after splitting has decreased considerably How does a Decision Tree work?
Is diameter >=3? False True The entropy after splitting has decreased considerablyThis NODE has already attained an entropy value of zero As you can see, there is only one kind of label left for this branch How does a Decision Tree work?
Is diameter >=3? False True So no further splitting is required for this NODE How does a Decision Tree work?
Is diameter >=3? False True So no further splitting is required for this NODE However, this node still requires a split to decrease the entropy further How does a Decision Tree work?
Is diameter>=3 False True Is color yellow? False True So, we split the right node further based on the color How does a Decision Tree work?
Is diameter>=3 False True Is color yellow? False True So the entropy in this case is now zero How does a Decision Tree work?
Is diameter>=3 False True Is color yellow? False True Predict lemon with 100% accuracy How does a Decision Tree work?
Is diameter>=3 False True Is color yellow? False True Predict apple with 100% accuracy How does a Decision Tree work?
Is diameter>=3 False True Is color yellow? False True Predict grapes with 100% accuracy How does a Decision Tree work?
How does Random Forest work?
Is diameter>=3 False True Is color Orange? False True How does a Random Forest work? Let this beTree 1
Let this beTree 2 Is color=red FalseTrue Is shape==circle? False true How does a Random Forest work?
Let this beTree 3 Is diameter=1 FalseTrue Grows in summer? False True How does a Random Forest work?
Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False true Is diameter=1 FalseTrue Grows in summer? False True tree 1 tree 2 tree 3 How does a Random Forest work?
Now Lets try to classify this fruit How does a Random Forest work?
Is diameter>=3 False True Is color Orange? False True Tree 1 classifies it as an orange How does a Random Forest work? Diameter = 3 Colour = orange Grows in summer = yes SHAPE = CIRCLE
Tree 2 classifies it as cherries Is color=red FalseTrue Is shape==circle? False True How does a Random Forest work? Diameter = 3 Colour = orange Grows in summer = yes SHAPE = CIRCLE
Tree 3 classifies it as orange Is diameter=1 FalseTrue Grows in summer? False True How does a Random Forest work? Diameter = 3 Colour = orange Grows in summer = yes SHAPE = CIRCLE
Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3 How does a Random Forest work?
How does a Random Forest work? orange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
How does a Random Forest work? cherryorange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
How does a Random Forest work? orangecherryorange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
How does a Random Forest work? orangecherryorange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
cherry Majority voted How does a Random Forest work? orange
So the fruit is classified as an orange How does a Random Forest work?
So the fruit is classified as an orange How does a Random Forest work?
Use Case – Iris Flower Analysis
Wonder what species of Iris do these flowers belong to? Use Case - Problem Statement
Let’s try to predict the species of the flowers using machine learning in Python Use Case - Problem Statement
Let’s See how it can be done Use Case - Implementation
# Loading the library with the iris dataset from sklearn.datasets import load_iris # Loading scikit's random forest classifier library from sklearn.ensemble import RandomForestClassifier # Loading pandas import pandas as pd # Loading numpy import numpy as np # Setting random seed np.random.seed(0) Use Case - Implementation
# Creating an object called iris with the iris data iris = load_iris() # Creating a dataframe with the four feature variables df = pd.DataFrame(iris.data, columns=iris.feature_names) # Viewing the top 5 rows df.head() Use Case - Implementation
# Adding a new column for the species name df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) # Viewing the top 5 rows df.head() Use Case - Implementation
# Creating Test and Train Data df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75 # View the top 5 rows df.head() Use Case - Implementation
# Creating dataframes with test rows and training rows train, test = df[df['is_train']==True], df[df['is_train']==False] # Show the number of observations for the test and training dataframes print('Number of observations in the training data:', len(train)) print('Number of observations in the test data:',len(test)) Use Case - Implementation
# Create a list of the feature column's names features = df.columns[:4] # View features features Use Case - Implementation
# Converting each species name into digits y = pd.factorize(train['species'])[0] # Viewing target y Use Case - Implementation
# Creating a random forest Classifier. clf = RandomForestClassifier(n_jobs=2, random_state=0) # Training the classifier clf.fit(train[features], y) Use Case - Implementation
# Applying the trained Classifier to the test clf.predict(test[features]) Use Case - Implementation
# Viewing the predicted probabilities of the first 10 observations clf.predict_proba(test[features])[0:10] Use Case - Implementation
# mapping names for the plants for each predicted plant class preds = iris.target_names[clf.predict(test[features])] # View the PREDICTED species for the first five observations preds[0:5] Use Case - Implementation
# Viewing the ACTUAL species for the first five observations test['species'].head() Use Case - Implementation
# Creating confusion matrix pd.crosstab(test['species'], preds, rownames=['Actual Species'], colnames=['Predicted Species']) Use Case - Implementation
Use Case - Implementation Total number of predictions = 32
Use Case - Implementation Number of accurate predictions = 30
Use Case - Implementation Number of inaccurate predictions = 2
Use Case - Implementation ModelAccuracy 30 32 X 100 = 93
Use Case - Implementation So the model accuracy is 93% ModelAccuracy 30 32 X 100 = 93
Key takeaways
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine Learning | Simplilearn

Random Forest Algorithm - Random Forest Explained | Random Forest In Machine Learning | Simplilearn

  • 1.
    What’s in itfor you? What is Machine Learning Applications of Random Forest What is Classification Why Random Forest What is Random Forest Random Forest and Decision Tree Use Case How does Random Forest work
  • 2.
    Remote Sensing Used in ETMdevices to acquire images of the earth’s surface. Accuracy is higher and training time is less Object Detection Multiclass object detection is done using Random Forest algorithms Provides better detection in complicated environments Kinect Random Forest is used in a game console called Kinect Tracks body movements and recreates it in the game Application of Random Forest
  • 3.
    Remote Sensing Used in ETMdevices to acquire images of the earth’s surface. Accuracy is higher and training time is less Object Detection Multiclass object detection is done using Random Forest algorithms Provides better detection in complicated environments Kinect Random Forest is used in a game console called Kinect Tracks body movements and recreates it in the game Application of Random Forest
  • 4.
    Kinect Random Forest is usedin a game console called Kinect Tracks body movements and recreates it in the game Application of Random Forest
  • 5.
    Application of RandomForest User performs a step Kinect registers the movement Marks the user based on accuracy
  • 6.
    Application of RandomForest User performs a step Kinect registers the movement Marks the user based on accuracy Training set to identify body parts Random forest classifier learns Identifies the body parts while dancing Score game avatar based on accuracy
  • 7.
    What’s in itfor you? What is Machine Learning? Applications of Random Forest What is Classification? Why Random Forest? What is Random Forest? Random Forest and Decision Tree Comparing Random Forest and Regression Use Case – Iris Flower Analysis
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    Supervised Learning Machine Learning Classification isa kind of problem wherein the outputs are categorical in nature like ‘Yes’ or ‘No’, ‘True’ or ‘False’, ‘0’ or ‘1’ Classification What is Classification?
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    Supervised Learning Machine LearningClassification KNN Decision Tree NaïveBayes Random Forest Solutions under Classification
  • 20.
    Supervised Learning Machine LearningClassification KNN Decision Tree NaïveBayes Random Forest Solutions under Classification
  • 21.
  • 22.
    Why Random Forest? Nooverfitting Use of multiple trees reduce the risk of overfitting Training time is less Estimates missing data Random Forest can maintain accuracy when a large proportion of data is missing High accuracy For large data, it produces highly accurate predictions Runs efficiently on large database
  • 23.
  • 24.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision What is Random Forest?
  • 25.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Output 1 What is Random Forest?
  • 26.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Output 1 Output 2 What is Random Forest?
  • 27.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Output 1 Output 2 Output 3 What is Random Forest?
  • 28.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 What is Random Forest?
  • 29.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 What is Random Forest?
  • 30.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 Majority What is Random Forest?
  • 31.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 Majority Final Decision What is Random Forest?
  • 32.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 Majority Final Decision What is Random Forest?
  • 33.
    Random Forest andDecision Tree
  • 34.
    Decision Tree Decision Treeis a tree shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence or reaction Is diameter>=3 False True Is color Orange? False True
  • 35.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node Entropy is the measure of randomness or unpredictability in the dataset Decision Tree- Important Terms
  • 36.
    Entropy Entropy is the measureof randomness or unpredictability in the dataset High entropy E1 Decision Tree - Important Terms
  • 37.
    Entropy Entropy is the measureof randomness or unpredictability in the dataset Initial Dataset Decision split Set 1 Set 2 High entropy E1 Decision Tree - Important Terms
  • 38.
    Entropy Entropy is the measureof randomness or unpredictability in the dataset Initial Dataset Decision split Set 1 Set 2 High entropy E1 Decision Tree - Important Terms
  • 39.
    Entropy Entropy is the measureof randomness or unpredictability in the dataset Initial Dataset Decision Split Set 1 Set 2 High entropy Lower entropy After Splitting E1 E2 Decision Tree - Important Terms
  • 40.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node It is the measure of decrease in entropy after the dataset is split Decision Tree - Important Terms
  • 41.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision EntropyInformation gain It is the measure of decrease in entropy after the dataset is split High entropy Lower entropy After Splitting E1 E2 Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Decision Tree - Important Terms
  • 42.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision EntropyInformation gain It is the measure of decrease in entropy after the dataset is split High entropy Lower entropy After splitting E1 E2 Information gain= E1-E2 Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Decision Tree - Important Terms
  • 43.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node Leaf node carries the classification or the decision Decision Tree - Important Terms
  • 44.
    EntropyLeaf Node Leaf node carriesthe classification or the decision Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Leaf Node Decision Tree - Important Terms
  • 45.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node Decision node has two or more branches Decision Tree - Important Terms
  • 46.
    EntropyDecision Node Decision node has twoor more branches Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Decision Node Decision Tree - Important Terms
  • 47.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node The top most Decision node is known as the Root node Decision Tree - Important Terms
  • 48.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision EntropyRoot Node The top most Decision node is known as the Root node Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Root Node Decision Tree - Important Terms
  • 49.
    How does aDecision Tree work?
  • 50.
    Let’s try tounderstand how a DecisionTree works with a simple example How does a Decision Tree work?
  • 51.
    How does aDecision Tree work?
  • 52.
    Problem statement To classifythe different types of fruits in the bowl based on different features How does a Decision Tree work?
  • 53.
    Problem statement To classifythe different types of fruits in the bowl based on different features The dataset(bowl) is looking quite messy and the entropy is high in this case How does a Decision Tree work?
  • 54.
    Problem statement To classifythe different types of fruits in the bowl based on different features The dataset(bowl) is looking quite messy and the entropy is high in this case Training Dataset Color Diameter Label Red Yellow Purple Red Yellow 3 1 1 3 Purple 3 3 Apple Apple Lemon Lemon Grapes Grapes How does a Decision Tree work?
  • 55.
    How to splitthe data We have to frame the conditions that split the data in such a way that the information gain is the highest How does a Decision Tree work?
  • 56.
    How to splitthe data We have to frame the conditions that split the data in such a way that the information gain is the highest Note Gain is the measure of decrease in entropy after splitting How does a Decision Tree work?
  • 57.
    Now we willtry to choose a condition that gives us the highest gain How does a Decision Tree work?
  • 58.
    Now we willtry to choose a condition that gives us the highest gain We will do that by splitting the data using each condition and checking the gain that we get out them. How does a Decision Tree work?
  • 59.
    We will dothat by splitting the data using each condition and checking the gain that we get out them. The condition that gives us the highest gain will be used to make the first split How does a Decision Tree work?
  • 60.
    Conditions Training Dataset Color DiameterLabel Red Yellow purple Red Yellow 3 1 1 3 purple 3 3 Apple Apple Lemon Lemon Grapes Grapes Color== purple? Diameter=3 Color==Yellow? Color== Red? Diameter=1 How does a Decision Tree work?
  • 61.
    Conditions Training Dataset Color DiameterLabel Red Yellow purple Red Yellow 3 1 1 3 purple 3 3 Apple Apple Lemon Lemon Grapes Grapes Color== purple? Diameter=3 Color==Yellow? Color== Red? Diameter=1 Let’s say this condition gives us the maximum gain How does a Decision Tree work? Color==Yellow?
  • 62.
    We split thedata Is diameter >=3? False True How does a Decision Tree work?
  • 63.
    Is diameter >=3? FalseTrue The entropy after splitting has decreased considerably How does a Decision Tree work?
  • 64.
    Is diameter >=3? FalseTrue The entropy after splitting has decreased considerablyThis NODE has already attained an entropy value of zero As you can see, there is only one kind of label left for this branch How does a Decision Tree work?
  • 65.
    Is diameter >=3? FalseTrue So no further splitting is required for this NODE How does a Decision Tree work?
  • 66.
    Is diameter >=3? FalseTrue So no further splitting is required for this NODE However, this node still requires a split to decrease the entropy further How does a Decision Tree work?
  • 67.
    Is diameter>=3 False True Iscolor yellow? False True So, we split the right node further based on the color How does a Decision Tree work?
  • 68.
    Is diameter>=3 False True Iscolor yellow? False True So the entropy in this case is now zero How does a Decision Tree work?
  • 69.
    Is diameter>=3 False True Iscolor yellow? False True Predict lemon with 100% accuracy How does a Decision Tree work?
  • 70.
    Is diameter>=3 False True Iscolor yellow? False True Predict apple with 100% accuracy How does a Decision Tree work?
  • 71.
    Is diameter>=3 False True Iscolor yellow? False True Predict grapes with 100% accuracy How does a Decision Tree work?
  • 72.
    How does RandomForest work?
  • 73.
    Is diameter>=3 False True Iscolor Orange? False True How does a Random Forest work? Let this beTree 1
  • 74.
    Let this beTree2 Is color=red FalseTrue Is shape==circle? False true How does a Random Forest work?
  • 75.
    Let this beTree3 Is diameter=1 FalseTrue Grows in summer? False True How does a Random Forest work?
  • 76.
    Is diameter>=3 False True Iscolor Orange? False True Is color=red FalseTrue Is shape==circle? False true Is diameter=1 FalseTrue Grows in summer? False True tree 1 tree 2 tree 3 How does a Random Forest work?
  • 77.
    Now Lets tryto classify this fruit How does a Random Forest work?
  • 78.
    Is diameter>=3 False True Iscolor Orange? False True Tree 1 classifies it as an orange How does a Random Forest work? Diameter = 3 Colour = orange Grows in summer = yes SHAPE = CIRCLE
  • 79.
    Tree 2 classifiesit as cherries Is color=red FalseTrue Is shape==circle? False True How does a Random Forest work? Diameter = 3 Colour = orange Grows in summer = yes SHAPE = CIRCLE
  • 80.
    Tree 3 classifiesit as orange Is diameter=1 FalseTrue Grows in summer? False True How does a Random Forest work? Diameter = 3 Colour = orange Grows in summer = yes SHAPE = CIRCLE
  • 81.
    Is diameter>=3 False True Iscolor Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3 How does a Random Forest work?
  • 82.
    How does aRandom Forest work? orange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
  • 83.
    How does aRandom Forest work? cherryorange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
  • 84.
    How does aRandom Forest work? orangecherryorange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
  • 85.
    How does aRandom Forest work? orangecherryorange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
  • 86.
    cherry Majority voted How doesa Random Forest work? orange
  • 87.
    So the fruitis classified as an orange How does a Random Forest work?
  • 88.
    So the fruitis classified as an orange How does a Random Forest work?
  • 89.
    Use Case –Iris Flower Analysis
  • 90.
    Wonder what speciesof Iris do these flowers belong to? Use Case - Problem Statement
  • 91.
    Let’s try topredict the species of the flowers using machine learning in Python Use Case - Problem Statement
  • 92.
    Let’s See howit can be done Use Case - Implementation
  • 93.
    # Loading thelibrary with the iris dataset from sklearn.datasets import load_iris # Loading scikit's random forest classifier library from sklearn.ensemble import RandomForestClassifier # Loading pandas import pandas as pd # Loading numpy import numpy as np # Setting random seed np.random.seed(0) Use Case - Implementation
  • 94.
    # Creating anobject called iris with the iris data iris = load_iris() # Creating a dataframe with the four feature variables df = pd.DataFrame(iris.data, columns=iris.feature_names) # Viewing the top 5 rows df.head() Use Case - Implementation
  • 95.
    # Adding anew column for the species name df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) # Viewing the top 5 rows df.head() Use Case - Implementation
  • 96.
    # Creating Testand Train Data df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75 # View the top 5 rows df.head() Use Case - Implementation
  • 97.
    # Creating dataframeswith test rows and training rows train, test = df[df['is_train']==True], df[df['is_train']==False] # Show the number of observations for the test and training dataframes print('Number of observations in the training data:', len(train)) print('Number of observations in the test data:',len(test)) Use Case - Implementation
  • 98.
    # Create alist of the feature column's names features = df.columns[:4] # View features features Use Case - Implementation
  • 99.
    # Converting eachspecies name into digits y = pd.factorize(train['species'])[0] # Viewing target y Use Case - Implementation
  • 100.
    # Creating arandom forest Classifier. clf = RandomForestClassifier(n_jobs=2, random_state=0) # Training the classifier clf.fit(train[features], y) Use Case - Implementation
  • 101.
    # Applying thetrained Classifier to the test clf.predict(test[features]) Use Case - Implementation
  • 102.
    # Viewing thepredicted probabilities of the first 10 observations clf.predict_proba(test[features])[0:10] Use Case - Implementation
  • 103.
    # mapping namesfor the plants for each predicted plant class preds = iris.target_names[clf.predict(test[features])] # View the PREDICTED species for the first five observations preds[0:5] Use Case - Implementation
  • 104.
    # Viewing theACTUAL species for the first five observations test['species'].head() Use Case - Implementation
  • 105.
    # Creating confusionmatrix pd.crosstab(test['species'], preds, rownames=['Actual Species'], colnames=['Predicted Species']) Use Case - Implementation
  • 106.
    Use Case -Implementation Total number of predictions = 32
  • 107.
    Use Case -Implementation Number of accurate predictions = 30
  • 108.
    Use Case -Implementation Number of inaccurate predictions = 2
  • 109.
    Use Case -Implementation ModelAccuracy 30 32 X 100 = 93
  • 110.
    Use Case -Implementation So the model accuracy is 93% ModelAccuracy 30 32 X 100 = 93
  • 111.

Editor's Notes