After calculating the movie preference example by hand, as promised, we are going to implement Naïve Bayes from scratch. After that, we will implement it using the scikit-learn package.
Implementing Naïve Bayes from scratch
Before we develop the model, let’s define the toy dataset we just worked with:
>>> import numpy as np >>> X_train = np.array([ ... [0, 1, 1], ... [0, 0, 1], ... [0, 0, 0], ... [1, 1, 0]]) >>> Y_train = ['Y', 'N', 'Y', 'Y'] >>> X_test = np.array([[1, 1, 0]])
For the model, starting with the prior, we first group the data by label and record their indices by classes:
>>> def get_label_indices(labels): ... """ ... Group samples based on their labels and return indices ... @param labels: list of labels ... @return: dict, {class1: [indices], class2: [indices]} ... ...