K Nearest Neighbors with Python | ML

K Nearest Neighbors with Python | ML

The K-Nearest Neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used for both classification and regression. It works by finding a predefined number of training samples closest in distance to a new sample and predicts the label from these.

Here's a step-by-step guide to implement KNN for classification using Python and the scikit-learn library:

1. Install Necessary Libraries

First, you'll need to install numpy and scikit-learn:

pip install numpy scikit-learn 

2. Load Dataset

For this example, let's use the Iris dataset, which is built into scikit-learn.

from sklearn import datasets iris = datasets.load_iris() X, y = iris.data, iris.target 

3. Split Dataset into Training and Test Set

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 

4. Standardize Features

KNN is sensitive to feature scaling because it relies on distances between data points. So, it's usually a good idea to scale the features:

from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) 

5. Build and Train KNN Classifier

from sklearn.neighbors import KNeighborsClassifier k = 3 # Number of neighbors to consider knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X_train, y_train) 

6. Make Predictions

y_pred = knn.predict(X_test) 

7. Evaluate the Model

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score print("Confusion Matrix:") print(confusion_matrix(y_test, y_pred)) print("\nClassification Report:") print(classification_report(y_test, y_pred)) print("\nAccuracy Score:") print(accuracy_score(y_test, y_pred)) 

Finding the Best Value of K

One common approach is to run KNN multiple times with different values of k and choose the one that has the best performance on a validation set.

error_rate = [] for i in range(1, 40): knn = KNeighborsClassifier(n_neighbors=i) knn.fit(X_train, y_train) pred_i = knn.predict(X_test) error_rate.append(np.mean(pred_i != y_test)) import matplotlib.pyplot as plt plt.figure(figsize=(10, 6)) plt.plot(range(1, 40), error_rate, color='blue', linestyle='dashed', marker='o', markerfacecolor='red', markersize=10) plt.title('Error Rate vs. K Value') plt.xlabel('K') plt.ylabel('Error Rate') plt.show() 

You can choose the value of k that gives the minimum error rate.

This is a basic introduction to KNN with Python. There are various nuances and best practices that can be explored as you dive deeper into real-world applications.


More Tags

presentviewcontroller word-style coturn sqlxml cubemx location dyld tsx distutils ionic4

More Programming Guides

Other Guides

More Programming Examples