K Nearest Neighbors with Python | ML

The K-Nearest Neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used for both classification and regression. It works by finding a predefined number of training samples closest in distance to a new sample and predicts the label from these.

Here's a step-by-step guide to implement KNN for classification using Python and the scikit-learn library:

1. Install Necessary Libraries

First, you'll need to install numpy and scikit-learn:

pip install numpy scikit-learn

2. Load Dataset

For this example, let's use the Iris dataset, which is built into scikit-learn.

from sklearn import datasets iris = datasets.load_iris() X, y = iris.data, iris.target

3. Split Dataset into Training and Test Set

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

4. Standardize Features

KNN is sensitive to feature scaling because it relies on distances between data points. So, it's usually a good idea to scale the features:

from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test)

5. Build and Train KNN Classifier

from sklearn.neighbors import KNeighborsClassifier k = 3 # Number of neighbors to consider knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X_train, y_train)

6. Make Predictions

y_pred = knn.predict(X_test)

7. Evaluate the Model

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score print("Confusion Matrix:") print(confusion_matrix(y_test, y_pred)) print("\nClassification Report:") print(classification_report(y_test, y_pred)) print("\nAccuracy Score:") print(accuracy_score(y_test, y_pred))

Finding the Best Value of K

One common approach is to run KNN multiple times with different values of k and choose the one that has the best performance on a validation set.

error_rate = [] for i in range(1, 40): knn = KNeighborsClassifier(n_neighbors=i) knn.fit(X_train, y_train) pred_i = knn.predict(X_test) error_rate.append(np.mean(pred_i != y_test)) import matplotlib.pyplot as plt plt.figure(figsize=(10, 6)) plt.plot(range(1, 40), error_rate, color='blue', linestyle='dashed', marker='o', markerfacecolor='red', markersize=10) plt.title('Error Rate vs. K Value') plt.xlabel('K') plt.ylabel('Error Rate') plt.show()

You can choose the value of k that gives the minimum error rate.

This is a basic introduction to KNN with Python. There are various nuances and best practices that can be explored as you dive deeper into real-world applications.

More Tags

presentviewcontroller word-style coturn sqlxml cubemx location dyld tsx distutils ionic4

K Nearest Neighbors with Python | ML

1. Install Necessary Libraries

2. Load Dataset

3. Split Dataset into Training and Test Set

4. Standardize Features

5. Build and Train KNN Classifier

6. Make Predictions

7. Evaluate the Model

Finding the Best Value of K

More Tags

More Programming Guides

Other Guides

More Programming Examples

Fitness Calculators

Auto Calculators

Financial Calculators

Date and Time Calculators

Internet Calculators

Pregnancy Calculators

Investment Calculators

Math Calculators

Housing/Building Calculators

Health Calculators

Retirement Calculators

Statistics Calculators

Various Measurements/Units Calculators

Everyday Utility Calculators

Weather Calculators

Real Estate Calculators

Tax and Salary Calculators

Geometry Calculators

Electronics/Circuits Calculators

Transportation Calculators

Entertainment/Anecdotes Calculators