Suppose that I have a dataset and build a ML model. This dataset is updated weekly and, after that, I want to, when he updated, my model predict for new rows that appears and append it to original dataset. How I made this?
This what I tried:
import pandas as pd import numpy as np import sklearn from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.svm import SVC url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] df = pd.read_csv(url, names=names) df array = df.values X = array[:,0:4] y = array[:,4] X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20, random_state=1) I skip some steps where I check the score for different models.
model = SVC(gamma='auto') model.fit(X_train, Y_train) predictions = model.predict(X_validation) Here I add new data to make my test:
new_data = [[5.9, 3.0, 5.7, 1.5], [4.8, 2.9, 3.0, 1.2]] df2 = pd.DataFrame(new_data, columns = ["sepal-length", "sepal-width", "petal-length", "petal-width"]) df3 = df.append(df2, ignore_index=True) df3 array2 = df3.values X2 = array2[:,0:4] predict = model.predict(X2) predict df3['pred'] = predict def final_class(row): if pd.isnull(row['class']): return row['pred'] else: return row['class'] df3['final_class'] = df3.apply(lambda x: final_class(x), axis=1) df3 Works, but I think that is not the best way to do it. Can someone help me?