I have a simple CSV with two columns:
- ErrorWeek (a number for the week number in the year)
- ErrorCount (for the number of errors in a given week)
I read the CSV data into a pandas dataframe, like this:
df = pd.read_csv("Errors.csv", sep=",") df.head() shows:
ErrorWeek ErrorCount 0 1 80 1 2 118 2 3 249 3 4 397 4 5 159 So far so good.
Then, I create a test/train split, like this:
X_train, X_test, y_train, y_test = train_test_split( df['ErrorWeek'], df['ErrorCount'], random_state=0) No errors so far.
But, I then create a linear regression object and try to fit the data.
# Create linear regression object regr = linear_model.LinearRegression() # Train the model using the training sets regr.fit(X_train, y_train) Here I do get an error: "Reshape your data either using array.reshape(-1, 1)"
--
Looking at the shape of X_Test and y_Test, I get what looks like two one dimensional "arrays":
X_train shape: (36,) y_train shape: (36,) --
I have spent many hours trying to figure this out, but I'm new to Pandas, Python, and to scikit-learn.
I'm reading in two dimensional data, but Pandas isn't seeing that way.
What do I need to do, specifically?
Thanks,