0

I'm trying to do a linear regression, however I keep running into the same problem of "ValueError: x and y must be the same size". I'm very confused, and have been on every single website there is to try to fix it. If anyone would know that would be a massive help. I don't understand what to do.

import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn import datasets, linear_model from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error import matplotlib.pyplot as plt #load datatset df = pd.read_csv('Real_estate.csv') X = df[['transaction date', 'house age', 'distance to the nearest MRT station','number of convenience stores', 'latitude','longitude']] y = df['house price of unit area'] x= df.iloc[:,0:-7].values y= df.iloc[:,1:].values x, y = np.array(x), np.array(y) model = LinearRegression() model.fit(x, y) model = LinearRegression().fit(x, y) x_train, x_test, y_train, y_test = train_test_split( x, y, test_size = 0.4) sc = StandardScaler() sc.fit(x_train) x_train_std = sc.transform(x_train) x_test_std = sc.transform(x_test) regr = linear_model.LinearRegression() regr.fit(x_train_std, y_train) y_pred = regr.predict(x_test) r_sq = model.score(x, y) print("Intercept: ", regr.intercept_) print("Coefficients: \n", regr.coef_) # The mean squared error print("Mean squared error: %.2f" % mean_squared_error(y_test, y_pred)) ##Model evaluation print("Mean absolute error: %.2f" % mean_absolute_error(y_test,y_pred)) print("Coefficient of determination: %.2f" % r2_score(y_test, y_pred)) y_pred = model.predict(x) print('predicted response:', y_pred, sep='\n') plt.scatter(x_test,y_test, color="black") plt.plot(x_test, y_pred, color="blue", linewidth=3) plt.xticks(()) plt.yticks(()) plt.show() 

This is my code but I don't know understand what's going wrong. I'm trying to use 7 columns, including the y value. I'm a beginner to Python, so I apologize if this is a very silly question. Thank you.

2
  • This is hard to read. Just copy and paste the code to the question and select the part you want to show as code and then press Ctrl+K and it will be formatted correclty. Commented Mar 29, 2022 at 23:05
  • Please post the full traceback of the error (again as formatted text). Commented Mar 29, 2022 at 23:09

1 Answer 1

1
plt.plot(x_test, y_pred, color="blue", linewidth=3) 

Both arguments need to be of the same shape, but y_pred is prediction over entire x, instead of x_test

change

y_pred = model.predict(x) 

to

y_pred = model.predict(x_test) 
Sign up to request clarification or add additional context in comments.

1 Comment

Or alternatively change plt.plot(x_test...) to plt.plot(x...).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.