ValueError when trying to fit regression model

Question

I have a data set of 1000 observations where x is independent and y is dependent. When i am trying to fit a simple regression model I am getting the following error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

There are NaN values or missing data in the data set and i also tried imputer with mean strategy for missing data (if any).

import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset=pd.read_csv('dataset1.csv') x = dataset.iloc[:,:-1].values y = dataset.iloc[: , 1].values from sklearn.preprocessing import Imputer imputer = Imputer(missing_values='NaN' , strategy = 'mean' , axis=0 ) imputer = imputer.fit(x[:, 0:1]) x[:,0:1] = imputer.transform(x[:,0:1]) imputer = imputer.fit(y) y = imputer.transform(y) from sklearn.cross_validation import train_test_split x_train , x_test , y_train , y_test = train_test_split(x,y , test_size=0.3 , random_state=0) from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(x_train , y_train)

error msg :-

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

dataset :- https://drive.google.com/file/d/1Ryl5my2RG2LpyByhQ_qqgVb7ztZeGtie/view?usp=sharing

Please use correct upper case letters first. This would improve the readiness of your text. Don't link to datasets. Offer us a sample data set directly in the text. — buhtz
– buhtz, Commented Jun 13, 2019 at 7:56

Scriddie · Accepted Answer · 2019-06-13 08:19:07Z

There is a missing label in your dataset (row 215 of the y column in the file you linked). To get rid of it you can simply add the following line right after you load the dataset:

dataset.dropna(subset=["y"], inplace=True) # drop any columns with missing labels in place

In your code you are currently trying to impute the missing label. This does not make much sense since there is nothing to be learnt from a missing label and the corresponding x value looks suspicious, too. But just in case you are wondering, the imputation did not work because you need to reshape your array first:

imputer = imputer.fit(y.reshape(-1, 1)) y = imputer.transform(y.reshape(-1, 1))

Collectives™ on Stack Overflow

ValueError when trying to fit regression model

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related