0

I have a data set of 1000 observations where x is independent and y is dependent. When i am trying to fit a simple regression model I am getting the following error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

There are NaN values or missing data in the data set and i also tried imputer with mean strategy for missing data (if any).

import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset=pd.read_csv('dataset1.csv') x = dataset.iloc[:,:-1].values y = dataset.iloc[: , 1].values from sklearn.preprocessing import Imputer imputer = Imputer(missing_values='NaN' , strategy = 'mean' , axis=0 ) imputer = imputer.fit(x[:, 0:1]) x[:,0:1] = imputer.transform(x[:,0:1]) imputer = imputer.fit(y) y = imputer.transform(y) from sklearn.cross_validation import train_test_split x_train , x_test , y_train , y_test = train_test_split(x,y , test_size=0.3 , random_state=0) from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(x_train , y_train) 

error msg :-

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

dataset :- https://drive.google.com/file/d/1Ryl5my2RG2LpyByhQ_qqgVb7ztZeGtie/view?usp=sharing

1
  • Please use correct upper case letters first. This would improve the readiness of your text. Don't link to datasets. Offer us a sample data set directly in the text. Commented Jun 13, 2019 at 7:56

1 Answer 1

2

There is a missing label in your dataset (row 215 of the y column in the file you linked). To get rid of it you can simply add the following line right after you load the dataset:

dataset.dropna(subset=["y"], inplace=True) # drop any columns with missing labels in place 

In your code you are currently trying to impute the missing label. This does not make much sense since there is nothing to be learnt from a missing label and the corresponding x value looks suspicious, too. But just in case you are wondering, the imputation did not work because you need to reshape your array first:

imputer = imputer.fit(y.reshape(-1, 1)) y = imputer.transform(y.reshape(-1, 1)) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.