I have a data set of 1000 observations where x is independent and y is dependent. When i am trying to fit a simple regression model I am getting the following error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
There are NaN values or missing data in the data set and i also tried imputer with mean strategy for missing data (if any).
import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset=pd.read_csv('dataset1.csv') x = dataset.iloc[:,:-1].values y = dataset.iloc[: , 1].values from sklearn.preprocessing import Imputer imputer = Imputer(missing_values='NaN' , strategy = 'mean' , axis=0 ) imputer = imputer.fit(x[:, 0:1]) x[:,0:1] = imputer.transform(x[:,0:1]) imputer = imputer.fit(y) y = imputer.transform(y) from sklearn.cross_validation import train_test_split x_train , x_test , y_train , y_test = train_test_split(x,y , test_size=0.3 , random_state=0) from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(x_train , y_train) error msg :-
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
dataset :- https://drive.google.com/file/d/1Ryl5my2RG2LpyByhQ_qqgVb7ztZeGtie/view?usp=sharing