I'm using Python scikit-learn for simple linear regression on data obtained from csv.
reader = pandas.io.parsers.read_csv("data/all-stocks-cleaned.csv") stock = np.array(reader) openingPrice = stock[:, 1] closingPrice = stock[:, 5] print((np.min(openingPrice))) print((np.min(closingPrice))) print((np.max(openingPrice))) print((np.max(closingPrice))) peningPriceTrain, openingPriceTest, closingPriceTrain, closingPriceTest = \ train_test_split(openingPrice, closingPrice, test_size=0.25, random_state=42) openingPriceTrain = np.reshape(openingPriceTrain,(openingPriceTrain.size,1)) openingPriceTrain = openingPriceTrain.astype(np.float64, copy=False) # openingPriceTrain = np.arange(openingPriceTrain, dtype=np.float64) closingPriceTrain = np.reshape(closingPriceTrain,(closingPriceTrain.size,1)) closingPriceTrain = closingPriceTrain.astype(np.float64, copy=False) openingPriceTest = np.reshape(openingPriceTest,(openingPriceTest.size,1)) closingPriceTest = np.reshape(closingPriceTest,(closingPriceTest.size,1)) regression = linear_model.LinearRegression() regression.fit(openingPriceTrain, closingPriceTrain) predicted = regression.predict(openingPriceTest) The min and max values are showed as 0.0 0.6 41998.0 2593.9
Yet I'm getting this error ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
How should I remove this error? Because from the above result it is true that it doesn't contain infinites or Nan values.
What's the solution for this?
Edit: all-stocks-cleaned.csv is avaliabale at http://www.sharecsv.com/s/cb31790afc9b9e33c5919cdc562630f3/all-stocks-cleaned.csv