1

I created the below table in Google Sheets and downloaded it as a CSV file.

enter image description here

My code is posted below. I'm really not sure where it's failing. I tried to highlight and run the code line by line and it keeps throwing that error.

# Data Preprocessing # Import Libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Import Dataset dataset = pd.read_csv('Data2.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 5].values # Replace Missing Values from sklearn.preprocessing import Imputer imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0) imputer = imputer.fit(X[:, 1:5 ]) X[:, 1:6] = imputer.transform(X[:, 1:5]) 

The error I'm getting is:

Could not convert string to float: 'Illinois' 

I also have this line above my error message

array = np.array(array, dtype=dtype, order=order, copy=copy) 

It seems like my code is not able to read my GPA column which contains floats. Maybe I didn't create that column right and have to specify that they're floats?

*** I'm updating with the full error message:

 [15]: runfile('/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/data_preprocessing_template2.py', wdir='/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing') Traceback (most recent call last): File "<ipython-input-15-5f895cf9ba62>", line 1, in <module> runfile('/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/data_preprocessing_template2.py', wdir='/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing') File "/Users/jim/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile execfile(filename, namespace) File "/Users/jim/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/data_preprocessing_template2.py", line 16, in <module> imputer = imputer.fit(X[:, 1:5 ]) File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/imputation.py", line 155, in fit force_all_finite=False) File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 433, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: could not convert string to float: 'Illinois' 
6
  • Use X[:,2:] as float values are from 3rd column onwards Commented Dec 21, 2017 at 3:01
  • Why not put the line that generates the error in your question? Commented Dec 21, 2017 at 3:04
  • "I'm really not sure where it's failing. [...] The error I'm getting is [...]" Please include the complete traceback (i.e. the complete error message) in the question. It will tell you where the code is failing. Commented Dec 21, 2017 at 3:15
  • Hi @WarrenWeckesser I've updated my post with the full error. Thank you. Commented Dec 21, 2017 at 3:57
  • @newcoder you still haven't pasted the error message fully. I recreated your case and ran it to see the full error message. Please see my answer. Commented Dec 21, 2017 at 4:00

2 Answers 2

3

Actually the full error you are getting is this (which would help tremendously if you pasted it in full):

Traceback (most recent call last): File "<ipython-input-7-6a92ceaf227a>", line 8, in <module> imputer = imputer.fit(X[:, 1:5 ]) File "C:\Users\Fatih\Anaconda2\lib\site-packages\sklearn\preprocessing\imputation.py", line 155, in fit force_all_finite=False) File "C:\Users\Fatih\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 433, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: could not convert string to float: Illinois 

which, if you look carefully, points out where it is failing:

imputer = imputer.fit(X[:, 1:5 ]) 

which is due to your effort in taking mean of a categorical variable, which, doesn't make sense, and

which is already asked and answered in this StackOverflow thread.

Sign up to request clarification or add additional context in comments.

2 Comments

Ok thank you, will make sure to post the whole error next time.
@newcoder Humbly, I highly recommend that you run your script one line at a time, as opposed to running the entire script at once, during prototyping or development or learning. That way you can be in full charge of what each line of code is actually doing, and it also makes debugging so easier. I am glad I was able to help!
-2

Change the line:

dataset = pd.read_csv('Data2.csv') 

by:

dataset = pd.read_csv('Data2.csv', delimiter=";") 

1 Comment

adding a delimiter will not help change anything

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.