Numpy Error "Could not convert string to float: 'Illinois'"

Question

I created the below table in Google Sheets and downloaded it as a CSV file.

enter image description here

My code is posted below. I'm really not sure where it's failing. I tried to highlight and run the code line by line and it keeps throwing that error.

# Data Preprocessing # Import Libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Import Dataset dataset = pd.read_csv('Data2.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 5].values # Replace Missing Values from sklearn.preprocessing import Imputer imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0) imputer = imputer.fit(X[:, 1:5 ]) X[:, 1:6] = imputer.transform(X[:, 1:5])

The error I'm getting is:

Could not convert string to float: 'Illinois'

I also have this line above my error message

array = np.array(array, dtype=dtype, order=order, copy=copy)

It seems like my code is not able to read my GPA column which contains floats. Maybe I didn't create that column right and have to specify that they're floats?

*** I'm updating with the full error message:

 [15]: runfile('/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/data_preprocessing_template2.py', wdir='/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing') Traceback (most recent call last): File "<ipython-input-15-5f895cf9ba62>", line 1, in <module> runfile('/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/data_preprocessing_template2.py', wdir='/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing') File "/Users/jim/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile execfile(filename, namespace) File "/Users/jim/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "/Users/jim/Desktop/Machine Learning Class/Part 1/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/data_preprocessing_template2.py", line 16, in <module> imputer = imputer.fit(X[:, 1:5 ]) File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/imputation.py", line 155, in fit force_all_finite=False) File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 433, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: could not convert string to float: 'Illinois'

Why not put the line that generates the error in your question? — JMA
– JMA, Commented Dec 21, 2017 at 3:04
"I'm really not sure where it's failing. [...] The error I'm getting is [...]" Please include the complete traceback (i.e. the complete error message) in the question. It will tell you where the code is failing. — Warren Weckesser
– Warren Weckesser, Commented Dec 21, 2017 at 3:15
Hi @WarrenWeckesser I've updated my post with the full error. Thank you. — wolfbagel
– wolfbagel, Commented Dec 21, 2017 at 3:57
@newcoder you still haven't pasted the error message fully. I recreated your case and ran it to see the full error message. Please see my answer. — FatihAkici
– FatihAkici, Commented Dec 21, 2017 at 4:00

FatihAkici · Accepted Answer · 2017-12-21 03:57:15Z

Actually the full error you are getting is this (which would help tremendously if you pasted it in full):

Traceback (most recent call last): File "<ipython-input-7-6a92ceaf227a>", line 8, in <module> imputer = imputer.fit(X[:, 1:5 ]) File "C:\Users\Fatih\Anaconda2\lib\site-packages\sklearn\preprocessing\imputation.py", line 155, in fit force_all_finite=False) File "C:\Users\Fatih\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 433, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: could not convert string to float: Illinois

which, if you look carefully, points out where it is failing:

imputer = imputer.fit(X[:, 1:5 ])

which is due to your effort in taking mean of a categorical variable, which, doesn't make sense, and

which is already asked and answered in this StackOverflow thread.

Ok thank you, will make sure to post the whole error next time.
@newcoder Humbly, I highly recommend that you run your script one line at a time, as opposed to running the entire script at once, during prototyping or development or learning. That way you can be in full charge of what each line of code is actually doing, and it also makes debugging so easier. I am glad I was able to help!

Daria Pydorenko · Accepted Answer · 2018-05-24 10:09:27Z

-2

Change the line:

dataset = pd.read_csv('Data2.csv')

by:

dataset = pd.read_csv('Data2.csv', delimiter=";")

edited May 24, 2018 at 10:09

Daria Pydorenko

1,8032 gold badges23 silver badges50 bronze badges

answered May 24, 2018 at 9:18

Nadir BOUCHERIT

1

1 Comment

Sumukh Bhandarkar Over a year ago

adding a delimiter will not help change anything

Collectives™ on Stack Overflow

Numpy Error "Could not convert string to float: 'Illinois'"

2 Answers 2

2 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Linked

Related