1

I'm trying to build a Linear Regression model for a dataset. After splitting the data into train and test, I get the below error:

ValueError: could not convert string to float: '?' Does that mean, there is null value or a float value in the dataset?

As I'm new to Python, I don't understand how to rectify this. Could anyone help me on this?

import pandas as pd from sklearn.model_selection import train_test_split from sklearn import linear_model df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = ['ID Number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape', 'Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli', 'Mitoses', 'Class']) X = df.iloc[:, 0:9].values y = df.iloc[:, 10].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 4) print(X_train.shape) print(y_train.shape) print(X_test.shape) print(y_test.shape) lr = linear_model.LinearRegression() lr.fit(X_train, y_train) 
2
  • 1
    Looks like one of the column is of type object. Type X.dtype and check datatype of each column in your data. Commented Jul 18, 2019 at 5:25
  • 1
    Yes, one column is of datatype 'Object'. I got the output after deleting that column. Thanks Commented Jul 18, 2019 at 6:58

1 Answer 1

1

The breast-cancer-wisconsin.data Dataset that you are using has some rows with '?' as value in 7th column. So when you create X and y don't consider the rows with '?' as value.

I hope this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes. I removed the column and then did the analysis again, got the output. Thanks

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.