Exclude rows which have NA value for a column [duplicate]

Question

This is a sample of my data

I have written this code which removes all categorical columns (eg. MsZoning). However, some non-categorical columns have NA value. How can I exclude them from my data set.

def main(): print('Starting program execution') iowa_train_prices_file_path='C:\\...\\programs\\python\\kaggle_competition_iowa_house_prices_train.csv' iowa_file_data = pd.read_csv(iowa_train_prices_file_path) print('Read file') model_random_forest = RandomForestRegressor(random_state=1) features = ['MSSubClass','MSZoning',...] y = iowa_file_data.SalePrice # every colmn except SalePrice X = iowa_file_data.drop('SalePrice', axis = 1) #The object dtype indicates a column has text (hint that the column is categorical) X_dropped = X.select_dtypes(exclude=['object']) print("fitting model") model_random_forest.fit(X_dropped, y) print("MAE of dropped categorical approach"); pd.set_option('display.max_rows', 500) pd.set_option('display.max_columns', 500) pd.set_option('display.width', 1000) main()

When I run the program, I get error ValueError: Input contains NaN, infinity or a value too large for dtype('float32') which I believe is due to NA value of Id=8.

Question 1 - How do I remove such rows entirely Question 2 - What is the type of such columns which are mostly nos. but have text in between? I thought I'll do print("X types",type(X.columns)) but that doesn't give the result

Alex Metsai · Accepted Answer · 2021-03-01 07:03:52Z

To remove nans, you can replace them with another value. It is common practice to use zeros.

iowa_file_data = iowa_file_data.fillna(0)

If you still want to remove the whole column, use

iowa_file_data = iowa_file_data.dropna(axis='columns')

And if you want to remove the entire row, use

iowa_file_data = iowa_file_data.dropna()

For your second question, from what I understand, you might want to see some info about the pandas object dtype: link.

Can I drop the entire row with has NA for a column or dropping the entire column the only option?
to remove rows, skip the axis=.. argument. I edited my post to inlcude this.

Collectives™ on Stack Overflow

Exclude rows which have NA value for a column [duplicate]

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related