This is a sample of my data
I have written this code which removes all categorical columns (eg. MsZoning). However, some non-categorical columns have NA value. How can I exclude them from my data set.
def main(): print('Starting program execution') iowa_train_prices_file_path='C:\\...\\programs\\python\\kaggle_competition_iowa_house_prices_train.csv' iowa_file_data = pd.read_csv(iowa_train_prices_file_path) print('Read file') model_random_forest = RandomForestRegressor(random_state=1) features = ['MSSubClass','MSZoning',...] y = iowa_file_data.SalePrice # every colmn except SalePrice X = iowa_file_data.drop('SalePrice', axis = 1) #The object dtype indicates a column has text (hint that the column is categorical) X_dropped = X.select_dtypes(exclude=['object']) print("fitting model") model_random_forest.fit(X_dropped, y) print("MAE of dropped categorical approach"); pd.set_option('display.max_rows', 500) pd.set_option('display.max_columns', 500) pd.set_option('display.width', 1000) main() When I run the program, I get error ValueError: Input contains NaN, infinity or a value too large for dtype('float32') which I believe is due to NA value of Id=8.
Question 1 - How do I remove such rows entirely Question 2 - What is the type of such columns which are mostly nos. but have text in between? I thought I'll do print("X types",type(X.columns)) but that doesn't give the result
