How to delete a certain value from all columns in a dataframe?

Question

I need to delete a certain value from all the column in my data frame.

Data frame description:

 Data columns (total 13 columns): Column Non-Null Count Dtype --- ------ -------------- ----- 1 age_group_5_years 34842 non-null int64 2 race_eth 34842 non-null int64 3 first_degree_hx 34842 non-null int64 4 age_menarche 34842 non-null int64 5 age_first_birth 34842 non-null int64

The number inside the column indicates a category .

Example:

Age (years) in 5 year groups 1 = Age 18-29 2 = Age 30-34 3 = Age 35-39

The column contains int or float, I need to drop all the 9 values in the data frame. 9 indicates unknown values in the dataframe

I'm not sure which column you mean here, but this should work:df = df[df["column"].ne(9)] where .ne(9) means not equal to 9. — Alex
– Alex, Commented Aug 14, 2021 at 13:01
what should happen if a cell contains 9? Do you want to drop the entire row, or replace with NaN? — Pierre D
– Pierre D, Commented Aug 14, 2021 at 13:19
First of all, Thanks for the answer. I would like to replace it with Nan. — Mohanad Anas Taifour
– Mohanad Anas Taifour, Commented Aug 14, 2021 at 14:06
what is the best method for detecting unknown values inside a data frame (Nan)?, to improve a machine learning model's accuracy? — Mohanad Anas Taifour
– Mohanad Anas Taifour, Commented Aug 14, 2021 at 14:09

Pierre D · Accepted Answer · 2021-08-14 13:48:03Z

I'm reading between the lines here: assuming the OP wants to drop all rows where at least one column contains 9 (int) or 9.0 (float) or 9 + 0j (complex):

df_new = df.replace(9, np.NaN).dropna()

Alternatively, you can make a mask and select with it:

mask = (df != 9).all(1) df_new = df.loc[mask]

Reproducible example:

np.random.seed(0) n = 12 df = pd.DataFrame({ 'x': np.random.randint(8, 11, n), 'y': np.random.randint(8, 11, n) * 1.0, 'z': np.random.randint(8, 11, n) * (1.0 + 0j), })

Gives:

>>> df x y z 0 8 9.0 9.0+0.0j 1 9 10.0 10.0+0.0j 2 8 10.0 8.0+0.0j 3 9 8.0 10.0+0.0j 4 9 9.0 8.0+0.0j 5 10 9.0 9.0+0.0j 6 8 9.0 9.0+0.0j 7 10 9.0 10.0+0.0j 8 8 8.0 8.0+0.0j 9 8 9.0 9.0+0.0j 10 8 8.0 9.0+0.0j 11 10 8.0 9.0+0.0j

And:

df_new = df.replace(9, np.NaN).dropna() >>> df_new x y z 2 8.0 10.0 8.0+0.0j 8 8.0 8.0 8.0+0.0j

Note that, with the seed and parameters above, each column has at least one row where it is the only one to have a 9:

>>> {k: set(s.index[s]) ... for k in df.columns ... for s in [(df[k] == 9) & ~(df.drop(k, 1) == 9).any(1)] ... } {'x': {1, 3}, 'y': {7}, 'z': {10, 11}}

and these are all among the rows that are dropped from df. That verifies that the expression is correct and drops all rows where any value is 9 (as int, float, or complex).

Collectives™ on Stack Overflow

How to delete a certain value from all columns in a dataframe?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related