0

I need to delete a certain value from all the column in my data frame.

Data frame description:

 Data columns (total 13 columns): Column Non-Null Count Dtype --- ------ -------------- ----- 1 age_group_5_years 34842 non-null int64 2 race_eth 34842 non-null int64 3 first_degree_hx 34842 non-null int64 4 age_menarche 34842 non-null int64 5 age_first_birth 34842 non-null int64 

The number inside the column indicates a category .

Example:

Age (years) in 5 year groups 1 = Age 18-29 2 = Age 30-34 3 = Age 35-39 

The column contains int or float, I need to drop all the 9 values in the data frame. 9 indicates unknown values in the dataframe

4
  • I'm not sure which column you mean here, but this should work:df = df[df["column"].ne(9)] where .ne(9) means not equal to 9. Commented Aug 14, 2021 at 13:01
  • what should happen if a cell contains 9? Do you want to drop the entire row, or replace with NaN? Commented Aug 14, 2021 at 13:19
  • First of all, Thanks for the answer. I would like to replace it with Nan. Commented Aug 14, 2021 at 14:06
  • what is the best method for detecting unknown values inside a data frame (Nan)?, to improve a machine learning model's accuracy? Commented Aug 14, 2021 at 14:09

1 Answer 1

1

I'm reading between the lines here: assuming the OP wants to drop all rows where at least one column contains 9 (int) or 9.0 (float) or 9 + 0j (complex):

df_new = df.replace(9, np.NaN).dropna() 

Alternatively, you can make a mask and select with it:

mask = (df != 9).all(1) df_new = df.loc[mask] 

Reproducible example:

np.random.seed(0) n = 12 df = pd.DataFrame({ 'x': np.random.randint(8, 11, n), 'y': np.random.randint(8, 11, n) * 1.0, 'z': np.random.randint(8, 11, n) * (1.0 + 0j), }) 

Gives:

>>> df x y z 0 8 9.0 9.0+0.0j 1 9 10.0 10.0+0.0j 2 8 10.0 8.0+0.0j 3 9 8.0 10.0+0.0j 4 9 9.0 8.0+0.0j 5 10 9.0 9.0+0.0j 6 8 9.0 9.0+0.0j 7 10 9.0 10.0+0.0j 8 8 8.0 8.0+0.0j 9 8 9.0 9.0+0.0j 10 8 8.0 9.0+0.0j 11 10 8.0 9.0+0.0j 

And:

df_new = df.replace(9, np.NaN).dropna() >>> df_new x y z 2 8.0 10.0 8.0+0.0j 8 8.0 8.0 8.0+0.0j 

Note that, with the seed and parameters above, each column has at least one row where it is the only one to have a 9:

>>> {k: set(s.index[s]) ... for k in df.columns ... for s in [(df[k] == 9) & ~(df.drop(k, 1) == 9).any(1)] ... } {'x': {1, 3}, 'y': {7}, 'z': {10, 11}} 

and these are all among the rows that are dropped from df. That verifies that the expression is correct and drops all rows where any value is 9 (as int, float, or complex).

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.