better way to drop nan rows in pandas

Question

On my own I found a way to drop nan rows from a pandas dataframe. Given a dataframe dat with column x which contains nan values,is there a more elegant way to do drop each row of dat which has a nan value in the x column?

dat = dat[np.logical_not(np.isnan(dat.x))] dat = dat.reset_index(drop=True)

you mean pd.dropna()?

Zero
– Zero

2016-04-02 08:09:38 +00:00
Commented Apr 2, 2016 at 8:09 — Zero
– Zero, Commented Apr 2, 2016 at 8:09
that looks like it will work

kilojoules
– kilojoules

2016-04-02 08:12:13 +00:00
Commented Apr 2, 2016 at 8:12 — kilojoules
– kilojoules, Commented Apr 2, 2016 at 8:12

TerminalWitchcraft · Accepted Answer · 2017-12-08 10:03:20Z

Use dropna:

dat.dropna()

You can pass param how to drop if all labels are nan or any of the labels are nan

dat.dropna(how='any') #to drop if any value in the row has a nan dat.dropna(how='all') #to drop if all values in the row are nan

Hope that answers your question!

Edit 1: In case you want to drop rows containing nan values only from particular column(s), as suggested by J. Doe in his answer below, you can use the following:

dat.dropna(subset=[col_list]) # col_list is a list of column names to consider for nan values.

J. Doe · Accepted Answer · 2017-04-07 13:18:50Z

To expand Hitesh's answer if you want to drop rows where 'x' specifically is nan, you can use the subset parameter. His answer will drop rows where other columns have nans as well

dat.dropna(subset=['x'])

hRt · Accepted Answer · 2017-09-07 07:16:03Z

16

Just in case commands in previous answers doesn't work, Try this: dat.dropna(subset=['x'], inplace = True)

answered Sep 7, 2017 at 7:16

hRt

2392 silver badges4 bronze badges

1 Comment

Charles Chow Over a year ago

yeah, pandas defaults are inplace=False, need to remember that

Jamiu S. · Accepted Answer · 2023-02-06 11:38:23Z

This answer introduces the thresh parameter which is absolutely useful in some use-cases.
Note: I added this answer because some questions have been marked as duplicates directing to this page which none of the approaches here addresses such use-cases eg; The bellow df format.
Example:
This approach addresses:

Dropping rows/columns with all NaN
Keeping rows/columns with desired number of non-NaN values (having valid data)

# Approaching rows ------------------ # Sample df df = pd.DataFrame({'Names': ['Name1', 'Name2', 'Name3', 'Name4'], 'Sunday': [2, None, 3, 3], 'Tuesday': [0, None, 3, None], 'Wednesday': [None, None, 4, None], 'Friday': [1, None, 7, None]}) print(df) Names Sunday Tuesday Wednesday Friday 0 Name1 2.0 0.0 NaN 1.0 1 Name2 NaN NaN NaN NaN 2 Name3 3.0 3.0 4.0 7.0 3 Name4 3.0 NaN NaN NaN # Keep only the rows with at least 2 non-NA values. df = df.dropna(thresh=2) print(df) Names Sunday Tuesday Wednesday Friday 0 Name1 2.0 0.0 NaN 1.0 2 Name3 3.0 3.0 4.0 7.0 3 Name4 3.0 NaN NaN NaN # Keep only the rows with at least 3 non-NA values. df = df.dropna(thresh=3) print(df) Names Sunday Tuesday Wednesday Friday 0 Name1 2.0 0.0 NaN 1.0 2 Name3 3.0 3.0 4.0 7.0

# Approaching columns: We need axis here to direct drop to columns ------------------------------------------------------------------ # If axis=0 or not called, drop is applied to only rows like the above examples # original df print(df) Names Sunday Tuesday Wednesday Friday 0 Name1 2.0 0.0 NaN 1.0 1 Name2 NaN NaN NaN NaN 2 Name3 3.0 3.0 4.0 7.0 3 Name4 3.0 NaN NaN NaN # Keep only the columns with at least 2 non-NA values. df =df.dropna(axis=1, thresh=2) print(df) Names Sunday Tuesday Friday 0 Name1 2.0 0.0 1.0 1 Name2 NaN NaN NaN 2 Name3 3.0 3.0 7.0 3 Name4 3.0 NaN NaN # Keep only the columns with at least 3 non-NA values. df =df.dropna(axis=1, thresh=3) print(df) Names Sunday 0 Name1 2.0 1 Name2 NaN 2 Name3 3.0 3 Name4 3.0

Conclusion:

The thresh parameter from pd.dropna() doc gives you the flexibility to decide the range of non-Na values you want to keep in a row/column.
The thresh parameter addresses a dataframe of the above given structure which df.dropna(how='all') does not.

Naveen Gabriel · Accepted Answer · 2019-12-21 17:23:27Z

To remove rows based on Nan value of particular column:

d= pd.DataFrame([[2,3],[4,None]]) #creating data frame d Output: 0 1 0 2 3.0 1 4 NaN

d = d[np.isfinite(d[1])] #Select rows where value of 1st column is not nan d Output: 0 1 0 2 3.0

user3452643 · Accepted Answer · 2022-09-20 18:00:53Z

dropna() is probably all you need for this, but creating a custom filter may also help or be easier to understand

import pandas as pd import numpy as np df = pd.DataFrame( [[4, 7, np.nan, np.nan], [5, np.nan, 11, 2], [6, 9, 12, np.nan]], index=[1, 2, 3], columns=['a', 'b', 'c', 'd']) print(f'starting matrix:\n{df}') #create the matrix of true/false NaNs: null_matrix = df.isnull() #create the sum of number of NaNs sum_null_matrix = null_matrix.T.sum().T #create the query of the matrix query_null = sum_null_matrix<2 #apply them to your matrix applied_df = df[query_null] print(f'query matrix:\n{query_null}') print(f'applied matrix:\n{applied_df}')

and you get the result:

starting matrix: a b c d 1 4 7.0 NaN NaN 2 5 NaN 11.0 2.0 3 6 9.0 12.0 NaN query matrix: 1 False 2 True 3 True dtype: bool applied matrix: a b c d 2 5 NaN 11.0 2.0 3 6 9.0 12.0 NaN

more info may be available on the nan checking answer: How to check if any value is NaN in a Pandas DataFrame

edit: dropna() has a threshold variable, but it doesn't have a min variable. This answer was for when someone needed to create a 'min NaNs' or some other custom function.

venuturumilli Sita Rama Rao · Accepted Answer · 2023-05-10 20:33:13Z

0

If you want to improve the readability of the code. We can have both values of Nan and notNan by using a bool series

bool_series=pd.notnull(dat["x"]) dat_notnull=dat[bool_series] dat_null =dat[~bool_series]

edited May 10, 2023 at 20:33

venuturumilli Sita Rama Rao

159 bronze badges

answered Oct 28, 2018 at 1:42

Chunxiao Li

515 bronze badges

2 Comments

craigcaulfield Over a year ago

Code is always good, but it also helps to add some comments/context about how this code answers the original question.

i alarmed alien Over a year ago

Please edit your answer to add an explanation of how your code works and how it solves the OP's problem. Many StackOverflow users are newbies and will not understand the code you have posted, so will not learn from your answer.

Collectives™ on Stack Overflow

better way to drop nan rows in pandas

7 Answers 7

Comments

Comments

1 Comment

Comments

Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

Comments

1 Comment

Comments

Comments

Comments

2 Comments

Linked

Related