This answer introduces the thresh parameter which is absolutely useful in some use-cases.
Note: I added this answer because some questions have been marked as duplicates directing to this page which none of the approaches here addresses such use-cases eg; The bellow df format.
Example:
This approach addresses:
- Dropping rows/columns with all
NaN - Keeping rows/columns with desired number of
non-NaN values (having valid data)
# Approaching rows ------------------ # Sample df df = pd.DataFrame({'Names': ['Name1', 'Name2', 'Name3', 'Name4'], 'Sunday': [2, None, 3, 3], 'Tuesday': [0, None, 3, None], 'Wednesday': [None, None, 4, None], 'Friday': [1, None, 7, None]}) print(df) Names Sunday Tuesday Wednesday Friday 0 Name1 2.0 0.0 NaN 1.0 1 Name2 NaN NaN NaN NaN 2 Name3 3.0 3.0 4.0 7.0 3 Name4 3.0 NaN NaN NaN # Keep only the rows with at least 2 non-NA values. df = df.dropna(thresh=2) print(df) Names Sunday Tuesday Wednesday Friday 0 Name1 2.0 0.0 NaN 1.0 2 Name3 3.0 3.0 4.0 7.0 3 Name4 3.0 NaN NaN NaN # Keep only the rows with at least 3 non-NA values. df = df.dropna(thresh=3) print(df) Names Sunday Tuesday Wednesday Friday 0 Name1 2.0 0.0 NaN 1.0 2 Name3 3.0 3.0 4.0 7.0
# Approaching columns: We need axis here to direct drop to columns ------------------------------------------------------------------ # If axis=0 or not called, drop is applied to only rows like the above examples # original df print(df) Names Sunday Tuesday Wednesday Friday 0 Name1 2.0 0.0 NaN 1.0 1 Name2 NaN NaN NaN NaN 2 Name3 3.0 3.0 4.0 7.0 3 Name4 3.0 NaN NaN NaN # Keep only the columns with at least 2 non-NA values. df =df.dropna(axis=1, thresh=2) print(df) Names Sunday Tuesday Friday 0 Name1 2.0 0.0 1.0 1 Name2 NaN NaN NaN 2 Name3 3.0 3.0 7.0 3 Name4 3.0 NaN NaN # Keep only the columns with at least 3 non-NA values. df =df.dropna(axis=1, thresh=3) print(df) Names Sunday 0 Name1 2.0 1 Name2 NaN 2 Name3 3.0 3 Name4 3.0
Conclusion:
- The
thresh parameter from pd.dropna() doc gives you the flexibility to decide the range of non-Na values you want to keep in a row/column. - The
thresh parameter addresses a dataframe of the above given structure which df.dropna(how='all') does not.
pd.dropna()?