How to delete rows that contain only certain values?

Question

I have a dataframe like this

 column_name 0 OnePlus phones never fail to meet my expectatiion. 1 received earlier than expected for local set. 2 \n 3 good 4 must buy! 5 \t 6 7 awesome product! 8 \n

I want to remove all rows which contain ONLY \n, \t, , \n in them.

Output should be like this:

 column_name 0 OnePlus phones never fail to meet my expectatiion. 1 received earlier than expected for local set. 2 good 3 must buy! 4 awesome product!

I tried the following method:

 df = df[df.column_name != '\n'].reset_index(drop=True) df = df[df.column_name != ''].reset_index(drop=True) df = df[df.column_name != ' '].reset_index(drop=True) df = df[df.column_name != ' '].reset_index(drop=True) df = df[df.column_name != ' \n '].reset_index(drop=True)

But is there a more elegant way or a pythonic way to do this instead of repeating the code?

jezrael · Accepted Answer · 2020-07-06 06:02:11Z

You can use Series.str.strip and compare only empty strings:

df1 = df[df.column_name.str.strip() != ''].reset_index(drop=True)

Or convert empty values to boolean:

df1 = df[df.column_name.str.strip().astype(bool)].reset_index(drop=True)

Or filter words, for me was necessary strip (maybe in real data strip should be removed):

df1 = df[df.column_name.str.strip().str.contains('\W', na=False)].reset_index(drop=True)

If need remove missing values and no string values replace these values to NaNs and then use DataFrame.dropna:

df.column_name = df.column_name.replace(r'^\s*$', np.nan, regex=True) df1 = df.dropna(subset=['column_name']).reset_index(drop=True)

wwnde · Accepted Answer · 2020-07-06 06:35:16Z

Use df.str.contains() to check if there is any small alpha after forward slash

df[df.Column Name.str.contains('[\\][a-z]+',case=True, na=False, regex=True)]

In your case, Data:

print(pd.DataFrame({'A':['OnePlus phones never fail to meet my expectatiion','received earlier than expected for local set.','\n','good','\t', np.nan,'must buy!','','awesome product!','\n' ]})) A 0 OnePlus phones never fail to meet my expectatiion 1 received earlier than expected for local set. 2 \n 3 good 4 \t 5 NaN 6 must buy! 7 8 awesome product! 9 \n

Solution

print(df[df.A.str.contains('[\\][a-z]+',case=True, na=False, regex=True)]) A 0 OnePlus phones never fail to meet my expectatiion 1 received earlier than expected for local set. 3 good 6 must buy! 8 awesome product!

Tom · Accepted Answer · 2020-07-06 05:58:21Z

Another approach, removing rows where the entries match the flagged elements:

df = df[~df['column_name'].isin(['\\n','\\t'])].dropna()

If there are extra spaces in the last row (or others) you can first do:

df['column_name'] = df['column_name'].str.strip()

Collectives™ on Stack Overflow

How to delete rows that contain only certain values?

3 Answers 3

1 Comment

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Related