3

I have a dataframe like this

 column_name 0 OnePlus phones never fail to meet my expectatiion. 1 received earlier than expected for local set. 2 \n 3 good 4 must buy! 5 \t 6 7 awesome product! 8 \n 

I want to remove all rows which contain ONLY \n, \t, , \n in them.

Output should be like this:

 column_name 0 OnePlus phones never fail to meet my expectatiion. 1 received earlier than expected for local set. 2 good 3 must buy! 4 awesome product! 

I tried the following method:

 df = df[df.column_name != '\n'].reset_index(drop=True) df = df[df.column_name != ''].reset_index(drop=True) df = df[df.column_name != ' '].reset_index(drop=True) df = df[df.column_name != ' '].reset_index(drop=True) df = df[df.column_name != ' \n '].reset_index(drop=True) 

But is there a more elegant way or a pythonic way to do this instead of repeating the code?

3 Answers 3

3

You can use Series.str.strip and compare only empty strings:

df1 = df[df.column_name.str.strip() != ''].reset_index(drop=True) 

Or convert empty values to boolean:

df1 = df[df.column_name.str.strip().astype(bool)].reset_index(drop=True) 

Or filter words, for me was necessary strip (maybe in real data strip should be removed):

df1 = df[df.column_name.str.strip().str.contains('\W', na=False)].reset_index(drop=True) 

If need remove missing values and no string values replace these values to NaNs and then use DataFrame.dropna:

df.column_name = df.column_name.replace(r'^\s*$', np.nan, regex=True) df1 = df.dropna(subset=['column_name']).reset_index(drop=True) 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! The last one is more general, that helps!
2

Use df.str.contains() to check if there is any small alpha after forward slash

df[df.Column Name.str.contains('[\\][a-z]+',case=True, na=False, regex=True)] 

In your case, Data:

print(pd.DataFrame({'A':['OnePlus phones never fail to meet my expectatiion','received earlier than expected for local set.','\n','good','\t', np.nan,'must buy!','','awesome product!','\n' ]})) A 0 OnePlus phones never fail to meet my expectatiion 1 received earlier than expected for local set. 2 \n 3 good 4 \t 5 NaN 6 must buy! 7 8 awesome product! 9 \n 

Solution

print(df[df.A.str.contains('[\\][a-z]+',case=True, na=False, regex=True)]) A 0 OnePlus phones never fail to meet my expectatiion 1 received earlier than expected for local set. 3 good 6 must buy! 8 awesome product! 

Comments

1

Another approach, removing rows where the entries match the flagged elements:

df = df[~df['column_name'].isin(['\\n','\\t'])].dropna() 

If there are extra spaces in the last row (or others) you can first do:

df['column_name'] = df['column_name'].str.strip() 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.