0

Given the following two pandas Dataframes df1 & df2 I am trying to delete all rows from df1["a"] (strings) which do not match with any of the strings from all the entries of df2["z"].

df1 = pd.DataFrame({'a' : ['use social tag', 'dubmod intern workshop data', 'deep collabor filter', 'pathrank a novel node rank'], 'b' : ["test", "test2" ,"test3", "test4"]}) df1 a b 0 use social tag test 1 dubmod intern workshop data test2 2 deep collabor filter test3 3 pathrank a novel node rank test4 df2 = pd.DataFrame({'z' : ['experiment', 'dubmod intern workshop data', 'deep collabor filter', 'experiment3']}) df2 z 0 experiment 1 dubmod intern workshop data 2 deep collabor filter 3 experiment3 

The result should look like this:

 a b 0 dubmod intern workshop data test2 1 deep collabor filter test3 
3
  • Don't think regex is the tool for the job Commented Jul 24, 2019 at 19:04
  • 1
    df1.merge(df2.rename(columns={'z': 'a'})). It's an exact match right? Commented Jul 24, 2019 at 19:04
  • Possible duplicate of delete rows based on a condition in pandas Commented Jul 24, 2019 at 23:34

2 Answers 2

1

To fully meet you requirements, I mean to "regenerate" also the index, run:

df1[df1.a.isin(df2.z)].reset_index(drop=True) 
Sign up to request clarification or add additional context in comments.

Comments

1

If you are only looking for exact matches, it's really simple:

df1[df1['a'].isin(df2['z'])].reset_index(drop=True) 

Instead of delete, you are filtering df1 for rows that are in df2.

2 Comments

This works fine if you don't want to refresh the index.
@Peet all you have to do is to add .reset_index(drop=True)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.