-1

I have two dataframes, let's call them A and B. They have exactly the same 7 columns (let's call them col1, col2, col3, col4, col5, col6 and col7). Some of the columns include client_id, client_first_name, client_last_name, telephone number etc. (I can't reveal the exact names for confidentiality purposes).

DataFrame A is much bigger than DataFrame B and some of the entries from DataFrame B are included in DataFrame A (i.e. DataFrame B is a subset of DataFrame A).

The problem is, I want to make sure that the records in DataFrame A are NOT in DataFrame B, i.e. 'subtract' DataFrame B from DataFrame A. How do I do it?

So far, I've been adding an extra column entitled 'group' for both DataFrames, merging them using pd.merge(A, B, how='left', on='col) and then pulling out the ones that ended up with two different values for 'group_x' and 'group_y' (the merge created these two groups.

Is there an easier way to do it? I tried a bunch of things but none of them worked.

1

1 Answer 1

0

Yes your way is OK, you could also do something like dfA.ix[!dfA.col.isin(dbB.col)] if you don't need the merged dataframe.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.