3

I have two data frames and I am trying to output the data that is in one but not the other.

I can get the data in the first dataframe but not the second using

only_new = old.merge( new, 'outer', on=['Employee ID', 'Benefit Plan Type'], suffixes=['','_'], indicator=True ).query('_merge == "left_only"').reindex_axis(old.columns, axis=1) 

Here is what I'm using to get the data that's only in my second dataframe

only_new =new.merge( old, 'outer', on=['Employee ID', 'Benefit Plan Type'], suffixes=['','_'], indicator=True ).query('_merge == "left only"').reindex_axis(new.columns, axis=1) 

But it doesn't return any data, but using Excel I can see that there should be a couple of rows.

It seems like this should work

only_new = old.merge(new, on='Employee ID', indicator=True, how='outer', only_new[only_new['_merge'] == 'right_only']) 

But I get

SyntaxError: non-keyword arg after keyword arg 

2 Answers 2

3

It seems you need change '_merge == "left_only"' to '_merge == "right_only"'.

Sign up to request clarification or add additional context in comments.

3 Comments

That works. When I output it I only get the two columns I'm joining on, and the first one I get all columns. Is that coming from somewhere else in the code?
Hmmm, it seems main problem is in reindex_axis. If remove it get same output, because reindex_axis filter out data, which are not in old.columns or in new.columns
Looking at the output closer it is returning the data in the first data frame, but only the ID and plan type columns, which is what the top half does correctly. I'm playing around with it to get the other way around.
1

Consider the dataframes old and new

old = pd.DataFrame(dict( ID=[1, 2, 3, 4, 5], Type=list('AAABB'), Total=[9 for _ in range(5)], ArbitraryColumn=['blah' for _ in range(5)] )) new = pd.DataFrame(dict( ID=[3, 4, 5, 6, 7], Type=list('ABBCC'), Total=[9 for _ in range(5)], ArbitraryColumn=['blah' for _ in range(5)] )) 

Then to take the symmetrically identical solution

old.merge( new, 'outer', on=['ID', 'Type'], suffixes=['_', ''], indicator=True # changed order of suffixes ).query('_merge == "right_only"').reindex_axis(new.columns, axis=1) # \......../ \./ # changed from `left` to `right` reindex with `new` ArbitraryColumn ID Total Type 5 blah 6 9.0 C 6 blah 7 9.0 C 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.