2

I currently have two pandas dataframes:

sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}] sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}, {'account': 'Blue Inc', 'Jan': 50, 'Feb': 90, 'Mar': 95 }] test_1 = pd.DataFrame(sales) test_2 = pd.DataFrame(sales2) 

What I want to achieve is to show only the differences that are in 'test_2' and not in 'test_1'.

The code I currently have concatenates the two dataframes and shows me the total difference across both dataframes however all I want to see if the differences in 'test_2' to 'test_1' and not the reverse:

def compare_dataframes(df1, df2): print 'Comparing dataframes...' df = pd.concat([df1, df2]) df = df.reset_index(drop=True) df_gpby = df.groupby(list(df.columns)) idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1] compared_data = df.reindex(idx) if len(compared_data) > 1: print 'No new sales on site!' else: print 'New sales on site!' print(compared_data) 

How could I adapt my current function to work like this?

2 Answers 2

5

Use merge with outer join and indicator parameter:

df = test_1.merge(test_2, how='outer', indicator=True) print (df) Feb Jan Mar account _merge 0 200 150 140 Jones LLC both 1 210 200 215 Alpha Co both 2 90 50 95 Blue Inc right_only 

And then filter only right_only rows by boolean indexing:

only2 = df[df['_merge'] == 'right_only'] print (only2) Feb Jan Mar account _merge 2 90 50 95 Blue Inc right_only 

Thanks @Jon Clements for one line solution with callback:

only2 = test_1.merge(test_2, how='outer', indicator=True)[lambda r: r._merge == 'right_only'] print (only2) Feb Jan Mar account _merge 2 90 50 95 Blue Inc right_only 

Or use query:

only2 = test_1.merge(test_2, how='outer', indicator=True).query("_merge == 'right_only'") 
Sign up to request clarification or add additional context in comments.

1 Comment

Was about to post: test_1.merge(test_2, how='outer', indicator=True)[lambda r: r._merge == 'right_only'].drop('_merge', axis=1) :p
0
import pandas as pd import numpy as np sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}] sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}, {'account': 'Blue Inc', 'Jan': 50, 'Feb': 90, 'Mar': 95 }] test_1 = pd.DataFrame(sales) test_2 = pd.DataFrame(sales2) test_3 = test_1.append(test_2).drop_duplicates(keep=False) print (test_3) 

it prints the different rows

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.