How To Compare Two Pandas DataFrames and Show Differences In DataFrame 2

Question

I currently have two pandas dataframes:

sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}] sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}, {'account': 'Blue Inc', 'Jan': 50, 'Feb': 90, 'Mar': 95 }] test_1 = pd.DataFrame(sales) test_2 = pd.DataFrame(sales2)

What I want to achieve is to show only the differences that are in 'test_2' and not in 'test_1'.

The code I currently have concatenates the two dataframes and shows me the total difference across both dataframes however all I want to see if the differences in 'test_2' to 'test_1' and not the reverse:

def compare_dataframes(df1, df2): print 'Comparing dataframes...' df = pd.concat([df1, df2]) df = df.reset_index(drop=True) df_gpby = df.groupby(list(df.columns)) idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1] compared_data = df.reindex(idx) if len(compared_data) > 1: print 'No new sales on site!' else: print 'New sales on site!' print(compared_data)

How could I adapt my current function to work like this?

jezrael · Accepted Answer · 2018-08-30 08:17:14Z

Use merge with outer join and indicator parameter:

df = test_1.merge(test_2, how='outer', indicator=True) print (df) Feb Jan Mar account _merge 0 200 150 140 Jones LLC both 1 210 200 215 Alpha Co both 2 90 50 95 Blue Inc right_only

And then filter only right_only rows by boolean indexing:

only2 = df[df['_merge'] == 'right_only'] print (only2) Feb Jan Mar account _merge 2 90 50 95 Blue Inc right_only

Thanks @Jon Clements for one line solution with callback:

only2 = test_1.merge(test_2, how='outer', indicator=True)[lambda r: r._merge == 'right_only'] print (only2) Feb Jan Mar account _merge 2 90 50 95 Blue Inc right_only

Or use query:

only2 = test_1.merge(test_2, how='outer', indicator=True).query("_merge == 'right_only'")

Was about to post: test_1.merge(test_2, how='outer', indicator=True)[lambda r: r._merge == 'right_only'].drop('_merge', axis=1) :p

Oncess · Accepted Answer · 2018-08-30 08:22:43Z

import pandas as pd import numpy as np sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}] sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}, {'account': 'Blue Inc', 'Jan': 50, 'Feb': 90, 'Mar': 95 }] test_1 = pd.DataFrame(sales) test_2 = pd.DataFrame(sales2) test_3 = test_1.append(test_2).drop_duplicates(keep=False) print (test_3)

it prints the different rows

Collectives™ on Stack Overflow

How To Compare Two Pandas DataFrames and Show Differences In DataFrame 2

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related