I currently have two pandas dataframes:
sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}] sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140}, {'account': 'Alpha Co', 'Jan': 200, 'Feb': 210, 'Mar': 215}, {'account': 'Blue Inc', 'Jan': 50, 'Feb': 90, 'Mar': 95 }] test_1 = pd.DataFrame(sales) test_2 = pd.DataFrame(sales2) What I want to achieve is to show only the differences that are in 'test_2' and not in 'test_1'.
The code I currently have concatenates the two dataframes and shows me the total difference across both dataframes however all I want to see if the differences in 'test_2' to 'test_1' and not the reverse:
def compare_dataframes(df1, df2): print 'Comparing dataframes...' df = pd.concat([df1, df2]) df = df.reset_index(drop=True) df_gpby = df.groupby(list(df.columns)) idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1] compared_data = df.reindex(idx) if len(compared_data) > 1: print 'No new sales on site!' else: print 'New sales on site!' print(compared_data) How could I adapt my current function to work like this?