I am trying to figure out how to display the differences between 2 Pandas DataFrames. I am nearly there but can't seem to figure out how to display additional data for rows that contain a difference.
Here is what I have so far:
Compare DataFrame A to DataFrame B:
DataFrame A:
Date ID_1 ID_2 Value 1-Jan 1 1 5 2-Jan 1 2 6 3-Jan 1 3 4 4-Jan 1 4 2 5-Jan 1 5 8 DataFrame B:
Date ID_1 ID_2 Value 1-Jan 1 1 5 2-Jan 1 2 6 3-Jan 1 3 4 4-Jan 1 4 2 5-Jan 1 5 55 Current Output:
Date Column From To 5-Jan Value 8 55 Desired Output:
Date ID_1 ID_2 From To 5-Jan 1 5 8 55 Current Code:
#stack column(s) where dataframes are not equal ne_stacked = (df1 != df2).stack() #create new dataframe from ne_stacked changed = ne_stacked[ne_stacked] #change column names changed.index.names = ['Date', 'Column'] #create array where dataframes are not equal diff_loc = np.where(df1 != df2) #create 'from' column changed_from = df1.values[diff_loc] #create 'to' column changed_to = df2.values[diff_loc] #create a summary dataframe final = pd.DataFrame({'From': changed_from, 'To': changed_to}, index=changed.index) print final