0

I am trying to figure out how to display the differences between 2 Pandas DataFrames. I am nearly there but can't seem to figure out how to display additional data for rows that contain a difference.

Here is what I have so far:

Compare DataFrame A to DataFrame B:

DataFrame A:

Date ID_1 ID_2 Value 1-Jan 1 1 5 2-Jan 1 2 6 3-Jan 1 3 4 4-Jan 1 4 2 5-Jan 1 5 8 

DataFrame B:

Date ID_1 ID_2 Value 1-Jan 1 1 5 2-Jan 1 2 6 3-Jan 1 3 4 4-Jan 1 4 2 5-Jan 1 5 55 

Current Output:

Date Column From To 5-Jan Value 8 55 

Desired Output:

Date ID_1 ID_2 From To 5-Jan 1 5 8 55 

Current Code:

#stack column(s) where dataframes are not equal ne_stacked = (df1 != df2).stack() #create new dataframe from ne_stacked changed = ne_stacked[ne_stacked] #change column names changed.index.names = ['Date', 'Column'] #create array where dataframes are not equal diff_loc = np.where(df1 != df2) #create 'from' column changed_from = df1.values[diff_loc] #create 'to' column changed_to = df2.values[diff_loc] #create a summary dataframe final = pd.DataFrame({'From': changed_from, 'To': changed_to}, index=changed.index) print final 

2 Answers 2

2

use merge:

In [29]: print df_a Date ID_1 ID_2 Value 0 1-Jan 1 1 5 1 2-Jan 1 2 6 2 3-Jan 1 3 4 3 4-Jan 1 4 2 4 5-Jan 1 5 8 In [30]: print df_b Date ID_1 ID_2 Value 0 1-Jan 1 1 5 1 2-Jan 1 2 6 2 3-Jan 1 3 4 3 4-Jan 1 4 2 4 5-Jan 1 5 55 In [31]: df_c = pd.merge(df_a, df_b, how='outer', on=['Date', 'ID_1', 'ID_2']) df_c.columns = ['Date', 'ID_1', 'ID_2', 'From', 'To'] df_c = df_c[df_c.From!=df_c.To] print df_c Date ID_1 ID_2 From To 4 5-Jan 1 5 8 55 
Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

dfm = df1.merge(df2, on=['Date', 'ID_1', 'ID_2']).rename(columns={'Value_x':'From', 'Value_y':'To'}) print dfm[dfm.From != dfm.To] Date ID_1 ID_2 From To 4 5-Jan 1 5 8 55 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.