Generating a dataframe based off the diff between two dataframes

Question

I have 2 data frames that look like this

Df1 City Code ColA Col..Z LA LAA LA LAB LA LAC Df2 Code ColA Col..Z LA LAA NY NYA CH CH1

What I'm trying to do have the result of

df3 Code ColA Col..Z NY NYA CH CH1

Normally I would loop through each row in df2 and say:

Df3 = If df2.row['Code'] in df1 then drop it.

But I want to find a pythonic pandas way to do it instead of looping through the dataframe. I was looking at examples using joins or merging but I cant seem to work it out.

Quang Hoang · Accepted Answer · 2021-10-27 01:50:03Z

2

This Df3 = If df2.row['Code'] in df1 then drop it. translates to

df3 = df2[~df2['Code'].isin(df1['City'] ]

answered Oct 27, 2021 at 1:50

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

chowpay Over a year ago

Ended up using this . It worked well. I did a minor syntax fix: df3 = df2[~df2['Code'].isin(df1['Code'] )]

Rohan Gautam · Accepted Answer · 2021-10-27 01:48:51Z

To keep only the different items in df2 based on the code column, you can do something like this, using drop_duplicates :

df2[df2.code.isin( # the different values in df2's 'code' column pd.concat([df1.code, df2.code]).drop_duplicates(keep=False) )]

John Collins · Accepted Answer · 2021-10-27 01:53:37Z

There is a pandas compare df method which might be relevant?:

df1 = pd.read_clipboard() df1

df2 = pd.read_clipboard() df2

df1.compare(df2).drop('self', axis=1, level=1).droplevel(1, axis=1)

(And I'm making an assumption you had a typo in your dataframes with the City col missing from df2?)

Collectives™ on Stack Overflow

Generating a dataframe based off the diff between two dataframes

3 Answers 3

1 Comment

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Related