Pandas - Merging Different Sized DataFrames

Question

I am having an issue merging two frames with a different amount of rows. The first dataframe has 5K rows, and the second dataframe has 20K rows. There is a column "id" in both frames, and all 5K "id" values will occur in the frame with 20K rows.

first frame "df"

 A B id A_1 B_1 0 1 1 1 0.5 0.5 1 3 2 2 0.2 0.4 2 3 4 3 0.8 0.9

second frame "df_2"

 A B id 0 1 1 1 1 3 2 2 2 3 4 3 3 1 2 4 4 3 1 5

Hopeful output frame "df_out"

 A B id A_1 B_1 0 1 1 1 0.5 0.5 1 3 2 2 0.2 0.4 2 3 4 3 0.8 0.9 3 1 2 4 na na 4 3 1 5 na na

My attempts to merge on 'id' have left me with only the 5k rows. The operation I am seeking is to preserve all the rows of the large dataframe, and stick Nan values for the data that does not exist in the large frame.

Thanks

Use the how='outer' option of pd.merge.

DYZ
– DYZ

2017-02-17 01:55:37 +00:00
Commented Feb 17, 2017 at 1:55 — DYZ
– DYZ, Commented Feb 17, 2017 at 1:55

miradulo · Accepted Answer · 2017-02-17 02:10:33Z

3

Just specify how=outer to df.merge so that you use the union of both DataFrames.

>>> df.merge(df_2, how='outer') A A_1 B B_1 id 0 1.0 0.5 1.0 0.5 1.0 1 3.0 0.2 2.0 0.4 2.0 2 3.0 0.8 4.0 0.9 3.0 3 1.0 NaN 2.0 NaN 4.0 4 3.0 NaN 1.0 NaN 5.0

edited Feb 17, 2017 at 2:10

answered Feb 17, 2017 at 1:59

miradulo

29.8k7 gold badges86 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

miradulo Over a year ago

@JeffSaltfist Glad I could help!

Jeff Saltfist Over a year ago

quick question, does the 'outer' argument signify the last column in the frame?

miradulo Over a year ago

@JeffSaltfist It specifies that the union of keys from both DataFrames should be used to merge, as opposed to the intersection or the entirety of one or the other (see the docs I linked).

Jeff Saltfist Over a year ago

Thank you. I was able to solve the issue with your code and by also including left_index=False and right_index=False in the code.

Collectives™ on Stack Overflow

Pandas - Merging Different Sized DataFrames

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related