I am having an issue merging two frames with a different amount of rows. The first dataframe has 5K rows, and the second dataframe has 20K rows. There is a column "id" in both frames, and all 5K "id" values will occur in the frame with 20K rows.
first frame "df"
A B id A_1 B_1 0 1 1 1 0.5 0.5 1 3 2 2 0.2 0.4 2 3 4 3 0.8 0.9 second frame "df_2"
A B id 0 1 1 1 1 3 2 2 2 3 4 3 3 1 2 4 4 3 1 5 Hopeful output frame "df_out"
A B id A_1 B_1 0 1 1 1 0.5 0.5 1 3 2 2 0.2 0.4 2 3 4 3 0.8 0.9 3 1 2 4 na na 4 3 1 5 na na My attempts to merge on 'id' have left me with only the 5k rows. The operation I am seeking is to preserve all the rows of the large dataframe, and stick Nan values for the data that does not exist in the large frame.
Thanks
how='outer'option ofpd.merge.