Pandas: How to reshape multiple dataframes into a common form?

Question

(Reproducible examples and feeble attempts at the end)

I've got two dataframes, df1 and df2:

df1:

 Col_A Col_B Col_D 1 NaN 21 NaN 2 10 NaN 33 4 12 23 38

df2:

 Col_C Col_E 2 22 44 3 NaN 45 5 4 48

And I'd like to find a common form like this:

df_common:

 Col_A Col_B Col_C Col_D Col_E 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN

...where I've got the union of all column names and row indexes, and NaN values only:

Then I'd like to fill in the values of df1 and df2 (still in two separate tables) so that I'll end up with:

df1_desired

 Col_A Col_B Col_C Col_D Col_E 1 NaN 21 NaN NaN NaN 2 10 NaN NaN 33 NaN 3 NaN NaN NaN NaN NaN 4 12 23 NaN NaN NaN 5 NaN NaN NaN 38 NaN

df2_resired:

 Col_A Col_B Col_C Col_D Col_E 1 NaN NaN NaN NaN NaN 2 NaN NaN 22 NaN 44 3 NaN NaN NaN NaN 35 4 NaN NaN NaN NaN NaN 5 NaN NaN 4 NaN 48

I've tried various attempts with pd.merge() and df.update() with no success

But I've come to terms with the fact that I don't even know what to properly call this particular challenge. Thank you for any suggestions!

Reproducible examples:

import pandas as pd import numpy as np df1 = pd.DataFrame({'Col_A': {1: np.nan, 2: '10', 4: '12'}, 'Col_B': {1: '21', 2: np.nan, 4: '23'}, 'Col_D': {1: np.nan, 2: '33', 4: '38'}}) df2 = pd.DataFrame({'Col_C': {2: '22', 3: np.nan, 5: '4'}, 'Col_E': {2: 44, 3: 45, 5: 48}}) df1_desired = pd.DataFrame({'Col_A': {1: np.nan, 2: '10', 3: np.nan, 4: '12', 5: np.nan}, 'Col_B': {1: '23', 2: np.nan, 3: np.nan, 4: '23', 5: np.nan}, 'Col_C': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}, 'Col_D': {1: np.nan, 2: '22', 3: np.nan, 4: np.nan, 5: '4'}, 'Col_E': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}}) df2_desired = pd.DataFrame({'Col_A': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}, 'Col_B': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}, 'Col_C': {1: np.nan, 2: '22', 3: np.nan, 4: np.nan, 5: '4'}, 'Col_D': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}, 'Col_E': {1: np.nan, 2: '44', 3: '35', 4: np.nan, 5: '48'}}) # find the commons common_cols = sorted(list(set().union(list(df1),list(df2)))) common_rows = sorted(list(set().union(list(df1.index),list(df2.index)))) df_common = pd.DataFrame(np.nan, index=common_rows, columns=common_cols) # attempt at reshaping df1 with pd.merge # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html df1_reshaped = pd.merge(df_common, df1, how='left', left_index=True, right_index=True) # attempt at dropping duplicates for df1 #df1_reshaped = df1_reshaped[df1_reshaped.columns.drop(list(df1_reshaped.filter(regex='_x')))] #df1_reshaped.columns = df_common.columns # attempt with df.update() # https://stackoverflow.com/questions/9787853/join-or-merge-with-overwrite-in-pandas df1_updated=df_common.update(df1)

halfer · Accepted Answer · 2020-05-19 14:05:37Z

2

You can use:

s=df1.combine_first(df2) df1=df1.reindex_like(s) Col_A Col_B Col_C Col_D Col_E 1 NaN 21.0 NaN NaN NaN 2 10.0 NaN NaN 33.0 NaN 3 NaN NaN NaN NaN NaN 4 12.0 23.0 NaN 38.0 NaN 5 NaN NaN NaN NaN NaN

edited May 19, 2020 at 14:05

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered May 3, 2020 at 1:34

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

vestland Over a year ago

Thank you so much! The approach is perfect, although it per the example should have been s=df_common.combine_first(df1); df1=df1.reindex_like(s). (and btw not my downvote!).

BENY Over a year ago

@vestland yw:-) happy coding

vestland Over a year ago

Someone probably tested your approach and didn't get that the approach was spot on but the reference not 100% correct as per the question. But I feel that that responsibility lies on my shoulders for not having explained the challenge properly. I hope the downvoter sees this and makes things right.

Collectives™ on Stack Overflow

Pandas: How to reshape multiple dataframes into a common form?

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related