2

(Reproducible examples and feeble attempts at the end)

I've got two dataframes, df1 and df2:

df1:

 Col_A Col_B Col_D 1 NaN 21 NaN 2 10 NaN 33 4 12 23 38 

df2:

 Col_C Col_E 2 22 44 3 NaN 45 5 4 48 

And I'd like to find a common form like this:

df_common:

 Col_A Col_B Col_C Col_D Col_E 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN 

...where I've got the union of all column names and row indexes, and NaN values only:

Then I'd like to fill in the values of df1 and df2 (still in two separate tables) so that I'll end up with:

df1_desired

 Col_A Col_B Col_C Col_D Col_E 1 NaN 21 NaN NaN NaN 2 10 NaN NaN 33 NaN 3 NaN NaN NaN NaN NaN 4 12 23 NaN NaN NaN 5 NaN NaN NaN 38 NaN 

df2_resired:

 Col_A Col_B Col_C Col_D Col_E 1 NaN NaN NaN NaN NaN 2 NaN NaN 22 NaN 44 3 NaN NaN NaN NaN 35 4 NaN NaN NaN NaN NaN 5 NaN NaN 4 NaN 48 

I've tried various attempts with pd.merge() and df.update() with no success

But I've come to terms with the fact that I don't even know what to properly call this particular challenge. Thank you for any suggestions!

Reproducible examples:

import pandas as pd import numpy as np df1 = pd.DataFrame({'Col_A': {1: np.nan, 2: '10', 4: '12'}, 'Col_B': {1: '21', 2: np.nan, 4: '23'}, 'Col_D': {1: np.nan, 2: '33', 4: '38'}}) df2 = pd.DataFrame({'Col_C': {2: '22', 3: np.nan, 5: '4'}, 'Col_E': {2: 44, 3: 45, 5: 48}}) df1_desired = pd.DataFrame({'Col_A': {1: np.nan, 2: '10', 3: np.nan, 4: '12', 5: np.nan}, 'Col_B': {1: '23', 2: np.nan, 3: np.nan, 4: '23', 5: np.nan}, 'Col_C': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}, 'Col_D': {1: np.nan, 2: '22', 3: np.nan, 4: np.nan, 5: '4'}, 'Col_E': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}}) df2_desired = pd.DataFrame({'Col_A': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}, 'Col_B': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}, 'Col_C': {1: np.nan, 2: '22', 3: np.nan, 4: np.nan, 5: '4'}, 'Col_D': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}, 'Col_E': {1: np.nan, 2: '44', 3: '35', 4: np.nan, 5: '48'}}) # find the commons common_cols = sorted(list(set().union(list(df1),list(df2)))) common_rows = sorted(list(set().union(list(df1.index),list(df2.index)))) df_common = pd.DataFrame(np.nan, index=common_rows, columns=common_cols) # attempt at reshaping df1 with pd.merge # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html df1_reshaped = pd.merge(df_common, df1, how='left', left_index=True, right_index=True) # attempt at dropping duplicates for df1 #df1_reshaped = df1_reshaped[df1_reshaped.columns.drop(list(df1_reshaped.filter(regex='_x')))] #df1_reshaped.columns = df_common.columns # attempt with df.update() # https://stackoverflow.com/questions/9787853/join-or-merge-with-overwrite-in-pandas df1_updated=df_common.update(df1) 

1 Answer 1

2

You can use:

s=df1.combine_first(df2) df1=df1.reindex_like(s) Col_A Col_B Col_C Col_D Col_E 1 NaN 21.0 NaN NaN NaN 2 10.0 NaN NaN 33.0 NaN 3 NaN NaN NaN NaN NaN 4 12.0 23.0 NaN 38.0 NaN 5 NaN NaN NaN NaN NaN 
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much! The approach is perfect, although it per the example should have been s=df_common.combine_first(df1); df1=df1.reindex_like(s). (and btw not my downvote!).
@vestland yw:-) happy coding
Someone probably tested your approach and didn't get that the approach was spot on but the reference not 100% correct as per the question. But I feel that that responsibility lies on my shoulders for not having explained the challenge properly. I hope the downvoter sees this and makes things right.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.