0

How can I concatenate a dictionary of dataframes (index of each dataframe starts with 0. So, there are duplicate indices) with different column sizes. i.e. Few df's have some missing columns and some df's contain all the columns.

Here is an example:

df1

idx col1 col2 col3 0 1 1 1 1 2 2 2 

df2

idx col1 col3 0 1 1 1 2 2 

df3

idx col1 col2 col3 0 1 1 1 1 2 2 2 

desired_result

idx col1 col2 col3 0 1 1 1 1 2 2 2 2 1 nan 1 3 2 nan 2 4 1 1 1 5 2 2 2 

So I would like to fill these columns with Nans and reshape it to the dataframe having all columns.

Thanks in advance !

1

1 Answer 1

1

Considering that the dictionary of dataframes looks like the following

df_dict = { 'df1': pd.DataFrame({'idx': [1, 2], 'col1': [1, 2], 'col2': [1, 2], 'col3': [1, 2]}), 'df2': pd.DataFrame({'idx': [1, 2], 'col1': [1, 2], 'col3': [1, 2]}), 'df3': pd.DataFrame({'idx': [1, 2], 'col1': [1, 2], 'col2': [1, 2], 'col3': [1, 2]}) } 

One can simply use pandas.concat as follows

df = pd.concat(df_dict.values(), ignore_index=True) [Out]: idx col1 col2 col3 0 1 1 1.0 1 1 2 2 2.0 2 2 1 1 NaN 1 3 2 2 NaN 2 4 1 1 1.0 1 5 2 2 2.0 2 

If one wants the idx column to be the index, one can pass pandas.DataFrame.set_index as follows

df = pd.concat(df_dict.values(), ignore_index=True).set_index('idx') [Out]: col1 col2 col3 idx 1 1 1.0 1 2 2 2.0 2 1 1 NaN 1 2 2 NaN 2 1 1 1.0 1 2 2 2.0 2 

Notes:

  • If one doesn't want col2 to be float64 (following this user's comment), one approach would be to use pandas.Int64Dtype as follows

    df['col2'] = df['col2'].astype(pd.Int64Dtype()) [Out]: idx col1 col2 col3 0 1 1 1 1 1 2 2 2 2 2 1 1 <NA> 1 3 2 2 <NA> 2 4 1 1 1 1 5 2 2 2 2 
Sign up to request clarification or add additional context in comments.

4 Comments

main = pd.concat(dict_main.values(), ignore_index=True).. I already tried this..But I got this error.
@SherwinR then your dictionary of df might not look like the one you mention in the question, and, consequently, the one I share in the answer.
@GonçaloPeres, just out of interest, why is col2 merging as a float? And could this be an int?
@Bossman NaN is a float. An alternative could be the pandas.Int64Dtype as it allows nullable ints.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.