Missing columns when merging multiple dataframes based on column value

Question

I have five dataframes each has unique columns and one common column id no.. Say each dataframe has the following columns:

df1: id no, time, date, age, name.
df2: id no, gender, address, employment, birth_date.
df3: id no, .....etc.
df4: id no, ......etc.
df5: id no, .......etc.

I have used merge as the following:

first1 = pd.merge(df1, df2, how= 'left', on = 'id_no') first2 = pd.merge(first1, df3, how= 'left', on = 'id_no') first3 = pd.merge(first2, df4, how= 'left', on = 'id_no') combineall = pd.merge(first3, df5, how= 'left', on = 'id_no')

The problem: The columns of df3 are missing from combineall dataframe. Though when I print df3 alone I see all the contents available. How my df3 gets missing during merging? Is there I make this process easier and less problematic.

Goal: I want to have comabinall dataframe with the all the columns from df1,df2,df3,df4,df5 merged based on id_no.

Aditi · Accepted Answer · 2018-02-21 06:43:23Z

0

Try out this:

from functools import reduce finaldf = reduce(lambda left,right: pd.merge(left, right, on='id_no', how='left'), [df1,df2,df3,df4,df5])

answered Feb 21, 2018 at 6:43

Aditi

82612 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Firis Over a year ago

I still face the same issue. My df3 columns are empty in finaldf.

Aditi Over a year ago

Try removing how='left'

Soudipta Dutta · Accepted Answer · 2022-12-28 06:41:46Z

import pandas as pd import numpy as np df1 = pd.DataFrame(np.array([ ['a', 1, 2], ['b', 3, 4], ['c', 5, 6]]), columns=['id', 'A', 'B']) df2 = pd.DataFrame(np.array([ ['a', 10, 12], ['b', 13, 14], ['c', 15, 16]]), columns=['id', 'C', 'D']) df3 = pd.DataFrame(np.array([ ['a', 100, 120], ['b', 130, 140], ['c', 150, 160]]), columns=['id', 'E', 'F']) dfs = [df1, df2, df3] dfs = [df.set_index('id') for df in dfs] aa = dfs[0].join(dfs[1:]) aar= aa.reset_index() print(aar) """ Explanation 1: 1. Set the index of each dataframe to the id column. 2. Join the dataframes together. 3. Reset the index to get the id column back. """ """ OUTPUT : id A B C D E F 0 a 1 2 10 12 100 120 1 b 3 4 13 14 130 140 2 c 5 6 15 16 150 160 """ bb = pd.concat( objs = ( common_ID.set_index('id') for common_ID in (df1, df2, df3) ), axis=1, join='inner' ).reset_index() print(bb) """ OUTPUT: id A B C D E F 0 a 1 2 10 12 100 120 1 b 3 4 13 14 130 140 2 c 5 6 15 16 150 160 """ """ Explanation 2: 1. set_index('id') for common_ID in (df1, df2, df3) - this is a generator expression that creates a generator object - the generator object is a sequence of dataframes with the common_ID as the index 2. pd.concat(objs = generator_object, axis=1, join='inner') - this concatenates the dataframes in the generator object along the columns - the join='inner' ensures that only the common_ID is used 3. .reset_index() - this resets the index to the default index """ cc = df1.merge(df2,on='id').merge(df3,on='id') print(cc) import functools as ft dd = ft.reduce(lambda x, y: pd.merge(x, y, on='id'), dfs) print(dd)

Collectives™ on Stack Overflow

Missing columns when merging multiple dataframes based on column value

2 Answers 2

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Linked

Related