2

I have five dataframes each has unique columns and one common column id no.. Say each dataframe has the following columns:

  1. df1: id no, time, date, age, name.
  2. df2: id no, gender, address, employment, birth_date.
  3. df3: id no, .....etc.
  4. df4: id no, ......etc.
  5. df5: id no, .......etc.

I have used merge as the following:

first1 = pd.merge(df1, df2, how= 'left', on = 'id_no') first2 = pd.merge(first1, df3, how= 'left', on = 'id_no') first3 = pd.merge(first2, df4, how= 'left', on = 'id_no') combineall = pd.merge(first3, df5, how= 'left', on = 'id_no') 

The problem: The columns of df3 are missing from combineall dataframe. Though when I print df3 alone I see all the contents available. How my df3 gets missing during merging? Is there I make this process easier and less problematic.

Goal: I want to have comabinall dataframe with the all the columns from df1,df2,df3,df4,df5 merged based on id_no.

0

2 Answers 2

0

Try out this:

from functools import reduce finaldf = reduce(lambda left,right: pd.merge(left, right, on='id_no', how='left'), [df1,df2,df3,df4,df5]) 
Sign up to request clarification or add additional context in comments.

2 Comments

I still face the same issue. My df3 columns are empty in finaldf.
Try removing how='left'
0
import pandas as pd import numpy as np df1 = pd.DataFrame(np.array([ ['a', 1, 2], ['b', 3, 4], ['c', 5, 6]]), columns=['id', 'A', 'B']) df2 = pd.DataFrame(np.array([ ['a', 10, 12], ['b', 13, 14], ['c', 15, 16]]), columns=['id', 'C', 'D']) df3 = pd.DataFrame(np.array([ ['a', 100, 120], ['b', 130, 140], ['c', 150, 160]]), columns=['id', 'E', 'F']) dfs = [df1, df2, df3] dfs = [df.set_index('id') for df in dfs] aa = dfs[0].join(dfs[1:]) aar= aa.reset_index() print(aar) """ Explanation 1: 1. Set the index of each dataframe to the id column. 2. Join the dataframes together. 3. Reset the index to get the id column back. """ """ OUTPUT : id A B C D E F 0 a 1 2 10 12 100 120 1 b 3 4 13 14 130 140 2 c 5 6 15 16 150 160 """ bb = pd.concat( objs = ( common_ID.set_index('id') for common_ID in (df1, df2, df3) ), axis=1, join='inner' ).reset_index() print(bb) """ OUTPUT: id A B C D E F 0 a 1 2 10 12 100 120 1 b 3 4 13 14 130 140 2 c 5 6 15 16 150 160 """ """ Explanation 2: 1. set_index('id') for common_ID in (df1, df2, df3) - this is a generator expression that creates a generator object - the generator object is a sequence of dataframes with the common_ID as the index 2. pd.concat(objs = generator_object, axis=1, join='inner') - this concatenates the dataframes in the generator object along the columns - the join='inner' ensures that only the common_ID is used 3. .reset_index() - this resets the index to the default index """ cc = df1.merge(df2,on='id').merge(df3,on='id') print(cc) import functools as ft dd = ft.reduce(lambda x, y: pd.merge(x, y, on='id'), dfs) print(dd) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.