Join two same columns from two dataframes, pandas

Question

I am looking for fastest way to join columns with same names using separator. my dataframes:

df1: A,B,C,D my,he,she,it df2: A,B,C,D dog,cat,elephant,fish

expected output:

df: A,B,C,D my:dog,he:cat,she:elephant,it:fish

As you can see, I want to merge columns with same names, two cells in one. I can use this code for A column:

df=df1.merge(df2) df['A'] = df[['A_x','A_y']].apply(lambda x: ':'.join(x), axis = 1)

In my real dataset i have above 30 columns, and i dont want to write same lines for each of them, is there any faster way to receive my expected output?

what are you merging on? index?

Umar.H
– Umar.H

2019-11-12 14:59:47 +00:00
Commented Nov 12, 2019 at 14:59 — Umar.H
– Umar.H, Commented Nov 12, 2019 at 14:59
In your case there's columns that the name doesn't match?

igorkf
– igorkf

2019-11-12 15:23:30 +00:00
Commented Nov 12, 2019 at 15:23 — igorkf
– igorkf, Commented Nov 12, 2019 at 15:23

Umar.H · Accepted Answer · 2019-11-12 15:23:58Z

2

How about concat and groupby ?

df3 = pd.concat([df1,df2],axis=0) df3 = df3.groupby(df3.index).transform(lambda x : ':'.join(x)).drop_duplicates() print(df3) A B C D 0 my:dog he:cat she:elephant it:fish

edited Nov 12, 2019 at 15:23

answered Nov 12, 2019 at 15:00

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user8560167 Over a year ago

are you shure that you want to use concat without axis=1 ? i will check it

Umar.H Over a year ago

no using axis=0 gives us the flexibility of grouping along the index and concating the rows into your joined values but it's hard to say without all your business requirements.

user8560167 Over a year ago

this is good answer for my dataset in question, but i have one question about my real dataset, using your code i am receiving as answer, headers of , my dataset, any idea? like: A:B B:C :C:D:A. i think is because of index, so this solution is only for ideal dataset

Umar.H Over a year ago

did you use apply or transform ?

Umar.H Over a year ago

try it with .transform(lambda x : ':'.join(x)).drop_duplicates() as above

|

SSharma · Accepted Answer · 2019-11-12 15:10:16Z

you can do this by simply adding the two dataframe with a separator.

import pandas as pd df1 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0]) df2 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0]) df1["A"] = "my" df1["B"] = "he" df1["C"] = "she" df1["D"] = "it" df2["A"] = "dog" df2["B"] = "cat" df2["C"] = "elephant" df2["D"] = "fish" print(df1) print(df2) df3 = df1 + ':' + df2 print(df3)

This will give you a result like:

A B C D 0 my he she it A B C D 0 dog cat elephant fish A B C D 0 my:dog he:cat she:elephant it:fish

Is this what you try to achieve? Although, this only works if you have same columns in both the dataframes. The extra columns will have nans. What do you want to do with the columns those are not same in df1 and df2? Please comment below to help me understand your problem better.

igorkf · Accepted Answer · 2019-11-12 15:21:51Z

How about this?

df3 = df1 + ':' + df2 print(df3) A B C D 0 my:dog he:cat she:elephant it:fish

This is good because if there's columns that doesn't match, you get NaN, so you can filter then later if you want:

df1 = pd.DataFrame({'A': ['my'], 'B': ['he'], 'C': ['she'], 'D': ['it'], 'E': ['another'], 'F': ['and another']}) df2 = pd.DataFrame({'A': ['dog'], 'B': ['cat'], 'C': ['elephant'], 'D': ['fish']}) df1 + ':' + df2 A B C D E F 0 my:dog he:cat she:elephant it:fish NaN NaN

VSharma · Accepted Answer · 2019-11-13 09:20:01Z

You can simply do:

df = df1 + ':' + df2 print(df)

Which is simple and effective

This should be your answer

Collectives™ on Stack Overflow

Join two same columns from two dataframes, pandas

4 Answers 4

7 Comments

Comments

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

Comments

2 Comments

Comments

Related