3

I am looking for fastest way to join columns with same names using separator. my dataframes:

df1: A,B,C,D my,he,she,it df2: A,B,C,D dog,cat,elephant,fish 

expected output:

df: A,B,C,D my:dog,he:cat,she:elephant,it:fish 

As you can see, I want to merge columns with same names, two cells in one. I can use this code for A column:

df=df1.merge(df2) df['A'] = df[['A_x','A_y']].apply(lambda x: ':'.join(x), axis = 1) 

In my real dataset i have above 30 columns, and i dont want to write same lines for each of them, is there any faster way to receive my expected output?

2
  • what are you merging on? index? Commented Nov 12, 2019 at 14:59
  • In your case there's columns that the name doesn't match? Commented Nov 12, 2019 at 15:23

4 Answers 4

2

How about concat and groupby ?

df3 = pd.concat([df1,df2],axis=0) df3 = df3.groupby(df3.index).transform(lambda x : ':'.join(x)).drop_duplicates() print(df3) A B C D 0 my:dog he:cat she:elephant it:fish 
Sign up to request clarification or add additional context in comments.

7 Comments

are you shure that you want to use concat without axis=1 ? i will check it
no using axis=0 gives us the flexibility of grouping along the index and concating the rows into your joined values but it's hard to say without all your business requirements.
this is good answer for my dataset in question, but i have one question about my real dataset, using your code i am receiving as answer, headers of , my dataset, any idea? like: A:B B:C :C:D:A. i think is because of index, so this solution is only for ideal dataset
did you use apply or transform ?
try it with .transform(lambda x : ':'.join(x)).drop_duplicates() as above
|
2

you can do this by simply adding the two dataframe with a separator.

import pandas as pd df1 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0]) df2 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0]) df1["A"] = "my" df1["B"] = "he" df1["C"] = "she" df1["D"] = "it" df2["A"] = "dog" df2["B"] = "cat" df2["C"] = "elephant" df2["D"] = "fish" print(df1) print(df2) df3 = df1 + ':' + df2 print(df3) 

This will give you a result like:

A B C D 0 my he she it A B C D 0 dog cat elephant fish A B C D 0 my:dog he:cat she:elephant it:fish 

Is this what you try to achieve? Although, this only works if you have same columns in both the dataframes. The extra columns will have nans. What do you want to do with the columns those are not same in df1 and df2? Please comment below to help me understand your problem better.

Comments

2

How about this?

df3 = df1 + ':' + df2 print(df3) A B C D 0 my:dog he:cat she:elephant it:fish 

This is good because if there's columns that doesn't match, you get NaN, so you can filter then later if you want:

df1 = pd.DataFrame({'A': ['my'], 'B': ['he'], 'C': ['she'], 'D': ['it'], 'E': ['another'], 'F': ['and another']}) df2 = pd.DataFrame({'A': ['dog'], 'B': ['cat'], 'C': ['elephant'], 'D': ['fish']}) df1 + ':' + df2 A B C D E F 0 my:dog he:cat she:elephant it:fish NaN NaN 

2 Comments

I would avoid loops in dataframes
I found out another solution :)
0

You can simply do:

df = df1 + ':' + df2 print(df) 

Which is simple and effective

This should be your answer

Comments