0

I am having Dataframe which has multiple columns in which some columns are equal (Same key in trailing end eg: column1 = 'a/first', column2 = 'b/first'). I want to merge these two columns. Please help me out to solve the problem.

My Dataframe looks like

name g1/column1 g1/column2 g1/g2/column1 g2/column2 AAAA 10 20 nan nan AAAA nan nan 30 40 

My result will be like as follows

name g1/column1 g1/column2 AAAA 10 20 AAAA 30 40 

Thanks in advance

2
  • what if both the columns have value for the same row ? Commented Dec 6, 2018 at 5:02
  • This is not possible. one should have one value. and others are nan Commented Dec 6, 2018 at 5:05

4 Answers 4

2

Use:

#create index by all columns with no merge df = df.set_index('name') #MultiIndex by split last / df.columns = df.columns.str.rsplit('/', n=1, expand=True) #aggregate first no NaN values per second level of MultiIndex df = df.groupby(level=1, axis=1).first() print (df) column1 column2 name AAAA 10.0 20.0 AAAA 30.0 40.0 
Sign up to request clarification or add additional context in comments.

5 Comments

nice and small solution. However, if you can put more explanation that will be great. Like for df = df.groupby(level=1, axis=1).first()
If the column which has no separator ('/'). it will ignore those columns. How to avoid this ?
@MOHAMEDAZARUDEEN - There is some rule for grouping? What is print (df.columns) ?
[u'Additional_type_of_g_business_enterprise', u'version', u'_attachments', u'formhub/uuid', u'group_bf8zc97/Female', u'group_bf8zc97/Female', u'group_bf8zc97/To', u'group_bf8zc97/CIG_expenses', u'group_gu8hn21/CIG_expenses', u'group_gu8hn22/group_gu8hn20/CIG_expenses']
@MOHAMEDAZARUDEEN - Thank you. So what columns need merge together? I cannot see your real data, so need mapping like u'village',u'county', ...=> u'village_code',u'sub_county' ... (maybe I ma wrong with this mapping, lease correct it if necessary and also add all columns names for both sides)
1

you need df.combine_first,

col1=['g1/column1', 'g1/column2'] col2=['g1/g2/column1', 'g2/column2'] df[col1]=df[col1].combine_first(pd.DataFrame(df[col2].values,columns=col1)) df=df.drop(col2,axis=1) print(df) # name g1/column1 g1/column2 #0 AAAA 10.0 20.0 #1 AAAA 30.0 40.0 

1 Comment

If I am having another column g1/g2/g3/column1, it wont added under g1/column1.
0

One of the solution:

df = pd.DataFrame([[10, 20, np.nan, np.nan], [np.nan, np.nan, 30, 40]], columns=['g1/column1', 'g1/column2', 'g1/g2/column1', 'g2/column2']) df g1/column1 g1/column2 g1/g2/column1 g2/column2 0 10.0 20.0 NaN NaN 1 NaN NaN 30.0 40.0 df = df.fillna(0) # <- replacing all NaN with 0 ndf = pd.DataFrame() unique_cols = ['column1', 'column2'] for i in range(len(unique_cols)): val = df.columns[df.columns.str.contains(unique_cols[i])] ndf[val[0]] = df.loc[:,val].sum().reset_index(drop=True) ndf # <- You can add index if you need (AAAA, AAAA) g1/column1 g1/column2 0 10.0 20.0 1 30.0 40.0 

Comments

0
import pandas as pd import numpy as np g1 = [20, np.nan, 30, np.nan] g1_2 = [10, np.nan, 20, np.nan] g2 = [np.nan, 30, np.nan, 40] g2_2 = [np.nan, 10, np.nan, 30] dataList = list(zip(g1, g1_2, g2, g2_2)) df = pd.DataFrame(data = dataList, columns=['g1/column1', 'g1/column2', 'g1/g2/column1', 'g2/column2']) df.fillna(0, inplace=True) df['g1Combined'] = df['g1/column1'] + df['g1/g2/column1'] df['g2Combined'] = df['g1/column2'] + df['g2/column2'] df.drop('g1/column1', axis=1, inplace=True) df.drop('g1/column2', axis=1, inplace=True) df.drop('g1/g2/column1', axis=1, inplace=True) df.drop('g2/column2', axis=1, inplace=True) df 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.