Stacking two columns in a dataframe

Question

This is the dataframe I am working with,

data = pd.DataFrame({'id':[1,2,3], 'start date': ['2001-11-13', '2002-6-26', '2004-2-22'], 'end date': ['2002-6-25', '2004-2-21', '2006-8-21'], 'values':[1000,2000,3000]})

I am trying to stack only the date columns, start date and end date, while leaving the id and values columns unstacked. The final dataframe would look like this,

data = pd.DataFrame({'id':[1,1,2,2,3,3], 'date': ['2001-11-13', '2002-6-26','2004-2-22','2002-6-25','2004-2-21', '2006-8-21'], 'values':[1000,1000,2000,2000,3000,3000]})

BENY · Accepted Answer · 2022-07-23 16:05:16Z

Just do melt

out = data.melt(['id','values']).sort_values('id') Out[436]: id values variable value 0 1 1000 start date 2001-11-13 3 1 1000 end date 2002-6-25 1 2 2000 start date 2002-6-26 4 2 2000 end date 2004-2-21 2 3 3000 start date 2004-2-22 5 3 3000 end date 2006-8-21

Or set_index + stack

out = data.set_index(['id','values']).stack().reset_index() Out[437]: id values level_2 0 0 1 1000 start date 2001-11-13 1 1 1000 end date 2002-6-25 2 2 2000 start date 2002-6-26 3 2 2000 end date 2004-2-21 4 3 3000 start date 2004-2-22 5 3 3000 end date 2006-8-21

constantstranger · Accepted Answer · 2022-07-23 16:21:42Z

Here's a way to do what you've asked that gives the output with the columns labeled as in your question:

cols = ['start date', 'end date'] data = pd.concat([data.drop(columns=set(cols)-{col}).rename(columns={col:'date'}) for col in cols]).sort_values('id').reset_index(drop=True)

Output:

 id date values 0 1 2001-11-13 1000 1 1 2002-6-25 1000 2 2 2002-6-26 2000 3 2 2004-2-21 2000 4 3 2004-2-22 3000 5 3 2006-8-21 3000

Collectives™ on Stack Overflow

Stacking two columns in a dataframe

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related