1

What's a good way of unpivoting a dataframe without having to hard-code the column names, I've 50+ columns that need to switch to rows. I've tried the following, given this example:

Using Python 2.7

df:

+----------+-------+-------+-------+-------+-------+ |Category |Col_A |Col_B |Col_C |Col_D |.......| +----------+-------+-------+-------+-------+-------+ |Category1 |1 |0 |3 |0 |.......| +----------+-------+-------+-------+-------+-------+ |Category2 |5 |0 |2 |0 |.......| +----------+-------+-------+-------+-------+-------+ |Category3 |0 |6 |0 |4 |.......| +----------+-------+-------+-------+-------+-------+ 

Code:

Determine the column (Col_X) names:

cols = df.columns[1:] 

Gets the column count and creates the format 'Col_X', Col_X for each:

x = "" cnt = 0 for c in cols: cnt += 1 x += "'"+str(c)+"', " + str(c) + ", " 

Removes last two characters:

x = x[:-2] 

Creates the full expression:

xpr = "\"stack(" + str(cnt) + ", " + str(x) + ") as (Type,Value)\"" 

Output:

df_output = df.select("Category",expr(xpr)) 

But it returns something like this:

df_output.show() +----------+-------+-------+-------+-------+-------+ |Category |stack(55, 'Col_A', Col_A, 'Col_B', Col_B.... +----------+-------+-------+-------+-------+-------+ |Category1 | +----------+ |Category2 | +----------+ |Category3 | +----------+ 

1 Answer 1

1

I managed to make it work by applying this on the expression buildup:

xpr = """stack({}, {}) as (Type,Value)""".format(cnt,x) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.