What's a good way of unpivoting a dataframe without having to hard-code the column names, I've 50+ columns that need to switch to rows. I've tried the following, given this example:
Using Python 2.7
df:
+----------+-------+-------+-------+-------+-------+ |Category |Col_A |Col_B |Col_C |Col_D |.......| +----------+-------+-------+-------+-------+-------+ |Category1 |1 |0 |3 |0 |.......| +----------+-------+-------+-------+-------+-------+ |Category2 |5 |0 |2 |0 |.......| +----------+-------+-------+-------+-------+-------+ |Category3 |0 |6 |0 |4 |.......| +----------+-------+-------+-------+-------+-------+ Code:
Determine the column (Col_X) names:
cols = df.columns[1:] Gets the column count and creates the format 'Col_X', Col_X for each:
x = "" cnt = 0 for c in cols: cnt += 1 x += "'"+str(c)+"', " + str(c) + ", " Removes last two characters:
x = x[:-2] Creates the full expression:
xpr = "\"stack(" + str(cnt) + ", " + str(x) + ") as (Type,Value)\"" Output:
df_output = df.select("Category",expr(xpr)) But it returns something like this:
df_output.show() +----------+-------+-------+-------+-------+-------+ |Category |stack(55, 'Col_A', Col_A, 'Col_B', Col_B.... +----------+-------+-------+-------+-------+-------+ |Category1 | +----------+ |Category2 | +----------+ |Category3 | +----------+