0

I have the following dataframe:

+--------------------+---+---+-----+----+--------+----+ | ak| 1 | 2 | 3 | 4 | 5 | 6 | +--------------------+---+---+-----+----+--------+----+ |8dce120638dbdf438 | 2| 1| 0| 0| 0| 0| |3fd28484316249e95 | 1| 0| 3| 1| 4| 5| |3636b43f64db33889 | 9| 3| 3| 4| 18| 11| +--------------------+---+---+-----+----+--------+----+ 

and I want transpose it to the following:

ak depth user_count 8dce120638dbdf438 1 2 8dce120638dbdf438 2 1 8dce120638dbdf438 3 0 8dce120638dbdf438 4 0 8dce120638dbdf438 5 0 8dce120638dbdf438 6 0 3fd28484316249e95 1 1 3fd28484316249e95 2 0 3fd28484316249e95 3 3 3fd28484316249e95 4 1 3fd28484316249e95 5 4 3fd28484316249e95 6 5 3fd28484316249e95 1 9 3fd28484316249e95 2 3 3fd28484316249e95 3 3 3fd28484316249e95 4 4 3fd28484316249e95 5 18 3fd28484316249e95 6 11 

How to do it in Scala?

2 Answers 2

3

A similar approach to @Ramesh Maharjan's, but without the use of UDFs - instead, using Spark's built-in array and struct functions to construct a similar array that can be exploded:

import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.types._ // per column name, create a struct (similar to a tuple) of the column name and value: def arrayItem(name: String) = struct(lit(name) cast IntegerType as "depth", $"$name" as "user_count") // create an array of these per column, explode it and select the relevant columns: df.withColumn("tmp", explode(array(df.columns.tail.map(arrayItem): _*))) .select($"ak", $"tmp.depth", $"tmp.user_count") 
Sign up to request clarification or add additional context in comments.

Comments

2

The solution seems straight forward to collect the values with column names into an array form, then use explode function to separate each element of array into separate rows, then finally separate the key and value to separate columns.

Summarizing the above explanation into code with explanation are as given below

val columns = df.columns.tail //selecting columns to be changed to rows import org.apache.spark.sql.functions._ //defining udf for zipping the column names with value and returning as array of column names zipped with column values def zipUdf = udf((cols: collection.mutable.WrappedArray[String], vals: collection.mutable.WrappedArray[String]) => cols.zip(vals)) df.select(col("ak"), zipUdf(lit(columns), array(columns.map(col): _*)).as("depth")) //calling udf function above .withColumn("depth", explode(col("depth"))) //exploding the array column to be on separate rows .select(col("ak"), col("depth._1").as("depth"), col("depth._2").as("user_count")) //selecting columns as required in output .show(false) 

You should have the following output

+-----------------+-----+----------+ |ak |depth|user_count| +-----------------+-----+----------+ |8dce120638dbdf438|1 |2 | |8dce120638dbdf438|2 |1 | |8dce120638dbdf438|3 |0 | |8dce120638dbdf438|4 |0 | |8dce120638dbdf438|5 |0 | |8dce120638dbdf438|6 |0 | |3fd28484316249e95|1 |1 | |3fd28484316249e95|2 |0 | |3fd28484316249e95|3 |3 | |3fd28484316249e95|4 |1 | |3fd28484316249e95|5 |4 | |3fd28484316249e95|6 |5 | |3636b43f64db33889|1 |9 | |3636b43f64db33889|2 |3 | |3636b43f64db33889|3 |3 | |3636b43f64db33889|4 |4 | |3636b43f64db33889|5 |18 | |3636b43f64db33889|6 |11 | +-----------------+-----+----------+ 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.