Thought differently with out using arrays_zip (which is available in => Spark 2.4)] and got the below...
It will work for Spark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...
Here map function (used in with column) creates a new map column. The input columns must be grouped as key-value pairs.
Case : String data type in Data :
import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap")) .toDF("Columns","Values").show(false)
Result :
root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+
Case : Mix of data types in Data :
If you have different types convert them to String... remaining steps wont change..
val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*)
Full Example :
import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.Column val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false)
Result :
root |-- col1: integer (nullable = false) |-- col2: integer (nullable = false) |-- col3: boolean (nullable = false) |-- col4: double (nullable = false) |-- col5: string (nullable = true) root |-- col1: string (nullable = false) |-- col2: string (nullable = false) |-- col3: string (nullable = false) |-- col4: string (nullable = false) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |2 | |col2 |3 | |col3 |true | |col4 |2.4 | |col5 |val | +-------+------+