Return to Answer

added the missing import statement.

edited Mar 10, 2021 at 21:04

51.3k
7
42
70

Thought differently with out using arrays_zip (which is available in =>=> Spark 2.4)] and got the below...

It will work for SparkSpark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap"))  .toDF("Columns","Values").show(false)

root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true)   +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+

import org.apache.spark.sql.functions._  import spark.implicits._ import org.apache.spark.sql.Column  val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5")  df.printSchema()  /**   * convert all columns to to string type since its needed further   */  val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*)  df1.printSchema()  var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) }  }  df1.withColumn("myMap", map(ColumnsAndValues: _*))   .select(explode($"myMap"))   .toDF("Columns", "Values")   .show(false)

Thought differently with out using arrays_zip (which is available in => Spark 2.4)] and got the below...

It will work for Spark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap")) .toDF("Columns","Values").show(false)

root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true)   +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+

import org.apache.spark.sql.functions._  import spark.implicits._ import org.apache.spark.sql.Column  val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5")  df.printSchema()  /**   * convert all columns to to string type since its needed further   */  val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*)  df1.printSchema()  var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) }  }  df1.withColumn("myMap", map(ColumnsAndValues: _*))   .select(explode($"myMap"))   .toDF("Columns", "Values")   .show(false)

Thought differently with out using arrays_zip (which is available in => Spark 2.4)] and got the below...

It will work for Spark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap"))  .toDF("Columns","Values").show(false)

root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+

import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.Column val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false)

added the missing import statement.

Source Link

edit approved Mar 10, 2021 at 21:04

Nikunj Kakadiya

3.1k
2
26
45

import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.Column  val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false)

import org.apache.spark.sql.functions._ import spark.implicits._ val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false)

import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.Column  val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false)

all data types supported case added

Source Link

edited Apr 28, 2020 at 18:51

Ram Ghadiyaram

29.4k
16
102
133

It will work for all versions of Spark Spark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

Case : String data type in Data :

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap")) .toDF("Columns","Values").show(false)

root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+

Case : Mix of data types in Data :

If you have different types convert them to String... remaining steps wont change..

val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*)

Full Example :

import org.apache.spark.sql.functions._ import spark.implicits._ val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false)

Result :

root |-- col1: integer (nullable = false) |-- col2: integer (nullable = false) |-- col3: boolean (nullable = false) |-- col4: double (nullable = false) |-- col5: string (nullable = true) root |-- col1: string (nullable = false) |-- col2: string (nullable = false) |-- col3: string (nullable = false) |-- col4: string (nullable = false) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |2 | |col2 |3 | |col3 |true | |col4 |2.4 | |col5 |val  | +-------+------+

It will work for all versions of Spark in a simpler way (flatmap , map and explode functions)...

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap")) .toDF("Columns","Values").show(false)

+-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+

It will work for Spark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

Case : String data type in Data :

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap")) .toDF("Columns","Values").show(false)

root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+

Case : Mix of data types in Data :

If you have different types convert them to String... remaining steps wont change..

val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*)

Full Example :

import org.apache.spark.sql.functions._ import spark.implicits._ val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false)

Result :

root |-- col1: integer (nullable = false) |-- col2: integer (nullable = false) |-- col3: boolean (nullable = false) |-- col4: double (nullable = false) |-- col5: string (nullable = true) root |-- col1: string (nullable = false) |-- col2: string (nullable = false) |-- col3: string (nullable = false) |-- col4: string (nullable = false) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |2 | |col2 |3 | |col3 |true | |col4 |2.4 | |col5 |val  | +-------+------+