Skip to main content
AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.
added the missing import statement.
Source Link
jwvh
  • 51.3k
  • 7
  • 42
  • 70

Thought differently with out using arrays_zip  (which is available in =>=> Spark 2.4)] and got the below...

It will work for SparkSpark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap"))  .toDF("Columns","Values").show(false)   
root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true)   +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+ 
import org.apache.spark.sql.functions._  import spark.implicits._ import org.apache.spark.sql.Column  val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5")  df.printSchema()  /**   * convert all columns to to string type since its needed further   */  val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*)  df1.printSchema()  var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) }  }  df1.withColumn("myMap", map(ColumnsAndValues: _*))   .select(explode($"myMap"))   .toDF("Columns", "Values")   .show(false) 

Thought differently with out using arrays_zip  (which is available in => Spark 2.4)] and got the below...

It will work for Spark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap")) .toDF("Columns","Values").show(false)   
root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true)   +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+ 
import org.apache.spark.sql.functions._  import spark.implicits._ import org.apache.spark.sql.Column  val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5")  df.printSchema()  /**   * convert all columns to to string type since its needed further   */  val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*)  df1.printSchema()  var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) }  }  df1.withColumn("myMap", map(ColumnsAndValues: _*))   .select(explode($"myMap"))   .toDF("Columns", "Values")   .show(false) 

Thought differently with out using arrays_zip (which is available in => Spark 2.4)] and got the below...

It will work for Spark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap"))  .toDF("Columns","Values").show(false) 
root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+ 
import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.Column val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false) 
import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.Column  val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false) 
import org.apache.spark.sql.functions._ import spark.implicits._ val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false) 
import org.apache.spark.sql.functions._ import spark.implicits._ import org.apache.spark.sql.Column  val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false) 
all data types supported case added
Source Link
Ram Ghadiyaram
  • 29.4k
  • 16
  • 102
  • 133

It will work for all versions of Spark Spark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

Case : String data type in Data :

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap")) .toDF("Columns","Values").show(false) 
root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+ 

Case : Mix of data types in Data :

If you have different types convert them to String... remaining steps wont change..

val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) 

Full Example :

import org.apache.spark.sql.functions._ import spark.implicits._ val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false) 

Result :

root |-- col1: integer (nullable = false) |-- col2: integer (nullable = false) |-- col3: boolean (nullable = false) |-- col4: double (nullable = false) |-- col5: string (nullable = true) root |-- col1: string (nullable = false) |-- col2: string (nullable = false) |-- col3: string (nullable = false) |-- col4: string (nullable = false) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |2 | |col2 |3 | |col3 |true | |col4 |2.4 | |col5 |val  | +-------+------+ 

It will work for all versions of Spark in a simpler way (flatmap , map and explode functions)...

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap")) .toDF("Columns","Values").show(false) 
+-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+ 

It will work for Spark =>2.0 onwards in a simpler way (flatmap , map and explode functions)...

Case : String data type in Data :

import org.apache.spark.sql.functions._ val df: DataFrame =Seq((("val1"),("val2"),("val3"),("val4"),("val5"))).toDF("col1","col2","col3","col4","col5") var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } df.printSchema() df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap")) .toDF("Columns","Values").show(false) 
root |-- col1: string (nullable = true) |-- col2: string (nullable = true) |-- col3: string (nullable = true) |-- col4: string (nullable = true) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |val1 | |col2 |val2 | |col3 |val3 | |col4 |val4 | |col5 |val5 | +-------+------+ 

Case : Mix of data types in Data :

If you have different types convert them to String... remaining steps wont change..

val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) 

Full Example :

import org.apache.spark.sql.functions._ import spark.implicits._ val df = Seq(((2), (3), (true), (2.4), ("val"))).toDF("col1", "col2", "col3", "col4", "col5") df.printSchema() /** * convert all columns to to string type since its needed further */ val df1 = df.select(df.columns.map(c => col(c).cast(StringType)): _*) df1.printSchema() var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => { Array(lit(c), col(c)) } } df1.withColumn("myMap", map(ColumnsAndValues: _*)) .select(explode($"myMap")) .toDF("Columns", "Values") .show(false) 

Result :

root |-- col1: integer (nullable = false) |-- col2: integer (nullable = false) |-- col3: boolean (nullable = false) |-- col4: double (nullable = false) |-- col5: string (nullable = true) root |-- col1: string (nullable = false) |-- col2: string (nullable = false) |-- col3: string (nullable = false) |-- col4: string (nullable = false) |-- col5: string (nullable = true) +-------+------+ |Columns|Values| +-------+------+ |col1 |2 | |col2 |3 | |col3 |true | |col4 |2.4 | |col5 |val  | +-------+------+ 
added 123 characters in body
Source Link
Ram Ghadiyaram
  • 29.4k
  • 16
  • 102
  • 133
Loading
added 42 characters in body
Source Link
Ram Ghadiyaram
  • 29.4k
  • 16
  • 102
  • 133
Loading
added 38 characters in body
Source Link
Ram Ghadiyaram
  • 29.4k
  • 16
  • 102
  • 133
Loading
deleted 15 characters in body
Source Link
Ram Ghadiyaram
  • 29.4k
  • 16
  • 102
  • 133
Loading
Source Link
Ram Ghadiyaram
  • 29.4k
  • 16
  • 102
  • 133
Loading