Spark - convert Map to a single-row DataFrame

Question

In my application I have a need to create a single-row DataFrame from a Map.

So that a Map like

("col1" -> 5, "col2" -> 10, "col3" -> 6)

would be transformed into a DataFrame with a single row and the map keys would become names of columns.

col1 | col2 | col3 5 | 10 | 6

In case you are wondering why would I want this - I just need to save a single document with some statistics into MongoDB using MongoSpark connector which allows saving DFs and RDDs.

Are the keys ordered, or do you want to sort them alphabetically? — Andrey Tyukin
– Andrey Tyukin, Commented Mar 20, 2018 at 14:03

Andrey Tyukin · Accepted Answer · 2018-03-20 14:29:37Z

I thought that sorting the column names doesn't hurt anyway.

 import org.apache.spark.sql.types._ val map = Map("col1" -> 5, "col2" -> 6, "col3" -> 10) val (keys, values) = map.toList.sortBy(_._1).unzip val rows = spark.sparkContext.parallelize(Seq(Row(values: _*))) val schema = StructType(keys.map( k => StructField(k, IntegerType, nullable = false))) val df = spark.createDataFrame(rows, schema) df.show()

Gives:

+----+----+----+ |col1|col2|col3| +----+----+----+ | 5| 6| 10| +----+----+----+

The idea is straightforward: convert map to list of tuples, unzip, convert the keys into a schema and the values into a single-entry row RDD, build dataframe from the two pieces (the interface for createDataFrame is a bit strange there, accepts java.util.Lists and kitchen sinks, but doesn't accept the usual scala List for some reason).

I'm using scala 2.11 and (I think) as such, in the above map.toList.sortBy(_._1).unzip does not compile: toList is not a member of map, ._1 is not a number.... any idea how to fix this?

Raphael Roth · Accepted Answer · 2018-03-20 14:20:28Z

here you go :

val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10) val df = map.tail .foldLeft(Seq(map.head._2).toDF(map.head._1))((acc,curr) => acc.withColumn(curr._1,lit(curr._2))) df.show() +----+----+----+ |col1|col2|col3| +----+----+----+ | 5| 6| 10| +----+----+----+

stack0114106 · Accepted Answer · 2020-12-18 18:41:26Z

A slight variation to Rapheal's answer. You can create a dummy column DF (1*1), then add the map elements using foldLeft and then finally delete the dummy column. That way, your foldLeft is straight forward and easy to remember.

val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10) val f = Seq("1").toDF("dummy") map.keys.toList.sorted.foldLeft(f) { (acc,x) => acc.withColumn(x,lit(map(x)) ) }.drop("dummy").show(false) +----+----+----+ |col1|col2|col3| +----+----+----+ |5 |6 |10 | +----+----+----+

Collectives™ on Stack Overflow

Spark - convert Map to a single-row DataFrame

3 Answers 3

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Linked

Related