5

In my application I have a need to create a single-row DataFrame from a Map.

So that a Map like

("col1" -> 5, "col2" -> 10, "col3" -> 6) 

would be transformed into a DataFrame with a single row and the map keys would become names of columns.

col1 | col2 | col3 5 | 10 | 6 

In case you are wondering why would I want this - I just need to save a single document with some statistics into MongoDB using MongoSpark connector which allows saving DFs and RDDs.

4
  • What happens when you try to parallelize it in Spark? Commented Mar 20, 2018 at 14:03
  • Are the keys ordered, or do you want to sort them alphabetically? Commented Mar 20, 2018 at 14:03
  • @AndreyTyukin No, order doesn't matter Commented Mar 20, 2018 at 14:05
  • @cricket_007, I think parallelize doesn't work for Maps Commented Mar 20, 2018 at 14:10

3 Answers 3

9

I thought that sorting the column names doesn't hurt anyway.

 import org.apache.spark.sql.types._ val map = Map("col1" -> 5, "col2" -> 6, "col3" -> 10) val (keys, values) = map.toList.sortBy(_._1).unzip val rows = spark.sparkContext.parallelize(Seq(Row(values: _*))) val schema = StructType(keys.map( k => StructField(k, IntegerType, nullable = false))) val df = spark.createDataFrame(rows, schema) df.show() 

Gives:

+----+----+----+ |col1|col2|col3| +----+----+----+ | 5| 6| 10| +----+----+----+ 

The idea is straightforward: convert map to list of tuples, unzip, convert the keys into a schema and the values into a single-entry row RDD, build dataframe from the two pieces (the interface for createDataFrame is a bit strange there, accepts java.util.Lists and kitchen sinks, but doesn't accept the usual scala List for some reason).

Sign up to request clarification or add additional context in comments.

1 Comment

I'm using scala 2.11 and (I think) as such, in the above map.toList.sortBy(_._1).unzip does not compile: toList is not a member of map, ._1 is not a number.... any idea how to fix this?
1

here you go :

val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10) val df = map.tail .foldLeft(Seq(map.head._2).toDF(map.head._1))((acc,curr) => acc.withColumn(curr._1,lit(curr._2))) df.show() +----+----+----+ |col1|col2|col3| +----+----+----+ | 5| 6| 10| +----+----+----+ 

Comments

0

A slight variation to Rapheal's answer. You can create a dummy column DF (1*1), then add the map elements using foldLeft and then finally delete the dummy column. That way, your foldLeft is straight forward and easy to remember.

val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10) val f = Seq("1").toDF("dummy") map.keys.toList.sorted.foldLeft(f) { (acc,x) => acc.withColumn(x,lit(map(x)) ) }.drop("dummy").show(false) +----+----+----+ |col1|col2|col3| +----+----+----+ |5 |6 |10 | +----+----+----+ 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.