5

I am new to Scala. I have a Dataframe with fields

ID:string, Time:timestamp, Items:array(struct(name:string,ranking:long)) 

I want to convert each row of the Items field to a hashmap, with the name as the key. I am not very sure how to do this.

2 Answers 2

9

This can be done using a UDF:

import spark.implicits._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.Row // Sample data: val df = Seq( ("id1", "t1", Array(("n1", 4L), ("n2", 5L))), ("id2", "t2", Array(("n3", 6L), ("n4", 7L))) ).toDF("ID", "Time", "Items") // Create UDF converting array of (String, Long) structs to Map[String, Long] val arrayToMap = udf[Map[String, Long], Seq[Row]] { array => array.map { case Row(key: String, value: Long) => (key, value) }.toMap } // apply UDF val result = df.withColumn("Items", arrayToMap($"Items")) result.show(false) // +---+----+---------------------+ // |ID |Time|Items | // +---+----+---------------------+ // |id1|t1 |Map(n1 -> 4, n2 -> 5)| // |id2|t2 |Map(n3 -> 6, n4 -> 7)| // +---+----+---------------------+ 

I can't see a way to do this without a UDF (using Spark's built-in functions only).

Sign up to request clarification or add additional context in comments.

Comments

1

Since 2.4.0, one can use map_from_entries:

import spark.implicits._ import org.apache.spark.sql.functions._ val df = Seq( (Array(("n1", 4L), ("n2", 5L))), (Array(("n3", 6L), ("n4", 7L))) ).toDF("Items") df.select(map_from_entries($"Items")).show /* +-----------------------+ |map_from_entries(Items)| +-----------------------+ | [n1 -> 4, n2 -> 5]| | [n3 -> 6, n4 -> 7]| +-----------------------+ */ 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.