0

I have one RDD which contains multiple datastructures, whereas one of these data structures is a Map[String, Int].

To visualize it easily I get the following after a map transformation:

val data = ... // This is a RDD[Map[String, Int]] 

In one of the elements of this RDD, the Map contains the following:

*key value* map_id -> 7753 Oscar -> 39 Jaden -> 13 Thomas -> 1 Chris -> 52 

And then it contains other names and numbers in other elements of the RDD, each map contains a certain map_id. Anyhow, if I simply do data.saveAsTextFile(path), I will get the following output in my file:

Map(map_id -> 7753, Oscar -> 39, Jaden -> 13, Thomas -> 1, Chris -> 52) Map(...) Map(...) 

However, I would like to format it as the following:

--------------------------- map_id: 7753 --------------------------- Oscar: 39 Jaden: 13 Thomas: 1 Chris: 52 --------------------------- map_id: <some other id> --------------------------- Name: nbr Name2: nbr2 

Basically, the map_id as some kind of header, then the contents, one line of space and then the next element.

To my question, data RDD only has two options, save as text file or as object file, which neither as far as I can see support my to customize the formatting. How could I go about doing this?

1 Answer 1

4

You can just map to String and write the result. For example:

def format(map: Map[String, Int]): String = { val id = map.get("map_id").map(_.toString).getOrElse("unknown") val content = map.collect { case (k, v) if k != "map_id" => s"$k: $v" }.mkString("\n") s"""|--------------------------- |map_id: $id |------------------------------- |$content """.stripMargin } data.map(format(_)).saveAsTextFile(path) 
Sign up to request clarification or add additional context in comments.

1 Comment

Woah, this is really smart, that makes it so simple and easy! Thanks a lot :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.