0

I have a dataset of 3 items. I call a function on each item using map() but the function is never called.

object MyProgram { val events = Seq("A","B","C") def main(args: Array[String]): Unit = { val spark = SparkSession .builder .appName("MyApp") .config("spark.master", "local") .getOrCreate() import spark.implicits._ val eventsDS = events.toDS() System.out.println("Before") val tempDS = eventsDS.rdd.map(x => doSomething(x)) System.out.println("After") } def doSomething(event: String) : Unit = { System.out.println("Do Something!") } } 

Output:

Before

After

2
  • In your code you not called any action try this one eventsDS.rdd.collect.map(x => doSomething(x)) Commented Jul 23, 2019 at 12:20
  • @Yogesh Oh of course, it makes sense! Yes, now it's working. Please write your comment as an answer and I will accept it. Commented Jul 23, 2019 at 12:22

1 Answer 1

2

map is lazily evaluated, you need to call an action like foreach to perform the computation:

eventsDS.foreach(doSomething _) 
Sign up to request clarification or add additional context in comments.

2 Comments

Agreed! my point was more to highlight that map doesn't get evaluated - Updated the answer accordingly
Additionally, since doSomething() will be run on the executors, you shouldn't expect to see output on the driver node.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.