0

I have an RDD that is of

t: org.apache.spark.rdd.RDD[Iterator[scala.xml.Node]] = MapPartitionsRDD[23] 

When using map such as below to access individual node I get an error

scala> t.map(l => l(0)) <console>:41: error: Iterator[scala.xml.Node] does not take parameters t.map(l => l(0)) 

Is there way to get individual nodes?

1 Answer 1

1

You can't access an Iterator with number index; You can use slice with next to access the nth element in an iterator as i.slice(n,n+1).next:

val rdd = spark.range(3).rdd.map(_ => Iterator(2,3,4)) // rdd: org.apache.spark.rdd.RDD[Iterator[Int]] = MapPartitionsRDD[19] at map at <console>:23 // to access the first element in each iterator rdd.map(l => l.slice(0,1).next).collect // res24: Array[Int] = Array(2, 2, 2) 
Sign up to request clarification or add additional context in comments.

1 Comment

Or just convert it to a Seq first with l.toSeq, then access it by index. Depends on how many different indices you will be accessing whether this is worth it. And which order - because using slice consumes (some of) the iterator so you can't get a value the same way again.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.