I'm learning Scala, curious how to optimize this code. What I have is an RDD loaded from Spark. It's a tab delimited dataset. I want to combine the first column with the second column, and append it as a new column to the end of the dataset, with a "-" separating the two.
For example: column1\tcolumn2\tcolumn3
becomes
column1\tcolumn2\tcolumn3\tcolumn1-column2
val f = sc.textFile("path/to/dataset") f.map(line => if (line.split("\t").length > 1) line.split("\t") :+ line.split("\t")(0)+"-"+line.split("\t")(1) else Array[String]()).map(a => a.mkString("\t") ) .saveAsTextFile("output/path")
splitonly once.