1

In spark shell, i'm reading an input file and trimming the field values next saving the final rdd using saveAsTextFile() method. The field separator in the input file is '|' but the in the output file I'm getting the field separator as ','.

Input Format: abc | def | xyz Default Output Format: abc,def,xyz 

Required output format something like abc|def|xyz

Is there anyway to change the default output delimiter value to '|', if yes than please suggest.

1

1 Answer 1

1

For an RDD, you'll just need to make a string with a pipe separated value on the product iterator :

scala> val rdd = sc.parallelize(Seq(("a", 1, 3), ("b", 2, 10))) // rdd: org.apache.spark.rdd.RDD[(String, Int, Int)] = ParallelCollectionRDD[11] at parallelize at <console>:27 scala> rdd.map { x => x.productIterator.toSeq.mkString("|") } // res9: Array[String] = Array(a|1|3, b|2|10) scala> scala> rdd.map { x => x.productIterator.toSeq.mkString("|") }.saveAsTextFile("test") 

Now let's check the content of the files :

$ cat test/part-0000* a|1|3 b|2|10 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.