1

I am working on ETL process in Scala. My raw log files has many columns (around 70). I try to save it to file using Row() objects:

val base_RDD = rawData.map{r => if(r(13) == null || r(13).trim.isEmpty) Row( r(2), r(3), r(4), "", r(6), r(7), r(8), r(9), r(10), r(11), r(12), r(13), r(14), r(15), r(16), r(18), r(21), r(27), r(29), r(30), r(32), r(33), r(34), r(35), r(36), r(37), r(38), r(39), r(40), r(41), r(42), r(43), r(44), r(45), r(46), r(47), r(48), r(49), r(50), r(51), r(52), r(53), r(54), r(55), r(56), r(57), r(58), r(59), r(60), r(61), r(62), r(63), r(64), r(65), r(66), r(67), r(68), r(69), r(70), r(71), r(72), r(73), r(74), r(75), "", "", "", "", "", "", "", r(76), r(77), r(78), r(1)) else Row(r(2), r(3), r(4), "", r(6), r(7), r(8), r(9), r(10), r(11), r(12), r(13), r(14), r(15), r(16), r(18), r(21), r(27), r(29), r(30), r(32), r(33), r(34), r(35), r(36), r(37), r(38), r(39), r(40), r(41), r(42), r(43), r(44), r(45), r(46), r(47), r(48), r(49), r(50), r(51), r(52), r(53), r(54), r(55), r(56), r(57), r(58), r(59), r(60), r(61), r(62), r(63), r(64), r(65), r(66), r(67), r(68), r(69), r(70), r(71), r(72), r(73), r(74), r(75), r(13).split("_")(0), r(13).split("_")(1), r(13).split("_")(2), r(13).split("_")(3), r(5), r(13).split("_")(5), r(13).split("_")(6),r(76), r(77), r(78), r(1))} 

Now exception is gone. however "[" and "]" are observed after saving data on disk base_RDD.saveAsTextFile("hdfs://nameservice1:8020/tmp/manish/tmpData") Is my approach in correct way? please suggest what goes wrong? If any.

SAMPLE OUTPUT:

[6035233,500212680,50013723,,,ddd.com,,,,,,,1,0,0,0,,0,,,,,,,,,,0,0,0,0,0,0,-1x-1,,,0,0,0,0,0,0,0,0,,0,0,,0,0,0,0,,,,,0,0,,0,0,0,0,0,,,,,,,,,0,0,] [6035233,500212680,50013723,,,d.com,,,,,,,1,0,0,0,,0,,,,,,,,,,0,0,0,0,0,0,-1x-1,,,0,0,0,0,0,0,0,0,,0,0,,0,0,0,0,,,,,0,0,,0,0,0,0,0,,,,,,,,,0,0,] 

I don't want "[" and "]"

0

1 Answer 1

4

Just use plain Lists and make strings before you call saveAsTextFile:

rawData.map{r => if(r(13) == null || r(13).trim.isEmpty) Seq(r(2), r(3), ...).mkString(",") else Seq(r(2), r(3), ...).mkString(",") } 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.