I am working on ETL process in Scala. My raw log files has many columns (around 70). I try to save it to file using Row() objects:
val base_RDD = rawData.map{r => if(r(13) == null || r(13).trim.isEmpty) Row( r(2), r(3), r(4), "", r(6), r(7), r(8), r(9), r(10), r(11), r(12), r(13), r(14), r(15), r(16), r(18), r(21), r(27), r(29), r(30), r(32), r(33), r(34), r(35), r(36), r(37), r(38), r(39), r(40), r(41), r(42), r(43), r(44), r(45), r(46), r(47), r(48), r(49), r(50), r(51), r(52), r(53), r(54), r(55), r(56), r(57), r(58), r(59), r(60), r(61), r(62), r(63), r(64), r(65), r(66), r(67), r(68), r(69), r(70), r(71), r(72), r(73), r(74), r(75), "", "", "", "", "", "", "", r(76), r(77), r(78), r(1)) else Row(r(2), r(3), r(4), "", r(6), r(7), r(8), r(9), r(10), r(11), r(12), r(13), r(14), r(15), r(16), r(18), r(21), r(27), r(29), r(30), r(32), r(33), r(34), r(35), r(36), r(37), r(38), r(39), r(40), r(41), r(42), r(43), r(44), r(45), r(46), r(47), r(48), r(49), r(50), r(51), r(52), r(53), r(54), r(55), r(56), r(57), r(58), r(59), r(60), r(61), r(62), r(63), r(64), r(65), r(66), r(67), r(68), r(69), r(70), r(71), r(72), r(73), r(74), r(75), r(13).split("_")(0), r(13).split("_")(1), r(13).split("_")(2), r(13).split("_")(3), r(5), r(13).split("_")(5), r(13).split("_")(6),r(76), r(77), r(78), r(1))} Now exception is gone. however "[" and "]" are observed after saving data on disk base_RDD.saveAsTextFile("hdfs://nameservice1:8020/tmp/manish/tmpData") Is my approach in correct way? please suggest what goes wrong? If any.
SAMPLE OUTPUT:
[6035233,500212680,50013723,,,ddd.com,,,,,,,1,0,0,0,,0,,,,,,,,,,0,0,0,0,0,0,-1x-1,,,0,0,0,0,0,0,0,0,,0,0,,0,0,0,0,,,,,0,0,,0,0,0,0,0,,,,,,,,,0,0,] [6035233,500212680,50013723,,,d.com,,,,,,,1,0,0,0,,0,,,,,,,,,,0,0,0,0,0,0,-1x-1,,,0,0,0,0,0,0,0,0,,0,0,,0,0,0,0,,,,,0,0,,0,0,0,0,0,,,,,,,,,0,0,] I don't want "[" and "]"