I am trying to store Stream Data into HDFS using SparkStreaming,but it Keep creating in new file insted of appending into one single file or few multiple files
If it keep creating n numbers of files,i feel it won't be much efficient
Code
lines.foreachRDD(f => { if (!f.isEmpty()) { val df = f.toDF().coalesce(1) df.write.mode(SaveMode.Append).json("hdfs://localhost:9000/MT9") } }) In my pom I am using respective dependencies:
- spark-core_2.11
- spark-sql_2.11
- spark-streaming_2.11
- spark-streaming-kafka-0-10_2.11
