I am new to spark and scala. I want to read a directory containing json files. The file has attribute called "EVENT_NAME" which can have 20 different values. I need to separate the events, depending upon the attribute value. i.e. EVENT_NAME=event_A events together. Write these in hive external table structure like: /apps/hive/warehouse/db/event_A/dt=date/hour=hr
Here I have 20 different tables for all the event types and data related to each event should go to respective table. I have managed to write some code but need help to write my data correctly.
{ import org.apache.spark.sql._ import sqlContext._ val path = "/source/data/path" val trafficRep = sc.textFile(path) val trafficRepDf = sqlContext.read.json(trafficRep) trafficRepDf.registerTempTable("trafficRepDf") trafficRepDf.write.partitionBy("EVENT_NAME").save("/apps/hive/warehouse/db/sample") } The last line creates a partitioned output but is not how exactly I need it. Please suggest how can I get it correct or any other piece of code to do it.