0

I have an array JSON as below format

{ "marks": [ { "subject": "Maths", "mark": "80" }, { "subject": "Physics", "mark": "70" }, { "subject": "Chemistry", "mark": "60" } ] } 

I need to split each array object as separate JSON files. Is there any way to do this in spark shell.

1 Answer 1

1

You can explode the marks array of structs, add an ID column, and write JSON files partitioned by the unique ID column.

df.selectExpr("inline(marks)") .withColumn("id", monotonically_increasing_id) .repartition(col("id")) .write .partitionBy("id") .json("output") 
Sign up to request clarification or add additional context in comments.

1 Comment

Where is the df coming from ? What if I just have the json .

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.