AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.

1. Home
2. Questions
3. AI Assist
4. Tags
6. Challenges
7. Chat
8. Articles
9. Users
11. Jobs
12. Companies
13. Collectives
14. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Stack Internal
Bring the best of human thought and AI automation together at your work. Learn more

How to split a JSON array to multiple JSONs using scala spark

Asked 4 years, 5 months ago

Modified 4 years, 5 months ago

Viewed 311 times

0

I have an array JSON as below format

{ "marks": [ { "subject": "Maths", "mark": "80" }, { "subject": "Physics", "mark": "70" }, { "subject": "Chemistry", "mark": "60" } ] }

I need to split each array object as separate JSON files. Is there any way to do this in spark shell.

edited Jun 18, 2021 at 11:52

42.7k13 gold badges44 silver badges62 bronze badges

asked Jun 18, 2021 at 11:45

Aldrin Rodrigues

1511 silver badge12 bronze badges

Add a comment |

1 Answer 1

Sorted by:

1

You can explode the marks array of structs, add an ID column, and write JSON files partitioned by the unique ID column.

df.selectExpr("inline(marks)") .withColumn("id", monotonically_increasing_id) .repartition(col("id")) .write .partitionBy("id") .json("output")

answered Jun 18, 2021 at 11:52

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ss301 Over a year ago

Where is the df coming from ? What if I just have the json .

Start asking to get answers

Find the answer to your question by asking.

Explore related questions

See similar questions with these tags.