Fetching values from nested JSON column in pyspark

I have a dataframe with the following schema:

Schema of the dataframe

I try to fetch all the data from this dataframe. I use df.collect() method to iterate through the entire dataframe and then pulling the values out of the columns one-by-one. But it seems like its not iterating through the entire tree and just pulling through the initial parent row only.

def parseCol(landing_df,data): for i in landing_df.collect(): parent_id = i["parent_id"] shared = "null" if (len(i["children"]))>1: # print(len(i["children"])) # if(len(i["children"])>1): data.append([i["project_id"],i["id"],i["name"],i["order"],i["pid"],i["created_date"],i["last_modified_date"], str(parent_id),i["description"],i["recursive"],i["links"][0][0],str(shared)]) for j in i["children"]: if(('shared') not in (i)): shared = 'null' else: shared = i['shared'] if(('project_id') not in (j)): project_id = "null" else: project_id = j['project_id'] data.append([project_id,j["id"],j["name"],j["order"],j["pid"],j["created_date"],j["last_modified_date"],str(j["parent_id"]),j["description"],j["recursive"],j["links"][0][0],str(shared)]) # print(-1) elif(len(i["children"])==0): data.append([i["project_id"],i["id"],i["name"],i["order"],i["pid"],i["created_date"],i["last_modified_date"],"null",i["description"],i["recursive"],i["links"][0][0],str(shared)]) return data

Can someone suggest some better way to do this.

edited May 23, 2023 at 7:15

Vincent Doba

5,1683 gold badges28 silver badges49 bronze badges

asked May 23, 2023 at 5:25

tathagat

94 bronze badges

Does this answer your question? reading a nested JSON file in pyspark

Pravash Panigrahi
– Pravash Panigrahi

2023-05-23 05:56:35 +00:00
Commented May 23, 2023 at 5:56
@PravashPanigrahi I believe no, as that is basically a struct datatype and this one is a nested array type data.

tathagat
– tathagat

2023-05-23 08:33:01 +00:00
Commented May 23, 2023 at 8:33
What is your goal here?

Arud Seka Berne S
– Arud Seka Berne S

2023-05-23 08:46:09 +00:00
Commented May 23, 2023 at 8:46
If possible give as a sample input and output.

Arud Seka Berne S
– Arud Seka Berne S

2023-05-23 08:46:35 +00:00
Commented May 23, 2023 at 8:46

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Fetching values from nested JSON column in pyspark

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked