2

While reading parquet file this is the following file data

|id |name |activegroup| |1 |abc |[{"groupID":"5d","role":"admin","status":"A"},{"groupID":"58","role":"admin","status":"A"}]| 

data types of each field

root

|--id : int |--name : String |--activegroup : String 

activegroup column is string explode function is not working. Following is the required output

|id |name |groupID|role|status| |1 |abc |5d |admin|A | |1 |def |58 |admin|A | 

Do help me with parsing the above in spark scala latest version

1
  • How to solve it for spark 2.3 version? Commented Jul 26, 2021 at 11:03

1 Answer 1

4

First you need to extract the json schema:

 val schema = schema_of_json(lit(df.select($"activeGroup").as[String].first)) 

Once you got it, you can convert your activegroup column, which is a String to json (from_json), and then explode it.

Once the column is a json, you can extract it's values with $"columnName.field"

 val dfresult = df.withColumn("jsonColumn", explode( from_json($"activegroup", schema))) .select($"id", $"name", $"jsonColumn.groupId" as "groupId", $"jsonColumn.role" as "role", $"jsonColumn.status" as "status") 

If you want to extract the whole json and the element names are ok to you you can use the * to do it:

val dfresult = df.withColumn("jsonColumn", explode( from_json($"activegroup", schema))) .select($"id", $"name", $"jsonColumn.*") 

RESULT

+---+----+-------+-----+------+ | id|name|groupId| role|status| +---+----+-------+-----+------+ | 1| abc| 5d|admin| A| | 1| abc| 58|admin| A| +---+----+-------+-----+------+ 
Sign up to request clarification or add additional context in comments.

2 Comments

This is not working with spark.sql 2.3 version. Can someone help me to solve from this version @SCouto
it should work, I answer this with spark 2.4 i think, which is your error?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.