0

Converting structType to MapType in Spark.

Schema:

event: struct (nullable = true) | | event_category: string (nullable = true) | | event_name: string (nullable = true) | | properties: struct (nullable = true) | | | prop1: string (nullable = true) | | | prop2: string (nullable = true) 

Sample data:

{ "event": { "event_category: "abc", "event_name": "click", "properties" : { "prop1": "prop1Value", "prop2": "prop2Value", .... } } } 

Need values as:

event_category | event_name | properties_key | properties_value | abc | click | prop1 | prop1Value abc | click | prop2 | prop2Value 
2

1 Answer 1

0

You will have to find some mechanism to create map of properties struct. I have used udf function to zip the key and values and return arrays of key and value.

import org.apache.spark.sql.functions._ def collectUdf = udf((cols: collection.mutable.WrappedArray[String], values: collection.mutable.WrappedArray[String]) => cols.zip(values)) 

Multiple generators are not supported in spark so you will have to save the dataframe to temporary dataframe.

val columnsMap = df_json.select($"event.properties.*").columns val temp = df_json.withColumn("event_properties", explode(collectUdf(lit(columnsMap), array($"event.properties.*")))) 

The last step would be to just separate the event_properties column

temp.select($"event.event_category", $"event.event_name", $"event_properties._1".as("properties_key"), $"event_properties._2".as("properties_value")).show(false) 

You should have what you desire

+--------------+----------+--------------+----------------+ |event_category|event_name|properties_key|properties_value| +--------------+----------+--------------+----------------+ |abc |click |prop1 |prop1Value | |abc |click |prop2 |prop2Value | +--------------+----------+--------------+----------------+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.