Apache Spark dataframe column explode to multiple columns

Question

I am currently using Apache Spark 2.1.1 to process an XML file into a CSV. My goal is to flatten the XML but the problem I am currently facing is unbounded occurrences of elements. Spark automatically infer these unbounded occurrences into array. Now what I want to do is explode an array column.

 Sample Schema |-- Instrument_XREF_Identifier: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- @bsid: string (nullable = true) | | |-- @exch_code: string (nullable = true) | | |-- @id_bb_sec_num: string (nullable = true) | | |-- @market_sector: string (nullable = true)

I know I can explode the array by this method

result = result.withColumn(p.name, explode(col(p.name)))

which will produce multiple rows with each array value containing struct. But the output I want to produce is to explode it into multiple columns instead of row.

Here is my expected output according to the schema I mentioned above:

Lets say that there are two struct values in the array.

bsid1 exch_code1 id_bb_sec_num1 market_sector1 bsid2 exch_code2 id_bb_sec_num2 market_sector2 123 3 1 13 234 12 212 221

How would variable length array map to fixed number of columns? Please post example input and expected output. — Alper t. Turker
– Alper t. Turker, Commented Jan 16, 2018 at 17:36

Raphael Roth · Accepted Answer · 2018-01-16 20:39:03Z

suppose Instrument_XREF_Identifier is a column of type array<struct<..>>, then you have to do it in two steps:

result .withColumn("tmp",explode(col("Instrument_XREF_Identifier"))) .select("tmp.*")

This will give you a column for each of the struct elements.

There seems not to be a way to do it in 1 select/withColumn statement, see Explode array of structs to columns in Spark

But that would still just be exploded into multiple rows. I'm trying to approach it so that I'll create new columns when they are exploded.

Collectives™ on Stack Overflow

Apache Spark dataframe column explode to multiple columns

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related