0

I have a dataframe with the following schema:

root |-- _1: struct (nullable = true) | |-- key: string (nullable = true) |-- _2: struct (nullable = true) | |-- value: long (nullable = true) 

I want to transform dataframe to the following schema:

root |-- _1: struct (nullable = true) | |-- key: string (nullable = true) | |-- value: long (nullable = true) 

2 Answers 2

2

Use struct:

pyspark.sql.functions.struct(*cols)

Creates a new struct column.

from pyspark.sql.functions import struct, col from pyspark.sql import Row df = spark.createDataFrame([Row(_1=Row(key="a"), _2=Row(value=1))]) result = df.select(struct(col("_1.key"), col("_2.value")).alias("_1")) 

which gives:

result.printSchema() # root # |-- _1: struct (nullable = false) # | |-- key: string (nullable = true) # | |-- value: long (nullable = true) 

and

result.show() # +-----+ # | _1| # +-----+ # |[a,1]| # +-----+ 
Sign up to request clarification or add additional context in comments.

1 Comment

Aha! This is exactly what I was looking for. Thanks.
2

If your dataframe is with following schema

root |-- _1: struct (nullable = true) | |-- key: string (nullable = true) |-- _2: struct (nullable = true) | |-- value: long (nullable = true) 

Then you can use * to select all elements of struct columns into separate columns and then use struct inbuilt function to combine them back to one struct field

from pyspark.sql import functions as F df.select(F.struct("_1.*", "_2.*").alias("_1")) 

you should get your desired output dataframe

root |-- _1: struct (nullable = false) | |-- key: string (nullable = true) | |-- value: long (nullable = true) 

Updated

More generalized form of above code if all the columns in original dataframe are struct is as below

df.select(F.struct(["{}.*".format(x) for x in df.columns]).alias("_1")) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.