20

I'm trying to create a schema for my new DataFrame and have tried various combinations of brackets and keywords but have been unable to figure out how to make this work. My current attempt:

from pyspark.sql.types import * schema = StructType([ StructField("User", IntegerType()), ArrayType(StructType([ StructField("user", StringType()), StructField("product", StringType()), StructField("rating", DoubleType())])) ]) 

Comes back with the error:

elementType should be DataType Traceback (most recent call last): File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__ assert isinstance(elementType, DataType), "elementType should be DataType" AssertionError: elementType should be DataType 

I have googled, but so far no good examples of an array of objects.

1 Answer 1

33

You will need an additional StructField for ArrayType property. This one should work:

from pyspark.sql.types import * schema = StructType([ StructField("User", IntegerType()), StructField("My_array", ArrayType( StructType([ StructField("user", StringType()), StructField("product", StringType()), StructField("rating", DoubleType()) ]) ) ]) 

For more information check this link: http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.