I'm trying to create a schema for my new DataFrame and have tried various combinations of brackets and keywords but have been unable to figure out how to make this work. My current attempt:
from pyspark.sql.types import * schema = StructType([ StructField("User", IntegerType()), ArrayType(StructType([ StructField("user", StringType()), StructField("product", StringType()), StructField("rating", DoubleType())])) ]) Comes back with the error:
elementType should be DataType Traceback (most recent call last): File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__ assert isinstance(elementType, DataType), "elementType should be DataType" AssertionError: elementType should be DataType I have googled, but so far no good examples of an array of objects.