I am new to Apache Spark, so forgive me if this is a noob question. I am trying to define a particular schema before reading in the dataset in order to speed up processing. There are a few data types that I am not sure how to define (ArrayType and StructType).
Here is a screenshot of the schema I am working with:

Here is what I have so far:
jsonSchema = StructType([StructField("attribution", ArrayType(), True), StructField("averagingPeriod", StructType(), True), StructField("city", StringType(), True), StructField("coordinates", StructType(), True), StructField("country", StringType(), True), StructField("date", StructType(), True), StructField("location", StringType(), True), StructField("mobile", BooleanType(), True), StructField("parameter", StringType(), True), StructField("sourceName", StringType(), True), StructField("sourceType", StringType(), True), StructField("unit", StringType(), True), StructField("value", DoubleType(), True) ]) My question is: How do I account for the name and url under the attribution column, the unit and value under the averagingPeriod column, etc?
For reference, here is the dataset I am using: https://registry.opendata.aws/openaq/.