I am trying to create PySpark dataframe by using the following code
#!/usr/bin/env python # coding: utf-8 import pyspark from pyspark.sql.session import SparkSession import pyspark.sql.functions as f from pyspark.sql.functions import coalesce spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate() #spark.sql("use bocconi") tableName = "dynamic_pricing.final" inputDF = spark.sql("""SELECT * FROM dynamic_pricing.final WHERE year = '2019' AND mercati_id = '6'""") I get the following error:
Py4JJavaError: An error occurred while calling o48.sql. : org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 9730 tasks (1024.1 MB) is bigger than spark.driver.maxResultSize (1024.0 MB) I had gone through those links: link1 and link2, but still problem not resolved. Any ideas about how to solve this? I tried also this:
# Create new config conf = (SparkConf() .set("spark.driver.maxResultSize", 0)) # Create new context sc = SparkContext(conf=conf)