1

I am writing data from azure databricks to azure sql using pyspark. Code works well having no nulls, but when dataframe contains nulls I get following error:

databricks/spark/python/pyspark/sql/pandas/conversion.py:300: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below: Unable to convert the field Product. If this column is not necessary, you may consider dropping it or converting to primitive type before the conversion. Context: Unsupported type in conversion from Arrow: null Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true. warnings.warn(msg) ValueError: Some of types cannot be determined after inferring 

The dataframe must be written to sql, including the nulls. How do I solve this?

sqlContext = SQLContext(sc) def to_sql(df, table): finaldf = sqlContext.createDataFrame(df) finaldf.write.jdbc(url=url, table= table, mode ="overwrite", properties = properties) to_sql(data, f"TF_{table.upper()}") 

EDIT:

Solved it creating a function that maps pandas dtypes to sql dtypes and outputs columns and dtypes as one string.

def convert_dtype(df): df_mssql = {'int64': 'bigint', 'object': 'varchar(200)', 'float64': 'float'} mydict = {} for col in df.columns: if str(df.dtypes[col]) in df_mssql: mydict[col] = df_mssql.get(str(df.dtypes[col])) l = " ".join([str(k[0] + " " + k[1] + ",") for k in list(mydict.items())]) return l[:-1] 

Passing this string to the createTableColumnTypes option solved this scenario

jdbcDF.write \ .option("createTableColumnTypes", convert_dtype(df) \ .jdbc("jdbc:postgresql:dbserver", "schema.tablename", properties={"user": "username", "password": "password"}) 

1 Answer 1

2

For this you'll need to specify the schema in your write statement. Here's an example from the documentation, also linked below:

jdbcDF.write \ .option("createTableColumnTypes", "name CHAR(64), comments VARCHAR(1024)") \ .jdbc("jdbc:postgresql:dbserver", "schema.tablename", properties={"user": "username", "password": "password"}) 

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

Sign up to request clarification or add additional context in comments.

1 Comment

Hi, thanks for answering. I wrote a small function to map the pandas dtypes to one string containing columns and sql dtypes. Will edit this in my post.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.