2

I have a small log dataframe which has metadata regarding the ETL performed within a given notebook, the notebook is part of a bigger ETL pipeline managed in Azure DataFactory.

Unfortunately, it seems that Databricks cannot invoke stored procedures so I'm manually appending a row with the correct data to my log table.

however, I cannot figure out the correct sytnax to update a table given a set of conditions :

the statement I use to append a single row is as follows :

spark_log.write.jdbc(sql_url, 'internal.Job',mode='append') 

this works swimmingly however, as my Data Factory is invoking a stored procedure,

I need to work in a query like

query = f""" UPDATE [internal].[Job] SET [MaxIngestionDate] date {date} , [DataLakeMetadataRaw] varchar(MAX) NULL , [DataLakeMetadataCurated] varchar(MAX) NULL WHERE [IsRunning] = 1 AND [FinishDateTime] IS NULL""" 

Is this possible ? if so can someone show me how?

Looking at the documentation this only seems to mention using select statements with the query parameter :

Target Database is an Azure SQL Database.

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

just to add this is a tiny operation, so performance is a non-issue.

1
  • to any lost souls wondering here, my work around was to pass a json blob on the completion of the notebook in my datafactory pipeline which i then parsed out and pass as parameters to my Stored Proc which in turn updated my log tables. Commented Apr 13, 2021 at 12:40

1 Answer 1

3

You can't do single record updates using jdbc in Spark with dataframes. You can only append or replace the entire table.

You can do updates using pyodbc- requires installing the MSSQL ODBC driver (How to install PYODBC in Databricks) or you can use jdbc via JayDeBeApi (https://pypi.org/project/JayDeBeApi/)

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.