Pyspark: No suitable Driver error while connecting to MS SQL Server 2017 from Spark 2.4 using Python

Question

I am facing a problem while running spark job using python i.e. pyspark. Please see below the code snippets

from pyspark.sql import SparkSession from os.path import abspath from pyspark.sql.functions import max,min,sum,col from pyspark.sql import functions as F spark = SparkSession.builder.appName("test").config("spark.driver.extraClassPath", "/usr/dt/mssql-jdbc-6.4.0.jre8.jar").getOrCreate() spark.conf.set("spark.sql.execution.arrow.enabled", "true") spark.conf.set("spark.sql.session.timeZone", "Etc/UTC") warehouse_loc = abspath('spark-warehouse') #loading data from MS SQL Server 2017 df = spark.read.format("jdbc").options(url="jdbc:sqlserver://10.90.3.22;DATABASE=TransTrak_V_1.0;user=sa;password=m2m@ipcl1234",properties = { "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver" },dbtable="Current_Voltage").load()

When I run this code, I am facing the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o38.load. : java.sql.SQLException: No suitable driver

The same code used to run fine earlier. However, due to some reasons, I had to reinstall centOS 7 again and then Python 3.6. I have set python 3.6 as a default python in spark i.e. when I start pyspark the default python is 3.6.

Just to mention, the system default python is Python 2.7. I am using centOS 7.

What is going wrong here? Can anybody please help on this?

Anybody please help. As I said, this was running fine earlier. Is it due to Spark 2.4.3 version? — pythondumb
– pythondumb, Commented Jul 9, 2019 at 18:17
I guess it is because that your jdbc driver doesn't match your db. — DennisLi
– DennisLi, Commented Jul 12, 2019 at 7:23

pythondumb · Accepted Answer · 2019-07-15 11:48:50Z

Ok, so after long search, it appears that probably spark doesn't work properly with openjdk i.e. java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64. When I see the default Java I see it is as follows

openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-b12) OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)

Then I tried to install Oracle JDK 8 from official site, however, then I faced separate issues. So in nutshell, I am not able to run the spark jobs like earlier.

Collectives™ on Stack Overflow

Pyspark: No suitable Driver error while connecting to MS SQL Server 2017 from Spark 2.4 using Python

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related