I am facing a problem while running spark job using python i.e. pyspark. Please see below the code snippets
from pyspark.sql import SparkSession from os.path import abspath from pyspark.sql.functions import max,min,sum,col from pyspark.sql import functions as F spark = SparkSession.builder.appName("test").config("spark.driver.extraClassPath", "/usr/dt/mssql-jdbc-6.4.0.jre8.jar").getOrCreate() spark.conf.set("spark.sql.execution.arrow.enabled", "true") spark.conf.set("spark.sql.session.timeZone", "Etc/UTC") warehouse_loc = abspath('spark-warehouse') #loading data from MS SQL Server 2017 df = spark.read.format("jdbc").options(url="jdbc:sqlserver://10.90.3.22;DATABASE=TransTrak_V_1.0;user=sa;password=m2m@ipcl1234",properties = { "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver" },dbtable="Current_Voltage").load() When I run this code, I am facing the following error:
py4j.protocol.Py4JJavaError: An error occurred while calling o38.load. : java.sql.SQLException: No suitable driver The same code used to run fine earlier. However, due to some reasons, I had to reinstall centOS 7 again and then Python 3.6. I have set python 3.6 as a default python in spark i.e. when I start pyspark the default python is 3.6.
Just to mention, the system default python is Python 2.7. I am using centOS 7.
What is going wrong here? Can anybody please help on this?