2

I am doing a POC, where I want to write some simple data set to Redshift.

I have following sbt file:

name := "Spark_POC" version := "1.0" scalaVersion := "2.10.6" libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.1" libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "2.0.1" resolvers += "jitpack" at "https://jitpack.io" libraryDependencies += "com.databricks" %% "spark-redshift" % "3.0.0-preview1" 

and following code:

object Main extends App{ val conf = new SparkConf().setAppName("Hello World").setMaster("local[2]") System.setProperty("hadoop.home.dir", "C:\\Users\\Srdjan Nikitovic\\Desktop\\scala\\hadoop") val spark = SparkSession .builder() .appName("Spark 1") .config(conf) .getOrCreate() val tempS3Dir = "s3n://access_key:secret_access_key@bucket_location" spark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "access_key") spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "secret_access_key") val data = spark .read .csv("hello.csv") data.write .format("com.databricks.spark.redshift") .option("url", "jdbc:redshift://redshift_server:5439/database?user=user_name&password=password") .option("dbtable", "public.testSpark") .option("forward_spark_s3_credentials",true) .option("tempdir", tempS3Dir) .mode("error") .save() } 

I am running the code from local Windows machine, thru Intellij.

I get the following error:

Exception in thread "main" java.lang.ClassNotFoundException: Could not load an Amazon Redshift JDBC driver; see the README for instructions on downloading and configuring the official Amazon driver.

I have tried with almost all the versions of Spark-Redshift drivers, (1.0.0, 2.0.0, 2.0.1 and now 3.0.0-PREVIEW) and I can't get this code to work.

Any help?

1 Answer 1

1

You first need to download the Redshift JDBC driver from Amazon.

Then you must tell Spark about it in the environment where this code is running. E.g. for a spark-shell running on EMR:

spark-shell … --jars /usr/share/aws/redshift/jdbc/RedshiftJDBC41.jar 
Sign up to request clarification or add additional context in comments.

3 Comments

I am not running the code on EMR, I am running it thru Intellij on my laptop, by clicking the run button. Any idea about that?
Yeah, the EMR thing is just an example. I don't know about IntelliJ specifically but you basically just need to tell the JVM it's running about the location of this jar.
I used this Jar s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.20.1043/… but I am still getting an exception java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.redshift. Probably the suggested Jar is not enough?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.