Write data to Redshift using Spark 2.0.1

Question

I am doing a POC, where I want to write some simple data set to Redshift.

I have following sbt file:

name := "Spark_POC" version := "1.0" scalaVersion := "2.10.6" libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.1" libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "2.0.1" resolvers += "jitpack" at "https://jitpack.io" libraryDependencies += "com.databricks" %% "spark-redshift" % "3.0.0-preview1"

and following code:

object Main extends App{ val conf = new SparkConf().setAppName("Hello World").setMaster("local[2]") System.setProperty("hadoop.home.dir", "C:\\Users\\Srdjan Nikitovic\\Desktop\\scala\\hadoop") val spark = SparkSession .builder() .appName("Spark 1") .config(conf) .getOrCreate() val tempS3Dir = "s3n://access_key:secret_access_key@bucket_location" spark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "access_key") spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "secret_access_key") val data = spark .read .csv("hello.csv") data.write .format("com.databricks.spark.redshift") .option("url", "jdbc:redshift://redshift_server:5439/database?user=user_name&password=password") .option("dbtable", "public.testSpark") .option("forward_spark_s3_credentials",true) .option("tempdir", tempS3Dir) .mode("error") .save() }

I am running the code from local Windows machine, thru Intellij.

I get the following error:

Exception in thread "main" java.lang.ClassNotFoundException: Could not load an Amazon Redshift JDBC driver; see the README for instructions on downloading and configuring the official Amazon driver.

I have tried with almost all the versions of Spark-Redshift drivers, (1.0.0, 2.0.0, 2.0.1 and now 3.0.0-PREVIEW) and I can't get this code to work.

Any help?

Joe Harris · Accepted Answer · 2016-12-06 14:50:00Z

1

You first need to download the Redshift JDBC driver from Amazon.

Then you must tell Spark about it in the environment where this code is running. E.g. for a spark-shell running on EMR:

spark-shell … --jars /usr/share/aws/redshift/jdbc/RedshiftJDBC41.jar

answered Dec 6, 2016 at 14:50

Joe Harris

14.1k4 gold badges49 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Srdjan Nikitovic Over a year ago

I am not running the code on EMR, I am running it thru Intellij on my laptop, by clicking the run button. Any idea about that?

Joe Harris Over a year ago

Yeah, the EMR thing is just an example. I don't know about IntelliJ specifically but you basically just need to tell the JVM it's running about the location of this jar.

Vzzarr Over a year ago

I used this Jar s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.20.1043/… but I am still getting an exception java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.redshift. Probably the suggested Jar is not enough?

Collectives™ on Stack Overflow

Write data to Redshift using Spark 2.0.1

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related