Write dataframe to Teradata in Spark

Question

I have values in dataframe , and I have created a table structure in Teradata. My requirement is to load dataframe to Teradata. But I am getting error:

I have tried following code :

df.write.format("jdbc") .option("driver","com.teradata.jdbc.TeraDriver") .option("url","organization.td.intranet") .option("dbtable",s"select * from td_s_zm_brainsdb.emp") .option("user","userid") .option("password","password") .mode("append") .save()

I got an error :

java.lang.NullPointerException at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) ... 48 elided

I changed url option to make it similar to jdbc url, and ran following command:

df.write.format("jdbc") .option("driver","com.teradata.jdbc.TeraDriver") .option("url","jdbc:teradata//organization.td.intranet,CHARSET=UTF8,TMODE=ANSI,user=G01159039") .option("dbtable",s"select * from td_s_zm_brainsdb.emp") .option("user","userid") .option("password","password") .mode("append") .save()

Still i am getting error:

java.lang.NullPointerException at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) ... 48 elided

I have included following jars:

with --jars option tdgssconfig-16.10.00.03.jar terajdbc4-16.10.00.03.jar teradata-connector-1.2.1.jar

Version of Teradata 15 Spark version 2

all-things-cloud · Accepted Answer · 2019-01-29 21:06:09Z

1

Change the jdbc_url and dbtable to the following

 .option("url","jdbc:teradata//organization.td.intranet/Database=td_s_zm_brainsdb) .option("dbtable","emp")

Also note in teradata, there are no row locks, so the above will create a table lock. i.e. it will not be efficient - parallel writes from sparkJDBC are not possible.

Native tools of teradata - fastloader /bteq combinations will work. Another option - that requires a complicated set up is Teradata Query Grid - this is super fast - Uses Presto behind the scene.

edited Jan 29, 2019 at 21:06

answered Jan 29, 2019 at 17:10

all-things-cloud

462 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

vinit Over a year ago

1- I am using Dataframe writer, my requirement is to read from table. How 'select * from table' will make difference?

all-things-cloud Over a year ago

I updated it. spark.apache.org/docs/latest/sql-data-sources-jdbc.html describes the syntax of jdbc write as well.

vinit Over a year ago

Thanks for help. But with this configurations also, I am getting an error - java.lang.NullPointerException at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) ... 48 elided

all-things-cloud Over a year ago

Is your read working? Want to be sure you are able to connect to teradata

vinit Over a year ago

No, reading is also not working. It either means , I am missing some option in jdbc parameter , or version of jars are not in sync with version of teradata and hadoop. I am using teradata 14 and HDP 2.x

Ajay Kharade · Accepted Answer · 2019-02-05 18:44:00Z

Below is code useful while reading data from Teradata table,

 df = (spark.read.format("jdbc").option("driver", "com.teradata.jdbc.TeraDriver") .option("url", "jdbc:teradata//organization.td.intranet/Database=td_s_zm_brainsdb") .option("dbtable", "(select * from td_s_zm_brainsdb.emp) AS t") .option("user", "userid") .option("password", "password") .load())

This will create data frame in Spark.

For writing data back to database below is statement,

Saving data to a JDBC source

jdbcDF.write \ .format("jdbc") \ .option("url", "jdbc:teradata//organization.td.intranet/Database=td_s_zm_brainsdb") \ .option("dbtable", "schema.tablename") \ .option("user", "username") \ .option("password", "password") \ .save()

I have to write data back to teradata, not to read from teradata.

Adriaan · Accepted Answer · 2023-04-24 11:46:40Z

The JDBC Url should be in the following form :

val jdbcUrl = s"jdbc:teradata://${jdbcHostname}/database=${jdbcDatabase},user=${jdbcUsername},password=${jdbcPassword}"

It was causing an exception, because I didn't supply username and password.

Collectives™ on Stack Overflow

Write dataframe to Teradata in Spark

3 Answers 3

5 Comments

Saving data to a JDBC source

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Saving data to a JDBC source

1 Comment

Comments

Related