Storing data as hive table in Apache Spark using Java

Question

I am doing one RND where i want to store my RDD to hive table. I have wirtten the code in Java and creating the RDD. After converting the RDD i am converting it to Data Frame and then store it in Hive table. But here i am facing two kind of different errors.

 public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setAppName("SparkMain"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); HiveContext hiveContext = new HiveContext(ctx.sc()); hiveContext.setConf("hive.metastore.uris", "thrift://address:port"); DataFrame df = hiveContext.read().text("/filepath"); df.write().saveAsTable("catAcctData"); df.registerTempTable("catAcctData"); DataFrame sql = hiveContext.sql("select * from catAcctData"); sql.show(); ctx.close();

}

If i am executing this program, it is working perfectly fine. I can see the table data in console.

But if i try below code it is saying org.apache.spark.sql.AnalysisException: Table not found: java

 public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setAppName("SparkMain"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); HiveContext hiveContext = new HiveContext(ctx.sc()); hiveContext.setConf("hive.metastore.uris", "thrift://address:port"); DataFrame sql = hiveContext.sql("select * from catAcctData"); sql.show(); ctx.close();

}

And if i try to save the table data using sqlContext it is saying java.lang.RuntimeException: Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

 public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setAppName("SparkMain"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); SQLContext hiveContext = new SQLContext(ctx.sc()); hiveContext.setConf("hive.metastore.uris", "thrift://address:port"); DataFrame df = hiveContext.read().text("/filepath"); df.write().saveAsTable("catAcctData"); df.registerTempTable("catAcctData"); DataFrame sql = hiveContext.sql("select * from catAcctData"); sql.show(); ctx.close();

}

I am bit confuse here. Please solve my query.

Regards, Pratik

Yehor Krivokon · Accepted Answer · 2018-02-01 11:51:23Z

1

Your problem is that you create your table using different HiveContext. In other words, HiveContext from the second program doesn't see "catAcctData" table because you've created this table with another HiveContext. Use one HiveContext for creating and reading tables.

Also I don't understand why you do this df.write().saveAsTable("catAcctData"); before creating temporary table. If you want to create temporary table you just need to use df.registerTempTable("catAcctData"); withoutdf.write().saveAsTable("catAcctData");.

answered Feb 1, 2018 at 11:51

Yehor Krivokon

8775 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Pratik Joshi Over a year ago

so Yehor Krivokon how can i read the previously created table ? Can you guide me please ?

Yehor Krivokon Over a year ago

1) HiveContext is deprecated. Add Hive support using: SparkSession.builder().enableHiveSupport(); 2) Create tables using previously created SQLContext. Use: SQLContext.getOrCreate(spark.sparkContext()); 3) Get SparkSession using this: SparkSession spark = SparkSession.builder().enableHiveSupport().getOrCreate();

Pratik Joshi Over a year ago

But i have one constrain that i can not use spark2.0 i have only spark 1.6 install in hadoop so i can not use sparksession.

Yehor Krivokon Over a year ago

Ok, you can get sparkContext using getOrcreate method, not from SparkSession.

Collectives™ on Stack Overflow

Storing data as hive table in Apache Spark using Java

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related