Convert Excel file to csv in Spark 1.X

Question

Is there a tool to convert Excel files into csv using Spark 1.X ? got this issue when executing this tuto https://github.com/ZuInnoTe/hadoopoffice/wiki/Read-Excel-document-using-Spark-1.x

Exception in thread "main" java.lang.NoClassDefFoundError: org/zuinnote/hadoop/office/format/mapreduce/ExcelFileInputFormat at org.zuinnote.spark.office.example.excel.SparkScalaExcelIn$.convertToCSV(SparkScalaExcelIn.scala:63) at org.zuinnote.spark.office.example.excel.SparkScalaExcelIn$.main(SparkScalaExcelIn.scala:56) at org.zuinnote.spark.office.example.excel.SparkScalaExcelIn.main(SparkScalaExcelIn.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.zuinnote.hadoop.office.format.mapreduce.ExcelFileInputFormat at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

have you added dependency https://mvnrepository.com/artifact/com.github.zuinnote/hadoopoffice-fileformat/1.0.0 to job's classpath? — Rahul Sharma
– Rahul Sharma, Commented Dec 13, 2017 at 15:57

Rahul Sharma · Accepted Answer · 2017-12-13 16:05:19Z

Spark is unable to find org.zuinnote.hadoop.office.format.mapreduce.ExcelFileInputFormat File format class in classpath.

Supply below dependency to spark-submit using --jars parameter-

<!-- https://mvnrepository.com/artifact/com.github.zuinnote/hadoopoffice-fileformat --> <dependency> <groupId>com.github.zuinnote</groupId> <artifactId>hadoopoffice-fileformat</artifactId> <version>1.0.4</version> </dependency>

Command:

spark-submit --jars hadoopoffice-fileformat-1.0.4.jar \ #rest of the command arguments

I am using sbt and I added the dependency but I still have the same problem
SPark UI-> Environment tab -> check entries in 'spark.driver.extraClassPath' and spark.executor.extraClassPath properties

Jörn Franke · Accepted Answer · 2018-10-01 22:04:01Z

You have to build a fat jar that contains all the necessary dependencies. The example project on the HadoopOffice page shows how you build one. One you build the fat/uber jar you simply use it in Spark summit.

Collectives™ on Stack Overflow

Convert Excel file to csv in Spark 1.X

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Related