1

Is there a tool to convert Excel files into csv using Spark 1.X ? got this issue when executing this tuto https://github.com/ZuInnoTe/hadoopoffice/wiki/Read-Excel-document-using-Spark-1.x

Exception in thread "main" java.lang.NoClassDefFoundError: org/zuinnote/hadoop/office/format/mapreduce/ExcelFileInputFormat at org.zuinnote.spark.office.example.excel.SparkScalaExcelIn$.convertToCSV(SparkScalaExcelIn.scala:63) at org.zuinnote.spark.office.example.excel.SparkScalaExcelIn$.main(SparkScalaExcelIn.scala:56) at org.zuinnote.spark.office.example.excel.SparkScalaExcelIn.main(SparkScalaExcelIn.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.zuinnote.hadoop.office.format.mapreduce.ExcelFileInputFormat at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
1
  • have you added dependency https://mvnrepository.com/artifact/com.github.zuinnote/hadoopoffice-fileformat/1.0.0 to job's classpath? Commented Dec 13, 2017 at 15:57

2 Answers 2

1

Spark is unable to find org.zuinnote.hadoop.office.format.mapreduce.ExcelFileInputFormat File format class in classpath.

Supply below dependency to spark-submit using --jars parameter-

<!-- https://mvnrepository.com/artifact/com.github.zuinnote/hadoopoffice-fileformat --> <dependency> <groupId>com.github.zuinnote</groupId> <artifactId>hadoopoffice-fileformat</artifactId> <version>1.0.4</version> </dependency> 

Command:

spark-submit --jars hadoopoffice-fileformat-1.0.4.jar \ #rest of the command arguments 
Sign up to request clarification or add additional context in comments.

3 Comments

I am using sbt and I added the dependency but I still have the same problem
is jar present in driver and executor classpath?
SPark UI-> Environment tab -> check entries in 'spark.driver.extraClassPath' and spark.executor.extraClassPath properties
0

You have to build a fat jar that contains all the necessary dependencies. The example project on the HadoopOffice page shows how you build one. One you build the fat/uber jar you simply use it in Spark summit.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.