I have a query which joins 4 tables and i used query pushdown to read it into a dataframe.
val df = spark.read.format("jdbc"). option("url", "jdbc:mysql://ip/dbname"). option("driver", "com.mysql.jdbc.Driver"). option("user", "username"). option("password", "password") .option("dbtable",s"($query) as temptable") .load() The number of records in individual tables are 430, 350, 64, 2354 respectively and it takes 12.784 sec to load and 2.119 sec for creating SparkSession
then I count the resultdata as,
val count=df.count() println(s"count $count") then the total execution time 25.806 sec and the result contains only 430 records.
When I try the same in sql workbench it only takes few sec to execute completely. Also I tried cache after load() but it take the same time. So how can I execute it much faster than what I did.