I need to understand if there is any difference between the below two approaches of caching while using spark sql and is there any performance benefit of one over the another (considering building the dataframes are costly and I want to reuse it many times/hit many actions) ?
1> Cache the original data frame before registering it as temporary table
df.cache()
df.createOrReplaceTempView("dummy_table")
2> Register the dataframe as temporary table and cache the table
df.createOrReplaceTempView("dummy_table")
sqlContext.cacheTable("dummy_table")
Thanks in advance.