I have a spark application with a very large Dataframe. I am currently registering the dataframe as a tempTable so I can perform several queries against it.
When I am using RDDs I use persist(StorageLevel.MEMORY_AND_DISK()) what is the equivalent for a tempTable.
Below are two possibilities, I don't think option 2 will work because cacheTable tries to cache in memory and my table is too big to fit in memory.
DataFrame standardLocationRecords = inputReader.readAsDataFrame(sc, sqlc); // Option 1 // standardLocationRecords.persist(StorageLevel.MEMORY_AND_DISK()); standardLocationRecords.registerTempTable("standardlocationrecords"); // Option 2 // standardLocationRecords.registerTempTable("standardlocationrecords"); sqlc.cacheTable("standardlocationrecords"); How can I best cache my temptable so I can perform several queries against it without having to keep reloading the data.
Thanks, Nathan