Revision 33e47d55-f35a-4b98-b76f-15f908be8bd5

**A1**: You are right about the evaluation of `createOrReplaceTempView`. This will be evaluated *lazily* for the current Spark session. In other word if you terminate Spark session without accessing it the data will never be transfered into *temp1*.

**A2**: Let's examine the case through an example using your code. First let's save your data with:

 df.write.mode("overwrite").option("header", "true")
 .partitionBy("Year", "Month")
 .format("csv")
 .save("/tmp/partition_test1/")

And then load it with:

 val df1 = spark.read.option("header", "true")
 .csv("/tmp/partition_test1/")
 .where($"Year" === 2019 && $"Month" === 5)

Executing `df1.explain` will return:

 == Physical Plan ==
 *(1) FileScan csv [Day#328,SalesAmount#329,StoreNumber#330,Year#331,Month#332] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/tmp/partition_test1], PartitionCount: 0, Partition
 Filters: [isnotnull(Year#331), isnotnull(Month#332), (Year#331 = 2019), (Month#332 = 5)], PushedFilters: [], ReadSchema: struct<Day:string,SalesAmount:string,StoreNumber:string>

As you can see the `PushedFilters: []` array is empty although the `PartitionFilters[]` is not, indicating that Spark was able to apply filtering on partitions and therefore pruning the partitions that do not satisfy the `where` statement. 

If we slightly change the Spark query to:

 df1.where($"StoreNumber" === 1 && $"Year" === 2011 && $"Month" === 11).explain

 == Physical Plan ==
 *(1) Project [Day#462, SalesAmount#463, StoreNumber#464, Year#465, Month#466]
 +- *(1) Filter (isnotnull(StoreNumber#464) && (cast(StoreNumber#464 as int) = 1))
 +- *(1) FileScan csv [Day#462,SalesAmount#463,StoreNumber#464,Year#465,Month#466] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/tmp/partition_test1], PartitionCount: 1, Par
 titionFilters: [isnotnull(Month#466), isnotnull(Year#465), (Year#465 = 2011), (Month#466 = 11)], PushedFilters: [IsNotNull(StoreNumber)], ReadSchema: struct<Day:string,SalesAmount:string,Store
 Number:string>

Now both `PartitionFilters` and `PushedFilters` will take place minimizing Spark workload. As you can see Spark leverages both filters first by recognizing the existing partitions through PartitionFilters and then applying the predicate pushdown.

Exactly the same applies for `parquet` files with the big difference that parquet will utilize the predicate pushdown filters even more combining them with its internal columnar based system (as you already mentioned), which keeps metrics and stats over the data. So the difference with CSV files is that in the case of CSVs the predicate pushdown will take place when Spark is reading/scanning the CSV files excluding records that do not satisfy the predicate pushdown condition. When for parquet the predicate pushdown filter will be propagated to the parquet internal system resulting to even larger pruning of data.

In your case loading data from `createOrReplaceTempView` will not differ and the execution plan will remain the same.

Some useful links:

https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

https://www.waitingforcode.com/apache-spark-sql/predicate-pushdown-spark-sql/read