Hi All: I am trying to understand how the partitioned tables work. I have a sales table of size 12.9MB. I have a date column that is partitioned by day. My assumption is that when I filter the data table using this date column, the amount of data processed by BigQuery will be optimized. However, it doesn’t seem to work that way, and I would like to understand the reason. In the below query, I am filtering sales.date using a subquery. When I try to execute the query as such, it is processing the entire table of 12.9 MB.
However, if I replace the below subquery with the actual date (the same result that we have from the subquery), then the amount of data processed is 4.9 MB.
The subquery alone processes 630 KB of data. If my understanding is right, shouldn’t the below given query process 4.9 MB + 630 KB = ~ 5.6 MB? But, it still processes 12.9 MB. Can someone explain what’s happening here?
SELECT sales.*, FROM `my-project.transaction_data.sales_table` sales WHERE DATE(sales.date) >= DATE_SUB(DATE((select max(temp.date) FROM ` my-project.transaction_data.sales_table ` temp)), INTERVAL 2 YEAR) ORDER BY sales.customer, sales.date