0

I am designing a BigQuery table, which is a never expiring table. It is more of a table where the row is stored based on a Product ID. There could be daily inserts and same Product ID could be inserted again (like maintaining a historical data).

There will be a VIEW written on this table which reads the latest version of Product ID based on the last inserted timestamp.

SELECT ARRAY_AGG(PRODUCTS ORDER BY INSERT_TIMESTAMP DESC LIMIT 2)[OFFSET(0)] from dataset1.PRODUCTS group by PRODUCTID 

Will Partitioning this table based on INSERT_TIMESTAMP do any help? I don't think so. Please confirm.

2 Answers 2

1

The query that you have provided won't receive any benefit from partitioning. To reduce the cost of the query and runtime, you should add a filter (if possible) to restrict INSERT_TIMESTAMP to a specific period of time, such as the last seven days.

Sign up to request clarification or add additional context in comments.

Comments

1

It depends on how you are preferring to use the table. If the data doesn't grow exponentially then you can follow the same structure you are currently using. If you think the persisting data will grow humongous in future, then partitioning the table & querying within the specified time range is a good way to plan. You may also create a daily/weekly/monthly (upto you) materialized view that maintains the latest aggregate date of all product id so that you can combine your materialized view & arr_agg query with the definitive range of insert_timestamp for all product ids

SELECT ARRAY_AGG(PRODUCTS ORDER BY INSERT_TIMESTAMP DESC LIMIT 2)[OFFSET(0)] FROM dataset1.PRODUCTS WHERE INSERT_TIMESTAMP >= `Last X Months Timestamp` GROUP BY PRODUCTID 

1 Comment

Thanks. I ended up creating a Timestamp partition. I would do a full load once a month, and all other days would be incremental updates to this table. So, once the full load is done, the older partitions are deleted/dropped (to stop the ever growing volume). I would have a View to get the latest version (offset(0)) on the entire table without the timestamp condition now (since the older partitions are dropped).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.