1

We are importing daily data (~600K rows) from Google GA4 into our data warehouse.

We are trying to decide how we are going to store the data. We're considering either a partitioned table or a partitioned view that queries daily tables, with the main goal being to improve query speed.

So far, we've implemented a partitioned view, with a primary key on the id and event_date columns. We were happy to see that the estimated execution plan shows that a query that filters by the date range only runs a clustered index scan on the daily tables that are in the date range specified.

enter image description here

However, we were surprised to see how long it took for a query to return the total number of rows for three days (23 seconds) vs running a query against each day individually (less than 1 second).

We were planning on running a test by creating a partitioned table, but wanted to check to see if there were any recommendations or best practices we should know about.

enter image description here

11
  • The execution plan that you have looks like an estimated plan (neither times nor "estimated vs actual" row counts on the operators). In the case where you actually ran the query, can you collect an actual execution plan to tell which operators are accumulating the most time? Commented May 2, 2023 at 1:18
  • 1
    just create a clustered index on date. partitioning will most likely decrease speed a bit if anything else. Commented May 2, 2023 at 11:39
  • 1
    Hmm, then it might make sense to partition by date and create correlated PK by date too, not for speed only but for maintenance + speed combo. Backuping up the whole enchilada or moving dates in an out will be much faster if you partition. Are you gonna clear up old data? Or what kind of operations are you planning on supporting? Commented May 2, 2023 at 20:42
  • 1
    @MatthewWalk - in re: running multiple times, is the view slow twice in a row? Like if you execute two of your count queries that go against the view in the same batch, do they both run in over a minute? Or does the first run in a minute and the second one run fast? Similarly, are you always selecting from the view and then from the individual tables in your testing? Or do you sometimes change the order? If the latter, is the view always slower? Commented May 2, 2023 at 21:36
  • 1
    180 million is a fair bit, but if it's well indexed it won't matter. What will matter is when you try and manage it: updates/deletes and index rebuilds are going to be problematic. Yes, partitioned tables might be the answer, but we need to see some realistic queries first, showing how you intend to query the table. Please show that (as text not images) along with the full table and index definitions. Commented May 2, 2023 at 22:38

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.