I have a data warehouse comprised of four clustered columnstore index tables (CCI) and nine rowstore tables. These tables are used only for analytics and the CCI data is inserted from staging tables every 15 minutes. I am looking to optimize query performance by adding partitions and sorting.
All queries of this data are predicated on an integer field with about 350 distinct values.The leftmost CCI has 100M records and 125 columns. There are three child CCIs that have thethat same distinct IDinteger field. CCI 2 has 15M records and 150 columns, CCI 3 and 4 both have about 30M records and 25 columns each.
Of these 350 distinct IDsintegers the distribution of record count in the leftmost table is as follows:
- 5% Greater than 1M
- 46% Greater than 100K
- 83% Greater than 10K
Additionally, there are nine other rowstore tables that also join to the CCIs. These have trickle inserts, are children of the CCIs, and they all contain the same IDinteger field. These rowstores have similar or smaller record volumes, < 10 columns each, two contain LOBS, and two undergo mass-updates frequently (these updates are also predicated on the ID field).
How many partitions should I make?
Should I partition the rowstore tables also?
Are there important considerations I am overlooking?
Note regarding the "sorting" I mentioned earlier:
A date field in the leftmost CCI is often a secondary predicate in these queries, therefore I am looking into re-sorting that CCI by date every four weeks or so as maintenance. I will achieve this sort by dropping the CCI, adding a clustered rowstore index on the date, dropping that index, and then re-adding the CCI with MAXDOP=1. I am also looking at sorting the child CCIs by the join key to their parent.