Revisions to How many partitions should I make for my clustered columnstore index tables? Should I partition the rowstore tables also?

Tweeted twitter.com/StackDBAs/status/1052982909186048006

occurred Oct 18, 2018 at 18:01

consistent description in the low cardinality integer field

edited Oct 17, 2018 at 15:01

705
7
19

I have a data warehouse comprised of four clustered columnstore index tables (CCI) and nine rowstore tables. These tables are used only for analytics and the CCI data is inserted from staging tables every 15 minutes. I am looking to optimize query performance by adding partitions and sorting.

All queries of this data are predicated on an integer field with about 350 distinct values.The leftmost CCI has 100M records and 125 columns. There are three child CCIs that have thethat same distinct IDinteger field. CCI 2 has 15M records and 150 columns, CCI 3 and 4 both have about 30M records and 25 columns each.

Of these 350 distinct IDsintegers the distribution of record count in the leftmost table is as follows:

5% Greater than 1M
46% Greater than 100K
83% Greater than 10K

Additionally, there are nine other rowstore tables that also join to the CCIs. These have trickle inserts, are children of the CCIs, and they all contain the same IDinteger field. These rowstores have similar or smaller record volumes, < 10 columns each, two contain LOBS, and two undergo mass-updates frequently (these updates are also predicated on the ID field).

How many partitions should I make?

Should I partition the rowstore tables also?

Are there important considerations I am overlooking?

Note regarding the "sorting" I mentioned earlier:

A date field in the leftmost CCI is often a secondary predicate in these queries, therefore I am looking into re-sorting that CCI by date every four weeks or so as maintenance. I will achieve this sort by dropping the CCI, adding a clustered rowstore index on the date, dropping that index, and then re-adding the CCI with MAXDOP=1. I am also looking at sorting the child CCIs by the join key to their parent.

I have a data warehouse comprised of four clustered columnstore index tables (CCI) and nine rowstore tables. These tables are used only for analytics and the CCI data is inserted from staging tables every 15 minutes. I am looking to optimize query performance by adding partitions and sorting.

All queries of this data are predicated on an integer field with about 350 distinct values.The leftmost CCI has 100M records and 125 columns. There are three child CCIs that have the same distinct ID field. CCI 2 has 15M records and 150 columns, CCI 3 and 4 both have about 30M records and 25 columns each.

Of these 350 distinct IDs the distribution of record count in the leftmost table is as follows:

5% Greater than 1M
46% Greater than 100K
83% Greater than 10K

Additionally, there are nine other rowstore tables that also join to the CCIs. These have trickle inserts, are children of the CCIs, and they all contain the same ID field. These rowstores have similar or smaller record volumes, < 10 columns each, two contain LOBS, and two undergo mass-updates frequently (these updates are also predicated on the ID field).

How many partitions should I make?

Should I partition the rowstore tables also?

Are there important considerations I am overlooking?

Note regarding the "sorting" I mentioned earlier:

A date field in the leftmost CCI is often a secondary predicate in these queries, therefore I am looking into re-sorting that CCI by date every four weeks or so as maintenance. I will achieve this sort by dropping the CCI, adding a clustered rowstore index on the date, dropping that index, and then re-adding the CCI with MAXDOP=1. I am also looking at sorting the child CCIs by the join key to their parent.

I have a data warehouse comprised of four clustered columnstore index tables (CCI) and nine rowstore tables. These tables are used only for analytics and the CCI data is inserted from staging tables every 15 minutes. I am looking to optimize query performance by adding partitions and sorting.

All queries of this data are predicated on an integer field with about 350 distinct values.The leftmost CCI has 100M records and 125 columns. There are three child CCIs that have that same integer field. CCI 2 has 15M records and 150 columns, CCI 3 and 4 both have about 30M records and 25 columns each.

Of these 350 distinct integers the distribution of record count in the leftmost table is as follows:

5% Greater than 1M
46% Greater than 100K
83% Greater than 10K

Additionally, there are nine other rowstore tables that also join to the CCIs. These have trickle inserts, are children of the CCIs, and they all contain the same integer field. These rowstores have similar or smaller record volumes, < 10 columns each, two contain LOBS, and two undergo mass-updates frequently (these updates are also predicated on the ID field).

How many partitions should I make?

Should I partition the rowstore tables also?

Are there important considerations I am overlooking?

Note regarding the "sorting" I mentioned earlier:

A date field in the leftmost CCI is often a secondary predicate in these queries, therefore I am looking into re-sorting that CCI by date every four weeks or so as maintenance. I will achieve this sort by dropping the CCI, adding a clustered rowstore index on the date, dropping that index, and then re-adding the CCI with MAXDOP=1. I am also looking at sorting the child CCIs by the join key to their parent.

edited tags

Link

edited Oct 17, 2018 at 14:28

Hannah Vernon ♦

71.1k
22
178
325

consistency in table descriptions

Source Link

edited Oct 17, 2018 at 14:28

Cyndi Baker

705
7
19

I have a data warehouse comprised of four clustered columnstore index tables (CCI) and nine rowstore tables. These tables are used only for analytics and the CCI data is inserted from staging tables every 15 minutes. I am looking to optimize query performance by adding partitions and sorting.

All queries of this data are predicated on an integer field with about 350 distinct values.The leftmost CCI has about100M records and 125 columns and about 100 million records. There are three child CCIs that have the same distinct ID field. CCI 2 has 15M records and 150 columns, CCI 3 and 4 both have about 30M records and 25 columns each.

Of these 350 distinct IDs the distribution of record count in the leftmost table is as follows:

5% Greater than 1M
46% Greater than 100K
83% Greater than 10K

Additionally, there are nine other rowstore tables that also join to the CCIs. These have trickle inserts, are children of the CCIs, and they all contain the same ID field. These rowstores have similar or smaller record volumes, < 10 columns each, two contain LOBS, and two undergo mass-updates frequently (these updates are also predicated on the ID field).

How many partitions should I make?

Should I partition the rowstore tables also?

Are there important considerations I am overlooking?

Note regarding the "sorting" I mentioned earlier:

A date field in the leftmost CCI is often a secondary predicate in these queries, therefore I am looking into re-sorting that CCI by date every four weeks or so as maintenance. I will achieve this sort by dropping the CCI, adding a clustered rowstore index on the date, dropping that index, and then re-adding the CCI with MAXDOP=1. I am also looking at sorting the child CCIs by the join key to their parent.

I have a data warehouse comprised of four clustered columnstore index tables (CCI) and nine rowstore tables. These tables are used only for analytics and the CCI data is inserted from staging tables every 15 minutes. I am looking to optimize query performance by adding partitions and sorting.

All queries of this data are predicated on an integer field with about 350 distinct values.The leftmost CCI has about 125 columns and about 100 million records. There are three child CCIs that have the same distinct ID field. CCI 2 has 15M records and 150 columns, CCI 3 and 4 both have about 30M records and 25 columns each.

Of these 350 distinct IDs the distribution of record count in the leftmost table is as follows:

5% Greater than 1M
46% Greater than 100K
83% Greater than 10K

Additionally, there are nine other rowstore tables that also join to the CCIs. These have trickle inserts, are children of the CCIs, and they all contain the same ID field. These rowstores have similar or smaller record volumes, < 10 columns each, two contain LOBS, and two undergo mass-updates frequently (these updates are also predicated on the ID field).

How many partitions should I make?

Should I partition the rowstore tables also?

Are there important considerations I am overlooking?

Note regarding the "sorting" I mentioned earlier:

A date field in the leftmost CCI is often a secondary predicate in these queries, therefore I am looking into re-sorting that CCI by date every four weeks or so as maintenance. I will achieve this sort by dropping the CCI, adding a clustered rowstore index on the date, dropping that index, and then re-adding the CCI with MAXDOP=1. I am also looking at sorting the child CCIs by the join key to their parent.

I have a data warehouse comprised of four clustered columnstore index tables (CCI) and nine rowstore tables. These tables are used only for analytics and the CCI data is inserted from staging tables every 15 minutes. I am looking to optimize query performance by adding partitions and sorting.

All queries of this data are predicated on an integer field with about 350 distinct values.The leftmost CCI has 100M records and 125 columns. There are three child CCIs that have the same distinct ID field. CCI 2 has 15M records and 150 columns, CCI 3 and 4 both have about 30M records and 25 columns each.

Of these 350 distinct IDs the distribution of record count in the leftmost table is as follows:

5% Greater than 1M
46% Greater than 100K
83% Greater than 10K

Additionally, there are nine other rowstore tables that also join to the CCIs. These have trickle inserts, are children of the CCIs, and they all contain the same ID field. These rowstores have similar or smaller record volumes, < 10 columns each, two contain LOBS, and two undergo mass-updates frequently (these updates are also predicated on the ID field).

How many partitions should I make?

Should I partition the rowstore tables also?

Are there important considerations I am overlooking?

Note regarding the "sorting" I mentioned earlier:

A date field in the leftmost CCI is often a secondary predicate in these queries, therefore I am looking into re-sorting that CCI by date every four weeks or so as maintenance. I will achieve this sort by dropping the CCI, adding a clustered rowstore index on the date, dropping that index, and then re-adding the CCI with MAXDOP=1. I am also looking at sorting the child CCIs by the join key to their parent.

Source Link

asked Oct 17, 2018 at 14:14

Cyndi Baker

705
7
19

Loading

Stack Exchange Network

Return to Question