Skip to main content
added details from comments
Source Link
Hannah Vernon
  • 71.1k
  • 22
  • 178
  • 325

After re-running the query that does partitioning with the addition of MAXDOP=1, I am getting sorted segments just like in the non-partitioned test. So my error in leaving out that critical setting for the partitioned test. Now I am contemplating what is the correct logic for inserting new data. I often see this suggested: 1. create Staging Table 2. load Staging Table 3. add clustered index to Staging Table 4. INSERT INTO <Final Table> SELECT <list columns> FROM <Staging Table>

After re-running the query that does partitioning with the addition of MAXDOP=1, I am getting sorted segments just like in the non-partitioned test. So my error in leaving out that critical setting for the partitioned test. Now I am contemplating what is the correct logic for inserting new data. I often see this suggested: 1. create Staging Table 2. load Staging Table 3. add clustered index to Staging Table 4. INSERT INTO <Final Table> SELECT <list columns> FROM <Staging Table>

added 2522 characters in body
Source Link

Below is the SQL pattern that DID NOT partition but DID result in sorted segments

CREATE TABLE dbo.RiskExposure ( ... ); GO INSERT INTO dbo.RiskExposure WITH (TABLOCK) ( ... ) SELECT ... FROM dbo.RiskExposureStage WHERE Close_of_Bus_ > '2020-08-31' GO CREATE CLUSTERED INDEX CCI_RiskExposure ON dbo.RiskExposure ([Close_of_Bus_]); GO CREATE CLUSTERED COLUMNSTORE INDEX CCI_RiskExposure ON dbo.Risk WITH (MAXDOP = 1, DROP_EXISTING = ON); 

Below is a more complete SQL for the test which I ran that DID partition but DID NOT result in sorted segments.

CREATE TABLE dbo.RiskExposure ( ... ); GO INSERT INTO dbo.RiskExposure WITH (TABLOCK) ( ... ) SELECT ... FROM dbo.RiskExposureStage WHERE Close_of_Bus_ > '2020-08-31' -- test with a subset GO CREATE CLUSTERED INDEX CCI_RiskExposure ON dbo.RiskExposure ( [Close_of_Bus_] ) WITH (DATA_COMPRESSION = PAGE) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); CREATE CLUSTERED COLUMNSTORE INDEX CCI_RiskExposure ON dbo.RiskExposure WITH (DROP_EXISTING = ON) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); GO ALTER INDEX CCI_RiskExposure ON dbo.RiskExposure REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON); -- get stats SELECT partitions.partition_number, column_store_segments.segment_id, column_store_segments.min_data_id, column_store_segments.max_data_id, column_store_segments.row_count FROM sys.column_store_segments INNER JOIN sys.partitions ON column_store_segments.hobt_id = partitions.hobt_id INNER JOIN sys.indexes ON indexes.index_id = partitions.index_id AND indexes.object_id = partitions.object_id INNER JOIN sys.tables ON tables.object_id = indexes.object_id INNER JOIN sys.columns ON tables.object_id = columns.object_id AND column_store_segments.column_id = columns.column_id WHERE tables.name = 'RiskExposure' AND columns.name = 'Close_of_bus_' ORDER BY tables.name, columns.name, partitions.partition_number, column_store_segments.segment_id; 

Here's a subset of the stats for the test where the table was not partitioned. Note that the rowgroups are packed (except for #35) and ordered.

partition_number segment_id min_data_id max_data_id row_count 1 0 737668 737673 1048576 1 1 737673 737677 1048576 1 2 737677 737680 1048576 1 3 737680 737684 1048576 1 4 737684 737686 1048576 1 5 737686 737689 1048576 1 6 737689 737691 1048576 1 7 737691 737693 1048576 1 8 737693 737695 1048576 .... 1 37 737786 737789 1048576 1 38 737789 737794 1048576 1 39 737794 737799 742086 1 40 737799 737799 34421 

Here are the stats for the test where partitions were employed. Note that the rowgroups are not ordered. Note I'm showing just the middle partition for brevity.

partition_number segment_id min_data_id max_data_id row_count 17 0 737698 737721 1048576 17 1 737698 737708 1048576 17 2 737698 737716 1048576 17 3 737698 737711 1048576 17 4 737712 737728 1048576 17 5 737716 737717 324458 17 6 737717 737744 1048576 17 7 737721 737723 1048576 17 8 737708 737768 1048576 17 9 737711 737788 1048576 17 10 737728 737734 1048576 17 11 737744 737747 1048576 17 12 737788 737789 643250 17 13 737723 737726 1048576 17 14 737768 737772 1048576 17 15 737734 737739 1048576 17 16 737747 737750 1048576 17 17 737726 737728 1048576 17 18 737772 737776 1048576 17 19 737728 737728 193276 17 20 737739 737742 1048576 17 21 737750 737752 1048576 17 22 737742 737743 314208 17 23 737776 737779 1048576 17 24 737752 737755 1048576 17 25 737779 737781 1048576 17 26 737755 737757 1048576 17 27 737781 737784 1048576 17 28 737757 737760 1048576 17 29 737784 737786 920474 17 30 737760 737764 686860 17 31 737716 737789 967708 17 32 737728 737743 507484 

Below is a more complete SQL for the test which I ran that DID NOT result in sorted segments

CREATE TABLE dbo.RiskExposure ( ... ); GO INSERT INTO dbo.RiskExposure WITH (TABLOCK) ( ... ) SELECT ... FROM dbo.RiskExposureStage WHERE Close_of_Bus_ > '2020-08-31' -- test with a subset GO CREATE CLUSTERED INDEX CCI_RiskExposure ON dbo.RiskExposure ( [Close_of_Bus_] ) WITH (DATA_COMPRESSION = PAGE) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); CREATE CLUSTERED COLUMNSTORE INDEX CCI_RiskExposure ON dbo.RiskExposure WITH (DROP_EXISTING = ON) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); GO ALTER INDEX CCI_RiskExposure ON dbo.RiskExposure REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON); -- get stats SELECT partitions.partition_number, column_store_segments.segment_id, column_store_segments.min_data_id, column_store_segments.max_data_id, column_store_segments.row_count FROM sys.column_store_segments INNER JOIN sys.partitions ON column_store_segments.hobt_id = partitions.hobt_id INNER JOIN sys.indexes ON indexes.index_id = partitions.index_id AND indexes.object_id = partitions.object_id INNER JOIN sys.tables ON tables.object_id = indexes.object_id INNER JOIN sys.columns ON tables.object_id = columns.object_id AND column_store_segments.column_id = columns.column_id WHERE tables.name = 'RiskExposure' AND columns.name = 'Close_of_bus_' ORDER BY tables.name, columns.name, partitions.partition_number, column_store_segments.segment_id; 

Below is the SQL pattern that DID NOT partition but DID result in sorted segments

CREATE TABLE dbo.RiskExposure ( ... ); GO INSERT INTO dbo.RiskExposure WITH (TABLOCK) ( ... ) SELECT ... FROM dbo.RiskExposureStage WHERE Close_of_Bus_ > '2020-08-31' GO CREATE CLUSTERED INDEX CCI_RiskExposure ON dbo.RiskExposure ([Close_of_Bus_]); GO CREATE CLUSTERED COLUMNSTORE INDEX CCI_RiskExposure ON dbo.Risk WITH (MAXDOP = 1, DROP_EXISTING = ON); 

Below is a more complete SQL for the test which I ran that DID partition but DID NOT result in sorted segments.

CREATE TABLE dbo.RiskExposure ( ... ); GO INSERT INTO dbo.RiskExposure WITH (TABLOCK) ( ... ) SELECT ... FROM dbo.RiskExposureStage WHERE Close_of_Bus_ > '2020-08-31' -- test with a subset GO CREATE CLUSTERED INDEX CCI_RiskExposure ON dbo.RiskExposure ( [Close_of_Bus_] ) WITH (DATA_COMPRESSION = PAGE) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); CREATE CLUSTERED COLUMNSTORE INDEX CCI_RiskExposure ON dbo.RiskExposure WITH (DROP_EXISTING = ON) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); GO ALTER INDEX CCI_RiskExposure ON dbo.RiskExposure REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON); -- get stats SELECT partitions.partition_number, column_store_segments.segment_id, column_store_segments.min_data_id, column_store_segments.max_data_id, column_store_segments.row_count FROM sys.column_store_segments INNER JOIN sys.partitions ON column_store_segments.hobt_id = partitions.hobt_id INNER JOIN sys.indexes ON indexes.index_id = partitions.index_id AND indexes.object_id = partitions.object_id INNER JOIN sys.tables ON tables.object_id = indexes.object_id INNER JOIN sys.columns ON tables.object_id = columns.object_id AND column_store_segments.column_id = columns.column_id WHERE tables.name = 'RiskExposure' AND columns.name = 'Close_of_bus_' ORDER BY tables.name, columns.name, partitions.partition_number, column_store_segments.segment_id; 

Here's a subset of the stats for the test where the table was not partitioned. Note that the rowgroups are packed (except for #35) and ordered.

partition_number segment_id min_data_id max_data_id row_count 1 0 737668 737673 1048576 1 1 737673 737677 1048576 1 2 737677 737680 1048576 1 3 737680 737684 1048576 1 4 737684 737686 1048576 1 5 737686 737689 1048576 1 6 737689 737691 1048576 1 7 737691 737693 1048576 1 8 737693 737695 1048576 .... 1 37 737786 737789 1048576 1 38 737789 737794 1048576 1 39 737794 737799 742086 1 40 737799 737799 34421 

Here are the stats for the test where partitions were employed. Note that the rowgroups are not ordered. Note I'm showing just the middle partition for brevity.

partition_number segment_id min_data_id max_data_id row_count 17 0 737698 737721 1048576 17 1 737698 737708 1048576 17 2 737698 737716 1048576 17 3 737698 737711 1048576 17 4 737712 737728 1048576 17 5 737716 737717 324458 17 6 737717 737744 1048576 17 7 737721 737723 1048576 17 8 737708 737768 1048576 17 9 737711 737788 1048576 17 10 737728 737734 1048576 17 11 737744 737747 1048576 17 12 737788 737789 643250 17 13 737723 737726 1048576 17 14 737768 737772 1048576 17 15 737734 737739 1048576 17 16 737747 737750 1048576 17 17 737726 737728 1048576 17 18 737772 737776 1048576 17 19 737728 737728 193276 17 20 737739 737742 1048576 17 21 737750 737752 1048576 17 22 737742 737743 314208 17 23 737776 737779 1048576 17 24 737752 737755 1048576 17 25 737779 737781 1048576 17 26 737755 737757 1048576 17 27 737781 737784 1048576 17 28 737757 737760 1048576 17 29 737784 737786 920474 17 30 737760 737764 686860 17 31 737716 737789 967708 17 32 737728 737743 507484 
added 1815 characters in body
Source Link

Below is a more complete SQL for the test which I ran that DID NOT result in sorted segments

CREATE TABLE dbo.RiskExposure ( ... ); GO INSERT INTO dbo.RiskExposure WITH (TABLOCK) ( ... ) SELECT ... FROM dbo.RiskExposureStage WHERE Close_of_Bus_ > '2020-08-31' -- test with a subset GO CREATE CLUSTERED INDEX CCI_RiskExposure ON dbo.RiskExposure ( [Close_of_Bus_] ) WITH (DATA_COMPRESSION = PAGE) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); CREATE CLUSTERED COLUMNSTORE INDEX CCI_RiskExposure ON dbo.RiskExposure WITH (DROP_EXISTING = ON) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); GO ALTER INDEX CCI_RiskExposure ON dbo.RiskExposure REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON); -- get stats SELECT partitions.partition_number, column_store_segments.segment_id, column_store_segments.min_data_id, column_store_segments.max_data_id, column_store_segments.row_count FROM sys.column_store_segments INNER JOIN sys.partitions ON column_store_segments.hobt_id = partitions.hobt_id INNER JOIN sys.indexes ON indexes.index_id = partitions.index_id AND indexes.object_id = partitions.object_id INNER JOIN sys.tables ON tables.object_id = indexes.object_id INNER JOIN sys.columns ON tables.object_id = columns.object_id AND column_store_segments.column_id = columns.column_id WHERE tables.name = 'RiskExposure' AND columns.name = 'Close_of_bus_' ORDER BY tables.name, columns.name, partitions.partition_number, column_store_segments.segment_id; 

Below is a more complete SQL for the test which I ran that DID NOT result in sorted segments

CREATE TABLE dbo.RiskExposure ( ... ); GO INSERT INTO dbo.RiskExposure WITH (TABLOCK) ( ... ) SELECT ... FROM dbo.RiskExposureStage WHERE Close_of_Bus_ > '2020-08-31' -- test with a subset GO CREATE CLUSTERED INDEX CCI_RiskExposure ON dbo.RiskExposure ( [Close_of_Bus_] ) WITH (DATA_COMPRESSION = PAGE) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); CREATE CLUSTERED COLUMNSTORE INDEX CCI_RiskExposure ON dbo.RiskExposure WITH (DROP_EXISTING = ON) ON ps_QuarterlyPartitionScheme3m( [Close_of_Bus_] ); GO ALTER INDEX CCI_RiskExposure ON dbo.RiskExposure REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON); -- get stats SELECT partitions.partition_number, column_store_segments.segment_id, column_store_segments.min_data_id, column_store_segments.max_data_id, column_store_segments.row_count FROM sys.column_store_segments INNER JOIN sys.partitions ON column_store_segments.hobt_id = partitions.hobt_id INNER JOIN sys.indexes ON indexes.index_id = partitions.index_id AND indexes.object_id = partitions.object_id INNER JOIN sys.tables ON tables.object_id = indexes.object_id INNER JOIN sys.columns ON tables.object_id = columns.object_id AND column_store_segments.column_id = columns.column_id WHERE tables.name = 'RiskExposure' AND columns.name = 'Close_of_bus_' ORDER BY tables.name, columns.name, partitions.partition_number, column_store_segments.segment_id; 
edited title
Link
Loading
Source Link
Loading