Update having taken partitioning all the way to production:
Deciding the right partitioning for a clustered columnstore index (CCI) is a very bespoke process. If the wrong partitions are chosen, performance and compression will be worse than in a non-partitioned scheme.
Because I was partitioning four CCIs, I chose the CCI with the least records and divided its record count by 1,048,576 (the ideal CCI rowgroup size). I used that quotient as my proposed number of partitions. Then I ran record count queries based on that scheme to return the actual row counts per partition. This step was to make sure there was reasonably even distribution of records among the partitions. There was. Lucky me.
An obstacle appeared: this production analysis process helped me arrive at the correct number of partitions, but only for production. My lower environments are much smaller than production. The chosen level of partitioning sliced the data so fine that I had no full rowgroups at all. The lower databases got bigger and query times stayed the same. IO did go down dramatically and I had to point to that repeatedly as the gains of this initiative were questioned. It was hard to prove that partitioning was really going to help until I got to production.
The Outcome: Partitioning has been a big success in production. IO is way down and my query times have been reduced by 70% or more. I also have more options for managing these tables in small chunks.
Some notes: pick the correct field to partition on. If you have queries that have to navigate a lot of partitions you may find degraded performance. Also, I have left room for growth, adding partitions and ranges to my partition function for data that is not there now but will be one day.
Original answer from only local testing:
Since asking this question I have been doing more research and a POC locally. It was suggested I share this POC in an answer.
In my POC I chose to use a partition function of:
CREATE PARTITION FUNCTION [MyIntPF](int) AS RANGE LEFT FOR VALUES ( N'50' , N'100' , N'150' , N'200' , N'250' , N'300' , N'350' , N'400' , N'450' , N'500' ); CREATE PARTITION SCHEME [MyIntPS] AS PARTITION [MyIntPF] TO ( [MyInt050fg] , [MyInt100fg] , [MyInt150fg] , [MyInt200fg] , [MyInt250fg] , [MyInt300fg] , [MyInt350fg] , [MyInt400fg] , [MyInt450fg] , [MyInt500fg] , [MyInt000fg] );
This function assigns 50 MyInts to each partition with room for a little growth.
Remember that I have roughly 350 distinct MyInts across the 170M records in the PROD CCIs. David Browne suggested a minimum record size of 1M in a partition, which make sense in order to optimize a CCI compressed segment. I am erring larger for two reasons. The first reason is to avoid creating a 100 partition POC monster. The second is that I presume that 1M applies to each table in the partition. I am partitioning four columnstores, the smallest of which has 25M records. If I broke it into 100 pieces it would never achieve a full segment.
In my local development DB I have 2.2M records in the leftmost CCI, and even less than that in the child CCIs. This presents a problem for creating a realistic replication of PROD. I really should prioritize a little extra time to make a big local DB for this, but in the meantime, here are the before/after IO results of the local partition. I have queried for an aggregate from my leftmost CCI predicated on MyInt = a single value.
Not Partitioned
Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 1548, lob physical reads 0, lob read-ahead reads 44. Segment reads 4, segment skipped 0.
Partitioned
Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 268, lob physical reads 0, lob read-ahead reads 0. Segment reads 1, segment skipped 0.
As expected, SQL Server was able to skip all but one of my partitions in a query with a MyInt equality predicate.
I am continuing to work on this and should have time to update here as things progress.