TL;DR: The suggested initial setting of 50 you read about remains a fine place to start. MAXDOP of 1 physical core per NUMA node is a good setting for a server like ours which serves both OLTP and OLAP workloads.
Corrolary: SQL Server is really, really good at what it does.
My principal worry with this setting was whether or not I would inhibit parallel execution on a clustered columnstore-based index for what should be pretty short queries. Would a setting of 50 cause what should be a sub-1 second query on to take a lot more time? Since columnstore indexes scale so well with CPUs, would the 'cost threshold for parallelism' setting just be ignored?
- Q: Will SQL Server even honor 'cost threshold for parallelism' for columnstore indexes?
A: Yes. When configured with a ridiculous setting of 30,000, parallelism for columnstore indexes was effectively disabled for my workloads. Trying some other, still obscene values (1,500) inhibited parallelism for workloads that nominally took about a second to run, but queries which nominally run in about 10 or more seconds exhibited parallel execution plans.
Q: Is a default setting of 50, as specified in some checklists out there, a safe value that won't inhibit parallelism for my columnstore based queries?
- A: Yes, and by a long shot. Even jacking the value up to 500 still allowed parallelism for simple, short (sub-second) columnstore based queries.
About my server, workload, and results:
- 2x Xeon E2650v2, (2 NUMA nodes, 12 physical cores, 24 HT threads), 384 GB RAM
- MAXDOP configured at 6 (6 physical cores per NUMA node)
- SQL Server 2014 Enterprise CU4
- Testing on 111,000,000 row clustered columnstore index, in 6 partitions (by year)
Two workloads tested:
The query of the high cardinality column took 84 seconds (elapsed) at thresholds over 1500, and about 14 seconds (elapsed) at thresholds under that number. The query of the low cardinality column took about 250ms (elapsed) at thresholds 500 and under, and 18 (elapsed) seconds at thresholds above 1500. (I didn't try to gauge the exact point at which it switched plans.) Interestingly enough, when parallelism is inhibited, the total CPU time for the low cardinality query shoots up dramatically; perhaps the server stops using batch mode for this query.
Heh, ultimately running tests leads to more questions, but that's all blog-fodder, and goes beyond the scope of this question.