Has the guidance on the setting for 'cost threshold for parallelism' changed with the advent of columnstore indexes?

Question

First off, what I'm not asking. I'm not asking what my setting should be.

Many are recommending upping the value past the default, and I certainly understand why that's the case for B-Tree based queries. But I've been reading about the (almost) linear scalability of in-memory clustered columnstore indexes, and I'm wondering if setting the cost threshold too high could cause SQL Server to starve columnstore-based queries for CPU cores.

So the question is this: Does SQL Server treat columnstore indexes differently when the 'cost threshold for parallelism' is concerned, and should that cause me to change my decision about what my initial setting should be?

Brent Ozar · Accepted Answer · 2014-11-05 14:41:51Z

Beyond the Cost Threshold setting, SQL Server appears to treat parallelism differently for columnstore indexes depending on your SQL Server version (2012 vs 2014) and even the datatypes in your table.

I'd start with Joe Chang's post benchmarking decimal vs float datatypes, and read the comments on that post as well. If you want to get exactly the right MAXDOP and Cost Threshold for Parallelism settings for your system, you'll need to perform the level of detailed testing that Joe does in his post, and that takes a lot of work. Because of that, I would focus on your system's primary bottleneck first - use wait stats to make sure parallelism or CPU pressure are problems for you, and then start by tuning the most CPU-intensive queries rather than making system settings changes.

Joe's got a good starting point there. I'm not really making changes; I'm performing the setup of a brand new AlwaysOn cluster (and finding out all of the creative ways you can conjugate 4 letter verbs starting with the letter F!) At the moment, I'm running most of my setup points from a checklist I found from some dude's SQL Server consultancy. (Thanks, by the way -- it's a great checklist!) — Dave Markle
– Dave Markle, Commented Nov 5, 2014 at 14:58

Dave Markle · Accepted Answer · 2014-11-07 16:13:58Z

TL;DR: The suggested initial setting of 50 you read about remains a fine place to start. MAXDOP of 1 physical core per NUMA node is a good setting for a server like ours which serves both OLTP and OLAP workloads.

Corrolary: SQL Server is really, really good at what it does.

My principal worry with this setting was whether or not I would inhibit parallel execution on a clustered columnstore-based index for what should be pretty short queries. Would a setting of 50 cause what should be a sub-1 second query on to take a lot more time? Since columnstore indexes scale so well with CPUs, would the 'cost threshold for parallelism' setting just be ignored?

Q: Will SQL Server even honor 'cost threshold for parallelism' for columnstore indexes?
A: Yes. When configured with a ridiculous setting of 30,000, parallelism for columnstore indexes was effectively disabled for my workloads. Trying some other, still obscene values (1,500) inhibited parallelism for workloads that nominally took about a second to run, but queries which nominally run in about 10 or more seconds exhibited parallel execution plans.
Q: Is a default setting of 50, as specified in some checklists out there, a safe value that won't inhibit parallelism for my columnstore based queries?
A: Yes, and by a long shot. Even jacking the value up to 500 still allowed parallelism for simple, short (sub-second) columnstore based queries.

About my server, workload, and results:

2x Xeon E2650v2, (2 NUMA nodes, 12 physical cores, 24 HT threads), 384 GB RAM
MAXDOP configured at 6 (6 physical cores per NUMA node)
SQL Server 2014 Enterprise CU4
Testing on 111,000,000 row clustered columnstore index, in 6 partitions (by year)

Two workloads tested:

SELECT COUNT(DISTINCT <low cardinality column>) FROM table;
SELECT COUNT(DISTINCT <high cardinality column>) FROM table;

The query of the high cardinality column took 84 seconds (elapsed) at thresholds over 1500, and about 14 seconds (elapsed) at thresholds under that number. The query of the low cardinality column took about 250ms (elapsed) at thresholds 500 and under, and 18 (elapsed) seconds at thresholds above 1500. (I didn't try to gauge the exact point at which it switched plans.) Interestingly enough, when parallelism is inhibited, the total CPU time for the low cardinality query shoots up dramatically; perhaps the server stops using batch mode for this query.

Heh, ultimately running tests leads to more questions, but that's all blog-fodder, and goes beyond the scope of this question.

I'm un-accepting this. I've done some more testing and have found this to be incorrect. I'll post more when I get some more information. Looks like the threshold of 50 was really deleterious to the performance of columnstore based queries -- it was actually forcing queries to run in Batch mode, causing them to take two orders of magnitude more than they otherwise would. — Dave Markle
– Dave Markle, Commented Nov 19, 2014 at 14:39

Community · Accepted Answer · 2018-11-13 23:33:11Z

I would add to the Joe Chang article that you should check out this one by Paul White, where he discusses a Trace Flag that essentially sets CTFP to 0 for the query you're running. I know it's not exactly what you're looking for, but that along with MAXDOP testing can give you a good idea about if your query going parallel will even be beneficial on column store indexesices. I've been trying it out a bit lately (in dev, I swear) in place of artificially complicating the plan to force parallelism.

Joe Obbish · Accepted Answer · 2020-09-10 19:57:44Z

I'm writing this answer from the point of view of SQL Server 2016+. 50 as a value for CTFP isn't a good fit for all columnstore workloads and I recommend starting around 30 instead. A few reasons for that:

Query costs will be lower than what you are used to due to higher compression rates and batch mode processing.
Queries on columnstore tables often use batch mode processing which has a different model for query parallelism than row mode execution. There is less overhead to parallel queries so parallelism can be a better choice for less expensive queries that would traditionally run best at DOP 1 without batch mode.
Queries on columnstore tables will have fewer parallel zones compared to rowstore queries. SQL Server will reserve fewer parallel worker threads so running out of threads on the server isn't as likely to happen if many parallel queries are running on the server at the same time.
There are significant optimizer cost model limitations with columnstore tables that aren't likely to be resolved soon. Rowgroup elimination and aggregate pushdown aren't factored into optimizer costs. If a query is able to eliminate nearly all rowgroups using elimination then it might be okay for it to run at DOP 1. However, if it can't then you might really want the query to run in parallel. The optimizer will assign the same cost either way.

In production workloads, I've seen queries on columnstore tables in the 30-49 optimizer unit range at MAXDOP 1 that really could have benefited from query parallelism. I can't say how common this is across all workloads, but it's something to be aware of.

Stack Exchange Network

Has the guidance on the setting for 'cost threshold for parallelism' changed with the advent of columnstore indexes?

4 Answers 4

Hot Network Questions

Has the guidance on the setting for 'cost threshold for parallelism' changed with the advent of columnstore indexes?

4 Answers 4

Related

Hot Network Questions