Why is a scan faster than seek for this predicate?

Question

I was able to reproduce a query performance issue that I would describe as unexpected. I'm looking for an answer that's focused on internals.

On my machine, the following query does a clustered index scan and takes about 6.8 seconds of CPU time:

SELECT ID1, ID2 FROM two_col_key_test WITH (FORCESCAN) WHERE ID1 NOT IN ( N'1', N'2',N'3', N'4', N'5', N'6', N'7', N'8', N'9', N'10', N'11', N'12',N'13', N'14', N'15', N'16', N'17', N'18', N'19', N'20' ) AND (ID1 = N'FILLER TEXT' AND ID2 >= N'' OR (ID1 > N'FILLER TEXT')) ORDER BY ID1, ID2 OFFSET 12000000 ROWS FETCH FIRST 1 ROW ONLY OPTION (MAXDOP 1);

The following query does a clustered index seek (only difference is removing the FORCESCAN hint) but takes about 18.2 seconds of CPU time:

SELECT ID1, ID2 FROM two_col_key_test WHERE ID1 NOT IN ( N'1', N'2',N'3', N'4', N'5', N'6', N'7', N'8', N'9', N'10', N'11', N'12',N'13', N'14', N'15', N'16', N'17', N'18', N'19', N'20' ) AND (ID1 = N'FILLER TEXT' AND ID2 >= N'' OR (ID1 > N'FILLER TEXT')) ORDER BY ID1, ID2 OFFSET 12000000 ROWS FETCH FIRST 1 ROW ONLY OPTION (MAXDOP 1);

The query plans are pretty similar. For both queries there are 120000001 rows read from the clustered index:

I am on SQL Server 2017 CU 10. Here is code to create and populate the two_col_key_test table:

drop table if exists dbo.two_col_key_test; CREATE TABLE dbo.two_col_key_test ( ID1 NVARCHAR(50) NOT NULL, ID2 NVARCHAR(50) NOT NULL, FILLER NVARCHAR(50), PRIMARY KEY (ID1, ID2) ); DROP TABLE IF EXISTS #t; SELECT TOP (4000) 0 ID INTO #t FROM master..spt_values t1 CROSS JOIN master..spt_values t2 OPTION (MAXDOP 1); INSERT INTO dbo.two_col_key_test WITH (TABLOCK) SELECT N'FILLER TEXT' + CASE WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) > 8000000 THEN N' 2' ELSE N'' END , ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) , NULL FROM #t t1 CROSS JOIN #t t2;

I am hoping for an answer that does more than call stack reporting. For example, I can see that sqlmin!TCValSSInRowExprFilter<231,0,0>::GetDataX takes significantly more CPU cycles in the slow query compared to the fast one:

Instead of stopping there, I'd like to understand what that is and why there's such a large difference between the two queries.

Why is there a large difference in CPU time for these two queries?

Paul White · Accepted Answer · 2018-10-03 12:10:18Z

Why is there a large difference in CPU time for these two queries?

The scan plan evaluates the following pushed non-sargable (residual) predicate for each row:

[two_col_key_test].[ID1]<>N'1' AND [two_col_key_test].[ID1]<>N'10' AND [two_col_key_test].[ID1]<>N'11' AND [two_col_key_test].[ID1]<>N'12' AND [two_col_key_test].[ID1]<>N'13' AND [two_col_key_test].[ID1]<>N'14' AND [two_col_key_test].[ID1]<>N'15' AND [two_col_key_test].[ID1]<>N'16' AND [two_col_key_test].[ID1]<>N'17' AND [two_col_key_test].[ID1]<>N'18' AND [two_col_key_test].[ID1]<>N'19' AND [two_col_key_test].[ID1]<>N'2' AND [two_col_key_test].[ID1]<>N'20' AND [two_col_key_test].[ID1]<>N'3' AND [two_col_key_test].[ID1]<>N'4' AND [two_col_key_test].[ID1]<>N'5' AND [two_col_key_test].[ID1]<>N'6' AND [two_col_key_test].[ID1]<>N'7' AND [two_col_key_test].[ID1]<>N'8' AND [two_col_key_test].[ID1]<>N'9' AND ( [two_col_key_test].[ID1]=N'FILLER TEXT' AND [two_col_key_test].[ID2]>=N'' OR [two_col_key_test].[ID1]>N'FILLER TEXT' )

The seek plan does two seeking operations:

Seek Keys[1]: Prefix: [two_col_key_test].ID1 = Scalar Operator(N'FILLER TEXT'), Start: [two_col_key_test].ID2 >= Scalar Operator(N'') Seek Keys[1]: Start: [two_col_key_test].ID1 > Scalar Operator(N'FILLER TEXT')

...to match this part of the predicate:

(ID1 = N'FILLER TEXT' AND ID2 >= N'' OR (ID1 > N'FILLER TEXT'))

A residual predicate is applied to rows that pass the seek conditions above (all rows in your example).

However, each inequality is replaced by two separate tests for less than OR greater than:

([two_col_key_test].[ID1]<N'1' OR [two_col_key_test].[ID1]>N'1') AND ([two_col_key_test].[ID1]<N'10' OR [two_col_key_test].[ID1]>N'10') AND ([two_col_key_test].[ID1]<N'11' OR [two_col_key_test].[ID1]>N'11') AND ([two_col_key_test].[ID1]<N'12' OR [two_col_key_test].[ID1]>N'12') AND ([two_col_key_test].[ID1]<N'13' OR [two_col_key_test].[ID1]>N'13') AND ([two_col_key_test].[ID1]<N'14' OR [two_col_key_test].[ID1]>N'14') AND ([two_col_key_test].[ID1]<N'15' OR [two_col_key_test].[ID1]>N'15') AND ([two_col_key_test].[ID1]<N'16' OR [two_col_key_test].[ID1]>N'16') AND ([two_col_key_test].[ID1]<N'17' OR [two_col_key_test].[ID1]>N'17') AND ([two_col_key_test].[ID1]<N'18' OR [two_col_key_test].[ID1]>N'18') AND ([two_col_key_test].[ID1]<N'19' OR [two_col_key_test].[ID1]>N'19') AND ([two_col_key_test].[ID1]<N'2' OR [two_col_key_test].[ID1]>N'2') AND ([two_col_key_test].[ID1]<N'20' OR [two_col_key_test].[ID1]>N'20') AND ([two_col_key_test].[ID1]<N'3' OR [two_col_key_test].[ID1]>N'3') AND ([two_col_key_test].[ID1]<N'4' OR [two_col_key_test].[ID1]>N'4') AND ([two_col_key_test].[ID1]<N'5' OR [two_col_key_test].[ID1]>N'5') AND ([two_col_key_test].[ID1]<N'6' OR [two_col_key_test].[ID1]>N'6') AND ([two_col_key_test].[ID1]<N'7' OR [two_col_key_test].[ID1]>N'7') AND ([two_col_key_test].[ID1]<N'8' OR [two_col_key_test].[ID1]>N'8') AND ([two_col_key_test].[ID1]<N'9' OR [two_col_key_test].[ID1]>N'9')

Rewriting each inequality e.g.:

[ID1] <> N'1' -> [ID1]<N'1' OR [ID1]>N'1'

...is counterproductive here. Collation-aware string comparisons are expensive. Doubling the number of comparisons explains most of the difference in CPU time you see.

You can see this more clearly by disabling the pushing of non-sargable predicates with undocumented trace flag 9130. That will show the residual as a separate Filter, with performance information you can inspect separately:

This will also highlight the slight cardinality misestimate on the seek, which explains why the optimizer chose the seek over the scan in the first place (it expected the seeking portion to eliminate some rows).

While the inequality rewrite may make (possibly filtered) index matching possible (to make the best use of the seeking ability of b-tree indexes), it would be better to subsequently revert this expansion if both halves end up in the residual. You could suggest this as an improvement on the SQL Server feedback site.

Note also that the original ("legacy") cardinality estimation model happens to select a scan by default for this query.

Stack Exchange Network

Why is a scan faster than seek for this predicate?

1 Answer 1

Hot Network Questions

Why is a scan faster than seek for this predicate?

1 Answer 1

Related

Hot Network Questions