0

Consider the following script on the 10 GB StackOverflow2010 database. Be warned that it clears the plan cache.

IF NOT EXISTS (SELECT * FROM sys.types WHERE is_table_type = 1 AND name = 'ExampleTvp') BEGIN CREATE TYPE ExampleTvp AS TABLE ( AcceptedAnswerId INT NOT NULL, AnswerCount INT NOT NULL, Noise INT PRIMARY KEY (AcceptedAnswerId, AnswerCount) ) END GO IF NOT EXISTS (SELECT * FROM sys.Indexes WHERE name = 'IX_AcceptedAnswerId_AnswerCount_LastActivityDate') BEGIN CREATE NONCLUSTERED INDEX IX_AcceptedAnswerId_AnswerCount_LastActivityDate ON dbo.Posts (AcceptedAnswerId, AnswerCount, LastActivityDate); END GO CREATE OR ALTER PROC SimpleJoinToTvp (@Param ExampleTvp READONLY) AS BEGIN SELECT Noise AS [MyVeryUniqueString] FROM @Param AS TableParam JOIN dbo.Posts ON Posts.AcceptedAnswerId = TableParam.AcceptedAnswerId AND Posts.AnswerCount = TableParam.AnswerCount WHERE Posts.LastActivityDate = '20080101' OPTION(MAXDOP 1); END GO DBCC FREEPROCCACHE; GO DECLARE @Foo ExampleTvp; INSERT @Foo SELECT DISTINCT TOP (20) AcceptedAnswerId, AnswerCount, Noise = 1234 FROM dbo.Posts WHERE Id % 2 = 0 AND AnswerCount IS NOT NULL AND AcceptedAnswerId IS NOT NULL ORDER BY AcceptedAnswerId; EXEC SimpleJoinToTvp @Foo; SELECT query_plan FROM sys.dm_exec_cached_plans CROSS APPLY sys.dm_exec_query_plan(plan_handle) WHERE CONVERT(NVARCHAR(MAX), query_plan) LIKE '%MyVeryUniqueString%' AND CONVERT(NVARCHAR(MAX), query_plan) NOT LIKE '%dm_exec_cached_plans%' 

When I run this on SQL Server 2022 with compatibility level 100, the final query returns this plan. Plan

Observe the estimated row count for each node. The loop thinks that only one row is going to come out, but:

  • The clustered index scan puts 20 loops in
  • The nonclustered index seek puts one row in 20 times, which is 20 rows
  • The join as shown in the plan has no predicate, so it should not be able to reduce the row count.

Annotated plan

How does this make sense? By what algorithm has the estimated execution plan determined that one row will come out of this loop, despite the loop not being a filter and having 20 rows go in from the top and the bottom?

I have seen this problem outside of toy examples like this. For example, I have a production server on compatibility level 100 that has a nested loop that scans the primary key of a table-valued parameter and nested loop joins it to a non-clustered index. Both the inner and outer part of the loop estimate the same number of rows coming out (about half a million), but the loop estimates only one row. I had thought this terrible estimate was the result of an ascending key problem. My hope is that answering the question I have asked here will lead me to a better understanding of how such estimates are made.

0

1 Answer 1

7

Your example includes the predicate:

Posts.LastActivityDate = '20080101' 

There are no rows in the Posts table that match.

The estimates on the seek and join result would be closer to zero, but SQL Server uses a floor of 1 row for estimation and plan selection purposes.

Perhaps you were misled by the join being an apply, where the predicates appear on the inner side instead of at the join itself?

It doesn't matter what physical type of join is used; the cardinality of the join result is the same, of course.

Seek on Posts
Posts Index Seek


The other way to think about it is to look at a hash or merge join plan for the same statement:

Hash plan

The estimate on the Posts scan with the date predicate produces an estimate of 1.00245 rows. When converted to an apply (ignoring the other predicates) with 20 rows driving the join, the calculated estimate on the inner side would be 1.00245 / 20 = 0.0501225. This is obviously below the floor of 1 row, so it gets rounded up.


More generally, you should not expect cardinality consistency across the operators of an execution plan. They do start off consistent when the initial round of estimation is performed, but after that smaller areas of the plan can be explored and replaced with equivalent operations.

These equivalent operations usually come with a new local cardinality estimate. SQL Server makes no effort to ensure the estimates remain consistent after these replacements. After all, there is no general way to decide which statistical estimation is 'better' and so should be preferred over earlier ones.

Remember, estimates help the optimizer make (local) plan element choices; any usefulness to people reading the whole plan is just a happy side effect.

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.