Consider the following script on the 10 GB StackOverflow2010 database. Be warned that it clears the plan cache.
IF NOT EXISTS (SELECT * FROM sys.types WHERE is_table_type = 1 AND name = 'ExampleTvp') BEGIN CREATE TYPE ExampleTvp AS TABLE ( AcceptedAnswerId INT NOT NULL, AnswerCount INT NOT NULL, Noise INT PRIMARY KEY (AcceptedAnswerId, AnswerCount) ) END GO IF NOT EXISTS (SELECT * FROM sys.Indexes WHERE name = 'IX_AcceptedAnswerId_AnswerCount_LastActivityDate') BEGIN CREATE NONCLUSTERED INDEX IX_AcceptedAnswerId_AnswerCount_LastActivityDate ON dbo.Posts (AcceptedAnswerId, AnswerCount, LastActivityDate); END GO CREATE OR ALTER PROC SimpleJoinToTvp (@Param ExampleTvp READONLY) AS BEGIN SELECT Noise AS [MyVeryUniqueString] FROM @Param AS TableParam JOIN dbo.Posts ON Posts.AcceptedAnswerId = TableParam.AcceptedAnswerId AND Posts.AnswerCount = TableParam.AnswerCount WHERE Posts.LastActivityDate = '20080101' OPTION(MAXDOP 1); END GO DBCC FREEPROCCACHE; GO DECLARE @Foo ExampleTvp; INSERT @Foo SELECT DISTINCT TOP (20) AcceptedAnswerId, AnswerCount, Noise = 1234 FROM dbo.Posts WHERE Id % 2 = 0 AND AnswerCount IS NOT NULL AND AcceptedAnswerId IS NOT NULL ORDER BY AcceptedAnswerId; EXEC SimpleJoinToTvp @Foo; SELECT query_plan FROM sys.dm_exec_cached_plans CROSS APPLY sys.dm_exec_query_plan(plan_handle) WHERE CONVERT(NVARCHAR(MAX), query_plan) LIKE '%MyVeryUniqueString%' AND CONVERT(NVARCHAR(MAX), query_plan) NOT LIKE '%dm_exec_cached_plans%' When I run this on SQL Server 2022 with compatibility level 100, the final query returns this plan. 
Observe the estimated row count for each node. The loop thinks that only one row is going to come out, but:
- The clustered index scan puts 20 loops in
- The nonclustered index seek puts one row in 20 times, which is 20 rows
- The join as shown in the plan has no predicate, so it should not be able to reduce the row count.
How does this make sense? By what algorithm has the estimated execution plan determined that one row will come out of this loop, despite the loop not being a filter and having 20 rows go in from the top and the bottom?
I have seen this problem outside of toy examples like this. For example, I have a production server on compatibility level 100 that has a nested loop that scans the primary key of a table-valued parameter and nested loop joins it to a non-clustered index. Both the inner and outer part of the loop estimate the same number of rows coming out (about half a million), but the loop estimates only one row. I had thought this terrible estimate was the result of an ascending key problem. My hope is that answering the question I have asked here will lead me to a better understanding of how such estimates are made.


