Temp table with less number of rows results in Clustered Index Seek Whereas more rows result in Clustered Index Scan - SQL Server

Question

I have the following table and data:

CREATE TABLE myTable ( ID INT IDENTITY(1,1) PRIMARY KEY, Column1 VARCHAR(50), Column2 VARCHAR(50), Column3 VARCHAR(50), Column4 VARCHAR(50), Column5 VARCHAR(50), Column6 VARCHAR(50), Column7 VARCHAR(50), Column8 VARCHAR(50), Column9 VARCHAR(50), Column10 VARCHAR(50) ) DECLARE @i INT = 1 DECLARE @j INT = 1 DECLARE @distinct_value_count INT = 20 DECLARE @distinct_value_count_with_more_rows INT = 3 DECLARE @rows_per_distinct_value INT = (20000 - (@distinct_value_count_with_more_rows * 2000)) / (@distinct_value_count - @distinct_value_count_with_more_rows) WHILE @i <= @distinct_value_count BEGIN DECLARE @current_rows_per_value INT = @rows_per_distinct_value IF @i <= @distinct_value_count_with_more_rows BEGIN SET @current_rows_per_value = @rows_per_distinct_value + 2000 END SET @j = 1 WHILE @j <= @current_rows_per_value BEGIN INSERT INTO myTable (Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10) VALUES ('Value' + CAST(@i AS VARCHAR(2)), 'Value' + CAST(@j AS VARCHAR(5)), 'Value' + CAST(@j + 1 AS VARCHAR(5)), 'Value' + CAST(@j + 2 AS VARCHAR(5)), 'Value' + CAST(@j + 3 AS VARCHAR(5)), 'Value' + CAST(@j + 4 AS VARCHAR(5)), 'Value' + CAST(@j + 5 AS VARCHAR(5)), 'Value' + CAST(@j + 6 AS VARCHAR(5)), 'Value' + CAST(@j + 7 AS VARCHAR(5)), 'Value' + CAST(@j + 8 AS VARCHAR(5))) SET @j = @j + 1 END SET @i = @i + 1 END Alter Table dbo.myTable Add Column11 varchar(50), Column12 varchar(50) Alter Table dbo.myTable Add dateModified datetime Update dbo.myTable set Column11 = Column1 ,Column12 = Column1 Update Top (10) dbo.myTable Set Column11 = 'Value7' Where Column1 = 'Value1' Update Top (10) dbo.myTable Set Column12 = 'Value7' Where Column1 = 'Value1' Update Top (10) dbo.myTable Set Column11 = 'Value6' Where Column1 = 'Value1' Update Top (10) dbo.myTable Set Column12 = 'Value6' Where Column1 = 'Value1' Update Top (10) dbo.myTable Set Column11 = 'Value5' Where Column1 = 'Value1' Update Top (10) dbo.myTable Set Column12 = 'Value5' Where Column1 = 'Value1' Update dbo.myTable set dateModified = getdate() + ID CREATE NONCLUSTERED INDEX [Idx_col] ON [dbo].[myTable] ( [Column1] ASC, [Column11] ASC, [Column12] ASC, [dateModified] ASC ) INCLUDE([Column5],[Column6])

I have to filter based on a few columns and return all the columns from the table. In order to do that, I have an index that covers the columns that need to be filtered. I'm breaking down the query into two parts:

Get all the primary key rows that satisfy the filter and store them in a temp table. This query makes use of the non-clustered index.
Join this temp table back to main table on the primary key column so that the clustered index is used to get all the columns.

However, i'm facing an issue when I try to do this. In the first scenario I'm getting all the filtered rows into a temp table and then when I join it back to main table, it's doing a clustered index scan. In the second scenario, I'm only getting the top 50 rows into temp table and when I join this to the main table, it's doing a clustered index seek. I'm confused as to why this is happening. In both cases, there is no index on the temp table. I would appreciate it if anyone can help me understand what's going on. Thank you!

Scenario 1:

SELECT id INTO #tmpId FROM myTable WHERE Column1= 'Value1' AND Column11 In( 'Value1','Value5','Value6', 'Value7') And Column12 In ('Value1','Value6') And dateModified > dateAdd(day,-5, getdate()) SELECT * FROM myTable m JOIN #tmpId t ON m.id = t.id drop table if exists #tmpId

Execution Plan Scenario 1: https://www.brentozar.com/pastetheplan/?id=rkDAD-aLh

Scenario 2:

SELECT id INTO #tmpId FROM myTable WHERE Column1= 'Value1' AND Column11 In( 'Value1','Value5','Value6', 'Value7') And Column12 In ('Value1','Value6') And dateModified > dateAdd(day,-5, getdate()) Order by dateModified desc offset 0 rows fetch next 50 rows only SELECT * FROM myTable m JOIN #tmpId t ON m.id = t.id drop table if exists #tmpId

Scenario 2 Execution Plan: https://www.brentozar.com/pastetheplan/?id=rJVbuWaLh

What's the actual problem or reason for asking the question? As the current answer suggests, this sounds like the database is working as designed and doing a good job of selecting the appropriate plan based on the expected amount of data that will be processed. — user20042973
– user20042973, Commented Jun 6, 2023 at 22:14

Andy DB Analyst · Accepted Answer · 2023-06-06 20:44:15Z

Index seek is beneficial for retrieving a relatively small amount of data. And it can drastically slow down a query that retrieves a large number of rows. Sometimes inapropriate use of index seek can slow down a query for hours. That's why information about a number of rows to be selected is crucial for query optimization.

When you select 50 rows index seek works perfectly. But it looks like your data for 5 days is too large for index seek to be faster than index scan. That's why optimizer uses index scan this time.

You can also try using FORCESEEK and FORCESCAN optimizer hints to monitor the difference.

Would it help if I add a clustered index to the temp table? The result set should include all the rows from the temp table and so i'm not sure if the time needed to create the index is worth it. Any suggestions? — lifeisajourney
– lifeisajourney, Commented Jun 6, 2023 at 20:55
You could improve your answer by explaining what the items do, and perhaps link to a reference — Rohit Gupta
– Rohit Gupta, Commented Jun 6, 2023 at 22:33
Clustered index will only improve a merge join. With clustered index no need to sort data from #tmpId. But it's only 9% of your plan. And it will also increase insert time into this table. — Andy DB Analyst
– Andy DB Analyst, Commented Jun 7, 2023 at 6:32

J.D. · Accepted Answer · 2023-06-07 12:39:57Z

As Andy mentioned, there's nothing wrong with the execution plans you've provided. The SQL Engine is doing its job as expected. Different operations are more efficient and faster depending on the size of data being processed. Index Seeks are proficient for looking up smaller amounts of data, whereas Index Scans are generally more proficient for larger amounts of data.

Think of an index like a phone book. The names are the data, and it's sorted by LastName then by FirstName. If you needed to lookup a single person's phone number - John Smith for example, the fastest way would be to jump directly to the page with S... LastNames and jump directly to where John is on that page. Ergo, the equivalent of an Index Seek operation.

Now, for a different scenario, let's say you needed to look up every single person's phone number. You could arbitrarily jump to the 'S...' page and then jump to John, then jump to the B page and jump to Mary, and then jump to the G page and jump to Tom and then jump back to the S page and jump to Ralph. All this jumping, in large amounts, has overhead. Since you know you need to read every phone number in the phone book anyway, it would be faster (less overhead) to just start at the very first name on the first page, and read sequentially through the phone book until you get to the very last name on the last page. This would be the equivalent of an Index Scan operation.

That is essentially what your two queries and their execution plans are doing.

The SQL Engine has something called the tipping point it uses as a threshold to decide between when an Index Scan would be more performant vs an Index Seek, based on the cardinality of the data being looked up. But it's a complex algorithm that can't definitively be calculated as a static value, so don't worry about trying to figure out what it is. The SQL Engine generally knows best anyway.

Would it help if I add a clustered index to the temp table?

There's nothing to help here, as previously mentioned the SQL Engine is working correctly to provide you the most efficient plans it deems necessary. But if your question more so is would it change anything, the answer is no, not likely.

The choices it's making, with the Tipping Point algorithm, is based on the number of rows that need to be looked up. That number won't change just because you add a clustered index to your temp table.

Thank you, J.D.! Your explanations are always very helpful and enlightening! Really appreciate your time and effort. — lifeisajourney
– lifeisajourney, Commented Jun 7, 2023 at 13:02

Stack Exchange Network

Temp table with less number of rows results in Clustered Index Seek Whereas more rows result in Clustered Index Scan - SQL Server

2 Answers 2

Hot Network Questions

Temp table with less number of rows results in Clustered Index Seek Whereas more rows result in Clustered Index Scan - SQL Server

2 Answers 2

Related

Hot Network Questions