Variable vs string literal speed disparity

Question

I have a simple query that runs very quickly (about 1 second) when I use a string literal in my WHERE clause, such as:

select * from table where theDate >= '6/5/2016'

However, this query (which is not materially different) runs in over 2 minutes:

declare @thisDate as date set @thisDate = '6/5/2016' select * from table where theDate >= @thisDate

I tried using OPTION(RECOMPILE) as suggested here but it does nothing to improve the performance. the column theDate is a datetime, this database was recently migrated from SQL Server 2005.

There is no index on my theDate column, and the table has just over 1 billion rows. It's a shared resource and I don't want to start indexing without some assurance that it will help.

I find that using logic instead of a variable provides the same performance as a string literal:

select * from table where theDate >= dateadd(dd, -23, getdate())

But, if I replace the date integer with a variable integer the performance is again hindered.

How can I include a variable and maintain performance?

EDIT

Actual query included by request:

DECLARE @days INT Set @days = 7 select c.DEBT_KEY , c.new_value , c.CHANGE_DATE from changes c with (nolock) where c.C_CODE = 3 and c.old_value = 4 and c.CHANGE_DATE >= dateadd(dd, -@days, getdate())

No joins.

Query Plans

With Variable (xml explain plan):

With string literal (xml explain plan):

So I can see the difference is that the variable invokes a Clustered Index Scan (clustered) while the string literal invokes a Key Lookup (clustered)... I will need to refer to google because I don't know really anything about the performance pros/cons of these.

EDIT EDIT

This worked (xml explain plan):

DECLARE @days INT Set @days = 7 select c.DEBT_KEY , c.new_value , c.CHANGE_DATE from changes c with (nolock) where c.CHANGE_CODE = 3 and c.old_value = 4 and c.CHANGE_DATE >= dateadd(dd, -@days, getdate()) OPTION(OPTIMIZE FOR (@days = 7))

... I don't know why. Also I dislike the solution as it negates my purpose of using a variable, which is to put all variables at the top of the proc in order to mitigate the need to poke around in the code during the inevitable maintenance.

over 1 billion rows and no index - wow! Searching this lot for a literal should take quite long as well... I'd assume there was some kind of cached result left... Did you compare the execution plans? * I don't want to start indexing without some assurance that it will help* Well I can assure you that an index will help :-) — Gottfried Lesigang
– Gottfried Lesigang, Commented Jun 29, 2016 at 22:09
Well, there's an index, but not on theDate, I guess I could have been clearer on that :P — n8.
– n8., Commented Jun 29, 2016 at 22:12
With so many rows you should use indexes on all columns you want to use in sort, filter or join operations... An index is - easy spoken - a sorted list of the values of this column. To find a given value in this list is an extremely fast process (divide in half, look if bigger or smaller, ah, it's bigger! Divide the upper half and so on...) Once found, all values are sitting together as one block. Now imagine an unsorted heap. Your query'd have to scan the whole table row-by-row. — Gottfried Lesigang
– Gottfried Lesigang, Commented Jun 29, 2016 at 22:19
Your first stop should be the actual query plan. mssqltips.com/sqlservertutorial/2252/…. Any other analysis will have caching clouding the issue. So get actual/estimated query plans for fast and slow queries and compare. BTW, the "I don't want to start indexing without some assurance that it will help" attitude is an excellent approach - don't just throw indexes at it unless you understand the problem and have captured some before and after measurements — Nick.Mc
– Nick.Mc, Commented Jun 30, 2016 at 0:21

Joel Coehoorn · Accepted Answer · 2016-06-30 16:06:55Z

The fast version does a clustered key lookup (can go right to the part of the table where that value is found).

The slow version does a non-clustered seek and then merges that with a clustered index scan (it's having to scan through the whole table).

I see the @thisDate variable is defined as Date type. Is is possible the column is defined as a DateTime? If that were true, any value in @thisDate wouldn't exactly line up with your clustered index, meaning the database will have to check through the whole table as we see here. The literal, on the other hand, would be interpreted as a DateTime value, which does match your table column type and will work with the index.

If this is right, you can fix things with a very simple change:

declare @thisDate as datetime set @thisDate = '6/5/2016'

Only 4 characters difference.

You can also try this in conjunction with OPTION(RECOMPILE), and you may want to also take a look at OPTIMIZE FOR UNKNOWN.

This was suggested in the comments, it does not speed up the query. ~8000 rows returned in 2 minutes.

Collectives™ on Stack Overflow

Variable vs string literal speed disparity

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related