Simple Inner join suggesting an Include index

Question

I have this simple inner join query and its execution plan master table has around 34K records and detail table has around 51K records. But this simple query is suggesting to add an index with include (containing all master columns that I included in the select). I wasn't expecting this what could be the reason and remedy.

 DECLARE @StartDrInvDate Date ='2017-06-01', @EndDrInvDate Date='2017-08-31' SELECT Mastertbl.DrInvoiceID, Mastertbl.DrInvoiceNo, Mastertbl.DistributorInvNo, PreparedBy, detailtbl.BatchNo, detailtbl.Discount, detailtbl.TradePrice, detailtbl.IssuedUnits, detailtbl.FreeUnits FROM scmDrInvoices Mastertbl INNER JOIN scmDrInvoiceDetails detailtbl ON Mastertbl.DrInvoiceID = detailtbl.DrInvoiceID WHERE (Mastertbl.DrInvDate BETWEEN @StartDrInvDate AND @EndDrInvDate)

My real curiosity is why it is suggesting this index - I normally not see this behavior with larger tables

With that number of rows you can safely ignore the suggestion provided you are happy with the current perfomance. — Serg
– Serg, Commented Aug 15, 2017 at 9:49
Just because SQL Server asked for that doesn't mean that you should create that. It bases it assumptions on the execution plan that it used on that particular query. Try adding a non clustered index on Mastertbl.DrInvoiceID and include Mastertbl.DrInvDate. Look at the query and play about indexes on it :) — dbajtr
– dbajtr, Commented Aug 15, 2017 at 9:51
@Serg This query is actually part of larger query that was taking some time and that query was suggesting this index when I remove all other joins and I have only this left and still suggesting the same index — Zia
– Zia, Commented Aug 15, 2017 at 9:58
If perfomance is an issue, the I'd definetly tried to follow the suggestion. Why the optimizer doesn't suggest covering index on other queries totally depends on those queries . — Serg
– Serg, Commented Aug 15, 2017 at 11:02

Gordon Linoff · Accepted Answer · 2017-08-15 10:40:16Z

For this query:

SELECT m.DrInvoiceID, m.DrInvoiceNo, m.DistributorInvNo, PreparedBy, d.BatchNo, d.Discount, d.TradePrice, d.IssuedUnits, d.FreeUnits FROM scmDrInvoices m INNER JOIN scmDrInvoiceDetails d ON m.DrInvoiceID = d.DrInvoiceID WHERE m.DrInvDate BETWEEN @StartDrInvDate AND @EndDrInvDate;

I would expect the basic indexes to be: scmDrInvoices(DrInvDate, DrInvoiceID) and scmDrInvoiceDetails(DrInvoiceID). This index would allow the query engine to quickly identify the rows that match the WHERE in the master table and then look up the corresponding values in scmDrInvoiceDetails.

The rest of the columns could then be included in either index so the indexes would cover the query. "Cover" means that all the columns are in the index, so the query plan does not need to refer to the original data pages.

The above strategy is what SQL Server is suggesting.

Caius Jard · Accepted Answer · 2017-08-15 10:47:22Z

You can perhaps see the logic of why it's suggesting to index the invoice date; it's done some calculation on the number of rows you want out of the number of rows it thinks there are currently, and it appears that the selectivity of an index on that column makes it worth indexing. If you want 3 rows out of 55,000, and you want it every 5 minutes forever, it makes sense to index. Especially if the growth rate of that table means that next year it'll be 3 rows out of 5.5 million.

The include recommendation is perhaps more naively recommending associating sufficient additional data with the indexed values such that the entire dataset demanded from the master table can be answered from the index, without hitting the table - indexes are essentially pointers to rows in a table; when the query engine has used the index to locate all the rows it will need, it then still needs to bash the table to actually get the data you want. By including data in an index you remove the need to go to the table and it's sensible sometimes, but not others (creating many indexes that essentially replicate most/all of a table data for seldom run queries is a waste of disk space).

Consider too, that the frequency with which you're running this query now, in a debug tool, is affecting SQLServer's opinion of how often the query is used. I routinely find my SQLAzure portal making index recommendations thanks to the devs running a query over and over, debugging it, when I actually know that in prod, that query will be used once a month, so I discard the recommendation to make an index that includes most the table, when the straight "index only the columns searched" will do fine, no include necessary

These recommendations thus shouldn't be blindly heeded as SQLServer cannot know what you intend to use this, or similar queries for in the real world applications. Index creation and maintenance should be done carefully and thoughtfully; for example it may be that this query is asking for this index, another query would want an index on a different column but it might make sense to create an index that keys on both columns (in a particular order) and then in whichever query searches on the column that is indexed second, include a predicate that hits the first indexed column regardless of whether the query needs it

Example, in your invoices table you have a column indicating whether it's paid or not, and somewhere else in your app you have another query that counts the number of unpaid invoices. You can either have 2 indexes - one on invoice date (for this query) and one on status (for that query) or one on both columns (status, date) and in this query have predicates of WHERE status = 'unpaid' AND date between... even though the status predicate is redundant. Why might it be redundant? Suppose you know you'll only ever be choosing invoices from last week that have not been sent out yet, so can only ever be unpaid.. This is what I mean by "be thoughtful about indexing" - you know lots about your app that SQLServer can never figure out.. By including the redundant status column in the "get invoices from last week" query (even though status is logically redundant) you allow the query engine to use an index that is ordered first by status, then by date. This means you can get away with having to only maintain one index, and it can be used by two queries

Index maintenance and logic of creation can be a full time job.. ;)

Collectives™ on Stack Overflow

Simple Inner join suggesting an Include index

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related