2

I have a SQL Server table with over 11 million records. These records are organized by "Category" and "Platform". I am stumped by the following scenario ...

SELECT COUNT(*) FROM TableName WHERE Category = 'session' AND Platform = 'windows'; -- Returns 1261500 SELECT COUNT(*) FROM TableName WHERE Category = 'session' AND Platform = 'linux'; -- Returns 1890599 

So there are over 600K more records associated with 'linux' than 'windows'.

However, this query returns in 6-9 seconds ...

SELECT MAX(id) FROM TableName WHERE Category = 'session' AND Platform = 'linux'; 

Yet this one I have to kill after waiting over 13 minutes for a result ...

SELECT MAX(id) FROM TableName WHERE Category = 'session' AND Platform = 'windows'; 

Oh ... I also have the following index on the table ...

CREATE NONCLUSTERED INDEX [IX_TableName_CategoryPlatform] ON [dbo].[TableName] ( [Platform] ASC, [Category] ASC, [CreateDate] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO 

Whiskey, Tango, Foxtrot?

Why does the search term make a difference, particularly since there is an index in place?

UPDATE

I have just made the following observation ...

SELECT MAX(id) FROM TableName WHERE Platform = 'windows'; 

By dropping the Category from the query, the response is returned very quickly ...

UPDATE 2

I have created a couple of execution plans as requested. The thing I noticed, however is that the percentages in the plans generated by the "Paste The Plan" utility and what I am getting in SSMS appear to be different so I am including, below each link, the percentages that I am seeing in Management Studio.

For the following Query (which works) ...

SELECT MAX([MessageID]) [MaxID] FROM [BoothComm].[UniversalMessageQueue] WHERE [MessagePlatform]='windows'; 

https://www.brentozar.com/pastetheplan/?id=Sk9q59CqZ

  • 0% : Select
  • 0% : Stream Aggregate
  • 0% : Top
  • 100% : Index Scan

The next query (which doesn't work) I can only provide an ESTIMATED execution plan.

SELECT MAX(MessageID) AS [MaxID] FROM BoothComm.UniversalMessageQueue WHERE MessageCategory = 'session' AND MessagePlatform = 'windows' 

https://www.brentozar.com/pastetheplan/?id=r1zqnq09-

  • 0% : Select
  • 0% : Stream Aggregate
  • 0% : Top
  • 0% : Nested Loops (Inner Join) -- Why is this there??
  • 21% : Index Scan
  • 79% : Key Lookup -- Also new and seems to want to take up more time than anything else

(thanks for all the help!)

UPDATE 3

So after all of the below conversation and changes made I am still left with the question ...

Why does this query return in under 1 second (thanks to adding the ID to the index) ...

SELECT MAX(MessageID) AS [MaxID] FROM BoothComm.UniversalMessageQueue WHERE MessagePlatform = 'linux' AND MessageCategory = 'accounting' 

And this one take 13 -22 seconds to run ...

SELECT MAX(MessageID) AS [MaxID] FROM BoothComm.UniversalMessageQueue WHERE MessagePlatform = 'windows' AND MessageCategory = 'accounting' 

Same table, same indexes, execution plans are the absolute same. Everything is identical except for the MessagePlatform value. And the value which is responsible for the latency appears on fewer records than the other.

16
  • 1
    is id your clustering key? Commented Sep 19, 2017 at 13:07
  • 1
    What does the execution plan show and does it use your index? Commented Sep 19, 2017 at 13:08
  • 1
    How about if you add ID as an INCLUDE to the index? Commented Sep 19, 2017 at 13:11
  • 1
    Share your execution plans using Paste The Plan @ brentozar.com here are the instructions: How to Use Paste the Plan. Commented Sep 19, 2017 at 13:18
  • 1
    Also, share your table schema. Commented Sep 19, 2017 at 13:23

1 Answer 1

4

Your queries are slow because the table is not normalized. You should not be storing Category and Platforms as strings on every record. Instead they should be in lookup tables with an integer primary key. These keys would then be stored in your main table with appropriate non clustered indexes on each one. Then you should add a clustered index to your main table on a column that makes sense to have sorted in ascending order (preferably a unique integer).

As to the actual problem you are encountering, if you have no clustered index defined, the data is stored in a heap (i.e. an unsorted pile of data). The index you have will help but performance is hampered by the fact that you are using strings as keys, and from the looks of it these strings are not highly specific (many repeats). SQL Server may simply be deciding to do a full scan to answer your question, as it is estimating that will be faster than any other method.

Sign up to request clarification or add additional context in comments.

3 Comments

While I acknowledge and agree with the normalization practices you are describing, this table contains "snapshots" in time. The problem is that referenced values can be changed but these records cannot. So while 5 may be more efficient than 'windows', 5 may not always refer to 'windows' but this record must always reflect 'windows'.
Also, I DO have a clustered index defined (the primary key). It is just not involved in this query. Given that only one clustered index is allowed on a table, unclustered indexes are the only remaining option.
You could handle the history of the changes in the lookup table. Create new lookups to add new values to the system. Make sure to add datestamps as well to reflect when the changes took place. As for the clustered index, they are used in every query behind the scenes. Each non-clustered index stores the clustering key in order to point back to the original record. It would help if you showed the table definition.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.