3

Suppose that I have a table with many columns that I don't care about, but two that I do: Primary and Secondary. There is a clustered primary key on Primary.

CREATE TABLE [dbo].[OnlyTwoGoodColumnsButManyBad] ( [Primary] INT PRIMARY KEY CLUSTERED, [Secondary] NVARCHAR(500) NOT NULL, [Evil1] NVARCHAR(MAX), [Evil2] NVARCHAR(MAX), [Evil3] NVARCHAR(MAX), [Evil4] NVARCHAR(MAX), [Evil5] NVARCHAR(MAX) ); 

Because I don't care about any of the other columns, I decide that I want this index:

CREATE NONCLUSTERED INDEX [IX_Primary_Secondary] ON [dbo].[OnlyTwoGoodColumnsButManyBad] ( [Primary], [Secondary] ); 

Given that this index's key list contains the primary key, there is no doubt that the key list is unique. Thus, I am tempted to make it UNIQUE.

CREATE UNIQUE NONCLUSTERED INDEX [UIX_Primary_Secondary] ON [dbo].[OnlyTwoGoodColumnsButManyBad] ( [Primary], [Secondary] ); 

But does this have any benefits? And what about the exact same case, but with the Secondary as the leading index key followed by Primary?

0

2 Answers 2

10

Uniqueness and Performance

Arguments FOR Declaring Uniqueness

It's typically better to provide more information to the optimizer rather than less. If you know that an index is unique, it's best to declare it that way. This can help the optimizer, as well as providing clarity about your data to actual people looking at the schema definitions.

This can have a direct impact on performance. See Paul White's article Enforcing Uniqueness for Performance for one such example.

Arguments AGAINST Declaring Uniqueness

As far as the specifics of this question, I tried to adapt your nonclustered index scenario to the demo in Paul's article, and it appears that singleton seeks are used whether the nonclustered index is declared unique or not. I've included the modified demo code at the end of this answer.

Based on that, it appears the optimizer is able to infer uniqueness by the inclusion of the UNIQUE key from the clustered index. This concept is more thoroughly confirmed and fleshed out in this Q&A: Should I mark a composite index as unique if it contains the primary key?

There are also certain scenarios where a UNIQUE index can cause performance overhead, so there's not a magic bullet answer here unfortunately. See these articles for related reading:

Index Key Ordering

Regarding the column ordering, that really depends on the types of queries that are going to be executed against this table. And this is more-or-less the "general" guidance about indexes that comes into play here. You generally want the "most selective" column first.

If queries are typically going to include a value for "Primary" as a filter (or join condition), then that's an ideal candidate - you can seek directly to the right row. If queries typically only provide a value for "Secondary", then Primary is not a good leading index key - you'd have to scan through every row in the table. In that case, Secondary is a better choice to lead the index.


Demo Code

See below for the code I used, again this is adapted (possibly incorrectly, as I am just a lowly web developer) from Paul's article:

SET STATISTICS TIME, IO ON; CREATE TABLE [dbo].[OnlyTwoGoodColumnsButManyBad] ( [Primary] INT PRIMARY KEY CLUSTERED, [Secondary] NVARCHAR(500) NOT NULL, [Evil1] NVARCHAR(MAX), [Evil2] NVARCHAR(MAX), [Evil3] NVARCHAR(MAX), [Evil4] NVARCHAR(MAX), [Evil5] NVARCHAR(MAX) ); INSERT dbo.OnlyTwoGoodColumnsButManyBad WITH (TABLOCKX) ([Primary], Secondary) SELECT TOP (5000000) [Primary] = ROW_NUMBER() OVER ( ORDER BY @@SPID), Secondary = 'Garbage' FROM master.sys.columns AS C CROSS JOIN master.sys.columns AS c2 CROSS JOIN master.sys.columns AS c3 ORDER BY [Primary]; CREATE NONCLUSTERED INDEX [IX_Primary_Secondary] ON [dbo].[OnlyTwoGoodColumnsButManyBad] ( [Primary], [Secondary] ); CREATE UNIQUE NONCLUSTERED INDEX [UIX_Primary_Secondary] ON [dbo].[OnlyTwoGoodColumnsButManyBad] ( [Primary], [Secondary] ); SELECT COUNT_BIG(*) FROM dbo.[OnlyTwoGoodColumnsButManyBad] AS ST WITH (TABLOCK, INDEX ([IX_Primary_Secondary])) JOIN dbo.[OnlyTwoGoodColumnsButManyBad] AS ST2 WITH (TABLOCK, INDEX ([IX_Primary_Secondary])) ON ST2.[Primary] = ST.[Primary] AND ST2.[Secondary] = ST.[Secondary] OPTION (MAXDOP 1, LOOP JOIN, FORCE ORDER); /* Table 'OnlyTwoGoodColumnsButManyBad'. Scan count 1, logical reads 15329888 CPU time = 5000 ms, elapsed time = 5312 ms. */ SELECT COUNT_BIG(*) FROM dbo.[OnlyTwoGoodColumnsButManyBad] AS ST WITH (TABLOCK, INDEX ([UIX_Primary_Secondary])) JOIN dbo.[OnlyTwoGoodColumnsButManyBad] AS ST2 WITH (TABLOCK, INDEX ([UIX_Primary_Secondary])) ON ST2.[Primary] = ST.[Primary] AND ST2.[Secondary] = ST.[Secondary] OPTION (MAXDOP 1, LOOP JOIN, FORCE ORDER); /* Table 'OnlyTwoGoodColumnsButManyBad'. Scan count 1, logical reads 15329888 CPU time = 4984 ms, elapsed time = 5283 ms. */ 
1
  • In your example value of Secondary is only "Garbage",tough length is [Secondary] NVARCHAR(500). so my question is --If table contain million of records then creating whatever index on Secondary really help.Wont it be too expensive for optmizer to choose ? Commented Sep 19, 2024 at 6:38
3

There is another benefit in declaring the index as unique.

Every non unique index is always bigger than the similar unique index. This is because in case of non unique nonclustered index, the rowID(RID o clustered index key) is also stored at non-leaf index levels.

The non unique nonclustered index (your case) always has the clustered index key present on every index level, whereas similar unique index has no need of it on its non-leaf levels, so the bigger is your clustered index key, the bigger is the size of your non unique nonclustered index.

I was not able to find the official documentation but here is an explanation of this: Why do root and intermediate levels of NON-unique NON-clustered indexes additionally store row ID?

Ok, here is my repro:

select n, replicate('0', 200) + cast(n as varchar(10)) as n1 into dbo.nums from fleetxs.dbo.nums alter table nums add constraint PK_nums primary key clustered(n) create index ix_n1 on dbo.nums(n1) DBCC IND (test, nums, 2) --187341 --187342 DBCC TRACEON(3604) DBCC PAGE(test, 1, 187341, 3) 

I just create a table from my work table dbo.Nums that contain integer numbers 1..1000000, I put the unique column n as it is in the column n of my new table and I create another column replicate('0', 200) + cast(n as varchar(10)) as n1, here replicate is used only for the purpose of large enough column to get not only root + leaf but also 1 intermediate level.

The index on this column is not unique but the content is unique by definition.

Then I get my index pages with DBCC IND:

enter image description here

I take 2 pages that represent intermediate and root level and examine them:

  1. intermediate level enter image description here

  2. root level enter image description here

Both of them have the clustered index column n a as part of the key. The values in the column n1 are unique though.

1
  • 1
    While true, non-leaf row size isn't usually an important factor in the overall size of an index. There are normally very many more entries at the leaf, which must include the full row locator. So, you're typically not saving very much. On the positive side, having the extra key(s) in the non-leaf levels allows additional seeking possibilities beyond just the indexed key. It's a trade off. Commented Sep 11, 2024 at 4:53

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.