6

What is the best way to make a simple many-to-many cross reference table which contains nothing but two columns which are themselves primary keys in other tables?

Does anyone have concrete evidence for or against creating a table with a single unique index, but no primary key? (Alternatives are detailed below).

Put another way: How does SQL Server internally uniquely identifies rows a) that have a primary key and b) that do not have a primary key?


In detail:

Given the input tables:

CREATE TABLE Foo ( FooID bigint identity(1,1) not null primary key, other stuff... ) CREATE TABLE Bar ( BarID bigint identity(1,1) not null primary key, other stuff... ) 

The three basic options are (in all cases assume a foreign key is created on the FooID and BarID columns):

-- Option 1: Compound primary key CREATE TABLE FooBarXRef ( FooID bigint not null , BarID bigint not null , PRIMARY KEY ( FooID, BarID ) , CONSTRAINT FK... etc ) -- Option 2: Independent primary key + unique index CREATE TABLE FooBarXRef ( FooBarXRefID bigint identity(1,1) not null primary key , FooID bigint not null , BarID bigint not null , CONSTRAINT FK... etc ); CREATE UNIQUE INDEX I_FooBarXRef_FooBar ON FooBarXRef ( FooID, BarID ); -- Option 3: Unique index, no explicit primary key: CREATE TABLE FooBarXRef ( FooID bigint not null , BarID bigint not null , CONSTRAINT FK... etc ); CREATE UNIQUE INDEX I_FooBarXRef_FooBar ON FooBarXRef ( FooID, BarID ); 

Does having a separate identity PK on the xref table to be redundant; that may needlessly introduces another layer of constraint checking on the database engine?

On the other hand are multi-column primary keys problematic? With a proposed solution to have the xref table contain only the two foreign keys, and define a unique index on those columns, but not define a primary key at all... ?


I suspect that doing so will cause SQL Server to create an internal primary key for the purposes of uniquely identifying each row, thus yielding the same redundant constraints as if a primary key were defined explicitly--but I have no proof or documentation to support this. Other questions and answers suggest that there is not an internal primary key by default (i.e. no equivalent to the Oracle ROWID); as the %%physloc%% is an indicator of where a row is currently stored and thus is subject to change. My intuition is that the engine must create something to uniquely identify a row in order to implement cursors, transactions, and concurrency.

2
  • I would certainly go for option1. It's a relationship table, so the composite PK makes perfectly sense in this case. The generic assertions like "composite keys are evil" are... evil ;) Commented Feb 13, 2014 at 18:07
  • @RaphaëlAlthaus I agree on both counts...but I'm looking for evidence that alternatives 2 and 3 are demonstrably less optimal :) Commented Feb 13, 2014 at 18:09

2 Answers 2

6

The concept of a primary key is really about relational theory; maintaining referential integrity by building relationships across multiple tables. The SQL Server engine, by default, creates a unique clustered index when a primary key is built (assuming a clustered index doesn't exist at the moment).

It's this clustered index that defines a unique row at the leaf level. For tables that have a non-unique clustered index, SQL Server creates a 4byte "uniquifier" to to the end of your key.

  • TestTable1 Primary Key
  • TestTable2 Primary Key & Unique Non-Clustered
  • TestTable3 Unique Clustered
  • TestTable4 Primary Clustered (same as Table1 & Table3, since a primary key CAN be defined on a non-clustered index I prefer this to always define which structure I want).

TestTable2 is redundant, it's create a unique clustered index to store all the records at it's leaf level. It's then creating a unique non-clustered index to enforce uniqueness once again. Any changes on the table will hit the clustered and then the non-cluster.

TestTable1, TestTable3, TestTable4 are a tie in my book, a unique clustered index structure is created on all. There is no physical difference in the way records are stored on a page.

However for SQL Server Replication, all replicated tables required a primary key. If your'll be using Replication in the future you may want to make sure all your unique clustered indexes are primary keys as well.

I seem to be unable to paste in my verifying scripts, so here they are on hastebin.

http://hastebin.com/qucajimixi.vbs

Sign up to request clarification or add additional context in comments.

1 Comment

Link is broken, how about a SqlFiddle session example?
2

Well, it all depends on the requirement. As far as I know

PRIMARY KEY= UNIQUE KEY+NOT NULL key 

What this tells you is that you can have multiple

NOT NULL UNIQUE INDEXES(NON CLUSTERED) but CANNOT HAVE MULTIPLE PRIMARY KEYS IN A TABLE( CLUSTERED). 

I am a huge believer of Relational database model and working with the PRIMARY-FOREIGN KEYS relationships. DB replication requires you to have Primary Key on a table ; therefore, it is always a good practice to create Primary Key instead of UNIQUE keys for your table.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.