Skip to main content
4 of 11
added "acceptable" reasons for natural keys
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309

under what circumstances, if ever, is preferable to use a primary key with some other real meaning? (emphasis added)

Given that the focus of this question is "preferable" and not "acceptable", and accepting that this is still a highly subjective topic, I will say that I cannot think of a situation where it is best for the system to have a truly natural key for a variety of reasons (most of which has been said before in other answers that @Paul linked to in a comment on the question):

  1. What is thought to be unique is not always unique (e.g. Social Security Numbers / SSNs in the U.S.)
  2. Sometimes things change, either in value or uniqueness (we don't control the external world)
  3. Even when something should be "stable" in value and uniqueness (e.g. SKU, perhaps), can the incoming value be guaranteed to be correct? Humans often mistype stuff doing data entry. There are also bugs in export processes that might cause values in a file imported by your system to be incorrect. There are also bugs in other systems that feed data into yours that can allow for the data itself to not be entirely correct, even if their export process worked correctly.

I emphasize "truly" because there is a situation where I prefer to not have a new surrogate key: bridge tables (or whatever you like to call tables used only, or mainly, to support many-to-many relationships).

Thing ThingXTag Tag ------ --------- --- ThingID INT AutoMagic PK ---> ThingID INT PK OtherStuff SomeType TagID INT PK <--- TagID INT AutoMagic PK TagName VARCHAR 

When modeling a bridge table (a table which doesn't exist in the logical model but is needed in the physical model), the PK there should be the existing Primary Key columns of the tables being related via this table. This allows for enforcing the proper uniqueness and non-NULL-ness of the values without needing a separate unique index / constraint. In the rare case of needing to Foreign Key to this relationship, it will:

  • be meaningful in that each of the key columns will actually point back to the original source tables without needing yet another join, and
  • prevent someone or something from updating the key columns that form the relationship between the two primary tables without updating the value being linked to via a foreign key.

I have worked on a system where these bridge tables had their own auto-increment surrogate key PKs, and the single column surrogate key of the bridge table was referenced in other tables via FK and it was a horrible, confusing mess that we wasted way too much time on for debugging, etc.


To be explicit, there are some scenarios where it is "acceptable" to have natural keys, though I don't think I would go so far as to say "preferable".

  1. State / Region and Country codes: If you use codes maintained by the International Organization for Standardization (ISO) (e.g. "US" = United States, "FL" = Florida, etc) then these are probably reliable enough to use, and I have used them as they are short (i.e. not bad for performance) and human-readable. These codes are also used often-enough in a variety of other ways, even outside of computer systems, that people have a general familiarity with them, even ones that don't initially make as much sense to some folks (e.g. "DE" = Germany might not be intuitively obvious to those who are unaware that "Germany" == "Deutschland" in German).
  2. Internal lookup value codes: You can't control external sources, but (hopefully) you are in control of your own system. If your system has department codes, status codes, etc that are used internally, then it should be fine to come up with short codes for them (2 - 4 bytes). At 4 bytes it would be using the same amount of space as an INT and if using a binary Collation (one ending in _BIN2, or even _BIN, but _BIN2 is preferred) then it should compare just as fast. Having relatively meaningful values for such codes can make support / debugging easier. However, you still get into the situation where over time, department names, etc might change and the codes might not be meaningful anymore.
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309