Skip to main content
added clarification
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309

(or whatever you like to call tables used only, or mainly, to supportrepresent many-to-many relationships)

Yes, the problems noted at the beginning of this answer are still potential issues when using natural keys as alternate keys. However, the difference is that the problem is isolated to (usually) just one table. If someone made a mistake and created a unique index on "flight locator", it might be a while before they get a violation. But once they do, it is easy enough to drop that unique index and recreate it to include the flight dates as well. Or, if you change your email address (often used as the Login) on a system and get an error because it was in use by someone else years ago (legitimately), that can most likely be handled by support without any impact / risk to existing related records. In both cases the rest of the data model is untouched for the necessary changes. 

Again, this is a pragmatic approach to minimizing potential data loss that can happen when migrating a primary key:

  • minimizing potential data loss that can happen when migrating a primary key that has been referenced by one or more foreign keys (and minimizing the scope of the project also reduces the maintenance window :-). While not all PKs (or Unique Constraints/Indexes) have FKs, using a surrogate should certainly reduce the number of columns that have such dependencies
  • making the system as durable, resilient, and efficient as possible. The physical model is the "in practice" to the "in theory" of the conceptual model. And we already make quite a few adjustments and considerations given that the conceptual model doesn't care about implementation. Yet we make choices to utilize vendor-specific features, for performance (e.g. denormalization, lookup tables, "sibling" tables as noted above, etc), to create the "bridge" tables (as noted above), and so on.

I don't know how many systems used SSNs (Social Security Numbers in the U.S.) as PKs, but for any that hasdid, some of them (many perhaps) may have avoided issues with them not being as unique as they should have been referenced by one or. But, none of those systems were able to avoid changes over the years regarding the need to handle them much more foreign keyssecurely. Systems treating SSNs as an alternate key require very little development time to switch over to encrypting those values, and the system requires little downtime (and minimizingor none) to make the changes at the data layer. Given that we all have backlogs of projects that we will likely never get to, businesses tend to prefer that these annoying yet inevitable changes will cost them 5 hours instead of 20 - 40 (don't forget that changes need to be tested, and hence the scope of the projectchanges also reducesdirectly impacts how much QA time is needed for the maintenance window :-project).

(or whatever you like to call tables used only, or mainly, to support many-to-many relationships)

Yes, the problems noted at the beginning of this answer are still potential issues when using natural keys as alternate keys. However, the difference is that the problem is isolated to (usually) just one table. If someone made a mistake and created a unique index on "flight locator", it might be a while before they get a violation. But once they do, it is easy enough to drop that unique index and recreate it to include the flight dates as well. Or, if you change your email address (often used as the Login) on a system and get an error because it was in use by someone else years ago (legitimately), that can most likely be handled by support without any impact / risk to existing related records. In both cases the rest of the data model is untouched for the necessary changes. Again, this is a pragmatic approach to minimizing potential data loss that can happen when migrating a primary key that has been referenced by one or more foreign keys (and minimizing the scope of the project also reduces the maintenance window :-).

(or whatever you like to call tables used only, or mainly, to represent many-to-many relationships)

Yes, the problems noted at the beginning of this answer are still potential issues when using natural keys as alternate keys. However, the difference is that the problem is isolated to (usually) just one table. If someone made a mistake and created a unique index on "flight locator", it might be a while before they get a violation. But once they do, it is easy enough to drop that unique index and recreate it to include the flight dates as well. Or, if you change your email address (often used as the Login) on a system and get an error because it was in use by someone else years ago (legitimately), that can most likely be handled by support without any impact / risk to existing related records. In both cases the rest of the data model is untouched for the necessary changes. 

Again, this is a pragmatic approach to:

  • minimizing potential data loss that can happen when migrating a primary key that has been referenced by one or more foreign keys (and minimizing the scope of the project also reduces the maintenance window :-). While not all PKs (or Unique Constraints/Indexes) have FKs, using a surrogate should certainly reduce the number of columns that have such dependencies
  • making the system as durable, resilient, and efficient as possible. The physical model is the "in practice" to the "in theory" of the conceptual model. And we already make quite a few adjustments and considerations given that the conceptual model doesn't care about implementation. Yet we make choices to utilize vendor-specific features, for performance (e.g. denormalization, lookup tables, "sibling" tables as noted above, etc), to create the "bridge" tables (as noted above), and so on.

I don't know how many systems used SSNs (Social Security Numbers in the U.S.) as PKs, but for any that did, some of them (many perhaps) may have avoided issues with them not being as unique as they should have been. But, none of those systems were able to avoid changes over the years regarding the need to handle them much more securely. Systems treating SSNs as an alternate key require very little development time to switch over to encrypting those values, and the system requires little downtime (or none) to make the changes at the data layer. Given that we all have backlogs of projects that we will likely never get to, businesses tend to prefer that these annoying yet inevitable changes will cost them 5 hours instead of 20 - 40 (don't forget that changes need to be tested, and hence the scope of the changes also directly impacts how much QA time is needed for the project).

added clarification
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309

Yes, the problems noted at the beginning of this answer are still potential issues when using natural keys as alternate keys. However, the difference is that the problem is isolated to (usually) just one table. If someone made a mistake and created a unique index on "flight locator", it might be a while before they get a violation. But once they do, it is easy enough to drop that unique index and recreate it to include the flight dates as well. Or, if you change your email address (often used as the Login) on a system and get an error because it was in use by someone else years ago (legitimately), that can most likely be handled by support without any impact / risk to existing related records. In both cases the rest of the data model is untouched for the necessary changes. Again, this is a pragmatic approach to minimizing potential data loss that can happen when migrating a primary key that has been referenced by one or more foreign keys (and minimizing the scope of the project also reduces the maintenance window :-).

Yes, the problems noted at the beginning of this answer are still potential issues when using natural keys as alternate keys. However, the difference is that the problem is isolated to (usually) just one table. If someone made a mistake and created a unique index on "flight locator", it might be a while before they get a violation. But once they do, it is easy enough to drop that unique index and recreate it to include the flight dates as well. Or, if you change your email address (often used as the Login) on a system and get an error because it was in use by someone else years ago (legitimately), that can most likely be handled by support without any impact / risk to existing related records. In both cases the rest of the data model is untouched for the necessary changes. Again, this is a pragmatic approach to minimizing potential data loss that can happen when migrating a primary key that has been referenced by one or more foreign keys (and minimizing the scope of the project also reduces the maintenance window :-).

added clarification about physical vs conceptual model
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309

To be clear, I am speaking in terms of the physical model, not the conceptual model. I assumed that the focus of this question was the physical model since it is framed in the context of issues that do not exist conceptually: surrogate keys, issues dealing with usage of the primary key value, etc. With that in mind, I was not implying that natural keys should not be stored and used for identification. On the contrary, natural keys are great "alternate keys" and should have unique constraints / indexes placed on them. However, the idealism of the conceptual model does not always translate directly to the physical model. Data integrity (i.e. the stability and reliability of the data model) is a top, if not the top, priority of the physical model. And so practical considerations, such as using surrogate keys, must be made to ensure that this goal is met and not compromised. Meaning, if you have SSNs or SKUs, etc, then absolutely store them in a column that has a unique constraint on it, and have the system do a lookup on that value since auto-generated numbers shouldn't be used externally anyway. Users don't need to know the auto-generated ID number of a record: they should pass in a value that they do know (e.g. email address as a lookup for UserID / CustomerID, flight confirmation codes in combination with flight dates, etc) and the system should translate that into the auto-generated value that it uses from that point forward.

To be clear, I am speaking in terms of the physical model, not the conceptual model. I assumed that the focus of this question was the physical model since it is framed in the context of issues that do not exist conceptually: surrogate keys, issues dealing with usage of the primary key value, etc. With that in mind, I was not implying that natural keys should not be stored and used for identification. On the contrary, natural keys are great "alternate keys" and should have unique constraints / indexes placed on them. However, the idealism of the conceptual model does not always translate directly to the physical model. Data integrity (i.e. the stability and reliability of the data model) is a top, if not the top, priority of the physical model. And so practical considerations, such as using surrogate keys, must be made to ensure that this goal is met and not compromised. Meaning, if you have SSNs or SKUs, etc, then absolutely store them in a column that has a unique constraint on it, and have the system do a lookup on that value since auto-generated numbers shouldn't be used externally anyway. Users don't need to know the auto-generated ID number of a record: they should pass in a value that they do know (e.g. email address as a lookup for UserID / CustomerID, flight confirmation codes in combination with flight dates, etc) and the system should translate that into the auto-generated value that it uses from that point forward.

added reason for "sibling" tables
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309
Loading
added pseudo-table
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309
Loading
added pseudo-table
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309
Loading
The @! It does nothing!
Source Link
Paul White
  • 95.8k
  • 30
  • 442
  • 691
Loading
added "acceptable" reasons for natural keys
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309
Loading
added reasons for preferring surrogate keys
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309
Loading
added pseudo-table
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309
Loading
Source Link
Solomon Rutzky
  • 70.4k
  • 8
  • 162
  • 309
Loading