- Surrogate Key - This is non essential data element that simply represent a record in the table but is not part of the business data. For example, an auto-generated identity key or a GUID.
- Natural Key - One or more data elements that are part of the data and is defined as a primary key. For example - ItemNumber or ItemDescription in an InventoryItems table.
- Composite Key - When two or more keys are joined to form the primary key. It generally is a combination of two or more natural keys.
This is not cast in stone, but most developers would prefer a surrogate key such as an identity column as a primary key which makes it easier to join multiple tables for querying purpose etc. It also makes it easier if you are using code generators such as nHibernate to generate your data access layer. But, if you ask a DBA, you may get a different answer. They much rather see natural key as a primary key, even if you have to use composite key to define it.
Personally, I like having an auto generated identity column as a primary key in base tables such as Employees with an EmployeeID as a primary key and Stores table with StoreID.
For linking tables, I much rather use a composite key instead of an identity column. For example, if I have EmployeeStores table which links Employees and Stores, instead of having an identity column EmployeeStoreID, I rather have EmployeeID and StoreID as a composite primary key. You may be wondering why not use an identity key instead?
Let's assume your application has a constraint i.e. an Employee can only be assigned to one store. If you have EmployeeStoreID as your primary key, the following records are perfectly legal...
EmployeeStoreID | EmployeeID | StoreID |
---|---|---|
1 | 100 | 20 |
2 | 100 | 30 |
This doesn't satisfy business constraint, does it? Sure, your application should enforce this but what if someone enters data directly? You can also enforce this by creating a unique constraint on EmployeeID and StoreID, but to me that is another layer of complexity. If I made EmployeeID and StoreID as a composite primary key, I wouldn't have this problem.
Linking tables like these are generally not involved in complex join operations so having a composite key shouldn't force you to write more complex queries. Code generators that I am aware of or have worked with also support composite primary keys.
There is still one issue though that I haven't addressed yet. What if you want to update a record, how do you determine which record to update?
For example, assume an employee can be assigned to multiple stores. consider the following table...
EmployeeID | StoreID |
---|---|
100 | 20 |
100 | 30 |
110 | 30 |
120 | 40 |
Employee 100 is assigned to stores 20 and 30. Now imagine this employee no longer works in store 20 but instead now works in store 40. The manager now updates this employees record via UI and changes his/her store from 20 to 40.
How would you know which record to update? Do you update the 1st record (correct one) or do you update second record (storeID=30) since your stored procedure likely received EmployeeID=100 and StoreID=40?
There are couple of ways to get around this problem.
- You can have your application send the stored procedure both old store ID and new store ID and you would know which record to update.
- You can have the EmployeeStoreID as an auto increment column and your application can pass this ID to your stored procedure to update the record.
Personally, I prefer option #1. If you are using typed data sets or even data access layer using code generators, you will have access to row state which will give you the original and updated values that you can pass to your stored procedure. Alternatively, you can code in a way to keep track of old value and pass it along with the new value.
I generally don't post as a programmer on this blog, I have another blog for that, but here I felt it necessary to discuss a bit about programming. I hope you've enjoyed reading this as much as I've enjoyed writing it. Your comments are appreciated.
Thank you