Can someone please explain to me the difference between a primary key, foreign key, clustered index, natural key, and surrogate key?

840 views

Learning SQL right now and the definitions for these terms are a little confusing. Can someone please explain them with an example? Thank you.

In: 15

18 Answers

Anonymous 0 Comments

Here’s the shortest explanation (of a complicated subject) that actually works…

A primary key (PK) just means “this is how you uniquely identify one of these rows in the table”. It can be one field, or the combination of a couple of fields.

A foreign key (FK) is just a reference to a *primary* key on another table. This helps to establish relationships, and I’ll talk more about that later. A PK value from another table can appear in multiple rows on the table that refers to it in the FK field, but only once on the PK table. Think single Customer ID on multiple Orders.

A natural key just means that the key value is a natural property of the thing in the table. If everyone in the world had guaranteed unique names, then we could use their names as a natural key. It’s not reflective of reality though, as anyone named John Smith can testify.

A surrogate key replaces a *natural* key with a guaranteed-unique value. Most often, we use auto-numbered (auto-incremented) primary keys, where the database management system (DBMS) manages the addition of the next value for you when the record is added. After Employee 123 is added to the Employee table, the next Employee will be 124.

There are also candidate keys, which aren’t designated as keys in the DBMS, per se, but are alternative ways to identify a single row in the table. Think CityStateZip Code. Zip Code is the primary key, but you can identify the same place by naming the city and the state. (Bonus reality note: Zip Code is 100% artificial, and in no way describes the location.)

Here’s the critical part about keys: A primary key only needs to be *unique*, and does NOT have to reflect any quality of the object in that row. It supplants the full set of fields (columns) in that row for use in relationships in order to help ensure their accuracy. Put the customer information in one place, just once, and refer to it using it’s ID.

The *reason* we need all of these tables to have uniquely identifiable rows is so that we can associate pieces of information easily (through Relationships), which sets context for events that occur. Ex: I can set up a Customers table with its own auto-numbered primary key. We can now refer to that Customer ID on the Orders table (instead of their name address, shipping address, phone number, etc.) and know for certain which exact customer made that order. It reduces the chances to introduce mistakes like misspelling the customer’s name badly enough that you can’t find their order, and it lets you tie them to multiple orders over time.

OK, not so simple of an explanation, nor a short one. I hope this helps…

You are viewing 1 out of 18 answers, click here to view all answers.