Can someone please explain to me the difference between a primary key, foreign key, clustered index, natural key, and surrogate key?

585 views

Learning SQL right now and the definitions for these terms are a little confusing. Can someone please explain them with an example? Thank you.

In: 15

18 Answers

Anonymous 0 Comments

What helped me understand all this was what used to be called third normal form*: you have a table with columns. All the columns MUST have data in them. Nulls are not allowed. Duplicate rows are not allowed. How do you store an address with these restrictions?

You pick out what makes each part of an address unique. Start with the name. Now, this is a problem with third normal form: you can go really over the top and atomize the data too much. Which is to say, you can split a name up into three or more tables if you’re manic about it. For the sake of argument, let’s say all names have a first name and a last name, and that’s it. No exceptions.

Okay, so that’s a table. First and last name. But that’s not a very good unique identifier. Lots of people have the same first and last name. So you create a primary key. This is a number that is incremented each time a record is entered. Always. To the point that, if you delete a record, that ID is never used again.

Okay. So we have a person’s name as a record with a primary key that will always identify that person even if they have the same name as another person, because you can’t enter two people with the exact same name at the exact same time (does this sound like an edge case? It sure does! But it’s beyond this discussion.) A person can have many addresses.

When someone is using an Excel spreadsheet to keep track of it all, they just add columns. “Address2,City2,Address3,City3..YaddaYadda2..3.” But we have a restriction. We can’t have null values. ALL columns must have data.

And this is where the relationships start to matter. One person can have many addresses. This means we put a PERSONID column in our ADDRESS table. This is a foreign key relationship. The name primary key becomes an address foreign key because, in the name table, THERE CAN BE ONLY ONE! But in my address table, there can be many. Now, all our tables have columns full of data.

Once you think about it, the rest emerges, at least in the US (but this applies to all addresses, the columns and tables will change but not the concepts). There are only a set number of states, so they get unique IDs. Those go into the ADDRESS table. Likewise zip codes. Presto: you now have a database with, what, six tables? No rows are repeated. All columns have data. There are no nulls. After a few mental adjustments, this makes structured queries… well, that’s why it’s called SQL.

There are several drawbacks to this model. But they pale against the advantages. It’s like knowing how to write music, or a novel: it’s fine to break the rules. Sometimes you need to. But first, you have to know them.

*This is from memory, of me reading a textbook on a bus, in 1996. YMMV.

You are viewing 1 out of 18 answers, click here to view all answers.