Can someone please explain to me the difference between a primary key, foreign key, clustered index, natural key, and surrogate key?

846 views

Learning SQL right now and the definitions for these terms are a little confusing. Can someone please explain them with an example? Thank you.

In: 15

18 Answers

Anonymous 0 Comments

Let’s pretend we’ve got a set of tables for sorting books for like a library or something

Primary key: what makes your row unique? If you have data about <book title> by <author>, the row can have all sorts of extra info like review scores or lending status, but it’s still the same book regardless of those extra details. If someone takes out “The hunger games” by suzanne collins and we don’t have a good primary key, the system could potentially add a totally new hunger games row with the lending status column set to “unavailable” and leave the original row alone (saying the book is still available), which is obviously not what we want. It’s the same book, there’s just a different status on it now, so we should update the existing row because the primary key for the new data matches the primary key of an existing row.

Foreign key: A list of values from another table to limit valid entries in the original table. Let’s say you’ve got a table that lists all sorts of book genres; it’s just literally one column filled with rows like “fiction” “science fiction” etc. Your main table may put a foreign key on its own genre column, so that it doesn’t end up doing something confusing like calling Star wars a “scifi”-genre story since that isn’t how all the other science fiction stories are categorized; it would make searching for it a nightmare. It’s like a web form providing a dropdown menu of valid responses instead of letting the user type whatever they want

Clustered index: what’s the default sort order for your table? This affects how they are stored in disk. If you tell the table to be stored on the hard drive by alphabetizing based on title, then adding a new entry to the table won’t tack the new entry to the end of the table and will instead try to insert it into the right spot.

non-clustered index (you didn’t ask, but figured I would talk about it): It’s basically a piece of paper on the side that lists our table in a different order. If someone wants to come in and search by author, they will have a hell of a time trying to do that if the table is sorted by book title. If someone wants to find all books by Brandon Sanderson, the table itself would be putting “Elantris” and “Warbreaker” nowhere close to each other in the table, so the search will have to go through the entire table to make sure it doesn’t miss anything. A nonclustered index is like looking at a note that says “Brandon Sanderson is the author for rows 5, 27, 83, (etc) in the original table” and the search will be able to jump straight to them for the rest of the data

Surrogate vs natural key are different types of primary key. My primary key example from earlier that was based on title/author would be considered a natural key because it uses the data itself to form a *natural* way to sort the information. A surrogate key would be like assigning a number to each row and just sorting on that. Surrogate keys have essentially zero chance of accidental overlap if done correctly, but they don’t give you much benefit in actually trying to find anything unless that number means something else to you (like if you want to find something inserted in the last month, so you take the most recent 1000 rows or something)

You are viewing 1 out of 18 answers, click here to view all answers.