DNA is a very complex molecule composed of 4 smaller molecules. We’ll call them A,T,G, and C. These four molecules will arrange themselves in various (long and complex) patterns. ATGC, GCTA, CTGA, and so on. Those different patterns are our genetic code. When siblings share 50% of their DNA it’s half-moms code and half-dads code, but rearranged in different ways that form those individuals genes and DNA. When humans and bananas share 60% of our DNA, 60% of the banana code is the same as found in humans and bananas, and 40% of it is found only in bananas.
Evolution is lazy and most biological functions are roughly similar to each other. Sure, we share 99.6% of our genome with pan bonobos; but that .4% makes a world of difference. Thing is, there isn’t *that* much difference between bonobos and humans. The heart, digestive system, kidneys, brain, eyes, ears, etc all do basically the same thing. Evolution isn’t going to come up with novel solutions for every animal. We see a lot of differences between a Chevy Tahoe and a Prius because we are tuned to it. Realistically they are dramatically similar. The wheels do the same thing, the steering wheel does the same thing, etc. Change a couple of parts and it is totally different; but it isn’t a brand new invention every time.
They mean different things.
Think of your genome (DNA) as your house. To function, you need some basic things. All houses have (for example) a dishwasher, a fridge, a bed, and an oven. In that sense, you and your neighbor share 100% of the house contents.
But let’s say your fridge is made by GE, and has 2 doors. Your neighbor’s fridge is made by Maytag and has 1 door. In that sense, the appliances are different.
In that same way, we are 99% “related” to chimps in that we have mostly the same *genes*, just different *versions* of those genes. Pretty much all living things have an enzyme called hexokinase (just a random example). In that way, every living thing is related! But it turns out that while you have the same hexokinase as any other human, it’s not the same as the hexokinase found in a banana or in a bacterium. Same genes, different versions.
*Within* humans, (i.e. it’s already understood that you share all the genes but have different versions), you can also have variability. That’s why you don’t look the same as your neighbor or your neighbor’s neighbor.
For example: let’s say hair color is a single gene (not really but bear with me), and it has 2 varieties (called alleles): brown and blonde.
Pete and Bob both have the hair color gene. Now Pete has brown hair and Bob has blonde hair, so clearly they have different *alleles* but they both do have some form of that gene. Bill, Bob’s identical brother, also has the hair color gene but he has the exact same copy as Bob. So in that sense, all 3 share the same gene but at the same time, in a different way, only 2 of them share it.
Itsy Bitsy the spider, on the other hand, does not have *any* version of that hair color gene because she doesn’t have hair.
So “percentages of DNA” is unfortunately too vague a term to actually be meaningful on its own. You’d have to specify what you actually mean: is it just having the gene vs. not having it (as in the case of Pete and Bob vs the spider), or is it variations (as in Bob and Bill vs Pete)
There is no single meaning. People are using the term in different contexts with different meanings. That’s why it seems confusing or contradictory – because they are talking about different things.
For example, there’s a difference between genetic similarity between *people* and *species*. When talking about people, you’re looking at a specific “snapshot” of DNA; when talking about species, you’re looking at a group of “snapshots” of DNA.
As an analogy – it’s like comparing the English words “horse” and “house” (people) vs. comparing the English alphabet with the French alphabet (species).
Further, there are different ways to compare between species. There is a difference between looking at “absolute” similarity – how many genes are shared – and “relative” similarity, which can be measured in various ways. For example, you could measure something like “how many genes *first appeared* in species X and *then* appeared in species Y” as a way to track how the species interbred; this measure would ignore all the things that are already similar in the species’ common lineage before they split into X and Y. There are other relative measures you could take as well.
It’s not scientists that say those things, it’s journalists lazily reporting it and forgetting all the “boring” details and explanations.
You can think of DNA has the “Blueprint” for an organism to grow, or the alphabet used to write those instructions.
If we keep the alphabet analogy:
* HORSE and SHORE share 100% of the same letters
* They are 80% similar.
They’re still absolutly not the same, nor even remotely related
Geneticist here. I totally hate those numbers because without context they mean nothing.
You can measure DNA similarity in different ways, and use whichever you need in a context. It will be a bit tldr but I’ll try to keep it streamlined.
So first, DNA is basically a message carved into a material. Let’s imagine a sentence in English:
>*The rabbit ate a carrot.*
You can have another sentence by adding or removing letters:
>*The* rabbi *ate a carrot.*
Or by changing a letter to another letter:
>*The rabbit ate a* parrot.
Or imagine the following sentence:
>*The rabbit ate a* cabbage.
Now if I asked you to rank these sentences based on how similar they are to the original, you would be in trouble. What is better similarity? All letters are the same but one letter is missing (rabbit vs rabbi)? The same amount of letters but one is exchanged (carrot vs parrot)? Is cabbage the best fit even though the letters are totally different, but at least it’s a veggie?
And that’s exactly what the problem with DNA comparison. DNA is a message made up from 4 letters (A, T, G and C), and it can in fact differ in length and/or content (letter exchange) as well it can differ in how the chapters are organized. And there’s not really a universal rule to make comparisons, because there’s not really a percentage score assigned to the different kinds of possible alterations.
Now you can actually define what you are scoring. For example a chimpanzee and a human are very similar. Chimps have all the sentences that we have, organized a bit differently and full with typos. It’s like:
>*Tha rabbti ate an carot.*
So for a chimp vs human comparison you can just take every single human sentence, and for each DNA letter you can find if the chimp has that letter correctly or not. It’s one minus for a missing letter or also one minus for an exchange. You would find that still 95% of the letters are correct.
This was easy, let’s dig deeper for the banana. To understand banana, let’s introduce a little more text to compare. Humans would be like:
>*The rabbit ate a carrot. There is a house on the top of the hill.*
Bananas would be like:
>*The fish live on the ocean. The carrot was eaten by the hare.*
Now as you see, humans have a sentence that is completely missing from the banana, bananas have a sentence that is completely missing from human. That’s absolutely pointless to compare sentences that have nothing to do with each other, so you actually should focus on the other sentence that is somewhat similar (the one with the carrots). And that’s exactly the case with human vs banana DNA comparison, that focuses on some very basic genes that are crucial in all life forms, and disregards everything else. And those super crucial genes that both banana and human have are sort of 60% similar. If you instead compared the whole things letter by letter like we did with the chimp, the comparison is just a mere random noise. But as you see so far, banana comparison 60% is a totally different kind of comparison than the chimp 95%.
To understand human vs human comparison, we need to dig a little deeper. DNA has indeed meaningful sentences but also meaningless filler gibberish between the sentences. Let’s see how it looks like with yet another text.
>*The rabbit ate a carrot. Bla bla bla bla bla bla bla. There is a house on the top of the hill.*
Now the “bla bla” part has no meaning whatsoever, it’s just there. You can have more bla there. You can have blu instead of bla. It still conveys the same message. If you compare only the meaningful sentences, then human DNA is extremely similar. Differences that look big (like eye color, skin color, blood type) are caused by minuscule typos in the DNA. Major messages like how to build a human in general are all the same. If you include the bla bla part in your comparison, then you will find more dissimilarity because there we indeed have some differences. And that’s the 99 or so %.
But then what is this thing with 50% with siblings? Well, it’s a complete different thing. To understand that, let’s focus on the 1% dissimilarities and let’s disregard the 99% similarity. In that 1% we have really a lot of little tiny typos and differences. And they are all over the DNA.
When the parents create a new life, they give away half of their DNA. A child is half dad, half mom. But it’s not always the same half. Lets say dad has a list of typos in his DNA that I will just call A B C D E F G H, mom has I J K L M N O P. when they create a baby, it can perhaps inherit A C E H from dad and J M N P from mom. A next baby can inherit A E F G from dad and I J M O from mom. In this case these two siblings share A E J M, half of the 8 spots. It’s very imprecise and Eli5, but you get the point: siblings, on the statistical average (not always exactly) share 50% of the genetic content in terms of typos and differences coming from the parents. This is all within the 1% difference. The other 99% that is common for all humans, is the same in siblings too.
And that’s it. I hope it’s clearer now.
Latest Answers