Eli5: What does percent in DNA mean?

303 views

Siblings share 50% of their DNA. Then scientists say we have 1-4% Neanderthal DNA, but say we are 99% related to chimps. Then there is the fact that we share 60% of our DNA with bananas.

In: 18

8 Answers

Anonymous 0 Comments

Geneticist here. I totally hate those numbers because without context they mean nothing.

You can measure DNA similarity in different ways, and use whichever you need in a context. It will be a bit tldr but I’ll try to keep it streamlined.

So first, DNA is basically a message carved into a material. Let’s imagine a sentence in English:

>*The rabbit ate a carrot.*

You can have another sentence by adding or removing letters:

>*The* rabbi *ate a carrot.*

Or by changing a letter to another letter:

>*The rabbit ate a* parrot.

Or imagine the following sentence:

>*The rabbit ate a* cabbage.

Now if I asked you to rank these sentences based on how similar they are to the original, you would be in trouble. What is better similarity? All letters are the same but one letter is missing (rabbit vs rabbi)? The same amount of letters but one is exchanged (carrot vs parrot)? Is cabbage the best fit even though the letters are totally different, but at least it’s a veggie?

And that’s exactly what the problem with DNA comparison. DNA is a message made up from 4 letters (A, T, G and C), and it can in fact differ in length and/or content (letter exchange) as well it can differ in how the chapters are organized. And there’s not really a universal rule to make comparisons, because there’s not really a percentage score assigned to the different kinds of possible alterations.

Now you can actually define what you are scoring. For example a chimpanzee and a human are very similar. Chimps have all the sentences that we have, organized a bit differently and full with typos. It’s like:

>*Tha rabbti ate an carot.*

So for a chimp vs human comparison you can just take every single human sentence, and for each DNA letter you can find if the chimp has that letter correctly or not. It’s one minus for a missing letter or also one minus for an exchange. You would find that still 95% of the letters are correct.

This was easy, let’s dig deeper for the banana. To understand banana, let’s introduce a little more text to compare. Humans would be like:

>*The rabbit ate a carrot. There is a house on the top of the hill.*

Bananas would be like:

>*The fish live on the ocean. The carrot was eaten by the hare.*

Now as you see, humans have a sentence that is completely missing from the banana, bananas have a sentence that is completely missing from human. That’s absolutely pointless to compare sentences that have nothing to do with each other, so you actually should focus on the other sentence that is somewhat similar (the one with the carrots). And that’s exactly the case with human vs banana DNA comparison, that focuses on some very basic genes that are crucial in all life forms, and disregards everything else. And those super crucial genes that both banana and human have are sort of 60% similar. If you instead compared the whole things letter by letter like we did with the chimp, the comparison is just a mere random noise. But as you see so far, banana comparison 60% is a totally different kind of comparison than the chimp 95%.

To understand human vs human comparison, we need to dig a little deeper. DNA has indeed meaningful sentences but also meaningless filler gibberish between the sentences. Let’s see how it looks like with yet another text.

>*The rabbit ate a carrot. Bla bla bla bla bla bla bla. There is a house on the top of the hill.*

Now the “bla bla” part has no meaning whatsoever, it’s just there. You can have more bla there. You can have blu instead of bla. It still conveys the same message. If you compare only the meaningful sentences, then human DNA is extremely similar. Differences that look big (like eye color, skin color, blood type) are caused by minuscule typos in the DNA. Major messages like how to build a human in general are all the same. If you include the bla bla part in your comparison, then you will find more dissimilarity because there we indeed have some differences. And that’s the 99 or so %.

But then what is this thing with 50% with siblings? Well, it’s a complete different thing. To understand that, let’s focus on the 1% dissimilarities and let’s disregard the 99% similarity. In that 1% we have really a lot of little tiny typos and differences. And they are all over the DNA.

When the parents create a new life, they give away half of their DNA. A child is half dad, half mom. But it’s not always the same half. Lets say dad has a list of typos in his DNA that I will just call A B C D E F G H, mom has I J K L M N O P. when they create a baby, it can perhaps inherit A C E H from dad and J M N P from mom. A next baby can inherit A E F G from dad and I J M O from mom. In this case these two siblings share A E J M, half of the 8 spots. It’s very imprecise and Eli5, but you get the point: siblings, on the statistical average (not always exactly) share 50% of the genetic content in terms of typos and differences coming from the parents. This is all within the 1% difference. The other 99% that is common for all humans, is the same in siblings too.

And that’s it. I hope it’s clearer now.

You are viewing 1 out of 8 answers, click here to view all answers.