It’s a bit complicated. DNA are string of building blocks called nucleotides. To make it easier to work with the DNA, we can just replaced these building blocks by letter and them compared the strings of letters of the DNA between two species.
1) First we need to align the chains of letter to make them fit together. For example, if you have the chain of BCCDAABCD and the chain DAABCDCDDA you can see that the 6 first letter of the first chain and the 6 last letter of the second chain are the same, we can align them. Of course, the DNA is just one big long chain so it’s actually a bit more complex to do for real. There is usually a certain amount of letters that can’t be aligned with each other.
2) Then you need to spot the mutations that happened between those two DNA you are comparing. The first type of mutation is an Indel or an Insertion/Deletion. They form just one category because we can’t always know if a piece of DNA was deleted to one chain or it was inserted into the the one. So for example if you have DDACDABBADC and DDBDACDABBADC, you can see that there is a BD that was either deleted from the first chain or inserted in the second chain.
3) The next type of mutation is a substitution. That one is easy, if you have DDACDABBADC and DBACDABBADC, you can see that the second letter changed from D to B.
So now you end up with 4 categories when you compare two DNA. Identical, Substitution, Indel and Unaligned. Now you can just get a simply % of identical DNA, but the exact % will depend on you present the data. For example, you when you hear that our DNA is 98% similar to chimpanzee, it’s only when you look at substitution and ignore indel and unaligned DNA. If you count substitution and Indel, but not unaligned DNA you get a similarity of abut 95% and if you look at the entire DNA you get about 81% similarity.
Now when it come to knowing what those different actually are, it’s another story. Knowing what each building block of the DNA do is not something that we currently know. We identify what each bit of DNA do one by one over time, but it’s a very complex things. Some part of DNA doesn’t seem to do anything, but are they really useless or are we just not yet able to identify their role. There is billions of base pair in the human DNA, understand each of them will take a long long time.
Latest Answers