How does DNA matching work?


Recent story about Colin Pitchfork (first person convicted of rape using DNA analysis) got me wondering how this works. How is DNA coded in a way which allows you to match it up with another sample? When you look at it under a microscope or something there aren’t exactly letters and numbers for each part of it.

In: Biology

The most common technique is to use various enzymes that split up the genes at specific points. Then the genes are forced through a gel by an eletrical field. Shorter genes go through the gel faster then the longer genes. If you power it off when the genes are half way through the gel and then use some fluorizing light to identify the genes they will form certain bands depending on their lengths. The theory here is that different genetic materials will be different enough that it is cut in different locations and therefore be of different lengths. By using different enzymes you get a different pattern again. Each set of DNA gives a unique fingerprint that you can compare.

It should be noted that this system is not perfect by any means. Even though no two non-twins have the same DNA this test can make two different sets of DNA look the same by pure chance, just like two fingerprints may look very similar to each other just by chance. These tests also use DNA amplification techniques first so it is possible that a small sample like a single dead skin cell may just happen to be the one to get amplified and tested against. So a DNA match might mean that the suspect were on the same subway as either the victim or the criminal within the last few days.

In addition to the more old-fashioned analog method of testing the length of particular bits of the genome through gel electrophoresis, more modern methods that actually sequence the sample DNA *do* actually yield a (massive) string of letters. These letters, A, T, C and G, represent the four nucleobases (adenine, thymine, cytosine and guanine) that encode information in DNA.

The chemistry behind how a bunch of small molecules can encode huge amounts of information is something for a different post, which I’m sure has been asked and answered on this sub before. All you need to know is that similarly to how a computer can store information as a long string of zeroes and ones, DNA does something very similar but with 4 possibilities per place along the string instead. And just like you could have a computer compare two strings of binary information and calculate a % match, you can do the same with sequenced DNA.

Some good answers in here.

Modern DNA profiling can be done with endonuclease restriction analysis and electrophoresis or sequencing, but typically in human identification we use Short Tandem Repeat tech.

To ELI5: we use a special magnifying glass (complicated chemistry and lab equipment) to look at differences in DNA that all people have (except twins). What we see is basically an address to one’s body. If you go some place and touch something, you leave cells behind. We can look at the address of those cells and match them to your body, and confirm those cells came from you.

I’m a forensic scientist specializing in DNA in the US. Feel free to ask me anything.

The code of your DNA is unique to you. Police compare the code of a sample they have to a suspect’s code and see if they match.

It may be different in human mess but for animals although I don’t really see a reason why, but this is how we do it for animals (I’m in the veterinary field):

While some parts of your genes have a decently even mix of A, T, G and C (I see other comments have already explained what these are so I’m not going into detail), other regions are very repetitive. Some of them are called “variable number tandem repeat” or VNTR, which is really just a complicated way to say there’s a pair of two letters repeating itself, eg ATATATATATATATATATATATAT.

Now the interesting part is that these VNTRS can vary in the amount of repeats. I might have 10 pairs of AT while you might have 15 pairs. This means that by measuring the length of these VNTRs, we can determine whether two DNA samples match up.

We can do that by cutting out the fragments of the chromosome that contains these VNTRs and measuring them using advanced machinery.

Of course it’s a little more complicated than that, and there is a bunch of statistics involved to make sure it’s not just a coincidence that two DNA samples would happen to have the same length VNTRs, but that’s the gist of it. In the end, if you consider enough VNTRs and do the math properly, you end up with a conclusion along the lines of “there is only a 0.01% chance that these two samples that have the same VNTR lengths for every VNTR we analysed, do NOT belong to the same individual”.

It’s like a genetic fingerprint!