How do scientists read DNA sequences?


I’ve been watching some videos about DNA and came across a DNA sequence with a bunch of letters for a virus. How do scientists read them or make sense of them? Can they tell what a virus/or DNA will do or how it will work, just from the sequence?

In: 5

DNA sequences consist of 4 nucleic acid residues – these are just simple molecules – and they are abbreviated by four letters: atgc.

Those letters can encode proteins, which do a lot of the chemical work that makes up what we think of as life. The atgc letters encode each subunit of a protein (an amino acid residue) by their arrangement in triplets such as acc, tca, ccc – each of these triplets are called codons.

So, if you want to read a coding sequence, you can manually start looking at the sequence until you see a “start” codon (atg) and then just roll down the line a codon at a time until you get to the end. The end is encoded by a “stop” codon. You then look at all of the amino acid residues in sequence to figure out what the encoded protein is by comparing it to other known protein sequences. If it does not match anything, then you have to do a ton of work to figure out if the protein is actually made by the organism and then what it does. You can usually get at least some idea of basic function just by looking at the sequence.

Now, that is just what is called coding sequence. There is a whole lot of DNA around and within coding sequences and the analysis of those sequences is much, much more complex. This “non coding” sequence may regulate gene expression, contribute to the physical structure of the genome or maybe not do much of anything. The best ways to analyze this information these days is by using special software that looks for certain sequences or patterns.

You mentioned viral DNA. A viral genome is almost always arranged with high efficiency and has complex gene regulation. An expert in viral genomes can quickly identify the majority of coding sequences in a viral genome, but figuring out the nuances of gene regulation and protein function can take years.

I hope this helps.

Most of the time, we use DNA in comparison to other sequences. It’s less about translating it like it’s a different language and more about seeing if the DNA is similar to a sequence you already know. So you have to have some prior knowledge to compare the DNA to, which is why the human genome project was so important. Now we can compare and contrast DNA to see what’s different in someone with a disease and that will help point us towards the protein it was supposed to make etc etc.

However, because we have a LOT of data about DNA, we can make really good predictions by comparing and contrasting that DNA. Sometimes we can look at a virus’s genetic information and know that it will be resistant to a certain drug. Sometimes we just don’t have enough context clues to make an assumption like that.

There is a lot of DNA that is “structural”, so it serves as (important!) space between genes. A gene is a segment of DNA that is destined to become a protein. Structural DNA never gets translated into proteins. Because the structural DNA isn’t distinguishable from DNA that is part of genes, we can’t really “predict” how a set of DNA will work from just the letters. (This is also similar to exon/introns if you’ve heard those words. Not quite the same thing but kind of the same idea.)

Also, when a gene gets translated into a protein, there are “chaperone proteins” around it that help it form it’s correct shape. These are complicated and unique interactions. Each one of those interactions has to be studied on each individual protein because it’s different from protein to protein. Sometimes we can do that but a lot of times it’s hard to convince someone to fund your research just because you’re curious.