How do we know DNA sequences make life similar to other life?

394 views

So we can sequence DNA, and we can compare the amino acids to match other strands, but how do we know it means anything bordering on similarity of what it actually represents?

I’m a programmer so the best I can give an analogy to is binary, I guess. Just because you have a bunch of 0s and 1s that match from one part of the machine to another doesn’t mean it acts the same among the “thing” it is a part of when compared to the other.

How do we know our DNA does with a level of certainty to say “this organism is similar to that one?”

In: 0

6 Answers

Anonymous 0 Comments

DNA isn’t just 1’s and 0’s. It’s an instruction set architecture for biology (sticking with the OP’s programming analogy). When we find ADD instructions in a plant that do the same thing as ADD instructions in humans (like producing a valuable molecule such as adenosine triphosphate (ATP)) that tells us that the plant and the human are running the same instruction set.

Anonymous 0 Comments

It is a lot of observation.

We can look at the DNA of various animals that look alike (dogs for example) and see that they have very similar DNA, compared to say a dog and a cat. If we look at animals that reproduce we can see that their offspring tend to resemble them (like a cow that produces a lot of milk is likely to have offspring that produce a lot of milk too) and that their DNA is very similar, but with minor alterations. It doesn’t take much brain power to extrapolate that minor changes over many generations results in a gradual changing of DNA and the resulting alteration of organisms.

Anonymous 0 Comments

Let’s stick with programming concepts.

DNA isn’t just a sequence of A/T/C/G with no organization, just like binary isn’t just a sequence of 0/1. First, nucleotides in DNA are interpreted in groups of 3 to form a codon, comparable to how bits are interpreted in groups of 8 to form a byte. And what those codons translate to as amino acids is consistent between different organisms, like different computers interpreting bytes to text through the same standard (e.g. UTF-8). On top of that, protein-coding genes are somewhat similar to computer files or file formats; they have sequences that indicate their start, end, intermediate breaks and even metadata, in a way, which are also consistent within major branches of life (and done in different but conceptually similar ways between branches).

Lastly, when you start researching individual genes, you’ll generally find similar genes in other organisms. The proteins they encode are the tools of the cell and ultimately represent how *things get done*. The more similar these are between organisms, the more similar those organisms tend to be. If it quacks like a duck and all that.

Anonymous 0 Comments

So the binary bits of 1 and 0 would be analogous to a single DNA nucleotide: A,C,G, or T.

Individually, they don’t do much. So in computers, you group 8 of them together to make a byte (let’s use ASCII as an example). In biology, we group 3 nucleotides together to make a codon.

We’ve standardized binary codes to encode different ASCII characters. Life has standardized codons to encode individual amino acids, which are chained together to form proteins. Proteins do just about everything in the human body. Biologists have determined that the codons are generally universally encoded across all living things. (Google “protein translation” for more info).

This means we can look at the DNA code, and know what the amino acid sequence in a protein. We can use that information to infer its structure and function.

Anonymous 0 Comments

ELI5: There are ways in which the same DNA sequence can do different things in different animals, but they all depend on other bits of DNA.

Knowing perfectly what DNA does just by its sequence is basically not possible at this time. The only way right now to take a genome and say “this is what kind of organism this genetic sequence would grow up into”, would be to compare it to the genomes of other living organisms, and make an educated guess based on the similarity.

That said, we do know a lot about how different biological systems work, so we have a lot of structured ways of figuring out what we’re looking at, with genomic data. We can do things like take an entire chromosome’s sequence, and pick out which regions of it are actually genes, and we can therefore usually also identify genes that look like genes in other species known to be involved in major, important control functions.

Everything that follows ranges from the ELI15 to the ELI50:

Every three bases in a coding sequence codes for a single amino acid. How does that happen? It happens because when the ribosome is reading off the initial mRNA transcript, it requires, in order to keep reading, certain things called transfer-RNAs to “match” the next codon and add their specific amino acid onto the growing protein chain. The transfer RNA works because it folds into a shape that can interact with the ribosome and also do two other things: match a certain part of the DNA, while binding to a specific amino acid.

So the DNA that codes for the t-RNA libraries, is what determines how an organism reads the coding regions of its own DNA. There’s one nearly-universal one that is standard, but there [are alternatives](https://en.wikipedia.org/wiki/List_of_genetic_codes).

Okay, so once you have a coding structure, you have to specify which regions of the DNA are actually meant for coding proteins. To do that, there are certain upstream (and sometimes downstream) structures and patterns that do the job of recruiting proteins like RNA polymerase that transcribes DNA into an mRNA transcript. There are various DNA patterns that do this, a lot of which are called [promoters](https://en.wikipedia.org/wiki/Promoter_(genetics)) because they promote the activity of a gene, usually a nearby one.

What makes a promoter DNA sequence able to serve as a promoter? It’s able to serve as a promoter because of the existence of proteins called transcription factors (TFs) that are able to bind to the promoter sequence. TFs also bind to other TFs, or to RNA polymerase. By being able to bind together, they form active little clumps of protein that encourage RNA polymerase to attach to the gene and start transcribing.

When a TF is read off, it attaches to the promoters of other genes to ultimately recruit RNA polymerase to transcribe them. Those other genes might be TFs themselves, which means that you can get complex cascades of TFs that set each other off, one by one, as well as setting off any genes that share the promoters involved. These TF cascades can activate whole biological programs involving many genes; they can be useful precisely *because* they serve as centralized control loci for evolution to evolve useful functions on.

So the DNA that codes for the protein itself doesn’t (usually) determine for itself when it gets expressed. For two species that share the exact same DNA sequence for a protein, the same protein might get expressed in wildly different contexts. But it’s still DNA controlling all that, just, it’s the promoters, and the transcription factor cascades encoded in the linkage between promoter and TF sequence… it’s these *other control sequences*, that control gene expression. Still DNA, just a different type.

Lastly, there’s a bunch of [post-transcription control](https://en.wikipedia.org/wiki/Post-transcriptional_regulation) that takes place at the RNA level. RNA transcripts can interact with and bind to one another in ways that can silence each other’s expression, modify the sequence of a transcript, and a lot more.

Anonymous 0 Comments

> How do we know DNA sequences make life similar to other life?

Because when they have the same DNA, they make the two things look and act similarly. With the key point that mutations in DNA cause deviations from the norm, and when others have those same DNA mutations, they have the same deviations.

>So we can sequence DNA, but how do we know it means anything?

Because can identify traits that come from DNA. If you sequence 10,000 plants and the 5 of them all have giant fruit all have the same bit of DNA (that’s different than the other 9,995 plants), that tells you that strand of DNA means “big fruit”.

>I’m a programmer so the best I can give an analogy to is binary, I guess. Just because you have a bunch of 0s and 1s that match from one part of the machine to another doesn’t mean it acts the same among the “thing” it is a part of when compared to the other.

….Yeah it does. If a file starts with “PK” in ascii it’s more than likely to have some sort of zip compression algorithm because the whole thing was invented by Phillip Walter Katz who got FUCKED on copyright.

If you find a section of opcodes that have a lot of memwrites, you obviously know that has something to do with data. If you find a section that mostly shuffles stuff off to the northbridge, that’s most probably video or number crunching. You can see patterns like “yep, obviously that’s a CRC check” which means that sections is DRM, went over a lossy connection, or it’s a vital section. DNA has that too.

We’ve identified long-jumps, checksums, code scrubbers, and I/O calls (genes). There’s a whole world of software equivalents in genetic engineering. I recommend “Herding Hemingway’s Cats” for a little exploration of the crazier bits of genetics we’ve discovered.

>How do we know our DNA does with a level of certainty to say “this organism is similar to that one?”

Literally just a diff. Any small change, and I mean a single acid (bit) off can kill the thing, but if the codebase is a near-clone, the creature that grows out of it will be very similar.