How exactly can DNA be explained as having about 700 terrabytes of information?

184 views

Please excuse my terminology, for all intents and purposes, I am 5 years old.

[This post from AskReddit](https://www.reddit.com/r/AskReddit/comments/u2ll98/whats_a_cool_fun_fact_that_you_know/i4jwtbe/) is confusing me. How exactly can a physical thing like DNA, be able to have an amount of digital size applied to it?

How do you correlate what I know of as a tiny tiny tiny little piece of my body, with a digital number given to the size of a program, or a hard drive, or a load of files?

In: 2

3 Answers

Anonymous 0 Comments

The way I’d think about it is that in computer storage systems, stuff is made of bits. I’m no expert on computers, but say a bit is a set amount of binary digits. That’s comparable to DNA and it’s respective bases, which will ultimately code for amino acids and thus proteins. So if your entire genome is arranged as 4 bases as sequences, you could segment those bases as the same size as a bit, and then add them all up to correlate base pairs to computer storage.

Anonymous 0 Comments

That statement is sort of misleading.

The entire human genome is actually about 750 megabytes. That comment got terrabytes from assuming a certain molecular mass of DNA but we don’t really measure molecular mass that way. It also conflates data with information when they are not the same thing.

As for how a physical thing like DNA can have a digital size, if you remember from biology class, DNA stores information using just 4 molecules that we call A, C, G, and T. That means we can assign each of these letters a bit. These molecules always come in pairs. We call these base pairs, so a single base pair is 2 bits. There are about 3 billion base pairs in the human genome, and 6 billion bits is 750 megabytes.

You can actually represent a human genome with much less though, because most of those base pairs are not used to encode for proteins, and of the ones that are used, many of them are identical in all humans. Also, A and T always go together and G and C always go together, so you only have to store one strand which cuts the amount of data in half. If you have a sequence that goes ATGC on one strand, you already know the sequence on the complementary strand has to be TACG.

All said and done, you can store a lossless, compressed full human genome in about 4 megabytes.

Anonymous 0 Comments

Imagine an alphabet with only 4 letters. that’s what DNA is. It’s a long chain of “nucleotides” or “bases” and there are 4 kinds. The sequence of those bases is how you encode information, just like spelling words. But since there are only 4 bases, you have to use groups of bases to spell anything useful. It takes 3 bases to form a useful “letter”.

Our cells use this to store instructions for how to make proteins. But you could, theoretically, store any kind of information in a DNA sequence.

So a group of researchers at Harvard managed to store information in DNA at a density of 700 terrabytes per gram of DNA.

But a gram of DNA is actually **A LOT** of DNA. a single human cell contains only 6 picograms of DNA, and that’s enough information to build a human.

so a gram of DNA is equivalent to the DNA from 1,000,000,000,000 cells.