How exactly can DNA be explained as having about 700 terrabytes of information?

197 views

Please excuse my terminology, for all intents and purposes, I am 5 years old.

[This post from AskReddit](https://www.reddit.com/r/AskReddit/comments/u2ll98/whats_a_cool_fun_fact_that_you_know/i4jwtbe/) is confusing me. How exactly can a physical thing like DNA, be able to have an amount of digital size applied to it?

How do you correlate what I know of as a tiny tiny tiny little piece of my body, with a digital number given to the size of a program, or a hard drive, or a load of files?

In: 2

3 Answers

Anonymous 0 Comments

That statement is sort of misleading.

The entire human genome is actually about 750 megabytes. That comment got terrabytes from assuming a certain molecular mass of DNA but we don’t really measure molecular mass that way. It also conflates data with information when they are not the same thing.

As for how a physical thing like DNA can have a digital size, if you remember from biology class, DNA stores information using just 4 molecules that we call A, C, G, and T. That means we can assign each of these letters a bit. These molecules always come in pairs. We call these base pairs, so a single base pair is 2 bits. There are about 3 billion base pairs in the human genome, and 6 billion bits is 750 megabytes.

You can actually represent a human genome with much less though, because most of those base pairs are not used to encode for proteins, and of the ones that are used, many of them are identical in all humans. Also, A and T always go together and G and C always go together, so you only have to store one strand which cuts the amount of data in half. If you have a sequence that goes ATGC on one strand, you already know the sequence on the complementary strand has to be TACG.

All said and done, you can store a lossless, compressed full human genome in about 4 megabytes.

You are viewing 1 out of 3 answers, click here to view all answers.