How does dna being data work?

917 views

I saw a meme on TTT about how one sperm cell has 37.5MB of DNA on it, and a full ejaculation is something like 1500 terabytes. How do they figure that out

In: Biology

Anonymous 0 Comments

In order to determine this, we first need to determine how we’re defining data.

DNA comes in four chemicals (adenine, thymine, guanine, and cytosine), and they always pair up in a particular way: adenine and thymine pair up, and cytosine and guanine pair up.

This means there are four cases to consider: A-T, T-A, C-G, and G-C. No other combinations are possible.

This means that, to represent four base pair combinations, we need to use (at minimum) two bits each. `00` could represent A-T; `01` could represent T-A, and so on.

There are eight bits to a byte, so we can encode four DNA base pairs into a single byte.

The human genome contains about 6×10^9 base pairs. Given that, we can calculate the total size of the human genome to be

>6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes,

or roughly 1.5 GB. Since sex cells contain half the genome each, a sperm cell holds 750MB. Similar calculations reveal that an average ejaculation contains something on the order of 135,000 TB.