Eli5: How do we know which genes do what, and what about combinations and power of expression?


Eli5: How do we know which genes do what, and what about combinations and power of expression?

In: 25

Tl;dr at the end

There are a number of ways.

They boil down to working backwards from the effect of a disease to find a gene responsible for it then working on that gene to find out its normal function, or working forward and taking a gene to work out its function from experiments when you remove the gene in cell cultures or animals

One of the original ways was looking at people with the same genetic disease and analysing their DNA for the same genetic defect. One example is Duchennes muscular dystrophy. We could work out it’s genetic since it can run in families. And that the gene must be on the X chromosome since when it skips a generation is always skipping through a mum, and only boys are affected. This let us know to only look at the X chromosome. Most boys with duchenne’s have a really big piece missing from the gene, so it was one of the ones that was easier to find than other defects. Due to the clinical features we knew it was involved in muscles somehow. Work on biopsies of muscles helped identify the protein missing and therefore what the gene made.

Other similar disease type ways of finding out a gene for a condition is when someone with a condition has it as a result of a break in two chromosomes and the parts that break switching places. This shows that the issue is with at gene at either of the two break points and allows that gene to be further explored.

Other ways of working out function of genes is to create animals lacking that gene, although I don’t know how this is done. These animals are often mice as they have many similar/same genes, grow to maturity fast and are small enough to be practical to keep many. These mice are referred to as “knock out mice” since the gene has been knocked out. You then see the effect of the lack of that gene on the mouse and work on why that occurred (eg looking for the tissues affected, trying to find the protein or single that has been damaged, etc). It also allows experimentation of possible treatments for the disease.

Another way of finding out where a product of the gene is expressed it to do things like tag the gene with a fluorescent tag in animals (zebra fish are useful for this). It then results in the tissues where that gene is active glowing. Or they can do similar tags that mean it’s detected. This article has interesting info:


Penetrance of a gene (if having a mutated gene always leads to disease or not) can be done by studying certain population groups. An example is people who have a variant in the BRCA 1 gene. They are at increase risk of certain cancers, such as breast, ovarian prostate. But not all people with a BRCA 1 variant go on to get these cancers

For expression, it can be the same as the above. You take a population with the variant, but also have the disease, and look at what they have. So in BRCA 1 example, it would be what cancer and at what age. You can also look at response to treatment to see if the cancers act differently to other cancers of the same tissue.

At a lab level for penetrants and expression, you can do similar experiments with the knock-out mice or other animals.

For combinations of genes, again you can look at populations with the gene and do large exome (sequence only the genes of the person’s DNA) or genome analysis (sequence all the DNA, so all the genes plus the non-coding DNA between the genes). Then if you find a common second variant in another gene you can compare how those peoples features differ or are similar. It’s also how you can look at environmental impact factors (eg in the BRCA 1 example, look to see if those who smoke have further increase incidence of the cancers, or how many children the women have, or anything else)

The population level stuff needs a relatively large population to look at, so can only be done with relatively common diseases or features.

Increasingly the DNA can be analysed and computer programs have been developed to work out the shape of protein it makes, and therefore it’s potential role.

And there will be a whole host of other ways too. All of them combined lead to understanding of the impact of that gene in its typical fashion

TL:DR: a butt load of different types of experiment. But some more tl:dr info
– see disease -> find similar gene defect in all -> analyse that specific gene product
– create animals (often mice) without that gene and see it’s impact to compare with mice with the gene
– create animals with the normal gene with some kind of tag so you can see where that gene is expressed and therefore know what tissues it is important to
– penetrance and expression of genes – populations of people/animals with a gene variant you want to study and see who has what feature and who doesn’t
– combinations of genes – it gets complicated but basically a combination of all of the above but looking at more than one gene at the same time
– computer predictive software that is getting better and better at predicting the function of the product of the gene

There is a lot still do develop in how we understand how individual genes work, and how genes interact with each other and the environment, but our knowledge and equipment is getting faster and faster

That was very long winded, even the tl:dr but I hope it made some sense.

We are gathering knowledge on that since the 1950s.
There were several methods. One idea is that you randomly destroy genes by x-ray and then select phenotypes. Such work was done even before we could sequence genes, and gave the knowledge of what goes wrong if a gene is missing.
Gene interactions were discovered by crossong those mutants.

Another approach was to look at things at the protein level. You could extract protein from the organism and compare the amounts (too little, too much). Protein interactions could be discovered by using one protein as a hook and fish out everything that binds to it.

There were methods to stain proteins and do the staining in situ (within the cell) so you can tell where exactly it is.

There were methods to crystallize a protein and determine its 3D structure. Then compare 3D structures, organize them into families, and based on knowledge of one family member, guess the function of another. (And of course test the guess.)

Once you could sequence and synthesize DNA, a whole bunch of methods came up. You could now inject a gene and look what it does. You could now exactly tell what protein comes from a gene. It sometimes turned out that a gene has 2 names because people thought they were two different things.

Then RNA chips came in, and you saw mass data on expression. Whole genome sequencing. Single cell sequencing. Nowadays everything about the whole proteome, genome, etc interaction, and it is mostly done with computers.

So in old days they had painstakingly lot of work to isolate one single protein, and asked all possible questions about it. What’s the size? What’s the charge? What are the interaction partners? What if I inject it into a fertilized drosophila egg?

Nowadays they work a lot on huge databases of all knowledge and how to squeeze out some meaningful information.

Folks have already given very detailed answers. I’ll just add a short and simple one: Lots of trial and error.

You cause an error in Gene x and see what changes. Is a protein missing? Or is something over expressed? Then you try and establish the exact mechanisms how the gene causes the effect.

Fun fact: some genes are named after what the cause when they are deactivated. For example the Wuschel gene in plants leads to a change in appearance. “Wuschel” is German and means something like “fluffy” or “hairy”. The plants which had a mutation in this gene, only grew very short stems, leading to a “fluffy” appearance.