“Lies, damned lies, and statistics” is a phrase describing the persuasive power of statistics to bolster weak arguments, “one of the best, and best-known” critiques of applied statistics.[2] It is also sometimes colloquially used to doubt statistics used to prove an opponent’s point.
from wikipedia, which leads me to my question:
what is this critique?
In: 0
I don’t know the referenced text that critiqued statistics. But I’ll try to answer your question generally.
Statistics is simply a way to interpret data probabilistically. It uses all sorts of models, which carry lots of assumptions, to spit out a statistic that I can describe the data and interpret it. While statistics as a field is in itself a science, it is used to interpret data in other fields of science. The validity of the interpretation in actuality depends on three conditions. One is that you applied the statistical tests correctly, meaning you satisfied all their inherent assumptions. Two is that the model the test uses properly captures the nature of whatever it is the data set is derived from. And three is that the data is obtained from a scientifically sound experiment.
Let’s delve deeper into those three conditions.
Condition 1: for example, let’s take an ANOVA test. This test takes some data set which let’s say has four groups (two independent variables, one dependent, so two way ANOVA). Then this test compares the intra group variance to the inter group variance. Let’s say the four groups have the following means and standard deviations.
0.1 +/- 0.01 kg.
10 +/- 1 kg.
6000 +/- 500 kg.
9000 +/- 600 kg.
It’s clear that the variance within the first group and within the third group is smaller than say the variance of them combined. This indicates they’re probably measuring the parameter in different groups. The test would tell you, yep the first group is significantly different than the third. But then you compare the first and second groups. They’re also clearly different no? The test would say they’re not, despite a 100 time difference in the mean and very non-overlapping standard deviations. Why does it say they’re the same then? Because you violated an assumption of the test. That assumption is homoscedasticity. Which means uniform variance. It basically says it can only compare groups in a data set if the variance within each group is similar. But simply because in the last two groups the mean is sooo high, the error is also high, making the variance go up and differ. The test will then still capture this difference between the first two and the second two groups, but the difference between the first two will just be masked. So here, you got a wrong interpretation because you used the test wrong. You’d be amazed how often people do that, yes in publications, yes in big journals.
Condition 2 is a bit similar because it also lies in the assumptions of the test, it may just not be clear to the researcher. A lot of the tests we use are parametric meaning they assume a normal distribution, so a gaussian or a bell curve. But let’s say you got your data from a sample that has both males and females. You think the parameter you measured is sex-independent. But perhaps it is. Let’s say that parameter is height. If you get 1000 random people, with a rough 50/50 split between males and females and plot on a group the amount of people at different height ranges on the y axis and the range on the x axis. Like 10 people at height range 1.9 to 2 meters 200 people at 1.8 to 1.9 meters etc. If you look at the graph after, it doesn’t look like a bell curve, it’ll probably look like a twin peak mountain. That’s because there are two populations here, and they have different average height with a lot of other heights hovering around that average. This means the population has a binomial and not a gaussian distribution. Meaning the model this test you used uses is not applicable and so the results it spits out cannot be interpreted meaningfully.
This is in condition 2 and not 1 because it’s not always so simple. We don’t usually have n=1000 in experiments, unless you’re running a gigantic clinical trial. And so even if you plot the distribution you may not realize it’s binomial because you so happened to randomly pick subjects that can form an approximate normal distribution even though in the real true big population that you sampled from that’s not true.
Now let’s talk about condition 3.
There are two main types of research approaches. One is experimental, and one is observational. If you want to see whether smoking is bad for you for example, it would be unethical to get 1000 people and make half of them smoke while making the other half not smoke. Unethical because in order to come up with that hypothesis, you must have had a rationale suggesting smoking is bad which means you can’t actively make people do it. But what you can do is observe those who already do smoke and compare them to those who don’t. The issue here is that you’re not in control, you couldn’t make sure that everyone in the groups is as similar as possible in all other aspects to the other group such that any difference between them is strictly due to smoking. Perhaps for example, people who don’t smoke also don’t drink and don’t do drugs because let’s say those variables tend to go together in society (not true but whatever, it’s an arbitrary example). So this means a difference you observe between the two groups might be caused by alcohol, drugs or just about any other variable that may be different between them but you’re unaware of it. Of course, statistical tests have ways where you can incorporate all the different variables you think might mess with the results and account for them but it is impossible to know them all. This means the only conclusion you can make is that smoking as a variable is associated or correlated with whatever it is you end up observing in the dependent variable.
While scientists are usually equipped with the knowledge to understand that a correlation does not necessarily imply causation, the general public isn’t necessarily so, and in combination with click bait titles, you can easily mislead people into misinterpreting the correlation as causation.
But it’s not just correlational studies, even experimental studies have such issues. You’re never 100% in control. Working with mice, perhaps your testing the effect of a given diet. Mice are usually fed ad libitum, meaning they have free access to food. So if you want different mice to have different diets, either you put your mice is two cages with each having a different diet or you separate the mice into individual cages. Let’s say you take the first approach. Now even though say you used 10 mice, 5 with each diet. You don’t actually have that much statistical power, because you have functionally n=2. Since cage allocation is another variable that is different between the groups and common within each group. Perhaps one cage was dropped by the animal caretaker stressing the mice making them show different results. Perhaps that cage so happened to have a particularly aggressive alpha causing a lot of fighting and stress. Then the results you get aren’t caused by the difference in one variable, the diet, but also cage, and you don’t know which was the culprit. If you took the other approach, putting each mouse alone, well mice get very stressed if alone for long. Perhaps stress causes a physiological reaction that masks the difference in the diet. For example maybe now they eat less or more, because they’re depressed, and then you get either highly variable results because each mouse had a different amount of each diet or highly similar results with no difference between the groups because they all ate so little.
And it’s not always so simple. Most of the time we cause differences we’re not even aware of. Sometimes you have to do your experiment at different times of the day and biology has a huge circadian aspect. So the variable you’re interested in measuring is probably changing at different times of the day. Heck even sometimes if you do the experiment in two batches and in one batch you’re on holiday so you ask your colleague to do it, chances are you get a binomial distribution. Believe me, I’ve had it happen to me. Mice even react differently if you’re a male or female researcher.
So you see… Even if you do everything right, you could still have wrong interpretation of data. That’s why we never say we “proved” anything in science, we just supported it. Statistics is a beautiful science that allows us to make sense of the chaos, but everything has its limitations. That’s why if a paper is published in a prestigious journal showing a very clear result that is very thoroughly thought out, you don’t fully trust it until an independent group somewhere else in the world reproduces it, and then another and another.
Latest Answers