Eli5 why does a survey need to have a minimum of 30 respondees to be statistically significant?

452 views

I’m not the best when it comes to stats. But I see some surveys who publish findings based on <30 responses. And I know that these are not valid, and it has something to do with the normal distribution.

In: 0

18 Answers

Anonymous 0 Comments

30 is not representative at all.

Generally the minimum is 1000 people taken at random to be at least a bit representative

Anonymous 0 Comments

I always heard 1500 participants are required to make it valid.

But look how easy even that many can be corrupt. What if they’re saying 100% of people think God is real, but the survey featured 1500 people from rural Texas. It would be heavily biased.

Source: Took 4 Statistic classes in a row when getting my MBA.

Anonymous 0 Comments

There are 7.8 billion people in the world. Obviously, many surveys are only intended for a certain audience, and even in very general surveys there is a problem of getting that many people to take part. But the more people you ask, the more likely you are to get a general result that really reflects the average or majority answer.

Say you are friends with a lot of model train enthusiasts. You ask the first 30 people you know for their main hobby, and the answer will be appear to be overwhelmingly model trains. But expand that to the first 300 people you know, and some more common hobbies will end up being the result, but model trains might still stick out as being quite a popular answer. Expand that to 3000 people, and you’d start to see that model trains are actually a pretty niche hobby.

There can still be biased or unusual results in a survey with more people, because you have to consider how the participants were found and how diverse they are in other categories (for example, a poll on Reddit with thousands of answers will still only reflect the opinions of people who spend time on Reddit). And a smaller survey could actually have very representative answers, but this is hard to know.

There’s a lot to consider in statistics. But it’s generally an important factor to consider to get a wider distribution of answers.

Anonymous 0 Comments

30 is not representative at all.

Generally the minimum is 1000 people taken at random to be at least a bit representative

Anonymous 0 Comments

30 is not representative at all.

Generally the minimum is 1000 people taken at random to be at least a bit representative

Anonymous 0 Comments

I always heard 1500 participants are required to make it valid.

But look how easy even that many can be corrupt. What if they’re saying 100% of people think God is real, but the survey featured 1500 people from rural Texas. It would be heavily biased.

Source: Took 4 Statistic classes in a row when getting my MBA.

Anonymous 0 Comments

I always heard 1500 participants are required to make it valid.

But look how easy even that many can be corrupt. What if they’re saying 100% of people think God is real, but the survey featured 1500 people from rural Texas. It would be heavily biased.

Source: Took 4 Statistic classes in a row when getting my MBA.

Anonymous 0 Comments

There are 7.8 billion people in the world. Obviously, many surveys are only intended for a certain audience, and even in very general surveys there is a problem of getting that many people to take part. But the more people you ask, the more likely you are to get a general result that really reflects the average or majority answer.

Say you are friends with a lot of model train enthusiasts. You ask the first 30 people you know for their main hobby, and the answer will be appear to be overwhelmingly model trains. But expand that to the first 300 people you know, and some more common hobbies will end up being the result, but model trains might still stick out as being quite a popular answer. Expand that to 3000 people, and you’d start to see that model trains are actually a pretty niche hobby.

There can still be biased or unusual results in a survey with more people, because you have to consider how the participants were found and how diverse they are in other categories (for example, a poll on Reddit with thousands of answers will still only reflect the opinions of people who spend time on Reddit). And a smaller survey could actually have very representative answers, but this is hard to know.

There’s a lot to consider in statistics. But it’s generally an important factor to consider to get a wider distribution of answers.

Anonymous 0 Comments

There are 7.8 billion people in the world. Obviously, many surveys are only intended for a certain audience, and even in very general surveys there is a problem of getting that many people to take part. But the more people you ask, the more likely you are to get a general result that really reflects the average or majority answer.

Say you are friends with a lot of model train enthusiasts. You ask the first 30 people you know for their main hobby, and the answer will be appear to be overwhelmingly model trains. But expand that to the first 300 people you know, and some more common hobbies will end up being the result, but model trains might still stick out as being quite a popular answer. Expand that to 3000 people, and you’d start to see that model trains are actually a pretty niche hobby.

There can still be biased or unusual results in a survey with more people, because you have to consider how the participants were found and how diverse they are in other categories (for example, a poll on Reddit with thousands of answers will still only reflect the opinions of people who spend time on Reddit). And a smaller survey could actually have very representative answers, but this is hard to know.

There’s a lot to consider in statistics. But it’s generally an important factor to consider to get a wider distribution of answers.

Anonymous 0 Comments

Generally speaking, the more observations you make (e.g. survey responses), the easier it is to detect an effect. Probably what you heard is that, for the kind of effect sizes one usually sees in whatever context was being discussed, it takes ~30 responses to be reasonably sure (probably a 5% or less chance of being wrong) that the difference observed is caused by a true difference in the population and not mere chance (i.e. you just happened to get a sample where your hypothesis was true, even though it isn’t true for the population).

The classic example is pulling coloured balls from a bag. How many balls do you have to pull to get a good idea of what percentage of the balls in the bag are what colour? It depends, of course, on how many balls there are and how the colours are distributed. You have to at least estimate those numbers before you decide what kind of test to do. If there are only ten balls, you could probably just do a census – i.e. look at every ball. If there are 500k balls, you’ll only be able to observe a sample. But how big a sample do you need? If you expect the distribution to be ~evenly divided between two colors, you may be able to get away with only 30. If, however, you expect ~25 colours, or that some colours will show up only ~1% of the time, say, you’ll need a lot more observations before you can be reasonably confident your sample resembles the population (every ball in the bag).

Bear in mind that most statistical tests assume the sample was drawn randomly. In practice, it is very hard if not impossible to randomly sample humans for a survey. So you generally will want to get more responses to make your statistical tests more powerful (more likely to distinguish a true effect) while keeping your significance level (likelihood that the effect observed is only by chance) reasonably low.

If you could get a truly random sample, you’d need fewer observations to have a good chance that your sample is representative. If it’s only mostly random, there’s a higher chance that any effect you observe is because of a bias in the sampling. Thus, you will probably want to be more strict in declaring that an observed effect is genuinely present in the population.

But by choosing to reject more findings that could have happened by chance, you make it harder to accept findings that are because of a genuine effect in the population. A real but small effect in the population is not easily distinguishable from a small effect in the sample caused by nonrandom sampling.