> How is it acceptable that 0.0002% of the population is accepted as representative?
The math explanation is in other comments, but I’ll give you one better: because we have seen time and time again that they are usually right. Just check surveys for past elections and they will usually be around the actual result. So we’ve come to accept them.
Suppose there is an issue on which people are split 70:30. If you ask a randomly chosen person, with probability 70% they will answer “yes” and with probability 30% they will answer “no”.
If you poll 1000 people, chosen at random and mutually independently, the most likely outcome of the poll is that you get 700x yes and 300x no. So far, this should be pretty obvious. It should also be obvious that usually you’ll get something approximate and not the exact split, such as 684 times yes and 316 times no.
The main question now is: how likely is it that we get something *substantially* different as the outcome of our poll? Can we realistically get, for example, 300x yes and 700x no?
And the answer is that this is *very very* unlikely.
We can do the math and calculate that in our example scenario:
* Already getting the 500-500 split or anything worse is almost impossible. Roughly on par with having a day on which you take part in four separate lotteries, win the jackpot in all of them and then get hit by lightning. (And yes, I did the math here.)
* Almost all polls will give you a result somewhere between 650-350 and 750-250. Maybe once per 1000 such polls will you see something that falls slightly outside these bounds, but almost always you’ll get a very good estimate.
That’s plenty accurate if all we need is a rough estimate.
1. You can get a good, although not very great, statistical analysis from a sample size of 1,000/200,000,000 if you have done an excellent sampling.
2. You may get a terrible statistical analysis from 100,000/200,000,000 if you have done a horrible sampling. For instance, if you go to Nigeria, and ask 100,000 children about who would be the next US presidential elections, you will not get a healthy result. Of course, I exaggerated this example, for ease of explanation.
Those being said, if you do an excellent sampling and increase your sample size, you will have healthier results. Let’s give an example from probability. In reality, if you throw two six sided dice, you have 1/36 chance to have 6-6. However, in practice, you may throw 6-6 three times in a row even if you only throw three times. Depending on that observation, and only on that observation , you may say that throwing 6-6 is a %100 chance. However, if you throw the dice 10,000 times, you will see the factor of luck will be minimized, and in 100,000 throws, it will be further minimized (eliminated in the infinity). Since it is not optimal (for money, time, and human resources wise) to ask to all 200,000,000 eligible voters, for instance, it would be a good idea to have a sample size which is neither too small, nor too costly.
A *random* sample asks 1,000 people picked at random the same questions.
A *representative* sample asks 1,000 people picked according to a data filter the same questions. So e.g 51% of Americans are women then you will want to include 510 women if your survey. If 20% of women voted Republican in the last election then you’ll want 102 women in your survey who vote Republican.
Preparing and optimising sets of people for surveys is a part of statistical analysis , there are lots of methods to identify groups and subgroups within a large population, and of course ways that survey results can be skewed to give a result before the question is even asked (99% of people surveyed ( ^at ^a ^gun ^convention) say they’re in favour of looser gun controls). Numbers don’t lie, but statistics can be fudged and misrepresented very easily, which is why random surveys (and even more structured ones with a low sample size) shouldn’t be taken at face value as indicative of a majority opinion.
One thing to keep in mind is that the questions are very broad. We’re not getting detailed opinions. It’s closer to a coin flip dichotomy than an essay question.
You flip a coin 1000 times the overall statistics are going to be extremely close to 50-50. And when it comes to a population of people, if you take pains to spread the sample out over all demographics then you should be in the ballpark of being correct.
One has the *option* of flipping a coin 400,000,000 times instead of 1000 but the percentage isn’t going to noticeably change.
Because statistics is counterintuitive.
A RANDOM SAMPLE of around 1000ish people (actually you’ll find its usually more like 1200, that’s important) will give you answers that are only about 3% off from reality.
The important part is the random part. If you grab, say, 1000 people from downtown Manhattan, you won’t get a picture of the country. You absolutely must have the closest to a perfectly random sample as you can get. Which is actually fairly difficult.
A really great example of this is from the first real scientific polling done on a Presidential campaign. In 1936 the Presidential election was up between FDR and a guy you’ve never heard of before named Alf Landon.
A magazine called Literary Digest had been doing polls of its readers, and it had a LOT of readers, for several Presidential electons and had been right for severasl Presidential elections.
In 1936 Literary Digest sent out 10 millon polls and got back 2.27 million answers. And based on that they said Alf London was going to kick FDR’s ass because their polling showed a massive win for Landon.
This other guy, George Gallup, did a scientific poll of around 1000 people and he said that FDR would win in a landslide. This prompted a great deal of mockery, how could he possibly say something like that with his measley 1000 people?
The answer was: randomness.
Turns out that Literary Digest readers were mostly richer, and mostly in certain geographic areas. They’d gotten lucky in the past but in 1936 the election was decided by poor people, often first time voters, who had been totally ignored by the Literary Digest poll.
It’s counterintuitive, you wouldn’t think that 1000 would be enough, but nope, it really is as long as its random enough.
To get an accurate poll of the entire 8 billion people on Earth you’d only need to sample around 2400 people, as long as you got a completely random sample.
And it’s randomness that’s vastly more important than size. If it’s not random it doesn’t matter how big your sample is, it’s going to be wrong. And a smaller random sample will just give slightly bigger error bars. You could sample 1,000 truly random people on Earth and your margin of error would only be around 6%. That 2400 I mentioned earlier was for a 2% margin of error.
Now, there are all sorts of other factors involved. For example, people will tend to be agreeable. If you say “Do you agree Bob Dole should be declared to be a twit” you’ll get a lot of people saying yes even if they don’t necessarially agree just because people tend to say what they think you want to hear.
Examining the questions asked by a poll is as essential as the sample size. A better question, for example, would ask something more like “Some people say Bob Dole is a twit, others say he isn’t. What do you think?” and would flip that 50% of the time so the question was phrased “Some people think bob Dole is not a twit, others say he is. What do you think?”
Bias kicks in depending on if the person is answering questions on a computer or on paper vs if they’re answering in person. For example if a Black pollster asks questions about race, surprise, a lot of white respondants lie and give much more racially progressive answers than they would if a white pollster was doing the questioning.
In general if you drill into the poll questions and find that they’re all pretty biased (“Donald Trump thinks America is the greatest country to ever exist, do you also love America or are you a commie?”) then it’s an indication that the poll isn’t actually designed to get accurate answers.
There’s also a practice called push polling, in which people are told that they’re being asked questions on a poll but in fact the purpose is to propagandize them and their answers are irrelevant.
“Scientists say that Diet Coke gives you cancer and Diet Pepsi will make you live forever, do you prefer to drink Diet Pepsi?” is a great way to advertise for Diet Pepsi, but a lousy way to find out how many people drink which soda.
Whether a sample is representative does not depend on the size of the overall population, at least once the population gets large enough. A random sample of 1000 people out of one million or one billion or one trillion still yields a margin of error of approximately 3 percent – which means that 95% of the time, the true sentiment if you measured everybody would be within three percentage points of your random sample.
The hard part is getting a random sample. If you poll a church in East Texas, you’ll get a very different response than on the wharf in San Francisco. If you poll online, you might miss those that don’t have internet access, and that will swing things, both by geography and demographics.
So 1000 of the right people is representative, but getting that 1000 is very hard to do.
The answer is statistical probability. It’s more important how representative the sample is of the population, IE is the average age, gender, geographical distribution etc representative of the country. It’s actually not relevant how big a percentage of the population it is when you’re talking about that big of a population. Polling 100,000 people isn’t likely to be any more accurate if the sample is equally representative.
Latest Answers