How is it that in the U.S.,surveys of 1,000 are accepted as representative of the entire country?

2.98K viewsOther

I’ve noticed most U.S. polls query around 1,000 people and sometimes even less. Somehow that qualifies for headlines like “Americans say…” or “Most Americans…” How is it acceptable that 0.0002% of the population is accepted as representative?

In: Other

48 Answers

Anonymous 0 Comments

Imagine I have a santa-sized sack of marbles of all sorts of colors.
Lets say we have somewhere like one million marbles.

I give you the task of telling me how many of them are red.

How many are you going to want to dump out and count before you’d feel confident giving me an answer?

All of them? Half of them? How correct do you want to be? Do you NEED to be exact?

Depending on what you’re trying to achieve, statistics has a lot of power for telling us what we need to know.

While you’re busy spending the [next week or two](https://nowiknow.com/how-long-would-it-take-to-count-to-a-million/) dumping them all out over the floor of a gymnasium and counting every single one and putting them back in the bag, I’m going to stick my arm in, stir them up for a minute, and grab a few handfuls. Maybe 20 marbles. An amount that I can count out in a minute or less. I find 2 red marbles in 20. I toss them back in and repeat. This time I find 3 in 20. Then I repeat again and find 2 in 20 again.

I’ll do that 10 times, and come to the conclusion that there are on average 2.3 red marbles in 20, or 115,000 total red marbles in the sack.

Then I’m going to spend 10 minutes with some statistical calculations, use the standard deviation of the sample results, and use the formulas to determine a 95% and 99% confidence level.

E.g. this might be “I am 95% confident that there are 115,000 +/- 3000 red marbles” and “I am 99% confident that there are 115,000 +/- 8000 red marbles”

The samples and those results can mathematically tell me that there’s only a 1% chance that my sampling was wrong outside of a range of 107,000 to 123,000.

My test was done in under an hour with a printed report, whereas counting any meaningful fraction of marbles will take much longer.

What my test relies on is that my sampling was sufficiently random, i.e. the marbles were well mixed before and between sampling.

So when surveying people, ideally we want to randomly sample from the target population. That’s actually very hard to do, and it’s a valid reason these studies are flawed.

E.g. If you wanted to sample random people, you could stand on a street corner and interview passers by. But your sampling will be skewed towards people who walk to work. If you’re sampling at 2pm in the afternoon, you’re skewing away from people who work 9-5 office jobs. Almost any method of sampling in person has a location bias.

One of the best ways to sample is to get a list of phone numbers of county residents, use a random number generator to pick 1000 of them at random, and then start calling. The best data list is probably a list of registered voters if it includes phone numbers. Of course, you’re then skewed based on time of day, and towards people who actually have the patience to answer your annoying questions.

Because there are so many ways to accidentally bias your sampling, a well designed study will also ask demographic questions like ethnicity, address, gender, age, etc. These may be useful for making headline conclusions like “People over 60… “, but they’re also useful for just checking biases. E.g. you can use census data for a county to find out that 40% of residents are over the age of 60. If you run your survey and it turns out that 70% of the respondents are over 60 yrs of age, that’s an indication that you may not have had a sufficiently random sample, and you need to overhaul your sampling technique (e.g. maybe your phone list includes cell phones and landlines, and older citizens are likely to have cell phones AND landlines, making them twice as likely for one of their numbers to get selected).

You are viewing 1 out of 48 answers, click here to view all answers.