How do we know that truly random samples accurately represent general populations?

1.35K views

How do we know that truly random samples accurately represent general populations?

In: Other

5 Answers

Anonymous 0 Comments

Look at the way [these 3000 ball bearings fall down in this toy.](https://i.imgur.com/2VRIovO.gif)

The bell curve shows the expected outcome.

The 3000 ball bearings are like random samples and the curve is like the general population.

Anonymous 0 Comments

With something truly random, there’s no way to know for sure. At least not without taking census and asking everyone.
But the chances of not getting a representation of the general population are very low.
[In fact, there’s a whole sub field of determining just how big the sample should be to get the odds at a particular lowness.](https://en.wikipedia.org/wiki/Sample_size_determination)

Anonymous 0 Comments

We cant be totally sure, but there are mathemathical ways to figure out exactly how sure we can be.

Lets assume, for example, that its important to represent race accurately, and that there are 20% black people in our population. We randomly sample 1000 people, so, ideally, there should be 200 blacks in there. But it probably wont be exactly 200, since re selected randomly. So, how close to the number of 200 can we expect to be?

Take the sample size, multiply with the percentage in question, then multiply with the opposite percentage, then take the square root of the result. In this example, the sample size is 1000 and the percentage is 0.2, so the result is sqrt(1000 * 0.2 * 0.8), which equals about 13. That number is called a “standard deviation”.

Rule of thumb is that there is a 95% probability that the actual result is within two standard deviations of the ideal result, and a 99.7% probability for three standard deviations.

So, in this case, we can be somewhat certain that we get between 174 and 226 blacks in our sample (200 plus/minus 26), and quite certain that we get between 161 and 239 (200 plus/minus 39).

That all assumes “truly random samples”, of course. In reality, when making a survey poll or something like that, it is difficult to get that. For example, if you ask people on the streets about their health, your poll will show a much healthier public than it actually is, because you will not get input from people who are so ill that they cannot leave their homes. Thats often the bigger hurdle to overcome when trying to get statistical information about the general public.

Anonymous 0 Comments

we know because of math, especially the law of larger numbers.

The difference in statistics between random samples and the population becomes small.

Anonymous 0 Comments

The Law of Large Numbers. It’s not just an expression. It’s an actual mathematical result. It tells us that the average of a random sample tends towards the average in the population. This is supported both by probability theory and by any number of experiments where the average in the population was already known and the sample did a good job of estimating it.

Even more remarkably, the LLN tells us, for a given sample size, how much we could potentially be off. This is how statisticians calculate things like the margin of error.