We cant be totally sure, but there are mathemathical ways to figure out exactly how sure we can be.
Lets assume, for example, that its important to represent race accurately, and that there are 20% black people in our population. We randomly sample 1000 people, so, ideally, there should be 200 blacks in there. But it probably wont be exactly 200, since re selected randomly. So, how close to the number of 200 can we expect to be?
Take the sample size, multiply with the percentage in question, then multiply with the opposite percentage, then take the square root of the result. In this example, the sample size is 1000 and the percentage is 0.2, so the result is sqrt(1000 * 0.2 * 0.8), which equals about 13. That number is called a “standard deviation”.
Rule of thumb is that there is a 95% probability that the actual result is within two standard deviations of the ideal result, and a 99.7% probability for three standard deviations.
So, in this case, we can be somewhat certain that we get between 174 and 226 blacks in our sample (200 plus/minus 26), and quite certain that we get between 161 and 239 (200 plus/minus 39).
That all assumes “truly random samples”, of course. In reality, when making a survey poll or something like that, it is difficult to get that. For example, if you ask people on the streets about their health, your poll will show a much healthier public than it actually is, because you will not get input from people who are so ill that they cannot leave their homes. Thats often the bigger hurdle to overcome when trying to get statistical information about the general public.
Latest Answers