Adding onto what others have mentioned, I believe a visual representation is a valuable tool for understanding. If you search for “sample size and margin of error” in your browser, you’ll come across numerous graphs illustrating a consistent trend: as the sample size increases, the margin of error decreases. However, you’ll also observe that the reduction in error becomes less significant with larger samples. In other words, there’s a substantial difference in error between a sample of, for instance, 10 people and 100 people, but not a substantial difference between a sample of 1000 and 2000 people. Like this one: [https://ihopejournalofophthalmology.com/content/132/2022/1/1/img/IHOPEJO-1-009-g001.png](https://ihopejournalofophthalmology.com/content/132/2022/1/1/img/IHOPEJO-1-009-g001.png)
Why does this happen? The margin of error is calculated using the formula: Margin of Error = (Variation in sample / square root of sample size) * Z score.
I don’t know how good your mathematical intuition is, but notice **the diminishing returns in terms of sample size.** As we add more individuals, each new addition has a diminishing impact on reducing the error. This phenomenon arises from the nature of divisions and can be shown on the following numerical example: Assume we have a sample size of 1, 2 and 4 people respectively, and for simplicity I’ve ignored squaring everything. Then we get that 1/1 equals 1, 1/2 equals 0.5, and 1/4 equals 0.25. While 0.25 is undoubtedly smaller than 0.5, the difference between 1.00 and 0.5 is greater than that between 0.5 and 0.25. This mathematical principle results in diminishing returns from increasing sample sizes. We achieve smaller errors, but the rate of decrease in errors itself diminishes. Consequently, at a sufficiently large sample size, the difference becomes so negligible that it may as well be considered equal to zero.
**This insight explains why we accept “small” sample sizes, such as 0.0002% of the population. Eventually,** **the effort required to increase the sample size becomes disproportionate to the marginal reduction in error.**
Finally, there’s a crucial aspect not purely evident in mathematics but highlighted by other Redditors in this thread. **It’s not just about the sample size; the quality of the sample matters significantly.** The well-known example of the “The Literary Digest presidential poll” illustrates this point. According to Wikipedia, “The magnitude of the magazine’s error – 19.54% for the popular vote for Roosevelt vs. Landon, and even more in some states – destroyed the magazine’s credibility, and it folded within 18 months of the election. In hindsight, the polling techniques employed by the magazine were faulty. They failed to capture a representative sample of the electorate and **disproportionately polled higher-income voters, who were more likely to support Landon.** Although it had polled ten million individuals (of whom **2.27 million responded, an astronomical total for any opinion poll**), it had surveyed its own readers first […]”
This example demonstrates that while sample size is crucial, it’s not the sole determining factor. Having a correct sampling technique is equally, if not more, important. Other Redditors have delved into this aspect in more detail in this thread. I hope this clarifies things for you!
Latest Answers