What is the use of bootstrapping (statistics)? How does it work?

547 views

What is the use of bootstrapping (statistics)? How does it work?

In: Mathematics

Anonymous 0 Comments

When doing inference in statistics, a benchmark for whether the pattern you found is real or “just noise” is asking a question like: “If there was no pattern here, what is the probability I would see that pattern by chance?” If the probability of this is low, then you can declare that there actually is a pattern.

Some approaches for this rely on functional or distributional assumptions. Based on how you think the data were generated, you can directly estimate the probability that your pattern arose by chance. However, sometimes this process (or your statistical technique) was too complicated to use these tools properly. This is where we get to kind of a harebrained scheme that miraculously works.

Here’s the idea: If we want to know whether we would have found the same pattern within a different sample of the same population, let’s just generate a bunch of new datasets! But how do we generate new data? It was probably hard/expensive to get the data we have now. Well, let’s just draw randomly from the data we do have! We don’t have to get into the math, but this actually works. Drawing (with replacement) from the original dataset can produce accurate confidence intervals for parameters estimated from that original dataset. And just like that, we’ve suddenly created new data and inference using only what we already had to work with, pulling ourselves up by our bootstraps – thus the name.