I am unknowledgeable about Monte Carlos. What would be the difference in the reliability/accuracy of the final expected value in a Monte Carlo simulation under the following two methods:
1) Run a 100k-iteration Monte Carlo simulation, five times over, get the average of each of the five simulations, and then take the average of the average of those five simulations
2) Run a 500k-iteration Monte Carlo once.
Presumably the second would be more accurate and reliable, but I am not sure how?
In: Mathematics
The simulations have a burn-in period – it takes a number of iterations for the distribution of the list of values it outputs to “forget” the simulation’s initial state and converge to the equilibrium distribution.
If you’re not discarding some of the initial iterations in each run, both 1) and 2) will be inaccurate – therefore 1) is more wasteful of iterations than 2) when you’re doing it right, in terms of finding the true average.
Monte Carlo is used for a huge variety of tasks in statistics and probability, and there are many different ways to do it. In the simplest case you want to draw samples from a known distribution and then evaluate a function on the sample points to estimate some statistical property of the result or compute an integral. This is known as crude MC and the results would be the same in both cases you describe. The samples you get are all independently drawn from the same distribution, so 5 sets of 100k is no different than a single set of 500k.
Burn-in is relevant in Markov Chain Monte Carlo, which is a common tool for inferring distributions, i.e. learning the distributions of variables you can not observe directly using some related quantity that you can measure. In this case you don’t actually know the distribution that you want to sample from. Instead you draw samples and accept or reject them (this is the simplest method, known as the Metropolis-Hastings algorithm) in such a way that eventually the “chain” of successive samples will converge to the correct distribution. Here you are not immediately sampling from the correct distribution as in the crude MC case, and therefore some of the initial samples must be discarded. Sampling 500k samples at once means you have to discard a smaller proportion of the total sample, but you may prefer to use 5 runs of 100k to reduce the total runtime by running the simulations in parallel.
There are many different types of Monte Carlo methods used for many different purposes. As FriedFred said, in some cases it may take a long time to move away from the initial state, but in others it might be possible to get stuck in certain states so a large number of shorter runs might work better. In other cases it will make no difference.
To add what others have written. You should analyze convergence. That is to say, consider a metric that is of importance to you (say the mean simulated value of some output). Then compare how this value develops as you increase the number of simulations. Of course this may look very different depending on what exactly you are tracking (volatility, mean, percentiles,….). But it should give you some idea as to whether you need more iterations
Latest Answers