A Markov process is a random process where the probabilities of changing to a new state depend only on the current state.
Imagine a frog in a pond sitting on one of a bunch of lily pads. The frog will jump from one lily pad to the next – it prefers nearer ones, and some are too far to reach, but it chooses randomly. The only thing that matters is which pad it is on, and where the pads are (which doesn’t change). Importantly, the frog is also dumb as a rock and doesn’t remember where it has been and doesn’t care where it is going.
So, if the frog is on pad number 3, it might jump as follows: pad 1 10% of the time, pad 2 20%, pad 4 30%, pad 5 30%, pad 6 10%. When the frog is on pad 1, there is a different set of probabilities, and so on.
Markov processes have a variety of analytical techniques, which can be used to solve certain problems. For example, if you want to find out what the probability of where the frog will be after 100 jumps, then there is an equation to do this. There are also analytical techniques for things like determining steady states. This means that if you can model something as a Markov process, then you have mathematical tools to analyse it.
Monte Carlo techniques are a method of analysing random processes, by simulating the process over and over, each time taking a different set of random decisions. This allows you to collect statistics and determine what the likely results of your process is.
For example, a builder wants to build a house. He has an estimated cost, but wants to know how likely he is to make a profit, or a loss. There are lots of things that could happen – the price of wood might fluctuate, there might be a shortage of electricians, the weather might be bad, the cost of pipes may change, interest rates on his loan may change, there may be delays which add extra costs. Some of these processes are more likely than others to happen, and some are more expensive than others.
In a Monte Carlo process, the builder simulates building 100 houses. He takes 100 estimates for the price of wood, 100 estimates for the price of electricians, 100 forecasts of what the weather might be like, and so on. He then looks at the 100 calculated total prices, and he can tell how many of those simulated houses make a profit, and how many make a loss. This informs him whether his project is a good risk or a bad risk.
The great thing about the Monte Carlo process, is that you can use it to simulate all sorts of random processes. You can simulate Markov chains too – maybe you want an analysis which doesn’t have an easy technique, or maybe your Markov chain is so big and complex that it’s not practical to solve the equations directly.
Monte Carlo is type of stochastic algorithm. It basically goes like this: You choose a solution at random, check if it’s the right solution. If the solution is “good enough” or you’ve reached some other ending condition (maximum number of iterations for example), you stop. The important thing here is that you’re not looking for the perfect solution (that’s called a Las Vegas algorithm), so the program will always terminate in at most a known number of steps. Some examples of Monte Carlo algorithms are Random Walk, the Metropolis Algorithm, all types of genetic evolutionary algorithms…
Markov chain is just like a finite state machine, but instead of changing states based on inputs, it changes states with some probability. So you can have a state A, from which you can go to state B with probability 0.2 or to state C with probability 0.8. It’s difficult to explain in text. The important thing is that a markov chain doesn’t have a memory, the next state only depends on the current state. This is called “Markov property”.
Markov chains can be used to model various things, in physics and astronomy as well, and when you evaluate those models using some Monte Carlo algorithm, you have MCMC.
Markov chains are a kind of process where the inputs for one step are solely determined by the outputs of the step directly before it. Anything that exhibits that behavior is a Markov chain.
Monte Carlo sampling is a way of describing a parameter space by repeatedly trying random points in that space and calculating their value. Think of it as trying to find the altitude of a hill by going to a bunch of random points on it and just measuring the altitude there. The more points you check, the more confident you can be in your answer.
A Markov Chain Monte Carlo then, is a kind of Markov chain that is focused toward describing a parameter space. And there are a number of different algorithms made for that purpose, to the point that I don’t really understand them all.
Why would you want to do this rather than just doing the regular Monte Carlo sample? Well sometimes you have a lot of different parameters, so randomly trying points would be silly, as you could never get enough points to be confident that your answer is correct.
So you get a bit clever, and find ways of localizing where your answer might be.
In the hill example, if your current altitude is lower than your previous one, why would you try looking close to that lower guess? It would probably be more efficient to guess close to places that you know to be higher.
But because there could be two peaks of the hill, you don’t just go close to prior high values, you randomize the process and sometimes jump to unrelated parts of the hill, and see if they’re taller.
As for where they’re used in astrophysics and cosmology, well there are a bunch of parameters we can measure that interact with each other. So it becomes more efficient to describe that parameter space with the Markov chain Monte Carlo simulation than by just completely guessing at what each parameter should be.
Everyone gave you great answers already, but I’d like to add an example for astronomy.
Last year I did an internship and the goal was to determine if a star had a stellar companion, in other words another body orbiting around it.
To do this we had to analyse the movement of a spiral arm in the debris disk around the star (this arm is a zone of higher density of debris). The issue was that the arm is not a clear cut, and we needed to modelize the rest of the debris disk, minus the arm, to then substract that model from the data. The resulting image would have a clearer spiral arm we could analyze.
And to model the disk we had a bunch of parameters: the angle from the plane of view, the angle from the “north” of the system, the critical radius (the radius at which you have the highest density), etc…
So I was tasked with exploring the parameter space to find the best model to fit the data. To do that, the MCMC method starts at one point of the parameter space, and then moves to a close neighbor, and see how it fits. If the fit is better, it continues to move in that general direction. Otherwise it choses another direction.
If you want to try it for yourself (for example with a simple polynomial function), you can play with the EMCEE python package, which is designed for MCMC exploration.
Latest Answers