# Markov Chain

35 views

I know nothing about stochastics, and this has been a bit hard to wrap my head around. What are their processes and purposes?

In: 1

Let’s use weather as an illustrative example.

Our weather has three states: sunny, cloudy, and rainy. We want to predict what the weather will look like in the next hour. A Markov Chain would indicate the probabilities of going from one state to the next.

If it’s sunny out, we could say that there’s a 33% chance of it staying sunny, 33% likely to become cloudy, and 33% likely to become rainy.

Now, these probabilities can change depending on what state you’re currently in. If it’s cloudy, maybe the probabilities are like 10% chance of getting sunny, 20% chance of staying cloudy, and 70% chance of becoming rainy.

You can visualize these probabilities using arrows like [this.](https://www.vatsalp.com/post/markov-chain/fig2.png)

> What are their processes and purposes?

Their purpose is to encode data based on probability of it happening. This is useful in fields like data compression or fields like predicting what is likely to follow.

The systems can be used anywhere there are patterns that are based on past events. It’s used in sports predictions all the time: *”A team that wins the first two games of the series wins the whole thing x% of the time*”, or *”When Bob starts a streak of 2 goals like this 40% of the time he goes on to make five more!”* It’s used in weather forecasting to identify what usually happens for local weather, economics forecasting for patterns of complex changes, used in speech recognition to predict what words are likely to be said next, and much more.

The process is to identify the probability of different chains happening.

Example of the process *with made up numbers*.

Let’s say you’re building something to predict what words come next. Maybe it is a text autocomplete for a phone keyboard, or voice recognition.

You’d start by developing an enormous list of text, such as data from a search engine, or compilation of books, scanning millions of text messages, or whatever source of data you’ve got. You’ll need enough examples to reflects the real-world patterns, because there are a lot of words out there you’ll want billions of lines of text.

Then you might decide to base your chain on two words. You’d scan the system for common two-word phrases. Just count all the times that two words appear together and every time they show up, count it. Any pair that shows up often will be used to build a statistical chain.

The next step is to go through every two word pair and find out what follows them. Perhaps your two words might be *”No pain…”* and the scan reveals it is usually followed by *”…no gain”,* but could also be followed by *”…after I took aspirin”, “…after it healed”, “…when I am drunk”, “…when I’m sleeping”,* and a bunch of other stuff.

You’d then put together your chain for *”No pain”*, that *”no gain”* is used 43% of the time, *”after”* is used 17%, *”when”* is used 12%, *”if”* is used 6%, and other words are used less frequently. Now you’ve got the information for the chain, useful for your autocomplete. Any time someone searches for *”No pain”,* you can suggest *”no gain”,* *”after”,* *”when”,* and *”if”*.

You’ll need to repeat that for every two word chain that shows up frequently. Some people have already done a lot of that indexing for you, like [this list of the top 100 two word phrases on Twitter.](http://www.needtagger.com/what-we-say-on-twitter-the-top-100-2-word-phrases/) An autocomplete system might have many hundred thousand two word phrases and what normally follows them.

End Example.

In the board game Monopoly, if you know the current location of a piece, you can calculate the different spaces that piece could land after its next move, and the probability of landing on each of those spaces. Then, if you do the same calculation for each of those spaces, you can calculate probabilities for where the piece will be after two turns… then three turns… then 50 turns… etc. This system is a Markov chain: by breaking the process down into independent steps and using some math tricks, you can learn things about the entire system. For example, you can learn the most-visited spaces overall, or the probability of landing on a given space sometime in the next 5 turns.

It turns out that a lot of real-world problems can be broken down into individual random steps. For example, I use Markov chains in my research about how electrons move through special plastics. Based on where an electron is now, we can estimate where it will be after a nanosecond, or what paths it might take through the material.