Back in November 2017, I had an interview with a prop trading firm and I was asked the following question: “What are the expected number of coin flips before you observe three heads in a row?”. I was asked the same question, 2 months later, this time on a homework problem for MATH 180B - Introduction to Stochastic Processes at UC San Diego. If only those events had occured in the opposite direction… Anyway, this blog post will be giving a brief introduction to discrete time markov chains using its applications in analyzing and modelling sequences of coin flips. Assumed knowledge: conditional probability and expectation. For reference, read: Conditional Expectation and Markov chains.

Markov chains

By definition, a markov process is “a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.” A markov process can be defined using a state space (the values that the random variable can take on in the process), an initial distribution and a stochastic matrix. Here’s a simple example of a markov process: Consider an ant starting at the origin, moves uniformly either one step to it’s left (to -1) or one step to it’s right (to +1), and once it gets to -1 or +1, it is forced to return to the origin. Then the state space is defined as , the initial state is and the probability matrix is:

$$ \left[ \begin{array}{c|lcr} & \text{-1} & \text{0} & \text{1} \\ \hline -1 & 0 & 1 & 0 \\ 0 & 0.5 & 0 & 0.5 \\ 1 & 0 & 1 & 0 \end{array} \right] $$

where the states apear on the left and on top of the matrix, and an entry in the matrix represents the probability of going from State i (labeled on the row) to State j (labeled on the column) after one time period (one step on the line, in our case). As mentioned in the process, after reaching state 1 or state -1, the ant is forced to go to state 0, and as seen in the probability matrix, the probability of going from -1 to 0 and 1 to 0 after one step, is indeed 1. That is, . It is important to note that the values of do not effect the probability distribution of once is known. More formally, - known as the markov property.

Expectation and absorbing states

Let’s alter the example from the previous paragraph, keeping everything the same, except that if the ant reaches the state it can never leave it. Now our probability matrix looks like:

$$ \left[ \begin{array}{c|lcr} & \text{-1} & \text{0} & \text{1} \\ \hline -1 & 0 & 1 & 0 \\ 0 & 0.5 & 0 & 0.5 \\ 1 & 0 & 0 & 1 \end{array} \right] $$

It is easy to see that if the ant reaches state it can never exit it. This is called an absoring state. Very often, it is useful to find expected number of steps in the process until absorption using recurrence relations. For the following calculation(s), we will be using the lemma:

Lemma: Let be a time-homogeneous Markov chain with countable state space , let , and let be the first time that the Markov chain reaches a state in . Then for each , we have . Now, let’s calculate the number of expected steps that the ant will take until it gets absorbed. We define and then we have

$$ \begin{cases} \phi(-1) = 1 + \phi(0) \\[2ex] \phi(0) = 1 + \frac{\phi(-1)}{2} + \frac{\phi(1)}{2} \\[2ex] \phi(1) = 0 \end{cases} $$

and solving this system of equations we get

$$ \begin{cases} \phi(-1) = 4 \\[2ex] \phi(0) = 3 \\[2ex] \phi(1) = 0 \end{cases} $$

That is, if we start at state it will, on average, take steps to get absorbed. Let’s see how we can apply this to sequences of coin flips.

Three heads in a row

The trickiest part of finding the number of flips until is observed is defining the state space and stochastic matrix. Here’s the best approach: Let’s define where represents only the last flip being a head, represents the last two flips being heads, represents the last three flips being heads, which is also an absorbing state, and representing anything else. It is important to understand that these 4 states are disjoint. That is, if the last two flips were heads, the random variable will take on and only. Now our stochastic matrix looks like -

$$ \left[ \begin{array}{c|lcr} & \text{0} & \text{H} & \text{HH} & \text{HHH} \\ \hline 0 & 0.5 & 0.5 & 0 & 0\\ H & 0.5 & 0 & 0.5 & 0 \\ HH & 0.5 & 0 & 0 & 0.5\\ HHH & 0 & 0 & 0 & 1 \end{array} \right] $$

and our system of equations for expectation (using similar calculations as in the previous paragraph is

$$ \begin{cases} \phi(0) = 1 + \frac{\phi(0)}{2} + \frac{\phi(H)}{2} \\[2ex] \phi(H) = 1 + \frac{\phi(0)}{2} + \frac{\phi(HH)}{2} \\[2ex] \phi(HH) = 1 + \frac{\phi(0)}{2} + \frac{\phi(HHH)}{2} \\[2ex] \phi(HHH) = 0 \end{cases} $$

and solving this system of equations gives us our answer as required (since we start at state 0).

A counter-intuitive comparison

We just looked at the number of expected flips until is observed. What about ? The transition probability matrix for looks like -

$$ \left[ \begin{array}{c|lcr} & \text{0} & \text{H} & \text{HH} & \text{HHT} \\ \hline 0 & 0.5 & 0.5 & 0 & 0\\ H & 0.5 & 0 & 0.5 & 0 \\ HH & 0 & 0 & 0.5 & 0.5\\ HHT & 0 & 0 & 0 & 1 \end{array} \right] $$

Solving this system gives us , which means that on average, it will take lesser (6, to be precise) coin flips to observe than . The reason for this difference is that once the process reaches the state it can never go back to or , but only either stay at or move into in the former case.

Summary
Markov chains are simple and beautiful.

References
[1] Professor David Meyer at the UC San Diego mathematics department.