Markov Chains

An introduction to Markov Chains.

Markov Chains: Definitions and Representations

A stochastic process $X = { x(t): t\in T}$ is a collection of random variables. There are two elements:

Markov chain is a discrete-time process for which the future behaviour, given the past and the present, only depends on the present and not on the past.

Markov process is the continuous-time version of a Markov chain.

Definition 1. [Markov chain] A discrete time stochastic process $ X_0, X_1, X_2, $. . . is a Markov chain if the following Markov property holds:

\[\small P(X_{t} = a_t \vert X_{t-1} = a_{t-1}, X_{t-2} = a_{t-2}, ..., X_0 = a_0) = P(X_{t} = a_t \vert X_{t-1} = a_{t-1}) = P_{a_{t-1}, a_{t}}\]

Remark 1: This is time-homogeneous markov chain, for $\forall t$, for $\forall a_{t-1}, a_{t} \in \Omega$, the transition probability $P_{a_{t-1}, a_{t} }$ is the same.

Remark 2: In DDPM , it is not a time-homogeneous chain, as the transition probability at t is obtained by a network(t).

The state $X_{t}$ depends on the previous state $X_{t-1}$ but is independent of the particular history $X_{t-2}, X_{t-3},…$. This is called the Markov property or memoryless property.

The Markov property does not imply that $X_{t}$ is independent of the random variables $X_{0}$, $X_{1}$,…, $X_{t-2}$; it just implies that any dependency of $X_{t}$ on the past is captured in the value of $X_{t-1}$.

The Markov chain is uniquely defined by the one-step transition probability matrix $P$:

\[P = \begin{pmatrix} P_{0,0} & P_{0, 1} & \cdots & P_{0, j} & \cdots\\ \vdots & \vdots & \ddots & \vdots& \vdots\\ P_{i,0} & P_{i, 1} & \cdots & P_{i, j} & \cdots\\ \vdots & \vdots & \ddots & \vdots& \vdots\\ \end{pmatrix},\]

where $P_{i,j}$ is the probability of transition from state $i$ to state $j$. $P_{i,j} = P(X_{t} = j \vert X_{t-1} = i), i,j \in \Omega$. $\forall i, \sum_{j \geq 0} P_{i,j} = 1$.

Classification of States

For simplicity, we assume that the state space $\Omega$ is finite. Below we introduce some concepts related to the classification of states.

Communicating class

Definition 2. [Communicating class] A state $j$ is reachable from state $i$ if there exists a positive integer $n$ such that $P_{i,j}^{(n)} > 0$. We write $i \rightarrow j$. If $j$ is reachable from $i$, and $i$ is reachable from $j$, then the states $i$ and $j$ are said to communicate, denoted by $i \leftrightarrow j$. A communicating class $C$ is a maximal set of states that communicate with each other. No state in $C$ communicates with any state not in $C$.

Irreducible Markov chain

Definition 3. [Irreducibility] A Markov chain is irreducible if all states belong to one communicating class.

This means that any state can be reached from any other state, i.e., $\forall i, j \in \Omega$, $P_{i,j} > 0$.

Lemma 1. A finite Markov chain is irreducible if and only if its graph representation is a strongly connected graph.

Transient vs recurrent states

Let $r_{i,j}^{t}$ denote the probability that the chain, starting at state $i$, the first time transition to state $j$ occurs at time $t$. That is, $ r_{i,j}^{t} = P(X_{t} = j, X_{s} \neq j, \forall 1 \leq s \leq t-1 \vert X_{0} = i). $

Definition 4. [Transient and Recurrent States] A state is recurrent if $\sum_{t \geq 1} r_{i,i}^{t} = 1$ and it is transient if $\sum_{t \geq 1} r_{i,i}^{t} < 1$. A Markov chain is recurrent if every state in the chain is recurrent.

Definition 5. An irreducible Markov chain is called recurrent if at least one (equivalently, every) state in this chain is recurrent. An irreducible Markov chain is called transient if at least one (equivalently, every) state in this chain is transient.

Let $\mu_{i} = \sum_{t \geq 1} t \cdot r_{i,i}^{t}$ denote the expected time to return to state $i$ when starting at state $i$.

Definition 6. A state $i$ is positive recurrent if $\mu_{i} < \infty$ and null recurrent if $\mu_{i} = \infty$.

Here we give an example of a Markov chain that has null recurrent states. Consider the following markov chain whose states are the positive integers.

Starting at state 1, the probability of not having returned to state 1 within the first $t$ steps is \(\prod_{j=1}^{t} \frac{j}{j+1} = \frac{1}{t+1}.\) The probability of never returning to state 1 from state 1 is 0, and state 1 is recurrent. Thus, the probability of the first time transition to state $1$ occurs at time $t$ is \(r_{1,1}^{t} = \frac{1}{t} \cdot \frac{1}{t+1} = \frac{1}{t(t+1)}.\) The expected number of steps until the first return to state 1 when starting at state 1 is \(\mu_{1} = \sum_{t = 1}^{\infty} t \cdot r_{1,1}^{t} = \sum_{t = 1}^{\infty} \frac{1}{t+1} = \infty.\) State 1 is recurrent but null recurrent.

Lemma 2. In a finite Markov chain:

  1. at least one state is recurrent; and
  2. all recurrent states are positive recurrent.

Thus, all states of a finite, irreducible Markov chain are positive recurrent.

Periodic vs aperiodic states

Definition 7. A state $j$ in a discrete time Markov chain is periodic if there exists an integer $k>1$ such that $P(X_{t+s}= j \vert X_t = j) = 0$ unless $s$ is divisible by $k$. A discrete time Markov chain is periodic if any state in the chain is periodic. A state or chain that is not periodic is aperiodic.

A state $i$ is periodic means that for $s = k, 2k, 3k,…$, $P(X_{t+s}= j \vert X_t = j) > 0$ (N.B.: k > 1).

Ergodicity

Definition 8. An aperiodic, positive recurrent state is an ergodic state. A Markov chain is ergodic if all its states are ergodic.

Corollary 1. Any finite, irreducible, and aperiodic Markov chain is an ergodic chain.

Stationary distribution

Consider the two-state “broken printer” Markov chain:

Transition diagram for the two-state broken printer chain.

There are two state (0 and 1) in this Markov chain, and assume that the initial distribution is:

\[P(X_0 = 0) = \frac{\beta}{\alpha+\beta}, P(X_0 = 1) = \frac{\alpha}{\alpha+\beta}.\]

Then, according to the transition probability matrix $P$, after one step, the distribution is

\[\small \begin{align*} P(X_1 = 0) &= P(X_0 = 0)P(X_1 = 0 \vert X_0 = 0) + P(X_0 = 1)P(X_1 = 0 \vert X_0 = 1) \\ &= \frac{\beta}{\alpha+\beta} \cdot (1-\alpha) + \frac{\alpha}{\alpha+\beta} \cdot \beta = \frac{\beta}{\alpha+\beta}, \\ P(X_1 = 1) &= P(X_0 = 0)P(X_1 = 1 \vert X_0 = 0) + P(X_0 = 1)P(X_1 = 1 \vert X_0 = 1) \\ &= \frac{\beta}{\alpha+\beta} \cdot \alpha + \frac{\alpha}{\alpha+\beta} \cdot (1-\beta) = \frac{\alpha}{\alpha+\beta}. \end{align*}\]

Apparently, the distribution of $X_1$ is the same as the initial distribution. Similarly, we can prove that the distribution of $X_t$ is the same as the initial distribution for any $t$. Here, $\pi = (\frac{\beta}{\alpha+\beta}, \frac{\alpha}{\alpha+\beta})$ is called stationary distribution.

Definition 9. A probability distribution $\pi = (\pi_i)$, $\sum_{i \in \Omega} \pi_i = 1$(row vector) on the state space $\Omega$ is called a stationary distribution (or an equilibrium distribution) for the Markov chain with transition probability matrix $P$ if $\pi = \pi P$, equivalently, $\pi_j = \sum_{i \in \Omega}\pi_i P_{i,j}$ for all $j \in \Omega$.

Finding a stationary distribution

Consider the following no-claims discount Markov chain with state space $\Omega = {1,2,3}$ and transition matrix:

\[P = \begin{pmatrix} \frac{1}{4} & \frac{3}{4} & 0\\ \frac{1}{4} & 0 & \frac{3}{4}\\ 0 & \frac{1}{4} & \frac{3}{4} \end{pmatrix}.\]

Existence and uniqueness

Given a Markov chaine, how can we know whether it has a stationary distribution? If it has, is it unique? At this part, we will answer these questions.

Some notations:

Note that this is different from $r_{i,j}^{t}$, which denotes the probability that the chain, starting at state $i$, the first time transition to state $j$ occurs at time $t$.

We also have

\[h_{i,j} = \begin{cases} \sum_{k \in \Omega}P_{i,k}h_{k,j}, & \text{if} \quad j \ne i, \\ 1, & \text{if} \quad j = i. \end{cases}\]

Theorems on Irreducible Markov Chains

Existence and uniqueness of stationary distribution

Theorem 1. Consider an irreducible Markov chain (finite or infinite), (1) if it is positive recurrent, $\exists$ an unique stationary distribution $\pi$, such that $\pi_i = \frac{1}{\mu_{i}}$. (2) if it is null recurrent or transient, no stationary distribution exists.

Remark: If the chain is finite irreducible, it must be positive recurrent, thus it has an unique stationary distribution.

Remark: If the Markov chain is not irreducible, we can decompose the state space into several communicating classes. Then, we can consider each communicating class separately.

Now, we give the proof of Theorem 1. We first prove that if a Markov chain is irreducible and positive recurrent, then there exists a stationary distribution. Next, we will prove the stationary distribution is unique. Since the second part with the null recurrent or transitive Markov chains is less important and more complicated, we will omit it. If you are interested in it, you can refer to the book Markov Chains by James Norris .

Proof. (1) Suppose that $(X_0, X_1 …)$ a recurrent Markov chain, which can be positive recurrent or null recurrent. Then we can desigh a stationary distribution as follows. (If we can desigh a stationary distribution, then it must be existed.)

Let $\nu_i$ be the expected number of visits to $i$ before we return back to $k$,

\[\begin{align*} \nu_i &= \mathbb{E}(\# \text{visits to $i$ before returning to } k \vert X_0 = k) \\ &= \mathbb{E}\sum_{t=1}^{M_k} P(X_t = i \vert X_0 = k) \\ &= \mathbb{E}\sum_{t = 0}^{M_k - 1} P(X_t = i \vert X_0 = k) \end{align*}\]

The last equation holds because of $ P(X_0 = i \vert X_0 = k) = 0$ and $ P(X_{M_k} = i \vert X_0 = k) = 0$.

If we want design a stationary distribution, it must statisfy $\pi P = \pi$ and $\sum_{i \in \Omega}\pi_i = 1$.

(a) We first prove that $\nu P = \nu$.

\[\begin{align*} \sum_{i \in \Omega} \nu_i P_{i,j} &= \mathbb{E}\sum_{i \in \Omega} \sum_{t = 0}^{M_k - 1} P(X_t = i, X_{t+1} = j \vert X_0 = k) \\ &= \mathbb{E}\sum_{t = 0}^{M_k - 1} \sum_{i \in \Omega} P(X_t = i, X_{t+1} = j \vert X_0 = k) \\ &= \mathbb{E} \sum_{t = 0}^{M_k - 1} P(X_{t+1} = j \vert X_0 = k) \\ &= \mathbb{E} \sum_{t = 1}^{M_k } P(X_{t} = j \vert X_0 = k) \\ &= \mathbb{E} \sum_{t = 0}^{M_k - 1} \nu_i \\ &= \nu_j. \end{align*}\]

(b) Next, what we need to do is to normalize $\nu$ to get a stationary distribution. We have

\[\begin{align*} \sum_{i \in \Omega} \nu_i &= \sum_{i \in \Omega} \mathbb{E} \sum_{t = 0}^{M_k - 1} P(X_t = i \vert X_0 = k) \\ &=\mathbb{E} \sum_{t = 0}^{M_k - 1} \sum_{i \in \Omega} P(X_t = i \vert X_0 = k) \\ &= \mathbb{E}(M_k \vert X_0 = i) \\ &= \mu_k. \end{align*}\]

Thus, we can define $\pi_i = \nu_i/\mu_k$, $\pi = {\pi_i, i \in \Omega}$ is one of the stationary distribution.

(2) Next, we prove that if a Markov chain is irreducible and positive recurrent, then the stationary distribution is unique and is given by $\pi_j = \frac{1}{\mu_j}$.

Given a stationary distribution $\pi$, if we prove that for all $i$, $\pi_j == \frac{1}{\mu_j}$, then we prove that the stationary distribution is unique.

Remember that the expected hitting time:

\begin{equation} \eta_{i,j} = 1 + \sum_{k \in \Omega}P_{i,k}\eta_{k,j}, j \ne i \label{eq:1} \end{equation}

We multiply both sides of Eq. \eqref{eq:1} by $\pi_i$ and sum over $i (i \ne j)$ to get

\begin{equation} \sum_{i \ne j} \pi_i \eta_{i,j} = \sum_{i \ne j} \pi_i + \sum_{i \ne j} \sum_{k \in \Omega} \pi_i P_{i,k}\eta_{k,j} \label{eq:2} \end{equation}

Since $\eta_{j,j} = 0$, we can rewrite the above equation as

\begin{equation} \sum_{i \in \Omega} \pi_i \eta_{i,j} = \sum_{i \ne j} \pi_i + \sum_{i \ne j} \sum_{k \in \Omega} \pi_i P_{i,k}\eta_{k,j}. \label{eq:3} \end{equation}

(The above equality lacks $j$, and we also want to design $\pi_j = 1/\mu_j$.)

Remember that the expected return time:

\[\mu_{j} = 1 + \sum_{i \in \Omega} P_{j,i}\eta_{i,j}.\]

We multiply both sides of Eq. \eqref{eq:2} by $\pi_j$ to get

\begin{equation} \pi_j \mu_{j} =\pi_j + \sum_{k \in \Omega} \pi_j P_{j,k}\eta_{k,j} \label{eq:4} \end{equation}

Adding Eq. \eqref{eq:2} and Eq. \eqref{eq:4}, we get

\[\begin{align*} \sum_{i \in \Omega} \pi_i \eta_{i,j} + \pi_j \mu_{j} &= \sum_{i \in \Omega} \pi_i + \sum_{i \in \Omega} \sum_{k \in \Omega} \pi_i P_{i,k}\eta_{k,j} \\ &= 1 + \sum_{k \in \Omega} \sum_{i \in \Omega} \pi_i P_{i,k}\eta_{k,j} \\ &= 1 + \sum_{k \in \Omega} \pi_k \eta_{k,j} \qquad (\text{since} \sum_{i \in \Omega} \pi_i P_{i,k} = \pi_k) \\ \end{align*}\]

Since the Markov chain is irreducible and positive recurrent, that means all states belong to a communication class and the expected return time of each state is finite. Thus, the space $\Omega$ is a finite dimensional space.

We can substract $\sum_{k \in \Omega} \pi_k \eta_{k,j}$ and $\sum_{i \in \Omega} \pi_i \eta_{i,j} $ (equal) from both sides of the above equation to get $ \pi_j \mu_{j}=1, $ which means $\pi_j = 1/\mu_j$. Similarly, we can prove that $\pi_i = 1/\mu_i$ for all $i \in \Omega$.

Limit theorem

Theorem 2. [Limit theorem] Consider an irreducible, aperiodic Markov chain (maybe infinite), we have $\lim\limits_{t \to \infty} P_{i,j}^{t} = \frac{1}{\mu_{j}}$. Spectially, (1) Suppose the Markov chain is positive recurrent. Then $\lim\limits_{t \to \infty} P_{i,j}^{t} = \pi_j = \frac{1}{\mu_{j}}$. (2) Suppose the Markov chain is null recurrent or transient. Then there is no limite probability.

Three conditions for convergence to an equilibrium probability distribution: irreducibility, aperiodicity, and positive recurrence. The limit probability is

\[P = \begin{pmatrix} \pi_1 & \pi_2 & \cdots & \pi_N\\ \pi_1 & \pi_2 & \cdots & \pi_N\\ \vdots & \vdots & \ddots & \vdots\\ \pi_1 & \pi_2 & \cdots & \pi_N\\ \end{pmatrix},\]

where each row is identical.

Ergodic theorem

Define $V_{i,j}^{t} = \vert{ n < t \vert X_n = j}\vert$. $V_{i,j}^{t}$ is the number of visits to state $j$ before time $t$ starting from state $i$. Then we can interpret $V_{i,j}^{t}/t$ as the proportion of time up to time $t$ spent in state $j$.

Theorem 3. [Ergodic theorem] Consider an irreducible Markov chain, we have $\lim\limits_{t \to \infty} V_{i,j}^{t}/t = \frac{1}{\mu_{j}}$ almost surely. Spectially, (1) Suppose the Markov chain is positive recurrent. Then $\lim\limits_{t \to \infty} V_{i,j}^{t}/t = \pi_j = \frac{1}{\mu_{j}}$ almost surely. (2) Suppose the Markov chain is null recurrent or transient. Then $ V_{i,j}^{t}/t \to 0$ almost surely for all $j$.

The term almost surely means that the convergence probability of the event is 1.

Detailed balance condition

Theorem 4. [Detailed balance condition]. Consider a finite, irreducible, and ergodic Markov chain with transition matrix $P$. If there are nonnegative numbers $\bar{\pi} = (\pi_0, \pi_1, …, \pi_n)$ such that $\sum_{i=0}^{n} \pi_i = 1$ and if, for any pair of states $i, j$, $ \pi_i P_{i,j} = \pi_{j} P_{j,i}, $ then $\bar{\pi}$ is the stationary distribution corresponding to $P$.

Proof. We have $ \sum_{i} \pi_i P_{i,j} = \sum_{i}\pi_{j} P_{j,i} = \pi_{j} $ Thus, $\bar{\pi} = \bar{\pi}P$. Since this is a finite, irreducible, and ergodic Markov chain, $\bar{\pi}$ must be the unique stationary distribution of the Markov chain.

Remark: Theorem 4 is a sufficient but not necessary condition.

Further Reading

We have introduced the basic concepts of Markov chains, including the definition, classification of states, and theorems on irreducible Markov chains. If you are interested in more details, you can refer to the following materials: