NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Transition or Stochastic Matrices and Discrete Time Markov Chains:


The objective of Markov matrices is to incorporate probability vectors so as to calculate the steady state distribution of proportions based on an initial state.

For example, if we know that a certain percentage of people leave the city for the suburbs every year, and a different proportion of city dwellers move to the suburbs, we can find out the percentage or number of people living in the city and suburbs one year from now:



\[\begin{bmatrix} & \text{City} & \text{Suburbs} \\ \text{City} & 0.88 & \color{red}{0.12}\\ \text{Suburbs} & \color{magenta}{0.05} & 0.95\end{bmatrix} \underbrace{\begin{bmatrix}\text{C}\\\text{S}\end{bmatrix}}_{\text{initial state}}=\underbrace{\begin{bmatrix}\text{C}\\\text{S}\end{bmatrix}}_{\text{state 1 yr later}}\]


ABSORBING MARKOV CHAINS:


From this series of lectures.

An absorbing state in a transition MC is such that when it is entered, it becomes impossible to leave. They are identified as \(1\)’s in the diagonal matrix.

An absorbing MC, like a regular MC, tends to a limiting matrix by calculating powers of the transition matrix, \(P.\) The presence of absorbing states does not guarantee a limiting matrix: for its to happen, the absorbing states have to be able to transition to absorbing states.

For instance, in the matrix

\[\tiny P=\begin{bmatrix} & A & B & C \\ A & 0 & 0 & 1 \\ B & 0 & 1 & 0 \\ C & 1 & 0 & 0 \end{bmatrix}\]

has an absorbing state in \(B\); however, it is impossible to reach from either \(A\) or \(C.\) Note that an absorbing state can be reach indirectly through other states.

A MC is an absorbing chain if 1. There is at least one absorbing state; and 2. It is possible to go from each non-absorbing states to at least one absorbing state in a finite number of steps.

Absorbing MC are organized in standard form with absorbing states in the upper-left block submatrix as an identity matrix.

Consider the following transition diagram:

The standard form will be:

\[\small\begin{align} P=\begin{bmatrix} & \text{A} & \text{non-A}\\ \text{A} & I & 0 \\ \text{non-A}& R & Q \end{bmatrix} = \begin{bmatrix} & \text{B} & \text{A} & \text{C} & \text{D}\\ \text{B} &1 & 0 & 0 & 0 \\ \text{A} &.4 & .6 & 0 & 0 \\ \text{C} &.7 & 0 & 0 & .3 \\ \text{D} &0 & 0 & 1 & 0 \end{bmatrix} \end{align}\]

Another example clarifies the block matrices. If we have a MC transition matrix given as

\[\small P=\begin{align} \begin{bmatrix} & \text{A} & \text{B} & \text{C} & \text{D}\\ \text{A} &0 & .3 & .3 & .4 \\ \text{B} &0 & 1 & 0 & 0 \\ \text{C} &0 & 0 & 1 & 0 \\ \text{D} &.8 & .1 & .1 & 0 \end{bmatrix} \end{align} \]

can be written in standard form as

\[\small\begin{align} P=\begin{bmatrix} & \text{B} & \text{C} & \text{A} & \text{D}\\ \text{B} &1 & 0 & 0 & 0 \\ \text{C} &0 & 1 & 0 & 0 \\ \text{A} &.3 & .3 & 0 & .4 \\ \text{D} &.1 & .1 & .8 & 0 \end{bmatrix} \end{align} \]

The matrix \(R=\begin{bmatrix}.3&.3\\.1&.1\end{bmatrix}\) and the matrix \(Q= \begin{bmatrix}0&.4\\.8&0\end{bmatrix}.\)

The limiting matrix, \(\bar P\) is approached as powers of the standard form of the absorbing MC \(P^k\) when \(k\) increases as

\[\bar P =\begin{bmatrix} I& 0\\ FR & 0 \end{bmatrix}\]

where the block matrix \(FR\) is calculated from the fundamental matrix \(F=(I-Q)^{-1}.\)

The \(R\) matrix is not necessarily square: if there are \(n_a\) absorbing states, and \(n_t\) transient states, the matrix \(R\) will be \(n_t \times n_a,\) and contain the transition probabilities between transient and absorbent states. On the other hand, \(Q\) is square, \(n_t \times n_t,\) containing the transition probabilities between transient states. The submatrix \(I\) will be square \(n_a \times n_a,\) and the matrix of zeroes will be square and \(n_t \times n_t.\)

Properties:
  1. The entry in row \(i\), column \(j\) of \(\bar P\) is the long-run probability of going from state \(i\) to state \(j.\)

  2. The sum of the entries in each row of the fundamental matrix \(F\) is the average number of trials it takes to go from each non-absorbing state to some absorbing state.


Proof:

This is a good resource. The submatrix \(P\) contains the transition probabilities from transient to absorbing states, while \(Q\) contains the transition probabilities from transient to transient states.

Powers of the transition matrix \(P\) approach a limiting matrix with a pattern:

\[\begin{align} P^2 &=\begin{bmatrix}I & 0 \\ R & Q \end{bmatrix}^2= \begin{bmatrix}I & 0 \\ R+QR & Q^2 \end{bmatrix}\\[3ex] P^3 &=\begin{bmatrix}I & 0 \\ R+QR & Q^2 \end{bmatrix}\begin{bmatrix}I & 0 \\ R & Q \end{bmatrix}=\begin{bmatrix}I & 0 \\ R+QR+Q^2R & Q^3 \end{bmatrix}\\[3ex] P^k &=\begin{bmatrix}I & 0 \\ \left(I+Q+Q^2+\cdots+Q^{k-1}\right)R & Q^k \end{bmatrix}\tag 1 \end{align}\]

The key now is that \(Q^k\to 0\) as \(k\to \infty.\)

The fundamental matrix is a geometric series:

\[F= I+Q+Q^2+\cdots=(I-Q)^{-1}\]

and replacing the expression \(I+Q+Q^2+\cdots+Q^{k-1}\) in (1) with \(F:\)

\[\begin{align}P^{\infty}&=\begin{bmatrix}I & 0 \\ FR & 0 \end{bmatrix}\end{align}\]


Example problem:

Suppose a credit union classifies car loans into four categories: paid in full (F); in good standing (G); in arrears (A); or bad debt (B). Each month, \(10\%\) of G accounts pay the loan in full; \(80\%\) remain in G; and \(10\%\) go to A. For accounts in A, \(10\%\) are paid in full, \(40\%\) become G; \(40\%\) stay in A; and \(10\%\) go to B:

We want to know what percentage of accounts in A will default. And how many months the accounts in A will remain so, until they default or paid in full (the two absorbing states).

(P = matrix(c(1,0,0,0,0,1,0,0,.1,0,.8,.1,.1,.1,.4,.4), nrow=4, byrow=T))
##      [,1] [,2] [,3] [,4]
## [1,]  1.0  0.0  0.0  0.0
## [2,]  0.0  1.0  0.0  0.0
## [3,]  0.1  0.0  0.8  0.1
## [4,]  0.1  0.1  0.4  0.4
(R = P[3:4, 1:2])
##      [,1] [,2]
## [1,]  0.1  0.0
## [2,]  0.1  0.1
(Q = P[3:4, 3:4])
##      [,1] [,2]
## [1,]  0.8  0.1
## [2,]  0.4  0.4
(Fund = solve(diag(2) - Q)) 
##      [,1] [,2]
## [1,]  7.5 1.25
## [2,]  5.0 2.50
(FR = Fund %*% R)
##       [,1]  [,2]
## [1,] 0.875 0.125
## [2,] 0.750 0.250

Just reading from the last matrix, \(75\%\) of accounts in arrears (last row) will be paid in full (F), and \(25\%\) will ultimately default. Likewise, of the good (G) accounts \(87\%\) will ultimately be paid in full, and \(13\%\) will default (B).

The average of months the accounts in A will remain in A before reaching an absorbing state (either F or B) will be

rowSums(Fund)[2]
## [1] 7.5

That is, an average of \(7.5\) months until they are paid in full, or go into default.

The interpretation of the entries of the fundamental matrix, Fund, or \(F\) in general is as follows: \(F(i,j)\) is the number of periods (in this case, months) spent in the \(j\) non-absorbing state when the chain started in the \(i\) non-absorbing state. And \(\sum_j F(i,j)\) yields the expected number of period spent in any non-absorbing state, given that we started at state \(i,\) which is the calculation of \(7.5\) months in the example above.

Proof:

Since

\[F = (I- Q)^{-1}= I + Q + Q^2 +\cdots\]

\[F(i,j) = Q^0(i,j) + Q^1(i,j) + Q^2(i,j)+\cdots\]

where \(Q^t(i,j)\) is that the probability that the process that started in the \(i\) non-absorbing state will be in the \(j\) non-absorbing state at time \(t.\) Or intepreted differently, the fraction of period \(t\) spent in state \(j.\) Therefore, the sum over all periods is the expected periods spent in \(j.\)


Finally, going back to the transition diagram above, all initial states will be unavoidably absorbed into B:

(P = matrix(c(1,0,0,0,.4,.6,0,0,.7,0,0,.3,0,0,1,0), nrow=4, byrow=T))
##      [,1] [,2] [,3] [,4]
## [1,]  1.0  0.0    0  0.0
## [2,]  0.4  0.6    0  0.0
## [3,]  0.7  0.0    0  0.3
## [4,]  0.0  0.0    1  0.0
R = P[2:4, 1, drop=F]
Q = P[2:4, 2:4]
Fund = solve(diag(3) - Q)
(FR = Fund %*% R)
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

We can see the absorbsion process in play if we focus back on the \(P\) matrix of the insurance company example:

(P = matrix(c(1,0,0,0,0,1,0,0,.1,0,.8,.1,.1,.1,.4,.4), nrow=4, byrow=T))
##      [,1] [,2] [,3] [,4]
## [1,]  1.0  0.0  0.0  0.0
## [2,]  0.0  1.0  0.0  0.0
## [3,]  0.1  0.0  0.8  0.1
## [4,]  0.1  0.1  0.4  0.4

After the second month…

P%*%P
##      [,1] [,2] [,3] [,4]
## [1,] 1.00 0.00 0.00 0.00
## [2,] 0.00 1.00 0.00 0.00
## [3,] 0.19 0.01 0.68 0.12
## [4,] 0.18 0.14 0.48 0.20

there’ll be a lot of accounts in the transient states. But after six years there is no account in the transition states, having all been absorbed (notice the empty lower-right corner block matrix:

require(expm)
round(P %^% 75, 3)
##       [,1]  [,2] [,3] [,4]
## [1,] 1.000 0.000    0    0
## [2,] 0.000 1.000    0    0
## [3,] 0.875 0.125    0    0
## [4,] 0.750 0.250    0    0

Markov Chains and Ergodicity:

From this post on Twitter by @normonics.

There were Markov chains without absorbing states that are non-ergodic, such as a simple random walk.


Conditions for ergodicity:

From this presentation.

Irreducibility:

From any state it is possible to get to any state.

In a Markov transition matrix \(P\) with \(N\) states all the entries in \[\sum_{n=0}^{N-1} P^n\]

must be \(>0\) for the matrix to be irreducible. Why? Because for any given \(P^n\) the enty \(ij\) is the probability of reaching state \(j\) from state \(i\) in \(n\) steps. The summation of these powers will tell us that if the entries are greater that zero at least in one number of the steps \(n\) there is a a path (i.e. probability of greater than zero of that transition happening in \(n\) steps). We want this to be true for all states.

If there is a possibility of going from one state to another in a system with \(N\) states, that will occur in a maximum of \(N-1\) steps.

Aperiodicity:

The paths from one state to itself \(i \to i=ii\) are not only multiples of a period \(T \geq 2.\) It can be checked by finding a loop in the transition diagram of \(k\) steps, and another loop of \(k+1\) steps. Once is enough if irreducible. With transition matrices: Find \(k\) and \(i\) such that \[P^k_{ii}P^{k+1}_{ii}>0.\]

This means that there is a probability greater than zero of going from \(i \to i\) in \(k\) steps, and also positive in \(k+1\) steps.

We check the diagonal matrix entries in two consecutive powers.

The idea is that we want to reach a steady-state vector of strictly positive probabilities independent of where we start.

Positive recurrence:

If I am at a given state the expected time before returning to that state is finite. If the matrix is irreducible and the number of states is finite, it is automatically fulfilled.


Example:

Non-ergodic periodic chain:

library(expm)
# Example 1 -> 2 -> 3 -> 1
mat <- matrix(c(0,1,0,0,0,1,1,0,0),ncol=3,byrow=T)
mat
##      [,1] [,2] [,3]
## [1,]    0    1    0
## [2,]    0    0    1
## [3,]    1    0    0

A state \(i\) is absorbing if once the system reaches state \(i,\) it stays in that state; that is, \(\Pr_{ii}=1\). There are no \(1\)’s in the diagonal in this matrix.

Checking irreducibility:

# Irreducible:
M <- matrix(rep(0,9),ncol=3)
for (i in 0:2) M <- M + mat %^% i
M
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
## [3,]    1    1    1

It is irreducible because all values are positive.

But it is not aperiodic because there are no two consecutive powers with any one diagonal entry positive in those consecutive powers:

# Aperiodic:
# Example: 1 -> 2 -> 3 -> 1
# If we want to move from 1 to itself we can do it in 3 steps or 6 or 9 or 12... So it's periodic with T = 3.
mat %^% 1
##      [,1] [,2] [,3]
## [1,]    0    1    0
## [2,]    0    0    1
## [3,]    1    0    0
mat %^% 2
##      [,1] [,2] [,3]
## [1,]    0    0    1
## [2,]    1    0    0
## [3,]    0    1    0
mat %^% 3
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
mat %^% 4
##      [,1] [,2] [,3]
## [1,]    0    1    0
## [2,]    0    0    1
## [3,]    1    0    0
# P to the power of 3 is the identity. Same for any P to the power of 1 + 3k or 2 + 3k or 3 + 3k.

This matrix is non-ergotic.


Non-ergodic chain with no absorbing state:

trans_mat <- matrix(c(0,1/2,0,1/2,0,
                      0,1/2,1/2,0,0,
                      0,1/2,1/2,0,0,
                      0,0,0,1/2,1/2,
                      0,0,0,1/2,1/2),ncol=5,byrow=T)
N = 100
trans_mat %^% N
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0 0.25 0.25 0.25 0.25
## [2,]    0 0.50 0.50 0.00 0.00
## [3,]    0 0.50 0.50 0.00 0.00
## [4,]    0 0.00 0.00 0.50 0.50
## [5,]    0 0.00 0.00 0.50 0.50

We try to find an absorbing state (i.e. a \(1\) in any entry) with a very high number of steps \(N = 100,\) and it doesn’t happen: no absorbing states.

# Check lack of irreducibility... number of states is 5:

M <- matrix(rep(0,25),ncol=5)
for (i in 0:4) M <- M + trans_mat %^% i
M
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1 1.25 0.75 1.25 0.75
## [2,]    0 3.00 2.00 0.00 0.00
## [3,]    0 2.00 3.00 0.00 0.00
## [4,]    0 0.00 0.00 3.00 2.00
## [5,]    0 0.00 0.00 2.00 3.00

Not all states are greater than \(0\) and therefore, the chain is not irreducible (hence, non-ergotic).


Non-ergodic chain with no absorbing state and each state with at least one incoming state:

trans_mat <- matrix(c(0,1/2,0,1/2,0,
                      0,1/2,1/2,0,0,
                      0,1/2,1/2,0,0,
                      0,0,0,1/2,1/2,
                      1/10,0,0,4/10,1/2),ncol=5,byrow=T)
N = 100
trans_mat %^% N
##             [,1]      [,2]      [,3]       [,4]       [,5]
## [1,] 0.002145419 0.4790818 0.4779816 0.01987296 0.02091818
## [2,] 0.000000000 0.5000000 0.5000000 0.00000000 0.00000000
## [3,] 0.000000000 0.5000000 0.5000000 0.00000000 0.00000000
## [4,] 0.004183637 0.4592089 0.4570634 0.03875293 0.04079114
## [5,] 0.003974592 0.4612471 0.4592089 0.03681655 0.03875293

We try to find an absorbing state (i.e. a \(1\) in any entry) with a very high number of steps \(N = 100,\) and it doesn’t happen: no absorbing states.

Checking irreducibility:

M <- matrix(rep(0,25),ncol=5)
for (i in 0:4) M <- M + trans_mat %^% i
M
##        [,1]   [,2]   [,3]   [,4]   [,5]
## [1,] 1.0500 1.2625 0.7500 1.2000 0.7375
## [2,] 0.0000 3.0000 2.0000 0.0000 0.0000
## [3,] 0.0000 2.0000 3.0000 0.0000 0.0000
## [4,] 0.1475 0.0625 0.0125 2.8400 1.9375
## [5,] 0.2400 0.1600 0.0625 1.6975 2.8400

The matrix is not irreducible.


Ergodic chain:

trans_mat <- matrix(c(0,1/2,0,1/2,0,
                      0,1/2,1/2,0,0,
                      1/10,4/10,5/10,0,0,
                      0,0,0,1/2,1/2,
                      1/10,0,0,4/10,1/2),ncol=5,byrow=T)
N = 100
trans_mat %^% N
##            [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 0.04761905 0.2380952 0.2380952 0.2380952 0.2380952
## [2,] 0.04761905 0.2391986 0.2393289 0.2369918 0.2368616
## [3,] 0.04761905 0.2390821 0.2391986 0.2371083 0.2369918
## [4,] 0.04761905 0.2369918 0.2368616 0.2391986 0.2393289
## [5,] 0.04761905 0.2371083 0.2369918 0.2390821 0.2391986

Checking irreducibility:

M <- matrix(rep(0,25),ncol=5)
for (i in 0:4) M <- M + trans_mat %^% i
M
##        [,1]   [,2]   [,3]   [,4]   [,5]
## [1,] 1.1000 1.2125 0.7375 1.2125 0.7375
## [2,] 0.1475 2.8400 1.9375 0.0625 0.0125
## [3,] 0.2425 1.6975 2.8400 0.1575 0.0625
## [4,] 0.1475 0.0625 0.0125 2.8400 1.9375
## [5,] 0.2425 0.1575 0.0625 1.6975 2.8400

Checking aperiodicity:

trans_mat %^% 3
##       [,1]  [,2]  [,3]  [,4]  [,5]
## [1,] 0.050 0.225 0.250 0.225 0.250
## [2,] 0.050 0.450 0.475 0.025 0.000
## [3,] 0.045 0.430 0.450 0.050 0.025
## [4,] 0.050 0.025 0.000 0.450 0.475
## [5,] 0.045 0.050 0.025 0.430 0.450
trans_mat %^% 4
##        [,1]   [,2]   [,3]   [,4]   [,5]
## [1,] 0.0500 0.2375 0.2375 0.2375 0.2375
## [2,] 0.0475 0.4400 0.4625 0.0375 0.0125
## [3,] 0.0475 0.4175 0.4400 0.0575 0.0375
## [4,] 0.0475 0.0375 0.0125 0.4400 0.4625
## [5,] 0.0475 0.0575 0.0375 0.4175 0.4400

There are multiple diagonal terms positive in both consecutive powers.


Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.