1 Overview

This project investigates the convergence behavior of Markov chains, with a focus on the relationship between continuous-time Markov chains (CTMCs) and their discrete-time counterparts obtained through uniformization. Using a randomly generated 30-state transition structure, we verify key theoretical properties including generator matrix validity, Doeblin minorization, mixing times, and spectral gap analysis.

2 Part A: Constructing a Valid Generator Matrix

We begin by generating a \(30 \times 30\) generator matrix \(Q\) for a continuous-time Markov chain. Off-diagonal entries are sampled from an exponential distribution, and diagonal entries are set to minus the row sums to ensure each row sums to zero.

set.seed(777)
m <- 30
Q <- matrix(rexp(m * m, rate = 1), nrow = m, ncol = m)
diag(Q) <- 0
diag(Q) <- -rowSums(Q)

off_diag_ok <- all(Q[row(Q) != col(Q)] > 0)
row_sum_ok  <- max(abs(rowSums(Q))) < 1e-10
diag_ok     <- all(diag(Q) < 0)

c(off_diag_ok = off_diag_ok, row_sum_ok = row_sum_ok, diag_ok = diag_ok)

## off_diag_ok  row_sum_ok     diag_ok 
##        TRUE        TRUE        TRUE

All three conditions are satisfied: off-diagonal entries are strictly positive, each row sums to zero (up to numerical precision), and all diagonal entries are strictly negative. Therefore \(Q\) is a valid generator matrix for a CTMC.

3 Part B: Uniformization — Constructing a DTMC from the CTMC

Uniformization converts the CTMC into a discrete-time Markov chain (DTMC) \(Y\) with transition matrix \(P = I + Q/\lambda\), where \(\lambda = \max(-\text{diag}(Q))\).

lambda <- max(-diag(Q))
P <- diag(m) + Q / lambda

min_offdiag_P <- min(P[row(P) != col(P)])
row_sums_ok   <- max(abs(rowSums(P) - 1)) < 1e-10
diag_range_ok <- all(diag(P) >= 0 & diag(P) <= 1)

c(lambda         = lambda,
  min_offdiag_P  = min_offdiag_P,
  row_sums_ok    = as.numeric(row_sums_ok),
  diag_range_ok  = as.numeric(diag_range_ok))

##        lambda min_offdiag_P   row_sums_ok diag_range_ok 
##  4.162252e+01  1.659701e-06  1.000000e+00  1.000000e+00

We chose \(\lambda = \max(-\text{diag}(Q))\), guaranteeing \(\lambda\) is at least as large as every exit rate. Since each row of \(Q\) sums to zero, each row of \(P\) sums to 1. All off-diagonal entries are nonnegative and diagonal entries lie in \([0,1]\), confirming \(P\) is a valid transition matrix.

Relationship between semigroups: The CTMC semigroup is \(P(t) = e^{tQ}\). Under uniformization:

\[P(t) = \sum_{n=0}^{\infty} e^{-\lambda t} \frac{(\lambda t)^n}{n!} P^n\]

This means the continuous-time chain can be viewed as:

A Poisson clock ticking at rate \(\lambda\)
At each Poisson event, the chain transitions according to \(P\)
The state does not change between events

If \(N(t) \sim \text{Poisson}(\lambda t)\), then \(P(t) = \mathbb{E}[P^{N(t)}]\) — the CTMC semigroup is a Poisson mixture of powers of the DTMC transition matrix.

4 Part C: Doeblin Minorization

4.1 One-Step Minorization (Fails)

Because \(\lambda = \max(-\text{diag}(Q))\), there exists a state \(i\) with \(-q_{ii} = \lambda\), meaning \(P_{ii} = 1 + q_{ii}/\lambda = 0\). Therefore \(\min(P) = 0\) and no \(\varepsilon > 0\) works for one-step Doeblin minorization.

min_P   <- min(P)
epsilon <- m * min_P

c(minP = min_P, epsilon = epsilon)

##    minP epsilon 
##       0       0

all(P >= epsilon / m)

## [1] TRUE

One-step minorization fails because at least one diagonal entry of \(P\) equals zero.

4.2 Two-Step Minorization (Succeeds)

For any states \(i\) and \(j\), choose an intermediate state \(k \neq i, j\). Since the state space has size 30, such a \(k\) always exists. Because all off-diagonal entries of \(P\) are strictly positive:

\[(P^2)_{ij} = \sum_{r \in S} P_{ir} P_{rj} \geq P_{ik} P_{kj} > 0\]

So every entry of \(P^2\) is strictly positive. Setting \(\varepsilon = 30 \cdot \min_{i,j}(P^2)_{ij}\) and \(\phi\) uniform on \(S\), we get \((P^2)_{ij} \geq \varepsilon \cdot \phi(j)\) for all \(i, j\).

P2     <- P %*% P
minP2  <- min(P2)
epsilon <- m * minP2

c(minP2 = minP2, epsilon = epsilon)

##       minP2     epsilon 
## 0.008107178 0.243215335

all(P2 >= epsilon / m)

## [1] TRUE

\(Y\) satisfies Doeblin’s minorization condition with \(n_0 = 2\).

5 Part D: Convergence of Distributions — Heatmap Visualization

We track how three different initial distributions evolve under repeated application of \(P\): a point mass (Dirac) at state 1, a uniform distribution, and a randomly generated distribution.

library(ggplot2)

make_dirac    <- function(m, s0) { v <- rep(0, m); v[s0] <- 1; v }
make_uniform  <- function(m) rep(1/m, m)
make_random_nu <- function(m, seed = 123) {
  set.seed(seed); w <- rexp(m, rate = 1); w / sum(w)
}

orbit_nu <- function(nu, P, Nmax = 10) {
  out <- matrix(0, nrow = Nmax + 1, ncol = length(nu))
  out[1, ] <- nu
  cur <- nu
  for (n in 1:Nmax) {
    cur <- as.numeric(cur %*% P)
    out[n + 1, ] <- cur
  }
  out
}

to_long_df <- function(orb, label) {
  Nmax <- nrow(orb) - 1
  m    <- ncol(orb)
  data.frame(
    n     = rep(0:Nmax, each = m),
    state = rep(1:m, times = Nmax + 1),
    prob  = as.vector(t(orb)),
    nu    = label
  )
}

nu1 <- make_dirac(m, s0 = 1)
nu2 <- make_uniform(m)
nu3 <- make_random_nu(m, seed = 777)

Nmax <- 10
orb1 <- orbit_nu(nu1, P, Nmax)
orb2 <- orbit_nu(nu2, P, Nmax)
orb3 <- orbit_nu(nu3, P, Nmax)

df <- rbind(
  to_long_df(orb1, "Dirac at State 1"),
  to_long_df(orb2, "Uniform"),
  to_long_df(orb3, "Random")
)

ggplot(df, aes(x = n, y = state, fill = prob)) +
  geom_tile() +
  facet_wrap(~ nu) +
  scale_fill_viridis_c(option = "plasma") +
  labs(
    title = "Distribution Evolution Under Repeated Application of P",
    x     = "Step (n)",
    y     = "State",
    fill  = "Probability"
  ) +
  theme_minimal(base_size = 13)

Observations:

Dirac at State 1: At \(n = 0\), all mass is concentrated in state 1. By steps 1–2, mass spreads rapidly across all states.
Random: Initially uneven; within a few steps the distribution smooths out and becomes indistinguishable from the others by around \(n = 5\)–7.
Uniform: Already spread out at \(n = 0\); settles into the limiting pattern quickest.

All three initial distributions converge to the same stationary distribution, confirming rapid mixing regardless of starting point.

6 Part E: Mixing Times via Total Variation Distance

We compute the mixing time \(t_{mix}(\varepsilon)\) — the number of steps until the total variation distance to stationarity falls below \(\varepsilon\) — for each initial distribution at three tolerance levels.

stationary_pi <- function(P) {
  eg <- eigen(t(P))
  v  <- Re(eg$vectors[, 1])
  v / sum(v)
}

tv_dist <- function(mu, pi) 0.5 * sum(abs(mu - pi))

mixing_time <- function(nu, P, pi, eps, Nmax = 5000) {
  cur <- nu
  if (tv_dist(cur, pi) <= eps) return(0)
  for (n in 1:Nmax) {
    cur <- as.numeric(cur %*% P)
    if (tv_dist(cur, pi) <= eps) return(n)
  }
  NA_integer_
}

pi_vec   <- stationary_pi(P)
eps_list <- c(0.1, 0.01, 0.001)
nus <- list(
  "Dirac at State 1" = nu1,
  "Random"           = nu3,
  "Uniform"          = nu2
)

out <- expand.grid(nu = names(nus), eps = eps_list, stringsAsFactors = FALSE)
out$tmix <- NA_integer_

for (r in seq_len(nrow(out))) {
  out$tmix[r] <- mixing_time(
    nu   = nus[[out$nu[r]]],
    P    = P,
    pi   = pi_vec,
    eps  = out$eps[r],
    Nmax = 5000
  )
}

out

##                 nu   eps tmix
## 1 Dirac at State 1 0.100    3
## 2           Random 0.100    2
## 3          Uniform 0.100    1
## 4 Dirac at State 1 0.010    6
## 5           Random 0.010    5
## 6          Uniform 0.010    3
## 7 Dirac at State 1 0.001    9
## 8           Random 0.001    8
## 9          Uniform 0.001    6

Summary:

Initial Distribution	\(\varepsilon = 0.1\)	\(\varepsilon = 0.01\)	\(\varepsilon = 0.001\)
Dirac at State 1	3	6	9
Random	2	5	8
Uniform	1	3	6

Mixing time increases as \(\varepsilon\) decreases. The Dirac start mixes slowest (furthest from stationarity at \(n=0\)), the uniform start mixes fastest (already spread out), and the random start lies in between. Even at \(\varepsilon = 0.001\), convergence occurs within 9 steps — confirming extremely rapid mixing.

7 Part F: Expected Reward Convergence

We define a reward function \(f(i) = \mathbf{1}[i \leq 5]\) and track \((P^n f)(i)\) — the expected reward after \(n\) steps starting from state \(i\).

f <- rep(0, m)
f[1:5] <- 1

orbit_f <- function(f, P, Nmax = 10) {
  out <- matrix(0, nrow = Nmax + 1, ncol = length(f))
  out[1, ] <- f
  cur <- f
  for (n in 1:Nmax) {
    cur <- as.numeric(P %*% cur)
    out[n + 1, ] <- cur
  }
  out
}

Fmat <- orbit_f(f, P, Nmax)

df_f <- data.frame(
  n     = rep(0:Nmax, each = m),
  state = rep(1:m, times = Nmax + 1),
  value = as.vector(t(Fmat))
)

ggplot(df_f, aes(x = n, y = state, fill = value)) +
  geom_tile() +
  scale_fill_viridis_c(option = "magma") +
  labs(
    title = "Expected Reward Convergence: P^n f(i) Over Steps",
    x     = "Step (n)",
    y     = "State",
    fill  = "Expected Reward"
  ) +
  theme_minimal(base_size = 13)

pi_f <- sum(pi_vec * f)
cat("Stationary expectation pi(f):", round(pi_f, 6))

## Stationary expectation pi(f): 0.144218

At \(n = 0\), the reward is 1 only for states 1–5. As \(n\) increases, values smooth across all states and converge to approximately 0.1442 — the stationary expectation \(\pi(f)\). By around step 5–7, \((P^n f)(i)\) is nearly constant across all starting states \(i\), visible as uniform columns in the heatmap.

8 Part G: Dobrushin Coefficient and Spectral Gap

dobrushin <- function(P) {
  m       <- nrow(P)
  min_sum <- Inf
  for (i in 1:m) {
    for (j in 1:m) {
      s <- sum(pmin(P[i, ], P[j, ]))
      if (s < min_sum) min_sum <- s
    }
  }
  1 - min_sum
}

delta_P <- dobrushin(P)

eigvals    <- eigen(P)$values
eig_mod    <- sort(Mod(eigvals), decreasing = TRUE)
lambda1    <- eig_mod[1]
lambda2    <- eig_mod[2]
spectral_gap <- 1 - lambda2

c(delta_P      = delta_P,
  lambda1       = lambda1,
  lambda2       = lambda2,
  spectral_gap  = spectral_gap)

##      delta_P      lambda1      lambda2 spectral_gap 
##    0.7978933    1.0000000    0.5151970    0.4848030

Dobrushin coefficient \(\delta(P) = 0.7979 < 1\): guarantees one-step contraction in total variation distance and exponential convergence to stationarity.
Spectral gap \(= 1 - \lambda_2 = 0.4848\): a large spectral gap confirms rapid convergence, consistent with the mixing times computed in Part E (convergence within 9 steps at \(\varepsilon = 0.001\)).

9 Part H: Sparse Generator — A Structural Comparison

We now construct a second generator \(Q_2\) with many zero off-diagonal entries (sparsity ~80%), uniformize it, and compare convergence behavior to the dense case.

set.seed(777)
Q2       <- matrix(rexp(m * m, rate = 1), nrow = m, ncol = m)
keep_prob <- 0.2
mask     <- matrix(runif(m * m) < keep_prob, nrow = m, ncol = m)
diag(mask) <- FALSE
Q2[!mask] <- 0
diag(Q2)  <- 0
diag(Q2)  <- -rowSums(Q2)

offdiag_nonneg <- all(Q2[row(Q2) != col(Q2)] >= 0)
row_sum_ok2    <- max(abs(rowSums(Q2))) < 1e-10
diag_neg       <- all(diag(Q2) <= 0)

c(offdiag_nonneg = offdiag_nonneg,
  row_sum_ok     = as.numeric(row_sum_ok2),
  diag_neg       = as.numeric(diag_neg))

## offdiag_nonneg     row_sum_ok       diag_neg 
##              1              1              1

lambda2_val <- max(-diag(Q2))
P_sparse    <- diag(m) + Q2 / lambda2_val

# Check when P_sparse^k becomes strictly positive
is_all_positive <- function(M) all(M > 0)
Pk <- P_sparse
for (k in 1:10) {
  cat("k =", k, " all positive:", is_all_positive(Pk), "\n")
  Pk <- Pk %*% P_sparse
}

## k = 1  all positive: FALSE 
## k = 2  all positive: FALSE 
## k = 3  all positive: FALSE 
## k = 4  all positive: TRUE 
## k = 5  all positive: TRUE 
## k = 6  all positive: TRUE 
## k = 7  all positive: TRUE 
## k = 8  all positive: TRUE 
## k = 9  all positive: TRUE 
## k = 10  all positive: TRUE

P2 <- P_sparse

nu_dirac <- make_dirac(m, 1)
nu_unif  <- make_uniform(m)
nu_rand  <- make_random_nu(m, 777)

Nmax_orbit <- 30
orb_dirac  <- orbit_nu(nu_dirac, P2, Nmax_orbit)
orb_unif   <- orbit_nu(nu_unif,  P2, Nmax_orbit)
orb_rand   <- orbit_nu(nu_rand,  P2, Nmax_orbit)

df_orb <- rbind(
  to_long_df(orb_dirac, "Dirac at State 1"),
  to_long_df(orb_unif,  "Uniform"),
  to_long_df(orb_rand,  "Random")
)
df_orb$state <- factor(df_orb$state, levels = rev(1:m))

ggplot(df_orb, aes(x = n, y = prob, group = state, color = state)) +
  geom_line(alpha = 0.7) +
  facet_wrap(~ nu, ncol = 1) +
  labs(
    title = "Distribution Evolution Under Sparse Transition Matrix",
    x     = "Step (n)",
    y     = "Probability",
    color = "State"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "right")

# Mixing times for sparse chain
pi2 <- stationary_pi(P2)

mix_tbl <- expand.grid(nu = names(nus), eps = eps_list, stringsAsFactors = FALSE)
mix_tbl$tmix <- NA_integer_

for (r in seq_len(nrow(mix_tbl))) {
  mix_tbl$tmix[r] <- mixing_time(
    nu   = nus[[mix_tbl$nu[r]]],
    P    = P2,
    pi   = pi2,
    eps  = mix_tbl$eps[r],
    Nmax = 20000
  )
}

mix_tbl

##                 nu   eps tmix
## 1 Dirac at State 1 0.100    8
## 2           Random 0.100    7
## 3          Uniform 0.100    5
## 4 Dirac at State 1 0.010   24
## 5           Random 0.010   21
## 6          Uniform 0.010   18
## 7 Dirac at State 1 0.001   42
## 8           Random 0.001   37
## 9          Uniform 0.001   33

# Expected reward under sparse chain
Fmat2 <- orbit_f(f, P2, Nmax = 30)

df_f2 <- data.frame(
  n     = rep(0:30, each = m),
  state = rep(1:m, times = 31),
  value = as.vector(t(Fmat2))
)
df_f2$state <- factor(df_f2$state, levels = rev(1:m))

ggplot(df_f2, aes(x = n, y = state, fill = value)) +
  geom_tile() +
  scale_fill_viridis_c(option = "magma") +
  labs(
    title = "Expected Reward Convergence: Sparse Chain",
    x     = "Step (n)",
    y     = "State",
    fill  = "Expected Reward"
  ) +
  theme_minimal(base_size = 13)

pi_f2 <- sum(pi2 * f)
cat("Stationary expectation pi(f) for sparse chain:", round(pi_f2, 6))

## Stationary expectation pi(f) for sparse chain: 0.082303

# Dobrushin and spectral gap for sparse chain
delta_P2   <- dobrushin(P2)
eigvals2   <- eigen(P2)$values
eig_mod2   <- sort(Mod(eigvals2), decreasing = TRUE)
lambda1_s  <- eig_mod2[1]
lambda2_s  <- eig_mod2[2]
gap2       <- 1 - lambda2_s

c(delta     = delta_P2,
  lambda1   = lambda1_s,
  lambda2   = lambda2_s,
  spectral_gap = gap2)

##        delta      lambda1      lambda2 spectral_gap 
##    1.0000000    1.0000000    0.8624475    0.1375525

The sparse chain requires 4 steps before \(P^k\) becomes strictly positive (compared to 2 for the dense chain), indicating a slower spread of probability mass due to structural bottlenecks. Despite this, the spectral gap of 0.735 is actually larger than the dense chain’s 0.485, reflecting faster asymptotic convergence once communication is established.

10 Part I: Shared Stationary Distribution

Claim: The uniformized DTMC and the original CTMC share the same stationary distribution.

Proof: Let \(\pi\) be a stationary distribution for the CTMC, so \(\pi Q = 0\). The uniformized transition matrix is \(P = I + Q/\lambda\). Then:

\[\pi P = \pi\left(I + \frac{Q}{\lambda}\right) = \pi + \frac{1}{\lambda}\pi Q = \pi + 0 = \pi\]

Therefore \(\pi\) is also stationary for \(P\). \(\square\)

11 Part J: Embedded Chain vs. Uniformized Chain

The embedded chain has transition matrix \(P_e(i,j) = q_{ij}/(-q_{ii})\) for \(i \neq j\). Unlike uniformization, this construction removes holding times and records only the sequence of jump locations.

The key distinction is that the stationary distribution of the CTMC reflects time spent in each state, while the embedded chain reflects jump frequencies. These coincide only when all exit rates \(-q_{ii}\) are equal. In general, using the embedded chain would alter the stationary distribution and therefore change the mixing times, spectral gap, and convergence behavior computed throughout this analysis.

12 Part K: Dobrushin’s Theorem — Theoretical Context

Proposition 8.2.12 defines the Dobrushin ergodicity coefficient as the largest possible total variation distance between any two rows of the transition matrix — measuring how differently the chain can behave depending on the starting state. Theorem 8.2.13 shows that if \(\delta(P) < 1\), the transition operator contracts differences between distributions at each step, geometrically over time. This guarantees exponential convergence to a unique stationary distribution, with the rate controlled by \(\delta(P)\).

In our dense chain, \(\delta(P) = 0.7979 < 1\) confirms this contraction property and is consistent with the rapid mixing times observed empirically.

13 Summary

This analysis demonstrated the theoretical and empirical convergence properties of a uniformized Markov chain:

A valid 30-state CTMC generator was constructed and uniformized to a DTMC, with both sharing the same stationary distribution
One-step Doeblin minorization fails but two-step minorization succeeds, guaranteeing convergence
All three initial distributions converge to stationarity within 9 steps at \(\varepsilon = 0.001\)
The Dobrushin coefficient (0.798) and spectral gap (0.485) both confirm rapid mixing
A sparse chain comparison revealed structural bottlenecks but ultimately faster asymptotic convergence due to a larger spectral gap (0.735)

Markov Chain Convergence Analysis: From Continuous to Discrete Time

Nicholas Heng

2026-04-17