The Problem

Suppose our HMM has identified 3 states, and we know the system has 3 true regimes. But the HMM labels are arbitrary - “State 1” from the HMM might actually correspond to “Regime 3” in reality.

Goal: Find the best one-to-one matching between HMM states and true regimes.


Step 1: Simulate Some Data

Let’s create a simple scenario with 3 regimes and 1000 time points.

set.seed(42)

#Number of time points and regimes
T_total <- 1000
K <- 3

#Simulate ground-truth regimes (switching every ~300 time points)
S_true <- c(rep(1, 350), rep(2, 300), rep(3, 350))

#Simulate HMM decoded states
#The HMM is mostly correct, but its labels are scrambled:
#  - When true regime is 1, HMM usually says "3"
#  - When true regime is 2, HMM usually says "1"
#  - When true regime is 3, HMM usually says "2"
#Plus some noise/errors

S_hat <- sapply(S_true, function(s) {
  #Scrambled mapping with 85% accuracy
  if (s == 1) {
    sample(c(3, 1, 2), 1, prob = c(0.85, 0.10, 0.05))
  } else if (s == 2) {
    sample(c(1, 2, 3), 1, prob = c(0.85, 0.10, 0.05))
  } else {
    sample(c(2, 1, 3), 1, prob = c(0.85, 0.05, 0.10))
  }
})

#Show first 20 values
data.frame(
  Time = 1:20,
  True_Regime = S_true[1:20],
  HMM_State = S_hat[1:20]
)
##    Time True_Regime HMM_State
## 1     1           1         1
## 2     2           1         1
## 3     3           1         3
## 4     4           1         3
## 5     5           1         3
## 6     6           1         3
## 7     7           1         3
## 8     8           1         3
## 9     9           1         3
## 10   10           1         3
## 11   11           1         3
## 12   12           1         3
## 13   13           1         1
## 14   14           1         3
## 15   15           1         3
## 16   16           1         1
## 17   17           1         2
## 18   18           1         3
## 19   19           1         3
## 20   20           1         3

Step 2: Build the Benefit Matrix B

The benefit matrix counts how often each HMM state occurred during each true regime.

\[B_{ij} = \sum_{t=1}^{T} \mathbf{1}\{\hat{S}_t = i \text{ and } S_t^{\text{true}} = j\}\]

#Initialize benefit matrix
B <- matrix(0, nrow = K, ncol = K)
rownames(B) <- paste("HMM State", 1:K)
colnames(B) <- paste("True Regime", 1:K)

#Count overlaps
for (t in 1:T_total) {
  i <- S_hat[t]      #HMM state
  j <- S_true[t]     #True regime
  B[i, j] <- B[i, j] + 1
}

print(B)
##             True Regime 1 True Regime 2 True Regime 3
## HMM State 1            39           260            13
## HMM State 2            15            25           295
## HMM State 3           296            15            42

Interpretation: Each cell shows how many time points HMM state \(i\) coincided with true regime \(j\). Higher numbers suggest a better match.


Step 3: Visualize the Benefit Matrix

#Heatmap of the benefit matrix
heatmap(B,
        Rowv = NA, Colv = NA,  #Don't reorder
        scale = "none",
        col = heat.colors(10, rev = TRUE),
        main = "Benefit Matrix B\n(Overlap Counts)")


Step 4: Convert to Cost Matrix

The Hungarian algorithm minimizes cost. To maximize benefit, we convert:

\[C_{ij} = \max(B) - B_{ij}\]

#Convert to cost matrix
C <- max(B) - B

print(C)
##             True Regime 1 True Regime 2 True Regime 3
## HMM State 1           257            36           283
## HMM State 2           281           271             1
## HMM State 3             0           281           254

Now smaller values = better matches.


Step 5: Hungarian Algorithm - Manual Walkthrough

Let’s walk through the algorithm step by step.

Step 5a: Row Reduction

Subtract the minimum of each row from all elements in that row.

C_reduced <- C

#Row reduction
for (i in 1:K) {
  C_reduced[i, ] <- C_reduced[i, ] - min(C_reduced[i, ])
}

print(C_reduced)
##             True Regime 1 True Regime 2 True Regime 3
## HMM State 1           221             0           247
## HMM State 2           280           270             0
## HMM State 3             0           281           254

Step 5b: Column Reduction

Subtract the minimum of each column from all elements in that column.

#Column reduction
for (j in 1:K) {
  C_reduced[, j] <- C_reduced[, j] - min(C_reduced[, j])
}

print(C_reduced)
##             True Regime 1 True Regime 2 True Regime 3
## HMM State 1           221             0           247
## HMM State 2           280           270             0
## HMM State 3             0           281           254

Step 5c: Find Optimal Assignment

Now we look for zeros. We need to find K zeros such that no two are in the same row or column.

#Show where the zeros are
zeros_matrix <- (C_reduced == 0)
print(zeros_matrix)
##             True Regime 1 True Regime 2 True Regime 3
## HMM State 1         FALSE          TRUE         FALSE
## HMM State 2         FALSE         FALSE          TRUE
## HMM State 3          TRUE         FALSE         FALSE

For small matrices, we can often find the assignment by inspection. For larger matrices, the algorithm has additional steps to create more zeros if needed.


Step 6: Use R’s Implementation

In practice, we use an optimized implementation. The clue package provides solve_LSAP (Linear Sum Assignment Problem).

#Install if needed: install.packages("clue")
library(clue)

#solve_LSAP minimizes cost, so we use the cost matrix
#It returns the column index assigned to each row
assignment <- solve_LSAP(C, maximum = FALSE)

#Or equivalently, maximize the benefit matrix directly
assignment_max <- solve_LSAP(B, maximum = TRUE)

print(assignment)
## Optimal assignment:
## 1 => 2, 2 => 3, 3 => 1

Interpretation:


Step 7: Build the Permutation Matrix P

#Create permutation matrix
P <- matrix(0, nrow = K, ncol = K)
rownames(P) <- paste("HMM State", 1:K)
colnames(P) <- paste("True Regime", 1:K)

for (i in 1:K) {
  P[i, assignment[i]] <- 1
}

print(P)
##             True Regime 1 True Regime 2 True Regime 3
## HMM State 1             0             1             0
## HMM State 2             0             0             1
## HMM State 3             1             0             0

Step 8: Verify the Solution

#Total benefit achieved
total_benefit <- sum(P * B)
cat("Total overlap with optimal matching:", total_benefit, "out of", T_total, "time points\n")
## Total overlap with optimal matching: 851 out of 1000 time points
cat("Accuracy:", round(100 * total_benefit / T_total, 1), "%\n")
## Accuracy: 85.1 %
#Compare with a wrong matching (identity: State i -> Regime i)
wrong_benefit <- sum(diag(B))
cat("\nIf we naively matched State i to Regime i:", wrong_benefit, "overlaps\n")
## 
## If we naively matched State i to Regime i: 106 overlaps
cat("Accuracy:", round(100 * wrong_benefit / T_total, 1), "%\n")
## Accuracy: 10.6 %

Step 9: Apply the Mapping to Relabel HMM States

#Create the mapping from HMM state to true regime
mapping <- as.integer(assignment)
names(mapping) <- 1:K

cat("Mapping:\n")
## Mapping:
for (i in 1:K) {
  cat("  HMM State", i, "->", "True Regime", mapping[i], "\n")
}
##   HMM State 1 -> True Regime 2 
##   HMM State 2 -> True Regime 3 
##   HMM State 3 -> True Regime 1
#Relabel the HMM states
S_hat_relabeled <- mapping[S_hat]

#Check accuracy after relabeling
accuracy <- mean(S_hat_relabeled == S_true)
cat("\nClassification accuracy after relabeling:", round(100 * accuracy, 1), "%\n")
## 
## Classification accuracy after relabeling: 85.1 %

Step 10: Visualize the Time Series

par(mfrow = c(3, 1), mar = c(4, 4, 2, 1))

#True regimes
plot(1:T_total, S_true, type = "l", col = "black", lwd = 2,
     xlab = "Time", ylab = "Regime", main = "Ground Truth Regimes",
     ylim = c(0.5, 3.5))

#HMM states (original labels)
plot(1:T_total, S_hat, type = "l", col = "red", lwd = 2,
     xlab = "Time", ylab = "State", main = "HMM States (Original Labels)",
     ylim = c(0.5, 3.5))

#HMM states (after Hungarian relabeling)
plot(1:T_total, S_hat_relabeled, type = "l", col = "blue", lwd = 2,
     xlab = "Time", ylab = "Regime", main = "HMM States (After Hungarian Relabeling)",
     ylim = c(0.5, 3.5))


Summary

  1. Build Benefit Matrix B: Count overlaps between HMM states and true regimes
  2. Convert to Cost Matrix C: \(C = \max(B) - B\)
  3. Apply Hungarian Algorithm: Find optimal one-to-one matching
  4. Relabel HMM States: Use the mapping to align with true regime labels

The Hungarian algorithm guarantees the globally optimal matching in \(O(K^3)\) time.