Suppose our HMM has identified 3 states, and we know the system has 3 true regimes. But the HMM labels are arbitrary - “State 1” from the HMM might actually correspond to “Regime 3” in reality.
Goal: Find the best one-to-one matching between HMM states and true regimes.
Let’s create a simple scenario with 3 regimes and 1000 time points.
set.seed(42)
#Number of time points and regimes
T_total <- 1000
K <- 3
#Simulate ground-truth regimes (switching every ~300 time points)
S_true <- c(rep(1, 350), rep(2, 300), rep(3, 350))
#Simulate HMM decoded states
#The HMM is mostly correct, but its labels are scrambled:
# - When true regime is 1, HMM usually says "3"
# - When true regime is 2, HMM usually says "1"
# - When true regime is 3, HMM usually says "2"
#Plus some noise/errors
S_hat <- sapply(S_true, function(s) {
#Scrambled mapping with 85% accuracy
if (s == 1) {
sample(c(3, 1, 2), 1, prob = c(0.85, 0.10, 0.05))
} else if (s == 2) {
sample(c(1, 2, 3), 1, prob = c(0.85, 0.10, 0.05))
} else {
sample(c(2, 1, 3), 1, prob = c(0.85, 0.05, 0.10))
}
})
#Show first 20 values
data.frame(
Time = 1:20,
True_Regime = S_true[1:20],
HMM_State = S_hat[1:20]
)
## Time True_Regime HMM_State
## 1 1 1 1
## 2 2 1 1
## 3 3 1 3
## 4 4 1 3
## 5 5 1 3
## 6 6 1 3
## 7 7 1 3
## 8 8 1 3
## 9 9 1 3
## 10 10 1 3
## 11 11 1 3
## 12 12 1 3
## 13 13 1 1
## 14 14 1 3
## 15 15 1 3
## 16 16 1 1
## 17 17 1 2
## 18 18 1 3
## 19 19 1 3
## 20 20 1 3
The benefit matrix counts how often each HMM state occurred during each true regime.
\[B_{ij} = \sum_{t=1}^{T} \mathbf{1}\{\hat{S}_t = i \text{ and } S_t^{\text{true}} = j\}\]
#Initialize benefit matrix
B <- matrix(0, nrow = K, ncol = K)
rownames(B) <- paste("HMM State", 1:K)
colnames(B) <- paste("True Regime", 1:K)
#Count overlaps
for (t in 1:T_total) {
i <- S_hat[t] #HMM state
j <- S_true[t] #True regime
B[i, j] <- B[i, j] + 1
}
print(B)
## True Regime 1 True Regime 2 True Regime 3
## HMM State 1 39 260 13
## HMM State 2 15 25 295
## HMM State 3 296 15 42
Interpretation: Each cell shows how many time points HMM state \(i\) coincided with true regime \(j\). Higher numbers suggest a better match.
#Heatmap of the benefit matrix
heatmap(B,
Rowv = NA, Colv = NA, #Don't reorder
scale = "none",
col = heat.colors(10, rev = TRUE),
main = "Benefit Matrix B\n(Overlap Counts)")
The Hungarian algorithm minimizes cost. To maximize benefit, we convert:
\[C_{ij} = \max(B) - B_{ij}\]
#Convert to cost matrix
C <- max(B) - B
print(C)
## True Regime 1 True Regime 2 True Regime 3
## HMM State 1 257 36 283
## HMM State 2 281 271 1
## HMM State 3 0 281 254
Now smaller values = better matches.
Let’s walk through the algorithm step by step.
Subtract the minimum of each row from all elements in that row.
C_reduced <- C
#Row reduction
for (i in 1:K) {
C_reduced[i, ] <- C_reduced[i, ] - min(C_reduced[i, ])
}
print(C_reduced)
## True Regime 1 True Regime 2 True Regime 3
## HMM State 1 221 0 247
## HMM State 2 280 270 0
## HMM State 3 0 281 254
Subtract the minimum of each column from all elements in that column.
#Column reduction
for (j in 1:K) {
C_reduced[, j] <- C_reduced[, j] - min(C_reduced[, j])
}
print(C_reduced)
## True Regime 1 True Regime 2 True Regime 3
## HMM State 1 221 0 247
## HMM State 2 280 270 0
## HMM State 3 0 281 254
Now we look for zeros. We need to find K zeros such that no two are in the same row or column.
#Show where the zeros are
zeros_matrix <- (C_reduced == 0)
print(zeros_matrix)
## True Regime 1 True Regime 2 True Regime 3
## HMM State 1 FALSE TRUE FALSE
## HMM State 2 FALSE FALSE TRUE
## HMM State 3 TRUE FALSE FALSE
For small matrices, we can often find the assignment by inspection. For larger matrices, the algorithm has additional steps to create more zeros if needed.
In practice, we use an optimized implementation. The
clue package provides solve_LSAP (Linear Sum
Assignment Problem).
#Install if needed: install.packages("clue")
library(clue)
#solve_LSAP minimizes cost, so we use the cost matrix
#It returns the column index assigned to each row
assignment <- solve_LSAP(C, maximum = FALSE)
#Or equivalently, maximize the benefit matrix directly
assignment_max <- solve_LSAP(B, maximum = TRUE)
print(assignment)
## Optimal assignment:
## 1 => 2, 2 => 3, 3 => 1
Interpretation:
#Create permutation matrix
P <- matrix(0, nrow = K, ncol = K)
rownames(P) <- paste("HMM State", 1:K)
colnames(P) <- paste("True Regime", 1:K)
for (i in 1:K) {
P[i, assignment[i]] <- 1
}
print(P)
## True Regime 1 True Regime 2 True Regime 3
## HMM State 1 0 1 0
## HMM State 2 0 0 1
## HMM State 3 1 0 0
#Total benefit achieved
total_benefit <- sum(P * B)
cat("Total overlap with optimal matching:", total_benefit, "out of", T_total, "time points\n")
## Total overlap with optimal matching: 851 out of 1000 time points
cat("Accuracy:", round(100 * total_benefit / T_total, 1), "%\n")
## Accuracy: 85.1 %
#Compare with a wrong matching (identity: State i -> Regime i)
wrong_benefit <- sum(diag(B))
cat("\nIf we naively matched State i to Regime i:", wrong_benefit, "overlaps\n")
##
## If we naively matched State i to Regime i: 106 overlaps
cat("Accuracy:", round(100 * wrong_benefit / T_total, 1), "%\n")
## Accuracy: 10.6 %
#Create the mapping from HMM state to true regime
mapping <- as.integer(assignment)
names(mapping) <- 1:K
cat("Mapping:\n")
## Mapping:
for (i in 1:K) {
cat(" HMM State", i, "->", "True Regime", mapping[i], "\n")
}
## HMM State 1 -> True Regime 2
## HMM State 2 -> True Regime 3
## HMM State 3 -> True Regime 1
#Relabel the HMM states
S_hat_relabeled <- mapping[S_hat]
#Check accuracy after relabeling
accuracy <- mean(S_hat_relabeled == S_true)
cat("\nClassification accuracy after relabeling:", round(100 * accuracy, 1), "%\n")
##
## Classification accuracy after relabeling: 85.1 %
par(mfrow = c(3, 1), mar = c(4, 4, 2, 1))
#True regimes
plot(1:T_total, S_true, type = "l", col = "black", lwd = 2,
xlab = "Time", ylab = "Regime", main = "Ground Truth Regimes",
ylim = c(0.5, 3.5))
#HMM states (original labels)
plot(1:T_total, S_hat, type = "l", col = "red", lwd = 2,
xlab = "Time", ylab = "State", main = "HMM States (Original Labels)",
ylim = c(0.5, 3.5))
#HMM states (after Hungarian relabeling)
plot(1:T_total, S_hat_relabeled, type = "l", col = "blue", lwd = 2,
xlab = "Time", ylab = "Regime", main = "HMM States (After Hungarian Relabeling)",
ylim = c(0.5, 3.5))
The Hungarian algorithm guarantees the globally optimal matching in \(O(K^3)\) time.