Andrey Markov
1856 - 1922
Andrey Markov was a Russian mathematician, that through his exploration of the weak law of large numbers, fathered what are now known as Markov Models.
A Markov Model is a stochastic model used to examine randomly changing systems. In these systems, it is assumed that future states depend only on the current state, not on the events that occurred before it, thus exhibiting the Markov Property.
There are four common Markov Models:
The simplest way to illustrate the Markov Property is with the first common Markov Model: Markov Chains
Formally a Markov Chain is Specified by:
Markov Chains Satisfy the Following Properties:
What is the forecast for the day after tomorrow?
\(A * \pi = \left[\begin{array}{rrr}0.8 & 0.2 & 0.3 \\0.1 & 0.6 & 0.2 \\0.1 & 0.2 & 0.5\end{array}\right]*\left[\begin{array} 00.75 \\ 0.2 \\0.05\end{array}\right] = \left[\begin{array}{rrr}0.655 \\ 0.205 \\ 0.140 \end{array}\right]\)
\(A = \begin{bmatrix}a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix}\)
\(B = \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \\b_{31} & b_{32} \end{bmatrix}\)
Data Used in Following Code Examples
#---set up: Transition Prob, Emission Prob, & Init Dist matrices---
a = matrix(c(0.54, 0.49, 0.46, 0.51),nrow = 2,ncol = 2)
b = matrix(c(0.16, 0.25, 0.26, 0.28, 0.58, 0.47),nrow = 2,ncol = 3)
initial_distribution = c(1/2, 1/2)
a
## [,1] [,2]
## [1,] 0.54 0.46
## [2,] 0.49 0.51
b
## [,1] [,2] [,3]
## [1,] 0.16 0.26 0.58
## [2,] 0.25 0.28 0.47
initial_distribution
## [1] 0.5 0.5
Forward Algorithm R Code:
data = read.csv("hmm_ex_data_r.csv")
#---set up: Transition Prob, Emission Prob, & Init Dist matrices---
a = matrix(c(0.54, 0.49, 0.46, 0.51),nrow = 2,ncol = 2)
b = matrix(c(0.16, 0.25, 0.26, 0.28, 0.58, 0.47),nrow = 2,ncol = 3)
initial_distribution = c(1/2, 1/2)
#---function to implement forward algorithm-------
forward = function(v, a, b, initial_distribution){
T = length(v) # length of time series
m = nrow(a) # no. hidden states
alpha = matrix(0, T, m) # set up alpha matrix
alpha[1, ] = initial_distribution*b[, v[1]] # initial value
for(t in 2:T){ # calculate remaining values
tmp = alpha[t-1, ] %*% a
alpha[t, ] = tmp * b[, v[t]]
}
return(alpha)
}
#---call forward algorithm-----------------------
alpha = forward(data$Visible,a,b,initial_distribution)
print("alpha:")
## [1] "alpha:"
head(alpha)
## [,1] [,2]
## [1,] 0.080000000 0.125000000
## [2,] 0.027157000 0.028154000
## [3,] 0.016506939 0.012619857
## [4,] 0.008756537 0.006593780
## [5,] 0.004616500 0.003473692
## [6,] 0.002433111 0.001830731
tail(alpha)
## [,1] [,2]
## [495,] 5.123287e-220 5.131635e-220
## [496,] 1.373080e-220 1.392677e-220
## [497,] 8.258473e-221 6.306845e-221
## [498,] 4.378959e-221 3.297233e-221
## [499,] 1.034873e-221 1.034855e-221
## [500,] 6.182280e-222 4.717943e-222
Backwards Algorithm R Code:
data = read.csv("hmm_ex_data_r.csv")
#---set up: Transition Prob & Emission Prob matrices------
a = matrix(c(0.54, 0.49, 0.46, 0.51),nrow = 2,ncol = 2)
b = matrix(c(0.16, 0.25, 0.26, 0.28, 0.58, 0.47),nrow = 2,ncol = 3)
#---BACKWARD Algorithm Function---------------------------
backward = function(V, A, B){
T = length(V)
m = nrow(A)
beta = matrix(1, T, m)
for(t in (T-1):1){
tmp = as.matrix(beta[t+1, ] * B[, V[t+1]])
beta[t, ] = t(A %*% tmp)
}
return(beta)
}
#---call Backward Algorithm Function----------------------
beta = backward(data$Visible,a,b)
head(beta)
## [,1] [,2]
## [1,] 5.306946e-221 5.323733e-221
## [2,] 1.981733e-220 1.960087e-220
## [3,] 3.760130e-220 3.719059e-220
## [4,] 7.134450e-220 7.056523e-220
## [5,] 1.353687e-219 1.338901e-219
## [6,] 2.568478e-219 2.540423e-219
tail(beta)
## [,1] [,2]
## [495,] 0.01061245 0.01064604
## [496,] 0.03962865 0.03919713
## [497,] 0.07516995 0.07440065
## [498,] 0.14180608 0.14225848
## [499,] 0.52940000 0.52390000
## [500,] 1.00000000 1.00000000
Baum Welch Algorithm R Code:
forward = function(v, a, b, initial_distribution){
T = length(v)
M = nrow(a)
alpha = matrix(0, T, M)
alpha[1, ] = initial_distribution*b[, v[1]]
for(t in 2:T){
tmp = alpha[t-1, ] %*% a
alpha[t, ] = tmp * b[, v[t]]}
return(alpha)}
backward = function(v, a, b){
T = length(v)
M = nrow(a)
beta = matrix(1, T, M)
for(t in (T-1):1){
tmp = as.matrix(beta[t+1, ] * b[, v[t+1]])
beta[t, ] = t(a %*% tmp)}
return(beta)}
BaumWelch = function(v, a, b, initial_distribution, n.iter = 100){
for(i in 1:n.iter){
T = length(v)
M = nrow(a)
K=ncol(b)
alpha = forward(v, a, b, initial_distribution)
beta = backward(v, a, b)
xi = array(0, dim=c(M, M, T-1))
for(t in 1:T-1){
denominator = ((alpha[t,] %*% a) * b[,v[t+1]]) %*% matrix(beta[t+1,])
for(s in 1:M){
numerator = alpha[t,s] * a[s,] * b[,v[t+1]] * beta[t+1,]
xi[s,,t]=numerator/as.vector(denominator)
}
}
xi.all.t = rowSums(xi, dims = 2)
a = xi.all.t/rowSums(xi.all.t)
gamma = apply(xi, c(1, 3), sum)
gamma = cbind(gamma, colSums(xi[, , T-1]))
for(l in 1:K){
b[, l] = rowSums(gamma[, which(v==l)])
}
b = b/rowSums(b)
}
return(list(a = a, b = b, initial_distribution = initial_distribution))
}
data = read.csv("hmm_ex_data_r.csv")
M=2; K=3
A = matrix(1, M, M)
A = A/rowSums(A)
B = matrix(1:6, M, K)
B = B/rowSums(B)
initial_distribution = c(1/2, 1/2)
(myout = BaumWelch(data$Visible, A, B, initial_distribution, n.iter = 100))
## $a
## [,1] [,2]
## [1,] 0.5381634 0.4618366
## [2,] 0.4866444 0.5133556
##
## $b
## [,1] [,2] [,3]
## [1,] 0.1627751 0.2625807 0.5746441
## [2,] 0.2514996 0.2778097 0.4706907
##
## $initial_distribution
## [1] 0.5 0.5
Checking output against hmm() R package results:
library(HMM)
hmm =initHMM(c("A", "B"), c(1, 2, 3),
startProbs = initial_distribution,
transProbs = A, emissionProbs = B)
true.out = baumWelch(hmm, data$Visible, maxIterations=100, pseudoCount=0)
true.out$hmm
## $States
## [1] "A" "B"
##
## $Symbols
## [1] 1 2 3
##
## $startProbs
## A B
## 0.5 0.5
##
## $transProbs
## to
## from A B
## A 0.5381634 0.4618366
## B 0.4866444 0.5133556
##
## $emissionProbs
## symbols
## states 1 2 3
## A 0.1627751 0.2625807 0.5746441
## B 0.2514996 0.2778097 0.4706907
The Baum Welch is used to estimate Transition and Emission Matrices. The next step involving Viterbi algoithn is used to make state predictions from testing data.
Viterbi with simulated data example
library(HMM)
# Initialise HMM
hmm = initHMM(c("A","B"), c("L","R"), transProbs=matrix(c(.6,.4,.4,.6),2),
emissionProbs=matrix(c(.6,.4,.4,.6),2))
hmm
## $States
## [1] "A" "B"
##
## $Symbols
## [1] "L" "R"
##
## $startProbs
## A B
## 0.5 0.5
##
## $transProbs
## to
## from A B
## A 0.6 0.4
## B 0.4 0.6
##
## $emissionProbs
## symbols
## states L R
## A 0.6 0.4
## B 0.4 0.6
# Sequence of observations
observations = c("L","L","R","R")
# Calculate Viterbi path
vp = viterbi(hmm,observations)
vp
## [1] "A" "A" "B" "B"
Explore a classic example of HMM of an imaginary dishonest casino and demo the aphid Hidden Markov Package
2 dice, one fair and one weighted with probability of rolling 6 at .5 and .1 for rest
# set states
states <- c("Begin", "Fair", "Loaded")
# dice results 1 through 6
residues <- paste(1:6)
### Define transition probability matrix A
A <- matrix(c(0, 0, 0, 0.99, 0.95, 0.1, 0.01, 0.05, 0.9), nrow = 3)
dimnames(A) <- list(from = states, to = states)
### Define emission probability matrix E
E <- matrix(c(rep(1/6, 6), rep(1/10, 5), 1/2), nrow = 2, byrow = TRUE)
dimnames(E) <- list(states = states[-1], residues = residues)
The plot.HMM method depicts the transition probabilities as weighted lines, and emission probabilities as horizontal bars.
### Create the HMM object
x <- structure(list(A = A, E = E), class = "HMM")
### Plot the model
plot(x, textexp = 1.5)
### Add the transition probabilities as text
text(x = 0.02, y = 0.5, labels = "0.95")
text(x = 0.51, y = 0.5, labels = "0.90")
text(x = 0.5, y = 0.9, labels = "0.05")
text(x = 0.5, y = 0.1, labels = "0.10")
6’s appearing at a higher rate suggesting the use of the loaded dice
# Load the casino dataset
data(casino)
rolls <- as.numeric(casino)
# roll results
qplot(rolls, ylab = 'count',main = 'Dice Roll Results')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The viterbi algorithm allows us to discover the most likely sequence of hidden states (when the dice switched). Comparison below.
# The actual path is stored in the names attribute of the sequence
actual <- c("F", "L")[match(names(casino), c("Fair", "Loaded"))]
### Find the predicted path
vit1 <- Viterbi(x, casino)
predicted <- c("F", "L")[vit1$path + 1]
actual[1:50]
## [1] "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F"
## [18] "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F"
## [35] "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "L" "L" "L" "L" "L"
predicted[1:50]
## [1] "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F"
## [18] "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F"
## [35] "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "L" "L"
We can also calculate the full and posterior probabilities of the sequence given the model using the forward and backward algorithms:
casino.post <- posterior(x, casino)
plot(1:300, seq(0, 1, length.out = 300), type = "n", xlab = "Roll number",
ylab = "Posterior probability of dice being fair", main = 'Fairness Probabilities')
starts <- which(c("L", actual) == "F" & c(actual, "F") == "L")
ends <- which(c("F", actual) == "L" & c(actual, "L") == "F") - 1
for(i in 1:6) rect(starts[i], 0, ends[i], 1, col = "grey", border = NA)
lines(1:300, casino.post[1, ])
Data
The data used has been created from two datasets:
Google mobility data: https://www.google.com/covid19/mobility/
NOAA climate data from Chicago O’Hare: https://www.ncdc.noaa.gov/
The dataset consists of 90 days of observed weather and park mobility from March 3, 2020 to June 1, 2020. These date were chosen based on the availability of the Google’s mobility data.
weather_mob <- read.csv("weather_mobility.csv")
#pull out spring dates only
spring <- weather_mob[1:90,]
States
Define 3 states:
Clear (below 53 degrees and no precipitation)
Rainy (below 53 degrees with precipitation)
Warm (53 degrees or above)
Add the states to the dataset.
#find rainy days
spring$rain <- ifelse(spring$prcp > 0, 1, 0)
#classify each observation as a specific state: rain, clear, or warm
spring$markov_states <- ifelse(spring$rain == 1 & spring$tmax < 53, "rain",
ifelse(spring$rain == 0 & spring$tmax < 53, "clear", "warm"))
Observations
Park mobility observations are taken from the Google Mobility data. The data compares mobility for the report date to the baseline day. Calculated for the report date (unless there are gaps) and reported as a positive or negative percentage. The baseline day is the median value from the 5‑week period Jan 3 – Feb 6, 2020.
Bin the mobility data to create 3 types of observations:
Low Mobility (less than -10)
Moderate Mobility (-10 to 40)
High Mobility (> 40)
plot(spring$parks, main = "Mobility Observation Bins", ylab = "Mobility Score")
abline(h = c(-10, 40), col = "red")
#bin mobilty data as low, moderate, high mobility
spring$park_obs <- ifelse(spring$parks < -10, "L",
ifelse(spring$parks >= -10 & spring$parks < 40, "M", "H"))
Overview
In part 1, the hidden states are known for the estimation of the transition and emissions matrix as we generated the example. Part 2 will detail use of Baum-Welch to estimate these matrices when the hidden states are not known.
There are 5 components we need to define to run our HMM model:
Hidden states
Set of observations
Transition matrix
Emissions Matrix
Initial Probability Distribution
Hidden States
In a Hidden Markov Model, the state at time \(t\) is unknown. The state will emit an observation, \(v\). In this example, \(M\) is the number of hidden states. The set of Hidden States is \(S\).
Thus, \(S^M =\) {clear, rainy, warm}
#define state:
S <- c('clear','rain', 'warm')
Transition Probability Matrix
The transition matrix is the probability of changing from one state to another state. It is represented as a \(M\times M\) matrix where the rows sum to 1. \(a_{ij}\) is simply the probability of state \(i\) changing to state \(j\) at \(t+1\).
The transition probability matrix here:
\((\hat A) = \begin{bmatrix}p(clear|clear)&p(clear|rainy)&p(clear|warm) \\p(rainy|clear)&p(rainy|rainy)&p(rainy|warm)\\p(warm|clear)&p(warm|rainy)&p(warm|warm)\\\end{bmatrix}\)
A simple example of a transition matrix with two states time step to 10:
spring$markov_states[1:10]
## [1] "clear" "warm" "clear" "warm" "warm" "warm" "clear" "clear"
## [9] "warm" "clear"
The following transitions occur:
clear -> clear 1x
clear -> warm 3x
warm -> clear 3x
warm -> warm 2x
Our expected transition matrix for this simple example is:
\(A = \begin{bmatrix}.25&.75 \\.6&.4\\\end{bmatrix}\)
Compare to output using markovchainFit function.
simple_A <- markovchainFit(spring$markov_states[1:10])
simple_A$estimate
## MLE Fit
## A 2 - dimensional discrete Markov Chain defined by the following states:
## clear, warm
## The transition matrix (by rows) is defined as follows:
## clear warm
## clear 0.25 0.75
## warm 0.60 0.40
Repeat the same process on all 90 states for the estimated transition matrix.
#find transition matrix using markovchain library
transition <- markovchainFit(data=spring$markov_states)
transition$estimate
## MLE Fit
## A 3 - dimensional discrete Markov Chain defined by the following states:
## clear, rain, warm
## The transition matrix (by rows) is defined as follows:
## clear rain warm
## clear 0.1428571 0.42857143 0.4285714
## rain 0.4545455 0.09090909 0.4545455
## warm 0.2105263 0.01754386 0.7719298
A <- rbind(c(transition$estimate[1]), (transition$estimate[2]), (transition$estimate[3]))
#check rows sum to 1
apply(A, 1, sum)
## [1] 1 1 1
Sequence of Observations
At each time step \(t\) the system will emit an observation: low, moderate, or high mobility at the parks. This creates a sequence of observations \(V^T\).
Key Point: each observation is dependent on the state that generated it, not on the neighboring observations.
#H: High - L: Low - M: Moderate (alphabetical order)
V_labels <- c('H', 'L','M')
V <- spring$park_obs
Emission Probability Matrix
The emissions probability matrix contains the probability of emitting symbol \(k\) given state \(j\). It is represented by a \(M\times C\) matrix that rows also sum to 1.
Our observation symbols are L, M, H. One observation will emit from a state at each time step. The estimated emissions matrix will be as follows:
\((\hat B) = \begin{bmatrix}p(high|clear)&p(low|clear)&p(moderate|clear) \\p(high|rainy)&p(low|rainy)&p(moderate|rainy)\\p(high|warm)&p(low|warm)&p(moderate|warm)\\\end{bmatrix}\)
To find the emission’s matrix from the observed data, find the number of times each state is observed for a specific observation type and divide by total number of observations for the state.
#Emissions probabilities
#contingency table
e_table <- table(spring$markov_states, spring$park_obs)
e_table
##
## H L M
## clear 1 8 12
## rain 0 11 0
## warm 15 10 33
#probabilities
clear <- e_table[1,]/sum(e_table[1,])
rain <- e_table[2,]/sum(e_table[2,])
warm <- e_table[3,]/sum(e_table[3,])
#make sure order is same as S vector above
B <- rbind(clear, rain, warm)
B
## H L M
## clear 0.04761905 0.3809524 0.5714286
## rain 0.00000000 1.0000000 0.0000000
## warm 0.25862069 0.1724138 0.5689655
Initial Probability Distribution
The initial state is the state \(s\) at \(t = 1\). Estimate the probability of starting at a particular state from the probability of being in a state in the observed data.
table(spring$markov_states)/length(spring$markov_states)
##
## clear rain warm
## 0.2333333 0.1222222 0.6444444
pi <- rbind(c(.26, .14, .6))
\(\pi = \begin{bmatrix}.26&.14&.60\\\end{bmatrix}\)
# Initialise HMM
hmm = initHMM(S, V_labels, startProbs = pi, transProbs = A, emissionProbs = B)
print(hmm)
## $States
## [1] "clear" "rain" "warm"
##
## $Symbols
## [1] "H" "L" "M"
##
## $startProbs
## clear rain warm
## 0.26 0.14 0.60
##
## $transProbs
## to
## from clear rain warm
## clear 0.1428571 0.42857143 0.4285714
## rain 0.4545455 0.09090909 0.4545455
## warm 0.2105263 0.01754386 0.7719298
##
## $emissionProbs
## symbols
## states H L M
## clear 0.04761905 0.3809524 0.5714286
## rain 0.00000000 1.0000000 0.0000000
## warm 0.25862069 0.1724138 0.5689655
Evaluation Problem
Our model is defined as \(\theta\) and our sequence of observations \(\ V^T\)
Given \(\theta\) and \(V^T\), estimate \(p(V^T|\theta)\)
In other words, let us calculate the probability our model generated a particular sequence \(V^T\).
Forward Algorithm
The forward algorithm is one efficient way to solve the evaluation problem as it derives the probability of the next step on the computed probability of the current step. Goal of forward algorithm compute the joint distribution on \(S_k\)given \(x_{1:k}\)
Simple example: HMM model is \(\theta\). What is the probability the set of observations \((M,M,M)\) came from model \(\theta\) \(p(V^T|\theta)\).
#take first 3 observations
V_sample <- c("M", "M", "M")
forward = function(v, a, b, initial_distribution){
T = length(v)
m = nrow(a)
alpha = matrix(0, T, m)
alpha[1, ] = initial_distribution*b[, v[1]]
for(t in 2:T){
tmp = alpha[t-1, ] %*% a
alpha[t, ] = tmp * b[, v[t]]
}
return(alpha)
}
answer <- forward(V_sample,A,B,pi)
answer
## [,1] [,2] [,3]
## [1,] 0.14857143 0 0.3413793
## [2,] 0.05319647 0 0.1861623
## [3,] 0.02673803 0 0.0947343
sum(answer[3,])
## [1] 0.1214723
Decoding Problem
The decoding problem finds the most probable hidden state at each time step. In other words, what is the most probable path given our set of observations?
Viterbi
Find most likely states based on A and B estimates from observed data. use the viterbi algorithm.
#HMM most likely states
HMM_states <- viterbi(hmm, V)
#compare model to actual
cbind(HMM_states, spring$markov_states)[1:15,]
## HMM_states
## [1,] "warm" "clear"
## [2,] "warm" "warm"
## [3,] "warm" "clear"
## [4,] "warm" "warm"
## [5,] "warm" "warm"
## [6,] "warm" "warm"
## [7,] "warm" "clear"
## [8,] "warm" "clear"
## [9,] "warm" "warm"
## [10,] "clear" "clear"
## [11,] "rain" "rain"
## [12,] "clear" "clear"
## [13,] "rain" "rain"
## [14,] "clear" "clear"
## [15,] "rain" "rain"
#how'd we do? % of states correct
sum(HMM_states == spring$markov_states)/length(spring$markov_states)
## [1] 0.7222222
plot(HMM_states == spring$markov_states, ylab = "Correct States", main = "Performance - Hidden States Known")
Learning Problem
Hidden Markov Models are generally an unsupervised learning process where the number of hidden states are unknown and only the observed symbols are visible.
Baum-Welch
The Baum-Welch algorithm (forward/backward algorithm) is a special case of Expectation Maximization that will estimate the Transition and Emission probabilities.
#update H,L,M to 1,2,3 for function.
spring$park_obs <- ifelse(spring$parks < -10, 1,
ifelse(spring$parks >= -10 & spring$parks < 40, 2, 3))
#backward
backward = function(V, A, B){
T = length(V)
m = nrow(A)
beta = matrix(1, T, m)
for(t in (T-1):1){
tmp = as.matrix(beta[t+1, ] * B[, V[t+1]])
beta[t, ] = t(A %*% tmp)
}
return(beta)
}
#BaumWelch
BaumWelch = function(v, a, b, initial_distribution, n.iter = 100){
for(i in 1:n.iter){
T = length(v)
M = nrow(a)
K=ncol(b)
alpha = forward(v, a, b, initial_distribution)
beta = backward(v, a, b)
xi = array(0, dim=c(M, M, T-1))
for(t in 1:T-1){
denominator = ((alpha[t,] %*% a) * b[,v[t+1]]) %*% matrix(beta[t+1,])
for(s in 1:M){
numerator = alpha[t,s] * a[s,] * b[,v[t+1]] * beta[t+1,]
xi[s,,t]=numerator/as.vector(denominator)
}
}
xi.all.t = rowSums(xi, dims = 2)
a = xi.all.t/rowSums(xi.all.t)
gamma = apply(xi, c(1, 3), sum)
gamma = cbind(gamma, colSums(xi[, , T-1]))
for(l in 1:K){
b[, l] = rowSums(gamma[, which(v==l)])
}
b = b/rowSums(b)
}
return(list(a = a, b = b, initial_distribution = initial_distribution))
}
M=3; K=3
A = matrix(1, M, M)
A = A/rowSums(A)
B = matrix(1:6, M, K)
B = B/rowSums(B)
initial_distribution = c(1/3,1/3, 1/3)
(output = BaumWelch(spring$park_obs, A, B, initial_distribution, n.iter = 100))
## $a
## [,1] [,2] [,3]
## [1,] 0.3953952 0.53133439 0.07327042
## [2,] 0.7546056 0.20945732 0.03593709
## [3,] 0.1042512 0.06582564 0.82992313
##
## $b
## [,1] [,2] [,3]
## [1,] 0.02238586 0.7390644 0.23854972
## [2,] 0.30672318 0.4771669 0.21610996
## [3,] 0.86278023 0.1130348 0.02418494
##
## $initial_distribution
## [1] 0.3333333 0.3333333 0.3333333
Find most likely states based on A and B matrix estimated from Baum-Welch.
pi <- c(1/3,1/3,1/3)
S <- c("rain", "clear","warm")
hmm2 = initHMM(S, V_labels, startProbs = pi, transProbs = output$a, emissionProbs = output$b)
print(hmm2)
## $States
## [1] "rain" "clear" "warm"
##
## $Symbols
## [1] "H" "L" "M"
##
## $startProbs
## rain clear warm
## 0.3333333 0.3333333 0.3333333
##
## $transProbs
## to
## from rain clear warm
## rain 0.3953952 0.53133439 0.07327042
## clear 0.7546056 0.20945732 0.03593709
## warm 0.1042512 0.06582564 0.82992313
##
## $emissionProbs
## symbols
## states H L M
## rain 0.02238586 0.7390644 0.23854972
## clear 0.30672318 0.4771669 0.21610996
## warm 0.86278023 0.1130348 0.02418494
HMM_states2 <- viterbi(hmm2, V)
#compare model to actual
cbind(HMM_states2, spring$markov_states)[1:15,]
## HMM_states2
## [1,] "rain" "clear"
## [2,] "clear" "warm"
## [3,] "rain" "clear"
## [4,] "clear" "warm"
## [5,] "clear" "warm"
## [6,] "rain" "warm"
## [7,] "clear" "clear"
## [8,] "rain" "clear"
## [9,] "clear" "warm"
## [10,] "rain" "clear"
## [11,] "rain" "rain"
## [12,] "clear" "clear"
## [13,] "rain" "rain"
## [14,] "clear" "clear"
## [15,] "rain" "rain"
#how'd we do? % of states correct
sum(HMM_states2 == spring$markov_states)/length(spring$markov_states)
## [1] 0.2333333
plot(HMM_states2 == spring$markov_states, ylab = "Correct States", main = "Performance - Baum-Welch Estimates")
Future Work
Model Improvement
Also a common refrain of modelers: “If we had more time we would have tried to improve the model.” This exercise is no different. Andrew L. did some impressive work getting a real world Hidden Markov Model up and running with only self study to guide him. These models have unlimited applications and an expertise in creating and refining them would be valuable.
Available Libraries
This is far from an exhaustive list of Markov and Hidden Markov packages in R. Packages range from simple to complex. Many of the latter packages were designed to address specific features that other packages are missing.