Overview

In this vignette, I provide a brief introduction to a simple regime switching switching model, which constitutes a special case of hidden Markov models (HMMs). These models allow for greater flexibility to accommodate for non-stationarity in the time series data. From application perspective, these models can be extremely useful in assessing the state of the economy/market. The discussion here revolves mainly behind the science of these models using simulated data. These notes can be very helpful for those interested in exploring and deepening understanding of regime switching models.

The Basic Case

To demonstrate the main idea behind regime switching models, let’s consider the following example. Suppose we have a \(x_t\) stochastic process that has the following conditional distribution for \(t = 1,...,t\) \[\begin{equation} x_t \mid s_t = s \sim N(\mu_s,\sigma_s^2). \end{equation}\] Without a loss of generality, this indicates if we know the current state \(s_t\), then we know that \(x_t\) follows a normal distribution with mean \(\mu_s\) and volatility \(\sigma_s\). Nonetheless, in practice, we do not know the state nor we observe its realization. We can, however, infer it from the data. This requires making assumption about the process \(s_t\), which usually assumed to follow a first order Markov process. Specifically, the probability to transition to state \(j\) at time \(t\), given that we were in state \(i\) at \(t-1\) is given by \[\begin{equation} \mathbb{P}(s_{t}=j\mid s_{t-1}=i)=p_{ij} \end{equation}\] A couple of comments worth noting. First, note that the probability to transition in one step depends only on the previous state, hence, the Markovian property. Second, it is common to represent these transition probability in a matrix, known as transition matrix, denoted by \(\mathbf{P}\). In particular, for \(S\) states, \(p_{ij}\) denotes the \(i\)th and \(j\)th column of \(\mathbf{P}\), for all \(i,j=1,..,S\). For instance, for \(S=2\), the transition matrix is given by \[\begin{equation} \mathbf{P}=\left[\begin{array}{cc} p_{11} & p_{12}\\ p_{21} & p_{22} \end{array}\right] \end{equation}\]

where \(p_{12} = 1- p_{11}\) and \(p_{21} = 1-p_{22}\). In this case, the transition matrix is identified using two parameters only, \(p_{11}\) and \(p_{22}\) which denote the persistence of state 1 and 2, respectively.

The main challenge behind HMM is the hidden part. How can we identify something ``unobservable’’? While \(s_t\) is unobservable, \(x_t\) is observable. As a result, the idea behind HMM is to infer (filter) something latent from something observable.

Toy Data

To demonstrate the filtering part, let’s cook some data and try to reverse engineer it. By construction, I impose a number assumptions to create our data. There are two states that the process transitions between with a fixed-constant transition matrix over time. Each state is associated with a distinct mean and volatility.

library(knitr)
library(kableExtra)
library(dplyr)
theta_v <- data.frame(t(c(2.00,-2.00,1.00,2.00,0.95,0.85)))
names(theta_v) <- c("$\\mu_1$","$\\mu_2$","$\\sigma_1$","$\\sigma_2$","$p_{11}$","$p_{22}$")
kable(theta_v, "html", booktabs = F,escape = F) %>% 
        kable_styling(position = "center")

\(\mu_1\)	\(\mu_2\)	\(\sigma_1\)	\(\sigma_2\)	\(p_{11}\)	\(p_{22}\)
2	-2	1	2	0.95	0.85

As the table above demonstrates, state \(s=2\) resembles the ``bad’’ state in which the process \(x_t\) exhibits negative outcomes with higher volatility. Moreover, we note that the second state is less persistent, such that the probability to stay in state 2 is less likely than the probability of staying is state 1.

Markovian Process

The above given parameters serve as the building block of the simulation process. To simulate process \(x_t\), we start with simulating the Markovian process, \(s_t\). In order to simulate the process over \(T\) periods, first, we need to build the transition matrix given \(p_{11}\) and \(p_{22}\). Second, we need to start with a given state \(s_1 = 1\). Starting with \(s_1 = 1\), we know that there is a probability of 95% to stay in state 1 and 5% to move to state 2. In other words, these probabilities correspond to the first row of the transition matrix.

p11 <- theta_v[1,5] 
p22 <- theta_v[1,6]
P <- matrix(c(p11,1-p22,1-p11,p22),2,2)
P[1,]

## [1] 0.95 0.05

Simulating the \(s_t\) is recursive since it depends on the previous state. Hence, we need to construct a loop:

set.seed(13)
T_end <- 10^2

s0 <- 1 
st <- function(i) sample(1:2,1,prob = P[i,])

s <- st(s0)
for(t in 2:T_end) {
  s <- c(s,st(s[t-1]))
}
plot(s, pch = 20,cex = 0.5)

The above plot demonstrates the persistence of the process \(s_t\). For the majority of the time there are more ``realizations’’ of state 1 than state 2. In fact, this can be confirmed by the stationary probabilities, which are given by

P_stat <- Reduce(function(M1,M2) M1%*%M2 ,lapply(1:100,function(x) P ))
P_stat[1,]

## [1] 0.75 0.25

Hence, there is a 75% chance to be in state in 1 over time, and 25% chance to be in state 2. This should be reflected in the simulated process s, such that

mean(s==1)

## [1] 0.69

Since we are using a small sample of 100 periods, we observe that the stationary probability is 69%, which is close to but not exactly equal to 75%.

The Outcome Process

Given the simulated Markovian process, the simulation of the outcome process is straightforward. A simple trick, though not most computationally efficient, is to simulate \(T\) periods for \(x_t \mid s_t = 1\) and \(T\) periods for \(x_t \mid s_t = 2\). Then, given the simulation of the \(s_t\), we create the outcome variable \(x_t\) with respect to each state. In other words,

set.seed(11)
x1 <- rnorm(T_end,theta_v[1,1],theta_v[1,3])
set.seed(17)
x2 <- rnorm(T_end,theta_v[1,2],theta_v[1,4])

t_index <- 1:T_end
x <- rep(0,T_end)
x[s==1] <- x1[s==1]
x[s==2] <- x2[s==2]
plot(x~t_index, pch = 20)
points(x[s == 2]~t_index[s==2],col = 2)

While the time series, overall, looks stationary, we observe that there are periods (highlighted in red) that exhibit higher variance and lower values. This seems straightforward. However, imagine the case when observe the data in practice without any knowledge about the data generating process behind it. One could suggest that there is a structural break in the data or a change in the regime where the process \(x_t\) becomes more volatile with more negative values. Nonetheless, it is always easier to explain after the fact. The main challenge is identify such case and filter the probability of a regime change using real data. After all, recall that the process \(s_t\) is latent! We will devote the next section to this filtering problem.

Estimation

In this section, I will conduct the statistical inference both manually (from scratch) and non0manually using the MSwM package. In the former, I will demonstrate how to construct the likelihood function and, then, estimate the parameters using a constrained optimization problem In the latter, I will illustrate how to replicate so without going through the trouble of the analytical derivations.

The Likelihood Function - Analytical Part

For those uninterested in the mathematics behind the likelihood function may skip this part. Nonetheless, those seeking specific problems with HMM may find this appealing. Nonetheless, the best way to understand the logic behind the model is to delve a little into the science behind it.

Let \(\Theta\) denote the vector of all parameters and \(L_T\) denote the likelihood function given a sample of size \(T\) consisting of observations \(x_{1},...,x_{T}\), such that \[\begin{equation} L_{T}=f(x_{1},...,x_{T}) \end{equation}\] From the law of total probability we know that \[\begin{equation} f(x_{1},...,x_{T})=f(x_{1},...,x_{T}\mid s_{T}=1)\mathbb{P}(s_{T}=1)+f(x_{1},...,x_{T}\mid s_{T}=2)\mathbb{P}(s_{T}=2) \end{equation}\] with \[\begin{equation} f(x_{1},...,x_{T}\mid s_{T}=s)\mathbb{P}(s_{T}=s)=\frac{f(x_{1},...,x_{T},s_{T}=s)}{\mathbb{P}(s_{T}=s)}\mathbb{P}(s_{t}=s)=f(x_{1},...,x_{T},s_{T}=s) \end{equation}\] such that \[\begin{align} l_{T}(s) & = f(x_{1},...,x_{T},s_{T}=s) \\ & =f(x_{T}\mid s_{T}=s,x_{1},...,x_{T-1})f(x_{1},...,x_{T-1},s_{T}=s) \\ & =f(x_{T}\mid s_{T}=s,x_{1},...,x_{T-1})f(s_{T}=s\mid x_{1},...,x_{T-1})f(x_{1},...,x_{T-1}) \\ & =f(x_{T}\mid s_{T}=s)\cdot\xi_{T\mid T-1,s}\cdot L_{T-1} \end{align}\]

with \(\xi_{T\mid T-1,s}=f(s_{T}=s\mid x_{1},...,x_{T-1})\) denoting the probability of being at state \(s\) at time \(T\) given all prior information until time \(T-1\).

Therefore, the likelihood function can be written in the following recursive manner: \[\begin{equation} L_{T}=\left[f(x_{T}\mid s_{T}=1)\cdot\xi_{T\mid T-1,1}+f(x_{T}\mid s_{T}=2)\cdot\xi_{T\mid T-1,2}\right]\cdot L_{T-1} \end{equation}\] Recall that since \(x_t \mid s_t = s\) follows a normal distribution with mean \(\mu_s\) and volatility \(\sigma_s\), we know that \[\begin{equation} f(x_{t}\mid s_{t}=s) = \frac{1}{\sigma_s} \phi \left( \frac{x_t - \mu_s}{\sigma_s} \right) \end{equation}\]

To simplify the likelihood function, let’s consider the following matrix representation

\[\begin{equation} \boldsymbol{f}_{T}=\left[\begin{array}{c} f(x_{T}\mid s_{T}=1)\\ f(x_{T}\mid s_{T}=2) \end{array}\right]\text{ and }\boldsymbol{\xi}_{T\mid T-1}=\left[\begin{array}{c} \xi_{T\mid T-1,1}\\ \xi_{T\mid T-1,2} \end{array}\right] \end{equation}\] The likelihood function \(L_{T}\), hence, is given by \[\begin{equation} L_{T}=f(x_{1},...,x_{T})=\boldsymbol{f}_{T}^{\prime}\boldsymbol{\xi}_{T\mid T-1}\cdot L_{T-1}=\Pi_{t=1}^{T}\boldsymbol{f}_{t}^{\prime}\boldsymbol{\xi}_{t\mid t-1}. \end{equation}\]

Note that the likelihood function is recursive, even though the conditional distribution of the \(x_t\) is iid. This is mainly due to the fact that \(x_t\) depends on the Markovian hidden process \(s_t\).

Intuitively, \(\xi_{t\mid t-1,s}\) is determined by the transition matrix. To see this, suppose we are able to assess the state of being state \(s\) at time \(t-1\) given the updated information, then \(\xi_{t-1\mid t-1,s}\) denotes the filtered probability, i.e. the probability of being in state \(s\) after the fact. Thus, it can be shown that \[\begin{equation} \xi_{t\mid t-1,1}=\xi_{t-1\mid t-1,1}p_{11}+\xi_{t-1\mid t-1,2}p_{21} \end{equation}\]

such that the probability of being in state 1 in period \(t\) depends on which state we are at time \(t-1\). If we were in state 1 at \(t-1\), then the probability to stay at 1 in the following period is given by \(p_{11}\). However, since the state is hidden, and, thus, we do not know whether we are in state 1 at \(t-1\), we can relate to the filtered probability of being at state 1 at \(t-1\) instead. The same logic applies to state 2. As indicated above, \(\xi_{t\mid t-1,1}\) is decomposed into two parts, conditioned on the filtered state at \(t-1\). A similar reasoning holds true for

\[\begin{equation} \xi_{t\mid t-1,2}=\xi_{t-1\mid t-1,1}p_{12}+\xi_{t-1\mid t-1,2}p_{22} \end{equation}\] In matrix representation, we have \[\begin{equation} \boldsymbol{\xi}_{t\mid t-1}=\mathbf{P^{\prime}}\boldsymbol{\xi}_{t-1\mid t-1} \end{equation}\] The main issue that remains is what constitutes \(\boldsymbol{\xi}_{t-1\mid t-1}\). In other words, we need to represent it using the sample data as well as the parameters in order to identify the likelihood function. By construction, we know that \[\begin{align} \xi_{t-1\mid t-1,1} & = f(s_{t-1} = 1\mid x_{1},...,x_{t-1})=\frac{f(s_{t-1}=1\mid x_{1},...,x_{t-1})}{f(x_{1},...,x_{t-1})} \\ & =\frac{f(x_{1},...,x_{t-1},s_{t-1}=1)}{f(x_{1},...,x_{t-1})} \\ & =\frac{f(x_{t-1}\mid x_{1},...,x_{t-2},s_{t-1}=1)f(x_{1},...,x_{t-2},s_{t-1}=1)}{f(x_{1},...,x_{t-1})} \\ & =\frac{f(x_{t-1}\mid s_{t-1}=1)f(s_{t-1}=1\mid x_{1},...,x_{t-2})f(x_{1},...,x_{t-2})}{\boldsymbol{f}_{t-1}^{\prime}\boldsymbol{\xi}_{t-1\mid t-2}\cdot L_{T-2}} \\ & =\frac{f(x_{t-1}\mid s_{t-1}=1)f(s_{t-1}=1\mid x_{1},...,x_{t-2})L_{t-2}}{\boldsymbol{f}_{t-1}^{\prime}\boldsymbol{\xi}_{t-1\mid t-2}\cdot L_{t-2}} \\ & =\frac{f(x_{t-1}\mid s_{t-1}=1)\xi_{t-1\mid t-2,1}}{\boldsymbol{f}_{t-1}^{\prime}\boldsymbol{\xi}_{t-1\mid t-2}} \end{align}\] Repeating the same logic, it can be established that \[\begin{equation} \xi_{t-1\mid t-1,2}=\frac{f(x_{t-1}\mid s_{t-1}=2)\xi_{t-1\mid t-2,2}}{\boldsymbol{f}_{t-1}^{\prime}\boldsymbol{\xi}_{t-1\mid t-2}} \end{equation}\] Again, in matrix notation, the filter can be written as \[\begin{equation} \boldsymbol{\xi}_{t-1\mid t-1}=\frac{1}{\boldsymbol{f}_{t-1}^{\prime}\boldsymbol{\xi}_{t-1\mid t-2}}\boldsymbol{f}_{t-1}\odot \boldsymbol{\xi}_{t-1\mid t-2} \end{equation}\]

with \(\odot\) is the element by element product.

Consequently, starting with a prior filter such that \(\xi_{0\mid0,1}=\xi_{0\mid0,2}= 1/2\), we can identify the likelihood function for a given data set \(x_1,...,x_T\) and a vector of parameters \(\Theta\). Finally, the the maximum likelihood estimator (MLE) is the vector \(\hat{\Theta}\) that maximizes \(\text{log}(L_T)\).

The Likelihood Function - Numerical Part

After establishing the likelihood function the optimization should be straightforward. First, we need to create a function that takes \(\Theta\) vector as the main input. Second, we need to set an optimization problem that returns the MLE. We define it as follows:

HMM_Lik <- function(theta) {
  mu <- theta[1:2] # means
  sig <- theta[3:4] # volatilities
  p <- t(matrix(c(theta[5],1-theta[6],1-theta[5],theta[6]),2,2)) # transition matrix
  xi <- c(0.5,0.5) # prior about the filter
  
  Xi <- numeric()
  L <- numeric()
  for(t in 1:T_end) {
    phi1 <- dnorm((x[t]-mu[1])/sig[1])/sig[1]
    phi2 <- dnorm((x[t]-mu[2])/sig[2])/sig[2]   
    phi <- c(phi1,phi2)
    xi_next <- t(p)%*%xi
    l_t <- t(phi)%*%xi_next
    L <- c(L,l_t)
    # update the filter for next period
    xi <- c((1/l_t))*(phi*xi_next) # inference prob
    Xi <- rbind(Xi,t(xi))
  }
  LL <- sum(log(L))
  list(Xi = Xi,LL = LL)
}

Before we optimize the likelihood function. Let’s take a look at the filter to understand how it works. Suppose we know the vector of parameters \(\Theta\) and that we are interested assessing the hidden state over time using the data on \(x_t\).

theta_known <- unlist(theta_v[1,])
Filter <- HMM_Lik(theta_known)$Xi
Filter <- cbind(t_index,Filter)
colnames(Filter) <- c("$t$","$\\xi_{t \\mid t, 1}$","$\\xi_{t \\mid t, 2}$")
rownames(Filter) <- NULL
kable(round(head(Filter),3), "html", booktabs = F,escape = F) %>% 
        kable_styling(position = "center")

\(t\)	\(\xi_{t \mid t, 1}\)	\(\xi_{t \mid t, 2}\)
1	0.878	0.122
2	0.982	0.018
3	0.887	0.113
4	0.875	0.125
5	0.318	0.682
6	0.000	1.000

Obviously, the sum of the filter at each time for the two states should be 1. Clearly, we can either be at state 1 or state 2. To confirm,

all(round(apply(Filter[,-1],1,sum),9) == 1)

## [1] TRUE

Luckily, since we engineered this data we know the periods in which state 2 prevails. To see how the filter works, let’s plot it against the true realizations of state 2.

plot(Filter[,3]~t_index, type = "l", ylab = expression(xi[2]))
points(Filter[s==2,3]~t_index[s==2],pch = 20, col = 2)

The beauty behind the filter is the identification of the latent state using information on \(x_t\) only. As we observe, the filter for state for 2 increases mainly when state 2 takes place. This can be demonstrated in the increased probability sounding the red dots, which denote the periods during which state 2 occurs. Nonetheless, there are major issues with the above. First, it assumes that we know the parameters \(\Theta\), whereas in practice we need to estimate this and, upon which, we make our inference. Second, this is all constructed in-sample. From practical point of view, decision makers are interested in the forecasted probabilities and their implications to future investment, for instance.

Manual Estimation

To optimize the above defined HMM_Lik function, I will need to perform two additional steps. The first is to set up an initial estimate that serves as the starting point for search algorithm. Second, we need to set constraints in order to validate that the estimated parameters are coherent, i.e. non-negative volatility and probability values between 0 and 1.

In the first step, I use the sample to create the initial parameter vector \(\Theta_0\)

m1 <- m2 <- mean(x,na.rm = T)
s1 <- s2 <- sd(x,na.rm = T)
p1 <- p2 <- 0.5
theta0 <- c(m1,m2,s1,s2,p1,p2)

In the second step, I formulate the following budget constraints for coherent estimates

# set the constraints
ui <- matrix(0,4,6)
ui[,5] <- c(1,0,-1,0)
ui[,6] <- c(0,1,0,-1)

# stack in matrix and vector
A <- t(matrix(ui,ncol= 4))
A <- ui
B <- c(0.01,0.01,-0.99,-0.99)

Note that the initial vector of parameters should satisfy the constraints

all(A%*%theta0 >= B)

## [1] TRUE

Finally, recall that by construction most optimization algorithms search for the minimum point. Hence, we need to change the output of the likelihood function to a negative value.

L.lik <- function(theta) -HMM_Lik(theta)$LL
opt <- constrOptim(theta0,L.lik, NULL,ui = A, ci = B )
opt

## $par
## [1]  1.7119528 -1.9981224  0.8345350  2.2183230  0.9365507  0.8487511
## 
## $value
## [1] 174.7445
## 
## $counts
## function gradient 
##     1002       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
## 
## $outer.iterations
## [1] 3
## 
## $barrier.value
## [1] 6.538295e-05

To check whether the MLE values are consistent with the true parameters, we plot the estimates versus the true values:

plot(opt$par ~ theta_known,pch = 20,cex=2,ylab="MLE",xlab = "True")
abline(a=0,b=1,lty=2)

Overall, we observe that the estimates are pretty consistent, which is not surprising due to the consistency property of MLE.

Estimation using the `MSwM` Library

So far the discussion in this vignette has served as motivation to learn how we can apply these models in practice. While one can refer to the MSwM package directly, I feel that is crucial to go through the above to gain a deep understanding of the dynamics behind the package. To close the circle, I will demonstrate below how to replicate the results of the manual estimation using the MSwM package.

The main function in the MSwM package is msmFit. This is a generic function that takes an estimated model and deploys on it a regime switching model. To demonstrate this, recall that the conditional distribution of \(x_t\) is iid. One can describe this as the following regression model \[\begin{equation} [x_t \mid s_t = s] = \mu_s + \epsilon_{t,s} \end{equation}\]

with \(\epsilon_{t,s} \sim N(0,\sigma_s^2)\).

If we were to ignore any regime switching in the process, we can simply estimate the parameters \(\mu\) and \(\sigma\) as

mod <- lm(x~1)
mod_est <- data.frame(mu = mod$coefficients, sig = summary(mod)$sig)
names(mod_est) <- c("$\\hat{\\mu}$","$\\hat{\\sigma}$")
rownames(mod_est) <- NULL
kable(mod_est, "html", booktabs = F,escape = F) %>% 
        kable_styling(position = "center")

\(\hat{\mu}\)	\(\hat{\sigma}\)
0.6244574	2.198929

Note that the estimated mean is different from each state alone. Nonetheless, on average, we should expect the process mean to be around 1, i..e \(0.75\times 2 + 0.25\times (-1) = 1\). This results from the law of total expectation property in which we know that \[\begin{equation} \mathbb{E}[x]=\sum_{\forall s}\mathbb{E}\left[x\mid s\right]\mathbb{P}(s)=\sum_{\forall s}\mu_{s}\pi_{s} \end{equation}\]

such that

EX <- 0.75*2 + 0.25*-2
EX

## [1] 1

For the volatility, the same logic applies. Note that \[\begin{equation} \mathbb{E}[x^{2}]=\sum_{\forall s}\mathbb{E}\left[x^{2}\mid s\right]\mathbb{P}(s) \end{equation}\] with \[\begin{equation} \mathbb{E}\left[x^{2}\mid s\right]=\sigma_{s}^{2}+\mu_{s}^{2} \end{equation}\] such \[\begin{equation} \mathbb{E}[x^{2}]=\sum_{\forall s}(\sigma_{s}^{2}+\mu_{s}^{2})\pi_{s} \end{equation}\] As a result, we have \[\begin{equation} \mathbb{V}[x]=\sum_{\forall s}(\sigma_{s}^{2}+\mu_{s}^{2})\pi_{s}-\left[\sum_{\forall s}\mu_{s}\pi_{s}\right]^{2} \end{equation}\]

EX2 <- (2^2 + 1^2)*0.75 + ((-2)^2 + 2^2)*0.25
VX <- EX2 - EX^2
sqrt(VX)

## [1] 2.179449

We note that the estimates from the regression are more consistent with the volatility than mean. This is mainly due to the more cumbersome task of estimating first moments versus second moments.

The point from the above is that the estimates do not cover the true nature of the data. If we assumed that the data is stationary, then we are incorrectly estimating the mean of the process to be 62%. However, at the same time, we know by construction that the process exhibits two mean outcomes - one positive and one negative. The same applies to volatility.

In order to uncover these patterns, we demonstrate below how to deploy a regime switching model using the above linear model:

library(MSwM)
mod.mswm <- msmFit(mod,k=2,p=0,sw=c(TRUE,TRUE),control=list(parallel=TRUE))

The msmFit takes a number of arguments. The major input is the fitted model, mod, which we would like to generalize to accommodate for switching regimes. The second, k, is the number of regimes. Since we know that we are dealing with two states, we set it to 2. However, in practice one would need to refer to an information criteria to identify the optimal number of states. The third argument, p, is relevant if we are deploying auto-regressive models. Since \(x_t\) is iid given a state \(s\), this is irrelevant in our toy data. The other major specification is to define which parameters are switching. By definition, we have two parameters, the mean \(\mu_s\) and the volatility \(\sigma_s\). For this reason, we add a vector of true/false to indicate which parameters are switching. In the above command, we allow both parameters to switch. Finally, we can specify whether the estimation procedure is conducting using parallel computing.

To understand the output from the model, let’s take a look at the output of the

mod.mswm

## Markov Switching Model
## 
## Call: msmFit(object = mod, k = 2, sw = c(TRUE, TRUE), p = 0, control = list(parallel = TRUE))
## 
##        AIC     BIC    logLik
##   352.2843 366.705 -174.1422
## 
## Coefficients:
##         (Intercept)(S)    Std(S)
## Model 1       1.711693 0.8346013
## Model 2      -2.004137 2.2155742
## 
## Transition probabilities:
##            Regime 1  Regime 2
## Regime 1 0.93767719 0.1510052
## Regime 2 0.06232281 0.8489948

The above output mainly reports the six estimated parameters that we tried to estimate manually. First, the coefficients table reports the mean and volatility for each regime/state. Model 1 reports a mean of 1.71 and volatility close to 1. Model 2, reports a negative mean of -2 and a volatility of around 2. Clearly, the model identifies two different states with different means and volatility, with respect to the toy data we constructed. Second, at the bottom of the output, the fitted model reports the transition probabilities. In particular, the transition matrix, with the diagonal denoting \(p_{11}\) and \(p_{22}\). To compare the results from MSwM with the manual MLE and the true parameters, let’s stack altogether:

theta_mswm <- c(unlist(mod.mswm@Coef),mod.mswm@std,diag(mod.mswm@transMat))
plot(opt$par ~ theta_known,pch = 20,cex=2,ylab="MLE",xlab = "True")
points(theta_mswm~theta_known,pch = 1,col = 2, cex = 2,lwd = 2)
abline(a=0,b=1,lty=2)
legend("topleft",c("Manual","MSwM"), pch = c(20,1), col = 1:2)

Interestingly, we find that the manual MLEs overlap with those extracted from the package. In terms of the filter for each state, we compare between the one retrieved from the package and the one extracted manually By definition, one can use the plot function on mod.mswm object to get an insight on the smoothed probabilities as well as the identified regimes.

par(mar = 2*c(1,1,1,1),mfrow = c(2,1))
plotProb(mod.mswm,2)

The plot on the top denotes the process \(x_t\) over time where the gray shaded areas identify the periods in which \(\hat{\xi}_{T\mid t,1} > 0.5\). In other words, the gray areas denote the time periods in which state 1 prevails. Since the library does not know where the data comes from, it makes its judgement using a cutting point of 50%. To see where this comes from, let’s replicate this using the manual estimates that we have.

Filter <- HMM_Lik(opt$par)$Xi
Filter <- cbind(t_index,Filter)
Filter <- data.frame(Filter)
Filter$Regime_1 <- (Filter[,2]>=0.5)*1
xx <- t_index[Filter$Regime_1 == 1] # plot regimes
plot(x~t_index,type ="l",col = 0,xlim=c(1,100))
rect(xx-1,-10,xx,10,col = "lightgray",lty = 0)
lines(x~t_index)
points(x[s==2] ~ t_index[s==2],col = 1,pch = 20)

The above plots emulates the one extracted from the package. One small difference is the regime for day 60. It happens to be there is a single day in which the filter detects the second state in a single period. This happens since the plotProb returns the smoothed probabilities in this case, i.e. the probabilities of being in each state after realizing the whole sample \(T\), i.e. \(\hat{\xi}_{t\mid T,1}\). On the other hand, the one from the manual estimation corresponds to the inference probabilities, \(\hat{\xi}_{t\mid t,1}\).

In any case, since we know the true realizations of the states, we can identify whether we are in the true state or not. We highlight state 2 using black dots in the above plot. Overall, we observe that the filter does pretty well in detecting the states in the data. The only exception is day 60, in which the inference probability is larger than 50%. To see how often the inference probabilities get it right, we run the following command

mean(Filter$Regime_1 == (s==1)*1)

## [1] 0.96

Concluding Remarks

This filtering procedure seems to work pretty well. Nevertheless, there are a number of challenges when it comes to real data implementation. First, when we do not possess the knowledge about the data generating process. Second, the states are not necessarily realized. Therefore, it is unclear to the researcher what the filters do imply. These two concerns can undermine the reliability of the regime switching mode. In terms of application, it is common to deploy such models t0 assess the state of the economy or the market. From decision making, this could also provide an interesting application to tactical allocation. I leave this for a future investigation.

A Brief Introduction to Regime Switching Models

Majeed Simaan

Overview

The Basic Case

Toy Data

Markovian Process

The Outcome Process

Estimation

The Likelihood Function - Analytical Part

The Likelihood Function - Numerical Part

Manual Estimation

Estimation using the `MSwM` Library

Concluding Remarks

A Brief Introduction to Regime Switching Models

Majeed Simaan

Overview

The Basic Case

Toy Data

Markovian Process

The Outcome Process

Estimation

The Likelihood Function - Analytical Part

The Likelihood Function - Numerical Part

Manual Estimation

Estimation using the MSwM Library

Concluding Remarks

Estimation using the `MSwM` Library