\usepackage{sn} \usepackage{tseries} \usepackage{tidyverse}

Risk for models and forecasting: Hidden Markov model with skew-normal distribution.

Authors

Saidjon Rustamov

Dinara Khaldarova

1 SKEW-NORMAL DISTRIBUTION

1.1 🧠 What is the Skew-Normal Distribution?

The skew-normal distribution is a generalization of the normal (Gaussian) distribution that allows for asymmetry (skewness). While the standard normal distribution is symmetric around the mean, the skew-normal can lean to the left or right, depending on the shape parameter.

1.2 📈 Probability Density Function (PDF)

The probability density function (PDF) of the skew-normal distribution is:

\[ f(x; \xi, \omega, \alpha) = \frac{2}{\omega} \, \phi\left( \frac{x - \xi}{\omega} \right) \, \Phi\left( \alpha \cdot \frac{x - \xi}{\omega} \right) \]

Where:

  • \(\phi(\cdot)\) : Standard normal PDF
  • \(\Phi(\cdot)\) : Standard normal CDF
  • \(\xi\) : Location parameter (like the mean)
  • \(\omega\) > 0 : Scale parameter (like the standard deviation)
  • \(\alpha\) : Shape parameter (controls skewness)

1.3 🔁 Special Cases

  • If ( \(\alpha\) = 0 ): It becomes a standard normal distribution.
  • If ( \(\alpha\) > 0 ): The distribution is right-skewed (tail leans right).
  • If ( \(\alpha\) < 0 ): The distribution is left-skewed (tail leans left).

1.4 🔍 Difference Between Normal and Skew-Normal Distributions

The normal distribution is a special case of the skew-normal distribution. Here’s how they differ:

Feature Normal Distribution Skew-Normal Distribution
Shape Symmetrical (bell curve) Asymmetrical (can lean left or right)
Skewness Exactly 0 Controlled by a shape parameter ( \(\alpha\) )
PDF Based on the standard normal function Involves both the standard normal PDF and CDF
Parameters Mean ( \(\mu\) ), SD ( \(\sigma\) ) Location ( \(\xi\) ), Scale ( \(\omega\) ), Shape ( \(\alpha\) )
Flexibility Limited to symmetry More flexible; can model skewed data
Special Case Normal dist. when ( \(\alpha\) = 0 )

In summary, the skew-normal adds an extra parameter to capture asymmetry, making it more adaptable for real-world data that often deviates from perfect symmetry.

Code
# Set seed for reproducibility
set.seed(123)

# Generate 1000 random values from a normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)

# Plot histogram with density instead of frequency
hist(normal_data,
     breaks = 30,
     col = "skyblue",
     border = "white",
     probability = TRUE,
     main = "Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add a normal density curve
curve(dnorm(x, mean = mean(normal_data), sd = sd(normal_data)),
      col = "darkblue",
      lwd = 2,
      add = TRUE)

Code
library(sn)
library(tseries)
# Set seed for reproducibility
set.seed(123)
# Generate 1000 random values from a skew-normal distribution
# Parameters: location (xi), scale (omega), shape (alpha)
skew_data <- rsn(1000, xi = 0, omega = 1, alpha = 5)

# Plot histogram with density
hist(skew_data,
     breaks = 30,
     col = "lightgreen",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = 5),
      col = "darkgreen",
      lwd = 2,
      add = TRUE)

Code
# Load required package
library(sn)

# Set seed for reproducibility
set.seed(123)

# Generate data from skew-normal distribution with alpha = -5
skew_data_neg <- rsn(n = 1000, xi = 0, omega = 1, alpha = -5)

# Plot histogram with density
hist(skew_data_neg,
     breaks = 30,
     col = "salmon",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution (α = -5)",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = -5),
      col = "darkred",
      lwd = 2,
      add = TRUE)

2 HMM (Hidden Markov Models)

2.1 📦 What is a Hidden Markov Model?

A Hidden Markov Model (HMM) is a statistical model where the system being modeled is assumed to follow a Markov process with unobserved (hidden) states.

Note

In plain terms: - There’s a process that moves between hidden states over time. - Each hidden state emits an observable output. - You see the outputs, but you don’t see the states.

Warning

Finance : Market regime switching: bull - bear market.

2.2 🧩 HMM Components

A Hidden Markov Model (HMM) is defined by the following key elements:

2.2.1 🔒 Hidden States

Denoted as:

\[ C_1, C_2, \dots, C_T \]

  • A finite set of unobserved states (e.g., calm/volatile, sunny/rainy).
  • The state at time ( t ), denoted ( C_t ), follows a Markov chain:

\[ P(C_t \mid C_{t-1}) = \text{transition probability} \]

This means the current state only depends on the previous state, not the full history.

2.2.2 📈 Observations

Denoted as:

\[ X_1, X_2, \dots, X_T \]

  • Each state emits data based on a specific probability distribution.
  • The observation at time ( t ), ( X_t ), depends only on the hidden state at that time:

\[ P(X_t \mid C_t) \]

These are sometimes called the emission probabilities.

2.2.3 🎲 Initial Probabilities

The model starts with an initial distribution over the hidden states:

\[ \delta = P(C_1) \]

This represents the probability of starting in each possible hidden state.

2.3 🔁 Markov Property

The hidden states follow the first-order Markov assumption:

\[ P(C_t \mid C_{1:t-1}) = P(C_t \mid C_{t-1}) \]

Also, the current observation depends only on the current state:

\[ P(X_t \mid C_{1:T}, X_{1:t-1}) = P(X_t \mid C_t) \]

2.4 ⚙️ Parameters of an HMM

Transition matrix ( ):

\[ \Gamma = \begin{bmatrix} P(C_{t+1} = 1 \mid C_t = 1) & \cdots & \cdots & P(C_{t+1} = m \mid C_t = m) \end{bmatrix} \]


Emission probabilities (depends on the application):

  • Normal: \(\mathcal{N}(\mu_i, \sigma_i^2)\)
  • Poisson: \(\ \text{Pois}(\lambda_i)\)
  • Skew-normal, etc.

Initial distribution:
\(\ \delta = P(C_1 = i)\)

2.5 📈 What You Can Do With HMMs

Task What It Means
Likelihood evaluation Compute \(\\P(X_1, \dots, X_T)\)
State estimation Infer the most likely hidden states
Parameter estimation Learn transition & emission parameters
Forecasting Predict future observations or states

2.6 📐 Key Algorithms


2.6.1 🔄 Forward Algorithm

Efficiently computes the likelihood of observed data using recursion:

\[ \alpha_t(j) = \sum_i \alpha_{t-1}(i) \cdot \gamma_{ij} \cdot P(X_t \mid C_t = j) \]

Where:

  • \(\alpha_t(j)\): probability of partial sequence ending in state \(\\j\)
  • \(\gamma_{ij}\): transition probability from state \(\\i\) to \(\\j\)
  • \(\\P(X_t \mid C_t = j)\): emission probability at time \(\\t\)

2.6.2 🔁 Backward Algorithm

Used for:

  • Smoothing (estimating past states)
  • EM Algorithm for learning parameters

2.7 📊 Visualization

  • 📈 Plot observations with colored hidden states
  • 📊 Show state probabilities over time
  • 📉 Compare model fit using criteria like AIC and BIC across different numbers of states

2.8 📌 Advantages of HMMs

✅ Handles hidden structure in time series
✅ Efficient thanks to Forward-Backward recursion
✅ Works well for sequential or temporal data
✅ Easily extended (e.g., with skew-normal emissions, covariates, etc.)

3 Main Part —> Project (1)

3.1 SX600 —> Analyzing data

Code
sx600 = get.hist.quote(instrument = "^stoxx", start = "2015-01-01",
                       quote = "AdjClose")
time series starts 2015-01-05
time series ends   2025-04-29
Code
sx600.num = as.numeric(sx600)
plot(sx600)

Code
nts = length(sx600)
sx600.sr = 100*diff(sx600.num)/sx600.num[-nts]
sx600.cr = diff(log(sx600))
plot(sx600.sr, type = "l", ylim = c(-10,10))

Code
hist(sx600.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.7*dsn(x,x=0.5,omega=0.8, alpha = 5)+ 0.2*dsn(x,x=0.6,omega=0.8, alpha = -4)+
+ 0.1*dsn(x,x=-1.85,omega=3, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code
set.seed(42)  # For reproducibility

data <- as.numeric(sx600.sr)
n <- length(data)

# Define your target proportions
weights <- c(0.9, 0.7, 0.4, 0.2)

# Initialize vectors to store estimated parameters
xi <- numeric(4)
omega <- numeric(4)
alpha <- numeric(4)

# Create overlapping subsets
splits <- list()
for (i in 1:4) {
  size_i <- floor(weights[i] * n)
  splits[[i]] <- sample(data, size_i, replace = FALSE)  # sample without replacement, but from full data each time
}

# Estimate skew-normal parameters for each subset
results <- list()
for (i in 1:4) {
  fit <- selm(splits[[i]] ~ 1, family = "SN")
  params <- coef(fit, param.type = "DP")
  # Store estimates
  xi[i] <- params["xi"]
  omega[i] <- params["omega"]
  alpha[i] <- params["alpha"]
  results[[i]] <- c(params,xi,omega, alpha)
  cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n",
              i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))
}
Subset 1 (90% of data): xi = 0.875, omega = 1.358, alpha = -1.378
Subset 2 (70% of data): xi = 0.858, omega = 1.345, alpha = -1.278
Subset 3 (40% of data): xi = 0.923, omega = 1.447, alpha = -1.557
Subset 4 (20% of data): xi = -0.597, omega = 1.255, alpha = 0.809
Code
forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 2:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
    return(results)
}


out = forward_skew_loop(data=as.numeric(sx600.sr), 
                        location.vec = xi, 
                        scale.vec = omega, 
                        shape = alpha)

for (m in 2:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}
HMM with 2 components has likelihood value: -3755.94 
HMM with 3 components has likelihood value: -3753.523 
HMM with 4 components has likelihood value: -3729.03 
Code
# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 2:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}
[1] 0.8430069 0.1569931
          [,1]      [,2]
[1,] 0.8430069 0.1569931
[1] 0.3455869 0.2265738 0.4278392
          [,1]      [,2]      [,3]
[1,] 0.3455869 0.2265738 0.4278392
[1] 0.1680199 0.2624128 0.2600075 0.3095598
          [,1]      [,2]      [,3]      [,4]
[1,] 0.1680199 0.2624128 0.2600075 0.3095598
Code
llik = function(par, m, data) {
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  return(llik)
}

llik_vals = numeric(4)
for (i in 2:4) {
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)  
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sx600.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
  }
Log-likelihood for 2 components: -3755.939 
Log-likelihood for 3 components: -3753.525 
Log-likelihood for 4 components: -3729.031 
Code
llik_optim_vals = numeric(4)
llik_optim_pars = list()
for (i in 2:4){
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0, i^2), rep(-Inf, i), rep(0.01, m), rep(-10, i)),
               upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)),
               control=list(fnscale=-1),data=as.numeric(sx600.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  llik_optim_pars[[i]]=outhmm$par
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}
MAX llik for 2 components: -3358.633 
MAX llik for 3 components: -3284.078 
MAX llik for 4 components: -3274.408 
Code
aic_last_vals = numeric()
for (i in 2:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}
AIC for 2 components: 6733.266 
AIC for 3 components: 6598.156 
AIC for 4 components: 6596.815 
Code
best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")

Best model is with 4 components (lowest AIC =  6596.815 )

3.2 SP500 —> Analyzing data

Code
sp500 = get.hist.quote(instrument = "^gspc", start = "2015-01-01"
                       , quote = "AdjClose")
time series starts 2015-01-02
time series ends   2025-04-29
Code
sp500.num = as.numeric(sp500)
plot(sp500)

Code
nts = length(sp500)
sp500.sr = 100*diff(sp500.num)/sp500.num[-nts]
sp500.cr = diff(log(sp500))
plot(sp500.sr, type = "l", ylim = c(-10,10))

Code
hist(sp500.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.5*dsn(x,x=0.3,omega=0.6, alpha = 5)+
  +0.2*dsn(x,x=0.4,omega=0.8, alpha = -4)+
  +0.1*dsn(x,x=-1,omega=1, alpha = 2)+
  +0.2*dsn(x,x=1,omega=1, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code
set.seed(45)  # For reproducibility

data <- as.numeric(sp500.sr)
n <- length(data)

# Define your target proportions
weights <- c(0.9, 0.7, 0.4, 0.2)

# Initialize vectors to store estimated parameters
xi <- numeric(4)
omega <- numeric(4)
alpha <- numeric(4)

# Create overlapping subsets
splits <- list()
for (i in 1:4) {
  size_i <- floor(weights[i] * n)
  splits[[i]] <- sample(data, size_i, replace = FALSE)  # sample without replacement, but from full data each time
}

# Estimate skew-normal parameters for each subset
results <- list()
for (i in 1:4) {
  fit <- selm(splits[[i]] ~ 1, family = "SN")
  params <- coef(fit, param.type = "DP")
  # Store estimates
  xi[i] <- params["xi"]
  omega[i] <- params["omega"]
  alpha[i] <- params["alpha"]
  results[[i]] <- c(params,xi,omega, alpha)
  cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n",
              i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))
}
Subset 1 (90% of data): xi = 0.795, omega = 1.370, alpha = -0.955
Subset 2 (70% of data): xi = 0.859, omega = 1.419, alpha = -1.052
Subset 3 (40% of data): xi = -0.649, omega = 1.369, alpha = 0.892
Subset 4 (20% of data): xi = 0.908, omega = 1.411, alpha = -1.186
Code
forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 2:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
  
  
  
  return(results)
}

out = forward_skew_loop(data=as.numeric(sp500.sr), 
                        location.vec = xi, 
                        scale.vec = omega, 
                        shape = alpha)


for (m in 2:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}
HMM with 2 components has likelihood value: -4023.631 
HMM with 3 components has likelihood value: -3987.416 
HMM with 4 components has likelihood value: -3971.691 
Code
# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 2:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}
[1] 0.5477482 0.4522518
          [,1]      [,2]
[1,] 0.5477482 0.4522518
[1] 0.2035905 0.1775676 0.6188419
          [,1]      [,2]      [,3]
[1,] 0.2035905 0.1775676 0.6188419
[1] 0.2317339 0.3612939 0.2448064 0.1621657
          [,1]      [,2]      [,3]      [,4]
[1,] 0.2317339 0.3612939 0.2448064 0.1621657
Code
llik = function(par, m, data) {
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  return(llik)
}







llik_vals = numeric(4)
for (i in 2:4) {
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)  
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sp500.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
}
Log-likelihood for 2 components: -4023.632 
Log-likelihood for 3 components: -3987.419 
Log-likelihood for 4 components: -3971.684 
Code
llik_optim_vals = numeric(4)
llik_optim_pars = list()
for (i in 2:4){
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0, i^2), rep(-Inf,i), rep(0.01, i), rep(-10, i)),
               upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)),
               control=list(fnscale=-1),data=as.numeric(sp500.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  llik_optim_pars[[i]]=outhmm$par
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}
MAX llik for 2 components: -3456.351 
MAX llik for 3 components: -3330.375 
MAX llik for 4 components: -3318.017 
Code
aic_last_vals = numeric()
for (i in 2:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}
AIC for 2 components: 6928.701 
AIC for 3 components: 6690.75 
AIC for 4 components: 6684.033 
Code
best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")

Best model is with 4 components (lowest AIC =  6684.033 )