Risk for models and forecasting: Hidden Markov model with skew-normal distribution.

Authors

Saidjon Rustamov

Dinara Khaldarova

1 SKEW-NORMAL DISTRIBUTION

1.1 🧠 What is the Skew-Normal Distribution?

The skew-normal distribution is a generalization of the normal (Gaussian) distribution that allows for asymmetry (skewness). While the standard normal distribution is symmetric around the mean, the skew-normal can lean to the left or right, depending on the shape parameter.

1.2 📈 Probability Density Function (PDF)

The probability density function (PDF) of the skew-normal distribution is:

\[ f(x; \xi, \omega, \alpha) = \frac{2}{\omega} \, \phi\left( \frac{x - \xi}{\omega} \right) \, \Phi\left( \alpha \cdot \frac{x - \xi}{\omega} \right) \]

Where:

  • \(\phi(\cdot)\) : Standard normal PDF
  • \(\Phi(\cdot)\) : Standard normal CDF
  • \(\xi\) : Location parameter (like the mean)
  • \(\omega\) > 0 : Scale parameter (like the standard deviation)
  • \(\alpha\) : Shape parameter (controls skewness)

1.3 🔁 Special Cases

  • If ( \(\alpha\) = 0 ): It becomes a standard normal distribution.
  • If ( \(\alpha\) > 0 ): The distribution is right-skewed (tail leans right).
  • If ( \(\alpha\) < 0 ): The distribution is left-skewed (tail leans left).

1.4 🔍 Difference Between Normal and Skew-Normal Distributions

The normal distribution is a special case of the skew-normal distribution. Here’s how they differ:

Feature Normal Distribution Skew-Normal Distribution
Shape Symmetrical (bell curve) Asymmetrical (can lean left or right)
Skewness Exactly 0 Controlled by a shape parameter ( \(\alpha\) )
PDF Based on the standard normal function Involves both the standard normal PDF and CDF
Parameters Mean ( \(\mu\) ), SD ( \(\sigma\) ) Location ( \(\xi\) ), Scale ( \(\omega\) ), Shape ( \(\alpha\) )
Flexibility Limited to symmetry More flexible; can model skewed data
Special Case Normal dist. when ( \(\alpha\) = 0 )

In summary, the skew-normal adds an extra parameter to capture asymmetry, making it more adaptable for real-world data that often deviates from perfect symmetry.

Code
# Set seed for reproducibility
set.seed(123)

# Generate 1000 random values from a normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)

# Plot histogram with density instead of frequency
hist(normal_data,
     breaks = 30,
     col = "skyblue",
     border = "white",
     probability = TRUE,
     main = "Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add a normal density curve
curve(dnorm(x, mean = mean(normal_data), sd = sd(normal_data)),
      col = "darkblue",
      lwd = 2,
      add = TRUE)

Code
library(sn)
library(tseries)
library(xts)
library(ggplot2)
library(ggthemes)
library(e1071)
library(PerformanceAnalytics)
library(patchwork)
library(scales)
library(quantmod)
library(dplyr)
library(lubridate)
# Set seed for reproducibility
set.seed(123)
# Generate 1000 random values from a skew-normal distribution
# Parameters: location (xi), scale (omega), shape (alpha)
skew_data <- rsn(1000, xi = 0, omega = 1, alpha = 5)

# Plot histogram with density
hist(skew_data,
     breaks = 30,
     col = "lightgreen",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = 5),
      col = "darkgreen",
      lwd = 2,
      add = TRUE)

Code
# Set seed for reproducibility
set.seed(123)

# Generate data from skew-normal distribution with alpha = -5
skew_data_neg <- rsn(n = 1000, xi = 0, omega = 1, alpha = -5)

# Plot histogram with density
hist(skew_data_neg,
     breaks = 30,
     col = "salmon",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution (α = -5)",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = -5),
      col = "darkred",
      lwd = 2,
      add = TRUE)

2 HMM (Hidden Markov Models)

2.1 📦 What is a Hidden Markov Model?

A Hidden Markov Model (HMM) is a statistical model where the system being modeled is assumed to follow a Markov process with unobserved (hidden) states.

Note

In plain terms: - There’s a process that moves between hidden states over time. - Each hidden state emits an observable output. - You see the outputs, but you don’t see the states.

Warning

Finance : Market regime switching: bull - bear market.

2.2 🧩 HMM Components

A Hidden Markov Model (HMM) is defined by the following key elements:

2.2.1 🔒 Hidden States

Denoted as:

\[ C_1, C_2, \dots, C_T \]

  • A finite set of unobserved states (e.g., calm/volatile, sunny/rainy).
  • The state at time ( t ), denoted ( C_t ), follows a Markov chain:

\[ P(C_t \mid C_{t-1}) = \text{transition probability} \]

This means the current state only depends on the previous state, not the full history.

2.2.2 📈 Observations

Denoted as:

\[ X_1, X_2, \dots, X_T \]

  • Each state emits data based on a specific probability distribution.
  • The observation at time ( t ), ( X_t ), depends only on the hidden state at that time:

\[ P(X_t \mid C_t) \]

These are sometimes called the emission probabilities.

2.2.3 🎲 Initial Probabilities

The model starts with an initial distribution over the hidden states:

\[ \delta = P(C_1) \]

This represents the probability of starting in each possible hidden state.

2.3 🔁 Markov Property

The hidden states follow the first-order Markov assumption:

\[ P(C_t \mid C_{1:t-1}) = P(C_t \mid C_{t-1}) \]

Also, the current observation depends only on the current state:

\[ P(X_t \mid C_{1:T}, X_{1:t-1}) = P(X_t \mid C_t) \]

2.4 ⚙️ Parameters of an HMM

Transition matrix ( ):

\[ \Gamma = \begin{bmatrix} P(C_{t+1} = 1 \mid C_t = 1) & \cdots & \cdots & P(C_{t+1} = m \mid C_t = m) \end{bmatrix} \]


Emission probabilities (depends on the application):

  • Normal: \(\mathcal{N}(\mu_i, \sigma_i^2)\)
  • Poisson: \(\ \text{Pois}(\lambda_i)\)
  • Skew-normal, etc.

Initial distribution:
\(\ \delta = P(C_1 = i)\)

2.5 📈 What You Can Do With HMMs

Task What It Means
Likelihood evaluation Compute \(\\P(X_1, \dots, X_T)\)
State estimation Infer the most likely hidden states
Parameter estimation Learn transition & emission parameters
Forecasting Predict future observations or states

2.6 📐 Key Algorithms


2.6.1 🔄 Forward Algorithm

Efficiently computes the likelihood of observed data using recursion:

\[ \alpha_t(j) = \sum_i \alpha_{t-1}(i) \cdot \gamma_{ij} \cdot P(X_t \mid C_t = j) \]

Where:

  • \(\alpha_t(j)\): probability of partial sequence ending in state \(\\j\)
  • \(\gamma_{ij}\): transition probability from state \(\\i\) to \(\\j\)
  • \(\\P(X_t \mid C_t = j)\): emission probability at time \(\\t\)

2.6.2 🔁 Backward Algorithm

Used for:

  • Smoothing (estimating past states)
  • EM Algorithm for learning parameters

2.7 📊 Visualization

  • 📈 Plot observations with colored hidden states
  • 📊 Show state probabilities over time
  • 📉 Compare model fit using criteria like AIC and BIC across different numbers of states

2.8 📌 Advantages of HMMs

✅ Handles hidden structure in time series
✅ Efficient thanks to Forward-Backward recursion
✅ Works well for sequential or temporal data
✅ Easily extended (e.g., with skew-normal emissions, covariates, etc.)

3 Main Part —> Project code

3.1 SX600 —> Analyzing data

Code
sx600 = get.hist.quote(instrument = "^stoxx", start = "2015-01-01", end = "2025-04-30",
                       quote = "AdjClose")
time series starts 2015-01-05
time series ends   2025-04-29
Code
sx600.num = as.numeric(sx600)
plot(sx600)

Code
nts = length(sx600)
sx600.sr = 100*diff(sx600.num)/sx600.num[-nts]
sx600.cr = diff(log(sx600))
plot(sx600.sr, type = "l", ylim = c(-10,10))

Code
hist(sx600.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.7*dsn(x,x=0.5,omega=0.8, alpha = 5)+ 0.2*dsn(x,x=0.6,omega=0.8, alpha = -4)+
+ 0.1*dsn(x,x=-1.85,omega=3, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code
set.seed(42)  # For reproducibility

data <- as.numeric(sx600.sr)
n <- length(data)

# Define your target proportions
weights <- c(0.9, 0.7, 0.4, 0.2)

# Initialize vectors to store estimated parameters
xi <- numeric(4)
omega <- numeric(4)
alpha <- numeric(4)

# Create overlapping subsets
splits <- list()
for (i in 1:4) {
  size_i <- floor(weights[i] * n)
  splits[[i]] <- sample(data, size_i, replace = FALSE)  # sample without replacement, but from full data each time
}

# Estimate skew-normal parameters for each subset
results <- list()
for (i in 1:4) {
  fit <- selm(splits[[i]] ~ 1, family = "SN")
  params <- coef(fit, param.type = "DP")
  # Store estimates
  xi[i] <- params["xi"]
  omega[i] <- params["omega"]
  alpha[i] <- params["alpha"]
  results[[i]] <- c(params,xi,omega, alpha)
  cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n",
              i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))
}
Subset 1 (90% of data): xi = 0.875, omega = 1.358, alpha = -1.378
Subset 2 (70% of data): xi = 0.858, omega = 1.345, alpha = -1.278
Subset 3 (40% of data): xi = 0.923, omega = 1.447, alpha = -1.557
Subset 4 (20% of data): xi = -0.597, omega = 1.255, alpha = 0.809
Code
forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 2:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
    return(results)
}


out = forward_skew_loop(data=as.numeric(sx600.sr), 
                        location.vec = xi, 
                        scale.vec = omega, 
                        shape = alpha)

for (m in 2:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}
HMM with 2 components has likelihood value: -3755.94 
HMM with 3 components has likelihood value: -3753.523 
HMM with 4 components has likelihood value: -3729.03 
Code
# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 2:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}
[1] 0.8430069 0.1569931
          [,1]      [,2]
[1,] 0.8430069 0.1569931
[1] 0.3455869 0.2265738 0.4278392
          [,1]      [,2]      [,3]
[1,] 0.3455869 0.2265738 0.4278392
[1] 0.1680199 0.2624128 0.2600075 0.3095598
          [,1]      [,2]      [,3]      [,4]
[1,] 0.1680199 0.2624128 0.2600075 0.3095598
Code
llik = function(par, m, data) {
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  return(llik)
}

llik_vals = numeric(4)
for (i in 2:4) {
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)  
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sx600.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
  }
Log-likelihood for 2 components: -3755.939 
Log-likelihood for 3 components: -3753.525 
Log-likelihood for 4 components: -3729.031 
Code
llik_optim_vals = numeric(4)
llik_optim_pars = list()
for (i in 2:4){
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0, i^2), rep(-Inf, i), rep(0.01, m), rep(-10, i)),
               upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)),
               control=list(fnscale=-1),data=as.numeric(sx600.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  llik_optim_pars[[i]]=outhmm$par
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}
MAX llik for 2 components: -3358.633 
MAX llik for 3 components: -3284.078 
MAX llik for 4 components: -3274.408 
Code
aic_last_vals = numeric()
for (i in 2:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}
AIC for 2 components: 6733.266 
AIC for 3 components: 6598.156 
AIC for 4 components: 6596.815 
Code
best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")

Best model is with 4 components (lowest AIC =  6596.815 )
Code
m=best_model
gamma.mle = matrix(llik_optim_pars[[m]][1:m^2], ncol=m, byrow=TRUE)
gamma.mlev = gamma.mle/apply(gamma.mle, FUN=sum, MAR=1)
gamma.mlev
           [,1]       [,2]        [,3]      [,4]
[1,] 0.93760405 0.05261219 0.009783761 0.0000000
[2,] 0.04427223 0.95572777 0.000000000 0.0000000
[3,] 0.00000000 0.00000000 0.505253711 0.4947463
[4,] 0.08535604 0.00000000 0.442838669 0.4718053
Code
xi = llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)]
cat("\nBest model's location parameter is:", llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)])

Best model's location parameter is: 0.7496635 0.5134708 0.5325302 -0.009874364
Code
omega = llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)]
cat("\nBest model's scale parameter is:", llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)])

Best model's scale parameter is: 1.301072 0.6749959 2.951049 2.074013
Code
alpha = llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)]
cat("\nBest model's shape parameter is:", llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)])    

Best model's shape parameter is: -1.133608 -1.252673 -1.891089 1.360593
Code
llik1 = function(par, m, data) {
  results = list()
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  results=list(llik = llik,
               filtering = filtering,
               pred = pred,
               delta = delta)
  return(results)
}

pars = llik_optim_pars[[m]]
forward_results <- llik1(par = llik_optim_pars[[m]], m = m, data = as.numeric(sx600.sr))
filtering <- forward_results$filtering
most_likely_states <- apply(filtering, 2, which.max)

# Plot histograms by state using only training data
par(mfrow = c(2, 2))  # 2x2 grid for 4 states

for (state in 1:4) {
  # Filter: only training period & only current state
  data_state <- as.numeric(sx600.sr)[most_likely_states == state]
  
  if (length(data_state) == 0) {
    plot.new()
    title(main = paste("No data for State", state))
    next
  }
  
  hist_info <- hist(data_state,  probability = TRUE, breaks = 50, plot = FALSE)
  
  data_min <- min(data_state)
  data_max <- max(data_state)
  range_expansion <- (data_max - data_min) * 0.15
  
  x_min <- data_min - range_expansion
  x_max <- data_max + range_expansion
  x_vals <- seq(x_min, x_max, length.out = 500)
  y_vals <- dsn(x_vals, xi = xi[state], omega = omega[state], alpha = alpha[state])
  
  ylim_max <- max(max(hist_info$density), max(y_vals)) * 1.1
  
  hist(data_state, probability = TRUE, breaks = 50,
       main = paste("Histogram of Returns - State", state),
       xlab = "Log-Return", ylab = "Density",
       col = "lightgray", border = "white",
       xlim = c(x_min, x_max), ylim = c(0, ylim_max))
  
  curve(dsn(x, xi = xi[state], omega = omega[state], alpha = alpha[state]),
        from = x_min, to = x_max, add = TRUE, col = "red", lwd = 2)
}
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования

Code
dates = index(sx600[-1])
sx600.ret = xts(sx600.cr, order.by = dates)
dates_full <- index(sx600.ret)
returns_full <- na.omit(as.numeric(sx600.ret[,1]))



plot_data_train <- data.frame(
  Date = dates_full,
  Return = returns_full,
  State = factor(most_likely_states, levels = 1:4)  # Only 4 states
)

# Step 3: Plot
ggplot(plot_data_train, aes(x = Date, y = Return, color = State)) +
  geom_point(size = 0.8, alpha = 0.8) +
  scale_color_brewer(palette = "Dark2") +
  labs(title = "STOXX600 Daily Returns Colored by Hidden State (2015–2025)",
       y = "Log-Return",
       x = "Date") +
  theme_minimal()

3.2 SP500 —> Analyzing data

Code
sp500 = get.hist.quote(instrument = "^gspc", start = "2015-01-01", end = "2025-04-30"
                       , quote = "AdjClose")
time series starts 2015-01-02
time series ends   2025-04-29
Code
sp500.num = as.numeric(sp500)
plot(sp500)

Code
nts = length(sp500)
sp500.sr = 100*diff(sp500.num)/sp500.num[-nts]
sp500.cr = diff(log(sp500))
plot(sp500.sr, type = "l", ylim = c(-10,10))

Code
hist(sp500.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.5*dsn(x,x=0.3,omega=0.6, alpha = 5)+
  +0.2*dsn(x,x=0.4,omega=0.8, alpha = -4)+
  +0.1*dsn(x,x=-1,omega=1, alpha = 2)+
  +0.2*dsn(x,x=1,omega=1, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code
set.seed(45)  # For reproducibility

data <- as.numeric(sp500.sr)
n <- length(data)

# Define your target proportions
weights <- c(0.9, 0.7, 0.4, 0.2)

# Initialize vectors to store estimated parameters
xi <- numeric(4)
omega <- numeric(4)
alpha <- numeric(4)

# Create overlapping subsets
splits <- list()
for (i in 1:4) {
  size_i <- floor(weights[i] * n)
  splits[[i]] <- sample(data, size_i, replace = FALSE)  # sample without replacement, but from full data each time
}

# Estimate skew-normal parameters for each subset
results <- list()
for (i in 1:4) {
  fit <- selm(splits[[i]] ~ 1, family = "SN")
  params <- coef(fit, param.type = "DP")
  # Store estimates
  xi[i] <- params["xi"]
  omega[i] <- params["omega"]
  alpha[i] <- params["alpha"]
  results[[i]] <- c(params,xi,omega, alpha)
  cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n",
              i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))
}
Subset 1 (90% of data): xi = 0.795, omega = 1.370, alpha = -0.955
Subset 2 (70% of data): xi = 0.859, omega = 1.419, alpha = -1.052
Subset 3 (40% of data): xi = -0.649, omega = 1.369, alpha = 0.892
Subset 4 (20% of data): xi = 0.908, omega = 1.411, alpha = -1.186
Code
forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 2:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
  
  
  
  return(results)
}

out = forward_skew_loop(data=as.numeric(sp500.sr), 
                        location.vec = xi, 
                        scale.vec = omega, 
                        shape = alpha)


for (m in 2:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}
HMM with 2 components has likelihood value: -4023.631 
HMM with 3 components has likelihood value: -3987.416 
HMM with 4 components has likelihood value: -3971.691 
Code
# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 2:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}
[1] 0.5477482 0.4522518
          [,1]      [,2]
[1,] 0.5477482 0.4522518
[1] 0.2035905 0.1775676 0.6188419
          [,1]      [,2]      [,3]
[1,] 0.2035905 0.1775676 0.6188419
[1] 0.2317339 0.3612939 0.2448064 0.1621657
          [,1]      [,2]      [,3]      [,4]
[1,] 0.2317339 0.3612939 0.2448064 0.1621657
Code
llik = function(par, m, data) {
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  return(llik)
}







llik_vals = numeric(4)
for (i in 2:4) {
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)  
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sp500.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
}
Log-likelihood for 2 components: -4023.632 
Log-likelihood for 3 components: -3987.419 
Log-likelihood for 4 components: -3971.684 
Code
llik_optim_vals = numeric(4)
llik_optim_pars = list()
for (i in 2:4){
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0, i^2), rep(-Inf,i), rep(0.01, i), rep(-10, i)),
               upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)),
               control=list(fnscale=-1),data=as.numeric(sp500.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  llik_optim_pars[[i]]=outhmm$par
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}
MAX llik for 2 components: -3456.351 
MAX llik for 3 components: -3330.375 
MAX llik for 4 components: -3318.017 
Code
aic_last_vals = numeric()
for (i in 2:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}
AIC for 2 components: 6928.701 
AIC for 3 components: 6690.75 
AIC for 4 components: 6684.033 
Code
best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")

Best model is with 4 components (lowest AIC =  6684.033 )
Code
m=best_model
gamma.mle = matrix(llik_optim_pars[[m]][1:m^2], ncol=m, byrow=TRUE)
gamma.mlev = gamma.mle/apply(gamma.mle, FUN=sum, MAR=1)
gamma.mlev
           [,1]       [,2]       [,3]      [,4]
[1,] 0.97373389 0.02626611 0.00000000 0.0000000
[2,] 0.00000000 0.47620583 0.03685692 0.4869373
[3,] 0.00000000 0.00000000 0.90362390 0.0963761
[4,] 0.04014038 0.08207112 0.00000000 0.8777885
Code
xi = llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)]
cat("\nBest model's location parameter is:", llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)])

Best model's location parameter is: 0.3829119 0.2964956 -0.7069928 1.106778
Code
omega = llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)]
cat("\nBest model's scale parameter is:", llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)])

Best model's scale parameter is: 0.5968282 1.692717 3.722713 1.502212
Code
alpha = llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)]
cat("\nBest model's shape parameter is:", llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)])    

Best model's shape parameter is: -0.7264074 -1.438691 0.2471854 -1.268492
Code
llik1 = function(par, m, data) {
  results = list()
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  results=list(llik = llik,
               filtering = filtering,
               pred = pred,
               delta = delta)
  return(results)
}

pars = llik_optim_pars[[m]]
forward_results <- llik1(par = llik_optim_pars[[m]], m = m, data = as.numeric(sp500.sr))
filtering <- forward_results$filtering
most_likely_states <- apply(filtering, 2, which.max)

# Plot histograms by state using only training data
par(mfrow = c(2, 2))  # 2x2 grid for 4 states

for (state in 1:4) {
  # Filter: only training period & only current state
  data_state <- as.numeric(sp500.sr)[most_likely_states == state]
  
  if (length(data_state) == 0) {
    plot.new()
    title(main = paste("No data for State", state))
    next
  }
  
  hist_info <- hist(data_state,  probability = TRUE, breaks = 50, plot = FALSE)
  
  data_min <- min(data_state)
  data_max <- max(data_state)
  range_expansion <- (data_max - data_min) * 0.15
  
  x_min <- data_min - range_expansion
  x_max <- data_max + range_expansion
  x_vals <- seq(x_min, x_max, length.out = 500)
  y_vals <- dsn(x_vals, xi = xi[state], omega = omega[state], alpha = alpha[state])
  
  ylim_max <- max(max(hist_info$density), max(y_vals)) * 1.1
  
  hist(data_state, probability = TRUE, breaks = 50,
       main = paste("Histogram of Returns - State", state),
       xlab = "Log-Return", ylab = "Density",
       col = "lightgray", border = "white",
       xlim = c(x_min, x_max), ylim = c(0, ylim_max))
  
  curve(dsn(x, xi = xi[state], omega = omega[state], alpha = alpha[state]),
        from = x_min, to = x_max, add = TRUE, col = "red", lwd = 2)
}
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования

Code
dates = index(sp500[-1])
sp500.ret = xts(sp500.cr, order.by = dates)
dates_full <- index(sp500.ret)
returns_full <- na.omit(as.numeric(sp500.ret[,1]))


plot_data_train <- data.frame(
  Date = dates_full,
  Return = returns_full,
  State = factor(most_likely_states, levels = 1:4)  # Only 4 states
)

# Step 3: Plot
ggplot(plot_data_train, aes(x = Date, y = Return, color = State)) +
  geom_point(size = 0.8, alpha = 0.8) +
  scale_color_brewer(palette = "Dark2") +
  labs(title = "SP500 Daily Returns Colored by Hidden State (2015–2025)",
       y = "Log-Return",
       x = "Date") +
  theme_minimal()

Code
par(mfrow = c(1, 1))
stat_distr = forward_results$delta
f=function(x) stat_distr[1]*dsn(x,x=xi[1],omega=omega[1], alpha = alpha[1])+ stat_distr[2]*dsn(x,x=xi[2],omega=omega[2], alpha =alpha[2])+
+ stat_distr[3]*dsn(x,x=xi[3],omega=omega[3], alpha = alpha[3])+
  stat_distr[4]*dsn(x,x=xi[4],omega=omega[4], alpha = alpha[4])
cat("\nBest model's stationary distribution is:", stat_distr)

Best model's stationary distribution is: 0.5360731 0.08184454 0.03129964 0.3507828
Code
hist(sp500.sr,
     breaks = 100,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve, stationary distribution, optimized parameters",
     xlab = "Returns")
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

3.3 NIKKEI225 —> Analyzing data

Code
nikkei = get.hist.quote(instrument = "^N225", start = "2015-01-01", end = "2025-05-30"
                       , quote = "AdjClose")
Warning: ^N225 contains missing values. Some functions will not work if objects
contain missing values in the middle of the series. Consider using na.omit(),
na.approx(), na.fill(), etc to remove or replace them.
time series starts 2015-01-05
Code
nikkei = na.omit(nikkei)
nikkei.num = as.numeric(nikkei)
plot(nikkei)

Code
nts = length(nikkei)
nikkei.sr = 100*diff(nikkei.num)/nikkei.num[-nts]
nikkei.cr = diff(log(nikkei))
plot(nikkei.sr, type = "l", ylim = c(-10,10))

Code
hist(nikkei.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram of NIKKEI index",
     xlab = "Returns")

Code
set.seed(36)  # For reproducibility

data <- as.numeric(nikkei.sr)
n <- length(data)

# Define your target proportions
weights <- c(0.9, 0.7, 0.4, 0.2)

# Initialize vectors to store estimated parameters
xi <- numeric(4)
omega <- numeric(4)
alpha <- numeric(4)

# Create overlapping subsets
splits <- list()
for (i in 1:4) {
  size_i <- floor(weights[i] * n)
  splits[[i]] <- sample(data, size_i, replace = FALSE)  # sample without replacement, but from full data each time
}

# Estimate skew-normal parameters for each subset
results <- list()
for (i in 1:4) {
  fit <- selm(splits[[i]] ~ 1, family = "SN")
  params <- coef(fit, param.type = "DP")
  # Store estimates
  xi[i] <- params["xi"]
  omega[i] <- params["omega"]
  alpha[i] <- params["alpha"]
  results[[i]] <- c(params,xi,omega, alpha)
  cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n",
              i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))
}
Subset 1 (90% of data): xi = 0.851, omega = 1.567, alpha = -0.876
Subset 2 (70% of data): xi = 0.856, omega = 1.510, alpha = -0.945
Subset 3 (40% of data): xi = 1.032, omega = 1.639, alpha = -1.162
Subset 4 (20% of data): xi = -0.887, omega = 1.628, alpha = 0.915
Code
forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 2:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
  
  
  
  return(results)
}

out = forward_skew_loop(data=as.numeric(nikkei.sr), 
                        location.vec = xi, 
                        scale.vec = omega, 
                        shape = alpha)


for (m in 2:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}
HMM with 2 components has likelihood value: -4304.308 
HMM with 3 components has likelihood value: -4300.9 
HMM with 4 components has likelihood value: -4269.035 
Code
# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 2:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}
[1] 0.6661874 0.3338126
          [,1]      [,2]
[1,] 0.6661874 0.3338126
[1] 0.3609897 0.3303818 0.3086285
          [,1]      [,2]      [,3]
[1,] 0.3609897 0.3303818 0.3086285
[1] 0.2717139 0.3431659 0.2318416 0.1532786
          [,1]      [,2]      [,3]      [,4]
[1,] 0.2717139 0.3431659 0.2318416 0.1532786
Code
llik = function(par, m, data) {
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  return(llik)
}




llik_vals = numeric(4)
for (i in 2:4) {
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)  
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(nikkei.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
}
Log-likelihood for 2 components: -4304.277 
Log-likelihood for 3 components: -4300.898 
Log-likelihood for 4 components: -4269.052 
Code
llik_optim_vals = numeric(4)
llik_optim_pars = list()
for (i in 2:4){
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0.0000000001, i^2), rep(-Inf,i), rep(0.01, i), rep(-10, i)),
               upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)),
               control=list(fnscale=-1),data=as.numeric(nikkei.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  llik_optim_pars[[i]]=outhmm$par
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}
MAX llik for 2 components: -3983.133 
MAX llik for 3 components: -3945.825 
MAX llik for 4 components: -3916.983 
Code
aic_last_vals = numeric()
for (i in 2:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}
AIC for 2 components: 7982.266 
AIC for 3 components: 7921.651 
AIC for 4 components: 7881.967 
Code
best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")

Best model is with 4 components (lowest AIC =  7881.967 )
Code
m=best_model
gamma.mle = matrix(llik_optim_pars[[m]][1:m^2], ncol=m, byrow=TRUE)
gamma.mlev = gamma.mle/apply(gamma.mle, FUN=sum, MAR=1)
gamma.mlev
             [,1]         [,2]         [,3]         [,4]
[1,] 9.274162e-01 1.046761e-10 5.896843e-02 1.361532e-02
[2,] 1.300097e-01 6.558046e-01 2.141857e-01 8.237048e-11
[3,] 8.481752e-11 5.934721e-01 4.065279e-01 8.481752e-11
[4,] 1.024942e-01 8.978791e-11 8.978791e-11 8.975058e-01
Code
xi = llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)]
omega = llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)]
alpha = llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)]
cat("\nBest model's location parameter is:", llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)],
    "\nBest model's shape parameter is:", llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)],
    "\nBest model's scale parameter is:", llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)])    

Best model's location parameter is: 0.873272 0.3671296 1.264923 -0.8585419 
Best model's shape parameter is: -1.091009 -1.500902 -1.264446 0.2061429 
Best model's scale parameter is: 1.527224 0.6268233 1.078457 3.169442
Code
llik1 = function(par, m, data) {
  results = list()
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  results=list(llik = llik,
               filtering = filtering,
               pred = pred,
               delta = delta)
  return(results)
}

pars = llik_optim_pars[[m]]
forward_results <- llik1(par = llik_optim_pars[[m]], m = m, data = as.numeric(nikkei.sr))
filtering <- forward_results$filtering
most_likely_states <- apply(filtering, 2, which.max)

# Plot histograms by state using only training data
par(mfrow = c(2, 2))  # 2x2 grid for 4 states

for (state in 1:4) {
  # Filter: only training period & only current state
  data_state <- as.numeric(nikkei.sr)[most_likely_states == state]
  
  if (length(data_state) == 0) {
    plot.new()
    title(main = paste("No data for State", state))
    next
  }
  
  hist_info <- hist(data_state, probability = TRUE, breaks = 50, plot = FALSE)
  
  data_min <- min(data_state)
  data_max <- max(data_state)
  range_expansion <- (data_max - data_min) * 0.15
  
  x_min <- data_min - range_expansion
  x_max <- data_max + range_expansion
  x_vals <- seq(x_min, x_max, length.out = 500)
  y_vals <- dsn(x_vals, xi = xi[state], omega = omega[state], alpha = alpha[state])
  
  ylim_max <- max(max(hist_info$density), max(y_vals)) * 1.1
  
  hist(data_state, probability = TRUE,  breaks = 50,
       main = paste("Histogram of Returns - State", state),
       xlab = "Log-Return", ylab = "Density",
       col = "lightgray", border = "white",
       xlim = c(x_min, x_max), ylim = c(0, ylim_max))
  
  curve(dsn(x, xi = xi[state], omega = omega[state], alpha = alpha[state]),
        from = x_min, to = x_max, add = TRUE, col = "red", lwd = 2)
}
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования

Code
dates = index(nikkei[-1])
nikkei.ret = xts(nikkei.cr, order.by = dates)
dates_full <- index(nikkei.ret)
returns_full <- na.omit(as.numeric(nikkei.ret[,1]))


plot_data_train <- data.frame(
  Date = dates_full,
  Return = returns_full,
  State = factor(most_likely_states, levels = 1:4)  # Only 4 states
)

# Step 3: Plot
ggplot(plot_data_train, aes(x = Date, y = Return, color = State)) +
  geom_point(size = 0.8, alpha = 0.8) +
  scale_color_brewer(palette = "Dark2") +
  labs(title = "NIKKEI Daily Returns Colored by Hidden State (2015–2025)",
       y = "Log-Return",
       x = "Date") +
  theme_minimal()

4 Fitting the GEV distribution to the monthly maxima and minima

4.1 Generalized Extreme Value (GEV) Distribution — Theory

The GEV distribution is designed to model the extremes (either maxima or minima) of a dataset. Instead of modeling the entire distribution, it focuses only on tail events — e.g., extreme market crashes or spikes.

4.1.1 GEV CDF (Simplified)

\[F(x) = \exp\left\{ - \left[ 1 + \xi \left( \frac{x - \mu}{\sigma} \right) \right]^{-1/\xi} \right\} \quad \text{for } 1 + \xi \left( \frac{x - \mu}{\sigma} \right) > 0 \]

Where: - \(\mu\): Location parameter — shifts the distribution - \(\sigma\) > 0 : Scale parameter — controls spread - \(\xi\): Shape parameter — determines the tail behavior

4.1.2 GEV Types Based on Shape Parameter

Type ξ (Shape) Tail Behavior
Fréchet ξ > 0 Heavy-tailed
Gumbel ξ = 0 Light-tailed
Weibull ξ < 0 Bounded upper tail

4.2 Why Use GEV in Financial Time Series?

Financial markets occasionally exhibit extreme behavior (e.g., crashes, spikes). While the HMM captures regime-switching behavior over time, it does not directly model the most extreme values within each state.

By fitting a GEV distribution to monthly maxima and minima, we gain:

  • Insight into tail risk,
  • Better understanding of risk under each regime,
  • Tools to estimate return levels (e.g., “What’s the largest monthly drop we expect every 10 years?”).

4.3 Monthly analysis of STOXX600

Code
sx600 = get.hist.quote(instrument = "^stoxx", start = "2015-01-01", end = "2025-04-30",
                       quote = "AdjClose")
time series starts 2015-01-05
time series ends   2025-04-29
Code
# Ensure column name is standardized
colnames(sx600) <- "AdjClose"

# Convert to a data frame with an explicit date column
sx600_df <- data.frame(
  date = index(sx600),
  price = as.numeric(sx600$AdjClose)  # extract as numeric
)

# Add a "month" column
sx600_df <- sx600_df %>%
  mutate(month = floor_date(date, "month"))

# Compute monthly maxima and minima
monthly_extremes <- sx600_df %>%
  group_by(month) %>%
  summarise(
    monthly_max = max(price, na.rm = TRUE),
    monthly_min = min(price, na.rm = TRUE),
    .groups = "drop"
  )

# View result
head(monthly_extremes)
# A tibble: 6 × 3
  month      monthly_max monthly_min
  <date>           <dbl>       <dbl>
1 2015-01-01        372.        332.
2 2015-02-01        392.        367.
3 2015-03-01        404.        388.
4 2015-04-01        414.        396.
5 2015-05-01        409.        389.
6 2015-06-01        401.        381.
Code
library(ggplot2)
Code
ggplot(monthly_extremes, aes(x = month)) +
  geom_ribbon(aes(ymin = monthly_min, ymax = monthly_max), fill = "skyblue", alpha = 0.3) +
  geom_line(aes(y = monthly_max), color = "blue", size = 1) +
  geom_line(aes(y = monthly_min), color = "red", size = 1) +
  labs(title = "Monthly Price Range of STOXX600",
       x = "Month",
       y = "Adjusted Close Price") +
  theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Code
library(ggplot2)
library(tidyr)

# Reshape data for easy plotting
extreme_long <- monthly_extremes %>%
  pivot_longer(cols = c(monthly_max, monthly_min),
               names_to = "type", values_to = "value")

# Plot histograms
ggplot(extreme_long, aes(x = value, fill = type)) +
  geom_histogram(alpha = 0.6, bins = 15, position = "identity", color = "white") +
  facet_wrap(~ type, scales = "free") +
  scale_fill_manual(values = c("monthly_max" = "blue", "monthly_min" = "red"),
                    labels = c("Monthly Maxima", "Monthly Minima")) +
  labs(title = "Histogram of Monthly Maxima and Minima",
       x = "Adjusted Close Price",
       y = "Frequency",
       fill = "Type") +
  theme_minimal()

Code
#install.packages("ismev")
# Load the required package
library(ismev)
Загрузка требуемого пакета: mgcv
Загрузка требуемого пакета: nlme

Присоединяю пакет: 'nlme'
Следующий объект скрыт от 'package:dplyr':

    collapse
This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
Code
# Fit GEV to monthly maxima
gev_max_fit <- gev.fit(monthly_extremes$monthly_max)
$conv
[1] 0

$nllh
[1] 665.5927

$mle
[1] 394.70275481  42.54372301   0.05627224

$se
[1] 4.59869221 3.55441103 0.09788299
Code
# Fit GEV to negative of monthly minima (to treat minima as maxima)
gev_min_fit <- gev.fit(-monthly_extremes$monthly_min)  # Flip sign for minima
$conv
[1] 0

$nllh
[1] 671.8119

$mle
[1] -415.9909219   59.4110871   -0.4117731

$se
[1] 5.71413360 4.10154189 0.04101183
Code
gev.profxi(gev_max_fit, xlow=-0.15, xup=0.35, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.prof(gev_max_fit, m=60,xlow=530, xup=750, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.diag(gev_max_fit)

Code
gev.profxi(gev_min_fit, xlow=-0.5, xup=-0.3, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.prof(gev_min_fit, m=60,xlow=-310, xup=-280, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.diag(gev_min_fit)

4.4 Monthly analysis of SP500

Code
sp500 = get.hist.quote(instrument = "^gspc", start = "2015-01-01", end = "2025-04-30"
                       , quote = "AdjClose")
time series starts 2015-01-02
time series ends   2025-04-29
Code
# Ensure column name is standardized
colnames(sp500) <- "AdjClose"

# Convert to a data frame with an explicit date column
sp500_df <- data.frame(
  date = index(sp500),
  price = as.numeric(sp500$AdjClose)  # extract as numeric
)

# Add a "month" column
sp500_df <- sp500_df %>%
  mutate(month = floor_date(date, "month"))

# Compute monthly maxima and minima
monthly_extremes <- sp500_df %>%
  group_by(month) %>%
  summarise(
    monthly_max = max(price, na.rm = TRUE),
    monthly_min = min(price, na.rm = TRUE),
    .groups = "drop"
  )

# View result
head(monthly_extremes)
# A tibble: 6 × 3
  month      monthly_max monthly_min
  <date>           <dbl>       <dbl>
1 2015-01-01       2063.       1993.
2 2015-02-01       2115.       2021.
3 2015-03-01       2117.       2040.
4 2015-04-01       2118.       2060.
5 2015-05-01       2131.       2080.
6 2015-06-01       2124.       2058.
Code
ggplot(monthly_extremes, aes(x = month)) +
  geom_ribbon(aes(ymin = monthly_min, ymax = monthly_max), fill = "skyblue", alpha = 0.3) +
  geom_line(aes(y = monthly_max), color = "blue", size = 1) +
  geom_line(aes(y = monthly_min), color = "red", size = 1) +
  labs(title = "Monthly Price Range of SP500",
       x = "Month",
       y = "Adjusted Close Price") +
  theme_minimal()

Code
# Reshape data for easy plotting
extreme_long <- monthly_extremes %>%
  pivot_longer(cols = c(monthly_max, monthly_min),
               names_to = "type", values_to = "value")

# Plot histograms
ggplot(extreme_long, aes(x = value, fill = type)) +
  geom_histogram(alpha = 0.6, bins = 15, position = "identity", color = "white") +
  facet_wrap(~ type, scales = "free") +
  scale_fill_manual(values = c("monthly_max" = "blue", "monthly_min" = "red"),
                    labels = c("Monthly Maxima", "Monthly Minima")) +
  labs(title = "Histogram of Monthly Maxima and Minima",
       x = "Adjusted Close Price",
       y = "Frequency",
       fill = "Type") +
  theme_minimal()

Code
# Fit GEV to monthly maxima
gev_max_fit <- gev.fit(monthly_extremes$monthly_max)
$conv
[1] 0

$nllh
[1] 1045.977

$mle
[1] 2.929553e+03 9.041336e+02 7.576205e-02

$se
[1] 108.3372738  88.0623152   0.1358157
Code
# Fit GEV to negative of monthly minima (to treat minima as maxima)
gev_min_fit <- gev.fit(-monthly_extremes$monthly_min)  # Flip sign for minima
$conv
[1] 0

$nllh
[1] 1026.645

$mle
[1] -3426.8819235  1279.5105503    -0.7963211

$se
[1] 119.37781148 106.58968288   0.05076008
Code
gev.profxi(gev_max_fit, xlow=-0.25, xup=0.5, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.prof(gev_max_fit, m=60,xlow=5900, xup=11100, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.diag(gev_max_fit)

Code
gev.profxi(gev_min_fit, xlow=-1.2, xup=-0.6, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.prof(gev_min_fit, m=60,xlow=-1920, xup=-1800, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.diag(gev_min_fit)

4.5 Monthly analysis of NIKKEI225

Code
nikkei = get.hist.quote(instrument = "^N225", start = "2015-01-01", end = "2025-05-30"
                       , quote = "AdjClose")
Warning: ^N225 contains missing values. Some functions will not work if objects
contain missing values in the middle of the series. Consider using na.omit(),
na.approx(), na.fill(), etc to remove or replace them.
time series starts 2015-01-05
Code
nikkei = na.omit(nikkei)
# Ensure column name is standardized
colnames(nikkei) <- "AdjClose"

# Convert to a data frame with an explicit date column
nikkei_df <- data.frame(
  date = index(nikkei),
  price = as.numeric(nikkei$AdjClose)  # extract as numeric
)

# Add a "month" column
nikkei_df <- nikkei_df %>%
  mutate(month = floor_date(date, "month"))

# Compute monthly maxima and minima
monthly_extremes <- nikkei_df %>%
  group_by(month) %>%
  summarise(
    monthly_max = max(price, na.rm = TRUE),
    monthly_min = min(price, na.rm = TRUE),
    .groups = "drop"
  )

# View result
head(monthly_extremes)
# A tibble: 6 × 3
  month      monthly_max monthly_min
  <date>           <dbl>       <dbl>
1 2015-01-01      17796.      16796.
2 2015-02-01      18798.      17336.
3 2015-03-01      19754.      18665.
4 2015-04-01      20188.      19035.
5 2015-05-01      20563.      19292.
6 2015-06-01      20868.      19991.
Code
ggplot(monthly_extremes, aes(x = month)) +
  geom_ribbon(aes(ymin = monthly_min, ymax = monthly_max), fill = "skyblue", alpha = 0.3) +
  geom_line(aes(y = monthly_max), color = "blue", size = 1) +
  geom_line(aes(y = monthly_min), color = "red", size = 1) +
  labs(title = "Monthly Price Range of NIKKEI225",
       x = "Month",
       y = "Adjusted Close Price") +
  theme_minimal()

Code
# Reshape data for easy plotting
extreme_long <- monthly_extremes %>%
  pivot_longer(cols = c(monthly_max, monthly_min),
               names_to = "type", values_to = "value")

# Plot histograms
ggplot(extreme_long, aes(x = value, fill = type)) +
  geom_histogram(alpha = 0.6, bins = 15, position = "identity", color = "white") +
  facet_wrap(~ type, scales = "free") +
  scale_fill_manual(values = c("monthly_max" = "blue", "monthly_min" = "red"),
                    labels = c("Monthly Maxima", "Monthly Minima")) +
  labs(title = "Histogram of Monthly Maxima and Minima",
       x = "Adjusted Close Price",
       y = "Frequency",
       fill = "Type") +
  theme_minimal()

Code
# Fit GEV to monthly maxima
gev_max_fit <- gev.fit(monthly_extremes$monthly_max)
$conv
[1] 0

$nllh
[1] 1267.901

$mle
[1] 2.256051e+04 4.801743e+03 1.486045e-01

$se
[1] 513.0572819 407.7429910   0.1003568
Code
# Fit GEV to negative of monthly minima (to treat minima as maxima)
gev_min_fit <- gev.fit(-monthly_extremes$monthly_min)  # Flip sign for minima
$conv
[1] 0

$nllh
[1] 1260.507

$mle
[1] -2.632558e+04  6.658697e+03 -5.531426e-01

$se
[1] 682.20722370 474.01405360   0.05680979
Code
gev.profxi(gev_max_fit, xlow=-0.1, xup=0.4, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.prof(gev_max_fit, m=60,xlow=40000, xup=70000, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.diag(gev_max_fit)

Code
gev.profxi(gev_min_fit, xlow=-1, xup=-0.4, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.prof(gev_min_fit, m=60,xlow=-16000, xup=-14000, conf = 0.95)
If routine fails, try changing plotting interval

Code
gev.diag(gev_min_fit)