1 SKEW-NORMAL DISTRIBUTION

1.1 🧠 What is the Skew-Normal Distribution?

The skew-normal distribution is a generalization of the normal (Gaussian) distribution that allows for asymmetry (skewness). While the standard normal distribution is symmetric around the mean, the skew-normal can lean to the left or right, depending on the shape parameter.

1.2 📈 Probability Density Function (PDF)

The probability density function (PDF) of the skew-normal distribution is:

\[ f(x; \xi, \omega, \alpha) = \frac{2}{\omega} \, \phi\left( \frac{x - \xi}{\omega} \right) \, \Phi\left( \alpha \cdot \frac{x - \xi}{\omega} \right) \]

Where:

\(\phi(\cdot)\) : Standard normal PDF
\(\Phi(\cdot)\) : Standard normal CDF
\(\xi\) : Location parameter (like the mean)
\(\omega\) > 0 : Scale parameter (like the standard deviation)
\(\alpha\) : Shape parameter (controls skewness)

1.3 🔁 Special Cases

If ( \(\alpha\) = 0 ): It becomes a standard normal distribution.
If ( \(\alpha\) > 0 ): The distribution is right-skewed (tail leans right).
If ( \(\alpha\) < 0 ): The distribution is left-skewed (tail leans left).

1.4 🔍 Difference Between Normal and Skew-Normal Distributions

The normal distribution is a special case of the skew-normal distribution. Here’s how they differ:

Feature	Normal Distribution	Skew-Normal Distribution
Shape	Symmetrical (bell curve)	Asymmetrical (can lean left or right)
Skewness	Exactly 0	Controlled by a shape parameter ( \(\alpha\) )
PDF	Based on the standard normal function	Involves both the standard normal PDF and CDF
Parameters	Mean ( \(\mu\) ), SD ( \(\sigma\) )	Location ( \(\xi\) ), Scale ( \(\omega\) ), Shape ( \(\alpha\) )
Flexibility	Limited to symmetry	More flexible; can model skewed data
Special Case	—	Normal dist. when ( \(\alpha\) = 0 )

In summary, the skew-normal adds an extra parameter to capture asymmetry, making it more adaptable for real-world data that often deviates from perfect symmetry.

Code

# Set seed for reproducibility
set.seed(123)

# Generate 1000 random values from a normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)

# Plot histogram with density instead of frequency
hist(normal_data,
     breaks = 30,
     col = "skyblue",
     border = "white",
     probability = TRUE,
     main = "Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add a normal density curve
curve(dnorm(x, mean = mean(normal_data), sd = sd(normal_data)),
      col = "darkblue",
      lwd = 2,
      add = TRUE)

Code

library(sn)
library(tseries)
library(xts)
library(ggplot2)
library(ggthemes)
library(e1071)
library(PerformanceAnalytics)
library(patchwork)
library(scales)
# Set seed for reproducibility
set.seed(123)
# Generate 1000 random values from a skew-normal distribution
# Parameters: location (xi), scale (omega), shape (alpha)
skew_data <- rsn(1000, xi = 0, omega = 1, alpha = 5)

# Plot histogram with density
hist(skew_data,
     breaks = 30,
     col = "lightgreen",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = 5),
      col = "darkgreen",
      lwd = 2,
      add = TRUE)

Code

# Set seed for reproducibility
set.seed(123)

# Generate data from skew-normal distribution with alpha = -5
skew_data_neg <- rsn(n = 1000, xi = 0, omega = 1, alpha = -5)

# Plot histogram with density
hist(skew_data_neg,
     breaks = 30,
     col = "salmon",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution (α = -5)",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = -5),
      col = "darkred",
      lwd = 2,
      add = TRUE)

2 HMM (Hidden Markov Models)

2.1 📦 What is a Hidden Markov Model?

A Hidden Markov Model (HMM) is a statistical model where the system being modeled is assumed to follow a Markov process with unobserved (hidden) states.

Note

In plain terms: - There’s a process that moves between hidden states over time. - Each hidden state emits an observable output. - You see the outputs, but you don’t see the states.

Warning

Finance : Market regime switching: bull - bear market.

2.2 🧩 HMM Components

A Hidden Markov Model (HMM) is defined by the following key elements:

2.2.1 🔒 Hidden States

Denoted as:

\[ C_1, C_2, \dots, C_T \]

A finite set of unobserved states (e.g., calm/volatile, sunny/rainy).
The state at time ( t ), denoted ( C_t ), follows a Markov chain:

\[ P(C_t \mid C_{t-1}) = \text{transition probability} \]

This means the current state only depends on the previous state, not the full history.

2.2.2 📈 Observations

Denoted as:

\[ X_1, X_2, \dots, X_T \]

Each state emits data based on a specific probability distribution.
The observation at time ( t ), ( X_t ), depends only on the hidden state at that time:

\[ P(X_t \mid C_t) \]

These are sometimes called the emission probabilities.

2.2.3 🎲 Initial Probabilities

The model starts with an initial distribution over the hidden states:

\[ \delta = P(C_1) \]

This represents the probability of starting in each possible hidden state.

2.3 🔁 Markov Property

The hidden states follow the first-order Markov assumption:

\[ P(C_t \mid C_{1:t-1}) = P(C_t \mid C_{t-1}) \]

Also, the current observation depends only on the current state:

\[ P(X_t \mid C_{1:T}, X_{1:t-1}) = P(X_t \mid C_t) \]

2.4 ⚙️ Parameters of an HMM

Transition matrix ( ):

\[ \Gamma = \begin{bmatrix} P(C_{t+1} = 1 \mid C_t = 1) & \cdots & \cdots & P(C_{t+1} = m \mid C_t = m) \end{bmatrix} \]

Emission probabilities (depends on the application):

Normal: \(\mathcal{N}(\mu_i, \sigma_i^2)\)
Poisson: \(\ \text{Pois}(\lambda_i)\)
Skew-normal, etc.

Initial distribution:
\(\ \delta = P(C_1 = i)\)

2.5 📈 What You Can Do With HMMs

Task	What It Means
Likelihood evaluation	Compute \(\\P(X_1, \dots, X_T)\)
State estimation	Infer the most likely hidden states
Parameter estimation	Learn transition & emission parameters
Forecasting	Predict future observations or states

2.6 📐 Key Algorithms

2.6.1 🔄 Forward Algorithm

Efficiently computes the likelihood of observed data using recursion:

\[ \alpha_t(j) = \sum_i \alpha_{t-1}(i) \cdot \gamma_{ij} \cdot P(X_t \mid C_t = j) \]

Where:

\(\alpha_t(j)\): probability of partial sequence ending in state \(\\j\)
\(\gamma_{ij}\): transition probability from state \(\\i\) to \(\\j\)
\(\\P(X_t \mid C_t = j)\): emission probability at time \(\\t\)

2.6.2 🔁 Backward Algorithm

Used for:

Smoothing (estimating past states)
EM Algorithm for learning parameters

2.7 📊 Visualization

📈 Plot observations with colored hidden states
📊 Show state probabilities over time
📉 Compare model fit using criteria like AIC and BIC across different numbers of states

2.8 📌 Advantages of HMMs

✅ Handles hidden structure in time series
✅ Efficient thanks to Forward-Backward recursion
✅ Works well for sequential or temporal data
✅ Easily extended (e.g., with skew-normal emissions, covariates, etc.)

3 Main Part —> Project (1)

3.1 SX600 —> Analyzing data

Code

sx600 = get.hist.quote(instrument = "^stoxx", start = "2015-01-01", end = "2025-04-30",
                       quote = "AdjClose")

time series starts 2015-01-05
time series ends   2025-04-29

Code

sx600.num = as.numeric(sx600)
plot(sx600)

Code

nts = length(sx600)
sx600.sr = 100*diff(sx600.num)/sx600.num[-nts]
sx600.cr = diff(log(sx600))
plot(sx600.sr, type = "l", ylim = c(-10,10))

Code

hist(sx600.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.7*dsn(x,x=0.5,omega=0.8, alpha = 5)+ 0.2*dsn(x,x=0.6,omega=0.8, alpha = -4)+
+ 0.1*dsn(x,x=-1.85,omega=3, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code

set.seed(42)  # For reproducibility

data <- as.numeric(sx600.sr)
n <- length(data)

# Define your target proportions
weights <- c(0.9, 0.7, 0.4, 0.2)

# Initialize vectors to store estimated parameters
xi <- numeric(4)
omega <- numeric(4)
alpha <- numeric(4)

# Create overlapping subsets
splits <- list()
for (i in 1:4) {
  size_i <- floor(weights[i] * n)
  splits[[i]] <- sample(data, size_i, replace = FALSE)  # sample without replacement, but from full data each time
}

# Estimate skew-normal parameters for each subset
results <- list()
for (i in 1:4) {
  fit <- selm(splits[[i]] ~ 1, family = "SN")
  params <- coef(fit, param.type = "DP")
  # Store estimates
  xi[i] <- params["xi"]
  omega[i] <- params["omega"]
  alpha[i] <- params["alpha"]
  results[[i]] <- c(params,xi,omega, alpha)
  cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n",
              i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))
}

Subset 1 (90% of data): xi = 0.875, omega = 1.358, alpha = -1.378
Subset 2 (70% of data): xi = 0.858, omega = 1.345, alpha = -1.278
Subset 3 (40% of data): xi = 0.923, omega = 1.447, alpha = -1.557
Subset 4 (20% of data): xi = -0.597, omega = 1.255, alpha = 0.809

Code

forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 2:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
    return(results)
}


out = forward_skew_loop(data=as.numeric(sx600.sr), 
                        location.vec = xi, 
                        scale.vec = omega, 
                        shape = alpha)

for (m in 2:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}

HMM with 2 components has likelihood value: -3755.94 
HMM with 3 components has likelihood value: -3753.523 
HMM with 4 components has likelihood value: -3729.03

Code

# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 2:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}

[1] 0.8430069 0.1569931
          [,1]      [,2]
[1,] 0.8430069 0.1569931
[1] 0.3455869 0.2265738 0.4278392
          [,1]      [,2]      [,3]
[1,] 0.3455869 0.2265738 0.4278392
[1] 0.1680199 0.2624128 0.2600075 0.3095598
          [,1]      [,2]      [,3]      [,4]
[1,] 0.1680199 0.2624128 0.2600075 0.3095598

Code

llik = function(par, m, data) {
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  return(llik)
}

llik_vals = numeric(4)
for (i in 2:4) {
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)  
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sx600.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
  }

Log-likelihood for 2 components: -3755.939 
Log-likelihood for 3 components: -3753.525 
Log-likelihood for 4 components: -3729.031

Code

llik_optim_vals = numeric(4)
llik_optim_pars = list()
for (i in 2:4){
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0, i^2), rep(-Inf, i), rep(0.01, m), rep(-10, i)),
               upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)),
               control=list(fnscale=-1),data=as.numeric(sx600.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  llik_optim_pars[[i]]=outhmm$par
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}

MAX llik for 2 components: -3358.633 
MAX llik for 3 components: -3284.078 
MAX llik for 4 components: -3274.408

Code

aic_last_vals = numeric()
for (i in 2:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}

AIC for 2 components: 6733.266 
AIC for 3 components: 6598.156 
AIC for 4 components: 6596.815

Code

best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")


Best model is with 4 components (lowest AIC =  6596.815 )

Code

m=best_model
gamma.mle = matrix(llik_optim_pars[[m]][1:m^2], ncol=m, byrow=TRUE)
gamma.mlev = gamma.mle/apply(gamma.mle, FUN=sum, MAR=1)
gamma.mlev

           [,1]       [,2]        [,3]      [,4]
[1,] 0.93760405 0.05261219 0.009783761 0.0000000
[2,] 0.04427223 0.95572777 0.000000000 0.0000000
[3,] 0.00000000 0.00000000 0.505253711 0.4947463
[4,] 0.08535604 0.00000000 0.442838669 0.4718053

Code

xi = llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)]
cat("\nBest model's location parameter is:", llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)])


Best model's location parameter is: 0.7496635 0.5134708 0.5325302 -0.009874364

Code

omega = llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)]
cat("\nBest model's scale parameter is:", llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)])


Best model's scale parameter is: 1.301072 0.6749959 2.951049 2.074013

Code

alpha = llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)]
cat("\nBest model's shape parameter is:", llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)])


Best model's shape parameter is: -1.133608 -1.252673 -1.891089 1.360593

Code

llik1 = function(par, m, data) {
  results = list()
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  results=list(llik = llik,
               filtering = filtering,
               pred = pred,
               delta = delta)
  return(results)
}

pars = llik_optim_pars[[m]]
forward_results <- llik1(par = llik_optim_pars[[m]], m = m, data = as.numeric(sx600.sr))
filtering <- forward_results$filtering
most_likely_states <- apply(filtering, 2, which.max)

# Plot histograms by state using only training data
par(mfrow = c(2, 2))  # 2x2 grid for 4 states

for (state in 1:4) {
  # Filter: only training period & only current state
  data_state <- as.numeric(sx600.sr)[most_likely_states == state]
  
  if (length(data_state) == 0) {
    plot.new()
    title(main = paste("No data for State", state))
    next
  }
  
  hist_info <- hist(data_state,  probability = TRUE, breaks = 50, plot = FALSE)
  
  data_min <- min(data_state)
  data_max <- max(data_state)
  range_expansion <- (data_max - data_min) * 0.15
  
  x_min <- data_min - range_expansion
  x_max <- data_max + range_expansion
  x_vals <- seq(x_min, x_max, length.out = 500)
  y_vals <- dsn(x_vals, xi = xi[state], omega = omega[state], alpha = alpha[state])
  
  ylim_max <- max(max(hist_info$density), max(y_vals)) * 1.1
  
  hist(data_state, probability = TRUE, breaks = 50,
       main = paste("Histogram of Returns - State", state),
       xlab = "Log-Return", ylab = "Density",
       col = "lightgray", border = "white",
       xlim = c(x_min, x_max), ylim = c(0, ylim_max))
  
  curve(dsn(x, xi = xi[state], omega = omega[state], alpha = alpha[state]),
        from = x_min, to = x_max, add = TRUE, col = "red", lwd = 2)
}

Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования

Code

dates = index(sx600[-1])
sx600.ret = xts(sx600.cr, order.by = dates)
dates_full <- index(sx600.ret)
returns_full <- na.omit(as.numeric(sx600.ret[,1]))



plot_data_train <- data.frame(
  Date = dates_full,
  Return = returns_full,
  State = factor(most_likely_states, levels = 1:4)  # Only 4 states
)

# Step 3: Plot
ggplot(plot_data_train, aes(x = Date, y = Return, color = State)) +
  geom_point(size = 0.8, alpha = 0.8) +
  scale_color_brewer(palette = "Dark2") +
  labs(title = "STOXX600 Daily Returns Colored by Hidden State (2015–2025)",
       y = "Log-Return",
       x = "Date") +
  theme_minimal()

Code

par(mfrow = c(1, 1))
stat_distr = forward_results$delta
f=function(x) stat_distr[1]*dsn(x,x=xi[1],omega=omega[1], alpha = alpha[1])+ stat_distr[2]*dsn(x,x=xi[2],omega=omega[2], alpha =alpha[2])+
+ stat_distr[3]*dsn(x,x=xi[3],omega=omega[3], alpha = alpha[3])+
  stat_distr[4]*dsn(x,x=xi[4],omega=omega[4], alpha = alpha[4])
cat("\nBest model's stationary distribution is:", stat_distr)


Best model's stationary distribution is: 0.4123075 0.4899776 0.05045501 0.0472599

Code

hist(sx600.sr,
     breaks = 100,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve, stationary distribution, optimized parameters",
     xlab = "Returns")
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

3.2 SP500 —> Analyzing data

Code

sp500 = get.hist.quote(instrument = "^gspc", start = "2015-01-01", end = "2025-04-30"
                       , quote = "AdjClose")

time series starts 2015-01-02
time series ends   2025-04-29

Code

sp500.num = as.numeric(sp500)
plot(sp500)

Code

nts = length(sp500)
sp500.sr = 100*diff(sp500.num)/sp500.num[-nts]
sp500.cr = diff(log(sp500))
plot(sp500.sr, type = "l", ylim = c(-10,10))

Code

hist(sp500.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.5*dsn(x,x=0.3,omega=0.6, alpha = 5)+
  +0.2*dsn(x,x=0.4,omega=0.8, alpha = -4)+
  +0.1*dsn(x,x=-1,omega=1, alpha = 2)+
  +0.2*dsn(x,x=1,omega=1, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code

set.seed(45)  # For reproducibility

data <- as.numeric(sp500.sr)
n <- length(data)

# Define your target proportions
weights <- c(0.9, 0.7, 0.4, 0.2)

# Initialize vectors to store estimated parameters
xi <- numeric(4)
omega <- numeric(4)
alpha <- numeric(4)

# Create overlapping subsets
splits <- list()
for (i in 1:4) {
  size_i <- floor(weights[i] * n)
  splits[[i]] <- sample(data, size_i, replace = FALSE)  # sample without replacement, but from full data each time
}

# Estimate skew-normal parameters for each subset
results <- list()
for (i in 1:4) {
  fit <- selm(splits[[i]] ~ 1, family = "SN")
  params <- coef(fit, param.type = "DP")
  # Store estimates
  xi[i] <- params["xi"]
  omega[i] <- params["omega"]
  alpha[i] <- params["alpha"]
  results[[i]] <- c(params,xi,omega, alpha)
  cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n",
              i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))
}

Subset 1 (90% of data): xi = 0.795, omega = 1.370, alpha = -0.955
Subset 2 (70% of data): xi = 0.859, omega = 1.419, alpha = -1.052
Subset 3 (40% of data): xi = -0.649, omega = 1.369, alpha = 0.892
Subset 4 (20% of data): xi = 0.908, omega = 1.411, alpha = -1.186

Code

forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 2:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
  
  
  
  return(results)
}

out = forward_skew_loop(data=as.numeric(sp500.sr), 
                        location.vec = xi, 
                        scale.vec = omega, 
                        shape = alpha)


for (m in 2:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}

HMM with 2 components has likelihood value: -4023.631 
HMM with 3 components has likelihood value: -3987.416 
HMM with 4 components has likelihood value: -3971.691

Code

# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 2:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}

[1] 0.5477482 0.4522518
          [,1]      [,2]
[1,] 0.5477482 0.4522518
[1] 0.2035905 0.1775676 0.6188419
          [,1]      [,2]      [,3]
[1,] 0.2035905 0.1775676 0.6188419
[1] 0.2317339 0.3612939 0.2448064 0.1621657
          [,1]      [,2]      [,3]      [,4]
[1,] 0.2317339 0.3612939 0.2448064 0.1621657

Code

llik = function(par, m, data) {
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  return(llik)
}







llik_vals = numeric(4)
for (i in 2:4) {
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)  
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sp500.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
}

Log-likelihood for 2 components: -4023.632 
Log-likelihood for 3 components: -3987.419 
Log-likelihood for 4 components: -3971.684

Code

llik_optim_vals = numeric(4)
llik_optim_pars = list()
for (i in 2:4){
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0, i^2), rep(-Inf,i), rep(0.01, i), rep(-10, i)),
               upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)),
               control=list(fnscale=-1),data=as.numeric(sp500.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  llik_optim_pars[[i]]=outhmm$par
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}

MAX llik for 2 components: -3456.351 
MAX llik for 3 components: -3330.375 
MAX llik for 4 components: -3318.017

Code

aic_last_vals = numeric()
for (i in 2:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}

AIC for 2 components: 6928.701 
AIC for 3 components: 6690.75 
AIC for 4 components: 6684.033

Code

best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")


Best model is with 4 components (lowest AIC =  6684.033 )

Code

m=best_model
gamma.mle = matrix(llik_optim_pars[[m]][1:m^2], ncol=m, byrow=TRUE)
gamma.mlev = gamma.mle/apply(gamma.mle, FUN=sum, MAR=1)
gamma.mlev

           [,1]       [,2]       [,3]      [,4]
[1,] 0.97373389 0.02626611 0.00000000 0.0000000
[2,] 0.00000000 0.47620583 0.03685692 0.4869373
[3,] 0.00000000 0.00000000 0.90362390 0.0963761
[4,] 0.04014038 0.08207112 0.00000000 0.8777885

Code

xi = llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)]
cat("\nBest model's location parameter is:", llik_optim_pars[[m]][(m^2 + 1):(m^2 + m)])


Best model's location parameter is: 0.3829119 0.2964956 -0.7069928 1.106778

Code

omega = llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)]
cat("\nBest model's scale parameter is:", llik_optim_pars[[m]][(m^2 + m + 1):(m^2 + 2*m)])


Best model's scale parameter is: 0.5968282 1.692717 3.722713 1.502212

Code

alpha = llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)]
cat("\nBest model's shape parameter is:", llik_optim_pars[[m]][(m^2 + 2*m + 1):(m^2 + 3*m)])


Best model's shape parameter is: -0.7264074 -1.438691 0.2471854 -1.268492

Code

llik1 = function(par, m, data) {
  results = list()
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  results=list(llik = llik,
               filtering = filtering,
               pred = pred,
               delta = delta)
  return(results)
}

pars = llik_optim_pars[[m]]
forward_results <- llik1(par = llik_optim_pars[[m]], m = m, data = as.numeric(sp500.sr))
filtering <- forward_results$filtering
most_likely_states <- apply(filtering, 2, which.max)

# Plot histograms by state using only training data
par(mfrow = c(2, 2))  # 2x2 grid for 4 states

for (state in 1:4) {
  # Filter: only training period & only current state
  data_state <- as.numeric(sp500.sr)[most_likely_states == state]
  
  if (length(data_state) == 0) {
    plot.new()
    title(main = paste("No data for State", state))
    next
  }
  
  hist_info <- hist(data_state,  probability = TRUE, breaks = 50, plot = FALSE)
  
  data_min <- min(data_state)
  data_max <- max(data_state)
  range_expansion <- (data_max - data_min) * 0.15
  
  x_min <- data_min - range_expansion
  x_max <- data_max + range_expansion
  x_vals <- seq(x_min, x_max, length.out = 500)
  y_vals <- dsn(x_vals, xi = xi[state], omega = omega[state], alpha = alpha[state])
  
  ylim_max <- max(max(hist_info$density), max(y_vals)) * 1.1
  
  hist(data_state, probability = TRUE, breaks = 50,
       main = paste("Histogram of Returns - State", state),
       xlab = "Log-Return", ylab = "Density",
       col = "lightgray", border = "white",
       xlim = c(x_min, x_max), ylim = c(0, ylim_max))
  
  curve(dsn(x, xi = xi[state], omega = omega[state], alpha = alpha[state]),
        from = x_min, to = x_max, add = TRUE, col = "red", lwd = 2)
}

Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования

Code

dates = index(sp500[-1])
sp500.ret = xts(sp500.cr, order.by = dates)
dates_full <- index(sp500.ret)
returns_full <- na.omit(as.numeric(sp500.ret[,1]))


plot_data_train <- data.frame(
  Date = dates_full,
  Return = returns_full,
  State = factor(most_likely_states, levels = 1:4)  # Only 4 states
)

# Step 3: Plot
ggplot(plot_data_train, aes(x = Date, y = Return, color = State)) +
  geom_point(size = 0.8, alpha = 0.8) +
  scale_color_brewer(palette = "Dark2") +
  labs(title = "SP500 Daily Returns Colored by Hidden State (2015–2025)",
       y = "Log-Return",
       x = "Date") +
  theme_minimal()

Code

par(mfrow = c(1, 1))
stat_distr = forward_results$delta
f=function(x) stat_distr[1]*dsn(x,x=xi[1],omega=omega[1], alpha = alpha[1])+ stat_distr[2]*dsn(x,x=xi[2],omega=omega[2], alpha =alpha[2])+
+ stat_distr[3]*dsn(x,x=xi[3],omega=omega[3], alpha = alpha[3])+
  stat_distr[4]*dsn(x,x=xi[4],omega=omega[4], alpha = alpha[4])
cat("\nBest model's stationary distribution is:", stat_distr)


Best model's stationary distribution is: 0.5360731 0.08184454 0.03129964 0.3507828

Code

hist(sp500.sr,
     breaks = 100,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve, stationary distribution, optimized parameters",
     xlab = "Returns")
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)