1 SKEW-NORMAL DISTRIBUTION

1.1 🧠 What is the Skew-Normal Distribution?

The skew-normal distribution is a generalization of the normal (Gaussian) distribution that allows for asymmetry (skewness). While the standard normal distribution is symmetric around the mean, the skew-normal can lean to the left or right, depending on the shape parameter.

1.2 📈 Probability Density Function (PDF)

The probability density function (PDF) of the skew-normal distribution is:

\[ f(x; \xi, \omega, \alpha) = \frac{2}{\omega} \, \phi\left( \frac{x - \xi}{\omega} \right) \, \Phi\left( \alpha \cdot \frac{x - \xi}{\omega} \right) \]

Where:

$\phi(\cdot)$ : Standard normal PDF
$\Phi(\cdot)$ : Standard normal CDF
$\xi$ : Location parameter (like the mean)
$\omega$ > 0 : Scale parameter (like the standard deviation)
$\alpha$ : Shape parameter (controls skewness)

1.3 🔁 Special Cases

If ( $\alpha$ = 0 ): It becomes a standard normal distribution.
If ( $\alpha$ > 0 ): The distribution is right-skewed (tail leans right).
If ( $\alpha$ < 0 ): The distribution is left-skewed (tail leans left).

1.4 🔍 Difference Between Normal and Skew-Normal Distributions

The normal distribution is a special case of the skew-normal distribution. Here’s how they differ:

Feature	Normal Distribution	Skew-Normal Distribution
Shape	Symmetrical (bell curve)	Asymmetrical (can lean left or right)
Skewness	Exactly 0	Controlled by a shape parameter ( $\alpha$ )
PDF	Based on the standard normal function	Involves both the standard normal PDF and CDF
Parameters	Mean ( $\mu$ ), SD ( $\sigma$ )	Location ( $\xi$ ), Scale ( $\omega$ ), Shape ( $\alpha$ )
Flexibility	Limited to symmetry	More flexible; can model skewed data
Special Case	—	Normal dist. when ( $\alpha$ = 0 )

In summary, the skew-normal adds an extra parameter to capture asymmetry, making it more adaptable for real-world data that often deviates from perfect symmetry.

Code

# Set seed for reproducibility
set.seed(123)

# Generate 1000 random values from a normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)

# Plot histogram with density instead of frequency
hist(normal_data,
     breaks = 30,
     col = "skyblue",
     border = "white",
     probability = TRUE,
     main = "Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add a normal density curve
curve(dnorm(x, mean = mean(normal_data), sd = sd(normal_data)),
      col = "darkblue",
      lwd = 2,
      add = TRUE)

Code

library(sn)
library(tseries)
# Set seed for reproducibility
set.seed(123)
# Generate 1000 random values from a skew-normal distribution
# Parameters: location (xi), scale (omega), shape (alpha)
skew_data <- rsn(1000, xi = 0, omega = 1, alpha = 5)

# Plot histogram with density
hist(skew_data,
     breaks = 30,
     col = "lightgreen",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = 5),
      col = "darkgreen",
      lwd = 2,
      add = TRUE)

Code

# Load required package
library(sn)

# Set seed for reproducibility
set.seed(123)

# Generate data from skew-normal distribution with alpha = -5
skew_data_neg <- rsn(n = 1000, xi = 0, omega = 1, alpha = -5)

# Plot histogram with density
hist(skew_data_neg,
     breaks = 30,
     col = "salmon",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution (α = -5)",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = -5),
      col = "darkred",
      lwd = 2,
      add = TRUE)

2 HMM (Hidden Markov Models)

2.1 📦 What is a Hidden Markov Model?

A Hidden Markov Model (HMM) is a statistical model where the system being modeled is assumed to follow a Markov process with unobserved (hidden) states.

Note

In plain terms: - There’s a process that moves between hidden states over time. - Each hidden state emits an observable output. - You see the outputs, but you don’t see the states.

Warning

Finance : Market regime switching: bull - bear market.

2.2 🧩 HMM Components

A Hidden Markov Model (HMM) is defined by the following key elements:

2.2.1 🔒 Hidden States

Denoted as:

\[ C_1, C_2, \dots, C_T \]

A finite set of unobserved states (e.g., calm/volatile, sunny/rainy).
The state at time ( t ), denoted ( C_t ), follows a Markov chain:

\[ P(C_t \mid C_{t-1}) = \text{transition probability} \]

This means the current state only depends on the previous state, not the full history.

2.2.2 📈 Observations

Denoted as:

\[ X_1, X_2, \dots, X_T \]

Each state emits data based on a specific probability distribution.
The observation at time ( t ), ( X_t ), depends only on the hidden state at that time:

\[ P(X_t \mid C_t) \]

These are sometimes called the emission probabilities.

2.2.3 🎲 Initial Probabilities

The model starts with an initial distribution over the hidden states:

\[ \delta = P(C_1) \]

This represents the probability of starting in each possible hidden state.

2.3 🔁 Markov Property

The hidden states follow the first-order Markov assumption:

\[ P(C_t \mid C_{1:t-1}) = P(C_t \mid C_{t-1}) \]

Also, the current observation depends only on the current state:

\[ P(X_t \mid C_{1:T}, X_{1:t-1}) = P(X_t \mid C_t) \]

2.4 ⚙️ Parameters of an HMM

Transition matrix ( ):

\[ \Gamma = \begin{bmatrix} P(C_{t+1} = 1 \mid C_t = 1) & \cdots & \cdots & P(C_{t+1} = m \mid C_t = m) \end{bmatrix} \]

Emission probabilities (depends on the application):

Normal: $\mathcal{N}(\mu_i, \sigma_i^2)$
Poisson: $\ \text{Pois}(\lambda_i)$
Skew-normal, etc.

Initial distribution:
$\ \delta = P(C_1 = i)$

2.5 📈 What You Can Do With HMMs

Task	What It Means
Likelihood evaluation	Compute $\\P(X_1, \dots, X_T)$
State estimation	Infer the most likely hidden states
Parameter estimation	Learn transition & emission parameters
Forecasting	Predict future observations or states

2.6 📐 Key Algorithms

2.6.1 🔄 Forward Algorithm

Efficiently computes the likelihood of observed data using recursion:

\[ \alpha_t(j) = \sum_i \alpha_{t-1}(i) \cdot \gamma_{ij} \cdot P(X_t \mid C_t = j) \]

Where:

$\alpha_t(j)$: probability of partial sequence ending in state $\\j$
$\gamma_{ij}$: transition probability from state $\\i$ to $\\j$
$\\P(X_t \mid C_t = j)$: emission probability at time $\\t$

2.6.2 🔁 Backward Algorithm

Used for:

Smoothing (estimating past states)
EM Algorithm for learning parameters

2.7 📊 Visualization

📈 Plot observations with colored hidden states
📊 Show state probabilities over time
📉 Compare model fit using criteria like AIC and BIC across different numbers of states

2.8 📌 Advantages of HMMs

✅ Handles hidden structure in time series
✅ Efficient thanks to Forward-Backward recursion
✅ Works well for sequential or temporal data
✅ Easily extended (e.g., with skew-normal emissions, covariates, etc.)

3 Main Part —> Project (1)

3.1 SX600 —> Analyzing data

Code

sx600 = get.hist.quote(instrument = "^stoxx", start = "2015-01-01",
                       quote = "AdjClose")

time series starts 2015-01-05
time series ends   2025-04-29

Code

sx600.num = as.numeric(sx600)
plot(sx600)

Code

nts = length(sx600)
sx600.sr = 100*diff(sx600.num)/sx600.num[-nts]
sx600.cr = diff(log(sx600))
plot(sx600.sr, type = "l", ylim = c(-10,10))

Code

hist(sx600.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.7*dsn(x,x=0.5,omega=0.8, alpha = 5)+ 0.2*dsn(x,x=0.6,omega=0.8, alpha = -4)+
+ 0.1*dsn(x,x=-1.85,omega=3, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code

set.seed(42)  # For reproducibility

data <- as.numeric(sx600.sr)
n <- length(data)

# Define your target proportions
weights <- c(0.9, 0.7, 0.4, 0.2)

# Initialize vectors to store estimated parameters
xi <- numeric(4)
omega <- numeric(4)
alpha <- numeric(4)

# Create overlapping subsets
splits <- list()
for (i in 1:4) {
  size_i <- floor(weights[i] * n)
  splits[[i]] <- sample(data, size_i, replace = FALSE)  # sample without replacement, but from full data each time
}

# Estimate skew-normal parameters for each subset
results <- list()
for (i in 1:4) {
  fit <- selm(splits[[i]] ~ 1, family = "SN")
  params <- coef(fit, param.type = "DP")
  # Store estimates
  xi[i] <- params["xi"]
  omega[i] <- params["omega"]
  alpha[i] <- params["alpha"]
  results[[i]] <- c(params,xi,omega, alpha)
  cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n",
              i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))
}

Subset 1 (90% of data): xi = 0.875, omega = 1.358, alpha = -1.378
Subset 2 (70% of data): xi = 0.858, omega = 1.345, alpha = -1.278
Subset 3 (40% of data): xi = 0.923, omega = 1.447, alpha = -1.557
Subset 4 (20% of data): xi = -0.597, omega = 1.255, alpha = 0.809

Code

forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 2:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
    return(results)
}


out = forward_skew_loop(data=as.numeric(sx600.sr), 
                        location.vec = xi, 
                        scale.vec = omega, 
                        shape = alpha)

for (m in 2:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}

HMM with 2 components has likelihood value: -3755.94 
HMM with 3 components has likelihood value: -3753.523 
HMM with 4 components has likelihood value: -3729.03

Code

# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 2:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}

[1] 0.8430069 0.1569931
          [,1]      [,2]
[1,] 0.8430069 0.1569931
[1] 0.3455869 0.2265738 0.4278392
          [,1]      [,2]      [,3]
[1,] 0.3455869 0.2265738 0.4278392
[1] 0.1680199 0.2624128 0.2600075 0.3095598
          [,1]      [,2]      [,3]      [,4]
[1,] 0.1680199 0.2624128 0.2600075 0.3095598

Code

llik = function(par, m, data) {
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  return(llik)
}

llik_vals = numeric(4)
for (i in 2:4) {
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)  
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sx600.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
  }

Log-likelihood for 2 components: -3755.939 
Log-likelihood for 3 components: -3753.525 
Log-likelihood for 4 components: -3729.031

Code

llik_optim_vals = numeric(4)
llik_optim_pars = list()
for (i in 2:4){
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0, i^2), rep(-Inf, i), rep(0.01, m), rep(-10, i)),
               upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)),
               control=list(fnscale=-1),data=as.numeric(sx600.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  llik_optim_pars[[i]]=outhmm$par
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}

MAX llik for 2 components: -3358.633 
MAX llik for 3 components: -3284.078 
MAX llik for 4 components: -3274.408

Code

aic_last_vals = numeric()
for (i in 2:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}

AIC for 2 components: 6733.266 
AIC for 3 components: 6598.156 
AIC for 4 components: 6596.815

Code

best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")


Best model is with 4 components (lowest AIC =  6596.815 )

3.2 SP500 —> Analyzing data

Code

sp500 = get.hist.quote(instrument = "^gspc", start = "2015-01-01"
                       , quote = "AdjClose")

time series starts 2015-01-02
time series ends   2025-04-29

Code

sp500.num = as.numeric(sp500)
plot(sp500)

Code

nts = length(sp500)
sp500.sr = 100*diff(sp500.num)/sp500.num[-nts]
sp500.cr = diff(log(sp500))
plot(sp500.sr, type = "l", ylim = c(-10,10))

Code

hist(sp500.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.5*dsn(x,x=0.3,omega=0.6, alpha = 5)+
  +0.2*dsn(x,x=0.4,omega=0.8, alpha = -4)+
  +0.1*dsn(x,x=-1,omega=1, alpha = 2)+
  +0.2*dsn(x,x=1,omega=1, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code

set.seed(45)  # For reproducibility

data <- as.numeric(sp500.sr)
n <- length(data)

# Define your target proportions
weights <- c(0.9, 0.7, 0.4, 0.2)

# Initialize vectors to store estimated parameters
xi <- numeric(4)
omega <- numeric(4)
alpha <- numeric(4)

# Create overlapping subsets
splits <- list()
for (i in 1:4) {
  size_i <- floor(weights[i] * n)
  splits[[i]] <- sample(data, size_i, replace = FALSE)  # sample without replacement, but from full data each time
}

# Estimate skew-normal parameters for each subset
results <- list()
for (i in 1:4) {
  fit <- selm(splits[[i]] ~ 1, family = "SN")
  params <- coef(fit, param.type = "DP")
  # Store estimates
  xi[i] <- params["xi"]
  omega[i] <- params["omega"]
  alpha[i] <- params["alpha"]
  results[[i]] <- c(params,xi,omega, alpha)
  cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n",
              i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))
}

Subset 1 (90% of data): xi = 0.795, omega = 1.370, alpha = -0.955
Subset 2 (70% of data): xi = 0.859, omega = 1.419, alpha = -1.052
Subset 3 (40% of data): xi = -0.649, omega = 1.369, alpha = 0.892
Subset 4 (20% of data): xi = 0.908, omega = 1.411, alpha = -1.186

Code

forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 2:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
  
  
  
  return(results)
}

out = forward_skew_loop(data=as.numeric(sp500.sr), 
                        location.vec = xi, 
                        scale.vec = omega, 
                        shape = alpha)


for (m in 2:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}

HMM with 2 components has likelihood value: -4023.631 
HMM with 3 components has likelihood value: -3987.416 
HMM with 4 components has likelihood value: -3971.691

Code

# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 2:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}

[1] 0.5477482 0.4522518
          [,1]      [,2]
[1,] 0.5477482 0.4522518
[1] 0.2035905 0.1775676 0.6188419
          [,1]      [,2]      [,3]
[1,] 0.2035905 0.1775676 0.6188419
[1] 0.2317339 0.3612939 0.2448064 0.1621657
          [,1]      [,2]      [,3]      [,4]
[1,] 0.2317339 0.3612939 0.2448064 0.1621657

Code

llik = function(par, m, data) {
  # Extract and normalize transition matrix
  gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE)
  gamma = gamma / apply(gamma, 1, sum)
  
  # Extract parameters
  location.vec = par[(m^2 + 1):(m^2 + m)]
  scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)]
  shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)]
  
  # Compute stationary distribution
  delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m)))
  
  n = length(data)
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m],
                                   omega = scale.vec[1:m], alpha = shape.vec[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m],
                                     omega = scale.vec[1:m], alpha = shape.vec[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  return(llik)
}







llik_vals = numeric(4)
for (i in 2:4) {
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)  
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sp500.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
}

Log-likelihood for 2 components: -4023.632 
Log-likelihood for 3 components: -3987.419 
Log-likelihood for 4 components: -3971.684

Code

llik_optim_vals = numeric(4)
llik_optim_pars = list()
for (i in 2:4){
  gamma_i = out[[i]][["gamma"]]
  location_init = xi[1:i]
  scale_init = omega[1:i]
  shape_init = alpha[1:i]
  par_i = c(t(gamma_i), location_init, scale_init, shape_init)
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0, i^2), rep(-Inf,i), rep(0.01, i), rep(-10, i)),
               upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)),
               control=list(fnscale=-1),data=as.numeric(sp500.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  llik_optim_pars[[i]]=outhmm$par
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}

MAX llik for 2 components: -3456.351 
MAX llik for 3 components: -3330.375 
MAX llik for 4 components: -3318.017

Code

aic_last_vals = numeric()
for (i in 2:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}

AIC for 2 components: 6928.701 
AIC for 3 components: 6690.75 
AIC for 4 components: 6684.033

Code

best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")


Best model is with 4 components (lowest AIC =  6684.033 )

--- title: "Risk for models and forecasting: Hidden Markov model with skew-normal distribution." author: - "Saidjon Rustamov" - "Dinara Khaldarova" format: html: theme: default css: custom_style.css toc: true toc-depth: 3 toc-location: left toc-title: "Table of contents" code-fold: true code-tools: true code-copy: true code-line-numbers: true number-sections: true number-depth: 3 fig-width: 8 fig-height: 6 fig-format: svg fig-dpi: 300 embed-resources: true page-layout: article fontsize: 1.1em linestretch: 1.5 title-block-banner: true smooth-scroll: true anchor-sections: true search: true header-includes: - \usepackage{sn} - \usepackage{tseries} - \usepackage{tidyverse} --- # SKEW-NORMAL DISTRIBUTION ## 🧠 What is the Skew-Normal Distribution? The skew-normal distribution is a generalization of the normal (Gaussian) distribution that allows for **asymmetry (skewness)**. While the standard normal distribution is **symmetric around the mean**, the skew-normal can **lean to the left or right**, depending on the **shape parameter**. ## 📈 Probability Density Function (PDF) The probability density function (PDF) of the **skew-normal distribution** is: $$ f(x; \xi, \omega, \alpha) = \frac{2}{\omega} \, \phi\left( \frac{x - \xi}{\omega} \right) \, \Phi\left( \alpha \cdot \frac{x - \xi}{\omega} \right) $$ Where: - $\phi(\cdot)$ : Standard normal PDF\ - $\Phi(\cdot)$ : Standard normal CDF\ - $\xi$ : Location parameter (like the mean)\ - $\omega$ \> 0 : Scale parameter (like the standard deviation)\ - $\alpha$ : Shape parameter (controls skewness) ## 🔁 Special Cases - If ( $\alpha$ = 0 ): It becomes a **standard normal distribution**. - If ( $\alpha$ \> 0 ): The distribution is **right-skewed** (tail leans right). - If ( $\alpha$ \< 0 ): The distribution is **left-skewed** (tail leans left). ## 🔍 Difference Between Normal and Skew-Normal Distributions ::: panel-tabset ### Main Differences The **normal distribution** is a special case of the skew-normal distribution. Here's how they differ: | Feature | Normal Distribution | Skew-Normal Distribution | |------------------------|------------------------|------------------------| | Shape | Symmetrical (bell curve) | Asymmetrical (can lean left or right) | | Skewness | Exactly 0 | Controlled by a shape parameter ( $\alpha$ ) | | PDF | Based on the standard normal function | Involves both the standard normal PDF and CDF | | Parameters | Mean ( $\mu$ ), SD ( $\sigma$ ) | Location ( $\xi$ ), Scale ( $\omega$ ), Shape ( $\alpha$ ) | | Flexibility | Limited to symmetry | More flexible; can model skewed data | | Special Case | — | Normal dist. when ( $\alpha$ = 0 ) | In summary, the **skew-normal** adds an extra parameter to capture **asymmetry**, making it more adaptable for real-world data that often deviates from perfect symmetry. ### Normal Distribution with Curve ```{r} # Set seed for reproducibility set.seed(123) # Generate 1000 random values from a normal distribution normal_data <- rnorm(1000, mean = 0, sd = 1) # Plot histogram with density instead of frequency hist(normal_data, breaks = 30, col = "skyblue", border = "white", probability = TRUE, main = "Normal Distribution with Density Curve", xlab = "Value", ylab = "Density") # Add a normal density curve curve(dnorm(x, mean = mean(normal_data), sd = sd(normal_data)), col = "darkblue", lwd = 2, add = TRUE) ``` ### 📈 Skew-Normal Distribution with Curve (alpha = 5) ```{r message=FALSE} library(sn) library(tseries) # Set seed for reproducibility set.seed(123) # Generate 1000 random values from a skew-normal distribution # Parameters: location (xi), scale (omega), shape (alpha) skew_data <- rsn(1000, xi = 0, omega = 1, alpha = 5) # Plot histogram with density hist(skew_data, breaks = 30, col = "lightgreen", border = "white", probability = TRUE, main = "Skew-Normal Distribution with Density Curve", xlab = "Value", ylab = "Density") # Add skew-normal density curve curve(dsn(x, xi = 0, omega = 1, alpha = 5), col = "darkgreen", lwd = 2, add = TRUE) ``` ### 📉 Skew-Normal Distribution (alpha = -5) ```{r} # Load required package library(sn) # Set seed for reproducibility set.seed(123) # Generate data from skew-normal distribution with alpha = -5 skew_data_neg <- rsn(n = 1000, xi = 0, omega = 1, alpha = -5) # Plot histogram with density hist(skew_data_neg, breaks = 30, col = "salmon", border = "white", probability = TRUE, main = "Skew-Normal Distribution (α = -5)", xlab = "Value", ylab = "Density") # Add skew-normal density curve curve(dsn(x, xi = 0, omega = 1, alpha = -5), col = "darkred", lwd = 2, add = TRUE) ``` ::: # HMM (Hidden Markov Models) ## 📦 What is a Hidden Markov Model? A **Hidden Markov Model (HMM)** is a statistical model where the system being modeled is assumed to follow a **Markov process** with **unobserved (hidden) states**. ::: callout-note In plain terms: - There's a process that **moves between hidden states** over time. - Each hidden state **emits an observable output**. - You **see the outputs**, but you **don’t see the states**. ::: ::: callout-warning **Finance** : Market regime switching: ***bull*** - ***bear*** market. ::: ## 🧩 HMM Components A **Hidden Markov Model (HMM)** is defined by the following key elements: ### 🔒 Hidden States Denoted as: $$ C_1, C_2, \dots, C_T $$ - A finite set of unobserved states (e.g., *calm/volatile*, *sunny/rainy*). - The state at time ( t ), denoted ( C_t ), follows a **Markov chain**: $$ P(C_t \mid C_{t-1}) = \text{transition probability} $$ This means the current state only depends on the **previous state**, not the full history. ### 📈 Observations Denoted as: $$ X_1, X_2, \dots, X_T $$ - Each state emits data based on a specific **probability distribution**. - The observation at time ( t ), ( X_t ), depends **only on the hidden state** at that time: $$ P(X_t \mid C_t) $$ These are sometimes called the **emission probabilities**. ### 🎲 Initial Probabilities The model starts with an **initial distribution** over the hidden states: $$ \delta = P(C_1) $$ This represents the probability of starting in each possible hidden state. ## 🔁 Markov Property The hidden states follow the **first-order Markov assumption**: $$ P(C_t \mid C_{1:t-1}) = P(C_t \mid C_{t-1}) $$ Also, the current observation depends only on the current state: $$ P(X_t \mid C_{1:T}, X_{1:t-1}) = P(X_t \mid C_t) $$ ## ⚙️ Parameters of an HMM Transition matrix ( \Gamma ): $$ \Gamma = \begin{bmatrix} P(C_{t+1} = 1 \mid C_t = 1) & \cdots & \cdots & P(C_{t+1} = m \mid C_t = m) \end{bmatrix} $$ ------------------------------------------------------------------------ Emission probabilities (depends on the application): - Normal: $\mathcal{N}(\mu_i, \sigma_i^2)$ - Poisson: $\ \text{Pois}(\lambda_i)$ - Skew-normal, etc. ------------------------------------------------------------------------ Initial distribution:\ $\ \delta = P(C_1 = i)$ ## 📈 What You Can Do With HMMs | **Task** | **What It Means** | |-----------------------|----------------------------------------| | Likelihood evaluation | Compute $\\P(X_1, \dots, X_T)$ | | State estimation | Infer the most likely hidden states | | Parameter estimation | Learn transition & emission parameters | | Forecasting | Predict future observations or states | ## 📐 Key Algorithms ------------------------------------------------------------------------ ### 🔄 Forward Algorithm Efficiently computes the **likelihood** of observed data using recursion: $$ \alpha_t(j) = \sum_i \alpha_{t-1}(i) \cdot \gamma_{ij} \cdot P(X_t \mid C_t = j) $$ Where: - $\alpha_t(j)$: probability of partial sequence ending in state $\\j$ - $\gamma_{ij}$: transition probability from state $\\i$ to $\\j$ - $\\P(X_t \mid C_t = j)$: emission probability at time $\\t$ ------------------------------------------------------------------------ ### 🔁 Backward Algorithm Used for: - **Smoothing** (estimating past states) - **EM Algorithm** for learning parameters ------------------------------------------------------------------------ ## 📊 Visualization - 📈 **Plot observations** with colored hidden states\ - 📊 Show **state probabilities** over time\ - 📉 Compare **model fit** using criteria like **AIC** and **BIC** across different numbers of states ------------------------------------------------------------------------ ## 📌 Advantages of HMMs ✅ Handles **hidden structure** in time series\ ✅ Efficient thanks to **Forward-Backward recursion**\ ✅ Works well for **sequential or temporal data**\ ✅ Easily extended (e.g., with **skew-normal emissions**, **covariates**, etc.) # Main Part ---\> Project (1) ## SX600 ---> Analyzing data ::: panel-tabset ### Uploading data and Plotting ```{r} sx600 = get.hist.quote(instrument = "^stoxx", start = "2015-01-01", quote = "AdjClose") sx600.num = as.numeric(sx600) plot(sx600) nts = length(sx600) sx600.sr = 100*diff(sx600.num)/sx600.num[-nts] sx600.cr = diff(log(sx600)) plot(sx600.sr, type = "l", ylim = c(-10,10)) hist(sx600.sr, breaks = 500, xlim = c(-4, 4), probability = TRUE, col = "red", border = "white", main = "Histogram with Fitted Skew-Normal Curve2", xlab = "Returns") f=function(x) 0.7*dsn(x,x=0.5,omega=0.8, alpha = 5)+ 0.2*dsn(x,x=0.6,omega=0.8, alpha = -4)+ + 0.1*dsn(x,x=-1.85,omega=3, alpha = 2) curve(f, col = "blue", lwd = 2, add = TRUE) ``` ### Estimating parameters using "sn" library ```{r} set.seed(42) # For reproducibility data <- as.numeric(sx600.sr) n <- length(data) # Define your target proportions weights <- c(0.9, 0.7, 0.4, 0.2) # Initialize vectors to store estimated parameters xi <- numeric(4) omega <- numeric(4) alpha <- numeric(4) # Create overlapping subsets splits <- list() for (i in 1:4) { size_i <- floor(weights[i] * n) splits[[i]] <- sample(data, size_i, replace = FALSE) # sample without replacement, but from full data each time } # Estimate skew-normal parameters for each subset results <- list() for (i in 1:4) { fit <- selm(splits[[i]] ~ 1, family = "SN") params <- coef(fit, param.type = "DP") # Store estimates xi[i] <- params["xi"] omega[i] <- params["omega"] alpha[i] <- params["alpha"] results[[i]] <- c(params,xi,omega, alpha) cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n", i, weights[i]*100, params["xi"], params["omega"], params["alpha"])) } ``` ### Forward function ```{r} forward_skew_loop = function(data, location.vec, scale.vec, shape) { results = list() for (m in 2:4) { n = length(data) # Generate gamma: m x m transition matrix with row sums = 1 gamma = matrix(runif(m * m), nrow = m) gamma = gamma / rowSums(gamma) # Initial state probabilities (delta): uniform delta = rep(1/m, m) # Initialize matrices pred = matrix(NA, nrow = m, ncol = n) filtering = matrix(NA, nrow = m, ncol = n) # Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] / sum(filtering[, 1]) # Loop over time for (t in 2:n) { pred[, t] = c(t(filtering[, t - 1]) %*% gamma) filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = llik + log(sum(filtering[, t])) filtering[, t] = filtering[, t] / sum(filtering[, t]) } results[[m]] = list( llik = llik, filtering = filtering, pred = pred, m = m, gamma = gamma ) } return(results) } out = forward_skew_loop(data=as.numeric(sx600.sr), location.vec = xi, scale.vec = omega, shape = alpha) for (m in 2:4) { cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n") } ``` ### Likelihood function and invariant distribution ```{r} # Initialize empty lists to store results transition_m = list() solved = list() for (m in 2:4) { gamma_m = out[[m]][["gamma"]] # Compute stationary distribution transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m)) # Multiply with gamma to check if it's stationary solved[[m]] = transition_m[[m]] %*% gamma_m print(transition_m[[m]]) print(solved[[m]]) } llik = function(par, m, data) { # Extract and normalize transition matrix gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE) gamma = gamma / apply(gamma, 1, sum) # Extract parameters location.vec = par[(m^2 + 1):(m^2 + m)] scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)] shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)] # Compute stationary distribution delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m))) n = length(data) pred = matrix(NA, nrow = m, ncol = n) filtering = matrix(NA, nrow = m, ncol = n) # Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik = log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] / sum(filtering[, 1]) # Loop over time for (t in 2:n) { pred[, t] = c(t(filtering[, t - 1]) %*% gamma) filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik = llik + log(sum(filtering[, t])) filtering[, t] = filtering[, t] / sum(filtering[, t]) } return(llik) } llik_vals = numeric(4) for (i in 2:4) { gamma_i = out[[i]][["gamma"]] location_init = xi[1:i] scale_init = omega[1:i] shape_init = alpha[1:i] par_i = c(t(gamma_i), location_init, scale_init, shape_init) llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sx600.sr)) cat("Log-likelihood for", i, "components:", llik_vals[i], "\n") } ``` ### Likelihood Optimization and AIC model selection ```{r} llik_optim_vals = numeric(4) llik_optim_pars = list() for (i in 2:4){ gamma_i = out[[i]][["gamma"]] location_init = xi[1:i] scale_init = omega[1:i] shape_init = alpha[1:i] par_i = c(t(gamma_i), location_init, scale_init, shape_init) outhmm=optim(par_i, fn=llik, method="L-BFGS-B", lower=c(rep(0, i^2), rep(-Inf, i), rep(0.01, m), rep(-10, i)), upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)), control=list(fnscale=-1),data=as.numeric(sx600.sr),m=i) llik_optim_vals[i]=outhmm$value llik_optim_pars[[i]]=outhmm$par cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n") } aic_last_vals = numeric() for (i in 2:4){ aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i) cat("AIC for", i, "components:", aic_last_vals[i], "\n") } best_model = which.min(aic_last_vals) cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n") ``` ::: ## SP500 ---> Analyzing data ::: panel-tabset ### Uploading data and Plotting ```{r} sp500 = get.hist.quote(instrument = "^gspc", start = "2015-01-01" , quote = "AdjClose") sp500.num = as.numeric(sp500) plot(sp500) nts = length(sp500) sp500.sr = 100*diff(sp500.num)/sp500.num[-nts] sp500.cr = diff(log(sp500)) plot(sp500.sr, type = "l", ylim = c(-10,10)) hist(sp500.sr, breaks = 500, xlim = c(-4, 4), probability = TRUE, col = "red", border = "white", main = "Histogram with Fitted Skew-Normal Curve2", xlab = "Returns") f=function(x) 0.5*dsn(x,x=0.3,omega=0.6, alpha = 5)+ +0.2*dsn(x,x=0.4,omega=0.8, alpha = -4)+ +0.1*dsn(x,x=-1,omega=1, alpha = 2)+ +0.2*dsn(x,x=1,omega=1, alpha = 2) curve(f, col = "blue", lwd = 2, add = TRUE) ``` ### Estimating parameters using "sn" library ```{r} set.seed(45) # For reproducibility data <- as.numeric(sp500.sr) n <- length(data) # Define your target proportions weights <- c(0.9, 0.7, 0.4, 0.2) # Initialize vectors to store estimated parameters xi <- numeric(4) omega <- numeric(4) alpha <- numeric(4) # Create overlapping subsets splits <- list() for (i in 1:4) { size_i <- floor(weights[i] * n) splits[[i]] <- sample(data, size_i, replace = FALSE) # sample without replacement, but from full data each time } # Estimate skew-normal parameters for each subset results <- list() for (i in 1:4) { fit <- selm(splits[[i]] ~ 1, family = "SN") params <- coef(fit, param.type = "DP") # Store estimates xi[i] <- params["xi"] omega[i] <- params["omega"] alpha[i] <- params["alpha"] results[[i]] <- c(params,xi,omega, alpha) cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n", i, weights[i]*100, params["xi"], params["omega"], params["alpha"])) } ``` ### Forward function ```{r} forward_skew_loop = function(data, location.vec, scale.vec, shape) { results = list() for (m in 2:4) { n = length(data) # Generate gamma: m x m transition matrix with row sums = 1 gamma = matrix(runif(m * m), nrow = m) gamma = gamma / rowSums(gamma) # Initial state probabilities (delta): uniform delta = rep(1/m, m) # Initialize matrices pred = matrix(NA, nrow = m, ncol = n) filtering = matrix(NA, nrow = m, ncol = n) # Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] / sum(filtering[, 1]) # Loop over time for (t in 2:n) { pred[, t] = c(t(filtering[, t - 1]) %*% gamma) filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = llik + log(sum(filtering[, t])) filtering[, t] = filtering[, t] / sum(filtering[, t]) } results[[m]] = list( llik = llik, filtering = filtering, pred = pred, m = m, gamma = gamma ) } return(results) } out = forward_skew_loop(data=as.numeric(sp500.sr), location.vec = xi, scale.vec = omega, shape = alpha) for (m in 2:4) { cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n") } ``` ### Likelihood function and invariant distribution ```{r} # Initialize empty lists to store results transition_m = list() solved = list() for (m in 2:4) { gamma_m = out[[m]][["gamma"]] # Compute stationary distribution transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m)) # Multiply with gamma to check if it's stationary solved[[m]] = transition_m[[m]] %*% gamma_m print(transition_m[[m]]) print(solved[[m]]) } llik = function(par, m, data) { # Extract and normalize transition matrix gamma = matrix(par[1:m^2], nrow = m, byrow = TRUE) gamma = gamma / apply(gamma, 1, sum) # Extract parameters location.vec = par[(m^2 + 1):(m^2 + m)] scale.vec = par[(m^2 + m + 1):(m^2 + 2*m)] shape.vec = par[(m^2 + 2*m + 1):(m^2 + 3*m)] # Compute stationary distribution delta = c(solve(t(diag(m) - gamma + matrix(1, m, m)), rep(1, m))) n = length(data) pred = matrix(NA, nrow = m, ncol = n) filtering = matrix(NA, nrow = m, ncol = n) # Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik = log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] / sum(filtering[, 1]) # Loop over time for (t in 2:n) { pred[, t] = c(t(filtering[, t - 1]) %*% gamma) filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik = llik + log(sum(filtering[, t])) filtering[, t] = filtering[, t] / sum(filtering[, t]) } return(llik) } llik_vals = numeric(4) for (i in 2:4) { gamma_i = out[[i]][["gamma"]] location_init = xi[1:i] scale_init = omega[1:i] shape_init = alpha[1:i] par_i = c(t(gamma_i), location_init, scale_init, shape_init) llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sp500.sr)) cat("Log-likelihood for", i, "components:", llik_vals[i], "\n") } ``` ### Likelihood Optimization and AIC model selection ```{r} llik_optim_vals = numeric(4) llik_optim_pars = list() for (i in 2:4){ gamma_i = out[[i]][["gamma"]] location_init = xi[1:i] scale_init = omega[1:i] shape_init = alpha[1:i] par_i = c(t(gamma_i), location_init, scale_init, shape_init) outhmm=optim(par_i, fn=llik, method="L-BFGS-B", lower=c(rep(0, i^2), rep(-Inf,i), rep(0.01, i), rep(-10, i)), upper=c(rep(1, i^2), rep(Inf, i), rep(Inf, i), rep(10, i)), control=list(fnscale=-1),data=as.numeric(sp500.sr),m=i) llik_optim_vals[i]=outhmm$value llik_optim_pars[[i]]=outhmm$par cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n") } aic_last_vals = numeric() for (i in 2:4){ aic_last_vals[i] = -2*llik_optim_vals[i]+2*(i*(i-1)+3*i) cat("AIC for", i, "components:", aic_last_vals[i], "\n") } best_model = which.min(aic_last_vals) cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n") ``` :::