1 SKEW-NORMAL DISTRIBUTION

1.1 🧠 What is the Skew-Normal Distribution?

The skew-normal distribution is a generalization of the normal (Gaussian) distribution that allows for asymmetry (skewness). While the standard normal distribution is symmetric around the mean, the skew-normal can lean to the left or right, depending on the shape parameter.

1.2 📈 Probability Density Function (PDF)

The probability density function (PDF) of the skew-normal distribution is:

\[ f(x; \xi, \omega, \alpha) = \frac{2}{\omega} \, \phi\left( \frac{x - \xi}{\omega} \right) \, \Phi\left( \alpha \cdot \frac{x - \xi}{\omega} \right) \]

Where:

$\phi(\cdot)$ : Standard normal PDF
$\Phi(\cdot)$ : Standard normal CDF
$\xi$ : Location parameter (like the mean)
$\omega$ > 0 : Scale parameter (like the standard deviation)
$\alpha$ : Shape parameter (controls skewness)

1.3 🔁 Special Cases

If ( $\alpha$ = 0 ): It becomes a standard normal distribution.
If ( $\alpha$ > 0 ): The distribution is right-skewed (tail leans right).
If ( $\alpha$ < 0 ): The distribution is left-skewed (tail leans left).

1.4 🔍 Difference Between Normal and Skew-Normal Distributions

The normal distribution is a special case of the skew-normal distribution. Here’s how they differ:

Feature	Normal Distribution	Skew-Normal Distribution
Shape	Symmetrical (bell curve)	Asymmetrical (can lean left or right)
Skewness	Exactly 0	Controlled by a shape parameter ( $\alpha$ )
PDF	Based on the standard normal function	Involves both the standard normal PDF and CDF
Parameters	Mean ( $\mu$ ), SD ( $\sigma$ )	Location ( $\xi$ ), Scale ( $\omega$ ), Shape ( $\alpha$ )
Flexibility	Limited to symmetry	More flexible; can model skewed data
Special Case	—	Normal dist. when ( $\alpha$ = 0 )

In summary, the skew-normal adds an extra parameter to capture asymmetry, making it more adaptable for real-world data that often deviates from perfect symmetry.

Code

# Set seed for reproducibility
set.seed(123)

# Generate 1000 random values from a normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)

# Plot histogram with density instead of frequency
hist(normal_data,
     breaks = 30,
     col = "skyblue",
     border = "white",
     probability = TRUE,
     main = "Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add a normal density curve
curve(dnorm(x, mean = mean(normal_data), sd = sd(normal_data)),
      col = "darkblue",
      lwd = 2,
      add = TRUE)

Code

#/ warning=FALSE
#/ message=FALSE
#/ error=FALSE
#/ output=FALSE
# Install and load the sn package (only if not already installed)
# install.packages("sn")
library(sn)

# Set seed for reproducibility
set.seed(123)

# Generate 1000 random values from a skew-normal distribution
# Parameters: location (xi), scale (omega), shape (alpha)
skew_data <- rsn(1000, xi = 0, omega = 1, alpha = 5)

# Plot histogram with density
hist(skew_data,
     breaks = 30,
     col = "lightgreen",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution with Density Curve",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = 5),
      col = "darkgreen",
      lwd = 2,
      add = TRUE)

Code

# Load required package
library(sn)

# Set seed for reproducibility
set.seed(123)

# Generate data from skew-normal distribution with alpha = -5
skew_data_neg <- rsn(n = 1000, xi = 0, omega = 1, alpha = -5)

# Plot histogram with density
hist(skew_data_neg,
     breaks = 30,
     col = "salmon",
     border = "white",
     probability = TRUE,
     main = "Skew-Normal Distribution (α = -5)",
     xlab = "Value",
     ylab = "Density")

# Add skew-normal density curve
curve(dsn(x, xi = 0, omega = 1, alpha = -5),
      col = "darkred",
      lwd = 2,
      add = TRUE)

2 HMM (Hidden Markov Models)

2.1 📦 What is a Hidden Markov Model?

A Hidden Markov Model (HMM) is a statistical model where the system being modeled is assumed to follow a Markov process with unobserved (hidden) states.

Note

In plain terms: - There’s a process that moves between hidden states over time. - Each hidden state emits an observable output. - You see the outputs, but you don’t see the states.

Warning

Finance : Market regime switching: bull - bear market.

2.2 🧩 HMM Components

A Hidden Markov Model (HMM) is defined by the following key elements:

2.2.1 🔒 Hidden States

Denoted as:

\[ C_1, C_2, \dots, C_T \]

A finite set of unobserved states (e.g., calm/volatile, sunny/rainy).
The state at time ( t ), denoted ( C_t ), follows a Markov chain:

\[ P(C_t \mid C_{t-1}) = \text{transition probability} \]

This means the current state only depends on the previous state, not the full history.

2.2.2 📈 Observations

Denoted as:

\[ X_1, X_2, \dots, X_T \]

Each state emits data based on a specific probability distribution.
The observation at time ( t ), ( X_t ), depends only on the hidden state at that time:

\[ P(X_t \mid C_t) \]

These are sometimes called the emission probabilities.

2.2.3 🎲 Initial Probabilities

The model starts with an initial distribution over the hidden states:

\[ \delta = P(C_1) \]

This represents the probability of starting in each possible hidden state.

2.3 🔁 Markov Property

The hidden states follow the first-order Markov assumption:

\[ P(C_t \mid C_{1:t-1}) = P(C_t \mid C_{t-1}) \]

Also, the current observation depends only on the current state:

\[ P(X_t \mid C_{1:T}, X_{1:t-1}) = P(X_t \mid C_t) \]

2.4 ⚙️ Parameters of an HMM

Transition matrix ( ):

\[ \Gamma = \begin{bmatrix} P(C_{t+1} = 1 \mid C_t = 1) & \cdots & \cdots & P(C_{t+1} = m \mid C_t = m) \end{bmatrix} \]

Emission probabilities (depends on the application):

Normal: $\mathcal{N}(\mu_i, \sigma_i^2)$
Poisson: $\ \text{Pois}(\lambda_i)$
Skew-normal, etc.

Initial distribution:
$\ \delta = P(C_1 = i)$

2.5 📈 What You Can Do With HMMs

Task	What It Means
Likelihood evaluation	Compute $\\P(X_1, \dots, X_T)$
State estimation	Infer the most likely hidden states
Parameter estimation	Learn transition & emission parameters
Forecasting	Predict future observations or states

2.6 📐 Key Algorithms

2.6.1 🔄 Forward Algorithm

Efficiently computes the likelihood of observed data using recursion:

\[ \alpha_t(j) = \sum_i \alpha_{t-1}(i) \cdot \gamma_{ij} \cdot P(X_t \mid C_t = j) \]

Where:

$\alpha_t(j)$: probability of partial sequence ending in state $\\j$
$\gamma_{ij}$: transition probability from state $\\i$ to $\\j$
$\\P(X_t \mid C_t = j)$: emission probability at time $\\t$

2.6.2 🔁 Backward Algorithm

Used for:

Smoothing (estimating past states)
EM Algorithm for learning parameters

2.7 📊 Visualization

📈 Plot observations with colored hidden states
📊 Show state probabilities over time
📉 Compare model fit using criteria like AIC and BIC across different numbers of states

2.8 📌 Advantages of HMMs

✅ Handles hidden structure in time series
✅ Efficient thanks to Forward-Backward recursion
✅ Works well for sequential or temporal data
✅ Easily extended (e.g., with skew-normal emissions, covariates, etc.)

3 Main Part —> Project (1)

3.1 SX600 —> Analyzing data

Code

library(tseries)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

Code

library(sn)
sx600 = get.hist.quote(instrument = "^stoxx", start = "2015-01-01",
                       quote = "AdjClose")

time series starts 2015-01-05
time series ends   2025-04-15

Code

sx600.num = as.numeric(sx600)
plot(sx600)

Code

nts = length(sx600)
sx600.sr = 100*diff(sx600.num)/sx600.num[-nts]
sx600.cr = diff(log(sx600))
plot(sx600.sr, type = "l", ylim = c(-10,10))

Code

hist(sx600.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.7*dsn(x,x=0.5,omega=0.8, alpha = 5)+ 0.2*dsn(x,x=0.6,omega=0.8, alpha = -4)+
+ 0.1*dsn(x,x=-1.85,omega=3, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code

forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 1:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
  
  
  
  return(results)
}

out = forward_skew_loop(data=as.numeric(sx600.sr), 
                        location.vec = c(0.5,0.6,-1.85,0), 
                        shape = c(0.8,0.8,3,1), 
                        scale.vec = c(5,4,2,3))


for (m in 1:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}

HMM with 1 components has likelihood value: -6796.344 
HMM with 2 components has likelihood value: -6605.417 
HMM with 3 components has likelihood value: -5321.959 
HMM with 4 components has likelihood value: -5297.602

Code

# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 1:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}

[1] 1
     [,1]
[1,]    1
[1] 0.5760423 0.4239577
          [,1]      [,2]
[1,] 0.5760423 0.4239577
[1] 0.2437833 0.4661603 0.2900564
          [,1]      [,2]      [,3]
[1,] 0.2437833 0.4661603 0.2900564
[1] 0.2903220 0.1609937 0.2904152 0.2582691
         [,1]      [,2]      [,3]      [,4]
[1,] 0.290322 0.1609937 0.2904152 0.2582691

Code

llik=function(par,m,data){
  gamma=matrix(par[1:m^2],nrow=m,byrow=TRUE)
  gamma=gamma/apply(gamma,FUN=sum,MAR=1)
  delta=c(solve(t(diag(m)-gamma+matrix(1,m,m)),rep(1,m)))
  delta=rep(1/m,m)
  location.vec=rep(0.0,m)
  scale.vec=par[(m^2+1):(m^2+m)]
  shape=seq(1,m)
  n = length(data)
  # Initialize matrices
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
 
  return(llik)
}


llik_vals = numeric(4)
for (i in 1:4) {
  gamma_i = out[[i]][["gamma"]]
  
  # Build parameter vector: gamma + sd.vec (we're assuming mean=0, skew=0 here)
  par_i = c(t(gamma_i), rep(1, i))  # Replace rep(1, i) with actual sd/shape if needed
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sx600.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
  }

Log-likelihood for 1 components: -4650.545 
Log-likelihood for 2 components: -4890.41 
Log-likelihood for 3 components: -5371.747 
Log-likelihood for 4 components: -5401.952

Code

llik_optim_vals = numeric(4)
for (i in 1:4){
  gamma_i = out[[i]][["gamma"]]
  par_i = c(t(gamma_i), rep(1, i))
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0,i^2),rep(0.05,i)),
               upper=c(rep(Inf,i^2),rep(Inf,i)),
               control=list(fnscale=-1),data=as.numeric(sx600.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}

MAX llik for 1 components: -4396.746 
MAX llik for 2 components: -4103.116 
MAX llik for 3 components: -4103.522 
MAX llik for 4 components: -4100.227

Code

aic_last_vals = numeric()
for (i in 1:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*i*(i+2)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}

AIC for 1 components: 8799.492 
AIC for 2 components: 8222.232 
AIC for 3 components: 8237.043 
AIC for 4 components: 8248.454

Code

best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")


Best model is with 2 components (lowest AIC =  8222.232 )

3.2 SP500 —> Analyzing data

Code

sp500 = get.hist.quote(instrument = "^gspc", start = "2015-01-01"
                       , quote = "AdjClose")

time series starts 2015-01-02
time series ends   2025-04-15

Code

sp500.num = as.numeric(sp500)
plot(sp500)

Code

nts = length(sp500)
sp500.sr = 100*diff(sp500.num)/sp500.num[-nts]
sp500.cr = diff(log(sp500))
plot(sp500.sr, type = "l", ylim = c(-10,10))

Code

hist(sp500.sr,
     breaks = 500,               
     xlim = c(-4, 4),
     probability = TRUE,         
     col = "red",
     border = "white",
     main = "Histogram with Fitted Skew-Normal Curve2",
     xlab = "Returns")


f=function(x) 0.5*dsn(x,x=0.3,omega=0.6, alpha = 5)+
  +0.2*dsn(x,x=0.4,omega=0.8, alpha = -4)+
  +0.1*dsn(x,x=-1,omega=1, alpha = 2)+
  +0.2*dsn(x,x=1,omega=1, alpha = 2)
curve(f,
      col = "blue",
      lwd = 2,
      add = TRUE)

Code

forward_skew_loop = function(data, location.vec, scale.vec, shape) {
  results = list()
  
  for (m in 1:4) {
    n = length(data)
    
    # Generate gamma: m x m transition matrix with row sums = 1
    gamma = matrix(runif(m * m), nrow = m)
    gamma = gamma / rowSums(gamma)
    
    # Initial state probabilities (delta): uniform
    delta = rep(1/m, m)
    
    # Initialize matrices
    pred = matrix(NA, nrow = m, ncol = n)
    filtering = matrix(NA, nrow = m, ncol = n)
    
    # Time t = 1
    pred[, 1] = delta
    filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = log(sum(filtering[, 1]))
    filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
    
    # Loop over time
    for (t in 2:n) {
      pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
      filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
      llik = llik + log(sum(filtering[, t]))
      filtering[, t] = filtering[, t] / sum(filtering[, t])
    }
    
    results[[m]] = list(
      llik = llik,
      filtering = filtering,
      pred = pred,
      m = m,
      gamma = gamma
    )
    
  }
  
  
  
  return(results)
}

out = forward_skew_loop(data=as.numeric(sp500.sr), 
                        location.vec = c(0.3,0.4,1,1), 
                        shape = c(0.6,0.8,1,1), 
                        scale.vec = c(5,4,2,2))

for (m in 1:4) {
  cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")
}

HMM with 1 components has likelihood value: -6689.809 
HMM with 2 components has likelihood value: -6363.563 
HMM with 3 components has likelihood value: -6243.174 
HMM with 4 components has likelihood value: -6178.918

Code

# Initialize empty lists to store results
transition_m = list()
solved = list()

for (m in 1:4) {
  gamma_m = out[[m]][["gamma"]]
  
  # Compute stationary distribution
  transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m))
  
  # Multiply with gamma to check if it's stationary
  solved[[m]] = transition_m[[m]] %*% gamma_m
  print(transition_m[[m]])
  print(solved[[m]])
  
}

[1] 1
     [,1]
[1,]    1
[1] 0.2547356 0.7452644
          [,1]      [,2]
[1,] 0.2547356 0.7452644
[1] 0.3213792 0.3375037 0.3411171
          [,1]      [,2]      [,3]
[1,] 0.3213792 0.3375037 0.3411171
[1] 0.2251955 0.3108865 0.2652819 0.1986360
          [,1]      [,2]      [,3]     [,4]
[1,] 0.2251955 0.3108865 0.2652819 0.198636

Code

llik=function(par,m,data){
  gamma=matrix(par[1:m^2],nrow=m,byrow=TRUE)
  gamma=gamma/apply(gamma,FUN=sum,MAR=1)
  delta=c(solve(t(diag(m)-gamma+matrix(1,m,m)),rep(1,m)))
  delta=rep(1/m,m)
  location.vec=rep(0.0,m)
  scale.vec=par[(m^2+1):(m^2+m)]
  shape=seq(1,m)
  n = length(data)
  # Initialize matrices
  pred = matrix(NA, nrow = m, ncol = n)
  filtering = matrix(NA, nrow = m, ncol = n)
  
  # Time t = 1
  pred[, 1] = delta
  filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
  llik = log(sum(filtering[, 1]))
  filtering[, 1] = filtering[, 1] / sum(filtering[, 1])
  
  # Loop over time
  for (t in 2:n) {
    pred[, t] = c(t(filtering[, t - 1]) %*% gamma)
    filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m])
    llik = llik + log(sum(filtering[, t]))
    filtering[, t] = filtering[, t] / sum(filtering[, t])
  }
  
  
  return(llik)
}


llik_vals = numeric(4)
for (i in 1:4) {
  gamma_i = out[[i]][["gamma"]]
  
  # Build parameter vector: gamma + sd.vec (we're assuming mean=0, skew=0 here)
  par_i = c(t(gamma_i), rep(1, i))  # Replace rep(1, i) with actual sd/shape if needed
  
  llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sp500.sr))
  cat("Log-likelihood for", i, "components:", llik_vals[i], "\n")
}

Log-likelihood for 1 components: -5007.313 
Log-likelihood for 2 components: -5498.02 
Log-likelihood for 3 components: -5507.632 
Log-likelihood for 4 components: -5679.738

Code

llik_optim_vals = numeric(4)
for (i in 1:4){
  gamma_i = out[[i]][["gamma"]]
  par_i = c(t(gamma_i), rep(1, i))
  outhmm=optim(par_i,
               fn=llik,
               method="L-BFGS-B",
               lower=c(rep(0,i^2),rep(0.05,i)),
               upper=c(rep(Inf,i^2),rep(Inf,i)),
               control=list(fnscale=-1),data=as.numeric(sp500.sr),m=i)
  llik_optim_vals[i]=outhmm$value
  cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n")
}

MAX llik for 1 components: -4577.977 
MAX llik for 2 components: -4178.241 
MAX llik for 3 components: -4162.306 
MAX llik for 4 components: -4162.594

Code

aic_last_vals = numeric()
for (i in 1:4){
  aic_last_vals[i] = -2*llik_optim_vals[i]+2*i*(i+2)
  cat("AIC for", i, "components:", aic_last_vals[i], "\n")
}

AIC for 1 components: 9161.955 
AIC for 2 components: 8372.481 
AIC for 3 components: 8354.613 
AIC for 4 components: 8373.188

Code

best_model = which.min(aic_last_vals)
cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n")


Best model is with 3 components (lowest AIC =  8354.613 )

--- title: "Risk for models and forecasting: Hidden Markov model with skew-normal distribution." author: - "Saidjon Rustamov" - "Dinara Khaldarova" format: html: theme: default css: custom_style.css toc: true toc-depth: 3 toc-location: left toc-title: "Table of contents" code-fold: true code-tools: true code-copy: true code-line-numbers: true number-sections: true number-depth: 3 fig-width: 8 fig-height: 6 fig-format: svg fig-dpi: 300 embed-resources: true page-layout: article fontsize: 1.1em linestretch: 1.5 title-block-banner: true smooth-scroll: true anchor-sections: true search: true header-includes: - \usepackage{sn} - \usepackage{tseries} - \usepackage{tidyverse} --- # SKEW-NORMAL DISTRIBUTION ## 🧠 What is the Skew-Normal Distribution? The skew-normal distribution is a generalization of the normal (Gaussian) distribution that allows for **asymmetry (skewness)**. While the standard normal distribution is **symmetric around the mean**, the skew-normal can **lean to the left or right**, depending on the **shape parameter**. ## 📈 Probability Density Function (PDF) The probability density function (PDF) of the **skew-normal distribution** is: $$ f(x; \xi, \omega, \alpha) = \frac{2}{\omega} \, \phi\left( \frac{x - \xi}{\omega} \right) \, \Phi\left( \alpha \cdot \frac{x - \xi}{\omega} \right) $$ Where: - $\phi(\cdot)$ : Standard normal PDF\ - $\Phi(\cdot)$ : Standard normal CDF\ - $\xi$ : Location parameter (like the mean)\ - $\omega$ \> 0 : Scale parameter (like the standard deviation)\ - $\alpha$ : Shape parameter (controls skewness) ## 🔁 Special Cases - If ( $\alpha$ = 0 ): It becomes a **standard normal distribution**. - If ( $\alpha$ \> 0 ): The distribution is **right-skewed** (tail leans right). - If ( $\alpha$ \< 0 ): The distribution is **left-skewed** (tail leans left). ## 🔍 Difference Between Normal and Skew-Normal Distributions ::: panel-tabset ### Main Differences The **normal distribution** is a special case of the skew-normal distribution. Here's how they differ: | Feature | Normal Distribution | Skew-Normal Distribution | |------------------------|------------------------|------------------------| | Shape | Symmetrical (bell curve) | Asymmetrical (can lean left or right) | | Skewness | Exactly 0 | Controlled by a shape parameter ( $\alpha$ ) | | PDF | Based on the standard normal function | Involves both the standard normal PDF and CDF | | Parameters | Mean ( $\mu$ ), SD ( $\sigma$ ) | Location ( $\xi$ ), Scale ( $\omega$ ), Shape ( $\alpha$ ) | | Flexibility | Limited to symmetry | More flexible; can model skewed data | | Special Case | — | Normal dist. when ( $\alpha$ = 0 ) | In summary, the **skew-normal** adds an extra parameter to capture **asymmetry**, making it more adaptable for real-world data that often deviates from perfect symmetry. ### Normal Distribution with Curve ```{r} # Set seed for reproducibility set.seed(123) # Generate 1000 random values from a normal distribution normal_data <- rnorm(1000, mean = 0, sd = 1) # Plot histogram with density instead of frequency hist(normal_data, breaks = 30, col = "skyblue", border = "white", probability = TRUE, main = "Normal Distribution with Density Curve", xlab = "Value", ylab = "Density") # Add a normal density curve curve(dnorm(x, mean = mean(normal_data), sd = sd(normal_data)), col = "darkblue", lwd = 2, add = TRUE) ``` ### 📈 Skew-Normal Distribution with Curve (alpha = 5) ```{r message=FALSE} #/ warning=FALSE #/ message=FALSE #/ error=FALSE #/ output=FALSE # Install and load the sn package (only if not already installed) # install.packages("sn") library(sn) # Set seed for reproducibility set.seed(123) # Generate 1000 random values from a skew-normal distribution # Parameters: location (xi), scale (omega), shape (alpha) skew_data <- rsn(1000, xi = 0, omega = 1, alpha = 5) # Plot histogram with density hist(skew_data, breaks = 30, col = "lightgreen", border = "white", probability = TRUE, main = "Skew-Normal Distribution with Density Curve", xlab = "Value", ylab = "Density") # Add skew-normal density curve curve(dsn(x, xi = 0, omega = 1, alpha = 5), col = "darkgreen", lwd = 2, add = TRUE) ``` ### 📉 Skew-Normal Distribution (alpha = -5) ```{r} # Load required package library(sn) # Set seed for reproducibility set.seed(123) # Generate data from skew-normal distribution with alpha = -5 skew_data_neg <- rsn(n = 1000, xi = 0, omega = 1, alpha = -5) # Plot histogram with density hist(skew_data_neg, breaks = 30, col = "salmon", border = "white", probability = TRUE, main = "Skew-Normal Distribution (α = -5)", xlab = "Value", ylab = "Density") # Add skew-normal density curve curve(dsn(x, xi = 0, omega = 1, alpha = -5), col = "darkred", lwd = 2, add = TRUE) ``` ::: # HMM (Hidden Markov Models) ## 📦 What is a Hidden Markov Model? A **Hidden Markov Model (HMM)** is a statistical model where the system being modeled is assumed to follow a **Markov process** with **unobserved (hidden) states**. ::: callout-note In plain terms: - There's a process that **moves between hidden states** over time. - Each hidden state **emits an observable output**. - You **see the outputs**, but you **don’t see the states**. ::: ::: callout-warning **Finance** : Market regime switching: ***bull*** - ***bear*** market. ::: ## 🧩 HMM Components A **Hidden Markov Model (HMM)** is defined by the following key elements: ### 🔒 Hidden States Denoted as: $$ C_1, C_2, \dots, C_T $$ - A finite set of unobserved states (e.g., *calm/volatile*, *sunny/rainy*). - The state at time ( t ), denoted ( C_t ), follows a **Markov chain**: $$ P(C_t \mid C_{t-1}) = \text{transition probability} $$ This means the current state only depends on the **previous state**, not the full history. ### 📈 Observations Denoted as: $$ X_1, X_2, \dots, X_T $$ - Each state emits data based on a specific **probability distribution**. - The observation at time ( t ), ( X_t ), depends **only on the hidden state** at that time: $$ P(X_t \mid C_t) $$ These are sometimes called the **emission probabilities**. ### 🎲 Initial Probabilities The model starts with an **initial distribution** over the hidden states: $$ \delta = P(C_1) $$ This represents the probability of starting in each possible hidden state. ## 🔁 Markov Property The hidden states follow the **first-order Markov assumption**: $$ P(C_t \mid C_{1:t-1}) = P(C_t \mid C_{t-1}) $$ Also, the current observation depends only on the current state: $$ P(X_t \mid C_{1:T}, X_{1:t-1}) = P(X_t \mid C_t) $$ ## ⚙️ Parameters of an HMM Transition matrix ( \Gamma ): $$ \Gamma = \begin{bmatrix} P(C_{t+1} = 1 \mid C_t = 1) & \cdots & \cdots & P(C_{t+1} = m \mid C_t = m) \end{bmatrix} $$ ------------------------------------------------------------------------ Emission probabilities (depends on the application): - Normal: $\mathcal{N}(\mu_i, \sigma_i^2)$ - Poisson: $\ \text{Pois}(\lambda_i)$ - Skew-normal, etc. ------------------------------------------------------------------------ Initial distribution:\ $\ \delta = P(C_1 = i)$ ## 📈 What You Can Do With HMMs | **Task** | **What It Means** | |-----------------------|----------------------------------------| | Likelihood evaluation | Compute $\\P(X_1, \dots, X_T)$ | | State estimation | Infer the most likely hidden states | | Parameter estimation | Learn transition & emission parameters | | Forecasting | Predict future observations or states | ## 📐 Key Algorithms ------------------------------------------------------------------------ ### 🔄 Forward Algorithm Efficiently computes the **likelihood** of observed data using recursion: $$ \alpha_t(j) = \sum_i \alpha_{t-1}(i) \cdot \gamma_{ij} \cdot P(X_t \mid C_t = j) $$ Where: - $\alpha_t(j)$: probability of partial sequence ending in state $\\j$ - $\gamma_{ij}$: transition probability from state $\\i$ to $\\j$ - $\\P(X_t \mid C_t = j)$: emission probability at time $\\t$ ------------------------------------------------------------------------ ### 🔁 Backward Algorithm Used for: - **Smoothing** (estimating past states) - **EM Algorithm** for learning parameters ------------------------------------------------------------------------ ## 📊 Visualization - 📈 **Plot observations** with colored hidden states\ - 📊 Show **state probabilities** over time\ - 📉 Compare **model fit** using criteria like **AIC** and **BIC** across different numbers of states ------------------------------------------------------------------------ ## 📌 Advantages of HMMs ✅ Handles **hidden structure** in time series\ ✅ Efficient thanks to **Forward-Backward recursion**\ ✅ Works well for **sequential or temporal data**\ ✅ Easily extended (e.g., with **skew-normal emissions**, **covariates**, etc.) # Main Part ---\> Project (1) ## SX600 ---> Analyzing data ::: panel-tabset ### Uploading data and Plotting ```{r} library(tseries) library(sn) sx600 = get.hist.quote(instrument = "^stoxx", start = "2015-01-01", quote = "AdjClose") sx600.num = as.numeric(sx600) plot(sx600) nts = length(sx600) sx600.sr = 100*diff(sx600.num)/sx600.num[-nts] sx600.cr = diff(log(sx600)) plot(sx600.sr, type = "l", ylim = c(-10,10)) hist(sx600.sr, breaks = 500, xlim = c(-4, 4), probability = TRUE, col = "red", border = "white", main = "Histogram with Fitted Skew-Normal Curve2", xlab = "Returns") f=function(x) 0.7*dsn(x,x=0.5,omega=0.8, alpha = 5)+ 0.2*dsn(x,x=0.6,omega=0.8, alpha = -4)+ + 0.1*dsn(x,x=-1.85,omega=3, alpha = 2) curve(f, col = "blue", lwd = 2, add = TRUE) ``` ### Forward function ```{r} forward_skew_loop = function(data, location.vec, scale.vec, shape) { results = list() for (m in 1:4) { n = length(data) # Generate gamma: m x m transition matrix with row sums = 1 gamma = matrix(runif(m * m), nrow = m) gamma = gamma / rowSums(gamma) # Initial state probabilities (delta): uniform delta = rep(1/m, m) # Initialize matrices pred = matrix(NA, nrow = m, ncol = n) filtering = matrix(NA, nrow = m, ncol = n) # Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] / sum(filtering[, 1]) # Loop over time for (t in 2:n) { pred[, t] = c(t(filtering[, t - 1]) %*% gamma) filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = llik + log(sum(filtering[, t])) filtering[, t] = filtering[, t] / sum(filtering[, t]) } results[[m]] = list( llik = llik, filtering = filtering, pred = pred, m = m, gamma = gamma ) } return(results) } out = forward_skew_loop(data=as.numeric(sx600.sr), location.vec = c(0.5,0.6,-1.85,0), shape = c(0.8,0.8,3,1), scale.vec = c(5,4,2,3)) for (m in 1:4) { cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n") } ``` ### Likelihood function and invariant distribution ```{r} # Initialize empty lists to store results transition_m = list() solved = list() for (m in 1:4) { gamma_m = out[[m]][["gamma"]] # Compute stationary distribution transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m)) # Multiply with gamma to check if it's stationary solved[[m]] = transition_m[[m]] %*% gamma_m print(transition_m[[m]]) print(solved[[m]]) } llik=function(par,m,data){ gamma=matrix(par[1:m^2],nrow=m,byrow=TRUE) gamma=gamma/apply(gamma,FUN=sum,MAR=1) delta=c(solve(t(diag(m)-gamma+matrix(1,m,m)),rep(1,m))) delta=rep(1/m,m) location.vec=rep(0.0,m) scale.vec=par[(m^2+1):(m^2+m)] shape=seq(1,m) n = length(data) # Initialize matrices pred = matrix(NA, nrow = m, ncol = n) filtering = matrix(NA, nrow = m, ncol = n) # Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] / sum(filtering[, 1]) # Loop over time for (t in 2:n) { pred[, t] = c(t(filtering[, t - 1]) %*% gamma) filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = llik + log(sum(filtering[, t])) filtering[, t] = filtering[, t] / sum(filtering[, t]) } return(llik) } llik_vals = numeric(4) for (i in 1:4) { gamma_i = out[[i]][["gamma"]] # Build parameter vector: gamma + sd.vec (we're assuming mean=0, skew=0 here) par_i = c(t(gamma_i), rep(1, i)) # Replace rep(1, i) with actual sd/shape if needed llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sx600.sr)) cat("Log-likelihood for", i, "components:", llik_vals[i], "\n") } ``` ### Likelihood Optimization and AIC model selection ```{r} llik_optim_vals = numeric(4) for (i in 1:4){ gamma_i = out[[i]][["gamma"]] par_i = c(t(gamma_i), rep(1, i)) outhmm=optim(par_i, fn=llik, method="L-BFGS-B", lower=c(rep(0,i^2),rep(0.05,i)), upper=c(rep(Inf,i^2),rep(Inf,i)), control=list(fnscale=-1),data=as.numeric(sx600.sr),m=i) llik_optim_vals[i]=outhmm$value cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n") } aic_last_vals = numeric() for (i in 1:4){ aic_last_vals[i] = -2*llik_optim_vals[i]+2*i*(i+2) cat("AIC for", i, "components:", aic_last_vals[i], "\n") } best_model = which.min(aic_last_vals) cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n") ``` ::: ## SP500 ---> Analyzing data ::: panel-tabset ### Uploading data and Plotting ```{r} sp500 = get.hist.quote(instrument = "^gspc", start = "2015-01-01" , quote = "AdjClose") sp500.num = as.numeric(sp500) plot(sp500) nts = length(sp500) sp500.sr = 100*diff(sp500.num)/sp500.num[-nts] sp500.cr = diff(log(sp500)) plot(sp500.sr, type = "l", ylim = c(-10,10)) hist(sp500.sr, breaks = 500, xlim = c(-4, 4), probability = TRUE, col = "red", border = "white", main = "Histogram with Fitted Skew-Normal Curve2", xlab = "Returns") f=function(x) 0.5*dsn(x,x=0.3,omega=0.6, alpha = 5)+ +0.2*dsn(x,x=0.4,omega=0.8, alpha = -4)+ +0.1*dsn(x,x=-1,omega=1, alpha = 2)+ +0.2*dsn(x,x=1,omega=1, alpha = 2) curve(f, col = "blue", lwd = 2, add = TRUE) ``` ### Forward function ```{r} forward_skew_loop = function(data, location.vec, scale.vec, shape) { results = list() for (m in 1:4) { n = length(data) # Generate gamma: m x m transition matrix with row sums = 1 gamma = matrix(runif(m * m), nrow = m) gamma = gamma / rowSums(gamma) # Initial state probabilities (delta): uniform delta = rep(1/m, m) # Initialize matrices pred = matrix(NA, nrow = m, ncol = n) filtering = matrix(NA, nrow = m, ncol = n) # Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] / sum(filtering[, 1]) # Loop over time for (t in 2:n) { pred[, t] = c(t(filtering[, t - 1]) %*% gamma) filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = llik + log(sum(filtering[, t])) filtering[, t] = filtering[, t] / sum(filtering[, t]) } results[[m]] = list( llik = llik, filtering = filtering, pred = pred, m = m, gamma = gamma ) } return(results) } out = forward_skew_loop(data=as.numeric(sp500.sr), location.vec = c(0.3,0.4,1,1), shape = c(0.6,0.8,1,1), scale.vec = c(5,4,2,2)) for (m in 1:4) { cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n") } ``` ### Likelihood function and invariant distribution ```{r} # Initialize empty lists to store results transition_m = list() solved = list() for (m in 1:4) { gamma_m = out[[m]][["gamma"]] # Compute stationary distribution transition_m[[m]] = solve(t(diag(m) - gamma_m + matrix(1, m, m)), rep(1, m)) # Multiply with gamma to check if it's stationary solved[[m]] = transition_m[[m]] %*% gamma_m print(transition_m[[m]]) print(solved[[m]]) } llik=function(par,m,data){ gamma=matrix(par[1:m^2],nrow=m,byrow=TRUE) gamma=gamma/apply(gamma,FUN=sum,MAR=1) delta=c(solve(t(diag(m)-gamma+matrix(1,m,m)),rep(1,m))) delta=rep(1/m,m) location.vec=rep(0.0,m) scale.vec=par[(m^2+1):(m^2+m)] shape=seq(1,m) n = length(data) # Initialize matrices pred = matrix(NA, nrow = m, ncol = n) filtering = matrix(NA, nrow = m, ncol = n) # Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] * dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] / sum(filtering[, 1]) # Loop over time for (t in 2:n) { pred[, t] = c(t(filtering[, t - 1]) %*% gamma) filtering[, t] = pred[, t] * dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = llik + log(sum(filtering[, t])) filtering[, t] = filtering[, t] / sum(filtering[, t]) } return(llik) } llik_vals = numeric(4) for (i in 1:4) { gamma_i = out[[i]][["gamma"]] # Build parameter vector: gamma + sd.vec (we're assuming mean=0, skew=0 here) par_i = c(t(gamma_i), rep(1, i)) # Replace rep(1, i) with actual sd/shape if needed llik_vals[i] = llik(par = par_i, m = i, data = as.numeric(sp500.sr)) cat("Log-likelihood for", i, "components:", llik_vals[i], "\n") } ``` ### Likelihood Optimization and AIC model selection ```{r} llik_optim_vals = numeric(4) for (i in 1:4){ gamma_i = out[[i]][["gamma"]] par_i = c(t(gamma_i), rep(1, i)) outhmm=optim(par_i, fn=llik, method="L-BFGS-B", lower=c(rep(0,i^2),rep(0.05,i)), upper=c(rep(Inf,i^2),rep(Inf,i)), control=list(fnscale=-1),data=as.numeric(sp500.sr),m=i) llik_optim_vals[i]=outhmm$value cat("MAX llik for", i, "components:", llik_optim_vals[i], "\n") } aic_last_vals = numeric() for (i in 1:4){ aic_last_vals[i] = -2*llik_optim_vals[i]+2*i*(i+2) cat("AIC for", i, "components:", aic_last_vals[i], "\n") } best_model = which.min(aic_last_vals) cat("\nBest model is with", best_model, "components (lowest AIC = ", aic_last_vals[best_model], ")\n") ``` :::