The skew-normal distribution is a generalization of the normal (Gaussian) distribution that allows for asymmetry (skewness). While the standard normal distribution is symmetric around the mean, the skew-normal can lean to the left or right, depending on the shape parameter.
1.2 📈 Probability Density Function (PDF)
The probability density function (PDF) of the skew-normal distribution is:
In summary, the skew-normal adds an extra parameter to capture asymmetry, making it more adaptable for real-world data that often deviates from perfect symmetry.
Code
# Set seed for reproducibilityset.seed(123)# Generate 1000 random values from a normal distributionnormal_data <-rnorm(1000, mean =0, sd =1)# Plot histogram with density instead of frequencyhist(normal_data,breaks =30,col ="skyblue",border ="white",probability =TRUE,main ="Normal Distribution with Density Curve",xlab ="Value",ylab ="Density")# Add a normal density curvecurve(dnorm(x, mean =mean(normal_data), sd =sd(normal_data)),col ="darkblue",lwd =2,add =TRUE)
Code
library(sn)library(tseries)library(xts)library(ggplot2)library(ggthemes)library(e1071)library(PerformanceAnalytics)library(patchwork)library(scales)library(quantmod)library(dplyr)library(lubridate)# Set seed for reproducibilityset.seed(123)# Generate 1000 random values from a skew-normal distribution# Parameters: location (xi), scale (omega), shape (alpha)skew_data <-rsn(1000, xi =0, omega =1, alpha =5)# Plot histogram with densityhist(skew_data,breaks =30,col ="lightgreen",border ="white",probability =TRUE,main ="Skew-Normal Distribution with Density Curve",xlab ="Value",ylab ="Density")# Add skew-normal density curvecurve(dsn(x, xi =0, omega =1, alpha =5),col ="darkgreen",lwd =2,add =TRUE)
Code
# Set seed for reproducibilityset.seed(123)# Generate data from skew-normal distribution with alpha = -5skew_data_neg <-rsn(n =1000, xi =0, omega =1, alpha =-5)# Plot histogram with densityhist(skew_data_neg,breaks =30,col ="salmon",border ="white",probability =TRUE,main ="Skew-Normal Distribution (α = -5)",xlab ="Value",ylab ="Density")# Add skew-normal density curvecurve(dsn(x, xi =0, omega =1, alpha =-5),col ="darkred",lwd =2,add =TRUE)
2 HMM (Hidden Markov Models)
2.1 📦 What is a Hidden Markov Model?
A Hidden Markov Model (HMM) is a statistical model where the system being modeled is assumed to follow a Markov process with unobserved (hidden) states.
Note
In plain terms: - There’s a process that moves between hidden states over time. - Each hidden state emits an observable output. - You see the outputs, but you don’t see the states.
\(\alpha_t(j)\): probability of partial sequence ending in state \(\\j\)
\(\gamma_{ij}\): transition probability from state \(\\i\) to \(\\j\)
\(\\P(X_t \mid C_t = j)\): emission probability at time \(\\t\)
2.6.2 🔁 Backward Algorithm
Used for:
Smoothing (estimating past states)
EM Algorithm for learning parameters
2.7 📊 Visualization
📈 Plot observations with colored hidden states
📊 Show state probabilities over time
📉 Compare model fit using criteria like AIC and BIC across different numbers of states
2.8 📌 Advantages of HMMs
✅ Handles hidden structure in time series
✅ Efficient thanks to Forward-Backward recursion
✅ Works well for sequential or temporal data
✅ Easily extended (e.g., with skew-normal emissions, covariates, etc.)
set.seed(42) # For reproducibilitydata <-as.numeric(sx600.sr)n <-length(data)# Define your target proportionsweights <-c(0.9, 0.7, 0.4, 0.2)# Initialize vectors to store estimated parametersxi <-numeric(4)omega <-numeric(4)alpha <-numeric(4)# Create overlapping subsetssplits <-list()for (i in1:4) { size_i <-floor(weights[i] * n) splits[[i]] <-sample(data, size_i, replace =FALSE) # sample without replacement, but from full data each time}# Estimate skew-normal parameters for each subsetresults <-list()for (i in1:4) { fit <-selm(splits[[i]] ~1, family ="SN") params <-coef(fit, param.type ="DP")# Store estimates xi[i] <- params["xi"] omega[i] <- params["omega"] alpha[i] <- params["alpha"] results[[i]] <-c(params,xi,omega, alpha)cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n", i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))}
Subset 1 (90% of data): xi = 0.875, omega = 1.358, alpha = -1.378
Subset 2 (70% of data): xi = 0.858, omega = 1.345, alpha = -1.278
Subset 3 (40% of data): xi = 0.923, omega = 1.447, alpha = -1.557
Subset 4 (20% of data): xi = -0.597, omega = 1.255, alpha = 0.809
Code
forward_skew_loop =function(data, location.vec, scale.vec, shape) { results =list()for (m in2:4) { n =length(data)# Generate gamma: m x m transition matrix with row sums = 1 gamma =matrix(runif(m * m), nrow = m) gamma = gamma /rowSums(gamma)# Initial state probabilities (delta): uniform delta =rep(1/m, m)# Initialize matrices pred =matrix(NA, nrow = m, ncol = n) filtering =matrix(NA, nrow = m, ncol = n)# Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] *dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik =log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] /sum(filtering[, 1])# Loop over timefor (t in2:n) { pred[, t] =c(t(filtering[, t -1]) %*% gamma) filtering[, t] = pred[, t] *dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = llik +log(sum(filtering[, t])) filtering[, t] = filtering[, t] /sum(filtering[, t]) } results[[m]] =list(llik = llik,filtering = filtering,pred = pred,m = m,gamma = gamma ) }return(results)}out =forward_skew_loop(data=as.numeric(sx600.sr), location.vec = xi, scale.vec = omega, shape = alpha)for (m in2:4) {cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")}
HMM with 2 components has likelihood value: -3755.94
HMM with 3 components has likelihood value: -3753.523
HMM with 4 components has likelihood value: -3729.03
Code
# Initialize empty lists to store resultstransition_m =list()solved =list()for (m in2:4) { gamma_m = out[[m]][["gamma"]]# Compute stationary distribution transition_m[[m]] =solve(t(diag(m) - gamma_m +matrix(1, m, m)), rep(1, m))# Multiply with gamma to check if it's stationary solved[[m]] = transition_m[[m]] %*% gamma_mprint(transition_m[[m]])print(solved[[m]])}
Best model's shape parameter is: -1.133608 -1.252673 -1.891089 1.360593
Code
llik1 =function(par, m, data) { results =list()# Extract and normalize transition matrix gamma =matrix(par[1:m^2], nrow = m, byrow =TRUE) gamma = gamma /apply(gamma, 1, sum)# Extract parameters location.vec = par[(m^2+1):(m^2+ m)] scale.vec = par[(m^2+ m +1):(m^2+2*m)] shape.vec = par[(m^2+2*m +1):(m^2+3*m)]# Compute stationary distribution delta =c(solve(t(diag(m) - gamma +matrix(1, m, m)), rep(1, m))) n =length(data) pred =matrix(NA, nrow = m, ncol = n) filtering =matrix(NA, nrow = m, ncol = n)# Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] *dsn(x = data[1], xi = location.vec[1:m],omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik =log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] /sum(filtering[, 1])# Loop over timefor (t in2:n) { pred[, t] =c(t(filtering[, t -1]) %*% gamma) filtering[, t] = pred[, t] *dsn(x = data[t], xi = location.vec[1:m],omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik = llik +log(sum(filtering[, t])) filtering[, t] = filtering[, t] /sum(filtering[, t]) } results=list(llik = llik,filtering = filtering,pred = pred,delta = delta)return(results)}pars = llik_optim_pars[[m]]forward_results <-llik1(par = llik_optim_pars[[m]], m = m, data =as.numeric(sx600.sr))filtering <- forward_results$filteringmost_likely_states <-apply(filtering, 2, which.max)# Plot histograms by state using only training datapar(mfrow =c(2, 2)) # 2x2 grid for 4 statesfor (state in1:4) {# Filter: only training period & only current state data_state <-as.numeric(sx600.sr)[most_likely_states == state]if (length(data_state) ==0) {plot.new()title(main =paste("No data for State", state))next } hist_info <-hist(data_state, probability =TRUE, breaks =50, plot =FALSE) data_min <-min(data_state) data_max <-max(data_state) range_expansion <- (data_max - data_min) *0.15 x_min <- data_min - range_expansion x_max <- data_max + range_expansion x_vals <-seq(x_min, x_max, length.out =500) y_vals <-dsn(x_vals, xi = xi[state], omega = omega[state], alpha = alpha[state]) ylim_max <-max(max(hist_info$density), max(y_vals)) *1.1hist(data_state, probability =TRUE, breaks =50,main =paste("Histogram of Returns - State", state),xlab ="Log-Return", ylab ="Density",col ="lightgray", border ="white",xlim =c(x_min, x_max), ylim =c(0, ylim_max))curve(dsn(x, xi = xi[state], omega = omega[state], alpha = alpha[state]),from = x_min, to = x_max, add =TRUE, col ="red", lwd =2)}
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Code
dates =index(sx600[-1])sx600.ret =xts(sx600.cr, order.by = dates)dates_full <-index(sx600.ret)returns_full <-na.omit(as.numeric(sx600.ret[,1]))plot_data_train <-data.frame(Date = dates_full,Return = returns_full,State =factor(most_likely_states, levels =1:4) # Only 4 states)# Step 3: Plotggplot(plot_data_train, aes(x = Date, y = Return, color = State)) +geom_point(size =0.8, alpha =0.8) +scale_color_brewer(palette ="Dark2") +labs(title ="STOXX600 Daily Returns Colored by Hidden State (2015–2025)",y ="Log-Return",x ="Date") +theme_minimal()
set.seed(45) # For reproducibilitydata <-as.numeric(sp500.sr)n <-length(data)# Define your target proportionsweights <-c(0.9, 0.7, 0.4, 0.2)# Initialize vectors to store estimated parametersxi <-numeric(4)omega <-numeric(4)alpha <-numeric(4)# Create overlapping subsetssplits <-list()for (i in1:4) { size_i <-floor(weights[i] * n) splits[[i]] <-sample(data, size_i, replace =FALSE) # sample without replacement, but from full data each time}# Estimate skew-normal parameters for each subsetresults <-list()for (i in1:4) { fit <-selm(splits[[i]] ~1, family ="SN") params <-coef(fit, param.type ="DP")# Store estimates xi[i] <- params["xi"] omega[i] <- params["omega"] alpha[i] <- params["alpha"] results[[i]] <-c(params,xi,omega, alpha)cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n", i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))}
Subset 1 (90% of data): xi = 0.795, omega = 1.370, alpha = -0.955
Subset 2 (70% of data): xi = 0.859, omega = 1.419, alpha = -1.052
Subset 3 (40% of data): xi = -0.649, omega = 1.369, alpha = 0.892
Subset 4 (20% of data): xi = 0.908, omega = 1.411, alpha = -1.186
Code
forward_skew_loop =function(data, location.vec, scale.vec, shape) { results =list()for (m in2:4) { n =length(data)# Generate gamma: m x m transition matrix with row sums = 1 gamma =matrix(runif(m * m), nrow = m) gamma = gamma /rowSums(gamma)# Initial state probabilities (delta): uniform delta =rep(1/m, m)# Initialize matrices pred =matrix(NA, nrow = m, ncol = n) filtering =matrix(NA, nrow = m, ncol = n)# Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] *dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik =log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] /sum(filtering[, 1])# Loop over timefor (t in2:n) { pred[, t] =c(t(filtering[, t -1]) %*% gamma) filtering[, t] = pred[, t] *dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = llik +log(sum(filtering[, t])) filtering[, t] = filtering[, t] /sum(filtering[, t]) } results[[m]] =list(llik = llik,filtering = filtering,pred = pred,m = m,gamma = gamma ) }return(results)}out =forward_skew_loop(data=as.numeric(sp500.sr), location.vec = xi, scale.vec = omega, shape = alpha)for (m in2:4) {cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")}
HMM with 2 components has likelihood value: -4023.631
HMM with 3 components has likelihood value: -3987.416
HMM with 4 components has likelihood value: -3971.691
Code
# Initialize empty lists to store resultstransition_m =list()solved =list()for (m in2:4) { gamma_m = out[[m]][["gamma"]]# Compute stationary distribution transition_m[[m]] =solve(t(diag(m) - gamma_m +matrix(1, m, m)), rep(1, m))# Multiply with gamma to check if it's stationary solved[[m]] = transition_m[[m]] %*% gamma_mprint(transition_m[[m]])print(solved[[m]])}
Best model's shape parameter is: -0.7264074 -1.438691 0.2471854 -1.268492
Code
llik1 =function(par, m, data) { results =list()# Extract and normalize transition matrix gamma =matrix(par[1:m^2], nrow = m, byrow =TRUE) gamma = gamma /apply(gamma, 1, sum)# Extract parameters location.vec = par[(m^2+1):(m^2+ m)] scale.vec = par[(m^2+ m +1):(m^2+2*m)] shape.vec = par[(m^2+2*m +1):(m^2+3*m)]# Compute stationary distribution delta =c(solve(t(diag(m) - gamma +matrix(1, m, m)), rep(1, m))) n =length(data) pred =matrix(NA, nrow = m, ncol = n) filtering =matrix(NA, nrow = m, ncol = n)# Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] *dsn(x = data[1], xi = location.vec[1:m],omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik =log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] /sum(filtering[, 1])# Loop over timefor (t in2:n) { pred[, t] =c(t(filtering[, t -1]) %*% gamma) filtering[, t] = pred[, t] *dsn(x = data[t], xi = location.vec[1:m],omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik = llik +log(sum(filtering[, t])) filtering[, t] = filtering[, t] /sum(filtering[, t]) } results=list(llik = llik,filtering = filtering,pred = pred,delta = delta)return(results)}pars = llik_optim_pars[[m]]forward_results <-llik1(par = llik_optim_pars[[m]], m = m, data =as.numeric(sp500.sr))filtering <- forward_results$filteringmost_likely_states <-apply(filtering, 2, which.max)# Plot histograms by state using only training datapar(mfrow =c(2, 2)) # 2x2 grid for 4 statesfor (state in1:4) {# Filter: only training period & only current state data_state <-as.numeric(sp500.sr)[most_likely_states == state]if (length(data_state) ==0) {plot.new()title(main =paste("No data for State", state))next } hist_info <-hist(data_state, probability =TRUE, breaks =50, plot =FALSE) data_min <-min(data_state) data_max <-max(data_state) range_expansion <- (data_max - data_min) *0.15 x_min <- data_min - range_expansion x_max <- data_max + range_expansion x_vals <-seq(x_min, x_max, length.out =500) y_vals <-dsn(x_vals, xi = xi[state], omega = omega[state], alpha = alpha[state]) ylim_max <-max(max(hist_info$density), max(y_vals)) *1.1hist(data_state, probability =TRUE, breaks =50,main =paste("Histogram of Returns - State", state),xlab ="Log-Return", ylab ="Density",col ="lightgray", border ="white",xlim =c(x_min, x_max), ylim =c(0, ylim_max))curve(dsn(x, xi = xi[state], omega = omega[state], alpha = alpha[state]),from = x_min, to = x_max, add =TRUE, col ="red", lwd =2)}
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Code
dates =index(sp500[-1])sp500.ret =xts(sp500.cr, order.by = dates)dates_full <-index(sp500.ret)returns_full <-na.omit(as.numeric(sp500.ret[,1]))plot_data_train <-data.frame(Date = dates_full,Return = returns_full,State =factor(most_likely_states, levels =1:4) # Only 4 states)# Step 3: Plotggplot(plot_data_train, aes(x = Date, y = Return, color = State)) +geom_point(size =0.8, alpha =0.8) +scale_color_brewer(palette ="Dark2") +labs(title ="SP500 Daily Returns Colored by Hidden State (2015–2025)",y ="Log-Return",x ="Date") +theme_minimal()
nikkei =get.hist.quote(instrument ="^N225", start ="2015-01-01", end ="2025-05-30" , quote ="AdjClose")
Warning: ^N225 contains missing values. Some functions will not work if objects
contain missing values in the middle of the series. Consider using na.omit(),
na.approx(), na.fill(), etc to remove or replace them.
nts =length(nikkei)nikkei.sr =100*diff(nikkei.num)/nikkei.num[-nts]nikkei.cr =diff(log(nikkei))plot(nikkei.sr, type ="l", ylim =c(-10,10))
Code
hist(nikkei.sr,breaks =500, xlim =c(-4, 4),probability =TRUE, col ="red",border ="white",main ="Histogram of NIKKEI index",xlab ="Returns")
Code
set.seed(36) # For reproducibilitydata <-as.numeric(nikkei.sr)n <-length(data)# Define your target proportionsweights <-c(0.9, 0.7, 0.4, 0.2)# Initialize vectors to store estimated parametersxi <-numeric(4)omega <-numeric(4)alpha <-numeric(4)# Create overlapping subsetssplits <-list()for (i in1:4) { size_i <-floor(weights[i] * n) splits[[i]] <-sample(data, size_i, replace =FALSE) # sample without replacement, but from full data each time}# Estimate skew-normal parameters for each subsetresults <-list()for (i in1:4) { fit <-selm(splits[[i]] ~1, family ="SN") params <-coef(fit, param.type ="DP")# Store estimates xi[i] <- params["xi"] omega[i] <- params["omega"] alpha[i] <- params["alpha"] results[[i]] <-c(params,xi,omega, alpha)cat(sprintf("Subset %d (%.0f%% of data): xi = %.3f, omega = %.3f, alpha = %.3f\n", i, weights[i]*100, params["xi"], params["omega"], params["alpha"]))}
Subset 1 (90% of data): xi = 0.851, omega = 1.567, alpha = -0.876
Subset 2 (70% of data): xi = 0.856, omega = 1.510, alpha = -0.945
Subset 3 (40% of data): xi = 1.032, omega = 1.639, alpha = -1.162
Subset 4 (20% of data): xi = -0.887, omega = 1.628, alpha = 0.915
Code
forward_skew_loop =function(data, location.vec, scale.vec, shape) { results =list()for (m in2:4) { n =length(data)# Generate gamma: m x m transition matrix with row sums = 1 gamma =matrix(runif(m * m), nrow = m) gamma = gamma /rowSums(gamma)# Initial state probabilities (delta): uniform delta =rep(1/m, m)# Initialize matrices pred =matrix(NA, nrow = m, ncol = n) filtering =matrix(NA, nrow = m, ncol = n)# Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] *dsn(x = data[1], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik =log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] /sum(filtering[, 1])# Loop over timefor (t in2:n) { pred[, t] =c(t(filtering[, t -1]) %*% gamma) filtering[, t] = pred[, t] *dsn(x = data[t], xi = location.vec[1:m], omega = scale.vec[1:m], alpha = shape[1:m]) llik = llik +log(sum(filtering[, t])) filtering[, t] = filtering[, t] /sum(filtering[, t]) } results[[m]] =list(llik = llik,filtering = filtering,pred = pred,m = m,gamma = gamma ) }return(results)}out =forward_skew_loop(data=as.numeric(nikkei.sr), location.vec = xi, scale.vec = omega, shape = alpha)for (m in2:4) {cat("HMM with", m, "components has likelihood value:", out[[m]][["llik"]], "\n")}
HMM with 2 components has likelihood value: -4304.308
HMM with 3 components has likelihood value: -4300.9
HMM with 4 components has likelihood value: -4269.035
Code
# Initialize empty lists to store resultstransition_m =list()solved =list()for (m in2:4) { gamma_m = out[[m]][["gamma"]]# Compute stationary distribution transition_m[[m]] =solve(t(diag(m) - gamma_m +matrix(1, m, m)), rep(1, m))# Multiply with gamma to check if it's stationary solved[[m]] = transition_m[[m]] %*% gamma_mprint(transition_m[[m]])print(solved[[m]])}
xi = llik_optim_pars[[m]][(m^2+1):(m^2+ m)]omega = llik_optim_pars[[m]][(m^2+ m +1):(m^2+2*m)]alpha = llik_optim_pars[[m]][(m^2+2*m +1):(m^2+3*m)]cat("\nBest model's location parameter is:", llik_optim_pars[[m]][(m^2+1):(m^2+ m)],"\nBest model's shape parameter is:", llik_optim_pars[[m]][(m^2+2*m +1):(m^2+3*m)],"\nBest model's scale parameter is:", llik_optim_pars[[m]][(m^2+ m +1):(m^2+2*m)])
Best model's location parameter is: 0.873272 0.3671296 1.264923 -0.8585419
Best model's shape parameter is: -1.091009 -1.500902 -1.264446 0.2061429
Best model's scale parameter is: 1.527224 0.6268233 1.078457 3.169442
Code
llik1 =function(par, m, data) { results =list()# Extract and normalize transition matrix gamma =matrix(par[1:m^2], nrow = m, byrow =TRUE) gamma = gamma /apply(gamma, 1, sum)# Extract parameters location.vec = par[(m^2+1):(m^2+ m)] scale.vec = par[(m^2+ m +1):(m^2+2*m)] shape.vec = par[(m^2+2*m +1):(m^2+3*m)]# Compute stationary distribution delta =c(solve(t(diag(m) - gamma +matrix(1, m, m)), rep(1, m))) n =length(data) pred =matrix(NA, nrow = m, ncol = n) filtering =matrix(NA, nrow = m, ncol = n)# Time t = 1 pred[, 1] = delta filtering[, 1] = pred[, 1] *dsn(x = data[1], xi = location.vec[1:m],omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik =log(sum(filtering[, 1])) filtering[, 1] = filtering[, 1] /sum(filtering[, 1])# Loop over timefor (t in2:n) { pred[, t] =c(t(filtering[, t -1]) %*% gamma) filtering[, t] = pred[, t] *dsn(x = data[t], xi = location.vec[1:m],omega = scale.vec[1:m], alpha = shape.vec[1:m]) llik = llik +log(sum(filtering[, t])) filtering[, t] = filtering[, t] /sum(filtering[, t]) } results=list(llik = llik,filtering = filtering,pred = pred,delta = delta)return(results)}pars = llik_optim_pars[[m]]forward_results <-llik1(par = llik_optim_pars[[m]], m = m, data =as.numeric(nikkei.sr))filtering <- forward_results$filteringmost_likely_states <-apply(filtering, 2, which.max)# Plot histograms by state using only training datapar(mfrow =c(2, 2)) # 2x2 grid for 4 statesfor (state in1:4) {# Filter: only training period & only current state data_state <-as.numeric(nikkei.sr)[most_likely_states == state]if (length(data_state) ==0) {plot.new()title(main =paste("No data for State", state))next } hist_info <-hist(data_state, probability =TRUE, breaks =50, plot =FALSE) data_min <-min(data_state) data_max <-max(data_state) range_expansion <- (data_max - data_min) *0.15 x_min <- data_min - range_expansion x_max <- data_max + range_expansion x_vals <-seq(x_min, x_max, length.out =500) y_vals <-dsn(x_vals, xi = xi[state], omega = omega[state], alpha = alpha[state]) ylim_max <-max(max(hist_info$density), max(y_vals)) *1.1hist(data_state, probability =TRUE, breaks =50,main =paste("Histogram of Returns - State", state),xlab ="Log-Return", ylab ="Density",col ="lightgray", border ="white",xlim =c(x_min, x_max), ylim =c(0, ylim_max))curve(dsn(x, xi = xi[state], omega = omega[state], alpha = alpha[state]),from = x_min, to = x_max, add =TRUE, col ="red", lwd =2)}
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Warning in hist.default(data_state, probability = TRUE, breaks = 50, plot =
FALSE): аргумент 'probability' не для использования
Code
dates =index(nikkei[-1])nikkei.ret =xts(nikkei.cr, order.by = dates)dates_full <-index(nikkei.ret)returns_full <-na.omit(as.numeric(nikkei.ret[,1]))plot_data_train <-data.frame(Date = dates_full,Return = returns_full,State =factor(most_likely_states, levels =1:4) # Only 4 states)# Step 3: Plotggplot(plot_data_train, aes(x = Date, y = Return, color = State)) +geom_point(size =0.8, alpha =0.8) +scale_color_brewer(palette ="Dark2") +labs(title ="NIKKEI Daily Returns Colored by Hidden State (2015–2025)",y ="Log-Return",x ="Date") +theme_minimal()
4 Fitting the GEV distribution to the monthly maxima and minima
4.1 Generalized Extreme Value (GEV) Distribution — Theory
The GEV distribution is designed to model the extremes (either maxima or minima) of a dataset. Instead of modeling the entire distribution, it focuses only on tail events — e.g., extreme market crashes or spikes.
Where: - \(\mu\): Location parameter — shifts the distribution - \(\sigma\) > 0 : Scale parameter — controls spread - \(\xi\): Shape parameter — determines the tail behavior
4.1.2 GEV Types Based on Shape Parameter
Type
ξ (Shape)
Tail Behavior
Fréchet
ξ > 0
Heavy-tailed
Gumbel
ξ = 0
Light-tailed
Weibull
ξ < 0
Bounded upper tail
4.2 Why Use GEV in Financial Time Series?
Financial markets occasionally exhibit extreme behavior (e.g., crashes, spikes). While the HMM captures regime-switching behavior over time, it does not directly model the most extreme values within each state.
By fitting a GEV distribution to monthly maxima and minima, we gain:
Insight into tail risk,
Better understanding of risk under each regime,
Tools to estimate return levels (e.g., “What’s the largest monthly drop we expect every 10 years?”).
sx600 =get.hist.quote(instrument ="^stoxx", start ="2015-01-01", end ="2025-04-30",quote ="AdjClose")
time series starts 2015-01-05
time series ends 2025-04-29
Code
# Ensure column name is standardizedcolnames(sx600) <-"AdjClose"# Convert to a data frame with an explicit date columnsx600_df <-data.frame(date =index(sx600),price =as.numeric(sx600$AdjClose) # extract as numeric)# Add a "month" columnsx600_df <- sx600_df %>%mutate(month =floor_date(date, "month"))# Compute monthly maxima and minimamonthly_extremes <- sx600_df %>%group_by(month) %>%summarise(monthly_max =max(price, na.rm =TRUE),monthly_min =min(price, na.rm =TRUE),.groups ="drop" )# View resulthead(monthly_extremes)
sp500 =get.hist.quote(instrument ="^gspc", start ="2015-01-01", end ="2025-04-30" , quote ="AdjClose")
time series starts 2015-01-02
time series ends 2025-04-29
Code
# Ensure column name is standardizedcolnames(sp500) <-"AdjClose"# Convert to a data frame with an explicit date columnsp500_df <-data.frame(date =index(sp500),price =as.numeric(sp500$AdjClose) # extract as numeric)# Add a "month" columnsp500_df <- sp500_df %>%mutate(month =floor_date(date, "month"))# Compute monthly maxima and minimamonthly_extremes <- sp500_df %>%group_by(month) %>%summarise(monthly_max =max(price, na.rm =TRUE),monthly_min =min(price, na.rm =TRUE),.groups ="drop" )# View resulthead(monthly_extremes)
nikkei =get.hist.quote(instrument ="^N225", start ="2015-01-01", end ="2025-05-30" , quote ="AdjClose")
Warning: ^N225 contains missing values. Some functions will not work if objects
contain missing values in the middle of the series. Consider using na.omit(),
na.approx(), na.fill(), etc to remove or replace them.
time series starts 2015-01-05
Code
nikkei =na.omit(nikkei)# Ensure column name is standardizedcolnames(nikkei) <-"AdjClose"# Convert to a data frame with an explicit date columnnikkei_df <-data.frame(date =index(nikkei),price =as.numeric(nikkei$AdjClose) # extract as numeric)# Add a "month" columnnikkei_df <- nikkei_df %>%mutate(month =floor_date(date, "month"))# Compute monthly maxima and minimamonthly_extremes <- nikkei_df %>%group_by(month) %>%summarise(monthly_max =max(price, na.rm =TRUE),monthly_min =min(price, na.rm =TRUE),.groups ="drop" )# View resulthead(monthly_extremes)