Intro to hmix

Author

Giancarlo Vercellino

Published

September 10, 2024

Honor is a luxury for aristocrats, but it is a necessity for servants.” - François Vatel

A recipe has no soul. You, as the cook, must bring soul to the recipe.” -Thomas Keller

Good food is very often, even most often, simple food.” - Anthony Bourdain

Anyone can cook, but only the fearless can be great.” - Remy

Mixing different ingredients

The hmix algorithm is aimed at modeling time-dependent sequences, assuming the system being analyzed to undergo transitions between a finite set of hidden states, each of which produces observable outputs based on probabilistic rules.

The hmix algorithm starts by transforming the time series into returns to normalize the data and reduce non-stationarity. The data is then segmented and clustered using k-means, which identifies distinct regimes that serve as observed states for the Hidden Markov Model1 (HMM). The HMM is initialized with random probabilities and trained using the Baum-Welch algorithm, with Viterbi as a backup. Once trained, the model generates probabilistic forecasts by sampling future states and reconstructing the original series. The algorithm uses a mixture function to fit several distributions to the data, combining them into an empirical distribution function for more accurate forecasting. The accuracy of these forecasts is then evaluated using the Continuous Ranked Probability Score (CRPS) through a back testing process.

The process flow of hmix in more detail:

  • Data Preparation: The original time series is transformed into returns, calculated as the percentage change from one time step to the next. This is done to normalize the data, making it more suitable for modeling and reducing non-stationarity. The transformed data is reframed into a matrix format suitable for clustering. The transformed time series is then reframed into a matrix format. This involves organizing the time series data into non-overlapping segments for the horizon, where each window corresponds to a portion of the time series that will be used for clustering and HMM training.

  • Clustering: The reframed time series data is clustered into centers groups using the k-means algorithm2. Each cluster represents a distinct regime or state of the time series, characterized by similar statistical properties. K-means assigns each window of data to the nearest cluster center, based on minimizing the variance within each cluster. Each time segment is assigned to one of the clusters, creating a sequence of observed states. These observed states correspond to the different regimes identified by the clustering process. This sequence of observed states will be used as the input for training the HMM, where the goal is to model the transitions between these regimes over time.

  • HMM: The algorithm defines a set of hidden states (n_hidden) for the HMM. Each hidden state represents an underlying process that governs the transitions between observed regimes (clusters) in the time series. Hidden states are not directly observable but are inferred from the observed data. The transition probabilities (probabilities of moving from one hidden state to another), emission probabilities (probabilities of observing a particular regime given a hidden state), and initial state probabilities are randomly initialized using a Dirichlet distribution. The Dirichlet distribution ensures that the probabilities sum to one, making them valid probability distributions. This random initialization is crucial for starting the HMM training process, providing the model with a reasonable starting point for learning the true underlying probabilities from the data.

    The HMM is trained on the clustered data using the Baum-Welch algorithm. If Baum-Welch fails, Viterbi training is used as a fallback. The training process estimates the model parameters that best explain the observed sequences. After training, the HMM provides estimates of the transition probabilities (how likely it is to move from one hidden state to another) and emission probabilities (how likely each observed regime is given a hidden state). These probabilities form the basis of the model’s ability to generate forecasts and infer the underlying structure of the time series.

  • Forecast Sampling: This passage of the algorithm focuses on generating probabilistic forecasts from a trained HMM for a given time series. It begins by sampling future observations based on the probabilities derived from the HMM, specifically from the next_observation_probs matrix. This matrix represents the likelihood of different future regimes or clusters for the next sequence in the horizon. The algorithm uses these probabilities to sample potential future states, producing a vector of predicted clusters.

    Next, the algorithm retrieves actual data from the sampled clusters, constructing a data frame where each row corresponds to a sampled sequence from each cluster bucket. This data is then used to compute the original series from returns, which represent the predicted time series values over the specified forecast horizon. To ensure the forecasted paths are valid, the algorithm filters out any sequences that violate the constraints of the original time series (e.g., negative values in a strictly positive series). It also rounds the results if the original series consists of integer values.

    The filtered forecast paths are then processed by fitting mixture distributions to each time step, capturing the uncertainty in the predictions, using a blend of different distributions.

  • Mixture: The mixture function is designed to create a composite probabilistic model by fitting several different statistical distributions to a given set of data points. It begins by fitting three types of distributions: the Generalized Normal Distribution3, Generalized Logistic Distribution4 and Generalized Lambda Distribution5 to the input data. Each of these distributions captures different characteristics of the data, and the function generates a set of possible values (quantiles) from each fitted distribution.

    Once these fitted values are obtained, the function combines them with the original data to create a comprehensive set of potential outcomes that represent the underlying distribution of the data. It then filters these combined values to ensure they are consistent with the original data (e.g., non-negative values if the original data is non-negative). This step helps in maintaining the integrity of the modeled distribution relative to the characteristics of the input data.

    The final step involves creating an empirical distribution function from the combined and filtered values using the edfun package6. This empirical distribution function represents the mixture model, capturing the combined effects of the different distributions fitted earlier. The output of the mixture function is a predictive function that can be used for forecasting or evaluating the probability distribution of new data points, making it a versatile tool for probabilistic analysis.

  • Back testing: The algorithm is encapsulated in a back testing function to compare the model’s predictions to the actual observations at each testing point. The Continuous Ranked Probability Score (CRPS)7 is calculated for each test point to quantify the accuracy of the probabilistic forecasts. The result is returned in a matrix with n_tests rows and a number of columns equal to horizon.

Show cooking with data: some practical examples

In this first example, we will demonstrate the basic usage of the hmix function on AMZN time series, predicting for a time horizon of 7 points in the future, using 5 clusters and 3 hidden states.

library(hmix)

results <- hmix(dummy_set$AMZN.Close, horizon = 7, centers = 5, n_hidden = 3, n_tests = 5)

The result is a list containing two main objects:

  • error_sets: This object includes the Continuous Ranked Probability Score (CRPS) for each test point (rows) across the forecast horizon (columns), providing a detailed evaluation of the model’s predictive accuracy over time.

    ###THE CRPS COLLECTED DURING BACKTESTING
    results$error_sets$crps_set
                  t1        t2         t3        t4        t5       t6       t7
    test_1 5.6316605 3.4799276 12.0893445 13.409894 11.133238 2.651227 1.477643
    test_2 1.3557777 1.8574234  0.9769766  1.070928  2.912289 3.365841 3.143530
    test_3 0.8085863 2.9002482  2.1105824  2.539823  1.124659 2.676483 3.404681
    test_4 1.4962359 0.7674623  1.0357988  4.029595  3.198497 6.371474 5.260257
    test_5 0.7548107 1.3296448  1.5795320  2.195853  3.109705 3.121753 3.465947
    ###THE ERROR DISTRIBUTION OVER PERCENTILES
    quantile(results$error_sets$crps_set, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
            5%        25%        50%        75%        95% 
     0.7962491  1.4167102  2.6764834  3.4353140 11.4200700 

model: This object encapsulates the hmm_model, which contains comprehensive details about the Hidden Markov Model (HMM). It includes the observed data (clustered time series), the initial HMM configuration, and the trained HMM parameters, such as the states, symbols, initial state probabilities (startProbs), transition probabilities (transProbs), and emission probabilities (emissionProbs).

###THE OBSERVATIONS CREATED WITH K-MEANS
results$model$hmm_model$observations
  [1] 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 5 2 5 5 5 5 4 5 5 5 3 5 5 5 5 5 5 5 4
 [38] 1 4 1 5 5 5 4 4 1 1 5 5 5 5 5 5 5 5 4 5 4 3 4 5 1 2 4 2 3 5 3 4 2 1 1 4 3
 [75] 4 5 5 5 5 5 5 5 4 5 5 4 1 5 5 1 5 2 1 1 5 4 5 5 5 5 5 5 5 5 5 5 1 5 5 3 5
[112] 2 4 4 1 4 3 4 4 5 5 5 5 1 3 1 2 3 4 1 1 5 4 1 4 2 2 1 5 5 5 5 5 4 5 3 5 2
[149] 2 1 5 5 5 5 5 5 4 5 5 1 5 5 5 2 5 4 5 5 2 5 5 1 5 1 3 5 1 5 5 4 2 3 1 2 1
[186] 5 1 5 2 2 2 5 1 1 2 4 4 5 1 4 5 2 4 5 2 4 5 1 2 4 2 3 5 2 4 3 5 1 5 2 3 3
[223] 2 1 3 5 4 5 1 5 5 5 5 5 4 4 5 5 5 1 2 4 5 2 1 5 5 5 1 1 2 1 5 3 4 5 5 5 5
[260] 4 2 5 4 5 5 5 5 4 1 2 2 2 4 2
###THE TRAINED PARAMETERS FOR HMM
results$model$hmm_model$trained_hmm$hmm
$States
[1] "s1" "s2" "s3"

$Symbols
[1] 5 4 2 3 1

$startProbs
        s1         s2         s3 
0.04505689 0.57172222 0.38322089 

$transProbs
    to
from           s1            s2        s3
  s1 5.685692e-01 1.785201e-141 0.4314308
  s2 7.758710e-89  8.521139e-01 0.1478861
  s3 6.220538e-01  2.683945e-01 0.1095517

$emissionProbs
      symbols
states            5            4            2          3            1
    s1 1.157265e-01 3.384340e-01 1.848545e-01 0.03050405 3.304809e-01
    s2 1.394967e-41 2.988782e-02 1.160988e-02 0.07217932 8.863230e-01
    s3 4.690382e-01 5.213926e-82 1.322001e-10 0.53096182 1.070940e-44

The model also includes pred_funs, a set of predictive functions generated by the mixture function. These functions provide comprehensive probabilistic forecasting capabilities, including:

  • rfun: A sampler function that generates random samples from the predicted distribution for each point in the time horizon.

  • dfun: The probability density function (PDF), which computes the likelihood of different outcomes.

  • pfun: The cumulative distribution function (CDF), which calculates the probability of an outcome being less than or equal to a specific value.

  • qfun: The inverse cumulative distribution function (ICDF), which returns the quantile associated with a given probability.

These functions collectively enable a thorough probabilistic analysis of future time series values.

names(results$model$pred_funs)
[1] "t1" "t2" "t3" "t4" "t5" "t6" "t7"
results$model$pred_funs$t7
$dfun
function (x) 
dapproxfun(x)
<bytecode: 0x0000025e56edcb40>
<environment: 0x0000025e58eefcd0>

$pfun
Empirical CDF 
Call: ecdf(x)
 x[1:1458] = 143.35, 144.76, 149.24,  ..., 209.54, 214.98

$qfun
function (v) 
.approxfun(x, y, v, method, yleft, yright, f, na.rm)
<bytecode: 0x0000025e56e5e028>
<environment: 0x0000025e5a265610>

$rfun
function (n) 
sample(x, size = n, replace = TRUE)
<bytecode: 0x0000025e56edddd8>
<environment: 0x0000025e58eefcd0>

$pfun_integrate_dfun
NULL

Now, let’s explore the power of our prediction functions. We can calculate the probability of different price trends, determine the likelihood of the price falling within a specific range, and even construct confidence intervals and odd estimates for each point in our forecasting horizon.

###GROWTH PROBABILITY OVER THE TIME HORIZON?
1 - results$model$pred_funs$t7$pfun(tail(dummy_set$AMZN.Close, 1)) 
[1] 0.7018182
###PROBABILITY FOR A PRICE RANGE BETWEEN 180 AND 185 AT T7?
diff(results$model$pred_funs$t7$pfun(c(180, 185)))
[1] 0.1686364
###80% CONFIDENCE RANGE OF PREDICTION AT T1?
results$model$pred_funs$t1$qfun(c(0.1, 0.9))
[1] 170.1640 177.7822
###95% CONFIDENCE RANGE OF PREDICTION AT T3?
results$model$pred_funs$t3$qfun(c(0.025, 0.975))
[1] 165.0955 185.9422
###ODDS FOR PRICE 185 OVER 165 AT T3?
results$model$pred_funs$t3$dfun(185)/results$model$pred_funs$t3$dfun(165)
[1] 0.8325378
###CALCULATING THE EXPECTATION AT T7 USING THE SAMPLER
mean(results$model$pred_funs$t7$rfun(1000))
[1] 176.5844

Closing remarks

Why did Mr. Markov become a chef? Because he always loves to “stir” up hidden patterns and to “season” its predictions with a pinch of randomness! (yeah, it’s a joke)

Enzoi!

Footnotes

  1. The hmm package provides tools for the analysis and modeling of Hidden Markov Models, allowing users to estimate model parameters, decode hidden states, and perform Baum-Welch/ Viterbi algorithm applications. For more details, refer to the hmm package documentation.↩︎

  2. K-means does not need introductions, since it is the most popular clustering algorithm that partitions data into groups by minimizing the distance between points and their assigned centroids. It’s favored for its simplicity and efficiency, especially with large datasets, though it may struggle with irregularly shaped clusters. Here’s the (universally standard) function references.↩︎

  3. The Generalized Normal Distribution is a generalization of the normal distribution that allows for heavier or lighter tails by adjusting a shape parameter. It is implemented in the normalp package in R, which provides tools for fitting and analyzing this distribution. For more details, refer to the normalp package documentation.↩︎

  4. The generalized logistic distribution is an extension of the standard logistic distribution that can model skewness and kurtosis more flexibly. It is used to describe data with asymmetrical behavior and is implemented in the glogis package in R. You can find more information in the glogis package documentation.↩︎

  5. The Generalized Lambda Distribution can take on the shape of many other distributions (e.g., normal, exponential) by adjusting its parameters, making it useful for modeling a wide range of data shapes. The gld package in R provides tools for fitting and analyzing the generalized lambda distribution. Additional information is available in the gld package documentation.↩︎

  6. The edfun package in R provides tools for constructing empirical distribution functions, which are non-parametric estimates of the cumulative distribution function (CDF) and inverse CDF based on observed data. For more information, you can visit edfun on CRAN.↩︎

  7. The CRPS is a scoring rule used to evaluate the accuracy of probabilistic forecasts. It measures the difference between the cumulative distribution function (CDF) of the predicted probability distribution and the actual observed value. The CRPS rewards sharp and accurate forecasts, with lower scores indicating better predictive performance. For more information, I like the way it is explained here.↩︎