library(hmix)
<- hmix(dummy_set$AMZN.Close, horizon = 7, centers = 5, n_hidden = 3, n_tests = 5) results
Intro to hmix
“Honor is a luxury for aristocrats, but it is a necessity for servants.” - François Vatel
“A recipe has no soul. You, as the cook, must bring soul to the recipe.” -Thomas Keller
“Good food is very often, even most often, simple food.” - Anthony Bourdain
“Anyone can cook, but only the fearless can be great.” - Remy
Mixing different ingredients
The hmix algorithm is aimed at modeling time-dependent sequences, assuming the system being analyzed to undergo transitions between a finite set of hidden states, each of which produces observable outputs based on probabilistic rules.
The hmix
algorithm starts by transforming the time series into returns to normalize the data and reduce non-stationarity. The data is then segmented and clustered using k-means, which identifies distinct regimes that serve as observed states for the Hidden Markov Model1 (HMM). The HMM is initialized with random probabilities and trained using the Baum-Welch algorithm, with Viterbi as a backup. Once trained, the model generates probabilistic forecasts by sampling future states and reconstructing the original series. The algorithm uses a mixture function to fit several distributions to the data, combining them into an empirical distribution function for more accurate forecasting. The accuracy of these forecasts is then evaluated using the Continuous Ranked Probability Score (CRPS) through a back testing process.
The process flow of hmix
in more detail:
Data Preparation: The original time series is transformed into returns, calculated as the percentage change from one time step to the next. This is done to normalize the data, making it more suitable for modeling and reducing non-stationarity. The transformed data is reframed into a matrix format suitable for clustering. The transformed time series is then reframed into a matrix format. This involves organizing the time series data into non-overlapping segments for the
horizon
, where each window corresponds to a portion of the time series that will be used for clustering and HMM training.Clustering: The reframed time series data is clustered into
centers
groups using the k-means algorithm2. Each cluster represents a distinct regime or state of the time series, characterized by similar statistical properties. K-means assigns each window of data to the nearest cluster center, based on minimizing the variance within each cluster. Each time segment is assigned to one of the clusters, creating a sequence of observed states. These observed states correspond to the different regimes identified by the clustering process. This sequence of observed states will be used as the input for training the HMM, where the goal is to model the transitions between these regimes over time.HMM: The algorithm defines a set of hidden states (
n_hidden
) for the HMM. Each hidden state represents an underlying process that governs the transitions between observed regimes (clusters) in the time series. Hidden states are not directly observable but are inferred from the observed data. The transition probabilities (probabilities of moving from one hidden state to another), emission probabilities (probabilities of observing a particular regime given a hidden state), and initial state probabilities are randomly initialized using a Dirichlet distribution. The Dirichlet distribution ensures that the probabilities sum to one, making them valid probability distributions. This random initialization is crucial for starting the HMM training process, providing the model with a reasonable starting point for learning the true underlying probabilities from the data.The HMM is trained on the clustered data using the Baum-Welch algorithm. If Baum-Welch fails, Viterbi training is used as a fallback. The training process estimates the model parameters that best explain the observed sequences. After training, the HMM provides estimates of the transition probabilities (how likely it is to move from one hidden state to another) and emission probabilities (how likely each observed regime is given a hidden state). These probabilities form the basis of the model’s ability to generate forecasts and infer the underlying structure of the time series.
Forecast Sampling: This passage of the algorithm focuses on generating probabilistic forecasts from a trained HMM for a given time series. It begins by sampling future observations based on the probabilities derived from the HMM, specifically from the
next_observation_probs
matrix. This matrix represents the likelihood of different future regimes or clusters for the next sequence in thehorizon
. The algorithm uses these probabilities to sample potential future states, producing a vector of predicted clusters.Next, the algorithm retrieves actual data from the sampled clusters, constructing a data frame where each row corresponds to a sampled sequence from each cluster bucket. This data is then used to compute the original series from returns, which represent the predicted time series values over the specified forecast horizon. To ensure the forecasted paths are valid, the algorithm filters out any sequences that violate the constraints of the original time series (e.g., negative values in a strictly positive series). It also rounds the results if the original series consists of integer values.
The filtered forecast paths are then processed by fitting mixture distributions to each time step, capturing the uncertainty in the predictions, using a blend of different distributions.
Mixture: The
mixture
function is designed to create a composite probabilistic model by fitting several different statistical distributions to a given set of data points. It begins by fitting three types of distributions: the Generalized Normal Distribution3, Generalized Logistic Distribution4 and Generalized Lambda Distribution5 to the input data. Each of these distributions captures different characteristics of the data, and the function generates a set of possible values (quantiles) from each fitted distribution.Once these fitted values are obtained, the function combines them with the original data to create a comprehensive set of potential outcomes that represent the underlying distribution of the data. It then filters these combined values to ensure they are consistent with the original data (e.g., non-negative values if the original data is non-negative). This step helps in maintaining the integrity of the modeled distribution relative to the characteristics of the input data.
The final step involves creating an empirical distribution function from the combined and filtered values using the
edfun
package6. This empirical distribution function represents the mixture model, capturing the combined effects of the different distributions fitted earlier. The output of themixture
function is a predictive function that can be used for forecasting or evaluating the probability distribution of new data points, making it a versatile tool for probabilistic analysis.Back testing: The algorithm is encapsulated in a back testing function to compare the model’s predictions to the actual observations at each testing point. The Continuous Ranked Probability Score (CRPS)7 is calculated for each test point to quantify the accuracy of the probabilistic forecasts. The result is returned in a matrix with
n_tests
rows and a number of columns equal tohorizon
.
Show cooking with data: some practical examples
In this first example, we will demonstrate the basic usage of the hmix
function on AMZN time series, predicting for a time horizon
of 7 points in the future, using 5 clusters and 3 hidden states.
The result is a list containing two main objects:
error_sets
: This object includes the Continuous Ranked Probability Score (CRPS) for each test point (rows) across the forecast horizon (columns), providing a detailed evaluation of the model’s predictive accuracy over time.###THE CRPS COLLECTED DURING BACKTESTING $error_sets$crps_set results
t1 t2 t3 t4 t5 t6 t7 test_1 5.6316605 3.4799276 12.0893445 13.409894 11.133238 2.651227 1.477643 test_2 1.3557777 1.8574234 0.9769766 1.070928 2.912289 3.365841 3.143530 test_3 0.8085863 2.9002482 2.1105824 2.539823 1.124659 2.676483 3.404681 test_4 1.4962359 0.7674623 1.0357988 4.029595 3.198497 6.371474 5.260257 test_5 0.7548107 1.3296448 1.5795320 2.195853 3.109705 3.121753 3.465947
###THE ERROR DISTRIBUTION OVER PERCENTILES quantile(results$error_sets$crps_set, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
5% 25% 50% 75% 95% 0.7962491 1.4167102 2.6764834 3.4353140 11.4200700
model
: This object encapsulates the hmm_model
, which contains comprehensive details about the Hidden Markov Model (HMM). It includes the observed data (clustered time series), the initial HMM configuration, and the trained HMM parameters, such as the states, symbols, initial state probabilities (startProbs
), transition probabilities (transProbs
), and emission probabilities (emissionProbs
).
###THE OBSERVATIONS CREATED WITH K-MEANS
$model$hmm_model$observations results
[1] 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 5 2 5 5 5 5 4 5 5 5 3 5 5 5 5 5 5 5 4
[38] 1 4 1 5 5 5 4 4 1 1 5 5 5 5 5 5 5 5 4 5 4 3 4 5 1 2 4 2 3 5 3 4 2 1 1 4 3
[75] 4 5 5 5 5 5 5 5 4 5 5 4 1 5 5 1 5 2 1 1 5 4 5 5 5 5 5 5 5 5 5 5 1 5 5 3 5
[112] 2 4 4 1 4 3 4 4 5 5 5 5 1 3 1 2 3 4 1 1 5 4 1 4 2 2 1 5 5 5 5 5 4 5 3 5 2
[149] 2 1 5 5 5 5 5 5 4 5 5 1 5 5 5 2 5 4 5 5 2 5 5 1 5 1 3 5 1 5 5 4 2 3 1 2 1
[186] 5 1 5 2 2 2 5 1 1 2 4 4 5 1 4 5 2 4 5 2 4 5 1 2 4 2 3 5 2 4 3 5 1 5 2 3 3
[223] 2 1 3 5 4 5 1 5 5 5 5 5 4 4 5 5 5 1 2 4 5 2 1 5 5 5 1 1 2 1 5 3 4 5 5 5 5
[260] 4 2 5 4 5 5 5 5 4 1 2 2 2 4 2
###THE TRAINED PARAMETERS FOR HMM
$model$hmm_model$trained_hmm$hmm results
$States
[1] "s1" "s2" "s3"
$Symbols
[1] 5 4 2 3 1
$startProbs
s1 s2 s3
0.04505689 0.57172222 0.38322089
$transProbs
to
from s1 s2 s3
s1 5.685692e-01 1.785201e-141 0.4314308
s2 7.758710e-89 8.521139e-01 0.1478861
s3 6.220538e-01 2.683945e-01 0.1095517
$emissionProbs
symbols
states 5 4 2 3 1
s1 1.157265e-01 3.384340e-01 1.848545e-01 0.03050405 3.304809e-01
s2 1.394967e-41 2.988782e-02 1.160988e-02 0.07217932 8.863230e-01
s3 4.690382e-01 5.213926e-82 1.322001e-10 0.53096182 1.070940e-44
The model also includes pred_funs
, a set of predictive functions generated by the mixture
function. These functions provide comprehensive probabilistic forecasting capabilities, including:
rfun
: A sampler function that generates random samples from the predicted distribution for each point in the timehorizon
.dfun
: The probability density function (PDF), which computes the likelihood of different outcomes.pfun
: The cumulative distribution function (CDF), which calculates the probability of an outcome being less than or equal to a specific value.qfun
: The inverse cumulative distribution function (ICDF), which returns the quantile associated with a given probability.
These functions collectively enable a thorough probabilistic analysis of future time series values.
names(results$model$pred_funs)
[1] "t1" "t2" "t3" "t4" "t5" "t6" "t7"
$model$pred_funs$t7 results
$dfun
function (x)
dapproxfun(x)
<bytecode: 0x0000025e56edcb40>
<environment: 0x0000025e58eefcd0>
$pfun
Empirical CDF
Call: ecdf(x)
x[1:1458] = 143.35, 144.76, 149.24, ..., 209.54, 214.98
$qfun
function (v)
.approxfun(x, y, v, method, yleft, yright, f, na.rm)
<bytecode: 0x0000025e56e5e028>
<environment: 0x0000025e5a265610>
$rfun
function (n)
sample(x, size = n, replace = TRUE)
<bytecode: 0x0000025e56edddd8>
<environment: 0x0000025e58eefcd0>
$pfun_integrate_dfun
NULL
Now, let’s explore the power of our prediction functions. We can calculate the probability of different price trends, determine the likelihood of the price falling within a specific range, and even construct confidence intervals and odd estimates for each point in our forecasting horizon.
###GROWTH PROBABILITY OVER THE TIME HORIZON?
1 - results$model$pred_funs$t7$pfun(tail(dummy_set$AMZN.Close, 1))
[1] 0.7018182
###PROBABILITY FOR A PRICE RANGE BETWEEN 180 AND 185 AT T7?
diff(results$model$pred_funs$t7$pfun(c(180, 185)))
[1] 0.1686364
###80% CONFIDENCE RANGE OF PREDICTION AT T1?
$model$pred_funs$t1$qfun(c(0.1, 0.9)) results
[1] 170.1640 177.7822
###95% CONFIDENCE RANGE OF PREDICTION AT T3?
$model$pred_funs$t3$qfun(c(0.025, 0.975)) results
[1] 165.0955 185.9422
###ODDS FOR PRICE 185 OVER 165 AT T3?
$model$pred_funs$t3$dfun(185)/results$model$pred_funs$t3$dfun(165) results
[1] 0.8325378
###CALCULATING THE EXPECTATION AT T7 USING THE SAMPLER
mean(results$model$pred_funs$t7$rfun(1000))
[1] 176.5844
Closing remarks
Why did Mr. Markov become a chef? Because he always loves to “stir” up hidden patterns and to “season” its predictions with a pinch of randomness! (yeah, it’s a joke)
Enzoi!
Footnotes
The
hmm
package provides tools for the analysis and modeling of Hidden Markov Models, allowing users to estimate model parameters, decode hidden states, and perform Baum-Welch/ Viterbi algorithm applications. For more details, refer to the hmm package documentation.↩︎K-means does not need introductions, since it is the most popular clustering algorithm that partitions data into groups by minimizing the distance between points and their assigned centroids. It’s favored for its simplicity and efficiency, especially with large datasets, though it may struggle with irregularly shaped clusters. Here’s the (universally standard) function references.↩︎
The Generalized Normal Distribution is a generalization of the normal distribution that allows for heavier or lighter tails by adjusting a shape parameter. It is implemented in the
normalp
package in R, which provides tools for fitting and analyzing this distribution. For more details, refer to the normalp package documentation.↩︎The generalized logistic distribution is an extension of the standard logistic distribution that can model skewness and kurtosis more flexibly. It is used to describe data with asymmetrical behavior and is implemented in the
glogis
package in R. You can find more information in the glogis package documentation.↩︎The Generalized Lambda Distribution can take on the shape of many other distributions (e.g., normal, exponential) by adjusting its parameters, making it useful for modeling a wide range of data shapes. The
gld
package in R provides tools for fitting and analyzing the generalized lambda distribution. Additional information is available in the gld package documentation.↩︎The
edfun
package in R provides tools for constructing empirical distribution functions, which are non-parametric estimates of the cumulative distribution function (CDF) and inverse CDF based on observed data. For more information, you can visit edfun on CRAN.↩︎The CRPS is a scoring rule used to evaluate the accuracy of probabilistic forecasts. It measures the difference between the cumulative distribution function (CDF) of the predicted probability distribution and the actual observed value. The CRPS rewards sharp and accurate forecasts, with lower scores indicating better predictive performance. For more information, I like the way it is explained here.↩︎