“Those who have knowledge don’t predict. Those who predict don’t have knowledge.” - Lao Tzu
“If you do not expect the unexpected, you will not find it.” - Heraclitus
“Prediction is very difficult, especially if it’s about the future.” - Niels Bohr
“I never think of the future—it comes soon enough.” - Albert Einstein
xpect the unexpected with Probabilistic Forecasting
The xpect function brings together the power of gradient-boosted regression (via XGBoost1) and conformal inference to deliver probabilistic forecasts. Instead of providing only point estimates, xpect constructs full predictive distributions that capture uncertainty in future time series values.
The xpect package works by:
Data Preparation & Feature Engineering
The process begins by transforming the raw time series into scaled returns, which normalizes the data to reduce non-stationarity and better expose underlying dynamics. Next, xpect reframes these scaled returns into structured matrices (based on past and future variables), creating input features suitable for predictive modeling. To enhance the quality and interpretability of these features, a Singular Value Decomposition2 (SVD) is applied, retaining a specified fraction of total variance defined by the coverage parameter. This step effectively reduces dimensionality, filters noise, and highlights the most informative components in the data, ultimately improving forecasting accuracy and model stability.
Modeling with Conformal Inference and Mixture of Distributions
An XGBoost model is trained on the prepared dataset, providing initial predictions for future time steps. These predictions are then calibrated through Conformal inference3, using a dedicated calibration set to estimate prediction residuals. After calibration, a mixture4 model approach fits several candidate statistical distributions—including Generalized Normal, Generalized Logistic, Generalized Lambda, among others—to the calibrated predictions, effectively capturing complex uncertainties in the forecasts. The resulting ensemble of fitted distributions is combined into an empirical distribution function5 using the edfun package, ultimately generating a suite of comprehensive predictive functions for each point in the forecasted horizon (future):
rfun: Generates random samples from the forecast distribution.
dfun: Computes the probability density function (PDF).
pfun: Returns the cumulative distribution function (CDF).
qfun: Provides the inverse CDF (quantile function).
Hyperparameter Optimization
xpect offers flexible hyperparameter optimization strategies, allowing users to tailor model tuning to the complexity of their data and available computational resources:
random_search: Select randomly n_samples configurations from predefined or custom hyperparameter ranges, providing broad coverage of the parameter space with minimal assumptions.
coarse_to_fine: Starts with a broad random search with n_samples and progressively narrows down the hyperparameter space through a top_k selection in multiple iterative n_phases, systematically refining toward optimal configurations.
bayesian: Employs Bayesian optimization6, intelligently balancing exploration and exploitation through probabilistic modeling. This method efficiently guides the search toward promising configurations, minimizing the number of evaluations required.
These methods empower users to select an optimization approach aligned with their analytical objectives, computational constraints, and data characteristics.
What do you expect? A Practical Example
In the following example, we apply xpect on a synthetic dataset. We forecast five future time steps using random_search hyperparameter search.
# Load required libraries (assumes xpect and dependencies are installed)library(xpect)library(ggplot2)# Create a dummy predictors dataset with sufficient observations.set.seed(42)df <-data.frame(target_series =tail(RandomWalker::rw30()$y, 100)+30,predictor1 =tail(RandomWalker::rw30()$y, 100)+30)# Execute xpect with random search methodresults <-xpect(predictors = df,target ="target_series",future =5,past =NULL, #STANDARD RANGEmax_depth =c(3:8), #CUSTOM RANGEeta =seq(0.01, 0.3, length.out =100), #CUSTOM RANGEalpha =0.5, #FIXED VALUElambda =seq(0.01, 0.1, length.out =100), #CUSTOM RANGEgamma =NULL, #STANDARD RANGEsearch ="random_search",n_samples =5,seed =123)# Display the computational time and forecast plotresults$time_log
[1] "1M 24S"
results$plot
The result is a list containing:
history: A data frame logging each evaluated hyperparameter configuration along with its performance.
best_model: The optimal forecasting model with associated predictive functions (rfun, dfun, pfun, qfun).
best_params: A summary of the hyperparameters selected for the best-performing model.
plot: A ggplot visualization that overlays the forecast with its uncertainty intervals.
time_log: The duration of the complete model-building and optimization process.
Let’s inspect the structure of history:
knitr::kable(results$history, caption ="Hyperparameters from the random search exploration, including entropy evaluation")
Hyperparameters from the random search exploration, including entropy evaluation
past
coverage
max_depth
eta
gamma
alpha
lambda
subsample
colsample_bytree
entropy
15
0.5
4
0.0481
0.4505
0.5
0.0991
0.8
0.8
1.2366
19
0.5
8
0.0803
4.7648
0.5
0.0745
0.8
0.8
1.2038
14
0.5
5
0.2707
1.7367
0.5
0.0327
0.8
0.8
1.0961
3
0.5
7
0.2736
3.2432
0.5
0.0155
0.8
0.8
1.0844
10
0.5
6
0.2092
4.9449
0.5
0.0473
0.8
0.8
1.1146
Models in the history are evaluated using entropy because entropy provides a quantitative measure of the uncertainty in their predictive distributions. Lower entropy indicates that a model’s forecasts are sharper and more confident, suggesting it better captures the underlying dynamics of the data. By ranking models based on their entropy, xpect can objectively select the configuration that minimizes uncertainty and delivers more informative and reliable forecasts.
Let’s inspect the structure of the best parameters:
knitr::kable(round(results$best_params, 3), caption ="Model with the lower entropy")
Model with the lower entropy
past
coverage
max_depth
eta
gamma
alpha
lambda
subsample
colsample_bytree
entropy
4
3
0.5
7
0.274
3.243
0.5
0.016
0.8
0.8
1.084
And explore the predictive functions for the last forecast point (t5):
# List available predictive functionsnames(results$best_model)
[1] "t1" "t2" "t3" "t4" "t5"
# Examine the prediction function for the 5th time stepresults$best_model$t5
Using these functions for each time point, you can compute confidence intervals, derive quantiles, and even generate random samples from the forecasted distribution. For example:
# Calculate the 95% confidence interval for T5 predictionsresults$best_model$t5$qfun(c(0.025, 0.975))
[1] 10.81514 16.01988
# Generate 1000 random forecast samples for T4 and compute their meanmean(results$best_model$t4$rfun(1000))
[1] 13.79569
# Probability of growth over the horizon1- results$best_model$t5$pfun(tail(df$target_series, 1))
[1] 0.459
# Probability of a value between 15 and 20 at T2results$best_model$t2$pfun(20) - results$best_model$t2$pfun(15)
[1] 0.062
# Odds for value 15 over value 13 at T5results$best_model$t5$dfun(15)/results$best_model$t5$dfun(13)
[1] 0.8133226
Final Thoughts
Never forget—forecasting is a notoriously tricky endeavor. The knowledgeable refrain from making predictions, the wise embrace uncertainty, physicists insist it’s an impossible task, and as for Albert… well, he doesn’t even bother trying. So, in summary: the future’s a prankster, and we’re all just trying to guess the punchline.
Enzoi!
Footnotes
XGBoost is the most famous gradient boosting library designed for efficiency, flexibility, and portability. For a comprehensive overview, please refer to the XGBoost documentation.↩︎
Singular Value Decomposition (SVD) is a matrix factorization technique widely used for dimensionality reduction and noise filtering. For a primer on the math, take a look here.↩︎
Conformal prediction is a statistical framework that provides reliable uncertainty estimates by constructing predictive intervals with guaranteed coverage under minimal assumptions. In xpect, conformal inference is used to calibrate the raw forecasts by leveraging residuals from a held-out calibration set. For further reading, take a look here.↩︎
The mixture function in xpect captures forecast uncertainty by fitting multiple probability distributions—including Generalized Normal, Generalized Logistic, Generalized Extreme Value, Generalized Lambda, Asymmetric Laplace, and Generalized Hyperbolic—to the calibrated predictions. These fitted distributions are then combined into a composite model, which is transformed into an empirical distribution function (EDF) using edfun. This ensures that the predictive functions (rfun, dfun, pfun, qfun) accurately reflect uncertainty across each point in the time horizon.↩︎
edfun is an R package that facilitates the creation of empirical distribution functions (EDFs), enabling non-parametric estimation of cumulative distribution functions and quantiles. More details can be found on its CRAN page.↩︎
Bayesian optimization is employed to fine-tune hyperparameters by defining a custom objective that minimizes the mean entropy of the predictive functions estimated with xpect. Key driving variables are n_samples and n_exploration. For more details, refer to the rBayesianOptimization package documentation.↩︎