Intro to temper

Author

Giancarlo Vercellino

Published

July 15, 2025

“He is happy whom circumstances suit his temper; but he is more excellent who suits his temper to any circumstance.” - David Hume

“You cannot predict the outcome of human affairs. Temper and unpredictability are their nature.” - Virginia Woolf

“Temper is the one thing you can’t get rid of by losing it.” - Jack Nicholson

What is temper

temper (Temporal Encoder–Masked Probabilistic Ensemble Regressor) is a machine learning algorithm for forecasting univariate time series with full uncertainty quantification. Instead of predicting only future values, temper estimates their entire probability distributions across multiple forecast horizons.

It works by combining a temporal autoencoder to compress historical patterns, a masked neural decision forest to generate diverse forecast samples, and a Gaussian mixture model to fit smooth predictive distributions. This design captures nonlinear dynamics, handles missing or noisy data robustly, and produces interpretable and calibrated uncertainty estimates.

By blending representation learning with probabilistic modeling, temper offers a versatile forecasting engine that is particularly valuable in domains where understanding forecast confidence is as important as the predictions themselves.

The analytical process implemented with temper

The temper algorithm follows a structured pipeline that transforms a univariate time series into full probabilistic forecasts. Each step plays a distinct role in enabling accurate, uncertainty-aware predictions:

  • Input preprocessing: The input is a numeric series of levels. Missing values are automatically filled using Kalman smoothing via imputeTS::na_kalman(), ensuring continuity. The series is then transformed into scaled differences (level changes divided by the previous value) to stabilize dynamics while preserving level interpretability.

  • Reframing into supervised format: Using a sliding window, the series is reframed into overlapping samples. Each sample consists of past consecutive scaled differences as inputs and future scaled differences as targets. This transforms the time series into a supervised dataset, enabling model training with input-output pairs.

  • Latent encoding via autoencoder:

    • Each input window is passed through a neural autoencoder, implemented using torch, which learns a compressed representation (latent_dim) of temporal dynamics.

    • The encoder captures the most informative structure in a low-dimensional latent space, while the decoder attempts to reconstruct the original input.

    • This reconstruction loss (mean squared error) acts as a regularizer, ensuring the latent code preserves important information.

  • Probabilistic masking and ensemble prediction:

    • The latent representation is passed through a differentiable forest — an ensemble of soft decision trees trained via torch.

    • Each tree includes a probabilistic feature mask, implemented using Gumbel–Softmax sampling. This learnable mask decides which latent features are retained or dropped during training.

    • The trees jointly produce diverse forecast samples (scaled differences) across multiple future steps, exploiting ensemble variance for uncertainty.

  • Inverse transformation to level space:

    • Forecasts in scaled-difference space are converted back to levels by reversing the differencing procedure—cumulatively applying the predicted scaled changes to the most recent observed value. This ensures the outputs remain on the same scale as the original series.
  • Gaussian mixture smoothing:

    • For each forecast horizon, the multiple sampled outcomes are post-processed via a Gaussian mixture model (GMix) using stats::kmeans() and custom density estimation routines.

    • A candidate number of components is set; K-means clustering identifies group structure; and means, variances, and weights are estimated.

    • The result is a smooth, interpretable, and reusable predictive distribution for each forecasted step.

This modular pipeline makes temper resilient to noise, expressive in uncertainty modeling, and flexible enough for downstream applications.

The main steps in temper

Output Construction and Visualization

Once all steps are completed, the algorithm returns a structured output object that includes:

  • pred_funs: A list of empirical distribution functions (pfun, qfun, rfun, and dfun) for each forecast step, allowing for quantile estimation, sampling, and density evaluation.

  • plot: A ready-to-publish fan chart implemented with ggplot2, displaying the original time series, the median forecast, and a predictive interval (typically 90%). This visualization is ideal for presentations, reporting, or exploratory analysis.

  • loss: A ggplot object showing the evolution of the training and validation loss (CRPS) over epochs. This aids in understanding convergence behavior and diagnosing overfitting or underfitting.

  • time_log: A lubridate::period object tracking the elapsed training time, offering transparency and aiding reproducibility.

Quick Start Example

Here’s a minimal example to get started with temper() in just a few lines:

library(temper)

fit <- temper(
  ts         = dummy_set$MSFT.Close,
  future     = 100,
  past       = 100,
  latent_dim = 10,
  n_trees    = 100,
  depth      = 8,
  epochs     = 30,
  seed       = 123,
  verbose    = FALSE
)

fit$plot  # Visualize fan chart

How to Forecast with temper : A Practical Example

Forecast Distributions, Quantiles, and Expectation

Once you’ve trained a temper model, the output includes a full predictive distribution at each forecast horizon. This allows you to:

  • Sample future scenarios with rfun()

  • Compute quantiles using qfun()

  • Evaluate cumulative probabilities with pfun()

  • Access pointwise densities using dfun()

Example 1: Plotting the Forecast Distribution at Horizon 100

samples <- fit$pred_funs$t100$rfun(1000) 
plot(density(samples),   main = "Forecast Distribution at t100",   xlab = "Predicted Value",   ylab = "Density" )

Example 2: Growth Probability at Horizon 50 and 100

# Probability that the series will increase 
1 - fit$pred_funs$t50$pfun(tail(dummy_set$MSFT.Close, 1))  # t+50 
[1] 0.4792041
1 - fit$pred_funs$t100$pfun(tail(dummy_set$MSFT.Close, 1))  # t+100
[1] 0.6070789

Example 3: Forecast Quantiles at Horizon 10

fit$pred_funs$t10$qfun(c(0.1, 0.25, 0.5, 0.75, 0.9))
[1] 448.7369 465.3088 487.9334 512.1540 527.4680

Example 4: Expected Value via Sampling and Integration at Horizon 100

# Sampling-based expectation 
mean(fit$pred_funs$t100$rfun(100000))  
[1] 529.4935
# Numerical integration 
q0 <- fit$pred_funs$t100$qfun(0.000001) 
q1 <- fit$pred_funs$t100$qfun(0.999999) 
norm_base <- integrate(function(x) fit$pred_funs$t100$dfun(x), q0, q1)$value
expectation <- integrate(function(x) fit$pred_funs$t100$dfun(x) * x, q0, q1)$value/norm_base
expectation
[1] 529.3906

Final Thoughts

temper is a flexible and probabilistic forecasting model that learns from volatility, embraces unpredictability, and—unlike most tempers—never snaps under pressure. So yes, use it for forecasting markets, demand, or the next time your CRAN submission will pass on the first try. Spoiler: it won’t.

(Yes, it is a joke).

Enzoi!