Audrex: a brief introduction

Giancarlo Vercellino

18-April-2022

… a voice comes to you speaking. Let the stars be set upon the board in accordance with their nature except for the Sun and Moon. And let the Sun be golden, the Moon silver, Kronos of obsidian, Ares of reddish onyx, Aphrodite lapis lazuli veined with gold, Hermes turquoise; let Zeus be of whitish stone, crystalline … .” (Wikipedia)

Mixing Dynamic Regression with Extreme Gradient Boosting

AUDREX allows you to automatically fine-tune the parameters of an ensamble of Dynamic Regression Models based on Extreme Gradient Boosting1. AUDREX sets up a different model for each time feature using each variable included within targets as regressor for the others and provides the Bayesian Optimization2 on the main parameters for the whole ensamble (with different parameters according to the chosen booster, gbtree or gblinear). For each sequence point in seq_len and each time feature, a different XGB model is validated according to n_windowsscheme (cumulative expanding). AUDREX 2.0 introduces also the opportunity of fine-tuning latent variables using SVD on the time features used as predictors3.

Below you can see a general depiction of AUDREX’s main processes. The important points here are the following:

  1. Differentiation and integration are automatically managed by AUDREX 2.0 using diff_threshold that is the maximal p-value for a repeated F-test for de-trending any time-feature: this allows to easily determine the different dynamic characteristics of each time feature (random, trending, exponential) and is somehow more predictable than Augmented Dickey-Fuller or Ljung Box Test. No more need to manually set-up the deriv variable for each feature.

  2. Besides the differentiation, AUDREX directly manages some kind of transformations . If you have limited missing values in your time features, AUDREX automatically proceeds with the imputation using the Kalman filter method implemented in the imputeTS package. If you prefer to project into the future the smoothed version of your time features, you can set smoother = TRUE to use loess4 function. If you want to normalize the time features using Yeo-Johnson before analysis, you can use norm= TRUE (normalization could be considered an hyper-parameter for fine tuning).

  3. The train and test errors are cross-validated through an expanding validation window: the default value is set to 3, meaning that the time features are divided into 3 + 1 segments guaranteeing at least three validation sets to measure the error on unforeseen data. The final prediction intervals comes from the integration of XGB prediction seeds with the sampling errors at each testing window in a Montecarlo simulation. For each point in the prediction sequence, a thousand samples are collected for the calculation of quantiles, mean, mode, standard deviation, skewness and kurtosis.

  4. Besides training and validating an ensemble model based on XGB, AUDREX lets you easily perform the Bayesian Optimization for the available boosters (gbtree and gblinear) : you only have to choose the number of sampling points for the hyper-space (n_sample) and the number of search cycles (n_search; in AUDREX 2.0, when this is set to zero, you call a Random Search). The fine-control of the process depends upon the acquisition function acq, and the interpolation parameters, kappa, eps and kernel. For any further information, you can check on rBayesianOptimization package.

  5. You can apply AUDREX to your time features only if you have enough data. When evaluating the available data for a potential application, you have to consider the dimensions of seq_len, the min_set of sequences for the training and validation processes, the number of n_windows for validation, the maximum differentiation done on the time features: all these variable have to be considered when assessing the time horizon of your predictions. When the data is not enough, AUDREX proposes a fix to seq_len values, if possible. If you set to NULL seq_len, AUDREX sets the maximum value to according to data length, minimum set, number of windows, etc.

The process flow of AUDREX

The process flow of AUDREX

What’s this mess with the climate? An example

In our introduction to Audrex, we are going to use some data on climate anomalies. Let’s take a look to two interesting time features: the Global Mean Temperature Anomalies (GMTA, degree °C) and the Global Mean Sea Level change (GMSL, meters). As showed in climate_anomalies, the time features are expected in ordered columns in a dataframe format.

Examples of time features: GMTA, GMSL
year_month GMTA GMSL
1993-1 0.3840 -1.6
1993-2 0.3967 -3.4
1993-3 0.4208 5.5
1993-4 0.3155 0.1
1993-5 0.3438 5.3
1993-6 0.2930 0.2
1993-7 0.2336 3.9
1993-8 0.2118 0.2
1993-9 0.1692 3.1
1993-10 0.2442 -0.1

In the first example, we are predicting the GMTA and the GMSL using a gblinear booster. In this naive example we try to set the hyper-parameters using only our gut feeling: max_dept= 10, min_child_weight= 0, eta= 1, gamma= 0, subsample= 0.5, colsample_bytree= 0.5, using 5 expanding windows for cross-validation (since the time features are not so long we have to reduce the min_setto 10 to allow a projection based on 20 past points and 12 future points).

fixed_date_format <- seq.Date(from = as.Date("1993-01-01"), to = as.Date("2015-02-01"), length.out = 266)###IF YOU WANT TO ADD DATES, BE SURE TO HAVE A CLEAR FORMAT

example1 <- audrex(climate_anomalies[, c("GMTA", "GMSL")], n_sample = 1, n_search = 0, seq_len = 15, n_windows = 5,  dates = fixed_date_format, booster = "gbtree", min_set = 10, max_depth = 10, min_child_weight = 0, eta = 1, gamma = 0, subsample = 0.5, colsample_bytree = 0.5, norm = T, n_dim = 2)
  time: 34.89 sec elapsed

The result is a list of different components, as you can see below.

names(example1)
  [1] "history"    "models"     "best_model" "time_log"

The first variable, history, summarizes the hyper-parameters selected by the user or through Bayesian Optmization or Random Search (setting n_search to zero) and relative error metrics (rmse, mae, mdae, mape, mase, rae, rse, rrse). Besides the predictions for each feature, best_model includes train-test error statistics for each time sequence (length equals to seq_len), joint-error (maximum error value across time features), prediction stats and plots for each one.

names(example1$best_model)
  [1] "predictions"  "joint_error"  "serie_errors" "pred_stats"   "plots"

The prediction is a list including the predicted results for each time-feature (quantile, min, max, mean, mode, sd, skewness, kurtosis for each time point in the seq_len sequence).

example1$best_model$predictions
  $GMTA
               min   10%   25%   50%   75%   90%   max  mean    sd  mode skewness
  2015-02-01 0.447 0.624 0.704 0.730 0.799 0.870 1.117 0.743 0.102 0.735   -0.041
  2015-03-05 0.235 0.619 0.731 0.813 0.903 0.981 1.320 0.813 0.147 0.826   -0.106
  2015-04-07 0.000 0.571 0.686 0.801 0.910 1.015 1.452 0.797 0.179 0.809   -0.120
  2015-05-09 0.000 0.577 0.706 0.845 0.983 1.100 1.481 0.843 0.204 0.822   -0.108
  2015-06-11 0.000 0.480 0.620 0.771 0.915 1.054 1.538 0.770 0.226 0.778   -0.061
  2015-07-14 0.000 0.435 0.578 0.740 0.912 1.047 1.585 0.741 0.245 0.714   -0.051
  2015-08-15 0.000 0.440 0.592 0.766 0.944 1.096 1.679 0.766 0.264 0.746   -0.014
  2015-09-17 0.000 0.408 0.580 0.775 0.956 1.127 1.794 0.772 0.284 0.796   -0.044
  2015-10-20 0.000 0.563 0.736 0.945 1.149 1.339 1.955 0.944 0.305 0.945   -0.010
  2015-11-21 0.000 0.419 0.615 0.837 1.061 1.258 1.882 0.837 0.323 0.857    0.048
  2015-12-24 0.000 0.422 0.623 0.846 1.090 1.280 2.024 0.855 0.338 0.899    0.083
  2016-01-26 0.000 0.561 0.764 0.998 1.261 1.469 2.196 1.013 0.356 0.982    0.048
  2016-02-27 0.000 0.586 0.812 1.039 1.311 1.532 2.237 1.054 0.372 1.039    0.039
  2016-03-31 0.229 0.850 1.101 1.344 1.621 1.858 2.550 1.355 0.391 1.301    0.040
  2016-05-03 0.000 0.488 0.748 0.998 1.286 1.517 2.235 1.006 0.404 0.961    0.050
             kurtosis
  2015-02-01    4.452
  2015-03-05    3.923
  2015-04-07    3.677
  2015-05-09    3.160
  2015-06-11    3.043
  2015-07-14    3.046
  2015-08-15    3.106
  2015-09-17    3.035
  2015-10-20    2.967
  2015-11-21    2.795
  2015-12-24    2.849
  2016-01-26    2.861
  2016-02-27    2.932
  2016-03-31    2.926
  2016-05-03    2.923
  
  $GMSL
                min    10%    25%     50%     75%     90%     max    mean     sd
  2015-02-01 69.651 77.041 78.701  81.017  83.349  85.641  92.951  81.125  3.415
  2015-03-05 68.512 77.602 80.419  83.584  86.852  89.591  96.967  83.649  4.723
  2015-04-07 69.054 79.559 83.013  86.595  90.530  94.098 102.943  86.763  5.683
  2015-05-09 61.886 73.005 76.954  81.367  85.671  89.010 100.039  81.210  6.408
  2015-06-11 62.783 77.674 82.179  86.805  91.589  95.676 107.084  86.772  7.067
  2015-07-14 64.797 79.532 84.473  90.273  95.328 100.394 118.591  90.104  8.059
  2015-08-15 65.882 83.143 88.755  94.260 100.256 104.906 122.798  94.326  8.543
  2015-09-17 67.985 84.309 89.546  95.834 102.854 107.410 120.792  96.072  9.176
  2015-10-20 79.701 91.808 96.901 104.657 111.628 117.380 132.477 104.521  9.740
  2015-11-21 79.751 92.474 98.279 105.406 112.729 118.913 134.121 105.679 10.207
  2015-12-24 77.458 92.674 98.825 106.509 113.680 120.546 138.791 106.475 10.659
  2016-01-26 71.716 91.837 97.904 105.509 113.700 120.931 142.180 105.845 11.301
  2016-02-27 63.185 88.227 94.244 102.551 110.589 118.138 140.631 102.736 11.636
  2016-03-31 71.790 92.843 99.188 107.644 115.859 124.235 149.002 107.953 12.063
  2016-05-03 66.805 93.379 99.814 108.279 117.177 124.987 146.834 108.517 12.560
                mode skewness kurtosis
  2015-02-01  80.468    0.226    3.096
  2015-03-05  83.916    0.019    2.862
  2015-04-07  87.153   -0.019    2.995
  2015-05-09  80.215   -0.127    3.007
  2015-06-11  87.446   -0.027    3.001
  2015-07-14  89.407    0.057    2.922
  2015-08-15  94.051    0.012    2.811
  2015-09-17  96.352    0.042    2.660
  2015-10-20 106.047    0.059    2.513
  2015-11-21 104.765    0.100    2.592
  2015-12-24 104.337    0.095    2.697
  2016-01-26 104.852    0.131    2.747
  2016-02-27 100.209    0.119    2.871
  2016-03-31 109.297    0.136    2.886
  2016-05-03 105.325    0.096    2.976

For each time features included in the model, you get a plot of the median with the chosen confidence interval (ci default is 0.8).

example1$best_model$plots
  $GMTA

  
  $GMSL

pred_stats includes some measures on the predicted sequences:

  1. the ratio of IQR to range (averaged across the prediction sequence points);
  2. the terminal IQR ratio (the ratio between last and first IQR-to-range);
  3. the risk ratio (absolute range above median to absolute range below median, average across all points);
  4. the terminal risk ratio (the risk ratio calculated at sequence end-points);
  5. the average divergence (max absolute ecdf difference between two points, averaged across all sequence);
  6. the terminal divergence (the divergence calculated between the last and the first point in prediction);
  7. the upside probability (probability of getting a value larger than previous point, averaged across all points);
  8. the terminal upside probability (as above, comparing the last and the first point in prediction).

While the IQR ratio could be helpful in understanding the aleatoric uncertainty affecting the predicted features, the divergence can give a better idea on the epistemic uncertainty.

example1$best_model$pred_stats
                        GMTA  GMSL
  avg_iqr_to_range     0.204 0.227
  terminal_iqr_ratio   5.663 3.736
  avg_risk_ratio       1.123 1.011
  terminal_risk_ratio  0.906 0.885
  avg_divergence       0.062 0.039
  terminal_divergence  0.095 0.002
  avg_upside_prob      0.572 0.672
  terminal_upside_prob 0.750 0.988

Automating the search for a better model

Now, the question is simple: can we get a better prediction searching the hyper-space with Bayesian Optimization? Let’s give it a try. The following example show you how to sample 10 points from a 5-dimensional hyper-parameter space: we are searching for the best max_depth, min_child_weight, eta, gamma, colsample_bytree inside the default boundaries (anyway, you can change the standard intervals replacing with your owns), while the Bayesian Optimization (thanks to the rBayesianOptimization package) is attempting 5 times to achieve the optimal solution (using the testing mape as target). You can also optimize for seq_len as hyper-parameter.

example2 <- audrex(climate_anomalies[, c("GMTA", "GMSL")], seq_len = 10, n_windows = 5,  dates = fixed_date_format, booster = "gbtree", min_set = 10, n_samp = 10, n_search = 5, seed = 12)
  
   Best Parameters Found: 
  Round = 1 norm = 0.0000   n_dim = 1.0000  max_depth = 4.0000  eta = 0.9995095 gamma = 18.63397    min_child_weight = 24.59459 subsample = 0.4618708   colsample_bytree = 0.1738651    Value = -2.664667 
  time: 315.47 sec elapsed

If we compare the error statistics from the best model in example2with the naive model in example1, we see an interesting improvement in the testing errors for GMTA and GMSL (error metrics are evaluated at the level of differentiated time-feature in AUDREX 2.0 for performance improvement).

The error statistics from example1.

example1$best_model$serie_errors
  $GMTA
            rmse   mae  mdae  mape  mase  rae   rse  rrse
  training 0.103 0.078 0.059 1.196 0.573 0.96 0.958 0.966
  testing  0.106 0.079 0.056 1.306 0.591 1.01 1.011 1.006
  
  $GMSL
            rmse   mae  mdae  mape  mase   rae   rse  rrse
  training 3.918 2.857 2.082 1.322 0.408 0.772 0.762 0.863
  testing  3.740 3.062 2.686 2.225 0.667 1.190 1.439 1.190

The error statistics from example2.

example2$best_model$serie_errors
  $GMTA
            rmse   mae  mdae  mape  mase rae   rse  rrse
  training 0.107 0.082 0.059 1.027 0.596   1 1.003 1.002
  testing  0.106 0.077 0.055 1.079 0.584   1 1.000 1.000
  
  $GMSL
            rmse   mae  mdae  mape  mase   rae   rse  rrse
  training 4.335 3.507 3.000 1.165 0.502 0.949 0.917 0.956
  testing  3.162 2.606 2.226 1.083 0.559 1.000 1.000 1.000

There is still something not working well here: testing error is lower than training error in many indicators, we need more space for validation cycles: let’s try windows = 7. Maybe, it would be better to expand the Bayesian Optimization increasing both n_samp and n_search: let’s see what happens if we set n_samp= 20, n_search= 10.

example3 <- audrex(climate_anomalies[, c("GMTA", "GMSL")], seq_len = 10, n_windows = 7,  dates = fixed_date_format, booster = "gblinear", min_set = 10, n_samp = 20, n_search = 10)
  
   Best Parameters Found: 
  Round = 11    norm = 1.0000   n_dim = 2.0000  eta = 0.2388142 lambda = 28.43439   alpha = 8.988946    Value = -2.754667 
  time: 451.19 sec elapsed

A closer look to the history table:

example3$history
      norm n_dim   eta lambda   alpha max_rmse max_mae max_mdae max_mape max_mase
  1  FALSE     2 0.813 22.492  18.138    3.233   2.676    2.374    1.065    0.589
  2  FALSE     2 0.592 38.783   7.755    3.231   2.672    2.365    1.062    0.589
  3  FALSE     2 0.545 72.278  62.368    3.233   2.676    2.374    1.065    0.589
  4   TRUE     2 0.250 77.086  92.977    3.233   2.676    2.374    1.065    0.589
  5  FALSE     1 0.398 23.386  46.456    3.233   2.676    2.374    1.065    0.589
  6   TRUE     1 0.966 34.915  56.575    3.233   2.676    2.374    1.065    0.589
  7  FALSE     2 0.525 53.678  10.152    3.231   2.674    2.367    1.060    0.589
  8   TRUE     2 0.237 89.180  80.599    3.233   2.676    2.374    1.065    0.589
  9  FALSE     2 0.558 36.882   0.815    3.299   2.716    2.470    1.303    0.589
  10  TRUE     1 0.404 77.270  63.155    3.233   2.676    2.374    1.065    0.589
  11  TRUE     2 0.239 28.434   8.989    3.231   2.672    2.361    1.061    0.589
  12 FALSE     2 0.510 35.797  35.394    3.233   2.676    2.374    1.065    0.589
  13  TRUE     2 0.596 98.930  70.238    3.233   2.676    2.374    1.065    0.589
  14  TRUE     1 0.309 26.123  77.748    3.233   2.676    2.374    1.065    0.589
  15  TRUE     1 0.787 73.569  93.444    3.233   2.676    2.374    1.065    0.589
  16  TRUE     1 0.814 62.679  90.480    3.233   2.676    2.374    1.065    0.589
  17 FALSE     1 0.622 24.214  89.804    3.233   2.676    2.374    1.065    0.589
  18 FALSE     1 0.689 13.520  47.307    3.233   2.676    2.374    1.065    0.589
  19 FALSE     1 0.357 34.226  26.713    3.233   2.676    2.374    1.065    0.589
  20 FALSE     2 0.433 84.970  44.366    3.233   2.676    2.374    1.065    0.589
  21 FALSE     2 0.277 70.447  18.462    3.233   2.676    2.374    1.065    0.589
  22  TRUE     2 0.090 27.150 100.000    3.233   2.676    2.374    1.065    0.589
  23  TRUE     2 0.556 81.554  30.989    3.233   2.676    2.374    1.065    0.589
  24  TRUE     1 0.263  2.520  21.871    3.233   2.676    2.374    1.065    0.589
  25 FALSE     2 0.988 84.135   8.676    3.231   2.674    2.364    1.059    0.589
  26  TRUE     1 0.706 66.726  49.557    3.233   2.676    2.374    1.065    0.589
  27 FALSE     2 0.941  0.312   8.717    3.236   2.672    2.358    1.068    0.589
  28  TRUE     1 0.102 64.163   8.525    3.231   2.673    2.363    1.059    0.589
  29 FALSE     2 0.030 48.261   8.303    3.231   2.673    2.363    1.874    0.617
  30  TRUE     2 0.875  8.982   8.394    3.233   2.670    2.367    1.066    0.589
     max_rae max_rse max_rrse wgt_avg_rank
  1    1.001   1.003    1.002        16.06
  2    1.000   1.003    1.002         6.23
  3    1.001   1.003    1.002        16.06
  4    1.001   1.003    1.002        16.06
  5    1.001   1.003    1.002        16.06
  6    1.001   1.003    1.002        16.06
  7    1.001   1.003    1.002         5.55
  8    1.001   1.003    1.002        16.06
  9    1.020   1.047    1.023        28.43
  10   1.001   1.003    1.002        16.06
  11   1.000   1.003    1.002         5.33
  12   1.001   1.003    1.002        16.06
  13   1.001   1.003    1.002        16.06
  14   1.001   1.003    1.002        16.06
  15   1.001   1.003    1.002        16.06
  16   1.001   1.003    1.002        16.06
  17   1.001   1.003    1.002        16.06
  18   1.001   1.003    1.002        16.06
  19   1.001   1.003    1.002        16.06
  20   1.001   1.003    1.002        16.06
  21   1.001   1.003    1.002        16.06
  22   1.001   1.003    1.002        16.06
  23   1.001   1.003    1.002        16.06
  24   1.001   1.003    1.002        16.06
  25   1.001   1.003    1.002         4.34
  26   1.001   1.003    1.002        16.06
  27   1.000   1.003    1.002        23.48
  28   1.000   1.003    1.002         3.59
  29   1.046   1.053    1.026        27.99
  30   1.000   1.003    1.002        22.76

Here are the best parameters discovered during the Bayesian Optimization:

example3$history[which.min(example3$history$wgt_avg_rank),]
     norm n_dim   eta lambda alpha max_rmse max_mae max_mdae max_mape max_mase
  28 TRUE     1 0.102 64.163 8.525    3.231   2.673    2.363    1.059    0.589
     max_rae max_rse max_rrse wgt_avg_rank
  28       1   1.003    1.002         3.59

Let’s have a look to the testing error: we have some improvement with the error on GMSL prediction, but still the testing error is greater than training: we need longer time features for a correct validation, data is not enough for GMSL.

example3$best_model$serie_errors
  $GMTA
            rmse   mae  mdae  mape  mase   rae   rse  rrse
  training 0.109 0.083 0.061 1.025 0.592 1.000 1.000 1.000
  testing  0.104 0.076 0.054 1.026 0.589 0.999 1.003 1.002
  
  $GMSL
            rmse   mae  mdae  mape  mase   rae   rse  rrse
  training 4.474 3.639 3.055 1.054 0.526 0.999 0.998 0.999
  testing  3.231 2.673 2.363 1.059 0.551 1.000 1.000 1.000

Well, it looks better to me.

example3$best_model$plot
  $GMTA

  
  $GMSL

Some useful references


  1. All you need to know about the Extreme Gradient Boosting implementation in R is here: https://github.com/dmlc/xgboost↩︎

  2. With reference to the Bayesian Optimization, we used the ever-green rBayesianOptimizationpackage, take a look here: https://github.com/yanyachen/rBayesianOptimization↩︎

  3. This approach was inspired by Christopher Xie, Alex Tank, Alec Greaves-Tunnell, Emily Fox, “A Unified Framework for Long Range and Cold Start Forecasting of Seasonal Profiles in Time Series” (<https://arxiv.org/abs/1710.08473>), at least for the idea of using latent variables as predictors.↩︎

  4. In some cases, maybe you want to operate on smoothed time-features. In this case, AUDREX calls on fANCOVA package. Here you can find all the latest: https://cran.r-project.org/web/packages/fANCOVA/index.html↩︎