“… a voice comes to you speaking. Let the stars be set upon the board in accordance with their nature except for the Sun and Moon. And let the Sun be golden, the Moon silver, Kronos of obsidian, Ares of reddish onyx, Aphrodite lapis lazuli veined with gold, Hermes turquoise; let Zeus be of whitish stone, crystalline … .” (Wikipedia)
AUDREX allows you to automatically fine-tune the parameters of an ensamble of Dynamic Regression Models based on Extreme Gradient Boosting1. AUDREX sets up a different model for each time feature using each variable included within targets
as regressor for the others and provides the Bayesian Optimization2 on the main parameters for the whole ensamble (with different parameters according to the chosen booster, gbtree
or gblinear
). For each sequence point in seq_len
and each time feature, a different XGB model is validated according to n_windows
scheme (cumulative expanding). AUDREX 2.0 introduces also the opportunity of fine-tuning latent variables using SVD on the time features used as predictors3.
Below you can see a general depiction of AUDREX’s main processes. The important points here are the following:
Differentiation and integration are automatically managed by AUDREX 2.0 using diff_threshold
that is the maximal p-value for a repeated F-test for de-trending any time-feature: this allows to easily determine the different dynamic characteristics of each time feature (random, trending, exponential) and is somehow more predictable than Augmented Dickey-Fuller or Ljung Box Test. No more need to manually set-up the deriv
variable for each feature.
Besides the differentiation, AUDREX directly manages some kind of transformations . If you have limited missing values in your time features, AUDREX automatically proceeds with the imputation using the Kalman filter method implemented in the imputeTS package. If you prefer to project into the future the smoothed version of your time features, you can set smoother
= TRUE
to use loess4 function. If you want to normalize the time features using Yeo-Johnson before analysis, you can use norm
= TRUE
(normalization could be considered an hyper-parameter for fine tuning).
The train and test errors are cross-validated through an expanding validation window
: the default value is set to 3, meaning that the time features are divided into 3 + 1 segments guaranteeing at least three validation sets to measure the error on unforeseen data. The final prediction intervals comes from the integration of XGB prediction seeds with the sampling errors at each testing window in a Montecarlo simulation. For each point in the prediction sequence, a thousand samples are collected for the calculation of quantiles, mean, mode, standard deviation, skewness and kurtosis.
Besides training and validating an ensemble model based on XGB, AUDREX lets you easily perform the Bayesian Optimization for the available boosters (gbtree
and gblinear
) : you only have to choose the number of sampling points for the hyper-space (n_sample
) and the number of search cycles (n_search
; in AUDREX 2.0, when this is set to zero, you call a Random Search). The fine-control of the process depends upon the acquisition function acq
, and the interpolation parameters, kappa
, eps
and kernel
. For any further information, you can check on rBayesianOptimization
package.
You can apply AUDREX to your time features only if you have enough data. When evaluating the available data for a potential application, you have to consider the dimensions of seq_len
, the min_set
of sequences for the training and validation processes, the number of n_windows
for validation, the maximum differentiation done on the time features: all these variable have to be considered when assessing the time horizon of your predictions. When the data is not enough, AUDREX proposes a fix to seq_len
values, if possible. If you set to NULL seq_len
, AUDREX sets the maximum value to according to data length, minimum set, number of windows, etc.
The process flow of AUDREX
In our introduction to Audrex, we are going to use some data on climate anomalies. Let’s take a look to two interesting time features: the Global Mean Temperature Anomalies (GMTA, degree °C) and the Global Mean Sea Level change (GMSL, meters). As showed in climate_anomalies
, the time features are expected in ordered columns in a dataframe format.
year_month | GMTA | GMSL |
---|---|---|
1993-1 | 0.3840 | -1.6 |
1993-2 | 0.3967 | -3.4 |
1993-3 | 0.4208 | 5.5 |
1993-4 | 0.3155 | 0.1 |
1993-5 | 0.3438 | 5.3 |
1993-6 | 0.2930 | 0.2 |
1993-7 | 0.2336 | 3.9 |
1993-8 | 0.2118 | 0.2 |
1993-9 | 0.1692 | 3.1 |
1993-10 | 0.2442 | -0.1 |
In the first example, we are predicting the GMTA and the GMSL using a gblinear
booster. In this naive example we try to set the hyper-parameters using only our gut feeling: max_dept
= 10, min_child_weight
= 0, eta
= 1, gamma
= 0, subsample
= 0.5, colsample_bytree
= 0.5, using 5 expanding windows for cross-validation (since the time features are not so long we have to reduce the min_set
to 10 to allow a projection based on 20 past points and 12 future points).
<- seq.Date(from = as.Date("1993-01-01"), to = as.Date("2015-02-01"), length.out = 266)###IF YOU WANT TO ADD DATES, BE SURE TO HAVE A CLEAR FORMAT
fixed_date_format
<- audrex(climate_anomalies[, c("GMTA", "GMSL")], n_sample = 1, n_search = 0, seq_len = 15, n_windows = 5, dates = fixed_date_format, booster = "gbtree", min_set = 10, max_depth = 10, min_child_weight = 0, eta = 1, gamma = 0, subsample = 0.5, colsample_bytree = 0.5, norm = T, n_dim = 2)
example1 : 34.89 sec elapsed time
The result is a list of different components, as you can see below.
names(example1)
1] "history" "models" "best_model" "time_log" [
The first variable, history
, summarizes the hyper-parameters selected by the user or through Bayesian Optmization or Random Search (setting n_search
to zero) and relative error metrics (rmse, mae, mdae, mape, mase, rae, rse, rrse). Besides the predictions for each feature, best_model
includes train-test error statistics for each time sequence (length equals to seq_len
), joint-error (maximum error value across time features), prediction stats and plots for each one.
names(example1$best_model)
1] "predictions" "joint_error" "serie_errors" "pred_stats" "plots" [
The prediction is a list including the predicted results for each time-feature (quantile, min, max, mean, mode, sd, skewness, kurtosis for each time point in the seq_len
sequence).
$best_model$predictions
example1$GMTA
10% 25% 50% 75% 90% max mean sd mode skewness
min 2015-02-01 0.447 0.624 0.704 0.730 0.799 0.870 1.117 0.743 0.102 0.735 -0.041
2015-03-05 0.235 0.619 0.731 0.813 0.903 0.981 1.320 0.813 0.147 0.826 -0.106
2015-04-07 0.000 0.571 0.686 0.801 0.910 1.015 1.452 0.797 0.179 0.809 -0.120
2015-05-09 0.000 0.577 0.706 0.845 0.983 1.100 1.481 0.843 0.204 0.822 -0.108
2015-06-11 0.000 0.480 0.620 0.771 0.915 1.054 1.538 0.770 0.226 0.778 -0.061
2015-07-14 0.000 0.435 0.578 0.740 0.912 1.047 1.585 0.741 0.245 0.714 -0.051
2015-08-15 0.000 0.440 0.592 0.766 0.944 1.096 1.679 0.766 0.264 0.746 -0.014
2015-09-17 0.000 0.408 0.580 0.775 0.956 1.127 1.794 0.772 0.284 0.796 -0.044
2015-10-20 0.000 0.563 0.736 0.945 1.149 1.339 1.955 0.944 0.305 0.945 -0.010
2015-11-21 0.000 0.419 0.615 0.837 1.061 1.258 1.882 0.837 0.323 0.857 0.048
2015-12-24 0.000 0.422 0.623 0.846 1.090 1.280 2.024 0.855 0.338 0.899 0.083
2016-01-26 0.000 0.561 0.764 0.998 1.261 1.469 2.196 1.013 0.356 0.982 0.048
2016-02-27 0.000 0.586 0.812 1.039 1.311 1.532 2.237 1.054 0.372 1.039 0.039
2016-03-31 0.229 0.850 1.101 1.344 1.621 1.858 2.550 1.355 0.391 1.301 0.040
2016-05-03 0.000 0.488 0.748 0.998 1.286 1.517 2.235 1.006 0.404 0.961 0.050
kurtosis2015-02-01 4.452
2015-03-05 3.923
2015-04-07 3.677
2015-05-09 3.160
2015-06-11 3.043
2015-07-14 3.046
2015-08-15 3.106
2015-09-17 3.035
2015-10-20 2.967
2015-11-21 2.795
2015-12-24 2.849
2016-01-26 2.861
2016-02-27 2.932
2016-03-31 2.926
2016-05-03 2.923
$GMSL
10% 25% 50% 75% 90% max mean sd
min 2015-02-01 69.651 77.041 78.701 81.017 83.349 85.641 92.951 81.125 3.415
2015-03-05 68.512 77.602 80.419 83.584 86.852 89.591 96.967 83.649 4.723
2015-04-07 69.054 79.559 83.013 86.595 90.530 94.098 102.943 86.763 5.683
2015-05-09 61.886 73.005 76.954 81.367 85.671 89.010 100.039 81.210 6.408
2015-06-11 62.783 77.674 82.179 86.805 91.589 95.676 107.084 86.772 7.067
2015-07-14 64.797 79.532 84.473 90.273 95.328 100.394 118.591 90.104 8.059
2015-08-15 65.882 83.143 88.755 94.260 100.256 104.906 122.798 94.326 8.543
2015-09-17 67.985 84.309 89.546 95.834 102.854 107.410 120.792 96.072 9.176
2015-10-20 79.701 91.808 96.901 104.657 111.628 117.380 132.477 104.521 9.740
2015-11-21 79.751 92.474 98.279 105.406 112.729 118.913 134.121 105.679 10.207
2015-12-24 77.458 92.674 98.825 106.509 113.680 120.546 138.791 106.475 10.659
2016-01-26 71.716 91.837 97.904 105.509 113.700 120.931 142.180 105.845 11.301
2016-02-27 63.185 88.227 94.244 102.551 110.589 118.138 140.631 102.736 11.636
2016-03-31 71.790 92.843 99.188 107.644 115.859 124.235 149.002 107.953 12.063
2016-05-03 66.805 93.379 99.814 108.279 117.177 124.987 146.834 108.517 12.560
mode skewness kurtosis2015-02-01 80.468 0.226 3.096
2015-03-05 83.916 0.019 2.862
2015-04-07 87.153 -0.019 2.995
2015-05-09 80.215 -0.127 3.007
2015-06-11 87.446 -0.027 3.001
2015-07-14 89.407 0.057 2.922
2015-08-15 94.051 0.012 2.811
2015-09-17 96.352 0.042 2.660
2015-10-20 106.047 0.059 2.513
2015-11-21 104.765 0.100 2.592
2015-12-24 104.337 0.095 2.697
2016-01-26 104.852 0.131 2.747
2016-02-27 100.209 0.119 2.871
2016-03-31 109.297 0.136 2.886
2016-05-03 105.325 0.096 2.976
For each time features included in the model, you get a plot of the median with the chosen confidence interval (ci
default is 0.8).
$best_model$plots
example1$GMTA
$GMSL
pred_stats
includes some measures on the predicted sequences:
While the IQR ratio could be helpful in understanding the aleatoric uncertainty affecting the predicted features, the divergence can give a better idea on the epistemic uncertainty.
$best_model$pred_stats
example1
GMTA GMSL0.204 0.227
avg_iqr_to_range 5.663 3.736
terminal_iqr_ratio 1.123 1.011
avg_risk_ratio 0.906 0.885
terminal_risk_ratio 0.062 0.039
avg_divergence 0.095 0.002
terminal_divergence 0.572 0.672
avg_upside_prob 0.750 0.988 terminal_upside_prob
Now, the question is simple: can we get a better prediction searching the hyper-space with Bayesian Optimization? Let’s give it a try. The following example show you how to sample 10 points from a 5-dimensional hyper-parameter space: we are searching for the best max_depth
, min_child_weight
, eta
, gamma
, colsample_bytree
inside the default boundaries (anyway, you can change the standard intervals replacing with your owns), while the Bayesian Optimization (thanks to the rBayesianOptimization
package) is attempting 5 times to achieve the optimal solution (using the testing mape
as target). You can also optimize for seq_len
as hyper-parameter.
<- audrex(climate_anomalies[, c("GMTA", "GMSL")], seq_len = 10, n_windows = 5, dates = fixed_date_format, booster = "gbtree", min_set = 10, n_samp = 10, n_search = 5, seed = 12)
example2
:
Best Parameters Found= 1 norm = 0.0000 n_dim = 1.0000 max_depth = 4.0000 eta = 0.9995095 gamma = 18.63397 min_child_weight = 24.59459 subsample = 0.4618708 colsample_bytree = 0.1738651 Value = -2.664667
Round : 315.47 sec elapsed time
If we compare the error statistics from the best model in example2
with the naive model in example1
, we see an interesting improvement in the testing errors for GMTA and GMSL (error metrics are evaluated at the level of differentiated time-feature in AUDREX 2.0 for performance improvement).
The error statistics from example1
.
$best_model$serie_errors
example1$GMTA
rmse mae mdae mape mase rae rse rrse0.103 0.078 0.059 1.196 0.573 0.96 0.958 0.966
training 0.106 0.079 0.056 1.306 0.591 1.01 1.011 1.006
testing
$GMSL
rmse mae mdae mape mase rae rse rrse3.918 2.857 2.082 1.322 0.408 0.772 0.762 0.863
training 3.740 3.062 2.686 2.225 0.667 1.190 1.439 1.190 testing
The error statistics from example2
.
$best_model$serie_errors
example2$GMTA
rmse mae mdae mape mase rae rse rrse0.107 0.082 0.059 1.027 0.596 1 1.003 1.002
training 0.106 0.077 0.055 1.079 0.584 1 1.000 1.000
testing
$GMSL
rmse mae mdae mape mase rae rse rrse4.335 3.507 3.000 1.165 0.502 0.949 0.917 0.956
training 3.162 2.606 2.226 1.083 0.559 1.000 1.000 1.000 testing
There is still something not working well here: testing error is lower than training error in many indicators, we need more space for validation cycles: let’s try windows
= 7. Maybe, it would be better to expand the Bayesian Optimization increasing both n_samp
and n_search
: let’s see what happens if we set n_samp
= 20, n_search
= 10.
<- audrex(climate_anomalies[, c("GMTA", "GMSL")], seq_len = 10, n_windows = 7, dates = fixed_date_format, booster = "gblinear", min_set = 10, n_samp = 20, n_search = 10)
example3
:
Best Parameters Found= 11 norm = 1.0000 n_dim = 2.0000 eta = 0.2388142 lambda = 28.43439 alpha = 8.988946 Value = -2.754667
Round : 451.19 sec elapsed time
A closer look to the history table:
$history
example3
norm n_dim eta lambda alpha max_rmse max_mae max_mdae max_mape max_mase1 FALSE 2 0.813 22.492 18.138 3.233 2.676 2.374 1.065 0.589
2 FALSE 2 0.592 38.783 7.755 3.231 2.672 2.365 1.062 0.589
3 FALSE 2 0.545 72.278 62.368 3.233 2.676 2.374 1.065 0.589
4 TRUE 2 0.250 77.086 92.977 3.233 2.676 2.374 1.065 0.589
5 FALSE 1 0.398 23.386 46.456 3.233 2.676 2.374 1.065 0.589
6 TRUE 1 0.966 34.915 56.575 3.233 2.676 2.374 1.065 0.589
7 FALSE 2 0.525 53.678 10.152 3.231 2.674 2.367 1.060 0.589
8 TRUE 2 0.237 89.180 80.599 3.233 2.676 2.374 1.065 0.589
9 FALSE 2 0.558 36.882 0.815 3.299 2.716 2.470 1.303 0.589
10 TRUE 1 0.404 77.270 63.155 3.233 2.676 2.374 1.065 0.589
11 TRUE 2 0.239 28.434 8.989 3.231 2.672 2.361 1.061 0.589
12 FALSE 2 0.510 35.797 35.394 3.233 2.676 2.374 1.065 0.589
13 TRUE 2 0.596 98.930 70.238 3.233 2.676 2.374 1.065 0.589
14 TRUE 1 0.309 26.123 77.748 3.233 2.676 2.374 1.065 0.589
15 TRUE 1 0.787 73.569 93.444 3.233 2.676 2.374 1.065 0.589
16 TRUE 1 0.814 62.679 90.480 3.233 2.676 2.374 1.065 0.589
17 FALSE 1 0.622 24.214 89.804 3.233 2.676 2.374 1.065 0.589
18 FALSE 1 0.689 13.520 47.307 3.233 2.676 2.374 1.065 0.589
19 FALSE 1 0.357 34.226 26.713 3.233 2.676 2.374 1.065 0.589
20 FALSE 2 0.433 84.970 44.366 3.233 2.676 2.374 1.065 0.589
21 FALSE 2 0.277 70.447 18.462 3.233 2.676 2.374 1.065 0.589
22 TRUE 2 0.090 27.150 100.000 3.233 2.676 2.374 1.065 0.589
23 TRUE 2 0.556 81.554 30.989 3.233 2.676 2.374 1.065 0.589
24 TRUE 1 0.263 2.520 21.871 3.233 2.676 2.374 1.065 0.589
25 FALSE 2 0.988 84.135 8.676 3.231 2.674 2.364 1.059 0.589
26 TRUE 1 0.706 66.726 49.557 3.233 2.676 2.374 1.065 0.589
27 FALSE 2 0.941 0.312 8.717 3.236 2.672 2.358 1.068 0.589
28 TRUE 1 0.102 64.163 8.525 3.231 2.673 2.363 1.059 0.589
29 FALSE 2 0.030 48.261 8.303 3.231 2.673 2.363 1.874 0.617
30 TRUE 2 0.875 8.982 8.394 3.233 2.670 2.367 1.066 0.589
max_rae max_rse max_rrse wgt_avg_rank1 1.001 1.003 1.002 16.06
2 1.000 1.003 1.002 6.23
3 1.001 1.003 1.002 16.06
4 1.001 1.003 1.002 16.06
5 1.001 1.003 1.002 16.06
6 1.001 1.003 1.002 16.06
7 1.001 1.003 1.002 5.55
8 1.001 1.003 1.002 16.06
9 1.020 1.047 1.023 28.43
10 1.001 1.003 1.002 16.06
11 1.000 1.003 1.002 5.33
12 1.001 1.003 1.002 16.06
13 1.001 1.003 1.002 16.06
14 1.001 1.003 1.002 16.06
15 1.001 1.003 1.002 16.06
16 1.001 1.003 1.002 16.06
17 1.001 1.003 1.002 16.06
18 1.001 1.003 1.002 16.06
19 1.001 1.003 1.002 16.06
20 1.001 1.003 1.002 16.06
21 1.001 1.003 1.002 16.06
22 1.001 1.003 1.002 16.06
23 1.001 1.003 1.002 16.06
24 1.001 1.003 1.002 16.06
25 1.001 1.003 1.002 4.34
26 1.001 1.003 1.002 16.06
27 1.000 1.003 1.002 23.48
28 1.000 1.003 1.002 3.59
29 1.046 1.053 1.026 27.99
30 1.000 1.003 1.002 22.76
Here are the best parameters discovered during the Bayesian Optimization:
$history[which.min(example3$history$wgt_avg_rank),]
example3
norm n_dim eta lambda alpha max_rmse max_mae max_mdae max_mape max_mase28 TRUE 1 0.102 64.163 8.525 3.231 2.673 2.363 1.059 0.589
max_rae max_rse max_rrse wgt_avg_rank28 1 1.003 1.002 3.59
Let’s have a look to the testing error: we have some improvement with the error on GMSL prediction, but still the testing error is greater than training: we need longer time features for a correct validation, data is not enough for GMSL.
$best_model$serie_errors
example3$GMTA
rmse mae mdae mape mase rae rse rrse0.109 0.083 0.061 1.025 0.592 1.000 1.000 1.000
training 0.104 0.076 0.054 1.026 0.589 0.999 1.003 1.002
testing
$GMSL
rmse mae mdae mape mase rae rse rrse4.474 3.639 3.055 1.054 0.526 0.999 0.998 0.999
training 3.231 2.673 2.363 1.059 0.551 1.000 1.000 1.000 testing
Well, it looks better to me.
$best_model$plot
example3$GMTA
$GMSL
All you need to know about the Extreme Gradient Boosting implementation in R is here: https://github.com/dmlc/xgboost↩︎
With reference to the Bayesian Optimization, we used the ever-green rBayesianOptimization
package, take a look here: https://github.com/yanyachen/rBayesianOptimization↩︎
This approach was inspired by Christopher Xie, Alex Tank, Alec Greaves-Tunnell, Emily Fox, “A Unified Framework for Long Range and Cold Start Forecasting of Seasonal Profiles in Time Series” (<https://arxiv.org/abs/1710.08473>), at least for the idea of using latent variables as predictors.↩︎
In some cases, maybe you want to operate on smoothed time-features. In this case, AUDREX calls on fANCOVA
package. Here you can find all the latest: https://cran.r-project.org/web/packages/fANCOVA/index.html↩︎