library(proteus)
Intro to proteus
“In Greek mythology, Proteus (/ˈproʊtiəs, -tjuːs/;Ancient Greek: Πρωτεύς, Prōteus) is an early prophetic sea-god or god of rivers and oceanic bodies of water, one of several deities whom Homer calls the ”Old Man of the Sea” (halios gerôn).Some who ascribe a specific domain to Proteus call him the god of ”elusive sea change”, which suggests the constantly changing nature of the sea or the liquid quality of water. He can foretell the future, but, in a mytheme familiar to several cultures, will change his shape to avoid doing so; he answers only to those who are capable of capturing him. From this feature of Proteus comes the adjective protean, meaning ”versatile”, ”mutable”, or ”capable of assuming many forms”. ”Protean” has positive connotations of flexibility, versatility and adaptability.” Wikipedia
Multiform like Proteus
Proteus is a Sequence-to-Sequence Variational Model designed for time-feature analysis, leveraging a wide range of distributions for improved accuracy. Unlike traditional methods that rely solely on the normal distribution, Proteus uses various latent models to better capture and predict complex processes. To achieve this, Proteus employs a neural network architecture that estimates the shape, location, and scale parameters of the chosen distribution. This approach transforms past sequence data into future sequence parameters, improving the model’s prediction capabilities. Proteus also assesses the accuracy of its predictions by estimating the error of measurement and calculating the confidence interval. By utilizing a range of distributions and advanced modeling techniques, Proteus provides a more accurate and comprehensive approach to time-feature analysis.
Here is a description of Proteus’s architecture. A number of neural network models are created to estimate each shape/location/scale parameter of the chosen latent distribution: moving from left to right, the tensors with the past sequences are transformed into the tensors of parameters of future sequences. The latent model is used to produce the future sequences and estimates the error of measurement (and assesses the confidence interval).
The time features structured in a dataframe in columnar order are “horizontally” reframed creating a 3D tensor. The 3D tensor passes through 3 main steps: time2vec embedding (inspired to this reference1), then adaptive normalization (inspired to this reference2), then a simple neural network with three linear transformations for achieving the target size. At the end of these three steps, you have the tensor of the estimated parameters for the chosen latent model and the sampling process may begin.
Starting from scratch
In our introduction to Proteus, we are going to use the Close Price series for Amazon, Google and Facebook (from Yahoo Finance). As showed in amzn_aapl_fb
, the time features are expected in ordered columns in a dataframe format.
::kable(head(amzn_aapl_fb, 10), align = "ccc", caption = "Examples of time features: close prices for Amazon, Google and Facebook") knitr
Date | AMZN | GOOGL | FB | |
---|---|---|---|---|
3779 | 2012-05-18 | 213.85 | 300.5005 | 38.23 |
3780 | 2012-05-21 | 218.11 | 307.3624 | 34.03 |
3781 | 2012-05-22 | 215.33 | 300.7007 | 31.00 |
3782 | 2012-05-23 | 217.28 | 305.0350 | 32.00 |
3783 | 2012-05-24 | 215.24 | 302.1321 | 33.03 |
3784 | 2012-05-25 | 212.89 | 296.0611 | 31.91 |
3785 | 2012-05-29 | 214.75 | 297.4675 | 28.84 |
3786 | 2012-05-30 | 209.23 | 294.4094 | 28.19 |
3787 | 2012-05-31 | 212.91 | 290.7207 | 29.60 |
3788 | 2012-06-01 | 208.22 | 285.7758 | 27.72 |
In our first example, we use proteus to predict a single time feature, namely Amazon. Proteus reframes each time feature as a matrix of sequences, with dimensions of n_sequences x past
+ future
variables. The number of sequences is determined by the stride
variable, which in this case is set to 1. This means that each past
+ future
sequence is shifted by a single position in time.
We aim to predict sequences of 30 time periods in the future
based on 60 time periods in the past, utilizing 20 different temporal embeddings. These embeddings decompose the original time feature into 1 trend and 19 different periodic components. We use a forward neural network with 32 nodes and a variational model based on normal
distribution.
The optimization method is another important hyper-parameter that impacts the error performance of the model. We measure error using back-testing on four rolling blocks, with n_blocks
set to 4 and rolling_blocks
set to TRUE. This means that the error is sampled on three different measurements using a rolling window scheme. If rolling_blocks
is set to FALSE, the error is sampled three times using an incremental window scheme.
Setting verbose to TRUE provides detailed information on the training and validation process, including the number of sequences, max batch size, and loss metric. Proteus offers several loss metrics, including Evidence Lower Bound, Continuous Ranked Probability Score, and a Custom Score (this last is the absolute difference in cdf between prediction and actual on the estimated latent parameters for the chosen distribution).
<- proteus(amzn_aapl_fb, target = "AMZN", future = 30, past = 60, t_embed = 20, activ = "linear", nodes = 32, distr = "normal", optim = "adam", loss_metric = "crps", rolling_blocks = T, n_blocks = 4, stride = 1, verbose = T, dates = "Date") example1
date and value gaps filled with kalman imputation
block 1
541 sequence for training
541 sequence for testing
epoch: 3 Train loss: 0.3282238 Test loss: 0.3254851
epoch: 6 Train loss: 0.3331435 Test loss: 0.3290568
epoch: 9 Train loss: 0.3325217 Test loss: 0.3298825
epoch: 12 Train loss: 0.3241959 Test loss: 0.3331832
early stop at epoch: 12 Train loss: 0.1797601 Test loss: 0.295964
block 2
541 sequence for training
541 sequence for testing
epoch: 3 Train loss: 0.3333673 Test loss: 0.3357568
epoch: 6 Train loss: 0.327236 Test loss: 0.3315799
epoch: 9 Train loss: 0.3264316 Test loss: 0.3358205
epoch: 12 Train loss: 0.3347558 Test loss: 0.328617
epoch: 15 Train loss: 0.3318962 Test loss: 0.3311511
epoch: 18 Train loss: 0.3319906 Test loss: 0.3360375
epoch: 21 Train loss: 0.3293301 Test loss: 0.3344645
epoch: 24 Train loss: 0.3340255 Test loss: 0.332931
epoch: 27 Train loss: 0.3338883 Test loss: 0.3309637
epoch: 30 Train loss: 0.3275637 Test loss: 0.325176
block 3
541 sequence for training
541 sequence for testing
epoch: 3 Train loss: 0.3300303 Test loss: 0.3347785
epoch: 6 Train loss: 0.3347132 Test loss: 0.3320392
epoch: 9 Train loss: 0.3323337 Test loss: 0.3323828
epoch: 12 Train loss: 0.3296569 Test loss: 0.3317752
early stop at epoch: 13 Train loss: 0.2757971 Test loss: 0.3399342
final training on all 4
2164 sequence for training
epoch: 3 Train loss: 0.3334524
epoch: 6 Train loss: 0.3327346
epoch: 9 Train loss: 0.3326941
epoch: 12 Train loss: 0.3342977
epoch: 15 Train loss: 0.3330329
epoch: 18 Train loss: 0.3315944
epoch: 21 Train loss: 0.3305171
epoch: 24 Train loss: 0.3344898
epoch: 27 Train loss: 0.3330322
epoch: 30 Train loss: 0.3325613
variational model based on normal latent distribution with 104 tensors and 167850 parameters
proteus: 969.31 sec elapsed
The result is a list of different components, as you can see below.
names(example1)
[1] "model_descr" "prediction" "plot" "features_errors"
[5] "history" "time_log"
The first variable is a simple high-level description of the model.
$model_descr example1
[1] "variational model based on normal latent distribution with 104 tensors and 167850 parameters"
The predictions
is a list including the predicted results for each time-feature (quantile, min, max, mean, mode, sd, skewness, kurtosis, iqr to range, above to below range, upside probability, divergence for each time point in the sequence). The IQR to range is the interquartile range to the min-max range, the above to below range is the range above median to the range below it, the upside probability is the probability of growth compared to the former point in the time sequence, the divergence is the maximum distance of cumulative normal curve of each point to the former point in the sequence.
::kable(example1$prediction$AMZN[1:10,], align = "ccc", caption = "Examples of time-feature prediction (first ten rows)") knitr
min | 10% | 25% | 50% | 75% | 90% | max | mean | sd | mode | kurtosis | skewness | iqr_to_range | above_to_below_range | upside_prob | divergence | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2019-07-12 | 1871.676 | 2000.441 | 2008.193 | 2011.581 | 2015.973 | 2023.987 | 2128.774 | 2011.812 | 15.9701 | 2011.240 | 18.8059 | -0.4415 | 0.0303 | 0.8377 | 0.5625 | 0.3590 |
2019-07-13 | 1846.539 | 1993.865 | 2006.659 | 2012.422 | 2019.670 | 2031.785 | 2146.090 | 2012.695 | 22.6081 | 2012.371 | 12.7274 | -0.3956 | 0.0434 | 0.8058 | 0.5632 | 0.0536 |
2019-07-14 | 1810.182 | 1988.702 | 2005.666 | 2013.443 | 2023.462 | 2039.838 | 2146.244 | 2013.838 | 27.7922 | 2012.857 | 10.1913 | -0.5391 | 0.0530 | 0.6534 | 0.5829 | 0.0271 |
2019-07-15 | 1767.992 | 1986.050 | 2004.821 | 2014.537 | 2027.263 | 2047.659 | 2157.775 | 2014.943 | 32.5280 | 2013.479 | 9.5906 | -0.7504 | 0.0576 | 0.5810 | 0.5816 | 0.0197 |
2019-07-16 | 1759.234 | 1981.934 | 2004.203 | 2015.539 | 2030.340 | 2053.713 | 2186.508 | 2015.691 | 36.5173 | 2013.907 | 9.0279 | -0.7370 | 0.0612 | 0.6671 | 0.5508 | 0.0203 |
2019-07-17 | 1781.287 | 1981.259 | 2003.228 | 2015.434 | 2032.437 | 2056.735 | 2201.561 | 2016.197 | 40.0886 | 2013.796 | 8.6095 | -0.6986 | 0.0695 | 0.7949 | 0.5311 | 0.0228 |
2019-07-18 | 1759.668 | 1979.082 | 2002.642 | 2016.389 | 2035.440 | 2060.172 | 2202.072 | 2016.932 | 43.2282 | 2013.050 | 8.9574 | -0.7630 | 0.0741 | 0.7233 | 0.5514 | 0.0216 |
2019-07-19 | 1729.402 | 1977.678 | 2001.741 | 2017.217 | 2037.231 | 2063.003 | 2235.560 | 2017.841 | 46.3900 | 2014.703 | 9.4186 | -0.7207 | 0.0701 | 0.7586 | 0.5656 | 0.0136 |
2019-07-20 | 1724.631 | 1977.317 | 2001.581 | 2017.070 | 2039.155 | 2067.486 | 2251.566 | 2018.706 | 49.2971 | 2015.534 | 9.4522 | -0.7050 | 0.0713 | 0.8019 | 0.5576 | 0.0148 |
2019-07-21 | 1718.067 | 1975.772 | 2001.306 | 2017.695 | 2041.739 | 2070.241 | 2280.978 | 2019.581 | 51.8477 | 2015.648 | 9.2776 | -0.7569 | 0.0718 | 0.8787 | 0.5607 | 0.0092 |
For each time features included in the model, you get a plot of the median values with the chosen confidence interval (ci
default is 0.8).
$plot example1
$AMZN
Adding any number of time features
It is possible to select any number of time features from the starting dataset. In the following example, we are going to select Amazon, Google and Facebook, for a joint-prediction.
<- proteus(amzn_aapl_fb, target = c("AMZN", "GOOGL", "FB"), future = 30, past = 60, t_embed = 20, activ = "linear", nodes = 64, distr = "normal", optim = "adam", rolling_blocks = F, stride = 10, verbose = F, dates = "Date") example2
proteus: 249.98 sec elapsed
$plot example2
$AMZN
$GOOGL
$FB
The history
plot reports the average selected loss across the validation blocks (in this case, based on a incremental windows with 10-strided sequences).
$history example2
The features errors include the standard metrics (me, mae, mse, rmsse, mpe, mape, rmae, rrmse, rame, mase, smse, sce) for each time feature.
$features_errors example2
$AMZN
me mae mse rmsse mpe mape rmae rrmse
train 5.62200 15.79933 525.156 11.42867 0.011 0.04200000 0.0850000 0.1083333
test 15.64367 40.34967 4017.532 25.79433 0.016 0.04133333 0.1613333 0.1816667
rame mase smse sce
train 0.02233333 4.191000 131.0683 4826.621
test 0.10733333 9.393667 813.8523 6407.464
$GOOGL
me mae mse rmsse mpe mape rmae
train 6.337 16.91867 584.695 12.26367 0.01133333 0.03300000 0.0700000
test 5.176 26.83400 1460.180 18.52433 0.00500000 0.03066667 0.2703333
rrmse rame mase smse sce
train 0.08933333 0.02633333 4.526333 150.6217 5185.446
test 0.32466667 0.13833333 6.868333 361.1113 2257.251
$FB
me mae mse rmsse mpe mape rmae
train 1.573000 2.759000 15.71833 4.739 0.02600000 0.05533333 0.05533333
test 1.597667 4.945333 54.01133 7.686 0.01133333 0.03666667 0.28300000
rrmse rame mase smse sce
train 0.07166667 0.02966667 4.020667 22.57467 7611.148
test 0.31300000 0.22766667 6.735333 68.88967 3741.401
Shifting skin, enhancing precision and getting a better understanding on uncertainty
Proteus offers a selection of twelve different latent models, each with two or three parameters, that can be used to improve the accuracy of predictions. However, selecting a specific model may increases the number of parameters to be estimated and computation time required.
Other important variables that impact computation time include rolling_blocks
and stride
. When rolling_blocks
is set to FALSE, back-testing is performed using an incremental block scheme with an increasing number of sequences from previous blocks, resulting in more accurate results but longer computation time. The stride
parameter operates as a thinning factor, reducing tensor size and computation time. To illustrate, consider the comparison between example1
and example3
. By using a smaller stride, we were able to reduce computation time and overfitting. Therefore, selecting the appropriate latent model and adjusting key parameters can significantly impact the performance of time-feature analysis.
<- proteus(amzn_aapl_fb, target = "AMZN", future = 30, past = 60, t_embed = 20, activ = "linear", nodes = 32, distr = "genbeta", optim = "adam", loss_metric = "crps", rolling_blocks = F, stride = 20, verbose = F, dates = "Date") example3
proteus: 173.34 sec elapsed
$time_log example1
[1] "16M 9S"
$time_log example3
[1] "2M 53S"
$model_descr example1
[1] "variational model based on normal latent distribution with 104 tensors and 167850 parameters"
$model_descr example3
[1] "variational model based on genbeta latent distribution with 156 tensors and 251775 parameters"
$features_errors example1
$AMZN
me mae mse rmsse mpe mape rmae
train 8.101667 18.41333 711.8557 12.84200 0.01400000 0.03833333 0.1306667
test 13.773333 40.40367 3969.9887 25.80133 0.01433333 0.04133333 0.1623333
rrmse rame mase smse sce
train 0.1586667 0.056 4.691333 170.1490 31275.20
test 0.1843333 0.101 9.390667 806.6403 56542.38
$features_errors example3
$AMZN
me mae mse rmsse mpe mape rmae
train -3.061000 15.395 469.3167 11.03867 -0.014000000 0.04266667 0.08566667
test 5.324667 35.400 2957.0780 22.29100 0.002333333 0.03733333 0.14166667
rrmse rame mase smse sce
train 0.1096667 0.02866667 4.146000 122.1493 -1177.1557
test 0.1560000 0.02266667 8.274333 601.1843 975.2163
With the release of version 1.1, we implemented a dedicated function for hyper-parameter tuning using random search. To see if we can improve our results with a limited number of models, let’s begin a random search with a sample size of 3. This will not only provide us with potential improvements but also give us valuable insights on how to further refine the tuning process.
<- proteus_random_search(3, amzn_aapl_fb, target = "AMZN", future = 30, loss_metric = "crps", rolling_blocks = F, verbose = F, dates = "Date") example4
proteus: 409.89 sec elapsed
proteus: 377.65 sec elapsed
proteus: 217.81 sec elapsed
random search: 1005.41 sec elapsed
If we take a look inside the random_search
table we can have an idea on the best hyper-parameters.
::kable(example4$random_search, align = "ccc", caption = "Examples of random search into the hyper-parameter space of proteus") knitr
model | past | t_embed | activ | nodes | distr | optim | lr | stride | avg_me | avg_mae | avg_mse | avg_rmsse | avg_mpe | avg_mape | avg_rmae | avg_rrmse | avg_rame | avg_mase | avg_smse | avg_sce |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 30 | 5 | linear | 25 | chisq | sgd | 0.088 | 9 | 9.030500e+00 | 2.780270e+01 | 2.281405e+03 | 1.830150e+01 | 0.0103 | 0.0403 | 0.1197 | 0.1413 | 0.0543 | 6.681300e+00 | 4.704435e+02 | 5.084861e+03 |
2 | 34 | 11 | softmax | 304 | gpd | rprop | 0.088 | 9 | -2.055587e+02 | 2.069192e+02 | 3.465990e+05 | 2.165030e+02 | -0.4363 | 0.4370 | 1.1597 | 1.6680 | 1.5997 | 5.506780e+01 | 6.896170e+04 | -1.581803e+05 |
1 | 46 | 26 | mish | 129 | exp | asgd | 0.042 | 3 | -5.547500e+07 | 5.547500e+07 | 1.769282e+18 | 3.942615e+08 | -136643.7175 | 136643.7175 | 225947.5537 | 2951937.4700 | 271004.4677 | 1.269292e+07 | 3.371283e+17 | -1.400557e+11 |
Footnotes
Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, StellaWu, Cathal Smyth, Pascal Poupart, Marcus Brubaker, Time2Vec: Learning a Vector Representation of Time, arXiv:1907.05321v1 [cs.LG] 11 Jul 2019↩︎
Nikolaos Passalis, Anastasios Tefas, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis, Deep Adaptive Input Normalization for Time Series Forecasting, arXiv:1902.07892v2 [q-fin.CP] 22 Sep 2019↩︎